public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v5 00/18] x86-64: Add vector math functions to libmvec
@ 2021-12-29  6:39 Sunil K Pandey
  2021-12-29  6:39 ` [PATCH v5 01/18] x86-64: Add vector atan/atanf implementation " Sunil K Pandey
                   ` (17 more replies)
  0 siblings, 18 replies; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Changes from v4:
-  Replace big negative rip offset with Table Lookup Bias.

Changes from v3:
-  Remove more unused data table fields.

Changes from v2:
-  Include LOE(live on exit) register info.
-  Apply more peephole optimization.
-  Optimize load of all bits set into ZMM register
-  Replace 3 kmovw + andl with kandw instruction.
-  Restructure data table and remove unused fields.
-  Fix data table and field alignment according to ISA.
-  Fix data offset according to ISA.
-  Remove exit call dead code.
-  Remove unnecessary save/restore.
-  Keep cfi_escape for callee saved registers only.
-  Add DW_CFA_expression comments corresponding to each cfi_escape.
-  Define macro corresponding to each numeric data table offset.
-  Replace numeric data table offset with macro name.
-  Add data table structure definition as comments.
-  Restructure data table and add comments to each data field value.
-  Rename numeric sequential labels with meaningful label name.
-  Add more comments to labels as well as on call sites.
-  Internal special value processing paths replaced by calls to standard
   scalar math functions, makes code more compact and aligned with
   previous libmvec submission.

Changes from v1:
  Add ISA specific sections for all libmvec functions.
  Add libmvec functions to math-vector-fortran.h.
  Change label to sequential.
  Fix function name in GNU header plate.

This patch set implements following vector math functions containing
SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI.  It
also contains accuracy, microbenchmark and ABI tests with regenerated
ulps.

atan
atanf
asin
asinf
hypot
hypotf
exp2
exp2f
exp10
exp10f
cosh
coshf
expm1
expm1f
sinh
sinhf
cbrt
cbrtf
atan2
atan2f
log10
log10f
log2
log2f
log1p
log1pf
atanh
atanhf
acosh
acoshf
erf
erff
tanh
tanhf
asinh
asinhf

Sunil K Pandey (18):
  x86-64: Add vector atan/atanf implementation to libmvec
  x86-64: Add vector asin/asinf implementation to libmvec
  x86-64: Add vector hypot/hypotf implementation to libmvec
  x86-64: Add vector exp2/exp2f implementation to libmvec
  x86-64: Add vector exp10/exp10f implementation to libmvec
  x86-64: Add vector cosh/coshf implementation to libmvec
  x86-64: Add vector expm1/expm1f implementation to libmvec
  x86-64: Add vector sinh/sinhf implementation to libmvec
  x86-64: Add vector cbrt/cbrtf implementation to libmvec
  x86-64: Add vector atan2/atan2f implementation to libmvec
  x86-64: Add vector log10/log10f implementation to libmvec
  x86-64: Add vector log2/log2f implementation to libmvec
  x86-64: Add vector log1p/log1pf implementation to libmvec
  x86-64: Add vector atanh/atanhf implementation to libmvec
  x86-64: Add vector acosh/acoshf implementation to libmvec
  x86-64: Add vector erf/erff implementation to libmvec
  x86-64: Add vector tanh/tanhf implementation to libmvec
  x86-64: Add vector asinh/asinhf implementation to libmvec

 bits/libm-simd-decl-stubs.h                   |  198 ++
 math/bits/mathcalls.h                         |   36 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |  144 ++
 sysdeps/x86/fpu/bits/math-vector.h            |   72 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   72 +
 sysdeps/x86_64/fpu/Makeconfig                 |   18 +
 sysdeps/x86_64/fpu/Versions                   |   36 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  352 ++++
 .../fpu/multiarch/svml_d_acosh2_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_acosh2_core.c |   27 +
 .../fpu/multiarch/svml_d_acosh2_core_sse4.S   | 1469 +++++++++++++++
 .../fpu/multiarch/svml_d_acosh4_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_acosh4_core.c |   27 +
 .../fpu/multiarch/svml_d_acosh4_core_avx2.S   | 1536 +++++++++++++++
 .../fpu/multiarch/svml_d_acosh8_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_acosh8_core.c |   27 +
 .../fpu/multiarch/svml_d_acosh8_core_avx512.S |  480 +++++
 .../fpu/multiarch/svml_d_asin2_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_asin2_core.c  |   27 +
 .../fpu/multiarch/svml_d_asin2_core_sse4.S    |  288 +++
 .../fpu/multiarch/svml_d_asin4_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_asin4_core.c  |   27 +
 .../fpu/multiarch/svml_d_asin4_core_avx2.S    |  273 +++
 .../fpu/multiarch/svml_d_asin8_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_asin8_core.c  |   27 +
 .../fpu/multiarch/svml_d_asin8_core_avx512.S  |  295 +++
 .../fpu/multiarch/svml_d_asinh2_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_asinh2_core.c |   27 +
 .../fpu/multiarch/svml_d_asinh2_core_sse4.S   | 1662 +++++++++++++++++
 .../fpu/multiarch/svml_d_asinh4_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_asinh4_core.c |   27 +
 .../fpu/multiarch/svml_d_asinh4_core_avx2.S   | 1601 ++++++++++++++++
 .../fpu/multiarch/svml_d_asinh8_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_asinh8_core.c |   27 +
 .../fpu/multiarch/svml_d_asinh8_core_avx512.S |  510 +++++
 .../fpu/multiarch/svml_d_atan22_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_atan22_core.c |   28 +
 .../fpu/multiarch/svml_d_atan22_core_sse4.S   |  471 +++++
 .../fpu/multiarch/svml_d_atan24_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_atan24_core.c |   28 +
 .../fpu/multiarch/svml_d_atan24_core_avx2.S   |  451 +++++
 .../fpu/multiarch/svml_d_atan28_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_atan28_core.c |   28 +
 .../fpu/multiarch/svml_d_atan28_core_avx512.S |  475 +++++
 .../fpu/multiarch/svml_d_atan2_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_atan2_core.c  |   27 +
 .../fpu/multiarch/svml_d_atan2_core_sse4.S    |  245 +++
 .../fpu/multiarch/svml_d_atan4_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_atan4_core.c  |   27 +
 .../fpu/multiarch/svml_d_atan4_core_avx2.S    |  225 +++
 .../fpu/multiarch/svml_d_atan8_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_atan8_core.c  |   27 +
 .../fpu/multiarch/svml_d_atan8_core_avx512.S  |  213 +++
 .../fpu/multiarch/svml_d_atanh2_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_atanh2_core.c |   27 +
 .../fpu/multiarch/svml_d_atanh2_core_sse4.S   | 1519 +++++++++++++++
 .../fpu/multiarch/svml_d_atanh4_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_atanh4_core.c |   27 +
 .../fpu/multiarch/svml_d_atanh4_core_avx2.S   | 1479 +++++++++++++++
 .../fpu/multiarch/svml_d_atanh8_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_atanh8_core.c |   27 +
 .../fpu/multiarch/svml_d_atanh8_core_avx512.S |  401 ++++
 .../fpu/multiarch/svml_d_cbrt2_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_cbrt2_core.c  |   27 +
 .../fpu/multiarch/svml_d_cbrt2_core_sse4.S    |  467 +++++
 .../fpu/multiarch/svml_d_cbrt4_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_cbrt4_core.c  |   27 +
 .../fpu/multiarch/svml_d_cbrt4_core_avx2.S    |  505 +++++
 .../fpu/multiarch/svml_d_cbrt8_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_cbrt8_core.c  |   27 +
 .../fpu/multiarch/svml_d_cbrt8_core_avx512.S  |  253 +++
 .../fpu/multiarch/svml_d_cosh2_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_cosh2_core.c  |   27 +
 .../fpu/multiarch/svml_d_cosh2_core_sse4.S    |  396 ++++
 .../fpu/multiarch/svml_d_cosh4_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_cosh4_core.c  |   27 +
 .../fpu/multiarch/svml_d_cosh4_core_avx2.S    |  412 ++++
 .../fpu/multiarch/svml_d_cosh8_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_cosh8_core.c  |   27 +
 .../fpu/multiarch/svml_d_cosh8_core_avx512.S  |  323 ++++
 .../fpu/multiarch/svml_d_erf2_core-sse2.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_erf2_core.c   |   27 +
 .../fpu/multiarch/svml_d_erf2_core_sse4.S     |  987 ++++++++++
 .../fpu/multiarch/svml_d_erf4_core-sse.S      |   20 +
 .../x86_64/fpu/multiarch/svml_d_erf4_core.c   |   27 +
 .../fpu/multiarch/svml_d_erf4_core_avx2.S     |  984 ++++++++++
 .../fpu/multiarch/svml_d_erf8_core-avx2.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_erf8_core.c   |   27 +
 .../fpu/multiarch/svml_d_erf8_core_avx512.S   |  983 ++++++++++
 .../fpu/multiarch/svml_d_exp102_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_exp102_core.c |   27 +
 .../fpu/multiarch/svml_d_exp102_core_sse4.S   |  418 +++++
 .../fpu/multiarch/svml_d_exp104_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_exp104_core.c |   27 +
 .../fpu/multiarch/svml_d_exp104_core_avx2.S   |  429 +++++
 .../fpu/multiarch/svml_d_exp108_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_exp108_core.c |   27 +
 .../fpu/multiarch/svml_d_exp108_core_avx512.S |  287 +++
 .../fpu/multiarch/svml_d_exp22_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_exp22_core.c  |   27 +
 .../fpu/multiarch/svml_d_exp22_core_sse4.S    |  325 ++++
 .../fpu/multiarch/svml_d_exp24_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_exp24_core.c  |   27 +
 .../fpu/multiarch/svml_d_exp24_core_avx2.S    |  341 ++++
 .../fpu/multiarch/svml_d_exp28_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_exp28_core.c  |   27 +
 .../fpu/multiarch/svml_d_exp28_core_avx512.S  |  301 +++
 .../fpu/multiarch/svml_d_expm12_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_expm12_core.c |   27 +
 .../fpu/multiarch/svml_d_expm12_core_sse4.S   |  421 +++++
 .../fpu/multiarch/svml_d_expm14_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_expm14_core.c |   27 +
 .../fpu/multiarch/svml_d_expm14_core_avx2.S   |  408 ++++
 .../fpu/multiarch/svml_d_expm18_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_expm18_core.c |   27 +
 .../fpu/multiarch/svml_d_expm18_core_avx512.S |  334 ++++
 .../fpu/multiarch/svml_d_hypot2_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_hypot2_core.c |   28 +
 .../fpu/multiarch/svml_d_hypot2_core_sse4.S   |  279 +++
 .../fpu/multiarch/svml_d_hypot4_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_hypot4_core.c |   28 +
 .../fpu/multiarch/svml_d_hypot4_core_avx2.S   |  289 +++
 .../fpu/multiarch/svml_d_hypot8_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_hypot8_core.c |   28 +
 .../fpu/multiarch/svml_d_hypot8_core_avx512.S |  235 +++
 .../fpu/multiarch/svml_d_log102_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_log102_core.c |   27 +
 .../fpu/multiarch/svml_d_log102_core_sse4.S   | 1089 +++++++++++
 .../fpu/multiarch/svml_d_log104_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_log104_core.c |   27 +
 .../fpu/multiarch/svml_d_log104_core_avx2.S   | 1074 +++++++++++
 .../fpu/multiarch/svml_d_log108_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_log108_core.c |   27 +
 .../fpu/multiarch/svml_d_log108_core_avx512.S |  299 +++
 .../fpu/multiarch/svml_d_log1p2_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_log1p2_core.c |   27 +
 .../fpu/multiarch/svml_d_log1p2_core_sse4.S   | 1398 ++++++++++++++
 .../fpu/multiarch/svml_d_log1p4_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_log1p4_core.c |   27 +
 .../fpu/multiarch/svml_d_log1p4_core_avx2.S   | 1383 ++++++++++++++
 .../fpu/multiarch/svml_d_log1p8_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_log1p8_core.c |   27 +
 .../fpu/multiarch/svml_d_log1p8_core_avx512.S |  317 ++++
 .../fpu/multiarch/svml_d_log22_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_log22_core.c  |   27 +
 .../fpu/multiarch/svml_d_log22_core_sse4.S    | 1339 +++++++++++++
 .../fpu/multiarch/svml_d_log24_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_log24_core.c  |   27 +
 .../fpu/multiarch/svml_d_log24_core_avx2.S    | 1324 +++++++++++++
 .../fpu/multiarch/svml_d_log28_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_log28_core.c  |   27 +
 .../fpu/multiarch/svml_d_log28_core_avx512.S  |  293 +++
 .../fpu/multiarch/svml_d_sinh2_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_sinh2_core.c  |   27 +
 .../fpu/multiarch/svml_d_sinh2_core_sse4.S    |  456 +++++
 .../fpu/multiarch/svml_d_sinh4_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_sinh4_core.c  |   27 +
 .../fpu/multiarch/svml_d_sinh4_core_avx2.S    |  470 +++++
 .../fpu/multiarch/svml_d_sinh8_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_sinh8_core.c  |   27 +
 .../fpu/multiarch/svml_d_sinh8_core_avx512.S  |  461 +++++
 .../fpu/multiarch/svml_d_tanh2_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_tanh2_core.c  |   27 +
 .../fpu/multiarch/svml_d_tanh2_core_sse4.S    | 1272 +++++++++++++
 .../fpu/multiarch/svml_d_tanh4_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_tanh4_core.c  |   27 +
 .../fpu/multiarch/svml_d_tanh4_core_avx2.S    | 1279 +++++++++++++
 .../fpu/multiarch/svml_d_tanh8_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_tanh8_core.c  |   27 +
 .../fpu/multiarch/svml_d_tanh8_core_avx512.S  |  472 +++++
 .../fpu/multiarch/svml_s_acoshf16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_acoshf16_core.c      |   28 +
 .../multiarch/svml_s_acoshf16_core_avx512.S   |  449 +++++
 .../fpu/multiarch/svml_s_acoshf4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_acoshf4_core.c       |   28 +
 .../fpu/multiarch/svml_s_acoshf4_core_sse4.S  |  389 ++++
 .../fpu/multiarch/svml_s_acoshf8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_acoshf8_core.c       |   28 +
 .../fpu/multiarch/svml_s_acoshf8_core_avx2.S  |  370 ++++
 .../fpu/multiarch/svml_s_asinf16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_asinf16_core.c       |   28 +
 .../multiarch/svml_s_asinf16_core_avx512.S    |  260 +++
 .../fpu/multiarch/svml_s_asinf4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_asinf4_core.c |   28 +
 .../fpu/multiarch/svml_s_asinf4_core_sse4.S   |  252 +++
 .../fpu/multiarch/svml_s_asinf8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_asinf8_core.c |   28 +
 .../fpu/multiarch/svml_s_asinf8_core_avx2.S   |  249 +++
 .../fpu/multiarch/svml_s_asinhf16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_asinhf16_core.c      |   28 +
 .../multiarch/svml_s_asinhf16_core_avx512.S   |  476 +++++
 .../fpu/multiarch/svml_s_asinhf4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_asinhf4_core.c       |   28 +
 .../fpu/multiarch/svml_s_asinhf4_core_sse4.S  |  509 +++++
 .../fpu/multiarch/svml_s_asinhf8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_asinhf8_core.c       |   28 +
 .../fpu/multiarch/svml_s_asinhf8_core_avx2.S  |  457 +++++
 .../fpu/multiarch/svml_s_atan2f16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_atan2f16_core.c      |   28 +
 .../multiarch/svml_s_atan2f16_core_avx512.S   |  399 ++++
 .../fpu/multiarch/svml_s_atan2f4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_atan2f4_core.c       |   28 +
 .../fpu/multiarch/svml_s_atan2f4_core_sse4.S  |  384 ++++
 .../fpu/multiarch/svml_s_atan2f8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_atan2f8_core.c       |   28 +
 .../fpu/multiarch/svml_s_atan2f8_core_avx2.S  |  362 ++++
 .../fpu/multiarch/svml_s_atanf16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_atanf16_core.c       |   28 +
 .../multiarch/svml_s_atanf16_core_avx512.S    |  174 ++
 .../fpu/multiarch/svml_s_atanf4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_atanf4_core.c |   28 +
 .../fpu/multiarch/svml_s_atanf4_core_sse4.S   |  164 ++
 .../fpu/multiarch/svml_s_atanf8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_atanf8_core.c |   28 +
 .../fpu/multiarch/svml_s_atanf8_core_avx2.S   |  148 ++
 .../fpu/multiarch/svml_s_atanhf16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_atanhf16_core.c      |   28 +
 .../multiarch/svml_s_atanhf16_core_avx512.S   |  393 ++++
 .../fpu/multiarch/svml_s_atanhf4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_atanhf4_core.c       |   28 +
 .../fpu/multiarch/svml_s_atanhf4_core_sse4.S  |  361 ++++
 .../fpu/multiarch/svml_s_atanhf8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_atanhf8_core.c       |   28 +
 .../fpu/multiarch/svml_s_atanhf8_core_avx2.S  |  335 ++++
 .../fpu/multiarch/svml_s_cbrtf16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_cbrtf16_core.c       |   28 +
 .../multiarch/svml_s_cbrtf16_core_avx512.S    |  235 +++
 .../fpu/multiarch/svml_s_cbrtf4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_cbrtf4_core.c |   28 +
 .../fpu/multiarch/svml_s_cbrtf4_core_sse4.S   |  490 +++++
 .../fpu/multiarch/svml_s_cbrtf8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_cbrtf8_core.c |   28 +
 .../fpu/multiarch/svml_s_cbrtf8_core_avx2.S   |  509 +++++
 .../fpu/multiarch/svml_s_coshf16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_coshf16_core.c       |   28 +
 .../multiarch/svml_s_coshf16_core_avx512.S    |  321 ++++
 .../fpu/multiarch/svml_s_coshf4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_coshf4_core.c |   28 +
 .../fpu/multiarch/svml_s_coshf4_core_sse4.S   |  305 +++
 .../fpu/multiarch/svml_s_coshf8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_coshf8_core.c |   28 +
 .../fpu/multiarch/svml_s_coshf8_core_avx2.S   |  308 +++
 .../fpu/multiarch/svml_s_erff16_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_erff16_core.c |   28 +
 .../fpu/multiarch/svml_s_erff16_core_avx512.S |  185 ++
 .../fpu/multiarch/svml_s_erff4_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_erff4_core.c  |   28 +
 .../fpu/multiarch/svml_s_erff4_core_sse4.S    |  664 +++++++
 .../fpu/multiarch/svml_s_erff8_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_s_erff8_core.c  |   28 +
 .../fpu/multiarch/svml_s_erff8_core_avx2.S    |  669 +++++++
 .../fpu/multiarch/svml_s_exp10f16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_exp10f16_core.c      |   28 +
 .../multiarch/svml_s_exp10f16_core_avx512.S   |  269 +++
 .../fpu/multiarch/svml_s_exp10f4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_exp10f4_core.c       |   28 +
 .../fpu/multiarch/svml_s_exp10f4_core_sse4.S  |  311 +++
 .../fpu/multiarch/svml_s_exp10f8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_exp10f8_core.c       |   28 +
 .../fpu/multiarch/svml_s_exp10f8_core_avx2.S  |  331 ++++
 .../fpu/multiarch/svml_s_exp2f16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_exp2f16_core.c       |   28 +
 .../multiarch/svml_s_exp2f16_core_avx512.S    |  271 +++
 .../fpu/multiarch/svml_s_exp2f4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_exp2f4_core.c |   28 +
 .../fpu/multiarch/svml_s_exp2f4_core_sse4.S   |  238 +++
 .../fpu/multiarch/svml_s_exp2f8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_exp2f8_core.c |   28 +
 .../fpu/multiarch/svml_s_exp2f8_core_avx2.S   |  245 +++
 .../fpu/multiarch/svml_s_expm1f16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_expm1f16_core.c      |   28 +
 .../multiarch/svml_s_expm1f16_core_avx512.S   |  281 +++
 .../fpu/multiarch/svml_s_expm1f4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_expm1f4_core.c       |   28 +
 .../fpu/multiarch/svml_s_expm1f4_core_sse4.S  |  358 ++++
 .../fpu/multiarch/svml_s_expm1f8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_expm1f8_core.c       |   28 +
 .../fpu/multiarch/svml_s_expm1f8_core_avx2.S  |  351 ++++
 .../fpu/multiarch/svml_s_hypotf16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_hypotf16_core.c      |   28 +
 .../multiarch/svml_s_hypotf16_core_avx512.S   |  239 +++
 .../fpu/multiarch/svml_s_hypotf4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_hypotf4_core.c       |   28 +
 .../fpu/multiarch/svml_s_hypotf4_core_sse4.S  |  265 +++
 .../fpu/multiarch/svml_s_hypotf8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_hypotf8_core.c       |   28 +
 .../fpu/multiarch/svml_s_hypotf8_core_avx2.S  |  269 +++
 .../fpu/multiarch/svml_s_log10f16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_log10f16_core.c      |   28 +
 .../multiarch/svml_s_log10f16_core_avx512.S   |  238 +++
 .../fpu/multiarch/svml_s_log10f4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_log10f4_core.c       |   28 +
 .../fpu/multiarch/svml_s_log10f4_core_sse4.S  |  243 +++
 .../fpu/multiarch/svml_s_log10f8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_log10f8_core.c       |   28 +
 .../fpu/multiarch/svml_s_log10f8_core_avx2.S  |  243 +++
 .../fpu/multiarch/svml_s_log1pf16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_log1pf16_core.c      |   28 +
 .../multiarch/svml_s_log1pf16_core_avx512.S   |  271 +++
 .../fpu/multiarch/svml_s_log1pf4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_log1pf4_core.c       |   28 +
 .../fpu/multiarch/svml_s_log1pf4_core_sse4.S  |  252 +++
 .../fpu/multiarch/svml_s_log1pf8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_log1pf8_core.c       |   28 +
 .../fpu/multiarch/svml_s_log1pf8_core_avx2.S  |  254 +++
 .../fpu/multiarch/svml_s_log2f16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_log2f16_core.c       |   28 +
 .../multiarch/svml_s_log2f16_core_avx512.S    |  231 +++
 .../fpu/multiarch/svml_s_log2f4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_log2f4_core.c |   28 +
 .../fpu/multiarch/svml_s_log2f4_core_sse4.S   |  223 +++
 .../fpu/multiarch/svml_s_log2f8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_log2f8_core.c |   28 +
 .../fpu/multiarch/svml_s_log2f8_core_avx2.S   |  226 +++
 .../fpu/multiarch/svml_s_sinhf16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_sinhf16_core.c       |   28 +
 .../multiarch/svml_s_sinhf16_core_avx512.S    |  318 ++++
 .../fpu/multiarch/svml_s_sinhf4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_sinhf4_core.c |   28 +
 .../fpu/multiarch/svml_s_sinhf4_core_sse4.S   |  308 +++
 .../fpu/multiarch/svml_s_sinhf8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_sinhf8_core.c |   28 +
 .../fpu/multiarch/svml_s_sinhf8_core_avx2.S   |  309 +++
 .../fpu/multiarch/svml_s_tanhf16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_tanhf16_core.c       |   28 +
 .../multiarch/svml_s_tanhf16_core_avx512.S    |  381 ++++
 .../fpu/multiarch/svml_s_tanhf4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_tanhf4_core.c |   28 +
 .../fpu/multiarch/svml_s_tanhf4_core_sse4.S   |  832 +++++++++
 .../fpu/multiarch/svml_s_tanhf8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_tanhf8_core.c |   28 +
 .../fpu/multiarch/svml_s_tanhf8_core_avx2.S   |  844 +++++++++
 sysdeps/x86_64/fpu/svml_d_acosh2_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_acosh4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_acosh8_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_d_asin2_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_asin4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_asin8_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_d_asinh2_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_asinh4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_asinh8_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_d_atan22_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_atan24_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_atan28_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_d_atan2_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_atan4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_atan8_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_d_atanh2_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_atanh4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_atanh8_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_d_cbrt2_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_cbrt4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_cbrt8_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_d_cosh2_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_cosh4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_cosh8_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_d_erf2_core.S         |   29 +
 sysdeps/x86_64/fpu/svml_d_erf4_core.S         |   29 +
 sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S     |   25 +
 sysdeps/x86_64/fpu/svml_d_erf8_core.S         |   25 +
 sysdeps/x86_64/fpu/svml_d_exp102_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_exp104_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_exp108_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_d_exp22_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_exp24_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_exp28_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_d_expm12_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_expm14_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_expm18_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_d_hypot2_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_hypot4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_hypot8_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_d_log102_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_log104_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_log104_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_log108_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_d_log1p2_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_log1p4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_log1p8_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_d_log22_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_log24_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_log24_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_log28_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_d_sinh2_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_sinh4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_sinh8_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_d_tanh2_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_tanh4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_tanh8_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_s_acoshf16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_acoshf4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_acoshf8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S  |   25 +
 sysdeps/x86_64/fpu/svml_s_asinf16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_asinf4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_asinf8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_s_asinhf16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_asinhf4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_asinhf8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S  |   25 +
 sysdeps/x86_64/fpu/svml_s_atan2f16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_atan2f4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_atan2f8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S  |   25 +
 sysdeps/x86_64/fpu/svml_s_atanf16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_atanf4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_atanf8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_s_atanhf16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_atanhf4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_atanhf8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S  |   25 +
 sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_s_coshf16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_coshf4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_coshf8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_s_erff16_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_s_erff4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_s_erff8_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_s_exp10f16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_exp10f4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_exp10f8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S  |   25 +
 sysdeps/x86_64/fpu/svml_s_exp2f16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_exp2f4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_exp2f8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_s_expm1f16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_expm1f4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_expm1f8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S  |   25 +
 sysdeps/x86_64/fpu/svml_s_hypotf16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_hypotf4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_hypotf8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S  |   25 +
 sysdeps/x86_64/fpu/svml_s_log10f16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_log10f4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_log10f8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S  |   25 +
 sysdeps/x86_64/fpu/svml_s_log1pf16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_log1pf4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_log1pf8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S  |   25 +
 sysdeps/x86_64/fpu/svml_s_log2f16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_log2f4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_log2f8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_s_sinhf16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_sinhf4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_sinhf8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_s_tanhf16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_tanhf4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_tanhf8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S   |   25 +
 .../fpu/test-double-libmvec-acosh-avx.c       |    1 +
 .../fpu/test-double-libmvec-acosh-avx2.c      |    1 +
 .../fpu/test-double-libmvec-acosh-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-acosh.c    |    3 +
 .../x86_64/fpu/test-double-libmvec-asin-avx.c |    1 +
 .../fpu/test-double-libmvec-asin-avx2.c       |    1 +
 .../fpu/test-double-libmvec-asin-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-asin.c |    3 +
 .../fpu/test-double-libmvec-asinh-avx.c       |    1 +
 .../fpu/test-double-libmvec-asinh-avx2.c      |    1 +
 .../fpu/test-double-libmvec-asinh-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-asinh.c    |    3 +
 .../x86_64/fpu/test-double-libmvec-atan-avx.c |    1 +
 .../fpu/test-double-libmvec-atan-avx2.c       |    1 +
 .../fpu/test-double-libmvec-atan-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-atan.c |    3 +
 .../fpu/test-double-libmvec-atan2-avx.c       |    1 +
 .../fpu/test-double-libmvec-atan2-avx2.c      |    1 +
 .../fpu/test-double-libmvec-atan2-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-atan2.c    |    3 +
 .../fpu/test-double-libmvec-atanh-avx.c       |    1 +
 .../fpu/test-double-libmvec-atanh-avx2.c      |    1 +
 .../fpu/test-double-libmvec-atanh-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-atanh.c    |    3 +
 .../x86_64/fpu/test-double-libmvec-cbrt-avx.c |    1 +
 .../fpu/test-double-libmvec-cbrt-avx2.c       |    1 +
 .../fpu/test-double-libmvec-cbrt-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c |    3 +
 .../x86_64/fpu/test-double-libmvec-cosh-avx.c |    1 +
 .../fpu/test-double-libmvec-cosh-avx2.c       |    1 +
 .../fpu/test-double-libmvec-cosh-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-cosh.c |    3 +
 .../x86_64/fpu/test-double-libmvec-erf-avx.c  |    1 +
 .../x86_64/fpu/test-double-libmvec-erf-avx2.c |    1 +
 .../fpu/test-double-libmvec-erf-avx512f.c     |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-erf.c  |    3 +
 .../fpu/test-double-libmvec-exp10-avx.c       |    1 +
 .../fpu/test-double-libmvec-exp10-avx2.c      |    1 +
 .../fpu/test-double-libmvec-exp10-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-exp10.c    |    3 +
 .../x86_64/fpu/test-double-libmvec-exp2-avx.c |    1 +
 .../fpu/test-double-libmvec-exp2-avx2.c       |    1 +
 .../fpu/test-double-libmvec-exp2-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-exp2.c |    3 +
 .../fpu/test-double-libmvec-expm1-avx.c       |    1 +
 .../fpu/test-double-libmvec-expm1-avx2.c      |    1 +
 .../fpu/test-double-libmvec-expm1-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-expm1.c    |    3 +
 .../fpu/test-double-libmvec-hypot-avx.c       |    1 +
 .../fpu/test-double-libmvec-hypot-avx2.c      |    1 +
 .../fpu/test-double-libmvec-hypot-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-hypot.c    |    3 +
 .../fpu/test-double-libmvec-log10-avx.c       |    1 +
 .../fpu/test-double-libmvec-log10-avx2.c      |    1 +
 .../fpu/test-double-libmvec-log10-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-log10.c    |    3 +
 .../fpu/test-double-libmvec-log1p-avx.c       |    1 +
 .../fpu/test-double-libmvec-log1p-avx2.c      |    1 +
 .../fpu/test-double-libmvec-log1p-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-log1p.c    |    3 +
 .../x86_64/fpu/test-double-libmvec-log2-avx.c |    1 +
 .../fpu/test-double-libmvec-log2-avx2.c       |    1 +
 .../fpu/test-double-libmvec-log2-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-log2.c |    3 +
 .../x86_64/fpu/test-double-libmvec-sinh-avx.c |    1 +
 .../fpu/test-double-libmvec-sinh-avx2.c       |    1 +
 .../fpu/test-double-libmvec-sinh-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-sinh.c |    3 +
 .../x86_64/fpu/test-double-libmvec-tanh-avx.c |    1 +
 .../fpu/test-double-libmvec-tanh-avx2.c       |    1 +
 .../fpu/test-double-libmvec-tanh-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-tanh.c |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   18 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   18 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   18 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   18 +
 .../fpu/test-float-libmvec-acoshf-avx.c       |    1 +
 .../fpu/test-float-libmvec-acoshf-avx2.c      |    1 +
 .../fpu/test-float-libmvec-acoshf-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-acoshf.c    |    3 +
 .../x86_64/fpu/test-float-libmvec-asinf-avx.c |    1 +
 .../fpu/test-float-libmvec-asinf-avx2.c       |    1 +
 .../fpu/test-float-libmvec-asinf-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-asinf.c |    3 +
 .../fpu/test-float-libmvec-asinhf-avx.c       |    1 +
 .../fpu/test-float-libmvec-asinhf-avx2.c      |    1 +
 .../fpu/test-float-libmvec-asinhf-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-asinhf.c    |    3 +
 .../fpu/test-float-libmvec-atan2f-avx.c       |    1 +
 .../fpu/test-float-libmvec-atan2f-avx2.c      |    1 +
 .../fpu/test-float-libmvec-atan2f-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-atan2f.c    |    3 +
 .../x86_64/fpu/test-float-libmvec-atanf-avx.c |    1 +
 .../fpu/test-float-libmvec-atanf-avx2.c       |    1 +
 .../fpu/test-float-libmvec-atanf-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-atanf.c |    3 +
 .../fpu/test-float-libmvec-atanhf-avx.c       |    1 +
 .../fpu/test-float-libmvec-atanhf-avx2.c      |    1 +
 .../fpu/test-float-libmvec-atanhf-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-atanhf.c    |    3 +
 .../x86_64/fpu/test-float-libmvec-cbrtf-avx.c |    1 +
 .../fpu/test-float-libmvec-cbrtf-avx2.c       |    1 +
 .../fpu/test-float-libmvec-cbrtf-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c |    3 +
 .../x86_64/fpu/test-float-libmvec-coshf-avx.c |    1 +
 .../fpu/test-float-libmvec-coshf-avx2.c       |    1 +
 .../fpu/test-float-libmvec-coshf-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-coshf.c |    3 +
 .../x86_64/fpu/test-float-libmvec-erff-avx.c  |    1 +
 .../x86_64/fpu/test-float-libmvec-erff-avx2.c |    1 +
 .../fpu/test-float-libmvec-erff-avx512f.c     |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-erff.c  |    3 +
 .../fpu/test-float-libmvec-exp10f-avx.c       |    1 +
 .../fpu/test-float-libmvec-exp10f-avx2.c      |    1 +
 .../fpu/test-float-libmvec-exp10f-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-exp10f.c    |    3 +
 .../x86_64/fpu/test-float-libmvec-exp2f-avx.c |    1 +
 .../fpu/test-float-libmvec-exp2f-avx2.c       |    1 +
 .../fpu/test-float-libmvec-exp2f-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c |    3 +
 .../fpu/test-float-libmvec-expm1f-avx.c       |    1 +
 .../fpu/test-float-libmvec-expm1f-avx2.c      |    1 +
 .../fpu/test-float-libmvec-expm1f-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-expm1f.c    |    3 +
 .../fpu/test-float-libmvec-hypotf-avx.c       |    1 +
 .../fpu/test-float-libmvec-hypotf-avx2.c      |    1 +
 .../fpu/test-float-libmvec-hypotf-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-hypotf.c    |    3 +
 .../fpu/test-float-libmvec-log10f-avx.c       |    1 +
 .../fpu/test-float-libmvec-log10f-avx2.c      |    1 +
 .../fpu/test-float-libmvec-log10f-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-log10f.c    |    3 +
 .../fpu/test-float-libmvec-log1pf-avx.c       |    1 +
 .../fpu/test-float-libmvec-log1pf-avx2.c      |    1 +
 .../fpu/test-float-libmvec-log1pf-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-log1pf.c    |    3 +
 .../x86_64/fpu/test-float-libmvec-log2f-avx.c |    1 +
 .../fpu/test-float-libmvec-log2f-avx2.c       |    1 +
 .../fpu/test-float-libmvec-log2f-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-log2f.c |    3 +
 .../x86_64/fpu/test-float-libmvec-sinhf-avx.c |    1 +
 .../fpu/test-float-libmvec-sinhf-avx2.c       |    1 +
 .../fpu/test-float-libmvec-sinhf-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c |    3 +
 .../x86_64/fpu/test-float-libmvec-tanhf-avx.c |    1 +
 .../fpu/test-float-libmvec-tanhf-avx2.c       |    1 +
 .../fpu/test-float-libmvec-tanhf-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   18 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   18 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   18 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   18 +
 628 files changed, 64608 insertions(+), 18 deletions(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log102_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log102_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log102_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log104_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log104_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log104_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log108_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log108_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log108_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log22_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log22_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log22_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log24_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log24_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log24_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log28_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log28_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log28_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asin2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asin4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asin8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan22_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan24_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan28_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erf2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp102_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp104_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp108_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp22_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp24_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp28_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_expm12_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_expm14_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_expm18_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log102_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log104_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log104_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log108_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log22_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log24_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log24_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log28_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erff16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erff4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erff8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 01/18] x86-64: Add vector atan/atanf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:25   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 02/18] x86-64: Add vector asin/asinf " Sunil K Pandey
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized atan/atanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector atan/atanf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 ++
 .../fpu/multiarch/svml_d_atan2_core-sse2.S    |  20 ++
 .../x86_64/fpu/multiarch/svml_d_atan2_core.c  |  27 ++
 .../fpu/multiarch/svml_d_atan2_core_sse4.S    | 245 ++++++++++++++++++
 .../fpu/multiarch/svml_d_atan4_core-sse.S     |  20 ++
 .../x86_64/fpu/multiarch/svml_d_atan4_core.c  |  27 ++
 .../fpu/multiarch/svml_d_atan4_core_avx2.S    | 225 ++++++++++++++++
 .../fpu/multiarch/svml_d_atan8_core-avx2.S    |  20 ++
 .../x86_64/fpu/multiarch/svml_d_atan8_core.c  |  27 ++
 .../fpu/multiarch/svml_d_atan8_core_avx512.S  | 213 +++++++++++++++
 .../fpu/multiarch/svml_s_atanf16_core-avx2.S  |  20 ++
 .../fpu/multiarch/svml_s_atanf16_core.c       |  28 ++
 .../multiarch/svml_s_atanf16_core_avx512.S    | 174 +++++++++++++
 .../fpu/multiarch/svml_s_atanf4_core-sse2.S   |  20 ++
 .../x86_64/fpu/multiarch/svml_s_atanf4_core.c |  28 ++
 .../fpu/multiarch/svml_s_atanf4_core_sse4.S   | 164 ++++++++++++
 .../fpu/multiarch/svml_s_atanf8_core-sse.S    |  20 ++
 .../x86_64/fpu/multiarch/svml_s_atanf8_core.c |  28 ++
 .../fpu/multiarch/svml_s_atanf8_core_avx2.S   | 148 +++++++++++
 sysdeps/x86_64/fpu/svml_d_atan2_core.S        |  29 +++
 sysdeps/x86_64/fpu/svml_d_atan4_core.S        |  29 +++
 sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S    |  25 ++
 sysdeps/x86_64/fpu/svml_d_atan8_core.S        |  25 ++
 sysdeps/x86_64/fpu/svml_s_atanf16_core.S      |  25 ++
 sysdeps/x86_64/fpu/svml_s_atanf4_core.S       |  29 +++
 sysdeps/x86_64/fpu/svml_s_atanf8_core.S       |  29 +++
 sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S   |  25 ++
 .../x86_64/fpu/test-double-libmvec-atan-avx.c |   1 +
 .../fpu/test-double-libmvec-atan-avx2.c       |   1 +
 .../fpu/test-double-libmvec-atan-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-double-libmvec-atan.c |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-atanf-avx.c |   1 +
 .../fpu/test-float-libmvec-atanf-avx2.c       |   1 +
 .../fpu/test-float-libmvec-atanf-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-float-libmvec-atanf.c |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 1741 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 2ccdd1fc53..b4647ca918 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -109,4 +109,15 @@
 #define __DECL_SIMD_acosf32x
 #define __DECL_SIMD_acosf64x
 #define __DECL_SIMD_acosf128x
+
+#define __DECL_SIMD_atan
+#define __DECL_SIMD_atanf
+#define __DECL_SIMD_atanl
+#define __DECL_SIMD_atanf16
+#define __DECL_SIMD_atanf32
+#define __DECL_SIMD_atanf64
+#define __DECL_SIMD_atanf128
+#define __DECL_SIMD_atanf32x
+#define __DECL_SIMD_atanf64x
+#define __DECL_SIMD_atanf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 2cc6654208..3e27c21f21 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -54,7 +54,7 @@ __MATHCALL_VEC (acos,, (_Mdouble_ __x));
 /* Arc sine of X.  */
 __MATHCALL (asin,, (_Mdouble_ __x));
 /* Arc tangent of X.  */
-__MATHCALL (atan,, (_Mdouble_ __x));
+__MATHCALL_VEC (atan,, (_Mdouble_ __x));
 /* Arc tangent of Y/X.  */
 __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
 
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index b37b55777e..a93258db6f 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -47,10 +47,18 @@ GLIBC_2.22 _ZGVeN8v_sin F
 GLIBC_2.22 _ZGVeN8vv_pow F
 GLIBC_2.22 _ZGVeN8vvv_sincos F
 GLIBC_2.35 _ZGVbN2v_acos F
+GLIBC_2.35 _ZGVbN2v_atan F
 GLIBC_2.35 _ZGVbN4v_acosf F
+GLIBC_2.35 _ZGVbN4v_atanf F
 GLIBC_2.35 _ZGVcN4v_acos F
+GLIBC_2.35 _ZGVcN4v_atan F
 GLIBC_2.35 _ZGVcN8v_acosf F
+GLIBC_2.35 _ZGVcN8v_atanf F
 GLIBC_2.35 _ZGVdN4v_acos F
+GLIBC_2.35 _ZGVdN4v_atan F
 GLIBC_2.35 _ZGVdN8v_acosf F
+GLIBC_2.35 _ZGVdN8v_atanf F
 GLIBC_2.35 _ZGVeN16v_acosf F
+GLIBC_2.35 _ZGVeN16v_atanf F
 GLIBC_2.35 _ZGVeN8v_acos F
+GLIBC_2.35 _ZGVeN8v_atan F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index dabb74cbb9..1c0e5c5e35 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -62,6 +62,10 @@
 #  define __DECL_SIMD_acos __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_acosf
 #  define __DECL_SIMD_acosf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_atan
+#  define __DECL_SIMD_atan __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_atanf
+#  define __DECL_SIMD_atanf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index 4bcbd1fbce..ddcccb11d7 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -30,6 +30,8 @@
 !GCC$ builtin (powf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (acos) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (acosf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (atan) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (atanf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -45,3 +47,5 @@
 !GCC$ builtin (powf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (acos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (acosf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (atan) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (atanf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 7acf1f306c..dae0887f13 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -23,6 +23,7 @@ postclean-generated += libmvec.mk
 # Define for both math and mathvec directories.
 libmvec-funcs = \
   acos \
+  atan \
   cos \
   exp \
   log \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 2985fe7ca7..424f6d526e 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -15,6 +15,8 @@ libmvec {
   }
   GLIBC_2.35 {
     _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
+    _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
+    _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
   }
 }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 6c12976c82..2e64e59803 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -164,6 +164,26 @@ float: 2
 float128: 2
 ldouble: 1
 
+Function: "atan_vlen16":
+float: 1
+
+Function: "atan_vlen2":
+double: 1
+
+Function: "atan_vlen4":
+double: 1
+float: 1
+
+Function: "atan_vlen4_avx2":
+double: 1
+
+Function: "atan_vlen8":
+double: 1
+float: 1
+
+Function: "atan_vlen8_avx2":
+float: 1
+
 Function: "atanh":
 double: 2
 float: 2
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S
new file mode 100644
index 0000000000..115e5223aa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized atan, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_atan _ZGVbN2v_atan_sse2
+#include "../svml_d_atan2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c
new file mode 100644
index 0000000000..93f079ffcb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized atan, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_atan
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_atan, __GI__ZGVbN2v_atan, __redirect__ZGVbN2v_atan)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S
new file mode 100644
index 0000000000..f0ad036b9e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S
@@ -0,0 +1,245 @@
+/* Function atan vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ */
+
+/* Offsets for data table __svml_datan_data_internal_avx512
+ */
+#define AbsMask                       	0
+#define Shifter                       	16
+#define MaxThreshold                  	32
+#define MOne                          	48
+#define One                           	64
+#define LargeX                        	80
+#define Zero                          	96
+#define Tbl_H                         	112
+#define Tbl_L                         	368
+#define dIndexMed                     	624
+#define Pi2                           	640
+#define Pi2_low                       	656
+#define coeff                         	672
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_atan_sse4)
+        lea       Tbl_H+128+__svml_datan_data_internal_avx512(%rip), %rcx
+        movups    __svml_datan_data_internal_avx512(%rip), %xmm4
+        movups    Shifter+__svml_datan_data_internal_avx512(%rip), %xmm3
+        andps     %xmm0, %xmm4
+        movaps    %xmm3, %xmm12
+        movaps    %xmm4, %xmm5
+        addpd     %xmm4, %xmm12
+        movaps    %xmm12, %xmm7
+
+/*
+ * table lookup sequence
+ * VPERMUTE not available
+ */
+        movaps    %xmm12, %xmm10
+        subpd     %xmm3, %xmm7
+        subpd     %xmm7, %xmm5
+        mulpd     %xmm4, %xmm7
+        movups    MaxThreshold+__svml_datan_data_internal_avx512(%rip), %xmm2
+        psllq     $3, %xmm10
+
+/* saturate X range */
+        movups    LargeX+__svml_datan_data_internal_avx512(%rip), %xmm8
+        pxor      %xmm4, %xmm0
+        cmplepd   %xmm4, %xmm2
+        addpd     One+__svml_datan_data_internal_avx512(%rip), %xmm7
+        minpd     %xmm4, %xmm8
+        movups    MOne+__svml_datan_data_internal_avx512(%rip), %xmm6
+        movaps    %xmm2, %xmm1
+        movaps    %xmm2, %xmm9
+        andnps    %xmm5, %xmm1
+        andps     %xmm2, %xmm6
+        andnps    %xmm7, %xmm9
+        andps     %xmm2, %xmm8
+        orps      %xmm6, %xmm1
+        orps      %xmm8, %xmm9
+
+/* R+Rl = DiffX/Y */
+        divpd     %xmm9, %xmm1
+        pand      .FLT_11(%rip), %xmm10
+
+/* set table value to Pi/2 for large X */
+        movups    Pi2+__svml_datan_data_internal_avx512(%rip), %xmm4
+        movd      %xmm10, %eax
+        andps     %xmm2, %xmm4
+        pshufd    $2, %xmm10, %xmm11
+        movaps    %xmm2, %xmm10
+
+/* polynomial evaluation */
+        movaps    %xmm1, %xmm2
+        mulpd     %xmm1, %xmm2
+        movd      %xmm11, %edx
+        movups    coeff+__svml_datan_data_internal_avx512(%rip), %xmm5
+        movaps    %xmm2, %xmm7
+        movups    coeff+32+__svml_datan_data_internal_avx512(%rip), %xmm6
+        movaps    %xmm2, %xmm9
+        mulpd     %xmm2, %xmm5
+        mulpd     %xmm2, %xmm7
+        addpd     coeff+16+__svml_datan_data_internal_avx512(%rip), %xmm5
+        mulpd     %xmm2, %xmm6
+        mulpd     %xmm7, %xmm5
+        addpd     coeff+48+__svml_datan_data_internal_avx512(%rip), %xmm6
+        mulpd     %xmm1, %xmm9
+        addpd     %xmm5, %xmm6
+        movups    coeff+64+__svml_datan_data_internal_avx512(%rip), %xmm8
+        mulpd     %xmm2, %xmm8
+        mulpd     %xmm6, %xmm7
+        addpd     coeff+80+__svml_datan_data_internal_avx512(%rip), %xmm8
+        addpd     %xmm7, %xmm8
+        mulpd     %xmm8, %xmm9
+        movups    dIndexMed+__svml_datan_data_internal_avx512(%rip), %xmm14
+        cmplepd   %xmm12, %xmm14
+        addpd     %xmm9, %xmm1
+        movslq    %eax, %rax
+        movaps    %xmm14, %xmm3
+        movslq    %edx, %rdx
+        movsd     -128(%rax,%rcx), %xmm13
+        movsd     (%rcx,%rax), %xmm15
+        movhpd    -128(%rdx,%rcx), %xmm13
+        movhpd    (%rcx,%rdx), %xmm15
+        andnps    %xmm13, %xmm3
+        andps     %xmm14, %xmm15
+        orps      %xmm15, %xmm3
+        andnps    %xmm3, %xmm10
+        orps      %xmm4, %xmm10
+        addpd     %xmm1, %xmm10
+        pxor      %xmm10, %xmm0
+        ret
+
+END(_ZGVbN2v_atan_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_datan_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 AbsMask[2][2];
+        __declspec(align(16)) VUINT32 Shifter[2][2];
+        __declspec(align(16)) VUINT32 MaxThreshold[2][2];
+        __declspec(align(16)) VUINT32 MOne[2][2];
+        __declspec(align(16)) VUINT32 One[2][2];
+        __declspec(align(16)) VUINT32 LargeX[2][2];
+        __declspec(align(16)) VUINT32 Zero[2][2];
+        __declspec(align(16)) VUINT32 Tbl_H[32][2];
+        __declspec(align(16)) VUINT32 Tbl_L[32][2];
+        __declspec(align(16)) VUINT32 dIndexMed[2][2];
+        __declspec(align(16)) VUINT32 Pi2[2][2];
+        __declspec(align(16)) VUINT32 Pi2_low[2][2];
+        __declspec(align(16)) VUINT32 coeff[6][2][2];
+    } __svml_datan_data_internal_avx512;
+#endif
+__svml_datan_data_internal_avx512:
+        /*== AbsMask ==*/
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== Shifter ==*/
+        .align 16
+        .quad 0x4318000000000000, 0x4318000000000000
+        /*== MaxThreshold ==*/
+        .align 16
+        .quad 0x401f800000000000, 0x401f800000000000
+        /*== MOne ==*/
+        .align 16
+        .quad 0xbff0000000000000, 0xbff0000000000000
+        /*== One ==*/
+        .align 16
+        .quad 0x3ff0000000000000, 0x3ff0000000000000
+        /*== LargeX ==*/
+        .align 16
+        .quad 0x47f0000000000000, 0x47f0000000000000
+        /*== Zero ==*/
+        .align 16
+        .quad 0x0000000000000000, 0x0000000000000000
+        /*== Tbl_H ==*/
+        .align 16
+        .quad 0x0000000000000000, 0x3fcf5b75f92c80dd
+        .quad 0x3fddac670561bb4f, 0x3fe4978fa3269ee1
+        .quad 0x3fe921fb54442d18, 0x3fecac7c57846f9e
+        .quad 0x3fef730bd281f69b, 0x3ff0d38f2c5ba09f
+        .quad 0x3ff1b6e192ebbe44, 0x3ff270ef55a53a25
+        .quad 0x3ff30b6d796a4da8, 0x3ff38d6a6ce13353
+        .quad 0x3ff3fc176b7a8560, 0x3ff45b54837351a0
+        .quad 0x3ff4ae10fc6589a5, 0x3ff4f68dea672617
+        .quad 0x3ff5368c951e9cfd, 0x3ff56f6f33a3e6a7
+        .quad 0x3ff5a25052114e60, 0x3ff5d013c41adabd
+        .quad 0x3ff5f97315254857, 0x3ff61f06c6a92b89
+        .quad 0x3ff6414d44094c7c, 0x3ff660b02c736a06
+        .quad 0x3ff67d8863bc99bd, 0x3ff698213a9d5053
+        .quad 0x3ff6b0bae830c070, 0x3ff6c78c7edeb195
+        .quad 0x3ff6dcc57bb565fd, 0x3ff6f08f07435fec
+        .quad 0x3ff7030cf9403197, 0x3ff7145eac2088a4
+        /*== Tbl_L ==*/
+        .align 16
+        .quad 0x0000000000000000, 0x3c68ab6e3cf7afbd
+        .quad 0x3c7a2b7f222f65e2, 0x3c72419a87f2a458
+        .quad 0x3c81a62633145c07, 0x3c80dae13ad18a6b
+        .quad 0x3c7007887af0cbbd, 0xbc9bd0dc231bfd70
+        .quad 0x3c9b1b466a88828e, 0xbc9a66b1af5f84fb
+        .quad 0x3c96254cb03bb199, 0xbc812c77e8a80f5c
+        .quad 0xbc4441a3bd3f1084, 0x3c79e4a72eedacc4
+        .quad 0xbc93b03e8a27f555, 0x3c9934f9f2b0020e
+        .quad 0xbc996f47948a99f1, 0xbc7df6edd6f1ec3b
+        .quad 0x3c78c2d0c89de218, 0x3c9f82bba194dd5d
+        .quad 0xbc831151a43b51ca, 0xbc8487d50bceb1a5
+        .quad 0xbc9c5f60a65c7397, 0xbc7acb6afb332a0f
+        .quad 0xbc99b7bd2e1e8c9c, 0xbc9b9839085189e3
+        .quad 0xbc97d1ab82ffb70b, 0x3c99239ad620ffe2
+        .quad 0xbc929c86447928e7, 0xbc8957a7170df016
+        .quad 0xbc7cbe1896221608, 0xbc9fda5797b32a0b
+        /*== dIndexMed ==*/
+        .align 16
+        .quad 0x4318000000000010, 0x4318000000000010
+        /*== Pi2 ==*/
+        .align 16
+        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18
+        /*== Pi2_low ==*/
+        .align 16
+        .quad 0x3c91a62633145c07, 0x3c91a62633145c07
+        /*== coeff6 ==*/
+        .align 16
+        .quad 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97
+        .quad 0xbfb74257c46790cc, 0xbfb74257c46790cc
+        .quad 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0
+        .quad 0xbfc249248eef04da, 0xbfc249248eef04da
+        .quad 0x3fc999999998741e, 0x3fc999999998741e
+        .quad 0xbfd555555555554d, 0xbfd555555555554d
+        .align 16
+        .type	__svml_datan_data_internal_avx512,@object
+        .size	__svml_datan_data_internal_avx512,.-__svml_datan_data_internal_avx512
+        .align 16
+
+.FLT_11:
+        .long	0x00000078,0x00000000,0x00000078,0x00000000
+        .type	.FLT_11,@object
+        .size	.FLT_11,16
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S
new file mode 100644
index 0000000000..79c48dbc91
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized atan, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_atan _ZGVdN4v_atan_sse_wrapper
+#include "../svml_d_atan4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c
new file mode 100644
index 0000000000..64ce66b9fd
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized atan, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_atan
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_atan, __GI__ZGVdN4v_atan, __redirect__ZGVdN4v_atan)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S
new file mode 100644
index 0000000000..50336514d7
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S
@@ -0,0 +1,225 @@
+/* Function atan vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ */
+
+/* Offsets for data table __svml_datan_data_internal_avx512
+ */
+#define AbsMask                       	0
+#define Shifter                       	32
+#define MaxThreshold                  	64
+#define MOne                          	96
+#define One                           	128
+#define LargeX                        	160
+#define Zero                          	192
+#define Tbl_H                         	224
+#define Tbl_L                         	480
+#define dIndexMed                     	736
+#define Pi2                           	768
+#define Pi2_low                       	800
+#define coeff                         	832
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_atan_avx2)
+        lea       Tbl_H+128+__svml_datan_data_internal_avx512(%rip), %rdi
+        vmovupd   Shifter+__svml_datan_data_internal_avx512(%rip), %ymm4
+        vmovupd   One+__svml_datan_data_internal_avx512(%rip), %ymm9
+
+/* saturate X range */
+        vmovupd   LargeX+__svml_datan_data_internal_avx512(%rip), %ymm6
+        vandpd    __svml_datan_data_internal_avx512(%rip), %ymm0, %ymm7
+        vaddpd    %ymm4, %ymm7, %ymm2
+        vcmpge_oqpd MaxThreshold+__svml_datan_data_internal_avx512(%rip), %ymm7, %ymm3
+        vminpd    %ymm7, %ymm6, %ymm10
+        vsubpd    %ymm4, %ymm2, %ymm5
+
+/*
+ * table lookup sequence
+ * VPERMUTE not available
+ */
+        vpsllq    $3, %ymm2, %ymm13
+        vsubpd    %ymm5, %ymm7, %ymm8
+        vcmpge_oqpd dIndexMed+__svml_datan_data_internal_avx512(%rip), %ymm2, %ymm2
+        vfmadd231pd %ymm7, %ymm5, %ymm9
+        vpand     .FLT_11(%rip), %ymm13, %ymm14
+        vblendvpd %ymm3, MOne+__svml_datan_data_internal_avx512(%rip), %ymm8, %ymm11
+        vblendvpd %ymm3, %ymm10, %ymm9, %ymm12
+        vxorpd    %ymm0, %ymm7, %ymm1
+
+/* R+Rl = DiffX/Y */
+        vdivpd    %ymm12, %ymm11, %ymm0
+        vextractf128 $1, %ymm14, %xmm4
+        vmovd     %xmm14, %eax
+        vmovd     %xmm4, %ecx
+        movslq    %eax, %rax
+        vpextrd   $2, %xmm14, %edx
+        movslq    %ecx, %rcx
+        vpextrd   $2, %xmm4, %esi
+        movslq    %edx, %rdx
+        movslq    %esi, %rsi
+        vmovsd    -128(%rax,%rdi), %xmm15
+        vmovsd    (%rdi,%rax), %xmm7
+        vmovsd    -128(%rcx,%rdi), %xmm5
+        vmovsd    (%rdi,%rcx), %xmm9
+        vmovhpd   -128(%rdx,%rdi), %xmm15, %xmm15
+        vmovhpd   (%rdi,%rdx), %xmm7, %xmm8
+        vmovhpd   -128(%rsi,%rdi), %xmm5, %xmm6
+        vmovhpd   (%rdi,%rsi), %xmm9, %xmm10
+
+/* polynomial evaluation */
+        vmulpd    %ymm0, %ymm0, %ymm5
+        vmulpd    %ymm5, %ymm5, %ymm4
+        vinsertf128 $1, %xmm6, %ymm15, %ymm11
+        vinsertf128 $1, %xmm10, %ymm8, %ymm12
+        vblendvpd %ymm2, %ymm12, %ymm11, %ymm13
+        vmovupd   coeff+__svml_datan_data_internal_avx512(%rip), %ymm8
+        vmovupd   coeff+64+__svml_datan_data_internal_avx512(%rip), %ymm2
+        vmulpd    %ymm5, %ymm0, %ymm6
+        vfmadd213pd coeff+32+__svml_datan_data_internal_avx512(%rip), %ymm5, %ymm8
+        vfmadd213pd coeff+96+__svml_datan_data_internal_avx512(%rip), %ymm5, %ymm2
+
+/* set table value to Pi/2 for large X */
+        vblendvpd %ymm3, Pi2+__svml_datan_data_internal_avx512(%rip), %ymm13, %ymm7
+        vmovupd   coeff+128+__svml_datan_data_internal_avx512(%rip), %ymm3
+        vfmadd213pd %ymm2, %ymm4, %ymm8
+        vfmadd213pd coeff+160+__svml_datan_data_internal_avx512(%rip), %ymm3, %ymm5
+        vfmadd213pd %ymm5, %ymm4, %ymm8
+        vfmadd213pd %ymm0, %ymm6, %ymm8
+        vaddpd    %ymm8, %ymm7, %ymm0
+        vxorpd    %ymm1, %ymm0, %ymm0
+        ret
+
+END(_ZGVdN4v_atan_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+.FLT_11:
+        .long	0x00000078,0x00000000,0x00000078,0x00000000,0x00000078,0x00000000,0x00000078,0x00000000
+        .type	.FLT_11,@object
+        .size	.FLT_11,32
+        .align 32
+
+#ifdef __svml_datan_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 AbsMask[4][2];
+        __declspec(align(32)) VUINT32 Shifter[4][2];
+        __declspec(align(32)) VUINT32 MaxThreshold[4][2];
+        __declspec(align(32)) VUINT32 MOne[4][2];
+        __declspec(align(32)) VUINT32 One[4][2];
+        __declspec(align(32)) VUINT32 LargeX[4][2];
+        __declspec(align(32)) VUINT32 Zero[4][2];
+        __declspec(align(32)) VUINT32 Tbl_H[32][2];
+        __declspec(align(32)) VUINT32 Tbl_L[32][2];
+        __declspec(align(32)) VUINT32 dIndexMed[4][2];
+        __declspec(align(32)) VUINT32 Pi2[4][2];
+        __declspec(align(32)) VUINT32 Pi2_low[4][2];
+        __declspec(align(32)) VUINT32 coeff[6][4][2];
+    } __svml_datan_data_internal_avx512;
+#endif
+__svml_datan_data_internal_avx512:
+        /*== AbsMask ==*/
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== Shifter ==*/
+        .align 32
+        .quad 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000
+        /*== MaxThreshold ==*/
+        .align 32
+        .quad 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000
+        /*== MOne ==*/
+        .align 32
+        .quad 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000
+        /*== One ==*/
+        .align 32
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== LargeX ==*/
+        .align 32
+        .quad 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000
+        /*== Zero ==*/
+        .align 32
+        .quad 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000
+        /*== Tbl_H ==*/
+        .align 32
+        .quad 0x0000000000000000, 0x3fcf5b75f92c80dd
+        .quad 0x3fddac670561bb4f, 0x3fe4978fa3269ee1
+        .quad 0x3fe921fb54442d18, 0x3fecac7c57846f9e
+        .quad 0x3fef730bd281f69b, 0x3ff0d38f2c5ba09f
+        .quad 0x3ff1b6e192ebbe44, 0x3ff270ef55a53a25
+        .quad 0x3ff30b6d796a4da8, 0x3ff38d6a6ce13353
+        .quad 0x3ff3fc176b7a8560, 0x3ff45b54837351a0
+        .quad 0x3ff4ae10fc6589a5, 0x3ff4f68dea672617
+        .quad 0x3ff5368c951e9cfd, 0x3ff56f6f33a3e6a7
+        .quad 0x3ff5a25052114e60, 0x3ff5d013c41adabd
+        .quad 0x3ff5f97315254857, 0x3ff61f06c6a92b89
+        .quad 0x3ff6414d44094c7c, 0x3ff660b02c736a06
+        .quad 0x3ff67d8863bc99bd, 0x3ff698213a9d5053
+        .quad 0x3ff6b0bae830c070, 0x3ff6c78c7edeb195
+        .quad 0x3ff6dcc57bb565fd, 0x3ff6f08f07435fec
+        .quad 0x3ff7030cf9403197, 0x3ff7145eac2088a4
+        /*== Tbl_L ==*/
+        .align 32
+        .quad 0x0000000000000000, 0x3c68ab6e3cf7afbd
+        .quad 0x3c7a2b7f222f65e2, 0x3c72419a87f2a458
+        .quad 0x3c81a62633145c07, 0x3c80dae13ad18a6b
+        .quad 0x3c7007887af0cbbd, 0xbc9bd0dc231bfd70
+        .quad 0x3c9b1b466a88828e, 0xbc9a66b1af5f84fb
+        .quad 0x3c96254cb03bb199, 0xbc812c77e8a80f5c
+        .quad 0xbc4441a3bd3f1084, 0x3c79e4a72eedacc4
+        .quad 0xbc93b03e8a27f555, 0x3c9934f9f2b0020e
+        .quad 0xbc996f47948a99f1, 0xbc7df6edd6f1ec3b
+        .quad 0x3c78c2d0c89de218, 0x3c9f82bba194dd5d
+        .quad 0xbc831151a43b51ca, 0xbc8487d50bceb1a5
+        .quad 0xbc9c5f60a65c7397, 0xbc7acb6afb332a0f
+        .quad 0xbc99b7bd2e1e8c9c, 0xbc9b9839085189e3
+        .quad 0xbc97d1ab82ffb70b, 0x3c99239ad620ffe2
+        .quad 0xbc929c86447928e7, 0xbc8957a7170df016
+        .quad 0xbc7cbe1896221608, 0xbc9fda5797b32a0b
+        /*== dIndexMed ==*/
+        .align 32
+        .quad 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010
+        /*== Pi2 ==*/
+        .align 32
+        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18
+        /*== Pi2_low ==*/
+        .align 32
+        .quad 0x3c91a62633145c07, 0x3c91a62633145c07, 0x3c91a62633145c07, 0x3c91a62633145c07
+        /*== coeff6 ==*/
+        .align 32
+        .quad 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97
+        .quad 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc
+        .quad 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0
+        .quad 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da
+        .quad 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e
+        .quad 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d
+        .align 32
+        .type	__svml_datan_data_internal_avx512,@object
+        .size	__svml_datan_data_internal_avx512,.-__svml_datan_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S
new file mode 100644
index 0000000000..723734e10b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized atan, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_atan _ZGVeN8v_atan_avx2_wrapper
+#include "../svml_d_atan8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c
new file mode 100644
index 0000000000..e97a41b6bc
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized atan, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_atan
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_atan, __GI__ZGVeN8v_atan, __redirect__ZGVeN8v_atan)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S
new file mode 100644
index 0000000000..fa6cb47308
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S
@@ -0,0 +1,213 @@
+/* Function atan vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ */
+
+/* Offsets for data table __svml_datan_data_internal_avx512
+ */
+#define AbsMask                       	0
+#define Shifter                       	64
+#define MaxThreshold                  	128
+#define MOne                          	192
+#define One                           	256
+#define LargeX                        	320
+#define Zero                          	384
+#define Tbl_H                         	448
+#define dIndexMed                     	704
+#define Pi2                           	768
+#define coeff_1                       	832
+#define coeff_2                       	896
+#define coeff_3                       	960
+#define coeff_4                       	1024
+#define coeff_5                       	1088
+#define coeff_6                       	1152
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_atan_skx)
+        vmovups   Shifter+__svml_datan_data_internal_avx512(%rip), %zmm4
+        vmovups   MaxThreshold+__svml_datan_data_internal_avx512(%rip), %zmm3
+        vmovups   One+__svml_datan_data_internal_avx512(%rip), %zmm9
+
+/* saturate X range */
+        vmovups   LargeX+__svml_datan_data_internal_avx512(%rip), %zmm7
+        vandpd    __svml_datan_data_internal_avx512(%rip), %zmm0, %zmm8
+
+/* R+Rl = DiffX/Y */
+        vbroadcastsd .FLT_10(%rip), %zmm15
+        vaddpd    {rn-sae}, %zmm4, %zmm8, %zmm2
+        vxorpd    %zmm0, %zmm8, %zmm1
+        vcmppd    $29, {sae}, %zmm3, %zmm8, %k2
+
+/* round to 2 bits after binary point */
+        vreducepd $40, {sae}, %zmm8, %zmm6
+        vsubpd    {rn-sae}, %zmm4, %zmm2, %zmm5
+
+/*
+ * if|X|>=MaxThreshold, set DiffX=-1
+ * VMSUB(D, DiffX, LargeMask, Zero, One);
+ */
+        vblendmpd MOne+__svml_datan_data_internal_avx512(%rip), %zmm6, %zmm10{%k2}
+        vfmadd231pd {rn-sae}, %zmm8, %zmm5, %zmm9
+        vmovups   dIndexMed+__svml_datan_data_internal_avx512(%rip), %zmm5
+
+/* table lookup sequence */
+        vmovups   Tbl_H+__svml_datan_data_internal_avx512(%rip), %zmm6
+        vgetmantpd $0, {sae}, %zmm10, %zmm14
+        vgetexppd {sae}, %zmm10, %zmm11
+        vmovups   coeff_5+__svml_datan_data_internal_avx512(%rip), %zmm10
+
+/*
+ * if|X|>=MaxThreshold, set Y=X
+ * VMADD(D, Y, LargeMask, X, Zero);
+ */
+        vminpd    {sae}, %zmm8, %zmm7, %zmm9{%k2}
+        vcmppd    $29, {sae}, %zmm5, %zmm2, %k1
+        vmovups   Tbl_H+128+__svml_datan_data_internal_avx512(%rip), %zmm7
+        vmovups   coeff_1+__svml_datan_data_internal_avx512(%rip), %zmm8
+        vgetmantpd $0, {sae}, %zmm9, %zmm3
+        vgetexppd {sae}, %zmm9, %zmm12
+        vmovups   coeff_3+__svml_datan_data_internal_avx512(%rip), %zmm9
+        vpermt2pd Tbl_H+64+__svml_datan_data_internal_avx512(%rip), %zmm2, %zmm6
+        vsubpd    {rn-sae}, %zmm12, %zmm11, %zmm4
+        vpermt2pd Tbl_H+192+__svml_datan_data_internal_avx512(%rip), %zmm2, %zmm7
+        vrcp14pd  %zmm3, %zmm13
+        vmovups   coeff_4+__svml_datan_data_internal_avx512(%rip), %zmm12
+        vmovups   coeff_6+__svml_datan_data_internal_avx512(%rip), %zmm11
+        vblendmpd %zmm7, %zmm6, %zmm2{%k1}
+        vmulpd    {rn-sae}, %zmm13, %zmm14, %zmm0
+        vfnmadd231pd {rn-sae}, %zmm3, %zmm13, %zmm15
+        vfnmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm3
+        vfmadd213pd {rn-sae}, %zmm15, %zmm15, %zmm15
+        vfmadd213pd {rn-sae}, %zmm13, %zmm13, %zmm15
+        vfmadd213pd {rn-sae}, %zmm0, %zmm15, %zmm3
+        vscalefpd {rn-sae}, %zmm4, %zmm3, %zmm0
+
+/* set table value to Pi/2 for large X */
+        vblendmpd Pi2+__svml_datan_data_internal_avx512(%rip), %zmm2, %zmm3{%k2}
+        vmovups   coeff_2+__svml_datan_data_internal_avx512(%rip), %zmm2
+
+/* polynomial evaluation */
+        vmulpd    {rn-sae}, %zmm0, %zmm0, %zmm14
+        vmulpd    {rn-sae}, %zmm14, %zmm14, %zmm13
+        vmulpd    {rn-sae}, %zmm0, %zmm14, %zmm15
+        vfmadd231pd {rn-sae}, %zmm14, %zmm8, %zmm2
+        vfmadd231pd {rn-sae}, %zmm14, %zmm9, %zmm12
+        vfmadd213pd {rn-sae}, %zmm11, %zmm10, %zmm14
+        vfmadd213pd {rn-sae}, %zmm12, %zmm13, %zmm2
+        vfmadd213pd {rn-sae}, %zmm14, %zmm13, %zmm2
+        vfmadd213pd {rn-sae}, %zmm0, %zmm15, %zmm2
+        vaddpd    {rn-sae}, %zmm3, %zmm2, %zmm0
+        vxorpd    %zmm1, %zmm0, %zmm0
+        ret
+
+END(_ZGVeN8v_atan_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_datan_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 AbsMask[8][2];
+        __declspec(align(64)) VUINT32 Shifter[8][2];
+        __declspec(align(64)) VUINT32 MaxThreshold[8][2];
+        __declspec(align(64)) VUINT32 MOne[8][2];
+        __declspec(align(64)) VUINT32 One[8][2];
+        __declspec(align(64)) VUINT32 LargeX[8][2];
+        __declspec(align(64)) VUINT32 Zero[8][2];
+        __declspec(align(64)) VUINT32 Tbl_H[32][2];
+        __declspec(align(64)) VUINT32 dIndexMed[8][2];
+        __declspec(align(64)) VUINT32 Pi2[8][2];
+        __declspec(align(64)) VUINT32 coeff[6][8][2];
+    } __svml_datan_data_internal_avx512;
+#endif
+__svml_datan_data_internal_avx512:
+        /*== AbsMask ==*/
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== Shifter ==*/
+        .align 64
+        .quad 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000
+        /*== MaxThreshold ==*/
+        .align 64
+        .quad 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000
+        /*== MOne ==*/
+        .align 64
+        .quad 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000
+        /*== One ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== LargeX ==*/
+        .align 64
+        .quad 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000
+        /*== Zero ==*/
+        .align 64
+        .quad 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000
+        /*== Tbl_H ==*/
+        .align 64
+        .quad 0x0000000000000000, 0x3fcf5b75f92c80dd
+        .quad 0x3fddac670561bb4f, 0x3fe4978fa3269ee1
+        .quad 0x3fe921fb54442d18, 0x3fecac7c57846f9e
+        .quad 0x3fef730bd281f69b, 0x3ff0d38f2c5ba09f
+        .quad 0x3ff1b6e192ebbe44, 0x3ff270ef55a53a25
+        .quad 0x3ff30b6d796a4da8, 0x3ff38d6a6ce13353
+        .quad 0x3ff3fc176b7a8560, 0x3ff45b54837351a0
+        .quad 0x3ff4ae10fc6589a5, 0x3ff4f68dea672617
+        .quad 0x3ff5368c951e9cfd, 0x3ff56f6f33a3e6a7
+        .quad 0x3ff5a25052114e60, 0x3ff5d013c41adabd
+        .quad 0x3ff5f97315254857, 0x3ff61f06c6a92b89
+        .quad 0x3ff6414d44094c7c, 0x3ff660b02c736a06
+        .quad 0x3ff67d8863bc99bd, 0x3ff698213a9d5053
+        .quad 0x3ff6b0bae830c070, 0x3ff6c78c7edeb195
+        .quad 0x3ff6dcc57bb565fd, 0x3ff6f08f07435fec
+        .quad 0x3ff7030cf9403197, 0x3ff7145eac2088a4
+        /*== dIndexMed ==*/
+        .align 64
+        .quad 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010
+        /*== Pi2 ==*/
+        .align 64
+        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18
+        /*== coeff6 ==*/
+        .align 64
+        .quad 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97
+        .quad 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc
+        .quad 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0
+        .quad 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da
+        .quad 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e
+        .quad 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d
+        .align 64
+        .type	__svml_datan_data_internal_avx512,@object
+        .size	__svml_datan_data_internal_avx512,.-__svml_datan_data_internal_avx512
+        .align 8
+
+.FLT_10:
+        .long	0x00000000,0x3ff00000
+        .type	.FLT_10,@object
+        .size	.FLT_10,8
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S
new file mode 100644
index 0000000000..27623cdf16
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized atanf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_atanf _ZGVeN16v_atanf_avx2_wrapper
+#include "../svml_s_atanf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c
new file mode 100644
index 0000000000..940de26615
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atanf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_atanf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_atanf, __GI__ZGVeN16v_atanf,
+	       __redirect__ZGVeN16v_atanf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S
new file mode 100644
index 0000000000..4a37f03e69
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S
@@ -0,0 +1,174 @@
+/* Function atanf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ */
+
+/* Offsets for data table __svml_satan_data_internal_avx512
+ */
+#define AbsMask                       	0
+#define Shifter                       	64
+#define MaxThreshold                  	128
+#define MOne                          	192
+#define One                           	256
+#define LargeX                        	320
+#define Zero                          	384
+#define Tbl_H                         	448
+#define Pi2                           	576
+#define coeff_1                       	640
+#define coeff_2                       	704
+#define coeff_3                       	768
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_atanf_skx)
+        vandps    __svml_satan_data_internal_avx512(%rip), %zmm0, %zmm7
+        vmovups   MaxThreshold+__svml_satan_data_internal_avx512(%rip), %zmm3
+        vmovups   One+__svml_satan_data_internal_avx512(%rip), %zmm8
+
+/* round to 2 bits after binary point */
+        vreduceps $40, {sae}, %zmm7, %zmm5
+
+/* saturate X range */
+        vmovups   LargeX+__svml_satan_data_internal_avx512(%rip), %zmm6
+        vmovups   Shifter+__svml_satan_data_internal_avx512(%rip), %zmm2
+        vcmpps    $29, {sae}, %zmm3, %zmm7, %k1
+
+/* table lookup sequence */
+        vmovups   Tbl_H+__svml_satan_data_internal_avx512(%rip), %zmm3
+        vsubps    {rn-sae}, %zmm5, %zmm7, %zmm4
+        vaddps    {rn-sae}, %zmm2, %zmm7, %zmm1
+        vxorps    %zmm0, %zmm7, %zmm0
+        vfmadd231ps {rn-sae}, %zmm7, %zmm4, %zmm8
+        vmovups   coeff_2+__svml_satan_data_internal_avx512(%rip), %zmm4
+
+/* if|X|>=MaxThreshold, set DiffX=-1 */
+        vblendmps MOne+__svml_satan_data_internal_avx512(%rip), %zmm5, %zmm9{%k1}
+        vmovups   coeff_3+__svml_satan_data_internal_avx512(%rip), %zmm5
+
+/* if|X|>=MaxThreshold, set Y=X */
+        vminps    {sae}, %zmm7, %zmm6, %zmm8{%k1}
+
+/* R+Rl = DiffX/Y */
+        vgetmantps $0, {sae}, %zmm9, %zmm12
+        vgetexpps {sae}, %zmm9, %zmm10
+        vpermt2ps Tbl_H+64+__svml_satan_data_internal_avx512(%rip), %zmm1, %zmm3
+        vgetmantps $0, {sae}, %zmm8, %zmm15
+        vgetexpps {sae}, %zmm8, %zmm11
+        vmovups   coeff_1+__svml_satan_data_internal_avx512(%rip), %zmm1
+
+/* set table value to Pi/2 for large X */
+        vblendmps Pi2+__svml_satan_data_internal_avx512(%rip), %zmm3, %zmm9{%k1}
+        vrcp14ps  %zmm15, %zmm13
+        vsubps    {rn-sae}, %zmm11, %zmm10, %zmm2
+        vmulps    {rn-sae}, %zmm13, %zmm12, %zmm14
+        vfnmadd213ps {rn-sae}, %zmm12, %zmm14, %zmm15
+        vfmadd213ps {rn-sae}, %zmm14, %zmm13, %zmm15
+        vscalefps {rn-sae}, %zmm2, %zmm15, %zmm7
+
+/* polynomial evaluation */
+        vmulps    {rn-sae}, %zmm7, %zmm7, %zmm8
+        vmulps    {rn-sae}, %zmm7, %zmm8, %zmm6
+        vfmadd231ps {rn-sae}, %zmm8, %zmm1, %zmm4
+        vfmadd213ps {rn-sae}, %zmm5, %zmm4, %zmm8
+        vfmadd213ps {rn-sae}, %zmm7, %zmm6, %zmm8
+        vaddps    {rn-sae}, %zmm9, %zmm8, %zmm10
+        vxorps    %zmm0, %zmm10, %zmm0
+        ret
+
+END(_ZGVeN16v_atanf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_satan_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 AbsMask[16][1];
+        __declspec(align(64)) VUINT32 Shifter[16][1];
+        __declspec(align(64)) VUINT32 MaxThreshold[16][1];
+        __declspec(align(64)) VUINT32 MOne[16][1];
+        __declspec(align(64)) VUINT32 One[16][1];
+        __declspec(align(64)) VUINT32 LargeX[16][1];
+        __declspec(align(64)) VUINT32 Zero[16][1];
+        __declspec(align(64)) VUINT32 Tbl_H[32][1];
+        __declspec(align(64)) VUINT32 Pi2[16][1];
+        __declspec(align(64)) VUINT32 coeff[3][16][1];
+    } __svml_satan_data_internal_avx512;
+#endif
+__svml_satan_data_internal_avx512:
+        /*== AbsMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== Shifter ==*/
+        .align 64
+        .long 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000
+        /*== MaxThreshold ==*/
+        .align 64
+        .long 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000
+        /*== MOne ==*/
+        .align 64
+        .long 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000
+        /*== One ==*/
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== LargeX ==*/
+        .align 64
+        .long 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000
+        /*== Zero ==*/
+        .align 64
+        .long 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000
+        /*== Tbl_H ==*/
+        .align 64
+        .long 0x00000000, 0x3e7adbb0
+        .long 0x3eed6338, 0x3f24bc7d
+        .long 0x3f490fdb, 0x3f6563e3
+        .long 0x3f7b985f, 0x3f869c79
+        .long 0x3f8db70d, 0x3f93877b
+        .long 0x3f985b6c, 0x3f9c6b53
+        .long 0x3f9fe0bb, 0x3fa2daa4
+        .long 0x3fa57088, 0x3fa7b46f
+        .long 0x3fa9b465, 0x3fab7b7a
+        .long 0x3fad1283, 0x3fae809e
+        .long 0x3fafcb99, 0x3fb0f836
+        .long 0x3fb20a6a, 0x3fb30581
+        .long 0x3fb3ec43, 0x3fb4c10a
+        .long 0x3fb585d7, 0x3fb63c64
+        .long 0x3fb6e62c, 0x3fb78478
+        .long 0x3fb81868, 0x3fb8a2f5
+        /*== Pi2 ==*/
+        .align 64
+        .long 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB
+        /*== coeff3 ==*/
+        .align 64
+        .long 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de
+        .long 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2
+        .long 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa
+        .align 64
+        .type	__svml_satan_data_internal_avx512,@object
+        .size	__svml_satan_data_internal_avx512,.-__svml_satan_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S
new file mode 100644
index 0000000000..fe81170666
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized atanf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_atanf _ZGVbN4v_atanf_sse2
+#include "../svml_s_atanf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c
new file mode 100644
index 0000000000..975ece6812
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atanf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_atanf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_atanf, __GI__ZGVbN4v_atanf,
+	       __redirect__ZGVbN4v_atanf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S
new file mode 100644
index 0000000000..c58a894e10
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S
@@ -0,0 +1,164 @@
+/* Function atanf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ */
+
+/* Offsets for data table __svml_satan_data_internal
+ */
+#define _sSIGN_MASK                   	0
+#define _sABS_MASK                    	16
+#define _sONE                         	32
+#define _sPIO2                        	48
+#define _sPC8                         	64
+#define _sPC7                         	80
+#define _sPC6                         	96
+#define _sPC5                         	112
+#define _sPC4                         	128
+#define _sPC3                         	144
+#define _sPC2                         	160
+#define _sPC1                         	176
+#define _sPC0                         	192
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_atanf_sse4)
+/*
+ * To use minps\maxps operations for argument reduction
+ * uncomment _AT_USEMINMAX_ definition
+ *  Declarations
+ * Variables
+ * Constants
+ */
+        movups    _sABS_MASK+__svml_satan_data_internal(%rip), %xmm2
+
+/*
+ * 1) If x>1,      then r=-1/x, PIO2=Pi/2
+ * 2) If -1<=x<=1, then r=x,    PIO2=0
+ * 3) If x<-1,     then r=-1/x, PIO2=-Pi/2
+ */
+        movups    _sONE+__svml_satan_data_internal(%rip), %xmm1
+        andps     %xmm0, %xmm2
+        movaps    %xmm2, %xmm9
+        movaps    %xmm1, %xmm3
+        cmpleps   %xmm1, %xmm9
+        maxps     %xmm2, %xmm3
+        minps     %xmm2, %xmm1
+        divps     %xmm3, %xmm1
+        movups    __svml_satan_data_internal(%rip), %xmm4
+        movaps    %xmm9, %xmm10
+        andps     %xmm4, %xmm0
+        andnps    %xmm4, %xmm9
+        pxor      %xmm0, %xmm9
+        pxor      %xmm1, %xmm9
+
+/* Polynomial. */
+        movaps    %xmm9, %xmm8
+        mulps     %xmm9, %xmm8
+        movaps    %xmm8, %xmm7
+        mulps     %xmm8, %xmm7
+        movups    _sPC8+__svml_satan_data_internal(%rip), %xmm6
+        mulps     %xmm7, %xmm6
+        movups    _sPC7+__svml_satan_data_internal(%rip), %xmm5
+        mulps     %xmm7, %xmm5
+        addps     _sPC6+__svml_satan_data_internal(%rip), %xmm6
+        mulps     %xmm7, %xmm6
+        addps     _sPC5+__svml_satan_data_internal(%rip), %xmm5
+        mulps     %xmm7, %xmm5
+        addps     _sPC4+__svml_satan_data_internal(%rip), %xmm6
+        mulps     %xmm7, %xmm6
+        addps     _sPC3+__svml_satan_data_internal(%rip), %xmm5
+        mulps     %xmm5, %xmm7
+        addps     _sPC2+__svml_satan_data_internal(%rip), %xmm6
+        mulps     %xmm8, %xmm6
+        addps     _sPC1+__svml_satan_data_internal(%rip), %xmm7
+        andnps    _sPIO2+__svml_satan_data_internal(%rip), %xmm10
+        addps     %xmm6, %xmm7
+        mulps     %xmm7, %xmm8
+        pxor      %xmm0, %xmm10
+        addps     _sPC0+__svml_satan_data_internal(%rip), %xmm8
+
+/* Reconstruction. */
+        mulps     %xmm8, %xmm9
+        addps     %xmm9, %xmm10
+        movaps    %xmm10, %xmm0
+        ret
+
+END(_ZGVbN4v_atanf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_satan_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 _sSIGN_MASK[4][1];
+        __declspec(align(16)) VUINT32 _sABS_MASK[4][1];
+        __declspec(align(16)) VUINT32 _sONE[4][1];
+        __declspec(align(16)) VUINT32 _sPIO2[4][1];
+        __declspec(align(16)) VUINT32 _sPC8[4][1];
+        __declspec(align(16)) VUINT32 _sPC7[4][1];
+        __declspec(align(16)) VUINT32 _sPC6[4][1];
+        __declspec(align(16)) VUINT32 _sPC5[4][1];
+        __declspec(align(16)) VUINT32 _sPC4[4][1];
+        __declspec(align(16)) VUINT32 _sPC3[4][1];
+        __declspec(align(16)) VUINT32 _sPC2[4][1];
+        __declspec(align(16)) VUINT32 _sPC1[4][1];
+        __declspec(align(16)) VUINT32 _sPC0[4][1];
+} __svml_satan_data_internal;
+#endif
+__svml_satan_data_internal:
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000 //_sSIGN_MASK
+        .align 16
+        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF //_sABS_MASK
+        .align 16
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000 //_sONE
+        .align 16
+        .long 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB //_sPIO2
+        .align 16
+        .long 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0 //_sPC8
+        .align 16
+        .long 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631 //_sPC7
+        .align 16
+        .long 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384 //_sPC6
+        .align 16
+        .long 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629 //_sPC5
+        .align 16
+        .long 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474 //_sPC4
+        .align 16
+        .long 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8 //_sPC3
+        .align 16
+        .long 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F //_sPC2
+        .align 16
+        .long 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49 //_sPC1
+        .align 16
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000 //_sPC0
+        .align 16
+        .type	__svml_satan_data_internal,@object
+        .size	__svml_satan_data_internal,.-__svml_satan_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S
new file mode 100644
index 0000000000..1652a8f5c6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized atanf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_atanf _ZGVdN8v_atanf_sse_wrapper
+#include "../svml_s_atanf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c
new file mode 100644
index 0000000000..733d8c3bc3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atanf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_atanf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_atanf, __GI__ZGVdN8v_atanf,
+	       __redirect__ZGVdN8v_atanf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S
new file mode 100644
index 0000000000..e333f979c4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S
@@ -0,0 +1,148 @@
+/* Function atanf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ */
+
+/* Offsets for data table __svml_satan_data_internal
+ */
+#define _sSIGN_MASK                   	0
+#define _sABS_MASK                    	32
+#define _sONE                         	64
+#define _sPIO2                        	96
+#define _sPC8                         	128
+#define _sPC7                         	160
+#define _sPC6                         	192
+#define _sPC5                         	224
+#define _sPC4                         	256
+#define _sPC3                         	288
+#define _sPC2                         	320
+#define _sPC1                         	352
+#define _sPC0                         	384
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_atanf_avx2)
+/*
+ * 1) If x>1,      then r=-1/x, PIO2=Pi/2
+ * 2) If -1<=x<=1, then r=x,    PIO2=0
+ * 3) If x<-1,     then r=-1/x, PIO2=-Pi/2
+ */
+        vmovups   _sONE+__svml_satan_data_internal(%rip), %ymm2
+        vmovups   __svml_satan_data_internal(%rip), %ymm7
+        vmovups   _sPC7+__svml_satan_data_internal(%rip), %ymm13
+
+/*
+ * To use minps\maxps operations for argument reduction
+ * uncomment _AT_USEMINMAX_ definition
+ *  Declarations
+ * Variables
+ * Constants
+ */
+        vandps    _sABS_MASK+__svml_satan_data_internal(%rip), %ymm0, %ymm3
+        vmaxps    %ymm3, %ymm2, %ymm5
+        vminps    %ymm3, %ymm2, %ymm4
+        vcmple_oqps %ymm2, %ymm3, %ymm6
+        vdivps    %ymm5, %ymm4, %ymm11
+        vandps    %ymm7, %ymm0, %ymm9
+        vandnps   %ymm7, %ymm6, %ymm8
+        vxorps    %ymm9, %ymm8, %ymm10
+        vxorps    %ymm11, %ymm10, %ymm15
+
+/* Polynomial. */
+        vmulps    %ymm15, %ymm15, %ymm14
+        vmovups   _sPC8+__svml_satan_data_internal(%rip), %ymm0
+        vmulps    %ymm14, %ymm14, %ymm12
+        vfmadd213ps _sPC6+__svml_satan_data_internal(%rip), %ymm12, %ymm0
+        vfmadd213ps _sPC5+__svml_satan_data_internal(%rip), %ymm12, %ymm13
+        vfmadd213ps _sPC4+__svml_satan_data_internal(%rip), %ymm12, %ymm0
+        vfmadd213ps _sPC3+__svml_satan_data_internal(%rip), %ymm12, %ymm13
+        vfmadd213ps _sPC2+__svml_satan_data_internal(%rip), %ymm12, %ymm0
+        vfmadd213ps _sPC1+__svml_satan_data_internal(%rip), %ymm12, %ymm13
+        vfmadd213ps %ymm13, %ymm14, %ymm0
+        vfmadd213ps _sPC0+__svml_satan_data_internal(%rip), %ymm14, %ymm0
+        vandnps   _sPIO2+__svml_satan_data_internal(%rip), %ymm6, %ymm1
+        vxorps    %ymm9, %ymm1, %ymm1
+
+/* Reconstruction. */
+        vfmadd213ps %ymm1, %ymm15, %ymm0
+        ret
+
+END(_ZGVdN8v_atanf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_satan_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 _sSIGN_MASK[8][1];
+        __declspec(align(32)) VUINT32 _sABS_MASK[8][1];
+        __declspec(align(32)) VUINT32 _sONE[8][1];
+        __declspec(align(32)) VUINT32 _sPIO2[8][1];
+        __declspec(align(32)) VUINT32 _sPC8[8][1];
+        __declspec(align(32)) VUINT32 _sPC7[8][1];
+        __declspec(align(32)) VUINT32 _sPC6[8][1];
+        __declspec(align(32)) VUINT32 _sPC5[8][1];
+        __declspec(align(32)) VUINT32 _sPC4[8][1];
+        __declspec(align(32)) VUINT32 _sPC3[8][1];
+        __declspec(align(32)) VUINT32 _sPC2[8][1];
+        __declspec(align(32)) VUINT32 _sPC1[8][1];
+        __declspec(align(32)) VUINT32 _sPC0[8][1];
+} __svml_satan_data_internal;
+#endif
+__svml_satan_data_internal:
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000 //_sSIGN_MASK
+        .align 32
+        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF //_sABS_MASK
+        .align 32
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000 //_sONE
+        .align 32
+        .long 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB //_sPIO2
+        .align 32
+        .long 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0 //_sPC8
+        .align 32
+        .long 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631 //_sPC7
+        .align 32
+        .long 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384 //_sPC6
+        .align 32
+        .long 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629 //_sPC5
+        .align 32
+        .long 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474 //_sPC4
+        .align 32
+        .long 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8 //_sPC3
+        .align 32
+        .long 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F //_sPC2
+        .align 32
+        .long 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49 //_sPC1
+        .align 32
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000 //_sPC0
+        .align 32
+        .type	__svml_satan_data_internal,@object
+        .size	__svml_satan_data_internal,.-__svml_satan_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_atan2_core.S b/sysdeps/x86_64/fpu/svml_d_atan2_core.S
new file mode 100644
index 0000000000..e86d5b7047
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atan2_core.S
@@ -0,0 +1,29 @@
+/* Function atan vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_atan)
+WRAPPER_IMPL_SSE2 atan
+END (_ZGVbN2v_atan)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_atan)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_atan4_core.S b/sysdeps/x86_64/fpu/svml_d_atan4_core.S
new file mode 100644
index 0000000000..eb11fd2f17
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atan4_core.S
@@ -0,0 +1,29 @@
+/* Function atan vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_atan)
+WRAPPER_IMPL_AVX _ZGVbN2v_atan
+END (_ZGVdN4v_atan)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_atan)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S
new file mode 100644
index 0000000000..b83a4be33d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function atan vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_atan)
+WRAPPER_IMPL_AVX _ZGVbN2v_atan
+END (_ZGVcN4v_atan)
diff --git a/sysdeps/x86_64/fpu/svml_d_atan8_core.S b/sysdeps/x86_64/fpu/svml_d_atan8_core.S
new file mode 100644
index 0000000000..9685a32bdc
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atan8_core.S
@@ -0,0 +1,25 @@
+/* Function atan vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_atan)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_atan
+END (_ZGVeN8v_atan)
diff --git a/sysdeps/x86_64/fpu/svml_s_atanf16_core.S b/sysdeps/x86_64/fpu/svml_s_atanf16_core.S
new file mode 100644
index 0000000000..f82d2422ae
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atanf16_core.S
@@ -0,0 +1,25 @@
+/* Function atanf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_atanf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_atanf
+END (_ZGVeN16v_atanf)
diff --git a/sysdeps/x86_64/fpu/svml_s_atanf4_core.S b/sysdeps/x86_64/fpu/svml_s_atanf4_core.S
new file mode 100644
index 0000000000..6b8c4d9624
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atanf4_core.S
@@ -0,0 +1,29 @@
+/* Function atanf vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_atanf)
+WRAPPER_IMPL_SSE2 atanf
+END (_ZGVbN4v_atanf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_atanf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_atanf8_core.S b/sysdeps/x86_64/fpu/svml_s_atanf8_core.S
new file mode 100644
index 0000000000..315681f6c0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atanf8_core.S
@@ -0,0 +1,29 @@
+/* Function atanf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_atanf)
+WRAPPER_IMPL_AVX _ZGVbN4v_atanf
+END (_ZGVdN8v_atanf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_atanf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S
new file mode 100644
index 0000000000..b9cd502186
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function atanf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_atanf)
+WRAPPER_IMPL_AVX _ZGVbN4v_atanf
+END (_ZGVcN8v_atanf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c
new file mode 100644
index 0000000000..0f7176a20b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-atan.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c
new file mode 100644
index 0000000000..0f7176a20b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-atan.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c
new file mode 100644
index 0000000000..0f7176a20b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-atan.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan.c
new file mode 100644
index 0000000000..982687b169
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC atan
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 0abc7d2021..467c913990 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVbN2v_log)
 VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVbN2vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos)
+VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index dda093b914..b72a7de84e 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVdN4v_log)
 VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVdN4v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVdN4vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos)
+VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index f3230463bb..d2434df21e 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVcN4v_log)
 VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVcN4v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVcN4vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos)
+VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index cf9f52faf0..f7aaf8159e 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVeN8v_log)
 VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVeN8v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVeN8vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos)
+VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c
new file mode 100644
index 0000000000..9251c65f8a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-atanf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c
new file mode 100644
index 0000000000..9251c65f8a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-atanf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c
new file mode 100644
index 0000000000..9251c65f8a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-atanf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf.c
new file mode 100644
index 0000000000..2a8ab87e86
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC atanf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index abbd3ed870..af769c56fa 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVeN16v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVeN16v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf)
+VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 8a24027952..76e61d2f1e 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf)
+VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index aff0442606..5e27eaaf29 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVdN8v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVdN8v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf)
+VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 913584d111..28daf79aa9 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVcN8v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVcN8v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf)
+VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 02/18] x86-64: Add vector asin/asinf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
  2021-12-29  6:39 ` [PATCH v5 01/18] x86-64: Add vector atan/atanf implementation " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:25   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 03/18] x86-64: Add vector hypot/hypotf " Sunil K Pandey
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized asin/asinf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector asin/asinf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 ++
 .../fpu/multiarch/svml_d_asin2_core-sse2.S    |  20 ++
 .../x86_64/fpu/multiarch/svml_d_asin2_core.c  |  27 ++
 .../fpu/multiarch/svml_d_asin2_core_sse4.S    | 288 +++++++++++++++++
 .../fpu/multiarch/svml_d_asin4_core-sse.S     |  20 ++
 .../x86_64/fpu/multiarch/svml_d_asin4_core.c  |  27 ++
 .../fpu/multiarch/svml_d_asin4_core_avx2.S    | 273 ++++++++++++++++
 .../fpu/multiarch/svml_d_asin8_core-avx2.S    |  20 ++
 .../x86_64/fpu/multiarch/svml_d_asin8_core.c  |  27 ++
 .../fpu/multiarch/svml_d_asin8_core_avx512.S  | 295 ++++++++++++++++++
 .../fpu/multiarch/svml_s_asinf16_core-avx2.S  |  20 ++
 .../fpu/multiarch/svml_s_asinf16_core.c       |  28 ++
 .../multiarch/svml_s_asinf16_core_avx512.S    | 260 +++++++++++++++
 .../fpu/multiarch/svml_s_asinf4_core-sse2.S   |  20 ++
 .../x86_64/fpu/multiarch/svml_s_asinf4_core.c |  28 ++
 .../fpu/multiarch/svml_s_asinf4_core_sse4.S   | 252 +++++++++++++++
 .../fpu/multiarch/svml_s_asinf8_core-sse.S    |  20 ++
 .../x86_64/fpu/multiarch/svml_s_asinf8_core.c |  28 ++
 .../fpu/multiarch/svml_s_asinf8_core_avx2.S   | 249 +++++++++++++++
 sysdeps/x86_64/fpu/svml_d_asin2_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_asin4_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S    |  25 ++
 sysdeps/x86_64/fpu/svml_d_asin8_core.S        |  25 ++
 sysdeps/x86_64/fpu/svml_s_asinf16_core.S      |  25 ++
 sysdeps/x86_64/fpu/svml_s_asinf4_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_asinf8_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S   |  25 ++
 .../x86_64/fpu/test-double-libmvec-asin-avx.c |   1 +
 .../fpu/test-double-libmvec-asin-avx2.c       |   1 +
 .../fpu/test-double-libmvec-asin-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-double-libmvec-asin.c |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-asinf-avx.c |   1 +
 .../fpu/test-float-libmvec-asinf-avx2.c       |   1 +
 .../fpu/test-float-libmvec-asinf-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-float-libmvec-asinf.c |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 2189 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asin2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asin4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asin8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index b4647ca918..ae8ee882d0 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -120,4 +120,15 @@
 #define __DECL_SIMD_atanf32x
 #define __DECL_SIMD_atanf64x
 #define __DECL_SIMD_atanf128x
+
+#define __DECL_SIMD_asin
+#define __DECL_SIMD_asinf
+#define __DECL_SIMD_asinl
+#define __DECL_SIMD_asinf16
+#define __DECL_SIMD_asinf32
+#define __DECL_SIMD_asinf64
+#define __DECL_SIMD_asinf128
+#define __DECL_SIMD_asinf32x
+#define __DECL_SIMD_asinf64x
+#define __DECL_SIMD_asinf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 3e27c21f21..bb53b7021e 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -52,7 +52,7 @@
 /* Arc cosine of X.  */
 __MATHCALL_VEC (acos,, (_Mdouble_ __x));
 /* Arc sine of X.  */
-__MATHCALL (asin,, (_Mdouble_ __x));
+__MATHCALL_VEC (asin,, (_Mdouble_ __x));
 /* Arc tangent of X.  */
 __MATHCALL_VEC (atan,, (_Mdouble_ __x));
 /* Arc tangent of Y/X.  */
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index a93258db6f..ab03a07f92 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -47,18 +47,26 @@ GLIBC_2.22 _ZGVeN8v_sin F
 GLIBC_2.22 _ZGVeN8vv_pow F
 GLIBC_2.22 _ZGVeN8vvv_sincos F
 GLIBC_2.35 _ZGVbN2v_acos F
+GLIBC_2.35 _ZGVbN2v_asin F
 GLIBC_2.35 _ZGVbN2v_atan F
 GLIBC_2.35 _ZGVbN4v_acosf F
+GLIBC_2.35 _ZGVbN4v_asinf F
 GLIBC_2.35 _ZGVbN4v_atanf F
 GLIBC_2.35 _ZGVcN4v_acos F
+GLIBC_2.35 _ZGVcN4v_asin F
 GLIBC_2.35 _ZGVcN4v_atan F
 GLIBC_2.35 _ZGVcN8v_acosf F
+GLIBC_2.35 _ZGVcN8v_asinf F
 GLIBC_2.35 _ZGVcN8v_atanf F
 GLIBC_2.35 _ZGVdN4v_acos F
+GLIBC_2.35 _ZGVdN4v_asin F
 GLIBC_2.35 _ZGVdN4v_atan F
 GLIBC_2.35 _ZGVdN8v_acosf F
+GLIBC_2.35 _ZGVdN8v_asinf F
 GLIBC_2.35 _ZGVdN8v_atanf F
 GLIBC_2.35 _ZGVeN16v_acosf F
+GLIBC_2.35 _ZGVeN16v_asinf F
 GLIBC_2.35 _ZGVeN16v_atanf F
 GLIBC_2.35 _ZGVeN8v_acos F
+GLIBC_2.35 _ZGVeN8v_asin F
 GLIBC_2.35 _ZGVeN8v_atan F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 1c0e5c5e35..73cb8849ff 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -66,6 +66,10 @@
 #  define __DECL_SIMD_atan __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_atanf
 #  define __DECL_SIMD_atanf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_asin
+#  define __DECL_SIMD_asin __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_asinf
+#  define __DECL_SIMD_asinf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index ddcccb11d7..4552c2bdfa 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -32,6 +32,8 @@
 !GCC$ builtin (acosf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (atan) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (atanf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (asin) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (asinf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -49,3 +51,5 @@
 !GCC$ builtin (acosf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (atan) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (atanf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (asin) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (asinf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index dae0887f13..e0eae0b196 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -23,6 +23,7 @@ postclean-generated += libmvec.mk
 # Define for both math and mathvec directories.
 libmvec-funcs = \
   acos \
+  asin \
   atan \
   cos \
   exp \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 424f6d526e..10baf869a5 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -15,8 +15,10 @@ libmvec {
   }
   GLIBC_2.35 {
     _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
+    _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
     _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
+    _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
     _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
   }
 }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 2e64e59803..ea0f833381 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -93,6 +93,26 @@ float: 1
 float128: 2
 ldouble: 1
 
+Function: "asin_vlen16":
+float: 1
+
+Function: "asin_vlen2":
+double: 1
+
+Function: "asin_vlen4":
+double: 1
+float: 1
+
+Function: "asin_vlen4_avx2":
+double: 1
+
+Function: "asin_vlen8":
+double: 1
+float: 1
+
+Function: "asin_vlen8_avx2":
+float: 1
+
 Function: "asinh":
 double: 2
 float: 2
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core-sse2.S
new file mode 100644
index 0000000000..57e1d41a7b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized asin, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_asin _ZGVbN2v_asin_sse2
+#include "../svml_d_asin2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core.c
new file mode 100644
index 0000000000..e46c3af81e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized asin, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_asin
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_asin, __GI__ZGVbN2v_asin, __redirect__ZGVbN2v_asin)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core_sse4.S
new file mode 100644
index 0000000000..a6f7a41623
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core_sse4.S
@@ -0,0 +1,288 @@
+/* Function asin vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      SelMask = (|x| >= 0.5) ? 1 : 0;
+ *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
+ *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
+ *
+ */
+
+/* Offsets for data table __svml_dasin_data_internal
+ */
+#define AbsMask                       	0
+#define OneHalf                       	16
+#define SmallNorm                     	32
+#define One                           	48
+#define Two                           	64
+#define sqrt_coeff                    	80
+#define poly_coeff                    	144
+#define Pi2H                          	336
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_asin_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm5
+        movups    __svml_dasin_data_internal(%rip), %xmm3
+        movups    OneHalf+__svml_dasin_data_internal(%rip), %xmm8
+
+/* x = |arg| */
+        movaps    %xmm3, %xmm4
+        andps     %xmm5, %xmm4
+
+/* Y = 0.5 - 0.5*x */
+        movaps    %xmm8, %xmm6
+        mulpd     %xmm4, %xmm6
+        movaps    %xmm8, %xmm14
+
+/* x^2 */
+        movaps    %xmm4, %xmm2
+        subpd     %xmm6, %xmm14
+        mulpd     %xmm4, %xmm2
+
+/* S ~ -2*sqrt(Y) */
+        cvtpd2ps  %xmm14, %xmm9
+        minpd     %xmm14, %xmm2
+        movlhps   %xmm9, %xmm9
+        movaps    %xmm14, %xmm15
+        rsqrtps   %xmm9, %xmm10
+        cmpltpd   SmallNorm+__svml_dasin_data_internal(%rip), %xmm15
+        addpd     %xmm14, %xmm14
+        cvtps2pd  %xmm10, %xmm11
+        andnps    %xmm11, %xmm15
+        movaps    %xmm4, %xmm1
+        movaps    %xmm15, %xmm12
+        andnps    %xmm5, %xmm3
+        mulpd     %xmm15, %xmm12
+        mulpd     %xmm14, %xmm15
+        mulpd     %xmm12, %xmm14
+        cmpnltpd  %xmm8, %xmm1
+        subpd     Two+__svml_dasin_data_internal(%rip), %xmm14
+
+/* polynomial */
+        movups    poly_coeff+__svml_dasin_data_internal(%rip), %xmm6
+        movaps    %xmm2, %xmm12
+        mulpd     %xmm2, %xmm6
+        mulpd     %xmm2, %xmm12
+        addpd     poly_coeff+16+__svml_dasin_data_internal(%rip), %xmm6
+        movups    One+__svml_dasin_data_internal(%rip), %xmm7
+        movaps    %xmm12, %xmm8
+        cmpltpd   %xmm4, %xmm7
+        mulpd     %xmm12, %xmm6
+        movmskpd  %xmm7, %edx
+        movups    poly_coeff+32+__svml_dasin_data_internal(%rip), %xmm9
+        movaps    %xmm14, %xmm0
+        movups    poly_coeff+64+__svml_dasin_data_internal(%rip), %xmm7
+        mulpd     %xmm2, %xmm9
+        mulpd     %xmm2, %xmm7
+        addpd     poly_coeff+48+__svml_dasin_data_internal(%rip), %xmm9
+        addpd     poly_coeff+80+__svml_dasin_data_internal(%rip), %xmm7
+        mulpd     %xmm12, %xmm8
+        mulpd     %xmm12, %xmm7
+        addpd     %xmm6, %xmm9
+        mulpd     %xmm15, %xmm0
+        mulpd     %xmm8, %xmm9
+        movups    poly_coeff+96+__svml_dasin_data_internal(%rip), %xmm10
+        mulpd     %xmm2, %xmm10
+        movups    sqrt_coeff+__svml_dasin_data_internal(%rip), %xmm13
+        mulpd     %xmm14, %xmm13
+        addpd     poly_coeff+112+__svml_dasin_data_internal(%rip), %xmm10
+        addpd     sqrt_coeff+16+__svml_dasin_data_internal(%rip), %xmm13
+        addpd     %xmm7, %xmm10
+        mulpd     %xmm14, %xmm13
+        addpd     %xmm9, %xmm10
+        addpd     sqrt_coeff+32+__svml_dasin_data_internal(%rip), %xmm13
+        mulpd     %xmm12, %xmm10
+        mulpd     %xmm13, %xmm14
+        movups    poly_coeff+128+__svml_dasin_data_internal(%rip), %xmm11
+        mulpd     %xmm2, %xmm11
+        addpd     sqrt_coeff+48+__svml_dasin_data_internal(%rip), %xmm14
+        addpd     poly_coeff+144+__svml_dasin_data_internal(%rip), %xmm11
+        mulpd     %xmm14, %xmm0
+        addpd     %xmm10, %xmm11
+        subpd     %xmm15, %xmm0
+        mulpd     %xmm11, %xmm12
+        movups    poly_coeff+160+__svml_dasin_data_internal(%rip), %xmm13
+        movaps    %xmm1, %xmm14
+        mulpd     %xmm2, %xmm13
+        addpd     poly_coeff+176+__svml_dasin_data_internal(%rip), %xmm13
+        addpd     %xmm12, %xmm13
+        mulpd     %xmm13, %xmm2
+        andnps    %xmm4, %xmm14
+        andps     %xmm1, %xmm0
+        orps      %xmm0, %xmm14
+        mulpd     %xmm14, %xmm2
+        addpd     %xmm2, %xmm14
+        movups    Pi2H+__svml_dasin_data_internal(%rip), %xmm0
+        andps     %xmm1, %xmm0
+        addpd     %xmm14, %xmm0
+        pxor      %xmm3, %xmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm5
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm5, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      asin@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN2v_asin_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dasin_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 AbsMask[2][2];
+        __declspec(align(16)) VUINT32 OneHalf[2][2];
+        __declspec(align(16)) VUINT32 SmallNorm[2][2];
+        __declspec(align(16)) VUINT32 One[2][2];
+        __declspec(align(16)) VUINT32 Two[2][2];
+        __declspec(align(16)) VUINT32 sqrt_coeff[4][2][2];
+        __declspec(align(16)) VUINT32 poly_coeff[12][2][2];
+        __declspec(align(16)) VUINT32 Pi2H[2][2];
+} __svml_dasin_data_internal;
+#endif
+__svml_dasin_data_internal:
+        /*== AbsMask ==*/
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== OneHalf ==*/
+        .align 16
+        .quad 0x3fe0000000000000, 0x3fe0000000000000
+        /*== SmallNorm ==*/
+        .align 16
+        .quad 0x3000000000000000, 0x3000000000000000
+        /*== One ==*/
+        .align 16
+        .quad 0x3ff0000000000000, 0x3ff0000000000000
+        /*== Two ==*/
+        .align 16
+        .quad 0x4000000000000000, 0x4000000000000000
+        /*== sqrt_coeff[4] ==*/
+        .align 16
+        .quad 0xbf918000993B24C3, 0xbf918000993B24C3 /* sqrt_coeff4 */
+        .quad 0x3fa400006F70D42D, 0x3fa400006F70D42D /* sqrt_coeff3 */
+        .quad 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97 /* sqrt_coeff2 */
+        .quad 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D /* sqrt_coeff1 */
+        /*== poly_coeff[12] ==*/
+        .align 16
+        .quad 0x3fa07520C70EB909, 0x3fa07520C70EB909 /* poly_coeff12 */
+        .quad 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED /* poly_coeff11 */
+        .quad 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE /* poly_coeff10 */
+        .quad 0x3f7A583395D45ED5, 0x3f7A583395D45ED5 /* poly_coeff9 */
+        .quad 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6 /* poly_coeff8 */
+        .quad 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57 /* poly_coeff7 */
+        .quad 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E /* poly_coeff6 */
+        .quad 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd /* poly_coeff5 */
+        .quad 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE /* poly_coeff4 */
+        .quad 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8 /* poly_coeff3 */
+        .quad 0x3fb333333337E0DE, 0x3fb333333337E0DE /* poly_coeff2 */
+        .quad 0x3fc555555555529C, 0x3fc555555555529C /* poly_coeff1 */
+        /*== Pi2H ==*/
+        .align 16
+        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18
+        .align 16
+        .type	__svml_dasin_data_internal,@object
+        .size	__svml_dasin_data_internal,.-__svml_dasin_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core-sse.S
new file mode 100644
index 0000000000..1006fddc59
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized asin, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_asin _ZGVdN4v_asin_sse_wrapper
+#include "../svml_d_asin4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core.c
new file mode 100644
index 0000000000..b896516f5e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized asin, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_asin
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_asin, __GI__ZGVdN4v_asin, __redirect__ZGVdN4v_asin)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core_avx2.S
new file mode 100644
index 0000000000..80467b616f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core_avx2.S
@@ -0,0 +1,273 @@
+/* Function asin vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      SelMask = (|x| >= 0.5) ? 1 : 0;
+ *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
+ *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
+ *
+ */
+
+/* Offsets for data table __svml_dasin_data_internal
+ */
+#define AbsMask                       	0
+#define OneHalf                       	32
+#define SmallNorm                     	64
+#define One                           	96
+#define Two                           	128
+#define sqrt_coeff                    	160
+#define poly_coeff                    	288
+#define Pi2H                          	672
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_asin_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        vmovupd   __svml_dasin_data_internal(%rip), %ymm6
+        vmovupd   OneHalf+__svml_dasin_data_internal(%rip), %ymm10
+        vmovupd   One+__svml_dasin_data_internal(%rip), %ymm8
+        vmovapd   %ymm0, %ymm5
+
+/* x = |arg| */
+        vandpd    %ymm5, %ymm6, %ymm4
+
+/* Y = 0.5 - 0.5*x */
+        vmovapd   %ymm10, %ymm15
+        vfnmadd231pd %ymm4, %ymm10, %ymm15
+
+/* x^2 */
+        vmulpd    %ymm4, %ymm4, %ymm7
+        vcmplt_oqpd %ymm4, %ymm8, %ymm9
+
+/* S ~ -2*sqrt(Y) */
+        vcmplt_oqpd SmallNorm+__svml_dasin_data_internal(%rip), %ymm15, %ymm13
+        vminpd    %ymm15, %ymm7, %ymm2
+        vaddpd    %ymm15, %ymm15, %ymm7
+        vcmpnlt_uqpd %ymm10, %ymm4, %ymm1
+        vcvtpd2ps %ymm15, %xmm11
+        vmovupd   poly_coeff+64+__svml_dasin_data_internal(%rip), %ymm10
+        vmulpd    %ymm2, %ymm2, %ymm15
+        vrsqrtps  %xmm11, %xmm12
+        vmovupd   poly_coeff+192+__svml_dasin_data_internal(%rip), %ymm11
+        vfmadd213pd poly_coeff+96+__svml_dasin_data_internal(%rip), %ymm2, %ymm10
+        vcvtps2pd %xmm12, %ymm14
+        vmulpd    %ymm15, %ymm15, %ymm12
+        vfmadd213pd poly_coeff+224+__svml_dasin_data_internal(%rip), %ymm2, %ymm11
+        vandnpd   %ymm14, %ymm13, %ymm0
+        vandnpd   %ymm5, %ymm6, %ymm3
+        vmulpd    %ymm0, %ymm0, %ymm6
+        vmovupd   poly_coeff+128+__svml_dasin_data_internal(%rip), %ymm13
+        vmovupd   poly_coeff+256+__svml_dasin_data_internal(%rip), %ymm14
+        vfmadd213pd poly_coeff+160+__svml_dasin_data_internal(%rip), %ymm2, %ymm13
+        vfmadd213pd poly_coeff+288+__svml_dasin_data_internal(%rip), %ymm2, %ymm14
+        vfmadd213pd %ymm11, %ymm15, %ymm13
+        vmovmskpd %ymm9, %edx
+        vmulpd    %ymm7, %ymm0, %ymm9
+        vfmsub213pd Two+__svml_dasin_data_internal(%rip), %ymm6, %ymm7
+
+/* polynomial */
+        vmovupd   poly_coeff+__svml_dasin_data_internal(%rip), %ymm6
+        vmovupd   sqrt_coeff+__svml_dasin_data_internal(%rip), %ymm0
+        vmulpd    %ymm7, %ymm9, %ymm8
+        vfmadd213pd poly_coeff+32+__svml_dasin_data_internal(%rip), %ymm2, %ymm6
+        vfmadd213pd sqrt_coeff+32+__svml_dasin_data_internal(%rip), %ymm7, %ymm0
+        vfmadd213pd %ymm10, %ymm15, %ymm6
+        vmovupd   poly_coeff+320+__svml_dasin_data_internal(%rip), %ymm10
+        vfmadd213pd sqrt_coeff+64+__svml_dasin_data_internal(%rip), %ymm7, %ymm0
+        vfmadd213pd %ymm13, %ymm12, %ymm6
+        vfmadd213pd poly_coeff+352+__svml_dasin_data_internal(%rip), %ymm2, %ymm10
+        vfmadd213pd sqrt_coeff+96+__svml_dasin_data_internal(%rip), %ymm7, %ymm0
+        vfmadd213pd %ymm14, %ymm15, %ymm6
+        vfmsub213pd %ymm9, %ymm8, %ymm0
+        vfmadd213pd %ymm10, %ymm15, %ymm6
+        vblendvpd %ymm1, %ymm0, %ymm4, %ymm4
+        vmulpd    %ymm6, %ymm2, %ymm2
+        vfmadd213pd %ymm4, %ymm4, %ymm2
+        vandpd    Pi2H+__svml_dasin_data_internal(%rip), %ymm1, %ymm1
+        vaddpd    %ymm2, %ymm1, %ymm0
+        vxorpd    %ymm3, %ymm0, %ymm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm5
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm5, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      asin@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_asin_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dasin_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 AbsMask[4][2];
+        __declspec(align(32)) VUINT32 OneHalf[4][2];
+        __declspec(align(32)) VUINT32 SmallNorm[4][2];
+        __declspec(align(32)) VUINT32 One[4][2];
+        __declspec(align(32)) VUINT32 Two[4][2];
+        __declspec(align(32)) VUINT32 sqrt_coeff[4][4][2];
+        __declspec(align(32)) VUINT32 poly_coeff[12][4][2];
+        __declspec(align(32)) VUINT32 Pi2H[4][2];
+} __svml_dasin_data_internal;
+#endif
+__svml_dasin_data_internal:
+        /*== AbsMask ==*/
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== OneHalf ==*/
+        .align 32
+        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
+        /*== SmallNorm ==*/
+        .align 32
+        .quad 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000
+        /*== One ==*/
+        .align 32
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== Two ==*/
+        .align 32
+        .quad 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000
+        /*== sqrt_coeff[4] ==*/
+        .align 32
+        .quad 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3 /* sqrt_coeff4 */
+        .quad 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D /* sqrt_coeff3 */
+        .quad 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97 /* sqrt_coeff2 */
+        .quad 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D /* sqrt_coeff1 */
+        /*== poly_coeff[12] ==*/
+        .align 32
+        .quad 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909 /* poly_coeff12 */
+        .quad 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED /* poly_coeff11 */
+        .quad 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE /* poly_coeff10 */
+        .quad 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5 /* poly_coeff9 */
+        .quad 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6 /* poly_coeff8 */
+        .quad 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57 /* poly_coeff7 */
+        .quad 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E /* poly_coeff6 */
+        .quad 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd /* poly_coeff5 */
+        .quad 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE /* poly_coeff4 */
+        .quad 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8 /* poly_coeff3 */
+        .quad 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE /* poly_coeff2 */
+        .quad 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C /* poly_coeff1 */
+        /*== Pi2H ==*/
+        .align 32
+        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18
+        .align 32
+        .type	__svml_dasin_data_internal,@object
+        .size	__svml_dasin_data_internal,.-__svml_dasin_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core-avx2.S
new file mode 100644
index 0000000000..354a55dfaa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized asin, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_asin _ZGVeN8v_asin_avx2_wrapper
+#include "../svml_d_asin8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core.c
new file mode 100644
index 0000000000..b03e4a2b9c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized asin, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_asin
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_asin, __GI__ZGVeN8v_asin, __redirect__ZGVeN8v_asin)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core_avx512.S
new file mode 100644
index 0000000000..b2fd8edb13
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core_avx512.S
@@ -0,0 +1,295 @@
+/* Function asin vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      SelMask = (|x| >= 0.5) ? 1 : 0;
+ *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
+ *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
+ *
+ */
+
+/* Offsets for data table __svml_dasin_data_internal
+ */
+#define AbsMask                       	0
+#define OneHalf                       	64
+#define SmallNorm                     	128
+#define One                           	192
+#define Two                           	256
+#define sqrt_coeff_1                  	320
+#define sqrt_coeff_2                  	384
+#define sqrt_coeff_3                  	448
+#define sqrt_coeff_4                  	512
+#define poly_coeff_1                  	576
+#define poly_coeff_2                  	640
+#define poly_coeff_3                  	704
+#define poly_coeff_4                  	768
+#define poly_coeff_5                  	832
+#define poly_coeff_6                  	896
+#define poly_coeff_7                  	960
+#define poly_coeff_8                  	1024
+#define poly_coeff_9                  	1088
+#define poly_coeff_10                 	1152
+#define poly_coeff_11                 	1216
+#define poly_coeff_12                 	1280
+#define Pi2H                          	1344
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_asin_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   OneHalf+__svml_dasin_data_internal(%rip), %zmm8
+
+/* S ~ -2*sqrt(Y) */
+        vmovups   SmallNorm+__svml_dasin_data_internal(%rip), %zmm10
+        vmovups   Two+__svml_dasin_data_internal(%rip), %zmm14
+        vmovups   sqrt_coeff_1+__svml_dasin_data_internal(%rip), %zmm15
+        vmovups   sqrt_coeff_2+__svml_dasin_data_internal(%rip), %zmm2
+        vmovups   sqrt_coeff_3+__svml_dasin_data_internal(%rip), %zmm1
+        vmovups   One+__svml_dasin_data_internal(%rip), %zmm9
+        vmovaps   %zmm0, %zmm6
+
+/* x = |arg| */
+        vandpd    __svml_dasin_data_internal(%rip), %zmm6, %zmm4
+
+/* Y = 0.5 - 0.5*x */
+        vmovaps   %zmm8, %zmm11
+        vfnmadd231pd {rn-sae}, %zmm4, %zmm8, %zmm11
+
+/* x^2 */
+        vmulpd    {rn-sae}, %zmm4, %zmm4, %zmm7
+        vrsqrt14pd %zmm11, %zmm12
+        vcmppd    $17, {sae}, %zmm10, %zmm11, %k1
+        vcmppd    $21, {sae}, %zmm8, %zmm4, %k2
+        vcmppd    $17, {sae}, %zmm4, %zmm9, %k0
+        vmovups   poly_coeff_5+__svml_dasin_data_internal(%rip), %zmm10
+
+/* polynomial */
+        vmovups   poly_coeff_1+__svml_dasin_data_internal(%rip), %zmm8
+        vmovups   poly_coeff_3+__svml_dasin_data_internal(%rip), %zmm9
+        vminpd    {sae}, %zmm11, %zmm7, %zmm3
+        vxorpd    %zmm12, %zmm12, %zmm12{%k1}
+        vaddpd    {rn-sae}, %zmm11, %zmm11, %zmm0
+        vxorpd    %zmm6, %zmm4, %zmm5
+        vmulpd    {rn-sae}, %zmm12, %zmm12, %zmm13
+        vmulpd    {rn-sae}, %zmm12, %zmm0, %zmm7
+        vmovups   poly_coeff_7+__svml_dasin_data_internal(%rip), %zmm11
+        vmovups   poly_coeff_4+__svml_dasin_data_internal(%rip), %zmm12
+        vfmsub213pd {rn-sae}, %zmm14, %zmm13, %zmm0
+        vmovups   sqrt_coeff_4+__svml_dasin_data_internal(%rip), %zmm13
+        vfmadd231pd {rn-sae}, %zmm3, %zmm9, %zmm12
+        vmovups   poly_coeff_11+__svml_dasin_data_internal(%rip), %zmm9
+        vfmadd231pd {rn-sae}, %zmm0, %zmm15, %zmm2
+        vmovups   poly_coeff_9+__svml_dasin_data_internal(%rip), %zmm15
+        vmulpd    {rn-sae}, %zmm0, %zmm7, %zmm14
+        vfmadd213pd {rn-sae}, %zmm1, %zmm0, %zmm2
+        vmovups   poly_coeff_2+__svml_dasin_data_internal(%rip), %zmm1
+        kmovw     %k0, %edx
+        vfmadd213pd {rn-sae}, %zmm13, %zmm0, %zmm2
+        vfmadd231pd {rn-sae}, %zmm3, %zmm8, %zmm1
+        vmovups   poly_coeff_10+__svml_dasin_data_internal(%rip), %zmm8
+        vmulpd    {rn-sae}, %zmm3, %zmm3, %zmm0
+        vfmsub213pd {rn-sae}, %zmm7, %zmm14, %zmm2
+        vmovups   poly_coeff_6+__svml_dasin_data_internal(%rip), %zmm7
+        vfmadd231pd {rn-sae}, %zmm3, %zmm15, %zmm8
+        vfmadd213pd {rn-sae}, %zmm12, %zmm0, %zmm1
+        vblendmpd %zmm2, %zmm4, %zmm2{%k2}
+        vfmadd231pd {rn-sae}, %zmm3, %zmm10, %zmm7
+        vmovups   poly_coeff_8+__svml_dasin_data_internal(%rip), %zmm10
+        vmovups   Pi2H+__svml_dasin_data_internal(%rip), %zmm4
+        vfmadd231pd {rn-sae}, %zmm3, %zmm11, %zmm10
+        vmovups   poly_coeff_12+__svml_dasin_data_internal(%rip), %zmm11
+        vfmadd213pd {rn-sae}, %zmm10, %zmm0, %zmm7
+        vfmadd231pd {rn-sae}, %zmm3, %zmm9, %zmm11
+        vmulpd    {rn-sae}, %zmm0, %zmm0, %zmm10
+        vfmadd213pd {rn-sae}, %zmm7, %zmm10, %zmm1
+        vfmadd213pd {rn-sae}, %zmm8, %zmm0, %zmm1
+        vfmadd213pd {rn-sae}, %zmm11, %zmm0, %zmm1
+        vmulpd    {rn-sae}, %zmm3, %zmm1, %zmm3
+        vfmadd213pd {rn-sae}, %zmm2, %zmm2, %zmm3
+        vaddpd    {rn-sae}, %zmm4, %zmm3, %zmm3{%k2}
+        vxorpd    %zmm5, %zmm3, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm6
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm6, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      asin@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_asin_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dasin_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 AbsMask[8][2];
+        __declspec(align(64)) VUINT32 OneHalf[8][2];
+        __declspec(align(64)) VUINT32 SmallNorm[8][2];
+        __declspec(align(64)) VUINT32 One[8][2];
+        __declspec(align(64)) VUINT32 Two[8][2];
+        __declspec(align(64)) VUINT32 sqrt_coeff[4][8][2];
+        __declspec(align(64)) VUINT32 poly_coeff[12][8][2];
+        __declspec(align(64)) VUINT32 Pi2H[8][2];
+} __svml_dasin_data_internal;
+#endif
+__svml_dasin_data_internal:
+        /*== AbsMask ==*/
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== OneHalf ==*/
+        .align 64
+        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
+        /*== SmallNorm ==*/
+        .align 64
+        .quad 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000
+        /*== One ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== Two ==*/
+        .align 64
+        .quad 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000
+        /*== sqrt_coeff[4] ==*/
+        .align 64
+        .quad 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3 /* sqrt_coeff4 */
+        .quad 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D /* sqrt_coeff3 */
+        .quad 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97 /* sqrt_coeff2 */
+        .quad 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D /* sqrt_coeff1 */
+        /*== poly_coeff[12] ==*/
+        .align 64
+        .quad 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909 /* poly_coeff12 */
+        .quad 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED /* poly_coeff11 */
+        .quad 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE /* poly_coeff10 */
+        .quad 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5 /* poly_coeff9 */
+        .quad 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6 /* poly_coeff8 */
+        .quad 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57 /* poly_coeff7 */
+        .quad 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E /* poly_coeff6 */
+        .quad 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd /* poly_coeff5 */
+        .quad 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE /* poly_coeff4 */
+        .quad 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8 /* poly_coeff3 */
+        .quad 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE /* poly_coeff2 */
+        .quad 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C /* poly_coeff1 */
+        /*== Pi2H ==*/
+        .align 64
+        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18
+        .align 64
+        .type	__svml_dasin_data_internal,@object
+        .size	__svml_dasin_data_internal,.-__svml_dasin_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core-avx2.S
new file mode 100644
index 0000000000..e0582f27d4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized asinf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_asinf _ZGVeN16v_asinf_avx2_wrapper
+#include "../svml_s_asinf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core.c
new file mode 100644
index 0000000000..4435055566
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized asinf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_asinf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_asinf, __GI__ZGVeN16v_asinf,
+	       __redirect__ZGVeN16v_asinf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core_avx512.S
new file mode 100644
index 0000000000..7afdfd1317
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core_avx512.S
@@ -0,0 +1,260 @@
+/* Function asinf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      SelMask = (|x| >= 0.5) ? 1 : 0;
+ *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
+ *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
+ *
+ *
+ */
+
+/* Offsets for data table __svml_sasin_data_internal
+ */
+#define AbsMask                       	0
+#define OneHalf                       	64
+#define SmallNorm                     	128
+#define One                           	192
+#define Two                           	256
+#define sqrt_coeff_1                  	320
+#define sqrt_coeff_2                  	384
+#define poly_coeff_1                  	448
+#define poly_coeff_2                  	512
+#define poly_coeff_3                  	576
+#define poly_coeff_4                  	640
+#define poly_coeff_5                  	704
+#define Pi2H                          	768
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_asinf_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   __svml_sasin_data_internal(%rip), %zmm4
+        vmovups   OneHalf+__svml_sasin_data_internal(%rip), %zmm6
+
+/* SQ ~ -2*sqrt(Y) */
+        vmovups   SmallNorm+__svml_sasin_data_internal(%rip), %zmm8
+        vmovups   Two+__svml_sasin_data_internal(%rip), %zmm12
+        vmovups   sqrt_coeff_1+__svml_sasin_data_internal(%rip), %zmm13
+        vmovups   One+__svml_sasin_data_internal(%rip), %zmm7
+        vmovaps   %zmm0, %zmm3
+
+/* x = |arg| */
+        vandps    %zmm3, %zmm4, %zmm2
+        vandnps   %zmm3, %zmm4, %zmm1
+
+/* x^2 */
+        vmulps    {rn-sae}, %zmm2, %zmm2, %zmm5
+        vcmpps    $17, {sae}, %zmm2, %zmm7, %k0
+        vcmpps    $21, {sae}, %zmm6, %zmm2, %k2
+        vmovups   poly_coeff_2+__svml_sasin_data_internal(%rip), %zmm7
+        kmovw     %k0, %edx
+
+/* Y = 0.5 - 0.5*x */
+        vmovaps   %zmm6, %zmm9
+        vfnmadd231ps {rn-sae}, %zmm2, %zmm6, %zmm9
+        vmovups   poly_coeff_5+__svml_sasin_data_internal(%rip), %zmm6
+        vrsqrt14ps %zmm9, %zmm10
+        vcmpps    $17, {sae}, %zmm8, %zmm9, %k1
+        vminps    {sae}, %zmm9, %zmm5, %zmm0
+        vmovups   sqrt_coeff_2+__svml_sasin_data_internal(%rip), %zmm8
+        vmovups   poly_coeff_4+__svml_sasin_data_internal(%rip), %zmm5
+        vxorps    %zmm10, %zmm10, %zmm10{%k1}
+        vaddps    {rn-sae}, %zmm9, %zmm9, %zmm14
+        vmulps    {rn-sae}, %zmm10, %zmm10, %zmm11
+        vmulps    {rn-sae}, %zmm10, %zmm14, %zmm4
+        vfmsub213ps {rn-sae}, %zmm12, %zmm11, %zmm14
+        vmulps    {rn-sae}, %zmm14, %zmm4, %zmm15
+        vfmadd231ps {rn-sae}, %zmm14, %zmm13, %zmm8
+        vmovups   poly_coeff_3+__svml_sasin_data_internal(%rip), %zmm14
+
+/* polynomial */
+        vmovups   poly_coeff_1+__svml_sasin_data_internal(%rip), %zmm13
+        vfmsub213ps {rn-sae}, %zmm4, %zmm15, %zmm8
+        vfmadd231ps {rn-sae}, %zmm0, %zmm14, %zmm5
+        vfmadd231ps {rn-sae}, %zmm0, %zmm13, %zmm7
+        vmulps    {rn-sae}, %zmm0, %zmm0, %zmm15
+        vblendmps %zmm8, %zmm2, %zmm2{%k2}
+        vfmadd213ps {rn-sae}, %zmm5, %zmm15, %zmm7
+        vfmadd213ps {rn-sae}, %zmm6, %zmm0, %zmm7
+        vmulps    {rn-sae}, %zmm0, %zmm7, %zmm9
+        vmovups   Pi2H+__svml_sasin_data_internal(%rip), %zmm0
+        vfmadd213ps {rn-sae}, %zmm2, %zmm2, %zmm9
+        vaddps    {rn-sae}, %zmm0, %zmm9, %zmm9{%k2}
+        vxorps    %zmm1, %zmm9, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm3, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      asinf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_asinf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_sasin_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 AbsMask[16][1];
+        __declspec(align(64)) VUINT32 OneHalf[16][1];
+        __declspec(align(64)) VUINT32 SmallNorm[16][1];
+        __declspec(align(64)) VUINT32 One[16][1];
+        __declspec(align(64)) VUINT32 Two[16][1];
+        __declspec(align(64)) VUINT32 sqrt_coeff[2][16][1];
+        __declspec(align(64)) VUINT32 poly_coeff[5][16][1];
+        __declspec(align(64)) VUINT32 Pi2H[16][1];
+} __svml_sasin_data_internal;
+#endif
+__svml_sasin_data_internal:
+        /*== AbsMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== OneHalf ==*/
+        .align 64
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
+        /*== SmallNorm ==*/
+        .align 64
+        .long 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000
+        /*== One ==*/
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== Two ==*/
+        .align 64
+        .long 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000
+        /*== sqrt_coeff[2] ==*/
+        .align 64
+        .long 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004 /* sqrt_coeff2 */
+        .long 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001 /* sqrt_coeff1 */
+        /*== poly_coeff[5] ==*/
+        .align 64
+        .long 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07 /* poly_coeff5 */
+        .long 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B /* poly_coeff4 */
+        .long 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4 /* poly_coeff3 */
+        .long 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12 /* poly_coeff2 */
+        .long 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF /* poly_coeff1 */
+        /*== Pi2H ==*/
+        .align 64
+        .long 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB
+        .align 64
+        .type	__svml_sasin_data_internal,@object
+        .size	__svml_sasin_data_internal,.-__svml_sasin_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core-sse2.S
new file mode 100644
index 0000000000..b958db7795
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized asinf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_asinf _ZGVbN4v_asinf_sse2
+#include "../svml_s_asinf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core.c
new file mode 100644
index 0000000000..5a7aa94264
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized asinf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_asinf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_asinf, __GI__ZGVbN4v_asinf,
+	       __redirect__ZGVbN4v_asinf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core_sse4.S
new file mode 100644
index 0000000000..ddcceeb7b9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core_sse4.S
@@ -0,0 +1,252 @@
+/* Function asinf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      SelMask = (|x| >= 0.5) ? 1 : 0;
+ *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
+ *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
+ *
+ *
+ */
+
+/* Offsets for data table __svml_sasin_data_internal
+ */
+#define AbsMask                       	0
+#define OneHalf                       	16
+#define SmallNorm                     	32
+#define One                           	48
+#define Two                           	64
+#define sqrt_coeff                    	80
+#define poly_coeff                    	112
+#define Pi2H                          	192
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_asinf_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm2
+        movups    __svml_sasin_data_internal(%rip), %xmm1
+        movups    OneHalf+__svml_sasin_data_internal(%rip), %xmm5
+
+/* x = |arg| */
+        movaps    %xmm1, %xmm0
+        andps     %xmm2, %xmm0
+
+/* Y = 0.5 - 0.5*x */
+        movaps    %xmm5, %xmm3
+        mulps     %xmm0, %xmm3
+        movaps    %xmm5, %xmm8
+
+/* x^2 */
+        movaps    %xmm0, %xmm14
+        movaps    %xmm0, %xmm15
+        mulps     %xmm0, %xmm14
+        subps     %xmm3, %xmm8
+        cmpnltps  %xmm5, %xmm15
+
+/* SQ ~ -2*sqrt(Y) */
+        rsqrtps   %xmm8, %xmm6
+        minps     %xmm8, %xmm14
+        movaps    %xmm8, %xmm9
+        movaps    %xmm14, %xmm10
+        cmpltps   SmallNorm+__svml_sasin_data_internal(%rip), %xmm9
+        mulps     %xmm14, %xmm10
+        addps     %xmm8, %xmm8
+        andnps    %xmm6, %xmm9
+        movaps    %xmm15, %xmm3
+        movaps    %xmm9, %xmm7
+        andnps    %xmm0, %xmm3
+        mulps     %xmm9, %xmm7
+        andnps    %xmm2, %xmm1
+        mulps     %xmm8, %xmm9
+        mulps     %xmm7, %xmm8
+
+/* polynomial */
+        movups    poly_coeff+__svml_sasin_data_internal(%rip), %xmm11
+        mulps     %xmm14, %xmm11
+        subps     Two+__svml_sasin_data_internal(%rip), %xmm8
+        movups    poly_coeff+32+__svml_sasin_data_internal(%rip), %xmm12
+        mulps     %xmm14, %xmm12
+        addps     poly_coeff+16+__svml_sasin_data_internal(%rip), %xmm11
+        mulps     %xmm10, %xmm11
+        addps     poly_coeff+48+__svml_sasin_data_internal(%rip), %xmm12
+        movups    sqrt_coeff+__svml_sasin_data_internal(%rip), %xmm13
+        addps     %xmm11, %xmm12
+        mulps     %xmm8, %xmm13
+        mulps     %xmm9, %xmm8
+        mulps     %xmm14, %xmm12
+        addps     sqrt_coeff+16+__svml_sasin_data_internal(%rip), %xmm13
+        addps     poly_coeff+64+__svml_sasin_data_internal(%rip), %xmm12
+        mulps     %xmm8, %xmm13
+        mulps     %xmm12, %xmm14
+        subps     %xmm9, %xmm13
+        andps     %xmm15, %xmm13
+        orps      %xmm13, %xmm3
+        mulps     %xmm3, %xmm14
+        movups    One+__svml_sasin_data_internal(%rip), %xmm4
+        addps     %xmm14, %xmm3
+        cmpltps   %xmm0, %xmm4
+        movups    Pi2H+__svml_sasin_data_internal(%rip), %xmm0
+        andps     %xmm15, %xmm0
+        movmskps  %xmm4, %edx
+        addps     %xmm3, %xmm0
+        pxor      %xmm1, %xmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm2, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      asinf@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_asinf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_sasin_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 AbsMask[4][1];
+        __declspec(align(16)) VUINT32 OneHalf[4][1];
+        __declspec(align(16)) VUINT32 SmallNorm[4][1];
+        __declspec(align(16)) VUINT32 One[4][1];
+        __declspec(align(16)) VUINT32 Two[4][1];
+        __declspec(align(16)) VUINT32 sqrt_coeff[2][4][1];
+        __declspec(align(16)) VUINT32 poly_coeff[5][4][1];
+        __declspec(align(16)) VUINT32 Pi2H[4][1];
+} __svml_sasin_data_internal;
+#endif
+__svml_sasin_data_internal:
+        /*== AbsMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== OneHalf ==*/
+        .align 16
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
+        /*== SmallNorm ==*/
+        .align 16
+        .long 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000
+        /*== One ==*/
+        .align 16
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== Two ==*/
+        .align 16
+        .long 0x40000000, 0x40000000, 0x40000000, 0x40000000
+        /*== sqrt_coeff[2] ==*/
+        .align 16
+        .long 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004 /* sqrt_coeff2 */
+        .long 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001 /* sqrt_coeff1 */
+        /*== poly_coeff[5] ==*/
+        .align 16
+        .long 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07 /* poly_coeff5 */
+        .long 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B /* poly_coeff4 */
+        .long 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4 /* poly_coeff3 */
+        .long 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12 /* poly_coeff2 */
+        .long 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF /* poly_coeff1 */
+        /*== Pi2H ==*/
+        .align 16
+        .long 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB
+        .align 16
+        .type	__svml_sasin_data_internal,@object
+        .size	__svml_sasin_data_internal,.-__svml_sasin_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core-sse.S
new file mode 100644
index 0000000000..6273c919d6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized asinf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_asinf _ZGVdN8v_asinf_sse_wrapper
+#include "../svml_s_asinf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core.c
new file mode 100644
index 0000000000..946b25b43f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized asinf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_asinf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_asinf, __GI__ZGVdN8v_asinf,
+	       __redirect__ZGVdN8v_asinf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core_avx2.S
new file mode 100644
index 0000000000..89c156dbbb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core_avx2.S
@@ -0,0 +1,249 @@
+/* Function asinf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      SelMask = (|x| >= 0.5) ? 1 : 0;
+ *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
+ *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
+ *
+ *
+ */
+
+/* Offsets for data table __svml_sasin_data_internal
+ */
+#define AbsMask                       	0
+#define OneHalf                       	32
+#define SmallNorm                     	64
+#define One                           	96
+#define Two                           	128
+#define sqrt_coeff                    	160
+#define poly_coeff                    	224
+#define Pi2H                          	384
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_asinf_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        vmovups   __svml_sasin_data_internal(%rip), %ymm5
+        vmovups   OneHalf+__svml_sasin_data_internal(%rip), %ymm9
+        vmovups   One+__svml_sasin_data_internal(%rip), %ymm6
+        vmovaps   %ymm0, %ymm4
+
+/* x = |arg| */
+        vandps    %ymm4, %ymm5, %ymm3
+
+/* Y = 0.5 - 0.5*x */
+        vmovaps   %ymm9, %ymm12
+        vfnmadd231ps %ymm3, %ymm9, %ymm12
+
+/* x^2 */
+        vmulps    %ymm3, %ymm3, %ymm7
+        vcmplt_oqps %ymm3, %ymm6, %ymm8
+
+/* SQ ~ -2*sqrt(Y) */
+        vcmplt_oqps SmallNorm+__svml_sasin_data_internal(%rip), %ymm12, %ymm10
+        vminps    %ymm12, %ymm7, %ymm1
+        vaddps    %ymm12, %ymm12, %ymm15
+        vcmpnlt_uqps %ymm9, %ymm3, %ymm0
+        vrsqrtps  %ymm12, %ymm11
+        vmovups   poly_coeff+64+__svml_sasin_data_internal(%rip), %ymm7
+        vmulps    %ymm1, %ymm1, %ymm6
+        vmovups   sqrt_coeff+__svml_sasin_data_internal(%rip), %ymm9
+        vfmadd213ps poly_coeff+96+__svml_sasin_data_internal(%rip), %ymm1, %ymm7
+        vmovmskps %ymm8, %edx
+
+/* polynomial */
+        vmovups   poly_coeff+__svml_sasin_data_internal(%rip), %ymm8
+        vandnps   %ymm11, %ymm10, %ymm13
+        vmulps    %ymm13, %ymm13, %ymm14
+        vfmadd213ps poly_coeff+32+__svml_sasin_data_internal(%rip), %ymm1, %ymm8
+        vandnps   %ymm4, %ymm5, %ymm2
+        vmulps    %ymm15, %ymm13, %ymm5
+        vfmsub213ps Two+__svml_sasin_data_internal(%rip), %ymm14, %ymm15
+        vfmadd213ps %ymm7, %ymm6, %ymm8
+        vfmadd213ps sqrt_coeff+32+__svml_sasin_data_internal(%rip), %ymm15, %ymm9
+        vmulps    %ymm15, %ymm5, %ymm15
+        vfmadd213ps poly_coeff+128+__svml_sasin_data_internal(%rip), %ymm1, %ymm8
+        vfmsub213ps %ymm5, %ymm15, %ymm9
+        vmulps    %ymm8, %ymm1, %ymm1
+        vblendvps %ymm0, %ymm9, %ymm3, %ymm3
+        vfmadd213ps %ymm3, %ymm3, %ymm1
+        vandps    Pi2H+__svml_sasin_data_internal(%rip), %ymm0, %ymm0
+        vaddps    %ymm1, %ymm0, %ymm10
+        vxorps    %ymm2, %ymm10, %ymm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm4
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm4, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      asinf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_asinf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_sasin_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 AbsMask[8][1];
+        __declspec(align(32)) VUINT32 OneHalf[8][1];
+        __declspec(align(32)) VUINT32 SmallNorm[8][1];
+        __declspec(align(32)) VUINT32 One[8][1];
+        __declspec(align(32)) VUINT32 Two[8][1];
+        __declspec(align(32)) VUINT32 sqrt_coeff[2][8][1];
+        __declspec(align(32)) VUINT32 poly_coeff[5][8][1];
+        __declspec(align(32)) VUINT32 Pi2H[8][1];
+} __svml_sasin_data_internal;
+#endif
+__svml_sasin_data_internal:
+        /*== AbsMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== OneHalf ==*/
+        .align 32
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
+        /*== SmallNorm ==*/
+        .align 32
+        .long 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000
+        /*== One ==*/
+        .align 32
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== Two ==*/
+        .align 32
+        .long 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000
+        /*== sqrt_coeff[2] ==*/
+        .align 32
+        .long 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004 /* sqrt_coeff2 */
+        .long 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001 /* sqrt_coeff1 */
+        /*== poly_coeff[5] ==*/
+        .align 32
+        .long 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07 /* poly_coeff5 */
+        .long 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B /* poly_coeff4 */
+        .long 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4 /* poly_coeff3 */
+        .long 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12 /* poly_coeff2 */
+        .long 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF /* poly_coeff1 */
+        /*== Pi2H ==*/
+        .align 32
+        .long 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB
+        .align 32
+        .type	__svml_sasin_data_internal,@object
+        .size	__svml_sasin_data_internal,.-__svml_sasin_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_asin2_core.S b/sysdeps/x86_64/fpu/svml_d_asin2_core.S
new file mode 100644
index 0000000000..8ff8bc58df
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_asin2_core.S
@@ -0,0 +1,29 @@
+/* Function asin vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_asin)
+WRAPPER_IMPL_SSE2 asin
+END (_ZGVbN2v_asin)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_asin)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_asin4_core.S b/sysdeps/x86_64/fpu/svml_d_asin4_core.S
new file mode 100644
index 0000000000..dbe33952bc
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_asin4_core.S
@@ -0,0 +1,29 @@
+/* Function asin vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_asin)
+WRAPPER_IMPL_AVX _ZGVbN2v_asin
+END (_ZGVdN4v_asin)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_asin)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S
new file mode 100644
index 0000000000..513a31bde5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function asin vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_asin)
+WRAPPER_IMPL_AVX _ZGVbN2v_asin
+END (_ZGVcN4v_asin)
diff --git a/sysdeps/x86_64/fpu/svml_d_asin8_core.S b/sysdeps/x86_64/fpu/svml_d_asin8_core.S
new file mode 100644
index 0000000000..06694298cf
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_asin8_core.S
@@ -0,0 +1,25 @@
+/* Function asin vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_asin)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_asin
+END (_ZGVeN8v_asin)
diff --git a/sysdeps/x86_64/fpu/svml_s_asinf16_core.S b/sysdeps/x86_64/fpu/svml_s_asinf16_core.S
new file mode 100644
index 0000000000..015d583e3f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_asinf16_core.S
@@ -0,0 +1,25 @@
+/* Function asinf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_asinf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_asinf
+END (_ZGVeN16v_asinf)
diff --git a/sysdeps/x86_64/fpu/svml_s_asinf4_core.S b/sysdeps/x86_64/fpu/svml_s_asinf4_core.S
new file mode 100644
index 0000000000..d80f06c16d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_asinf4_core.S
@@ -0,0 +1,29 @@
+/* Function asinf vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_asinf)
+WRAPPER_IMPL_SSE2 asinf
+END (_ZGVbN4v_asinf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_asinf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_asinf8_core.S b/sysdeps/x86_64/fpu/svml_s_asinf8_core.S
new file mode 100644
index 0000000000..304ad0a7f5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_asinf8_core.S
@@ -0,0 +1,29 @@
+/* Function asinf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_asinf)
+WRAPPER_IMPL_AVX _ZGVbN4v_asinf
+END (_ZGVdN8v_asinf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_asinf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S
new file mode 100644
index 0000000000..a2f7dc112e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function asinf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_asinf)
+WRAPPER_IMPL_AVX _ZGVbN4v_asinf
+END (_ZGVcN8v_asinf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx.c
new file mode 100644
index 0000000000..e37cfdce58
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-asin.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx2.c
new file mode 100644
index 0000000000..e37cfdce58
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-asin.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx512f.c
new file mode 100644
index 0000000000..e37cfdce58
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-asin.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asin.c b/sysdeps/x86_64/fpu/test-double-libmvec-asin.c
new file mode 100644
index 0000000000..d2e16e67f4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-asin.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC asin
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 467c913990..5746bb5be3 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVbN2vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan)
+VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index b72a7de84e..8d3d5493ed 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVdN4v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVdN4vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan)
+VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index d2434df21e..f43328f2ff 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVcN4v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVcN4vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan)
+VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index f7aaf8159e..8b566c199a 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVeN8v_exp)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVeN8vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan)
+VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx.c
new file mode 100644
index 0000000000..6aa8f5f370
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-asinf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx2.c
new file mode 100644
index 0000000000..6aa8f5f370
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-asinf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx512f.c
new file mode 100644
index 0000000000..6aa8f5f370
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-asinf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinf.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinf.c
new file mode 100644
index 0000000000..2bbe2395a0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC asinf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index af769c56fa..3d3218a310 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVeN16v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf)
+VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 76e61d2f1e..7d75b9f60f 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf)
+VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 5e27eaaf29..405dde49bc 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVdN8v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf)
+VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 28daf79aa9..7558443f2e 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVcN8v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf)
+VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 03/18] x86-64: Add vector hypot/hypotf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
  2021-12-29  6:39 ` [PATCH v5 01/18] x86-64: Add vector atan/atanf implementation " Sunil K Pandey
  2021-12-29  6:39 ` [PATCH v5 02/18] x86-64: Add vector asin/asinf " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:24   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 04/18] x86-64: Add vector exp2/exp2f " Sunil K Pandey
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector hypot/hypotf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 ++
 .../fpu/multiarch/svml_d_hypot2_core-sse2.S   |  20 ++
 .../x86_64/fpu/multiarch/svml_d_hypot2_core.c |  28 ++
 .../fpu/multiarch/svml_d_hypot2_core_sse4.S   | 279 +++++++++++++++++
 .../fpu/multiarch/svml_d_hypot4_core-sse.S    |  20 ++
 .../x86_64/fpu/multiarch/svml_d_hypot4_core.c |  28 ++
 .../fpu/multiarch/svml_d_hypot4_core_avx2.S   | 289 ++++++++++++++++++
 .../fpu/multiarch/svml_d_hypot8_core-avx2.S   |  20 ++
 .../x86_64/fpu/multiarch/svml_d_hypot8_core.c |  28 ++
 .../fpu/multiarch/svml_d_hypot8_core_avx512.S | 235 ++++++++++++++
 .../fpu/multiarch/svml_s_hypotf16_core-avx2.S |  20 ++
 .../fpu/multiarch/svml_s_hypotf16_core.c      |  28 ++
 .../multiarch/svml_s_hypotf16_core_avx512.S   | 239 +++++++++++++++
 .../fpu/multiarch/svml_s_hypotf4_core-sse2.S  |  20 ++
 .../fpu/multiarch/svml_s_hypotf4_core.c       |  28 ++
 .../fpu/multiarch/svml_s_hypotf4_core_sse4.S  | 265 ++++++++++++++++
 .../fpu/multiarch/svml_s_hypotf8_core-sse.S   |  20 ++
 .../fpu/multiarch/svml_s_hypotf8_core.c       |  28 ++
 .../fpu/multiarch/svml_s_hypotf8_core_avx2.S  | 269 ++++++++++++++++
 sysdeps/x86_64/fpu/svml_d_hypot2_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_d_hypot4_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S   |  25 ++
 sysdeps/x86_64/fpu/svml_d_hypot8_core.S       |  25 ++
 sysdeps/x86_64/fpu/svml_s_hypotf16_core.S     |  25 ++
 sysdeps/x86_64/fpu/svml_s_hypotf4_core.S      |  29 ++
 sysdeps/x86_64/fpu/svml_s_hypotf8_core.S      |  29 ++
 sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S  |  25 ++
 .../fpu/test-double-libmvec-hypot-avx.c       |   1 +
 .../fpu/test-double-libmvec-hypot-avx2.c      |   1 +
 .../fpu/test-double-libmvec-hypot-avx512f.c   |   1 +
 .../x86_64/fpu/test-double-libmvec-hypot.c    |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../fpu/test-float-libmvec-hypotf-avx.c       |   1 +
 .../fpu/test-float-libmvec-hypotf-avx2.c      |   1 +
 .../fpu/test-float-libmvec-hypotf-avx512f.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-hypotf.c    |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 2151 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index ae8ee882d0..adf65f6bc2 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -131,4 +131,15 @@
 #define __DECL_SIMD_asinf32x
 #define __DECL_SIMD_asinf64x
 #define __DECL_SIMD_asinf128x
+
+#define __DECL_SIMD_hypot
+#define __DECL_SIMD_hypotf
+#define __DECL_SIMD_hypotl
+#define __DECL_SIMD_hypotf16
+#define __DECL_SIMD_hypotf32
+#define __DECL_SIMD_hypotf64
+#define __DECL_SIMD_hypotf128
+#define __DECL_SIMD_hypotf32x
+#define __DECL_SIMD_hypotf64x
+#define __DECL_SIMD_hypotf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index bb53b7021e..2ed820a0dc 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -144,7 +144,7 @@ __MATHCALL (sqrt,, (_Mdouble_ __x));
 
 #if defined __USE_XOPEN || defined __USE_ISOC99
 /* Return `sqrt(X*X + Y*Y)'.  */
-__MATHCALL (hypot,, (_Mdouble_ __x, _Mdouble_ __y));
+__MATHCALL_VEC (hypot,, (_Mdouble_ __x, _Mdouble_ __y));
 #endif
 
 #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index ab03a07f92..12bb03245b 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -49,24 +49,32 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
 GLIBC_2.35 _ZGVbN2v_acos F
 GLIBC_2.35 _ZGVbN2v_asin F
 GLIBC_2.35 _ZGVbN2v_atan F
+GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
 GLIBC_2.35 _ZGVbN4v_asinf F
 GLIBC_2.35 _ZGVbN4v_atanf F
+GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
 GLIBC_2.35 _ZGVcN4v_asin F
 GLIBC_2.35 _ZGVcN4v_atan F
+GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
 GLIBC_2.35 _ZGVcN8v_asinf F
 GLIBC_2.35 _ZGVcN8v_atanf F
+GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
 GLIBC_2.35 _ZGVdN4v_asin F
 GLIBC_2.35 _ZGVdN4v_atan F
+GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
 GLIBC_2.35 _ZGVdN8v_asinf F
 GLIBC_2.35 _ZGVdN8v_atanf F
+GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
 GLIBC_2.35 _ZGVeN16v_asinf F
 GLIBC_2.35 _ZGVeN16v_atanf F
+GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
 GLIBC_2.35 _ZGVeN8v_asin F
 GLIBC_2.35 _ZGVeN8v_atan F
+GLIBC_2.35 _ZGVeN8vv_hypot F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 73cb8849ff..437977c5fd 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -70,6 +70,10 @@
 #  define __DECL_SIMD_asin __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_asinf
 #  define __DECL_SIMD_asinf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_hypot
+#  define __DECL_SIMD_hypot __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_hypotf
+#  define __DECL_SIMD_hypotf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index 4552c2bdfa..cda31479a6 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -34,6 +34,8 @@
 !GCC$ builtin (atanf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (asin) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (asinf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (hypot) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (hypotf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -53,3 +55,5 @@
 !GCC$ builtin (atanf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (asin) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (asinf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (hypot) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (hypotf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index e0eae0b196..7769a02731 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -27,6 +27,7 @@ libmvec-funcs = \
   atan \
   cos \
   exp \
+  hypot \
   log \
   pow \
   sin \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 10baf869a5..e359e5dc2c 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -17,8 +17,10 @@ libmvec {
     _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
     _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
     _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
+    _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
     _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
+    _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
   }
 }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index ea0f833381..a7513ec94e 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1375,6 +1375,26 @@ double: 1
 float128: 1
 ldouble: 1
 
+Function: "hypot_vlen16":
+float: 1
+
+Function: "hypot_vlen2":
+double: 1
+
+Function: "hypot_vlen4":
+double: 1
+float: 1
+
+Function: "hypot_vlen4_avx2":
+double: 1
+
+Function: "hypot_vlen8":
+double: 1
+float: 1
+
+Function: "hypot_vlen8_avx2":
+float: 1
+
 Function: "j0":
 double: 3
 float: 9
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S
new file mode 100644
index 0000000000..237e38459e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized hypot.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2vv_hypot _ZGVbN2vv_hypot_sse2
+#include "../svml_d_hypot2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c
new file mode 100644
index 0000000000..3f0865f05d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized hypot, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2vv_hypot
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2vv_hypot, __GI__ZGVbN2vv_hypot,
+	       __redirect__ZGVbN2vv_hypot)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S
new file mode 100644
index 0000000000..931f34e5f2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S
@@ -0,0 +1,279 @@
+/* Function hypot vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      HIGH LEVEL OVERVIEW
+ *
+ *      Calculate z = (x*x+y*y)
+ *      Calculate reciplicle sqrt (z)
+ *      Calculate error = z*(rsqrt(z)*rsqrt(z)) - 1
+ *      Calculate fixing part p with polynom
+ *      Fix answer with sqrt(z) = z * rsqrt(z) + error * p * z
+ *
+ *      ALGORITHM DETAILS
+ *
+ *    Multiprecision branch for _HA_ only
+ *      Remove sigm from both arguments
+ *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
+ *      Split _x int _a and _b for multiprecision
+ *      If _x >> _y we will we will not split _y for multiprecision
+ *      all _y will be put into lower part (_d) and higher part (_c = 0)
+ *      Fixing _hilo_mask for the case _x >> _y
+ *      Split _y into _c and _d for multiprecision with fixed mask
+ *
+ *      compute Hi and Lo parts of _z = _x*_x + _y*_y
+ *
+ *      _zHi = _a*_a + _c*_c
+ *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
+ *      _z = _zHi + _zLo
+ *
+ *    No multiprecision branch for _LA_ and _EP_
+ *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
+ *
+ *    Check _z exponent to be withing borders [3BC ; 441] else goto Callout
+ *
+ *    _s  ~ 1.0/sqrt(_z)
+ *    _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z = (1.0/_z + O)
+ *    _e[rror]  =  (1.0/_z + O) * _z - 1.0
+ *    calculate fixing part _p
+ *    _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1
+ *    some parts of polynom are skipped for lower flav
+ *
+ *    result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dhypot_data_internal
+ */
+#define _dHiLoMask                    	0
+#define _dAbsMask                     	16
+#define _dOne                         	32
+#define _POLY_C5                      	48
+#define _POLY_C4                      	64
+#define _POLY_C3                      	80
+#define _POLY_C2                      	96
+#define _POLY_C1                      	112
+#define _LowBoundary                  	128
+#define _HighBoundary                 	144
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2vv_hypot_sse4)
+        subq      $88, %rsp
+        cfi_def_cfa_offset(96)
+
+/*
+ *  Defines
+ *  Implementation
+ * Multiprecision branch for _HA_ only
+ * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
+ */
+        movaps    %xmm0, %xmm10
+        movaps    %xmm1, %xmm2
+        mulpd     %xmm0, %xmm10
+        mulpd     %xmm1, %xmm2
+        addpd     %xmm2, %xmm10
+
+/*
+ * _s  ~ 1.0/sqrt(_z)
+ * _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z
+ */
+        cvtpd2ps  %xmm10, %xmm7
+        movlhps   %xmm7, %xmm7
+        rsqrtps   %xmm7, %xmm8
+        cvtps2pd  %xmm8, %xmm11
+        movaps    %xmm11, %xmm2
+        mulpd     %xmm11, %xmm2
+
+/* _e[rror]  ~  (1.0/_z + O) * _z - 1.0 */
+        mulpd     %xmm10, %xmm2
+        subpd     _dOne+__svml_dhypot_data_internal(%rip), %xmm2
+
+/*
+ * calculate fixing part _p
+ * _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1
+ * some parts of polynom are skipped for lower flav
+ */
+        movups    _POLY_C4+__svml_dhypot_data_internal(%rip), %xmm9
+        mulpd     %xmm2, %xmm9
+        addpd     _POLY_C3+__svml_dhypot_data_internal(%rip), %xmm9
+        mulpd     %xmm2, %xmm9
+        addpd     _POLY_C2+__svml_dhypot_data_internal(%rip), %xmm9
+        mulpd     %xmm2, %xmm9
+        addpd     _POLY_C1+__svml_dhypot_data_internal(%rip), %xmm9
+
+/* result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z */
+        mulpd     %xmm9, %xmm2
+        mulpd     %xmm11, %xmm2
+        mulpd     %xmm10, %xmm11
+        mulpd     %xmm10, %xmm2
+
+/* Check _z exponent to be withing borders [3BC ; 441] else goto Callout */
+        movq      _LowBoundary+__svml_dhypot_data_internal(%rip), %xmm5
+        movq      _HighBoundary+__svml_dhypot_data_internal(%rip), %xmm3
+        pshufd    $221, %xmm10, %xmm4
+        pcmpgtd   %xmm4, %xmm5
+        pcmpgtd   %xmm3, %xmm4
+        por       %xmm4, %xmm5
+        pshufd    $80, %xmm5, %xmm6
+        movmskpd  %xmm6, %edx
+        addpd     %xmm11, %xmm2
+
+/*  The end of implementation  */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1 xmm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm2, %xmm0
+        addq      $88, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(96)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm1, 48(%rsp)
+        movups    %xmm2, 64(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -80)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -88)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -96)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    64(%rsp), %xmm2
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -80)
+        cfi_offset(13, -88)
+        cfi_offset(14, -96)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm2
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        movsd     48(%rsp,%r14,8), %xmm1
+        call      hypot@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN2vv_hypot_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dhypot_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _dHiLoMask[2][2];
+        __declspec(align(16)) VUINT32 _dAbsMask[2][2];
+        __declspec(align(16)) VUINT32 _dOne[2][2];
+        __declspec(align(16)) VUINT32 _POLY_C5[2][2];
+        __declspec(align(16)) VUINT32 _POLY_C4[2][2];
+        __declspec(align(16)) VUINT32 _POLY_C3[2][2];
+        __declspec(align(16)) VUINT32 _POLY_C2[2][2];
+        __declspec(align(16)) VUINT32 _POLY_C1[2][2];
+        __declspec(align(16)) VUINT32 _LowBoundary[4][1];
+        __declspec(align(16)) VUINT32 _HighBoundary[4][1];
+} __svml_dhypot_data_internal;
+#endif
+__svml_dhypot_data_internal:
+        /* legacy algorithm */
+        .quad 0xffffc00000000000, 0xffffc00000000000       /* _dHiLoMask     */
+        .align 16
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff       /* _dAbsMask      */
+        .align 16
+        .quad 0x3FF0000000000000, 0x3FF0000000000000       /* _dOne          */
+        .align 16
+        .quad 0xBFCF800000000000, 0xBFCF800000000000       /* _POLY_C5            */
+        .align 16
+        .quad 0x3FD1800000000000, 0x3FD1800000000000       /* _POLY_C4            */
+        .align 16
+        .quad 0xBFD4000000000000, 0xBFD4000000000000       /* _POLY_C3            */
+        .align 16
+        .quad 0x3FD8000000000000, 0x3FD8000000000000       /* _POLY_C2            */
+        .align 16
+        .quad 0xBFE0000000000000, 0xBFE0000000000000       /* _POLY_C1            */
+        .align 16
+        .long 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000       /* _LowBoundary   */
+        .align 16
+        .long 0x44100000, 0x44100000, 0x44100000, 0x44100000       /* _HighBoundary  */
+        .align 16
+        .type	__svml_dhypot_data_internal,@object
+        .size	__svml_dhypot_data_internal,.-__svml_dhypot_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S
new file mode 100644
index 0000000000..5e7c75c44c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized hypot.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4vv_hypot _ZGVdN4vv_hypot_sse_wrapper
+#include "../svml_d_hypot4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c
new file mode 100644
index 0000000000..06f34d35e1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized hypot, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4vv_hypot
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4vv_hypot, __GI__ZGVdN4vv_hypot,
+	       __redirect__ZGVdN4vv_hypot)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S
new file mode 100644
index 0000000000..45028ab7e9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S
@@ -0,0 +1,289 @@
+/* Function hypot vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      HIGH LEVEL OVERVIEW
+ *
+ *      Calculate z = (x*x+y*y)
+ *      Calculate reciplicle sqrt (z)
+ *      Calculate error = z*(rsqrt(z)*rsqrt(z)) - 1
+ *      Calculate fixing part p with polynom
+ *      Fix answer with sqrt(z) = z * rsqrt(z) + error * p * z
+ *
+ *      ALGORITHM DETAILS
+ *
+ *    Multiprecision branch for _HA_ only
+ *      Remove sigm from both arguments
+ *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
+ *      Split _x int _a and _b for multiprecision
+ *      If _x >> _y we will we will not split _y for multiprecision
+ *      all _y will be put into lower part (_d) and higher part (_c = 0)
+ *      Fixing _hilo_mask for the case _x >> _y
+ *      Split _y into _c and _d for multiprecision with fixed mask
+ *
+ *      compute Hi and Lo parts of _z = _x*_x + _y*_y
+ *
+ *      _zHi = _a*_a + _c*_c
+ *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
+ *      _z = _zHi + _zLo
+ *
+ *    No multiprecision branch for _LA_ and _EP_
+ *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
+ *
+ *    Check _z exponent to be withing borders [3BC ; 441] else goto Callout
+ *
+ *    _s  ~ 1.0/sqrt(_z)
+ *    _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z = (1.0/_z + O)
+ *    _e[rror]  =  (1.0/_z + O) * _z - 1.0
+ *    calculate fixing part _p
+ *    _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1
+ *    some parts of polynom are skipped for lower flav
+ *
+ *    result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dhypot_data_internal
+ */
+#define _dHiLoMask                    	0
+#define _dAbsMask                     	32
+#define _dOne                         	64
+#define _POLY_C5                      	96
+#define _POLY_C4                      	128
+#define _POLY_C3                      	160
+#define _POLY_C2                      	192
+#define _POLY_C1                      	224
+#define _LowBoundary                  	256
+#define _HighBoundary                 	288
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4vv_hypot_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $128, %rsp
+        vmovapd   %ymm1, %ymm2
+        vmovapd   %ymm0, %ymm1
+
+/*
+ *  Defines
+ *  Implementation
+ * Multiprecision branch for _HA_ only
+ * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
+ */
+        vmulpd    %ymm1, %ymm1, %ymm0
+
+/*
+ * calculate fixing part _p
+ * _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1
+ * some parts of polynom are skipped for lower flav
+ */
+        vmovupd   _POLY_C4+__svml_dhypot_data_internal(%rip), %ymm15
+        vmovups   _LowBoundary+__svml_dhypot_data_internal(%rip), %xmm4
+        vfmadd231pd %ymm2, %ymm2, %ymm0
+
+/*
+ * _s  ~ 1.0/sqrt(_z)
+ * _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z
+ */
+        vcvtpd2ps %ymm0, %xmm12
+
+/* Check _z exponent to be withing borders [3BC ; 441] else goto Callout */
+        vextractf128 $1, %ymm0, %xmm3
+        vrsqrtps  %xmm12, %xmm13
+        vshufps   $221, %xmm3, %xmm0, %xmm5
+        vcvtps2pd %xmm13, %ymm3
+        vpcmpgtd  %xmm5, %xmm4, %xmm6
+        vpcmpgtd  _HighBoundary+__svml_dhypot_data_internal(%rip), %xmm5, %xmm7
+        vpor      %xmm7, %xmm6, %xmm9
+        vpshufd   $80, %xmm9, %xmm8
+        vmulpd    %ymm3, %ymm3, %ymm14
+        vpshufd   $250, %xmm9, %xmm10
+
+/* _e[rror]  ~  (1.0/_z + O) * _z - 1.0 */
+        vfmsub213pd _dOne+__svml_dhypot_data_internal(%rip), %ymm0, %ymm14
+        vfmadd213pd _POLY_C3+__svml_dhypot_data_internal(%rip), %ymm14, %ymm15
+        vfmadd213pd _POLY_C2+__svml_dhypot_data_internal(%rip), %ymm14, %ymm15
+        vfmadd213pd _POLY_C1+__svml_dhypot_data_internal(%rip), %ymm14, %ymm15
+
+/* result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z */
+        vmulpd    %ymm15, %ymm14, %ymm14
+        vmulpd    %ymm14, %ymm3, %ymm15
+        vmulpd    %ymm15, %ymm0, %ymm4
+        vfmadd213pd %ymm4, %ymm3, %ymm0
+        vinsertf128 $1, %xmm10, %ymm8, %ymm11
+        vmovmskpd %ymm11, %edx
+
+/*  The end of implementation  */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1 ymm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm1, 32(%rsp)
+        vmovupd   %ymm2, 64(%rsp)
+        vmovupd   %ymm0, 96(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   96(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        movsd     64(%rsp,%r14,8), %xmm1
+        call      hypot@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 96(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4vv_hypot_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dhypot_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _dHiLoMask[4][2];
+        __declspec(align(32)) VUINT32 _dAbsMask[4][2];
+        __declspec(align(32)) VUINT32 _dOne[4][2];
+        __declspec(align(32)) VUINT32 _POLY_C5[4][2];
+        __declspec(align(32)) VUINT32 _POLY_C4[4][2];
+        __declspec(align(32)) VUINT32 _POLY_C3[4][2];
+        __declspec(align(32)) VUINT32 _POLY_C2[4][2];
+        __declspec(align(32)) VUINT32 _POLY_C1[4][2];
+        __declspec(align(32)) VUINT32 _LowBoundary[8][1];
+        __declspec(align(32)) VUINT32 _HighBoundary[8][1];
+} __svml_dhypot_data_internal;
+#endif
+__svml_dhypot_data_internal:
+        /* legacy algorithm */
+        .quad 0xffffc00000000000, 0xffffc00000000000, 0xffffc00000000000, 0xffffc00000000000       /* _dHiLoMask     */
+        .align 32
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff       /* _dAbsMask      */
+        .align 32
+        .quad 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000       /* _dOne          */
+        .align 32
+        .quad 0xBFCF800000000000, 0xBFCF800000000000, 0xBFCF800000000000, 0xBFCF800000000000       /* _POLY_C5            */
+        .align 32
+        .quad 0x3FD1800000000000, 0x3FD1800000000000, 0x3FD1800000000000, 0x3FD1800000000000       /* _POLY_C4            */
+        .align 32
+        .quad 0xBFD4000000000000, 0xBFD4000000000000, 0xBFD4000000000000, 0xBFD4000000000000       /* _POLY_C3            */
+        .align 32
+        .quad 0x3FD8000000000000, 0x3FD8000000000000, 0x3FD8000000000000, 0x3FD8000000000000       /* _POLY_C2            */
+        .align 32
+        .quad 0xBFE0000000000000, 0xBFE0000000000000, 0xBFE0000000000000, 0xBFE0000000000000       /* _POLY_C1            */
+        .align 32
+        .long 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000       /* _LowBoundary   */
+        .align 32
+        .long 0x44100000, 0x44100000, 0x44100000, 0x44100000, 0x44100000, 0x44100000, 0x44100000, 0x44100000       /* _HighBoundary  */
+        .align 32
+        .type	__svml_dhypot_data_internal,@object
+        .size	__svml_dhypot_data_internal,.-__svml_dhypot_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S
new file mode 100644
index 0000000000..a53e82cf9a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized hypot.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8vv_hypot _ZGVeN8vv_hypot_avx2_wrapper
+#include "../svml_d_hypot8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c
new file mode 100644
index 0000000000..6052c752c9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized hypot, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8vv_hypot
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8vv_hypot, __GI__ZGVeN8vv_hypot,
+	       __redirect__ZGVeN8vv_hypot)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S
new file mode 100644
index 0000000000..1e5e716a8d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S
@@ -0,0 +1,235 @@
+/* Function hypot vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      HIGH LEVEL OVERVIEW
+ *
+ *      Calculate z = (x*x+y*y)
+ *      Calculate reciplicle sqrt (z)
+ *      Calculate error = z*(rsqrt(z)*rsqrt(z)) - 1
+ *      Calculate fixing part p with polynom
+ *      Fix answer with sqrt(z) = z * rsqrt(z) + error * p * z
+ *
+ *      ALGORITHM DETAILS
+ *
+ *    Multiprecision branch for _HA_ only
+ *      Remove sigm from both arguments
+ *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
+ *      Split _x int _a and _b for multiprecision
+ *      If _x >> _y we will we will not split _y for multiprecision
+ *      all _y will be put into lower part (_d) and higher part (_c = 0)
+ *      Fixing _hilo_mask for the case _x >> _y
+ *      Split _y into _c and _d for multiprecision with fixed mask
+ *
+ *      compute Hi and Lo parts of _z = _x*_x + _y*_y
+ *
+ *      _zHi = _a*_a + _c*_c
+ *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
+ *      _z = _zHi + _zLo
+ *
+ *    No multiprecision branch for _LA_ and _EP_
+ *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
+ *
+ *    Check _z exponent to be withing borders [3BC ; 441] else goto Callout
+ *
+ *    _s  ~ 1.0/sqrt(_z)
+ *    _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z = (1.0/_z + O)
+ *    _e[rror]  =  (1.0/_z + O) * _z - 1.0
+ *    calculate fixing part _p
+ *    _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1
+ *    some parts of polynom are skipped for lower flav
+ *
+ *    result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dhypot_data_internal
+ */
+#define _dAbsMask                     	0
+#define _lExpBound_uisa               	64
+#define _lExpBound                    	128
+#define _dHalf                        	192
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8vv_hypot_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $256, %rsp
+        vgetexppd {sae}, %zmm0, %zmm2
+        vgetexppd {sae}, %zmm1, %zmm3
+        vmovups   _dHalf+__svml_dhypot_data_internal(%rip), %zmm9
+        vmaxpd    {sae}, %zmm3, %zmm2, %zmm4
+        vmulpd    {rn-sae}, %zmm0, %zmm0, %zmm2
+        vandpd    _dAbsMask+__svml_dhypot_data_internal(%rip), %zmm4, %zmm5
+        vfmadd231pd {rn-sae}, %zmm1, %zmm1, %zmm2
+
+/* Select exponent bound so that no scaling is needed */
+        vpcmpq    $5, _lExpBound_uisa+__svml_dhypot_data_internal(%rip), %zmm5, %k0
+        vrsqrt14pd %zmm2, %zmm6
+        kmovw     %k0, %edx
+        vmulpd    {rn-sae}, %zmm6, %zmm2, %zmm7
+        vmulpd    {rn-sae}, %zmm6, %zmm9, %zmm8
+        vfnmadd231pd {rn-sae}, %zmm7, %zmm8, %zmm9
+        vfmadd231pd {rn-sae}, %zmm9, %zmm8, %zmm8
+        vfmadd213pd {rn-sae}, %zmm7, %zmm7, %zmm9
+        vfnmadd231pd {rn-sae}, %zmm9, %zmm9, %zmm2
+        vfmadd213pd {rn-sae}, %zmm9, %zmm8, %zmm2
+
+/*  The end of implementation  */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1 zmm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %zmm2, %zmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm0, 64(%rsp)
+        vmovups   %zmm1, 128(%rsp)
+        vmovups   %zmm2, 192(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm2
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   192(%rsp), %zmm2
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm2
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        movsd     128(%rsp,%r14,8), %xmm1
+        call      hypot@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 192(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8vv_hypot_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dhypot_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(64)) VUINT32 _dAbsMask[8][2];
+        __declspec(align(64)) VUINT32 _lExpBound_uisa[8][2];
+        __declspec(align(64)) VUINT32 _lExpBound[8][2];
+        __declspec(align(64)) VUINT32 _dHalf[8][2];
+} __svml_dhypot_data_internal;
+#endif
+__svml_dhypot_data_internal:
+        /* legacy algorithm */
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff       /* _dAbsMask      */
+        /* fma based algorithm*/
+        .align 64
+        .quad 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000       /* _lExpBound_uisa */
+        .align 64
+        .quad 0x404f800000000000, 0x404f800000000000, 0x404f800000000000, 0x404f800000000000, 0x404f800000000000, 0x404f800000000000, 0x404f800000000000, 0x404f800000000000       /* _lExpBound      */
+        .align 64
+        .quad 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000       /* _dHalf          */
+        .align 64
+        .type	__svml_dhypot_data_internal,@object
+        .size	__svml_dhypot_data_internal,.-__svml_dhypot_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S
new file mode 100644
index 0000000000..a6ba40df4d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized hypotf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16vv_hypotf _ZGVeN16vv_hypotf_avx2_wrapper
+#include "../svml_s_hypotf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c
new file mode 100644
index 0000000000..0c9eb6a364
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized hypotf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16vv_hypotf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16vv_hypotf, __GI__ZGVeN16vv_hypotf,
+	       __redirect__ZGVeN16vv_hypotf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S
new file mode 100644
index 0000000000..46a156d136
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S
@@ -0,0 +1,239 @@
+/* Function hypotf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      HIGH LEVEL OVERVIEW
+ *
+ *      Calculate z = (x*x+y*y)
+ *      Calculate reciplicle sqrt (z)
+ *      Calculate make two NR iterations
+ *
+ *      ALGORITHM DETAILS
+ *
+ *    Multiprecision branch for _HA_ only
+ *      Remove sigm from both arguments
+ *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
+ *      Split _x int _a and _b for multiprecision
+ *      If _x >> _y we will we will not split _y for multiprecision
+ *      all _y will be put into lower part (_d) and higher part (_c = 0)
+ *      Fixing _hilo_mask for the case _x >> _y
+ *      Split _y into _c and _d for multiprecision with fixed mask
+ *
+ *      compute Hi and Lo parts of _z = _x*_x + _y*_y
+ *
+ *      _zHi = _a*_a + _c*_c
+ *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
+ *      _z = _zHi + _zLo
+ *
+ *    No multiprecision branch for _LA_ and _EP_
+ *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
+ *
+ *    Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout
+ *
+ *    Compute resciplicle sqrt s0 ~ 1.0/sqrt(_z),
+ *      that multiplied by _z, is final result for _EP_ version.
+ *
+ *    First iteration (or zero iteration):
+ *       s =  z * s0
+ *       h = .5 * s0
+ *       d =  s *  h - .5
+ *
+ *    Second iteration:
+ *       h = d * h + h
+ *       s = s * d + s
+ *       d = s * s - z (in multiprecision for _HA_)
+ *
+ *    result = s - h * d
+ *
+ *    EP version of the function can be implemented as y[i]=sqrt(a[i]^2+b[i]^2)
+ *    with all intermediate operations done in target precision for i=1,..,n.
+ *    It can return result y[i]=0 in case a[i]^2 and b[i]^2 underflow in target
+ *    precision (for some i). It can return result y[i]=NAN in case
+ *    a[i]^2+b[i]^2 overflow in target precision, for some i. It can return
+ *    result y[i]=NAN in case a[i] or b[i] is infinite, for some i.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_shypot_data_internal
+ */
+#define _sAbsMask                     	0
+#define _sHalf                        	64
+#define _iExpBound                    	128
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16vv_hypotf_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $256, %rsp
+        vgetexpps {sae}, %zmm0, %zmm2
+        vgetexpps {sae}, %zmm1, %zmm3
+        vmovups   _sHalf+__svml_shypot_data_internal(%rip), %zmm6
+        vmaxps    {sae}, %zmm3, %zmm2, %zmm4
+        vmulps    {rn-sae}, %zmm0, %zmm0, %zmm2
+        vandps    _sAbsMask+__svml_shypot_data_internal(%rip), %zmm4, %zmm5
+        vfmadd231ps {rn-sae}, %zmm1, %zmm1, %zmm2
+        vpcmpd    $5, _iExpBound+__svml_shypot_data_internal(%rip), %zmm5, %k0
+        vrsqrt14ps %zmm2, %zmm7
+        kmovw     %k0, %edx
+        vmulps    {rn-sae}, %zmm7, %zmm2, %zmm9
+        vmulps    {rn-sae}, %zmm7, %zmm6, %zmm8
+        vfnmadd231ps {rn-sae}, %zmm9, %zmm9, %zmm2
+        vfmadd213ps {rn-sae}, %zmm9, %zmm8, %zmm2
+
+/*
+ * VSCALEF( S, _VRES1, _VRES1, sExp );
+ *  The end of implementation
+ */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1 zmm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %zmm2, %zmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm0, 64(%rsp)
+        vmovups   %zmm1, 128(%rsp)
+        vmovups   %zmm2, 192(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm2
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   192(%rsp), %zmm2
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm2
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        movss     128(%rsp,%r14,4), %xmm1
+        call      hypotf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 192(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16vv_hypotf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_shypot_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(64)) VUINT32 _sAbsMask[16][1];
+        __declspec(align(64)) VUINT32 _sHalf[16][1];
+        __declspec(align(64)) VUINT32 _iExpBound[16][1];
+} __svml_shypot_data_internal;
+#endif
+__svml_shypot_data_internal:
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _sAbsMask      */
+        .align 64
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000  /* _sHalf         */
+        /* fma based algorithm*/
+        .align 64
+        .long 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000  /* _iExpBound     */
+        .align 64
+        .type	__svml_shypot_data_internal,@object
+        .size	__svml_shypot_data_internal,.-__svml_shypot_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S
new file mode 100644
index 0000000000..5e9dd22d94
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized hypotf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4vv_hypotf _ZGVbN4vv_hypotf_sse2
+#include "../svml_s_hypotf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c
new file mode 100644
index 0000000000..91c9f5ca3f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized hypotf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4vv_hypotf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4vv_hypotf, __GI__ZGVbN4vv_hypotf,
+	       __redirect__ZGVbN4vv_hypotf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S
new file mode 100644
index 0000000000..a3f6d21ce1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S
@@ -0,0 +1,265 @@
+/* Function hypotf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      HIGH LEVEL OVERVIEW
+ *
+ *      Calculate z = (x*x+y*y)
+ *      Calculate reciplicle sqrt (z)
+ *      Calculate make two NR iterations
+ *
+ *      ALGORITHM DETAILS
+ *
+ *    Multiprecision branch for _HA_ only
+ *      Remove sigm from both arguments
+ *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
+ *      Split _x int _a and _b for multiprecision
+ *      If _x >> _y we will we will not split _y for multiprecision
+ *      all _y will be put into lower part (_d) and higher part (_c = 0)
+ *      Fixing _hilo_mask for the case _x >> _y
+ *      Split _y into _c and _d for multiprecision with fixed mask
+ *
+ *      compute Hi and Lo parts of _z = _x*_x + _y*_y
+ *
+ *      _zHi = _a*_a + _c*_c
+ *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
+ *      _z = _zHi + _zLo
+ *
+ *    No multiprecision branch for _LA_ and _EP_
+ *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
+ *
+ *    Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout
+ *
+ *    Compute resciplicle sqrt s0 ~ 1.0/sqrt(_z),
+ *      that multiplied by _z, is final result for _EP_ version.
+ *
+ *    First iteration (or zero iteration):
+ *       s =  z * s0
+ *       h = .5 * s0
+ *       d =  s *  h - .5
+ *
+ *    Second iteration:
+ *       h = d * h + h
+ *       s = s * d + s
+ *       d = s * s - z (in multiprecision for _HA_)
+ *
+ *    result = s - h * d
+ *
+ *    EP version of the function can be implemented as y[i]=sqrt(a[i]^2+b[i]^2)
+ *    with all intermediate operations done in target precision for i=1,..,n.
+ *    It can return result y[i]=0 in case a[i]^2 and b[i]^2 underflow in target
+ *    precision (for some i). It can return result y[i]=NAN in case
+ *    a[i]^2+b[i]^2 overflow in target precision, for some i. It can return
+ *    result y[i]=NAN in case a[i] or b[i] is infinite, for some i.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_shypot_data_internal
+ */
+#define _sHiLoMask                    	0
+#define _sAbsMask                     	16
+#define _sHalf                        	32
+#define _LowBoundary                  	48
+#define _HighBoundary                 	64
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4vv_hypotf_sse4)
+        subq      $88, %rsp
+        cfi_def_cfa_offset(96)
+
+/*
+ *  Implementation
+ * Multiprecision branch for _HA_ only
+ * No multiprecision branch for _LA_
+ * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
+ */
+        movaps    %xmm0, %xmm8
+        movaps    %xmm1, %xmm2
+        mulps     %xmm0, %xmm8
+        mulps     %xmm1, %xmm2
+
+/*
+ *  Variables
+ *  Defines
+ *  Constants loading
+ */
+        movups    _sHalf+__svml_shypot_data_internal(%rip), %xmm5
+        addps     %xmm2, %xmm8
+
+/* _s0  ~ 1.0/sqrt(_z) */
+        rsqrtps   %xmm8, %xmm10
+
+/* First iteration */
+        movaps    %xmm10, %xmm2
+        movaps    %xmm8, %xmm3
+        mulps     %xmm8, %xmm2
+        mulps     %xmm5, %xmm10
+        movaps    %xmm2, %xmm6
+        mulps     %xmm10, %xmm6
+
+/* Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout */
+        movdqu    _LowBoundary+__svml_shypot_data_internal(%rip), %xmm4
+        subps     %xmm6, %xmm5
+
+/* Second iteration */
+        movaps    %xmm5, %xmm7
+        pcmpgtd   %xmm8, %xmm4
+        mulps     %xmm2, %xmm5
+        mulps     %xmm10, %xmm7
+        addps     %xmm5, %xmm2
+        addps     %xmm7, %xmm10
+
+/* Finish second iteration in native precision for _LA_ */
+        movaps    %xmm2, %xmm9
+        mulps     %xmm2, %xmm9
+        pcmpgtd   _HighBoundary+__svml_shypot_data_internal(%rip), %xmm3
+        subps     %xmm8, %xmm9
+        mulps     %xmm9, %xmm10
+        por       %xmm3, %xmm4
+        movmskps  %xmm4, %edx
+        subps     %xmm10, %xmm2
+
+/*  The end of implementation  */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1 xmm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm2, %xmm0
+        addq      $88, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(96)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm1, 48(%rsp)
+        movups    %xmm2, 64(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -80)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -88)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -96)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    64(%rsp), %xmm2
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -80)
+        cfi_offset(13, -88)
+        cfi_offset(14, -96)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm2
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        movss     48(%rsp,%r14,4), %xmm1
+        call      hypotf@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4vv_hypotf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_shypot_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _sHiLoMask[4][1];
+        __declspec(align(16)) VUINT32 _sAbsMask[4][1];
+        __declspec(align(16)) VUINT32 _sHalf[4][1];
+        __declspec(align(16)) VUINT32 _LowBoundary[4][1];
+        __declspec(align(16)) VUINT32 _HighBoundary[4][1];
+} __svml_shypot_data_internal;
+#endif
+__svml_shypot_data_internal:
+        /* legacy algorithm */
+        .long 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000  /* _sHiLoMask     */
+        .align 16
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _sAbsMask      */
+        .align 16
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000  /* _sHalf         */
+        .align 16
+        .long 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000  /* _LowBoundary   */
+        .align 16
+        .long 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000  /* _HighBoundary  */
+        .align 16
+        .type	__svml_shypot_data_internal,@object
+        .size	__svml_shypot_data_internal,.-__svml_shypot_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S
new file mode 100644
index 0000000000..d37556e331
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized hypotf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8vv_hypotf _ZGVdN8vv_hypotf_sse_wrapper
+#include "../svml_s_hypotf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c
new file mode 100644
index 0000000000..6cc497e73d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized sinf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8vv_hypotf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8vv_hypotf, __GI__ZGVdN8vv_hypotf,
+	       __redirect__ZGVdN8vv_hypotf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S
new file mode 100644
index 0000000000..733022ff01
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S
@@ -0,0 +1,269 @@
+/* Function hypotf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *      HIGH LEVEL OVERVIEW
+ *
+ *      Calculate z = (x*x+y*y)
+ *      Calculate reciplicle sqrt (z)
+ *      Calculate make two NR iterations
+ *
+ *      ALGORITHM DETAILS
+ *
+ *    Multiprecision branch for _HA_ only
+ *      Remove sigm from both arguments
+ *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
+ *      Split _x int _a and _b for multiprecision
+ *      If _x >> _y we will we will not split _y for multiprecision
+ *      all _y will be put into lower part (_d) and higher part (_c = 0)
+ *      Fixing _hilo_mask for the case _x >> _y
+ *      Split _y into _c and _d for multiprecision with fixed mask
+ *
+ *      compute Hi and Lo parts of _z = _x*_x + _y*_y
+ *
+ *      _zHi = _a*_a + _c*_c
+ *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
+ *      _z = _zHi + _zLo
+ *
+ *    No multiprecision branch for _LA_ and _EP_
+ *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
+ *
+ *    Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout
+ *
+ *    Compute resciplicle sqrt s0 ~ 1.0/sqrt(_z),
+ *      that multiplied by _z, is final result for _EP_ version.
+ *
+ *    First iteration (or zero iteration):
+ *       s =  z * s0
+ *       h = .5 * s0
+ *       d =  s *  h - .5
+ *
+ *    Second iteration:
+ *       h = d * h + h
+ *       s = s * d + s
+ *       d = s * s - z (in multiprecision for _HA_)
+ *
+ *    result = s - h * d
+ *
+ *    EP version of the function can be implemented as y[i]=sqrt(a[i]^2+b[i]^2)
+ *    with all intermediate operations done in target precision for i=1,..,n.
+ *    It can return result y[i]=0 in case a[i]^2 and b[i]^2 underflow in target
+ *    precision (for some i). It can return result y[i]=NAN in case
+ *    a[i]^2+b[i]^2 overflow in target precision, for some i. It can return
+ *    result y[i]=NAN in case a[i] or b[i] is infinite, for some i.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_shypot_data_internal
+ */
+#define _sHiLoMask                    	0
+#define _sAbsMask                     	32
+#define _sHalf                        	64
+#define _LowBoundary                  	96
+#define _HighBoundary                 	128
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8vv_hypotf_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $128, %rsp
+
+/*
+ *  Implementation
+ * Multiprecision branch for _HA_ only
+ * No multiprecision branch for _LA_
+ * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
+ */
+        vmulps    %ymm0, %ymm0, %ymm8
+
+/*
+ *  Variables
+ *  Defines
+ *  Constants loading
+ */
+        vmovups   _sHalf+__svml_shypot_data_internal(%rip), %ymm7
+
+/* Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout */
+        vmovups   _LowBoundary+__svml_shypot_data_internal(%rip), %ymm2
+        vfmadd231ps %ymm1, %ymm1, %ymm8
+
+/* _s0  ~ 1.0/sqrt(_z) */
+        vrsqrtps  %ymm8, %ymm6
+        vpcmpgtd  %ymm8, %ymm2, %ymm3
+
+/* First iteration */
+        vmulps    %ymm8, %ymm6, %ymm9
+        vmulps    %ymm7, %ymm6, %ymm2
+        vfnmadd231ps %ymm9, %ymm2, %ymm7
+        vfmadd213ps %ymm9, %ymm7, %ymm9
+
+/* Second iteration */
+        vfmadd132ps %ymm7, %ymm2, %ymm2
+        vpcmpgtd  _HighBoundary+__svml_shypot_data_internal(%rip), %ymm8, %ymm4
+        vpor      %ymm4, %ymm3, %ymm5
+
+/* Finish second iteration in native precision for _LA_ */
+        vfmsub231ps %ymm9, %ymm9, %ymm8
+        vmovmskps %ymm5, %edx
+        vfnmadd213ps %ymm9, %ymm8, %ymm2
+
+/*  The end of implementation  */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1 ymm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %ymm2, %ymm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm0, 32(%rsp)
+        vmovups   %ymm1, 64(%rsp)
+        vmovups   %ymm2, 96(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm2
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   96(%rsp), %ymm2
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm2
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        movss     64(%rsp,%r14,4), %xmm1
+        call      hypotf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 96(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8vv_hypotf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_shypot_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _sHiLoMask[8][1];
+        __declspec(align(32)) VUINT32 _sAbsMask[8][1];
+        __declspec(align(32)) VUINT32 _sHalf[8][1];
+        __declspec(align(32)) VUINT32 _LowBoundary[8][1];
+        __declspec(align(32)) VUINT32 _HighBoundary[8][1];
+} __svml_shypot_data_internal;
+#endif
+__svml_shypot_data_internal:
+        /* legacy algorithm */
+        .long 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000  /* _sHiLoMask     */
+        .align 32
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _sAbsMask      */
+        .align 32
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000  /* _sHalf         */
+        .align 32
+        .long 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000  /* _LowBoundary   */
+        .align 32
+        .long 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000  /* _HighBoundary  */
+        .align 32
+        .type	__svml_shypot_data_internal,@object
+        .size	__svml_shypot_data_internal,.-__svml_shypot_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_hypot2_core.S b/sysdeps/x86_64/fpu/svml_d_hypot2_core.S
new file mode 100644
index 0000000000..ea98f36324
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_hypot2_core.S
@@ -0,0 +1,29 @@
+/* Function hypot vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2vv_hypot)
+WRAPPER_IMPL_SSE2_ff hypot
+END (_ZGVbN2vv_hypot)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2vv_hypot)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_hypot4_core.S b/sysdeps/x86_64/fpu/svml_d_hypot4_core.S
new file mode 100644
index 0000000000..cedbbff2b6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_hypot4_core.S
@@ -0,0 +1,29 @@
+/* Function hypot vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4vv_hypot)
+WRAPPER_IMPL_AVX_ff _ZGVbN2vv_hypot
+END (_ZGVdN4vv_hypot)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4vv_hypot)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S
new file mode 100644
index 0000000000..e0fef5203d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function hypot vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4vv_hypot)
+WRAPPER_IMPL_AVX_ff _ZGVbN2vv_hypot
+END (_ZGVcN4vv_hypot)
diff --git a/sysdeps/x86_64/fpu/svml_d_hypot8_core.S b/sysdeps/x86_64/fpu/svml_d_hypot8_core.S
new file mode 100644
index 0000000000..7588e4407b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_hypot8_core.S
@@ -0,0 +1,25 @@
+/* Function hypot vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8vv_hypot)
+WRAPPER_IMPL_AVX512_ff _ZGVdN4vv_hypot
+END (_ZGVeN8vv_hypot)
diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf16_core.S b/sysdeps/x86_64/fpu/svml_s_hypotf16_core.S
new file mode 100644
index 0000000000..06d421a926
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_hypotf16_core.S
@@ -0,0 +1,25 @@
+/* Function hypotf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16vv_hypotf)
+WRAPPER_IMPL_AVX512_ff _ZGVdN8vv_hypotf
+END (_ZGVeN16vv_hypotf)
diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf4_core.S b/sysdeps/x86_64/fpu/svml_s_hypotf4_core.S
new file mode 100644
index 0000000000..7e8553cae4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_hypotf4_core.S
@@ -0,0 +1,29 @@
+/* Function hypotf vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4vv_hypotf)
+WRAPPER_IMPL_SSE2_ff hypotf
+END (_ZGVbN4vv_hypotf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4vv_hypotf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf8_core.S b/sysdeps/x86_64/fpu/svml_s_hypotf8_core.S
new file mode 100644
index 0000000000..a9bf27370b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_hypotf8_core.S
@@ -0,0 +1,29 @@
+/* Function hypotf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8vv_hypotf)
+WRAPPER_IMPL_AVX_ff _ZGVbN4vv_hypotf
+END (_ZGVdN8vv_hypotf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8vv_hypotf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S
new file mode 100644
index 0000000000..8b8008a7e9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function hypotf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY(_ZGVcN8vv_hypotf)
+WRAPPER_IMPL_AVX_ff _ZGVbN4vv_hypotf
+END(_ZGVcN8vv_hypotf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c
new file mode 100644
index 0000000000..c6a26a63e4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-hypot.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c
new file mode 100644
index 0000000000..c6a26a63e4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-hypot.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c
new file mode 100644
index 0000000000..c6a26a63e4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-hypot.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot.c
new file mode 100644
index 0000000000..c0f600a443
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC hypot
+#include "test-vector-abi-arg2.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 5746bb5be3..9bc9d1dafa 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVbN2vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 8d3d5493ed..c41994d90a 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -33,6 +33,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVdN4vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index f43328f2ff..881f6c801a 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVcN4vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 8b566c199a..6fd106fe68 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVeN8vv_pow)
 VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c
new file mode 100644
index 0000000000..97d11ad1d3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-hypotf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c
new file mode 100644
index 0000000000..97d11ad1d3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-hypotf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c
new file mode 100644
index 0000000000..97d11ad1d3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-hypotf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c
new file mode 100644
index 0000000000..38776fa724
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC hypotf
+#include "test-vector-abi-arg2.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 3d3218a310..4c2ea6ddfe 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 7d75b9f60f..1d5d952d07 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 405dde49bc..7a750f3781 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -33,6 +33,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 7558443f2e..af816a7789 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf)
 VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 04/18] x86-64: Add vector exp2/exp2f implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (2 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 03/18] x86-64: Add vector hypot/hypotf " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:25   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 05/18] x86-64: Add vector exp10/exp10f " Sunil K Pandey
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector exp2/exp2f with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
 .../fpu/multiarch/svml_d_exp22_core-sse2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_exp22_core.c  |  27 ++
 .../fpu/multiarch/svml_d_exp22_core_sse4.S    | 325 +++++++++++++++++
 .../fpu/multiarch/svml_d_exp24_core-sse.S     |  20 +
 .../x86_64/fpu/multiarch/svml_d_exp24_core.c  |  27 ++
 .../fpu/multiarch/svml_d_exp24_core_avx2.S    | 341 ++++++++++++++++++
 .../fpu/multiarch/svml_d_exp28_core-avx2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_exp28_core.c  |  27 ++
 .../fpu/multiarch/svml_d_exp28_core_avx512.S  | 301 ++++++++++++++++
 .../fpu/multiarch/svml_s_exp2f16_core-avx2.S  |  20 +
 .../fpu/multiarch/svml_s_exp2f16_core.c       |  28 ++
 .../multiarch/svml_s_exp2f16_core_avx512.S    | 271 ++++++++++++++
 .../fpu/multiarch/svml_s_exp2f4_core-sse2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_s_exp2f4_core.c |  28 ++
 .../fpu/multiarch/svml_s_exp2f4_core_sse4.S   | 238 ++++++++++++
 .../fpu/multiarch/svml_s_exp2f8_core-sse.S    |  20 +
 .../x86_64/fpu/multiarch/svml_s_exp2f8_core.c |  28 ++
 .../fpu/multiarch/svml_s_exp2f8_core_avx2.S   | 245 +++++++++++++
 sysdeps/x86_64/fpu/svml_d_exp22_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_exp24_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S    |  25 ++
 sysdeps/x86_64/fpu/svml_d_exp28_core.S        |  25 ++
 sysdeps/x86_64/fpu/svml_s_exp2f16_core.S      |  25 ++
 sysdeps/x86_64/fpu/svml_s_exp2f4_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_exp2f8_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S   |  25 ++
 .../x86_64/fpu/test-double-libmvec-exp2-avx.c |   1 +
 .../fpu/test-double-libmvec-exp2-avx2.c       |   1 +
 .../fpu/test-double-libmvec-exp2-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-double-libmvec-exp2.c |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-exp2f-avx.c |   1 +
 .../fpu/test-float-libmvec-exp2f-avx2.c       |   1 +
 .../fpu/test-float-libmvec-exp2f-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 2293 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp22_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp24_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp28_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index adf65f6bc2..36d6643eb9 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -142,4 +142,15 @@
 #define __DECL_SIMD_hypotf32x
 #define __DECL_SIMD_hypotf64x
 #define __DECL_SIMD_hypotf128x
+
+#define __DECL_SIMD_exp2
+#define __DECL_SIMD_exp2f
+#define __DECL_SIMD_exp2l
+#define __DECL_SIMD_exp2f16
+#define __DECL_SIMD_exp2f32
+#define __DECL_SIMD_exp2f64
+#define __DECL_SIMD_exp2f128
+#define __DECL_SIMD_exp2f32x
+#define __DECL_SIMD_exp2f64x
+#define __DECL_SIMD_exp2f128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 2ed820a0dc..645088cbf3 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -127,7 +127,7 @@ __MATHCALL (logb,, (_Mdouble_ __x));
 
 #ifdef __USE_ISOC99
 /* Compute base-2 exponential of X.  */
-__MATHCALL (exp2,, (_Mdouble_ __x));
+__MATHCALL_VEC (exp2,, (_Mdouble_ __x));
 
 /* Compute base-2 logarithm of X.  */
 __MATHCALL (log2,, (_Mdouble_ __x));
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 12bb03245b..1717f2dee9 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -49,32 +49,40 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
 GLIBC_2.35 _ZGVbN2v_acos F
 GLIBC_2.35 _ZGVbN2v_asin F
 GLIBC_2.35 _ZGVbN2v_atan F
+GLIBC_2.35 _ZGVbN2v_exp2 F
 GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
 GLIBC_2.35 _ZGVbN4v_asinf F
 GLIBC_2.35 _ZGVbN4v_atanf F
+GLIBC_2.35 _ZGVbN4v_exp2f F
 GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
 GLIBC_2.35 _ZGVcN4v_asin F
 GLIBC_2.35 _ZGVcN4v_atan F
+GLIBC_2.35 _ZGVcN4v_exp2 F
 GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
 GLIBC_2.35 _ZGVcN8v_asinf F
 GLIBC_2.35 _ZGVcN8v_atanf F
+GLIBC_2.35 _ZGVcN8v_exp2f F
 GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
 GLIBC_2.35 _ZGVdN4v_asin F
 GLIBC_2.35 _ZGVdN4v_atan F
+GLIBC_2.35 _ZGVdN4v_exp2 F
 GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
 GLIBC_2.35 _ZGVdN8v_asinf F
 GLIBC_2.35 _ZGVdN8v_atanf F
+GLIBC_2.35 _ZGVdN8v_exp2f F
 GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
 GLIBC_2.35 _ZGVeN16v_asinf F
 GLIBC_2.35 _ZGVeN16v_atanf F
+GLIBC_2.35 _ZGVeN16v_exp2f F
 GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
 GLIBC_2.35 _ZGVeN8v_asin F
 GLIBC_2.35 _ZGVeN8v_atan F
+GLIBC_2.35 _ZGVeN8v_exp2 F
 GLIBC_2.35 _ZGVeN8vv_hypot F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 437977c5fd..c7a972521b 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -74,6 +74,10 @@
 #  define __DECL_SIMD_hypot __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_hypotf
 #  define __DECL_SIMD_hypotf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_exp2
+#  define __DECL_SIMD_exp2 __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_exp2f
+#  define __DECL_SIMD_exp2f __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index cda31479a6..0994e6dfac 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -36,6 +36,8 @@
 !GCC$ builtin (asinf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (hypot) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (hypotf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (exp2) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (exp2f) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -57,3 +59,5 @@
 !GCC$ builtin (asinf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (hypot) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (hypotf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (exp2) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (exp2f) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 7769a02731..03b2364417 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -27,6 +27,7 @@ libmvec-funcs = \
   atan \
   cos \
   exp \
+  exp2 \
   hypot \
   log \
   pow \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index e359e5dc2c..12b7ad1830 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -17,10 +17,12 @@ libmvec {
     _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
     _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
     _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
+    _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
     _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
     _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
+    _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
     _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
   }
 }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index a7513ec94e..bc4479ad39 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1276,6 +1276,26 @@ float: 1
 float128: 2
 ldouble: 1
 
+Function: "exp2_vlen16":
+float: 1
+
+Function: "exp2_vlen2":
+double: 1
+
+Function: "exp2_vlen4":
+double: 1
+float: 1
+
+Function: "exp2_vlen4_avx2":
+double: 1
+
+Function: "exp2_vlen8":
+double: 1
+float: 1
+
+Function: "exp2_vlen8_avx2":
+float: 1
+
 Function: "exp_downward":
 double: 1
 float: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S
new file mode 100644
index 0000000000..330260baaa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized exp2, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_exp2 _ZGVbN2v_exp2_sse2
+#include "../svml_d_exp22_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c
new file mode 100644
index 0000000000..e0cf198030
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized exp2, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_exp2
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_exp2, __GI__ZGVbN2v_exp2, __redirect__ZGVbN2v_exp2)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S
new file mode 100644
index 0000000000..7388c242f6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S
@@ -0,0 +1,325 @@
+/* Function exp2 vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   exp2(x)  = 2^n * T[j] * (1 + P(y))
+ *   where
+ *        x = m*(1/K) + y,    y in [-1/K..1/K]
+ *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
+ *
+ *        values of 2^j/K are tabulated
+ *
+ *        P(y) is a minimax polynomial approximation of exp2(x)-1
+ *        on small interval [-1/K..1/K]
+ *
+ *  Special cases:
+ *
+ *   exp2(NaN)  = NaN
+ *   exp2(+INF) = +INF
+ *   exp2(-INF) = 0
+ *   exp2(x)    = 1 for subnormals
+ *   For IEEE double
+ *     if x >= 1024.0 then exp2(x) overflows
+ *     if x < -1076.0 then exp2(x) underflows
+ *
+ */
+
+/* Offsets for data table __svml_dexp2_data_internal
+ */
+#define _dbT                          	0
+#define _dbShifter                    	1024
+#define _dPC1                         	1040
+#define _dPC2                         	1056
+#define _dPC3                         	1072
+#define _dPC4                         	1088
+#define _lIndexMask                   	1104
+#define _iAbsMask                     	1120
+#define _iDomainRange                 	1136
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_exp2_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+
+/*  R  */
+        movaps    %xmm0, %xmm7
+        movups    _dbShifter+__svml_dexp2_data_internal(%rip), %xmm1
+
+/* out, basePtr, iIndex, iBaseOfs, iSize, iGran, iOfs */
+        lea       __svml_dexp2_data_internal(%rip), %rsi
+
+/*  Load arument  */
+        movaps    %xmm1, %xmm10
+        addpd     %xmm0, %xmm10
+        movaps    %xmm10, %xmm6
+        subpd     %xmm1, %xmm6
+        subpd     %xmm6, %xmm7
+
+/*
+ *  Polynomial
+ * poly(dN) = a1*dR+...+a4*dR^4
+ */
+        movups    _dPC4+__svml_dexp2_data_internal(%rip), %xmm8
+        mulpd     %xmm7, %xmm8
+        addpd     _dPC3+__svml_dexp2_data_internal(%rip), %xmm8
+        mulpd     %xmm7, %xmm8
+        addpd     _dPC2+__svml_dexp2_data_internal(%rip), %xmm8
+        movdqu    _lIndexMask+__svml_dexp2_data_internal(%rip), %xmm9
+
+/*  Index and lookup  */
+        movdqa    %xmm9, %xmm5
+        pandn     %xmm10, %xmm9
+        pand      %xmm10, %xmm5
+
+/*  2^N  */
+        psllq     $45, %xmm9
+        movd      %xmm5, %eax
+        movq      _iAbsMask+__svml_dexp2_data_internal(%rip), %xmm2
+
+/* Check for overflow\underflow  */
+        pshufd    $221, %xmm0, %xmm4
+        pextrw    $4, %xmm5, %ecx
+
+/* a1+...+a4*dR^3 ! */
+        mulpd     %xmm7, %xmm8
+        shll      $3, %eax
+        pand      %xmm2, %xmm4
+        shll      $3, %ecx
+        movq      (%rsi,%rax), %xmm1
+        movhpd    (%rsi,%rcx), %xmm1
+
+/* dR=dR*dT */
+        mulpd     %xmm1, %xmm7
+        addpd     _dPC1+__svml_dexp2_data_internal(%rip), %xmm8
+
+/*
+ *  Reconstruction
+ * exp2 = {2^N later}*(Tj+Tj*poly)
+ * dN = dT+dT*dR*(a1+...+a4*dR^3)
+ */
+        mulpd     %xmm7, %xmm8
+        addpd     %xmm8, %xmm1
+        movq      _iDomainRange+__svml_dexp2_data_internal(%rip), %xmm3
+        pcmpgtd   %xmm3, %xmm4
+        movmskps  %xmm4, %edx
+
+/* quick 2^N */
+        paddq     %xmm9, %xmm1
+        andl      $3, %edx
+
+/*  Finish   */
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm1, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm1, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm1
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      exp2@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN2v_exp2_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dexp2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _dbT[(1<<7)][2];
+        __declspec(align(16)) VUINT32 _dbShifter[2][2];
+        __declspec(align(16)) VUINT32 _dPC1[2][2];
+        __declspec(align(16)) VUINT32 _dPC2[2][2];
+        __declspec(align(16)) VUINT32 _dPC3[2][2];
+        __declspec(align(16)) VUINT32 _dPC4[2][2];
+        __declspec(align(16)) VUINT32 _lIndexMask[2][2];
+        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
+        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
+} __svml_dexp2_data_internal;
+#endif
+__svml_dexp2_data_internal:
+        /*== _dbT ==*/
+        .quad 0x3ff0000000000000, 0x3ff0163da9fb3335   /*2^( 0 /128),2^( 1 /128)*/
+        .quad 0x3ff02c9a3e778061, 0x3ff04315e86e7f85   /*2^( 2 /128),2^( 3 /128)*/
+        .quad 0x3ff059b0d3158574, 0x3ff0706b29ddf6de   /*2^( 4 /128),2^( 5 /128)*/
+        .quad 0x3ff0874518759bc8, 0x3ff09e3ecac6f383   /*2^( 6 /128),2^( 7 /128)*/
+        .quad 0x3ff0b5586cf9890f, 0x3ff0cc922b7247f7   /*2^( 8 /128),2^( 9 /128)*/
+        .quad 0x3ff0e3ec32d3d1a2, 0x3ff0fb66affed31b   /*2^( 10 /128),2^( 11 /128)*/
+        .quad 0x3ff11301d0125b51, 0x3ff12abdc06c31cc   /*2^( 12 /128),2^( 13 /128)*/
+        .quad 0x3ff1429aaea92de0, 0x3ff15a98c8a58e51   /*2^( 14 /128),2^( 15 /128)*/
+        .quad 0x3ff172b83c7d517b, 0x3ff18af9388c8dea   /*2^( 16 /128),2^( 17 /128)*/
+        .quad 0x3ff1a35beb6fcb75, 0x3ff1bbe084045cd4   /*2^( 18 /128),2^( 19 /128)*/
+        .quad 0x3ff1d4873168b9aa, 0x3ff1ed5022fcd91d   /*2^( 20 /128),2^( 21 /128)*/
+        .quad 0x3ff2063b88628cd6, 0x3ff21f49917ddc96   /*2^( 22 /128),2^( 23 /128)*/
+        .quad 0x3ff2387a6e756238, 0x3ff251ce4fb2a63f   /*2^( 24 /128),2^( 25 /128)*/
+        .quad 0x3ff26b4565e27cdd, 0x3ff284dfe1f56381   /*2^( 26 /128),2^( 27 /128)*/
+        .quad 0x3ff29e9df51fdee1, 0x3ff2b87fd0dad990   /*2^( 28 /128),2^( 29 /128)*/
+        .quad 0x3ff2d285a6e4030b, 0x3ff2ecafa93e2f56   /*2^( 30 /128),2^( 31 /128)*/
+        .quad 0x3ff306fe0a31b715, 0x3ff32170fc4cd831   /*2^( 32 /128),2^( 33 /128)*/
+        .quad 0x3ff33c08b26416ff, 0x3ff356c55f929ff1   /*2^( 34 /128),2^( 35 /128)*/
+        .quad 0x3ff371a7373aa9cb, 0x3ff38cae6d05d866   /*2^( 36 /128),2^( 37 /128)*/
+        .quad 0x3ff3a7db34e59ff7, 0x3ff3c32dc313a8e5   /*2^( 38 /128),2^( 39 /128)*/
+        .quad 0x3ff3dea64c123422, 0x3ff3fa4504ac801c   /*2^( 40 /128),2^( 41 /128)*/
+        .quad 0x3ff4160a21f72e2a, 0x3ff431f5d950a897   /*2^( 42 /128),2^( 43 /128)*/
+        .quad 0x3ff44e086061892d, 0x3ff46a41ed1d0057   /*2^( 44 /128),2^( 45 /128)*/
+        .quad 0x3ff486a2b5c13cd0, 0x3ff4a32af0d7d3de   /*2^( 46 /128),2^( 47 /128)*/
+        .quad 0x3ff4bfdad5362a27, 0x3ff4dcb299fddd0d   /*2^( 48 /128),2^( 49 /128)*/
+        .quad 0x3ff4f9b2769d2ca7, 0x3ff516daa2cf6642   /*2^( 50 /128),2^( 51 /128)*/
+        .quad 0x3ff5342b569d4f82, 0x3ff551a4ca5d920f   /*2^( 52 /128),2^( 53 /128)*/
+        .quad 0x3ff56f4736b527da, 0x3ff58d12d497c7fd   /*2^( 54 /128),2^( 55 /128)*/
+        .quad 0x3ff5ab07dd485429, 0x3ff5c9268a5946b7   /*2^( 56 /128),2^( 57 /128)*/
+        .quad 0x3ff5e76f15ad2148, 0x3ff605e1b976dc09   /*2^( 58 /128),2^( 59 /128)*/
+        .quad 0x3ff6247eb03a5585, 0x3ff6434634ccc320   /*2^( 60 /128),2^( 61 /128)*/
+        .quad 0x3ff6623882552225, 0x3ff68155d44ca973   /*2^( 62 /128),2^( 63 /128)*/
+        .quad 0x3ff6a09e667f3bcd, 0x3ff6c012750bdabf   /*2^( 64 /128),2^( 65 /128)*/
+        .quad 0x3ff6dfb23c651a2f, 0x3ff6ff7df9519484   /*2^( 66 /128),2^( 67 /128)*/
+        .quad 0x3ff71f75e8ec5f74, 0x3ff73f9a48a58174   /*2^( 68 /128),2^( 69 /128)*/
+        .quad 0x3ff75feb564267c9, 0x3ff780694fde5d3f   /*2^( 70 /128),2^( 71 /128)*/
+        .quad 0x3ff7a11473eb0187, 0x3ff7c1ed0130c132   /*2^( 72 /128),2^( 73 /128)*/
+        .quad 0x3ff7e2f336cf4e62, 0x3ff80427543e1a12   /*2^( 74 /128),2^( 75 /128)*/
+        .quad 0x3ff82589994cce13, 0x3ff8471a4623c7ad   /*2^( 76 /128),2^( 77 /128)*/
+        .quad 0x3ff868d99b4492ed, 0x3ff88ac7d98a6699   /*2^( 78 /128),2^( 79 /128)*/
+        .quad 0x3ff8ace5422aa0db, 0x3ff8cf3216b5448c   /*2^( 80 /128),2^( 81 /128)*/
+        .quad 0x3ff8f1ae99157736, 0x3ff9145b0b91ffc6   /*2^( 82 /128),2^( 83 /128)*/
+        .quad 0x3ff93737b0cdc5e5, 0x3ff95a44cbc8520f   /*2^( 84 /128),2^( 85 /128)*/
+        .quad 0x3ff97d829fde4e50, 0x3ff9a0f170ca07ba   /*2^( 86 /128),2^( 87 /128)*/
+        .quad 0x3ff9c49182a3f090, 0x3ff9e86319e32323   /*2^( 88 /128),2^( 89 /128)*/
+        .quad 0x3ffa0c667b5de565, 0x3ffa309bec4a2d33   /*2^( 90 /128),2^( 91 /128)*/
+        .quad 0x3ffa5503b23e255d, 0x3ffa799e1330b358   /*2^( 92 /128),2^( 93 /128)*/
+        .quad 0x3ffa9e6b5579fdbf, 0x3ffac36bbfd3f37a   /*2^( 94 /128),2^( 95 /128)*/
+        .quad 0x3ffae89f995ad3ad, 0x3ffb0e07298db666   /*2^( 96 /128),2^( 97 /128)*/
+        .quad 0x3ffb33a2b84f15fb, 0x3ffb59728de5593a   /*2^( 98 /128),2^( 99 /128)*/
+        .quad 0x3ffb7f76f2fb5e47, 0x3ffba5b030a1064a   /*2^( 100 /128),2^( 101 /128)*/
+        .quad 0x3ffbcc1e904bc1d2, 0x3ffbf2c25bd71e09   /*2^( 102 /128),2^( 103 /128)*/
+        .quad 0x3ffc199bdd85529c, 0x3ffc40ab5fffd07a   /*2^( 104 /128),2^( 105 /128)*/
+        .quad 0x3ffc67f12e57d14b, 0x3ffc8f6d9406e7b5   /*2^( 106 /128),2^( 107 /128)*/
+        .quad 0x3ffcb720dcef9069, 0x3ffcdf0b555dc3fa   /*2^( 108 /128),2^( 109 /128)*/
+        .quad 0x3ffd072d4a07897c, 0x3ffd2f87080d89f2   /*2^( 110 /128),2^( 111 /128)*/
+        .quad 0x3ffd5818dcfba487, 0x3ffd80e316c98398   /*2^( 112 /128),2^( 113 /128)*/
+        .quad 0x3ffda9e603db3285, 0x3ffdd321f301b460   /*2^( 114 /128),2^( 115 /128)*/
+        .quad 0x3ffdfc97337b9b5f, 0x3ffe264614f5a129   /*2^( 116 /128),2^( 117 /128)*/
+        .quad 0x3ffe502ee78b3ff6, 0x3ffe7a51fbc74c83   /*2^( 118 /128),2^( 119 /128)*/
+        .quad 0x3ffea4afa2a490da, 0x3ffecf482d8e67f1   /*2^( 120 /128),2^( 121 /128)*/
+        .quad 0x3ffefa1bee615a27, 0x3fff252b376bba97   /*2^( 122 /128),2^( 123 /128)*/
+        .quad 0x3fff50765b6e4540, 0x3fff7bfdad9cbe14   /*2^( 124 /128),2^( 125 /128)*/
+        .quad 0x3fffa7c1819e90d8, 0x3fffd3c22b8f71f1 /*2^( 126 /128),2^( 127 /128)*/
+        .align 16
+        .quad 0x42c8000000000000, 0x42c8000000000000  /* _dbShifter - 0x433-7=0x42c shifted right on K!*/
+        //log2(relerr) = -53.547756365162
+        .align 16
+        .quad 0x3fe62e42fefa3685, 0x3fe62e42fefa3685 /* _dPC1 */
+        .align 16
+        .quad 0x3fcebfbdff82ca48, 0x3fcebfbdff82ca48 /* _dPC2 */
+        .align 16
+        .quad 0x3fac6b09b180f045, 0x3fac6b09b180f045 /* _dPC3 */
+        .align 16
+        .quad 0x3f83b2ab5bb1268f, 0x3f83b2ab5bb1268f /* _dPC4 */
+        .align 16
+        .quad 0x000000000000007f, 0x000000000000007f          /* _lIndexMask =(2^K-1)*/
+        .align 16
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff          /* _iAbsMask */
+        .align 16
+        .long 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff /* _iDomainRange */
+        .align 16
+        .type	__svml_dexp2_data_internal,@object
+        .size	__svml_dexp2_data_internal,.-__svml_dexp2_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S
new file mode 100644
index 0000000000..51c5de1100
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized exp2, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_exp2 _ZGVdN4v_exp2_sse_wrapper
+#include "../svml_d_exp24_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c
new file mode 100644
index 0000000000..bb979afde6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized exp2, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_exp2
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_exp2, __GI__ZGVdN4v_exp2, __redirect__ZGVdN4v_exp2)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S
new file mode 100644
index 0000000000..6aaadafeeb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S
@@ -0,0 +1,341 @@
+/* Function exp2 vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   exp2(x)  = 2^n * T[j] * (1 + P(y))
+ *   where
+ *        x = m*(1/K) + y,    y in [-1/K..1/K]
+ *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
+ *
+ *        values of 2^j/K are tabulated
+ *
+ *        P(y) is a minimax polynomial approximation of exp2(x)-1
+ *        on small interval [-1/K..1/K]
+ *
+ *  Special cases:
+ *
+ *   exp2(NaN)  = NaN
+ *   exp2(+INF) = +INF
+ *   exp2(-INF) = 0
+ *   exp2(x)    = 1 for subnormals
+ *   For IEEE double
+ *     if x >= 1024.0 then exp2(x) overflows
+ *     if x < -1076.0 then exp2(x) underflows
+ *
+ */
+
+/* Offsets for data table __svml_dexp2_data_internal
+ */
+#define _dbT                          	0
+#define _dbShifter                    	1024
+#define _dPC1                         	1056
+#define _dPC2                         	1088
+#define _dPC3                         	1120
+#define _dPC4                         	1152
+#define _lIndexMask                   	1184
+#define _iAbsMask                     	1216
+#define _iDomainRange                 	1248
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_exp2_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+
+/* out, basePtr, iIndex, iBaseOfs, iSize, iGran, iOfs */
+        lea       __svml_dexp2_data_internal(%rip), %r8
+        vmovupd   _dbShifter+__svml_dexp2_data_internal(%rip), %ymm4
+        vmovupd   _lIndexMask+__svml_dexp2_data_internal(%rip), %ymm3
+        vmovapd   %ymm0, %ymm1
+
+/*  Load arument  */
+        vaddpd    %ymm4, %ymm1, %ymm2
+        vsubpd    %ymm4, %ymm2, %ymm0
+
+/*  Index and lookup  */
+        vandps    %ymm3, %ymm2, %ymm9
+        vpandn    %ymm2, %ymm3, %ymm2
+
+/*  2^N  */
+        vpsllq    $45, %ymm2, %ymm3
+
+/*  R  */
+        vsubpd    %ymm0, %ymm1, %ymm15
+
+/* Check for overflow\underflow  */
+        vextractf128 $1, %ymm1, %xmm5
+
+/*
+ *  Polynomial
+ * poly(dN) = a1*dR+...+a4*dR^4
+ */
+        vmovupd   _dPC4+__svml_dexp2_data_internal(%rip), %ymm0
+        vshufps   $221, %xmm5, %xmm1, %xmm6
+        vandps    _iAbsMask+__svml_dexp2_data_internal(%rip), %xmm6, %xmm7
+        vpcmpgtd  _iDomainRange+__svml_dexp2_data_internal(%rip), %xmm7, %xmm8
+        vfmadd213pd _dPC3+__svml_dexp2_data_internal(%rip), %ymm15, %ymm0
+        vmovmskps %xmm8, %eax
+        vfmadd213pd _dPC2+__svml_dexp2_data_internal(%rip), %ymm15, %ymm0
+
+/* a1+...+a4*dR^3 ! */
+        vfmadd213pd _dPC1+__svml_dexp2_data_internal(%rip), %ymm15, %ymm0
+        vextractf128 $1, %ymm9, %xmm12
+        vmovd     %xmm9, %edx
+        vmovd     %xmm12, %esi
+        shll      $3, %edx
+        vpextrd   $2, %xmm9, %ecx
+        shll      $3, %esi
+        vpextrd   $2, %xmm12, %edi
+        shll      $3, %ecx
+        vmovq     (%r8,%rdx), %xmm10
+        shll      $3, %edi
+        vmovq     (%r8,%rsi), %xmm13
+        vmovhpd   (%r8,%rcx), %xmm10, %xmm11
+        vmovhpd   (%r8,%rdi), %xmm13, %xmm14
+        vinsertf128 $1, %xmm14, %ymm11, %ymm4
+
+/* dR=dR*dT */
+        vmulpd    %ymm15, %ymm4, %ymm15
+
+/*
+ *  Reconstruction
+ * exp2 = {2^N later}*(Tj+Tj*poly)
+ * dN = dT+dT*dR*(a1+...+a4*dR^3)
+ */
+        vfmadd213pd %ymm4, %ymm15, %ymm0
+
+/* quick 2^N */
+        vpaddq    %ymm3, %ymm0, %ymm0
+
+/*  Finish   */
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm1, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      exp2@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_exp2_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dexp2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _dbT[(1<<7)][2];
+        __declspec(align(32)) VUINT32 _dbShifter[4][2];
+        __declspec(align(32)) VUINT32 _dPC1[4][2];
+        __declspec(align(32)) VUINT32 _dPC2[4][2];
+        __declspec(align(32)) VUINT32 _dPC3[4][2];
+        __declspec(align(32)) VUINT32 _dPC4[4][2];
+        __declspec(align(32)) VUINT32 _lIndexMask[4][2];
+        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
+        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
+} __svml_dexp2_data_internal;
+#endif
+__svml_dexp2_data_internal:
+        /*== _dbT ==*/
+        .quad 0x3ff0000000000000, 0x3ff0163da9fb3335   /*2^( 0 /128),2^( 1 /128)*/
+        .quad 0x3ff02c9a3e778061, 0x3ff04315e86e7f85   /*2^( 2 /128),2^( 3 /128)*/
+        .quad 0x3ff059b0d3158574, 0x3ff0706b29ddf6de   /*2^( 4 /128),2^( 5 /128)*/
+        .quad 0x3ff0874518759bc8, 0x3ff09e3ecac6f383   /*2^( 6 /128),2^( 7 /128)*/
+        .quad 0x3ff0b5586cf9890f, 0x3ff0cc922b7247f7   /*2^( 8 /128),2^( 9 /128)*/
+        .quad 0x3ff0e3ec32d3d1a2, 0x3ff0fb66affed31b   /*2^( 10 /128),2^( 11 /128)*/
+        .quad 0x3ff11301d0125b51, 0x3ff12abdc06c31cc   /*2^( 12 /128),2^( 13 /128)*/
+        .quad 0x3ff1429aaea92de0, 0x3ff15a98c8a58e51   /*2^( 14 /128),2^( 15 /128)*/
+        .quad 0x3ff172b83c7d517b, 0x3ff18af9388c8dea   /*2^( 16 /128),2^( 17 /128)*/
+        .quad 0x3ff1a35beb6fcb75, 0x3ff1bbe084045cd4   /*2^( 18 /128),2^( 19 /128)*/
+        .quad 0x3ff1d4873168b9aa, 0x3ff1ed5022fcd91d   /*2^( 20 /128),2^( 21 /128)*/
+        .quad 0x3ff2063b88628cd6, 0x3ff21f49917ddc96   /*2^( 22 /128),2^( 23 /128)*/
+        .quad 0x3ff2387a6e756238, 0x3ff251ce4fb2a63f   /*2^( 24 /128),2^( 25 /128)*/
+        .quad 0x3ff26b4565e27cdd, 0x3ff284dfe1f56381   /*2^( 26 /128),2^( 27 /128)*/
+        .quad 0x3ff29e9df51fdee1, 0x3ff2b87fd0dad990   /*2^( 28 /128),2^( 29 /128)*/
+        .quad 0x3ff2d285a6e4030b, 0x3ff2ecafa93e2f56   /*2^( 30 /128),2^( 31 /128)*/
+        .quad 0x3ff306fe0a31b715, 0x3ff32170fc4cd831   /*2^( 32 /128),2^( 33 /128)*/
+        .quad 0x3ff33c08b26416ff, 0x3ff356c55f929ff1   /*2^( 34 /128),2^( 35 /128)*/
+        .quad 0x3ff371a7373aa9cb, 0x3ff38cae6d05d866   /*2^( 36 /128),2^( 37 /128)*/
+        .quad 0x3ff3a7db34e59ff7, 0x3ff3c32dc313a8e5   /*2^( 38 /128),2^( 39 /128)*/
+        .quad 0x3ff3dea64c123422, 0x3ff3fa4504ac801c   /*2^( 40 /128),2^( 41 /128)*/
+        .quad 0x3ff4160a21f72e2a, 0x3ff431f5d950a897   /*2^( 42 /128),2^( 43 /128)*/
+        .quad 0x3ff44e086061892d, 0x3ff46a41ed1d0057   /*2^( 44 /128),2^( 45 /128)*/
+        .quad 0x3ff486a2b5c13cd0, 0x3ff4a32af0d7d3de   /*2^( 46 /128),2^( 47 /128)*/
+        .quad 0x3ff4bfdad5362a27, 0x3ff4dcb299fddd0d   /*2^( 48 /128),2^( 49 /128)*/
+        .quad 0x3ff4f9b2769d2ca7, 0x3ff516daa2cf6642   /*2^( 50 /128),2^( 51 /128)*/
+        .quad 0x3ff5342b569d4f82, 0x3ff551a4ca5d920f   /*2^( 52 /128),2^( 53 /128)*/
+        .quad 0x3ff56f4736b527da, 0x3ff58d12d497c7fd   /*2^( 54 /128),2^( 55 /128)*/
+        .quad 0x3ff5ab07dd485429, 0x3ff5c9268a5946b7   /*2^( 56 /128),2^( 57 /128)*/
+        .quad 0x3ff5e76f15ad2148, 0x3ff605e1b976dc09   /*2^( 58 /128),2^( 59 /128)*/
+        .quad 0x3ff6247eb03a5585, 0x3ff6434634ccc320   /*2^( 60 /128),2^( 61 /128)*/
+        .quad 0x3ff6623882552225, 0x3ff68155d44ca973   /*2^( 62 /128),2^( 63 /128)*/
+        .quad 0x3ff6a09e667f3bcd, 0x3ff6c012750bdabf   /*2^( 64 /128),2^( 65 /128)*/
+        .quad 0x3ff6dfb23c651a2f, 0x3ff6ff7df9519484   /*2^( 66 /128),2^( 67 /128)*/
+        .quad 0x3ff71f75e8ec5f74, 0x3ff73f9a48a58174   /*2^( 68 /128),2^( 69 /128)*/
+        .quad 0x3ff75feb564267c9, 0x3ff780694fde5d3f   /*2^( 70 /128),2^( 71 /128)*/
+        .quad 0x3ff7a11473eb0187, 0x3ff7c1ed0130c132   /*2^( 72 /128),2^( 73 /128)*/
+        .quad 0x3ff7e2f336cf4e62, 0x3ff80427543e1a12   /*2^( 74 /128),2^( 75 /128)*/
+        .quad 0x3ff82589994cce13, 0x3ff8471a4623c7ad   /*2^( 76 /128),2^( 77 /128)*/
+        .quad 0x3ff868d99b4492ed, 0x3ff88ac7d98a6699   /*2^( 78 /128),2^( 79 /128)*/
+        .quad 0x3ff8ace5422aa0db, 0x3ff8cf3216b5448c   /*2^( 80 /128),2^( 81 /128)*/
+        .quad 0x3ff8f1ae99157736, 0x3ff9145b0b91ffc6   /*2^( 82 /128),2^( 83 /128)*/
+        .quad 0x3ff93737b0cdc5e5, 0x3ff95a44cbc8520f   /*2^( 84 /128),2^( 85 /128)*/
+        .quad 0x3ff97d829fde4e50, 0x3ff9a0f170ca07ba   /*2^( 86 /128),2^( 87 /128)*/
+        .quad 0x3ff9c49182a3f090, 0x3ff9e86319e32323   /*2^( 88 /128),2^( 89 /128)*/
+        .quad 0x3ffa0c667b5de565, 0x3ffa309bec4a2d33   /*2^( 90 /128),2^( 91 /128)*/
+        .quad 0x3ffa5503b23e255d, 0x3ffa799e1330b358   /*2^( 92 /128),2^( 93 /128)*/
+        .quad 0x3ffa9e6b5579fdbf, 0x3ffac36bbfd3f37a   /*2^( 94 /128),2^( 95 /128)*/
+        .quad 0x3ffae89f995ad3ad, 0x3ffb0e07298db666   /*2^( 96 /128),2^( 97 /128)*/
+        .quad 0x3ffb33a2b84f15fb, 0x3ffb59728de5593a   /*2^( 98 /128),2^( 99 /128)*/
+        .quad 0x3ffb7f76f2fb5e47, 0x3ffba5b030a1064a   /*2^( 100 /128),2^( 101 /128)*/
+        .quad 0x3ffbcc1e904bc1d2, 0x3ffbf2c25bd71e09   /*2^( 102 /128),2^( 103 /128)*/
+        .quad 0x3ffc199bdd85529c, 0x3ffc40ab5fffd07a   /*2^( 104 /128),2^( 105 /128)*/
+        .quad 0x3ffc67f12e57d14b, 0x3ffc8f6d9406e7b5   /*2^( 106 /128),2^( 107 /128)*/
+        .quad 0x3ffcb720dcef9069, 0x3ffcdf0b555dc3fa   /*2^( 108 /128),2^( 109 /128)*/
+        .quad 0x3ffd072d4a07897c, 0x3ffd2f87080d89f2   /*2^( 110 /128),2^( 111 /128)*/
+        .quad 0x3ffd5818dcfba487, 0x3ffd80e316c98398   /*2^( 112 /128),2^( 113 /128)*/
+        .quad 0x3ffda9e603db3285, 0x3ffdd321f301b460   /*2^( 114 /128),2^( 115 /128)*/
+        .quad 0x3ffdfc97337b9b5f, 0x3ffe264614f5a129   /*2^( 116 /128),2^( 117 /128)*/
+        .quad 0x3ffe502ee78b3ff6, 0x3ffe7a51fbc74c83   /*2^( 118 /128),2^( 119 /128)*/
+        .quad 0x3ffea4afa2a490da, 0x3ffecf482d8e67f1   /*2^( 120 /128),2^( 121 /128)*/
+        .quad 0x3ffefa1bee615a27, 0x3fff252b376bba97   /*2^( 122 /128),2^( 123 /128)*/
+        .quad 0x3fff50765b6e4540, 0x3fff7bfdad9cbe14   /*2^( 124 /128),2^( 125 /128)*/
+        .quad 0x3fffa7c1819e90d8, 0x3fffd3c22b8f71f1 /*2^( 126 /128),2^( 127 /128)*/
+        .align 32
+        .quad 0x42c8000000000000, 0x42c8000000000000, 0x42c8000000000000, 0x42c8000000000000  /* _dbShifter - 0x433-7=0x42c shifted right on K!*/
+        //log2(relerr) = -53.547756365162
+        .align 32
+        .quad 0x3fe62e42fefa3685, 0x3fe62e42fefa3685, 0x3fe62e42fefa3685, 0x3fe62e42fefa3685 /* _dPC1 */
+        .align 32
+        .quad 0x3fcebfbdff82ca48, 0x3fcebfbdff82ca48, 0x3fcebfbdff82ca48, 0x3fcebfbdff82ca48 /* _dPC2 */
+        .align 32
+        .quad 0x3fac6b09b180f045, 0x3fac6b09b180f045, 0x3fac6b09b180f045, 0x3fac6b09b180f045 /* _dPC3 */
+        .align 32
+        .quad 0x3f83b2ab5bb1268f, 0x3f83b2ab5bb1268f, 0x3f83b2ab5bb1268f, 0x3f83b2ab5bb1268f /* _dPC4 */
+        .align 32
+        .quad 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f          /* _lIndexMask =(2^K-1)*/
+        .align 32
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff          /* _iAbsMask */
+        .align 32
+        .long 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff /* _iDomainRange */
+        .align 32
+        .type	__svml_dexp2_data_internal,@object
+        .size	__svml_dexp2_data_internal,.-__svml_dexp2_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S
new file mode 100644
index 0000000000..c9c17f0aaa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized exp2, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_exp2 _ZGVeN8v_exp2_avx2_wrapper
+#include "../svml_d_exp28_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c
new file mode 100644
index 0000000000..3be9e88e98
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized exp2, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_exp2
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_exp2, __GI__ZGVeN8v_exp2, __redirect__ZGVeN8v_exp2)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S
new file mode 100644
index 0000000000..90f21695f0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S
@@ -0,0 +1,301 @@
+/* Function exp2 vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *     Double precision mantissa represented as: 1.b1b2b3 ... b52
+ *     Constant for double precision: S = 2^48 x 1.5
+ *
+ *     2^X = 2^Xo  x  2^{X-Xo}
+ *     2^X = 2^K  x  2^fo  x  2^{X-Xo}
+ *     2^X = 2^K  x  2^fo  x  2^r
+ *
+ *     2^K  --> Manual scaling
+ *     2^fo --> Table lookup
+ *     r    --> 1 + poly    (r = X - Xo)
+ *
+ *     Xo = K  +  fo
+ *     Xo = K  +  0.x1x2x3x4
+ *
+ *     r = X - Xo
+ *       = Vreduce(X, imm)
+ *       = X - VRndScale(X, imm),    where Xo = VRndScale(X, imm)
+ *
+ *     Rnd(S + X) = S + Xo,    where S is selected as S = 2^19 x 1.5
+ *         S + X = S + floor(X) + 0.x1x2x3x4
+ *     Rnd(S + X) = Rnd(2^48 x 1.5 + X)
+ *     (Note: 2^exp x 1.b1b2b3 ... b52,  2^{exp-52} = 2^-4 for exp=48)
+ *
+ *     exp2(x) =  2^K  x  2^fo  x (1 + poly(r)),   where 2^r = 1 + poly(r)
+ *
+ *     Scale back:
+ *     dest = src1 x 2^floor(src2)
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dexp2_data_internal_avx512
+ */
+#define Frac_PowerD0                  	0
+#define poly_coeff1                   	128
+#define poly_coeff2                   	192
+#define poly_coeff3                   	256
+#define poly_coeff4                   	320
+#define poly_coeff5                   	384
+#define poly_coeff6                   	448
+#define add_const                     	512
+#define AbsMask                       	576
+#define Threshold                     	640
+#define _lIndexMask                   	704
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_exp2_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   poly_coeff5+__svml_dexp2_data_internal_avx512(%rip), %zmm14
+        vmovups   poly_coeff6+__svml_dexp2_data_internal_avx512(%rip), %zmm6
+
+/*
+ * Reduced argument
+ * where VREDUCE is available
+ */
+        vreducepd $65, {sae}, %zmm0, %zmm10
+        vmovups   poly_coeff4+__svml_dexp2_data_internal_avx512(%rip), %zmm7
+        vmovups   add_const+__svml_dexp2_data_internal_avx512(%rip), %zmm3
+        vmovups   poly_coeff3+__svml_dexp2_data_internal_avx512(%rip), %zmm8
+        vmovups   __svml_dexp2_data_internal_avx512(%rip), %zmm13
+
+/* c6*r   + c5 */
+        vfmadd231pd {rn-sae}, %zmm10, %zmm6, %zmm14
+        vmovups   poly_coeff2+__svml_dexp2_data_internal_avx512(%rip), %zmm9
+        vmovups   Threshold+__svml_dexp2_data_internal_avx512(%rip), %zmm2
+
+/*
+ *
+ *  HA
+ * Variables and constants
+ * Load constants and vector(s)
+ */
+        vmovups   poly_coeff1+__svml_dexp2_data_internal_avx512(%rip), %zmm11
+
+/* c6*r^2 + c5*r + c4 */
+        vfmadd213pd {rn-sae}, %zmm7, %zmm10, %zmm14
+
+/*
+ * Integer form of K+0.b1b2b3b4 in lower bits - call K_plus_f0
+ * Mantisssa of normalized double precision FP: 1.b1b2...b52
+ */
+        vaddpd    {rd-sae}, %zmm3, %zmm0, %zmm4
+        vandpd    AbsMask+__svml_dexp2_data_internal_avx512(%rip), %zmm0, %zmm1
+
+/* c6*r^3 + c5*r^2 + c4*r + c3 */
+        vfmadd213pd {rn-sae}, %zmm8, %zmm10, %zmm14
+        vcmppd    $29, {sae}, %zmm2, %zmm1, %k0
+
+/* c6*r^4 + c5*r^3 + c4*r^2 + c3*r + c2 */
+        vfmadd213pd {rn-sae}, %zmm9, %zmm10, %zmm14
+        kmovw     %k0, %edx
+
+/* c6*r^5 + c5*r^4 + c4*r^3 + c3*r^2 + c2*r + c1 */
+        vfmadd213pd {rn-sae}, %zmm11, %zmm10, %zmm14
+
+/* Table value: 2^(0.b1b2b3b4) */
+        vpandq    _lIndexMask+__svml_dexp2_data_internal_avx512(%rip), %zmm4, %zmm5
+        vpermt2pd Frac_PowerD0+64+__svml_dexp2_data_internal_avx512(%rip), %zmm5, %zmm13
+
+/* T*r */
+        vmulpd    {rn-sae}, %zmm10, %zmm13, %zmm12
+
+/* T + (T*r*(c6*r^5 + c5*r^4 + c4*r^3 + c3*r^2 + c2*r + c1)) */
+        vfmadd213pd {rn-sae}, %zmm13, %zmm12, %zmm14
+
+/* Scaling placed at the end to avoid accuracy loss when T*r*scale underflows */
+        vscalefpd {rn-sae}, %zmm0, %zmm14, %zmm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %zmm1, %zmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm0, 64(%rsp)
+        vmovups   %zmm1, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm1
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      exp2@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_exp2_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dexp2_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Frac_PowerD0[16][2];
+        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
+        __declspec(align(64)) VUINT32 add_const[8][2];
+        __declspec(align(64)) VUINT32 AbsMask[8][2];
+        __declspec(align(64)) VUINT32 Threshold[8][2];
+        __declspec(align(64)) VUINT32 _lIndexMask[8][2];
+} __svml_dexp2_data_internal_avx512;
+#endif
+__svml_dexp2_data_internal_avx512:
+        /*== Frac_PowerD0 ==*/
+        .quad 0x3FF0000000000000
+        .quad 0x3FF0B5586CF9890F
+        .quad 0x3FF172B83C7D517B
+        .quad 0x3FF2387A6E756238
+        .quad 0x3FF306FE0A31B715
+        .quad 0x3FF3DEA64C123422
+        .quad 0x3FF4BFDAD5362A27
+        .quad 0x3FF5AB07DD485429
+        .quad 0x3FF6A09E667F3BCD
+        .quad 0x3FF7A11473EB0187
+        .quad 0x3FF8ACE5422AA0DB
+        .quad 0x3FF9C49182A3F090
+        .quad 0x3FFAE89F995AD3AD
+        .quad 0x3FFC199BDD85529C
+        .quad 0x3FFD5818DCFBA487
+        .quad 0x3FFEA4AFA2A490DA
+        .align 64
+        .quad 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B  /*== poly_coeff1 ==*/
+        .align 64
+        .quad 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A  /*== poly_coeff2 ==*/
+        .align 64
+        .quad 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9  /*== poly_coeff3 ==*/
+        .align 64
+        .quad 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252  /*== poly_coeff4 ==*/
+        .align 64
+        .quad 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19  /*== poly_coeff5 ==*/
+        .align 64
+        .quad 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B  /*== poly_coeff6 ==*/
+        .align 64
+        .quad 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000  /* add_const     */
+        .align 64
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff  /* AbsMask       */
+        .align 64
+        .quad 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000  /* Threshold     */
+        .align 64
+        .quad 0x000000000000000F, 0x000000000000000F, 0x000000000000000F, 0x000000000000000F, 0x000000000000000F, 0x000000000000000F, 0x000000000000000F, 0x000000000000000F  /* _lIndexMask   */
+        .align 64
+        .type	__svml_dexp2_data_internal_avx512,@object
+        .size	__svml_dexp2_data_internal_avx512,.-__svml_dexp2_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S
new file mode 100644
index 0000000000..4daa687852
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized exp2f.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_exp2f _ZGVeN16v_exp2f_avx2_wrapper
+#include "../svml_s_exp2f16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c
new file mode 100644
index 0000000000..e90d9d8684
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized exp2f, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_exp2f
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_exp2f, __GI__ZGVeN16v_exp2f,
+	       __redirect__ZGVeN16v_exp2f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S
new file mode 100644
index 0000000000..6b512159bc
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S
@@ -0,0 +1,271 @@
+/* Function exp2f vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *     Single precision mantissa represented as: 1.b1b2b3 ... b23
+ *     Constant for single precision: S = 2^19 x 1.5
+ *
+ *     2^X = 2^Xo  x  2^{X-Xo}
+ *     2^X = 2^K  x  2^fo  x  2^{X-Xo}
+ *     2^X = 2^K  x  2^fo  x  2^r
+ *
+ *     2^K  --> Manual scaling
+ *     2^fo --> Table lookup
+ *     r    --> 1 + poly    (r = X - Xo)
+ *
+ *     Xo = K  +  fo
+ *     Xo = K  +  0.x1x2x3x4
+ *
+ *     r = X - Xo
+ *       = Vreduce(X, imm)
+ *       = X - VRndScale(X, imm),    where Xo = VRndScale(X, imm)
+ *
+ *     Rnd(S + X) = S + Xo,    where S is selected as S = 2^19 x 1.5
+ *         S + X = S + floor(X) + 0.x1x2x3x4
+ *     Rnd(S + X) = Rnd(2^19 x 1.5 + X)
+ *     (Note: 2^exp x 1.b1b2b3 ... b23,  2^{exp-23} = 2^-4 for exp=19)
+ *
+ *     exp2(x) =  2^K  x  2^fo  x (1 + poly(r)),   where 2^r = 1 + poly(r)
+ *
+ *     Scale back:
+ *     dest = src1 x 2^floor(src2)
+ *
+ *
+ */
+
+/* Offsets for data table __svml_sexp2_data_internal_avx512
+ */
+#define Frac_PowerS0                  	0
+#define poly_coeff1                   	64
+#define poly_coeff2                   	128
+#define poly_coeff3                   	192
+#define add_const                     	256
+#define AbsMask                       	320
+#define Threshold                     	384
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_exp2f_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   add_const+__svml_sexp2_data_internal_avx512(%rip), %zmm3
+
+/*
+ * Reduced argument
+ * where VREDUCE is available
+ */
+        vreduceps $65, {sae}, %zmm0, %zmm6
+        vmovups   poly_coeff3+__svml_sexp2_data_internal_avx512(%rip), %zmm5
+        vmovups   poly_coeff2+__svml_sexp2_data_internal_avx512(%rip), %zmm10
+        vmovups   Threshold+__svml_sexp2_data_internal_avx512(%rip), %zmm2
+
+/*
+ *
+ *  HA
+ * Variables and constants
+ * Load constants and vector(s)
+ */
+        vmovups   poly_coeff1+__svml_sexp2_data_internal_avx512(%rip), %zmm7
+
+/*
+ * Integer form of K+0.b1b2b3b4 in lower bits - call K_plus_f0
+ * Mantisssa of normalized single precision FP: 1.b1b2...b23
+ */
+        vaddps    {rd-sae}, %zmm3, %zmm0, %zmm4
+        vandps    AbsMask+__svml_sexp2_data_internal_avx512(%rip), %zmm0, %zmm1
+
+/* c3*r   + c2 */
+        vfmadd231ps {rn-sae}, %zmm6, %zmm5, %zmm10
+        vcmpps    $30, {sae}, %zmm2, %zmm1, %k0
+
+/* c3*r^2 + c2*r + c1 */
+        vfmadd213ps {rn-sae}, %zmm7, %zmm6, %zmm10
+
+/* Table value: 2^(0.b1b2b3b4) */
+        vpermps   __svml_sexp2_data_internal_avx512(%rip), %zmm4, %zmm9
+        kmovw     %k0, %edx
+
+/* T*r */
+        vmulps    {rn-sae}, %zmm6, %zmm9, %zmm8
+
+/* T + (T*r*(c3*r^2 + c2*r + c1) */
+        vfmadd213ps {rn-sae}, %zmm9, %zmm8, %zmm10
+
+/* Scaling placed at the end to avoid accuracy loss when T*r*scale underflows */
+        vscalefps {rn-sae}, %zmm0, %zmm10, %zmm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %zmm1, %zmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm0, 64(%rsp)
+        vmovups   %zmm1, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm1
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      exp2f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_exp2f_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_sexp2_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Frac_PowerS0[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
+        __declspec(align(64)) VUINT32 add_const[16][1];
+        __declspec(align(64)) VUINT32 AbsMask[16][1];
+        __declspec(align(64)) VUINT32 Threshold[16][1];
+} __svml_sexp2_data_internal_avx512;
+#endif
+__svml_sexp2_data_internal_avx512:
+        /*== Frac_PowerS0 ==*/
+        .long 0x3F800000
+        .long 0x3F85AAC3
+        .long 0x3F8B95C2
+        .long 0x3F91C3D3
+        .long 0x3F9837F0
+        .long 0x3F9EF532
+        .long 0x3FA5FED7
+        .long 0x3FAD583F
+        .long 0x3FB504F3
+        .long 0x3FBD08A4
+        .long 0x3FC5672A
+        .long 0x3FCE248C
+        .long 0x3FD744FD
+        .long 0x3FE0CCDF
+        .long 0x3FEAC0C7
+        .long 0x3FF5257D
+        .align 64
+        .long 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222  /*== poly_coeff1 ==*/
+        .align 64
+        .long 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B  /*== poly_coeff2 ==*/
+        .align 64
+        .long 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA  /*== poly_coeff3 ==*/
+        .align 64
+        .long 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000   /* add_const */
+        .align 64
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff   /* AbsMask   */
+        .align 64
+        .long 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000   /* Threshold=126.0 */
+        .align 64
+        .type	__svml_sexp2_data_internal_avx512,@object
+        .size	__svml_sexp2_data_internal_avx512,.-__svml_sexp2_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S
new file mode 100644
index 0000000000..0b3fec834c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized exp2f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_exp2f _ZGVbN4v_exp2f_sse2
+#include "../svml_s_exp2f4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c
new file mode 100644
index 0000000000..db47118d97
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized exp2f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_exp2f
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_exp2f, __GI__ZGVbN4v_exp2f,
+	       __redirect__ZGVbN4v_exp2f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S
new file mode 100644
index 0000000000..0d9f45d5c3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S
@@ -0,0 +1,238 @@
+/* Function exp2f vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   exp2(x)  = 2^n * T[j] * (1 + P(y))
+ *   where
+ *        x = m*(1/K) + y,    y in [-1/K..1/K]
+ *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
+ *
+ *        values of 2^j/K are tabulated
+ *
+ *        P(y) is a minimax polynomial approximation of exp2(x)-1
+ *        on small interval [-1/K..1/K]
+ *
+ *  Special cases:
+ *
+ *   exp2(NaN)  = NaN
+ *   exp2(+INF) = +INF
+ *   exp2(-INF) = 0
+ *   exp2(x)    = 1 for subnormals
+ *   For IEEE float
+ *     if x >= 128.0 then exp2f(x) overflow
+ *     if x < -151.0 then exp2f(x) underflow
+ *
+ */
+
+/* Offsets for data table __svml_sexp2_data_internal
+ */
+#define _sShifter                     	0
+#define _sPC0                         	16
+#define _sPC1                         	32
+#define _sPC2                         	48
+#define _sPC3                         	64
+#define _sPC4                         	80
+#define _sPC5                         	96
+#define _sPC6                         	112
+#define _iAbsMask                     	128
+#define _iDomainRange                 	144
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_exp2f_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+
+/* Check for overflow\underflow  */
+        movups    __svml_sexp2_data_internal(%rip), %xmm1
+
+/*  Implementation  */
+        movaps    %xmm1, %xmm5
+
+/*  Polynomial  */
+        movups    _sPC6+__svml_sexp2_data_internal(%rip), %xmm4
+        addps     %xmm0, %xmm5
+        movaps    %xmm5, %xmm3
+
+/*  2^N  */
+        pslld     $23, %xmm5
+
+/* Check for overflow\underflow  */
+        movdqu    _iAbsMask+__svml_sexp2_data_internal(%rip), %xmm2
+        subps     %xmm1, %xmm3
+
+/*  R  */
+        movaps    %xmm0, %xmm1
+        pand      %xmm0, %xmm2
+        pcmpgtd   _iDomainRange+__svml_sexp2_data_internal(%rip), %xmm2
+        subps     %xmm3, %xmm1
+        movmskps  %xmm2, %edx
+        mulps     %xmm1, %xmm4
+        addps     _sPC5+__svml_sexp2_data_internal(%rip), %xmm4
+        mulps     %xmm1, %xmm4
+        addps     _sPC4+__svml_sexp2_data_internal(%rip), %xmm4
+        mulps     %xmm1, %xmm4
+        addps     _sPC3+__svml_sexp2_data_internal(%rip), %xmm4
+        mulps     %xmm1, %xmm4
+        addps     _sPC2+__svml_sexp2_data_internal(%rip), %xmm4
+        mulps     %xmm1, %xmm4
+        addps     _sPC1+__svml_sexp2_data_internal(%rip), %xmm4
+        mulps     %xmm4, %xmm1
+        addps     _sPC0+__svml_sexp2_data_internal(%rip), %xmm1
+
+/*  Reconstruction  */
+        paddd     %xmm5, %xmm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm1, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm1, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      exp2f@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_exp2f_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_sexp2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _sShifter[4][1];
+        __declspec(align(16)) VUINT32 _sPC0[4][1];
+        __declspec(align(16)) VUINT32 _sPC1[4][1];
+        __declspec(align(16)) VUINT32 _sPC2[4][1];
+        __declspec(align(16)) VUINT32 _sPC3[4][1];
+        __declspec(align(16)) VUINT32 _sPC4[4][1];
+        __declspec(align(16)) VUINT32 _sPC5[4][1];
+        __declspec(align(16)) VUINT32 _sPC6[4][1];
+        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
+        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
+} __svml_sexp2_data_internal;
+#endif
+__svml_sexp2_data_internal:
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000   /* _sShifter */
+        .align 16
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000   /* _sPC0  */
+        .align 16
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218   /* _sPC1  */
+        .align 16
+        .long 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef   /* _sPC2  */
+        .align 16
+        .long 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf   /* _sPC3  */
+        .align 16
+        .long 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c   /* _sPC4  */
+        .align 16
+        .long 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51   /* _sPC5  */
+        .align 16
+        .long 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c   /* _sPC6  */
+        //common
+        .align 16
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff   /* _iAbsMask */
+        .align 16
+        .long 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000   /* _iDomainRange=126.0 */
+        .align 16
+        .type	__svml_sexp2_data_internal,@object
+        .size	__svml_sexp2_data_internal,.-__svml_sexp2_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S
new file mode 100644
index 0000000000..4da2278ed8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized exp2f, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_exp2f _ZGVdN8v_exp2f_sse_wrapper
+#include "../svml_s_exp2f8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c
new file mode 100644
index 0000000000..dc34671263
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized exp2f, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_exp2f
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_exp2f, __GI__ZGVdN8v_exp2f,
+	       __redirect__ZGVdN8v_exp2f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S
new file mode 100644
index 0000000000..aa7af4be79
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S
@@ -0,0 +1,245 @@
+/* Function exp2f vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   exp2(x)  = 2^n * T[j] * (1 + P(y))
+ *   where
+ *        x = m*(1/K) + y,    y in [-1/K..1/K]
+ *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
+ *
+ *        values of 2^j/K are tabulated
+ *
+ *        P(y) is a minimax polynomial approximation of exp2(x)-1
+ *        on small interval [-1/K..1/K]
+ *
+ *  Special cases:
+ *
+ *   exp2(NaN)  = NaN
+ *   exp2(+INF) = +INF
+ *   exp2(-INF) = 0
+ *   exp2(x)    = 1 for subnormals
+ *   For IEEE float
+ *     if x >= 128.0 then exp2f(x) overflow
+ *     if x < -151.0 then exp2f(x) underflow
+ *
+ */
+
+/* Offsets for data table __svml_sexp2_data_internal
+ */
+#define _sShifter                     	0
+#define _sPC0                         	32
+#define _sPC1                         	64
+#define _sPC2                         	96
+#define _sPC3                         	128
+#define _sPC4                         	160
+#define _sPC5                         	192
+#define _sPC6                         	224
+#define _iAbsMask                     	256
+#define _iDomainRange                 	288
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_exp2f_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        vmovups   __svml_sexp2_data_internal(%rip), %ymm1
+
+/* Check for overflow\underflow  */
+        vmovups   _sPC6+__svml_sexp2_data_internal(%rip), %ymm7
+
+/*  Implementation  */
+        vaddps    %ymm1, %ymm0, %ymm6
+        vsubps    %ymm1, %ymm6, %ymm4
+
+/*  2^N  */
+        vpslld    $23, %ymm6, %ymm8
+
+/*  R  */
+        vsubps    %ymm4, %ymm0, %ymm5
+
+/*  Polynomial  */
+        vfmadd213ps _sPC5+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
+        vfmadd213ps _sPC4+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
+        vfmadd213ps _sPC3+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
+        vfmadd213ps _sPC2+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
+        vfmadd213ps _sPC1+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
+        vfmadd213ps _sPC0+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
+
+/* Check for overflow\underflow  */
+        vandps    _iAbsMask+__svml_sexp2_data_internal(%rip), %ymm0, %ymm2
+        vpcmpgtd  _iDomainRange+__svml_sexp2_data_internal(%rip), %ymm2, %ymm3
+        vmovmskps %ymm3, %edx
+
+/*  Reconstruction  */
+        vpaddd    %ymm8, %ymm7, %ymm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %ymm1, %ymm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm0, 32(%rsp)
+        vmovups   %ymm1, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm1
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      exp2f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_exp2f_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_sexp2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _sShifter[8][1];
+        __declspec(align(32)) VUINT32 _sPC0[8][1];
+        __declspec(align(32)) VUINT32 _sPC1[8][1];
+        __declspec(align(32)) VUINT32 _sPC2[8][1];
+        __declspec(align(32)) VUINT32 _sPC3[8][1];
+        __declspec(align(32)) VUINT32 _sPC4[8][1];
+        __declspec(align(32)) VUINT32 _sPC5[8][1];
+        __declspec(align(32)) VUINT32 _sPC6[8][1];
+        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
+        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
+} __svml_sexp2_data_internal;
+#endif
+__svml_sexp2_data_internal:
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000   /* _sShifter */
+        .align 32
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000   /* _sPC0  */
+        .align 32
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218   /* _sPC1  */
+        .align 32
+        .long 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef   /* _sPC2  */
+        .align 32
+        .long 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf   /* _sPC3  */
+        .align 32
+        .long 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c   /* _sPC4  */
+        .align 32
+        .long 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51   /* _sPC5  */
+        .align 32
+        .long 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c   /* _sPC6  */
+        //common
+        .align 32
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff   /* _iAbsMask */
+        .align 32
+        .long 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000   /* _iDomainRange=126.0 */
+        .align 32
+        .type	__svml_sexp2_data_internal,@object
+        .size	__svml_sexp2_data_internal,.-__svml_sexp2_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_exp22_core.S b/sysdeps/x86_64/fpu/svml_d_exp22_core.S
new file mode 100644
index 0000000000..f03080a977
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_exp22_core.S
@@ -0,0 +1,29 @@
+/* Function exp2 vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_exp2)
+WRAPPER_IMPL_SSE2 exp2
+END (_ZGVbN2v_exp2)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_exp2)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_exp24_core.S b/sysdeps/x86_64/fpu/svml_d_exp24_core.S
new file mode 100644
index 0000000000..40475c7a94
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_exp24_core.S
@@ -0,0 +1,29 @@
+/* Function exp2 vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_exp2)
+WRAPPER_IMPL_AVX _ZGVbN2v_exp2
+END (_ZGVdN4v_exp2)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_exp2)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S b/sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S
new file mode 100644
index 0000000000..a7d22409df
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S
@@ -0,0 +1,25 @@
+/* Function exp2 vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_exp2)
+WRAPPER_IMPL_AVX _ZGVbN2v_exp2
+END (_ZGVcN4v_exp2)
diff --git a/sysdeps/x86_64/fpu/svml_d_exp28_core.S b/sysdeps/x86_64/fpu/svml_d_exp28_core.S
new file mode 100644
index 0000000000..f68aaed427
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_exp28_core.S
@@ -0,0 +1,25 @@
+/* Function exp2 vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_exp2)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_exp2
+END (_ZGVeN8v_exp2)
diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f16_core.S b/sysdeps/x86_64/fpu/svml_s_exp2f16_core.S
new file mode 100644
index 0000000000..8ba4e82272
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_exp2f16_core.S
@@ -0,0 +1,25 @@
+/* Function exp2f vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_exp2f)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_exp2f
+END (_ZGVeN16v_exp2f)
diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f4_core.S b/sysdeps/x86_64/fpu/svml_s_exp2f4_core.S
new file mode 100644
index 0000000000..916f176dca
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_exp2f4_core.S
@@ -0,0 +1,29 @@
+/* Function exp2f vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_exp2f)
+WRAPPER_IMPL_SSE2 exp2f
+END (_ZGVbN4v_exp2f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_exp2f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f8_core.S b/sysdeps/x86_64/fpu/svml_s_exp2f8_core.S
new file mode 100644
index 0000000000..b8821b952b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_exp2f8_core.S
@@ -0,0 +1,29 @@
+/* Function exp2f vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_exp2f)
+WRAPPER_IMPL_AVX _ZGVbN4v_exp2f
+END (_ZGVdN8v_exp2f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_exp2f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S
new file mode 100644
index 0000000000..ddaaf3b59a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function exp2f vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_exp2f)
+WRAPPER_IMPL_AVX _ZGVbN4v_exp2f
+END (_ZGVcN8v_exp2f)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c
new file mode 100644
index 0000000000..341ec99724
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-exp2.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c
new file mode 100644
index 0000000000..341ec99724
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-exp2.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c
new file mode 100644
index 0000000000..341ec99724
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-exp2.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2.c
new file mode 100644
index 0000000000..b3b04f63e4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC exp2
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 9bc9d1dafa..2f7172bd7b 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot)
+VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index c41994d90a..e2d519faac 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot)
+VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 881f6c801a..1ce4d8b413 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot)
+VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 6fd106fe68..6c87cec648 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos)
 VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot)
+VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c
new file mode 100644
index 0000000000..0281d386fb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-exp2f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c
new file mode 100644
index 0000000000..0281d386fb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-exp2f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c
new file mode 100644
index 0000000000..0281d386fb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-exp2f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c
new file mode 100644
index 0000000000..bf57661bee
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC exp2f
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 4c2ea6ddfe..597d7d7598 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf)
+VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 1d5d952d07..3500eec810 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf)
+VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 7a750f3781..921b9c65d6 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf)
+VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index af816a7789..6cbcb57521 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf)
+VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 05/18] x86-64: Add vector exp10/exp10f implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (3 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 04/18] x86-64: Add vector exp2/exp2f " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:25   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 06/18] x86-64: Add vector cosh/coshf " Sunil K Pandey
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector exp10/exp10f with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
 .../fpu/multiarch/svml_d_exp102_core-sse2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_d_exp102_core.c |  27 ++
 .../fpu/multiarch/svml_d_exp102_core_sse4.S   | 418 +++++++++++++++++
 .../fpu/multiarch/svml_d_exp104_core-sse.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_exp104_core.c |  27 ++
 .../fpu/multiarch/svml_d_exp104_core_avx2.S   | 429 ++++++++++++++++++
 .../fpu/multiarch/svml_d_exp108_core-avx2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_d_exp108_core.c |  27 ++
 .../fpu/multiarch/svml_d_exp108_core_avx512.S | 287 ++++++++++++
 .../fpu/multiarch/svml_s_exp10f16_core-avx2.S |  20 +
 .../fpu/multiarch/svml_s_exp10f16_core.c      |  28 ++
 .../multiarch/svml_s_exp10f16_core_avx512.S   | 269 +++++++++++
 .../fpu/multiarch/svml_s_exp10f4_core-sse2.S  |  20 +
 .../fpu/multiarch/svml_s_exp10f4_core.c       |  28 ++
 .../fpu/multiarch/svml_s_exp10f4_core_sse4.S  | 311 +++++++++++++
 .../fpu/multiarch/svml_s_exp10f8_core-sse.S   |  20 +
 .../fpu/multiarch/svml_s_exp10f8_core.c       |  28 ++
 .../fpu/multiarch/svml_s_exp10f8_core_avx2.S  | 331 ++++++++++++++
 sysdeps/x86_64/fpu/svml_d_exp102_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_d_exp104_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S   |  25 +
 sysdeps/x86_64/fpu/svml_d_exp108_core.S       |  25 +
 sysdeps/x86_64/fpu/svml_s_exp10f16_core.S     |  25 +
 sysdeps/x86_64/fpu/svml_s_exp10f4_core.S      |  29 ++
 sysdeps/x86_64/fpu/svml_s_exp10f8_core.S      |  29 ++
 sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S  |  25 +
 .../fpu/test-double-libmvec-exp10-avx.c       |   1 +
 .../fpu/test-double-libmvec-exp10-avx2.c      |   1 +
 .../fpu/test-double-libmvec-exp10-avx512f.c   |   1 +
 .../x86_64/fpu/test-double-libmvec-exp10.c    |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../fpu/test-float-libmvec-exp10f-avx.c       |   1 +
 .../fpu/test-float-libmvec-exp10f-avx2.c      |   1 +
 .../fpu/test-float-libmvec-exp10f-avx512f.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-exp10f.c    |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 2617 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp102_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp104_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_exp108_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 36d6643eb9..bc18621f17 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -153,4 +153,15 @@
 #define __DECL_SIMD_exp2f32x
 #define __DECL_SIMD_exp2f64x
 #define __DECL_SIMD_exp2f128x
+
+#define __DECL_SIMD_exp10
+#define __DECL_SIMD_exp10f
+#define __DECL_SIMD_exp10l
+#define __DECL_SIMD_exp10f16
+#define __DECL_SIMD_exp10f32
+#define __DECL_SIMD_exp10f64
+#define __DECL_SIMD_exp10f128
+#define __DECL_SIMD_exp10f32x
+#define __DECL_SIMD_exp10f64x
+#define __DECL_SIMD_exp10f128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 645088cbf3..870778457f 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -111,7 +111,7 @@ __MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr)) __nonnull ((2));
 
 #if __GLIBC_USE (IEC_60559_FUNCS_EXT_C2X)
 /* Compute exponent to base ten.  */
-__MATHCALL (exp10,, (_Mdouble_ __x));
+__MATHCALL_VEC (exp10,, (_Mdouble_ __x));
 #endif
 
 #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 1717f2dee9..b3c1f59593 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -49,40 +49,48 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
 GLIBC_2.35 _ZGVbN2v_acos F
 GLIBC_2.35 _ZGVbN2v_asin F
 GLIBC_2.35 _ZGVbN2v_atan F
+GLIBC_2.35 _ZGVbN2v_exp10 F
 GLIBC_2.35 _ZGVbN2v_exp2 F
 GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
 GLIBC_2.35 _ZGVbN4v_asinf F
 GLIBC_2.35 _ZGVbN4v_atanf F
+GLIBC_2.35 _ZGVbN4v_exp10f F
 GLIBC_2.35 _ZGVbN4v_exp2f F
 GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
 GLIBC_2.35 _ZGVcN4v_asin F
 GLIBC_2.35 _ZGVcN4v_atan F
+GLIBC_2.35 _ZGVcN4v_exp10 F
 GLIBC_2.35 _ZGVcN4v_exp2 F
 GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
 GLIBC_2.35 _ZGVcN8v_asinf F
 GLIBC_2.35 _ZGVcN8v_atanf F
+GLIBC_2.35 _ZGVcN8v_exp10f F
 GLIBC_2.35 _ZGVcN8v_exp2f F
 GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
 GLIBC_2.35 _ZGVdN4v_asin F
 GLIBC_2.35 _ZGVdN4v_atan F
+GLIBC_2.35 _ZGVdN4v_exp10 F
 GLIBC_2.35 _ZGVdN4v_exp2 F
 GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
 GLIBC_2.35 _ZGVdN8v_asinf F
 GLIBC_2.35 _ZGVdN8v_atanf F
+GLIBC_2.35 _ZGVdN8v_exp10f F
 GLIBC_2.35 _ZGVdN8v_exp2f F
 GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
 GLIBC_2.35 _ZGVeN16v_asinf F
 GLIBC_2.35 _ZGVeN16v_atanf F
+GLIBC_2.35 _ZGVeN16v_exp10f F
 GLIBC_2.35 _ZGVeN16v_exp2f F
 GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
 GLIBC_2.35 _ZGVeN8v_asin F
 GLIBC_2.35 _ZGVeN8v_atan F
+GLIBC_2.35 _ZGVeN8v_exp10 F
 GLIBC_2.35 _ZGVeN8v_exp2 F
 GLIBC_2.35 _ZGVeN8vv_hypot F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index c7a972521b..f3f9c2e092 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -78,6 +78,10 @@
 #  define __DECL_SIMD_exp2 __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_exp2f
 #  define __DECL_SIMD_exp2f __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_exp10
+#  define __DECL_SIMD_exp10 __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_exp10f
+#  define __DECL_SIMD_exp10f __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index 0994e6dfac..c033abbedc 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -38,6 +38,8 @@
 !GCC$ builtin (hypotf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (exp2) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (exp2f) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (exp10) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (exp10f) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -61,3 +63,5 @@
 !GCC$ builtin (hypotf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (exp2) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (exp2f) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (exp10) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (exp10f) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 03b2364417..fd0a9da439 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -27,6 +27,7 @@ libmvec-funcs = \
   atan \
   cos \
   exp \
+  exp10 \
   exp2 \
   hypot \
   log \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 12b7ad1830..f29cfa4cbf 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -17,11 +17,13 @@ libmvec {
     _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
     _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
     _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
+    _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
     _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
     _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
     _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
+    _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
     _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
     _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
   }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index bc4479ad39..45f2e4bb53 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1252,6 +1252,26 @@ float: 1
 float128: 3
 ldouble: 2
 
+Function: "exp10_vlen16":
+float: 3
+
+Function: "exp10_vlen2":
+double: 1
+
+Function: "exp10_vlen4":
+double: 1
+float: 1
+
+Function: "exp10_vlen4_avx2":
+double: 1
+
+Function: "exp10_vlen8":
+double: 1
+float: 1
+
+Function: "exp10_vlen8_avx2":
+float: 1
+
 Function: "exp2":
 double: 1
 float: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S
new file mode 100644
index 0000000000..ab615c0323
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized exp10, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_exp10 _ZGVbN2v_exp10_sse2
+#include "../svml_d_exp102_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c
new file mode 100644
index 0000000000..5c5625b278
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized exp10, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_exp10
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_exp10, __GI__ZGVbN2v_exp10, __redirect__ZGVbN2v_exp10)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S
new file mode 100644
index 0000000000..7c6e5de3e0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S
@@ -0,0 +1,418 @@
+/* Function exp10 vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   exp10(x)  = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y))
+ *   where
+ *        x = m*log10(2)/K + y,  y in [-log10(2)/K..log10(2)/K]
+ *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
+ *
+ *        values of 2^j/K are tabulated
+ *
+ *        P(y) is a minimax polynomial approximation of exp10(x)-1
+ *        on small interval [-log10(2)/K..log10(2)/K]
+ *
+ *  Special cases:
+ *
+ *   exp10(NaN)  = NaN
+ *   exp10(+INF) = +INF
+ *   exp10(-INF) = 0
+ *   exp10(x)    = 1 for subnormals
+ *   For IEEE double
+ *     if x >  3.39782712893383973096e+02 then exp10(x) overflow
+ *     if x < -3.45133219101941108420e+02 then exp10(x) underflow
+ *
+ */
+
+/* Offsets for data table __svml_dexp10_data_internal
+ */
+#define _dbT                          	0
+#define _dbLg2_10                     	1024
+#define _dbShifter                    	1040
+#define _dbInvLg2_10hi                	1056
+#define _dbInvLg2_10lo                	1072
+#define _dPC1                         	1088
+#define _dPC2                         	1104
+#define _dPC3                         	1120
+#define _dPC4                         	1136
+#define _dPC5                         	1152
+#define _lExpMask                     	1168
+#define _iIndexMask                   	1184
+#define _iAbsMask                     	1200
+#define _iDomainRange                 	1216
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_exp10_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+
+/*  R  */
+        movaps    %xmm0, %xmm12
+
+/*  Load arument  */
+        movups    _dbLg2_10+__svml_dexp10_data_internal(%rip), %xmm13
+        lea       __svml_dexp10_data_internal(%rip), %rsi
+        mulpd     %xmm0, %xmm13
+        movups    _dbShifter+__svml_dexp10_data_internal(%rip), %xmm1
+        addpd     %xmm1, %xmm13
+        movaps    %xmm13, %xmm9
+        subpd     %xmm1, %xmm9
+        movups    _dbInvLg2_10hi+__svml_dexp10_data_internal(%rip), %xmm8
+        mulpd     %xmm9, %xmm8
+        movups    _dbInvLg2_10lo+__svml_dexp10_data_internal(%rip), %xmm10
+        mulpd     %xmm9, %xmm10
+        subpd     %xmm8, %xmm12
+        subpd     %xmm10, %xmm12
+
+/*
+ *  Polynomial
+ * poly(dN) = a1*dR+...+a5*dR^5
+ */
+        movups    _dPC5+__svml_dexp10_data_internal(%rip), %xmm11
+        mulpd     %xmm12, %xmm11
+        addpd     _dPC4+__svml_dexp10_data_internal(%rip), %xmm11
+        mulpd     %xmm12, %xmm11
+        addpd     _dPC3+__svml_dexp10_data_internal(%rip), %xmm11
+        mulpd     %xmm12, %xmm11
+        addpd     _dPC2+__svml_dexp10_data_internal(%rip), %xmm11
+
+/* a1+...+a5*dR^4 ! */
+        mulpd     %xmm12, %xmm11
+        addpd     _dPC1+__svml_dexp10_data_internal(%rip), %xmm11
+        movq      _iIndexMask+__svml_dexp10_data_internal(%rip), %xmm5
+
+/*  Index and lookup  */
+        pshufd    $136, %xmm13, %xmm6
+
+/*  2^N  */
+        psllq     $45, %xmm13
+        pand      %xmm5, %xmm6
+
+/* iIndex*=sizeof(D); */
+        pslld     $3, %xmm6
+        movd      %xmm6, %eax
+        pshufd    $1, %xmm6, %xmm7
+        movq      _iAbsMask+__svml_dexp10_data_internal(%rip), %xmm2
+
+/* a1*dR+...+a5*dR^5 */
+        mulpd     %xmm11, %xmm12
+        movd      %xmm7, %ecx
+
+/* Check for overflow\underflow  */
+        pshufd    $221, %xmm0, %xmm4
+        movq      _iDomainRange+__svml_dexp10_data_internal(%rip), %xmm3
+        pand      %xmm2, %xmm4
+        movslq    %eax, %rax
+        pcmpgtd   %xmm3, %xmm4
+        movslq    %ecx, %rcx
+        movmskps  %xmm4, %edx
+
+/* lM==EXP(2^N) */
+        pand      _lExpMask+__svml_dexp10_data_internal(%rip), %xmm13
+        movsd     (%rsi,%rax), %xmm1
+        movhpd    (%rsi,%rcx), %xmm1
+
+/* Tj*poly */
+        mulpd     %xmm1, %xmm12
+        addpd     %xmm12, %xmm1
+
+/* quick 2^N */
+        paddq     %xmm13, %xmm1
+        andl      $3, %edx
+
+/*  Finish   */
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm1, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm1, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm1
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      exp10@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN2v_exp10_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dexp10_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _dbT[(1<<7)][2];
+        __declspec(align(16)) VUINT32 _dbLg2_10[2][2];
+        __declspec(align(16)) VUINT32 _dbShifter[2][2];
+        __declspec(align(16)) VUINT32 _dbInvLg2_10hi[2][2];
+        __declspec(align(16)) VUINT32 _dbInvLg2_10lo[2][2];
+        __declspec(align(16)) VUINT32 _dPC1[2][2];
+        __declspec(align(16)) VUINT32 _dPC2[2][2];
+        __declspec(align(16)) VUINT32 _dPC3[2][2];
+        __declspec(align(16)) VUINT32 _dPC4[2][2];
+        __declspec(align(16)) VUINT32 _dPC5[2][2];
+        __declspec(align(16)) VUINT32 _lExpMask[2][2];
+        __declspec(align(16)) VUINT32 _iIndexMask[4][1];
+        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
+        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
+} __svml_dexp10_data_internal;
+#endif
+__svml_dexp10_data_internal:
+        /*== _dbT ==*/
+        .quad 0x3ff0000000000000    /*2^( 0 /128)*/
+        .quad 0x3ff0163da9fb3335    /*2^( 1 /128)*/
+        .quad 0x3ff02c9a3e778061    /*2^( 2 /128)*/
+        .quad 0x3ff04315e86e7f85    /*2^( 3 /128)*/
+        .quad 0x3ff059b0d3158574    /*2^( 4 /128)*/
+        .quad 0x3ff0706b29ddf6de    /*2^( 5 /128)*/
+        .quad 0x3ff0874518759bc8    /*2^( 6 /128)*/
+        .quad 0x3ff09e3ecac6f383    /*2^( 7 /128)*/
+        .quad 0x3ff0b5586cf9890f    /*2^( 8 /128)*/
+        .quad 0x3ff0cc922b7247f7    /*2^( 9 /128)*/
+        .quad 0x3ff0e3ec32d3d1a2    /*2^( 10 /128)*/
+        .quad 0x3ff0fb66affed31b    /*2^( 11 /128)*/
+        .quad 0x3ff11301d0125b51    /*2^( 12 /128)*/
+        .quad 0x3ff12abdc06c31cc    /*2^( 13 /128)*/
+        .quad 0x3ff1429aaea92de0    /*2^( 14 /128)*/
+        .quad 0x3ff15a98c8a58e51    /*2^( 15 /128)*/
+        .quad 0x3ff172b83c7d517b    /*2^( 16 /128)*/
+        .quad 0x3ff18af9388c8dea    /*2^( 17 /128)*/
+        .quad 0x3ff1a35beb6fcb75    /*2^( 18 /128)*/
+        .quad 0x3ff1bbe084045cd4    /*2^( 19 /128)*/
+        .quad 0x3ff1d4873168b9aa    /*2^( 20 /128)*/
+        .quad 0x3ff1ed5022fcd91d    /*2^( 21 /128)*/
+        .quad 0x3ff2063b88628cd6    /*2^( 22 /128)*/
+        .quad 0x3ff21f49917ddc96    /*2^( 23 /128)*/
+        .quad 0x3ff2387a6e756238    /*2^( 24 /128)*/
+        .quad 0x3ff251ce4fb2a63f    /*2^( 25 /128)*/
+        .quad 0x3ff26b4565e27cdd    /*2^( 26 /128)*/
+        .quad 0x3ff284dfe1f56381    /*2^( 27 /128)*/
+        .quad 0x3ff29e9df51fdee1    /*2^( 28 /128)*/
+        .quad 0x3ff2b87fd0dad990    /*2^( 29 /128)*/
+        .quad 0x3ff2d285a6e4030b    /*2^( 30 /128)*/
+        .quad 0x3ff2ecafa93e2f56    /*2^( 31 /128)*/
+        .quad 0x3ff306fe0a31b715    /*2^( 32 /128)*/
+        .quad 0x3ff32170fc4cd831    /*2^( 33 /128)*/
+        .quad 0x3ff33c08b26416ff    /*2^( 34 /128)*/
+        .quad 0x3ff356c55f929ff1    /*2^( 35 /128)*/
+        .quad 0x3ff371a7373aa9cb    /*2^( 36 /128)*/
+        .quad 0x3ff38cae6d05d866    /*2^( 37 /128)*/
+        .quad 0x3ff3a7db34e59ff7    /*2^( 38 /128)*/
+        .quad 0x3ff3c32dc313a8e5    /*2^( 39 /128)*/
+        .quad 0x3ff3dea64c123422    /*2^( 40 /128)*/
+        .quad 0x3ff3fa4504ac801c    /*2^( 41 /128)*/
+        .quad 0x3ff4160a21f72e2a    /*2^( 42 /128)*/
+        .quad 0x3ff431f5d950a897    /*2^( 43 /128)*/
+        .quad 0x3ff44e086061892d    /*2^( 44 /128)*/
+        .quad 0x3ff46a41ed1d0057    /*2^( 45 /128)*/
+        .quad 0x3ff486a2b5c13cd0    /*2^( 46 /128)*/
+        .quad 0x3ff4a32af0d7d3de    /*2^( 47 /128)*/
+        .quad 0x3ff4bfdad5362a27    /*2^( 48 /128)*/
+        .quad 0x3ff4dcb299fddd0d    /*2^( 49 /128)*/
+        .quad 0x3ff4f9b2769d2ca7    /*2^( 50 /128)*/
+        .quad 0x3ff516daa2cf6642    /*2^( 51 /128)*/
+        .quad 0x3ff5342b569d4f82    /*2^( 52 /128)*/
+        .quad 0x3ff551a4ca5d920f    /*2^( 53 /128)*/
+        .quad 0x3ff56f4736b527da    /*2^( 54 /128)*/
+        .quad 0x3ff58d12d497c7fd    /*2^( 55 /128)*/
+        .quad 0x3ff5ab07dd485429    /*2^( 56 /128)*/
+        .quad 0x3ff5c9268a5946b7    /*2^( 57 /128)*/
+        .quad 0x3ff5e76f15ad2148    /*2^( 58 /128)*/
+        .quad 0x3ff605e1b976dc09    /*2^( 59 /128)*/
+        .quad 0x3ff6247eb03a5585    /*2^( 60 /128)*/
+        .quad 0x3ff6434634ccc320    /*2^( 61 /128)*/
+        .quad 0x3ff6623882552225    /*2^( 62 /128)*/
+        .quad 0x3ff68155d44ca973    /*2^( 63 /128)*/
+        .quad 0x3ff6a09e667f3bcd    /*2^( 64 /128)*/
+        .quad 0x3ff6c012750bdabf    /*2^( 65 /128)*/
+        .quad 0x3ff6dfb23c651a2f    /*2^( 66 /128)*/
+        .quad 0x3ff6ff7df9519484    /*2^( 67 /128)*/
+        .quad 0x3ff71f75e8ec5f74    /*2^( 68 /128)*/
+        .quad 0x3ff73f9a48a58174    /*2^( 69 /128)*/
+        .quad 0x3ff75feb564267c9    /*2^( 70 /128)*/
+        .quad 0x3ff780694fde5d3f    /*2^( 71 /128)*/
+        .quad 0x3ff7a11473eb0187    /*2^( 72 /128)*/
+        .quad 0x3ff7c1ed0130c132    /*2^( 73 /128)*/
+        .quad 0x3ff7e2f336cf4e62    /*2^( 74 /128)*/
+        .quad 0x3ff80427543e1a12    /*2^( 75 /128)*/
+        .quad 0x3ff82589994cce13    /*2^( 76 /128)*/
+        .quad 0x3ff8471a4623c7ad    /*2^( 77 /128)*/
+        .quad 0x3ff868d99b4492ed    /*2^( 78 /128)*/
+        .quad 0x3ff88ac7d98a6699    /*2^( 79 /128)*/
+        .quad 0x3ff8ace5422aa0db    /*2^( 80 /128)*/
+        .quad 0x3ff8cf3216b5448c    /*2^( 81 /128)*/
+        .quad 0x3ff8f1ae99157736    /*2^( 82 /128)*/
+        .quad 0x3ff9145b0b91ffc6    /*2^( 83 /128)*/
+        .quad 0x3ff93737b0cdc5e5    /*2^( 84 /128)*/
+        .quad 0x3ff95a44cbc8520f    /*2^( 85 /128)*/
+        .quad 0x3ff97d829fde4e50    /*2^( 86 /128)*/
+        .quad 0x3ff9a0f170ca07ba    /*2^( 87 /128)*/
+        .quad 0x3ff9c49182a3f090    /*2^( 88 /128)*/
+        .quad 0x3ff9e86319e32323    /*2^( 89 /128)*/
+        .quad 0x3ffa0c667b5de565    /*2^( 90 /128)*/
+        .quad 0x3ffa309bec4a2d33    /*2^( 91 /128)*/
+        .quad 0x3ffa5503b23e255d    /*2^( 92 /128)*/
+        .quad 0x3ffa799e1330b358    /*2^( 93 /128)*/
+        .quad 0x3ffa9e6b5579fdbf    /*2^( 94 /128)*/
+        .quad 0x3ffac36bbfd3f37a    /*2^( 95 /128)*/
+        .quad 0x3ffae89f995ad3ad    /*2^( 96 /128)*/
+        .quad 0x3ffb0e07298db666    /*2^( 97 /128)*/
+        .quad 0x3ffb33a2b84f15fb    /*2^( 98 /128)*/
+        .quad 0x3ffb59728de5593a    /*2^( 99 /128)*/
+        .quad 0x3ffb7f76f2fb5e47    /*2^( 100 /128)*/
+        .quad 0x3ffba5b030a1064a    /*2^( 101 /128)*/
+        .quad 0x3ffbcc1e904bc1d2    /*2^( 102 /128)*/
+        .quad 0x3ffbf2c25bd71e09    /*2^( 103 /128)*/
+        .quad 0x3ffc199bdd85529c    /*2^( 104 /128)*/
+        .quad 0x3ffc40ab5fffd07a    /*2^( 105 /128)*/
+        .quad 0x3ffc67f12e57d14b    /*2^( 106 /128)*/
+        .quad 0x3ffc8f6d9406e7b5    /*2^( 107 /128)*/
+        .quad 0x3ffcb720dcef9069    /*2^( 108 /128)*/
+        .quad 0x3ffcdf0b555dc3fa    /*2^( 109 /128)*/
+        .quad 0x3ffd072d4a07897c    /*2^( 110 /128)*/
+        .quad 0x3ffd2f87080d89f2    /*2^( 111 /128)*/
+        .quad 0x3ffd5818dcfba487    /*2^( 112 /128)*/
+        .quad 0x3ffd80e316c98398    /*2^( 113 /128)*/
+        .quad 0x3ffda9e603db3285    /*2^( 114 /128)*/
+        .quad 0x3ffdd321f301b460    /*2^( 115 /128)*/
+        .quad 0x3ffdfc97337b9b5f    /*2^( 116 /128)*/
+        .quad 0x3ffe264614f5a129    /*2^( 117 /128)*/
+        .quad 0x3ffe502ee78b3ff6    /*2^( 118 /128)*/
+        .quad 0x3ffe7a51fbc74c83    /*2^( 119 /128)*/
+        .quad 0x3ffea4afa2a490da    /*2^( 120 /128)*/
+        .quad 0x3ffecf482d8e67f1    /*2^( 121 /128)*/
+        .quad 0x3ffefa1bee615a27    /*2^( 122 /128)*/
+        .quad 0x3fff252b376bba97    /*2^( 123 /128)*/
+        .quad 0x3fff50765b6e4540    /*2^( 124 /128)*/
+        .quad 0x3fff7bfdad9cbe14    /*2^( 125 /128)*/
+        .quad 0x3fffa7c1819e90d8    /*2^( 126 /128)*/
+        .quad 0x3fffd3c22b8f71f1     /*2^( 127 /128)*/
+        .align 16
+        .quad 0x407a934f0979a371, 0x407a934f0979a371  /* _dbLg2_10*2^K */
+        .align 16
+        .quad 0x4338800000000000, 0x4338800000000000  /* _dbShifter */
+        .align 16
+        .quad 0x3f63441350a00000, 0x3f63441350a00000  /* _dbInvLg2_10hi/2^K 53-11-K bits*/
+        .align 16
+        .quad 0xbd10c0219dc1da99, 0xbd10c0219dc1da99  /* _dbInvLg2_10lo/2^K */
+        //PC0 = 1.0
+        .align 16
+        .quad 0x40026bb1bbb55516, 0x40026bb1bbb55516  /* _dPC1 */
+        .align 16
+        .quad 0x40053524c73ce8e3, 0x40053524c73ce8e3  /* _dPC2 */
+        .align 16
+        .quad 0x4000470591ccea8b, 0x4000470591ccea8b  /* _dPC3 */
+        .align 16
+        .quad 0x3ff2bd767584db59, 0x3ff2bd767584db59  /* _dPC4 */
+        .align 16
+        .quad 0x3fe144c03efafb54, 0x3fe144c03efafb54  /* _dPC5 */
+        .align 16
+        .quad 0xfff0000000000000, 0xfff0000000000000  /* _lExpMask */
+        .align 16
+        .long 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f          /* _iIndexMask =(2^K-1)*/
+        //common
+        .align 16
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff          /* _iAbsMask */
+        .align 16
+        .long 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70 /* _iDomainRange */
+        .align 16
+        .type	__svml_dexp10_data_internal,@object
+        .size	__svml_dexp10_data_internal,.-__svml_dexp10_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S
new file mode 100644
index 0000000000..260c052143
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized exp10, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_exp10 _ZGVdN4v_exp10_sse_wrapper
+#include "../svml_d_exp104_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c
new file mode 100644
index 0000000000..e3e302be72
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized exp10, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_exp10
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_exp10, __GI__ZGVdN4v_exp10, __redirect__ZGVdN4v_exp10)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S
new file mode 100644
index 0000000000..1a53f43c9e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S
@@ -0,0 +1,429 @@
+/* Function exp10 vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   exp10(x)  = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y))
+ *   where
+ *        x = m*log10(2)/K + y,  y in [-log10(2)/K..log10(2)/K]
+ *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
+ *
+ *        values of 2^j/K are tabulated
+ *
+ *        P(y) is a minimax polynomial approximation of exp10(x)-1
+ *        on small interval [-log10(2)/K..log10(2)/K]
+ *
+ *  Special cases:
+ *
+ *   exp10(NaN)  = NaN
+ *   exp10(+INF) = +INF
+ *   exp10(-INF) = 0
+ *   exp10(x)    = 1 for subnormals
+ *   For IEEE double
+ *     if x >  3.39782712893383973096e+02 then exp10(x) overflow
+ *     if x < -3.45133219101941108420e+02 then exp10(x) underflow
+ *
+ */
+
+/* Offsets for data table __svml_dexp10_data_internal
+ */
+#define _dbT                          	0
+#define _dbLg2_10                     	1024
+#define _dbShifter                    	1056
+#define _dbInvLg2_10hi                	1088
+#define _dbInvLg2_10lo                	1120
+#define _dPC1                         	1152
+#define _dPC2                         	1184
+#define _dPC3                         	1216
+#define _dPC4                         	1248
+#define _dPC5                         	1280
+#define _lExpMask                     	1312
+#define _iIndexMask                   	1344
+#define _iAbsMask                     	1376
+#define _iDomainRange                 	1408
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_exp10_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       __svml_dexp10_data_internal(%rip), %r8
+        vmovapd   %ymm0, %ymm2
+        vmovupd   _dbShifter+__svml_dexp10_data_internal(%rip), %ymm3
+
+/*  Load arument  */
+        vmovupd   _dbLg2_10+__svml_dexp10_data_internal(%rip), %ymm0
+        vfmadd213pd %ymm3, %ymm2, %ymm0
+        vsubpd    %ymm3, %ymm0, %ymm1
+
+/*  R  */
+        vmovupd   _dbInvLg2_10hi+__svml_dexp10_data_internal(%rip), %ymm3
+        vfnmadd213pd %ymm2, %ymm1, %ymm3
+
+/* Check for overflow\underflow  */
+        vextractf128 $1, %ymm2, %xmm4
+        vfnmadd132pd _dbInvLg2_10lo+__svml_dexp10_data_internal(%rip), %ymm3, %ymm1
+        vshufps   $221, %xmm4, %xmm2, %xmm5
+        vandps    _iAbsMask+__svml_dexp10_data_internal(%rip), %xmm5, %xmm6
+        vpcmpgtd  _iDomainRange+__svml_dexp10_data_internal(%rip), %xmm6, %xmm7
+
+/*
+ *  Polynomial
+ * poly(dN) = a1*dR+...+a5*dR^5
+ */
+        vmovupd   _dPC5+__svml_dexp10_data_internal(%rip), %ymm4
+        vmovmskps %xmm7, %eax
+        vfmadd213pd _dPC4+__svml_dexp10_data_internal(%rip), %ymm1, %ymm4
+        vfmadd213pd _dPC3+__svml_dexp10_data_internal(%rip), %ymm1, %ymm4
+        vfmadd213pd _dPC2+__svml_dexp10_data_internal(%rip), %ymm1, %ymm4
+
+/* a1+...+a5*dR^4 ! */
+        vfmadd213pd _dPC1+__svml_dexp10_data_internal(%rip), %ymm1, %ymm4
+
+/* a1*dR+...+a5*dR^5 */
+        vmulpd    %ymm4, %ymm1, %ymm1
+
+/*  Index and lookup  */
+        vextractf128 $1, %ymm0, %xmm8
+        vshufps   $136, %xmm8, %xmm0, %xmm9
+        vandps    _iIndexMask+__svml_dexp10_data_internal(%rip), %xmm9, %xmm10
+
+/* iIndex*=sizeof(D); */
+        vpslld    $3, %xmm10, %xmm13
+        vmovd     %xmm13, %edx
+
+/*  2^N  */
+        vpsllq    $45, %ymm0, %ymm0
+        vpextrd   $2, %xmm13, %esi
+        movslq    %edx, %rdx
+        vpextrd   $1, %xmm13, %ecx
+        movslq    %esi, %rsi
+        vpextrd   $3, %xmm13, %edi
+        movslq    %ecx, %rcx
+        movslq    %edi, %rdi
+        vmovsd    (%r8,%rdx), %xmm11
+        vmovsd    (%r8,%rsi), %xmm14
+        vmovhpd   (%r8,%rcx), %xmm11, %xmm12
+        vmovhpd   (%r8,%rdi), %xmm14, %xmm15
+
+/* lM==EXP(2^N) */
+        vpand     _lExpMask+__svml_dexp10_data_internal(%rip), %ymm0, %ymm6
+        vinsertf128 $1, %xmm15, %ymm12, %ymm5
+
+/* Tj*poly */
+        vfmadd213pd %ymm5, %ymm5, %ymm1
+
+/* quick 2^N */
+        vpaddq    %ymm6, %ymm1, %ymm0
+
+/*  Finish   */
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm2, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      exp10@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_exp10_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dexp10_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _dbT[(1<<7)][2];
+        __declspec(align(32)) VUINT32 _dbLg2_10[4][2];
+        __declspec(align(32)) VUINT32 _dbShifter[4][2];
+        __declspec(align(32)) VUINT32 _dbInvLg2_10hi[4][2];
+        __declspec(align(32)) VUINT32 _dbInvLg2_10lo[4][2];
+        __declspec(align(32)) VUINT32 _dPC1[4][2];
+        __declspec(align(32)) VUINT32 _dPC2[4][2];
+        __declspec(align(32)) VUINT32 _dPC3[4][2];
+        __declspec(align(32)) VUINT32 _dPC4[4][2];
+        __declspec(align(32)) VUINT32 _dPC5[4][2];
+        __declspec(align(32)) VUINT32 _lExpMask[4][2];
+        __declspec(align(32)) VUINT32 _iIndexMask[8][1];
+        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
+        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
+} __svml_dexp10_data_internal;
+#endif
+__svml_dexp10_data_internal:
+        /*== _dbT ==*/
+        .quad 0x3ff0000000000000    /*2^( 0 /128)*/
+        .quad 0x3ff0163da9fb3335    /*2^( 1 /128)*/
+        .quad 0x3ff02c9a3e778061    /*2^( 2 /128)*/
+        .quad 0x3ff04315e86e7f85    /*2^( 3 /128)*/
+        .quad 0x3ff059b0d3158574    /*2^( 4 /128)*/
+        .quad 0x3ff0706b29ddf6de    /*2^( 5 /128)*/
+        .quad 0x3ff0874518759bc8    /*2^( 6 /128)*/
+        .quad 0x3ff09e3ecac6f383    /*2^( 7 /128)*/
+        .quad 0x3ff0b5586cf9890f    /*2^( 8 /128)*/
+        .quad 0x3ff0cc922b7247f7    /*2^( 9 /128)*/
+        .quad 0x3ff0e3ec32d3d1a2    /*2^( 10 /128)*/
+        .quad 0x3ff0fb66affed31b    /*2^( 11 /128)*/
+        .quad 0x3ff11301d0125b51    /*2^( 12 /128)*/
+        .quad 0x3ff12abdc06c31cc    /*2^( 13 /128)*/
+        .quad 0x3ff1429aaea92de0    /*2^( 14 /128)*/
+        .quad 0x3ff15a98c8a58e51    /*2^( 15 /128)*/
+        .quad 0x3ff172b83c7d517b    /*2^( 16 /128)*/
+        .quad 0x3ff18af9388c8dea    /*2^( 17 /128)*/
+        .quad 0x3ff1a35beb6fcb75    /*2^( 18 /128)*/
+        .quad 0x3ff1bbe084045cd4    /*2^( 19 /128)*/
+        .quad 0x3ff1d4873168b9aa    /*2^( 20 /128)*/
+        .quad 0x3ff1ed5022fcd91d    /*2^( 21 /128)*/
+        .quad 0x3ff2063b88628cd6    /*2^( 22 /128)*/
+        .quad 0x3ff21f49917ddc96    /*2^( 23 /128)*/
+        .quad 0x3ff2387a6e756238    /*2^( 24 /128)*/
+        .quad 0x3ff251ce4fb2a63f    /*2^( 25 /128)*/
+        .quad 0x3ff26b4565e27cdd    /*2^( 26 /128)*/
+        .quad 0x3ff284dfe1f56381    /*2^( 27 /128)*/
+        .quad 0x3ff29e9df51fdee1    /*2^( 28 /128)*/
+        .quad 0x3ff2b87fd0dad990    /*2^( 29 /128)*/
+        .quad 0x3ff2d285a6e4030b    /*2^( 30 /128)*/
+        .quad 0x3ff2ecafa93e2f56    /*2^( 31 /128)*/
+        .quad 0x3ff306fe0a31b715    /*2^( 32 /128)*/
+        .quad 0x3ff32170fc4cd831    /*2^( 33 /128)*/
+        .quad 0x3ff33c08b26416ff    /*2^( 34 /128)*/
+        .quad 0x3ff356c55f929ff1    /*2^( 35 /128)*/
+        .quad 0x3ff371a7373aa9cb    /*2^( 36 /128)*/
+        .quad 0x3ff38cae6d05d866    /*2^( 37 /128)*/
+        .quad 0x3ff3a7db34e59ff7    /*2^( 38 /128)*/
+        .quad 0x3ff3c32dc313a8e5    /*2^( 39 /128)*/
+        .quad 0x3ff3dea64c123422    /*2^( 40 /128)*/
+        .quad 0x3ff3fa4504ac801c    /*2^( 41 /128)*/
+        .quad 0x3ff4160a21f72e2a    /*2^( 42 /128)*/
+        .quad 0x3ff431f5d950a897    /*2^( 43 /128)*/
+        .quad 0x3ff44e086061892d    /*2^( 44 /128)*/
+        .quad 0x3ff46a41ed1d0057    /*2^( 45 /128)*/
+        .quad 0x3ff486a2b5c13cd0    /*2^( 46 /128)*/
+        .quad 0x3ff4a32af0d7d3de    /*2^( 47 /128)*/
+        .quad 0x3ff4bfdad5362a27    /*2^( 48 /128)*/
+        .quad 0x3ff4dcb299fddd0d    /*2^( 49 /128)*/
+        .quad 0x3ff4f9b2769d2ca7    /*2^( 50 /128)*/
+        .quad 0x3ff516daa2cf6642    /*2^( 51 /128)*/
+        .quad 0x3ff5342b569d4f82    /*2^( 52 /128)*/
+        .quad 0x3ff551a4ca5d920f    /*2^( 53 /128)*/
+        .quad 0x3ff56f4736b527da    /*2^( 54 /128)*/
+        .quad 0x3ff58d12d497c7fd    /*2^( 55 /128)*/
+        .quad 0x3ff5ab07dd485429    /*2^( 56 /128)*/
+        .quad 0x3ff5c9268a5946b7    /*2^( 57 /128)*/
+        .quad 0x3ff5e76f15ad2148    /*2^( 58 /128)*/
+        .quad 0x3ff605e1b976dc09    /*2^( 59 /128)*/
+        .quad 0x3ff6247eb03a5585    /*2^( 60 /128)*/
+        .quad 0x3ff6434634ccc320    /*2^( 61 /128)*/
+        .quad 0x3ff6623882552225    /*2^( 62 /128)*/
+        .quad 0x3ff68155d44ca973    /*2^( 63 /128)*/
+        .quad 0x3ff6a09e667f3bcd    /*2^( 64 /128)*/
+        .quad 0x3ff6c012750bdabf    /*2^( 65 /128)*/
+        .quad 0x3ff6dfb23c651a2f    /*2^( 66 /128)*/
+        .quad 0x3ff6ff7df9519484    /*2^( 67 /128)*/
+        .quad 0x3ff71f75e8ec5f74    /*2^( 68 /128)*/
+        .quad 0x3ff73f9a48a58174    /*2^( 69 /128)*/
+        .quad 0x3ff75feb564267c9    /*2^( 70 /128)*/
+        .quad 0x3ff780694fde5d3f    /*2^( 71 /128)*/
+        .quad 0x3ff7a11473eb0187    /*2^( 72 /128)*/
+        .quad 0x3ff7c1ed0130c132    /*2^( 73 /128)*/
+        .quad 0x3ff7e2f336cf4e62    /*2^( 74 /128)*/
+        .quad 0x3ff80427543e1a12    /*2^( 75 /128)*/
+        .quad 0x3ff82589994cce13    /*2^( 76 /128)*/
+        .quad 0x3ff8471a4623c7ad    /*2^( 77 /128)*/
+        .quad 0x3ff868d99b4492ed    /*2^( 78 /128)*/
+        .quad 0x3ff88ac7d98a6699    /*2^( 79 /128)*/
+        .quad 0x3ff8ace5422aa0db    /*2^( 80 /128)*/
+        .quad 0x3ff8cf3216b5448c    /*2^( 81 /128)*/
+        .quad 0x3ff8f1ae99157736    /*2^( 82 /128)*/
+        .quad 0x3ff9145b0b91ffc6    /*2^( 83 /128)*/
+        .quad 0x3ff93737b0cdc5e5    /*2^( 84 /128)*/
+        .quad 0x3ff95a44cbc8520f    /*2^( 85 /128)*/
+        .quad 0x3ff97d829fde4e50    /*2^( 86 /128)*/
+        .quad 0x3ff9a0f170ca07ba    /*2^( 87 /128)*/
+        .quad 0x3ff9c49182a3f090    /*2^( 88 /128)*/
+        .quad 0x3ff9e86319e32323    /*2^( 89 /128)*/
+        .quad 0x3ffa0c667b5de565    /*2^( 90 /128)*/
+        .quad 0x3ffa309bec4a2d33    /*2^( 91 /128)*/
+        .quad 0x3ffa5503b23e255d    /*2^( 92 /128)*/
+        .quad 0x3ffa799e1330b358    /*2^( 93 /128)*/
+        .quad 0x3ffa9e6b5579fdbf    /*2^( 94 /128)*/
+        .quad 0x3ffac36bbfd3f37a    /*2^( 95 /128)*/
+        .quad 0x3ffae89f995ad3ad    /*2^( 96 /128)*/
+        .quad 0x3ffb0e07298db666    /*2^( 97 /128)*/
+        .quad 0x3ffb33a2b84f15fb    /*2^( 98 /128)*/
+        .quad 0x3ffb59728de5593a    /*2^( 99 /128)*/
+        .quad 0x3ffb7f76f2fb5e47    /*2^( 100 /128)*/
+        .quad 0x3ffba5b030a1064a    /*2^( 101 /128)*/
+        .quad 0x3ffbcc1e904bc1d2    /*2^( 102 /128)*/
+        .quad 0x3ffbf2c25bd71e09    /*2^( 103 /128)*/
+        .quad 0x3ffc199bdd85529c    /*2^( 104 /128)*/
+        .quad 0x3ffc40ab5fffd07a    /*2^( 105 /128)*/
+        .quad 0x3ffc67f12e57d14b    /*2^( 106 /128)*/
+        .quad 0x3ffc8f6d9406e7b5    /*2^( 107 /128)*/
+        .quad 0x3ffcb720dcef9069    /*2^( 108 /128)*/
+        .quad 0x3ffcdf0b555dc3fa    /*2^( 109 /128)*/
+        .quad 0x3ffd072d4a07897c    /*2^( 110 /128)*/
+        .quad 0x3ffd2f87080d89f2    /*2^( 111 /128)*/
+        .quad 0x3ffd5818dcfba487    /*2^( 112 /128)*/
+        .quad 0x3ffd80e316c98398    /*2^( 113 /128)*/
+        .quad 0x3ffda9e603db3285    /*2^( 114 /128)*/
+        .quad 0x3ffdd321f301b460    /*2^( 115 /128)*/
+        .quad 0x3ffdfc97337b9b5f    /*2^( 116 /128)*/
+        .quad 0x3ffe264614f5a129    /*2^( 117 /128)*/
+        .quad 0x3ffe502ee78b3ff6    /*2^( 118 /128)*/
+        .quad 0x3ffe7a51fbc74c83    /*2^( 119 /128)*/
+        .quad 0x3ffea4afa2a490da    /*2^( 120 /128)*/
+        .quad 0x3ffecf482d8e67f1    /*2^( 121 /128)*/
+        .quad 0x3ffefa1bee615a27    /*2^( 122 /128)*/
+        .quad 0x3fff252b376bba97    /*2^( 123 /128)*/
+        .quad 0x3fff50765b6e4540    /*2^( 124 /128)*/
+        .quad 0x3fff7bfdad9cbe14    /*2^( 125 /128)*/
+        .quad 0x3fffa7c1819e90d8    /*2^( 126 /128)*/
+        .quad 0x3fffd3c22b8f71f1     /*2^( 127 /128)*/
+        .align 32
+        .quad 0x407a934f0979a371, 0x407a934f0979a371, 0x407a934f0979a371, 0x407a934f0979a371  /* _dbLg2_10*2^K */
+        .align 32
+        .quad 0x4338800000000000, 0x4338800000000000, 0x4338800000000000, 0x4338800000000000  /* _dbShifter */
+        .align 32
+        .quad 0x3f63441350a00000, 0x3f63441350a00000, 0x3f63441350a00000, 0x3f63441350a00000  /* _dbInvLg2_10hi/2^K 53-11-K bits*/
+        .align 32
+        .quad 0xbd10c0219dc1da99, 0xbd10c0219dc1da99, 0xbd10c0219dc1da99, 0xbd10c0219dc1da99  /* _dbInvLg2_10lo/2^K */
+        //PC0 = 1.0
+        .align 32
+        .quad 0x40026bb1bbb55516, 0x40026bb1bbb55516, 0x40026bb1bbb55516, 0x40026bb1bbb55516  /* _dPC1 */
+        .align 32
+        .quad 0x40053524c73ce8e3, 0x40053524c73ce8e3, 0x40053524c73ce8e3, 0x40053524c73ce8e3  /* _dPC2 */
+        .align 32
+        .quad 0x4000470591ccea8b, 0x4000470591ccea8b, 0x4000470591ccea8b, 0x4000470591ccea8b  /* _dPC3 */
+        .align 32
+        .quad 0x3ff2bd767584db59, 0x3ff2bd767584db59, 0x3ff2bd767584db59, 0x3ff2bd767584db59  /* _dPC4 */
+        .align 32
+        .quad 0x3fe144c03efafb54, 0x3fe144c03efafb54, 0x3fe144c03efafb54, 0x3fe144c03efafb54  /* _dPC5 */
+        .align 32
+        .quad 0xfff0000000000000, 0xfff0000000000000, 0xfff0000000000000, 0xfff0000000000000  /* _lExpMask */
+        .align 32
+        .long 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f          /* _iIndexMask =(2^K-1)*/
+        //common
+        .align 32
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff          /* _iAbsMask */
+        .align 32
+        .long 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70 /* _iDomainRange */
+        .align 32
+        .type	__svml_dexp10_data_internal,@object
+        .size	__svml_dexp10_data_internal,.-__svml_dexp10_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S
new file mode 100644
index 0000000000..3aff9446d3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized exp10, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_exp10 _ZGVeN8v_exp10_avx2_wrapper
+#include "../svml_d_exp108_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c
new file mode 100644
index 0000000000..d592663169
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized exp10, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_exp10
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_exp10, __GI__ZGVeN8v_exp10, __redirect__ZGVeN8v_exp10)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S
new file mode 100644
index 0000000000..953cb5bc1a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S
@@ -0,0 +1,287 @@
+/* Function exp10 vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *   Typical exp10() implementation, except that:
+ *    - tables are small (16 elements), allowing for fast gathers
+ *    - all arguments processed in the main path
+ *        - final VSCALEF assists branch-free design (correct overflow/underflow and special case responses)
+ *        - a VAND is used to ensure the reduced argument |R|<2, even for large inputs
+ *        - RZ mode used to avoid oveflow to +/-Inf for x*log2(e); helps with special case handling
+ *        - SAE used to avoid spurious flag settings
+ *
+ */
+
+/* Offsets for data table __svml_dexp10_data_internal_avx512
+ */
+#define Exp_tbl_H                     	0
+#define L2E                           	128
+#define Shifter                       	192
+#define L2H                           	256
+#define L2L                           	320
+#define EMask                         	384
+#define poly_coeff6                   	448
+#define poly_coeff5                   	512
+#define poly_coeff4                   	576
+#define poly_coeff3                   	640
+#define poly_coeff2                   	704
+#define poly_coeff1                   	768
+#define AbsMask                       	832
+#define Threshold                     	896
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_exp10_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   L2E+__svml_dexp10_data_internal_avx512(%rip), %zmm4
+        vmovups   Shifter+__svml_dexp10_data_internal_avx512(%rip), %zmm2
+        vmovups   L2H+__svml_dexp10_data_internal_avx512(%rip), %zmm5
+        vmovups   L2L+__svml_dexp10_data_internal_avx512(%rip), %zmm3
+
+/* polynomial */
+        vmovups   poly_coeff6+__svml_dexp10_data_internal_avx512(%rip), %zmm6
+        vmovups   poly_coeff4+__svml_dexp10_data_internal_avx512(%rip), %zmm7
+        vmovups   poly_coeff3+__svml_dexp10_data_internal_avx512(%rip), %zmm9
+        vmovups   poly_coeff2+__svml_dexp10_data_internal_avx512(%rip), %zmm8
+        vmovups   poly_coeff1+__svml_dexp10_data_internal_avx512(%rip), %zmm11
+        vmovups   Threshold+__svml_dexp10_data_internal_avx512(%rip), %zmm14
+        vmovaps   %zmm0, %zmm1
+
+/* 2^(52-4)*1.5 + x * log2(e) */
+        vfmadd213pd {rz-sae}, %zmm2, %zmm1, %zmm4
+        vandpd    AbsMask+__svml_dexp10_data_internal_avx512(%rip), %zmm1, %zmm13
+
+/* Z0 ~ x*log2(e), rounded down to 4 fractional bits */
+        vsubpd    {rn-sae}, %zmm2, %zmm4, %zmm0
+
+/* Table lookup: Th */
+        vmovups   __svml_dexp10_data_internal_avx512(%rip), %zmm2
+        vcmppd    $29, {sae}, %zmm14, %zmm13, %k0
+
+/* R = x - Z0*log(2) */
+        vfnmadd213pd {rn-sae}, %zmm1, %zmm0, %zmm5
+        vpermt2pd Exp_tbl_H+64+__svml_dexp10_data_internal_avx512(%rip), %zmm4, %zmm2
+        kmovw     %k0, %edx
+        vfnmadd231pd {rn-sae}, %zmm0, %zmm3, %zmm5
+        vmovups   poly_coeff5+__svml_dexp10_data_internal_avx512(%rip), %zmm3
+
+/* ensure |R|<2 even for special cases */
+        vandpd    EMask+__svml_dexp10_data_internal_avx512(%rip), %zmm5, %zmm12
+        vmulpd    {rn-sae}, %zmm12, %zmm12, %zmm10
+        vmulpd    {rn-sae}, %zmm12, %zmm2, %zmm15
+        vfmadd231pd {rn-sae}, %zmm12, %zmm6, %zmm3
+        vfmadd231pd {rn-sae}, %zmm12, %zmm7, %zmm9
+        vfmadd231pd {rn-sae}, %zmm12, %zmm8, %zmm11
+        vfmadd213pd {rn-sae}, %zmm9, %zmm10, %zmm3
+        vfmadd213pd {rn-sae}, %zmm11, %zmm10, %zmm3
+        vfmadd213pd {rn-sae}, %zmm2, %zmm15, %zmm3
+        vscalefpd {rn-sae}, %zmm0, %zmm3, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm1, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      exp10@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_exp10_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dexp10_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Exp_tbl_H[16][2];
+        __declspec(align(64)) VUINT32 L2E[8][2];
+        __declspec(align(64)) VUINT32 Shifter[8][2];
+        __declspec(align(64)) VUINT32 L2H[8][2];
+        __declspec(align(64)) VUINT32 L2L[8][2];
+        __declspec(align(64)) VUINT32 EMask[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
+        __declspec(align(64)) VUINT32 AbsMask[8][2];
+        __declspec(align(64)) VUINT32 Threshold[8][2];
+    } __svml_dexp10_data_internal_avx512;
+#endif
+__svml_dexp10_data_internal_avx512:
+        /*== Exp_tbl_H ==*/
+        .quad 0x3ff0000000000000
+        .quad 0x3ff0b5586cf9890f
+        .quad 0x3ff172b83c7d517b
+        .quad 0x3ff2387a6e756238
+        .quad 0x3ff306fe0a31b715
+        .quad 0x3ff3dea64c123422
+        .quad 0x3ff4bfdad5362a27
+        .quad 0x3ff5ab07dd485429
+        .quad 0x3ff6a09e667f3bcd
+        .quad 0x3ff7a11473eb0187
+        .quad 0x3ff8ace5422aa0db
+        .quad 0x3ff9c49182a3f090
+        .quad 0x3ffae89f995ad3ad
+        .quad 0x3ffc199bdd85529c
+        .quad 0x3ffd5818dcfba487
+        .quad 0x3ffea4afa2a490da
+        /*== log2(e) ==*/
+        .align 64
+        .quad 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371
+        /*== Shifter=2^(52-4)*1.5 ==*/
+        .align 64
+        .quad 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0
+        /*== L2H = log(2)_high ==*/
+        .align 64
+        .quad 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff
+        /*== L2L = log(2)_low ==*/
+        .align 64
+        .quad 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21
+        /*== EMask ==*/
+        .align 64
+        .quad 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff
+        /*== poly_coeff6 ==*/
+        .align 64
+        .quad 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020
+        /*== poly_coeff5 ==*/
+        .align 64
+        .quad 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424
+        /*== poly_coeff4 ==*/
+        .align 64
+        .quad 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d
+        /*== poly_coeff3 ==*/
+        .align 64
+        .quad 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8
+        /*== poly_coeff2 ==*/
+        .align 64
+        .quad 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25
+        /*== poly_coeff1 ==*/
+        .align 64
+        .quad 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2
+        /*== AbsMask ==*/
+        .align 64
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== Threshold ==*/
+        .align 64
+        .quad 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41
+        .align 64
+        .type	__svml_dexp10_data_internal_avx512,@object
+        .size	__svml_dexp10_data_internal_avx512,.-__svml_dexp10_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S
new file mode 100644
index 0000000000..dda41c9c8f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized exp10f.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_exp10f _ZGVeN16v_exp10f_avx2_wrapper
+#include "../svml_s_exp10f16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c
new file mode 100644
index 0000000000..8176a5912b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized exp10f, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_exp10f
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_exp10f, __GI__ZGVeN16v_exp10f,
+	       __redirect__ZGVeN16v_exp10f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S
new file mode 100644
index 0000000000..fc9309c90f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S
@@ -0,0 +1,269 @@
+/* Function exp10f vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *   Typical exp10() implementation, except that:
+ *    - tables are small (16 elements), allowing for fast gathers
+ *    - all arguments processed in the main path
+ *        - final VSCALEF assists branch-free design (correct overflow/underflow and special case responses)
+ *        - a VAND is used to ensure the reduced argument |R|<2, even for large inputs
+ *        - RZ mode used to avoid oveflow to +/-Inf for x*log2(e); helps with special case handling
+ *        - SAE used to avoid spurious flag settings
+ *
+ */
+
+/* Offsets for data table __svml_sexp10_data_internal_avx512
+ */
+#define Exp_tbl_L                     	0
+#define Exp_tbl_H                     	128
+#define L2E                           	256
+#define Shifter                       	320
+#define L2H                           	384
+#define L2L                           	448
+#define EMask                         	512
+#define AbsMask                       	576
+#define Threshold                     	640
+#define poly_coeff2                   	704
+#define poly_coeff1                   	768
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_exp10f_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   L2E+__svml_sexp10_data_internal_avx512(%rip), %zmm2
+        vmovups   Shifter+__svml_sexp10_data_internal_avx512(%rip), %zmm1
+        vmovups   L2H+__svml_sexp10_data_internal_avx512(%rip), %zmm5
+        vmovups   L2L+__svml_sexp10_data_internal_avx512(%rip), %zmm4
+
+/* ensure |R|<2 even for special cases */
+        vmovups   EMask+__svml_sexp10_data_internal_avx512(%rip), %zmm6
+        vmovups   poly_coeff2+__svml_sexp10_data_internal_avx512(%rip), %zmm9
+
+/* 2^(52-4)*1.5 + x * log2(e) */
+        vfmadd213ps {rz-sae}, %zmm1, %zmm0, %zmm2
+        vmovups   poly_coeff1+__svml_sexp10_data_internal_avx512(%rip), %zmm10
+        vmovups   __svml_sexp10_data_internal_avx512(%rip), %zmm8
+        vmovups   Exp_tbl_H+__svml_sexp10_data_internal_avx512(%rip), %zmm15
+        vmovups   Threshold+__svml_sexp10_data_internal_avx512(%rip), %zmm13
+        vpsrld    $5, %zmm2, %zmm3
+
+/* Z0 ~ x*log2(e), rounded down to 6 fractional bits */
+        vsubps    {rn-sae}, %zmm1, %zmm2, %zmm1
+        vpermt2ps Exp_tbl_L+64+__svml_sexp10_data_internal_avx512(%rip), %zmm2, %zmm8
+        vpermt2ps Exp_tbl_H+64+__svml_sexp10_data_internal_avx512(%rip), %zmm3, %zmm15
+        vandps    AbsMask+__svml_sexp10_data_internal_avx512(%rip), %zmm0, %zmm12
+
+/* R = x - Z0*log(2) */
+        vfnmadd213ps {rn-sae}, %zmm0, %zmm1, %zmm5
+        vcmpps    $29, {sae}, %zmm13, %zmm12, %k0
+        vfnmadd231ps {rn-sae}, %zmm1, %zmm4, %zmm5
+        kmovw     %k0, %edx
+        vrangeps  $2, {sae}, %zmm6, %zmm5, %zmm11
+        vfmadd231ps {rn-sae}, %zmm11, %zmm9, %zmm10
+        vmulps    {rn-sae}, %zmm11, %zmm10, %zmm14
+
+/* x!=0? */
+        vpxord    %zmm7, %zmm7, %zmm7
+        vcmpps    $4, {sae}, %zmm7, %zmm0, %k1
+
+/* Th*Tl */
+        vmulps    {rn-sae}, %zmm8, %zmm15, %zmm15{%k1}
+        vfmadd213ps {rn-sae}, %zmm15, %zmm14, %zmm15
+        vscalefps {rn-sae}, %zmm1, %zmm15, %zmm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %zmm1, %zmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm0, 64(%rsp)
+        vmovups   %zmm1, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm1
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      exp10f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_exp10f_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_sexp10_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Exp_tbl_L[32][1];
+        __declspec(align(64)) VUINT32 Exp_tbl_H[32][1];
+        __declspec(align(64)) VUINT32 L2E[16][1];
+        __declspec(align(64)) VUINT32 Shifter[16][1];
+        __declspec(align(64)) VUINT32 L2H[16][1];
+        __declspec(align(64)) VUINT32 L2L[16][1];
+        __declspec(align(64)) VUINT32 EMask[16][1];
+        __declspec(align(64)) VUINT32 AbsMask[16][1];
+        __declspec(align(64)) VUINT32 Threshold[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
+    } __svml_sexp10_data_internal_avx512;
+#endif
+__svml_sexp10_data_internal_avx512:
+        /*== Exp_tbl_L ==*/
+        .long 0x3f800001, 0x3f801631, 0x3f802c65, 0x3f80429d
+        .long 0x3f8058d9, 0x3f806f18, 0x3f80855c, 0x3f809ba3
+        .long 0x3f80b1ee, 0x3f80c83d, 0x3f80de90, 0x3f80f4e7
+        .long 0x3f810b42, 0x3f8121a0, 0x3f813803, 0x3f814e69
+        .long 0x3f8164d3, 0x3f817b41, 0x3f8191b3, 0x3f81a829
+        .long 0x3f81bea2, 0x3f81d520, 0x3f81eba2, 0x3f820227
+        .long 0x3f8218b0, 0x3f822f3d, 0x3f8245cf, 0x3f825c64
+        .long 0x3f8272fd, 0x3f828999, 0x3f82a03a, 0x3f82b6df
+        /*== Exp_tbl_H ==*/
+        .align 64
+        .long 0x3f800000, 0x3f82cd87, 0x3f85aac3, 0x3f88980f
+        .long 0x3f8b95c2, 0x3f8ea43a, 0x3f91c3d3, 0x3f94f4f0
+        .long 0x3f9837f0, 0x3f9b8d3a, 0x3f9ef532, 0x3fa27043
+        .long 0x3fa5fed7, 0x3fa9a15b, 0x3fad583f, 0x3fb123f6
+        .long 0x3fb504f3, 0x3fb8fbaf, 0x3fbd08a4, 0x3fc12c4d
+        .long 0x3fc5672a, 0x3fc9b9be, 0x3fce248c, 0x3fd2a81e
+        .long 0x3fd744fd, 0x3fdbfbb8, 0x3fe0ccdf, 0x3fe5b907
+        .long 0x3feac0c7, 0x3fefe4ba, 0x3ff5257d, 0x3ffa83b3
+        /*== log2(10) ==*/
+        .align 64
+        .long 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78
+        /*== Shifter=2^(23-10)*1.5 ==*/
+        .align 64
+        .long 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000
+        /*== L2H = log(2)_high ==*/
+        .align 64
+        .long 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b
+        /*== L2L = log(2)_low ==*/
+        .align 64
+        .long 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860
+        /*== EMask ==*/
+        .align 64
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
+        /*== AbsMask ==*/
+        .align 64
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== Threshold ==*/
+        .align 64
+        .long 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818
+        /*== poly_coeff2 ==*/
+        .align 64
+        .long 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA
+        /*== poly_coeff1 ==*/
+        .align 64
+        .long 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D
+        .align 64
+        .type	__svml_sexp10_data_internal_avx512,@object
+        .size	__svml_sexp10_data_internal_avx512,.-__svml_sexp10_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S
new file mode 100644
index 0000000000..460d01357d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized exp10f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_exp10f _ZGVbN4v_exp10f_sse2
+#include "../svml_s_exp10f4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c
new file mode 100644
index 0000000000..7ce90a9bae
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized exp10f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_exp10f
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_exp10f, __GI__ZGVbN4v_exp10f,
+	       __redirect__ZGVbN4v_exp10f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S
new file mode 100644
index 0000000000..879592b789
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S
@@ -0,0 +1,311 @@
+/* Function exp10f vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   exp10(x)  = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y))
+ *   where
+ *        x = m*log10(2)/K + y,  y in [-log10(2)/K..log10(2)/K]
+ *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
+ *
+ *        values of 2^j/K are tabulated
+ *
+ *        P(y) is a minimax polynomial approximation of exp10(x)-1
+ *        on small interval [-log10(2)/K..log10(2)/K]
+ *
+ *  Special cases:
+ *
+ *   exp10(NaN)  = NaN
+ *   exp10(+INF) = +INF
+ *   exp10(-INF) = 0
+ *   exp10(x)    = 1 for subnormals
+ *   For IEEE float
+ *     if x >  38.5318412780761720 then exp10f(x) overflow
+ *     if x < -45.4555282592773440 then exp10f(x) underflow
+ *
+ */
+
+/* Offsets for data table __svml_sexp10_data_internal
+ */
+#define _sT                           	0
+#define _sLg2_10                      	128
+#define _sShifter                     	144
+#define _sInvLg2_10hi                 	160
+#define _sInvLg2_10lo                 	176
+#define _sPC0                         	192
+#define _sPC1                         	208
+#define _sPC2                         	224
+#define _iIndexMask                   	240
+#define _iAbsMask                     	256
+#define _iDomainRange                 	272
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_exp10f_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm4
+
+/*  Load arument  */
+        movups    _sLg2_10+__svml_sexp10_data_internal(%rip), %xmm2
+        lea       __svml_sexp10_data_internal(%rip), %r8
+        mulps     %xmm4, %xmm2
+        movups    _sShifter+__svml_sexp10_data_internal(%rip), %xmm5
+
+/*  R  */
+        movups    _sInvLg2_10hi+__svml_sexp10_data_internal(%rip), %xmm14
+        addps     %xmm5, %xmm2
+        movaps    %xmm2, %xmm1
+        movups    _sInvLg2_10lo+__svml_sexp10_data_internal(%rip), %xmm15
+        subps     %xmm5, %xmm1
+        mulps     %xmm1, %xmm14
+        movaps    %xmm4, %xmm5
+        mulps     %xmm1, %xmm15
+        subps     %xmm14, %xmm5
+
+/*
+ *  Polynomial
+ * exp10 = 2^N*(Tj+Tj*poly)
+ * poly(sN) = {1+later} a0+a1*sR
+ */
+        movups    _sPC2+__svml_sexp10_data_internal(%rip), %xmm1
+        subps     %xmm15, %xmm5
+        mulps     %xmm5, %xmm1
+        movdqu    _iIndexMask+__svml_sexp10_data_internal(%rip), %xmm3
+
+/*  Index and lookup  */
+        movdqa    %xmm3, %xmm10
+
+/* remove index bits */
+        pandn     %xmm2, %xmm3
+        pand      %xmm2, %xmm10
+
+/*  2^N  */
+        pslld     $18, %xmm3
+
+/* iIndex *= sizeof(S); */
+        pslld     $2, %xmm10
+        addps     _sPC1+__svml_sexp10_data_internal(%rip), %xmm1
+        movd      %xmm10, %edx
+        pshufd    $1, %xmm10, %xmm7
+        pshufd    $2, %xmm10, %xmm9
+        pshufd    $3, %xmm10, %xmm11
+        movd      %xmm7, %ecx
+        movd      %xmm9, %esi
+        movd      %xmm11, %edi
+
+/* Check for overflow\underflow  */
+        movdqu    _iAbsMask+__svml_sexp10_data_internal(%rip), %xmm6
+        pand      %xmm4, %xmm6
+        mulps     %xmm1, %xmm5
+        movslq    %edx, %rdx
+        addps     _sPC0+__svml_sexp10_data_internal(%rip), %xmm5
+        movslq    %ecx, %rcx
+        movslq    %esi, %rsi
+        movslq    %edi, %rdi
+        movd      (%r8,%rdx), %xmm0
+        movd      (%r8,%rcx), %xmm8
+        movd      (%r8,%rsi), %xmm13
+        movd      (%r8,%rdi), %xmm12
+        punpckldq %xmm8, %xmm0
+        punpckldq %xmm12, %xmm13
+        punpcklqdq %xmm13, %xmm0
+
+/* Tj_l+Tj_h*poly */
+        mulps     %xmm0, %xmm5
+        pcmpgtd   _iDomainRange+__svml_sexp10_data_internal(%rip), %xmm6
+        addps     %xmm5, %xmm0
+        movmskps  %xmm6, %eax
+
+/* quick mul 2^N */
+        paddd     %xmm3, %xmm0
+
+/*  Finish   */
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm4
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm4, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 eax
+
+        xorl      %edx, %edx
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      exp10f@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_exp10f_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_sexp10_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _sT[(1<<5)][1];
+        __declspec(align(16)) VUINT32 _sLg2_10[4][1];
+        __declspec(align(16)) VUINT32 _sShifter[4][1];
+        __declspec(align(16)) VUINT32 _sInvLg2_10hi[4][1];
+        __declspec(align(16)) VUINT32 _sInvLg2_10lo[4][1];
+        __declspec(align(16)) VUINT32 _sPC0[4][1];
+        __declspec(align(16)) VUINT32 _sPC1[4][1];
+        __declspec(align(16)) VUINT32 _sPC2[4][1];
+        __declspec(align(16)) VUINT32 _iIndexMask[4][1];
+        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
+        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
+} __svml_sexp10_data_internal;
+#endif
+__svml_sexp10_data_internal:
+        /*== _sT ==*/
+        .long 0x3f800000  // 2^( 0 /32 )
+        .long 0x3f82cd87  // 2^( 1 /32 )
+        .long 0x3f85aac3  // 2^( 2 /32 )
+        .long 0x3f88980f  // 2^( 3 /32 )
+        .long 0x3f8b95c2  // 2^( 4 /32 )
+        .long 0x3f8ea43a  // 2^( 5 /32 )
+        .long 0x3f91c3d3  // 2^( 6 /32 )
+        .long 0x3f94f4f0  // 2^( 7 /32 )
+        .long 0x3f9837f0  // 2^( 8 /32 )
+        .long 0x3f9b8d3a  // 2^( 9 /32 )
+        .long 0x3f9ef532  // 2^( 10/32 )
+        .long 0x3fa27043  // 2^( 11/32 )
+        .long 0x3fa5fed7  // 2^( 12/32 )
+        .long 0x3fa9a15b  // 2^( 13/32 )
+        .long 0x3fad583f  // 2^( 14/32 )
+        .long 0x3fb123f6  // 2^( 15/32 )
+        .long 0x3fb504f3  // 2^( 16/32 )
+        .long 0x3fb8fbaf  // 2^( 17/32 )
+        .long 0x3fbd08a4  // 2^( 18/32 )
+        .long 0x3fc12c4d  // 2^( 19/32 )
+        .long 0x3fc5672a  // 2^( 20/32 )
+        .long 0x3fc9b9be  // 2^( 21/32 )
+        .long 0x3fce248c  // 2^( 22/32 )
+        .long 0x3fd2a81e  // 2^( 23/32 )
+        .long 0x3fd744fd  // 2^( 24/32 )
+        .long 0x3fdbfbb8  // 2^( 25/32 )
+        .long 0x3fe0ccdf  // 2^( 26/32 )
+        .long 0x3fe5b907  // 2^( 27/32 )
+        .long 0x3feac0c7  // 2^( 28/32 )
+        .long 0x3fefe4ba  // 2^( 29/32 )
+        .long 0x3ff5257d  // 2^( 30/32 )
+        .long 0x3ffa83b3  // 2^( 31/32 )
+        .align 16
+        .long 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78  /* _sLg2_10*2^K   */
+        .align 16
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000  /* _sShifter) */
+        .align 16
+        .long 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000  /* _sInvLg2_10hi/2^K hi (24-K-7) bits*/
+        .align 16
+        .long 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc  /* _sInvLg2_10lo/2^K  lo bits */
+        // otherwise exp10(0) won't produce exact 1.0
+        .align 16
+        .long 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868  /* _sPC0 */
+        .align 16
+        .long 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b  /* _sPC1 */
+        .align 16
+        .long 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2  /* _sPC2 */
+        .align 16
+        .long 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f  /* _iIndexMask =(2^K-1)*/
+        //common
+        .align 16
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff   /* _iAbsMask */
+        .align 16
+        .long 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818   /* _iDomainRange=-log10(max_denormal=0x007fffff) RZ */
+        .align 16
+        .type	__svml_sexp10_data_internal,@object
+        .size	__svml_sexp10_data_internal,.-__svml_sexp10_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S
new file mode 100644
index 0000000000..3f3fe252da
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized exp10f, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_exp10f _ZGVdN8v_exp10f_sse_wrapper
+#include "../svml_s_exp10f8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c
new file mode 100644
index 0000000000..1f5ed5a59d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized exp10f, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_exp10f
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_exp10f, __GI__ZGVdN8v_exp10f,
+	       __redirect__ZGVdN8v_exp10f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S
new file mode 100644
index 0000000000..b576412cf1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S
@@ -0,0 +1,331 @@
+/* Function exp10f vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   exp10(x)  = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y))
+ *   where
+ *        x = m*log10(2)/K + y,  y in [-log10(2)/K..log10(2)/K]
+ *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
+ *
+ *        values of 2^j/K are tabulated
+ *
+ *        P(y) is a minimax polynomial approximation of exp10(x)-1
+ *        on small interval [-log10(2)/K..log10(2)/K]
+ *
+ *  Special cases:
+ *
+ *   exp10(NaN)  = NaN
+ *   exp10(+INF) = +INF
+ *   exp10(-INF) = 0
+ *   exp10(x)    = 1 for subnormals
+ *   For IEEE float
+ *     if x >  38.5318412780761720 then exp10f(x) overflow
+ *     if x < -45.4555282592773440 then exp10f(x) underflow
+ *
+ */
+
+/* Offsets for data table __svml_sexp10_data_internal
+ */
+#define _sT                           	0
+#define _sLg2_10                      	128
+#define _sShifter                     	160
+#define _sInvLg2_10hi                 	192
+#define _sInvLg2_10lo                 	224
+#define _sPC0                         	256
+#define _sPC1                         	288
+#define _sPC2                         	320
+#define _iIndexMask                   	352
+#define _iAbsMask                     	384
+#define _iDomainRange                 	416
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_exp10f_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       __svml_sexp10_data_internal(%rip), %rax
+        vmovups   _sShifter+__svml_sexp10_data_internal(%rip), %ymm4
+
+/*  Load arument  */
+        vmovups   _sLg2_10+__svml_sexp10_data_internal(%rip), %ymm1
+        vmovups   _iIndexMask+__svml_sexp10_data_internal(%rip), %ymm2
+        vmovaps   %ymm0, %ymm3
+        vfmadd213ps %ymm4, %ymm3, %ymm1
+
+/*  Index and lookup  */
+        vandps    %ymm2, %ymm1, %ymm7
+
+/* iIndex *= sizeof(S); */
+        vpslld    $2, %ymm7, %ymm10
+        vsubps    %ymm4, %ymm1, %ymm0
+
+/* Check for overflow\underflow  */
+        vandps    _iAbsMask+__svml_sexp10_data_internal(%rip), %ymm3, %ymm5
+        vpcmpgtd  _iDomainRange+__svml_sexp10_data_internal(%rip), %ymm5, %ymm6
+        vmovmskps %ymm6, %edx
+        vmovd     %xmm10, %ecx
+        vextractf128 $1, %ymm10, %xmm6
+        vpextrd   $1, %xmm10, %esi
+        vpextrd   $2, %xmm10, %edi
+        vpextrd   $3, %xmm10, %r8d
+        movslq    %ecx, %rcx
+        movslq    %esi, %rsi
+        movslq    %edi, %rdi
+        movslq    %r8d, %r8
+        vmovd     (%rax,%rcx), %xmm8
+        vmovd     (%rax,%rsi), %xmm9
+        vmovd     (%rax,%rdi), %xmm11
+        vmovd     (%rax,%r8), %xmm12
+        vpunpckldq %xmm9, %xmm8, %xmm13
+        vpunpckldq %xmm12, %xmm11, %xmm14
+        vpunpcklqdq %xmm14, %xmm13, %xmm15
+
+/*  R  */
+        vmovups   _sInvLg2_10hi+__svml_sexp10_data_internal(%rip), %ymm13
+        vmovd     %xmm6, %r9d
+        vfnmadd213ps %ymm3, %ymm0, %ymm13
+        vpextrd   $1, %xmm6, %r10d
+        movslq    %r9d, %r9
+        movslq    %r10d, %r10
+        vfnmadd132ps _sInvLg2_10lo+__svml_sexp10_data_internal(%rip), %ymm13, %ymm0
+        vmovd     (%rax,%r9), %xmm4
+        vmovd     (%rax,%r10), %xmm5
+        vpunpckldq %xmm5, %xmm4, %xmm9
+
+/*
+ *  Polynomial
+ * exp10 = 2^N*(Tj+Tj*poly)
+ * poly(sN) = {1+later} a0+a1*sR
+ */
+        vmovups   _sPC2+__svml_sexp10_data_internal(%rip), %ymm4
+        vfmadd213ps _sPC1+__svml_sexp10_data_internal(%rip), %ymm0, %ymm4
+        vpextrd   $2, %xmm6, %r11d
+        vpextrd   $3, %xmm6, %ecx
+        movslq    %r11d, %r11
+        movslq    %ecx, %rcx
+        vfmadd213ps _sPC0+__svml_sexp10_data_internal(%rip), %ymm0, %ymm4
+        vmovd     (%rax,%r11), %xmm7
+        vmovd     (%rax,%rcx), %xmm8
+        vpunpckldq %xmm8, %xmm7, %xmm11
+
+/* remove index bits */
+        vpandn    %ymm1, %ymm2, %ymm0
+        vpunpcklqdq %xmm11, %xmm9, %xmm12
+
+/*  2^N  */
+        vpslld    $18, %ymm0, %ymm1
+        vinsertf128 $1, %xmm12, %ymm15, %ymm14
+
+/* Tj_l+Tj_h*poly */
+        vfmadd213ps %ymm14, %ymm14, %ymm4
+
+/* quick mul 2^N */
+        vpaddd    %ymm1, %ymm4, %ymm0
+
+/*  Finish   */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm3, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      exp10f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_exp10f_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_sexp10_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _sT[(1<<5)][1];
+        __declspec(align(32)) VUINT32 _sLg2_10[8][1];
+        __declspec(align(32)) VUINT32 _sShifter[8][1];
+        __declspec(align(32)) VUINT32 _sInvLg2_10hi[8][1];
+        __declspec(align(32)) VUINT32 _sInvLg2_10lo[8][1];
+        __declspec(align(32)) VUINT32 _sPC0[8][1];
+        __declspec(align(32)) VUINT32 _sPC1[8][1];
+        __declspec(align(32)) VUINT32 _sPC2[8][1];
+        __declspec(align(32)) VUINT32 _iIndexMask[8][1];
+        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
+        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
+} __svml_sexp10_data_internal;
+#endif
+__svml_sexp10_data_internal:
+        /*== _sT ==*/
+        .long 0x3f800000  // 2^( 0 /32 )
+        .long 0x3f82cd87  // 2^( 1 /32 )
+        .long 0x3f85aac3  // 2^( 2 /32 )
+        .long 0x3f88980f  // 2^( 3 /32 )
+        .long 0x3f8b95c2  // 2^( 4 /32 )
+        .long 0x3f8ea43a  // 2^( 5 /32 )
+        .long 0x3f91c3d3  // 2^( 6 /32 )
+        .long 0x3f94f4f0  // 2^( 7 /32 )
+        .long 0x3f9837f0  // 2^( 8 /32 )
+        .long 0x3f9b8d3a  // 2^( 9 /32 )
+        .long 0x3f9ef532  // 2^( 10/32 )
+        .long 0x3fa27043  // 2^( 11/32 )
+        .long 0x3fa5fed7  // 2^( 12/32 )
+        .long 0x3fa9a15b  // 2^( 13/32 )
+        .long 0x3fad583f  // 2^( 14/32 )
+        .long 0x3fb123f6  // 2^( 15/32 )
+        .long 0x3fb504f3  // 2^( 16/32 )
+        .long 0x3fb8fbaf  // 2^( 17/32 )
+        .long 0x3fbd08a4  // 2^( 18/32 )
+        .long 0x3fc12c4d  // 2^( 19/32 )
+        .long 0x3fc5672a  // 2^( 20/32 )
+        .long 0x3fc9b9be  // 2^( 21/32 )
+        .long 0x3fce248c  // 2^( 22/32 )
+        .long 0x3fd2a81e  // 2^( 23/32 )
+        .long 0x3fd744fd  // 2^( 24/32 )
+        .long 0x3fdbfbb8  // 2^( 25/32 )
+        .long 0x3fe0ccdf  // 2^( 26/32 )
+        .long 0x3fe5b907  // 2^( 27/32 )
+        .long 0x3feac0c7  // 2^( 28/32 )
+        .long 0x3fefe4ba  // 2^( 29/32 )
+        .long 0x3ff5257d  // 2^( 30/32 )
+        .long 0x3ffa83b3  // 2^( 31/32 )
+        .align 32
+        .long 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78  /* _sLg2_10*2^K   */
+        .align 32
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000  /* _sShifter) */
+        .align 32
+        .long 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000  /* _sInvLg2_10hi/2^K hi (24-K-7) bits*/
+        .align 32
+        .long 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc  /* _sInvLg2_10lo/2^K  lo bits */
+        // otherwise exp10(0) won't produce exact 1.0
+        .align 32
+        .long 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868  /* _sPC0 */
+        .align 32
+        .long 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b  /* _sPC1 */
+        .align 32
+        .long 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2  /* _sPC2 */
+        .align 32
+        .long 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f  /* _iIndexMask =(2^K-1)*/
+        //common
+        .align 32
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff   /* _iAbsMask */
+        .align 32
+        .long 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818   /* _iDomainRange=-log10(max_denormal=0x007fffff) RZ */
+        .align 32
+        .type	__svml_sexp10_data_internal,@object
+        .size	__svml_sexp10_data_internal,.-__svml_sexp10_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_exp102_core.S b/sysdeps/x86_64/fpu/svml_d_exp102_core.S
new file mode 100644
index 0000000000..157fb3b7c0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_exp102_core.S
@@ -0,0 +1,29 @@
+/* Function exp10 vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_exp10)
+WRAPPER_IMPL_SSE2 exp10
+END (_ZGVbN2v_exp10)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_exp10)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_exp104_core.S b/sysdeps/x86_64/fpu/svml_d_exp104_core.S
new file mode 100644
index 0000000000..9b9d0a5d4b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_exp104_core.S
@@ -0,0 +1,29 @@
+/* Function exp10 vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_exp10)
+WRAPPER_IMPL_AVX _ZGVbN2v_exp10
+END (_ZGVdN4v_exp10)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_exp10)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S b/sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S
new file mode 100644
index 0000000000..1ba1a819ed
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S
@@ -0,0 +1,25 @@
+/* Function exp10 vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_exp10)
+WRAPPER_IMPL_AVX _ZGVbN2v_exp10
+END (_ZGVcN4v_exp10)
diff --git a/sysdeps/x86_64/fpu/svml_d_exp108_core.S b/sysdeps/x86_64/fpu/svml_d_exp108_core.S
new file mode 100644
index 0000000000..a530dc12de
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_exp108_core.S
@@ -0,0 +1,25 @@
+/* Function exp10 vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_exp10)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_exp10
+END (_ZGVeN8v_exp10)
diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f16_core.S b/sysdeps/x86_64/fpu/svml_s_exp10f16_core.S
new file mode 100644
index 0000000000..e5043bc875
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_exp10f16_core.S
@@ -0,0 +1,25 @@
+/* Function exp10f vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_exp10f)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_exp10f
+END (_ZGVeN16v_exp10f)
diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f4_core.S b/sysdeps/x86_64/fpu/svml_s_exp10f4_core.S
new file mode 100644
index 0000000000..75e6637a82
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_exp10f4_core.S
@@ -0,0 +1,29 @@
+/* Function exp10f vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_exp10f)
+WRAPPER_IMPL_SSE2 exp10f
+END (_ZGVbN4v_exp10f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_exp10f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f8_core.S b/sysdeps/x86_64/fpu/svml_s_exp10f8_core.S
new file mode 100644
index 0000000000..d481d2dee9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_exp10f8_core.S
@@ -0,0 +1,29 @@
+/* Function exp10f vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_exp10f)
+WRAPPER_IMPL_AVX _ZGVbN4v_exp10f
+END (_ZGVdN8v_exp10f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_exp10f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S
new file mode 100644
index 0000000000..65944bd4d2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function exp10f vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_exp10f)
+WRAPPER_IMPL_AVX _ZGVbN4v_exp10f
+END (_ZGVcN8v_exp10f)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c
new file mode 100644
index 0000000000..7cdda9895b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-exp10.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c
new file mode 100644
index 0000000000..7cdda9895b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-exp10.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c
new file mode 100644
index 0000000000..7cdda9895b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-exp10.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10.c
new file mode 100644
index 0000000000..b1461ed85e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC exp10
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 2f7172bd7b..256e8f07c9 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2)
+VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index e2d519faac..9de1dab2c2 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2)
+VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 1ce4d8b413..43865ab099 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2)
+VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 6c87cec648..5dbdacf617 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan)
 VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2)
+VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c
new file mode 100644
index 0000000000..be3cdaa80d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-exp10f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c
new file mode 100644
index 0000000000..be3cdaa80d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-exp10f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c
new file mode 100644
index 0000000000..be3cdaa80d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-exp10f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c
new file mode 100644
index 0000000000..06f447eb8d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC exp10f
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 597d7d7598..c159c8f583 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f)
+VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 3500eec810..c745ef744a 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f)
+VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 921b9c65d6..c9226cf4dc 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f)
+VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 6cbcb57521..92970c5ace 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf)
 VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f)
+VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 06/18] x86-64: Add vector cosh/coshf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (4 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 05/18] x86-64: Add vector exp10/exp10f " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:26   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 07/18] x86-64: Add vector expm1/expm1f " Sunil K Pandey
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector cosh/coshf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
 .../fpu/multiarch/svml_d_cosh2_core-sse2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_cosh2_core.c  |  27 ++
 .../fpu/multiarch/svml_d_cosh2_core_sse4.S    | 396 +++++++++++++++++
 .../fpu/multiarch/svml_d_cosh4_core-sse.S     |  20 +
 .../x86_64/fpu/multiarch/svml_d_cosh4_core.c  |  27 ++
 .../fpu/multiarch/svml_d_cosh4_core_avx2.S    | 412 ++++++++++++++++++
 .../fpu/multiarch/svml_d_cosh8_core-avx2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_cosh8_core.c  |  27 ++
 .../fpu/multiarch/svml_d_cosh8_core_avx512.S  | 323 ++++++++++++++
 .../fpu/multiarch/svml_s_coshf16_core-avx2.S  |  20 +
 .../fpu/multiarch/svml_s_coshf16_core.c       |  28 ++
 .../multiarch/svml_s_coshf16_core_avx512.S    | 321 ++++++++++++++
 .../fpu/multiarch/svml_s_coshf4_core-sse2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_s_coshf4_core.c |  28 ++
 .../fpu/multiarch/svml_s_coshf4_core_sse4.S   | 305 +++++++++++++
 .../fpu/multiarch/svml_s_coshf8_core-sse.S    |  20 +
 .../x86_64/fpu/multiarch/svml_s_coshf8_core.c |  28 ++
 .../fpu/multiarch/svml_s_coshf8_core_avx2.S   | 308 +++++++++++++
 sysdeps/x86_64/fpu/svml_d_cosh2_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_cosh4_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S    |  25 ++
 sysdeps/x86_64/fpu/svml_d_cosh8_core.S        |  25 ++
 sysdeps/x86_64/fpu/svml_s_coshf16_core.S      |  25 ++
 sysdeps/x86_64/fpu/svml_s_coshf4_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_coshf8_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S   |  25 ++
 .../x86_64/fpu/test-double-libmvec-cosh-avx.c |   1 +
 .../fpu/test-double-libmvec-cosh-avx2.c       |   1 +
 .../fpu/test-double-libmvec-cosh-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-double-libmvec-cosh.c |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-coshf-avx.c |   1 +
 .../fpu/test-float-libmvec-coshf-avx2.c       |   1 +
 .../fpu/test-float-libmvec-coshf-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-float-libmvec-coshf.c |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 2637 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index bc18621f17..35c6ac57a8 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -164,4 +164,15 @@
 #define __DECL_SIMD_exp10f32x
 #define __DECL_SIMD_exp10f64x
 #define __DECL_SIMD_exp10f128x
+
+#define __DECL_SIMD_cosh
+#define __DECL_SIMD_coshf
+#define __DECL_SIMD_coshl
+#define __DECL_SIMD_coshf16
+#define __DECL_SIMD_coshf32
+#define __DECL_SIMD_coshf64
+#define __DECL_SIMD_coshf128
+#define __DECL_SIMD_coshf32x
+#define __DECL_SIMD_coshf64x
+#define __DECL_SIMD_coshf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 870778457f..60a314f69e 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -68,7 +68,7 @@ __MATHCALL (tan,, (_Mdouble_ __x));
 /* Hyperbolic functions.  */
 
 /* Hyperbolic cosine of X.  */
-__MATHCALL (cosh,, (_Mdouble_ __x));
+__MATHCALL_VEC (cosh,, (_Mdouble_ __x));
 /* Hyperbolic sine of X.  */
 __MATHCALL (sinh,, (_Mdouble_ __x));
 /* Hyperbolic tangent of X.  */
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index b3c1f59593..4907680143 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -49,48 +49,56 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
 GLIBC_2.35 _ZGVbN2v_acos F
 GLIBC_2.35 _ZGVbN2v_asin F
 GLIBC_2.35 _ZGVbN2v_atan F
+GLIBC_2.35 _ZGVbN2v_cosh F
 GLIBC_2.35 _ZGVbN2v_exp10 F
 GLIBC_2.35 _ZGVbN2v_exp2 F
 GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
 GLIBC_2.35 _ZGVbN4v_asinf F
 GLIBC_2.35 _ZGVbN4v_atanf F
+GLIBC_2.35 _ZGVbN4v_coshf F
 GLIBC_2.35 _ZGVbN4v_exp10f F
 GLIBC_2.35 _ZGVbN4v_exp2f F
 GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
 GLIBC_2.35 _ZGVcN4v_asin F
 GLIBC_2.35 _ZGVcN4v_atan F
+GLIBC_2.35 _ZGVcN4v_cosh F
 GLIBC_2.35 _ZGVcN4v_exp10 F
 GLIBC_2.35 _ZGVcN4v_exp2 F
 GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
 GLIBC_2.35 _ZGVcN8v_asinf F
 GLIBC_2.35 _ZGVcN8v_atanf F
+GLIBC_2.35 _ZGVcN8v_coshf F
 GLIBC_2.35 _ZGVcN8v_exp10f F
 GLIBC_2.35 _ZGVcN8v_exp2f F
 GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
 GLIBC_2.35 _ZGVdN4v_asin F
 GLIBC_2.35 _ZGVdN4v_atan F
+GLIBC_2.35 _ZGVdN4v_cosh F
 GLIBC_2.35 _ZGVdN4v_exp10 F
 GLIBC_2.35 _ZGVdN4v_exp2 F
 GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
 GLIBC_2.35 _ZGVdN8v_asinf F
 GLIBC_2.35 _ZGVdN8v_atanf F
+GLIBC_2.35 _ZGVdN8v_coshf F
 GLIBC_2.35 _ZGVdN8v_exp10f F
 GLIBC_2.35 _ZGVdN8v_exp2f F
 GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
 GLIBC_2.35 _ZGVeN16v_asinf F
 GLIBC_2.35 _ZGVeN16v_atanf F
+GLIBC_2.35 _ZGVeN16v_coshf F
 GLIBC_2.35 _ZGVeN16v_exp10f F
 GLIBC_2.35 _ZGVeN16v_exp2f F
 GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
 GLIBC_2.35 _ZGVeN8v_asin F
 GLIBC_2.35 _ZGVeN8v_atan F
+GLIBC_2.35 _ZGVeN8v_cosh F
 GLIBC_2.35 _ZGVeN8v_exp10 F
 GLIBC_2.35 _ZGVeN8v_exp2 F
 GLIBC_2.35 _ZGVeN8vv_hypot F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index f3f9c2e092..708e81b3d0 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -82,6 +82,10 @@
 #  define __DECL_SIMD_exp10 __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_exp10f
 #  define __DECL_SIMD_exp10f __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_cosh
+#  define __DECL_SIMD_cosh __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_coshf
+#  define __DECL_SIMD_coshf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index c033abbedc..81d0238ebf 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -40,6 +40,8 @@
 !GCC$ builtin (exp2f) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (exp10) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (exp10f) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (cosh) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (coshf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -65,3 +67,5 @@
 !GCC$ builtin (exp2f) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (exp10) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (exp10f) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (cosh) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (coshf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index fd0a9da439..5bc2df134f 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -26,6 +26,7 @@ libmvec-funcs = \
   asin \
   atan \
   cos \
+  cosh \
   exp \
   exp10 \
   exp2 \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index f29cfa4cbf..53346d16a2 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -17,12 +17,14 @@ libmvec {
     _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
     _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
     _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
+    _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh;
     _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
     _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
     _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
     _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
+    _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf;
     _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
     _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
     _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 45f2e4bb53..ac70f15208 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -891,6 +891,26 @@ float: 2
 float128: 3
 ldouble: 3
 
+Function: "cosh_vlen16":
+float: 2
+
+Function: "cosh_vlen2":
+double: 2
+
+Function: "cosh_vlen4":
+double: 2
+float: 2
+
+Function: "cosh_vlen4_avx2":
+double: 2
+
+Function: "cosh_vlen8":
+double: 2
+float: 2
+
+Function: "cosh_vlen8_avx2":
+float: 2
+
 Function: Real part of "cpow":
 double: 2
 float: 5
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S
new file mode 100644
index 0000000000..bfe4e3d0f0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized cosh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_cosh _ZGVbN2v_cosh_sse2
+#include "../svml_d_cosh2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c
new file mode 100644
index 0000000000..99561fea47
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized cosh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_cosh
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_cosh, __GI__ZGVbN2v_cosh, __redirect__ZGVbN2v_cosh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S
new file mode 100644
index 0000000000..150bfae7e1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S
@@ -0,0 +1,396 @@
+/* Function cosh vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute cosh(x) as (exp(x)+exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   cosh(NaN) = quiet NaN, and raise invalid exception
+ *   cosh(INF) = that INF
+ *   cosh(0)   = 1
+ *   cosh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_dcosh_data_internal
+ */
+#define _dbT                          	0
+#define _dbInvLn2                     	2064
+#define _dbLn2hi                      	2080
+#define _dbLn2lo                      	2096
+#define _dbShifter                    	2112
+#define _iIndexMask                   	2128
+#define _dPC2                         	2144
+#define _dPC3                         	2160
+#define _dPC4                         	2176
+#define _iMaxIndex                    	2192
+#define _lExpMask                     	2208
+#define _dSign                        	2224
+#define _iDomainRange                 	2240
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_cosh_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm4
+        movups    _dSign+__svml_dcosh_data_internal(%rip), %xmm2
+        lea       _dbT+__svml_dcosh_data_internal(%rip), %r8
+
+/*  Abs argument  */
+        movaps    %xmm2, %xmm5
+
+/* dXSign=0x001000000000 */
+        psrlq     $11, %xmm2
+
+/*
+ *  Load argument
+ * dM = x*2^K/log(2) + RShifter
+ */
+        movups    _dbInvLn2+__svml_dcosh_data_internal(%rip), %xmm3
+        andnps    %xmm4, %xmm5
+        mulpd     %xmm5, %xmm3
+        movups    _dbShifter+__svml_dcosh_data_internal(%rip), %xmm1
+        addpd     %xmm1, %xmm3
+
+/*
+ *  R
+ * dN = dM - RShifter
+ */
+        movaps    %xmm3, %xmm15
+        subpd     %xmm1, %xmm15
+
+/* dR = dX - dN*Log2_hi/2^K */
+        movups    _dbLn2hi+__svml_dcosh_data_internal(%rip), %xmm14
+        mulpd     %xmm15, %xmm14
+
+/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */
+        movups    _dbLn2lo+__svml_dcosh_data_internal(%rip), %xmm1
+        mulpd     %xmm15, %xmm1
+
+/*
+ * Check for overflow\underflow
+ *
+ */
+        pshufd    $221, %xmm5, %xmm7
+        subpd     %xmm14, %xmm5
+        movq      _iIndexMask+__svml_dcosh_data_internal(%rip), %xmm8
+
+/*  Index and lookup  */
+        pshufd    $136, %xmm3, %xmm9
+
+/*
+ *  G1,G2,G3: dTdif,dTn * 2^N,2^(-N)
+ * NB: copied from sinh_la - to be optimized!!!!!
+ */
+        psllq     $44, %xmm3
+
+/*
+ * trick
+ * 256=-iIndex
+ */
+        movq      _iMaxIndex+__svml_dcosh_data_internal(%rip), %xmm12
+        pand      %xmm8, %xmm9
+        subpd     %xmm1, %xmm5
+        psubd     %xmm9, %xmm12
+
+/* iIndex*=3 */
+        movdqa    %xmm9, %xmm10
+
+/* iDomainRange*=3 */
+        pslld     $3, %xmm12
+        pslld     $3, %xmm10
+        movd      %xmm12, %esi
+        pshufd    $1, %xmm12, %xmm13
+        movq      _iDomainRange+__svml_dcosh_data_internal(%rip), %xmm6
+        movd      %xmm13, %edi
+        pcmpgtd   %xmm6, %xmm7
+        movmskps  %xmm7, %eax
+
+/* dR2 = dR^2 */
+        movaps    %xmm5, %xmm7
+
+/* lM now is an EXP(2^N) */
+        pand      _lExpMask+__svml_dcosh_data_internal(%rip), %xmm3
+        pshufd    $1, %xmm10, %xmm11
+        movslq    %esi, %rsi
+        mulpd     %xmm5, %xmm7
+        movd      %xmm10, %edx
+        movsd     (%r8,%rsi), %xmm6
+        movd      %xmm11, %ecx
+        movslq    %edi, %rdi
+        movslq    %edx, %rdx
+        movslq    %ecx, %rcx
+        movhpd    (%r8,%rdi), %xmm6
+
+/*  */
+        psubq     %xmm3, %xmm6
+
+/* lX- = EXP(1/2) */
+        psubq     %xmm2, %xmm6
+
+/*
+ * sinh(r) = r +r*r^2*a3 ....
+ * dSinh_r = r^2*a3
+ */
+        movups    _dPC3+__svml_dcosh_data_internal(%rip), %xmm2
+        mulpd     %xmm7, %xmm2
+
+/* dSinh_r = r + r*r^2*a3 */
+        mulpd     %xmm5, %xmm2
+        movsd     (%r8,%rdx), %xmm0
+        movhpd    (%r8,%rcx), %xmm0
+        paddq     %xmm3, %xmm0
+        addpd     %xmm2, %xmm5
+
+/* dTn = dTn*2^N - dTn*2^-N */
+        movaps    %xmm0, %xmm3
+        subpd     %xmm6, %xmm3
+
+/* dTp = dTn*2^N + dTn*2^-N */
+        addpd     %xmm6, %xmm0
+        mulpd     %xmm5, %xmm3
+
+/* poly(r) = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
+        movups    _dPC4+__svml_dcosh_data_internal(%rip), %xmm5
+        mulpd     %xmm7, %xmm5
+        addpd     _dPC2+__svml_dcosh_data_internal(%rip), %xmm5
+        mulpd     %xmm5, %xmm7
+
+/* dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
+        mulpd     %xmm0, %xmm7
+        addpd     %xmm7, %xmm3
+
+/* _VRES1 = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
+        addpd     %xmm3, %xmm0
+        andl      $3, %eax
+
+/*  Ret H  */
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm4
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm4, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0
+
+        xorl      %edx, %edx
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      cosh@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN2v_cosh_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dcosh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _dbT[(1 + (1<<8))][2];  //dTpj ONLY!
+        __declspec(align(16)) VUINT32 _dbInvLn2[2][2];
+        __declspec(align(16)) VUINT32 _dbLn2hi[2][2];
+        __declspec(align(16)) VUINT32 _dbLn2lo[2][2];
+        __declspec(align(16)) VUINT32 _dbShifter[2][2];
+        __declspec(align(16)) VUINT32 _iIndexMask[4][1];          //(1<<K)1-
+        __declspec(align(16)) VUINT32 _dPC2[2][2];
+        __declspec(align(16)) VUINT32 _dPC3[2][2];
+        __declspec(align(16)) VUINT32 _dPC4[2][2];
+        __declspec(align(16)) VUINT32 _iMaxIndex[4][1];       //(1<<K)
+        __declspec(align(16)) VUINT32 _lExpMask[2][2];
+        __declspec(align(16)) VUINT32 _dSign[2][2];               //0x8000000000000000
+        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
+} __svml_dcosh_data_internal;
+#endif
+__svml_dcosh_data_internal:
+        /*== _dbT ==*/
+        .quad 0x3fe0000000000000, 0x3fe00b1afa5abcbf, 0x3fe0163da9fb3335, 0x3fe02168143b0281
+        .quad 0x3fe02c9a3e778061, 0x3fe037d42e11bbcc, 0x3fe04315e86e7f85, 0x3fe04e5f72f654b1
+        .quad 0x3fe059b0d3158574, 0x3fe0650a0e3c1f89, 0x3fe0706b29ddf6de, 0x3fe07bd42b72a836
+        .quad 0x3fe0874518759bc8, 0x3fe092bdf66607e0, 0x3fe09e3ecac6f383, 0x3fe0a9c79b1f3919
+        .quad 0x3fe0b5586cf9890f, 0x3fe0c0f145e46c85, 0x3fe0cc922b7247f7, 0x3fe0d83b23395dec
+        .quad 0x3fe0e3ec32d3d1a2, 0x3fe0efa55fdfa9c5, 0x3fe0fb66affed31b, 0x3fe1073028d7233e
+        .quad 0x3fe11301d0125b51, 0x3fe11edbab5e2ab6, 0x3fe12abdc06c31cc, 0x3fe136a814f204ab
+        .quad 0x3fe1429aaea92de0, 0x3fe14e95934f312e, 0x3fe15a98c8a58e51, 0x3fe166a45471c3c2
+        .quad 0x3fe172b83c7d517b, 0x3fe17ed48695bbc0, 0x3fe18af9388c8dea, 0x3fe1972658375d2f
+        .quad 0x3fe1a35beb6fcb75, 0x3fe1af99f8138a1c, 0x3fe1bbe084045cd4, 0x3fe1c82f95281c6b
+        .quad 0x3fe1d4873168b9aa, 0x3fe1e0e75eb44027, 0x3fe1ed5022fcd91d, 0x3fe1f9c18438ce4d
+        .quad 0x3fe2063b88628cd6, 0x3fe212be3578a819, 0x3fe21f49917ddc96, 0x3fe22bdda27912d1
+        .quad 0x3fe2387a6e756238, 0x3fe2451ffb82140a, 0x3fe251ce4fb2a63f, 0x3fe25e85711ece75
+        .quad 0x3fe26b4565e27cdd, 0x3fe2780e341ddf29, 0x3fe284dfe1f56381, 0x3fe291ba7591bb70
+        .quad 0x3fe29e9df51fdee1, 0x3fe2ab8a66d10f13, 0x3fe2b87fd0dad990, 0x3fe2c57e39771b2f
+        .quad 0x3fe2d285a6e4030b, 0x3fe2df961f641589, 0x3fe2ecafa93e2f56, 0x3fe2f9d24abd886b
+        .quad 0x3fe306fe0a31b715, 0x3fe31432edeeb2fd, 0x3fe32170fc4cd831, 0x3fe32eb83ba8ea32
+        .quad 0x3fe33c08b26416ff, 0x3fe3496266e3fa2d, 0x3fe356c55f929ff1, 0x3fe36431a2de883b
+        .quad 0x3fe371a7373aa9cb, 0x3fe37f26231e754a, 0x3fe38cae6d05d866, 0x3fe39a401b7140ef
+        .quad 0x3fe3a7db34e59ff7, 0x3fe3b57fbfec6cf4, 0x3fe3c32dc313a8e5, 0x3fe3d0e544ede173
+        .quad 0x3fe3dea64c123422, 0x3fe3ec70df1c5175, 0x3fe3fa4504ac801c, 0x3fe40822c367a024
+        .quad 0x3fe4160a21f72e2a, 0x3fe423fb2709468a, 0x3fe431f5d950a897, 0x3fe43ffa3f84b9d4
+        .quad 0x3fe44e086061892d, 0x3fe45c2042a7d232, 0x3fe46a41ed1d0057, 0x3fe4786d668b3237
+        .quad 0x3fe486a2b5c13cd0, 0x3fe494e1e192aed2, 0x3fe4a32af0d7d3de, 0x3fe4b17dea6db7d7
+        .quad 0x3fe4bfdad5362a27, 0x3fe4ce41b817c114, 0x3fe4dcb299fddd0d, 0x3fe4eb2d81d8abff
+        .quad 0x3fe4f9b2769d2ca7, 0x3fe508417f4531ee, 0x3fe516daa2cf6642, 0x3fe5257de83f4eef
+        .quad 0x3fe5342b569d4f82, 0x3fe542e2f4f6ad27, 0x3fe551a4ca5d920f, 0x3fe56070dde910d2
+        .quad 0x3fe56f4736b527da, 0x3fe57e27dbe2c4cf, 0x3fe58d12d497c7fd, 0x3fe59c0827ff07cc
+        .quad 0x3fe5ab07dd485429, 0x3fe5ba11fba87a03, 0x3fe5c9268a5946b7, 0x3fe5d84590998b93
+        .quad 0x3fe5e76f15ad2148, 0x3fe5f6a320dceb71, 0x3fe605e1b976dc09, 0x3fe6152ae6cdf6f4
+        .quad 0x3fe6247eb03a5585, 0x3fe633dd1d1929fd, 0x3fe6434634ccc320, 0x3fe652b9febc8fb7
+        .quad 0x3fe6623882552225, 0x3fe671c1c70833f6, 0x3fe68155d44ca973, 0x3fe690f4b19e9538
+        .quad 0x3fe6a09e667f3bcd, 0x3fe6b052fa75173e, 0x3fe6c012750bdabf, 0x3fe6cfdcddd47645
+        .quad 0x3fe6dfb23c651a2f, 0x3fe6ef9298593ae5, 0x3fe6ff7df9519484, 0x3fe70f7466f42e87
+        .quad 0x3fe71f75e8ec5f74, 0x3fe72f8286ead08a, 0x3fe73f9a48a58174, 0x3fe74fbd35d7cbfd
+        .quad 0x3fe75feb564267c9, 0x3fe77024b1ab6e09, 0x3fe780694fde5d3f, 0x3fe790b938ac1cf6
+        .quad 0x3fe7a11473eb0187, 0x3fe7b17b0976cfdb, 0x3fe7c1ed0130c132, 0x3fe7d26a62ff86f0
+        .quad 0x3fe7e2f336cf4e62, 0x3fe7f3878491c491, 0x3fe80427543e1a12, 0x3fe814d2add106d9
+        .quad 0x3fe82589994cce13, 0x3fe8364c1eb941f7, 0x3fe8471a4623c7ad, 0x3fe857f4179f5b21
+        .quad 0x3fe868d99b4492ed, 0x3fe879cad931a436, 0x3fe88ac7d98a6699, 0x3fe89bd0a478580f
+        .quad 0x3fe8ace5422aa0db, 0x3fe8be05bad61778, 0x3fe8cf3216b5448c, 0x3fe8e06a5e0866d9
+        .quad 0x3fe8f1ae99157736, 0x3fe902fed0282c8a, 0x3fe9145b0b91ffc6, 0x3fe925c353aa2fe2
+        .quad 0x3fe93737b0cdc5e5, 0x3fe948b82b5f98e5, 0x3fe95a44cbc8520f, 0x3fe96bdd9a7670b3
+        .quad 0x3fe97d829fde4e50, 0x3fe98f33e47a22a2, 0x3fe9a0f170ca07ba, 0x3fe9b2bb4d53fe0d
+        .quad 0x3fe9c49182a3f090, 0x3fe9d674194bb8d5, 0x3fe9e86319e32323, 0x3fe9fa5e8d07f29e
+        .quad 0x3fea0c667b5de565, 0x3fea1e7aed8eb8bb, 0x3fea309bec4a2d33, 0x3fea42c980460ad8
+        .quad 0x3fea5503b23e255d, 0x3fea674a8af46052, 0x3fea799e1330b358, 0x3fea8bfe53c12e59
+        .quad 0x3fea9e6b5579fdbf, 0x3feab0e521356eba, 0x3feac36bbfd3f37a, 0x3fead5ff3a3c2774
+        .quad 0x3feae89f995ad3ad, 0x3feafb4ce622f2ff, 0x3feb0e07298db666, 0x3feb20ce6c9a8952
+        .quad 0x3feb33a2b84f15fb, 0x3feb468415b749b1, 0x3feb59728de5593a, 0x3feb6c6e29f1c52a
+        .quad 0x3feb7f76f2fb5e47, 0x3feb928cf22749e4, 0x3feba5b030a1064a, 0x3febb8e0b79a6f1f
+        .quad 0x3febcc1e904bc1d2, 0x3febdf69c3f3a207, 0x3febf2c25bd71e09, 0x3fec06286141b33d
+        .quad 0x3fec199bdd85529c, 0x3fec2d1cd9fa652c, 0x3fec40ab5fffd07a, 0x3fec544778fafb22
+        .quad 0x3fec67f12e57d14b, 0x3fec7ba88988c933, 0x3fec8f6d9406e7b5, 0x3feca3405751c4db
+        .quad 0x3fecb720dcef9069, 0x3feccb0f2e6d1675, 0x3fecdf0b555dc3fa, 0x3fecf3155b5bab74
+        .quad 0x3fed072d4a07897c, 0x3fed1b532b08c968, 0x3fed2f87080d89f2, 0x3fed43c8eacaa1d6
+        .quad 0x3fed5818dcfba487, 0x3fed6c76e862e6d3, 0x3fed80e316c98398, 0x3fed955d71ff6075
+        .quad 0x3feda9e603db3285, 0x3fedbe7cd63a8315, 0x3fedd321f301b460, 0x3fede7d5641c0658
+        .quad 0x3fedfc97337b9b5f, 0x3fee11676b197d17, 0x3fee264614f5a129, 0x3fee3b333b16ee12
+        .quad 0x3fee502ee78b3ff6, 0x3fee653924676d76, 0x3fee7a51fbc74c83, 0x3fee8f7977cdb740
+        .quad 0x3feea4afa2a490da, 0x3feeb9f4867cca6e, 0x3feecf482d8e67f1, 0x3feee4aaa2188510
+        .quad 0x3feefa1bee615a27, 0x3fef0f9c1cb6412a, 0x3fef252b376bba97, 0x3fef3ac948dd7274
+        .quad 0x3fef50765b6e4540, 0x3fef6632798844f8, 0x3fef7bfdad9cbe14, 0x3fef91d802243c89
+        .quad 0x3fefa7c1819e90d8, 0x3fefbdba3692d514, 0x3fefd3c22b8f71f1, 0x3fefe9d96b2a23d9
+        .quad 0x3ff0000000000000
+        .align 16
+        .quad 0x3ff71547652b82fe, 0x3ff71547652b82fe /* _dbInvLn2 = 1/log(2) */
+        .align 16
+        .quad 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000 /* _dbLn2hi  = log(2) hi*/
+        .align 16
+        .quad 0xBDAC610CA86C3899, 0xBDAC610CA86C3899 /* _dbLn2lo  = log(2) lo*/
+        .align 16
+        .quad 0x42B8000000000000, 0x42B8000000000000 /* _dbShifter */
+        .align 16
+        .long 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF         /* _iIndexMask */
+        .align 16
+        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
+        .align 16
+        .quad 0x3FC5555570813E14, 0x3FC5555570813E14 /* _dPC3 */
+        .align 16
+        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
+        .align 16
+        .long 0x00000100, 0x00000100, 0x00000100, 0x00000100 /* _iMaxIndex */
+        .align 16
+        .quad 0x7ff0000000000000, 0x7ff0000000000000 /* _lExpMask */
+        .align 16
+        .quad 0x8000000000000000, 0x8000000000000000 /* _dSign*/
+        .align 16
+        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99 /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
+        .align 16
+        .type	__svml_dcosh_data_internal,@object
+        .size	__svml_dcosh_data_internal,.-__svml_dcosh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S
new file mode 100644
index 0000000000..4410d34583
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized cosh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_cosh _ZGVdN4v_cosh_sse_wrapper
+#include "../svml_d_cosh4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c
new file mode 100644
index 0000000000..c4f59206a9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized cosh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_cosh
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_cosh, __GI__ZGVdN4v_cosh, __redirect__ZGVdN4v_cosh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S
new file mode 100644
index 0000000000..2d86a02923
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S
@@ -0,0 +1,412 @@
+/* Function cosh vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute cosh(x) as (exp(x)+exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   cosh(NaN) = quiet NaN, and raise invalid exception
+ *   cosh(INF) = that INF
+ *   cosh(0)   = 1
+ *   cosh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_dcosh_data_internal
+ */
+#define _dbT                          	0
+#define _dbInvLn2                     	2080
+#define _dbLn2hi                      	2112
+#define _dbLn2lo                      	2144
+#define _dbShifter                    	2176
+#define _iIndexMask                   	2208
+#define _dPC2                         	2240
+#define _dPC3                         	2272
+#define _dPC4                         	2304
+#define _iMaxIndex                    	2336
+#define _lExpMask                     	2368
+#define _dSign                        	2400
+#define _iDomainRange                 	2432
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_cosh_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       _dbT+__svml_dcosh_data_internal(%rip), %rax
+        vmovupd   _dSign+__svml_dcosh_data_internal(%rip), %ymm8
+        vmovupd   _dbShifter+__svml_dcosh_data_internal(%rip), %ymm6
+
+/*
+ *  Load argument
+ * dM = x*2^K/log(2) + RShifter
+ */
+        vmovupd   _dbInvLn2+__svml_dcosh_data_internal(%rip), %ymm3
+
+/*
+ * trick
+ * 256=-iIndex
+ */
+        vmovups   _iMaxIndex+__svml_dcosh_data_internal(%rip), %xmm14
+
+/* dXSign=0x001000000000 */
+        vpsrlq    $11, %ymm8, %ymm5
+        vmovapd   %ymm0, %ymm7
+
+/*  Abs argument  */
+        vandnpd   %ymm7, %ymm8, %ymm4
+        vfmadd213pd %ymm6, %ymm4, %ymm3
+
+/*  Index and lookup  */
+        vextractf128 $1, %ymm3, %xmm12
+        vshufps   $136, %xmm12, %xmm3, %xmm13
+        vpand     _iIndexMask+__svml_dcosh_data_internal(%rip), %xmm13, %xmm15
+        vpsubd    %xmm15, %xmm14, %xmm0
+
+/* iDomainRange*=3 */
+        vpslld    $3, %xmm0, %xmm2
+        vmovd     %xmm2, %r9d
+        vpextrd   $2, %xmm2, %r11d
+        movslq    %r9d, %r9
+        vpextrd   $1, %xmm2, %r10d
+        movslq    %r11d, %r11
+        movslq    %r10d, %r10
+        vmovsd    (%rax,%r9), %xmm12
+
+/*
+ * Check for overflow\underflow
+ *
+ */
+        vextractf128 $1, %ymm4, %xmm9
+        vmovsd    (%rax,%r11), %xmm14
+        vmovhpd   (%rax,%r10), %xmm12, %xmm13
+        vshufps   $221, %xmm9, %xmm4, %xmm10
+
+/* iIndex*=3 */
+        vpslld    $3, %xmm15, %xmm9
+
+/*
+ *  R
+ * dN = dM - RShifter
+ */
+        vsubpd    %ymm6, %ymm3, %ymm15
+        vmovd     %xmm9, %ecx
+        vpcmpgtd  _iDomainRange+__svml_dcosh_data_internal(%rip), %xmm10, %xmm11
+        vmovupd   _dbLn2hi+__svml_dcosh_data_internal(%rip), %ymm6
+
+/*
+ *  G1,G2,G3: dTdif,dTn * 2^N,2^(-N)
+ * NB: copied from sinh_la - to be optimized!!!!!
+ */
+        vpsllq    $44, %ymm3, %ymm3
+        vmovmskps %xmm11, %edx
+
+/* dR = dX - dN*Log2_hi/2^K */
+        vfnmadd231pd %ymm6, %ymm15, %ymm4
+
+/* lM now is an EXP(2^N) */
+        vpand     _lExpMask+__svml_dcosh_data_internal(%rip), %ymm3, %ymm3
+
+/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */
+        vfnmadd231pd _dbLn2lo+__svml_dcosh_data_internal(%rip), %ymm15, %ymm4
+        movslq    %ecx, %rcx
+        vpextrd   $2, %xmm9, %edi
+        vpextrd   $1, %xmm9, %esi
+        movslq    %edi, %rdi
+        vmovsd    (%rax,%rcx), %xmm1
+        vpextrd   $3, %xmm9, %r8d
+        vpextrd   $3, %xmm2, %ecx
+        movslq    %esi, %rsi
+        movslq    %r8d, %r8
+        movslq    %ecx, %rcx
+
+/* dR2 = dR^2 */
+        vmulpd    %ymm4, %ymm4, %ymm0
+        vmovsd    (%rax,%rdi), %xmm10
+        vmovhpd   (%rax,%rsi), %xmm1, %xmm8
+        vmovhpd   (%rax,%r8), %xmm10, %xmm11
+        vmovhpd   (%rax,%rcx), %xmm14, %xmm2
+        vinsertf128 $1, %xmm11, %ymm8, %ymm1
+        vinsertf128 $1, %xmm2, %ymm13, %ymm2
+        vpaddq    %ymm3, %ymm1, %ymm6
+
+/*  */
+        vpsubq    %ymm3, %ymm2, %ymm1
+
+/*
+ * sinh(r) = r +r*r^2*a3 ....
+ * dSinh_r = r^2*a3
+ */
+        vmulpd    _dPC3+__svml_dcosh_data_internal(%rip), %ymm0, %ymm2
+
+/* lX- = EXP(1/2) */
+        vpsubq    %ymm5, %ymm1, %ymm5
+
+/* dSinh_r = r + r*r^2*a3 */
+        vfmadd213pd %ymm4, %ymm4, %ymm2
+
+/* poly(r) = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
+        vmovupd   _dPC4+__svml_dcosh_data_internal(%rip), %ymm4
+
+/* dTn = dTn*2^N - dTn*2^-N */
+        vsubpd    %ymm5, %ymm6, %ymm1
+
+/* dTp = dTn*2^N + dTn*2^-N */
+        vaddpd    %ymm5, %ymm6, %ymm3
+        vfmadd213pd _dPC2+__svml_dcosh_data_internal(%rip), %ymm0, %ymm4
+        vmulpd    %ymm2, %ymm1, %ymm1
+        vmulpd    %ymm4, %ymm0, %ymm0
+
+/* dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
+        vfmadd213pd %ymm1, %ymm3, %ymm0
+
+/* _VRES1 = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
+        vaddpd    %ymm0, %ymm3, %ymm0
+
+/*  Ret H  */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm7
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm7, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      cosh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_cosh_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dcosh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _dbT[(1 + (1<<8))][2];  //dTpj ONLY!
+        __declspec(align(32)) VUINT32 _dbInvLn2[4][2];
+        __declspec(align(32)) VUINT32 _dbLn2hi[4][2];
+        __declspec(align(32)) VUINT32 _dbLn2lo[4][2];
+        __declspec(align(32)) VUINT32 _dbShifter[4][2];
+        __declspec(align(32)) VUINT32 _iIndexMask[8][1];          //(1<<K)1-
+        __declspec(align(32)) VUINT32 _dPC2[4][2];
+        __declspec(align(32)) VUINT32 _dPC3[4][2];
+        __declspec(align(32)) VUINT32 _dPC4[4][2];
+        __declspec(align(32)) VUINT32 _iMaxIndex[8][1];       //(1<<K)
+        __declspec(align(32)) VUINT32 _lExpMask[4][2];
+        __declspec(align(32)) VUINT32 _dSign[4][2];               //0x8000000000000000
+        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
+} __svml_dcosh_data_internal;
+#endif
+__svml_dcosh_data_internal:
+        /*== _dbT ==*/
+        .quad 0x3fe0000000000000, 0x3fe00b1afa5abcbf, 0x3fe0163da9fb3335, 0x3fe02168143b0281
+        .quad 0x3fe02c9a3e778061, 0x3fe037d42e11bbcc, 0x3fe04315e86e7f85, 0x3fe04e5f72f654b1
+        .quad 0x3fe059b0d3158574, 0x3fe0650a0e3c1f89, 0x3fe0706b29ddf6de, 0x3fe07bd42b72a836
+        .quad 0x3fe0874518759bc8, 0x3fe092bdf66607e0, 0x3fe09e3ecac6f383, 0x3fe0a9c79b1f3919
+        .quad 0x3fe0b5586cf9890f, 0x3fe0c0f145e46c85, 0x3fe0cc922b7247f7, 0x3fe0d83b23395dec
+        .quad 0x3fe0e3ec32d3d1a2, 0x3fe0efa55fdfa9c5, 0x3fe0fb66affed31b, 0x3fe1073028d7233e
+        .quad 0x3fe11301d0125b51, 0x3fe11edbab5e2ab6, 0x3fe12abdc06c31cc, 0x3fe136a814f204ab
+        .quad 0x3fe1429aaea92de0, 0x3fe14e95934f312e, 0x3fe15a98c8a58e51, 0x3fe166a45471c3c2
+        .quad 0x3fe172b83c7d517b, 0x3fe17ed48695bbc0, 0x3fe18af9388c8dea, 0x3fe1972658375d2f
+        .quad 0x3fe1a35beb6fcb75, 0x3fe1af99f8138a1c, 0x3fe1bbe084045cd4, 0x3fe1c82f95281c6b
+        .quad 0x3fe1d4873168b9aa, 0x3fe1e0e75eb44027, 0x3fe1ed5022fcd91d, 0x3fe1f9c18438ce4d
+        .quad 0x3fe2063b88628cd6, 0x3fe212be3578a819, 0x3fe21f49917ddc96, 0x3fe22bdda27912d1
+        .quad 0x3fe2387a6e756238, 0x3fe2451ffb82140a, 0x3fe251ce4fb2a63f, 0x3fe25e85711ece75
+        .quad 0x3fe26b4565e27cdd, 0x3fe2780e341ddf29, 0x3fe284dfe1f56381, 0x3fe291ba7591bb70
+        .quad 0x3fe29e9df51fdee1, 0x3fe2ab8a66d10f13, 0x3fe2b87fd0dad990, 0x3fe2c57e39771b2f
+        .quad 0x3fe2d285a6e4030b, 0x3fe2df961f641589, 0x3fe2ecafa93e2f56, 0x3fe2f9d24abd886b
+        .quad 0x3fe306fe0a31b715, 0x3fe31432edeeb2fd, 0x3fe32170fc4cd831, 0x3fe32eb83ba8ea32
+        .quad 0x3fe33c08b26416ff, 0x3fe3496266e3fa2d, 0x3fe356c55f929ff1, 0x3fe36431a2de883b
+        .quad 0x3fe371a7373aa9cb, 0x3fe37f26231e754a, 0x3fe38cae6d05d866, 0x3fe39a401b7140ef
+        .quad 0x3fe3a7db34e59ff7, 0x3fe3b57fbfec6cf4, 0x3fe3c32dc313a8e5, 0x3fe3d0e544ede173
+        .quad 0x3fe3dea64c123422, 0x3fe3ec70df1c5175, 0x3fe3fa4504ac801c, 0x3fe40822c367a024
+        .quad 0x3fe4160a21f72e2a, 0x3fe423fb2709468a, 0x3fe431f5d950a897, 0x3fe43ffa3f84b9d4
+        .quad 0x3fe44e086061892d, 0x3fe45c2042a7d232, 0x3fe46a41ed1d0057, 0x3fe4786d668b3237
+        .quad 0x3fe486a2b5c13cd0, 0x3fe494e1e192aed2, 0x3fe4a32af0d7d3de, 0x3fe4b17dea6db7d7
+        .quad 0x3fe4bfdad5362a27, 0x3fe4ce41b817c114, 0x3fe4dcb299fddd0d, 0x3fe4eb2d81d8abff
+        .quad 0x3fe4f9b2769d2ca7, 0x3fe508417f4531ee, 0x3fe516daa2cf6642, 0x3fe5257de83f4eef
+        .quad 0x3fe5342b569d4f82, 0x3fe542e2f4f6ad27, 0x3fe551a4ca5d920f, 0x3fe56070dde910d2
+        .quad 0x3fe56f4736b527da, 0x3fe57e27dbe2c4cf, 0x3fe58d12d497c7fd, 0x3fe59c0827ff07cc
+        .quad 0x3fe5ab07dd485429, 0x3fe5ba11fba87a03, 0x3fe5c9268a5946b7, 0x3fe5d84590998b93
+        .quad 0x3fe5e76f15ad2148, 0x3fe5f6a320dceb71, 0x3fe605e1b976dc09, 0x3fe6152ae6cdf6f4
+        .quad 0x3fe6247eb03a5585, 0x3fe633dd1d1929fd, 0x3fe6434634ccc320, 0x3fe652b9febc8fb7
+        .quad 0x3fe6623882552225, 0x3fe671c1c70833f6, 0x3fe68155d44ca973, 0x3fe690f4b19e9538
+        .quad 0x3fe6a09e667f3bcd, 0x3fe6b052fa75173e, 0x3fe6c012750bdabf, 0x3fe6cfdcddd47645
+        .quad 0x3fe6dfb23c651a2f, 0x3fe6ef9298593ae5, 0x3fe6ff7df9519484, 0x3fe70f7466f42e87
+        .quad 0x3fe71f75e8ec5f74, 0x3fe72f8286ead08a, 0x3fe73f9a48a58174, 0x3fe74fbd35d7cbfd
+        .quad 0x3fe75feb564267c9, 0x3fe77024b1ab6e09, 0x3fe780694fde5d3f, 0x3fe790b938ac1cf6
+        .quad 0x3fe7a11473eb0187, 0x3fe7b17b0976cfdb, 0x3fe7c1ed0130c132, 0x3fe7d26a62ff86f0
+        .quad 0x3fe7e2f336cf4e62, 0x3fe7f3878491c491, 0x3fe80427543e1a12, 0x3fe814d2add106d9
+        .quad 0x3fe82589994cce13, 0x3fe8364c1eb941f7, 0x3fe8471a4623c7ad, 0x3fe857f4179f5b21
+        .quad 0x3fe868d99b4492ed, 0x3fe879cad931a436, 0x3fe88ac7d98a6699, 0x3fe89bd0a478580f
+        .quad 0x3fe8ace5422aa0db, 0x3fe8be05bad61778, 0x3fe8cf3216b5448c, 0x3fe8e06a5e0866d9
+        .quad 0x3fe8f1ae99157736, 0x3fe902fed0282c8a, 0x3fe9145b0b91ffc6, 0x3fe925c353aa2fe2
+        .quad 0x3fe93737b0cdc5e5, 0x3fe948b82b5f98e5, 0x3fe95a44cbc8520f, 0x3fe96bdd9a7670b3
+        .quad 0x3fe97d829fde4e50, 0x3fe98f33e47a22a2, 0x3fe9a0f170ca07ba, 0x3fe9b2bb4d53fe0d
+        .quad 0x3fe9c49182a3f090, 0x3fe9d674194bb8d5, 0x3fe9e86319e32323, 0x3fe9fa5e8d07f29e
+        .quad 0x3fea0c667b5de565, 0x3fea1e7aed8eb8bb, 0x3fea309bec4a2d33, 0x3fea42c980460ad8
+        .quad 0x3fea5503b23e255d, 0x3fea674a8af46052, 0x3fea799e1330b358, 0x3fea8bfe53c12e59
+        .quad 0x3fea9e6b5579fdbf, 0x3feab0e521356eba, 0x3feac36bbfd3f37a, 0x3fead5ff3a3c2774
+        .quad 0x3feae89f995ad3ad, 0x3feafb4ce622f2ff, 0x3feb0e07298db666, 0x3feb20ce6c9a8952
+        .quad 0x3feb33a2b84f15fb, 0x3feb468415b749b1, 0x3feb59728de5593a, 0x3feb6c6e29f1c52a
+        .quad 0x3feb7f76f2fb5e47, 0x3feb928cf22749e4, 0x3feba5b030a1064a, 0x3febb8e0b79a6f1f
+        .quad 0x3febcc1e904bc1d2, 0x3febdf69c3f3a207, 0x3febf2c25bd71e09, 0x3fec06286141b33d
+        .quad 0x3fec199bdd85529c, 0x3fec2d1cd9fa652c, 0x3fec40ab5fffd07a, 0x3fec544778fafb22
+        .quad 0x3fec67f12e57d14b, 0x3fec7ba88988c933, 0x3fec8f6d9406e7b5, 0x3feca3405751c4db
+        .quad 0x3fecb720dcef9069, 0x3feccb0f2e6d1675, 0x3fecdf0b555dc3fa, 0x3fecf3155b5bab74
+        .quad 0x3fed072d4a07897c, 0x3fed1b532b08c968, 0x3fed2f87080d89f2, 0x3fed43c8eacaa1d6
+        .quad 0x3fed5818dcfba487, 0x3fed6c76e862e6d3, 0x3fed80e316c98398, 0x3fed955d71ff6075
+        .quad 0x3feda9e603db3285, 0x3fedbe7cd63a8315, 0x3fedd321f301b460, 0x3fede7d5641c0658
+        .quad 0x3fedfc97337b9b5f, 0x3fee11676b197d17, 0x3fee264614f5a129, 0x3fee3b333b16ee12
+        .quad 0x3fee502ee78b3ff6, 0x3fee653924676d76, 0x3fee7a51fbc74c83, 0x3fee8f7977cdb740
+        .quad 0x3feea4afa2a490da, 0x3feeb9f4867cca6e, 0x3feecf482d8e67f1, 0x3feee4aaa2188510
+        .quad 0x3feefa1bee615a27, 0x3fef0f9c1cb6412a, 0x3fef252b376bba97, 0x3fef3ac948dd7274
+        .quad 0x3fef50765b6e4540, 0x3fef6632798844f8, 0x3fef7bfdad9cbe14, 0x3fef91d802243c89
+        .quad 0x3fefa7c1819e90d8, 0x3fefbdba3692d514, 0x3fefd3c22b8f71f1, 0x3fefe9d96b2a23d9
+        .quad 0x3ff0000000000000
+        .align 32
+        .quad 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe /* _dbInvLn2 = 1/log(2) */
+        .align 32
+        .quad 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000 /* _dbLn2hi  = log(2) hi*/
+        .align 32
+        .quad 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899 /* _dbLn2lo  = log(2) lo*/
+        .align 32
+        .quad 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000 /* _dbShifter */
+        .align 32
+        .long 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF         /* _iIndexMask */
+        .align 32
+        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
+        .align 32
+        .quad 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14 /* _dPC3 */
+        .align 32
+        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
+        .align 32
+        .long 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100 /* _iMaxIndex */
+        .align 32
+        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000 /* _lExpMask */
+        .align 32
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 /* _dSign*/
+        .align 32
+        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99 /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
+        .align 32
+        .type	__svml_dcosh_data_internal,@object
+        .size	__svml_dcosh_data_internal,.-__svml_dcosh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S
new file mode 100644
index 0000000000..8b385cc297
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized cosh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_cosh _ZGVeN8v_cosh_avx2_wrapper
+#include "../svml_d_cosh8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c
new file mode 100644
index 0000000000..576b3186d5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized cosh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_cosh
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_cosh, __GI__ZGVeN8v_cosh, __redirect__ZGVeN8v_cosh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S
new file mode 100644
index 0000000000..53040cef9a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S
@@ -0,0 +1,323 @@
+/* Function cosh vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute cosh(x) as (exp(x)+exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   cosh(NaN) = quiet NaN, and raise invalid exception
+ *   cosh(INF) = that INF
+ *   cosh(0)   = 1
+ *   cosh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_dcosh_data_internal
+ */
+#define _dTp_h                        	0
+#define _dTn_h                        	128
+#define _dbShifter_UISA               	256
+#define _dPC2_UISA                    	320
+#define _dPC3_UISA                    	384
+#define _dPC4_UISA                    	448
+#define _dPC5_UISA                    	512
+#define _dPC6_UISA                    	576
+#define _dPC7_UISA                    	640
+#define _dbInvLn2                     	704
+#define _dbLn2hi                      	768
+#define _dbLn2lo                      	832
+#define _dbShifter                    	896
+#define _dPC2                         	960
+#define _dPC3                         	1024
+#define _dPC4                         	1088
+#define _lExpMask                     	1152
+#define _dSign                        	1216
+#define _iDomainRange                 	1280
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_cosh_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   _dSign+__svml_dcosh_data_internal(%rip), %zmm11
+        vmovups   _dbShifter_UISA+__svml_dcosh_data_internal(%rip), %zmm15
+
+/*
+ *  Load argument
+ * dM = x*2^K/log(2) + RShifter
+ */
+        vmovups   _dbInvLn2+__svml_dcosh_data_internal(%rip), %zmm4
+        vmovups   _dbLn2hi+__svml_dcosh_data_internal(%rip), %zmm2
+        vmovups   _dbLn2lo+__svml_dcosh_data_internal(%rip), %zmm3
+        vmovups   _dPC7_UISA+__svml_dcosh_data_internal(%rip), %zmm8
+        vmovups   _dPC6_UISA+__svml_dcosh_data_internal(%rip), %zmm9
+        vmovups   _dPC2_UISA+__svml_dcosh_data_internal(%rip), %zmm7
+        vmovups   _dPC3_UISA+__svml_dcosh_data_internal(%rip), %zmm6
+        vmovaps   %zmm0, %zmm10
+
+/*  Abs argument  */
+        vandnpd   %zmm10, %zmm11, %zmm5
+
+/*  Index and lookup  */
+        vmovups   __svml_dcosh_data_internal(%rip), %zmm11
+        vmovups   _dTn_h+__svml_dcosh_data_internal(%rip), %zmm0
+        vfmadd213pd {rn-sae}, %zmm15, %zmm5, %zmm4
+
+/*
+ * Check for overflow\underflow
+ *
+ */
+        vpsrlq    $32, %zmm5, %zmm12
+
+/* dN = dM - RShifter */
+        vsubpd    {rn-sae}, %zmm15, %zmm4, %zmm1
+        vpmovqd   %zmm12, %ymm13
+        vpermt2pd _dTn_h+64+__svml_dcosh_data_internal(%rip), %zmm4, %zmm0
+        vpermt2pd _dTp_h+64+__svml_dcosh_data_internal(%rip), %zmm4, %zmm11
+
+/* dR = dX - dN*Log2_hi/2^K */
+        vfnmadd231pd {rn-sae}, %zmm2, %zmm1, %zmm5
+
+/*
+ * poly(r) = Gmjp(1 + a2*r^2 + a4*r^4) + Gmjn*(r+ a3*r^3 +a5*r^5)       =
+ * = Gmjp_h +Gmjp_l+ Gmjp*r^2*(a2 + a4*r^2) + Gmjn*(r+ r^3*(a3 +a5*r^2)
+ */
+        vmovups   _dPC5_UISA+__svml_dcosh_data_internal(%rip), %zmm12
+        vpsllq    $48, %zmm4, %zmm2
+
+/* dR = dX - dN*Log2_hi/2^K */
+        vfnmadd231pd {rn-sae}, %zmm3, %zmm1, %zmm5
+        vmulpd    {rn-sae}, %zmm5, %zmm5, %zmm1
+        vfmadd231pd {rn-sae}, %zmm1, %zmm8, %zmm12
+        vmovups   _dPC4_UISA+__svml_dcosh_data_internal(%rip), %zmm8
+        vfmadd213pd {rn-sae}, %zmm6, %zmm1, %zmm12
+        vfmadd231pd {rn-sae}, %zmm1, %zmm9, %zmm8
+        vfmadd213pd {rn-sae}, %zmm7, %zmm1, %zmm8
+        vpcmpgtd  _iDomainRange+__svml_dcosh_data_internal(%rip), %ymm13, %ymm14
+        vmovmskps %ymm14, %edx
+
+/* dOut=r^2*(a2 + a4*r^2) */
+        vmulpd    {rn-sae}, %zmm1, %zmm8, %zmm6
+
+/* lM now is an EXP(2^N) */
+        vpandq    _lExpMask+__svml_dcosh_data_internal(%rip), %zmm2, %zmm3
+        vpaddq    %zmm3, %zmm11, %zmm4
+        vpsubq    %zmm3, %zmm0, %zmm0
+        vsubpd    {rn-sae}, %zmm0, %zmm4, %zmm14
+        vaddpd    {rn-sae}, %zmm0, %zmm4, %zmm13
+
+/* dM=r^2*(a3 +a5*r^2) */
+        vmulpd    {rn-sae}, %zmm1, %zmm12, %zmm0
+        vfmadd213pd {rn-sae}, %zmm13, %zmm13, %zmm6
+
+/* dM= r + r^3*(a3 +a5*r^2) */
+        vfmadd213pd {rn-sae}, %zmm5, %zmm5, %zmm0
+        vfmadd213pd {rn-sae}, %zmm6, %zmm14, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm10
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm10, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      cosh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_cosh_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dcosh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(64)) VUINT32 _dTp_h[(1<<4)][2];
+        __declspec(align(64)) VUINT32 _dTn_h[(1<<4)][2];
+        __declspec(align(64)) VUINT32 _dbShifter_UISA[8][2];
+        __declspec(align(64)) VUINT32 _dPC2_UISA[8][2];
+        __declspec(align(64)) VUINT32 _dPC3_UISA[8][2];
+        __declspec(align(64)) VUINT32 _dPC4_UISA[8][2];
+        __declspec(align(64)) VUINT32 _dPC5_UISA[8][2];
+        __declspec(align(64)) VUINT32 _dPC6_UISA[8][2];
+        __declspec(align(64)) VUINT32 _dPC7_UISA[8][2];
+        __declspec(align(64)) VUINT32 _dbInvLn2[8][2];
+        __declspec(align(64)) VUINT32 _dbLn2hi[8][2];
+        __declspec(align(64)) VUINT32 _dbLn2lo[8][2];
+        __declspec(align(64)) VUINT32 _dbShifter[8][2];
+        __declspec(align(64)) VUINT32 _dPC2[8][2];
+        __declspec(align(64)) VUINT32 _dPC3[8][2];
+        __declspec(align(64)) VUINT32 _dPC4[8][2];
+        __declspec(align(64)) VUINT32 _lExpMask[8][2];
+        __declspec(align(64)) VUINT32 _dSign[8][2];               //0x8000000000000000
+        __declspec(align(64)) VUINT32 _iDomainRange[16][1];
+} __svml_dcosh_data_internal;
+#endif
+__svml_dcosh_data_internal:
+        /*== _dTp_h ==*/
+        .quad 0x3fe0000000000000, 0x3fe0b5586cf9890f, 0x3fe172b83c7d517b, 0x3fe2387a6e756238
+        .quad 0x3fe306fe0a31b715, 0x3fe3dea64c123422, 0x3fe4bfdad5362a27, 0x3fe5ab07dd485429
+        .quad 0x3fe6a09e667f3bcd, 0x3fe7a11473eb0187, 0x3fe8ace5422aa0db, 0x3fe9c49182a3f090
+        .quad 0x3feae89f995ad3ad, 0x3fec199bdd85529c, 0x3fed5818dcfba487, 0x3feea4afa2a490da
+        /*== dTn_h ==*/
+        .align 64
+        .quad 0x3fe0000000000000, 0x3fdea4afa2a490da, 0x3fdd5818dcfba487, 0x3fdc199bdd85529c
+        .quad 0x3fdae89f995ad3ad, 0x3fd9c49182a3f090, 0x3fd8ace5422aa0db, 0x3fd7a11473eb0187
+        .quad 0x3fd6a09e667f3bcd, 0x3fd5ab07dd485429, 0x3fd4bfdad5362a27, 0x3fd3dea64c123422
+        .quad 0x3fd306fe0a31b715, 0x3fd2387a6e756238, 0x3fd172b83c7d517b, 0x3fd0b5586cf9890f
+        .align 64
+        .quad 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000 /* _dbShifter_UISA  */
+        .align 64
+        .quad 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004 /* _dPC2_UISA       */
+        .align 64
+        .quad 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543 /* _dPC3_UISA       */
+        .align 64
+        .quad 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37 /* _dPC4_UISA       */
+        .align 64
+        .quad 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c /* _dPC5_UISA       */
+        .align 64
+        .quad 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116 /* _dPC6_UISA       */
+        .align 64
+        .quad 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da /* _dPC7_UISA       */
+        /*== _dbT ==*/
+        .align 64
+        .quad 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe /* _dbInvLn2 = 1/log(2) */
+        .align 64
+        .quad 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000 /* _dbLn2hi  = log(2) hi*/
+        .align 64
+        .quad 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899 /* _dbLn2lo  = log(2) lo*/
+        .align 64
+        .quad 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000 /* _dbShifter */
+        .align 64
+        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
+        .align 64
+        .quad 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14 /* _dPC3 */
+        .align 64
+        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
+        .align 64
+        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000 /* _lExpMask */
+        .align 64
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 /* _dSign*/
+        .align 64
+        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99 /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
+        .align 64
+        .type	__svml_dcosh_data_internal,@object
+        .size	__svml_dcosh_data_internal,.-__svml_dcosh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S
new file mode 100644
index 0000000000..456d8a129f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized coshf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_coshf _ZGVeN16v_coshf_avx2_wrapper
+#include "../svml_s_coshf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c
new file mode 100644
index 0000000000..34c008871a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized coshf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_coshf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_coshf, __GI__ZGVeN16v_coshf,
+	       __redirect__ZGVeN16v_coshf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S
new file mode 100644
index 0000000000..276e3cfe4d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S
@@ -0,0 +1,321 @@
+/* Function coshf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute cosh(x) as (exp(x)+exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   cosh(NaN) = quiet NaN, and raise invalid exception
+ *   cosh(INF) = that INF
+ *   cosh(0)   = 1
+ *   cosh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_scosh_data_internal
+ */
+#define _sExp_tbl_PH                  	0
+#define _sExp_tbl_NH                  	128
+#define _sShifter_UISA                	256
+#define _iDomainRange_UISA            	320
+#define _sPC1_UISA                    	384
+#define _sPC2_UISA                    	448
+#define _sPC3_UISA                    	512
+#define _sInvLn2                      	576
+#define _sLn2hi                       	640
+#define _sLn2lo                       	704
+#define _sSign                        	768
+#define _iExpMask                     	832
+#define _sShifter                     	896
+#define _iDomainRange                 	960
+#define _sPC1                         	1024
+#define _sPC2                         	1088
+#define _sPC3                         	1152
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_coshf_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   _sSign+__svml_scosh_data_internal(%rip), %zmm4
+        vmovups   _sShifter_UISA+__svml_scosh_data_internal(%rip), %zmm6
+
+/*
+ *  Load argument
+ * dM = x/log(2) + RShifter
+ */
+        vmovups   _sInvLn2+__svml_scosh_data_internal(%rip), %zmm10
+        vmovups   _sLn2hi+__svml_scosh_data_internal(%rip), %zmm7
+        vmovups   _sLn2lo+__svml_scosh_data_internal(%rip), %zmm9
+
+/*  */
+        vmovups   _sPC3_UISA+__svml_scosh_data_internal(%rip), %zmm2
+
+/* x^2 */
+        vmovups   _sPC2_UISA+__svml_scosh_data_internal(%rip), %zmm3
+
+/*  G1,G2 2^N,2^(-N)  */
+        vmovups   __svml_scosh_data_internal(%rip), %zmm12
+        vmovups   _sExp_tbl_NH+__svml_scosh_data_internal(%rip), %zmm13
+
+/*
+ *  Implementation
+ *  Abs argument
+ */
+        vandnps   %zmm0, %zmm4, %zmm1
+
+/* Check for overflow\underflow  */
+        vpternlogd $255, %zmm5, %zmm5, %zmm5
+        vfmadd213ps {rn-sae}, %zmm6, %zmm1, %zmm10
+        vpcmpd    $1, _iDomainRange_UISA+__svml_scosh_data_internal(%rip), %zmm1, %k1
+
+/* iM now is an EXP(2^N) */
+        vpslld    $18, %zmm10, %zmm11
+
+/*
+ *  R
+ * sN = sM - RShifter
+ */
+        vsubps    {rn-sae}, %zmm6, %zmm10, %zmm8
+        vpermt2ps _sExp_tbl_PH+64+__svml_scosh_data_internal(%rip), %zmm10, %zmm12
+        vpermt2ps _sExp_tbl_NH+64+__svml_scosh_data_internal(%rip), %zmm10, %zmm13
+        vpandnd   %zmm1, %zmm1, %zmm5{%k1}
+
+/* sR = sX - sN*Log2_hi */
+        vfnmadd231ps {rn-sae}, %zmm7, %zmm8, %zmm1
+        vptestmd  %zmm5, %zmm5, %k0
+
+/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
+        vfnmadd231ps {rn-sae}, %zmm9, %zmm8, %zmm1
+        kmovw     %k0, %edx
+        vmulps    {rn-sae}, %zmm1, %zmm1, %zmm4
+        vmulps    {rn-sae}, %zmm4, %zmm2, %zmm2
+
+/* sSinh_r = r + r*(r^2*(a3)) */
+        vfmadd213ps {rn-sae}, %zmm1, %zmm1, %zmm2
+
+/* sOut = r^2*(a2) */
+        vmulps    {rn-sae}, %zmm4, %zmm3, %zmm1
+        vpandd    _iExpMask+__svml_scosh_data_internal(%rip), %zmm11, %zmm14
+        vpaddd    %zmm14, %zmm12, %zmm15
+        vpsubd    %zmm14, %zmm13, %zmm10
+
+/* sG2 = 2^N*Th + 2^(-N)*T_h */
+        vaddps    {rn-sae}, %zmm10, %zmm15, %zmm5
+
+/* sG1 = 2^N*Th - 2^(-N)*T_h */
+        vsubps    {rn-sae}, %zmm10, %zmm15, %zmm6
+
+/* res = sG1*(r + r*(r^2*(a3))) + sG2*(1+r^2*(a2)) */
+        vfmadd213ps {rn-sae}, %zmm5, %zmm5, %zmm1
+        vfmadd213ps {rn-sae}, %zmm1, %zmm2, %zmm6
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm6
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %zmm6, %zmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm0, 64(%rsp)
+        vmovups   %zmm6, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm6
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm6
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm6
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      coshf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_coshf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_scosh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(64)) VUINT32 _sExp_tbl_PH[32][1];
+        __declspec(align(64)) VUINT32 _sExp_tbl_NH[32][1];
+        __declspec(align(64)) VUINT32 _sShifter_UISA[16][1];
+        __declspec(align(64)) VUINT32 _iDomainRange_UISA[16][1];
+        __declspec(align(64)) VUINT32 _sPC1_UISA[16][1];
+        __declspec(align(64)) VUINT32 _sPC2_UISA[16][1];
+        __declspec(align(64)) VUINT32 _sPC3_UISA[16][1];
+        __declspec(align(64)) VUINT32 _sInvLn2[16][1];
+        __declspec(align(64)) VUINT32 _sLn2hi[16][1];
+        __declspec(align(64)) VUINT32 _sLn2lo[16][1];
+        __declspec(align(64)) VUINT32 _sSign[16][1];
+        __declspec(align(64)) VUINT32 _iExpMask[16][1];
+        __declspec(align(64)) VUINT32 _sShifter[16][1];
+        __declspec(align(64)) VUINT32 _iDomainRange[16][1];
+        __declspec(align(64)) VUINT32 _sPC1[16][1];
+        __declspec(align(64)) VUINT32 _sPC2[16][1];
+        __declspec(align(64)) VUINT32 _sPC3[16][1];
+} __svml_scosh_data_internal;
+#endif
+__svml_scosh_data_internal:
+        /* _sExp_tbl_PH 2^(i/32-1), i=0..31 */
+        .long 0x3f000000, 0x3f02cd87, 0x3f05aac3, 0x3f08980f
+        .long 0x3f0b95c2, 0x3f0ea43a, 0x3f11c3d3, 0x3f14f4f0
+        .long 0x3f1837f0, 0x3f1b8d3a, 0x3f1ef532, 0x3f227043
+        .long 0x3f25fed7, 0x3f29a15b, 0x3f2d583f, 0x3f3123f6
+        .long 0x3f3504f3, 0x3f38fbaf, 0x3f3d08a4, 0x3f412c4d
+        .long 0x3f45672a, 0x3f49b9be, 0x3f4e248c, 0x3f52a81e
+        .long 0x3f5744fd, 0x3f5bfbb8, 0x3f60ccdf, 0x3f65b907
+        .long 0x3f6ac0c7, 0x3f6fe4ba, 0x3f75257d, 0x3f7a83b3
+        /* _sExp_tbl_NH 2^(-i/32-1), i=0..31 */
+        .align 64
+        .long 0x3f000000, 0x3efa83b3, 0x3ef5257d, 0x3eefe4ba
+        .long 0x3eeac0c7, 0x3ee5b907, 0x3ee0ccdf, 0x3edbfbb8
+        .long 0x3ed744fd, 0x3ed2a81e, 0x3ece248c, 0x3ec9b9be
+        .long 0x3ec5672a, 0x3ec12c4d, 0x3ebd08a4, 0x3eb8fbaf
+        .long 0x3eb504f3, 0x3eb123f6, 0x3ead583f, 0x3ea9a15b
+        .long 0x3ea5fed7, 0x3ea27043, 0x3e9ef532, 0x3e9b8d3a
+        .long 0x3e9837f0, 0x3e94f4f0, 0x3e91c3d3, 0x3e8ea43a
+        .long 0x3e8b95c2, 0x3e88980f, 0x3e85aac3, 0x3e82cd87
+        .align 64
+        .long 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000         /* 1.5*2^18 _sShifter_UISA */
+        .align 64
+        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E         /* _iDomainRange_UISA */
+        .align 64
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1_UISA=1       */
+        .align 64
+        .long 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f         /* _sPC2_UISA         */
+        .align 64
+        .long 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd         /* _sPC3_UISA         */
+        .align 64
+        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B       /* _sInvLn2  */  //k=0
+        .align 64
+        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000       /* _sLn2hi   */
+        .align 64
+        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4       /* _sLn2lo   */
+        .align 64
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000       /* _sSign    */
+        .align 64
+        .long 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000       /* _iExpMask */
+        .align 64
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
+        .align 64
+        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
+        .align 64
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
+        .align 64
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
+        .align 64
+        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
+        .align 64
+        .type	__svml_scosh_data_internal,@object
+        .size	__svml_scosh_data_internal,.-__svml_scosh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S
new file mode 100644
index 0000000000..c719dc7d6a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized coshf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_coshf _ZGVbN4v_coshf_sse2
+#include "../svml_s_coshf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c
new file mode 100644
index 0000000000..c2dfcd44f8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized coshf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_coshf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_coshf, __GI__ZGVbN4v_coshf,
+	       __redirect__ZGVbN4v_coshf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S
new file mode 100644
index 0000000000..506f6a4bd9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S
@@ -0,0 +1,305 @@
+/* Function coshf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute cosh(x) as (exp(x)+exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   cosh(NaN) = quiet NaN, and raise invalid exception
+ *   cosh(INF) = that INF
+ *   cosh(0)   = 1
+ *   cosh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_scosh_data_internal
+ */
+#define _sInvLn2                      	0
+#define _sLn2hi                       	16
+#define _sLn2lo                       	32
+#define _sSign                        	48
+#define _sShifter                     	64
+#define _iDomainRange                 	80
+#define _sPC1                         	96
+#define _sPC2                         	112
+#define _sPC3                         	128
+#define _sPC4                         	144
+#define _sPC5                         	160
+#define _sPC6                         	176
+#define _iHalf                        	192
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_coshf_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+
+/*
+ *  Implementation
+ *  Abs argument
+ */
+        movups    _sSign+__svml_scosh_data_internal(%rip), %xmm1
+
+/*
+ *  Load argument
+ * dM = x/log(2) + RShifter
+ */
+        movups    _sInvLn2+__svml_scosh_data_internal(%rip), %xmm9
+        andnps    %xmm0, %xmm1
+        mulps     %xmm1, %xmm9
+
+/* Check for overflow\underflow  */
+        movaps    %xmm1, %xmm3
+        movups    _sShifter+__svml_scosh_data_internal(%rip), %xmm4
+        movups    _sLn2hi+__svml_scosh_data_internal(%rip), %xmm5
+        addps     %xmm4, %xmm9
+
+/*
+ *  R
+ * sN = sM - RShifter
+ */
+        movaps    %xmm9, %xmm6
+
+/*
+ *  G1,G2 2^N,2^(-N)
+ * iM now is an EXP(2^N)
+ */
+        pslld     $23, %xmm9
+        movups    _sLn2lo+__svml_scosh_data_internal(%rip), %xmm7
+        subps     %xmm4, %xmm6
+
+/* sR = sX - sN*Log2_hi */
+        mulps     %xmm6, %xmm5
+
+/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
+        mulps     %xmm6, %xmm7
+        movdqu    _iDomainRange+__svml_scosh_data_internal(%rip), %xmm2
+        pcmpgtd   %xmm2, %xmm3
+        pcmpeqd   %xmm1, %xmm2
+
+/*
+ * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) ....
+ * sSinh_r = (a3+r^2*a5)
+ */
+        movups    _sPC5+__svml_scosh_data_internal(%rip), %xmm10
+        por       %xmm2, %xmm3
+
+/*
+ * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2)
+ * sOut = (a4 +a6*sR2)
+ */
+        movups    _sPC6+__svml_scosh_data_internal(%rip), %xmm11
+        subps     %xmm5, %xmm1
+        movmskps  %xmm3, %edx
+        movdqu    _iHalf+__svml_scosh_data_internal(%rip), %xmm8
+        subps     %xmm7, %xmm1
+
+/* sR2 = sR^2,shaffled */
+        movaps    %xmm1, %xmm13
+        movdqa    %xmm8, %xmm2
+        mulps     %xmm1, %xmm13
+        paddd     %xmm9, %xmm2
+        mulps     %xmm13, %xmm10
+        psubd     %xmm9, %xmm8
+        mulps     %xmm13, %xmm11
+        addps     _sPC3+__svml_scosh_data_internal(%rip), %xmm10
+        addps     _sPC4+__svml_scosh_data_internal(%rip), %xmm11
+
+/* sSinh_r = r^2*(a3+r^2*a5) */
+        mulps     %xmm13, %xmm10
+
+/* sOut = a2+sR2*(a4+a6*sR2) */
+        mulps     %xmm13, %xmm11
+
+/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */
+        mulps     %xmm1, %xmm10
+        addps     _sPC2+__svml_scosh_data_internal(%rip), %xmm11
+        addps     %xmm10, %xmm1
+
+/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */
+        mulps     %xmm11, %xmm13
+
+/* sG1 = 2^(N-1)-2^(-N-1) */
+        movdqa    %xmm2, %xmm12
+
+/* sG2 = 2^(N-1)+2^(-N-1) */
+        addps     %xmm8, %xmm2
+        subps     %xmm8, %xmm12
+
+/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        mulps     %xmm2, %xmm13
+
+/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        mulps     %xmm1, %xmm12
+        addps     %xmm12, %xmm13
+
+/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        addps     %xmm13, %xmm2
+
+/*  Ret H  */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm2, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm2, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm2
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm2
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      coshf@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_coshf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_scosh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _sInvLn2[4][1];
+        __declspec(align(16)) VUINT32 _sLn2hi[4][1];
+        __declspec(align(16)) VUINT32 _sLn2lo[4][1];
+        __declspec(align(16)) VUINT32 _sSign[4][1];
+        __declspec(align(16)) VUINT32 _sShifter[4][1];
+        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
+        __declspec(align(16)) VUINT32 _sPC1[4][1];
+        __declspec(align(16)) VUINT32 _sPC2[4][1];
+        __declspec(align(16)) VUINT32 _sPC3[4][1];
+        __declspec(align(16)) VUINT32 _sPC4[4][1];
+        __declspec(align(16)) VUINT32 _sPC5[4][1];
+        __declspec(align(16)) VUINT32 _sPC6[4][1];
+        __declspec(align(16)) VUINT32 _iHalf[4][1];
+} __svml_scosh_data_internal;
+#endif
+__svml_scosh_data_internal:
+        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B       /* _sInvLn2  */  //k=0
+        .align 16
+        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000       /* _sLn2hi   */
+        .align 16
+        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4       /* _sLn2lo   */
+        .align 16
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000       /* _sSign    */
+        .align 16
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
+        .align 16
+        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
+        .align 16
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
+        .align 16
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
+        .align 16
+        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
+        .align 16
+        .long 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72         /* _sPC4  */
+        .align 16
+        .long 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461         /* _sPC5  */
+        .align 16
+        .long 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3         /* _sPC6  */
+        // Integer constants
+        .align 16
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000 /* _iHalf*/
+        .align 16
+        .type	__svml_scosh_data_internal,@object
+        .size	__svml_scosh_data_internal,.-__svml_scosh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S
new file mode 100644
index 0000000000..c27229e1fa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized coshf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_coshf _ZGVdN8v_coshf_sse_wrapper
+#include "../svml_s_coshf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c
new file mode 100644
index 0000000000..e82818b2c9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized coshf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_coshf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_coshf, __GI__ZGVdN8v_coshf,
+	       __redirect__ZGVdN8v_coshf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S
new file mode 100644
index 0000000000..9149061e7e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S
@@ -0,0 +1,308 @@
+/* Function coshf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute cosh(x) as (exp(x)+exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   cosh(NaN) = quiet NaN, and raise invalid exception
+ *   cosh(INF) = that INF
+ *   cosh(0)   = 1
+ *   cosh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_scosh_data_internal
+ */
+#define _sInvLn2                      	0
+#define _sLn2hi                       	32
+#define _sLn2lo                       	64
+#define _sSign                        	96
+#define _sShifter                     	128
+#define _iDomainRange                 	160
+#define _sPC1                         	192
+#define _sPC2                         	224
+#define _sPC3                         	256
+#define _sPC4                         	288
+#define _sPC5                         	320
+#define _sPC6                         	352
+#define _iHalf                        	384
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_coshf_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        vmovups   _sSign+__svml_scosh_data_internal(%rip), %ymm2
+        vmovups   _sShifter+__svml_scosh_data_internal(%rip), %ymm7
+
+/*
+ *  Load argument
+ * dM = x/log(2) + RShifter
+ */
+        vmovups   _sInvLn2+__svml_scosh_data_internal(%rip), %ymm10
+        vmovups   _sLn2hi+__svml_scosh_data_internal(%rip), %ymm8
+        vmovups   _iDomainRange+__svml_scosh_data_internal(%rip), %ymm3
+
+/*
+ * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) ....
+ * sSinh_r = (a3+r^2*a5)
+ */
+        vmovups   _sPC5+__svml_scosh_data_internal(%rip), %ymm15
+        vmovups   _iHalf+__svml_scosh_data_internal(%rip), %ymm11
+        vmovaps   %ymm0, %ymm1
+
+/*
+ *  Implementation
+ *  Abs argument
+ */
+        vandnps   %ymm1, %ymm2, %ymm0
+        vfmadd213ps %ymm7, %ymm0, %ymm10
+
+/*
+ *  R
+ * sN = sM - RShifter
+ */
+        vsubps    %ymm7, %ymm10, %ymm9
+
+/*
+ *  G1,G2 2^N,2^(-N)
+ * iM now is an EXP(2^N)
+ */
+        vpslld    $23, %ymm10, %ymm12
+
+/* Check for overflow\underflow  */
+        vpcmpgtd  %ymm3, %ymm0, %ymm4
+        vpcmpeqd  %ymm3, %ymm0, %ymm5
+
+/* sR = sX - sN*Log2_hi */
+        vfnmadd231ps %ymm8, %ymm9, %ymm0
+        vpaddd    %ymm12, %ymm11, %ymm13
+        vpsubd    %ymm12, %ymm11, %ymm14
+        vpor      %ymm5, %ymm4, %ymm6
+
+/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
+        vfnmadd231ps _sLn2lo+__svml_scosh_data_internal(%rip), %ymm9, %ymm0
+
+/* sG1 = 2^(N-1)-2^(-N-1) */
+        vsubps    %ymm14, %ymm13, %ymm4
+
+/* sG2 = 2^(N-1)+2^(-N-1) */
+        vaddps    %ymm14, %ymm13, %ymm3
+
+/* sR2 = sR^2,shaffled */
+        vmulps    %ymm0, %ymm0, %ymm2
+        vfmadd213ps _sPC3+__svml_scosh_data_internal(%rip), %ymm2, %ymm15
+
+/* sSinh_r = r^2*(a3+r^2*a5) */
+        vmulps    %ymm15, %ymm2, %ymm13
+
+/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */
+        vfmadd213ps %ymm0, %ymm0, %ymm13
+
+/*
+ * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2)
+ * sOut = (a4 +a6*sR2)
+ */
+        vmovups   _sPC6+__svml_scosh_data_internal(%rip), %ymm0
+        vfmadd213ps _sPC4+__svml_scosh_data_internal(%rip), %ymm2, %ymm0
+
+/* sOut = a2+sR2*(a4+a6*sR2) */
+        vfmadd213ps _sPC2+__svml_scosh_data_internal(%rip), %ymm2, %ymm0
+
+/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */
+        vmulps    %ymm0, %ymm2, %ymm15
+
+/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        vmulps    %ymm15, %ymm3, %ymm14
+
+/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        vfmadd213ps %ymm14, %ymm13, %ymm4
+        vmovmskps %ymm6, %edx
+
+/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        vaddps    %ymm4, %ymm3, %ymm0
+
+/*  Ret H  */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm1, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      coshf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_coshf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_scosh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _sInvLn2[8][1];
+        __declspec(align(32)) VUINT32 _sLn2hi[8][1];
+        __declspec(align(32)) VUINT32 _sLn2lo[8][1];
+        __declspec(align(32)) VUINT32 _sSign[8][1];
+        __declspec(align(32)) VUINT32 _sShifter[8][1];
+        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
+        __declspec(align(32)) VUINT32 _sPC1[8][1];
+        __declspec(align(32)) VUINT32 _sPC2[8][1];
+        __declspec(align(32)) VUINT32 _sPC3[8][1];
+        __declspec(align(32)) VUINT32 _sPC4[8][1];
+        __declspec(align(32)) VUINT32 _sPC5[8][1];
+        __declspec(align(32)) VUINT32 _sPC6[8][1];
+        __declspec(align(32)) VUINT32 _iHalf[8][1];
+} __svml_scosh_data_internal;
+#endif
+__svml_scosh_data_internal:
+        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B       /* _sInvLn2  */  //k=0
+        .align 32
+        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000       /* _sLn2hi   */
+        .align 32
+        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4       /* _sLn2lo   */
+        .align 32
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000       /* _sSign    */
+        .align 32
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
+        .align 32
+        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
+        .align 32
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
+        .align 32
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
+        .align 32
+        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
+        .align 32
+        .long 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72         /* _sPC4  */
+        .align 32
+        .long 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461         /* _sPC5  */
+        .align 32
+        .long 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3         /* _sPC6  */
+        // Integer constants
+        .align 32
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000 /* _iHalf*/
+        .align 32
+        .type	__svml_scosh_data_internal,@object
+        .size	__svml_scosh_data_internal,.-__svml_scosh_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_cosh2_core.S b/sysdeps/x86_64/fpu/svml_d_cosh2_core.S
new file mode 100644
index 0000000000..f95952cfe5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cosh2_core.S
@@ -0,0 +1,29 @@
+/* Function cosh vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_cosh)
+WRAPPER_IMPL_SSE2 cosh
+END (_ZGVbN2v_cosh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_cosh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_cosh4_core.S b/sysdeps/x86_64/fpu/svml_d_cosh4_core.S
new file mode 100644
index 0000000000..cc24d0fb6b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cosh4_core.S
@@ -0,0 +1,29 @@
+/* Function cosh vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_cosh)
+WRAPPER_IMPL_AVX _ZGVbN2v_cosh
+END (_ZGVdN4v_cosh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_cosh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S
new file mode 100644
index 0000000000..4323f5e308
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function cosh vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_cosh)
+WRAPPER_IMPL_AVX _ZGVbN2v_cosh
+END (_ZGVcN4v_cosh)
diff --git a/sysdeps/x86_64/fpu/svml_d_cosh8_core.S b/sysdeps/x86_64/fpu/svml_d_cosh8_core.S
new file mode 100644
index 0000000000..90ee1ca125
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cosh8_core.S
@@ -0,0 +1,25 @@
+/* Function cosh vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_cosh)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_cosh
+END (_ZGVeN8v_cosh)
diff --git a/sysdeps/x86_64/fpu/svml_s_coshf16_core.S b/sysdeps/x86_64/fpu/svml_s_coshf16_core.S
new file mode 100644
index 0000000000..fe243b8b94
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_coshf16_core.S
@@ -0,0 +1,25 @@
+/* Function coshf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_coshf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_coshf
+END (_ZGVeN16v_coshf)
diff --git a/sysdeps/x86_64/fpu/svml_s_coshf4_core.S b/sysdeps/x86_64/fpu/svml_s_coshf4_core.S
new file mode 100644
index 0000000000..b55ede6e38
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_coshf4_core.S
@@ -0,0 +1,29 @@
+/* Function coshf vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_coshf)
+WRAPPER_IMPL_SSE2 coshf
+END (_ZGVbN4v_coshf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_coshf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_coshf8_core.S b/sysdeps/x86_64/fpu/svml_s_coshf8_core.S
new file mode 100644
index 0000000000..3ea02d0f19
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_coshf8_core.S
@@ -0,0 +1,29 @@
+/* Function coshf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_coshf)
+WRAPPER_IMPL_AVX _ZGVbN4v_coshf
+END (_ZGVdN8v_coshf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_coshf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S
new file mode 100644
index 0000000000..9b3002f7c9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function coshf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_coshf)
+WRAPPER_IMPL_AVX _ZGVbN4v_coshf
+END (_ZGVcN8v_coshf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c
new file mode 100644
index 0000000000..1dd311a562
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-cosh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c
new file mode 100644
index 0000000000..1dd311a562
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-cosh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c
new file mode 100644
index 0000000000..1dd311a562
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-cosh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh.c
new file mode 100644
index 0000000000..cf49ec5d87
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC cosh
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 256e8f07c9..68c449e04a 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10)
+VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 9de1dab2c2..df67306373 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10)
+VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 43865ab099..1a6731098f 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10)
+VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 5dbdacf617..4cdfa918e8 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10)
+VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c
new file mode 100644
index 0000000000..905dc3ca4a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-coshf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c
new file mode 100644
index 0000000000..905dc3ca4a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-coshf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c
new file mode 100644
index 0000000000..905dc3ca4a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-coshf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf.c
new file mode 100644
index 0000000000..94b899076b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC coshf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index c159c8f583..47a9862233 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f)
+VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index c745ef744a..e7c5410e7b 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f)
+VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index c9226cf4dc..b8e9d48cd6 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f)
+VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 92970c5ace..328c827b27 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f)
+VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 07/18] x86-64: Add vector expm1/expm1f implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (5 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 06/18] x86-64: Add vector cosh/coshf " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:25   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 08/18] x86-64: Add vector sinh/sinhf " Sunil K Pandey
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector expm1/expm1f with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
 .../fpu/multiarch/svml_d_expm12_core-sse2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_d_expm12_core.c |  27 ++
 .../fpu/multiarch/svml_d_expm12_core_sse4.S   | 421 ++++++++++++++++++
 .../fpu/multiarch/svml_d_expm14_core-sse.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_expm14_core.c |  27 ++
 .../fpu/multiarch/svml_d_expm14_core_avx2.S   | 408 +++++++++++++++++
 .../fpu/multiarch/svml_d_expm18_core-avx2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_d_expm18_core.c |  27 ++
 .../fpu/multiarch/svml_d_expm18_core_avx512.S | 334 ++++++++++++++
 .../fpu/multiarch/svml_s_expm1f16_core-avx2.S |  20 +
 .../fpu/multiarch/svml_s_expm1f16_core.c      |  28 ++
 .../multiarch/svml_s_expm1f16_core_avx512.S   | 281 ++++++++++++
 .../fpu/multiarch/svml_s_expm1f4_core-sse2.S  |  20 +
 .../fpu/multiarch/svml_s_expm1f4_core.c       |  28 ++
 .../fpu/multiarch/svml_s_expm1f4_core_sse4.S  | 358 +++++++++++++++
 .../fpu/multiarch/svml_s_expm1f8_core-sse.S   |  20 +
 .../fpu/multiarch/svml_s_expm1f8_core.c       |  28 ++
 .../fpu/multiarch/svml_s_expm1f8_core_avx2.S  | 351 +++++++++++++++
 sysdeps/x86_64/fpu/svml_d_expm12_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_d_expm14_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S   |  25 ++
 sysdeps/x86_64/fpu/svml_d_expm18_core.S       |  25 ++
 sysdeps/x86_64/fpu/svml_s_expm1f16_core.S     |  25 ++
 sysdeps/x86_64/fpu/svml_s_expm1f4_core.S      |  29 ++
 sysdeps/x86_64/fpu/svml_s_expm1f8_core.S      |  29 ++
 sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S  |  25 ++
 .../fpu/test-double-libmvec-expm1-avx.c       |   1 +
 .../fpu/test-double-libmvec-expm1-avx2.c      |   1 +
 .../fpu/test-double-libmvec-expm1-avx512f.c   |   1 +
 .../x86_64/fpu/test-double-libmvec-expm1.c    |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../fpu/test-float-libmvec-expm1f-avx.c       |   1 +
 .../fpu/test-float-libmvec-expm1f-avx2.c      |   1 +
 .../fpu/test-float-libmvec-expm1f-avx512f.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-expm1f.c    |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 2725 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_expm12_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_expm14_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_expm18_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 35c6ac57a8..28dc4a82c5 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -175,4 +175,15 @@
 #define __DECL_SIMD_coshf32x
 #define __DECL_SIMD_coshf64x
 #define __DECL_SIMD_coshf128x
+
+#define __DECL_SIMD_expm1
+#define __DECL_SIMD_expm1f
+#define __DECL_SIMD_expm1l
+#define __DECL_SIMD_expm1f16
+#define __DECL_SIMD_expm1f32
+#define __DECL_SIMD_expm1f64
+#define __DECL_SIMD_expm1f128
+#define __DECL_SIMD_expm1f32x
+#define __DECL_SIMD_expm1f64x
+#define __DECL_SIMD_expm1f128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 60a314f69e..c57adc8ace 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -116,7 +116,7 @@ __MATHCALL_VEC (exp10,, (_Mdouble_ __x));
 
 #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99
 /* Return exp(X) - 1.  */
-__MATHCALL (expm1,, (_Mdouble_ __x));
+__MATHCALL_VEC (expm1,, (_Mdouble_ __x));
 
 /* Return log(1 + X).  */
 __MATHCALL (log1p,, (_Mdouble_ __x));
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 4907680143..c9d3213bd3 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -52,6 +52,7 @@ GLIBC_2.35 _ZGVbN2v_atan F
 GLIBC_2.35 _ZGVbN2v_cosh F
 GLIBC_2.35 _ZGVbN2v_exp10 F
 GLIBC_2.35 _ZGVbN2v_exp2 F
+GLIBC_2.35 _ZGVbN2v_expm1 F
 GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
 GLIBC_2.35 _ZGVbN4v_asinf F
@@ -59,6 +60,7 @@ GLIBC_2.35 _ZGVbN4v_atanf F
 GLIBC_2.35 _ZGVbN4v_coshf F
 GLIBC_2.35 _ZGVbN4v_exp10f F
 GLIBC_2.35 _ZGVbN4v_exp2f F
+GLIBC_2.35 _ZGVbN4v_expm1f F
 GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
 GLIBC_2.35 _ZGVcN4v_asin F
@@ -66,6 +68,7 @@ GLIBC_2.35 _ZGVcN4v_atan F
 GLIBC_2.35 _ZGVcN4v_cosh F
 GLIBC_2.35 _ZGVcN4v_exp10 F
 GLIBC_2.35 _ZGVcN4v_exp2 F
+GLIBC_2.35 _ZGVcN4v_expm1 F
 GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
 GLIBC_2.35 _ZGVcN8v_asinf F
@@ -73,6 +76,7 @@ GLIBC_2.35 _ZGVcN8v_atanf F
 GLIBC_2.35 _ZGVcN8v_coshf F
 GLIBC_2.35 _ZGVcN8v_exp10f F
 GLIBC_2.35 _ZGVcN8v_exp2f F
+GLIBC_2.35 _ZGVcN8v_expm1f F
 GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
 GLIBC_2.35 _ZGVdN4v_asin F
@@ -80,6 +84,7 @@ GLIBC_2.35 _ZGVdN4v_atan F
 GLIBC_2.35 _ZGVdN4v_cosh F
 GLIBC_2.35 _ZGVdN4v_exp10 F
 GLIBC_2.35 _ZGVdN4v_exp2 F
+GLIBC_2.35 _ZGVdN4v_expm1 F
 GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
 GLIBC_2.35 _ZGVdN8v_asinf F
@@ -87,6 +92,7 @@ GLIBC_2.35 _ZGVdN8v_atanf F
 GLIBC_2.35 _ZGVdN8v_coshf F
 GLIBC_2.35 _ZGVdN8v_exp10f F
 GLIBC_2.35 _ZGVdN8v_exp2f F
+GLIBC_2.35 _ZGVdN8v_expm1f F
 GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
 GLIBC_2.35 _ZGVeN16v_asinf F
@@ -94,6 +100,7 @@ GLIBC_2.35 _ZGVeN16v_atanf F
 GLIBC_2.35 _ZGVeN16v_coshf F
 GLIBC_2.35 _ZGVeN16v_exp10f F
 GLIBC_2.35 _ZGVeN16v_exp2f F
+GLIBC_2.35 _ZGVeN16v_expm1f F
 GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
 GLIBC_2.35 _ZGVeN8v_asin F
@@ -101,4 +108,5 @@ GLIBC_2.35 _ZGVeN8v_atan F
 GLIBC_2.35 _ZGVeN8v_cosh F
 GLIBC_2.35 _ZGVeN8v_exp10 F
 GLIBC_2.35 _ZGVeN8v_exp2 F
+GLIBC_2.35 _ZGVeN8v_expm1 F
 GLIBC_2.35 _ZGVeN8vv_hypot F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 708e81b3d0..e2f98e176f 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -86,6 +86,10 @@
 #  define __DECL_SIMD_cosh __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_coshf
 #  define __DECL_SIMD_coshf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_expm1
+#  define __DECL_SIMD_expm1 __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_expm1f
+#  define __DECL_SIMD_expm1f __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index 81d0238ebf..43233059f6 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -42,6 +42,8 @@
 !GCC$ builtin (exp10f) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (cosh) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (coshf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (expm1) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (expm1f) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -69,3 +71,5 @@
 !GCC$ builtin (exp10f) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosh) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (coshf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (expm1) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (expm1f) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 5bc2df134f..8de8214971 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -30,6 +30,7 @@ libmvec-funcs = \
   exp \
   exp10 \
   exp2 \
+  expm1 \
   hypot \
   log \
   pow \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 53346d16a2..58debb2dbe 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -20,6 +20,7 @@ libmvec {
     _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh;
     _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
     _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
+    _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
     _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
@@ -27,6 +28,7 @@ libmvec {
     _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf;
     _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
     _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
+    _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
     _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
   }
 }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index ac70f15208..f05ece8c8a 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1395,6 +1395,26 @@ float: 1
 float128: 3
 ldouble: 4
 
+Function: "expm1_vlen16":
+float: 1
+
+Function: "expm1_vlen2":
+double: 1
+
+Function: "expm1_vlen4":
+double: 1
+float: 1
+
+Function: "expm1_vlen4_avx2":
+double: 1
+
+Function: "expm1_vlen8":
+double: 1
+float: 1
+
+Function: "expm1_vlen8_avx2":
+float: 1
+
 Function: "gamma":
 double: 4
 float: 7
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core-sse2.S
new file mode 100644
index 0000000000..e8cb6faaca
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized expm1, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_expm1 _ZGVbN2v_expm1_sse2
+#include "../svml_d_expm12_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core.c
new file mode 100644
index 0000000000..9c794e932e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized expm1, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_expm1
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_expm1, __GI__ZGVbN2v_expm1, __redirect__ZGVbN2v_expm1)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core_sse4.S
new file mode 100644
index 0000000000..db763e3856
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core_sse4.S
@@ -0,0 +1,421 @@
+/* Function expm1 vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    N = (int)(x*2^k/log(2.0)), R = x - N*log(2)/2^k
+ *    exp(x) = 2^(N/2^k) * poly(R) is computed in high-low parts
+ *    expm1(x) = exp(x)-1 is then obtained via multi-precision computation
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dexpm1_data_internal
+ */
+#define Expm1_HA_table                	0
+#define poly_coeff                    	2048
+#define Log2e                         	2112
+#define L2H                           	2128
+#define L2L                           	2144
+#define ExpAddConst                   	2160
+#define IndexMask                     	2176
+#define ExpMask                       	2192
+#define MOne                          	2208
+#define AbsMask                       	2224
+#define Threshold                     	2240
+#define L2                            	2256
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_expm1_sse4)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $64, %rsp
+        movaps    %xmm0, %xmm2
+        movups    Log2e+__svml_dexpm1_data_internal(%rip), %xmm7
+        lea       __svml_dexpm1_data_internal(%rip), %rsi
+        mulpd     %xmm0, %xmm7
+        movups    .FLT_10(%rip), %xmm3
+        addpd     %xmm3, %xmm7
+        subpd     %xmm3, %xmm7
+
+/* argument reduction */
+        movups    L2H+__svml_dexpm1_data_internal(%rip), %xmm4
+        mulpd     %xmm7, %xmm4
+        movups    L2L+__svml_dexpm1_data_internal(%rip), %xmm5
+        mulpd     %xmm7, %xmm5
+        subpd     %xmm4, %xmm2
+        subpd     %xmm5, %xmm2
+
+/* polynomial */
+        movups    poly_coeff+__svml_dexpm1_data_internal(%rip), %xmm12
+        movaps    %xmm2, %xmm14
+        mulpd     %xmm2, %xmm12
+        mulpd     %xmm2, %xmm14
+        addpd     poly_coeff+16+__svml_dexpm1_data_internal(%rip), %xmm12
+        movups    ExpAddConst+__svml_dexpm1_data_internal(%rip), %xmm15
+        addpd     %xmm7, %xmm15
+        mulpd     %xmm14, %xmm12
+        movups    poly_coeff+32+__svml_dexpm1_data_internal(%rip), %xmm13
+        mulpd     %xmm2, %xmm13
+
+/* table lookup */
+        movdqu    IndexMask+__svml_dexpm1_data_internal(%rip), %xmm8
+        pand      %xmm15, %xmm8
+        movups    AbsMask+__svml_dexpm1_data_internal(%rip), %xmm1
+        pshufd    $2, %xmm8, %xmm9
+        movaps    %xmm1, %xmm6
+        movd      %xmm8, %eax
+        andps     %xmm0, %xmm6
+        movd      %xmm9, %ecx
+        andnps    %xmm0, %xmm1
+        movdqu    ExpMask+__svml_dexpm1_data_internal(%rip), %xmm11
+        pand      %xmm11, %xmm15
+        cmpnlepd  Threshold+__svml_dexpm1_data_internal(%rip), %xmm6
+        addpd     poly_coeff+48+__svml_dexpm1_data_internal(%rip), %xmm13
+        movmskpd  %xmm6, %edx
+        psllq     $41, %xmm15
+
+/* T-1 */
+        movups    MOne+__svml_dexpm1_data_internal(%rip), %xmm4
+        movslq    %eax, %rax
+        movslq    %ecx, %rcx
+        addpd     %xmm12, %xmm13
+        movups    (%rsi,%rax), %xmm3
+        movups    (%rsi,%rcx), %xmm10
+        movaps    %xmm3, %xmm6
+        unpckhpd  %xmm10, %xmm3
+
+/* Th1 = (Th-1) + Tl */
+        mulpd     %xmm15, %xmm3
+        mulpd     %xmm13, %xmm14
+        unpcklpd  %xmm10, %xmm6
+        orps      %xmm15, %xmm6
+        addpd     %xmm4, %xmm6
+        addpd     %xmm14, %xmm2
+        addpd     %xmm3, %xmm6
+
+/* T = Th+Tl */
+        movaps    %xmm6, %xmm5
+        subpd     %xmm4, %xmm5
+        mulpd     %xmm5, %xmm2
+        addpd     %xmm2, %xmm6
+        orps      %xmm1, %xmm6
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm6
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm6, %xmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm6, 48(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm6
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 xmm6
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      expm1@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVbN2v_expm1_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dexpm1_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 Expm1_HA_table[(1<<8)][2];
+        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
+        __declspec(align(16)) VUINT32 Log2e[2][2];
+        __declspec(align(16)) VUINT32 L2H[2][2];
+        __declspec(align(16)) VUINT32 L2L[2][2];
+        __declspec(align(16)) VUINT32 ExpAddConst[2][2];
+        __declspec(align(16)) VUINT32 IndexMask[2][2];
+        __declspec(align(16)) VUINT32 ExpMask[2][2];
+        __declspec(align(16)) VUINT32 MOne[2][2];
+        __declspec(align(16)) VUINT32 AbsMask[2][2];
+        __declspec(align(16)) VUINT32 Threshold[2][2];
+        __declspec(align(16)) VUINT32 L2[2][2];
+} __svml_dexpm1_data_internal;
+#endif
+__svml_dexpm1_data_internal:
+        /* Expm1_HA_table */
+        .quad 0x0000000000000000, 0x0000000000000000
+        .quad 0x0000163da8000000, 0x3e3fb33356d84a67
+        .quad 0x00002c9a40000000, 0xbe3887f9f1190835
+        .quad 0x00004315e8000000, 0x3e1b9fe12f5ce3e7
+        .quad 0x000059b0d0000000, 0x3e48ac2ba1d73e2a
+        .quad 0x0000706b28000000, 0x3e3ddf6ddc6dc404
+        .quad 0x0000874518000000, 0x3e1d66f20230d7c9
+        .quad 0x00009e3ec8000000, 0x3e46379c1a290f03
+        .quad 0x0000b55870000000, 0xbe4833b784eb3a37
+        .quad 0x0000cc9228000000, 0x3e4b923fba03db83
+        .quad 0x0000e3ec30000000, 0x3e469e8d10103a17
+        .quad 0x0000fb66b0000000, 0xbdb2ce50dcdf6e22
+        .quad 0x00011301d0000000, 0x3df25b50a4ebbf1b
+        .quad 0x00012abdc0000000, 0x3e1b0c72fee4aeb5
+        .quad 0x0001429ab0000000, 0xbe356d2204cbefe7
+        .quad 0x00015a98c8000000, 0x3e24b1ca24901aae
+        .quad 0x000172b840000000, 0xbe4c15742919041c
+        .quad 0x00018af938000000, 0x3e2191bd3777ee17
+        .quad 0x0001a35be8000000, 0x3e4b7e5ba9e5b4c8
+        .quad 0x0001bbe088000000, 0xbe4fdd19632a70c7
+        .quad 0x0001d48730000000, 0x3e368b9aa7805b80
+        .quad 0x0001ed5020000000, 0x3e47e6c8e5c40d00
+        .quad 0x0002063b88000000, 0x3e18a3358ee3bac1
+        .quad 0x00021f4990000000, 0x3e37ddc962552fd3
+        .quad 0x0002387a70000000, 0xbe38a9dc7993e052
+        .quad 0x000251ce50000000, 0xbe135670329f5521
+        .quad 0x00026b4568000000, 0xbe40ec1916d42cc6
+        .quad 0x000284dfe0000000, 0x3e3f5638096cf15d
+        .quad 0x00029e9df8000000, 0xbe470108f69ed175
+        .quad 0x0002b87fd0000000, 0x3e2b5b31ffbbd48d
+        .quad 0x0002d285a8000000, 0xbe31bfcf4bff6e2b
+        .quad 0x0002ecafa8000000, 0x3e33e2f5611ca0f4
+        .quad 0x000306fe08000000, 0x3e418db8a96f46ad
+        .quad 0x0003217100000000, 0xbe4d993e76563187
+        .quad 0x00033c08b0000000, 0x3e4320b7fa64e431
+        .quad 0x000356c560000000, 0xbe1b5803cdae772e
+        .quad 0x000371a738000000, 0xbe28aac6ab1d7560
+        .quad 0x00038cae70000000, 0xbe47d13cd3d2b1a8
+        .quad 0x0003a7db38000000, 0xbe48d30048af21b7
+        .quad 0x0003c32dc0000000, 0x3e489d47242000f9
+        .quad 0x0003dea650000000, 0xbe4f6e5eee525f6f
+        .quad 0x0003fa4508000000, 0xbe4a9bff22fa047f
+        .quad 0x0004160a20000000, 0x3e3f72e29f84325c
+        .quad 0x000431f5d8000000, 0x3e350a896dc70444
+        .quad 0x00044e0860000000, 0x3e18624b40c4dbd0
+        .quad 0x00046a41f0000000, 0xbe4717fd446d7686
+        .quad 0x000486a2b8000000, 0xbe41f6197f61f2e2
+        .quad 0x0004a32af0000000, 0x3e2afa7bcce5b17a
+        .quad 0x0004bfdad8000000, 0xbe464eaec715e343
+        .quad 0x0004dcb298000000, 0x3e3fddd0d63b36ef
+        .quad 0x0004f9b278000000, 0xbe362d35952cc275
+        .quad 0x000516daa0000000, 0x3e467b320e0897a9
+        .quad 0x0005342b58000000, 0xbe362b07e20f57c4
+        .quad 0x000551a4c8000000, 0x3e42ec9076297631
+        .quad 0x00056f4738000000, 0xbe34ad8259913500
+        .quad 0x00058d12d8000000, 0xbe4b41c016d6a1ea
+        .quad 0x0005ab07e0000000, 0xbe45bd5eb539b67f
+        .quad 0x0005c92688000000, 0x3e42ca35b80e258e
+        .quad 0x0005e76f18000000, 0xbe4296f5bc8b20da
+        .quad 0x000605e1b8000000, 0x3e376dc08b076f59
+        .quad 0x0006247eb0000000, 0x3e0d2ac258f87d03
+        .quad 0x0006434638000000, 0xbe4999e701c483c7
+        .quad 0x0006623880000000, 0x3e42a91124893ecf
+        .quad 0x00068155d8000000, 0xbe4d9ab467bf1d47
+        .quad 0x0006a09e68000000, 0xbe380c4336f74d05
+        .quad 0x0006c01278000000, 0xbe47a12a08944ab3
+        .quad 0x0006dfb240000000, 0xbe4cd72e886ef8ea
+        .quad 0x0006ff7df8000000, 0x3e3519483cf87e1b
+        .quad 0x00071f75e8000000, 0x3e2d8bee7ba46e1e
+        .quad 0x00073f9a48000000, 0x3e24b02e77ab934a
+        .quad 0x00075feb58000000, 0xbe3bd98374091656
+        .quad 0x0007806950000000, 0xbe00d1604f328fec
+        .quad 0x0007a11470000000, 0x3e4f580c36bea881
+        .quad 0x0007c1ed00000000, 0x3e330c1327c49334
+        .quad 0x0007e2f338000000, 0xbe330b19defa2fd4
+        .quad 0x0008042758000000, 0xbe4e0f2f724f90cc
+        .quad 0x0008258998000000, 0x3e34cce128acf88b
+        .quad 0x0008471a48000000, 0xbe3dc385331ad094
+        .quad 0x000868d998000000, 0x3e4a2497640720ed
+        .quad 0x00088ac7d8000000, 0x3e38a669966530bd
+        .quad 0x0008ace540000000, 0x3e415506dadd3e2b
+        .quad 0x0008cf3218000000, 0xbe34abb7410d55e3
+        .quad 0x0008f1ae98000000, 0x3e31577362b98274
+        .quad 0x0009145b08000000, 0x3e4c8ffe2c4530da
+        .quad 0x00093737b0000000, 0x3e29b8bc9e8a0388
+        .quad 0x00095a44c8000000, 0x3e4e4290774da41b
+        .quad 0x00097d82a0000000, 0xbe00d8d83a30b6f8
+        .quad 0x0009a0f170000000, 0x3e2940f737462137
+        .quad 0x0009c49180000000, 0x3e451f8480e3e236
+        .quad 0x0009e86318000000, 0x3e3e323231824ca8
+        .quad 0x000a0c6678000000, 0x3e4aef2b2594d6d4
+        .quad 0x000a309bf0000000, 0xbe4dae966539f470
+        .quad 0x000a5503b0000000, 0x3e41f12ae45a1225
+        .quad 0x000a799e10000000, 0x3e49859ac3796fd9
+        .quad 0x000a9e6b58000000, 0xbe44301205e0a6de
+        .quad 0x000ac36bc0000000, 0xbe0606431f9234cb
+        .quad 0x000ae89f98000000, 0x3e35ad3ad5e8734d
+        .quad 0x000b0e0728000000, 0x3e38db66590842ad
+        .quad 0x000b33a2b8000000, 0x3e13c57ebdaff43a
+        .quad 0x000b597290000000, 0xbe40d536338e3bf7
+        .quad 0x000b7f76f0000000, 0x3e47daf237553d84
+        .quad 0x000ba5b030000000, 0x3e2420c930819679
+        .quad 0x000bcc1e90000000, 0x3e12f074891ee83d
+        .quad 0x000bf2c258000000, 0x3e4eb8f0442046b8
+        .quad 0x000c199be0000000, 0xbe43d56b1eeef9a7
+        .quad 0x000c40ab60000000, 0xbd87c2c975903ef8
+        .quad 0x000c67f130000000, 0xbe3a82eb4b5dec80
+        .quad 0x000c8f6d98000000, 0xbe4fc8c257729a1e
+        .quad 0x000cb720e0000000, 0xbe48837cb757e1a1
+        .quad 0x000cdf0b58000000, 0xbe4511e031dd83b5
+        .quad 0x000d072d48000000, 0x3e403c4bdc687918
+        .quad 0x000d2f8708000000, 0x3deb13e315bc2473
+        .quad 0x000d5818e0000000, 0xbe4822dbc6d12fd3
+        .quad 0x000d80e318000000, 0xbe3367c68447b063
+        .quad 0x000da9e600000000, 0x3e4ed9942b84600d
+        .quad 0x000dd321f0000000, 0x3e480da3025b4aef
+        .quad 0x000dfc9730000000, 0x3e4bdcdaf5cb4656
+        .quad 0x000e264618000000, 0xbe4852f6baf6c4f0
+        .quad 0x000e502ee8000000, 0xbe1d30027630bb40
+        .quad 0x000e7a51f8000000, 0x3e4e3a641a5aa459
+        .quad 0x000ea4afa0000000, 0x3e452486cc2c7b9d
+        .quad 0x000ecf4830000000, 0xbe438cc07b927e77
+        .quad 0x000efa1bf0000000, 0xbe39ea5d888e02de
+        .quad 0x000f252b38000000, 0xbe2288ad162f2d20
+        .quad 0x000f507658000000, 0x3e4b722a033a7c26
+        .quad 0x000f7bfdb0000000, 0xbe431a0f63b7625a
+        .quad 0x000fa7c180000000, 0x3e39e90d82e90a7e
+        .quad 0x000fd3c228000000, 0x3e4c7b8f884badd2
+        /*== poly_coeff[4] ==*/
+        .align 16
+        .quad 0x3f81111168877F38, 0x3f81111168877F38 /* coeff5 */
+        .quad 0x3fa55555C2A9C0F3, 0x3fa55555C2A9C0F3 /* coeff4 */
+        .quad 0x3fc555555555541D, 0x3fc555555555541D /* coeff3 */
+        .quad 0x3fdFFFFFFFFFFE5C, 0x3fdFFFFFFFFFFE5C /* coeff2 */
+        /*== Log2e ==*/
+        .align 16
+        .quad 0x40671547652B82FE, 0x40671547652B82FE
+        /*== L2H ==*/
+        .align 16
+        .quad 0x3f762e42fef80000, 0x3f762e42fef80000
+        /*== L2L ==*/
+        .align 16
+        .quad 0x3d41cf79abc9e3b4, 0x3d41cf79abc9e3b4
+        /*== ExpAddConst ==*/
+        .align 16
+        .quad 0x42f80000001ff800, 0x42f80000001ff800
+        /*== IndexMask ==*/
+        .align 16
+        .quad 0x00000000000007f0, 0x00000000000007f0
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x00000000003ff800, 0x00000000003ff800
+        /*== MOne ==*/
+        .align 16
+        .quad 0xbff0000000000000, 0xbff0000000000000
+        /*== AbsMask ==*/
+        .align 16
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== Threshold ==*/
+        .align 16
+        .quad 0x40861DA04CBAFE43, 0x40861DA04CBAFE43
+        /*== L2 ==*/
+        .align 16
+        .quad 0x3f762e42fefa39ef, 0x3f762e42fefa39ef
+        .align 16
+        .type	__svml_dexpm1_data_internal,@object
+        .size	__svml_dexpm1_data_internal,.-__svml_dexpm1_data_internal
+        .align 16
+
+.FLT_10:
+        .long	0x00000000,0x43380000,0x00000000,0x43380000
+        .type	.FLT_10,@object
+        .size	.FLT_10,16
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core-sse.S
new file mode 100644
index 0000000000..e7016708d0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized expm1, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_expm1 _ZGVdN4v_expm1_sse_wrapper
+#include "../svml_d_expm14_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core.c
new file mode 100644
index 0000000000..4215d7dbaf
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized expm1, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_expm1
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_expm1, __GI__ZGVdN4v_expm1, __redirect__ZGVdN4v_expm1)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core_avx2.S
new file mode 100644
index 0000000000..c34f73a578
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core_avx2.S
@@ -0,0 +1,408 @@
+/* Function expm1 vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    N = (int)(x*2^k/log(2.0)), R = x - N*log(2)/2^k
+ *    exp(x) = 2^(N/2^k) * poly(R) is computed in high-low parts
+ *    expm1(x) = exp(x)-1 is then obtained via multi-precision computation
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dexpm1_data_internal
+ */
+#define Expm1_HA_table                	0
+#define poly_coeff                    	2048
+#define Log2e                         	2176
+#define L2H                           	2208
+#define L2L                           	2240
+#define ExpAddConst                   	2272
+#define IndexMask                     	2304
+#define ExpMask                       	2336
+#define MOne                          	2368
+#define AbsMask                       	2400
+#define Threshold                     	2432
+#define L2                            	2464
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_expm1_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       __svml_dexpm1_data_internal(%rip), %r8
+        vmovapd   %ymm0, %ymm3
+        vmulpd    Log2e+__svml_dexpm1_data_internal(%rip), %ymm3, %ymm4
+
+/* argument reduction */
+        vmovupd   L2H+__svml_dexpm1_data_internal(%rip), %ymm2
+        vmovupd   AbsMask+__svml_dexpm1_data_internal(%rip), %ymm5
+        vroundpd  $0, %ymm4, %ymm8
+        vaddpd    ExpAddConst+__svml_dexpm1_data_internal(%rip), %ymm8, %ymm0
+        vfnmadd213pd %ymm3, %ymm8, %ymm2
+
+/* table lookup */
+        vandps    IndexMask+__svml_dexpm1_data_internal(%rip), %ymm0, %ymm9
+        vandpd    %ymm5, %ymm3, %ymm6
+        vcmpnle_uqpd Threshold+__svml_dexpm1_data_internal(%rip), %ymm6, %ymm7
+        vfnmadd231pd L2L+__svml_dexpm1_data_internal(%rip), %ymm8, %ymm2
+        vandnpd   %ymm3, %ymm5, %ymm1
+        vmovmskpd %ymm7, %eax
+        vmovupd   poly_coeff+64+__svml_dexpm1_data_internal(%rip), %ymm7
+        vmulpd    %ymm2, %ymm2, %ymm8
+        vfmadd213pd poly_coeff+96+__svml_dexpm1_data_internal(%rip), %ymm2, %ymm7
+        vandps    ExpMask+__svml_dexpm1_data_internal(%rip), %ymm0, %ymm0
+        vextractf128 $1, %ymm9, %xmm10
+        vmovd     %xmm9, %edx
+        vmovd     %xmm10, %esi
+        vpextrd   $2, %xmm9, %ecx
+        vpextrd   $2, %xmm10, %edi
+        movslq    %edx, %rdx
+        movslq    %ecx, %rcx
+        movslq    %esi, %rsi
+        movslq    %edi, %rdi
+        vmovupd   (%r8,%rdx), %xmm13
+        vmovupd   (%r8,%rcx), %xmm14
+        vmovupd   (%r8,%rsi), %xmm4
+        vmovupd   (%r8,%rdi), %xmm5
+        vunpcklpd %xmm14, %xmm13, %xmm11
+        vunpcklpd %xmm5, %xmm4, %xmm12
+        vpsllq    $41, %ymm0, %ymm10
+        vunpckhpd %xmm14, %xmm13, %xmm15
+        vunpckhpd %xmm5, %xmm4, %xmm13
+        vinsertf128 $1, %xmm12, %ymm11, %ymm6
+
+/* polynomial */
+        vmovupd   poly_coeff+__svml_dexpm1_data_internal(%rip), %ymm12
+
+/* T-1 */
+        vmovupd   MOne+__svml_dexpm1_data_internal(%rip), %ymm11
+        vfmadd213pd poly_coeff+32+__svml_dexpm1_data_internal(%rip), %ymm2, %ymm12
+        vfmadd213pd %ymm7, %ymm8, %ymm12
+        vorpd     %ymm10, %ymm6, %ymm9
+        vfmadd213pd %ymm2, %ymm8, %ymm12
+        vaddpd    %ymm11, %ymm9, %ymm2
+        vinsertf128 $1, %xmm13, %ymm15, %ymm14
+
+/* Th1 = (Th-1) + Tl */
+        vfmadd213pd %ymm2, %ymm10, %ymm14
+
+/* T = Th+Tl */
+        vsubpd    %ymm11, %ymm14, %ymm0
+        vfmadd213pd %ymm14, %ymm12, %ymm0
+        vorpd     %ymm1, %ymm0, %ymm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm3, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      expm1@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_expm1_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dexpm1_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 Expm1_HA_table[(1<<8)][2];
+        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
+        __declspec(align(32)) VUINT32 Log2e[4][2];
+        __declspec(align(32)) VUINT32 L2H[4][2];
+        __declspec(align(32)) VUINT32 L2L[4][2];
+        __declspec(align(32)) VUINT32 ExpAddConst[4][2];
+        __declspec(align(32)) VUINT32 IndexMask[4][2];
+        __declspec(align(32)) VUINT32 ExpMask[4][2];
+        __declspec(align(32)) VUINT32 MOne[4][2];
+        __declspec(align(32)) VUINT32 AbsMask[4][2];
+        __declspec(align(32)) VUINT32 Threshold[4][2];
+        __declspec(align(32)) VUINT32 L2[4][2];
+} __svml_dexpm1_data_internal;
+#endif
+__svml_dexpm1_data_internal:
+        /* Expm1_HA_table */
+        .quad 0x0000000000000000, 0x0000000000000000
+        .quad 0x0000163da8000000, 0x3e3fb33356d84a67
+        .quad 0x00002c9a40000000, 0xbe3887f9f1190835
+        .quad 0x00004315e8000000, 0x3e1b9fe12f5ce3e7
+        .quad 0x000059b0d0000000, 0x3e48ac2ba1d73e2a
+        .quad 0x0000706b28000000, 0x3e3ddf6ddc6dc404
+        .quad 0x0000874518000000, 0x3e1d66f20230d7c9
+        .quad 0x00009e3ec8000000, 0x3e46379c1a290f03
+        .quad 0x0000b55870000000, 0xbe4833b784eb3a37
+        .quad 0x0000cc9228000000, 0x3e4b923fba03db83
+        .quad 0x0000e3ec30000000, 0x3e469e8d10103a17
+        .quad 0x0000fb66b0000000, 0xbdb2ce50dcdf6e22
+        .quad 0x00011301d0000000, 0x3df25b50a4ebbf1b
+        .quad 0x00012abdc0000000, 0x3e1b0c72fee4aeb5
+        .quad 0x0001429ab0000000, 0xbe356d2204cbefe7
+        .quad 0x00015a98c8000000, 0x3e24b1ca24901aae
+        .quad 0x000172b840000000, 0xbe4c15742919041c
+        .quad 0x00018af938000000, 0x3e2191bd3777ee17
+        .quad 0x0001a35be8000000, 0x3e4b7e5ba9e5b4c8
+        .quad 0x0001bbe088000000, 0xbe4fdd19632a70c7
+        .quad 0x0001d48730000000, 0x3e368b9aa7805b80
+        .quad 0x0001ed5020000000, 0x3e47e6c8e5c40d00
+        .quad 0x0002063b88000000, 0x3e18a3358ee3bac1
+        .quad 0x00021f4990000000, 0x3e37ddc962552fd3
+        .quad 0x0002387a70000000, 0xbe38a9dc7993e052
+        .quad 0x000251ce50000000, 0xbe135670329f5521
+        .quad 0x00026b4568000000, 0xbe40ec1916d42cc6
+        .quad 0x000284dfe0000000, 0x3e3f5638096cf15d
+        .quad 0x00029e9df8000000, 0xbe470108f69ed175
+        .quad 0x0002b87fd0000000, 0x3e2b5b31ffbbd48d
+        .quad 0x0002d285a8000000, 0xbe31bfcf4bff6e2b
+        .quad 0x0002ecafa8000000, 0x3e33e2f5611ca0f4
+        .quad 0x000306fe08000000, 0x3e418db8a96f46ad
+        .quad 0x0003217100000000, 0xbe4d993e76563187
+        .quad 0x00033c08b0000000, 0x3e4320b7fa64e431
+        .quad 0x000356c560000000, 0xbe1b5803cdae772e
+        .quad 0x000371a738000000, 0xbe28aac6ab1d7560
+        .quad 0x00038cae70000000, 0xbe47d13cd3d2b1a8
+        .quad 0x0003a7db38000000, 0xbe48d30048af21b7
+        .quad 0x0003c32dc0000000, 0x3e489d47242000f9
+        .quad 0x0003dea650000000, 0xbe4f6e5eee525f6f
+        .quad 0x0003fa4508000000, 0xbe4a9bff22fa047f
+        .quad 0x0004160a20000000, 0x3e3f72e29f84325c
+        .quad 0x000431f5d8000000, 0x3e350a896dc70444
+        .quad 0x00044e0860000000, 0x3e18624b40c4dbd0
+        .quad 0x00046a41f0000000, 0xbe4717fd446d7686
+        .quad 0x000486a2b8000000, 0xbe41f6197f61f2e2
+        .quad 0x0004a32af0000000, 0x3e2afa7bcce5b17a
+        .quad 0x0004bfdad8000000, 0xbe464eaec715e343
+        .quad 0x0004dcb298000000, 0x3e3fddd0d63b36ef
+        .quad 0x0004f9b278000000, 0xbe362d35952cc275
+        .quad 0x000516daa0000000, 0x3e467b320e0897a9
+        .quad 0x0005342b58000000, 0xbe362b07e20f57c4
+        .quad 0x000551a4c8000000, 0x3e42ec9076297631
+        .quad 0x00056f4738000000, 0xbe34ad8259913500
+        .quad 0x00058d12d8000000, 0xbe4b41c016d6a1ea
+        .quad 0x0005ab07e0000000, 0xbe45bd5eb539b67f
+        .quad 0x0005c92688000000, 0x3e42ca35b80e258e
+        .quad 0x0005e76f18000000, 0xbe4296f5bc8b20da
+        .quad 0x000605e1b8000000, 0x3e376dc08b076f59
+        .quad 0x0006247eb0000000, 0x3e0d2ac258f87d03
+        .quad 0x0006434638000000, 0xbe4999e701c483c7
+        .quad 0x0006623880000000, 0x3e42a91124893ecf
+        .quad 0x00068155d8000000, 0xbe4d9ab467bf1d47
+        .quad 0x0006a09e68000000, 0xbe380c4336f74d05
+        .quad 0x0006c01278000000, 0xbe47a12a08944ab3
+        .quad 0x0006dfb240000000, 0xbe4cd72e886ef8ea
+        .quad 0x0006ff7df8000000, 0x3e3519483cf87e1b
+        .quad 0x00071f75e8000000, 0x3e2d8bee7ba46e1e
+        .quad 0x00073f9a48000000, 0x3e24b02e77ab934a
+        .quad 0x00075feb58000000, 0xbe3bd98374091656
+        .quad 0x0007806950000000, 0xbe00d1604f328fec
+        .quad 0x0007a11470000000, 0x3e4f580c36bea881
+        .quad 0x0007c1ed00000000, 0x3e330c1327c49334
+        .quad 0x0007e2f338000000, 0xbe330b19defa2fd4
+        .quad 0x0008042758000000, 0xbe4e0f2f724f90cc
+        .quad 0x0008258998000000, 0x3e34cce128acf88b
+        .quad 0x0008471a48000000, 0xbe3dc385331ad094
+        .quad 0x000868d998000000, 0x3e4a2497640720ed
+        .quad 0x00088ac7d8000000, 0x3e38a669966530bd
+        .quad 0x0008ace540000000, 0x3e415506dadd3e2b
+        .quad 0x0008cf3218000000, 0xbe34abb7410d55e3
+        .quad 0x0008f1ae98000000, 0x3e31577362b98274
+        .quad 0x0009145b08000000, 0x3e4c8ffe2c4530da
+        .quad 0x00093737b0000000, 0x3e29b8bc9e8a0388
+        .quad 0x00095a44c8000000, 0x3e4e4290774da41b
+        .quad 0x00097d82a0000000, 0xbe00d8d83a30b6f8
+        .quad 0x0009a0f170000000, 0x3e2940f737462137
+        .quad 0x0009c49180000000, 0x3e451f8480e3e236
+        .quad 0x0009e86318000000, 0x3e3e323231824ca8
+        .quad 0x000a0c6678000000, 0x3e4aef2b2594d6d4
+        .quad 0x000a309bf0000000, 0xbe4dae966539f470
+        .quad 0x000a5503b0000000, 0x3e41f12ae45a1225
+        .quad 0x000a799e10000000, 0x3e49859ac3796fd9
+        .quad 0x000a9e6b58000000, 0xbe44301205e0a6de
+        .quad 0x000ac36bc0000000, 0xbe0606431f9234cb
+        .quad 0x000ae89f98000000, 0x3e35ad3ad5e8734d
+        .quad 0x000b0e0728000000, 0x3e38db66590842ad
+        .quad 0x000b33a2b8000000, 0x3e13c57ebdaff43a
+        .quad 0x000b597290000000, 0xbe40d536338e3bf7
+        .quad 0x000b7f76f0000000, 0x3e47daf237553d84
+        .quad 0x000ba5b030000000, 0x3e2420c930819679
+        .quad 0x000bcc1e90000000, 0x3e12f074891ee83d
+        .quad 0x000bf2c258000000, 0x3e4eb8f0442046b8
+        .quad 0x000c199be0000000, 0xbe43d56b1eeef9a7
+        .quad 0x000c40ab60000000, 0xbd87c2c975903ef8
+        .quad 0x000c67f130000000, 0xbe3a82eb4b5dec80
+        .quad 0x000c8f6d98000000, 0xbe4fc8c257729a1e
+        .quad 0x000cb720e0000000, 0xbe48837cb757e1a1
+        .quad 0x000cdf0b58000000, 0xbe4511e031dd83b5
+        .quad 0x000d072d48000000, 0x3e403c4bdc687918
+        .quad 0x000d2f8708000000, 0x3deb13e315bc2473
+        .quad 0x000d5818e0000000, 0xbe4822dbc6d12fd3
+        .quad 0x000d80e318000000, 0xbe3367c68447b063
+        .quad 0x000da9e600000000, 0x3e4ed9942b84600d
+        .quad 0x000dd321f0000000, 0x3e480da3025b4aef
+        .quad 0x000dfc9730000000, 0x3e4bdcdaf5cb4656
+        .quad 0x000e264618000000, 0xbe4852f6baf6c4f0
+        .quad 0x000e502ee8000000, 0xbe1d30027630bb40
+        .quad 0x000e7a51f8000000, 0x3e4e3a641a5aa459
+        .quad 0x000ea4afa0000000, 0x3e452486cc2c7b9d
+        .quad 0x000ecf4830000000, 0xbe438cc07b927e77
+        .quad 0x000efa1bf0000000, 0xbe39ea5d888e02de
+        .quad 0x000f252b38000000, 0xbe2288ad162f2d20
+        .quad 0x000f507658000000, 0x3e4b722a033a7c26
+        .quad 0x000f7bfdb0000000, 0xbe431a0f63b7625a
+        .quad 0x000fa7c180000000, 0x3e39e90d82e90a7e
+        .quad 0x000fd3c228000000, 0x3e4c7b8f884badd2
+        /*== poly_coeff[4] ==*/
+        .align 32
+        .quad 0x3f81111168877F38, 0x3f81111168877F38, 0x3f81111168877F38, 0x3f81111168877F38 /* coeff5 */
+        .quad 0x3fa55555C2A9C0F3, 0x3fa55555C2A9C0F3, 0x3fa55555C2A9C0F3, 0x3fa55555C2A9C0F3 /* coeff4 */
+        .quad 0x3fc555555555541D, 0x3fc555555555541D, 0x3fc555555555541D, 0x3fc555555555541D /* coeff3 */
+        .quad 0x3fdFFFFFFFFFFE5C, 0x3fdFFFFFFFFFFE5C, 0x3fdFFFFFFFFFFE5C, 0x3fdFFFFFFFFFFE5C /* coeff2 */
+        /*== Log2e ==*/
+        .align 32
+        .quad 0x40671547652B82FE, 0x40671547652B82FE, 0x40671547652B82FE, 0x40671547652B82FE
+        /*== L2H ==*/
+        .align 32
+        .quad 0x3f762e42fef80000, 0x3f762e42fef80000, 0x3f762e42fef80000, 0x3f762e42fef80000
+        /*== L2L ==*/
+        .align 32
+        .quad 0x3d41cf79abc9e3b4, 0x3d41cf79abc9e3b4, 0x3d41cf79abc9e3b4, 0x3d41cf79abc9e3b4
+        /*== ExpAddConst ==*/
+        .align 32
+        .quad 0x42f80000001ff800, 0x42f80000001ff800, 0x42f80000001ff800, 0x42f80000001ff800
+        /*== IndexMask ==*/
+        .align 32
+        .quad 0x00000000000007f0, 0x00000000000007f0, 0x00000000000007f0, 0x00000000000007f0
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x00000000003ff800, 0x00000000003ff800, 0x00000000003ff800, 0x00000000003ff800
+        /*== MOne ==*/
+        .align 32
+        .quad 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000
+        /*== AbsMask ==*/
+        .align 32
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== Threshold ==*/
+        .align 32
+        .quad 0x40861DA04CBAFE43, 0x40861DA04CBAFE43, 0x40861DA04CBAFE43, 0x40861DA04CBAFE43
+        /*== L2 ==*/
+        .align 32
+        .quad 0x3f762e42fefa39ef, 0x3f762e42fefa39ef, 0x3f762e42fefa39ef, 0x3f762e42fefa39ef
+        .align 32
+        .type	__svml_dexpm1_data_internal,@object
+        .size	__svml_dexpm1_data_internal,.-__svml_dexpm1_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core-avx2.S
new file mode 100644
index 0000000000..3b75d1de16
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized expm1, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_expm1 _ZGVeN8v_expm1_avx2_wrapper
+#include "../svml_d_expm18_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core.c
new file mode 100644
index 0000000000..860edf6df5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized expm1, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_expm1
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_expm1, __GI__ZGVeN8v_expm1, __redirect__ZGVeN8v_expm1)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core_avx512.S
new file mode 100644
index 0000000000..64cee91abd
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core_avx512.S
@@ -0,0 +1,334 @@
+/* Function expm1 vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *   After computing exp(x) in high-low parts, an accurate computation is performed to obtain exp(x)-1
+ *   Typical exp() implementation, except that:
+ *    - tables are small (16 elements), allowing for fast gathers
+ *    - all arguments processed in the main path
+ *        - final VSCALEF assists branch-free design (correct overflow/underflow and special case responses)
+ *        - a VAND is used to ensure the reduced argument |R|<2, even for large inputs
+ *        - RZ mode used to avoid oveflow to +/-Inf for x*log2(e); helps with special case handling
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dexpm1_data_internal_avx512
+ */
+#define Exp_tbl_H                     	0
+#define Exp_tbl_L                     	128
+#define L2E                           	256
+#define Shifter                       	320
+#define Threshold                     	384
+#define SgnMask                       	448
+#define L2H                           	512
+#define L2L                           	576
+#define ZThres                        	640
+#define EMask                         	704
+#define poly_coeff7                   	768
+#define poly_coeff6                   	832
+#define poly_coeff5                   	896
+#define poly_coeff4                   	960
+#define poly_coeff3                   	1024
+#define poly_coeff2                   	1088
+#define One                           	1152
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_expm1_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   L2E+__svml_dexpm1_data_internal_avx512(%rip), %zmm6
+        vmovups   Shifter+__svml_dexpm1_data_internal_avx512(%rip), %zmm4
+        vmovups   L2H+__svml_dexpm1_data_internal_avx512(%rip), %zmm11
+        vmovups   L2L+__svml_dexpm1_data_internal_avx512(%rip), %zmm5
+        vmovups   Threshold+__svml_dexpm1_data_internal_avx512(%rip), %zmm3
+        vmovups   poly_coeff5+__svml_dexpm1_data_internal_avx512(%rip), %zmm13
+        vmovups   poly_coeff4+__svml_dexpm1_data_internal_avx512(%rip), %zmm15
+
+/* polynomial */
+        vmovups   poly_coeff7+__svml_dexpm1_data_internal_avx512(%rip), %zmm12
+
+/* set Z0=max(Z0, -128.0) */
+        vmovups   ZThres+__svml_dexpm1_data_internal_avx512(%rip), %zmm8
+        vmovups   poly_coeff3+__svml_dexpm1_data_internal_avx512(%rip), %zmm14
+        vmovups   __svml_dexpm1_data_internal_avx512(%rip), %zmm9
+        vmovaps   %zmm0, %zmm2
+
+/* 2^(52-4)*1.5 + x * log2(e) */
+        vfmadd213pd {rn-sae}, %zmm4, %zmm2, %zmm6
+        vmovups   Exp_tbl_L+__svml_dexpm1_data_internal_avx512(%rip), %zmm0
+        vcmppd    $21, {sae}, %zmm3, %zmm2, %k0
+
+/* Z0 ~ x*log2(e), rounded to 4 fractional bits */
+        vsubpd    {rn-sae}, %zmm4, %zmm6, %zmm7
+        vpermt2pd Exp_tbl_H+64+__svml_dexpm1_data_internal_avx512(%rip), %zmm6, %zmm9
+        vpermt2pd Exp_tbl_L+64+__svml_dexpm1_data_internal_avx512(%rip), %zmm6, %zmm0
+        vandpd    SgnMask+__svml_dexpm1_data_internal_avx512(%rip), %zmm2, %zmm1
+
+/* R = x - Z0*log(2) */
+        vfnmadd213pd {rn-sae}, %zmm2, %zmm7, %zmm11
+        vmaxpd    {sae}, %zmm8, %zmm7, %zmm10
+        vfnmadd231pd {rn-sae}, %zmm7, %zmm5, %zmm11
+        kmovw     %k0, %edx
+
+/* ensure |R|<2 even for special cases */
+        vandpd    EMask+__svml_dexpm1_data_internal_avx512(%rip), %zmm11, %zmm3
+        vmovups   poly_coeff6+__svml_dexpm1_data_internal_avx512(%rip), %zmm11
+
+/* scale Th */
+        vscalefpd {rn-sae}, %zmm10, %zmm9, %zmm4
+        vfmadd231pd {rn-sae}, %zmm3, %zmm13, %zmm15
+        vfmadd231pd {rn-sae}, %zmm3, %zmm12, %zmm11
+        vmovups   poly_coeff2+__svml_dexpm1_data_internal_avx512(%rip), %zmm12
+        vmulpd    {rn-sae}, %zmm3, %zmm3, %zmm13
+        vfmadd231pd {rn-sae}, %zmm3, %zmm14, %zmm12
+        vfmadd213pd {rn-sae}, %zmm15, %zmm13, %zmm11
+        vfmadd213pd {rn-sae}, %zmm12, %zmm13, %zmm11
+
+/* Tlr + R+ R*Poly */
+        vfmadd213pd {rn-sae}, %zmm0, %zmm13, %zmm11
+
+/* Th - 1 */
+        vmovups   One+__svml_dexpm1_data_internal_avx512(%rip), %zmm0
+        vaddpd    {rn-sae}, %zmm3, %zmm11, %zmm14
+        vsubpd    {rn-sae}, %zmm0, %zmm4, %zmm15
+
+/* (Th-1)+Th*(Tlr + R+ R*Poly) */
+        vfmadd213pd {rn-sae}, %zmm15, %zmm14, %zmm4
+        vorpd     %zmm1, %zmm4, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm2, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      expm1@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_expm1_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dexpm1_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Exp_tbl_H[16][2];
+        __declspec(align(64)) VUINT32 Exp_tbl_L[16][2];
+        __declspec(align(64)) VUINT32 L2E[8][2];
+        __declspec(align(64)) VUINT32 Shifter[8][2];
+        __declspec(align(64)) VUINT32 Threshold[8][2];
+        __declspec(align(64)) VUINT32 SgnMask[8][2];
+        __declspec(align(64)) VUINT32 L2H[8][2];
+        __declspec(align(64)) VUINT32 L2L[8][2];
+        __declspec(align(64)) VUINT32 ZThres[8][2];
+        __declspec(align(64)) VUINT32 EMask[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
+        __declspec(align(64)) VUINT32 One[8][2];
+    } __svml_dexpm1_data_internal_avx512;
+#endif
+__svml_dexpm1_data_internal_avx512:
+        /*== Exp_tbl_H ==*/
+        .quad 0x3ff0000000000000
+        .quad 0x3ff0b5586cf9890f
+        .quad 0x3ff172b83c7d517b
+        .quad 0x3ff2387a6e756238
+        .quad 0x3ff306fe0a31b715
+        .quad 0x3ff3dea64c123422
+        .quad 0x3ff4bfdad5362a27
+        .quad 0x3ff5ab07dd485429
+        .quad 0x3ff6a09e667f3bcd
+        .quad 0x3ff7a11473eb0187
+        .quad 0x3ff8ace5422aa0db
+        .quad 0x3ff9c49182a3f090
+        .quad 0x3ffae89f995ad3ad
+        .quad 0x3ffc199bdd85529c
+        .quad 0x3ffd5818dcfba487
+        .quad 0x3ffea4afa2a490da
+        /*== Exp_tbl_L ==*/
+        .align 64
+        .quad 0x0000000000000000
+        .quad 0x3c979aa65d837b6d
+        .quad 0xbc801b15eaa59348
+        .quad 0x3c968efde3a8a894
+        .quad 0x3c834d754db0abb6
+        .quad 0x3c859f48a72a4c6d
+        .quad 0x3c7690cebb7aafb0
+        .quad 0x3c9063e1e21c5409
+        .quad 0xbc93b3efbf5e2228
+        .quad 0xbc7b32dcb94da51d
+        .quad 0x3c8db72fc1f0eab4
+        .quad 0x3c71affc2b91ce27
+        .quad 0x3c8c1a7792cb3387
+        .quad 0x3c736eae30af0cb3
+        .quad 0x3c74a385a63d07a7
+        .quad 0xbc8ff7128fd391f0
+        /*== log2(e) ==*/
+        .align 64
+        .quad 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE
+        /*== Shifter=2^(52-4)*1.5 ==*/
+        .align 64
+        .quad 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0
+        /*== Threshold ==*/
+        .align 64
+        .quad 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44
+        /*== Sgn ==*/
+        .align 64
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
+        /*== L2H = log(2)_high ==*/
+        .align 64
+        .quad 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef
+        /*== L2L = log(2)_low ==*/
+        .align 64
+        .quad 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f
+        /*== ZThres ==*/
+        .align 64
+        .quad 0xc060000000000000, 0xc060000000000000, 0xc060000000000000, 0xc060000000000000, 0xc060000000000000, 0xc060000000000000, 0xc060000000000000, 0xc060000000000000
+        /*== EMask ==*/
+        .align 64
+        .quad 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff
+        /*== poly_coeff7 ==*/
+        .align 64
+        .quad 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a
+        /*== poly_coeff6 ==*/
+        .align 64
+        .quad 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f
+        /*== poly_coeff5 ==*/
+        .align 64
+        .quad 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214
+        /*== poly_coeff4 ==*/
+        .align 64
+        .quad 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06
+        /*== poly_coeff3 ==*/
+        .align 64
+        .quad 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656
+        /*== poly_coeff2 ==*/
+        .align 64
+        .quad 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2
+        /*== One ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        .align 64
+        .type	__svml_dexpm1_data_internal_avx512,@object
+        .size	__svml_dexpm1_data_internal_avx512,.-__svml_dexpm1_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core-avx2.S
new file mode 100644
index 0000000000..a2a8699a05
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized expm1f.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_expm1f _ZGVeN16v_expm1f_avx2_wrapper
+#include "../svml_s_expm1f16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core.c
new file mode 100644
index 0000000000..8007d1e415
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized expm1f, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_expm1f
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_expm1f, __GI__ZGVeN16v_expm1f,
+	       __redirect__ZGVeN16v_expm1f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core_avx512.S
new file mode 100644
index 0000000000..5b0dcde77f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core_avx512.S
@@ -0,0 +1,281 @@
+/* Function expm1f vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *   After computing exp(x) in high-low parts, an accurate computation is performed to obtain exp(x)-1
+ *   Typical exp() implementation, except that:
+ *    - tables are small (32 elements), allowing for fast gathers
+ *    - all arguments processed in the main path
+ *        - final VSCALEF assists branch-free design (correct overflow/underflow and special case responses)
+ *        - a VAND is used to ensure the reduced argument |R|<2, even for large inputs
+ *        - RZ mode used to avoid oveflow to +/-Inf for x*log2(e); helps with special case handling
+ *
+ *
+ */
+
+/* Offsets for data table __svml_sexpm1_data_internal_avx512
+ */
+#define Exp_tbl_H                     	0
+#define Exp_tbl_L                     	128
+#define L2E                           	256
+#define Shifter                       	320
+#define Threshold                     	384
+#define SgnMask                       	448
+#define L2H                           	512
+#define L2L                           	576
+#define EMask                         	640
+#define poly_coeff3                   	704
+#define poly_coeff2                   	768
+#define One                           	832
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_expm1f_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   L2E+__svml_sexpm1_data_internal_avx512(%rip), %zmm5
+        vmovups   Shifter+__svml_sexpm1_data_internal_avx512(%rip), %zmm3
+        vmovups   L2H+__svml_sexpm1_data_internal_avx512(%rip), %zmm8
+        vmovups   L2L+__svml_sexpm1_data_internal_avx512(%rip), %zmm4
+        vmovups   __svml_sexpm1_data_internal_avx512(%rip), %zmm6
+
+/* polynomial */
+        vmovups   poly_coeff3+__svml_sexpm1_data_internal_avx512(%rip), %zmm9
+        vmovups   poly_coeff2+__svml_sexpm1_data_internal_avx512(%rip), %zmm12
+        vmovups   Exp_tbl_L+__svml_sexpm1_data_internal_avx512(%rip), %zmm11
+        vmovups   Threshold+__svml_sexpm1_data_internal_avx512(%rip), %zmm2
+
+/* Th - 1 */
+        vmovups   One+__svml_sexpm1_data_internal_avx512(%rip), %zmm14
+        vmovaps   %zmm0, %zmm1
+
+/* 2^(52-5)*1.5 + x * log2(e) */
+        vfmadd213ps {rn-sae}, %zmm3, %zmm1, %zmm5
+        vcmpps    $29, {sae}, %zmm2, %zmm1, %k0
+
+/* Z0 ~ x*log2(e), rounded to 5 fractional bits */
+        vsubps    {rn-sae}, %zmm3, %zmm5, %zmm7
+        vpermt2ps Exp_tbl_H+64+__svml_sexpm1_data_internal_avx512(%rip), %zmm5, %zmm6
+        vpermt2ps Exp_tbl_L+64+__svml_sexpm1_data_internal_avx512(%rip), %zmm5, %zmm11
+        vandps    SgnMask+__svml_sexpm1_data_internal_avx512(%rip), %zmm1, %zmm0
+
+/* R = x - Z0*log(2) */
+        vfnmadd213ps {rn-sae}, %zmm1, %zmm7, %zmm8
+
+/* scale Th */
+        vscalefps {rn-sae}, %zmm7, %zmm6, %zmm2
+        vfnmadd231ps {rn-sae}, %zmm7, %zmm4, %zmm8
+        kmovw     %k0, %edx
+
+/* ensure |R|<2 even for special cases */
+        vandps    EMask+__svml_sexpm1_data_internal_avx512(%rip), %zmm8, %zmm13
+        vsubps    {rn-sae}, %zmm14, %zmm2, %zmm8
+        vmulps    {rn-sae}, %zmm13, %zmm13, %zmm10
+        vfmadd231ps {rn-sae}, %zmm13, %zmm9, %zmm12
+
+/* Tlr + R+ R2*Poly */
+        vfmadd213ps {rn-sae}, %zmm11, %zmm10, %zmm12
+        vaddps    {rn-sae}, %zmm13, %zmm12, %zmm15
+
+/* (Th-1)+Th*(Tlr + R+ R*Poly) */
+        vfmadd213ps {rn-sae}, %zmm8, %zmm15, %zmm2
+        vorps     %zmm0, %zmm2, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm1, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      expm1f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_expm1f_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_sexpm1_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Exp_tbl_H[32][1];
+        __declspec(align(64)) VUINT32 Exp_tbl_L[32][1];
+        __declspec(align(64)) VUINT32 L2E[16][1];
+        __declspec(align(64)) VUINT32 Shifter[16][1];
+        __declspec(align(64)) VUINT32 Threshold[16][1];
+        __declspec(align(64)) VUINT32 SgnMask[16][1];
+        __declspec(align(64)) VUINT32 L2H[16][1];
+        __declspec(align(64)) VUINT32 L2L[16][1];
+        __declspec(align(64)) VUINT32 EMask[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
+        __declspec(align(64)) VUINT32 One[16][1];
+    } __svml_sexpm1_data_internal_avx512;
+#endif
+__svml_sexpm1_data_internal_avx512:
+        /*== Exp_tbl_H ==*/
+        .long 0x3f800000, 0x3f82cd87, 0x3f85aac3, 0x3f88980f
+        .long 0x3f8b95c2, 0x3f8ea43a, 0x3f91c3d3, 0x3f94f4f0
+        .long 0x3f9837f0, 0x3f9b8d3a, 0x3f9ef532, 0x3fa27043
+        .long 0x3fa5fed7, 0x3fa9a15b, 0x3fad583f, 0x3fb123f6
+        .long 0x3fb504f3, 0x3fb8fbaf, 0x3fbd08a4, 0x3fc12c4d
+        .long 0x3fc5672a, 0x3fc9b9be, 0x3fce248c, 0x3fd2a81e
+        .long 0x3fd744fd, 0x3fdbfbb8, 0x3fe0ccdf, 0x3fe5b907
+        .long 0x3feac0c7, 0x3fefe4ba, 0x3ff5257d, 0x3ffa83b3
+        /*== Exp_tbl_L ==*/
+        .align 64
+        .long 0x00000000, 0xb34a3a0a, 0x3346cb6a, 0xb36ed17e
+        .long 0xb24e0611, 0xb3517dd9, 0x334b2482, 0xb31586de
+        .long 0x33092801, 0xb2e6f467, 0x331b85f2, 0x3099b6f1
+        .long 0xb3051aa8, 0xb2e2a0da, 0xb2006c56, 0xb3365942
+        .long 0x329302ae, 0x32c595dc, 0xb302e5a2, 0xb28e10a1
+        .long 0x31b3d0e5, 0xb31a472b, 0x31d1daf2, 0xb305bf64
+        .long 0xb27ce182, 0xb2f26443, 0xb1b4b0da, 0xb1da8a8f
+        .long 0xb1d290be, 0xb2d5b899, 0x31b0a147, 0xb2156afc
+        /*== log2(e) ==*/
+        .align 64
+        .long 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B
+        /*== Shifter=2^(23-5)*1.5 ==*/
+        .align 64
+        .long 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000
+        /*== Threshold ==*/
+        .align 64
+        .long 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B
+        /*== Sgn ==*/
+        .align 64
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000
+        /*== L2H = log(2)_high ==*/
+        .align 64
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
+        /*== L2L = log(2)_low ==*/
+        .align 64
+        .long 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308
+        /*== EMask ==*/
+        .align 64
+        .long 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff
+        /*== poly_coeff3 ==*/
+        .align 64
+        .long 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3
+        /*== poly_coeff2 ==*/
+        .align 64
+        .long 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6
+        /*== One ==*/
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        .align 64
+        .type	__svml_sexpm1_data_internal_avx512,@object
+        .size	__svml_sexpm1_data_internal_avx512,.-__svml_sexpm1_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core-sse2.S
new file mode 100644
index 0000000000..b4dbb77590
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized expm1f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_expm1f _ZGVbN4v_expm1f_sse2
+#include "../svml_s_expm1f4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core.c
new file mode 100644
index 0000000000..f8ef12511d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized expm1f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_expm1f
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_expm1f, __GI__ZGVbN4v_expm1f,
+	       __redirect__ZGVbN4v_expm1f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core_sse4.S
new file mode 100644
index 0000000000..18770f6dbb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core_sse4.S
@@ -0,0 +1,358 @@
+/* Function expm1f vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    N = (int)(x*2^k/log(2.0)), R = x - N*log(2)/2^k
+ *    exp(x) = 2^(N/2^k) * poly(R) is computed in high-low parts
+ *    expm1(x) = exp(x)-1 is then obtained via multi-precision computation
+ *
+ *
+ */
+
+/* Offsets for data table __svml_sexpm1_data_internal
+ */
+#define Expm1_HA_table                	0
+#define poly_coeff                    	512
+#define Log2e                         	576
+#define L2H                           	592
+#define L2L                           	608
+#define ExpAddConst                   	624
+#define IndexMask                     	640
+#define ExpMask                       	656
+#define MOne                          	672
+#define AbsMask                       	688
+#define Threshold                     	704
+#define L2                            	720
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_expm1f_sse4)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $64, %rsp
+        movaps    %xmm0, %xmm4
+        movups    Log2e+__svml_sexpm1_data_internal(%rip), %xmm9
+        lea       __svml_sexpm1_data_internal(%rip), %r8
+        mulps     %xmm0, %xmm9
+        movups    .FLT_10(%rip), %xmm5
+        movups    ExpAddConst+__svml_sexpm1_data_internal(%rip), %xmm2
+        addps     %xmm5, %xmm9
+
+/* argument reduction */
+        movups    L2H+__svml_sexpm1_data_internal(%rip), %xmm6
+        subps     %xmm5, %xmm9
+        mulps     %xmm9, %xmm6
+        addps     %xmm9, %xmm2
+
+/* table lookup */
+        movdqu    IndexMask+__svml_sexpm1_data_internal(%rip), %xmm12
+        subps     %xmm6, %xmm4
+        pand      %xmm2, %xmm12
+        movups    L2L+__svml_sexpm1_data_internal(%rip), %xmm7
+        movups    AbsMask+__svml_sexpm1_data_internal(%rip), %xmm3
+        pshufd    $1, %xmm12, %xmm10
+        movaps    %xmm3, %xmm8
+        mulps     %xmm9, %xmm7
+        andps     %xmm0, %xmm8
+        cmpnleps  Threshold+__svml_sexpm1_data_internal(%rip), %xmm8
+        movd      %xmm12, %edx
+        subps     %xmm7, %xmm4
+        movd      %xmm10, %ecx
+        movmskps  %xmm8, %eax
+        pshufd    $2, %xmm12, %xmm11
+        movaps    %xmm4, %xmm7
+        pshufd    $3, %xmm12, %xmm13
+        andnps    %xmm0, %xmm3
+        movd      %xmm11, %esi
+        movd      %xmm13, %edi
+
+/* polynomial */
+        movups    poly_coeff+__svml_sexpm1_data_internal(%rip), %xmm8
+        movdqu    ExpMask+__svml_sexpm1_data_internal(%rip), %xmm6
+        movslq    %edx, %rdx
+        pand      %xmm6, %xmm2
+        movslq    %ecx, %rcx
+        pslld     $14, %xmm2
+        movslq    %esi, %rsi
+        movslq    %edi, %rdi
+        movq      (%r8,%rdx), %xmm1
+        movq      (%r8,%rcx), %xmm14
+        movq      (%r8,%rsi), %xmm5
+        movq      (%r8,%rdi), %xmm15
+        unpcklps  %xmm14, %xmm1
+        mulps     %xmm4, %xmm8
+        movaps    %xmm1, %xmm10
+        mulps     %xmm4, %xmm7
+        addps     poly_coeff+16+__svml_sexpm1_data_internal(%rip), %xmm8
+        unpcklps  %xmm15, %xmm5
+        movlhps   %xmm5, %xmm10
+        shufps    $238, %xmm5, %xmm1
+        orps      %xmm2, %xmm10
+
+/* T-1 */
+        movups    MOne+__svml_sexpm1_data_internal(%rip), %xmm9
+        mulps     %xmm2, %xmm1
+        addps     %xmm9, %xmm10
+        mulps     %xmm7, %xmm8
+        addps     %xmm1, %xmm10
+        addps     %xmm8, %xmm4
+        movaps    %xmm10, %xmm1
+        subps     %xmm9, %xmm1
+        mulps     %xmm1, %xmm4
+        addps     %xmm4, %xmm10
+        orps      %xmm3, %xmm10
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax xmm0 xmm10
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm10, %xmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm10, 48(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax
+
+        xorl      %edx, %edx
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm10
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 xmm10
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      expm1f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVbN4v_expm1f_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_sexpm1_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 Expm1_HA_table[(1<<7)][1];
+        __declspec(align(16)) VUINT32 poly_coeff[4][4][1];
+        __declspec(align(16)) VUINT32 Log2e[4][1];
+        __declspec(align(16)) VUINT32 L2H[4][1];
+        __declspec(align(16)) VUINT32 L2L[4][1];
+        __declspec(align(16)) VUINT32 ExpAddConst[4][1];
+        __declspec(align(16)) VUINT32 IndexMask[4][1];
+        __declspec(align(16)) VUINT32 ExpMask[4][1];
+        __declspec(align(16)) VUINT32 MOne[4][1];
+        __declspec(align(16)) VUINT32 AbsMask[4][1];
+        __declspec(align(16)) VUINT32 Threshold[4][1];
+        __declspec(align(16)) VUINT32 L2[4][1];
+} __svml_sexpm1_data_internal;
+#endif
+__svml_sexpm1_data_internal:
+        /* Expm1_HA_table */
+        .long 0x00000000, 0x00000000
+        .long 0x00016000, 0x391a3e78
+        .long 0x0002d000, 0xb89e59d5
+        .long 0x00044000, 0xb93ae78a
+        .long 0x0005b000, 0xb9279306
+        .long 0x00072000, 0xb79e6961
+        .long 0x0008a000, 0xb97e2fee
+        .long 0x000a1000, 0x391aaea9
+        .long 0x000b9000, 0x39383c7d
+        .long 0x000d2000, 0xb9241490
+        .long 0x000ea000, 0x39073169
+        .long 0x00103000, 0x386e218a
+        .long 0x0011c000, 0x38f4dceb
+        .long 0x00136000, 0xb93a9a1e
+        .long 0x0014f000, 0x391df520
+        .long 0x00169000, 0x3905a6e4
+        .long 0x00183000, 0x397e0a32
+        .long 0x0019e000, 0x370b2641
+        .long 0x001b9000, 0xb8b1918b
+        .long 0x001d4000, 0xb8132c6a
+        .long 0x001ef000, 0x39264c12
+        .long 0x0020b000, 0x37221f73
+        .long 0x00227000, 0x37060619
+        .long 0x00243000, 0x3922b5c1
+        .long 0x00260000, 0xb814ab27
+        .long 0x0027d000, 0xb89b12c6
+        .long 0x0029a000, 0x382d5a75
+        .long 0x002b8000, 0xb938c94b
+        .long 0x002d6000, 0xb97822b8
+        .long 0x002f4000, 0xb910ea53
+        .long 0x00312000, 0x38fd6075
+        .long 0x00331000, 0x38620955
+        .long 0x00350000, 0x391e667f
+        .long 0x00370000, 0xb89b8736
+        .long 0x00390000, 0xb90a1714
+        .long 0x003b0000, 0xb7a54ded
+        .long 0x003d1000, 0xb96b8c15
+        .long 0x003f1000, 0x397336cf
+        .long 0x00413000, 0xb8eccd66
+        .long 0x00434000, 0x39599b45
+        .long 0x00456000, 0x3965422b
+        .long 0x00479000, 0xb8a2cdd5
+        .long 0x0049c000, 0xb9484f32
+        .long 0x004bf000, 0xb8fac043
+        .long 0x004e2000, 0x391182a4
+        .long 0x00506000, 0x38ccf6bc
+        .long 0x0052b000, 0xb97c4dc2
+        .long 0x0054f000, 0x38d6aaf4
+        .long 0x00574000, 0x391f995b
+        .long 0x0059a000, 0xb8ba8f62
+        .long 0x005c0000, 0xb9090d05
+        .long 0x005e6000, 0x37f4825e
+        .long 0x0060d000, 0xb8c844f5
+        .long 0x00634000, 0xb76d1a83
+        .long 0x0065c000, 0xb95f2310
+        .long 0x00684000, 0xb952b5f8
+        .long 0x006ac000, 0x37c6e7dd
+        .long 0x006d5000, 0xb7cfe126
+        .long 0x006fe000, 0x3917337c
+        .long 0x00728000, 0x383b9e2d
+        .long 0x00752000, 0x392fa2a5
+        .long 0x0077d000, 0x37df730b
+        .long 0x007a8000, 0x38ecb6dd
+        .long 0x007d4000, 0xb879f986
+        /*== poly_coeff[4] ==*/
+        .align 16
+        .long 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF /* coeff3 */
+        .long 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F /* coeff2 */
+        /* 32 Byte Padding */
+        .zero 32
+        /*== Log2e ==*/
+        .align 16
+        .long 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B
+        /*== L2H ==*/
+        .align 16
+        .long 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000
+        /*== L2L ==*/
+        .align 16
+        .long 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083
+        /*== ExpAddConst ==*/
+        .align 16
+        .long 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00
+        /*== IndexMask ==*/
+        .align 16
+        .long 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8
+        /*== ExpMask ==*/
+        .align 16
+        .long 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00
+        /*== MOne ==*/
+        .align 16
+        .long 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000
+        /*== AbsMask ==*/
+        .align 16
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== Threshold ==*/
+        .align 16
+        .long 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B // 86.643394
+        /*== L2 ==*/
+        .align 16
+        .long 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218
+        .align 16
+        .type	__svml_sexpm1_data_internal,@object
+        .size	__svml_sexpm1_data_internal,.-__svml_sexpm1_data_internal
+        .align 16
+
+.FLT_10:
+        .long	0x4b400000,0x4b400000,0x4b400000,0x4b400000
+        .type	.FLT_10,@object
+        .size	.FLT_10,16
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core-sse.S
new file mode 100644
index 0000000000..e34e4eb8d0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized expm1f, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_expm1f _ZGVdN8v_expm1f_sse_wrapper
+#include "../svml_s_expm1f8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core.c
new file mode 100644
index 0000000000..7e8b57de30
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized expm1f, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_expm1f
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_expm1f, __GI__ZGVdN8v_expm1f,
+	       __redirect__ZGVdN8v_expm1f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core_avx2.S
new file mode 100644
index 0000000000..8e65d692d6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core_avx2.S
@@ -0,0 +1,351 @@
+/* Function expm1f vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    N = (int)(x*2^k/log(2.0)), R = x - N*log(2)/2^k
+ *    exp(x) = 2^(N/2^k) * poly(R) is computed in high-low parts
+ *    expm1(x) = exp(x)-1 is then obtained via multi-precision computation
+ *
+ *
+ */
+
+/* Offsets for data table __svml_sexpm1_data_internal
+ */
+#define Expm1_HA_table                	0
+#define poly_coeff                    	512
+#define Log2e                         	640
+#define L2H                           	672
+#define L2L                           	704
+#define ExpAddConst                   	736
+#define IndexMask                     	768
+#define ExpMask                       	800
+#define MOne                          	832
+#define AbsMask                       	864
+#define Threshold                     	896
+#define L2                            	928
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_expm1f_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       __svml_sexpm1_data_internal(%rip), %rax
+        vmovaps   %ymm0, %ymm3
+        vmulps    Log2e+__svml_sexpm1_data_internal(%rip), %ymm3, %ymm4
+
+/* argument reduction */
+        vmovups   L2H+__svml_sexpm1_data_internal(%rip), %ymm2
+        vmovups   AbsMask+__svml_sexpm1_data_internal(%rip), %ymm5
+        vroundps  $0, %ymm4, %ymm8
+        vaddps    ExpAddConst+__svml_sexpm1_data_internal(%rip), %ymm8, %ymm0
+        vfnmadd213ps %ymm3, %ymm8, %ymm2
+
+/* table lookup */
+        vandps    IndexMask+__svml_sexpm1_data_internal(%rip), %ymm0, %ymm9
+        vandps    %ymm5, %ymm3, %ymm6
+        vcmpnle_uqps Threshold+__svml_sexpm1_data_internal(%rip), %ymm6, %ymm7
+        vfnmadd231ps L2L+__svml_sexpm1_data_internal(%rip), %ymm8, %ymm2
+        vandps    ExpMask+__svml_sexpm1_data_internal(%rip), %ymm0, %ymm0
+        vandnps   %ymm3, %ymm5, %ymm1
+        vpslld    $14, %ymm0, %ymm0
+        vmovmskps %ymm7, %edx
+        vmovd     %xmm9, %ecx
+        vextractf128 $1, %ymm9, %xmm10
+        movslq    %ecx, %rcx
+        vmovd     %xmm10, %r9d
+        vpextrd   $1, %xmm9, %esi
+        vpextrd   $2, %xmm9, %edi
+        vpextrd   $3, %xmm9, %r8d
+        vmovq     (%rax,%rcx), %xmm11
+        vpextrd   $1, %xmm10, %r10d
+        vpextrd   $2, %xmm10, %r11d
+        vpextrd   $3, %xmm10, %ecx
+        movslq    %esi, %rsi
+        movslq    %edi, %rdi
+        movslq    %r8d, %r8
+        movslq    %r9d, %r9
+        movslq    %r10d, %r10
+        movslq    %r11d, %r11
+        movslq    %ecx, %rcx
+        vmovq     (%rax,%rsi), %xmm13
+        vmovq     (%rax,%rdi), %xmm12
+        vmovq     (%rax,%r8), %xmm14
+        vmovq     (%rax,%r9), %xmm15
+        vmovq     (%rax,%r10), %xmm5
+        vmovq     (%rax,%r11), %xmm4
+        vmovq     (%rax,%rcx), %xmm6
+        vunpcklps %xmm12, %xmm11, %xmm7
+        vunpcklps %xmm14, %xmm13, %xmm8
+        vunpcklps %xmm4, %xmm15, %xmm15
+        vunpcklps %xmm6, %xmm5, %xmm9
+        vmulps    %ymm2, %ymm2, %ymm13
+        vinsertf128 $1, %xmm15, %ymm7, %ymm10
+        vinsertf128 $1, %xmm9, %ymm8, %ymm11
+        vunpcklps %ymm11, %ymm10, %ymm12
+        vorps     %ymm0, %ymm12, %ymm14
+
+/* polynomial */
+        vmovups   poly_coeff+__svml_sexpm1_data_internal(%rip), %ymm12
+        vfmadd213ps poly_coeff+32+__svml_sexpm1_data_internal(%rip), %ymm2, %ymm12
+        vfmadd213ps %ymm2, %ymm13, %ymm12
+
+/* T-1 */
+        vmovups   MOne+__svml_sexpm1_data_internal(%rip), %ymm13
+        vaddps    %ymm13, %ymm14, %ymm2
+        vunpckhps %ymm11, %ymm10, %ymm4
+        vfmadd213ps %ymm2, %ymm0, %ymm4
+        vsubps    %ymm13, %ymm4, %ymm0
+        vfmadd213ps %ymm4, %ymm12, %ymm0
+        vorps     %ymm1, %ymm0, %ymm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm3, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      expm1f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_expm1f_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_sexpm1_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 Expm1_HA_table[(1<<7)][1];
+        __declspec(align(32)) VUINT32 poly_coeff[4][8][1];
+        __declspec(align(32)) VUINT32 Log2e[8][1];
+        __declspec(align(32)) VUINT32 L2H[8][1];
+        __declspec(align(32)) VUINT32 L2L[8][1];
+        __declspec(align(32)) VUINT32 ExpAddConst[8][1];
+        __declspec(align(32)) VUINT32 IndexMask[8][1];
+        __declspec(align(32)) VUINT32 ExpMask[8][1];
+        __declspec(align(32)) VUINT32 MOne[8][1];
+        __declspec(align(32)) VUINT32 AbsMask[8][1];
+        __declspec(align(32)) VUINT32 Threshold[8][1];
+        __declspec(align(32)) VUINT32 L2[8][1];
+} __svml_sexpm1_data_internal;
+#endif
+__svml_sexpm1_data_internal:
+        /* Expm1_HA_table */
+        .long 0x00000000, 0x00000000
+        .long 0x00016000, 0x391a3e78
+        .long 0x0002d000, 0xb89e59d5
+        .long 0x00044000, 0xb93ae78a
+        .long 0x0005b000, 0xb9279306
+        .long 0x00072000, 0xb79e6961
+        .long 0x0008a000, 0xb97e2fee
+        .long 0x000a1000, 0x391aaea9
+        .long 0x000b9000, 0x39383c7d
+        .long 0x000d2000, 0xb9241490
+        .long 0x000ea000, 0x39073169
+        .long 0x00103000, 0x386e218a
+        .long 0x0011c000, 0x38f4dceb
+        .long 0x00136000, 0xb93a9a1e
+        .long 0x0014f000, 0x391df520
+        .long 0x00169000, 0x3905a6e4
+        .long 0x00183000, 0x397e0a32
+        .long 0x0019e000, 0x370b2641
+        .long 0x001b9000, 0xb8b1918b
+        .long 0x001d4000, 0xb8132c6a
+        .long 0x001ef000, 0x39264c12
+        .long 0x0020b000, 0x37221f73
+        .long 0x00227000, 0x37060619
+        .long 0x00243000, 0x3922b5c1
+        .long 0x00260000, 0xb814ab27
+        .long 0x0027d000, 0xb89b12c6
+        .long 0x0029a000, 0x382d5a75
+        .long 0x002b8000, 0xb938c94b
+        .long 0x002d6000, 0xb97822b8
+        .long 0x002f4000, 0xb910ea53
+        .long 0x00312000, 0x38fd6075
+        .long 0x00331000, 0x38620955
+        .long 0x00350000, 0x391e667f
+        .long 0x00370000, 0xb89b8736
+        .long 0x00390000, 0xb90a1714
+        .long 0x003b0000, 0xb7a54ded
+        .long 0x003d1000, 0xb96b8c15
+        .long 0x003f1000, 0x397336cf
+        .long 0x00413000, 0xb8eccd66
+        .long 0x00434000, 0x39599b45
+        .long 0x00456000, 0x3965422b
+        .long 0x00479000, 0xb8a2cdd5
+        .long 0x0049c000, 0xb9484f32
+        .long 0x004bf000, 0xb8fac043
+        .long 0x004e2000, 0x391182a4
+        .long 0x00506000, 0x38ccf6bc
+        .long 0x0052b000, 0xb97c4dc2
+        .long 0x0054f000, 0x38d6aaf4
+        .long 0x00574000, 0x391f995b
+        .long 0x0059a000, 0xb8ba8f62
+        .long 0x005c0000, 0xb9090d05
+        .long 0x005e6000, 0x37f4825e
+        .long 0x0060d000, 0xb8c844f5
+        .long 0x00634000, 0xb76d1a83
+        .long 0x0065c000, 0xb95f2310
+        .long 0x00684000, 0xb952b5f8
+        .long 0x006ac000, 0x37c6e7dd
+        .long 0x006d5000, 0xb7cfe126
+        .long 0x006fe000, 0x3917337c
+        .long 0x00728000, 0x383b9e2d
+        .long 0x00752000, 0x392fa2a5
+        .long 0x0077d000, 0x37df730b
+        .long 0x007a8000, 0x38ecb6dd
+        .long 0x007d4000, 0xb879f986
+        /*== poly_coeff[4] ==*/
+        .align 32
+        .long 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF /* coeff3 */
+        .long 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F /* coeff2 */
+        /* 64 Byte Padding */
+        .zero 64
+        /*== Log2e ==*/
+        .align 32
+        .long 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B
+        /*== L2H ==*/
+        .align 32
+        .long 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000
+        /*== L2L ==*/
+        .align 32
+        .long 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083
+        /*== ExpAddConst ==*/
+        .align 32
+        .long 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00
+        /*== IndexMask ==*/
+        .align 32
+        .long 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8
+        /*== ExpMask ==*/
+        .align 32
+        .long 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00
+        /*== MOne ==*/
+        .align 32
+        .long 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000
+        /*== AbsMask ==*/
+        .align 32
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== Threshold ==*/
+        .align 32
+        .long 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B // 86.643394
+        /*== L2 ==*/
+        .align 32
+        .long 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218
+        .align 32
+        .type	__svml_sexpm1_data_internal,@object
+        .size	__svml_sexpm1_data_internal,.-__svml_sexpm1_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_expm12_core.S b/sysdeps/x86_64/fpu/svml_d_expm12_core.S
new file mode 100644
index 0000000000..a725d614bd
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_expm12_core.S
@@ -0,0 +1,29 @@
+/* Function expm1 vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_expm1)
+WRAPPER_IMPL_SSE2 expm1
+END (_ZGVbN2v_expm1)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_expm1)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_expm14_core.S b/sysdeps/x86_64/fpu/svml_d_expm14_core.S
new file mode 100644
index 0000000000..1027def883
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_expm14_core.S
@@ -0,0 +1,29 @@
+/* Function expm1 vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_expm1)
+WRAPPER_IMPL_AVX _ZGVbN2v_expm1
+END (_ZGVdN4v_expm1)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_expm1)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S b/sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S
new file mode 100644
index 0000000000..3a34262241
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S
@@ -0,0 +1,25 @@
+/* Function expm1 vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_expm1)
+WRAPPER_IMPL_AVX _ZGVbN2v_expm1
+END (_ZGVcN4v_expm1)
diff --git a/sysdeps/x86_64/fpu/svml_d_expm18_core.S b/sysdeps/x86_64/fpu/svml_d_expm18_core.S
new file mode 100644
index 0000000000..fa97595665
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_expm18_core.S
@@ -0,0 +1,25 @@
+/* Function expm1 vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_expm1)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_expm1
+END (_ZGVeN8v_expm1)
diff --git a/sysdeps/x86_64/fpu/svml_s_expm1f16_core.S b/sysdeps/x86_64/fpu/svml_s_expm1f16_core.S
new file mode 100644
index 0000000000..b7423632a9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_expm1f16_core.S
@@ -0,0 +1,25 @@
+/* Function expm1f vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_expm1f)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_expm1f
+END (_ZGVeN16v_expm1f)
diff --git a/sysdeps/x86_64/fpu/svml_s_expm1f4_core.S b/sysdeps/x86_64/fpu/svml_s_expm1f4_core.S
new file mode 100644
index 0000000000..334a49133a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_expm1f4_core.S
@@ -0,0 +1,29 @@
+/* Function expm1f vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_expm1f)
+WRAPPER_IMPL_SSE2 expm1f
+END (_ZGVbN4v_expm1f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_expm1f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_expm1f8_core.S b/sysdeps/x86_64/fpu/svml_s_expm1f8_core.S
new file mode 100644
index 0000000000..10589574a5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_expm1f8_core.S
@@ -0,0 +1,29 @@
+/* Function expm1f vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_expm1f)
+WRAPPER_IMPL_AVX _ZGVbN4v_expm1f
+END (_ZGVdN8v_expm1f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_expm1f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S
new file mode 100644
index 0000000000..4161113615
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function expm1f vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_expm1f)
+WRAPPER_IMPL_AVX _ZGVbN4v_expm1f
+END (_ZGVcN8v_expm1f)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx.c
new file mode 100644
index 0000000000..3e59cb7141
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-expm1.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx2.c
new file mode 100644
index 0000000000..3e59cb7141
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-expm1.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx512f.c
new file mode 100644
index 0000000000..3e59cb7141
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-expm1.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-expm1.c b/sysdeps/x86_64/fpu/test-double-libmvec-expm1.c
new file mode 100644
index 0000000000..33806a78c8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-expm1.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC expm1
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 68c449e04a..0222f9f5b8 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh)
+VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index df67306373..1aad9faf9c 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -37,6 +37,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh)
+VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 1a6731098f..e404bf899d 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh)
+VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 4cdfa918e8..2b4de59343 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh)
+VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx.c
new file mode 100644
index 0000000000..67e31f9666
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-expm1f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx2.c
new file mode 100644
index 0000000000..67e31f9666
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-expm1f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx512f.c
new file mode 100644
index 0000000000..67e31f9666
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-expm1f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-expm1f.c b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f.c
new file mode 100644
index 0000000000..aa9871a39d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC expm1f
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 47a9862233..9a4a1b84a9 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf)
+VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index e7c5410e7b..eb4e36d0e2 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf)
+VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index b8e9d48cd6..d8adab59e6 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -37,6 +37,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf)
+VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 328c827b27..e6e1a90c72 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf)
 VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf)
+VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 08/18] x86-64: Add vector sinh/sinhf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (6 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 07/18] x86-64: Add vector expm1/expm1f " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:26   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 09/18] x86-64: Add vector cbrt/cbrtf " Sunil K Pandey
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector sinh/sinhf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
 .../fpu/multiarch/svml_d_sinh2_core-sse2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_sinh2_core.c  |  27 +
 .../fpu/multiarch/svml_d_sinh2_core_sse4.S    | 456 +++++++++++++++++
 .../fpu/multiarch/svml_d_sinh4_core-sse.S     |  20 +
 .../x86_64/fpu/multiarch/svml_d_sinh4_core.c  |  27 +
 .../fpu/multiarch/svml_d_sinh4_core_avx2.S    | 470 ++++++++++++++++++
 .../fpu/multiarch/svml_d_sinh8_core-avx2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_sinh8_core.c  |  27 +
 .../fpu/multiarch/svml_d_sinh8_core_avx512.S  | 461 +++++++++++++++++
 .../fpu/multiarch/svml_s_sinhf16_core-avx2.S  |  20 +
 .../fpu/multiarch/svml_s_sinhf16_core.c       |  28 ++
 .../multiarch/svml_s_sinhf16_core_avx512.S    | 318 ++++++++++++
 .../fpu/multiarch/svml_s_sinhf4_core-sse2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_s_sinhf4_core.c |  28 ++
 .../fpu/multiarch/svml_s_sinhf4_core_sse4.S   | 308 ++++++++++++
 .../fpu/multiarch/svml_s_sinhf8_core-sse.S    |  20 +
 .../x86_64/fpu/multiarch/svml_s_sinhf8_core.c |  28 ++
 .../fpu/multiarch/svml_s_sinhf8_core_avx2.S   | 309 ++++++++++++
 sysdeps/x86_64/fpu/svml_d_sinh2_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_sinh4_core.S        |  29 ++
 sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S    |  25 +
 sysdeps/x86_64/fpu/svml_d_sinh8_core.S        |  25 +
 sysdeps/x86_64/fpu/svml_s_sinhf16_core.S      |  25 +
 sysdeps/x86_64/fpu/svml_s_sinhf4_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_sinhf8_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S   |  25 +
 .../x86_64/fpu/test-double-libmvec-sinh-avx.c |   1 +
 .../fpu/test-double-libmvec-sinh-avx2.c       |   1 +
 .../fpu/test-double-libmvec-sinh-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-double-libmvec-sinh.c |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-sinhf-avx.c |   1 +
 .../fpu/test-float-libmvec-sinhf-avx2.c       |   1 +
 .../fpu/test-float-libmvec-sinhf-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 2894 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 28dc4a82c5..6347320521 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -186,4 +186,15 @@
 #define __DECL_SIMD_expm1f32x
 #define __DECL_SIMD_expm1f64x
 #define __DECL_SIMD_expm1f128x
+
+#define __DECL_SIMD_sinh
+#define __DECL_SIMD_sinhf
+#define __DECL_SIMD_sinhl
+#define __DECL_SIMD_sinhf16
+#define __DECL_SIMD_sinhf32
+#define __DECL_SIMD_sinhf64
+#define __DECL_SIMD_sinhf128
+#define __DECL_SIMD_sinhf32x
+#define __DECL_SIMD_sinhf64x
+#define __DECL_SIMD_sinhf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index c57adc8ace..673b3a93ba 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -70,7 +70,7 @@ __MATHCALL (tan,, (_Mdouble_ __x));
 /* Hyperbolic cosine of X.  */
 __MATHCALL_VEC (cosh,, (_Mdouble_ __x));
 /* Hyperbolic sine of X.  */
-__MATHCALL (sinh,, (_Mdouble_ __x));
+__MATHCALL_VEC (sinh,, (_Mdouble_ __x));
 /* Hyperbolic tangent of X.  */
 __MATHCALL (tanh,, (_Mdouble_ __x));
 
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index c9d3213bd3..f9d7b085ab 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -53,6 +53,7 @@ GLIBC_2.35 _ZGVbN2v_cosh F
 GLIBC_2.35 _ZGVbN2v_exp10 F
 GLIBC_2.35 _ZGVbN2v_exp2 F
 GLIBC_2.35 _ZGVbN2v_expm1 F
+GLIBC_2.35 _ZGVbN2v_sinh F
 GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
 GLIBC_2.35 _ZGVbN4v_asinf F
@@ -61,6 +62,7 @@ GLIBC_2.35 _ZGVbN4v_coshf F
 GLIBC_2.35 _ZGVbN4v_exp10f F
 GLIBC_2.35 _ZGVbN4v_exp2f F
 GLIBC_2.35 _ZGVbN4v_expm1f F
+GLIBC_2.35 _ZGVbN4v_sinhf F
 GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
 GLIBC_2.35 _ZGVcN4v_asin F
@@ -69,6 +71,7 @@ GLIBC_2.35 _ZGVcN4v_cosh F
 GLIBC_2.35 _ZGVcN4v_exp10 F
 GLIBC_2.35 _ZGVcN4v_exp2 F
 GLIBC_2.35 _ZGVcN4v_expm1 F
+GLIBC_2.35 _ZGVcN4v_sinh F
 GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
 GLIBC_2.35 _ZGVcN8v_asinf F
@@ -77,6 +80,7 @@ GLIBC_2.35 _ZGVcN8v_coshf F
 GLIBC_2.35 _ZGVcN8v_exp10f F
 GLIBC_2.35 _ZGVcN8v_exp2f F
 GLIBC_2.35 _ZGVcN8v_expm1f F
+GLIBC_2.35 _ZGVcN8v_sinhf F
 GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
 GLIBC_2.35 _ZGVdN4v_asin F
@@ -85,6 +89,7 @@ GLIBC_2.35 _ZGVdN4v_cosh F
 GLIBC_2.35 _ZGVdN4v_exp10 F
 GLIBC_2.35 _ZGVdN4v_exp2 F
 GLIBC_2.35 _ZGVdN4v_expm1 F
+GLIBC_2.35 _ZGVdN4v_sinh F
 GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
 GLIBC_2.35 _ZGVdN8v_asinf F
@@ -93,6 +98,7 @@ GLIBC_2.35 _ZGVdN8v_coshf F
 GLIBC_2.35 _ZGVdN8v_exp10f F
 GLIBC_2.35 _ZGVdN8v_exp2f F
 GLIBC_2.35 _ZGVdN8v_expm1f F
+GLIBC_2.35 _ZGVdN8v_sinhf F
 GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
 GLIBC_2.35 _ZGVeN16v_asinf F
@@ -101,6 +107,7 @@ GLIBC_2.35 _ZGVeN16v_coshf F
 GLIBC_2.35 _ZGVeN16v_exp10f F
 GLIBC_2.35 _ZGVeN16v_exp2f F
 GLIBC_2.35 _ZGVeN16v_expm1f F
+GLIBC_2.35 _ZGVeN16v_sinhf F
 GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
 GLIBC_2.35 _ZGVeN8v_asin F
@@ -109,4 +116,5 @@ GLIBC_2.35 _ZGVeN8v_cosh F
 GLIBC_2.35 _ZGVeN8v_exp10 F
 GLIBC_2.35 _ZGVeN8v_exp2 F
 GLIBC_2.35 _ZGVeN8v_expm1 F
+GLIBC_2.35 _ZGVeN8v_sinh F
 GLIBC_2.35 _ZGVeN8vv_hypot F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index e2f98e176f..51a41cfebc 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -90,6 +90,10 @@
 #  define __DECL_SIMD_expm1 __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_expm1f
 #  define __DECL_SIMD_expm1f __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_sinh
+#  define __DECL_SIMD_sinh __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_sinhf
+#  define __DECL_SIMD_sinhf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index 43233059f6..91e9b4fc83 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -44,6 +44,8 @@
 !GCC$ builtin (coshf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (expm1) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (expm1f) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (sinh) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (sinhf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -73,3 +75,5 @@
 !GCC$ builtin (coshf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (expm1) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (expm1f) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (sinh) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (sinhf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 8de8214971..81e9fc95b2 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -36,6 +36,7 @@ libmvec-funcs = \
   pow \
   sin \
   sincos \
+  sinh \
 
 # Define libmvec function for benchtests directory.
 libmvec-bench-funcs = \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 58debb2dbe..2710446d12 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -21,6 +21,7 @@ libmvec {
     _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
     _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
     _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
+    _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
     _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
@@ -29,6 +30,7 @@ libmvec {
     _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
     _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
     _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
+    _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
     _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
   }
 }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index f05ece8c8a..f4b313119d 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1840,6 +1840,26 @@ float: 3
 float128: 4
 ldouble: 5
 
+Function: "sinh_vlen16":
+float: 1
+
+Function: "sinh_vlen2":
+double: 2
+
+Function: "sinh_vlen4":
+double: 2
+float: 1
+
+Function: "sinh_vlen4_avx2":
+double: 2
+
+Function: "sinh_vlen8":
+double: 2
+float: 1
+
+Function: "sinh_vlen8_avx2":
+float: 1
+
 Function: "tan":
 float: 1
 float128: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core-sse2.S
new file mode 100644
index 0000000000..ca12ad6678
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized sinh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_sinh _ZGVbN2v_sinh_sse2
+#include "../svml_d_sinh2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core.c
new file mode 100644
index 0000000000..c0344b2902
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized sinh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_sinh
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_sinh, __GI__ZGVbN2v_sinh, __redirect__ZGVbN2v_sinh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core_sse4.S
new file mode 100644
index 0000000000..80d19e9dba
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core_sse4.S
@@ -0,0 +1,456 @@
+/* Function sinh vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute sinh(x) as (exp(x)-exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   sinh(NaN) = quiet NaN, and raise invalid exception
+ *   sinh(INF) = that INF
+ *   sinh(x)   = x for subnormals
+ *   sinh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_dsinh_data_internal
+ */
+#define _dbInvLn2                     	0
+#define _dbLn2hi                      	16
+#define _dbLn2lo                      	32
+#define _dSign                        	48
+#define _dbT                          	64
+#define _dbShifter                    	2112
+#define _iDomainRange                 	2128
+#define _dPC2                         	2144
+#define _dPC3                         	2160
+#define _dPC4                         	2176
+#define _dPC5                         	2192
+#define _lIndexMask                   	2208
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_sinh_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm2
+
+/*  Abs argument  */
+        movups    _dSign+__svml_dsinh_data_internal(%rip), %xmm0
+        lea       _dbT+8+__svml_dsinh_data_internal(%rip), %rsi
+        andps     %xmm2, %xmm0
+        movaps    %xmm0, %xmm1
+
+/*
+ *  Load argument
+ * dM = x*2^K/log(2) + RShifter
+ */
+        movups    _dbInvLn2+__svml_dsinh_data_internal(%rip), %xmm10
+        pxor      %xmm2, %xmm1
+        mulpd     %xmm1, %xmm10
+        movups    _dbShifter+__svml_dsinh_data_internal(%rip), %xmm5
+        addpd     %xmm5, %xmm10
+
+/*
+ *  R
+ * dN = dM - RShifter
+ */
+        movaps    %xmm10, %xmm7
+        subpd     %xmm5, %xmm7
+
+/* dR = dX - dN*Log2_hi/2^K */
+        movups    _dbLn2hi+__svml_dsinh_data_internal(%rip), %xmm6
+        mulpd     %xmm7, %xmm6
+
+/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */
+        movups    _dbLn2lo+__svml_dsinh_data_internal(%rip), %xmm8
+        mulpd     %xmm7, %xmm8
+
+/*
+ * Check for overflow\underflow
+ *
+ */
+        pshufd    $221, %xmm1, %xmm4
+        subpd     %xmm6, %xmm1
+        subpd     %xmm8, %xmm1
+
+/* VLOAD_CONST( D, dPC[0],         TAB._dPC1 ); */
+        movq      _iDomainRange+__svml_dsinh_data_internal(%rip), %xmm3
+        pcmpgtd   %xmm3, %xmm4
+
+/* dR2 = dR^2 */
+        movaps    %xmm1, %xmm3
+        mulpd     %xmm1, %xmm3
+        movmskps  %xmm4, %edx
+
+/*
+ * sinh(r) = r*((a1=1)+r^2*(a3+r^2*a5)) = r + r*(r^2*(a3+r^2*a5)) ....
+ * dSinh_r = (a3+r^2*a5)
+ */
+        movups    _dPC5+__svml_dsinh_data_internal(%rip), %xmm12
+
+/*
+ * poly(r) = (dG2+dG1)+dG3*sinh(dR)+dG1*sinh(dR)+(dG1+dG2)*dR2*(a2 +a4*dR2)
+ * dOut = (a2 +a4*dR2)
+ */
+        movups    _dPC4+__svml_dsinh_data_internal(%rip), %xmm13
+        mulpd     %xmm3, %xmm12
+        mulpd     %xmm3, %xmm13
+        addpd     _dPC3+__svml_dsinh_data_internal(%rip), %xmm12
+        addpd     _dPC2+__svml_dsinh_data_internal(%rip), %xmm13
+
+/* dSinh_r = r^2*(a3+r^2*a5) */
+        mulpd     %xmm3, %xmm12
+
+/* dOut = dR2*(a2 +a4*dR2) */
+        mulpd     %xmm13, %xmm3
+
+/* dSinh_r = r + r*(r^2*(a3+r^2*a5)) */
+        mulpd     %xmm1, %xmm12
+
+/*
+ *  Index and lookup
+ * j
+ */
+        movups    _lIndexMask+__svml_dsinh_data_internal(%rip), %xmm9
+        andps     %xmm10, %xmm9
+        movd      %xmm9, %eax
+
+/* split j and N */
+        pxor      %xmm9, %xmm10
+
+/*
+ *  G1,G2,G3: dTdif,dTn * 2^N,2^(-N)
+ * lM now is an EXP(2^N)
+ */
+        psllq     $45, %xmm10
+
+/*  */
+        movaps    %xmm10, %xmm4
+        pextrw    $4, %xmm9, %ecx
+        addpd     %xmm12, %xmm1
+        shll      $4, %eax
+        shll      $4, %ecx
+        movq      (%rax,%rsi), %xmm11
+        movhpd    (%rcx,%rsi), %xmm11
+        paddq     %xmm11, %xmm4
+
+/*  */
+        psubq     %xmm10, %xmm11
+
+/* dG3 = dTn*2^N + dTn*2^-N */
+        movdqa    %xmm4, %xmm14
+        addpd     %xmm11, %xmm14
+
+/* dG2 = dTn*2^N - dTn*2^-N */
+        subpd     %xmm11, %xmm4
+        movq      -8(%rax,%rsi), %xmm15
+        movhpd    -8(%rcx,%rsi), %xmm15
+        paddq     %xmm10, %xmm15
+
+/* dG2 += dG1 */
+        addpd     %xmm15, %xmm4
+
+/* dG1 += dG3 */
+        addpd     %xmm14, %xmm15
+
+/* dOut = dG2*dR2*(a2 +a4*dR2) */
+        mulpd     %xmm4, %xmm3
+
+/* dOut = dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
+        mulpd     %xmm15, %xmm1
+        addpd     %xmm1, %xmm3
+
+/* dOut = dG2 + dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
+        addpd     %xmm3, %xmm4
+
+/*  Ret H  */
+        orps      %xmm4, %xmm0
+        andl      $3, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm2, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      sinh@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN2v_sinh_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dsinh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _dbInvLn2[2][2];
+        __declspec(align(16)) VUINT32 _dbLn2hi[2][2];
+        __declspec(align(16)) VUINT32 _dbLn2lo[2][2];
+        __declspec(align(16)) VUINT32 _dSign[2][2];                //0x8000000000000000
+        __declspec(align(16)) VUINT32 _dbT[(1<<7)][2][2]; //precalc poly coeff
+        __declspec(align(16)) VUINT32 _dbShifter[2][2];
+        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
+        __declspec(align(16)) VUINT32 _dPC2[2][2];
+        __declspec(align(16)) VUINT32 _dPC3[2][2];
+        __declspec(align(16)) VUINT32 _dPC4[2][2];
+        __declspec(align(16)) VUINT32 _dPC5[2][2];
+        __declspec(align(16)) VUINT32 _lIndexMask[2][2];
+} __svml_dsinh_data_internal;
+#endif
+__svml_dsinh_data_internal:
+        .quad 0x3FF71547652B82FE, 0x3FF71547652B82FE /* _dbInvLn2 = 1/log(2) */
+        .align 16
+        .quad 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000 /* _dbLn2hi  = log(2) hi*/
+        .align 16
+        .quad 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A /* _dbLn2lo  = log(2) lo*/
+        .align 16
+        .quad 0x8000000000000000, 0x8000000000000000 /* _dSign */
+        //_dbT
+        .align 16
+        .quad 0x0000000000000000, 0x3FE0000000000000  //2^( 0 /128-1) - 2^(- 0 /128-1), 2^(- 0 /128-1)
+        .quad 0x3F762E4A19BD1E74, 0x3FDFD3C22B8F71F1  //2^( 1 /128-1) - 2^(- 1 /128-1), 2^(- 1 /128-1)
+        .quad 0x3F862E5F6A0DFD36, 0x3FDFA7C1819E90D8  //2^( 2 /128-1) - 2^(- 2 /128-1), 2^(- 2 /128-1)
+        .quad 0x3F90A2E234040F5F, 0x3FDF7BFDAD9CBE14  //2^( 3 /128-1) - 2^(- 3 /128-1), 2^(- 3 /128-1)
+        .quad 0x3F962EB4ABCC5A81, 0x3FDF50765B6E4540  //2^( 4 /128-1) - 2^(- 4 /128-1), 2^(- 4 /128-1)
+        .quad 0x3F9BBAB1C5033244, 0x3FDF252B376BBA97  //2^( 5 /128-1) - 2^(- 5 /128-1), 2^(- 5 /128-1)
+        .quad 0x3FA0A372144EEB45, 0x3FDEFA1BEE615A27  //2^( 6 /128-1) - 2^(- 6 /128-1), 2^(- 6 /128-1)
+        .quad 0x3FA369AB3FFBF8B0, 0x3FDECF482D8E67F1  //2^( 7 /128-1) - 2^(- 7 /128-1), 2^(- 7 /128-1)
+        .quad 0x3FA63009BA740A2A, 0x3FDEA4AFA2A490DA  //2^( 8 /128-1) - 2^(- 8 /128-1), 2^(- 8 /128-1)
+        .quad 0x3FA8F692D8EA1B5A, 0x3FDE7A51FBC74C83  //2^( 9 /128-1) - 2^(- 9 /128-1), 2^(- 9 /128-1)
+        .quad 0x3FABBD4BF0E31A6F, 0x3FDE502EE78B3FF6  //2^( 10 /128-1) - 2^(- 10 /128-1), 2^(- 10 /128-1)
+        .quad 0x3FAE843A5840286A, 0x3FDE264614F5A129  //2^( 11 /128-1) - 2^(- 11 /128-1), 2^(- 11 /128-1)
+        .quad 0x3FB0A5B1B2A46D0A, 0x3FDDFC97337B9B5F  //2^( 12 /128-1) - 2^(- 12 /128-1), 2^(- 12 /128-1)
+        .quad 0x3FB20966375ABCDF, 0x3FDDD321F301B460  //2^( 13 /128-1) - 2^(- 13 /128-1), 2^(- 13 /128-1)
+        .quad 0x3FB36D3D65DCA4E8, 0x3FDDA9E603DB3285  //2^( 14 /128-1) - 2^(- 14 /128-1), 2^(- 14 /128-1)
+        .quad 0x3FB4D139EA06642A, 0x3FDD80E316C98398  //2^( 15 /128-1) - 2^(- 15 /128-1), 2^(- 15 /128-1)
+        .quad 0x3FB6355E6FFBF9BA, 0x3FDD5818DCFBA487  //2^( 16 /128-1) - 2^(- 16 /128-1), 2^(- 16 /128-1)
+        .quad 0x3FB799ADA42E4788, 0x3FDD2F87080D89F2  //2^( 17 /128-1) - 2^(- 17 /128-1), 2^(- 17 /128-1)
+        .quad 0x3FB8FE2A336035BC, 0x3FDD072D4A07897C  //2^( 18 /128-1) - 2^(- 18 /128-1), 2^(- 18 /128-1)
+        .quad 0x3FBA62D6CAABD6B6, 0x3FDCDF0B555DC3FA  //2^( 19 /128-1) - 2^(- 19 /128-1), 2^(- 19 /128-1)
+        .quad 0x3FBBC7B617878BAF, 0x3FDCB720DCEF9069  //2^( 20 /128-1) - 2^(- 20 /128-1), 2^(- 20 /128-1)
+        .quad 0x3FBD2CCAC7CB2A11, 0x3FDC8F6D9406E7B5  //2^( 21 /128-1) - 2^(- 21 /128-1), 2^(- 21 /128-1)
+        .quad 0x3FBE921789B52185, 0x3FDC67F12E57D14B  //2^( 22 /128-1) - 2^(- 22 /128-1), 2^(- 22 /128-1)
+        .quad 0x3FBFF79F0BEFA2C7, 0x3FDC40AB5FFFD07A  //2^( 23 /128-1) - 2^(- 23 /128-1), 2^(- 23 /128-1)
+        .quad 0x3FC0AEB1FECAE3A9, 0x3FDC199BDD85529C  //2^( 24 /128-1) - 2^(- 24 /128-1), 2^(- 24 /128-1)
+        .quad 0x3FC161B4871C5CEC, 0x3FDBF2C25BD71E09  //2^( 25 /128-1) - 2^(- 25 /128-1), 2^(- 25 /128-1)
+        .quad 0x3FC214D876F26FD0, 0x3FDBCC1E904BC1D2  //2^( 26 /128-1) - 2^(- 26 /128-1), 2^(- 26 /128-1)
+        .quad 0x3FC2C81F2693816F, 0x3FDBA5B030A1064A  //2^( 27 /128-1) - 2^(- 27 /128-1), 2^(- 27 /128-1)
+        .quad 0x3FC37B89EE88BEF7, 0x3FDB7F76F2FB5E47  //2^( 28 /128-1) - 2^(- 28 /128-1), 2^(- 28 /128-1)
+        .quad 0x3FC42F1A27A0B3CD, 0x3FDB59728DE5593A  //2^( 29 /128-1) - 2^(- 29 /128-1), 2^(- 29 /128-1)
+        .quad 0x3FC4E2D12AF1E037, 0x3FDB33A2B84F15FB  //2^( 30 /128-1) - 2^(- 30 /128-1), 2^(- 30 /128-1)
+        .quad 0x3FC596B051DD508D, 0x3FDB0E07298DB666  //2^( 31 /128-1) - 2^(- 31 /128-1), 2^(- 31 /128-1)
+        .quad 0x3FC64AB8F61134FA, 0x3FDAE89F995AD3AD  //2^( 32 /128-1) - 2^(- 32 /128-1), 2^(- 32 /128-1)
+        .quad 0x3FC6FEEC718B79D1, 0x3FDAC36BBFD3F37A  //2^( 33 /128-1) - 2^(- 33 /128-1), 2^(- 33 /128-1)
+        .quad 0x3FC7B34C1E9C607F, 0x3FDA9E6B5579FDBF  //2^( 34 /128-1) - 2^(- 34 /128-1), 2^(- 34 /128-1)
+        .quad 0x3FC867D957E91912, 0x3FDA799E1330B358  //2^( 35 /128-1) - 2^(- 35 /128-1), 2^(- 35 /128-1)
+        .quad 0x3FC91C95786E5C72, 0x3FDA5503B23E255D  //2^( 36 /128-1) - 2^(- 36 /128-1), 2^(- 36 /128-1)
+        .quad 0x3FC9D181DB83072F, 0x3FDA309BEC4A2D33  //2^( 37 /128-1) - 2^(- 37 /128-1), 2^(- 37 /128-1)
+        .quad 0x3FCA869FDCDAB512, 0x3FDA0C667B5DE565  //2^( 38 /128-1) - 2^(- 38 /128-1), 2^(- 38 /128-1)
+        .quad 0x3FCB3BF0D8885D4C, 0x3FD9E86319E32323  //2^( 39 /128-1) - 2^(- 39 /128-1), 2^(- 39 /128-1)
+        .quad 0x3FCBF1762B00EF69, 0x3FD9C49182A3F090  //2^( 40 /128-1) - 2^(- 40 /128-1), 2^(- 40 /128-1)
+        .quad 0x3FCCA731311DF0FB, 0x3FD9A0F170CA07BA  //2^( 41 /128-1) - 2^(- 41 /128-1), 2^(- 41 /128-1)
+        .quad 0x3FCD5D2348201C09, 0x3FD97D829FDE4E50  //2^( 42 /128-1) - 2^(- 42 /128-1), 2^(- 42 /128-1)
+        .quad 0x3FCE134DCDB1FE3E, 0x3FD95A44CBC8520F  //2^( 43 /128-1) - 2^(- 43 /128-1), 2^(- 43 /128-1)
+        .quad 0x3FCEC9B21FEA98EA, 0x3FD93737B0CDC5E5  //2^( 44 /128-1) - 2^(- 44 /128-1), 2^(- 44 /128-1)
+        .quad 0x3FCF80519D5001D3, 0x3FD9145B0B91FFC6  //2^( 45 /128-1) - 2^(- 45 /128-1), 2^(- 45 /128-1)
+        .quad 0x3FD01B96D26D026A, 0x3FD8F1AE99157736  //2^( 46 /128-1) - 2^(- 46 /128-1), 2^(- 46 /128-1)
+        .quad 0x3FD07723CAFA6331, 0x3FD8CF3216B5448C  //2^( 47 /128-1) - 2^(- 47 /128-1), 2^(- 47 /128-1)
+        .quad 0x3FD0D2D06841B373, 0x3FD8ACE5422AA0DB  //2^( 48 /128-1) - 2^(- 48 /128-1), 2^(- 48 /128-1)
+        .quad 0x3FD12E9D5A715381, 0x3FD88AC7D98A6699  //2^( 49 /128-1) - 2^(- 49 /128-1), 2^(- 49 /128-1)
+        .quad 0x3FD18A8B51F5C661, 0x3FD868D99B4492ED  //2^( 50 /128-1) - 2^(- 50 /128-1), 2^(- 50 /128-1)
+        .quad 0x3FD1E69AFF7B04D7, 0x3FD8471A4623C7AD  //2^( 51 /128-1) - 2^(- 51 /128-1), 2^(- 51 /128-1)
+        .quad 0x3FD242CD13EDD0F1, 0x3FD82589994CCE13  //2^( 52 /128-1) - 2^(- 52 /128-1), 2^(- 52 /128-1)
+        .quad 0x3FD29F22407D0A0C, 0x3FD80427543E1A12  //2^( 53 /128-1) - 2^(- 53 /128-1), 2^(- 53 /128-1)
+        .quad 0x3FD2FB9B369B0153, 0x3FD7E2F336CF4E62  //2^( 54 /128-1) - 2^(- 54 /128-1), 2^(- 54 /128-1)
+        .quad 0x3FD35838A7FECEC8, 0x3FD7C1ED0130C132  //2^( 55 /128-1) - 2^(- 55 /128-1), 2^(- 55 /128-1)
+        .quad 0x3FD3B4FB46A5A6CC, 0x3FD7A11473EB0187  //2^( 56 /128-1) - 2^(- 56 /128-1), 2^(- 56 /128-1)
+        .quad 0x3FD411E3C4D4302F, 0x3FD780694FDE5D3F  //2^( 57 /128-1) - 2^(- 57 /128-1), 2^(- 57 /128-1)
+        .quad 0x3FD46EF2D517DAC8, 0x3FD75FEB564267C9  //2^( 58 /128-1) - 2^(- 58 /128-1), 2^(- 58 /128-1)
+        .quad 0x3FD4CC292A48369E, 0x3FD73F9A48A58174  //2^( 59 /128-1) - 2^(- 59 /128-1), 2^(- 59 /128-1)
+        .quad 0x3FD5298777884B96, 0x3FD71F75E8EC5F74  //2^( 60 /128-1) - 2^(- 60 /128-1), 2^(- 60 /128-1)
+        .quad 0x3FD5870E7047F1BC, 0x3FD6FF7DF9519484  //2^( 61 /128-1) - 2^(- 61 /128-1), 2^(- 61 /128-1)
+        .quad 0x3FD5E4BEC8452A1A, 0x3FD6DFB23C651A2F  //2^( 62 /128-1) - 2^(- 62 /128-1), 2^(- 62 /128-1)
+        .quad 0x3FD64299338D7827, 0x3FD6C012750BDABF  //2^( 63 /128-1) - 2^(- 63 /128-1), 2^(- 63 /128-1)
+        .quad 0x3FD6A09E667F3BCD, 0x3FD6A09E667F3BCD  //2^( 64 /128-1) - 2^(- 64 /128-1), 2^(- 64 /128-1)
+        .quad 0x3FD6FECF15CB0C0B, 0x3FD68155D44CA973  //2^( 65 /128-1) - 2^(- 65 /128-1), 2^(- 65 /128-1)
+        .quad 0x3FD75D2BF6751239, 0x3FD6623882552225  //2^( 66 /128-1) - 2^(- 66 /128-1), 2^(- 66 /128-1)
+        .quad 0x3FD7BBB5BDD665E8, 0x3FD6434634CCC320  //2^( 67 /128-1) - 2^(- 67 /128-1), 2^(- 67 /128-1)
+        .quad 0x3FD81A6D219E6963, 0x3FD6247EB03A5585  //2^( 68 /128-1) - 2^(- 68 /128-1), 2^(- 68 /128-1)
+        .quad 0x3FD87952D7D426DF, 0x3FD605E1B976DC09  //2^( 69 /128-1) - 2^(- 69 /128-1), 2^(- 69 /128-1)
+        .quad 0x3FD8D86796D7AE49, 0x3FD5E76F15AD2148  //2^( 70 /128-1) - 2^(- 70 /128-1), 2^(- 70 /128-1)
+        .quad 0x3FD937AC156373C8, 0x3FD5C9268A5946B7  //2^( 71 /128-1) - 2^(- 71 /128-1), 2^(- 71 /128-1)
+        .quad 0x3FD997210A8DAEE4, 0x3FD5AB07DD485429  //2^( 72 /128-1) - 2^(- 72 /128-1), 2^(- 72 /128-1)
+        .quad 0x3FD9F6C72DC9BA68, 0x3FD58D12D497C7FD  //2^( 73 /128-1) - 2^(- 73 /128-1), 2^(- 73 /128-1)
+        .quad 0x3FDA569F36E974EA, 0x3FD56F4736B527DA  //2^( 74 /128-1) - 2^(- 74 /128-1), 2^(- 74 /128-1)
+        .quad 0x3FDAB6A9DE1EA215, 0x3FD551A4CA5D920F  //2^( 75 /128-1) - 2^(- 75 /128-1), 2^(- 75 /128-1)
+        .quad 0x3FDB16E7DBFC4CA3, 0x3FD5342B569D4F82  //2^( 76 /128-1) - 2^(- 76 /128-1), 2^(- 76 /128-1)
+        .quad 0x3FDB7759E9782918, 0x3FD516DAA2CF6642  //2^( 77 /128-1) - 2^(- 77 /128-1), 2^(- 77 /128-1)
+        .quad 0x3FDBD800BFEBF932, 0x3FD4F9B2769D2CA7  //2^( 78 /128-1) - 2^(- 78 /128-1), 2^(- 78 /128-1)
+        .quad 0x3FDC38DD1916F025, 0x3FD4DCB299FDDD0D  //2^( 79 /128-1) - 2^(- 79 /128-1), 2^(- 79 /128-1)
+        .quad 0x3FDC99EFAF1F1790, 0x3FD4BFDAD5362A27  //2^( 80 /128-1) - 2^(- 80 /128-1), 2^(- 80 /128-1)
+        .quad 0x3FDCFB393C92B539, 0x3FD4A32AF0D7D3DE  //2^( 81 /128-1) - 2^(- 81 /128-1), 2^(- 81 /128-1)
+        .quad 0x3FDD5CBA7C69B19C, 0x3FD486A2B5C13CD0  //2^( 82 /128-1) - 2^(- 82 /128-1), 2^(- 82 /128-1)
+        .quad 0x3FDDBE742A06FF34, 0x3FD46A41ED1D0057  //2^( 83 /128-1) - 2^(- 83 /128-1), 2^(- 83 /128-1)
+        .quad 0x3FDE2067013A029D, 0x3FD44E086061892D  //2^( 84 /128-1) - 2^(- 84 /128-1), 2^(- 84 /128-1)
+        .quad 0x3FDE8293BE3FFB87, 0x3FD431F5D950A897  //2^( 85 /128-1) - 2^(- 85 /128-1), 2^(- 85 /128-1)
+        .quad 0x3FDEE4FB1DC56E75, 0x3FD4160A21F72E2A  //2^( 86 /128-1) - 2^(- 86 /128-1), 2^(- 86 /128-1)
+        .quad 0x3FDF479DDCE78F58, 0x3FD3FA4504AC801C  //2^( 87 /128-1) - 2^(- 87 /128-1), 2^(- 87 /128-1)
+        .quad 0x3FDFAA7CB935ACFE, 0x3FD3DEA64C123422  //2^( 88 /128-1) - 2^(- 88 /128-1), 2^(- 88 /128-1)
+        .quad 0x3FE006CC38594EB1, 0x3FD3C32DC313A8E5  //2^( 89 /128-1) - 2^(- 89 /128-1), 2^(- 89 /128-1)
+        .quad 0x3FE03878E0EB1569, 0x3FD3A7DB34E59FF7  //2^( 90 /128-1) - 2^(- 90 /128-1), 2^(- 90 /128-1)
+        .quad 0x3FE06A44B5C74101, 0x3FD38CAE6D05D866  //2^( 91 /128-1) - 2^(- 91 /128-1), 2^(- 91 /128-1)
+        .quad 0x3FE09C3016A0D077, 0x3FD371A7373AA9CB  //2^( 92 /128-1) - 2^(- 92 /128-1), 2^(- 92 /128-1)
+        .quad 0x3FE0CE3B63676360, 0x3FD356C55F929FF1  //2^( 93 /128-1) - 2^(- 93 /128-1), 2^(- 93 /128-1)
+        .quad 0x3FE10066FC47F240, 0x3FD33C08B26416FF  //2^( 94 /128-1) - 2^(- 94 /128-1), 2^(- 94 /128-1)
+        .quad 0x3FE132B341AD8761, 0x3FD32170FC4CD831  //2^( 95 /128-1) - 2^(- 95 /128-1), 2^(- 95 /128-1)
+        .quad 0x3FE165209441F823, 0x3FD306FE0A31B715  //2^( 96 /128-1) - 2^(- 96 /128-1), 2^(- 96 /128-1)
+        .quad 0x3FE197AF54EE9EBB, 0x3FD2ECAFA93E2F56  //2^( 97 /128-1) - 2^(- 97 /128-1), 2^(- 97 /128-1)
+        .quad 0x3FE1CA5FE4DD1475, 0x3FD2D285A6E4030B  //2^( 98 /128-1) - 2^(- 98 /128-1), 2^(- 98 /128-1)
+        .quad 0x3FE1FD32A577EC72, 0x3FD2B87FD0DAD990  //2^( 99 /128-1) - 2^(- 99 /128-1), 2^(- 99 /128-1)
+        .quad 0x3FE23027F86B6ED6, 0x3FD29E9DF51FDEE1  //2^( 100 /128-1) - 2^(- 100 /128-1), 2^(- 100 /128-1)
+        .quad 0x3FE263403FA65489, 0x3FD284DFE1F56381  //2^( 101 /128-1) - 2^(- 101 /128-1), 2^(- 101 /128-1)
+        .quad 0x3FE2967BDD5A8364, 0x3FD26B4565E27CDD  //2^( 102 /128-1) - 2^(- 102 /128-1), 2^(- 102 /128-1)
+        .quad 0x3FE2C9DB33FDCAE9, 0x3FD251CE4FB2A63F  //2^( 103 /128-1) - 2^(- 103 /128-1), 2^(- 103 /128-1)
+        .quad 0x3FE2FD5EA64AA180, 0x3FD2387A6E756238  //2^( 104 /128-1) - 2^(- 104 /128-1), 2^(- 104 /128-1)
+        .quad 0x3FE331069740E22F, 0x3FD21F49917DDC96  //2^( 105 /128-1) - 2^(- 105 /128-1), 2^(- 105 /128-1)
+        .quad 0x3FE364D36A268AE0, 0x3FD2063B88628CD6  //2^( 106 /128-1) - 2^(- 106 /128-1), 2^(- 106 /128-1)
+        .quad 0x3FE398C582887B27, 0x3FD1ED5022FCD91D  //2^( 107 /128-1) - 2^(- 107 /128-1), 2^(- 107 /128-1)
+        .quad 0x3FE3CCDD443B3394, 0x3FD1D4873168B9AA  //2^( 108 /128-1) - 2^(- 108 /128-1), 2^(- 108 /128-1)
+        .quad 0x3FE4011B135B9590, 0x3FD1BBE084045CD4  //2^( 109 /128-1) - 2^(- 109 /128-1), 2^(- 109 /128-1)
+        .quad 0x3FE4357F544FA3C1, 0x3FD1A35BEB6FCB75  //2^( 110 /128-1) - 2^(- 110 /128-1), 2^(- 110 /128-1)
+        .quad 0x3FE46A0A6BC742FD, 0x3FD18AF9388C8DEA  //2^( 111 /128-1) - 2^(- 111 /128-1), 2^(- 111 /128-1)
+        .quad 0x3FE49EBCBEBCFBCA, 0x3FD172B83C7D517B  //2^( 112 /128-1) - 2^(- 112 /128-1), 2^(- 112 /128-1)
+        .quad 0x3FE4D396B276BC6F, 0x3FD15A98C8A58E51  //2^( 113 /128-1) - 2^(- 113 /128-1), 2^(- 113 /128-1)
+        .quad 0x3FE50898AC869B96, 0x3FD1429AAEA92DE0  //2^( 114 /128-1) - 2^(- 114 /128-1), 2^(- 114 /128-1)
+        .quad 0x3FE53DC312CB9B7A, 0x3FD12ABDC06C31CC  //2^( 115 /128-1) - 2^(- 115 /128-1), 2^(- 115 /128-1)
+        .quad 0x3FE573164B726DB6, 0x3FD11301D0125B51  //2^( 116 /128-1) - 2^(- 116 /128-1), 2^(- 116 /128-1)
+        .quad 0x3FE5A892BCF6379B, 0x3FD0FB66AFFED31B  //2^( 117 /128-1) - 2^(- 117 /128-1), 2^(- 117 /128-1)
+        .quad 0x3FE5DE38CE215725, 0x3FD0E3EC32D3D1A2  //2^( 118 /128-1) - 2^(- 118 /128-1), 2^(- 118 /128-1)
+        .quad 0x3FE61408E60E2888, 0x3FD0CC922B7247F7  //2^( 119 /128-1) - 2^(- 119 /128-1), 2^(- 119 /128-1)
+        .quad 0x3FE64A036C27CC52, 0x3FD0B5586CF9890F  //2^( 120 /128-1) - 2^(- 120 /128-1), 2^(- 120 /128-1)
+        .quad 0x3FE68028C82AEE2F, 0x3FD09E3ECAC6F383  //2^( 121 /128-1) - 2^(- 121 /128-1), 2^(- 121 /128-1)
+        .quad 0x3FE6B67962268C43, 0x3FD0874518759BC8  //2^( 122 /128-1) - 2^(- 122 /128-1), 2^(- 122 /128-1)
+        .quad 0x3FE6ECF5A27CBF28, 0x3FD0706B29DDF6DE  //2^( 123 /128-1) - 2^(- 123 /128-1), 2^(- 123 /128-1)
+        .quad 0x3FE7239DF1E38286, 0x3FD059B0D3158574  //2^( 124 /128-1) - 2^(- 124 /128-1), 2^(- 124 /128-1)
+        .quad 0x3FE75A72B9657E51, 0x3FD04315E86E7F85  //2^( 125 /128-1) - 2^(- 125 /128-1), 2^(- 125 /128-1)
+        .quad 0x3FE791746262D0A8, 0x3FD02C9A3E778061  //2^( 126 /128-1) - 2^(- 126 /128-1), 2^(- 126 /128-1)
+        .quad 0x3FE7C8A35691D856, 0x3FD0163DA9FB3335 //2^( 127 /128-1) - 2^(- 127 /128-1), 2^(- 127 /128-1)
+        .align 16
+        .quad 0x42C8000000000000, 0x42C8000000000000 /* _dbShifter = 1.5 * 2^(52-k)*/
+        .align 16
+        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99         /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
+        .align 16
+        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
+        .align 16
+        .quad 0x3FC55555555554AD, 0x3FC55555555554AD /* _dPC3 */
+        .align 16
+        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
+        .align 16
+        .quad 0x3F8111115712F425, 0x3F8111115712F425 /* _dPC5 */
+        .align 16
+        .quad 0x000000000000007f, 0x000000000000007f /* _lIndexMask */
+        .align 16
+        .type	__svml_dsinh_data_internal,@object
+        .size	__svml_dsinh_data_internal,.-__svml_dsinh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core-sse.S
new file mode 100644
index 0000000000..ae531575fe
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized sinh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_sinh _ZGVdN4v_sinh_sse_wrapper
+#include "../svml_d_sinh4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core.c
new file mode 100644
index 0000000000..bdf10b664b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized sinh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_sinh
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_sinh, __GI__ZGVdN4v_sinh, __redirect__ZGVdN4v_sinh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core_avx2.S
new file mode 100644
index 0000000000..27b50d31a8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core_avx2.S
@@ -0,0 +1,470 @@
+/* Function sinh vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute sinh(x) as (exp(x)-exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   sinh(NaN) = quiet NaN, and raise invalid exception
+ *   sinh(INF) = that INF
+ *   sinh(x)   = x for subnormals
+ *   sinh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_dsinh_data_internal
+ */
+#define _dbInvLn2                     	0
+#define _dbLn2hi                      	32
+#define _dbLn2lo                      	64
+#define _dSign                        	96
+#define _dbT                          	128
+#define _dbShifter                    	2176
+#define _iDomainRange                 	2208
+#define _dPC2                         	2240
+#define _dPC3                         	2272
+#define _dPC4                         	2304
+#define _dPC5                         	2336
+#define _lIndexMask                   	2368
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_sinh_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       _dbT+8+__svml_dsinh_data_internal(%rip), %r8
+        vmovupd   _dbShifter+__svml_dsinh_data_internal(%rip), %ymm12
+
+/*
+ *  Load argument
+ * dM = x*2^K/log(2) + RShifter
+ */
+        vmovupd   _dbInvLn2+__svml_dsinh_data_internal(%rip), %ymm5
+        vmovupd   _dbLn2hi+__svml_dsinh_data_internal(%rip), %ymm13
+        vmovapd   %ymm0, %ymm8
+
+/*
+ * VLOAD_CONST( D, dPC[0],         TAB._dPC1 );
+ *  Abs argument
+ */
+        vandpd    _dSign+__svml_dsinh_data_internal(%rip), %ymm8, %ymm7
+        vxorpd    %ymm8, %ymm7, %ymm6
+        vfmadd213pd %ymm12, %ymm6, %ymm5
+
+/*
+ *  R
+ * dN = dM - RShifter
+ */
+        vsubpd    %ymm12, %ymm5, %ymm3
+
+/*
+ *  Index and lookup
+ * j
+ */
+        vandps    _lIndexMask+__svml_dsinh_data_internal(%rip), %ymm5, %ymm4
+
+/*
+ * Check for overflow\underflow
+ *
+ */
+        vextractf128 $1, %ymm6, %xmm9
+        vshufps   $221, %xmm9, %xmm6, %xmm10
+
+/* dR = dX - dN*Log2_hi/2^K */
+        vfnmadd231pd %ymm13, %ymm3, %ymm6
+        vpcmpgtd  _iDomainRange+__svml_dsinh_data_internal(%rip), %xmm10, %xmm11
+        vmovmskps %xmm11, %eax
+
+/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */
+        vfnmadd231pd _dbLn2lo+__svml_dsinh_data_internal(%rip), %ymm3, %ymm6
+        vextractf128 $1, %ymm4, %xmm0
+        vmovd     %xmm4, %edx
+        vmovd     %xmm0, %esi
+        shll      $4, %edx
+        vpextrd   $2, %xmm4, %ecx
+
+/* split j and N */
+        vxorps    %ymm4, %ymm5, %ymm3
+        shll      $4, %esi
+        vpextrd   $2, %xmm0, %edi
+        shll      $4, %ecx
+
+/*
+ *  G1,G2,G3: dTdif,dTn * 2^N,2^(-N)
+ * lM now is an EXP(2^N)
+ */
+        vpsllq    $45, %ymm3, %ymm4
+        vmovq     (%rdx,%r8), %xmm14
+        vmovq     (%rsi,%r8), %xmm1
+        vmovhpd   (%rcx,%r8), %xmm14, %xmm15
+        shll      $4, %edi
+        vmovhpd   (%rdi,%r8), %xmm1, %xmm2
+
+/* dR2 = dR^2 */
+        vmulpd    %ymm6, %ymm6, %ymm1
+        vmovq     -8(%rdx,%r8), %xmm9
+        vmovq     -8(%rsi,%r8), %xmm11
+        vmovhpd   -8(%rcx,%r8), %xmm9, %xmm10
+        vmovhpd   -8(%rdi,%r8), %xmm11, %xmm12
+        vinsertf128 $1, %xmm2, %ymm15, %ymm2
+
+/*  */
+        vpaddq    %ymm4, %ymm2, %ymm5
+
+/*  */
+        vpsubq    %ymm4, %ymm2, %ymm14
+
+/* dG3 = dTn*2^N + dTn*2^-N */
+        vaddpd    %ymm14, %ymm5, %ymm2
+
+/* dG2 = dTn*2^N - dTn*2^-N */
+        vsubpd    %ymm14, %ymm5, %ymm14
+
+/*
+ * sinh(r) = r*((a1=1)+r^2*(a3+r^2*a5)) = r + r*(r^2*(a3+r^2*a5)) ....
+ * dSinh_r = (a3+r^2*a5)
+ */
+        vmovupd   _dPC5+__svml_dsinh_data_internal(%rip), %ymm5
+        vfmadd213pd _dPC3+__svml_dsinh_data_internal(%rip), %ymm1, %ymm5
+        vinsertf128 $1, %xmm12, %ymm10, %ymm13
+        vpaddq    %ymm4, %ymm13, %ymm0
+
+/* dSinh_r = r^2*(a3+r^2*a5) */
+        vmulpd    %ymm5, %ymm1, %ymm4
+
+/* dG2 += dG1 */
+        vaddpd    %ymm14, %ymm0, %ymm3
+
+/* dG1 += dG3 */
+        vaddpd    %ymm2, %ymm0, %ymm0
+
+/* dSinh_r = r + r*(r^2*(a3+r^2*a5)) */
+        vfmadd213pd %ymm6, %ymm6, %ymm4
+
+/*
+ * poly(r) = (dG2+dG1)+dG3*sinh(dR)+dG1*sinh(dR)+(dG1+dG2)*dR2*(a2 +a4*dR2)
+ * dOut = (a2 +a4*dR2)
+ */
+        vmovupd   _dPC4+__svml_dsinh_data_internal(%rip), %ymm6
+        vfmadd213pd _dPC2+__svml_dsinh_data_internal(%rip), %ymm1, %ymm6
+
+/* dOut = dR2*(a2 +a4*dR2) */
+        vmulpd    %ymm6, %ymm1, %ymm1
+
+/* dOut = dG2*dR2*(a2 +a4*dR2) */
+        vmulpd    %ymm3, %ymm1, %ymm6
+
+/* dOut = dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
+        vfmadd213pd %ymm6, %ymm0, %ymm4
+
+/* dOut = dG2 + dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
+        vaddpd    %ymm4, %ymm3, %ymm5
+
+/*  Ret H  */
+        vorpd     %ymm5, %ymm7, %ymm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm8
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm8, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      sinh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_sinh_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dsinh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _dbInvLn2[4][2];
+        __declspec(align(32)) VUINT32 _dbLn2hi[4][2];
+        __declspec(align(32)) VUINT32 _dbLn2lo[4][2];
+        __declspec(align(32)) VUINT32 _dSign[4][2];                //0x8000000000000000
+        __declspec(align(32)) VUINT32 _dbT[(1<<7)][2][2]; //precalc poly coeff
+        __declspec(align(32)) VUINT32 _dbShifter[4][2];
+        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
+        __declspec(align(32)) VUINT32 _dPC2[4][2];
+        __declspec(align(32)) VUINT32 _dPC3[4][2];
+        __declspec(align(32)) VUINT32 _dPC4[4][2];
+        __declspec(align(32)) VUINT32 _dPC5[4][2];
+        __declspec(align(32)) VUINT32 _lIndexMask[4][2];
+} __svml_dsinh_data_internal;
+#endif
+__svml_dsinh_data_internal:
+        .quad 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE /* _dbInvLn2 = 1/log(2) */
+        .align 32
+        .quad 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000 /* _dbLn2hi  = log(2) hi*/
+        .align 32
+        .quad 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A /* _dbLn2lo  = log(2) lo*/
+        .align 32
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 /* _dSign */
+        //_dbT
+        .align 32
+        .quad 0x0000000000000000, 0x3FE0000000000000  //2^( 0 /128-1) - 2^(- 0 /128-1), 2^(- 0 /128-1)
+        .quad 0x3F762E4A19BD1E74, 0x3FDFD3C22B8F71F1  //2^( 1 /128-1) - 2^(- 1 /128-1), 2^(- 1 /128-1)
+        .quad 0x3F862E5F6A0DFD36, 0x3FDFA7C1819E90D8  //2^( 2 /128-1) - 2^(- 2 /128-1), 2^(- 2 /128-1)
+        .quad 0x3F90A2E234040F5F, 0x3FDF7BFDAD9CBE14  //2^( 3 /128-1) - 2^(- 3 /128-1), 2^(- 3 /128-1)
+        .quad 0x3F962EB4ABCC5A81, 0x3FDF50765B6E4540  //2^( 4 /128-1) - 2^(- 4 /128-1), 2^(- 4 /128-1)
+        .quad 0x3F9BBAB1C5033244, 0x3FDF252B376BBA97  //2^( 5 /128-1) - 2^(- 5 /128-1), 2^(- 5 /128-1)
+        .quad 0x3FA0A372144EEB45, 0x3FDEFA1BEE615A27  //2^( 6 /128-1) - 2^(- 6 /128-1), 2^(- 6 /128-1)
+        .quad 0x3FA369AB3FFBF8B0, 0x3FDECF482D8E67F1  //2^( 7 /128-1) - 2^(- 7 /128-1), 2^(- 7 /128-1)
+        .quad 0x3FA63009BA740A2A, 0x3FDEA4AFA2A490DA  //2^( 8 /128-1) - 2^(- 8 /128-1), 2^(- 8 /128-1)
+        .quad 0x3FA8F692D8EA1B5A, 0x3FDE7A51FBC74C83  //2^( 9 /128-1) - 2^(- 9 /128-1), 2^(- 9 /128-1)
+        .quad 0x3FABBD4BF0E31A6F, 0x3FDE502EE78B3FF6  //2^( 10 /128-1) - 2^(- 10 /128-1), 2^(- 10 /128-1)
+        .quad 0x3FAE843A5840286A, 0x3FDE264614F5A129  //2^( 11 /128-1) - 2^(- 11 /128-1), 2^(- 11 /128-1)
+        .quad 0x3FB0A5B1B2A46D0A, 0x3FDDFC97337B9B5F  //2^( 12 /128-1) - 2^(- 12 /128-1), 2^(- 12 /128-1)
+        .quad 0x3FB20966375ABCDF, 0x3FDDD321F301B460  //2^( 13 /128-1) - 2^(- 13 /128-1), 2^(- 13 /128-1)
+        .quad 0x3FB36D3D65DCA4E8, 0x3FDDA9E603DB3285  //2^( 14 /128-1) - 2^(- 14 /128-1), 2^(- 14 /128-1)
+        .quad 0x3FB4D139EA06642A, 0x3FDD80E316C98398  //2^( 15 /128-1) - 2^(- 15 /128-1), 2^(- 15 /128-1)
+        .quad 0x3FB6355E6FFBF9BA, 0x3FDD5818DCFBA487  //2^( 16 /128-1) - 2^(- 16 /128-1), 2^(- 16 /128-1)
+        .quad 0x3FB799ADA42E4788, 0x3FDD2F87080D89F2  //2^( 17 /128-1) - 2^(- 17 /128-1), 2^(- 17 /128-1)
+        .quad 0x3FB8FE2A336035BC, 0x3FDD072D4A07897C  //2^( 18 /128-1) - 2^(- 18 /128-1), 2^(- 18 /128-1)
+        .quad 0x3FBA62D6CAABD6B6, 0x3FDCDF0B555DC3FA  //2^( 19 /128-1) - 2^(- 19 /128-1), 2^(- 19 /128-1)
+        .quad 0x3FBBC7B617878BAF, 0x3FDCB720DCEF9069  //2^( 20 /128-1) - 2^(- 20 /128-1), 2^(- 20 /128-1)
+        .quad 0x3FBD2CCAC7CB2A11, 0x3FDC8F6D9406E7B5  //2^( 21 /128-1) - 2^(- 21 /128-1), 2^(- 21 /128-1)
+        .quad 0x3FBE921789B52185, 0x3FDC67F12E57D14B  //2^( 22 /128-1) - 2^(- 22 /128-1), 2^(- 22 /128-1)
+        .quad 0x3FBFF79F0BEFA2C7, 0x3FDC40AB5FFFD07A  //2^( 23 /128-1) - 2^(- 23 /128-1), 2^(- 23 /128-1)
+        .quad 0x3FC0AEB1FECAE3A9, 0x3FDC199BDD85529C  //2^( 24 /128-1) - 2^(- 24 /128-1), 2^(- 24 /128-1)
+        .quad 0x3FC161B4871C5CEC, 0x3FDBF2C25BD71E09  //2^( 25 /128-1) - 2^(- 25 /128-1), 2^(- 25 /128-1)
+        .quad 0x3FC214D876F26FD0, 0x3FDBCC1E904BC1D2  //2^( 26 /128-1) - 2^(- 26 /128-1), 2^(- 26 /128-1)
+        .quad 0x3FC2C81F2693816F, 0x3FDBA5B030A1064A  //2^( 27 /128-1) - 2^(- 27 /128-1), 2^(- 27 /128-1)
+        .quad 0x3FC37B89EE88BEF7, 0x3FDB7F76F2FB5E47  //2^( 28 /128-1) - 2^(- 28 /128-1), 2^(- 28 /128-1)
+        .quad 0x3FC42F1A27A0B3CD, 0x3FDB59728DE5593A  //2^( 29 /128-1) - 2^(- 29 /128-1), 2^(- 29 /128-1)
+        .quad 0x3FC4E2D12AF1E037, 0x3FDB33A2B84F15FB  //2^( 30 /128-1) - 2^(- 30 /128-1), 2^(- 30 /128-1)
+        .quad 0x3FC596B051DD508D, 0x3FDB0E07298DB666  //2^( 31 /128-1) - 2^(- 31 /128-1), 2^(- 31 /128-1)
+        .quad 0x3FC64AB8F61134FA, 0x3FDAE89F995AD3AD  //2^( 32 /128-1) - 2^(- 32 /128-1), 2^(- 32 /128-1)
+        .quad 0x3FC6FEEC718B79D1, 0x3FDAC36BBFD3F37A  //2^( 33 /128-1) - 2^(- 33 /128-1), 2^(- 33 /128-1)
+        .quad 0x3FC7B34C1E9C607F, 0x3FDA9E6B5579FDBF  //2^( 34 /128-1) - 2^(- 34 /128-1), 2^(- 34 /128-1)
+        .quad 0x3FC867D957E91912, 0x3FDA799E1330B358  //2^( 35 /128-1) - 2^(- 35 /128-1), 2^(- 35 /128-1)
+        .quad 0x3FC91C95786E5C72, 0x3FDA5503B23E255D  //2^( 36 /128-1) - 2^(- 36 /128-1), 2^(- 36 /128-1)
+        .quad 0x3FC9D181DB83072F, 0x3FDA309BEC4A2D33  //2^( 37 /128-1) - 2^(- 37 /128-1), 2^(- 37 /128-1)
+        .quad 0x3FCA869FDCDAB512, 0x3FDA0C667B5DE565  //2^( 38 /128-1) - 2^(- 38 /128-1), 2^(- 38 /128-1)
+        .quad 0x3FCB3BF0D8885D4C, 0x3FD9E86319E32323  //2^( 39 /128-1) - 2^(- 39 /128-1), 2^(- 39 /128-1)
+        .quad 0x3FCBF1762B00EF69, 0x3FD9C49182A3F090  //2^( 40 /128-1) - 2^(- 40 /128-1), 2^(- 40 /128-1)
+        .quad 0x3FCCA731311DF0FB, 0x3FD9A0F170CA07BA  //2^( 41 /128-1) - 2^(- 41 /128-1), 2^(- 41 /128-1)
+        .quad 0x3FCD5D2348201C09, 0x3FD97D829FDE4E50  //2^( 42 /128-1) - 2^(- 42 /128-1), 2^(- 42 /128-1)
+        .quad 0x3FCE134DCDB1FE3E, 0x3FD95A44CBC8520F  //2^( 43 /128-1) - 2^(- 43 /128-1), 2^(- 43 /128-1)
+        .quad 0x3FCEC9B21FEA98EA, 0x3FD93737B0CDC5E5  //2^( 44 /128-1) - 2^(- 44 /128-1), 2^(- 44 /128-1)
+        .quad 0x3FCF80519D5001D3, 0x3FD9145B0B91FFC6  //2^( 45 /128-1) - 2^(- 45 /128-1), 2^(- 45 /128-1)
+        .quad 0x3FD01B96D26D026A, 0x3FD8F1AE99157736  //2^( 46 /128-1) - 2^(- 46 /128-1), 2^(- 46 /128-1)
+        .quad 0x3FD07723CAFA6331, 0x3FD8CF3216B5448C  //2^( 47 /128-1) - 2^(- 47 /128-1), 2^(- 47 /128-1)
+        .quad 0x3FD0D2D06841B373, 0x3FD8ACE5422AA0DB  //2^( 48 /128-1) - 2^(- 48 /128-1), 2^(- 48 /128-1)
+        .quad 0x3FD12E9D5A715381, 0x3FD88AC7D98A6699  //2^( 49 /128-1) - 2^(- 49 /128-1), 2^(- 49 /128-1)
+        .quad 0x3FD18A8B51F5C661, 0x3FD868D99B4492ED  //2^( 50 /128-1) - 2^(- 50 /128-1), 2^(- 50 /128-1)
+        .quad 0x3FD1E69AFF7B04D7, 0x3FD8471A4623C7AD  //2^( 51 /128-1) - 2^(- 51 /128-1), 2^(- 51 /128-1)
+        .quad 0x3FD242CD13EDD0F1, 0x3FD82589994CCE13  //2^( 52 /128-1) - 2^(- 52 /128-1), 2^(- 52 /128-1)
+        .quad 0x3FD29F22407D0A0C, 0x3FD80427543E1A12  //2^( 53 /128-1) - 2^(- 53 /128-1), 2^(- 53 /128-1)
+        .quad 0x3FD2FB9B369B0153, 0x3FD7E2F336CF4E62  //2^( 54 /128-1) - 2^(- 54 /128-1), 2^(- 54 /128-1)
+        .quad 0x3FD35838A7FECEC8, 0x3FD7C1ED0130C132  //2^( 55 /128-1) - 2^(- 55 /128-1), 2^(- 55 /128-1)
+        .quad 0x3FD3B4FB46A5A6CC, 0x3FD7A11473EB0187  //2^( 56 /128-1) - 2^(- 56 /128-1), 2^(- 56 /128-1)
+        .quad 0x3FD411E3C4D4302F, 0x3FD780694FDE5D3F  //2^( 57 /128-1) - 2^(- 57 /128-1), 2^(- 57 /128-1)
+        .quad 0x3FD46EF2D517DAC8, 0x3FD75FEB564267C9  //2^( 58 /128-1) - 2^(- 58 /128-1), 2^(- 58 /128-1)
+        .quad 0x3FD4CC292A48369E, 0x3FD73F9A48A58174  //2^( 59 /128-1) - 2^(- 59 /128-1), 2^(- 59 /128-1)
+        .quad 0x3FD5298777884B96, 0x3FD71F75E8EC5F74  //2^( 60 /128-1) - 2^(- 60 /128-1), 2^(- 60 /128-1)
+        .quad 0x3FD5870E7047F1BC, 0x3FD6FF7DF9519484  //2^( 61 /128-1) - 2^(- 61 /128-1), 2^(- 61 /128-1)
+        .quad 0x3FD5E4BEC8452A1A, 0x3FD6DFB23C651A2F  //2^( 62 /128-1) - 2^(- 62 /128-1), 2^(- 62 /128-1)
+        .quad 0x3FD64299338D7827, 0x3FD6C012750BDABF  //2^( 63 /128-1) - 2^(- 63 /128-1), 2^(- 63 /128-1)
+        .quad 0x3FD6A09E667F3BCD, 0x3FD6A09E667F3BCD  //2^( 64 /128-1) - 2^(- 64 /128-1), 2^(- 64 /128-1)
+        .quad 0x3FD6FECF15CB0C0B, 0x3FD68155D44CA973  //2^( 65 /128-1) - 2^(- 65 /128-1), 2^(- 65 /128-1)
+        .quad 0x3FD75D2BF6751239, 0x3FD6623882552225  //2^( 66 /128-1) - 2^(- 66 /128-1), 2^(- 66 /128-1)
+        .quad 0x3FD7BBB5BDD665E8, 0x3FD6434634CCC320  //2^( 67 /128-1) - 2^(- 67 /128-1), 2^(- 67 /128-1)
+        .quad 0x3FD81A6D219E6963, 0x3FD6247EB03A5585  //2^( 68 /128-1) - 2^(- 68 /128-1), 2^(- 68 /128-1)
+        .quad 0x3FD87952D7D426DF, 0x3FD605E1B976DC09  //2^( 69 /128-1) - 2^(- 69 /128-1), 2^(- 69 /128-1)
+        .quad 0x3FD8D86796D7AE49, 0x3FD5E76F15AD2148  //2^( 70 /128-1) - 2^(- 70 /128-1), 2^(- 70 /128-1)
+        .quad 0x3FD937AC156373C8, 0x3FD5C9268A5946B7  //2^( 71 /128-1) - 2^(- 71 /128-1), 2^(- 71 /128-1)
+        .quad 0x3FD997210A8DAEE4, 0x3FD5AB07DD485429  //2^( 72 /128-1) - 2^(- 72 /128-1), 2^(- 72 /128-1)
+        .quad 0x3FD9F6C72DC9BA68, 0x3FD58D12D497C7FD  //2^( 73 /128-1) - 2^(- 73 /128-1), 2^(- 73 /128-1)
+        .quad 0x3FDA569F36E974EA, 0x3FD56F4736B527DA  //2^( 74 /128-1) - 2^(- 74 /128-1), 2^(- 74 /128-1)
+        .quad 0x3FDAB6A9DE1EA215, 0x3FD551A4CA5D920F  //2^( 75 /128-1) - 2^(- 75 /128-1), 2^(- 75 /128-1)
+        .quad 0x3FDB16E7DBFC4CA3, 0x3FD5342B569D4F82  //2^( 76 /128-1) - 2^(- 76 /128-1), 2^(- 76 /128-1)
+        .quad 0x3FDB7759E9782918, 0x3FD516DAA2CF6642  //2^( 77 /128-1) - 2^(- 77 /128-1), 2^(- 77 /128-1)
+        .quad 0x3FDBD800BFEBF932, 0x3FD4F9B2769D2CA7  //2^( 78 /128-1) - 2^(- 78 /128-1), 2^(- 78 /128-1)
+        .quad 0x3FDC38DD1916F025, 0x3FD4DCB299FDDD0D  //2^( 79 /128-1) - 2^(- 79 /128-1), 2^(- 79 /128-1)
+        .quad 0x3FDC99EFAF1F1790, 0x3FD4BFDAD5362A27  //2^( 80 /128-1) - 2^(- 80 /128-1), 2^(- 80 /128-1)
+        .quad 0x3FDCFB393C92B539, 0x3FD4A32AF0D7D3DE  //2^( 81 /128-1) - 2^(- 81 /128-1), 2^(- 81 /128-1)
+        .quad 0x3FDD5CBA7C69B19C, 0x3FD486A2B5C13CD0  //2^( 82 /128-1) - 2^(- 82 /128-1), 2^(- 82 /128-1)
+        .quad 0x3FDDBE742A06FF34, 0x3FD46A41ED1D0057  //2^( 83 /128-1) - 2^(- 83 /128-1), 2^(- 83 /128-1)
+        .quad 0x3FDE2067013A029D, 0x3FD44E086061892D  //2^( 84 /128-1) - 2^(- 84 /128-1), 2^(- 84 /128-1)
+        .quad 0x3FDE8293BE3FFB87, 0x3FD431F5D950A897  //2^( 85 /128-1) - 2^(- 85 /128-1), 2^(- 85 /128-1)
+        .quad 0x3FDEE4FB1DC56E75, 0x3FD4160A21F72E2A  //2^( 86 /128-1) - 2^(- 86 /128-1), 2^(- 86 /128-1)
+        .quad 0x3FDF479DDCE78F58, 0x3FD3FA4504AC801C  //2^( 87 /128-1) - 2^(- 87 /128-1), 2^(- 87 /128-1)
+        .quad 0x3FDFAA7CB935ACFE, 0x3FD3DEA64C123422  //2^( 88 /128-1) - 2^(- 88 /128-1), 2^(- 88 /128-1)
+        .quad 0x3FE006CC38594EB1, 0x3FD3C32DC313A8E5  //2^( 89 /128-1) - 2^(- 89 /128-1), 2^(- 89 /128-1)
+        .quad 0x3FE03878E0EB1569, 0x3FD3A7DB34E59FF7  //2^( 90 /128-1) - 2^(- 90 /128-1), 2^(- 90 /128-1)
+        .quad 0x3FE06A44B5C74101, 0x3FD38CAE6D05D866  //2^( 91 /128-1) - 2^(- 91 /128-1), 2^(- 91 /128-1)
+        .quad 0x3FE09C3016A0D077, 0x3FD371A7373AA9CB  //2^( 92 /128-1) - 2^(- 92 /128-1), 2^(- 92 /128-1)
+        .quad 0x3FE0CE3B63676360, 0x3FD356C55F929FF1  //2^( 93 /128-1) - 2^(- 93 /128-1), 2^(- 93 /128-1)
+        .quad 0x3FE10066FC47F240, 0x3FD33C08B26416FF  //2^( 94 /128-1) - 2^(- 94 /128-1), 2^(- 94 /128-1)
+        .quad 0x3FE132B341AD8761, 0x3FD32170FC4CD831  //2^( 95 /128-1) - 2^(- 95 /128-1), 2^(- 95 /128-1)
+        .quad 0x3FE165209441F823, 0x3FD306FE0A31B715  //2^( 96 /128-1) - 2^(- 96 /128-1), 2^(- 96 /128-1)
+        .quad 0x3FE197AF54EE9EBB, 0x3FD2ECAFA93E2F56  //2^( 97 /128-1) - 2^(- 97 /128-1), 2^(- 97 /128-1)
+        .quad 0x3FE1CA5FE4DD1475, 0x3FD2D285A6E4030B  //2^( 98 /128-1) - 2^(- 98 /128-1), 2^(- 98 /128-1)
+        .quad 0x3FE1FD32A577EC72, 0x3FD2B87FD0DAD990  //2^( 99 /128-1) - 2^(- 99 /128-1), 2^(- 99 /128-1)
+        .quad 0x3FE23027F86B6ED6, 0x3FD29E9DF51FDEE1  //2^( 100 /128-1) - 2^(- 100 /128-1), 2^(- 100 /128-1)
+        .quad 0x3FE263403FA65489, 0x3FD284DFE1F56381  //2^( 101 /128-1) - 2^(- 101 /128-1), 2^(- 101 /128-1)
+        .quad 0x3FE2967BDD5A8364, 0x3FD26B4565E27CDD  //2^( 102 /128-1) - 2^(- 102 /128-1), 2^(- 102 /128-1)
+        .quad 0x3FE2C9DB33FDCAE9, 0x3FD251CE4FB2A63F  //2^( 103 /128-1) - 2^(- 103 /128-1), 2^(- 103 /128-1)
+        .quad 0x3FE2FD5EA64AA180, 0x3FD2387A6E756238  //2^( 104 /128-1) - 2^(- 104 /128-1), 2^(- 104 /128-1)
+        .quad 0x3FE331069740E22F, 0x3FD21F49917DDC96  //2^( 105 /128-1) - 2^(- 105 /128-1), 2^(- 105 /128-1)
+        .quad 0x3FE364D36A268AE0, 0x3FD2063B88628CD6  //2^( 106 /128-1) - 2^(- 106 /128-1), 2^(- 106 /128-1)
+        .quad 0x3FE398C582887B27, 0x3FD1ED5022FCD91D  //2^( 107 /128-1) - 2^(- 107 /128-1), 2^(- 107 /128-1)
+        .quad 0x3FE3CCDD443B3394, 0x3FD1D4873168B9AA  //2^( 108 /128-1) - 2^(- 108 /128-1), 2^(- 108 /128-1)
+        .quad 0x3FE4011B135B9590, 0x3FD1BBE084045CD4  //2^( 109 /128-1) - 2^(- 109 /128-1), 2^(- 109 /128-1)
+        .quad 0x3FE4357F544FA3C1, 0x3FD1A35BEB6FCB75  //2^( 110 /128-1) - 2^(- 110 /128-1), 2^(- 110 /128-1)
+        .quad 0x3FE46A0A6BC742FD, 0x3FD18AF9388C8DEA  //2^( 111 /128-1) - 2^(- 111 /128-1), 2^(- 111 /128-1)
+        .quad 0x3FE49EBCBEBCFBCA, 0x3FD172B83C7D517B  //2^( 112 /128-1) - 2^(- 112 /128-1), 2^(- 112 /128-1)
+        .quad 0x3FE4D396B276BC6F, 0x3FD15A98C8A58E51  //2^( 113 /128-1) - 2^(- 113 /128-1), 2^(- 113 /128-1)
+        .quad 0x3FE50898AC869B96, 0x3FD1429AAEA92DE0  //2^( 114 /128-1) - 2^(- 114 /128-1), 2^(- 114 /128-1)
+        .quad 0x3FE53DC312CB9B7A, 0x3FD12ABDC06C31CC  //2^( 115 /128-1) - 2^(- 115 /128-1), 2^(- 115 /128-1)
+        .quad 0x3FE573164B726DB6, 0x3FD11301D0125B51  //2^( 116 /128-1) - 2^(- 116 /128-1), 2^(- 116 /128-1)
+        .quad 0x3FE5A892BCF6379B, 0x3FD0FB66AFFED31B  //2^( 117 /128-1) - 2^(- 117 /128-1), 2^(- 117 /128-1)
+        .quad 0x3FE5DE38CE215725, 0x3FD0E3EC32D3D1A2  //2^( 118 /128-1) - 2^(- 118 /128-1), 2^(- 118 /128-1)
+        .quad 0x3FE61408E60E2888, 0x3FD0CC922B7247F7  //2^( 119 /128-1) - 2^(- 119 /128-1), 2^(- 119 /128-1)
+        .quad 0x3FE64A036C27CC52, 0x3FD0B5586CF9890F  //2^( 120 /128-1) - 2^(- 120 /128-1), 2^(- 120 /128-1)
+        .quad 0x3FE68028C82AEE2F, 0x3FD09E3ECAC6F383  //2^( 121 /128-1) - 2^(- 121 /128-1), 2^(- 121 /128-1)
+        .quad 0x3FE6B67962268C43, 0x3FD0874518759BC8  //2^( 122 /128-1) - 2^(- 122 /128-1), 2^(- 122 /128-1)
+        .quad 0x3FE6ECF5A27CBF28, 0x3FD0706B29DDF6DE  //2^( 123 /128-1) - 2^(- 123 /128-1), 2^(- 123 /128-1)
+        .quad 0x3FE7239DF1E38286, 0x3FD059B0D3158574  //2^( 124 /128-1) - 2^(- 124 /128-1), 2^(- 124 /128-1)
+        .quad 0x3FE75A72B9657E51, 0x3FD04315E86E7F85  //2^( 125 /128-1) - 2^(- 125 /128-1), 2^(- 125 /128-1)
+        .quad 0x3FE791746262D0A8, 0x3FD02C9A3E778061  //2^( 126 /128-1) - 2^(- 126 /128-1), 2^(- 126 /128-1)
+        .quad 0x3FE7C8A35691D856, 0x3FD0163DA9FB3335 //2^( 127 /128-1) - 2^(- 127 /128-1), 2^(- 127 /128-1)
+        .align 32
+        .quad 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000 /* _dbShifter = 1.5 * 2^(52-k)*/
+        .align 32
+        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99         /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
+        .align 32
+        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
+        .align 32
+        .quad 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD /* _dPC3 */
+        .align 32
+        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
+        .align 32
+        .quad 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425 /* _dPC5 */
+        .align 32
+        .quad 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f /* _lIndexMask */
+        .align 32
+        .type	__svml_dsinh_data_internal,@object
+        .size	__svml_dsinh_data_internal,.-__svml_dsinh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core-avx2.S
new file mode 100644
index 0000000000..d767d25080
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized sinh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_sinh _ZGVeN8v_sinh_avx2_wrapper
+#include "../svml_d_sinh8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core.c
new file mode 100644
index 0000000000..427d07bce2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized sinh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_sinh
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_sinh, __GI__ZGVeN8v_sinh, __redirect__ZGVeN8v_sinh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core_avx512.S
new file mode 100644
index 0000000000..d057d6c7eb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core_avx512.S
@@ -0,0 +1,461 @@
+/* Function sinh vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute sinh(x) as (exp(x)-exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   sinh(NaN) = quiet NaN, and raise invalid exception
+ *   sinh(INF) = that INF
+ *   sinh(x)   = x for subnormals
+ *   sinh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_dsinh_data_internal
+ */
+#define _dbInvLn2                     	0
+#define _dbLn2hi                      	64
+#define _dbLn2lo                      	128
+#define _dSign                        	192
+#define _dbT                          	256
+#define _dbShifter                    	2304
+#define _iDomainRange                 	2368
+#define _dPC2                         	2432
+#define _dPC3                         	2496
+#define _dPC4                         	2560
+#define _dPC5                         	2624
+#define _lIndexMask                   	2688
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_sinh_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        lea       _dbT+8+__svml_dsinh_data_internal(%rip), %rax
+        vmovaps   %zmm0, %zmm8
+
+/*  Abs argument  */
+        vandpd    _dSign+__svml_dsinh_data_internal(%rip), %zmm8, %zmm7
+        vmovups   _dbShifter+__svml_dsinh_data_internal(%rip), %zmm13
+
+/*
+ *  Load argument
+ * dM = x*2^K/log(2) + RShifter
+ */
+        vmovups   _dbInvLn2+__svml_dsinh_data_internal(%rip), %zmm12
+        vmovups   _dbLn2hi+__svml_dsinh_data_internal(%rip), %zmm14
+        vmovups   _dPC5+__svml_dsinh_data_internal(%rip), %zmm6
+
+/* VLOAD_CONST( D, dPC[0],         TAB._dPC1 ); */
+        vmovups   _dPC4+__svml_dsinh_data_internal(%rip), %zmm4
+        vxorpd    %zmm8, %zmm7, %zmm5
+        kxnorw    %k0, %k0, %k1
+        kxnorw    %k0, %k0, %k2
+        vfmadd213pd {rn-sae}, %zmm13, %zmm5, %zmm12
+
+/*
+ * Check for overflow\underflow
+ *
+ */
+        vpsrlq    $32, %zmm5, %zmm9
+
+/*
+ *  R
+ * dN = dM - RShifter
+ */
+        vsubpd    {rn-sae}, %zmm13, %zmm12, %zmm2
+        vpmovqd   %zmm9, %ymm10
+        vmovups   _dbLn2lo+__svml_dsinh_data_internal(%rip), %zmm9
+
+/* dR = dX - dN*Log2_hi/2^K */
+        vfnmadd231pd {rn-sae}, %zmm14, %zmm2, %zmm5
+
+/*
+ * sinh(r) = r*((a1=1)+r^2*(a3+r^2*a5)) = r + r*(r^2*(a3+r^2*a5)) ....
+ * dSinh_r = (a3+r^2*a5)
+ */
+        vmovups   _dPC3+__svml_dsinh_data_internal(%rip), %zmm14
+
+/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */
+        vfnmadd231pd {rn-sae}, %zmm9, %zmm2, %zmm5
+        vpcmpgtd  _iDomainRange+__svml_dsinh_data_internal(%rip), %ymm10, %ymm11
+        vmovmskps %ymm11, %edx
+
+/* dR2 = dR^2 */
+        vmulpd    {rn-sae}, %zmm5, %zmm5, %zmm2
+        vfmadd231pd {rn-sae}, %zmm2, %zmm6, %zmm14
+
+/*
+ *  Index and lookup
+ * j
+ */
+        vpandq    _lIndexMask+__svml_dsinh_data_internal(%rip), %zmm12, %zmm15
+        vpsllq    $4, %zmm15, %zmm1
+        vpmovqd   %zmm1, %ymm0
+        vpxord    %zmm11, %zmm11, %zmm11
+        vpxord    %zmm10, %zmm10, %zmm10
+        vgatherdpd (%rax,%ymm0), %zmm11{%k1}
+        vgatherdpd -8(%rax,%ymm0), %zmm10{%k2}
+
+/* split j and N */
+        vpxorq    %zmm15, %zmm12, %zmm3
+
+/*
+ *  G1,G2,G3: dTdif,dTn * 2^N,2^(-N)
+ * lM now is an EXP(2^N)
+ */
+        vpsllq    $45, %zmm3, %zmm3
+        vpaddq    %zmm3, %zmm10, %zmm1
+
+/*  */
+        vpaddq    %zmm3, %zmm11, %zmm12
+
+/*  */
+        vpsubq    %zmm3, %zmm11, %zmm13
+
+/* dSinh_r = r^2*(a3+r^2*a5) */
+        vmulpd    {rn-sae}, %zmm2, %zmm14, %zmm3
+
+/* dG2 = dTn*2^N - dTn*2^-N */
+        vsubpd    {rn-sae}, %zmm13, %zmm12, %zmm15
+
+/* dG3 = dTn*2^N + dTn*2^-N */
+        vaddpd    {rn-sae}, %zmm13, %zmm12, %zmm0
+
+/* dSinh_r = r + r*(r^2*(a3+r^2*a5)) */
+        vfmadd213pd {rn-sae}, %zmm5, %zmm5, %zmm3
+
+/*
+ * poly(r) = (dG2+dG1)+dG3*sinh(dR)+dG1*sinh(dR)+(dG1+dG2)*dR2*(a2 +a4*dR2)
+ * dOut = (a2 +a4*dR2)
+ */
+        vmovups   _dPC2+__svml_dsinh_data_internal(%rip), %zmm5
+
+/* dG1 += dG3 */
+        vaddpd    {rn-sae}, %zmm0, %zmm1, %zmm6
+        vfmadd231pd {rn-sae}, %zmm2, %zmm4, %zmm5
+
+/* dOut = dR2*(a2 +a4*dR2) */
+        vmulpd    {rn-sae}, %zmm2, %zmm5, %zmm4
+
+/* dG2 += dG1 */
+        vaddpd    {rn-sae}, %zmm15, %zmm1, %zmm2
+
+/* dOut = dG2*dR2*(a2 +a4*dR2) */
+        vmulpd    {rn-sae}, %zmm2, %zmm4, %zmm4
+
+/* dOut = dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
+        vfmadd213pd {rn-sae}, %zmm4, %zmm6, %zmm3
+
+/* dOut = dG2 + dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
+        vaddpd    {rn-sae}, %zmm2, %zmm3, %zmm0
+
+/*  Ret H  */
+        vorpd     %zmm0, %zmm7, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm8
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm8, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      sinh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_sinh_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dsinh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(64)) VUINT32 _dbInvLn2[8][2];
+        __declspec(align(64)) VUINT32 _dbLn2hi[8][2];
+        __declspec(align(64)) VUINT32 _dbLn2lo[8][2];
+        __declspec(align(64)) VUINT32 _dSign[8][2];                //0x8000000000000000
+        __declspec(align(64)) VUINT32 _dbT[(1<<7)][2][2]; //precalc poly coeff
+        __declspec(align(64)) VUINT32 _dbShifter[8][2];
+        __declspec(align(64)) VUINT32 _iDomainRange[16][1];
+        __declspec(align(64)) VUINT32 _dPC2[8][2];
+        __declspec(align(64)) VUINT32 _dPC3[8][2];
+        __declspec(align(64)) VUINT32 _dPC4[8][2];
+        __declspec(align(64)) VUINT32 _dPC5[8][2];
+        __declspec(align(64)) VUINT32 _lIndexMask[8][2];
+} __svml_dsinh_data_internal;
+#endif
+__svml_dsinh_data_internal:
+        .quad 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE /* _dbInvLn2 = 1/log(2) */
+        .align 64
+        .quad 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000 /* _dbLn2hi  = log(2) hi*/
+        .align 64
+        .quad 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A /* _dbLn2lo  = log(2) lo*/
+        .align 64
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 /* _dSign */
+        //_dbT
+        .align 64
+        .quad 0x0000000000000000, 0x3FE0000000000000  //2^( 0 /128-1) - 2^(- 0 /128-1), 2^(- 0 /128-1)
+        .quad 0x3F762E4A19BD1E74, 0x3FDFD3C22B8F71F1  //2^( 1 /128-1) - 2^(- 1 /128-1), 2^(- 1 /128-1)
+        .quad 0x3F862E5F6A0DFD36, 0x3FDFA7C1819E90D8  //2^( 2 /128-1) - 2^(- 2 /128-1), 2^(- 2 /128-1)
+        .quad 0x3F90A2E234040F5F, 0x3FDF7BFDAD9CBE14  //2^( 3 /128-1) - 2^(- 3 /128-1), 2^(- 3 /128-1)
+        .quad 0x3F962EB4ABCC5A81, 0x3FDF50765B6E4540  //2^( 4 /128-1) - 2^(- 4 /128-1), 2^(- 4 /128-1)
+        .quad 0x3F9BBAB1C5033244, 0x3FDF252B376BBA97  //2^( 5 /128-1) - 2^(- 5 /128-1), 2^(- 5 /128-1)
+        .quad 0x3FA0A372144EEB45, 0x3FDEFA1BEE615A27  //2^( 6 /128-1) - 2^(- 6 /128-1), 2^(- 6 /128-1)
+        .quad 0x3FA369AB3FFBF8B0, 0x3FDECF482D8E67F1  //2^( 7 /128-1) - 2^(- 7 /128-1), 2^(- 7 /128-1)
+        .quad 0x3FA63009BA740A2A, 0x3FDEA4AFA2A490DA  //2^( 8 /128-1) - 2^(- 8 /128-1), 2^(- 8 /128-1)
+        .quad 0x3FA8F692D8EA1B5A, 0x3FDE7A51FBC74C83  //2^( 9 /128-1) - 2^(- 9 /128-1), 2^(- 9 /128-1)
+        .quad 0x3FABBD4BF0E31A6F, 0x3FDE502EE78B3FF6  //2^( 10 /128-1) - 2^(- 10 /128-1), 2^(- 10 /128-1)
+        .quad 0x3FAE843A5840286A, 0x3FDE264614F5A129  //2^( 11 /128-1) - 2^(- 11 /128-1), 2^(- 11 /128-1)
+        .quad 0x3FB0A5B1B2A46D0A, 0x3FDDFC97337B9B5F  //2^( 12 /128-1) - 2^(- 12 /128-1), 2^(- 12 /128-1)
+        .quad 0x3FB20966375ABCDF, 0x3FDDD321F301B460  //2^( 13 /128-1) - 2^(- 13 /128-1), 2^(- 13 /128-1)
+        .quad 0x3FB36D3D65DCA4E8, 0x3FDDA9E603DB3285  //2^( 14 /128-1) - 2^(- 14 /128-1), 2^(- 14 /128-1)
+        .quad 0x3FB4D139EA06642A, 0x3FDD80E316C98398  //2^( 15 /128-1) - 2^(- 15 /128-1), 2^(- 15 /128-1)
+        .quad 0x3FB6355E6FFBF9BA, 0x3FDD5818DCFBA487  //2^( 16 /128-1) - 2^(- 16 /128-1), 2^(- 16 /128-1)
+        .quad 0x3FB799ADA42E4788, 0x3FDD2F87080D89F2  //2^( 17 /128-1) - 2^(- 17 /128-1), 2^(- 17 /128-1)
+        .quad 0x3FB8FE2A336035BC, 0x3FDD072D4A07897C  //2^( 18 /128-1) - 2^(- 18 /128-1), 2^(- 18 /128-1)
+        .quad 0x3FBA62D6CAABD6B6, 0x3FDCDF0B555DC3FA  //2^( 19 /128-1) - 2^(- 19 /128-1), 2^(- 19 /128-1)
+        .quad 0x3FBBC7B617878BAF, 0x3FDCB720DCEF9069  //2^( 20 /128-1) - 2^(- 20 /128-1), 2^(- 20 /128-1)
+        .quad 0x3FBD2CCAC7CB2A11, 0x3FDC8F6D9406E7B5  //2^( 21 /128-1) - 2^(- 21 /128-1), 2^(- 21 /128-1)
+        .quad 0x3FBE921789B52185, 0x3FDC67F12E57D14B  //2^( 22 /128-1) - 2^(- 22 /128-1), 2^(- 22 /128-1)
+        .quad 0x3FBFF79F0BEFA2C7, 0x3FDC40AB5FFFD07A  //2^( 23 /128-1) - 2^(- 23 /128-1), 2^(- 23 /128-1)
+        .quad 0x3FC0AEB1FECAE3A9, 0x3FDC199BDD85529C  //2^( 24 /128-1) - 2^(- 24 /128-1), 2^(- 24 /128-1)
+        .quad 0x3FC161B4871C5CEC, 0x3FDBF2C25BD71E09  //2^( 25 /128-1) - 2^(- 25 /128-1), 2^(- 25 /128-1)
+        .quad 0x3FC214D876F26FD0, 0x3FDBCC1E904BC1D2  //2^( 26 /128-1) - 2^(- 26 /128-1), 2^(- 26 /128-1)
+        .quad 0x3FC2C81F2693816F, 0x3FDBA5B030A1064A  //2^( 27 /128-1) - 2^(- 27 /128-1), 2^(- 27 /128-1)
+        .quad 0x3FC37B89EE88BEF7, 0x3FDB7F76F2FB5E47  //2^( 28 /128-1) - 2^(- 28 /128-1), 2^(- 28 /128-1)
+        .quad 0x3FC42F1A27A0B3CD, 0x3FDB59728DE5593A  //2^( 29 /128-1) - 2^(- 29 /128-1), 2^(- 29 /128-1)
+        .quad 0x3FC4E2D12AF1E037, 0x3FDB33A2B84F15FB  //2^( 30 /128-1) - 2^(- 30 /128-1), 2^(- 30 /128-1)
+        .quad 0x3FC596B051DD508D, 0x3FDB0E07298DB666  //2^( 31 /128-1) - 2^(- 31 /128-1), 2^(- 31 /128-1)
+        .quad 0x3FC64AB8F61134FA, 0x3FDAE89F995AD3AD  //2^( 32 /128-1) - 2^(- 32 /128-1), 2^(- 32 /128-1)
+        .quad 0x3FC6FEEC718B79D1, 0x3FDAC36BBFD3F37A  //2^( 33 /128-1) - 2^(- 33 /128-1), 2^(- 33 /128-1)
+        .quad 0x3FC7B34C1E9C607F, 0x3FDA9E6B5579FDBF  //2^( 34 /128-1) - 2^(- 34 /128-1), 2^(- 34 /128-1)
+        .quad 0x3FC867D957E91912, 0x3FDA799E1330B358  //2^( 35 /128-1) - 2^(- 35 /128-1), 2^(- 35 /128-1)
+        .quad 0x3FC91C95786E5C72, 0x3FDA5503B23E255D  //2^( 36 /128-1) - 2^(- 36 /128-1), 2^(- 36 /128-1)
+        .quad 0x3FC9D181DB83072F, 0x3FDA309BEC4A2D33  //2^( 37 /128-1) - 2^(- 37 /128-1), 2^(- 37 /128-1)
+        .quad 0x3FCA869FDCDAB512, 0x3FDA0C667B5DE565  //2^( 38 /128-1) - 2^(- 38 /128-1), 2^(- 38 /128-1)
+        .quad 0x3FCB3BF0D8885D4C, 0x3FD9E86319E32323  //2^( 39 /128-1) - 2^(- 39 /128-1), 2^(- 39 /128-1)
+        .quad 0x3FCBF1762B00EF69, 0x3FD9C49182A3F090  //2^( 40 /128-1) - 2^(- 40 /128-1), 2^(- 40 /128-1)
+        .quad 0x3FCCA731311DF0FB, 0x3FD9A0F170CA07BA  //2^( 41 /128-1) - 2^(- 41 /128-1), 2^(- 41 /128-1)
+        .quad 0x3FCD5D2348201C09, 0x3FD97D829FDE4E50  //2^( 42 /128-1) - 2^(- 42 /128-1), 2^(- 42 /128-1)
+        .quad 0x3FCE134DCDB1FE3E, 0x3FD95A44CBC8520F  //2^( 43 /128-1) - 2^(- 43 /128-1), 2^(- 43 /128-1)
+        .quad 0x3FCEC9B21FEA98EA, 0x3FD93737B0CDC5E5  //2^( 44 /128-1) - 2^(- 44 /128-1), 2^(- 44 /128-1)
+        .quad 0x3FCF80519D5001D3, 0x3FD9145B0B91FFC6  //2^( 45 /128-1) - 2^(- 45 /128-1), 2^(- 45 /128-1)
+        .quad 0x3FD01B96D26D026A, 0x3FD8F1AE99157736  //2^( 46 /128-1) - 2^(- 46 /128-1), 2^(- 46 /128-1)
+        .quad 0x3FD07723CAFA6331, 0x3FD8CF3216B5448C  //2^( 47 /128-1) - 2^(- 47 /128-1), 2^(- 47 /128-1)
+        .quad 0x3FD0D2D06841B373, 0x3FD8ACE5422AA0DB  //2^( 48 /128-1) - 2^(- 48 /128-1), 2^(- 48 /128-1)
+        .quad 0x3FD12E9D5A715381, 0x3FD88AC7D98A6699  //2^( 49 /128-1) - 2^(- 49 /128-1), 2^(- 49 /128-1)
+        .quad 0x3FD18A8B51F5C661, 0x3FD868D99B4492ED  //2^( 50 /128-1) - 2^(- 50 /128-1), 2^(- 50 /128-1)
+        .quad 0x3FD1E69AFF7B04D7, 0x3FD8471A4623C7AD  //2^( 51 /128-1) - 2^(- 51 /128-1), 2^(- 51 /128-1)
+        .quad 0x3FD242CD13EDD0F1, 0x3FD82589994CCE13  //2^( 52 /128-1) - 2^(- 52 /128-1), 2^(- 52 /128-1)
+        .quad 0x3FD29F22407D0A0C, 0x3FD80427543E1A12  //2^( 53 /128-1) - 2^(- 53 /128-1), 2^(- 53 /128-1)
+        .quad 0x3FD2FB9B369B0153, 0x3FD7E2F336CF4E62  //2^( 54 /128-1) - 2^(- 54 /128-1), 2^(- 54 /128-1)
+        .quad 0x3FD35838A7FECEC8, 0x3FD7C1ED0130C132  //2^( 55 /128-1) - 2^(- 55 /128-1), 2^(- 55 /128-1)
+        .quad 0x3FD3B4FB46A5A6CC, 0x3FD7A11473EB0187  //2^( 56 /128-1) - 2^(- 56 /128-1), 2^(- 56 /128-1)
+        .quad 0x3FD411E3C4D4302F, 0x3FD780694FDE5D3F  //2^( 57 /128-1) - 2^(- 57 /128-1), 2^(- 57 /128-1)
+        .quad 0x3FD46EF2D517DAC8, 0x3FD75FEB564267C9  //2^( 58 /128-1) - 2^(- 58 /128-1), 2^(- 58 /128-1)
+        .quad 0x3FD4CC292A48369E, 0x3FD73F9A48A58174  //2^( 59 /128-1) - 2^(- 59 /128-1), 2^(- 59 /128-1)
+        .quad 0x3FD5298777884B96, 0x3FD71F75E8EC5F74  //2^( 60 /128-1) - 2^(- 60 /128-1), 2^(- 60 /128-1)
+        .quad 0x3FD5870E7047F1BC, 0x3FD6FF7DF9519484  //2^( 61 /128-1) - 2^(- 61 /128-1), 2^(- 61 /128-1)
+        .quad 0x3FD5E4BEC8452A1A, 0x3FD6DFB23C651A2F  //2^( 62 /128-1) - 2^(- 62 /128-1), 2^(- 62 /128-1)
+        .quad 0x3FD64299338D7827, 0x3FD6C012750BDABF  //2^( 63 /128-1) - 2^(- 63 /128-1), 2^(- 63 /128-1)
+        .quad 0x3FD6A09E667F3BCD, 0x3FD6A09E667F3BCD  //2^( 64 /128-1) - 2^(- 64 /128-1), 2^(- 64 /128-1)
+        .quad 0x3FD6FECF15CB0C0B, 0x3FD68155D44CA973  //2^( 65 /128-1) - 2^(- 65 /128-1), 2^(- 65 /128-1)
+        .quad 0x3FD75D2BF6751239, 0x3FD6623882552225  //2^( 66 /128-1) - 2^(- 66 /128-1), 2^(- 66 /128-1)
+        .quad 0x3FD7BBB5BDD665E8, 0x3FD6434634CCC320  //2^( 67 /128-1) - 2^(- 67 /128-1), 2^(- 67 /128-1)
+        .quad 0x3FD81A6D219E6963, 0x3FD6247EB03A5585  //2^( 68 /128-1) - 2^(- 68 /128-1), 2^(- 68 /128-1)
+        .quad 0x3FD87952D7D426DF, 0x3FD605E1B976DC09  //2^( 69 /128-1) - 2^(- 69 /128-1), 2^(- 69 /128-1)
+        .quad 0x3FD8D86796D7AE49, 0x3FD5E76F15AD2148  //2^( 70 /128-1) - 2^(- 70 /128-1), 2^(- 70 /128-1)
+        .quad 0x3FD937AC156373C8, 0x3FD5C9268A5946B7  //2^( 71 /128-1) - 2^(- 71 /128-1), 2^(- 71 /128-1)
+        .quad 0x3FD997210A8DAEE4, 0x3FD5AB07DD485429  //2^( 72 /128-1) - 2^(- 72 /128-1), 2^(- 72 /128-1)
+        .quad 0x3FD9F6C72DC9BA68, 0x3FD58D12D497C7FD  //2^( 73 /128-1) - 2^(- 73 /128-1), 2^(- 73 /128-1)
+        .quad 0x3FDA569F36E974EA, 0x3FD56F4736B527DA  //2^( 74 /128-1) - 2^(- 74 /128-1), 2^(- 74 /128-1)
+        .quad 0x3FDAB6A9DE1EA215, 0x3FD551A4CA5D920F  //2^( 75 /128-1) - 2^(- 75 /128-1), 2^(- 75 /128-1)
+        .quad 0x3FDB16E7DBFC4CA3, 0x3FD5342B569D4F82  //2^( 76 /128-1) - 2^(- 76 /128-1), 2^(- 76 /128-1)
+        .quad 0x3FDB7759E9782918, 0x3FD516DAA2CF6642  //2^( 77 /128-1) - 2^(- 77 /128-1), 2^(- 77 /128-1)
+        .quad 0x3FDBD800BFEBF932, 0x3FD4F9B2769D2CA7  //2^( 78 /128-1) - 2^(- 78 /128-1), 2^(- 78 /128-1)
+        .quad 0x3FDC38DD1916F025, 0x3FD4DCB299FDDD0D  //2^( 79 /128-1) - 2^(- 79 /128-1), 2^(- 79 /128-1)
+        .quad 0x3FDC99EFAF1F1790, 0x3FD4BFDAD5362A27  //2^( 80 /128-1) - 2^(- 80 /128-1), 2^(- 80 /128-1)
+        .quad 0x3FDCFB393C92B539, 0x3FD4A32AF0D7D3DE  //2^( 81 /128-1) - 2^(- 81 /128-1), 2^(- 81 /128-1)
+        .quad 0x3FDD5CBA7C69B19C, 0x3FD486A2B5C13CD0  //2^( 82 /128-1) - 2^(- 82 /128-1), 2^(- 82 /128-1)
+        .quad 0x3FDDBE742A06FF34, 0x3FD46A41ED1D0057  //2^( 83 /128-1) - 2^(- 83 /128-1), 2^(- 83 /128-1)
+        .quad 0x3FDE2067013A029D, 0x3FD44E086061892D  //2^( 84 /128-1) - 2^(- 84 /128-1), 2^(- 84 /128-1)
+        .quad 0x3FDE8293BE3FFB87, 0x3FD431F5D950A897  //2^( 85 /128-1) - 2^(- 85 /128-1), 2^(- 85 /128-1)
+        .quad 0x3FDEE4FB1DC56E75, 0x3FD4160A21F72E2A  //2^( 86 /128-1) - 2^(- 86 /128-1), 2^(- 86 /128-1)
+        .quad 0x3FDF479DDCE78F58, 0x3FD3FA4504AC801C  //2^( 87 /128-1) - 2^(- 87 /128-1), 2^(- 87 /128-1)
+        .quad 0x3FDFAA7CB935ACFE, 0x3FD3DEA64C123422  //2^( 88 /128-1) - 2^(- 88 /128-1), 2^(- 88 /128-1)
+        .quad 0x3FE006CC38594EB1, 0x3FD3C32DC313A8E5  //2^( 89 /128-1) - 2^(- 89 /128-1), 2^(- 89 /128-1)
+        .quad 0x3FE03878E0EB1569, 0x3FD3A7DB34E59FF7  //2^( 90 /128-1) - 2^(- 90 /128-1), 2^(- 90 /128-1)
+        .quad 0x3FE06A44B5C74101, 0x3FD38CAE6D05D866  //2^( 91 /128-1) - 2^(- 91 /128-1), 2^(- 91 /128-1)
+        .quad 0x3FE09C3016A0D077, 0x3FD371A7373AA9CB  //2^( 92 /128-1) - 2^(- 92 /128-1), 2^(- 92 /128-1)
+        .quad 0x3FE0CE3B63676360, 0x3FD356C55F929FF1  //2^( 93 /128-1) - 2^(- 93 /128-1), 2^(- 93 /128-1)
+        .quad 0x3FE10066FC47F240, 0x3FD33C08B26416FF  //2^( 94 /128-1) - 2^(- 94 /128-1), 2^(- 94 /128-1)
+        .quad 0x3FE132B341AD8761, 0x3FD32170FC4CD831  //2^( 95 /128-1) - 2^(- 95 /128-1), 2^(- 95 /128-1)
+        .quad 0x3FE165209441F823, 0x3FD306FE0A31B715  //2^( 96 /128-1) - 2^(- 96 /128-1), 2^(- 96 /128-1)
+        .quad 0x3FE197AF54EE9EBB, 0x3FD2ECAFA93E2F56  //2^( 97 /128-1) - 2^(- 97 /128-1), 2^(- 97 /128-1)
+        .quad 0x3FE1CA5FE4DD1475, 0x3FD2D285A6E4030B  //2^( 98 /128-1) - 2^(- 98 /128-1), 2^(- 98 /128-1)
+        .quad 0x3FE1FD32A577EC72, 0x3FD2B87FD0DAD990  //2^( 99 /128-1) - 2^(- 99 /128-1), 2^(- 99 /128-1)
+        .quad 0x3FE23027F86B6ED6, 0x3FD29E9DF51FDEE1  //2^( 100 /128-1) - 2^(- 100 /128-1), 2^(- 100 /128-1)
+        .quad 0x3FE263403FA65489, 0x3FD284DFE1F56381  //2^( 101 /128-1) - 2^(- 101 /128-1), 2^(- 101 /128-1)
+        .quad 0x3FE2967BDD5A8364, 0x3FD26B4565E27CDD  //2^( 102 /128-1) - 2^(- 102 /128-1), 2^(- 102 /128-1)
+        .quad 0x3FE2C9DB33FDCAE9, 0x3FD251CE4FB2A63F  //2^( 103 /128-1) - 2^(- 103 /128-1), 2^(- 103 /128-1)
+        .quad 0x3FE2FD5EA64AA180, 0x3FD2387A6E756238  //2^( 104 /128-1) - 2^(- 104 /128-1), 2^(- 104 /128-1)
+        .quad 0x3FE331069740E22F, 0x3FD21F49917DDC96  //2^( 105 /128-1) - 2^(- 105 /128-1), 2^(- 105 /128-1)
+        .quad 0x3FE364D36A268AE0, 0x3FD2063B88628CD6  //2^( 106 /128-1) - 2^(- 106 /128-1), 2^(- 106 /128-1)
+        .quad 0x3FE398C582887B27, 0x3FD1ED5022FCD91D  //2^( 107 /128-1) - 2^(- 107 /128-1), 2^(- 107 /128-1)
+        .quad 0x3FE3CCDD443B3394, 0x3FD1D4873168B9AA  //2^( 108 /128-1) - 2^(- 108 /128-1), 2^(- 108 /128-1)
+        .quad 0x3FE4011B135B9590, 0x3FD1BBE084045CD4  //2^( 109 /128-1) - 2^(- 109 /128-1), 2^(- 109 /128-1)
+        .quad 0x3FE4357F544FA3C1, 0x3FD1A35BEB6FCB75  //2^( 110 /128-1) - 2^(- 110 /128-1), 2^(- 110 /128-1)
+        .quad 0x3FE46A0A6BC742FD, 0x3FD18AF9388C8DEA  //2^( 111 /128-1) - 2^(- 111 /128-1), 2^(- 111 /128-1)
+        .quad 0x3FE49EBCBEBCFBCA, 0x3FD172B83C7D517B  //2^( 112 /128-1) - 2^(- 112 /128-1), 2^(- 112 /128-1)
+        .quad 0x3FE4D396B276BC6F, 0x3FD15A98C8A58E51  //2^( 113 /128-1) - 2^(- 113 /128-1), 2^(- 113 /128-1)
+        .quad 0x3FE50898AC869B96, 0x3FD1429AAEA92DE0  //2^( 114 /128-1) - 2^(- 114 /128-1), 2^(- 114 /128-1)
+        .quad 0x3FE53DC312CB9B7A, 0x3FD12ABDC06C31CC  //2^( 115 /128-1) - 2^(- 115 /128-1), 2^(- 115 /128-1)
+        .quad 0x3FE573164B726DB6, 0x3FD11301D0125B51  //2^( 116 /128-1) - 2^(- 116 /128-1), 2^(- 116 /128-1)
+        .quad 0x3FE5A892BCF6379B, 0x3FD0FB66AFFED31B  //2^( 117 /128-1) - 2^(- 117 /128-1), 2^(- 117 /128-1)
+        .quad 0x3FE5DE38CE215725, 0x3FD0E3EC32D3D1A2  //2^( 118 /128-1) - 2^(- 118 /128-1), 2^(- 118 /128-1)
+        .quad 0x3FE61408E60E2888, 0x3FD0CC922B7247F7  //2^( 119 /128-1) - 2^(- 119 /128-1), 2^(- 119 /128-1)
+        .quad 0x3FE64A036C27CC52, 0x3FD0B5586CF9890F  //2^( 120 /128-1) - 2^(- 120 /128-1), 2^(- 120 /128-1)
+        .quad 0x3FE68028C82AEE2F, 0x3FD09E3ECAC6F383  //2^( 121 /128-1) - 2^(- 121 /128-1), 2^(- 121 /128-1)
+        .quad 0x3FE6B67962268C43, 0x3FD0874518759BC8  //2^( 122 /128-1) - 2^(- 122 /128-1), 2^(- 122 /128-1)
+        .quad 0x3FE6ECF5A27CBF28, 0x3FD0706B29DDF6DE  //2^( 123 /128-1) - 2^(- 123 /128-1), 2^(- 123 /128-1)
+        .quad 0x3FE7239DF1E38286, 0x3FD059B0D3158574  //2^( 124 /128-1) - 2^(- 124 /128-1), 2^(- 124 /128-1)
+        .quad 0x3FE75A72B9657E51, 0x3FD04315E86E7F85  //2^( 125 /128-1) - 2^(- 125 /128-1), 2^(- 125 /128-1)
+        .quad 0x3FE791746262D0A8, 0x3FD02C9A3E778061  //2^( 126 /128-1) - 2^(- 126 /128-1), 2^(- 126 /128-1)
+        .quad 0x3FE7C8A35691D856, 0x3FD0163DA9FB3335 //2^( 127 /128-1) - 2^(- 127 /128-1), 2^(- 127 /128-1)
+        .align 64
+        .quad 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000 /* _dbShifter = 1.5 * 2^(52-k)*/
+        .align 64
+        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99         /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
+        .align 64
+        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
+        .align 64
+        .quad 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD /* _dPC3 */
+        .align 64
+        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
+        .align 64
+        .quad 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425 /* _dPC5 */
+        .align 64
+        .quad 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f /* _lIndexMask */
+        .align 64
+        .type	__svml_dsinh_data_internal,@object
+        .size	__svml_dsinh_data_internal,.-__svml_dsinh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core-avx2.S
new file mode 100644
index 0000000000..06525b7b37
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized sinhf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_sinhf _ZGVeN16v_sinhf_avx2_wrapper
+#include "../svml_s_sinhf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core.c
new file mode 100644
index 0000000000..6a954caa37
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized sinhf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_sinhf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_sinhf, __GI__ZGVeN16v_sinhf,
+	       __redirect__ZGVeN16v_sinhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core_avx512.S
new file mode 100644
index 0000000000..1119c00259
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core_avx512.S
@@ -0,0 +1,318 @@
+/* Function sinhf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute sinh(x) as (exp(x)-exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   sinh(NaN) = quiet NaN, and raise invalid exception
+ *   sinh(INF) = that INF
+ *   sinh(x)   = x for subnormals
+ *   sinh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_ssinh_data_internal
+ */
+#define _sInvLn2                      	0
+#define _sLn2hi                       	64
+#define _sLn2lo                       	128
+#define _sSign                        	192
+#define _sShifter                     	256
+#define _iDomainRange                 	320
+#define _sPC1                         	384
+#define _sPC2                         	448
+#define _sPC3                         	512
+#define _sPC4                         	576
+#define _sPC5                         	640
+#define _sPC6                         	704
+#define _iHalf                        	768
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_sinhf_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovaps   %zmm0, %zmm5
+
+/*
+ *  Implementation
+ *  Abs argument
+ */
+        vandps    _sSign+__svml_ssinh_data_internal(%rip), %zmm5, %zmm4
+
+/*
+ * Check for overflow\underflow
+ * MORE faster than GE?
+ */
+        vpternlogd $255, %zmm6, %zmm6, %zmm6
+        vmovups   _sShifter+__svml_ssinh_data_internal(%rip), %zmm7
+
+/*
+ *  Load argument
+ * dM = x/log(2) + RShifter
+ */
+        vmovups   _sInvLn2+__svml_ssinh_data_internal(%rip), %zmm11
+        vmovups   _sLn2hi+__svml_ssinh_data_internal(%rip), %zmm8
+        vmovups   _sLn2lo+__svml_ssinh_data_internal(%rip), %zmm10
+        vmovups   _iHalf+__svml_ssinh_data_internal(%rip), %zmm12
+        vmovups   _sPC5+__svml_ssinh_data_internal(%rip), %zmm0
+        vmovups   _sPC6+__svml_ssinh_data_internal(%rip), %zmm3
+
+/* x^2 */
+        vmovups   _sPC2+__svml_ssinh_data_internal(%rip), %zmm2
+        vxorps    %zmm5, %zmm4, %zmm1
+        vfmadd213ps {rn-sae}, %zmm7, %zmm1, %zmm11
+        vpcmpd    $2, _iDomainRange+__svml_ssinh_data_internal(%rip), %zmm1, %k1
+
+/*
+ *  G1,G2 2^N,2^(-N)
+ * iM now is an EXP(2^N)
+ */
+        vpslld    $23, %zmm11, %zmm13
+
+/*
+ *  R
+ * sN = sM - RShifter
+ */
+        vsubps    {rn-sae}, %zmm7, %zmm11, %zmm9
+        vpaddd    %zmm13, %zmm12, %zmm14
+        vpsubd    %zmm13, %zmm12, %zmm15
+
+/* sG1 = 2^(N-1)+2^(-N-1) */
+        vaddps    {rn-sae}, %zmm15, %zmm14, %zmm7
+        vpandnd   %zmm1, %zmm1, %zmm6{%k1}
+
+/* sR = sX - sN*Log2_hi */
+        vfnmadd231ps {rn-sae}, %zmm8, %zmm9, %zmm1
+        vptestmd  %zmm6, %zmm6, %k0
+
+/* sG2 = 2^(N-1)-2^(-N-1) */
+        vsubps    {rn-sae}, %zmm15, %zmm14, %zmm8
+
+/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
+        vfnmadd231ps {rn-sae}, %zmm10, %zmm9, %zmm1
+
+/*
+ * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) ....
+ * sSinh_r = (a3+r^2*a5)
+ */
+        vmovups   _sPC3+__svml_ssinh_data_internal(%rip), %zmm14
+        kmovw     %k0, %edx
+
+/* sR2 = sR^2 */
+        vmulps    {rn-sae}, %zmm1, %zmm1, %zmm6
+        vfmadd231ps {rn-sae}, %zmm6, %zmm0, %zmm14
+
+/* sSinh_r = r^2*(a3+r^2*a5) */
+        vmulps    {rn-sae}, %zmm6, %zmm14, %zmm0
+
+/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */
+        vfmadd213ps {rn-sae}, %zmm1, %zmm1, %zmm0
+
+/*
+ * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2)
+ * sOut = (a4 +a6*sR2)
+ */
+        vmovups   _sPC4+__svml_ssinh_data_internal(%rip), %zmm1
+        vfmadd231ps {rn-sae}, %zmm6, %zmm3, %zmm1
+
+/* sOut = a2+sR2*(a4+a6*sR2) */
+        vfmadd213ps {rn-sae}, %zmm2, %zmm6, %zmm1
+
+/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */
+        vmulps    {rn-sae}, %zmm6, %zmm1, %zmm2
+
+/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        vmulps    {rn-sae}, %zmm8, %zmm2, %zmm3
+
+/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        vfmadd213ps {rn-sae}, %zmm3, %zmm0, %zmm7
+
+/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        vaddps    {rn-sae}, %zmm8, %zmm7, %zmm9
+
+/*  Ret H  */
+        vorps     %zmm9, %zmm4, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm5
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm5, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      sinhf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_sinhf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_ssinh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(64)) VUINT32 _sInvLn2[16][1];
+        __declspec(align(64)) VUINT32 _sLn2hi[16][1];
+        __declspec(align(64)) VUINT32 _sLn2lo[16][1];
+        __declspec(align(64)) VUINT32 _sSign[16][1];
+        __declspec(align(64)) VUINT32 _sShifter[16][1];
+        __declspec(align(64)) VUINT32 _iDomainRange[16][1];
+        __declspec(align(64)) VUINT32 _sPC1[16][1];
+        __declspec(align(64)) VUINT32 _sPC2[16][1];
+        __declspec(align(64)) VUINT32 _sPC3[16][1];
+        __declspec(align(64)) VUINT32 _sPC4[16][1];
+        __declspec(align(64)) VUINT32 _sPC5[16][1];
+        __declspec(align(64)) VUINT32 _sPC6[16][1];
+        __declspec(align(64)) VUINT32 _iHalf[16][1];
+} __svml_ssinh_data_internal;
+#endif
+__svml_ssinh_data_internal:
+        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B           /* _sInvLn2  */  //k=0
+        .align 64
+        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000           /* _sLn2hi   */
+        .align 64
+        .long 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4           /* _sLn2lo   */
+        .align 64
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSign    */
+        .align 64
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
+        .align 64
+        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
+        .align 64
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
+        .align 64
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
+        .align 64
+        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
+        .align 64
+        .long 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72         /* _sPC4  */
+        .align 64
+        .long 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461         /* _sPC5  */
+        .align 64
+        .long 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3         /* _sPC6  */
+        // Integer constants
+        .align 64
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000 /* _iHalf*/
+        .align 64
+        .type	__svml_ssinh_data_internal,@object
+        .size	__svml_ssinh_data_internal,.-__svml_ssinh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core-sse2.S
new file mode 100644
index 0000000000..1b31095fe1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized sinhf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_sinhf _ZGVbN4v_sinhf_sse2
+#include "../svml_s_sinhf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core.c
new file mode 100644
index 0000000000..9d4297c2c9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized sinhf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_sinhf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_sinhf, __GI__ZGVbN4v_sinhf,
+	       __redirect__ZGVbN4v_sinhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core_sse4.S
new file mode 100644
index 0000000000..82d6f55d33
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core_sse4.S
@@ -0,0 +1,308 @@
+/* Function sinhf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute sinh(x) as (exp(x)-exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   sinh(NaN) = quiet NaN, and raise invalid exception
+ *   sinh(INF) = that INF
+ *   sinh(x)   = x for subnormals
+ *   sinh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_ssinh_data_internal
+ */
+#define _sInvLn2                      	0
+#define _sLn2hi                       	16
+#define _sLn2lo                       	32
+#define _sSign                        	48
+#define _sShifter                     	64
+#define _iDomainRange                 	80
+#define _sPC1                         	96
+#define _sPC2                         	112
+#define _sPC3                         	128
+#define _sPC4                         	144
+#define _sPC5                         	160
+#define _sPC6                         	176
+#define _iHalf                        	192
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_sinhf_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+
+/*
+ *  Implementation
+ *  Abs argument
+ */
+        movups    _sSign+__svml_ssinh_data_internal(%rip), %xmm14
+        andps     %xmm0, %xmm14
+        movaps    %xmm14, %xmm10
+
+/*
+ *  Load argument
+ * dM = x/log(2) + RShifter
+ */
+        movups    _sInvLn2+__svml_ssinh_data_internal(%rip), %xmm7
+        pxor      %xmm0, %xmm10
+        mulps     %xmm10, %xmm7
+
+/*
+ * Check for overflow\underflow
+ * MORE faster than GE?
+ */
+        movaps    %xmm10, %xmm1
+        movups    _sShifter+__svml_ssinh_data_internal(%rip), %xmm2
+
+/* sR = sX - sN*Log2_hi */
+        movups    _sLn2hi+__svml_ssinh_data_internal(%rip), %xmm3
+        addps     %xmm2, %xmm7
+
+/*
+ *  R
+ * sN = sM - RShifter
+ */
+        movaps    %xmm7, %xmm4
+
+/*
+ *  G1,G2 2^N,2^(-N)
+ * iM now is an EXP(2^N)
+ */
+        pslld     $23, %xmm7
+
+/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
+        movups    _sLn2lo+__svml_ssinh_data_internal(%rip), %xmm5
+        subps     %xmm2, %xmm4
+        mulps     %xmm4, %xmm3
+        mulps     %xmm4, %xmm5
+        subps     %xmm3, %xmm10
+
+/*
+ * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) ....
+ * sSinh_r = (a3+r^2*a5)
+ */
+        movups    _sPC5+__svml_ssinh_data_internal(%rip), %xmm8
+        subps     %xmm5, %xmm10
+
+/* sR2 = sR^2 */
+        movaps    %xmm10, %xmm12
+        mulps     %xmm10, %xmm12
+
+/*
+ * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2)
+ * sOut = (a4 +a6*sR2)
+ */
+        movups    _sPC6+__svml_ssinh_data_internal(%rip), %xmm9
+        mulps     %xmm12, %xmm8
+        mulps     %xmm12, %xmm9
+        addps     _sPC3+__svml_ssinh_data_internal(%rip), %xmm8
+        addps     _sPC4+__svml_ssinh_data_internal(%rip), %xmm9
+
+/* sSinh_r = r^2*(a3+r^2*a5) */
+        mulps     %xmm12, %xmm8
+
+/* sOut = a2+sR2*(a4+a6*sR2) */
+        mulps     %xmm12, %xmm9
+
+/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */
+        mulps     %xmm10, %xmm8
+        addps     _sPC2+__svml_ssinh_data_internal(%rip), %xmm9
+        addps     %xmm8, %xmm10
+
+/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */
+        mulps     %xmm9, %xmm12
+        movdqu    _iHalf+__svml_ssinh_data_internal(%rip), %xmm6
+        movdqa    %xmm6, %xmm13
+        psubd     %xmm7, %xmm6
+        paddd     %xmm7, %xmm13
+
+/* sG1 = 2^(N-1)+2^(-N-1) */
+        movdqa    %xmm13, %xmm11
+
+/* sG2 = 2^(N-1)-2^(-N-1) */
+        subps     %xmm6, %xmm13
+        addps     %xmm6, %xmm11
+
+/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        mulps     %xmm13, %xmm12
+
+/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        mulps     %xmm10, %xmm11
+        pcmpgtd   _iDomainRange+__svml_ssinh_data_internal(%rip), %xmm1
+        addps     %xmm11, %xmm12
+        movmskps  %xmm1, %edx
+
+/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        addps     %xmm12, %xmm13
+
+/*  Ret H  */
+        orps      %xmm13, %xmm14
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm14
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm14, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm14, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm14
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm14
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      sinhf@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_sinhf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_ssinh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _sInvLn2[4][1];
+        __declspec(align(16)) VUINT32 _sLn2hi[4][1];
+        __declspec(align(16)) VUINT32 _sLn2lo[4][1];
+        __declspec(align(16)) VUINT32 _sSign[4][1];
+        __declspec(align(16)) VUINT32 _sShifter[4][1];
+        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
+        __declspec(align(16)) VUINT32 _sPC1[4][1];
+        __declspec(align(16)) VUINT32 _sPC2[4][1];
+        __declspec(align(16)) VUINT32 _sPC3[4][1];
+        __declspec(align(16)) VUINT32 _sPC4[4][1];
+        __declspec(align(16)) VUINT32 _sPC5[4][1];
+        __declspec(align(16)) VUINT32 _sPC6[4][1];
+        __declspec(align(16)) VUINT32 _iHalf[4][1];
+} __svml_ssinh_data_internal;
+#endif
+__svml_ssinh_data_internal:
+        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B           /* _sInvLn2  */  //k=0
+        .align 16
+        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000           /* _sLn2hi   */
+        .align 16
+        .long 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4           /* _sLn2lo   */
+        .align 16
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSign    */
+        .align 16
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
+        .align 16
+        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
+        .align 16
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
+        .align 16
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
+        .align 16
+        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
+        .align 16
+        .long 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72         /* _sPC4  */
+        .align 16
+        .long 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461         /* _sPC5  */
+        .align 16
+        .long 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3         /* _sPC6  */
+        // Integer constants
+        .align 16
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000 /* _iHalf*/
+        .align 16
+        .type	__svml_ssinh_data_internal,@object
+        .size	__svml_ssinh_data_internal,.-__svml_ssinh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core-sse.S
new file mode 100644
index 0000000000..d3c9c607a0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized sinhf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_sinhf _ZGVdN8v_sinhf_sse_wrapper
+#include "../svml_s_sinhf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core.c
new file mode 100644
index 0000000000..2a2e21e742
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized sinhf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_sinhf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_sinhf, __GI__ZGVdN8v_sinhf,
+	       __redirect__ZGVdN8v_sinhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core_avx2.S
new file mode 100644
index 0000000000..ea13fb60d4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core_avx2.S
@@ -0,0 +1,309 @@
+/* Function sinhf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute sinh(x) as (exp(x)-exp(-x))/2,
+ *   where exp is calculated as
+ *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
+ *
+ *   Special cases:
+ *
+ *   sinh(NaN) = quiet NaN, and raise invalid exception
+ *   sinh(INF) = that INF
+ *   sinh(x)   = x for subnormals
+ *   sinh(x) overflows for big x and returns MAXLOG+log(2)
+ *
+ */
+
+/* Offsets for data table __svml_ssinh_data_internal
+ */
+#define _sInvLn2                      	0
+#define _sLn2hi                       	32
+#define _sLn2lo                       	64
+#define _sSign                        	96
+#define _sShifter                     	128
+#define _iDomainRange                 	160
+#define _sPC1                         	192
+#define _sPC2                         	224
+#define _sPC3                         	256
+#define _sPC4                         	288
+#define _sPC5                         	320
+#define _sPC6                         	352
+#define _iHalf                        	384
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_sinhf_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        vmovups   _sInvLn2+__svml_ssinh_data_internal(%rip), %ymm7
+        vmovups   _sShifter+__svml_ssinh_data_internal(%rip), %ymm4
+        vmovups   _sLn2hi+__svml_ssinh_data_internal(%rip), %ymm5
+
+/*
+ * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2)
+ * sOut = (a4 +a6*sR2)
+ */
+        vmovups   _sPC6+__svml_ssinh_data_internal(%rip), %ymm14
+
+/*
+ * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) ....
+ * sSinh_r = (a3+r^2*a5)
+ */
+        vmovups   _sPC5+__svml_ssinh_data_internal(%rip), %ymm12
+        vmovups   _iHalf+__svml_ssinh_data_internal(%rip), %ymm8
+        vmovaps   %ymm0, %ymm2
+
+/*
+ *  Implementation
+ *  Abs argument
+ */
+        vandps    _sSign+__svml_ssinh_data_internal(%rip), %ymm2, %ymm1
+        vxorps    %ymm2, %ymm1, %ymm0
+
+/*
+ *  Load argument
+ * dM = x/log(2) + RShifter
+ */
+        vfmadd213ps %ymm4, %ymm0, %ymm7
+
+/*
+ *  R
+ * sN = sM - RShifter
+ */
+        vsubps    %ymm4, %ymm7, %ymm6
+
+/*
+ *  G1,G2 2^N,2^(-N)
+ * iM now is an EXP(2^N)
+ */
+        vpslld    $23, %ymm7, %ymm9
+
+/*
+ * Check for overflow\underflow
+ * MORE faster than GE?
+ */
+        vpcmpgtd  _iDomainRange+__svml_ssinh_data_internal(%rip), %ymm0, %ymm3
+
+/* sR = sX - sN*Log2_hi */
+        vfnmadd231ps %ymm5, %ymm6, %ymm0
+        vpaddd    %ymm9, %ymm8, %ymm10
+        vpsubd    %ymm9, %ymm8, %ymm11
+
+/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
+        vfnmadd231ps _sLn2lo+__svml_ssinh_data_internal(%rip), %ymm6, %ymm0
+
+/* sR2 = sR^2 */
+        vmulps    %ymm0, %ymm0, %ymm13
+        vfmadd213ps _sPC4+__svml_ssinh_data_internal(%rip), %ymm13, %ymm14
+        vfmadd213ps _sPC3+__svml_ssinh_data_internal(%rip), %ymm13, %ymm12
+
+/* sOut = a2+sR2*(a4+a6*sR2) */
+        vfmadd213ps _sPC2+__svml_ssinh_data_internal(%rip), %ymm13, %ymm14
+
+/* sSinh_r = r^2*(a3+r^2*a5) */
+        vmulps    %ymm12, %ymm13, %ymm12
+
+/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */
+        vmulps    %ymm14, %ymm13, %ymm15
+
+/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */
+        vfmadd213ps %ymm0, %ymm0, %ymm12
+        vmovmskps %ymm3, %edx
+
+/* sG1 = 2^(N-1)+2^(-N-1) */
+        vaddps    %ymm11, %ymm10, %ymm3
+
+/* sG2 = 2^(N-1)-2^(-N-1) */
+        vsubps    %ymm11, %ymm10, %ymm10
+
+/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        vmulps    %ymm15, %ymm10, %ymm0
+
+/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        vfmadd213ps %ymm0, %ymm12, %ymm3
+
+/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */
+        vaddps    %ymm3, %ymm10, %ymm4
+
+/*  Ret H  */
+        vorps     %ymm4, %ymm1, %ymm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm2, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      sinhf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_sinhf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_ssinh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _sInvLn2[8][1];
+        __declspec(align(32)) VUINT32 _sLn2hi[8][1];
+        __declspec(align(32)) VUINT32 _sLn2lo[8][1];
+        __declspec(align(32)) VUINT32 _sSign[8][1];
+        __declspec(align(32)) VUINT32 _sShifter[8][1];
+        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
+        __declspec(align(32)) VUINT32 _sPC1[8][1];
+        __declspec(align(32)) VUINT32 _sPC2[8][1];
+        __declspec(align(32)) VUINT32 _sPC3[8][1];
+        __declspec(align(32)) VUINT32 _sPC4[8][1];
+        __declspec(align(32)) VUINT32 _sPC5[8][1];
+        __declspec(align(32)) VUINT32 _sPC6[8][1];
+        __declspec(align(32)) VUINT32 _iHalf[8][1];
+} __svml_ssinh_data_internal;
+#endif
+__svml_ssinh_data_internal:
+        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B           /* _sInvLn2  */  //k=0
+        .align 32
+        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000           /* _sLn2hi   */
+        .align 32
+        .long 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4           /* _sLn2lo   */
+        .align 32
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSign    */
+        .align 32
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
+        .align 32
+        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
+        .align 32
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
+        .align 32
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
+        .align 32
+        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
+        .align 32
+        .long 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72         /* _sPC4  */
+        .align 32
+        .long 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461         /* _sPC5  */
+        .align 32
+        .long 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3         /* _sPC6  */
+        // Integer constants
+        .align 32
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000 /* _iHalf*/
+        .align 32
+        .type	__svml_ssinh_data_internal,@object
+        .size	__svml_ssinh_data_internal,.-__svml_ssinh_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_sinh2_core.S b/sysdeps/x86_64/fpu/svml_d_sinh2_core.S
new file mode 100644
index 0000000000..91bda7318c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_sinh2_core.S
@@ -0,0 +1,29 @@
+/* Function sinh vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_sinh)
+WRAPPER_IMPL_SSE2 sinh
+END (_ZGVbN2v_sinh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_sinh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_sinh4_core.S b/sysdeps/x86_64/fpu/svml_d_sinh4_core.S
new file mode 100644
index 0000000000..7b8091946a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_sinh4_core.S
@@ -0,0 +1,29 @@
+/* Function sinh vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_sinh)
+WRAPPER_IMPL_AVX _ZGVbN2v_sinh
+END (_ZGVdN4v_sinh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_sinh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S
new file mode 100644
index 0000000000..f773bf110c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function sinh vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_sinh)
+WRAPPER_IMPL_AVX _ZGVbN2v_sinh
+END (_ZGVcN4v_sinh)
diff --git a/sysdeps/x86_64/fpu/svml_d_sinh8_core.S b/sysdeps/x86_64/fpu/svml_d_sinh8_core.S
new file mode 100644
index 0000000000..153a18429c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_sinh8_core.S
@@ -0,0 +1,25 @@
+/* Function sinh vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_sinh)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_sinh
+END (_ZGVeN8v_sinh)
diff --git a/sysdeps/x86_64/fpu/svml_s_sinhf16_core.S b/sysdeps/x86_64/fpu/svml_s_sinhf16_core.S
new file mode 100644
index 0000000000..f8dc7da336
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_sinhf16_core.S
@@ -0,0 +1,25 @@
+/* Function sinhf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_sinhf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_sinhf
+END (_ZGVeN16v_sinhf)
diff --git a/sysdeps/x86_64/fpu/svml_s_sinhf4_core.S b/sysdeps/x86_64/fpu/svml_s_sinhf4_core.S
new file mode 100644
index 0000000000..d065d03eb6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_sinhf4_core.S
@@ -0,0 +1,29 @@
+/* Function sinhf vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_sinhf)
+WRAPPER_IMPL_SSE2 sinhf
+END (_ZGVbN4v_sinhf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_sinhf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_sinhf8_core.S b/sysdeps/x86_64/fpu/svml_s_sinhf8_core.S
new file mode 100644
index 0000000000..1194699a76
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_sinhf8_core.S
@@ -0,0 +1,29 @@
+/* Function sinhf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_sinhf)
+WRAPPER_IMPL_AVX _ZGVbN4v_sinhf
+END (_ZGVdN8v_sinhf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_sinhf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S
new file mode 100644
index 0000000000..82c6b9b239
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function sinhf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_sinhf)
+WRAPPER_IMPL_AVX _ZGVbN4v_sinhf
+END (_ZGVcN8v_sinhf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx.c
new file mode 100644
index 0000000000..55aa36d866
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-sinh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx2.c
new file mode 100644
index 0000000000..55aa36d866
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-sinh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx512f.c
new file mode 100644
index 0000000000..55aa36d866
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-sinh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-sinh.c b/sysdeps/x86_64/fpu/test-double-libmvec-sinh.c
new file mode 100644
index 0000000000..82dcaf745d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-sinh.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC sinh
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 0222f9f5b8..db136cc901 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1)
+VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVbN2v_sinh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 1aad9faf9c..5fc09ac8c0 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1)
+VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVdN4v_sinh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index e404bf899d..26ef7fb365 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1)
+VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVcN4v_sinh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 2b4de59343..c7055fca76 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1)
+VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVeN8v_sinh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx.c
new file mode 100644
index 0000000000..93986945f3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-sinhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx2.c
new file mode 100644
index 0000000000..93986945f3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-sinhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx512f.c
new file mode 100644
index 0000000000..93986945f3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-sinhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c
new file mode 100644
index 0000000000..fb1f3c5c48
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC sinhf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 9a4a1b84a9..d353bcb0f2 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f)
+VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVeN16v_sinhf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index eb4e36d0e2..5e59117626 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f)
+VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVbN4v_sinhf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index d8adab59e6..e884a5f4df 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f)
+VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVdN8v_sinhf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index e6e1a90c72..95910d39e9 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f)
 VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f)
+VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVcN8v_sinhf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 09/18] x86-64: Add vector cbrt/cbrtf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (7 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 08/18] x86-64: Add vector sinh/sinhf " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:25   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 10/18] x86-64: Add vector atan2/atan2f " Sunil K Pandey
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
 .../fpu/multiarch/svml_d_cbrt2_core-sse2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_cbrt2_core.c  |  27 +
 .../fpu/multiarch/svml_d_cbrt2_core_sse4.S    | 467 ++++++++++++++++
 .../fpu/multiarch/svml_d_cbrt4_core-sse.S     |  20 +
 .../x86_64/fpu/multiarch/svml_d_cbrt4_core.c  |  27 +
 .../fpu/multiarch/svml_d_cbrt4_core_avx2.S    | 505 +++++++++++++++++
 .../fpu/multiarch/svml_d_cbrt8_core-avx2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_cbrt8_core.c  |  27 +
 .../fpu/multiarch/svml_d_cbrt8_core_avx512.S  | 253 +++++++++
 .../fpu/multiarch/svml_s_cbrtf16_core-avx2.S  |  20 +
 .../fpu/multiarch/svml_s_cbrtf16_core.c       |  28 +
 .../multiarch/svml_s_cbrtf16_core_avx512.S    | 235 ++++++++
 .../fpu/multiarch/svml_s_cbrtf4_core-sse2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_s_cbrtf4_core.c |  28 +
 .../fpu/multiarch/svml_s_cbrtf4_core_sse4.S   | 490 +++++++++++++++++
 .../fpu/multiarch/svml_s_cbrtf8_core-sse.S    |  20 +
 .../x86_64/fpu/multiarch/svml_s_cbrtf8_core.c |  28 +
 .../fpu/multiarch/svml_s_cbrtf8_core_avx2.S   | 509 ++++++++++++++++++
 sysdeps/x86_64/fpu/svml_d_cbrt2_core.S        |  29 +
 sysdeps/x86_64/fpu/svml_d_cbrt4_core.S        |  29 +
 sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S    |  25 +
 sysdeps/x86_64/fpu/svml_d_cbrt8_core.S        |  25 +
 sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S      |  25 +
 sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S       |  29 +
 sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S       |  29 +
 sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S   |  25 +
 .../x86_64/fpu/test-double-libmvec-cbrt-avx.c |   1 +
 .../fpu/test-double-libmvec-cbrt-avx2.c       |   1 +
 .../fpu/test-double-libmvec-cbrt-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-cbrtf-avx.c |   1 +
 .../fpu/test-float-libmvec-cbrtf-avx2.c       |   1 +
 .../fpu/test-float-libmvec-cbrtf-avx512f.c    |   1 +
 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 3031 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 6347320521..7f1304ed1d 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -197,4 +197,15 @@
 #define __DECL_SIMD_sinhf32x
 #define __DECL_SIMD_sinhf64x
 #define __DECL_SIMD_sinhf128x
+
+#define __DECL_SIMD_cbrt
+#define __DECL_SIMD_cbrtf
+#define __DECL_SIMD_cbrtl
+#define __DECL_SIMD_cbrtf16
+#define __DECL_SIMD_cbrtf32
+#define __DECL_SIMD_cbrtf64
+#define __DECL_SIMD_cbrtf128
+#define __DECL_SIMD_cbrtf32x
+#define __DECL_SIMD_cbrtf64x
+#define __DECL_SIMD_cbrtf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 673b3a93ba..26d18f0135 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -149,7 +149,7 @@ __MATHCALL_VEC (hypot,, (_Mdouble_ __x, _Mdouble_ __y));
 
 #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99
 /* Return the cube root of X.  */
-__MATHCALL (cbrt,, (_Mdouble_ __x));
+__MATHCALL_VEC (cbrt,, (_Mdouble_ __x));
 #endif
 
 
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index f9d7b085ab..a6558d9810 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -49,6 +49,7 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
 GLIBC_2.35 _ZGVbN2v_acos F
 GLIBC_2.35 _ZGVbN2v_asin F
 GLIBC_2.35 _ZGVbN2v_atan F
+GLIBC_2.35 _ZGVbN2v_cbrt F
 GLIBC_2.35 _ZGVbN2v_cosh F
 GLIBC_2.35 _ZGVbN2v_exp10 F
 GLIBC_2.35 _ZGVbN2v_exp2 F
@@ -58,6 +59,7 @@ GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
 GLIBC_2.35 _ZGVbN4v_asinf F
 GLIBC_2.35 _ZGVbN4v_atanf F
+GLIBC_2.35 _ZGVbN4v_cbrtf F
 GLIBC_2.35 _ZGVbN4v_coshf F
 GLIBC_2.35 _ZGVbN4v_exp10f F
 GLIBC_2.35 _ZGVbN4v_exp2f F
@@ -67,6 +69,7 @@ GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
 GLIBC_2.35 _ZGVcN4v_asin F
 GLIBC_2.35 _ZGVcN4v_atan F
+GLIBC_2.35 _ZGVcN4v_cbrt F
 GLIBC_2.35 _ZGVcN4v_cosh F
 GLIBC_2.35 _ZGVcN4v_exp10 F
 GLIBC_2.35 _ZGVcN4v_exp2 F
@@ -76,6 +79,7 @@ GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
 GLIBC_2.35 _ZGVcN8v_asinf F
 GLIBC_2.35 _ZGVcN8v_atanf F
+GLIBC_2.35 _ZGVcN8v_cbrtf F
 GLIBC_2.35 _ZGVcN8v_coshf F
 GLIBC_2.35 _ZGVcN8v_exp10f F
 GLIBC_2.35 _ZGVcN8v_exp2f F
@@ -85,6 +89,7 @@ GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
 GLIBC_2.35 _ZGVdN4v_asin F
 GLIBC_2.35 _ZGVdN4v_atan F
+GLIBC_2.35 _ZGVdN4v_cbrt F
 GLIBC_2.35 _ZGVdN4v_cosh F
 GLIBC_2.35 _ZGVdN4v_exp10 F
 GLIBC_2.35 _ZGVdN4v_exp2 F
@@ -94,6 +99,7 @@ GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
 GLIBC_2.35 _ZGVdN8v_asinf F
 GLIBC_2.35 _ZGVdN8v_atanf F
+GLIBC_2.35 _ZGVdN8v_cbrtf F
 GLIBC_2.35 _ZGVdN8v_coshf F
 GLIBC_2.35 _ZGVdN8v_exp10f F
 GLIBC_2.35 _ZGVdN8v_exp2f F
@@ -103,6 +109,7 @@ GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
 GLIBC_2.35 _ZGVeN16v_asinf F
 GLIBC_2.35 _ZGVeN16v_atanf F
+GLIBC_2.35 _ZGVeN16v_cbrtf F
 GLIBC_2.35 _ZGVeN16v_coshf F
 GLIBC_2.35 _ZGVeN16v_exp10f F
 GLIBC_2.35 _ZGVeN16v_exp2f F
@@ -112,6 +119,7 @@ GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
 GLIBC_2.35 _ZGVeN8v_asin F
 GLIBC_2.35 _ZGVeN8v_atan F
+GLIBC_2.35 _ZGVeN8v_cbrt F
 GLIBC_2.35 _ZGVeN8v_cosh F
 GLIBC_2.35 _ZGVeN8v_exp10 F
 GLIBC_2.35 _ZGVeN8v_exp2 F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 51a41cfebc..dcd45934ab 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -94,6 +94,10 @@
 #  define __DECL_SIMD_sinh __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_sinhf
 #  define __DECL_SIMD_sinhf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_cbrt
+#  define __DECL_SIMD_cbrt __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_cbrtf
+#  define __DECL_SIMD_cbrtf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index 91e9b4fc83..dfb5f13ea3 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -46,6 +46,8 @@
 !GCC$ builtin (expm1f) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (sinh) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (sinhf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (cbrt) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -77,3 +79,5 @@
 !GCC$ builtin (expm1f) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (sinh) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (sinhf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (cbrt) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 81e9fc95b2..dde737c0d6 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -25,6 +25,7 @@ libmvec-funcs = \
   acos \
   asin \
   atan \
+  cbrt \
   cos \
   cosh \
   exp \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 2710446d12..b70aeb3e2f 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -17,6 +17,7 @@ libmvec {
     _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
     _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
     _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
+    _ZGVbN2v_cbrt; _ZGVcN4v_cbrt; _ZGVdN4v_cbrt; _ZGVeN8v_cbrt;
     _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh;
     _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
     _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
@@ -26,6 +27,7 @@ libmvec {
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
     _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
+    _ZGVbN4v_cbrtf; _ZGVcN8v_cbrtf; _ZGVdN8v_cbrtf; _ZGVeN16v_cbrtf;
     _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf;
     _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
     _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index f4b313119d..e039a993df 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -583,6 +583,26 @@ float: 1
 float128: 1
 ldouble: 1
 
+Function: "cbrt_vlen16":
+float: 1
+
+Function: "cbrt_vlen2":
+double: 1
+
+Function: "cbrt_vlen4":
+double: 1
+float: 2
+
+Function: "cbrt_vlen4_avx2":
+double: 1
+
+Function: "cbrt_vlen8":
+double: 1
+float: 2
+
+Function: "cbrt_vlen8_avx2":
+float: 2
+
 Function: Real part of "ccos":
 double: 1
 float: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S
new file mode 100644
index 0000000000..60f4c46a11
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized cbrt, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_cbrt _ZGVbN2v_cbrt_sse2
+#include "../svml_d_cbrt2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c
new file mode 100644
index 0000000000..07390b7150
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized cbrt, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_cbrt
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_cbrt, __GI__ZGVbN2v_cbrt, __redirect__ZGVbN2v_cbrt)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S
new file mode 100644
index 0000000000..72ecb25e05
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S
@@ -0,0 +1,467 @@
+/* Function cbrt vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
+ *   Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
+ *   where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in double precision
+ *   cbrt(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
+ *   (T stores the high 53 bits, D stores the low order bits)
+ *   Result=2^k*T+(2^k*T*r)*P+2^k*D
+ *   where P=p1+p2*r+..+p8*r^7
+ *
+ */
+
+/* Offsets for data table __svml_dcbrt_data_internal
+ */
+#define _dRcp                         	0
+#define _dCbrtHiLo                    	256
+#define _dA7                          	1024
+#define _dA6                          	1040
+#define _dA5                          	1056
+#define _dA4                          	1072
+#define _dA3                          	1088
+#define _dA2                          	1104
+#define _dA1                          	1120
+#define _dNeg65Div64                  	1136
+#define _dSgnf6Mask                   	1152
+#define _dNegOne                      	1168
+#define _dMantissaMask                	1184
+#define _lExpHiMask                   	1200
+#define _lExpLoMask                   	1216
+#define _l1556                        	1232
+#define _iRcpIndexMask                	1248
+#define _iAbsMask                     	1264
+#define _iSignMask                    	1280
+#define _iBias                        	1296
+#define _iSub                         	1312
+#define _iCmp                         	1328
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_cbrt_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+
+/* Calculate CbrtIndex */
+        movaps    %xmm0, %xmm10
+        psrlq     $52, %xmm10
+
+/* Load 1/(1+iRcpIndex/32+1/64) reciprocal table value */
+        lea       __svml_dcbrt_data_internal(%rip), %r8
+        pand      _lExpLoMask+__svml_dcbrt_data_internal(%rip), %xmm10
+        movdqu    _l1556+__svml_dcbrt_data_internal(%rip), %xmm9
+        pmuludq   %xmm10, %xmm9
+
+/* If the exponent field is zero - go to callout to process denormals */
+        movq      _iAbsMask+__svml_dcbrt_data_internal(%rip), %xmm7
+
+/* Calculate Rcp table index */
+        movq      _iRcpIndexMask+__svml_dcbrt_data_internal(%rip), %xmm13
+
+/* Get iX - high part of argument */
+        pshufd    $221, %xmm0, %xmm4
+
+/*
+ * Declarations
+ * Load constants
+ */
+        movq      _iSignMask+__svml_dcbrt_data_internal(%rip), %xmm1
+        pand      %xmm4, %xmm7
+        pand      %xmm4, %xmm13
+
+/* Compute 2^k */
+        psrld     $20, %xmm4
+        movq      _iBias+__svml_dcbrt_data_internal(%rip), %xmm2
+        pand      %xmm1, %xmm4
+        pshufd    $136, %xmm9, %xmm15
+        por       %xmm2, %xmm4
+        psrld     $14, %xmm15
+        psrld     $12, %xmm13
+        paddd     %xmm15, %xmm4
+        pxor      %xmm2, %xmm2
+        pslld     $20, %xmm4
+        movdqa    %xmm15, %xmm11
+        movd      %xmm13, %edx
+        paddd     %xmm15, %xmm11
+        pshufd    $1, %xmm13, %xmm8
+        punpckldq %xmm4, %xmm2
+
+/*
+ * VAND( L, l2k, = l2k, lExpHiMask );
+ * Argument reduction Z
+ */
+        movups    _dMantissaMask+__svml_dcbrt_data_internal(%rip), %xmm1
+        movups    _dSgnf6Mask+__svml_dcbrt_data_internal(%rip), %xmm4
+        andps     %xmm0, %xmm1
+        movd      %xmm8, %ecx
+        andps     %xmm0, %xmm4
+        orps      _dNegOne+__svml_dcbrt_data_internal(%rip), %xmm1
+        orps      _dNeg65Div64+__svml_dcbrt_data_internal(%rip), %xmm4
+        movslq    %edx, %rdx
+        subpd     %xmm4, %xmm1
+        movslq    %ecx, %rcx
+        movsd     (%r8,%rdx), %xmm3
+        movq      _iSub+__svml_dcbrt_data_internal(%rip), %xmm5
+        psubd     %xmm5, %xmm7
+        movhpd    (%r8,%rcx), %xmm3
+        mulpd     %xmm1, %xmm3
+
+/* Polynomial */
+        movups    _dA7+__svml_dcbrt_data_internal(%rip), %xmm5
+        mulpd     %xmm3, %xmm5
+        addpd     _dA6+__svml_dcbrt_data_internal(%rip), %xmm5
+        mulpd     %xmm3, %xmm5
+        addpd     _dA5+__svml_dcbrt_data_internal(%rip), %xmm5
+        mulpd     %xmm3, %xmm5
+        addpd     _dA4+__svml_dcbrt_data_internal(%rip), %xmm5
+        mulpd     %xmm3, %xmm5
+        addpd     _dA3+__svml_dcbrt_data_internal(%rip), %xmm5
+        pshufd    $136, %xmm10, %xmm12
+        psubd     %xmm15, %xmm12
+        psubd     %xmm11, %xmm12
+        mulpd     %xmm3, %xmm5
+        pslld     $8, %xmm12
+        paddd     %xmm12, %xmm13
+
+/* Load cbrt(2^j*(1+iRcpIndex/32+1/64)) Hi & Lo values */
+        movd      %xmm13, %esi
+        pshufd    $1, %xmm13, %xmm14
+        movq      _iCmp+__svml_dcbrt_data_internal(%rip), %xmm6
+        movd      %xmm14, %edi
+        pcmpgtd   %xmm6, %xmm7
+        movmskps  %xmm7, %eax
+        addpd     _dA2+__svml_dcbrt_data_internal(%rip), %xmm5
+        movslq    %esi, %rsi
+        movslq    %edi, %rdi
+        mulpd     %xmm3, %xmm5
+        movsd     256(%r8,%rsi), %xmm6
+        movhpd    256(%r8,%rdi), %xmm6
+
+/* THi*2^k, TLo*2^k */
+        mulpd     %xmm2, %xmm6
+        addpd     _dA1+__svml_dcbrt_data_internal(%rip), %xmm5
+
+/* THi*2^k*Z */
+        mulpd     %xmm6, %xmm3
+
+/* Final reconstruction */
+        mulpd     %xmm3, %xmm5
+        addpd     %xmm5, %xmm6
+        andl      $3, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm6
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm6, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm6, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 eax xmm6
+
+        xorl      %edx, %edx
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm6
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm6
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      cbrt@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN2v_cbrt_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dcbrt_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _dRcp[32][2];
+        __declspec(align(16)) VUINT32 _dCbrtHiLo[96][2];
+        __declspec(align(16)) VUINT32 _dA7[2][2];
+        __declspec(align(16)) VUINT32 _dA6[2][2];
+        __declspec(align(16)) VUINT32 _dA5[2][2];
+        __declspec(align(16)) VUINT32 _dA4[2][2];
+        __declspec(align(16)) VUINT32 _dA3[2][2];
+        __declspec(align(16)) VUINT32 _dA2[2][2];
+        __declspec(align(16)) VUINT32 _dA1[2][2];
+        __declspec(align(16)) VUINT32 _dNeg65Div64[2][2];
+        __declspec(align(16)) VUINT32 _dSgnf6Mask[2][2];
+        __declspec(align(16)) VUINT32 _dNegOne[2][2];
+        __declspec(align(16)) VUINT32 _dMantissaMask[2][2];
+        __declspec(align(16)) VUINT32 _lExpHiMask[2][2];
+        __declspec(align(16)) VUINT32 _lExpLoMask[2][2];
+        __declspec(align(16)) VUINT32 _l1556[2][2];
+        __declspec(align(16)) VUINT32 _iRcpIndexMask[4][1];
+        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
+        __declspec(align(16)) VUINT32 _iSignMask[4][1];
+        __declspec(align(16)) VUINT32 _iBias[4][1];
+        __declspec(align(16)) VUINT32 _iSub[4][1];
+        __declspec(align(16)) VUINT32 _iCmp[4][1];
+} __svml_dcbrt_data_internal;
+#endif
+__svml_dcbrt_data_internal:
+        /*== _dRcp ==*/
+        .quad 0xBFEF81F81F81F820  /* (1/(1+0/32+1/64)) = -.984615 */
+        .quad 0xBFEE9131ABF0B767  /* (1/(1+1/32+1/64)) = -.955224 */
+        .quad 0xBFEDAE6076B981DB  /* (1/(1+2/32+1/64)) = -.927536 */
+        .quad 0xBFECD85689039B0B  /* (1/(1+3/32+1/64)) = -.901408 */
+        .quad 0xBFEC0E070381C0E0  /* (1/(1+4/32+1/64)) = -.876712 */
+        .quad 0xBFEB4E81B4E81B4F  /* (1/(1+5/32+1/64)) = -.853333 */
+        .quad 0xBFEA98EF606A63BE  /* (1/(1+6/32+1/64)) = -.831169 */
+        .quad 0xBFE9EC8E951033D9  /* (1/(1+7/32+1/64)) = -.810127 */
+        .quad 0xBFE948B0FCD6E9E0  /* (1/(1+8/32+1/64)) = -.790123 */
+        .quad 0xBFE8ACB90F6BF3AA  /* (1/(1+9/32+1/64)) = -.771084 */
+        .quad 0xBFE8181818181818  /* (1/(1+10/32+1/64)) = -.752941 */
+        .quad 0xBFE78A4C8178A4C8  /* (1/(1+11/32+1/64)) = -.735632 */
+        .quad 0xBFE702E05C0B8170  /* (1/(1+12/32+1/64)) = -.719101 */
+        .quad 0xBFE6816816816817  /* (1/(1+13/32+1/64)) = -.703297 */
+        .quad 0xBFE6058160581606  /* (1/(1+14/32+1/64)) = -.688172 */
+        .quad 0xBFE58ED2308158ED  /* (1/(1+15/32+1/64)) = -.673684 */
+        .quad 0xBFE51D07EAE2F815  /* (1/(1+16/32+1/64)) = -.659794 */
+        .quad 0xBFE4AFD6A052BF5B  /* (1/(1+17/32+1/64)) = -.646465 */
+        .quad 0xBFE446F86562D9FB  /* (1/(1+18/32+1/64)) = -.633663 */
+        .quad 0xBFE3E22CBCE4A902  /* (1/(1+19/32+1/64)) = -.621359 */
+        .quad 0xBFE3813813813814  /* (1/(1+20/32+1/64)) = -.609524 */
+        .quad 0xBFE323E34A2B10BF  /* (1/(1+21/32+1/64)) = -.598131 */
+        .quad 0xBFE2C9FB4D812CA0  /* (1/(1+22/32+1/64)) = -.587156 */
+        .quad 0xBFE27350B8812735  /* (1/(1+23/32+1/64)) = -.576577 */
+        .quad 0xBFE21FB78121FB78  /* (1/(1+24/32+1/64)) = -.566372 */
+        .quad 0xBFE1CF06ADA2811D  /* (1/(1+25/32+1/64)) = -.556522 */
+        .quad 0xBFE1811811811812  /* (1/(1+26/32+1/64)) = -.547009 */
+        .quad 0xBFE135C81135C811  /* (1/(1+27/32+1/64)) = -.537815 */
+        .quad 0xBFE0ECF56BE69C90  /* (1/(1+28/32+1/64)) = -.528926 */
+        .quad 0xBFE0A6810A6810A7  /* (1/(1+29/32+1/64)) = -.520325 */
+        .quad 0xBFE0624DD2F1A9FC  /* (1/(1+30/32+1/64)) = -.512    */
+        .quad 0xBFE0204081020408  /* (1/(1+31/32+1/64)) = -.503937 */
+        /*== _dCbrtHiLo ==*/
+        .align 16
+        .quad 0x3FF01539221D4C97    /* HI((2^0*(1+0/32+1/64))^(1/3)) = 1.005181 */
+        .quad 0x3FF03F06771A2E33    /* HI((2^0*(1+1/32+1/64))^(1/3)) = 1.015387 */
+        .quad 0x3FF06800E629D671    /* HI((2^0*(1+2/32+1/64))^(1/3)) = 1.025391 */
+        .quad 0x3FF090328731DEB2    /* HI((2^0*(1+3/32+1/64))^(1/3)) = 1.035204 */
+        .quad 0x3FF0B7A4B1BD64AC    /* HI((2^0*(1+4/32+1/64))^(1/3)) = 1.044835 */
+        .quad 0x3FF0DE601024FB87    /* HI((2^0*(1+5/32+1/64))^(1/3)) = 1.054291 */
+        .quad 0x3FF1046CB0597000    /* HI((2^0*(1+6/32+1/64))^(1/3)) = 1.06358  */
+        .quad 0x3FF129D212A9BA9B    /* HI((2^0*(1+7/32+1/64))^(1/3)) = 1.07271  */
+        .quad 0x3FF14E9736CDAF38    /* HI((2^0*(1+8/32+1/64))^(1/3)) = 1.081687 */
+        .quad 0x3FF172C2A772F507    /* HI((2^0*(1+9/32+1/64))^(1/3)) = 1.090518 */
+        .quad 0x3FF1965A848001D3    /* HI((2^0*(1+10/32+1/64))^(1/3)) = 1.099207 */
+        .quad 0x3FF1B9648C38C55D    /* HI((2^0*(1+11/32+1/64))^(1/3)) = 1.107762 */
+        .quad 0x3FF1DBE6236A0C45    /* HI((2^0*(1+12/32+1/64))^(1/3)) = 1.116186 */
+        .quad 0x3FF1FDE45CBB1F9F    /* HI((2^0*(1+13/32+1/64))^(1/3)) = 1.124485 */
+        .quad 0x3FF21F63FF409042    /* HI((2^0*(1+14/32+1/64))^(1/3)) = 1.132664 */
+        .quad 0x3FF240698C6746E5    /* HI((2^0*(1+15/32+1/64))^(1/3)) = 1.140726 */
+        .quad 0x3FF260F9454BB99B    /* HI((2^0*(1+16/32+1/64))^(1/3)) = 1.148675 */
+        .quad 0x3FF281172F8E7073    /* HI((2^0*(1+17/32+1/64))^(1/3)) = 1.156516 */
+        .quad 0x3FF2A0C719B4B6D0    /* HI((2^0*(1+18/32+1/64))^(1/3)) = 1.164252 */
+        .quad 0x3FF2C00C9F2263EC    /* HI((2^0*(1+19/32+1/64))^(1/3)) = 1.171887 */
+        .quad 0x3FF2DEEB2BB7FB78    /* HI((2^0*(1+20/32+1/64))^(1/3)) = 1.179423 */
+        .quad 0x3FF2FD65FF1EFBBC    /* HI((2^0*(1+21/32+1/64))^(1/3)) = 1.186865 */
+        .quad 0x3FF31B802FCCF6A2    /* HI((2^0*(1+22/32+1/64))^(1/3)) = 1.194214 */
+        .quad 0x3FF3393CADC50708    /* HI((2^0*(1+23/32+1/64))^(1/3)) = 1.201474 */
+        .quad 0x3FF3569E451E4C2A    /* HI((2^0*(1+24/32+1/64))^(1/3)) = 1.208647 */
+        .quad 0x3FF373A7A0554CDE    /* HI((2^0*(1+25/32+1/64))^(1/3)) = 1.215736 */
+        .quad 0x3FF3905B4A6D76CE    /* HI((2^0*(1+26/32+1/64))^(1/3)) = 1.222743 */
+        .quad 0x3FF3ACBBB0E756B6    /* HI((2^0*(1+27/32+1/64))^(1/3)) = 1.229671 */
+        .quad 0x3FF3C8CB258FA340    /* HI((2^0*(1+28/32+1/64))^(1/3)) = 1.236522 */
+        .quad 0x3FF3E48BE02AC0CE    /* HI((2^0*(1+29/32+1/64))^(1/3)) = 1.243297 */
+        .quad 0x3FF4000000000000    /* HI((2^0*(1+30/32+1/64))^(1/3)) = 1.25     */
+        .quad 0x3FF41B298D47800E    /* HI((2^0*(1+31/32+1/64))^(1/3)) = 1.256631 */
+        .quad 0x3FF443604B34D9B2    /* HI((2^1*(1+0/32+1/64))^(1/3)) = 1.266449 */
+        .quad 0x3FF4780B20906571    /* HI((2^1*(1+1/32+1/64))^(1/3)) = 1.279307 */
+        .quad 0x3FF4ABAC3EE06706    /* HI((2^1*(1+2/32+1/64))^(1/3)) = 1.291912 */
+        .quad 0x3FF4DE505DA66B8D    /* HI((2^1*(1+3/32+1/64))^(1/3)) = 1.304276 */
+        .quad 0x3FF51003420A5C07    /* HI((2^1*(1+4/32+1/64))^(1/3)) = 1.316409 */
+        .quad 0x3FF540CFD6FD11C1    /* HI((2^1*(1+5/32+1/64))^(1/3)) = 1.328323 */
+        .quad 0x3FF570C04260716B    /* HI((2^1*(1+6/32+1/64))^(1/3)) = 1.340027 */
+        .quad 0x3FF59FDDF7A45F38    /* HI((2^1*(1+7/32+1/64))^(1/3)) = 1.35153  */
+        .quad 0x3FF5CE31C83539DF    /* HI((2^1*(1+8/32+1/64))^(1/3)) = 1.36284  */
+        .quad 0x3FF5FBC3F20966A4    /* HI((2^1*(1+9/32+1/64))^(1/3)) = 1.373966 */
+        .quad 0x3FF6289C2C8F1B70    /* HI((2^1*(1+10/32+1/64))^(1/3)) = 1.384915 */
+        .quad 0x3FF654C1B4316DCF    /* HI((2^1*(1+11/32+1/64))^(1/3)) = 1.395693 */
+        .quad 0x3FF6803B54A34E44    /* HI((2^1*(1+12/32+1/64))^(1/3)) = 1.406307 */
+        .quad 0x3FF6AB0F72182659    /* HI((2^1*(1+13/32+1/64))^(1/3)) = 1.416763 */
+        .quad 0x3FF6D544118C08BC    /* HI((2^1*(1+14/32+1/64))^(1/3)) = 1.427067 */
+        .quad 0x3FF6FEDEE0388D4A    /* HI((2^1*(1+15/32+1/64))^(1/3)) = 1.437224 */
+        .quad 0x3FF727E53A4F645E    /* HI((2^1*(1+16/32+1/64))^(1/3)) = 1.44724  */
+        .quad 0x3FF7505C31104114    /* HI((2^1*(1+17/32+1/64))^(1/3)) = 1.457119 */
+        .quad 0x3FF77848904CD549    /* HI((2^1*(1+18/32+1/64))^(1/3)) = 1.466866 */
+        .quad 0x3FF79FAEE36B2534    /* HI((2^1*(1+19/32+1/64))^(1/3)) = 1.476485 */
+        .quad 0x3FF7C69379F4605B    /* HI((2^1*(1+20/32+1/64))^(1/3)) = 1.48598  */
+        .quad 0x3FF7ECFA6BBCA391    /* HI((2^1*(1+21/32+1/64))^(1/3)) = 1.495356 */
+        .quad 0x3FF812E79CAE7EB9    /* HI((2^1*(1+22/32+1/64))^(1/3)) = 1.504615 */
+        .quad 0x3FF8385EC043C71D    /* HI((2^1*(1+23/32+1/64))^(1/3)) = 1.513762 */
+        .quad 0x3FF85D635CB41B9D    /* HI((2^1*(1+24/32+1/64))^(1/3)) = 1.5228   */
+        .quad 0x3FF881F8CDE083DB    /* HI((2^1*(1+25/32+1/64))^(1/3)) = 1.531731 */
+        .quad 0x3FF8A6224802B8A8    /* HI((2^1*(1+26/32+1/64))^(1/3)) = 1.54056  */
+        .quad 0x3FF8C9E2DA25E5E4    /* HI((2^1*(1+27/32+1/64))^(1/3)) = 1.549289 */
+        .quad 0x3FF8ED3D706E1010    /* HI((2^1*(1+28/32+1/64))^(1/3)) = 1.55792  */
+        .quad 0x3FF91034D632B6DF    /* HI((2^1*(1+29/32+1/64))^(1/3)) = 1.566457 */
+        .quad 0x3FF932CBB7F0CF2D    /* HI((2^1*(1+30/32+1/64))^(1/3)) = 1.574901 */
+        .quad 0x3FF95504A517BF3A    /* HI((2^1*(1+31/32+1/64))^(1/3)) = 1.583256 */
+        .quad 0x3FF987AF34F8BB19    /* HI((2^2*(1+0/32+1/64))^(1/3)) = 1.595626 */
+        .quad 0x3FF9CA0A8337B317    /* HI((2^2*(1+1/32+1/64))^(1/3)) = 1.611826 */
+        .quad 0x3FFA0B1709CC13D5    /* HI((2^2*(1+2/32+1/64))^(1/3)) = 1.627708 */
+        .quad 0x3FFA4AE4CE6419ED    /* HI((2^2*(1+3/32+1/64))^(1/3)) = 1.643285 */
+        .quad 0x3FFA8982A5567031    /* HI((2^2*(1+4/32+1/64))^(1/3)) = 1.658572 */
+        .quad 0x3FFAC6FE500AB570    /* HI((2^2*(1+5/32+1/64))^(1/3)) = 1.673582 */
+        .quad 0x3FFB036497A15A17    /* HI((2^2*(1+6/32+1/64))^(1/3)) = 1.688328 */
+        .quad 0x3FFB3EC164671755    /* HI((2^2*(1+7/32+1/64))^(1/3)) = 1.702821 */
+        .quad 0x3FFB791FD288C46F    /* HI((2^2*(1+8/32+1/64))^(1/3)) = 1.717071 */
+        .quad 0x3FFBB28A44693BE4    /* HI((2^2*(1+9/32+1/64))^(1/3)) = 1.731089 */
+        .quad 0x3FFBEB0A72EB6E31    /* HI((2^2*(1+10/32+1/64))^(1/3)) = 1.744883 */
+        .quad 0x3FFC22A97BF5F697    /* HI((2^2*(1+11/32+1/64))^(1/3)) = 1.758462 */
+        .quad 0x3FFC596FEF6AF983    /* HI((2^2*(1+12/32+1/64))^(1/3)) = 1.771835 */
+        .quad 0x3FFC8F65DAC655A3    /* HI((2^2*(1+13/32+1/64))^(1/3)) = 1.785009 */
+        .quad 0x3FFCC492D38CE8D9    /* HI((2^2*(1+14/32+1/64))^(1/3)) = 1.797992 */
+        .quad 0x3FFCF8FE00B19367    /* HI((2^2*(1+15/32+1/64))^(1/3)) = 1.810789 */
+        .quad 0x3FFD2CAE230F8709    /* HI((2^2*(1+16/32+1/64))^(1/3)) = 1.823408 */
+        .quad 0x3FFD5FA99D15208F    /* HI((2^2*(1+17/32+1/64))^(1/3)) = 1.835855 */
+        .quad 0x3FFD91F679B6E505    /* HI((2^2*(1+18/32+1/64))^(1/3)) = 1.848135 */
+        .quad 0x3FFDC39A72BF2302    /* HI((2^2*(1+19/32+1/64))^(1/3)) = 1.860255 */
+        .quad 0x3FFDF49AF68C1570    /* HI((2^2*(1+20/32+1/64))^(1/3)) = 1.872218 */
+        .quad 0x3FFE24FD2D4C23B8    /* HI((2^2*(1+21/32+1/64))^(1/3)) = 1.884031 */
+        .quad 0x3FFE54C5FDC5EC73    /* HI((2^2*(1+22/32+1/64))^(1/3)) = 1.895697 */
+        .quad 0x3FFE83FA11B81DBB    /* HI((2^2*(1+23/32+1/64))^(1/3)) = 1.907221 */
+        .quad 0x3FFEB29DD9DBAF25    /* HI((2^2*(1+24/32+1/64))^(1/3)) = 1.918608 */
+        .quad 0x3FFEE0B59191D374    /* HI((2^2*(1+25/32+1/64))^(1/3)) = 1.929861 */
+        .quad 0x3FFF0E454245E4BF    /* HI((2^2*(1+26/32+1/64))^(1/3)) = 1.940984 */
+        .quad 0x3FFF3B50C68A9DD3    /* HI((2^2*(1+27/32+1/64))^(1/3)) = 1.951981 */
+        .quad 0x3FFF67DBCCF922DC    /* HI((2^2*(1+28/32+1/64))^(1/3)) = 1.962856 */
+        .quad 0x3FFF93E9DAD7A4A6    /* HI((2^2*(1+29/32+1/64))^(1/3)) = 1.973612 */
+        .quad 0x3FFFBF7E4E8CC9CB    /* HI((2^2*(1+30/32+1/64))^(1/3)) = 1.984251 */
+        .quad 0x3FFFEA9C61E47CD3    /* HI((2^2*(1+31/32+1/64))^(1/3)) = 1.994778 */
+        .align 16
+        .quad 0x3F93750AD588F115, 0x3F93750AD588F115      /* _dA7 */
+        .align 16
+        .quad 0xBF98090D6221A247, 0xBF98090D6221A247      /* _dA6 */
+        .align 16
+        .quad 0x3F9EE7113506AC12, 0x3F9EE7113506AC12      /* _dA5 */
+        .align 16
+        .quad 0xBFA511E8D2B3183B, 0xBFA511E8D2B3183B      /* _dA4 */
+        .align 16
+        .quad 0x3FAF9ADD3C0CA458, 0x3FAF9ADD3C0CA458      /* _dA3 */
+        .align 16
+        .quad 0xBFBC71C71C71C71C, 0xBFBC71C71C71C71C      /* _dA2 */
+        .align 16
+        .quad 0x3FD5555555555555, 0x3FD5555555555555      /* _dA1 */
+        .align 16
+        .quad 0xBFF0400000000000, 0xBFF0400000000000        /* _dNeg65Div64 */
+        .align 16
+        .quad 0x000FC00000000000, 0x000FC00000000000        /* _dSgnf6Mask */
+        .align 16
+        .quad 0xBFF0000000000000, 0xBFF0000000000000        /* _dNegOne */
+        .align 16
+        .quad 0x000FFFFFFFFFFFFF, 0x000FFFFFFFFFFFFF        /* _dMantissaMask */
+        .align 16
+        .quad 0xFFF0000000000000, 0xFFF0000000000000        /* _lExpHiMask */
+        .align 16
+        .quad 0x00000000000007FF, 0x00000000000007FF        /* _lExpLoMask */
+        .align 16
+        .quad 0x0000000000001556, 0x0000000000001556        /* _l1556 */
+        .align 16
+        .long 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000    /* _iRcpIndexMask */
+        .align 16
+        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF    /* _iAbsMask */
+        .align 16
+        .long 0x00000800, 0x00000800, 0x00000800, 0x00000800    /* _iSignMask */
+        .align 16
+        .long 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA    /* _iBias */
+        .align 16
+        .long 0x80100000, 0x80100000, 0x80100000, 0x80100000    /* _iSub */
+        .align 16
+        .long 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff    /* _iCmp */
+        .align 16
+        .type	__svml_dcbrt_data_internal,@object
+        .size	__svml_dcbrt_data_internal,.-__svml_dcbrt_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S
new file mode 100644
index 0000000000..3b54f31fbc
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized cbrt, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_cbrt _ZGVdN4v_cbrt_sse_wrapper
+#include "../svml_d_cbrt4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c
new file mode 100644
index 0000000000..0b135877aa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized cbrt, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_cbrt
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_cbrt, __GI__ZGVdN4v_cbrt, __redirect__ZGVdN4v_cbrt)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S
new file mode 100644
index 0000000000..2223c5309f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S
@@ -0,0 +1,505 @@
+/* Function cbrt vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
+ *   Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
+ *   where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in double precision
+ *   cbrt(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
+ *   (T stores the high 53 bits, D stores the low order bits)
+ *   Result=2^k*T+(2^k*T*r)*P+2^k*D
+ *   where P=p1+p2*r+..+p8*r^7
+ *
+ */
+
+/* Offsets for data table __svml_dcbrt_data_internal
+ */
+#define _dRcp                         	0
+#define _dCbrtHiLo                    	256
+#define _dA7                          	1024
+#define _dA6                          	1056
+#define _dA5                          	1088
+#define _dA4                          	1120
+#define _dA3                          	1152
+#define _dA2                          	1184
+#define _dA1                          	1216
+#define _dNeg65Div64                  	1248
+#define _dSgnf6Mask                   	1280
+#define _dNegOne                      	1312
+#define _dMantissaMask                	1344
+#define _lExpHiMask                   	1376
+#define _lExpLoMask                   	1408
+#define _l1556                        	1440
+#define _iRcpIndexMask                	1472
+#define _iAbsMask                     	1504
+#define _iSignMask                    	1536
+#define _iBias                        	1568
+#define _iSub                         	1600
+#define _iCmp                         	1632
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_cbrt_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+
+/* Load 1/(1+iRcpIndex/32+1/64) reciprocal table value */
+        lea       __svml_dcbrt_data_internal(%rip), %rax
+        vmovapd   %ymm0, %ymm5
+
+/*
+ * Declarations
+ * Load constants
+ * Get iX - high part of argument
+ */
+        vextractf128 $1, %ymm5, %xmm6
+
+/* Calculate CbrtIndex */
+        vpsrlq    $52, %ymm5, %ymm15
+        vshufps   $221, %xmm6, %xmm5, %xmm4
+
+/* Calculate Rcp table index */
+        vandps    _iRcpIndexMask+__svml_dcbrt_data_internal(%rip), %xmm4, %xmm10
+        vpsrld    $12, %xmm10, %xmm3
+        vmovd     %xmm3, %ecx
+
+/* If the exponent field is zero - go to callout to process denormals */
+        vandps    _iAbsMask+__svml_dcbrt_data_internal(%rip), %xmm4, %xmm7
+
+/* Compute 2^k */
+        vpsrld    $20, %xmm4, %xmm4
+        vpsubd    _iSub+__svml_dcbrt_data_internal(%rip), %xmm7, %xmm8
+        vandps    _lExpLoMask+__svml_dcbrt_data_internal(%rip), %ymm15, %ymm0
+        vpmuludq  _l1556+__svml_dcbrt_data_internal(%rip), %ymm0, %ymm6
+        vpextrd   $2, %xmm3, %edi
+        movslq    %ecx, %rcx
+        vpextrd   $1, %xmm3, %esi
+        movslq    %edi, %rdi
+        vpextrd   $3, %xmm3, %r8d
+        movslq    %esi, %rsi
+        movslq    %r8d, %r8
+        vpcmpgtd  _iCmp+__svml_dcbrt_data_internal(%rip), %xmm8, %xmm9
+        vmovsd    (%rax,%rcx), %xmm11
+        vmovmskps %xmm9, %edx
+        vmovsd    (%rax,%rdi), %xmm13
+        vmovhpd   (%rax,%rsi), %xmm11, %xmm12
+        vmovhpd   (%rax,%r8), %xmm13, %xmm14
+        vextractf128 $1, %ymm6, %xmm7
+        vshufps   $136, %xmm7, %xmm6, %xmm8
+        vmovups   __VUNPACK_ODD_ind1.613.0.1(%rip), %ymm7
+        vextractf128 $1, %ymm0, %xmm1
+        vshufps   $136, %xmm1, %xmm0, %xmm9
+        vpsrld    $14, %xmm8, %xmm1
+        vpsubd    %xmm1, %xmm9, %xmm10
+        vpaddd    %xmm1, %xmm1, %xmm11
+
+/*
+ * VAND( L, l2k, = l2k, lExpHiMask );
+ * Argument reduction Z
+ */
+        vandpd    _dMantissaMask+__svml_dcbrt_data_internal(%rip), %ymm5, %ymm9
+        vinsertf128 $1, %xmm14, %ymm12, %ymm2
+        vpsubd    %xmm11, %xmm10, %xmm12
+        vpslld    $8, %xmm12, %xmm13
+        vpaddd    %xmm13, %xmm3, %xmm15
+
+/* Load cbrt(2^j*(1+iRcpIndex/32+1/64)) Hi & Lo values */
+        vmovd     %xmm15, %r9d
+        vpextrd   $2, %xmm15, %r11d
+        movslq    %r9d, %r9
+        vpextrd   $1, %xmm15, %r10d
+        movslq    %r11d, %r11
+        vpextrd   $3, %xmm15, %ecx
+        movslq    %r10d, %r10
+        movslq    %ecx, %rcx
+        vmovsd    256(%rax,%r9), %xmm3
+        vmovsd    256(%rax,%r11), %xmm0
+        vandpd    _dSgnf6Mask+__svml_dcbrt_data_internal(%rip), %ymm5, %ymm10
+        vmovhpd   256(%rax,%r10), %xmm3, %xmm14
+        vmovhpd   256(%rax,%rcx), %xmm0, %xmm3
+        vorpd     _dNegOne+__svml_dcbrt_data_internal(%rip), %ymm9, %ymm11
+        vorpd     _dNeg65Div64+__svml_dcbrt_data_internal(%rip), %ymm10, %ymm12
+        vsubpd    %ymm12, %ymm11, %ymm13
+        vmulpd    %ymm13, %ymm2, %ymm2
+        vinsertf128 $1, %xmm3, %ymm14, %ymm0
+        vpand     _iSignMask+__svml_dcbrt_data_internal(%rip), %xmm4, %xmm3
+        vpor      _iBias+__svml_dcbrt_data_internal(%rip), %xmm3, %xmm4
+        vpaddd    %xmm1, %xmm4, %xmm1
+        vpslld    $20, %xmm1, %xmm6
+
+/* Polynomial */
+        vmovupd   _dA7+__svml_dcbrt_data_internal(%rip), %ymm1
+        vfmadd213pd _dA6+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213pd _dA5+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213pd _dA4+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213pd _dA3+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213pd _dA2+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213pd _dA1+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
+        vpermps   %ymm6, %ymm7, %ymm8
+        vandps    __VUNPACK_ODD_mask.613.0.1(%rip), %ymm8, %ymm14
+
+/* THi*2^k, TLo*2^k */
+        vmulpd    %ymm14, %ymm0, %ymm0
+
+/* THi*2^k*Z */
+        vmulpd    %ymm0, %ymm2, %ymm2
+
+/* Final reconstruction */
+        vmulpd    %ymm2, %ymm1, %ymm3
+        vaddpd    %ymm3, %ymm0, %ymm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm5
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm5, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      cbrt@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_cbrt_avx2)
+        .section .rodata, "a"
+        .align 32
+
+__VUNPACK_ODD_ind1.613.0.1:
+	.rept	3
+        .long	0
+	.endr
+        .long	1
+        .long	0
+        .long	2
+        .long	0
+        .long	3
+        .align 32
+
+__VUNPACK_ODD_mask.613.0.1:
+        .long	0
+        .long	-1
+        .long	0
+        .long	-1
+        .long	0
+        .long	-1
+        .long	0
+        .long	-1
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dcbrt_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _dRcp[32][2];
+        __declspec(align(32)) VUINT32 _dCbrtHiLo[96][2];
+        __declspec(align(32)) VUINT32 _dA7[4][2];
+        __declspec(align(32)) VUINT32 _dA6[4][2];
+        __declspec(align(32)) VUINT32 _dA5[4][2];
+        __declspec(align(32)) VUINT32 _dA4[4][2];
+        __declspec(align(32)) VUINT32 _dA3[4][2];
+        __declspec(align(32)) VUINT32 _dA2[4][2];
+        __declspec(align(32)) VUINT32 _dA1[4][2];
+        __declspec(align(32)) VUINT32 _dNeg65Div64[4][2];
+        __declspec(align(32)) VUINT32 _dSgnf6Mask[4][2];
+        __declspec(align(32)) VUINT32 _dNegOne[4][2];
+        __declspec(align(32)) VUINT32 _dMantissaMask[4][2];
+        __declspec(align(32)) VUINT32 _lExpHiMask[4][2];
+        __declspec(align(32)) VUINT32 _lExpLoMask[4][2];
+        __declspec(align(32)) VUINT32 _l1556[4][2];
+        __declspec(align(32)) VUINT32 _iRcpIndexMask[8][1];
+        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
+        __declspec(align(32)) VUINT32 _iSignMask[8][1];
+        __declspec(align(32)) VUINT32 _iBias[8][1];
+        __declspec(align(32)) VUINT32 _iSub[8][1];
+        __declspec(align(32)) VUINT32 _iCmp[8][1];
+} __svml_dcbrt_data_internal;
+#endif
+__svml_dcbrt_data_internal:
+        /*== _dRcp ==*/
+        .quad 0xBFEF81F81F81F820  /* (1/(1+0/32+1/64)) = -.984615 */
+        .quad 0xBFEE9131ABF0B767  /* (1/(1+1/32+1/64)) = -.955224 */
+        .quad 0xBFEDAE6076B981DB  /* (1/(1+2/32+1/64)) = -.927536 */
+        .quad 0xBFECD85689039B0B  /* (1/(1+3/32+1/64)) = -.901408 */
+        .quad 0xBFEC0E070381C0E0  /* (1/(1+4/32+1/64)) = -.876712 */
+        .quad 0xBFEB4E81B4E81B4F  /* (1/(1+5/32+1/64)) = -.853333 */
+        .quad 0xBFEA98EF606A63BE  /* (1/(1+6/32+1/64)) = -.831169 */
+        .quad 0xBFE9EC8E951033D9  /* (1/(1+7/32+1/64)) = -.810127 */
+        .quad 0xBFE948B0FCD6E9E0  /* (1/(1+8/32+1/64)) = -.790123 */
+        .quad 0xBFE8ACB90F6BF3AA  /* (1/(1+9/32+1/64)) = -.771084 */
+        .quad 0xBFE8181818181818  /* (1/(1+10/32+1/64)) = -.752941 */
+        .quad 0xBFE78A4C8178A4C8  /* (1/(1+11/32+1/64)) = -.735632 */
+        .quad 0xBFE702E05C0B8170  /* (1/(1+12/32+1/64)) = -.719101 */
+        .quad 0xBFE6816816816817  /* (1/(1+13/32+1/64)) = -.703297 */
+        .quad 0xBFE6058160581606  /* (1/(1+14/32+1/64)) = -.688172 */
+        .quad 0xBFE58ED2308158ED  /* (1/(1+15/32+1/64)) = -.673684 */
+        .quad 0xBFE51D07EAE2F815  /* (1/(1+16/32+1/64)) = -.659794 */
+        .quad 0xBFE4AFD6A052BF5B  /* (1/(1+17/32+1/64)) = -.646465 */
+        .quad 0xBFE446F86562D9FB  /* (1/(1+18/32+1/64)) = -.633663 */
+        .quad 0xBFE3E22CBCE4A902  /* (1/(1+19/32+1/64)) = -.621359 */
+        .quad 0xBFE3813813813814  /* (1/(1+20/32+1/64)) = -.609524 */
+        .quad 0xBFE323E34A2B10BF  /* (1/(1+21/32+1/64)) = -.598131 */
+        .quad 0xBFE2C9FB4D812CA0  /* (1/(1+22/32+1/64)) = -.587156 */
+        .quad 0xBFE27350B8812735  /* (1/(1+23/32+1/64)) = -.576577 */
+        .quad 0xBFE21FB78121FB78  /* (1/(1+24/32+1/64)) = -.566372 */
+        .quad 0xBFE1CF06ADA2811D  /* (1/(1+25/32+1/64)) = -.556522 */
+        .quad 0xBFE1811811811812  /* (1/(1+26/32+1/64)) = -.547009 */
+        .quad 0xBFE135C81135C811  /* (1/(1+27/32+1/64)) = -.537815 */
+        .quad 0xBFE0ECF56BE69C90  /* (1/(1+28/32+1/64)) = -.528926 */
+        .quad 0xBFE0A6810A6810A7  /* (1/(1+29/32+1/64)) = -.520325 */
+        .quad 0xBFE0624DD2F1A9FC  /* (1/(1+30/32+1/64)) = -.512    */
+        .quad 0xBFE0204081020408  /* (1/(1+31/32+1/64)) = -.503937 */
+        /*== _dCbrtHiLo ==*/
+        .align 32
+        .quad 0x3FF01539221D4C97    /* HI((2^0*(1+0/32+1/64))^(1/3)) = 1.005181 */
+        .quad 0x3FF03F06771A2E33    /* HI((2^0*(1+1/32+1/64))^(1/3)) = 1.015387 */
+        .quad 0x3FF06800E629D671    /* HI((2^0*(1+2/32+1/64))^(1/3)) = 1.025391 */
+        .quad 0x3FF090328731DEB2    /* HI((2^0*(1+3/32+1/64))^(1/3)) = 1.035204 */
+        .quad 0x3FF0B7A4B1BD64AC    /* HI((2^0*(1+4/32+1/64))^(1/3)) = 1.044835 */
+        .quad 0x3FF0DE601024FB87    /* HI((2^0*(1+5/32+1/64))^(1/3)) = 1.054291 */
+        .quad 0x3FF1046CB0597000    /* HI((2^0*(1+6/32+1/64))^(1/3)) = 1.06358  */
+        .quad 0x3FF129D212A9BA9B    /* HI((2^0*(1+7/32+1/64))^(1/3)) = 1.07271  */
+        .quad 0x3FF14E9736CDAF38    /* HI((2^0*(1+8/32+1/64))^(1/3)) = 1.081687 */
+        .quad 0x3FF172C2A772F507    /* HI((2^0*(1+9/32+1/64))^(1/3)) = 1.090518 */
+        .quad 0x3FF1965A848001D3    /* HI((2^0*(1+10/32+1/64))^(1/3)) = 1.099207 */
+        .quad 0x3FF1B9648C38C55D    /* HI((2^0*(1+11/32+1/64))^(1/3)) = 1.107762 */
+        .quad 0x3FF1DBE6236A0C45    /* HI((2^0*(1+12/32+1/64))^(1/3)) = 1.116186 */
+        .quad 0x3FF1FDE45CBB1F9F    /* HI((2^0*(1+13/32+1/64))^(1/3)) = 1.124485 */
+        .quad 0x3FF21F63FF409042    /* HI((2^0*(1+14/32+1/64))^(1/3)) = 1.132664 */
+        .quad 0x3FF240698C6746E5    /* HI((2^0*(1+15/32+1/64))^(1/3)) = 1.140726 */
+        .quad 0x3FF260F9454BB99B    /* HI((2^0*(1+16/32+1/64))^(1/3)) = 1.148675 */
+        .quad 0x3FF281172F8E7073    /* HI((2^0*(1+17/32+1/64))^(1/3)) = 1.156516 */
+        .quad 0x3FF2A0C719B4B6D0    /* HI((2^0*(1+18/32+1/64))^(1/3)) = 1.164252 */
+        .quad 0x3FF2C00C9F2263EC    /* HI((2^0*(1+19/32+1/64))^(1/3)) = 1.171887 */
+        .quad 0x3FF2DEEB2BB7FB78    /* HI((2^0*(1+20/32+1/64))^(1/3)) = 1.179423 */
+        .quad 0x3FF2FD65FF1EFBBC    /* HI((2^0*(1+21/32+1/64))^(1/3)) = 1.186865 */
+        .quad 0x3FF31B802FCCF6A2    /* HI((2^0*(1+22/32+1/64))^(1/3)) = 1.194214 */
+        .quad 0x3FF3393CADC50708    /* HI((2^0*(1+23/32+1/64))^(1/3)) = 1.201474 */
+        .quad 0x3FF3569E451E4C2A    /* HI((2^0*(1+24/32+1/64))^(1/3)) = 1.208647 */
+        .quad 0x3FF373A7A0554CDE    /* HI((2^0*(1+25/32+1/64))^(1/3)) = 1.215736 */
+        .quad 0x3FF3905B4A6D76CE    /* HI((2^0*(1+26/32+1/64))^(1/3)) = 1.222743 */
+        .quad 0x3FF3ACBBB0E756B6    /* HI((2^0*(1+27/32+1/64))^(1/3)) = 1.229671 */
+        .quad 0x3FF3C8CB258FA340    /* HI((2^0*(1+28/32+1/64))^(1/3)) = 1.236522 */
+        .quad 0x3FF3E48BE02AC0CE    /* HI((2^0*(1+29/32+1/64))^(1/3)) = 1.243297 */
+        .quad 0x3FF4000000000000    /* HI((2^0*(1+30/32+1/64))^(1/3)) = 1.25     */
+        .quad 0x3FF41B298D47800E    /* HI((2^0*(1+31/32+1/64))^(1/3)) = 1.256631 */
+        .quad 0x3FF443604B34D9B2    /* HI((2^1*(1+0/32+1/64))^(1/3)) = 1.266449 */
+        .quad 0x3FF4780B20906571    /* HI((2^1*(1+1/32+1/64))^(1/3)) = 1.279307 */
+        .quad 0x3FF4ABAC3EE06706    /* HI((2^1*(1+2/32+1/64))^(1/3)) = 1.291912 */
+        .quad 0x3FF4DE505DA66B8D    /* HI((2^1*(1+3/32+1/64))^(1/3)) = 1.304276 */
+        .quad 0x3FF51003420A5C07    /* HI((2^1*(1+4/32+1/64))^(1/3)) = 1.316409 */
+        .quad 0x3FF540CFD6FD11C1    /* HI((2^1*(1+5/32+1/64))^(1/3)) = 1.328323 */
+        .quad 0x3FF570C04260716B    /* HI((2^1*(1+6/32+1/64))^(1/3)) = 1.340027 */
+        .quad 0x3FF59FDDF7A45F38    /* HI((2^1*(1+7/32+1/64))^(1/3)) = 1.35153  */
+        .quad 0x3FF5CE31C83539DF    /* HI((2^1*(1+8/32+1/64))^(1/3)) = 1.36284  */
+        .quad 0x3FF5FBC3F20966A4    /* HI((2^1*(1+9/32+1/64))^(1/3)) = 1.373966 */
+        .quad 0x3FF6289C2C8F1B70    /* HI((2^1*(1+10/32+1/64))^(1/3)) = 1.384915 */
+        .quad 0x3FF654C1B4316DCF    /* HI((2^1*(1+11/32+1/64))^(1/3)) = 1.395693 */
+        .quad 0x3FF6803B54A34E44    /* HI((2^1*(1+12/32+1/64))^(1/3)) = 1.406307 */
+        .quad 0x3FF6AB0F72182659    /* HI((2^1*(1+13/32+1/64))^(1/3)) = 1.416763 */
+        .quad 0x3FF6D544118C08BC    /* HI((2^1*(1+14/32+1/64))^(1/3)) = 1.427067 */
+        .quad 0x3FF6FEDEE0388D4A    /* HI((2^1*(1+15/32+1/64))^(1/3)) = 1.437224 */
+        .quad 0x3FF727E53A4F645E    /* HI((2^1*(1+16/32+1/64))^(1/3)) = 1.44724  */
+        .quad 0x3FF7505C31104114    /* HI((2^1*(1+17/32+1/64))^(1/3)) = 1.457119 */
+        .quad 0x3FF77848904CD549    /* HI((2^1*(1+18/32+1/64))^(1/3)) = 1.466866 */
+        .quad 0x3FF79FAEE36B2534    /* HI((2^1*(1+19/32+1/64))^(1/3)) = 1.476485 */
+        .quad 0x3FF7C69379F4605B    /* HI((2^1*(1+20/32+1/64))^(1/3)) = 1.48598  */
+        .quad 0x3FF7ECFA6BBCA391    /* HI((2^1*(1+21/32+1/64))^(1/3)) = 1.495356 */
+        .quad 0x3FF812E79CAE7EB9    /* HI((2^1*(1+22/32+1/64))^(1/3)) = 1.504615 */
+        .quad 0x3FF8385EC043C71D    /* HI((2^1*(1+23/32+1/64))^(1/3)) = 1.513762 */
+        .quad 0x3FF85D635CB41B9D    /* HI((2^1*(1+24/32+1/64))^(1/3)) = 1.5228   */
+        .quad 0x3FF881F8CDE083DB    /* HI((2^1*(1+25/32+1/64))^(1/3)) = 1.531731 */
+        .quad 0x3FF8A6224802B8A8    /* HI((2^1*(1+26/32+1/64))^(1/3)) = 1.54056  */
+        .quad 0x3FF8C9E2DA25E5E4    /* HI((2^1*(1+27/32+1/64))^(1/3)) = 1.549289 */
+        .quad 0x3FF8ED3D706E1010    /* HI((2^1*(1+28/32+1/64))^(1/3)) = 1.55792  */
+        .quad 0x3FF91034D632B6DF    /* HI((2^1*(1+29/32+1/64))^(1/3)) = 1.566457 */
+        .quad 0x3FF932CBB7F0CF2D    /* HI((2^1*(1+30/32+1/64))^(1/3)) = 1.574901 */
+        .quad 0x3FF95504A517BF3A    /* HI((2^1*(1+31/32+1/64))^(1/3)) = 1.583256 */
+        .quad 0x3FF987AF34F8BB19    /* HI((2^2*(1+0/32+1/64))^(1/3)) = 1.595626 */
+        .quad 0x3FF9CA0A8337B317    /* HI((2^2*(1+1/32+1/64))^(1/3)) = 1.611826 */
+        .quad 0x3FFA0B1709CC13D5    /* HI((2^2*(1+2/32+1/64))^(1/3)) = 1.627708 */
+        .quad 0x3FFA4AE4CE6419ED    /* HI((2^2*(1+3/32+1/64))^(1/3)) = 1.643285 */
+        .quad 0x3FFA8982A5567031    /* HI((2^2*(1+4/32+1/64))^(1/3)) = 1.658572 */
+        .quad 0x3FFAC6FE500AB570    /* HI((2^2*(1+5/32+1/64))^(1/3)) = 1.673582 */
+        .quad 0x3FFB036497A15A17    /* HI((2^2*(1+6/32+1/64))^(1/3)) = 1.688328 */
+        .quad 0x3FFB3EC164671755    /* HI((2^2*(1+7/32+1/64))^(1/3)) = 1.702821 */
+        .quad 0x3FFB791FD288C46F    /* HI((2^2*(1+8/32+1/64))^(1/3)) = 1.717071 */
+        .quad 0x3FFBB28A44693BE4    /* HI((2^2*(1+9/32+1/64))^(1/3)) = 1.731089 */
+        .quad 0x3FFBEB0A72EB6E31    /* HI((2^2*(1+10/32+1/64))^(1/3)) = 1.744883 */
+        .quad 0x3FFC22A97BF5F697    /* HI((2^2*(1+11/32+1/64))^(1/3)) = 1.758462 */
+        .quad 0x3FFC596FEF6AF983    /* HI((2^2*(1+12/32+1/64))^(1/3)) = 1.771835 */
+        .quad 0x3FFC8F65DAC655A3    /* HI((2^2*(1+13/32+1/64))^(1/3)) = 1.785009 */
+        .quad 0x3FFCC492D38CE8D9    /* HI((2^2*(1+14/32+1/64))^(1/3)) = 1.797992 */
+        .quad 0x3FFCF8FE00B19367    /* HI((2^2*(1+15/32+1/64))^(1/3)) = 1.810789 */
+        .quad 0x3FFD2CAE230F8709    /* HI((2^2*(1+16/32+1/64))^(1/3)) = 1.823408 */
+        .quad 0x3FFD5FA99D15208F    /* HI((2^2*(1+17/32+1/64))^(1/3)) = 1.835855 */
+        .quad 0x3FFD91F679B6E505    /* HI((2^2*(1+18/32+1/64))^(1/3)) = 1.848135 */
+        .quad 0x3FFDC39A72BF2302    /* HI((2^2*(1+19/32+1/64))^(1/3)) = 1.860255 */
+        .quad 0x3FFDF49AF68C1570    /* HI((2^2*(1+20/32+1/64))^(1/3)) = 1.872218 */
+        .quad 0x3FFE24FD2D4C23B8    /* HI((2^2*(1+21/32+1/64))^(1/3)) = 1.884031 */
+        .quad 0x3FFE54C5FDC5EC73    /* HI((2^2*(1+22/32+1/64))^(1/3)) = 1.895697 */
+        .quad 0x3FFE83FA11B81DBB    /* HI((2^2*(1+23/32+1/64))^(1/3)) = 1.907221 */
+        .quad 0x3FFEB29DD9DBAF25    /* HI((2^2*(1+24/32+1/64))^(1/3)) = 1.918608 */
+        .quad 0x3FFEE0B59191D374    /* HI((2^2*(1+25/32+1/64))^(1/3)) = 1.929861 */
+        .quad 0x3FFF0E454245E4BF    /* HI((2^2*(1+26/32+1/64))^(1/3)) = 1.940984 */
+        .quad 0x3FFF3B50C68A9DD3    /* HI((2^2*(1+27/32+1/64))^(1/3)) = 1.951981 */
+        .quad 0x3FFF67DBCCF922DC    /* HI((2^2*(1+28/32+1/64))^(1/3)) = 1.962856 */
+        .quad 0x3FFF93E9DAD7A4A6    /* HI((2^2*(1+29/32+1/64))^(1/3)) = 1.973612 */
+        .quad 0x3FFFBF7E4E8CC9CB    /* HI((2^2*(1+30/32+1/64))^(1/3)) = 1.984251 */
+        .quad 0x3FFFEA9C61E47CD3    /* HI((2^2*(1+31/32+1/64))^(1/3)) = 1.994778 */
+        .align 32
+        .quad 0x3F93750AD588F115, 0x3F93750AD588F115, 0x3F93750AD588F115, 0x3F93750AD588F115      /* _dA7 */
+        .align 32
+        .quad 0xBF98090D6221A247, 0xBF98090D6221A247, 0xBF98090D6221A247, 0xBF98090D6221A247      /* _dA6 */
+        .align 32
+        .quad 0x3F9EE7113506AC12, 0x3F9EE7113506AC12, 0x3F9EE7113506AC12, 0x3F9EE7113506AC12      /* _dA5 */
+        .align 32
+        .quad 0xBFA511E8D2B3183B, 0xBFA511E8D2B3183B, 0xBFA511E8D2B3183B, 0xBFA511E8D2B3183B      /* _dA4 */
+        .align 32
+        .quad 0x3FAF9ADD3C0CA458, 0x3FAF9ADD3C0CA458, 0x3FAF9ADD3C0CA458, 0x3FAF9ADD3C0CA458      /* _dA3 */
+        .align 32
+        .quad 0xBFBC71C71C71C71C, 0xBFBC71C71C71C71C, 0xBFBC71C71C71C71C, 0xBFBC71C71C71C71C      /* _dA2 */
+        .align 32
+        .quad 0x3FD5555555555555, 0x3FD5555555555555, 0x3FD5555555555555, 0x3FD5555555555555      /* _dA1 */
+        .align 32
+        .quad 0xBFF0400000000000, 0xBFF0400000000000, 0xBFF0400000000000, 0xBFF0400000000000        /* _dNeg65Div64 */
+        .align 32
+        .quad 0x000FC00000000000, 0x000FC00000000000, 0x000FC00000000000, 0x000FC00000000000        /* _dSgnf6Mask */
+        .align 32
+        .quad 0xBFF0000000000000, 0xBFF0000000000000, 0xBFF0000000000000, 0xBFF0000000000000        /* _dNegOne */
+        .align 32
+        .quad 0x000FFFFFFFFFFFFF, 0x000FFFFFFFFFFFFF, 0x000FFFFFFFFFFFFF, 0x000FFFFFFFFFFFFF        /* _dMantissaMask */
+        .align 32
+        .quad 0xFFF0000000000000, 0xFFF0000000000000, 0xFFF0000000000000, 0xFFF0000000000000        /* _lExpHiMask */
+        .align 32
+        .quad 0x00000000000007FF, 0x00000000000007FF, 0x00000000000007FF, 0x00000000000007FF        /* _lExpLoMask */
+        .align 32
+        .quad 0x0000000000001556, 0x0000000000001556, 0x0000000000001556, 0x0000000000001556        /* _l1556 */
+        .align 32
+        .long 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000    /* _iRcpIndexMask */
+        .align 32
+        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF    /* _iAbsMask */
+        .align 32
+        .long 0x00000800, 0x00000800, 0x00000800, 0x00000800, 0x00000800, 0x00000800, 0x00000800, 0x00000800    /* _iSignMask */
+        .align 32
+        .long 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA    /* _iBias */
+        .align 32
+        .long 0x80100000, 0x80100000, 0x80100000, 0x80100000, 0x80100000, 0x80100000, 0x80100000, 0x80100000    /* _iSub */
+        .align 32
+        .long 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff    /* _iCmp */
+        .align 32
+        .type	__svml_dcbrt_data_internal,@object
+        .size	__svml_dcbrt_data_internal,.-__svml_dcbrt_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S
new file mode 100644
index 0000000000..3831e582ce
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized cbrt, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_cbrt _ZGVeN8v_cbrt_avx2_wrapper
+#include "../svml_d_cbrt8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c
new file mode 100644
index 0000000000..28c147216f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized cbrt, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_cbrt
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_cbrt, __GI__ZGVeN8v_cbrt, __redirect__ZGVeN8v_cbrt)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S
new file mode 100644
index 0000000000..b9c071b54c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S
@@ -0,0 +1,253 @@
+/* Function cbrt vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
+ *   Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
+ *   where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in double precision
+ *   cbrt(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
+ *   (T stores the high 53 bits, D stores the low order bits)
+ *   Result=2^k*T+(2^k*T*r)*P+2^k*D
+ *   where P=p1+p2*r+..+p8*r^7
+ *
+ */
+
+/* Offsets for data table __svml_dcbrt_data_internal_avx512
+ */
+#define etbl_H                        	0
+#define etbl_L                        	64
+#define cbrt_tbl_H                    	128
+#define BiasL                         	256
+#define SZero                         	320
+#define OneThird                      	384
+#define Bias3                         	448
+#define Three                         	512
+#define One                           	576
+#define poly_coeff10                  	640
+#define poly_coeff9                   	704
+#define poly_coeff8                   	768
+#define poly_coeff7                   	832
+#define poly_coeff6                   	896
+#define poly_coeff5                   	960
+#define poly_coeff4                   	1024
+#define poly_coeff3                   	1088
+#define poly_coeff2                   	1152
+#define poly_coeff1                   	1216
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_cbrt_skx)
+        vgetmantpd $0, {sae}, %zmm0, %zmm14
+
+/* GetExp(x) */
+        vgetexppd {sae}, %zmm0, %zmm7
+        vmovups   BiasL+__svml_dcbrt_data_internal_avx512(%rip), %zmm8
+
+/* exponent/3 */
+        vmovups   OneThird+__svml_dcbrt_data_internal_avx512(%rip), %zmm9
+        vmovups   Bias3+__svml_dcbrt_data_internal_avx512(%rip), %zmm10
+
+/* Reduced argument: R = DblRcp*Mantissa - 1 */
+        vmovups   One+__svml_dcbrt_data_internal_avx512(%rip), %zmm2
+
+/* exponent%3 (to be used as index) */
+        vmovups   Three+__svml_dcbrt_data_internal_avx512(%rip), %zmm11
+
+/* DblRcp ~ 1/Mantissa */
+        vrcp14pd  %zmm14, %zmm13
+        vaddpd    {rn-sae}, %zmm8, %zmm7, %zmm12
+        vandpd    SZero+__svml_dcbrt_data_internal_avx512(%rip), %zmm0, %zmm6
+
+/* round DblRcp to 3 fractional bits (RN mode, no Precision exception) */
+        vrndscalepd $72, {sae}, %zmm13, %zmm15
+        vfmsub231pd {rn-sae}, %zmm12, %zmm9, %zmm10
+
+/* polynomial */
+        vmovups   poly_coeff10+__svml_dcbrt_data_internal_avx512(%rip), %zmm0
+        vmovups   poly_coeff8+__svml_dcbrt_data_internal_avx512(%rip), %zmm7
+        vmovups   poly_coeff7+__svml_dcbrt_data_internal_avx512(%rip), %zmm9
+        vfmsub231pd {rn-sae}, %zmm15, %zmm14, %zmm2
+        vrndscalepd $9, {sae}, %zmm10, %zmm5
+
+/* Table lookup */
+        vmovups   cbrt_tbl_H+__svml_dcbrt_data_internal_avx512(%rip), %zmm10
+        vmovups   poly_coeff6+__svml_dcbrt_data_internal_avx512(%rip), %zmm8
+        vmovups   poly_coeff3+__svml_dcbrt_data_internal_avx512(%rip), %zmm13
+        vfmadd231pd {rn-sae}, %zmm2, %zmm7, %zmm9
+        vfnmadd231pd {rn-sae}, %zmm5, %zmm11, %zmm12
+        vmovups   poly_coeff5+__svml_dcbrt_data_internal_avx512(%rip), %zmm11
+        vmovups   poly_coeff1+__svml_dcbrt_data_internal_avx512(%rip), %zmm14
+
+/* Prepare table index */
+        vpsrlq    $49, %zmm15, %zmm1
+
+/* Table lookup: 2^(exponent%3) */
+        vpermpd   __svml_dcbrt_data_internal_avx512(%rip), %zmm12, %zmm4
+        vpermpd   etbl_L+__svml_dcbrt_data_internal_avx512(%rip), %zmm12, %zmm3
+        vpermt2pd cbrt_tbl_H+64+__svml_dcbrt_data_internal_avx512(%rip), %zmm1, %zmm10
+        vmovups   poly_coeff9+__svml_dcbrt_data_internal_avx512(%rip), %zmm1
+        vfmadd231pd {rn-sae}, %zmm2, %zmm8, %zmm11
+        vmovups   poly_coeff2+__svml_dcbrt_data_internal_avx512(%rip), %zmm12
+        vscalefpd {rn-sae}, %zmm5, %zmm10, %zmm15
+        vfmadd231pd {rn-sae}, %zmm2, %zmm0, %zmm1
+        vmovups   poly_coeff4+__svml_dcbrt_data_internal_avx512(%rip), %zmm5
+        vfmadd231pd {rn-sae}, %zmm2, %zmm12, %zmm14
+        vmulpd    {rn-sae}, %zmm2, %zmm2, %zmm0
+        vfmadd231pd {rn-sae}, %zmm2, %zmm5, %zmm13
+
+/* Sh*R */
+        vmulpd    {rn-sae}, %zmm2, %zmm4, %zmm2
+        vfmadd213pd {rn-sae}, %zmm9, %zmm0, %zmm1
+        vfmadd213pd {rn-sae}, %zmm11, %zmm0, %zmm1
+        vfmadd213pd {rn-sae}, %zmm13, %zmm0, %zmm1
+        vfmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm1
+
+/* Sl + (Sh*R)*Poly */
+        vfmadd213pd {rn-sae}, %zmm3, %zmm1, %zmm2
+
+/*
+ * branch-free
+ * scaled_Th*(Sh+Sl+Sh*R*Poly)
+ */
+        vaddpd    {rn-sae}, %zmm4, %zmm2, %zmm3
+        vmulpd    {rn-sae}, %zmm15, %zmm3, %zmm4
+        vorpd     %zmm6, %zmm4, %zmm0
+        ret
+
+END(_ZGVeN8v_cbrt_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dcbrt_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 etbl_H[8][2];
+        __declspec(align(64)) VUINT32 etbl_L[8][2];
+        __declspec(align(64)) VUINT32 cbrt_tbl_H[16][2];
+        __declspec(align(64)) VUINT32 BiasL[8][2];
+        __declspec(align(64)) VUINT32 SZero[8][2];
+        __declspec(align(64)) VUINT32 OneThird[8][2];
+        __declspec(align(64)) VUINT32 Bias3[8][2];
+        __declspec(align(64)) VUINT32 Three[8][2];
+        __declspec(align(64)) VUINT32 One[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff10[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
+    } __svml_dcbrt_data_internal_avx512;
+#endif
+__svml_dcbrt_data_internal_avx512:
+        /*== etbl_H ==*/
+        .quad 0x3ff0000000000000
+        .quad 0x3ff428a2f98d728b
+        .quad 0x3ff965fea53d6e3d
+        .quad 0x0000000000000000
+        .quad 0xbff0000000000000
+        .quad 0xbff428a2f98d728b
+        .quad 0xbff965fea53d6e3d
+        .quad 0x0000000000000000
+        /*== etbl_L ==*/
+        .align 64
+        .quad 0x0000000000000000
+        .quad 0xbc7ddc22548ea41e
+        .quad 0xbc9f53e999952f09
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        .quad 0x3c7ddc22548ea41e
+        .quad 0x3c9f53e999952f09
+        .quad 0x0000000000000000
+        /*== cbrt_tbl_H ==*/
+        .align 64
+        .quad 0x3ff428a2f98d728b
+        .quad 0x3ff361f35ca116ff
+        .quad 0x3ff2b6b5edf6b54a
+        .quad 0x3ff220e6dd675180
+        .quad 0x3ff19c3b38e975a8
+        .quad 0x3ff12589c21fb842
+        .quad 0x3ff0ba6ee5f9aad4
+        .quad 0x3ff059123d3a9848
+        .quad 0x3ff0000000000000
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        /*== BiasL ==*/
+        .align 64
+        .quad 0x4338000000000000, 0x4338000000000000, 0x4338000000000000, 0x4338000000000000, 0x4338000000000000, 0x4338000000000000, 0x4338000000000000, 0x4338000000000000
+        /*== Zero ==*/
+        .align 64
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
+        /*== OneThird ==*/
+        .align 64
+        .quad 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556
+        /*== Bias3 ==*/
+        .align 64
+        .quad 0x4320000000000000, 0x4320000000000000, 0x4320000000000000, 0x4320000000000000, 0x4320000000000000, 0x4320000000000000, 0x4320000000000000, 0x4320000000000000
+        /*== Three ==*/
+        .align 64
+        .quad 0x4008000000000000, 0x4008000000000000, 0x4008000000000000, 0x4008000000000000, 0x4008000000000000, 0x4008000000000000, 0x4008000000000000, 0x4008000000000000
+        /*==One ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== poly_coeff10 ==*/
+        .align 64
+        .quad 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62
+        /*== poly_coeff9 ==*/
+        .align 64
+        .quad 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875
+        /*== poly_coeff8 ==*/
+        .align 64
+        .quad 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f
+        /*== poly_coeff7 ==*/
+        .align 64
+        .quad 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914
+        /*== poly_coeff6 ==*/
+        .align 64
+        .quad 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e
+        /*== poly_coeff5 ==*/
+        .align 64
+        .quad 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569
+        /*== poly_coeff4 ==*/
+        .align 64
+        .quad 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e
+        /*== poly_coeff3 ==*/
+        .align 64
+        .quad 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31
+        /*== poly_coeff2 ==*/
+        .align 64
+        .quad 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741
+        /*== poly_coeff1 ==*/
+        .align 64
+        .quad 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557
+        .align 64
+        .type	__svml_dcbrt_data_internal_avx512,@object
+        .size	__svml_dcbrt_data_internal_avx512,.-__svml_dcbrt_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S
new file mode 100644
index 0000000000..faa847fba6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized cbrtf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_cbrtf _ZGVeN16v_cbrtf_avx2_wrapper
+#include "../svml_s_cbrtf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c
new file mode 100644
index 0000000000..785a68cc0d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized cbrtf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_cbrtf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_cbrtf, __GI__ZGVeN16v_cbrtf,
+	       __redirect__ZGVeN16v_cbrtf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S
new file mode 100644
index 0000000000..55b017682b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S
@@ -0,0 +1,235 @@
+/* Function cbrtf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *     x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
+ *     Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
+ *     where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in single precision
+ *     cbrtf(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
+ *     (T stores the high 24 bits, D stores the low order bits)
+ *     Result=2^k*T+(2^k*T*r)*P+2^k*D
+ *      where P=p1+p2*r+..
+ *
+ */
+
+/* Offsets for data table __svml_scbrt_data_internal_avx512
+ */
+#define etbl_H                        	0
+#define etbl_L                        	64
+#define cbrt_tbl_H                    	128
+#define BiasL                         	256
+#define SZero                         	320
+#define OneThird                      	384
+#define Bias3                         	448
+#define Three                         	512
+#define One                           	576
+#define poly_coeff3                   	640
+#define poly_coeff2                   	704
+#define poly_coeff1                   	768
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_cbrtf_skx)
+        vgetmantps $0, {sae}, %zmm0, %zmm8
+
+/* GetExp(x) */
+        vgetexpps {sae}, %zmm0, %zmm1
+        vmovups   BiasL+__svml_scbrt_data_internal_avx512(%rip), %zmm2
+
+/* exponent/3 */
+        vmovups   OneThird+__svml_scbrt_data_internal_avx512(%rip), %zmm3
+        vmovups   Bias3+__svml_scbrt_data_internal_avx512(%rip), %zmm4
+        vmovups   One+__svml_scbrt_data_internal_avx512(%rip), %zmm15
+
+/* exponent%3 (to be used as index) */
+        vmovups   Three+__svml_scbrt_data_internal_avx512(%rip), %zmm5
+
+/* polynomial */
+        vmovups   poly_coeff3+__svml_scbrt_data_internal_avx512(%rip), %zmm11
+        vmovups   poly_coeff1+__svml_scbrt_data_internal_avx512(%rip), %zmm14
+
+/* Table lookup */
+        vmovups   cbrt_tbl_H+__svml_scbrt_data_internal_avx512(%rip), %zmm12
+
+/* DblRcp ~ 1/Mantissa */
+        vrcp14ps  %zmm8, %zmm7
+        vaddps    {rn-sae}, %zmm2, %zmm1, %zmm6
+        vandps    SZero+__svml_scbrt_data_internal_avx512(%rip), %zmm0, %zmm0
+
+/* round DblRcp to 3 fractional bits (RN mode, no Precision exception) */
+        vrndscaleps $88, {sae}, %zmm7, %zmm9
+        vfmsub231ps {rn-sae}, %zmm6, %zmm3, %zmm4
+        vmovups   poly_coeff2+__svml_scbrt_data_internal_avx512(%rip), %zmm7
+
+/* Reduced argument: R = DblRcp*Mantissa - 1 */
+        vfmsub231ps {rn-sae}, %zmm9, %zmm8, %zmm15
+        vrndscaleps $9, {sae}, %zmm4, %zmm13
+
+/* Prepare table index */
+        vpsrld    $19, %zmm9, %zmm10
+        vfmadd231ps {rn-sae}, %zmm15, %zmm11, %zmm7
+        vfnmadd231ps {rn-sae}, %zmm13, %zmm5, %zmm6
+        vpermt2ps cbrt_tbl_H+64+__svml_scbrt_data_internal_avx512(%rip), %zmm10, %zmm12
+        vfmadd213ps {rn-sae}, %zmm14, %zmm15, %zmm7
+        vscalefps {rn-sae}, %zmm13, %zmm12, %zmm2
+
+/* Table lookup: 2^(exponent%3) */
+        vpermps   __svml_scbrt_data_internal_avx512(%rip), %zmm6, %zmm1
+        vpermps   etbl_L+__svml_scbrt_data_internal_avx512(%rip), %zmm6, %zmm6
+
+/* Sh*R */
+        vmulps    {rn-sae}, %zmm15, %zmm1, %zmm14
+
+/* Sl + (Sh*R)*Poly */
+        vfmadd213ps {rn-sae}, %zmm6, %zmm7, %zmm14
+
+/*
+ * branch-free
+ * scaled_Th*(Sh+Sl+Sh*R*Poly)
+ */
+        vaddps    {rn-sae}, %zmm1, %zmm14, %zmm15
+        vmulps    {rn-sae}, %zmm2, %zmm15, %zmm3
+        vorps     %zmm0, %zmm3, %zmm0
+        ret
+
+END(_ZGVeN16v_cbrtf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_scbrt_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 etbl_H[16][1];
+        __declspec(align(64)) VUINT32 etbl_L[16][1];
+        __declspec(align(64)) VUINT32 cbrt_tbl_H[32][1];
+        __declspec(align(64)) VUINT32 BiasL[16][1];
+        __declspec(align(64)) VUINT32 SZero[16][1];
+        __declspec(align(64)) VUINT32 OneThird[16][1];
+        __declspec(align(64)) VUINT32 Bias3[16][1];
+        __declspec(align(64)) VUINT32 Three[16][1];
+        __declspec(align(64)) VUINT32 One[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
+    } __svml_scbrt_data_internal_avx512;
+#endif
+__svml_scbrt_data_internal_avx512:
+        /*== etbl_H ==*/
+        .long 0x3f800000
+        .long 0x3fa14518
+        .long 0x3fcb2ff5
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        /*== etbl_L ==*/
+        .align 64
+        .long 0x00000000
+        .long 0xb2ce51af
+        .long 0x32a7adc8
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        /*== cbrt_tbl_H ==*/
+        .align 64
+        .long 0x3fa14518
+        .long 0x3f9e0b2b
+        .long 0x3f9b0f9b
+        .long 0x3f984a9a
+        .long 0x3f95b5af
+        .long 0x3f934b6c
+        .long 0x3f910737
+        .long 0x3f8ee526
+        .long 0x3f8ce1da
+        .long 0x3f8afa6a
+        .long 0x3f892c4e
+        .long 0x3f87754e
+        .long 0x3f85d377
+        .long 0x3f844510
+        .long 0x3f82c892
+        .long 0x3f815c9f
+        .long 0x3f800000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        .long 0x00000000
+        /*== BiasL ==*/
+        .align 64
+        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000
+        /*== Zero ==*/
+        .align 64
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000
+        /*== OneThird ==*/
+        .align 64
+        .long 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab
+        /*== Bias3 ==*/
+        .align 64
+        .long 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000
+        /*== Three ==*/
+        .align 64
+        .long 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000
+        /*==One ==*/
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== poly_coeff3 ==*/
+        .align 64
+        .long 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c
+        /*== poly_coeff2 ==*/
+        .align 64
+        .long 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363
+        /*== poly_coeff1 ==*/
+        .align 64
+        .long 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa
+        .align 64
+        .type	__svml_scbrt_data_internal_avx512,@object
+        .size	__svml_scbrt_data_internal_avx512,.-__svml_scbrt_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S
new file mode 100644
index 0000000000..76fc254e7a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized cbrtf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_cbrtf _ZGVbN4v_cbrtf_sse2
+#include "../svml_s_cbrtf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c
new file mode 100644
index 0000000000..564a549b39
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized cbrtf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_cbrtf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_cbrtf, __GI__ZGVbN4v_cbrtf,
+	       __redirect__ZGVbN4v_cbrtf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S
new file mode 100644
index 0000000000..af42dd5164
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S
@@ -0,0 +1,490 @@
+/* Function cbrtf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *     x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
+ *     Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
+ *     where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in single precision
+ *     cbrtf(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
+ *     (T stores the high 24 bits, D stores the low order bits)
+ *     Result=2^k*T+(2^k*T*r)*P+2^k*D
+ *      where P=p1+p2*r+..
+ *
+ */
+
+/* Offsets for data table __svml_scbrt_data_internal
+ */
+#define _sRcp                         	0
+#define _sCbrtHL                      	128
+#define _sP2                          	512
+#define _sP1                          	528
+#define _sMantissaMask                	544
+#define _sMantissaMask1               	560
+#define _sExpMask                     	576
+#define _sExpMask1                    	592
+#define _iRcpIndexMask                	608
+#define _iBExpMask                    	624
+#define _iSignMask                    	640
+#define _iBias                        	656
+#define _iOne                         	672
+#define _i555                         	688
+#define _iAbsMask                     	704
+#define _iSubConst                    	720
+#define _iCmpConst                    	736
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_cbrtf_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+
+/*
+ * Load constants
+ * Reciprocal index calculation
+ */
+        movaps    %xmm0, %xmm2
+        movdqu    _iRcpIndexMask+__svml_scbrt_data_internal(%rip), %xmm3
+        psrld     $16, %xmm2
+        pand      %xmm2, %xmm3
+
+/* Load reciprocal value */
+        lea       __svml_scbrt_data_internal(%rip), %rdx
+        pshufd    $1, %xmm3, %xmm5
+
+/* Get signed biased exponent */
+        psrld     $7, %xmm2
+        movd      %xmm3, %eax
+        movd      %xmm5, %ecx
+
+/* Get absolute biased exponent */
+        movdqu    _iBExpMask+__svml_scbrt_data_internal(%rip), %xmm15
+
+/*
+ * Calculate exponent/3
+ * i555Exp=(2^{12}-1)/3*exponent
+ */
+        movdqu    _i555+__svml_scbrt_data_internal(%rip), %xmm14
+        pand      %xmm2, %xmm15
+        movslq    %eax, %rax
+        movdqa    %xmm14, %xmm5
+        movslq    %ecx, %rcx
+        psrlq     $32, %xmm14
+        pmuludq   %xmm15, %xmm5
+        movd      (%rdx,%rax), %xmm4
+        movd      (%rdx,%rcx), %xmm6
+        punpckldq %xmm6, %xmm4
+        movdqa    %xmm15, %xmm6
+        psrlq     $32, %xmm15
+        pmuludq   %xmm14, %xmm15
+        pshufd    $2, %xmm3, %xmm7
+        psllq     $32, %xmm15
+        pshufd    $3, %xmm3, %xmm8
+        movd      %xmm7, %esi
+        movd      %xmm8, %edi
+
+/* Argument reduction */
+        movups    _sMantissaMask+__svml_scbrt_data_internal(%rip), %xmm12
+        movups    _sMantissaMask1+__svml_scbrt_data_internal(%rip), %xmm11
+        andps     %xmm0, %xmm12
+        pand      .FLT_17(%rip), %xmm5
+        andps     %xmm0, %xmm11
+        movslq    %esi, %rsi
+        por       %xmm15, %xmm5
+        movslq    %edi, %rdi
+
+/* Get K (exponent=3*k+j) */
+        psrld     $12, %xmm5
+        orps      _sExpMask+__svml_scbrt_data_internal(%rip), %xmm12
+        orps      _sExpMask1+__svml_scbrt_data_internal(%rip), %xmm11
+        psubd     _iOne+__svml_scbrt_data_internal(%rip), %xmm6
+
+/* r=y-y` */
+        subps     %xmm11, %xmm12
+
+/* Get J */
+        psubd     %xmm5, %xmm6
+        movdqu    _iAbsMask+__svml_scbrt_data_internal(%rip), %xmm1
+        psubd     %xmm5, %xmm6
+        movd      (%rdx,%rsi), %xmm10
+        pand      %xmm0, %xmm1
+        movd      (%rdx,%rdi), %xmm9
+        psubd     %xmm5, %xmm6
+        punpckldq %xmm9, %xmm10
+
+/* Get 128*J */
+        pslld     $7, %xmm6
+        punpcklqdq %xmm10, %xmm4
+
+/*
+ * iCbrtIndex=4*l+128*j
+ * Zero index if callout expected
+ */
+        paddd     %xmm6, %xmm3
+        psubd     _iSubConst+__svml_scbrt_data_internal(%rip), %xmm1
+        pcmpgtd   _iCmpConst+__svml_scbrt_data_internal(%rip), %xmm1
+
+/* r=(y-y`)*rcp_table(y`) */
+        mulps     %xmm12, %xmm4
+        movmskps  %xmm1, %eax
+
+/* Biased exponent-1 */
+        movdqu    _iSignMask+__svml_scbrt_data_internal(%rip), %xmm13
+        pandn     %xmm3, %xmm1
+
+/*
+ * Add 2/3*(bias-1)+1 to (k+1/3*(bias-1))
+ * Attach sign to exponent
+ */
+        movdqu    _iBias+__svml_scbrt_data_internal(%rip), %xmm12
+        pand      %xmm13, %xmm2
+        paddd     %xmm5, %xmm12
+
+/* Load Cbrt table Hi & Lo values */
+        movd      %xmm1, %r8d
+        por       %xmm2, %xmm12
+        pshufd    $1, %xmm1, %xmm2
+        pslld     $23, %xmm12
+        pshufd    $2, %xmm1, %xmm7
+        pshufd    $3, %xmm1, %xmm1
+        movd      %xmm2, %r9d
+        movd      %xmm7, %r10d
+        movd      %xmm1, %r11d
+
+/* Polynomial:    p1+r*(p2*r+r*(p3+r*p4)) */
+        movups    _sP2+__svml_scbrt_data_internal(%rip), %xmm11
+        mulps     %xmm4, %xmm11
+        movslq    %r8d, %r8
+        addps     _sP1+__svml_scbrt_data_internal(%rip), %xmm11
+        movslq    %r9d, %r9
+        movslq    %r10d, %r10
+        movslq    %r11d, %r11
+        movd      128(%rdx,%r8), %xmm10
+        movd      128(%rdx,%r9), %xmm3
+        movd      128(%rdx,%r10), %xmm9
+        movd      128(%rdx,%r11), %xmm8
+        punpckldq %xmm3, %xmm10
+        punpckldq %xmm8, %xmm9
+        punpcklqdq %xmm9, %xmm10
+
+/* sCbrtHi *= 2^k */
+        mulps     %xmm10, %xmm12
+
+/* T`*r */
+        mulps     %xmm12, %xmm4
+
+/* (T`*r)*P */
+        mulps     %xmm4, %xmm11
+
+/*
+ * T`*r*P+D`
+ * result = T`+(T`*r*P+D`)
+ */
+        addps     %xmm11, %xmm12
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm12
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm12, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm12, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 eax
+
+        xorl      %edx, %edx
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm12
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm12
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      cbrtf@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_cbrtf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_scbrt_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _sRcp[32][1];
+        __declspec(align(16)) VUINT32 _sCbrtHL[96][1];
+        __declspec(align(16)) VUINT32 _sP2[4][1];
+        __declspec(align(16)) VUINT32 _sP1[4][1];
+        __declspec(align(16)) VUINT32 _sMantissaMask[4][1];
+        __declspec(align(16)) VUINT32 _sMantissaMask1[4][1];
+        __declspec(align(16)) VUINT32 _sExpMask[4][1];
+        __declspec(align(16)) VUINT32 _sExpMask1[4][1];
+        __declspec(align(16)) VUINT32 _iRcpIndexMask[4][1];
+        __declspec(align(16)) VUINT32 _iBExpMask[4][1];
+        __declspec(align(16)) VUINT32 _iSignMask[4][1];
+        __declspec(align(16)) VUINT32 _iBias[4][1];
+        __declspec(align(16)) VUINT32 _iOne[4][1];
+        __declspec(align(16)) VUINT32 _i555[4][1];
+        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
+        __declspec(align(16)) VUINT32 _iSubConst[4][1];
+        __declspec(align(16)) VUINT32 _iCmpConst[4][1];
+} __svml_scbrt_data_internal;
+#endif
+__svml_scbrt_data_internal:
+        /*== _sRcp ==*/
+        .long 0xBF7C0FC1  /* (1/(1+0/32+1/64)) = -.984615 */
+        .long 0xBF74898D  /* (1/(1+1/32+1/64)) = -.955224 */
+        .long 0xBF6D7304  /* (1/(1+2/32+1/64)) = -.927536 */
+        .long 0xBF66C2B4  /* (1/(1+3/32+1/64)) = -.901408 */
+        .long 0xBF607038  /* (1/(1+4/32+1/64)) = -.876712 */
+        .long 0xBF5A740E  /* (1/(1+5/32+1/64)) = -.853333 */
+        .long 0xBF54C77B  /* (1/(1+6/32+1/64)) = -.831169 */
+        .long 0xBF4F6475  /* (1/(1+7/32+1/64)) = -.810127 */
+        .long 0xBF4A4588  /* (1/(1+8/32+1/64)) = -.790123 */
+        .long 0xBF4565C8  /* (1/(1+9/32+1/64)) = -.771084 */
+        .long 0xBF40C0C1  /* (1/(1+10/32+1/64)) = -.752941 */
+        .long 0xBF3C5264  /* (1/(1+11/32+1/64)) = -.735632 */
+        .long 0xBF381703  /* (1/(1+12/32+1/64)) = -.719101 */
+        .long 0xBF340B41  /* (1/(1+13/32+1/64)) = -.703297 */
+        .long 0xBF302C0B  /* (1/(1+14/32+1/64)) = -.688172 */
+        .long 0xBF2C7692  /* (1/(1+15/32+1/64)) = -.673684 */
+        .long 0xBF28E83F  /* (1/(1+16/32+1/64)) = -.659794 */
+        .long 0xBF257EB5  /* (1/(1+17/32+1/64)) = -.646465 */
+        .long 0xBF2237C3  /* (1/(1+18/32+1/64)) = -.633663 */
+        .long 0xBF1F1166  /* (1/(1+19/32+1/64)) = -.621359 */
+        .long 0xBF1C09C1  /* (1/(1+20/32+1/64)) = -.609524 */
+        .long 0xBF191F1A  /* (1/(1+21/32+1/64)) = -.598131 */
+        .long 0xBF164FDA  /* (1/(1+22/32+1/64)) = -.587156 */
+        .long 0xBF139A86  /* (1/(1+23/32+1/64)) = -.576577 */
+        .long 0xBF10FDBC  /* (1/(1+24/32+1/64)) = -.566372 */
+        .long 0xBF0E7835  /* (1/(1+25/32+1/64)) = -.556522 */
+        .long 0xBF0C08C1  /* (1/(1+26/32+1/64)) = -.547009 */
+        .long 0xBF09AE41  /* (1/(1+27/32+1/64)) = -.537815 */
+        .long 0xBF0767AB  /* (1/(1+28/32+1/64)) = -.528926 */
+        .long 0xBF053408  /* (1/(1+29/32+1/64)) = -.520325 */
+        .long 0xBF03126F  /* (1/(1+30/32+1/64)) = -.512    */
+        .long 0xBF010204  /* (1/(1+31/32+1/64)) = -.503937 */
+        /*== _sCbrtHL ==*/
+        .align 16
+        .long 0x3F80A9C9    /* HI((2^0*(1+0/32+1/64))^(1/3)) = 1.005181 */
+        .long 0x3F81F833    /* HI((2^0*(1+1/32+1/64))^(1/3)) = 1.015387 */
+        .long 0x3F834007    /* HI((2^0*(1+2/32+1/64))^(1/3)) = 1.025391 */
+        .long 0x3F848194    /* HI((2^0*(1+3/32+1/64))^(1/3)) = 1.035204 */
+        .long 0x3F85BD25    /* HI((2^0*(1+4/32+1/64))^(1/3)) = 1.044835 */
+        .long 0x3F86F300    /* HI((2^0*(1+5/32+1/64))^(1/3)) = 1.054291 */
+        .long 0x3F882365    /* HI((2^0*(1+6/32+1/64))^(1/3)) = 1.06358  */
+        .long 0x3F894E90    /* HI((2^0*(1+7/32+1/64))^(1/3)) = 1.07271  */
+        .long 0x3F8A74B9    /* HI((2^0*(1+8/32+1/64))^(1/3)) = 1.081687 */
+        .long 0x3F8B9615    /* HI((2^0*(1+9/32+1/64))^(1/3)) = 1.090518 */
+        .long 0x3F8CB2D4    /* HI((2^0*(1+10/32+1/64))^(1/3)) = 1.099207 */
+        .long 0x3F8DCB24    /* HI((2^0*(1+11/32+1/64))^(1/3)) = 1.107762 */
+        .long 0x3F8EDF31    /* HI((2^0*(1+12/32+1/64))^(1/3)) = 1.116186 */
+        .long 0x3F8FEF22    /* HI((2^0*(1+13/32+1/64))^(1/3)) = 1.124485 */
+        .long 0x3F90FB1F    /* HI((2^0*(1+14/32+1/64))^(1/3)) = 1.132664 */
+        .long 0x3F92034C    /* HI((2^0*(1+15/32+1/64))^(1/3)) = 1.140726 */
+        .long 0x3F9307CA    /* HI((2^0*(1+16/32+1/64))^(1/3)) = 1.148675 */
+        .long 0x3F9408B9    /* HI((2^0*(1+17/32+1/64))^(1/3)) = 1.156516 */
+        .long 0x3F950638    /* HI((2^0*(1+18/32+1/64))^(1/3)) = 1.164252 */
+        .long 0x3F960064    /* HI((2^0*(1+19/32+1/64))^(1/3)) = 1.171887 */
+        .long 0x3F96F759    /* HI((2^0*(1+20/32+1/64))^(1/3)) = 1.179423 */
+        .long 0x3F97EB2F    /* HI((2^0*(1+21/32+1/64))^(1/3)) = 1.186865 */
+        .long 0x3F98DC01    /* HI((2^0*(1+22/32+1/64))^(1/3)) = 1.194214 */
+        .long 0x3F99C9E5    /* HI((2^0*(1+23/32+1/64))^(1/3)) = 1.201474 */
+        .long 0x3F9AB4F2    /* HI((2^0*(1+24/32+1/64))^(1/3)) = 1.208647 */
+        .long 0x3F9B9D3D    /* HI((2^0*(1+25/32+1/64))^(1/3)) = 1.215736 */
+        .long 0x3F9C82DA    /* HI((2^0*(1+26/32+1/64))^(1/3)) = 1.222743 */
+        .long 0x3F9D65DD    /* HI((2^0*(1+27/32+1/64))^(1/3)) = 1.229671 */
+        .long 0x3F9E4659    /* HI((2^0*(1+28/32+1/64))^(1/3)) = 1.236522 */
+        .long 0x3F9F245F    /* HI((2^0*(1+29/32+1/64))^(1/3)) = 1.243297 */
+        .long 0x3FA00000    /* HI((2^0*(1+30/32+1/64))^(1/3)) = 1.25     */
+        .long 0x3FA0D94C    /* HI((2^0*(1+31/32+1/64))^(1/3)) = 1.256631 */
+        .long 0x3FA21B02    /* HI((2^1*(1+0/32+1/64))^(1/3)) = 1.266449 */
+        .long 0x3FA3C059    /* HI((2^1*(1+1/32+1/64))^(1/3)) = 1.279307 */
+        .long 0x3FA55D61    /* HI((2^1*(1+2/32+1/64))^(1/3)) = 1.291912 */
+        .long 0x3FA6F282    /* HI((2^1*(1+3/32+1/64))^(1/3)) = 1.304276 */
+        .long 0x3FA8801A    /* HI((2^1*(1+4/32+1/64))^(1/3)) = 1.316409 */
+        .long 0x3FAA067E    /* HI((2^1*(1+5/32+1/64))^(1/3)) = 1.328323 */
+        .long 0x3FAB8602    /* HI((2^1*(1+6/32+1/64))^(1/3)) = 1.340027 */
+        .long 0x3FACFEEF    /* HI((2^1*(1+7/32+1/64))^(1/3)) = 1.35153  */
+        .long 0x3FAE718E    /* HI((2^1*(1+8/32+1/64))^(1/3)) = 1.36284  */
+        .long 0x3FAFDE1F    /* HI((2^1*(1+9/32+1/64))^(1/3)) = 1.373966 */
+        .long 0x3FB144E1    /* HI((2^1*(1+10/32+1/64))^(1/3)) = 1.384915 */
+        .long 0x3FB2A60D    /* HI((2^1*(1+11/32+1/64))^(1/3)) = 1.395692 */
+        .long 0x3FB401DA    /* HI((2^1*(1+12/32+1/64))^(1/3)) = 1.406307 */
+        .long 0x3FB5587B    /* HI((2^1*(1+13/32+1/64))^(1/3)) = 1.416763 */
+        .long 0x3FB6AA20    /* HI((2^1*(1+14/32+1/64))^(1/3)) = 1.427067 */
+        .long 0x3FB7F6F7    /* HI((2^1*(1+15/32+1/64))^(1/3)) = 1.437224 */
+        .long 0x3FB93F29    /* HI((2^1*(1+16/32+1/64))^(1/3)) = 1.44724  */
+        .long 0x3FBA82E1    /* HI((2^1*(1+17/32+1/64))^(1/3)) = 1.457119 */
+        .long 0x3FBBC244    /* HI((2^1*(1+18/32+1/64))^(1/3)) = 1.466866 */
+        .long 0x3FBCFD77    /* HI((2^1*(1+19/32+1/64))^(1/3)) = 1.476485 */
+        .long 0x3FBE349B    /* HI((2^1*(1+20/32+1/64))^(1/3)) = 1.48598  */
+        .long 0x3FBF67D3    /* HI((2^1*(1+21/32+1/64))^(1/3)) = 1.495356 */
+        .long 0x3FC0973C    /* HI((2^1*(1+22/32+1/64))^(1/3)) = 1.504615 */
+        .long 0x3FC1C2F6    /* HI((2^1*(1+23/32+1/64))^(1/3)) = 1.513762 */
+        .long 0x3FC2EB1A    /* HI((2^1*(1+24/32+1/64))^(1/3)) = 1.5228   */
+        .long 0x3FC40FC6    /* HI((2^1*(1+25/32+1/64))^(1/3)) = 1.531731 */
+        .long 0x3FC53112    /* HI((2^1*(1+26/32+1/64))^(1/3)) = 1.54056  */
+        .long 0x3FC64F16    /* HI((2^1*(1+27/32+1/64))^(1/3)) = 1.549289 */
+        .long 0x3FC769EB    /* HI((2^1*(1+28/32+1/64))^(1/3)) = 1.55792  */
+        .long 0x3FC881A6    /* HI((2^1*(1+29/32+1/64))^(1/3)) = 1.566457 */
+        .long 0x3FC9965D    /* HI((2^1*(1+30/32+1/64))^(1/3)) = 1.574901 */
+        .long 0x3FCAA825    /* HI((2^1*(1+31/32+1/64))^(1/3)) = 1.583256 */
+        .long 0x3FCC3D79    /* HI((2^2*(1+0/32+1/64))^(1/3)) = 1.595626 */
+        .long 0x3FCE5054    /* HI((2^2*(1+1/32+1/64))^(1/3)) = 1.611826 */
+        .long 0x3FD058B8    /* HI((2^2*(1+2/32+1/64))^(1/3)) = 1.627707 */
+        .long 0x3FD25726    /* HI((2^2*(1+3/32+1/64))^(1/3)) = 1.643285 */
+        .long 0x3FD44C15    /* HI((2^2*(1+4/32+1/64))^(1/3)) = 1.658572 */
+        .long 0x3FD637F2    /* HI((2^2*(1+5/32+1/64))^(1/3)) = 1.673582 */
+        .long 0x3FD81B24    /* HI((2^2*(1+6/32+1/64))^(1/3)) = 1.688328 */
+        .long 0x3FD9F60B    /* HI((2^2*(1+7/32+1/64))^(1/3)) = 1.702821 */
+        .long 0x3FDBC8FE    /* HI((2^2*(1+8/32+1/64))^(1/3)) = 1.717071 */
+        .long 0x3FDD9452    /* HI((2^2*(1+9/32+1/64))^(1/3)) = 1.731089 */
+        .long 0x3FDF5853    /* HI((2^2*(1+10/32+1/64))^(1/3)) = 1.744883 */
+        .long 0x3FE1154B    /* HI((2^2*(1+11/32+1/64))^(1/3)) = 1.758462 */
+        .long 0x3FE2CB7F    /* HI((2^2*(1+12/32+1/64))^(1/3)) = 1.771835 */
+        .long 0x3FE47B2E    /* HI((2^2*(1+13/32+1/64))^(1/3)) = 1.785009 */
+        .long 0x3FE62496    /* HI((2^2*(1+14/32+1/64))^(1/3)) = 1.797992 */
+        .long 0x3FE7C7F0    /* HI((2^2*(1+15/32+1/64))^(1/3)) = 1.810789 */
+        .long 0x3FE96571    /* HI((2^2*(1+16/32+1/64))^(1/3)) = 1.823408 */
+        .long 0x3FEAFD4C    /* HI((2^2*(1+17/32+1/64))^(1/3)) = 1.835855 */
+        .long 0x3FEC8FB3    /* HI((2^2*(1+18/32+1/64))^(1/3)) = 1.848135 */
+        .long 0x3FEE1CD3    /* HI((2^2*(1+19/32+1/64))^(1/3)) = 1.860255 */
+        .long 0x3FEFA4D7    /* HI((2^2*(1+20/32+1/64))^(1/3)) = 1.872218 */
+        .long 0x3FF127E9    /* HI((2^2*(1+21/32+1/64))^(1/3)) = 1.88403  */
+        .long 0x3FF2A62F    /* HI((2^2*(1+22/32+1/64))^(1/3)) = 1.895697 */
+        .long 0x3FF41FD0    /* HI((2^2*(1+23/32+1/64))^(1/3)) = 1.907221 */
+        .long 0x3FF594EE    /* HI((2^2*(1+24/32+1/64))^(1/3)) = 1.918607 */
+        .long 0x3FF705AC    /* HI((2^2*(1+25/32+1/64))^(1/3)) = 1.929861 */
+        .long 0x3FF8722A    /* HI((2^2*(1+26/32+1/64))^(1/3)) = 1.940984 */
+        .long 0x3FF9DA86    /* HI((2^2*(1+27/32+1/64))^(1/3)) = 1.951981 */
+        .long 0x3FFB3EDE    /* HI((2^2*(1+28/32+1/64))^(1/3)) = 1.962856 */
+        .long 0x3FFC9F4E    /* HI((2^2*(1+29/32+1/64))^(1/3)) = 1.973612 */
+        .long 0x3FFDFBF2    /* HI((2^2*(1+30/32+1/64))^(1/3)) = 1.984251 */
+        .long 0x3FFF54E3    /* HI((2^2*(1+31/32+1/64))^(1/3)) = 1.994778 */
+        .align 16
+        .long 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962  /* _sP2 */
+        .align 16
+        .long 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91  /* _sP1 */
+        .align 16
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff  /* _sMantissaMask (EXP_MSK3) */
+        .align 16
+        .long 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000  /* _sMantissaMask1 (SIG_MASK) */
+        .align 16
+        .long 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000  /* _sExpMask  (EXP_MASK) */
+        .align 16
+        .long 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000  /* _sExpMask1 (EXP_MASK2) */
+        .align 16
+        .long 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c  /* _iRcpIndexMask */
+        .align 16
+        .long 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff  /* _iBExpMask */
+        .align 16
+        .long 0x00000100, 0x00000100, 0x00000100, 0x00000100  /* _iSignMask */
+        .align 16
+        .long 0x00000055, 0x00000055, 0x00000055, 0x00000055  /* _iBias */
+        .align 16
+        .long 0x00000001, 0x00000001, 0x00000001, 0x00000001  /* _iOne */
+        .align 16
+        .long 0x00000555, 0x00000555, 0x00000555, 0x00000555  /* _i555 */
+        .align 16
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _iAbsMask */
+        .align 16
+        .long 0x80800000, 0x80800000, 0x80800000, 0x80800000  /* _iSubConst */
+        .align 16
+        .long 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF  /* _iCmpConst */
+        .align 16
+        .type	__svml_scbrt_data_internal,@object
+        .size	__svml_scbrt_data_internal,.-__svml_scbrt_data_internal
+        .align 16
+
+.FLT_17:
+        .long	0xffffffff,0x00000000,0xffffffff,0x00000000
+        .type	.FLT_17,@object
+        .size	.FLT_17,16
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S
new file mode 100644
index 0000000000..8eaa457fa6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized cbrtf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_cbrtf _ZGVdN8v_cbrtf_sse_wrapper
+#include "../svml_s_cbrtf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c
new file mode 100644
index 0000000000..089d28461f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized cbrtf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_cbrtf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_cbrtf, __GI__ZGVdN8v_cbrtf,
+	       __redirect__ZGVdN8v_cbrtf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S
new file mode 100644
index 0000000000..acd20d9db8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S
@@ -0,0 +1,509 @@
+/* Function cbrtf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *     x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
+ *     Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
+ *     where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in single precision
+ *     cbrtf(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
+ *     (T stores the high 24 bits, D stores the low order bits)
+ *     Result=2^k*T+(2^k*T*r)*P+2^k*D
+ *      where P=p1+p2*r+..
+ *
+ */
+
+/* Offsets for data table __svml_scbrt_data_internal
+ */
+#define _sRcp                         	0
+#define _sCbrtHL                      	128
+#define _sP2                          	512
+#define _sP1                          	544
+#define _sMantissaMask                	576
+#define _sMantissaMask1               	608
+#define _sExpMask                     	640
+#define _sExpMask1                    	672
+#define _iRcpIndexMask                	704
+#define _iBExpMask                    	736
+#define _iSignMask                    	768
+#define _iBias                        	800
+#define _iOne                         	832
+#define _i555                         	864
+#define _iAbsMask                     	896
+#define _iSubConst                    	928
+#define _iCmpConst                    	960
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_cbrtf_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+
+/* Load reciprocal value */
+        lea       __svml_scbrt_data_internal(%rip), %rdx
+        vmovaps   %ymm0, %ymm5
+
+/*
+ * Load constants
+ * Reciprocal index calculation
+ */
+        vpsrld    $16, %ymm5, %ymm3
+        vpand     _iRcpIndexMask+__svml_scbrt_data_internal(%rip), %ymm3, %ymm4
+        vextractf128 $1, %ymm4, %xmm15
+        vmovd     %xmm4, %eax
+        vmovd     %xmm15, %r8d
+        vpextrd   $1, %xmm15, %r9d
+        vpextrd   $2, %xmm15, %r10d
+        vpextrd   $3, %xmm15, %r11d
+        movslq    %r8d, %r8
+        movslq    %r9d, %r9
+        movslq    %r10d, %r10
+        movslq    %r11d, %r11
+        vpextrd   $1, %xmm4, %ecx
+        vpextrd   $2, %xmm4, %esi
+        vpextrd   $3, %xmm4, %edi
+        movslq    %eax, %rax
+        movslq    %ecx, %rcx
+        movslq    %esi, %rsi
+        movslq    %edi, %rdi
+        vmovd     (%rdx,%r8), %xmm13
+        vmovd     (%rdx,%r9), %xmm14
+        vmovd     (%rdx,%r10), %xmm1
+        vmovd     (%rdx,%r11), %xmm0
+        vpunpckldq %xmm14, %xmm13, %xmm2
+        vpunpckldq %xmm0, %xmm1, %xmm13
+
+/* Get signed biased exponent */
+        vpsrld    $7, %ymm3, %ymm0
+        vmovd     (%rdx,%rax), %xmm6
+        vmovd     (%rdx,%rcx), %xmm7
+        vmovd     (%rdx,%rsi), %xmm8
+        vmovd     (%rdx,%rdi), %xmm9
+        vpunpckldq %xmm7, %xmm6, %xmm10
+        vpunpckldq %xmm9, %xmm8, %xmm11
+        vpunpcklqdq %xmm11, %xmm10, %xmm12
+        vpunpcklqdq %xmm13, %xmm2, %xmm6
+        vandps    _iAbsMask+__svml_scbrt_data_internal(%rip), %ymm5, %ymm3
+
+/* Argument reduction */
+        vandps    _sMantissaMask+__svml_scbrt_data_internal(%rip), %ymm5, %ymm8
+        vandps    _sMantissaMask1+__svml_scbrt_data_internal(%rip), %ymm5, %ymm9
+        vpsubd    _iSubConst+__svml_scbrt_data_internal(%rip), %ymm3, %ymm7
+        vorps     _sExpMask+__svml_scbrt_data_internal(%rip), %ymm8, %ymm10
+        vorps     _sExpMask1+__svml_scbrt_data_internal(%rip), %ymm9, %ymm11
+
+/* r=y-y` */
+        vsubps    %ymm11, %ymm10, %ymm15
+
+/* Biased exponent-1 */
+        vpand     _iSignMask+__svml_scbrt_data_internal(%rip), %ymm0, %ymm8
+        vpcmpgtd  _iCmpConst+__svml_scbrt_data_internal(%rip), %ymm7, %ymm2
+        vmovmskps %ymm2, %eax
+        vinsertf128 $1, %xmm6, %ymm12, %ymm14
+
+/* Get absolute biased exponent */
+        vpand     _iBExpMask+__svml_scbrt_data_internal(%rip), %ymm0, %ymm6
+
+/* r=(y-y`)*rcp_table(y`) */
+        vmulps    %ymm15, %ymm14, %ymm1
+        vpsubd    _iOne+__svml_scbrt_data_internal(%rip), %ymm6, %ymm10
+
+/*
+ * Calculate exponent/3
+ * i555Exp=(2^{12}-1)/3*exponent
+ */
+        vpmulld   _i555+__svml_scbrt_data_internal(%rip), %ymm6, %ymm3
+
+/* Get K (exponent=3*k+j) */
+        vpsrld    $12, %ymm3, %ymm13
+
+/* Get J */
+        vpsubd    %ymm13, %ymm10, %ymm11
+
+/* Add 2/3*(bias-1)+1 to (k+1/3*(bias-1)) */
+        vpaddd    _iBias+__svml_scbrt_data_internal(%rip), %ymm13, %ymm7
+        vpsubd    %ymm13, %ymm11, %ymm12
+
+/* Attach sign to exponent */
+        vpor      %ymm8, %ymm7, %ymm9
+        vpsubd    %ymm13, %ymm12, %ymm14
+        vpslld    $23, %ymm9, %ymm0
+
+/* Get 128*J */
+        vpslld    $7, %ymm14, %ymm15
+
+/* iCbrtIndex=4*l+128*j */
+        vpaddd    %ymm15, %ymm4, %ymm4
+
+/* Zero index if callout expected */
+        vpandn    %ymm4, %ymm2, %ymm4
+
+/* Load Cbrt table Hi & Lo values */
+        vmovd     %xmm4, %ecx
+        vextractf128 $1, %ymm4, %xmm13
+        vpextrd   $1, %xmm4, %esi
+        movslq    %ecx, %rcx
+        movslq    %esi, %rsi
+        vmovd     %xmm13, %r9d
+        vmovd     128(%rdx,%rcx), %xmm2
+        vpextrd   $2, %xmm4, %edi
+        vpextrd   $3, %xmm4, %r8d
+        vmovd     128(%rdx,%rsi), %xmm3
+        vpextrd   $1, %xmm13, %r10d
+        vpextrd   $2, %xmm13, %ecx
+        vpextrd   $3, %xmm13, %esi
+        movslq    %edi, %rdi
+        movslq    %r8d, %r8
+        movslq    %r9d, %r9
+        movslq    %r10d, %r10
+        movslq    %ecx, %rcx
+        movslq    %esi, %rsi
+        vmovd     128(%rdx,%rdi), %xmm6
+        vmovd     128(%rdx,%r8), %xmm7
+        vmovd     128(%rdx,%r9), %xmm11
+        vmovd     128(%rdx,%r10), %xmm12
+        vmovd     128(%rdx,%rcx), %xmm14
+        vmovd     128(%rdx,%rsi), %xmm15
+        vpunpckldq %xmm3, %xmm2, %xmm8
+        vpunpckldq %xmm7, %xmm6, %xmm9
+        vpunpckldq %xmm12, %xmm11, %xmm4
+        vpunpckldq %xmm15, %xmm14, %xmm11
+        vpunpcklqdq %xmm9, %xmm8, %xmm10
+        vpunpcklqdq %xmm11, %xmm4, %xmm2
+        vinsertf128 $1, %xmm2, %ymm10, %ymm3
+
+/* sCbrtHi *= 2^k */
+        vmulps    %ymm3, %ymm0, %ymm2
+
+/* Polynomial:    p1+r*(p2*r+r*(p3+r*p4)) */
+        vmovups   _sP2+__svml_scbrt_data_internal(%rip), %ymm0
+        vfmadd213ps _sP1+__svml_scbrt_data_internal(%rip), %ymm1, %ymm0
+
+/* T`*r */
+        vmulps    %ymm2, %ymm1, %ymm1
+
+/* (T`*r)*P */
+        vmulps    %ymm1, %ymm0, %ymm0
+
+/*
+ * T`*r*P+D`
+ * result = T`+(T`*r*P+D`)
+ */
+        vaddps    %ymm0, %ymm2, %ymm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm5
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm5, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      cbrtf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_cbrtf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_scbrt_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _sRcp[32][1];
+        __declspec(align(32)) VUINT32 _sCbrtHL[96][1];
+        __declspec(align(32)) VUINT32 _sP2[8][1];
+        __declspec(align(32)) VUINT32 _sP1[8][1];
+        __declspec(align(32)) VUINT32 _sMantissaMask[8][1];
+        __declspec(align(32)) VUINT32 _sMantissaMask1[8][1];
+        __declspec(align(32)) VUINT32 _sExpMask[8][1];
+        __declspec(align(32)) VUINT32 _sExpMask1[8][1];
+        __declspec(align(32)) VUINT32 _iRcpIndexMask[8][1];
+        __declspec(align(32)) VUINT32 _iBExpMask[8][1];
+        __declspec(align(32)) VUINT32 _iSignMask[8][1];
+        __declspec(align(32)) VUINT32 _iBias[8][1];
+        __declspec(align(32)) VUINT32 _iOne[8][1];
+        __declspec(align(32)) VUINT32 _i555[8][1];
+        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
+        __declspec(align(32)) VUINT32 _iSubConst[8][1];
+        __declspec(align(32)) VUINT32 _iCmpConst[8][1];
+} __svml_scbrt_data_internal;
+#endif
+__svml_scbrt_data_internal:
+        /*== _sRcp ==*/
+        .long 0xBF7C0FC1  /* (1/(1+0/32+1/64)) = -.984615 */
+        .long 0xBF74898D  /* (1/(1+1/32+1/64)) = -.955224 */
+        .long 0xBF6D7304  /* (1/(1+2/32+1/64)) = -.927536 */
+        .long 0xBF66C2B4  /* (1/(1+3/32+1/64)) = -.901408 */
+        .long 0xBF607038  /* (1/(1+4/32+1/64)) = -.876712 */
+        .long 0xBF5A740E  /* (1/(1+5/32+1/64)) = -.853333 */
+        .long 0xBF54C77B  /* (1/(1+6/32+1/64)) = -.831169 */
+        .long 0xBF4F6475  /* (1/(1+7/32+1/64)) = -.810127 */
+        .long 0xBF4A4588  /* (1/(1+8/32+1/64)) = -.790123 */
+        .long 0xBF4565C8  /* (1/(1+9/32+1/64)) = -.771084 */
+        .long 0xBF40C0C1  /* (1/(1+10/32+1/64)) = -.752941 */
+        .long 0xBF3C5264  /* (1/(1+11/32+1/64)) = -.735632 */
+        .long 0xBF381703  /* (1/(1+12/32+1/64)) = -.719101 */
+        .long 0xBF340B41  /* (1/(1+13/32+1/64)) = -.703297 */
+        .long 0xBF302C0B  /* (1/(1+14/32+1/64)) = -.688172 */
+        .long 0xBF2C7692  /* (1/(1+15/32+1/64)) = -.673684 */
+        .long 0xBF28E83F  /* (1/(1+16/32+1/64)) = -.659794 */
+        .long 0xBF257EB5  /* (1/(1+17/32+1/64)) = -.646465 */
+        .long 0xBF2237C3  /* (1/(1+18/32+1/64)) = -.633663 */
+        .long 0xBF1F1166  /* (1/(1+19/32+1/64)) = -.621359 */
+        .long 0xBF1C09C1  /* (1/(1+20/32+1/64)) = -.609524 */
+        .long 0xBF191F1A  /* (1/(1+21/32+1/64)) = -.598131 */
+        .long 0xBF164FDA  /* (1/(1+22/32+1/64)) = -.587156 */
+        .long 0xBF139A86  /* (1/(1+23/32+1/64)) = -.576577 */
+        .long 0xBF10FDBC  /* (1/(1+24/32+1/64)) = -.566372 */
+        .long 0xBF0E7835  /* (1/(1+25/32+1/64)) = -.556522 */
+        .long 0xBF0C08C1  /* (1/(1+26/32+1/64)) = -.547009 */
+        .long 0xBF09AE41  /* (1/(1+27/32+1/64)) = -.537815 */
+        .long 0xBF0767AB  /* (1/(1+28/32+1/64)) = -.528926 */
+        .long 0xBF053408  /* (1/(1+29/32+1/64)) = -.520325 */
+        .long 0xBF03126F  /* (1/(1+30/32+1/64)) = -.512    */
+        .long 0xBF010204  /* (1/(1+31/32+1/64)) = -.503937 */
+        /*== _sCbrtHL ==*/
+        .align 32
+        .long 0x3F80A9C9    /* HI((2^0*(1+0/32+1/64))^(1/3)) = 1.005181 */
+        .long 0x3F81F833    /* HI((2^0*(1+1/32+1/64))^(1/3)) = 1.015387 */
+        .long 0x3F834007    /* HI((2^0*(1+2/32+1/64))^(1/3)) = 1.025391 */
+        .long 0x3F848194    /* HI((2^0*(1+3/32+1/64))^(1/3)) = 1.035204 */
+        .long 0x3F85BD25    /* HI((2^0*(1+4/32+1/64))^(1/3)) = 1.044835 */
+        .long 0x3F86F300    /* HI((2^0*(1+5/32+1/64))^(1/3)) = 1.054291 */
+        .long 0x3F882365    /* HI((2^0*(1+6/32+1/64))^(1/3)) = 1.06358  */
+        .long 0x3F894E90    /* HI((2^0*(1+7/32+1/64))^(1/3)) = 1.07271  */
+        .long 0x3F8A74B9    /* HI((2^0*(1+8/32+1/64))^(1/3)) = 1.081687 */
+        .long 0x3F8B9615    /* HI((2^0*(1+9/32+1/64))^(1/3)) = 1.090518 */
+        .long 0x3F8CB2D4    /* HI((2^0*(1+10/32+1/64))^(1/3)) = 1.099207 */
+        .long 0x3F8DCB24    /* HI((2^0*(1+11/32+1/64))^(1/3)) = 1.107762 */
+        .long 0x3F8EDF31    /* HI((2^0*(1+12/32+1/64))^(1/3)) = 1.116186 */
+        .long 0x3F8FEF22    /* HI((2^0*(1+13/32+1/64))^(1/3)) = 1.124485 */
+        .long 0x3F90FB1F    /* HI((2^0*(1+14/32+1/64))^(1/3)) = 1.132664 */
+        .long 0x3F92034C    /* HI((2^0*(1+15/32+1/64))^(1/3)) = 1.140726 */
+        .long 0x3F9307CA    /* HI((2^0*(1+16/32+1/64))^(1/3)) = 1.148675 */
+        .long 0x3F9408B9    /* HI((2^0*(1+17/32+1/64))^(1/3)) = 1.156516 */
+        .long 0x3F950638    /* HI((2^0*(1+18/32+1/64))^(1/3)) = 1.164252 */
+        .long 0x3F960064    /* HI((2^0*(1+19/32+1/64))^(1/3)) = 1.171887 */
+        .long 0x3F96F759    /* HI((2^0*(1+20/32+1/64))^(1/3)) = 1.179423 */
+        .long 0x3F97EB2F    /* HI((2^0*(1+21/32+1/64))^(1/3)) = 1.186865 */
+        .long 0x3F98DC01    /* HI((2^0*(1+22/32+1/64))^(1/3)) = 1.194214 */
+        .long 0x3F99C9E5    /* HI((2^0*(1+23/32+1/64))^(1/3)) = 1.201474 */
+        .long 0x3F9AB4F2    /* HI((2^0*(1+24/32+1/64))^(1/3)) = 1.208647 */
+        .long 0x3F9B9D3D    /* HI((2^0*(1+25/32+1/64))^(1/3)) = 1.215736 */
+        .long 0x3F9C82DA    /* HI((2^0*(1+26/32+1/64))^(1/3)) = 1.222743 */
+        .long 0x3F9D65DD    /* HI((2^0*(1+27/32+1/64))^(1/3)) = 1.229671 */
+        .long 0x3F9E4659    /* HI((2^0*(1+28/32+1/64))^(1/3)) = 1.236522 */
+        .long 0x3F9F245F    /* HI((2^0*(1+29/32+1/64))^(1/3)) = 1.243297 */
+        .long 0x3FA00000    /* HI((2^0*(1+30/32+1/64))^(1/3)) = 1.25     */
+        .long 0x3FA0D94C    /* HI((2^0*(1+31/32+1/64))^(1/3)) = 1.256631 */
+        .long 0x3FA21B02    /* HI((2^1*(1+0/32+1/64))^(1/3)) = 1.266449 */
+        .long 0x3FA3C059    /* HI((2^1*(1+1/32+1/64))^(1/3)) = 1.279307 */
+        .long 0x3FA55D61    /* HI((2^1*(1+2/32+1/64))^(1/3)) = 1.291912 */
+        .long 0x3FA6F282    /* HI((2^1*(1+3/32+1/64))^(1/3)) = 1.304276 */
+        .long 0x3FA8801A    /* HI((2^1*(1+4/32+1/64))^(1/3)) = 1.316409 */
+        .long 0x3FAA067E    /* HI((2^1*(1+5/32+1/64))^(1/3)) = 1.328323 */
+        .long 0x3FAB8602    /* HI((2^1*(1+6/32+1/64))^(1/3)) = 1.340027 */
+        .long 0x3FACFEEF    /* HI((2^1*(1+7/32+1/64))^(1/3)) = 1.35153  */
+        .long 0x3FAE718E    /* HI((2^1*(1+8/32+1/64))^(1/3)) = 1.36284  */
+        .long 0x3FAFDE1F    /* HI((2^1*(1+9/32+1/64))^(1/3)) = 1.373966 */
+        .long 0x3FB144E1    /* HI((2^1*(1+10/32+1/64))^(1/3)) = 1.384915 */
+        .long 0x3FB2A60D    /* HI((2^1*(1+11/32+1/64))^(1/3)) = 1.395692 */
+        .long 0x3FB401DA    /* HI((2^1*(1+12/32+1/64))^(1/3)) = 1.406307 */
+        .long 0x3FB5587B    /* HI((2^1*(1+13/32+1/64))^(1/3)) = 1.416763 */
+        .long 0x3FB6AA20    /* HI((2^1*(1+14/32+1/64))^(1/3)) = 1.427067 */
+        .long 0x3FB7F6F7    /* HI((2^1*(1+15/32+1/64))^(1/3)) = 1.437224 */
+        .long 0x3FB93F29    /* HI((2^1*(1+16/32+1/64))^(1/3)) = 1.44724  */
+        .long 0x3FBA82E1    /* HI((2^1*(1+17/32+1/64))^(1/3)) = 1.457119 */
+        .long 0x3FBBC244    /* HI((2^1*(1+18/32+1/64))^(1/3)) = 1.466866 */
+        .long 0x3FBCFD77    /* HI((2^1*(1+19/32+1/64))^(1/3)) = 1.476485 */
+        .long 0x3FBE349B    /* HI((2^1*(1+20/32+1/64))^(1/3)) = 1.48598  */
+        .long 0x3FBF67D3    /* HI((2^1*(1+21/32+1/64))^(1/3)) = 1.495356 */
+        .long 0x3FC0973C    /* HI((2^1*(1+22/32+1/64))^(1/3)) = 1.504615 */
+        .long 0x3FC1C2F6    /* HI((2^1*(1+23/32+1/64))^(1/3)) = 1.513762 */
+        .long 0x3FC2EB1A    /* HI((2^1*(1+24/32+1/64))^(1/3)) = 1.5228   */
+        .long 0x3FC40FC6    /* HI((2^1*(1+25/32+1/64))^(1/3)) = 1.531731 */
+        .long 0x3FC53112    /* HI((2^1*(1+26/32+1/64))^(1/3)) = 1.54056  */
+        .long 0x3FC64F16    /* HI((2^1*(1+27/32+1/64))^(1/3)) = 1.549289 */
+        .long 0x3FC769EB    /* HI((2^1*(1+28/32+1/64))^(1/3)) = 1.55792  */
+        .long 0x3FC881A6    /* HI((2^1*(1+29/32+1/64))^(1/3)) = 1.566457 */
+        .long 0x3FC9965D    /* HI((2^1*(1+30/32+1/64))^(1/3)) = 1.574901 */
+        .long 0x3FCAA825    /* HI((2^1*(1+31/32+1/64))^(1/3)) = 1.583256 */
+        .long 0x3FCC3D79    /* HI((2^2*(1+0/32+1/64))^(1/3)) = 1.595626 */
+        .long 0x3FCE5054    /* HI((2^2*(1+1/32+1/64))^(1/3)) = 1.611826 */
+        .long 0x3FD058B8    /* HI((2^2*(1+2/32+1/64))^(1/3)) = 1.627707 */
+        .long 0x3FD25726    /* HI((2^2*(1+3/32+1/64))^(1/3)) = 1.643285 */
+        .long 0x3FD44C15    /* HI((2^2*(1+4/32+1/64))^(1/3)) = 1.658572 */
+        .long 0x3FD637F2    /* HI((2^2*(1+5/32+1/64))^(1/3)) = 1.673582 */
+        .long 0x3FD81B24    /* HI((2^2*(1+6/32+1/64))^(1/3)) = 1.688328 */
+        .long 0x3FD9F60B    /* HI((2^2*(1+7/32+1/64))^(1/3)) = 1.702821 */
+        .long 0x3FDBC8FE    /* HI((2^2*(1+8/32+1/64))^(1/3)) = 1.717071 */
+        .long 0x3FDD9452    /* HI((2^2*(1+9/32+1/64))^(1/3)) = 1.731089 */
+        .long 0x3FDF5853    /* HI((2^2*(1+10/32+1/64))^(1/3)) = 1.744883 */
+        .long 0x3FE1154B    /* HI((2^2*(1+11/32+1/64))^(1/3)) = 1.758462 */
+        .long 0x3FE2CB7F    /* HI((2^2*(1+12/32+1/64))^(1/3)) = 1.771835 */
+        .long 0x3FE47B2E    /* HI((2^2*(1+13/32+1/64))^(1/3)) = 1.785009 */
+        .long 0x3FE62496    /* HI((2^2*(1+14/32+1/64))^(1/3)) = 1.797992 */
+        .long 0x3FE7C7F0    /* HI((2^2*(1+15/32+1/64))^(1/3)) = 1.810789 */
+        .long 0x3FE96571    /* HI((2^2*(1+16/32+1/64))^(1/3)) = 1.823408 */
+        .long 0x3FEAFD4C    /* HI((2^2*(1+17/32+1/64))^(1/3)) = 1.835855 */
+        .long 0x3FEC8FB3    /* HI((2^2*(1+18/32+1/64))^(1/3)) = 1.848135 */
+        .long 0x3FEE1CD3    /* HI((2^2*(1+19/32+1/64))^(1/3)) = 1.860255 */
+        .long 0x3FEFA4D7    /* HI((2^2*(1+20/32+1/64))^(1/3)) = 1.872218 */
+        .long 0x3FF127E9    /* HI((2^2*(1+21/32+1/64))^(1/3)) = 1.88403  */
+        .long 0x3FF2A62F    /* HI((2^2*(1+22/32+1/64))^(1/3)) = 1.895697 */
+        .long 0x3FF41FD0    /* HI((2^2*(1+23/32+1/64))^(1/3)) = 1.907221 */
+        .long 0x3FF594EE    /* HI((2^2*(1+24/32+1/64))^(1/3)) = 1.918607 */
+        .long 0x3FF705AC    /* HI((2^2*(1+25/32+1/64))^(1/3)) = 1.929861 */
+        .long 0x3FF8722A    /* HI((2^2*(1+26/32+1/64))^(1/3)) = 1.940984 */
+        .long 0x3FF9DA86    /* HI((2^2*(1+27/32+1/64))^(1/3)) = 1.951981 */
+        .long 0x3FFB3EDE    /* HI((2^2*(1+28/32+1/64))^(1/3)) = 1.962856 */
+        .long 0x3FFC9F4E    /* HI((2^2*(1+29/32+1/64))^(1/3)) = 1.973612 */
+        .long 0x3FFDFBF2    /* HI((2^2*(1+30/32+1/64))^(1/3)) = 1.984251 */
+        .long 0x3FFF54E3    /* HI((2^2*(1+31/32+1/64))^(1/3)) = 1.994778 */
+        .align 32
+        .long 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962  /* _sP2 */
+        .align 32
+        .long 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91  /* _sP1 */
+        .align 32
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff  /* _sMantissaMask (EXP_MSK3) */
+        .align 32
+        .long 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000  /* _sMantissaMask1 (SIG_MASK) */
+        .align 32
+        .long 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000  /* _sExpMask  (EXP_MASK) */
+        .align 32
+        .long 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000  /* _sExpMask1 (EXP_MASK2) */
+        .align 32
+        .long 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c  /* _iRcpIndexMask */
+        .align 32
+        .long 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff  /* _iBExpMask */
+        .align 32
+        .long 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100  /* _iSignMask */
+        .align 32
+        .long 0x00000055, 0x00000055, 0x00000055, 0x00000055, 0x00000055, 0x00000055, 0x00000055, 0x00000055  /* _iBias */
+        .align 32
+        .long 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001  /* _iOne */
+        .align 32
+        .long 0x00000555, 0x00000555, 0x00000555, 0x00000555, 0x00000555, 0x00000555, 0x00000555, 0x00000555  /* _i555 */
+        .align 32
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _iAbsMask */
+        .align 32
+        .long 0x80800000, 0x80800000, 0x80800000, 0x80800000, 0x80800000, 0x80800000, 0x80800000, 0x80800000  /* _iSubConst */
+        .align 32
+        .long 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF  /* _iCmpConst */
+        .align 32
+        .type	__svml_scbrt_data_internal,@object
+        .size	__svml_scbrt_data_internal,.-__svml_scbrt_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt2_core.S b/sysdeps/x86_64/fpu/svml_d_cbrt2_core.S
new file mode 100644
index 0000000000..4bf546564b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cbrt2_core.S
@@ -0,0 +1,29 @@
+/* Function cbrt vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_cbrt)
+WRAPPER_IMPL_SSE2 cbrt
+END (_ZGVbN2v_cbrt)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_cbrt)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt4_core.S b/sysdeps/x86_64/fpu/svml_d_cbrt4_core.S
new file mode 100644
index 0000000000..e6d1003e27
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cbrt4_core.S
@@ -0,0 +1,29 @@
+/* Function cbrt vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_cbrt)
+WRAPPER_IMPL_AVX _ZGVbN2v_cbrt
+END (_ZGVdN4v_cbrt)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_cbrt)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S
new file mode 100644
index 0000000000..70632869ac
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function cbrt vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_cbrt)
+WRAPPER_IMPL_AVX _ZGVbN2v_cbrt
+END (_ZGVcN4v_cbrt)
diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt8_core.S b/sysdeps/x86_64/fpu/svml_d_cbrt8_core.S
new file mode 100644
index 0000000000..37571673a7
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cbrt8_core.S
@@ -0,0 +1,25 @@
+/* Function cbrt vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_cbrt)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_cbrt
+END (_ZGVeN8v_cbrt)
diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S b/sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S
new file mode 100644
index 0000000000..1be6294026
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S
@@ -0,0 +1,25 @@
+/* Function cbrtf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_cbrtf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_cbrtf
+END (_ZGVeN16v_cbrtf)
diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S b/sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S
new file mode 100644
index 0000000000..2469a100f4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S
@@ -0,0 +1,29 @@
+/* Function cbrtf vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_cbrtf)
+WRAPPER_IMPL_SSE2 cbrtf
+END (_ZGVbN4v_cbrtf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_cbrtf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S
new file mode 100644
index 0000000000..efedc22323
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S
@@ -0,0 +1,29 @@
+/* Function cbrtf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_cbrtf)
+WRAPPER_IMPL_AVX _ZGVbN4v_cbrtf
+END (_ZGVdN8v_cbrtf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_cbrtf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S
new file mode 100644
index 0000000000..b5acc62426
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function cbrtf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_cbrtf)
+WRAPPER_IMPL_AVX _ZGVbN4v_cbrtf
+END (_ZGVcN8v_cbrtf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c
new file mode 100644
index 0000000000..c8bc643c99
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-cbrt.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c
new file mode 100644
index 0000000000..c8bc643c99
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-cbrt.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c
new file mode 100644
index 0000000000..c8bc643c99
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-cbrt.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c
new file mode 100644
index 0000000000..fb3684b18c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC cbrt
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index db136cc901..b1981ac7e4 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVbN2v_sinh)
+VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 5fc09ac8c0..47915a7e59 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVdN4v_sinh)
+VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 26ef7fb365..5cd5049807 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVcN4v_sinh)
+VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index c7055fca76..83970739ab 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10)
 VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVeN8v_sinh)
+VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c
new file mode 100644
index 0000000000..59b8d77f71
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-cbrtf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c
new file mode 100644
index 0000000000..59b8d77f71
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-cbrtf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c
new file mode 100644
index 0000000000..59b8d77f71
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-cbrtf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c
new file mode 100644
index 0000000000..3a06ba79e0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC cbrtf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index d353bcb0f2..0420f11c28 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVeN16v_sinhf)
+VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 5e59117626..c8f7580265 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVbN4v_sinhf)
+VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index e884a5f4df..b581796b88 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVdN8v_sinhf)
+VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 95910d39e9..f16789e5ff 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f)
 VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVcN8v_sinhf)
+VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 10/18] x86-64: Add vector atan2/atan2f implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (8 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 09/18] x86-64: Add vector cbrt/cbrtf " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:26   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 11/18] x86-64: Add vector log10/log10f " Sunil K Pandey
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector atan2/atan2f with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
 .../fpu/multiarch/svml_d_atan22_core-sse2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_d_atan22_core.c |  28 ++
 .../fpu/multiarch/svml_d_atan22_core_sse4.S   | 471 +++++++++++++++++
 .../fpu/multiarch/svml_d_atan24_core-sse.S    |  20 +
 .../x86_64/fpu/multiarch/svml_d_atan24_core.c |  28 ++
 .../fpu/multiarch/svml_d_atan24_core_avx2.S   | 451 +++++++++++++++++
 .../fpu/multiarch/svml_d_atan28_core-avx2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_d_atan28_core.c |  28 ++
 .../fpu/multiarch/svml_d_atan28_core_avx512.S | 475 ++++++++++++++++++
 .../fpu/multiarch/svml_s_atan2f16_core-avx2.S |  20 +
 .../fpu/multiarch/svml_s_atan2f16_core.c      |  28 ++
 .../multiarch/svml_s_atan2f16_core_avx512.S   | 399 +++++++++++++++
 .../fpu/multiarch/svml_s_atan2f4_core-sse2.S  |  20 +
 .../fpu/multiarch/svml_s_atan2f4_core.c       |  28 ++
 .../fpu/multiarch/svml_s_atan2f4_core_sse4.S  | 384 ++++++++++++++
 .../fpu/multiarch/svml_s_atan2f8_core-sse.S   |  20 +
 .../fpu/multiarch/svml_s_atan2f8_core.c       |  28 ++
 .../fpu/multiarch/svml_s_atan2f8_core_avx2.S  | 362 +++++++++++++
 sysdeps/x86_64/fpu/svml_d_atan22_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_d_atan24_core.S       |  29 ++
 sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S   |  25 +
 sysdeps/x86_64/fpu/svml_d_atan28_core.S       |  25 +
 sysdeps/x86_64/fpu/svml_s_atan2f16_core.S     |  25 +
 sysdeps/x86_64/fpu/svml_s_atan2f4_core.S      |  29 ++
 sysdeps/x86_64/fpu/svml_s_atan2f8_core.S      |  29 ++
 sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S  |  25 +
 .../fpu/test-double-libmvec-atan2-avx.c       |   1 +
 .../fpu/test-double-libmvec-atan2-avx2.c      |   1 +
 .../fpu/test-double-libmvec-atan2-avx512f.c   |   1 +
 .../x86_64/fpu/test-double-libmvec-atan2.c    |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../fpu/test-float-libmvec-atan2f-avx.c       |   1 +
 .../fpu/test-float-libmvec-atan2f-avx2.c      |   1 +
 .../fpu/test-float-libmvec-atan2f-avx512f.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-atan2f.c    |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 3117 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan22_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan24_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atan28_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 7f1304ed1d..31878bf4ed 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -208,4 +208,15 @@
 #define __DECL_SIMD_cbrtf32x
 #define __DECL_SIMD_cbrtf64x
 #define __DECL_SIMD_cbrtf128x
+
+#define __DECL_SIMD_atan2
+#define __DECL_SIMD_atan2f
+#define __DECL_SIMD_atan2l
+#define __DECL_SIMD_atan2f16
+#define __DECL_SIMD_atan2f32
+#define __DECL_SIMD_atan2f64
+#define __DECL_SIMD_atan2f128
+#define __DECL_SIMD_atan2f32x
+#define __DECL_SIMD_atan2f64x
+#define __DECL_SIMD_atan2f128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 26d18f0135..1bd4911993 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -56,7 +56,7 @@ __MATHCALL_VEC (asin,, (_Mdouble_ __x));
 /* Arc tangent of X.  */
 __MATHCALL_VEC (atan,, (_Mdouble_ __x));
 /* Arc tangent of Y/X.  */
-__MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
+__MATHCALL_VEC (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
 
 /* Cosine of X.  */
 __MATHCALL_VEC (cos,, (_Mdouble_ __x));
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index a6558d9810..2b3b8d3886 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -55,6 +55,7 @@ GLIBC_2.35 _ZGVbN2v_exp10 F
 GLIBC_2.35 _ZGVbN2v_exp2 F
 GLIBC_2.35 _ZGVbN2v_expm1 F
 GLIBC_2.35 _ZGVbN2v_sinh F
+GLIBC_2.35 _ZGVbN2vv_atan2 F
 GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
 GLIBC_2.35 _ZGVbN4v_asinf F
@@ -65,6 +66,7 @@ GLIBC_2.35 _ZGVbN4v_exp10f F
 GLIBC_2.35 _ZGVbN4v_exp2f F
 GLIBC_2.35 _ZGVbN4v_expm1f F
 GLIBC_2.35 _ZGVbN4v_sinhf F
+GLIBC_2.35 _ZGVbN4vv_atan2f F
 GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
 GLIBC_2.35 _ZGVcN4v_asin F
@@ -75,6 +77,7 @@ GLIBC_2.35 _ZGVcN4v_exp10 F
 GLIBC_2.35 _ZGVcN4v_exp2 F
 GLIBC_2.35 _ZGVcN4v_expm1 F
 GLIBC_2.35 _ZGVcN4v_sinh F
+GLIBC_2.35 _ZGVcN4vv_atan2 F
 GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
 GLIBC_2.35 _ZGVcN8v_asinf F
@@ -85,6 +88,7 @@ GLIBC_2.35 _ZGVcN8v_exp10f F
 GLIBC_2.35 _ZGVcN8v_exp2f F
 GLIBC_2.35 _ZGVcN8v_expm1f F
 GLIBC_2.35 _ZGVcN8v_sinhf F
+GLIBC_2.35 _ZGVcN8vv_atan2f F
 GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
 GLIBC_2.35 _ZGVdN4v_asin F
@@ -95,6 +99,7 @@ GLIBC_2.35 _ZGVdN4v_exp10 F
 GLIBC_2.35 _ZGVdN4v_exp2 F
 GLIBC_2.35 _ZGVdN4v_expm1 F
 GLIBC_2.35 _ZGVdN4v_sinh F
+GLIBC_2.35 _ZGVdN4vv_atan2 F
 GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
 GLIBC_2.35 _ZGVdN8v_asinf F
@@ -105,6 +110,7 @@ GLIBC_2.35 _ZGVdN8v_exp10f F
 GLIBC_2.35 _ZGVdN8v_exp2f F
 GLIBC_2.35 _ZGVdN8v_expm1f F
 GLIBC_2.35 _ZGVdN8v_sinhf F
+GLIBC_2.35 _ZGVdN8vv_atan2f F
 GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
 GLIBC_2.35 _ZGVeN16v_asinf F
@@ -115,6 +121,7 @@ GLIBC_2.35 _ZGVeN16v_exp10f F
 GLIBC_2.35 _ZGVeN16v_exp2f F
 GLIBC_2.35 _ZGVeN16v_expm1f F
 GLIBC_2.35 _ZGVeN16v_sinhf F
+GLIBC_2.35 _ZGVeN16vv_atan2f F
 GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
 GLIBC_2.35 _ZGVeN8v_asin F
@@ -125,4 +132,5 @@ GLIBC_2.35 _ZGVeN8v_exp10 F
 GLIBC_2.35 _ZGVeN8v_exp2 F
 GLIBC_2.35 _ZGVeN8v_expm1 F
 GLIBC_2.35 _ZGVeN8v_sinh F
+GLIBC_2.35 _ZGVeN8vv_atan2 F
 GLIBC_2.35 _ZGVeN8vv_hypot F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index dcd45934ab..62f2890ab3 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -98,6 +98,10 @@
 #  define __DECL_SIMD_cbrt __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_cbrtf
 #  define __DECL_SIMD_cbrtf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_atan2
+#  define __DECL_SIMD_atan2 __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_atan2f
+#  define __DECL_SIMD_atan2f __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index dfb5f13ea3..2269b74d50 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -48,6 +48,8 @@
 !GCC$ builtin (sinhf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (cbrt) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (atan2) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (atan2f) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -81,3 +83,5 @@
 !GCC$ builtin (sinhf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cbrt) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (atan2) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (atan2f) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index dde737c0d6..96a40856fa 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -25,6 +25,7 @@ libmvec-funcs = \
   acos \
   asin \
   atan \
+  atan2 \
   cbrt \
   cos \
   cosh \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index b70aeb3e2f..f58c98eb45 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -23,6 +23,7 @@ libmvec {
     _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
     _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
     _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
+    _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
     _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
@@ -33,6 +34,7 @@ libmvec {
     _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
     _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
     _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
+    _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
     _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
   }
 }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index e039a993df..6f59c61756 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -166,6 +166,26 @@ float: 2
 float128: 2
 ldouble: 1
 
+Function: "atan2_vlen16":
+float: 2
+
+Function: "atan2_vlen2":
+double: 1
+
+Function: "atan2_vlen4":
+double: 1
+float: 2
+
+Function: "atan2_vlen4_avx2":
+double: 1
+
+Function: "atan2_vlen8":
+double: 1
+float: 2
+
+Function: "atan2_vlen8_avx2":
+float: 2
+
 Function: "atan_downward":
 double: 1
 float: 2
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core-sse2.S
new file mode 100644
index 0000000000..6c3ad05a6c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized atan2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2vv_atan2 _ZGVbN2vv_atan2_sse2
+#include "../svml_d_atan22_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core.c
new file mode 100644
index 0000000000..43f1ee7f33
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atan2, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2vv_atan2
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2vv_atan2, __GI__ZGVbN2vv_atan2,
+	       __redirect__ZGVbN2vv_atan2)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core_sse4.S
new file mode 100644
index 0000000000..5c0d0fd17f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core_sse4.S
@@ -0,0 +1,471 @@
+/* Function atan2 vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_datan2_data_internal
+ */
+#define dPI                           	0
+#define dPIO2                         	16
+#define dA19                          	32
+#define dA18                          	48
+#define dA17                          	64
+#define dA16                          	80
+#define dA15                          	96
+#define dA14                          	112
+#define dA13                          	128
+#define dA12                          	144
+#define dA11                          	160
+#define dA10                          	176
+#define dA09                          	192
+#define dA08                          	208
+#define dA07                          	224
+#define dA06                          	240
+#define dA05                          	256
+#define dA04                          	272
+#define dA03                          	288
+#define dA02                          	304
+#define dA01                          	320
+#define dA00                          	336
+#define dSIGN_MASK                    	352
+#define iCHK_WORK_SUB                 	368
+#define iCHK_WORK_CMP                 	384
+#define dABS_MASK                     	400
+#define dZERO                         	416
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2vv_atan2_sse4)
+        subq      $88, %rsp
+        cfi_def_cfa_offset(96)
+        movaps    %xmm0, %xmm8
+
+/*
+ * #define NO_VECTOR_ZERO_ATAN2_ARGS
+ *  Declarations
+ * Variables
+ * Constants
+ *  The end of declarations
+ *  Implementation
+ * Get r0~=1/B
+ * Cannot be replaced by VQRCP(D, dR0, dB);
+ * Argument Absolute values
+ */
+        movups    dABS_MASK+__svml_datan2_data_internal(%rip), %xmm4
+        movaps    %xmm1, %xmm9
+        movaps    %xmm4, %xmm1
+        andps     %xmm8, %xmm4
+        andps     %xmm9, %xmm1
+        movaps    %xmm4, %xmm2
+        cmpnltpd  %xmm1, %xmm2
+
+/* Argument signs */
+        movups    dSIGN_MASK+__svml_datan2_data_internal(%rip), %xmm3
+        movaps    %xmm2, %xmm0
+        movups    dPIO2+__svml_datan2_data_internal(%rip), %xmm5
+        movaps    %xmm3, %xmm7
+        movaps    %xmm3, %xmm6
+
+/*
+ * 1) If y<x then a= y, b=x, PIO2=0
+ * 2) If y>x then a=-x, b=y, PIO2=Pi/2
+ */
+        orps      %xmm1, %xmm3
+        movaps    %xmm2, %xmm10
+        andps     %xmm2, %xmm5
+        andnps    %xmm4, %xmm0
+        andps     %xmm2, %xmm3
+        andnps    %xmm1, %xmm10
+        andps     %xmm4, %xmm2
+        orps      %xmm3, %xmm0
+        orps      %xmm2, %xmm10
+        divpd     %xmm10, %xmm0
+        movq      iCHK_WORK_SUB+__svml_datan2_data_internal(%rip), %xmm11
+
+/* if x<0, dPI = Pi, else dPI =0 */
+        movaps    %xmm9, %xmm3
+
+/* Check if y and x are on main path. */
+        pshufd    $221, %xmm1, %xmm12
+        andps     %xmm9, %xmm7
+        psubd     %xmm11, %xmm12
+        andps     %xmm8, %xmm6
+        movq      iCHK_WORK_CMP+__svml_datan2_data_internal(%rip), %xmm13
+        xorl      %edx, %edx
+        movups    %xmm4, 16(%rsp)
+        xorl      %eax, %eax
+        pshufd    $221, %xmm4, %xmm14
+        movdqa    %xmm12, %xmm4
+        pcmpgtd   %xmm13, %xmm4
+        pcmpeqd   %xmm13, %xmm12
+        por       %xmm12, %xmm4
+
+/* Polynomial. */
+        movaps    %xmm0, %xmm12
+        mulpd     %xmm0, %xmm12
+        cmplepd   dZERO+__svml_datan2_data_internal(%rip), %xmm3
+        psubd     %xmm11, %xmm14
+        movdqa    %xmm14, %xmm15
+        pcmpeqd   %xmm13, %xmm14
+        pcmpgtd   %xmm13, %xmm15
+        por       %xmm14, %xmm15
+        movaps    %xmm12, %xmm14
+        mulpd     %xmm12, %xmm14
+        por       %xmm15, %xmm4
+        movaps    %xmm14, %xmm15
+        mulpd     %xmm14, %xmm15
+        movmskps  %xmm4, %ecx
+        movups    %xmm10, (%rsp)
+        movups    dA19+__svml_datan2_data_internal(%rip), %xmm10
+        mulpd     %xmm15, %xmm10
+        movups    dA18+__svml_datan2_data_internal(%rip), %xmm13
+        movups    dA17+__svml_datan2_data_internal(%rip), %xmm11
+        addpd     dA15+__svml_datan2_data_internal(%rip), %xmm10
+        mulpd     %xmm15, %xmm13
+        mulpd     %xmm15, %xmm11
+        mulpd     %xmm15, %xmm10
+        addpd     dA14+__svml_datan2_data_internal(%rip), %xmm13
+        addpd     dA13+__svml_datan2_data_internal(%rip), %xmm11
+        addpd     dA11+__svml_datan2_data_internal(%rip), %xmm10
+        mulpd     %xmm15, %xmm13
+        mulpd     %xmm15, %xmm11
+        mulpd     %xmm15, %xmm10
+        addpd     dA10+__svml_datan2_data_internal(%rip), %xmm13
+        addpd     dA09+__svml_datan2_data_internal(%rip), %xmm11
+        addpd     dA07+__svml_datan2_data_internal(%rip), %xmm10
+        mulpd     %xmm15, %xmm13
+        mulpd     %xmm15, %xmm11
+        mulpd     %xmm15, %xmm10
+        addpd     dA06+__svml_datan2_data_internal(%rip), %xmm13
+        addpd     dA05+__svml_datan2_data_internal(%rip), %xmm11
+        addpd     dA03+__svml_datan2_data_internal(%rip), %xmm10
+        mulpd     %xmm15, %xmm13
+        mulpd     %xmm15, %xmm11
+        mulpd     %xmm12, %xmm10
+        addpd     dA02+__svml_datan2_data_internal(%rip), %xmm13
+        addpd     dA01+__svml_datan2_data_internal(%rip), %xmm11
+        addpd     %xmm10, %xmm13
+        mulpd     %xmm11, %xmm12
+        mulpd     %xmm13, %xmm14
+        movups    dA16+__svml_datan2_data_internal(%rip), %xmm2
+        mulpd     %xmm15, %xmm2
+        addpd     dA12+__svml_datan2_data_internal(%rip), %xmm2
+        mulpd     %xmm15, %xmm2
+        addpd     dA08+__svml_datan2_data_internal(%rip), %xmm2
+        mulpd     %xmm15, %xmm2
+        addpd     dA04+__svml_datan2_data_internal(%rip), %xmm2
+
+/* A00=1.0, account for it later  VQFMA(D, dP4, dP4, dR8, dA00); */
+        mulpd     %xmm2, %xmm15
+        addpd     %xmm12, %xmm15
+        addpd     %xmm14, %xmm15
+
+/*
+ * Reconstruction.
+ * dP=(R+R*dP) + dPIO2
+ */
+        mulpd     %xmm0, %xmm15
+        addpd     %xmm15, %xmm0
+        addpd     %xmm5, %xmm0
+        andps     __svml_datan2_data_internal(%rip), %xmm3
+        orps      %xmm7, %xmm0
+        addpd     %xmm3, %xmm0
+
+/*  Special branch for fast (vector) processing of zero arguments  */
+        movups    16(%rsp), %xmm11
+        orps      %xmm6, %xmm0
+        testb     $3, %cl
+
+/* Go to auxilary branch */
+        jne       L(AUX_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm1 xmm3 xmm4 xmm5 xmm6 xmm7 xmm8 xmm9 xmm11
+
+/* Return from auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH_RETURN):
+/*
+ *  Special branch for fast (vector) processing of zero arguments
+ *  The end of implementation
+ */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm8 xmm9
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $88, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(96)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm8, 32(%rsp)
+        movups    %xmm9, 48(%rsp)
+        movups    %xmm0, 64(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0
+
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -80)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -88)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -96)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    64(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -80)
+        cfi_offset(13, -88)
+        cfi_offset(14, -96)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        movsd     48(%rsp,%r14,8), %xmm1
+        call      atan2@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+        cfi_restore(12)
+        cfi_restore(13)
+        cfi_restore(14)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH):
+/* Check if at least on of Y or Y is zero: iAXAYZERO */
+        movups    dZERO+__svml_datan2_data_internal(%rip), %xmm2
+
+/* Check if both X & Y are not NaNs:  iXYnotNAN */
+        movaps    %xmm9, %xmm12
+        movaps    %xmm8, %xmm10
+        cmpordpd  %xmm9, %xmm12
+        cmpordpd  %xmm8, %xmm10
+        cmpeqpd   %xmm2, %xmm1
+        cmpeqpd   %xmm2, %xmm11
+        andps     %xmm10, %xmm12
+        orps      %xmm11, %xmm1
+        pshufd    $221, %xmm1, %xmm1
+        pshufd    $221, %xmm12, %xmm11
+
+/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
+        pand      %xmm11, %xmm1
+
+/* Exclude from previous callout mask zero (and not NaN) arguments */
+        movdqa    %xmm1, %xmm13
+        pandn     %xmm4, %xmm13
+
+/*
+ *  Path for zero arguments (at least one of both)
+ * Check if both args are zeros (den. is zero)
+ */
+        movups    (%rsp), %xmm4
+        cmpeqpd   %xmm2, %xmm4
+
+/* Go to callout */
+        movmskps  %xmm13, %edx
+
+/* Set sPIO2 to zero if den. is zero */
+        movaps    %xmm4, %xmm15
+        andps     %xmm2, %xmm4
+        andnps    %xmm5, %xmm15
+        andl      $3, %edx
+        orps      %xmm4, %xmm15
+        pshufd    $221, %xmm9, %xmm5
+        orps      %xmm7, %xmm15
+
+/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
+        pshufd    $221, %xmm2, %xmm7
+        pcmpgtd   %xmm5, %xmm7
+        pshufd    $80, %xmm7, %xmm14
+        andps     %xmm3, %xmm14
+        addpd     %xmm14, %xmm15
+
+/* Merge results from main and spec path */
+        pshufd    $80, %xmm1, %xmm3
+        orps      %xmm6, %xmm15
+        movdqa    %xmm3, %xmm6
+        andps     %xmm3, %xmm15
+        andnps    %xmm0, %xmm6
+        movaps    %xmm6, %xmm0
+        orps      %xmm15, %xmm0
+
+/* Return to main vector processing path */
+        jmp       L(AUX_BRANCH_RETURN)
+                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm8 xmm9
+END(_ZGVbN2vv_atan2_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_datan2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 dPI[2][2];
+        __declspec(align(16)) VUINT32 dPIO2[2][2];
+        __declspec(align(16)) VUINT32 dA19[2][2];
+        __declspec(align(16)) VUINT32 dA18[2][2];
+        __declspec(align(16)) VUINT32 dA17[2][2];
+        __declspec(align(16)) VUINT32 dA16[2][2];
+        __declspec(align(16)) VUINT32 dA15[2][2];
+        __declspec(align(16)) VUINT32 dA14[2][2];
+        __declspec(align(16)) VUINT32 dA13[2][2];
+        __declspec(align(16)) VUINT32 dA12[2][2];
+        __declspec(align(16)) VUINT32 dA11[2][2];
+        __declspec(align(16)) VUINT32 dA10[2][2];
+        __declspec(align(16)) VUINT32 dA09[2][2];
+        __declspec(align(16)) VUINT32 dA08[2][2];
+        __declspec(align(16)) VUINT32 dA07[2][2];
+        __declspec(align(16)) VUINT32 dA06[2][2];
+        __declspec(align(16)) VUINT32 dA05[2][2];
+        __declspec(align(16)) VUINT32 dA04[2][2];
+        __declspec(align(16)) VUINT32 dA03[2][2];
+        __declspec(align(16)) VUINT32 dA02[2][2];
+        __declspec(align(16)) VUINT32 dA01[2][2];
+        __declspec(align(16)) VUINT32 dA00[2][2];
+        __declspec(align(16)) VUINT32 dSIGN_MASK[2][2];
+        __declspec(align(16)) VUINT32 iCHK_WORK_SUB[4][1];
+        __declspec(align(16)) VUINT32 iCHK_WORK_CMP[4][1];
+        __declspec(align(16)) VUINT32 dABS_MASK[2][2];
+        __declspec(align(16)) VUINT32 dZERO[2][2];
+} __svml_datan2_data_internal;
+#endif
+__svml_datan2_data_internal:
+        .quad 0x400921FB54442D18, 0x400921FB54442D18 //dPI
+        .align 16
+        .quad 0x3FF921FB54442D18, 0x3FF921FB54442D18 //dPIO2
+        .align 16
+        .quad 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3 // dA19
+        .align 16
+        .quad 0x3F2CED0A36665209, 0x3F2CED0A36665209 // dA18
+        .align 16
+        .quad 0xBF52E67C93954C23, 0xBF52E67C93954C23 // dA17
+        .align 16
+        .quad 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3 // dA16
+        .align 16
+        .quad 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD // dA15
+        .align 16
+        .quad 0x3F914F4C661116A5, 0x3F914F4C661116A5 // dA14
+        .align 16
+        .quad 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C // dA13
+        .align 16
+        .quad 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F // dA12
+        .align 16
+        .quad 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC // dA11
+        .align 16
+        .quad 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B // dA10
+        .align 16
+        .quad 0xBFAAD261EAA09954, 0xBFAAD261EAA09954 // dA09
+        .align 16
+        .quad 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF // dA08
+        .align 16
+        .quad 0xBFB11084009435E0, 0xBFB11084009435E0 // dA07
+        .align 16
+        .quad 0x3FB3B12A49295651, 0x3FB3B12A49295651 // dA06
+        .align 16
+        .quad 0xBFB745D009BADA94, 0xBFB745D009BADA94 // dA05
+        .align 16
+        .quad 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5 // dA04
+        .align 16
+        .quad 0xBFC2492491EE55C7, 0xBFC2492491EE55C7 // dA03
+        .align 16
+        .quad 0x3FC999999997EE34, 0x3FC999999997EE34 // dA02
+        .align 16
+        .quad 0xBFD55555555553C5, 0xBFD55555555553C5 // dA01
+        .align 16
+        .quad 0x3FF0000000000000, 0x3FF0000000000000 // dA00
+        .align 16
+        .quad 0x8000000000000000, 0x8000000000000000 //dSIGN_MASK
+        .align 16
+        .long 0x80300000, 0x80300000, 0x80300000, 0x80300000 //iCHK_WORK_SUB
+        .align 16
+        .long 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000 //iCHK_WORK_CMP
+        .align 16
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff //dABS_MASK
+        .align 16
+        .quad 0x0000000000000000, 0x0000000000000000 //dZERO
+        .align 16
+        .type	__svml_datan2_data_internal,@object
+        .size	__svml_datan2_data_internal,.-__svml_datan2_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core-sse.S
new file mode 100644
index 0000000000..0db843a088
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized atan2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4vv_atan2 _ZGVdN4vv_atan2_sse_wrapper
+#include "../svml_d_atan24_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core.c
new file mode 100644
index 0000000000..c2e2611584
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atan2, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4vv_atan2
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4vv_atan2, __GI__ZGVdN4vv_atan2,
+	       __redirect__ZGVdN4vv_atan2)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core_avx2.S
new file mode 100644
index 0000000000..cdf780715b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core_avx2.S
@@ -0,0 +1,451 @@
+/* Function atan2 vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_datan2_data_internal
+ */
+#define dPI                           	0
+#define dPIO2                         	32
+#define dA19                          	64
+#define dA18                          	96
+#define dA17                          	128
+#define dA16                          	160
+#define dA15                          	192
+#define dA14                          	224
+#define dA13                          	256
+#define dA12                          	288
+#define dA11                          	320
+#define dA10                          	352
+#define dA09                          	384
+#define dA08                          	416
+#define dA07                          	448
+#define dA06                          	480
+#define dA05                          	512
+#define dA04                          	544
+#define dA03                          	576
+#define dA02                          	608
+#define dA01                          	640
+#define dA00                          	672
+#define dSIGN_MASK                    	704
+#define iCHK_WORK_SUB                 	736
+#define iCHK_WORK_CMP                 	768
+#define dABS_MASK                     	800
+#define dZERO                         	832
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4vv_atan2_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $128, %rsp
+        xorl      %edx, %edx
+
+/*
+ * #define NO_VECTOR_ZERO_ATAN2_ARGS
+ *  Declarations
+ * Variables
+ * Constants
+ *  The end of declarations
+ *  Implementation
+ * Get r0~=1/B
+ * Cannot be replaced by VQRCP(D, dR0, dB);
+ * Argument Absolute values
+ */
+        vmovupd   dABS_MASK+__svml_datan2_data_internal(%rip), %ymm5
+
+/* Argument signs */
+        vmovupd   dSIGN_MASK+__svml_datan2_data_internal(%rip), %ymm4
+        vmovups   iCHK_WORK_SUB+__svml_datan2_data_internal(%rip), %xmm13
+        vmovupd   %ymm0, (%rsp)
+        vmovapd   %ymm1, %ymm8
+        vandpd    %ymm5, %ymm8, %ymm2
+        vandpd    %ymm5, %ymm0, %ymm1
+        vcmpnlt_uqpd %ymm2, %ymm1, %ymm15
+
+/*
+ * 1) If y<x then a= y, b=x, PIO2=0
+ * 2) If y>x then a=-x, b=y, PIO2=Pi/2
+ */
+        vorpd     %ymm4, %ymm2, %ymm6
+        vblendvpd %ymm15, %ymm6, %ymm1, %ymm3
+        vblendvpd %ymm15, %ymm1, %ymm2, %ymm6
+        vdivpd    %ymm6, %ymm3, %ymm14
+        vmovups   iCHK_WORK_CMP+__svml_datan2_data_internal(%rip), %xmm3
+        vmovupd   %ymm6, 32(%rsp)
+        vandpd    %ymm4, %ymm0, %ymm7
+        vandpd    %ymm4, %ymm8, %ymm5
+        vandpd    dPIO2+__svml_datan2_data_internal(%rip), %ymm15, %ymm4
+
+/* Check if y and x are on main path. */
+        vextractf128 $1, %ymm2, %xmm9
+        vextractf128 $1, %ymm1, %xmm10
+        vshufps   $221, %xmm9, %xmm2, %xmm11
+        vshufps   $221, %xmm10, %xmm1, %xmm12
+        vpsubd    %xmm13, %xmm11, %xmm0
+        vpsubd    %xmm13, %xmm12, %xmm9
+        vpcmpgtd  %xmm3, %xmm0, %xmm15
+        vpcmpeqd  %xmm3, %xmm0, %xmm6
+        vpcmpgtd  %xmm3, %xmm9, %xmm10
+        vpcmpeqd  %xmm3, %xmm9, %xmm3
+        vpor      %xmm6, %xmm15, %xmm11
+        vpor      %xmm3, %xmm10, %xmm12
+
+/* Polynomial. */
+        vmulpd    %ymm14, %ymm14, %ymm10
+        vpor      %xmm12, %xmm11, %xmm3
+        vmovupd   dA18+__svml_datan2_data_internal(%rip), %ymm9
+        vmovupd   dA17+__svml_datan2_data_internal(%rip), %ymm12
+        vmovupd   dA16+__svml_datan2_data_internal(%rip), %ymm15
+        vmulpd    %ymm10, %ymm10, %ymm11
+
+/* if x<0, dPI = Pi, else dPI =0 */
+        vcmple_oqpd dZERO+__svml_datan2_data_internal(%rip), %ymm8, %ymm13
+        vmovmskps %xmm3, %eax
+        vmulpd    %ymm11, %ymm11, %ymm0
+        vandpd    __svml_datan2_data_internal(%rip), %ymm13, %ymm6
+        vmovupd   dA19+__svml_datan2_data_internal(%rip), %ymm13
+        vfmadd213pd dA14+__svml_datan2_data_internal(%rip), %ymm0, %ymm9
+        vfmadd213pd dA13+__svml_datan2_data_internal(%rip), %ymm0, %ymm12
+        vfmadd213pd dA12+__svml_datan2_data_internal(%rip), %ymm0, %ymm15
+        vfmadd213pd dA15+__svml_datan2_data_internal(%rip), %ymm0, %ymm13
+        vfmadd213pd dA10+__svml_datan2_data_internal(%rip), %ymm0, %ymm9
+        vfmadd213pd dA09+__svml_datan2_data_internal(%rip), %ymm0, %ymm12
+        vfmadd213pd dA08+__svml_datan2_data_internal(%rip), %ymm0, %ymm15
+        vfmadd213pd dA11+__svml_datan2_data_internal(%rip), %ymm0, %ymm13
+        vfmadd213pd dA06+__svml_datan2_data_internal(%rip), %ymm0, %ymm9
+        vfmadd213pd dA05+__svml_datan2_data_internal(%rip), %ymm0, %ymm12
+        vfmadd213pd dA04+__svml_datan2_data_internal(%rip), %ymm0, %ymm15
+        vfmadd213pd dA07+__svml_datan2_data_internal(%rip), %ymm0, %ymm13
+        vfmadd213pd dA02+__svml_datan2_data_internal(%rip), %ymm0, %ymm9
+        vfmadd213pd dA01+__svml_datan2_data_internal(%rip), %ymm0, %ymm12
+        vfmadd213pd dA03+__svml_datan2_data_internal(%rip), %ymm0, %ymm13
+
+/* A00=1.0, account for it later  VQFMA(D, dP4, dP4, dR8, dA00); */
+        vmulpd    %ymm15, %ymm0, %ymm0
+        vfmadd213pd %ymm9, %ymm10, %ymm13
+        vfmadd213pd %ymm0, %ymm10, %ymm12
+        vfmadd213pd %ymm12, %ymm11, %ymm13
+
+/*
+ * Reconstruction.
+ * dP=(R+R*dP) + dPIO2
+ */
+        vfmadd213pd %ymm14, %ymm14, %ymm13
+        vaddpd    %ymm13, %ymm4, %ymm14
+        vorpd     %ymm5, %ymm14, %ymm0
+        vaddpd    %ymm0, %ymm6, %ymm9
+        vorpd     %ymm7, %ymm9, %ymm0
+
+/*  Special branch for fast (vector) processing of zero arguments  */
+        testl     %eax, %eax
+
+/* Go to auxilary branch */
+        jne       L(AUX_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx xmm3 ymm0 ymm1 ymm2 ymm4 ymm5 ymm6 ymm7 ymm8
+
+/* Return from auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH_RETURN):
+/*
+ *  Special branch for fast (vector) processing of zero arguments
+ *  The end of implementation
+ */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm8
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   (%rsp), %ymm1
+        vmovupd   %ymm8, 64(%rsp)
+        vmovupd   %ymm0, 96(%rsp)
+        vmovupd   %ymm1, 32(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   96(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        movsd     64(%rsp,%r14,8), %xmm1
+        call      atan2@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 96(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+        cfi_restore(12)
+        cfi_restore(13)
+        cfi_restore(14)
+                                # LOE rbx r15 r12d r13d
+
+/* Auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH):
+        vmovupd   (%rsp), %ymm11
+
+/* Check if at least on of Y or Y is zero: iAXAYZERO */
+        vmovupd   dZERO+__svml_datan2_data_internal(%rip), %ymm10
+
+/* Check if both X & Y are not NaNs:  iXYnotNAN */
+        vcmpordpd %ymm8, %ymm8, %ymm12
+        vcmpordpd %ymm11, %ymm11, %ymm13
+        vcmpeqpd  %ymm10, %ymm2, %ymm2
+        vcmpeqpd  %ymm10, %ymm1, %ymm1
+        vandpd    %ymm13, %ymm12, %ymm14
+        vorpd     %ymm1, %ymm2, %ymm2
+        vextractf128 $1, %ymm14, %xmm15
+        vextractf128 $1, %ymm2, %xmm11
+        vshufps   $221, %xmm15, %xmm14, %xmm9
+        vshufps   $221, %xmm11, %xmm2, %xmm12
+
+/*
+ *  Path for zero arguments (at least one of both)
+ * Check if both args are zeros (den. is zero)
+ */
+        vcmpeqpd  32(%rsp), %ymm10, %ymm2
+
+/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
+        vpand     %xmm9, %xmm12, %xmm1
+
+/* Exclude from previous callout mask zero (and not NaN) arguments */
+        vpandn    %xmm3, %xmm1, %xmm3
+
+/* Go to callout */
+        vmovmskps %xmm3, %edx
+
+/* Set sPIO2 to zero if den. is zero */
+        vblendvpd %ymm2, %ymm10, %ymm4, %ymm4
+        vorpd     %ymm5, %ymm4, %ymm5
+
+/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
+        vextractf128 $1, %ymm10, %xmm2
+        vextractf128 $1, %ymm8, %xmm3
+        vshufps   $221, %xmm2, %xmm10, %xmm4
+        vshufps   $221, %xmm3, %xmm8, %xmm9
+        vpcmpgtd  %xmm9, %xmm4, %xmm12
+        vpshufd   $80, %xmm12, %xmm11
+        vpshufd   $250, %xmm12, %xmm13
+        vinsertf128 $1, %xmm13, %ymm11, %ymm14
+        vandpd    %ymm6, %ymm14, %ymm6
+        vaddpd    %ymm6, %ymm5, %ymm2
+        vorpd     %ymm7, %ymm2, %ymm2
+
+/* Merge results from main and spec path */
+        vpshufd   $80, %xmm1, %xmm7
+        vpshufd   $250, %xmm1, %xmm1
+        vinsertf128 $1, %xmm1, %ymm7, %ymm3
+        vblendvpd %ymm3, %ymm2, %ymm0, %ymm0
+
+/* Return to main vector processing path */
+        jmp       L(AUX_BRANCH_RETURN)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm8
+END(_ZGVdN4vv_atan2_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_datan2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 dPI[4][2];
+        __declspec(align(32)) VUINT32 dPIO2[4][2];
+        __declspec(align(32)) VUINT32 dA19[4][2];
+        __declspec(align(32)) VUINT32 dA18[4][2];
+        __declspec(align(32)) VUINT32 dA17[4][2];
+        __declspec(align(32)) VUINT32 dA16[4][2];
+        __declspec(align(32)) VUINT32 dA15[4][2];
+        __declspec(align(32)) VUINT32 dA14[4][2];
+        __declspec(align(32)) VUINT32 dA13[4][2];
+        __declspec(align(32)) VUINT32 dA12[4][2];
+        __declspec(align(32)) VUINT32 dA11[4][2];
+        __declspec(align(32)) VUINT32 dA10[4][2];
+        __declspec(align(32)) VUINT32 dA09[4][2];
+        __declspec(align(32)) VUINT32 dA08[4][2];
+        __declspec(align(32)) VUINT32 dA07[4][2];
+        __declspec(align(32)) VUINT32 dA06[4][2];
+        __declspec(align(32)) VUINT32 dA05[4][2];
+        __declspec(align(32)) VUINT32 dA04[4][2];
+        __declspec(align(32)) VUINT32 dA03[4][2];
+        __declspec(align(32)) VUINT32 dA02[4][2];
+        __declspec(align(32)) VUINT32 dA01[4][2];
+        __declspec(align(32)) VUINT32 dA00[4][2];
+        __declspec(align(32)) VUINT32 dSIGN_MASK[4][2];
+        __declspec(align(32)) VUINT32 iCHK_WORK_SUB[8][1];
+        __declspec(align(32)) VUINT32 iCHK_WORK_CMP[8][1];
+        __declspec(align(32)) VUINT32 dABS_MASK[4][2];
+        __declspec(align(32)) VUINT32 dZERO[4][2];
+} __svml_datan2_data_internal;
+#endif
+__svml_datan2_data_internal:
+        .quad 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18 //dPI
+        .align 32
+        .quad 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18 //dPIO2
+        .align 32
+        .quad 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3 // dA19
+        .align 32
+        .quad 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209 // dA18
+        .align 32
+        .quad 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23 // dA17
+        .align 32
+        .quad 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3 // dA16
+        .align 32
+        .quad 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD // dA15
+        .align 32
+        .quad 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5 // dA14
+        .align 32
+        .quad 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C // dA13
+        .align 32
+        .quad 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F // dA12
+        .align 32
+        .quad 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC // dA11
+        .align 32
+        .quad 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B // dA10
+        .align 32
+        .quad 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954 // dA09
+        .align 32
+        .quad 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF // dA08
+        .align 32
+        .quad 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0 // dA07
+        .align 32
+        .quad 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651 // dA06
+        .align 32
+        .quad 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94 // dA05
+        .align 32
+        .quad 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5 // dA04
+        .align 32
+        .quad 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7 // dA03
+        .align 32
+        .quad 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34 // dA02
+        .align 32
+        .quad 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5 // dA01
+        .align 32
+        .quad 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000 // dA00
+        .align 32
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 //dSIGN_MASK
+        .align 32
+        .long 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000 //iCHK_WORK_SUB
+        .align 32
+        .long 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000 //iCHK_WORK_CMP
+        .align 32
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff //dABS_MASK
+        .align 32
+        .quad 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000 //dZERO
+        .align 32
+        .type	__svml_datan2_data_internal,@object
+        .size	__svml_datan2_data_internal,.-__svml_datan2_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core-avx2.S
new file mode 100644
index 0000000000..a8d34a6143
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized atan2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8vv_atan2 _ZGVeN8vv_atan2_avx2_wrapper
+#include "../svml_d_atan28_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core.c
new file mode 100644
index 0000000000..a0897e9cf0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atan2, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8vv_atan2
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8vv_atan2, __GI__ZGVeN8vv_atan2,
+	       __redirect__ZGVeN8vv_atan2)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core_avx512.S
new file mode 100644
index 0000000000..6d18f5f757
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core_avx512.S
@@ -0,0 +1,475 @@
+/* Function atan2 vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_datan2_data_internal
+ */
+#define dPI                           	0
+#define dPIO2                         	64
+#define dA19                          	128
+#define dA18                          	192
+#define dA17                          	256
+#define dA16                          	320
+#define dA15                          	384
+#define dA14                          	448
+#define dA13                          	512
+#define dA12                          	576
+#define dA11                          	640
+#define dA10                          	704
+#define dA09                          	768
+#define dA08                          	832
+#define dA07                          	896
+#define dA06                          	960
+#define dA05                          	1024
+#define dA04                          	1088
+#define dA03                          	1152
+#define dA02                          	1216
+#define dA01                          	1280
+#define dA00                          	1344
+#define dSIGN_MASK                    	1408
+#define iCHK_WORK_SUB                 	1472
+#define iCHK_WORK_CMP                 	1536
+#define dABS_MASK                     	1600
+#define dZERO                         	1664
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8vv_atan2_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $256, %rsp
+        xorl      %edx, %edx
+
+/*
+ * #define NO_VECTOR_ZERO_ATAN2_ARGS
+ *  Declarations
+ * Variables
+ * Constants
+ *  The end of declarations
+ *  Implementation
+ * Get r0~=1/B
+ * Cannot be replaced by VQRCP(D, dR0, dB);
+ * Argument Absolute values
+ */
+        vmovups   dABS_MASK+__svml_datan2_data_internal(%rip), %zmm4
+
+/* Argument signs */
+        vmovups   dSIGN_MASK+__svml_datan2_data_internal(%rip), %zmm6
+
+/*
+ * 1) If y<x then a= y, b=x, PIO2=0
+ * 2) If y>x then a=-x, b=y, PIO2=Pi/2
+ */
+        vmovups   dPIO2+__svml_datan2_data_internal(%rip), %zmm3
+        vandpd    %zmm4, %zmm0, %zmm11
+        vmovaps   %zmm1, %zmm7
+        vandpd    %zmm4, %zmm7, %zmm2
+        vandpd    %zmm6, %zmm7, %zmm5
+        vandpd    %zmm6, %zmm0, %zmm4
+        vorpd     %zmm6, %zmm2, %zmm12
+        vcmppd    $17, {sae}, %zmm2, %zmm11, %k1
+        vmovdqu   iCHK_WORK_CMP+__svml_datan2_data_internal(%rip), %ymm6
+        vmovups   %zmm11, 64(%rsp)
+
+/* Check if y and x are on main path. */
+        vpsrlq    $32, %zmm2, %zmm9
+        vblendmpd %zmm11, %zmm12, %zmm13{%k1}
+        vblendmpd %zmm2, %zmm11, %zmm15{%k1}
+        vpsrlq    $32, %zmm11, %zmm8
+        vmovdqu   iCHK_WORK_SUB+__svml_datan2_data_internal(%rip), %ymm12
+        vdivpd    {rn-sae}, %zmm15, %zmm13, %zmm1
+        vmovups   %zmm15, (%rsp)
+        vpmovqd   %zmm9, %ymm14
+        vpmovqd   %zmm8, %ymm10
+        vxorpd    %zmm3, %zmm3, %zmm3{%k1}
+        vpsubd    %ymm12, %ymm14, %ymm13
+        vpsubd    %ymm12, %ymm10, %ymm9
+
+/* Polynomial. */
+        vmulpd    {rn-sae}, %zmm1, %zmm1, %zmm12
+        vpcmpgtd  %ymm6, %ymm13, %ymm15
+        vpcmpeqd  %ymm6, %ymm13, %ymm11
+        vmulpd    {rn-sae}, %zmm12, %zmm12, %zmm13
+        vpor      %ymm11, %ymm15, %ymm8
+        vmovups   dA19+__svml_datan2_data_internal(%rip), %zmm11
+        vmovups   dA15+__svml_datan2_data_internal(%rip), %zmm15
+        vpcmpgtd  %ymm6, %ymm9, %ymm14
+        vpcmpeqd  %ymm6, %ymm9, %ymm6
+        vpor      %ymm6, %ymm14, %ymm10
+        vmulpd    {rn-sae}, %zmm13, %zmm13, %zmm14
+        vmovups   dA18+__svml_datan2_data_internal(%rip), %zmm9
+        vpor      %ymm10, %ymm8, %ymm6
+        vmovups   dA17+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd231pd {rn-sae}, %zmm14, %zmm11, %zmm15
+        vmovups   dA14+__svml_datan2_data_internal(%rip), %zmm11
+        vmovups   dA12+__svml_datan2_data_internal(%rip), %zmm8
+        vfmadd231pd {rn-sae}, %zmm14, %zmm9, %zmm11
+        vmovups   dA13+__svml_datan2_data_internal(%rip), %zmm9
+        vfmadd231pd {rn-sae}, %zmm14, %zmm10, %zmm9
+        vmovups   dA16+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd231pd {rn-sae}, %zmm14, %zmm10, %zmm8
+        vmovups   dA11+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm15
+        vmovups   dA10+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm11
+        vmovups   dA09+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm9
+        vmovups   dA08+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm8
+        vmovups   dA07+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm15
+        vmovups   dA06+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm11
+        vmovups   dA05+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm9
+        vmovups   dA04+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm8
+        vmovups   dA03+__svml_datan2_data_internal(%rip), %zmm10
+
+/* A00=1.0, account for it later  VQFMA(D, dP4, dP4, dR8, dA00); */
+        vmulpd    {rn-sae}, %zmm14, %zmm8, %zmm8
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm15
+        vmovups   dA02+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm11
+        vmovups   dA01+__svml_datan2_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm11, %zmm12, %zmm15
+        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm9
+        vfmadd213pd {rn-sae}, %zmm8, %zmm12, %zmm9
+        vmovups   __svml_datan2_data_internal(%rip), %zmm8
+        vfmadd213pd {rn-sae}, %zmm9, %zmm13, %zmm15
+
+/*
+ * Reconstruction.
+ * dP=(R+R*dP) + dPIO2
+ */
+        vfmadd213pd {rn-sae}, %zmm1, %zmm1, %zmm15
+        vaddpd    {rn-sae}, %zmm3, %zmm15, %zmm1
+        vorpd     %zmm5, %zmm1, %zmm9
+
+/* if x<0, dPI = Pi, else dPI =0 */
+        vmovups   dZERO+__svml_datan2_data_internal(%rip), %zmm1
+        vcmppd    $18, {sae}, %zmm1, %zmm7, %k2
+        vaddpd    {rn-sae}, %zmm8, %zmm9, %zmm9{%k2}
+        vmovmskps %ymm6, %eax
+        vorpd     %zmm4, %zmm9, %zmm11
+
+/*  Special branch for fast (vector) processing of zero arguments  */
+        vmovups   64(%rsp), %zmm9
+        testl     %eax, %eax
+
+/* Go to auxilary branch */
+        jne       L(AUX_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm6 zmm0 zmm2 zmm3 zmm4 zmm5 zmm7 zmm9 zmm11
+
+/* Return from auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH_RETURN):
+/*
+ *  Special branch for fast (vector) processing of zero arguments
+ *  The end of implementation
+ */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm7 zmm11
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %zmm11, %zmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm0, 64(%rsp)
+        vmovups   %zmm7, 128(%rsp)
+        vmovups   %zmm11, 192(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm11
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   192(%rsp), %zmm11
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm11
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        movsd     128(%rsp,%r14,8), %xmm1
+        call      atan2@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 192(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+        cfi_restore(12)
+        cfi_restore(13)
+        cfi_restore(14)
+                                # LOE rbx r15 r12d r13d
+
+/* Auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH):
+/* Check if at least on of Y or Y is zero: iAXAYZERO */
+        vmovups   dZERO+__svml_datan2_data_internal(%rip), %zmm8
+
+/* Check if both X & Y are not NaNs:  iXYnotNAN */
+        vcmppd    $3, {sae}, %zmm7, %zmm7, %k1
+        vcmppd    $3, {sae}, %zmm0, %zmm0, %k2
+        vcmppd    $4, {sae}, %zmm8, %zmm2, %k3
+        vcmppd    $4, {sae}, %zmm8, %zmm9, %k4
+
+/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
+        vpcmpgtq  %zmm7, %zmm8, %k6
+        vpternlogd $0xff, %zmm1, %zmm1, %zmm10
+        vmovaps   %zmm10, %zmm15
+        vmovaps   %zmm10, %zmm12
+        vmovaps   %zmm10, %zmm13
+        vpandnq   %zmm2, %zmm2, %zmm15{%k3}
+        vmovaps   %zmm10, %zmm2
+        vpandnq   %zmm7, %zmm7, %zmm12{%k1}
+        vpandnq   %zmm0, %zmm0, %zmm13{%k2}
+        vpandnq   %zmm9, %zmm9, %zmm2{%k4}
+        vandpd    %zmm13, %zmm12, %zmm14
+        vorpd     %zmm2, %zmm15, %zmm9
+        vpsrlq    $32, %zmm14, %zmm1
+        vpsrlq    $32, %zmm9, %zmm2
+        vpmovqd   %zmm1, %ymm1
+        vpmovqd   %zmm2, %ymm9
+
+/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
+        vpand     %ymm1, %ymm9, %ymm2
+
+/*
+ *  Path for zero arguments (at least one of both)
+ * Check if both args are zeros (den. is zero)
+ */
+        vmovups   (%rsp), %zmm1
+
+/* Exclude from previous callout mask zero (and not NaN) arguments */
+        vpandn    %ymm6, %ymm2, %ymm6
+        vcmppd    $4, {sae}, %zmm8, %zmm1, %k5
+
+/* Go to callout */
+        vmovmskps %ymm6, %edx
+        vpandnq   %zmm1, %zmm1, %zmm10{%k5}
+
+/* Set sPIO2 to zero if den. is zero */
+        vpandnq   %zmm3, %zmm10, %zmm3
+        vpandq    %zmm10, %zmm8, %zmm1
+        vporq     %zmm1, %zmm3, %zmm3
+        vorpd     %zmm5, %zmm3, %zmm1
+        vmovups   __svml_datan2_data_internal(%rip), %zmm5
+        vaddpd    {rn-sae}, %zmm5, %zmm1, %zmm1{%k6}
+        vorpd     %zmm4, %zmm1, %zmm1
+
+/* Merge results from main and spec path */
+        vpmovzxdq %ymm2, %zmm4
+        vpsllq    $32, %zmm4, %zmm2
+        vpord     %zmm2, %zmm4, %zmm3
+        vpandnq   %zmm11, %zmm3, %zmm11
+        vpandq    %zmm3, %zmm1, %zmm1
+        vporq     %zmm1, %zmm11, %zmm11
+
+/* Return to main vector processing path */
+        jmp       L(AUX_BRANCH_RETURN)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm7 zmm11
+END(_ZGVeN8vv_atan2_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_datan2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 dPI[8][2];
+        __declspec(align(64)) VUINT32 dPIO2[8][2];
+        __declspec(align(64)) VUINT32 dA19[8][2];
+        __declspec(align(64)) VUINT32 dA18[8][2];
+        __declspec(align(64)) VUINT32 dA17[8][2];
+        __declspec(align(64)) VUINT32 dA16[8][2];
+        __declspec(align(64)) VUINT32 dA15[8][2];
+        __declspec(align(64)) VUINT32 dA14[8][2];
+        __declspec(align(64)) VUINT32 dA13[8][2];
+        __declspec(align(64)) VUINT32 dA12[8][2];
+        __declspec(align(64)) VUINT32 dA11[8][2];
+        __declspec(align(64)) VUINT32 dA10[8][2];
+        __declspec(align(64)) VUINT32 dA09[8][2];
+        __declspec(align(64)) VUINT32 dA08[8][2];
+        __declspec(align(64)) VUINT32 dA07[8][2];
+        __declspec(align(64)) VUINT32 dA06[8][2];
+        __declspec(align(64)) VUINT32 dA05[8][2];
+        __declspec(align(64)) VUINT32 dA04[8][2];
+        __declspec(align(64)) VUINT32 dA03[8][2];
+        __declspec(align(64)) VUINT32 dA02[8][2];
+        __declspec(align(64)) VUINT32 dA01[8][2];
+        __declspec(align(64)) VUINT32 dA00[8][2];
+        __declspec(align(64)) VUINT32 dSIGN_MASK[8][2];
+        __declspec(align(64)) VUINT32 iCHK_WORK_SUB[16][1];
+        __declspec(align(64)) VUINT32 iCHK_WORK_CMP[16][1];
+        __declspec(align(64)) VUINT32 dABS_MASK[8][2];
+        __declspec(align(64)) VUINT32 dZERO[8][2];
+} __svml_datan2_data_internal;
+#endif
+__svml_datan2_data_internal:
+        .quad 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18 //dPI
+        .align 64
+        .quad 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18 //dPIO2
+        .align 64
+        .quad 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3 // dA19
+        .align 64
+        .quad 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209 // dA18
+        .align 64
+        .quad 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23 // dA17
+        .align 64
+        .quad 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3 // dA16
+        .align 64
+        .quad 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD // dA15
+        .align 64
+        .quad 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5 // dA14
+        .align 64
+        .quad 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C // dA13
+        .align 64
+        .quad 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F // dA12
+        .align 64
+        .quad 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC // dA11
+        .align 64
+        .quad 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B // dA10
+        .align 64
+        .quad 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954 // dA09
+        .align 64
+        .quad 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF // dA08
+        .align 64
+        .quad 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0 // dA07
+        .align 64
+        .quad 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651 // dA06
+        .align 64
+        .quad 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94 // dA05
+        .align 64
+        .quad 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5 // dA04
+        .align 64
+        .quad 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7 // dA03
+        .align 64
+        .quad 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34 // dA02
+        .align 64
+        .quad 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5 // dA01
+        .align 64
+        .quad 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000 // dA00
+        .align 64
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 //dSIGN_MASK
+        .align 64
+        .long 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000 //iCHK_WORK_SUB
+        .align 64
+        .long 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000 //iCHK_WORK_CMP
+        .align 64
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff //dABS_MASK
+        .align 64
+        .quad 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000 //dZERO
+        .align 64
+        .type	__svml_datan2_data_internal,@object
+        .size	__svml_datan2_data_internal,.-__svml_datan2_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core-avx2.S
new file mode 100644
index 0000000000..a2a76e8bfd
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized atan2f.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16vv_atan2f _ZGVeN16vv_atan2f_avx2_wrapper
+#include "../svml_s_atan2f16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core.c
new file mode 100644
index 0000000000..6fa806414d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atan2f, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16vv_atan2f
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16vv_atan2f, __GI__ZGVeN16vv_atan2f,
+	       __redirect__ZGVeN16vv_atan2f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core_avx512.S
new file mode 100644
index 0000000000..f3477cc8e6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core_avx512.S
@@ -0,0 +1,399 @@
+/* Function atan2f vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_satan2_data_internal
+ */
+#define sZERO                         	0
+#define sONE                          	64
+#define sSIGN_MASK                    	128
+#define sABS_MASK                     	192
+#define sPIO2                         	256
+#define sPI                           	320
+#define sPC8                          	384
+#define sPC7                          	448
+#define sPC6                          	512
+#define sPC5                          	576
+#define sPC4                          	640
+#define sPC3                          	704
+#define sPC2                          	768
+#define sPC1                          	832
+#define sPC0                          	896
+#define iCHK_WORK_SUB                 	960
+#define iCHK_WORK_CMP                 	1024
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16vv_atan2f_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $256, %rsp
+        xorl      %edx, %edx
+
+/*
+ * #define NO_VECTOR_ZERO_ATAN2_ARGS
+ *  Declarations
+ * Variables
+ * Constants
+ *  The end of declarations
+ *  Implementation
+ * Arguments signs
+ */
+        vmovups   sABS_MASK+__svml_satan2_data_internal(%rip), %zmm6
+        vmovups   sONE+__svml_satan2_data_internal(%rip), %zmm3
+
+/* Testing on working interval. */
+        vmovups   iCHK_WORK_SUB+__svml_satan2_data_internal(%rip), %zmm9
+        vmovups   iCHK_WORK_CMP+__svml_satan2_data_internal(%rip), %zmm14
+
+/*
+ * 1) If y<x then a= y, b=x, PIO2=0
+ * 2) If y>x then a=-x, b=y, PIO2=Pi/2
+ */
+        vmovups   sPIO2+__svml_satan2_data_internal(%rip), %zmm4
+        vpternlogd $255, %zmm13, %zmm13, %zmm13
+        vmovaps   %zmm1, %zmm8
+        vandps    %zmm6, %zmm8, %zmm2
+        vandps    %zmm6, %zmm0, %zmm1
+        vorps     sSIGN_MASK+__svml_satan2_data_internal(%rip), %zmm2, %zmm5
+        vpsubd    %zmm9, %zmm2, %zmm10
+        vpsubd    %zmm9, %zmm1, %zmm12
+        vxorps    %zmm2, %zmm8, %zmm7
+        vxorps    %zmm1, %zmm0, %zmm6
+        vcmpps    $17, {sae}, %zmm2, %zmm1, %k1
+        vpcmpgtd  %zmm10, %zmm14, %k2
+        vpcmpgtd  %zmm12, %zmm14, %k3
+        vmovups   sPC6+__svml_satan2_data_internal(%rip), %zmm14
+        vblendmps %zmm1, %zmm5, %zmm11{%k1}
+        vblendmps %zmm2, %zmm1, %zmm5{%k1}
+        vxorps    %zmm4, %zmm4, %zmm4{%k1}
+
+/*
+ * Division a/b.
+ * Enabled when FMA is available and
+ * performance is better with NR iteration
+ */
+        vrcp14ps  %zmm5, %zmm15
+        vfnmadd231ps {rn-sae}, %zmm5, %zmm15, %zmm3
+        vfmadd213ps {rn-sae}, %zmm15, %zmm3, %zmm15
+        vmulps    {rn-sae}, %zmm15, %zmm11, %zmm3
+        vfnmadd231ps {rn-sae}, %zmm5, %zmm3, %zmm11
+        vfmadd213ps {rn-sae}, %zmm3, %zmm11, %zmm15
+        vmovups   sPC8+__svml_satan2_data_internal(%rip), %zmm11
+        vpternlogd $255, %zmm3, %zmm3, %zmm3
+
+/* Polynomial. */
+        vmulps    {rn-sae}, %zmm15, %zmm15, %zmm9
+        vpandnd   %zmm10, %zmm10, %zmm13{%k2}
+        vmulps    {rn-sae}, %zmm9, %zmm9, %zmm10
+        vfmadd231ps {rn-sae}, %zmm10, %zmm11, %zmm14
+        vmovups   sPC5+__svml_satan2_data_internal(%rip), %zmm11
+        vpandnd   %zmm12, %zmm12, %zmm3{%k3}
+        vpord     %zmm3, %zmm13, %zmm3
+        vmovups   sPC4+__svml_satan2_data_internal(%rip), %zmm13
+        vmovups   sPC7+__svml_satan2_data_internal(%rip), %zmm12
+        vptestmd  %zmm3, %zmm3, %k0
+        vfmadd213ps {rn-sae}, %zmm13, %zmm10, %zmm14
+        vfmadd231ps {rn-sae}, %zmm10, %zmm12, %zmm11
+        vmovups   sPC3+__svml_satan2_data_internal(%rip), %zmm12
+        vmovups   sPC2+__svml_satan2_data_internal(%rip), %zmm13
+
+/*  Special branch for fast (vector) processing of zero arguments  */
+        kortestw  %k0, %k0
+        vfmadd213ps {rn-sae}, %zmm12, %zmm10, %zmm11
+        vmovups   sPC1+__svml_satan2_data_internal(%rip), %zmm12
+        vfmadd213ps {rn-sae}, %zmm13, %zmm10, %zmm14
+        vmovups   sPC0+__svml_satan2_data_internal(%rip), %zmm13
+        vfmadd213ps {rn-sae}, %zmm12, %zmm10, %zmm11
+        vfmadd213ps {rn-sae}, %zmm13, %zmm10, %zmm14
+        vfmadd213ps {rn-sae}, %zmm14, %zmm9, %zmm11
+
+/* Reconstruction. */
+        vfmadd213ps {rn-sae}, %zmm4, %zmm15, %zmm11
+
+/* if x<0, sPI = Pi, else sPI =0 */
+        vmovups   __svml_satan2_data_internal(%rip), %zmm15
+        vorps     %zmm7, %zmm11, %zmm9
+        vcmpps    $18, {sae}, %zmm15, %zmm8, %k4
+        vmovups   sPI+__svml_satan2_data_internal(%rip), %zmm11
+        vaddps    {rn-sae}, %zmm11, %zmm9, %zmm9{%k4}
+        vorps     %zmm6, %zmm9, %zmm10
+
+/* Go to auxilary branch */
+        jne       L(AUX_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1 zmm2 zmm3 zmm4 zmm5 zmm6 zmm7 zmm8 zmm10 zmm11
+
+/* Return from auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH_RETURN):
+/*
+ *  Special branch for fast (vector) processing of zero arguments
+ *  The end of implementation
+ */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm8 zmm10
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %zmm10, %zmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm0, 64(%rsp)
+        vmovups   %zmm8, 128(%rsp)
+        vmovups   %zmm10, 192(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm10
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   192(%rsp), %zmm10
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm10
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        movss     128(%rsp,%r14,4), %xmm1
+        call      atan2f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 192(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+        cfi_restore(12)
+        cfi_restore(13)
+        cfi_restore(14)
+                                # LOE rbx r15 r12d r13d
+
+/* Auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH):
+/* Check if at least on of Y or Y is zero: iAXAYZERO */
+        vmovups   __svml_satan2_data_internal(%rip), %zmm9
+
+/* Check if both X & Y are not NaNs:  iXYnotNAN */
+        vcmpps    $3, {sae}, %zmm8, %zmm8, %k1
+        vcmpps    $3, {sae}, %zmm0, %zmm0, %k2
+        vpcmpd    $4, %zmm9, %zmm2, %k3
+        vpcmpd    $4, %zmm9, %zmm1, %k4
+
+/*
+ *  Path for zero arguments (at least one of both)
+ * Check if both args are zeros (den. is zero)
+ */
+        vcmpps    $4, {sae}, %zmm9, %zmm5, %k5
+
+/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
+        vpcmpgtd  %zmm8, %zmm9, %k6
+        vpternlogd $255, %zmm14, %zmm14, %zmm14
+        vpternlogd $255, %zmm12, %zmm12, %zmm12
+        vpternlogd $255, %zmm13, %zmm13, %zmm13
+        vpandnd   %zmm2, %zmm2, %zmm14{%k3}
+        vpternlogd $255, %zmm2, %zmm2, %zmm2
+        vpandnd   %zmm1, %zmm1, %zmm2{%k4}
+        vpord     %zmm2, %zmm14, %zmm15
+        vpternlogd $255, %zmm2, %zmm2, %zmm2
+        vpandnd   %zmm5, %zmm5, %zmm2{%k5}
+
+/* Set sPIO2 to zero if den. is zero */
+        vpandnd   %zmm4, %zmm2, %zmm4
+        vpandd    %zmm2, %zmm9, %zmm5
+        vpord     %zmm5, %zmm4, %zmm2
+        vorps     %zmm7, %zmm2, %zmm7
+        vaddps    {rn-sae}, %zmm11, %zmm7, %zmm7{%k6}
+        vorps     %zmm6, %zmm7, %zmm6
+        vpandnd   %zmm8, %zmm8, %zmm12{%k1}
+        vpandnd   %zmm0, %zmm0, %zmm13{%k2}
+        vandps    %zmm13, %zmm12, %zmm12
+
+/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
+        vpandd    %zmm12, %zmm15, %zmm1
+
+/* Exclude from previous callout mask zero (and not NaN) arguments */
+        vpandnd   %zmm3, %zmm1, %zmm3
+
+/* Go to callout */
+        vptestmd  %zmm3, %zmm3, %k0
+        kmovw     %k0, %edx
+
+/* Merge results from main and spec path */
+        vpandnd   %zmm10, %zmm1, %zmm10
+        vpandd    %zmm1, %zmm6, %zmm11
+        vpord     %zmm11, %zmm10, %zmm10
+
+/* Return to main vector processing path */
+        jmp       L(AUX_BRANCH_RETURN)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm8 zmm10
+END(_ZGVeN16vv_atan2f_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_satan2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 sZERO[16][1];
+        __declspec(align(64)) VUINT32 sONE[16][1];
+        __declspec(align(64)) VUINT32 sSIGN_MASK[16][1];
+        __declspec(align(64)) VUINT32 sABS_MASK[16][1];
+        __declspec(align(64)) VUINT32 sPIO2[16][1];
+        __declspec(align(64)) VUINT32 sPI[16][1];
+        __declspec(align(64)) VUINT32 sPC8[16][1];
+        __declspec(align(64)) VUINT32 sPC7[16][1];
+        __declspec(align(64)) VUINT32 sPC6[16][1];
+        __declspec(align(64)) VUINT32 sPC5[16][1];
+        __declspec(align(64)) VUINT32 sPC4[16][1];
+        __declspec(align(64)) VUINT32 sPC3[16][1];
+        __declspec(align(64)) VUINT32 sPC2[16][1];
+        __declspec(align(64)) VUINT32 sPC1[16][1];
+        __declspec(align(64)) VUINT32 sPC0[16][1];
+        __declspec(align(64)) VUINT32 iCHK_WORK_SUB[16][1];
+        __declspec(align(64)) VUINT32 iCHK_WORK_CMP[16][1];
+} __svml_satan2_data_internal;
+#endif
+__svml_satan2_data_internal:
+        .long 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 // sZERO
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000 // sONE
+        .align 64
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000 // sSIGN_MASK
+        .align 64
+        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF // sABS_MASK
+        .align 64
+        .long 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB // sPIO2
+        .align 64
+        .long 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB // sPI
+        .align 64
+        .long 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0 // sA08
+        .align 64
+        .long 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631 // sA07
+        .align 64
+        .long 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384 // sA06
+        .align 64
+        .long 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629 // sA05
+        .align 64
+        .long 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474 // sA04
+        .align 64
+        .long 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8 // sA03
+        .align 64
+        .long 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F // sA02
+        .align 64
+        .long 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49 // sA01
+        .align 64
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 // sA00
+        .align 64
+        .long 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000 //iCHK_WORK_SUB
+        .align 64
+        .long 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000 //iCHK_WORK_CMP
+        .align 64
+        .type	__svml_satan2_data_internal,@object
+        .size	__svml_satan2_data_internal,.-__svml_satan2_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core-sse2.S
new file mode 100644
index 0000000000..d1a67facf1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized atan2f.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4vv_atan2f _ZGVbN4vv_atan2f_sse2
+#include "../svml_s_atan2f4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core.c
new file mode 100644
index 0000000000..ee882b0557
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atan2f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4vv_atan2f
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4vv_atan2f, __GI__ZGVbN4vv_atan2f,
+	       __redirect__ZGVbN4vv_atan2f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core_sse4.S
new file mode 100644
index 0000000000..e4fbe82501
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core_sse4.S
@@ -0,0 +1,384 @@
+/* Function atan2f vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_satan2_data_internal
+ */
+#define sZERO                         	0
+#define sSIGN_MASK                    	16
+#define sABS_MASK                     	32
+#define sPIO2                         	48
+#define sPI                           	64
+#define sPC8                          	80
+#define sPC7                          	96
+#define sPC6                          	112
+#define sPC5                          	128
+#define sPC4                          	144
+#define sPC3                          	160
+#define sPC2                          	176
+#define sPC1                          	192
+#define sPC0                          	208
+#define iCHK_WORK_SUB                 	224
+#define iCHK_WORK_CMP                 	240
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4vv_atan2f_sse4)
+        subq      $88, %rsp
+        cfi_def_cfa_offset(96)
+        movaps    %xmm0, %xmm12
+
+/*
+ * #define NO_VECTOR_ZERO_ATAN2_ARGS
+ *  Declarations
+ * Variables
+ * Constants
+ *  The end of declarations
+ *  Implementation
+ * Arguments signs
+ */
+        movups    sABS_MASK+__svml_satan2_data_internal(%rip), %xmm10
+        movaps    %xmm1, %xmm13
+        movaps    %xmm10, %xmm11
+        andps     %xmm12, %xmm10
+        andps     %xmm13, %xmm11
+        movaps    %xmm10, %xmm7
+        cmpltps   %xmm11, %xmm7
+
+/*
+ * 1) If y<x then a= y, b=x, PIO2=0
+ * 2) If y>x then a=-x, b=y, PIO2=Pi/2
+ */
+        movups    sSIGN_MASK+__svml_satan2_data_internal(%rip), %xmm6
+        movaps    %xmm7, %xmm0
+        orps      %xmm11, %xmm6
+        movaps    %xmm10, %xmm4
+        andnps    %xmm6, %xmm0
+        movaps    %xmm7, %xmm6
+        movaps    %xmm11, %xmm5
+        andps     %xmm7, %xmm4
+        andnps    %xmm10, %xmm6
+        andps     %xmm7, %xmm5
+        orps      %xmm4, %xmm0
+        orps      %xmm5, %xmm6
+
+/* Division a/b. */
+        divps     %xmm6, %xmm0
+
+/* Testing on working interval. */
+        movdqu    iCHK_WORK_SUB+__svml_satan2_data_internal(%rip), %xmm14
+        movaps    %xmm11, %xmm15
+        movaps    %xmm10, %xmm3
+        psubd     %xmm14, %xmm15
+        psubd     %xmm14, %xmm3
+        movdqa    %xmm15, %xmm1
+        movdqu    iCHK_WORK_CMP+__svml_satan2_data_internal(%rip), %xmm2
+        movdqa    %xmm3, %xmm14
+        pcmpgtd   %xmm2, %xmm1
+        pcmpeqd   %xmm2, %xmm15
+        pcmpgtd   %xmm2, %xmm14
+        pcmpeqd   %xmm2, %xmm3
+        por       %xmm15, %xmm1
+        por       %xmm3, %xmm14
+        por       %xmm14, %xmm1
+
+/* Polynomial. */
+        movaps    %xmm0, %xmm14
+        mulps     %xmm0, %xmm14
+        movaps    %xmm13, %xmm4
+        movmskps  %xmm1, %ecx
+        movaps    %xmm14, %xmm15
+        movaps    %xmm11, %xmm9
+        mulps     %xmm14, %xmm15
+        pxor      %xmm13, %xmm9
+        movups    sPC8+__svml_satan2_data_internal(%rip), %xmm2
+        movaps    %xmm10, %xmm8
+        mulps     %xmm15, %xmm2
+        pxor      %xmm12, %xmm8
+        movups    sPC7+__svml_satan2_data_internal(%rip), %xmm3
+        xorl      %edx, %edx
+        mulps     %xmm15, %xmm3
+        addps     sPC6+__svml_satan2_data_internal(%rip), %xmm2
+        mulps     %xmm15, %xmm2
+        addps     sPC5+__svml_satan2_data_internal(%rip), %xmm3
+        mulps     %xmm15, %xmm3
+        addps     sPC4+__svml_satan2_data_internal(%rip), %xmm2
+        mulps     %xmm15, %xmm2
+        addps     sPC3+__svml_satan2_data_internal(%rip), %xmm3
+        mulps     %xmm15, %xmm3
+        addps     sPC2+__svml_satan2_data_internal(%rip), %xmm2
+        mulps     %xmm2, %xmm15
+        addps     sPC1+__svml_satan2_data_internal(%rip), %xmm3
+        mulps     %xmm3, %xmm14
+        addps     sPC0+__svml_satan2_data_internal(%rip), %xmm15
+
+/* if x<0, sPI = Pi, else sPI =0 */
+        movups    __svml_satan2_data_internal(%rip), %xmm5
+        xorl      %eax, %eax
+        andnps    sPIO2+__svml_satan2_data_internal(%rip), %xmm7
+        addps     %xmm14, %xmm15
+        cmpleps   %xmm5, %xmm4
+
+/* Reconstruction. */
+        mulps     %xmm15, %xmm0
+        andps     sPI+__svml_satan2_data_internal(%rip), %xmm4
+        addps     %xmm7, %xmm0
+        orps      %xmm9, %xmm0
+        addps     %xmm4, %xmm0
+        orps      %xmm8, %xmm0
+
+/*  Special branch for fast (vector) processing of zero arguments  */
+        testl     %ecx, %ecx
+
+/* Go to auxilary branch */
+        jne       L(AUX_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm1 xmm4 xmm5 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13
+
+/* Return from auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH_RETURN):
+/*
+ *  Special branch for fast (vector) processing of zero arguments
+ *  The end of implementation
+ */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm12 xmm13
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $88, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(96)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm12, 32(%rsp)
+        movups    %xmm13, 48(%rsp)
+        movups    %xmm0, 64(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0
+
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -80)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -88)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -96)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    64(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -80)
+        cfi_offset(13, -88)
+        cfi_offset(14, -96)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        movss     48(%rsp,%r14,4), %xmm1
+        call      atan2f@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+        cfi_restore(12)
+        cfi_restore(13)
+        cfi_restore(14)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH):
+/* Check if both X & Y are not NaNs:  iXYnotNAN */
+        movaps    %xmm13, %xmm3
+        movaps    %xmm12, %xmm2
+        cmpordps  %xmm13, %xmm3
+        cmpordps  %xmm12, %xmm2
+
+/*
+ *  Path for zero arguments (at least one of both)
+ * Check if both args are zeros (den. is zero)
+ */
+        cmpeqps   %xmm5, %xmm6
+
+/* Check if at least on of Y or Y is zero: iAXAYZERO */
+        pcmpeqd   %xmm5, %xmm11
+        pcmpeqd   %xmm5, %xmm10
+        andps     %xmm2, %xmm3
+        por       %xmm10, %xmm11
+
+/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
+        andps     %xmm3, %xmm11
+
+/* Exclude from previous callout mask zero (and not NaN) arguments */
+        movaps    %xmm11, %xmm10
+        pandn     %xmm1, %xmm10
+
+/* Set sPIO2 to zero if den. is zero */
+        movaps    %xmm6, %xmm1
+        andnps    %xmm7, %xmm1
+        andps     %xmm5, %xmm6
+        orps      %xmm6, %xmm1
+
+/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
+        pcmpgtd   %xmm13, %xmm5
+        orps      %xmm9, %xmm1
+        andps     %xmm4, %xmm5
+
+/* Merge results from main and spec path */
+        movaps    %xmm11, %xmm4
+        addps     %xmm5, %xmm1
+
+/* Go to callout */
+        movmskps  %xmm10, %edx
+        orps      %xmm8, %xmm1
+        andnps    %xmm0, %xmm4
+        andps     %xmm11, %xmm1
+        movaps    %xmm4, %xmm0
+        orps      %xmm1, %xmm0
+
+/* Return to main vector processing path */
+        jmp       L(AUX_BRANCH_RETURN)
+                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm12 xmm13
+END(_ZGVbN4vv_atan2f_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_satan2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 sZERO[4][1];
+        __declspec(align(16)) VUINT32 sSIGN_MASK[4][1];
+        __declspec(align(16)) VUINT32 sABS_MASK[4][1];
+        __declspec(align(16)) VUINT32 sPIO2[4][1];
+        __declspec(align(16)) VUINT32 sPI[4][1];
+        __declspec(align(16)) VUINT32 sPC8[4][1];
+        __declspec(align(16)) VUINT32 sPC7[4][1];
+        __declspec(align(16)) VUINT32 sPC6[4][1];
+        __declspec(align(16)) VUINT32 sPC5[4][1];
+        __declspec(align(16)) VUINT32 sPC4[4][1];
+        __declspec(align(16)) VUINT32 sPC3[4][1];
+        __declspec(align(16)) VUINT32 sPC2[4][1];
+        __declspec(align(16)) VUINT32 sPC1[4][1];
+        __declspec(align(16)) VUINT32 sPC0[4][1];
+        __declspec(align(16)) VUINT32 iCHK_WORK_SUB[4][1];
+        __declspec(align(16)) VUINT32 iCHK_WORK_CMP[4][1];
+} __svml_satan2_data_internal;
+#endif
+__svml_satan2_data_internal:
+        .long 0x00000000, 0x00000000, 0x00000000, 0x00000000 // sZERO
+        .align 16
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000 // sSIGN_MASK
+        .align 16
+        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF // sABS_MASK
+        .align 16
+        .long 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB // sPIO2
+        .align 16
+        .long 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB // sPI
+        .align 16
+        .long 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0 // sA08
+        .align 16
+        .long 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631 // sA07
+        .align 16
+        .long 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384 // sA06
+        .align 16
+        .long 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629 // sA05
+        .align 16
+        .long 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474 // sA04
+        .align 16
+        .long 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8 // sA03
+        .align 16
+        .long 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F // sA02
+        .align 16
+        .long 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49 // sA01
+        .align 16
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 // sA00
+        .align 16
+        .long 0x81000000, 0x81000000, 0x81000000, 0x81000000 //iCHK_WORK_SUB
+        .align 16
+        .long 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000 //iCHK_WORK_CMP
+        .align 16
+        .type	__svml_satan2_data_internal,@object
+        .size	__svml_satan2_data_internal,.-__svml_satan2_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core-sse.S
new file mode 100644
index 0000000000..21b1d3ff63
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized atan2f.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8vv_atan2f _ZGVdN8vv_atan2f_sse_wrapper
+#include "../svml_s_atan2f8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core.c
new file mode 100644
index 0000000000..7e02050983
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized sinf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8vv_atan2f
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8vv_atan2f, __GI__ZGVdN8vv_atan2f,
+	       __redirect__ZGVdN8vv_atan2f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core_avx2.S
new file mode 100644
index 0000000000..2e6e5eb71c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core_avx2.S
@@ -0,0 +1,362 @@
+/* Function atan2f vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
+ *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
+ *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
+ *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
+ *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
+ *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_satan2_data_internal
+ */
+#define sZERO                         	0
+#define sSIGN_MASK                    	32
+#define sABS_MASK                     	64
+#define sPIO2                         	96
+#define sPI                           	128
+#define sPC8                          	160
+#define sPC7                          	192
+#define sPC6                          	224
+#define sPC5                          	256
+#define sPC4                          	288
+#define sPC3                          	320
+#define sPC2                          	352
+#define sPC1                          	384
+#define sPC0                          	416
+#define iCHK_WORK_SUB                 	448
+#define iCHK_WORK_CMP                 	480
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8vv_atan2f_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $128, %rsp
+        xorl      %edx, %edx
+
+/*
+ * #define NO_VECTOR_ZERO_ATAN2_ARGS
+ *  Declarations
+ * Variables
+ * Constants
+ *  The end of declarations
+ *  Implementation
+ * Arguments signs
+ */
+        vmovups   sABS_MASK+__svml_satan2_data_internal(%rip), %ymm2
+
+/* Testing on working interval. */
+        vmovups   iCHK_WORK_SUB+__svml_satan2_data_internal(%rip), %ymm15
+        vmovups   iCHK_WORK_CMP+__svml_satan2_data_internal(%rip), %ymm9
+
+/* if x<0, sPI = Pi, else sPI =0 */
+        vmovups   __svml_satan2_data_internal(%rip), %ymm5
+        vmovaps   %ymm1, %ymm7
+        vandps    %ymm2, %ymm7, %ymm13
+        vandps    %ymm2, %ymm0, %ymm12
+        vcmplt_oqps %ymm13, %ymm12, %ymm4
+        vcmple_oqps %ymm5, %ymm7, %ymm6
+        vpsubd    %ymm15, %ymm13, %ymm10
+        vpsubd    %ymm15, %ymm12, %ymm8
+
+/*
+ * 1) If y<x then a= y, b=x, PIO2=0
+ * 2) If y>x then a=-x, b=y, PIO2=Pi/2
+ */
+        vorps     sSIGN_MASK+__svml_satan2_data_internal(%rip), %ymm13, %ymm3
+        vblendvps %ymm4, %ymm12, %ymm3, %ymm14
+        vblendvps %ymm4, %ymm13, %ymm12, %ymm3
+
+/* Division a/b. */
+        vdivps    %ymm3, %ymm14, %ymm11
+        vpcmpgtd  %ymm9, %ymm10, %ymm14
+        vpcmpeqd  %ymm9, %ymm10, %ymm15
+        vpor      %ymm15, %ymm14, %ymm10
+        vmovups   sPC7+__svml_satan2_data_internal(%rip), %ymm15
+        vpcmpgtd  %ymm9, %ymm8, %ymm14
+        vpcmpeqd  %ymm9, %ymm8, %ymm8
+        vpor      %ymm8, %ymm14, %ymm9
+        vmovups   sPC8+__svml_satan2_data_internal(%rip), %ymm14
+        vpor      %ymm9, %ymm10, %ymm10
+
+/* Polynomial. */
+        vmulps    %ymm11, %ymm11, %ymm9
+        vmulps    %ymm9, %ymm9, %ymm8
+        vfmadd213ps sPC6+__svml_satan2_data_internal(%rip), %ymm8, %ymm14
+        vfmadd213ps sPC5+__svml_satan2_data_internal(%rip), %ymm8, %ymm15
+        vfmadd213ps sPC4+__svml_satan2_data_internal(%rip), %ymm8, %ymm14
+        vfmadd213ps sPC3+__svml_satan2_data_internal(%rip), %ymm8, %ymm15
+        vfmadd213ps sPC2+__svml_satan2_data_internal(%rip), %ymm8, %ymm14
+        vfmadd213ps sPC1+__svml_satan2_data_internal(%rip), %ymm8, %ymm15
+        vfmadd213ps sPC0+__svml_satan2_data_internal(%rip), %ymm8, %ymm14
+        vfmadd213ps %ymm14, %ymm9, %ymm15
+        vandnps   sPIO2+__svml_satan2_data_internal(%rip), %ymm4, %ymm4
+
+/* Reconstruction. */
+        vfmadd213ps %ymm4, %ymm11, %ymm15
+        vxorps    %ymm13, %ymm7, %ymm1
+        vandps    sPI+__svml_satan2_data_internal(%rip), %ymm6, %ymm6
+        vorps     %ymm1, %ymm15, %ymm11
+        vaddps    %ymm11, %ymm6, %ymm8
+        vmovmskps %ymm10, %eax
+        vxorps    %ymm12, %ymm0, %ymm2
+        vorps     %ymm2, %ymm8, %ymm9
+
+/*  Special branch for fast (vector) processing of zero arguments  */
+        testl     %eax, %eax
+
+/* Go to auxilary branch */
+        jne       L(AUX_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1 ymm2 ymm3 ymm4 ymm5 ymm6 ymm7 ymm9 ymm10 ymm12 ymm13
+
+/* Return from auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH_RETURN):
+/*
+ *  Special branch for fast (vector) processing of zero arguments
+ *  The end of implementation
+ */
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm7 ymm9
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %ymm9, %ymm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm0, 32(%rsp)
+        vmovups   %ymm7, 64(%rsp)
+        vmovups   %ymm9, 96(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm9
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   96(%rsp), %ymm9
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm9
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        movss     64(%rsp,%r14,4), %xmm1
+        call      atan2f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 96(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+        cfi_restore(12)
+        cfi_restore(13)
+        cfi_restore(14)
+                                # LOE rbx r15 r12d r13d
+
+/* Auxilary branch
+ * for out of main path inputs
+ */
+
+L(AUX_BRANCH):
+/* Check if at least on of Y or Y is zero: iAXAYZERO */
+        vpcmpeqd  %ymm5, %ymm13, %ymm13
+        vpcmpeqd  %ymm5, %ymm12, %ymm12
+
+/* Check if both X & Y are not NaNs:  iXYnotNAN */
+        vcmpordps %ymm7, %ymm7, %ymm11
+        vcmpordps %ymm0, %ymm0, %ymm14
+
+/*
+ *  Path for zero arguments (at least one of both)
+ * Check if both args are zeros (den. is zero)
+ */
+        vcmpeqps  %ymm5, %ymm3, %ymm3
+        vpor      %ymm12, %ymm13, %ymm15
+
+/* Set sPIO2 to zero if den. is zero */
+        vblendvps %ymm3, %ymm5, %ymm4, %ymm4
+        vandps    %ymm14, %ymm11, %ymm8
+
+/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
+        vpand     %ymm8, %ymm15, %ymm8
+
+/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
+        vpcmpgtd  %ymm7, %ymm5, %ymm5
+        vorps     %ymm1, %ymm4, %ymm1
+        vandps    %ymm6, %ymm5, %ymm6
+        vaddps    %ymm6, %ymm1, %ymm1
+
+/* Exclude from previous callout mask zero (and not NaN) arguments */
+        vpandn    %ymm10, %ymm8, %ymm10
+        vorps     %ymm2, %ymm1, %ymm2
+
+/* Go to callout */
+        vmovmskps %ymm10, %edx
+
+/* Merge results from main and spec path */
+        vblendvps %ymm8, %ymm2, %ymm9, %ymm9
+
+/* Return to main vector processing path */
+        jmp       L(AUX_BRANCH_RETURN)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm7 ymm9
+END(_ZGVdN8vv_atan2f_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_satan2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 sZERO[8][1];
+        __declspec(align(32)) VUINT32 sSIGN_MASK[8][1];
+        __declspec(align(32)) VUINT32 sABS_MASK[8][1];
+        __declspec(align(32)) VUINT32 sPIO2[8][1];
+        __declspec(align(32)) VUINT32 sPI[8][1];
+        __declspec(align(32)) VUINT32 sPC8[8][1];
+        __declspec(align(32)) VUINT32 sPC7[8][1];
+        __declspec(align(32)) VUINT32 sPC6[8][1];
+        __declspec(align(32)) VUINT32 sPC5[8][1];
+        __declspec(align(32)) VUINT32 sPC4[8][1];
+        __declspec(align(32)) VUINT32 sPC3[8][1];
+        __declspec(align(32)) VUINT32 sPC2[8][1];
+        __declspec(align(32)) VUINT32 sPC1[8][1];
+        __declspec(align(32)) VUINT32 sPC0[8][1];
+        __declspec(align(32)) VUINT32 iCHK_WORK_SUB[8][1];
+        __declspec(align(32)) VUINT32 iCHK_WORK_CMP[8][1];
+} __svml_satan2_data_internal;
+#endif
+__svml_satan2_data_internal:
+        .long 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 // sZERO
+        .align 32
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000 // sSIGN_MASK
+        .align 32
+        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF // sABS_MASK
+        .align 32
+        .long 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB // sPIO2
+        .align 32
+        .long 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB // sPI
+        .align 32
+        .long 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0 // sA08
+        .align 32
+        .long 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631 // sA07
+        .align 32
+        .long 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384 // sA06
+        .align 32
+        .long 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629 // sA05
+        .align 32
+        .long 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474 // sA04
+        .align 32
+        .long 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8 // sA03
+        .align 32
+        .long 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F // sA02
+        .align 32
+        .long 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49 // sA01
+        .align 32
+        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 // sA00
+        .align 32
+        .long 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000 //iCHK_WORK_SUB
+        .align 32
+        .long 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000 //iCHK_WORK_CMP
+        .align 32
+        .type	__svml_satan2_data_internal,@object
+        .size	__svml_satan2_data_internal,.-__svml_satan2_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_atan22_core.S b/sysdeps/x86_64/fpu/svml_d_atan22_core.S
new file mode 100644
index 0000000000..f3089e70f9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atan22_core.S
@@ -0,0 +1,29 @@
+/* Function atan2 vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2vv_atan2)
+WRAPPER_IMPL_SSE2_ff atan2
+END (_ZGVbN2vv_atan2)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2vv_atan2)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_atan24_core.S b/sysdeps/x86_64/fpu/svml_d_atan24_core.S
new file mode 100644
index 0000000000..8a163d12d2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atan24_core.S
@@ -0,0 +1,29 @@
+/* Function atan2 vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4vv_atan2)
+WRAPPER_IMPL_AVX_ff _ZGVbN2vv_atan2
+END (_ZGVdN4vv_atan2)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4vv_atan2)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S b/sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S
new file mode 100644
index 0000000000..0ee5ae8faf
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S
@@ -0,0 +1,25 @@
+/* Function atan2 vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4vv_atan2)
+WRAPPER_IMPL_AVX_ff _ZGVbN2vv_atan2
+END (_ZGVcN4vv_atan2)
diff --git a/sysdeps/x86_64/fpu/svml_d_atan28_core.S b/sysdeps/x86_64/fpu/svml_d_atan28_core.S
new file mode 100644
index 0000000000..b85f696686
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atan28_core.S
@@ -0,0 +1,25 @@
+/* Function atan2 vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8vv_atan2)
+WRAPPER_IMPL_AVX512_ff _ZGVdN4vv_atan2
+END (_ZGVeN8vv_atan2)
diff --git a/sysdeps/x86_64/fpu/svml_s_atan2f16_core.S b/sysdeps/x86_64/fpu/svml_s_atan2f16_core.S
new file mode 100644
index 0000000000..25acb31dfb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atan2f16_core.S
@@ -0,0 +1,25 @@
+/* Function atan2f vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16vv_atan2f)
+WRAPPER_IMPL_AVX512_ff _ZGVdN8vv_atan2f
+END (_ZGVeN16vv_atan2f)
diff --git a/sysdeps/x86_64/fpu/svml_s_atan2f4_core.S b/sysdeps/x86_64/fpu/svml_s_atan2f4_core.S
new file mode 100644
index 0000000000..bc99f0ba10
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atan2f4_core.S
@@ -0,0 +1,29 @@
+/* Function atan2f vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4vv_atan2f)
+WRAPPER_IMPL_SSE2_ff atan2f
+END (_ZGVbN4vv_atan2f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4vv_atan2f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_atan2f8_core.S b/sysdeps/x86_64/fpu/svml_s_atan2f8_core.S
new file mode 100644
index 0000000000..bfcdb3c372
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atan2f8_core.S
@@ -0,0 +1,29 @@
+/* Function atan2f vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8vv_atan2f)
+WRAPPER_IMPL_AVX_ff _ZGVbN4vv_atan2f
+END (_ZGVdN8vv_atan2f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8vv_atan2f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S
new file mode 100644
index 0000000000..1aa8d05822
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function atan2f vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY(_ZGVcN8vv_atan2f)
+WRAPPER_IMPL_AVX_ff _ZGVbN4vv_atan2f
+END(_ZGVcN8vv_atan2f)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx.c
new file mode 100644
index 0000000000..e423bce25b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-atan2.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx2.c
new file mode 100644
index 0000000000..e423bce25b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-atan2.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx512f.c
new file mode 100644
index 0000000000..e423bce25b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-atan2.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan2.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan2.c
new file mode 100644
index 0000000000..d0aa626d95
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan2.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC atan2
+#include "test-vector-abi-arg2.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index b1981ac7e4..37a7a1c777 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVbN2v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 47915a7e59..4313f67e06 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVdN4v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 5cd5049807..4b8b00f16d 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVcN4v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 83970739ab..d06522a407 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVeN8v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx.c
new file mode 100644
index 0000000000..5c7e2c9ad5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-atan2f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx2.c
new file mode 100644
index 0000000000..5c7e2c9ad5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-atan2f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx512f.c
new file mode 100644
index 0000000000..5c7e2c9ad5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-atan2f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atan2f.c b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f.c
new file mode 100644
index 0000000000..beb5c745cb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC atan2f
+#include "test-vector-abi-arg2.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 0420f11c28..0bd631bf9a 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVeN16v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index c8f7580265..1018398bd3 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVbN4v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index b581796b88..42ea28f30f 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVdN8v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index f16789e5ff..70a0216a07 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf)
 VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVcN8v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 11/18] x86-64: Add vector log10/log10f implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (9 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 10/18] x86-64: Add vector atan2/atan2f " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:26   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 12/18] x86-64: Add vector log2/log2f " Sunil K Pandey
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized log10/log10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector log10/log10f with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |   11 +
 math/bits/mathcalls.h                         |    2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
 sysdeps/x86/fpu/bits/math-vector.h            |    4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
 sysdeps/x86_64/fpu/Makeconfig                 |    1 +
 sysdeps/x86_64/fpu/Versions                   |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
 .../fpu/multiarch/svml_d_log102_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_log102_core.c |   27 +
 .../fpu/multiarch/svml_d_log102_core_sse4.S   | 1089 +++++++++++++++++
 .../fpu/multiarch/svml_d_log104_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_log104_core.c |   27 +
 .../fpu/multiarch/svml_d_log104_core_avx2.S   | 1074 ++++++++++++++++
 .../fpu/multiarch/svml_d_log108_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_log108_core.c |   27 +
 .../fpu/multiarch/svml_d_log108_core_avx512.S |  299 +++++
 .../fpu/multiarch/svml_s_log10f16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_log10f16_core.c      |   28 +
 .../multiarch/svml_s_log10f16_core_avx512.S   |  238 ++++
 .../fpu/multiarch/svml_s_log10f4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_log10f4_core.c       |   28 +
 .../fpu/multiarch/svml_s_log10f4_core_sse4.S  |  243 ++++
 .../fpu/multiarch/svml_s_log10f8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_log10f8_core.c       |   28 +
 .../fpu/multiarch/svml_s_log10f8_core_avx2.S  |  243 ++++
 sysdeps/x86_64/fpu/svml_d_log102_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_log104_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_log104_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_log108_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_s_log10f16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_log10f4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_log10f8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S  |   25 +
 .../fpu/test-double-libmvec-log10-avx.c       |    1 +
 .../fpu/test-double-libmvec-log10-avx2.c      |    1 +
 .../fpu/test-double-libmvec-log10-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-log10.c    |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
 .../fpu/test-float-libmvec-log10f-avx.c       |    1 +
 .../fpu/test-float-libmvec-log10f-avx2.c      |    1 +
 .../fpu/test-float-libmvec-log10f-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-log10f.c    |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
 50 files changed, 3758 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log102_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log102_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log102_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log104_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log104_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log104_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log108_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log108_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log108_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log102_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log104_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log104_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log108_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 31878bf4ed..4ad584c227 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -219,4 +219,15 @@
 #define __DECL_SIMD_atan2f32x
 #define __DECL_SIMD_atan2f64x
 #define __DECL_SIMD_atan2f128x
+
+#define __DECL_SIMD_log10
+#define __DECL_SIMD_log10f
+#define __DECL_SIMD_log10l
+#define __DECL_SIMD_log10f16
+#define __DECL_SIMD_log10f32
+#define __DECL_SIMD_log10f64
+#define __DECL_SIMD_log10f128
+#define __DECL_SIMD_log10f32x
+#define __DECL_SIMD_log10f64x
+#define __DECL_SIMD_log10f128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 1bd4911993..f21384758a 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -104,7 +104,7 @@ __MATHCALL (ldexp,, (_Mdouble_ __x, int __exponent));
 __MATHCALL_VEC (log,, (_Mdouble_ __x));
 
 /* Base-ten logarithm of X.  */
-__MATHCALL (log10,, (_Mdouble_ __x));
+__MATHCALL_VEC (log10,, (_Mdouble_ __x));
 
 /* Break VALUE into integral and fractional parts.  */
 __MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr)) __nonnull ((2));
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 2b3b8d3886..8108a2a189 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -54,6 +54,7 @@ GLIBC_2.35 _ZGVbN2v_cosh F
 GLIBC_2.35 _ZGVbN2v_exp10 F
 GLIBC_2.35 _ZGVbN2v_exp2 F
 GLIBC_2.35 _ZGVbN2v_expm1 F
+GLIBC_2.35 _ZGVbN2v_log10 F
 GLIBC_2.35 _ZGVbN2v_sinh F
 GLIBC_2.35 _ZGVbN2vv_atan2 F
 GLIBC_2.35 _ZGVbN2vv_hypot F
@@ -65,6 +66,7 @@ GLIBC_2.35 _ZGVbN4v_coshf F
 GLIBC_2.35 _ZGVbN4v_exp10f F
 GLIBC_2.35 _ZGVbN4v_exp2f F
 GLIBC_2.35 _ZGVbN4v_expm1f F
+GLIBC_2.35 _ZGVbN4v_log10f F
 GLIBC_2.35 _ZGVbN4v_sinhf F
 GLIBC_2.35 _ZGVbN4vv_atan2f F
 GLIBC_2.35 _ZGVbN4vv_hypotf F
@@ -76,6 +78,7 @@ GLIBC_2.35 _ZGVcN4v_cosh F
 GLIBC_2.35 _ZGVcN4v_exp10 F
 GLIBC_2.35 _ZGVcN4v_exp2 F
 GLIBC_2.35 _ZGVcN4v_expm1 F
+GLIBC_2.35 _ZGVcN4v_log10 F
 GLIBC_2.35 _ZGVcN4v_sinh F
 GLIBC_2.35 _ZGVcN4vv_atan2 F
 GLIBC_2.35 _ZGVcN4vv_hypot F
@@ -87,6 +90,7 @@ GLIBC_2.35 _ZGVcN8v_coshf F
 GLIBC_2.35 _ZGVcN8v_exp10f F
 GLIBC_2.35 _ZGVcN8v_exp2f F
 GLIBC_2.35 _ZGVcN8v_expm1f F
+GLIBC_2.35 _ZGVcN8v_log10f F
 GLIBC_2.35 _ZGVcN8v_sinhf F
 GLIBC_2.35 _ZGVcN8vv_atan2f F
 GLIBC_2.35 _ZGVcN8vv_hypotf F
@@ -98,6 +102,7 @@ GLIBC_2.35 _ZGVdN4v_cosh F
 GLIBC_2.35 _ZGVdN4v_exp10 F
 GLIBC_2.35 _ZGVdN4v_exp2 F
 GLIBC_2.35 _ZGVdN4v_expm1 F
+GLIBC_2.35 _ZGVdN4v_log10 F
 GLIBC_2.35 _ZGVdN4v_sinh F
 GLIBC_2.35 _ZGVdN4vv_atan2 F
 GLIBC_2.35 _ZGVdN4vv_hypot F
@@ -109,6 +114,7 @@ GLIBC_2.35 _ZGVdN8v_coshf F
 GLIBC_2.35 _ZGVdN8v_exp10f F
 GLIBC_2.35 _ZGVdN8v_exp2f F
 GLIBC_2.35 _ZGVdN8v_expm1f F
+GLIBC_2.35 _ZGVdN8v_log10f F
 GLIBC_2.35 _ZGVdN8v_sinhf F
 GLIBC_2.35 _ZGVdN8vv_atan2f F
 GLIBC_2.35 _ZGVdN8vv_hypotf F
@@ -120,6 +126,7 @@ GLIBC_2.35 _ZGVeN16v_coshf F
 GLIBC_2.35 _ZGVeN16v_exp10f F
 GLIBC_2.35 _ZGVeN16v_exp2f F
 GLIBC_2.35 _ZGVeN16v_expm1f F
+GLIBC_2.35 _ZGVeN16v_log10f F
 GLIBC_2.35 _ZGVeN16v_sinhf F
 GLIBC_2.35 _ZGVeN16vv_atan2f F
 GLIBC_2.35 _ZGVeN16vv_hypotf F
@@ -131,6 +138,7 @@ GLIBC_2.35 _ZGVeN8v_cosh F
 GLIBC_2.35 _ZGVeN8v_exp10 F
 GLIBC_2.35 _ZGVeN8v_exp2 F
 GLIBC_2.35 _ZGVeN8v_expm1 F
+GLIBC_2.35 _ZGVeN8v_log10 F
 GLIBC_2.35 _ZGVeN8v_sinh F
 GLIBC_2.35 _ZGVeN8vv_atan2 F
 GLIBC_2.35 _ZGVeN8vv_hypot F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 62f2890ab3..64e80ada7a 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -102,6 +102,10 @@
 #  define __DECL_SIMD_atan2 __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_atan2f
 #  define __DECL_SIMD_atan2f __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_log10
+#  define __DECL_SIMD_log10 __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_log10f
+#  define __DECL_SIMD_log10f __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index 2269b74d50..f5050c68af 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -50,6 +50,8 @@
 !GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (atan2) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (atan2f) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (log10) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (log10f) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -85,3 +87,5 @@
 !GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (atan2) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (atan2f) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (log10) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (log10f) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 96a40856fa..ba37044e9d 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -35,6 +35,7 @@ libmvec-funcs = \
   expm1 \
   hypot \
   log \
+  log10 \
   pow \
   sin \
   sincos \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index f58c98eb45..8beaf0736f 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -22,6 +22,7 @@ libmvec {
     _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
     _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
     _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
+    _ZGVbN2v_log10; _ZGVcN4v_log10; _ZGVdN4v_log10; _ZGVeN8v_log10;
     _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
     _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
     _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
@@ -33,6 +34,7 @@ libmvec {
     _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
     _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
     _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
+    _ZGVbN4v_log10f; _ZGVcN8v_log10f; _ZGVdN8v_log10f; _ZGVeN16v_log10f;
     _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
     _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
     _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 6f59c61756..b0cd9d60ea 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1641,6 +1641,26 @@ float: 2
 float128: 1
 ldouble: 1
 
+Function: "log10_vlen16":
+float: 1
+
+Function: "log10_vlen2":
+double: 1
+
+Function: "log10_vlen4":
+double: 1
+float: 1
+
+Function: "log10_vlen4_avx2":
+double: 1
+
+Function: "log10_vlen8":
+double: 1
+float: 1
+
+Function: "log10_vlen8_avx2":
+float: 1
+
 Function: "log1p":
 double: 1
 float: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core-sse2.S
new file mode 100644
index 0000000000..e654db6d6c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized log10, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_log10 _ZGVbN2v_log10_sse2
+#include "../svml_d_log102_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core.c
new file mode 100644
index 0000000000..1c775f33b6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized log10, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_log10
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_log10, __GI__ZGVbN2v_log10, __redirect__ZGVbN2v_log10)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core_sse4.S
new file mode 100644
index 0000000000..33372f576f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core_sse4.S
@@ -0,0 +1,1089 @@
+/* Function log10 vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
+ *       log10(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dlog10_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	4112
+#define poly_coeff                    	8224
+#define ExpMask                       	8304
+#define Two10                         	8320
+#define MinNorm                       	8336
+#define MaxNorm                       	8352
+#define HalfMask                      	8368
+#define One                           	8384
+#define Threshold                     	8400
+#define Bias                          	8416
+#define Bias1                         	8432
+#define L2                            	8448
+
+/* Lookup bias for data table __svml_dlog10_data_internal.  */
+#define Table_Lookup_Bias               -0x406ff0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_log10_sse4)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $64, %rsp
+
+/* exponent bits */
+        movaps    %xmm0, %xmm5
+
+/* preserve mantissa, set input exponent to 2^(-10) */
+        movups    ExpMask+__svml_dlog10_data_internal(%rip), %xmm1
+        psrlq     $20, %xmm5
+        andps     %xmm0, %xmm1
+        lea       Table_Lookup_Bias+__svml_dlog10_data_internal(%rip), %rsi
+        orps      Two10+__svml_dlog10_data_internal(%rip), %xmm1
+
+/* check range */
+        movaps    %xmm0, %xmm8
+
+/* reciprocal approximation good to at least 11 bits */
+        cvtpd2ps  %xmm1, %xmm2
+        cmpltpd   MinNorm+__svml_dlog10_data_internal(%rip), %xmm8
+        movlhps   %xmm2, %xmm2
+        movaps    %xmm0, %xmm7
+        rcpps     %xmm2, %xmm3
+        cmpnlepd  MaxNorm+__svml_dlog10_data_internal(%rip), %xmm7
+        cvtps2pd  %xmm3, %xmm12
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        movups    .FLT_12(%rip), %xmm4
+        orps      %xmm7, %xmm8
+        addpd     %xmm4, %xmm12
+
+/* combine and get argument value range mask */
+        movmskpd  %xmm8, %edx
+
+/* argument reduction */
+        movups    HalfMask+__svml_dlog10_data_internal(%rip), %xmm9
+        subpd     %xmm4, %xmm12
+        andps     %xmm1, %xmm9
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        movaps    %xmm12, %xmm10
+        subpd     %xmm9, %xmm1
+        mulpd     %xmm12, %xmm9
+        mulpd     %xmm12, %xmm1
+        subpd     One+__svml_dlog10_data_internal(%rip), %xmm9
+        addpd     %xmm9, %xmm1
+
+/* polynomial */
+        movups    poly_coeff+__svml_dlog10_data_internal(%rip), %xmm14
+        psrlq     $40, %xmm10
+        mulpd     %xmm1, %xmm14
+        movd      %xmm10, %eax
+        pshufd    $2, %xmm10, %xmm11
+        movaps    %xmm1, %xmm10
+        movups    poly_coeff+32+__svml_dlog10_data_internal(%rip), %xmm15
+        mulpd     %xmm1, %xmm10
+        addpd     poly_coeff+16+__svml_dlog10_data_internal(%rip), %xmm14
+        mulpd     %xmm1, %xmm15
+        mulpd     %xmm10, %xmm14
+        addpd     poly_coeff+48+__svml_dlog10_data_internal(%rip), %xmm15
+        movd      %xmm11, %ecx
+
+/* exponent*log(2.0) */
+        movups    Threshold+__svml_dlog10_data_internal(%rip), %xmm13
+        addpd     %xmm14, %xmm15
+        cmpltpd   %xmm12, %xmm13
+        mulpd     %xmm15, %xmm10
+        pshufd    $221, %xmm5, %xmm6
+        movups    poly_coeff+64+__svml_dlog10_data_internal(%rip), %xmm11
+
+/* biased exponent in DP format */
+        cvtdq2pd  %xmm6, %xmm3
+        mulpd     %xmm1, %xmm11
+        andps     Bias+__svml_dlog10_data_internal(%rip), %xmm13
+        orps      Bias1+__svml_dlog10_data_internal(%rip), %xmm13
+        subpd     %xmm13, %xmm3
+        addpd     %xmm10, %xmm11
+        mulpd     L2+__svml_dlog10_data_internal(%rip), %xmm3
+        movslq    %eax, %rax
+        movslq    %ecx, %rcx
+        movsd     (%rsi,%rax), %xmm2
+        movhpd    (%rsi,%rcx), %xmm2
+
+/* reconstruction */
+        addpd     %xmm11, %xmm2
+        addpd     %xmm2, %xmm3
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm3, %xmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm3, 48(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm3
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 xmm3
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      log10@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVbN2v_log10_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dlog10_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 Log_HA_table[(1<<9)+2][2];
+        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(16)) VUINT32 poly_coeff[5][2][2];
+        __declspec(align(16)) VUINT32 ExpMask[2][2];
+        __declspec(align(16)) VUINT32 Two10[2][2];
+        __declspec(align(16)) VUINT32 MinNorm[2][2];
+        __declspec(align(16)) VUINT32 MaxNorm[2][2];
+        __declspec(align(16)) VUINT32 HalfMask[2][2];
+        __declspec(align(16)) VUINT32 One[2][2];
+        __declspec(align(16)) VUINT32 Threshold[2][2];
+        __declspec(align(16)) VUINT32 Bias[2][2];
+        __declspec(align(16)) VUINT32 Bias1[2][2];
+        __declspec(align(16)) VUINT32 L2[2][2];
+} __svml_dlog10_data_internal;
+#endif
+__svml_dlog10_data_internal:
+        /* Log_HA_table */
+        .quad 0xc0733a7146f6b080, 0xbe1e707ce619c200
+        .quad 0xc0733a7547771970, 0xbe1e79c6c06d6f51
+        .quad 0xc0733a7945aacb70, 0xbe1e78e225fad29c
+        .quad 0xc0733a7d41946970, 0xbe1e76d607f9693b
+        .quad 0xc0733a813b3691f0, 0xbe1e7704b3e0685b
+        .quad 0xc0733a853293df00, 0xbe1e79c1216a27fa
+        .quad 0xc0733a8927aee660, 0xbe1e76dce5734a81
+        .quad 0xc0733a8d1a8a3920, 0xbe1e782ee2ca4dba
+        .quad 0xc0733a910b286430, 0xbe1e7812d1a0a61f
+        .quad 0xc0733a94f98bf010, 0xbe1e77e1b5ecbc61
+        .quad 0xc0733a98e5b76100, 0xbe1e76635cac1586
+        .quad 0xc0733a9ccfad36f0, 0xbe1e7638f7968f32
+        .quad 0xc0733aa0b76feda0, 0xbe1e7840ee76e365
+        .quad 0xc0733aa49d01fcb0, 0xbe1e79f3fd01907e
+        .quad 0xc0733aa88065d7a0, 0xbe1e77bbb3a9c38a
+        .quad 0xc0733aac619dedb0, 0xbe1e7742719bf41d
+        .quad 0xc0733ab040acaa20, 0xbe1e79bcedaf79cb
+        .quad 0xc0733ab41d947450, 0xbe1e762d63cb7ca0
+        .quad 0xc0733ab7f857af50, 0xbe1e77a07be83403
+        .quad 0xc0733abbd0f8ba80, 0xbe1e7763ff836ad0
+        .quad 0xc0733abfa779f130, 0xbe1e7737720ead39
+        .quad 0xc0733ac37bddaad0, 0xbe1e7776a08e55e7
+        .quad 0xc0733ac74e263af0, 0xbe1e793e3c52dd36
+        .quad 0xc0733acb1e55f160, 0xbe1e788a94695051
+        .quad 0xc0733aceec6f1a10, 0xbe1e76508114a813
+        .quad 0xc0733ad2b873fd20, 0xbe1e76909457d23e
+        .quad 0xc0733ad68266df10, 0xbe1e7664a24f9ca4
+        .quad 0xc0733ada4a4a0090, 0xbe1e7a07b3d44b18
+        .quad 0xc0733ade101f9ee0, 0xbe1e76d87594704d
+        .quad 0xc0733ae1d3e9f340, 0xbe1e79563595a182
+        .quad 0xc0733ae595ab33b0, 0xbe1e771880c3c6ab
+        .quad 0xc0733ae955659250, 0xbe1e78c171f517d4
+        .quad 0xc0733aed131b3df0, 0xbe1e77eac3874666
+        .quad 0xc0733af0cece61b0, 0xbe1e790db479d8f6
+        .quad 0xc0733af488812550, 0xbe1e7965d1aa5c90
+        .quad 0xc0733af84035ad10, 0xbe1e78ceb398ba47
+        .quad 0xc0733afbf5ee19c0, 0xbe1e779cc0dcb5aa
+        .quad 0xc0733affa9ac88c0, 0xbe1e7871053953ed
+        .quad 0xc0733b035b731420, 0xbe1e7a082cffa71a
+        .quad 0xc0733b070b43d2a0, 0xbe1e7904b4382fad
+        .quad 0xc0733b0ab920d790, 0xbe1e79b458d0b4f3
+        .quad 0xc0733b0e650c3310, 0xbe1e79d0ded414c6
+        .quad 0xc0733b120f07f200, 0xbe1e763c357a1943
+        .quad 0xc0733b15b7161dd0, 0xbe1e78b80ba6daaa
+        .quad 0xc0733b195d38bd00, 0xbe1e7998e23b8ffd
+        .quad 0xc0733b1d0171d2c0, 0xbe1e7974aa65ee8c
+        .quad 0xc0733b20a3c35f20, 0xbe1e76ccfde752ab
+        .quad 0xc0733b24442f5ef0, 0xbe1e77b4ff19debb
+        .quad 0xc0733b27e2b7cc10, 0xbe1e7772ee478542
+        .quad 0xc0733b2b7f5e9d30, 0xbe1e781d81b58b44
+        .quad 0xc0733b2f1a25c600, 0xbe1e78350d967565
+        .quad 0xc0733b32b30f3720, 0xbe1e783888e48152
+        .quad 0xc0733b364a1cde30, 0xbe1e78367bf7c111
+        .quad 0xc0733b39df50a5d0, 0xbe1e7959e57ca47d
+        .quad 0xc0733b3d72ac75c0, 0xbe1e777322423222
+        .quad 0xc0733b41043232b0, 0xbe1e767ce42a60aa
+        .quad 0xc0733b4493e3be70, 0xbe1e781d445aea19
+        .quad 0xc0733b4821c2f800, 0xbe1e7922fca18e18
+        .quad 0xc0733b4badd1bb80, 0xbe1e76fed3d40647
+        .quad 0xc0733b4f3811e210, 0xbe1e793948c9eabc
+        .quad 0xc0733b52c0854240, 0xbe1e76e487656b8c
+        .quad 0xc0733b56472daf90, 0xbe1e780ab2f71223
+        .quad 0xc0733b59cc0cfaf0, 0xbe1e77189120b09c
+        .quad 0xc0733b5d4f24f270, 0xbe1e7644a0343a12
+        .quad 0xc0733b60d0776160, 0xbe1e78f2a3e4733d
+        .quad 0xc0733b6450061080, 0xbe1e7913b2f73ae5
+        .quad 0xc0733b67cdd2c5c0, 0xbe1e7882d08393b5
+        .quad 0xc0733b6b49df4470, 0xbe1e765e1b209979
+        .quad 0xc0733b6ec42d4d20, 0xbe1e785c9c4620d4
+        .quad 0xc0733b75b394f240, 0xbe1e78878cd0e956
+        .quad 0xc0733b7c9c178630, 0xbe1e789a4112d90b
+        .quad 0xc0733b837dc2b0f0, 0xbe1e79050b8a1766
+        .quad 0xc0733b8a58a3f220, 0xbe1e7790dffc47aa
+        .quad 0xc0733b912cc8a180, 0xbe1e77174593b06a
+        .quad 0xc0733b97fa3defb0, 0xbe1e7677de2d2ecc
+        .quad 0xc0733b9ec110e6b0, 0xbe1e76cff477ca18
+        .quad 0xc0733ba5814e6a80, 0xbe1e78f8644dec7b
+        .quad 0xc0733bac3b0339d0, 0xbe1e764e1361788d
+        .quad 0xc0733bb2ee3bee30, 0xbe1e78c913e738de
+        .quad 0xc0733bb99b04fd30, 0xbe1e76666f5bddaa
+        .quad 0xc0733bc0416ab850, 0xbe1e77e87cbd8ab6
+        .quad 0xc0733bc6e1794e10, 0xbe1e76f18ba1c966
+        .quad 0xc0733bcd7b3cca10, 0xbe1e777c9461b8db
+        .quad 0xc0733bd40ec115d0, 0xbe1e78b78526ffac
+        .quad 0xc0733bda9c11f920, 0xbe1e7942abecfede
+        .quad 0xc0733be1233b1aa0, 0xbe1e76d8a684fd8c
+        .quad 0xc0733be7a4480010, 0xbe1e79622b539ac9
+        .quad 0xc0733bee1f440f30, 0xbe1e7978e7cc20ea
+        .quad 0xc0733bf4943a8de0, 0xbe1e765c9c9de825
+        .quad 0xc0733bfb0336a290, 0xbe1e775d8b138ee2
+        .quad 0xc0733c016c435500, 0xbe1e78bf33465c2f
+        .quad 0xc0733c07cf6b8e80, 0xbe1e78164f7cc441
+        .quad 0xc0733c0e2cba1a50, 0xbe1e7824e64d0b23
+        .quad 0xc0733c148439a630, 0xbe1e78373ae7dd81
+        .quad 0xc0733c1ad5f4c2c0, 0xbe1e7704513e0afe
+        .quad 0xc0733c2121f5e3d0, 0xbe1e7914aa84200f
+        .quad 0xc0733c2768476110, 0xbe1e76b1cde25cf6
+        .quad 0xc0733c2da8f37600, 0xbe1e796120e3862d
+        .quad 0xc0733c33e40442e0, 0xbe1e78ec836d7e7b
+        .quad 0xc0733c3a1983cca0, 0xbe1e77fb13b7dabb
+        .quad 0xc0733c40497bfd70, 0xbe1e783c6fcb2404
+        .quad 0xc0733c4673f6a530, 0xbe1e7628bb93dce8
+        .quad 0xc0733c4c98fd7990, 0xbe1e7857a47b5001
+        .quad 0xc0733c52b89a16d0, 0xbe1e76708dc2831f
+        .quad 0xc0733c58d2d5ffa0, 0xbe1e77b6038651f1
+        .quad 0xc0733c5ee7ba9de0, 0xbe1e792e855bb5b2
+        .quad 0xc0733c64f75142d0, 0xbe1e776cacd5c105
+        .quad 0xc0733c6b01a32740, 0xbe1e77f8a8011315
+        .quad 0xc0733c7106b96c30, 0xbe1e765cf3efcfde
+        .quad 0xc0733c77069d1ad0, 0xbe1e78d837d2efac
+        .quad 0xc0733c7d01572530, 0xbe1e78b615cf772c
+        .quad 0xc0733c82f6f06640, 0xbe1e7650bbbd7a25
+        .quad 0xc0733c88e771a220, 0xbe1e78bcf3495872
+        .quad 0xc0733c8ed2e386c0, 0xbe1e792266832e84
+        .quad 0xc0733c94b94eabd0, 0xbe1e79c1c3c2ca52
+        .quad 0xc0733c9a9abb9340, 0xbe1e78aa61e5807d
+        .quad 0xc0733ca07732a970, 0xbe1e7620fc4cf156
+        .quad 0xc0733ca64ebc4570, 0xbe1e76b914a832c5
+        .quad 0xc0733cac2160a970, 0xbe1e79227f72020e
+        .quad 0xc0733cb1ef280300, 0xbe1e77ac972cc008
+        .quad 0xc0733cb7b81a6b10, 0xbe1e798089be41f4
+        .quad 0xc0733cbd7c3fe6a0, 0xbe1e77942ae037fe
+        .quad 0xc0733cc33ba06690, 0xbe1e7956ae6463d9
+        .quad 0xc0733cc8f643c850, 0xbe1e7918a50c7942
+        .quad 0xc0733cceac31d5d0, 0xbe1e78308eeab604
+        .quad 0xc0733cd45d7245e0, 0xbe1e76dd4ea88445
+        .quad 0xc0733cda0a0cbc60, 0xbe1e77e7c1aa5909
+        .quad 0xc0733cdfb208caa0, 0xbe1e7804b9d20e54
+        .quad 0xc0733ce5556def70, 0xbe1e78f88e99d49c
+        .quad 0xc0733ceaf4439780, 0xbe1e787d74682d68
+        .quad 0xc0733cf08e911d80, 0xbe1e76edc24fe6e7
+        .quad 0xc0733cf6245dca50, 0xbe1e79b347ec86d2
+        .quad 0xc0733cfbb5b0d580, 0xbe1e797cceb2c39b
+        .quad 0xc0733d0142916530, 0xbe1e783adbdc6aa1
+        .quad 0xc0733d06cb068e70, 0xbe1e76e4c20e3d9e
+        .quad 0xc0733d0c4f175570, 0xbe1e77070bf3cf61
+        .quad 0xc0733d11cecaadc0, 0xbe1e781c43502734
+        .quad 0xc0733d174a277a80, 0xbe1e78b11268ea72
+        .quad 0xc0733d1cc1348e90, 0xbe1e7754b83bfc7d
+        .quad 0xc0733d2233f8acb0, 0xbe1e7756c29bf5e9
+        .quad 0xc0733d27a27a87d0, 0xbe1e7952fc1d9333
+        .quad 0xc0733d2d0cc0c350, 0xbe1e778c76ae6077
+        .quad 0xc0733d3272d1f2e0, 0xbe1e7a1896ba8f43
+        .quad 0xc0733d37d4b49b30, 0xbe1e76dafdf432d8
+        .quad 0xc0733d3d326f3180, 0xbe1e795330184013
+        .quad 0xc0733d428c081c80, 0xbe1e763cc774d30f
+        .quad 0xc0733d47e185b3d0, 0xbe1e77030a779c0a
+        .quad 0xc0733d4d32ee40b0, 0xbe1e7908af2a2d7e
+        .quad 0xc0733d528047fe00, 0xbe1e78c4953b797d
+        .quad 0xc0733d57c9991850, 0xbe1e78b43b096579
+        .quad 0xc0733d5d0ee7ae30, 0xbe1e7824ae0a4804
+        .quad 0xc0733d625039d040, 0xbe1e79d2b2fbb740
+        .quad 0xc0733d678d958190, 0xbe1e7662de59a1a6
+        .quad 0xc0733d6cc700b760, 0xbe1e76b251d59aaa
+        .quad 0xc0733d71fc8159b0, 0xbe1e7a00cfd1f487
+        .quad 0xc0733d772e1d4360, 0xbe1e77f4d246167e
+        .quad 0xc0733d7c5bda4200, 0xbe1e767a4ee8e6fc
+        .quad 0xc0733d8185be1640, 0xbe1e777ccf0a8aed
+        .quad 0xc0733d86abce7420, 0xbe1e767d7e279ada
+        .quad 0xc0733d8bce1102d0, 0xbe1e7a05cef4bb90
+        .quad 0xc0733d90ec8b5d40, 0xbe1e78f75369be5b
+        .quad 0xc0733d96074311d0, 0xbe1e77b9612e8c8a
+        .quad 0xc0733d9b1e3da2b0, 0xbe1e794518b9adeb
+        .quad 0xc0733da031808620, 0xbe1e7810626fb934
+        .quad 0xc0733da541112650, 0xbe1e76d87223fa6d
+        .quad 0xc0733daa4cf4e1a0, 0xbe1e794c5e7ca3b5
+        .quad 0xc0733daf55310af0, 0xbe1e789856ef816f
+        .quad 0xc0733db459cae970, 0xbe1e77d2004effbd
+        .quad 0xc0733db95ac7b8f0, 0xbe1e78467d31eb9c
+        .quad 0xc0733dbe582caa00, 0xbe1e79aaa4e25787
+        .quad 0xc0733dc351fee220, 0xbe1e762de8f107bf
+        .quad 0xc0733dc848437b90, 0xbe1e7670670a63fe
+        .quad 0xc0733dcd3aff85d0, 0xbe1e795ca237c6cc
+        .quad 0xc0733dd22a3805b0, 0xbe1e77e55c53c1d9
+        .quad 0xc0733dd715f1f520, 0xbe1e78a806213ac4
+        .quad 0xc0733ddbfe3243b0, 0xbe1e77743a2bc615
+        .quad 0xc0733de0e2fdd660, 0xbe1e78b8b45b0b7d
+        .quad 0xc0733de5c4598800, 0xbe1e78d635f2f4b9
+        .quad 0xc0733deaa24a2920, 0xbe1e7758c396a11e
+        .quad 0xc0733def7cd48020, 0xbe1e7a17a8cc454c
+        .quad 0xc0733df453fd49a0, 0xbe1e783caa73f616
+        .quad 0xc0733df927c93820, 0xbe1e7932cfa29664
+        .quad 0xc0733dfdf83cf490, 0xbe1e777d265c72a6
+        .quad 0xc0733e02c55d1e10, 0xbe1e7775e7c03c60
+        .quad 0xc0733e078f2e4a40, 0xbe1e79f65d52d232
+        .quad 0xc0733e0c55b50570, 0xbe1e76e7e7464b4e
+        .quad 0xc0733e1118f5d250, 0xbe1e77be81cad877
+        .quad 0xc0733e15d8f52a80, 0xbe1e79dd25b5fb3a
+        .quad 0xc0733e1a95b77e80, 0xbe1e78e45f1418ef
+        .quad 0xc0733e1f4f4135a0, 0xbe1e78eb7289505b
+        .quad 0xc0733e240596ae50, 0xbe1e78a468c07cad
+        .quad 0xc0733e28b8bc3e20, 0xbe1e776b558a4009
+        .quad 0xc0733e2d68b631d0, 0xbe1e77412eb9941e
+        .quad 0xc0733e321588cd80, 0xbe1e76b2853f845e
+        .quad 0xc0733e36bf384cb0, 0xbe1e76aa7184273c
+        .quad 0xc0733e3b65c8e260, 0xbe1e7832027f78fa
+        .quad 0xc0733e40093eb930, 0xbe1e7a1c7da131f5
+        .quad 0xc0733e44a99df380, 0xbe1e76a0bc2ae4bc
+        .quad 0xc0733e4946eaab30, 0xbe1e78dff13b6f5d
+        .quad 0xc0733e4de128f250, 0xbe1e765a226dea2c
+        .quad 0xc0733e52785cd290, 0xbe1e78509b989111
+        .quad 0xc0733e570c8a4de0, 0xbe1e7916a4e9803d
+        .quad 0xc0733e5b9db55e30, 0xbe1e7950c15758cc
+        .quad 0xc0733e602be1f5a0, 0xbe1e7922ba1ad420
+        .quad 0xc0733e64b713fe90, 0xbe1e794cbaabcef6
+        .quad 0xc0733e693f4f5bc0, 0xbe1e7837bf883fed
+        .quad 0xc0733e6dc497e850, 0xbe1e76f198ddbbdf
+        .quad 0xc0733e7246f177d0, 0xbe1e7a18c1067764
+        .quad 0xc0733e76c65fd6a0, 0xbe1e76b845a8fd9d
+        .quad 0xc0733e7b42e6c970, 0xbe1e7714012df506
+        .quad 0xc0733e7fbc8a0de0, 0xbe1e7765612922cd
+        .quad 0xc0733e84334d5a50, 0xbe1e7688f5424a00
+        .quad 0xc0733e88a7345df0, 0xbe1e769d011f6663
+        .quad 0xc0733e8d1842c0e0, 0xbe1e79914acbfaf7
+        .quad 0xc0733e91867c2460, 0xbe1e79a85e189bd7
+        .quad 0xc0733e95f1e422a0, 0xbe1e79ea7c726432
+        .quad 0xc0733e9a5a7e4f10, 0xbe1e768a6fbb8e6e
+        .quad 0xc0733e9ec04e3620, 0xbe1e793c75bcc9fc
+        .quad 0xc0733ea323575dd0, 0xbe1e797f78da13d4
+        .quad 0xc0733ea7839d4550, 0xbe1e78d8c9cda978
+        .quad 0xc0733eabe1236540, 0xbe1e77028d480fff
+        .quad 0xc0733eb03bed2fa0, 0xbe1e7a0d0f74ff7c
+        .quad 0xc0733eb493fe1040, 0xbe1e76732e8a35fb
+        .quad 0xc0733eb8e9596c30, 0xbe1e77220caeabeb
+        .quad 0xc0733ebd3c02a260, 0xbe1e797438b645ef
+        .quad 0xc0733ec18bfd0b80, 0xbe1e79207c5fd6e8
+        .quad 0xc0733ec5d94bf9f0, 0xbe1e781c7df8f946
+        .quad 0xc0733eca23f2b9f0, 0xbe1e76736284e2db
+        .quad 0xc0733ece6bf49190, 0xbe1e7a109cc0c3f5
+        .quad 0xc0733ed2b154c120, 0xbe1e767f14a16d50
+        .quad 0xc0733ed6f4168290, 0xbe1e789cd22acaf0
+        .quad 0xc0733edb343d0a40, 0xbe1e764355ca28ad
+        .quad 0xc0733edf71cb8660, 0xbe1e79e4c7a81c45
+        .quad 0xc0733ee3acc51fb0, 0xbe1e761e26b644c2
+        .quad 0xc0733ee7e52cf8c0, 0xbe1e793e9f8fbdd3
+        .quad 0xc0733eec1b062ed0, 0xbe1e78c432991c20
+        .quad 0xc0733ef04e53d940, 0xbe1e78cdd025f4d8
+        .quad 0xc0733ef47f1909f0, 0xbe1e778310c6446e
+        .quad 0xc0733ef8ad58cd20, 0xbe1e7871af3d6e17
+        .quad 0xc0733efcd91629b0, 0xbe1e77e0e906f697
+        .quad 0xc0733f01025420f0, 0xbe1e7a1ae9b27892
+        .quad 0xc0733f052915af00, 0xbe1e76ac64c88f9d
+        .quad 0xc0733f094d5dca60, 0xbe1e779a815589c4
+        .quad 0xc0733f0d6f2f6480, 0xbe1e788f39a4864c
+        .quad 0xc0733f118e8d6980, 0xbe1e79fc51263525
+        .quad 0xc0733f15ab7ac060, 0xbe1e783501f19e90
+        .quad 0xc0733f19c5fa4ae0, 0xbe1e767e82c327ab
+        .quad 0xc0733f1dde0ee5a0, 0xbe1e7a1785d66123
+        .quad 0xc0733f21f3bb6870, 0xbe1e7936d07203da
+        .quad 0xc0733f260702a5e0, 0xbe1e7a010a7ac699
+        .quad 0xc0733f2a17e76bb0, 0xbe1e7975e4e16312
+        .quad 0xc0733f2e266c82b0, 0xbe1e7654b5422330
+        .quad 0xc0733f323294aeb0, 0xbe1e77f8a4909d35
+        .quad 0xc0733f363c62aee0, 0xbe1e792c8e30d226
+        .quad 0xc0733f3a43d93da0, 0xbe1e76f6ac67a1ff
+        .quad 0xc0733f3e48fb1070, 0xbe1e775c2e97715a
+        .quad 0xc0733f424bcad840, 0xbe1e781cd54ae100
+        /*== Log_LA_table ==*/
+        .align 16
+        .quad 0x0000000000000000
+        .quad 0xbf4bc48a867884b7
+        .quad 0xbf5bbd9e9482af09
+        .quad 0xbf64c9096b94befd
+        .quad 0xbf6bafd47221ed26
+        .quad 0xbf714999e2ad8ea6
+        .quad 0xbf74b99563d2a1bd
+        .quad 0xbf7827de6b310350
+        .quad 0xbf7b9476a4fcd10f
+        .quad 0xbf7eff5fbaf25781
+        .quad 0xbf81344daa2d7553
+        .quad 0xbf82e8158b08d957
+        .quad 0xbf849b0851443684
+        .quad 0xbf864d26cce610dd
+        .quad 0xbf87fe71ccc4e6b0
+        .quad 0xbf89aeea1e897fdf
+        .quad 0xbf8b5e908eb13790
+        .quad 0xbf8d0d65e890405a
+        .quad 0xbf8ebb6af653e2ee
+        .quad 0xbf90345040825bad
+        .quad 0xbf910a83a8446c78
+        .quad 0xbf91e05015d30a71
+        .quad 0xbf92b5b5ec0209d3
+        .quad 0xbf938ab58d173e91
+        .quad 0xbf945f4f5acb8be0
+        .quad 0xbf953383b64bf13f
+        .quad 0xbf960753003a94ef
+        .quad 0xbf96dabd98afcc05
+        .quad 0xbf97adc3df3b1ff8
+        .quad 0xbf98806632e451d0
+        .quad 0xbf9952a4f22c5ae9
+        .quad 0xbf9a24807b0e6b5c
+        .quad 0xbf9af5f92b00e610
+        .quad 0xbf9bc70f5ef65a77
+        .quad 0xbf9c97c3735e7c0a
+        .quad 0xbf9d6815c4271775
+        .quad 0xbf9e3806acbd058f
+        .quad 0xbf9f0796880d1c19
+        .quad 0xbf9fd6c5b0851c4c
+        .quad 0xbfa052ca400a4f9b
+        .quad 0xbfa0ba01a8170000
+        .quad 0xbfa121093ce3a205
+        .quad 0xbfa187e12aad8077
+        .quad 0xbfa1ee899d74a03e
+        .quad 0xbfa25502c0fc314c
+        .quad 0xbfa2bb4cc0cafe8d
+        .quad 0xbfa32167c82bdcda
+        .quad 0xbfa38754022e18e2
+        .quad 0xbfa3ed1199a5e425
+        .quad 0xbfa452a0b92cc0ec
+        .quad 0xbfa4b8018b21ed4f
+        .quad 0xbfa51d3439aacd4a
+        .quad 0xbfa58238eeb353da
+        .quad 0xbfa5e70fd3ee6b34
+        .quad 0xbfa64bb912d65c07
+        .quad 0xbfa6b034d4ad33df
+        .quad 0xbfa71483427d2a99
+        .quad 0xbfa778a4851906f3
+        .quad 0xbfa7dc98c51c8242
+        .quad 0xbfa840602aecab3d
+        .quad 0xbfa8a3fadeb847f4
+        .quad 0xbfa90769087836e4
+        .quad 0xbfa96aaacfefcf3c
+        .quad 0xbfa9cdc05cad4042
+        .quad 0xbfaa30a9d609efea
+        .quad 0xbfaa9367632ad897
+        .quad 0xbfaaf5f92b00e610
+        .quad 0xbfab585f544951a4
+        .quad 0xbfabba9a058dfd84
+        .quad 0xbfac1ca96525cf56
+        .quad 0xbfac7e8d993509f9
+        .quad 0xbface046c7ada68d
+        .quad 0xbfad41d5164facb4
+        .quad 0xbfada338aaa98a0c
+        .quad 0xbfae0471aa1868f5
+        .quad 0xbfae658039c88690
+        .quad 0xbfaec6647eb58808
+        .quad 0xbfaf271e9daacf20
+        .quad 0xbfaf87aebb43ce06
+        .quad 0xbfafe814fbec5a77
+        .quad 0xbfb02428c1f08016
+        .quad 0xbfb054323b97a948
+        .quad 0xbfb08426fcdb1ee7
+        .quad 0xbfb0b40717932b96
+        .quad 0xbfb0e3d29d81165e
+        .quad 0xbfb11389a04f4a2e
+        .quad 0xbfb1432c31917d08
+        .quad 0xbfb172ba62c4d6de
+        .quad 0xbfb1a23445501816
+        .quad 0xbfb1d199ea83bfbe
+        .quad 0xbfb200eb639a3173
+        .quad 0xbfb23028c1b7daed
+        .quad 0xbfb25f5215eb594a
+        .quad 0xbfb28e67712d9dfc
+        .quad 0xbfb2bd68e4621371
+        .quad 0xbfb2ec568056c16f
+        .quad 0xbfb31b3055c47118
+        .quad 0xbfb349f6754ed0b4
+        .quad 0xbfb378a8ef84971e
+        .quad 0xbfb3a747d4dfa6f5
+        .quad 0xbfb3d5d335c53179
+        .quad 0xbfb4044b2285d925
+        .quad 0xbfb432afab5dd3ff
+        .quad 0xbfb46100e0750da1
+        .quad 0xbfb48f3ed1df48fb
+        .quad 0xbfb4bd698f9c41cf
+        .quad 0xbfb4eb812997cde4
+        .quad 0xbfb51985afa9fdfd
+        .quad 0xbfb5477731973e85
+        .quad 0xbfb57555bf1077f5
+        .quad 0xbfb5a32167b32f02
+        .quad 0xbfb5d0da3b09a47e
+        .quad 0xbfb5fe80488af4fd
+        .quad 0xbfb62c139f9b3837
+        .quad 0xbfb659944f8ba02d
+        .quad 0xbfb68702679a980a
+        .quad 0xbfb6b45df6f3e2c9
+        .quad 0xbfb6e1a70cb0b99a
+        .quad 0xbfb70eddb7d7ea07
+        .quad 0xbfb73c02075df3e5
+        .quad 0xbfb769140a2526fd
+        .quad 0xbfb79613cefdc07d
+        .quad 0xbfb7c30164a60836
+        .quad 0xbfb7efdcd9ca6d8f
+        .quad 0xbfb81ca63d05a44a
+        .quad 0xbfb8495d9ce0c10c
+        .quad 0xbfb8760307d355ab
+        .quad 0xbfb8a2968c438d41
+        .quad 0xbfb8cf183886480d
+        .quad 0xbfb8fb881adf3713
+        .quad 0xbfb927e64180f790
+        .quad 0xbfb95432ba8d2e2f
+        .quad 0xbfb9806d9414a209
+        .quad 0xbfb9ac96dc175776
+        .quad 0xbfb9d8aea084aa9c
+        .quad 0xbfba04b4ef3b69d8
+        .quad 0xbfba30a9d609efea
+        .quad 0xbfba5c8d62ae3dec
+        .quad 0xbfba885fa2d6151e
+        .quad 0xbfbab420a41f1076
+        .quad 0xbfbadfd07416be07
+        .quad 0xbfbb0b6f203ab82c
+        .quad 0xbfbb36fcb5f8be8a
+        .quad 0xbfbb627942aecedd
+        .quad 0xbfbb8de4d3ab3d98
+        .quad 0xbfbbb93f762cce4f
+        .quad 0xbfbbe4893762cbf7
+        .quad 0xbfbc0fc2246d20f5
+        .quad 0xbfbc3aea4a5c6eff
+        .quad 0xbfbc6601b63226cb
+        .quad 0xbfbc910874e09f98
+        .quad 0xbfbcbbfe934b2e81
+        .quad 0xbfbce6e41e463da5
+        .quad 0xbfbd11b92297632b
+        .quad 0xbfbd3c7dacf5780b
+        .quad 0xbfbd6731ca08aeb9
+        .quad 0xbfbd91d5866aa99c
+        .quad 0xbfbdbc68eea6915b
+        .quad 0xbfbde6ec0f392b05
+        .quad 0xbfbe115ef490ee07
+        .quad 0xbfbe3bc1ab0e19fe
+        .quad 0xbfbe66143f02cc5d
+        .quad 0xbfbe9056bcb315e8
+        .quad 0xbfbeba893055100b
+        .quad 0xbfbee4aba610f204
+        .quad 0xbfbf0ebe2a0125eb
+        .quad 0xbfbf38c0c8325d86
+        .quad 0xbfbf62b38ca3a706
+        .quad 0xbfbf8c9683468191
+        .quad 0xbfbfb669b7fef1a8
+        .quad 0xbfbfe02d36a3956d
+        .quad 0xbfc004f0857edc5c
+        .quad 0xbfc019c2a064b486
+        .quad 0xbfc02e8cf1dac4b8
+        .quad 0xbfc0434f7fb1f307
+        .quad 0xbfc0580a4fb4a3df
+        .quad 0xbfc06cbd67a6c3b6
+        .quad 0xbfc08168cd45d0a9
+        .quad 0xbfc0960c8648e406
+        .quad 0xbfc0aaa89860bbcf
+        .quad 0xbfc0bf3d0937c41c
+        .quad 0xbfc0d3c9de722078
+        .quad 0xbfc0e84f1dadb526
+        .quad 0xbfc0fccccc823059
+        .quad 0xbfc11142f0811357
+        .quad 0xbfc125b18f35bb8e
+        .quad 0xbfc13a18ae256b99
+        .quad 0xbfc14e7852cf5430
+        .quad 0xbfc162d082ac9d10
+        .quad 0xbfc1772143306dc6
+        .quad 0xbfc18b6a99c7f679
+        .quad 0xbfc19fac8bda7897
+        .quad 0xbfc1b3e71ec94f7b
+        .quad 0xbfc1c81a57eff8fd
+        .quad 0xbfc1dc463ca41df8
+        .quad 0xbfc1f06ad2359abd
+        .quad 0xbfc204881dee8777
+        .quad 0xbfc2189e25134081
+        .quad 0xbfc22cacece26ead
+        .quad 0xbfc240b47a950f79
+        .quad 0xbfc254b4d35e7d3c
+        .quad 0xbfc268adfc6c773e
+        .quad 0xbfc27c9ffae729c1
+        .quad 0xbfc2908ad3f13603
+        .quad 0xbfc2a46e8ca7ba2a
+        .quad 0xbfc2b84b2a225923
+        .quad 0xbfc2cc20b1734279
+        .quad 0xbfc2dfef27a73a18
+        .quad 0xbfc2f3b691c5a001
+        .quad 0xbfc30776f4d077f7
+        .quad 0xbfc31b3055c47118
+        .quad 0xbfc32ee2b998ed6e
+        .quad 0xbfc3428e2540096d
+        .quad 0x3fc331f403985097
+        .quad 0x3fc31e56798a910a
+        .quad 0x3fc30abfd8f333b6
+        .quad 0x3fc2f7301cf4e87b
+        .quad 0x3fc2e3a740b7800f
+        .quad 0x3fc2d0253f67e4cb
+        .quad 0x3fc2bcaa14381386
+        .quad 0x3fc2a935ba5f1479
+        .quad 0x3fc295c82d18f434
+        .quad 0x3fc2826167a6bc9c
+        .quad 0x3fc26f01654e6df6
+        .quad 0x3fc25ba8215af7fc
+        .quad 0x3fc24855971c3307
+        .quad 0x3fc23509c1e6d937
+        .quad 0x3fc221c49d147fb3
+        .quad 0x3fc20e8624038fed
+        .quad 0x3fc1fb4e521740f4
+        .quad 0x3fc1e81d22b790d4
+        .quad 0x3fc1d4f291513e01
+        .quad 0x3fc1c1ce9955c0c6
+        .quad 0x3fc1aeb1363b44c8
+        .quad 0x3fc19b9a637ca295
+        .quad 0x3fc1888a1c995931
+        .quad 0x3fc175805d1587c1
+        .quad 0x3fc1627d2079e731
+        .quad 0x3fc14f806253c3ed
+        .quad 0x3fc13c8a1e34f7a0
+        .quad 0x3fc1299a4fb3e306
+        .quad 0x3fc116b0f26b67bb
+        .quad 0x3fc103ce01fae223
+        .quad 0x3fc0f0f17a062353
+        .quad 0x3fc0de1b56356b04
+        .quad 0x3fc0cb4b9235619a
+        .quad 0x3fc0b88229b71227
+        .quad 0x3fc0a5bf186fe483
+        .quad 0x3fc093025a19976c
+        .quad 0x3fc0804bea723aa9
+        .quad 0x3fc06d9bc53c2941
+        .quad 0x3fc05af1e63e03b4
+        .quad 0x3fc0484e4942aa43
+        .quad 0x3fc035b0ea19373b
+        .quad 0x3fc02319c494f951
+        .quad 0x3fc01088d48d6e03
+        .quad 0x3fbffbfc2bbc7803
+        .quad 0x3fbfd6f308ce5b52
+        .quad 0x3fbfb1f6381856f4
+        .quad 0x3fbf8d05b16a6d47
+        .quad 0x3fbf68216c9cc727
+        .quad 0x3fbf4349618fa91a
+        .quad 0x3fbf1e7d882b689a
+        .quad 0x3fbef9bdd860616b
+        .quad 0x3fbed50a4a26eafc
+        .quad 0x3fbeb062d57f4de8
+        .quad 0x3fbe8bc77271b97a
+        .quad 0x3fbe6738190e394c
+        .quad 0x3fbe42b4c16caaf3
+        .quad 0x3fbe1e3d63acb3ba
+        .quad 0x3fbdf9d1f7f5b674
+        .quad 0x3fbdd5727676c959
+        .quad 0x3fbdb11ed766abf4
+        .quad 0x3fbd8cd71303bd26
+        .quad 0x3fbd689b2193f133
+        .quad 0x3fbd446afb64c7e5
+        .quad 0x3fbd204698cb42bd
+        .quad 0x3fbcfc2df223db2d
+        .quad 0x3fbcd820ffd278f3
+        .quad 0x3fbcb41fba42686d
+        .quad 0x3fbc902a19e65111
+        .quad 0x3fbc6c4017382bea
+        .quad 0x3fbc4861aab93a23
+        .quad 0x3fbc248eccf1fba6
+        .quad 0x3fbc00c7767225cb
+        .quad 0x3fbbdd0b9fd09a10
+        .quad 0x3fbbb95b41ab5ce6
+        .quad 0x3fbb95b654a78c87
+        .quad 0x3fbb721cd17157e3
+        .quad 0x3fbb4e8eb0bbf58f
+        .quad 0x3fbb2b0beb419ad0
+        .quad 0x3fbb079479c372ad
+        .quad 0x3fbae4285509950b
+        .quad 0x3fbac0c775e2fde6
+        .quad 0x3fba9d71d5258484
+        .quad 0x3fba7a276badd2c8
+        .quad 0x3fba56e8325f5c87
+        .quad 0x3fba33b4222456f1
+        .quad 0x3fba108b33edb005
+        .quad 0x3fb9ed6d60b30612
+        .quad 0x3fb9ca5aa1729f45
+        .quad 0x3fb9a752ef316149
+        .quad 0x3fb9845642fac8f0
+        .quad 0x3fb9616495e0e1e8
+        .quad 0x3fb93e7de0fc3e80
+        .quad 0x3fb91ba21d6bef77
+        .quad 0x3fb8f8d144557bdf
+        .quad 0x3fb8d60b4ee4d901
+        .quad 0x3fb8b350364c6257
+        .quad 0x3fb8909ff3c4d191
+        .quad 0x3fb86dfa808d36a0
+        .quad 0x3fb84b5fd5eaefd8
+        .quad 0x3fb828cfed29a215
+        .quad 0x3fb8064abf9b30f1
+        .quad 0x3fb7e3d04697b704
+        .quad 0x3fb7c1607b7d7e32
+        .quad 0x3fb79efb57b0f803
+        .quad 0x3fb77ca0d49cb608
+        .quad 0x3fb75a50ebb1624a
+        .quad 0x3fb7380b9665b7c8
+        .quad 0x3fb715d0ce367afc
+        .quad 0x3fb6f3a08ca67270
+        .quad 0x3fb6d17acb3e5f5e
+        .quad 0x3fb6af5f838cf654
+        .quad 0x3fb68d4eaf26d7ee
+        .quad 0x3fb66b4847a68997
+        .quad 0x3fb6494c46ac6e4d
+        .quad 0x3fb6275aa5debf81
+        .quad 0x3fb605735ee985f1
+        .quad 0x3fb5e3966b7e9295
+        .quad 0x3fb5c1c3c5557799
+        .quad 0x3fb59ffb662b815c
+        .quad 0x3fb57e3d47c3af7b
+        .quad 0x3fb55c8963e6adeb
+        .quad 0x3fb53adfb462ce16
+        .quad 0x3fb51940330c000b
+        .quad 0x3fb4f7aad9bbcbaf
+        .quad 0x3fb4d61fa2514a00
+        .quad 0x3fb4b49e86b11e5f
+        .quad 0x3fb4932780c56fe2
+        .quad 0x3fb471ba8a7de2b7
+        .quad 0x3fb450579dcf9186
+        .quad 0x3fb42efeb4b506e9
+        .quad 0x3fb40dafc92e36e2
+        .quad 0x3fb3ec6ad5407868
+        .quad 0x3fb3cb2fd2f67ef1
+        .quad 0x3fb3a9febc60540a
+        .quad 0x3fb388d78b9350ff
+        .quad 0x3fb367ba3aaa1883
+        .quad 0x3fb346a6c3c49066
+        .quad 0x3fb3259d2107db54
+        .quad 0x3fb3049d4c9e52a0
+        .quad 0x3fb2e3a740b7800f
+        .quad 0x3fb2c2baf78817b7
+        .quad 0x3fb2a1d86b49f1e2
+        .quad 0x3fb280ff963c04fc
+        .quad 0x3fb2603072a25f82
+        .quad 0x3fb23f6afac6220a
+        .quad 0x3fb21eaf28f57941
+        .quad 0x3fb1fdfcf7839804
+        .quad 0x3fb1dd5460c8b16f
+        .quad 0x3fb1bcb55f21f307
+        .quad 0x3fb19c1fecf17ee0
+        .quad 0x3fb17b94049e65d0
+        .quad 0x3fb15b11a094a1aa
+        .quad 0x3fb13a98bb450f81
+        .quad 0x3fb11a294f2569f6
+        .quad 0x3fb0f9c356b04389
+        .quad 0x3fb0d966cc6500fa
+        .quad 0x3fb0b913aac7d3a7
+        .quad 0x3fb098c9ec61b3ff
+        .quad 0x3fb078898bc05bf4
+        .quad 0x3fb0585283764178
+        .quad 0x3fb03824ce1a9101
+        .quad 0x3fb0180066492817
+        .quad 0x3fafefca8d451fd6
+        .quad 0x3fafafa6d397efdb
+        .quad 0x3faf6f9594de60f0
+        .quad 0x3faf2f96c6754aee
+        .quad 0x3faeefaa5dc2b239
+        .quad 0x3faeafd05035bd3b
+        .quad 0x3fae70089346a9e6
+        .quad 0x3fae30531c76c34a
+        .quad 0x3fadf0afe1505738
+        .quad 0x3fadb11ed766abf4
+        .quad 0x3fad719ff455f5f7
+        .quad 0x3fad32332dc34dbd
+        .quad 0x3facf2d8795ca5a5
+        .quad 0x3facb38fccd8bfdb
+        .quad 0x3fac74591df72456
+        .quad 0x3fac3534628016dd
+        .quad 0x3fabf62190448d22
+        .quad 0x3fabb7209d1e24e5
+        .quad 0x3fab78317eef1a29
+        .quad 0x3fab39542ba23d73
+        .quad 0x3faafa88992aea19
+        .quad 0x3faabbcebd84fca0
+        .quad 0x3faa7d268eb4c924
+        .quad 0x3faa3e9002c711d2
+        .quad 0x3faa000b0fd0fd6b
+        .quad 0x3fa9c197abf00dd7
+        .quad 0x3fa98335cd4a16c3
+        .quad 0x3fa944e56a0d3450
+        .quad 0x3fa906a6786fc1cb
+        .quad 0x3fa8c878eeb05074
+        .quad 0x3fa88a5cc3159e53
+        .quad 0x3fa84c51ebee8d15
+        .quad 0x3fa80e585f9218fc
+        .quad 0x3fa7d070145f4fd7
+        .quad 0x3fa7929900bd4809
+        .quad 0x3fa754d31b1b179c
+        .quad 0x3fa7171e59efcb5f
+        .quad 0x3fa6d97ab3ba5e10
+        .quad 0x3fa69be81f01af99
+        .quad 0x3fa65e6692547c4e
+        .quad 0x3fa620f604495440
+        .quad 0x3fa5e3966b7e9295
+        .quad 0x3fa5a647be9a54f6
+        .quad 0x3fa56909f44a72fe
+        .quad 0x3fa52bdd034475b8
+        .quad 0x3fa4eec0e2458f30
+        .quad 0x3fa4b1b588129203
+        .quad 0x3fa474baeb77e904
+        .quad 0x3fa437d103498eec
+        .quad 0x3fa3faf7c663060e
+        .quad 0x3fa3be2f2ba7501f
+        .quad 0x3fa381772a00e604
+        .quad 0x3fa344cfb861afae
+        .quad 0x3fa30838cdc2fbfd
+        .quad 0x3fa2cbb2612578b4
+        .quad 0x3fa28f3c69912a74
+        .quad 0x3fa252d6de1564c1
+        .quad 0x3fa21681b5c8c213
+        .quad 0x3fa1da3ce7c91bf8
+        .quad 0x3fa19e086b3b8333
+        .quad 0x3fa161e4374c37f4
+        .quad 0x3fa125d0432ea20e
+        .quad 0x3fa0e9cc861d4944
+        .quad 0x3fa0add8f759cd95
+        .quad 0x3fa071f58e2cdf9b
+        .quad 0x3fa0362241e638ec
+        .quad 0x3f9ff4be13b92920
+        .quad 0x3f9f7d57badb4ee8
+        .quad 0x3f9f061167fc31e8
+        .quad 0x3f9e8eeb09f2f6cb
+        .quad 0x3f9e17e48fa48962
+        .quad 0x3f9da0fde8038de9
+        .quad 0x3f9d2a3702105259
+        .quad 0x3f9cb38fccd8bfdb
+        .quad 0x3f9c3d0837784c41
+        .quad 0x3f9bc6a03117eb97
+        .quad 0x3f9b5057a8ee01ce
+        .quad 0x3f9ada2e8e3e546f
+        .quad 0x3f9a6424d059fc68
+        .quad 0x3f99ee3a5e9f57e8
+        .quad 0x3f99786f2879fc53
+        .quad 0x3f9902c31d62a843
+        .quad 0x3f988d362cdf359e
+        .quad 0x3f9817c846828bbd
+        .quad 0x3f97a27959ec91aa
+        .quad 0x3f972d4956ca2067
+        .quad 0x3f96b8382cd4f551
+        .quad 0x3f964345cbd3a491
+        .quad 0x3f95ce7223998b98
+        .quad 0x3f9559bd2406c3ba
+        .quad 0x3f94e526bd0814d1
+        .quad 0x3f9470aede96e7f2
+        .quad 0x3f93fc5578b93a38
+        .quad 0x3f93881a7b818f9e
+        .quad 0x3f9313fdd70ee5e8
+        .quad 0x3f929fff7b8ca79d
+        .quad 0x3f922c1f59329f1b
+        .quad 0x3f91b85d6044e9ae
+        .quad 0x3f9144b98113eac0
+        .quad 0x3f90d133abfc3f1b
+        .quad 0x3f905dcbd166b033
+        .quad 0x3f8fd503c3904f1d
+        .quad 0x3f8eeeab9b43445d
+        .quad 0x3f8e088f0b004827
+        .quad 0x3f8d22adf3f9579d
+        .quad 0x3f8c3d0837784c41
+        .quad 0x3f8b579db6dec358
+        .quad 0x3f8a726e53a6056e
+        .quad 0x3f898d79ef5eedf0
+        .quad 0x3f88a8c06bb1d2f4
+        .quad 0x3f87c441aa5e6d15
+        .quad 0x3f86dffd8d3bbf70
+        .quad 0x3f85fbf3f637ffc5
+        .quad 0x3f851824c7587eb0
+        .quad 0x3f84348fe2b99002
+        .quad 0x3f8351352a8e733f
+        .quad 0x3f826e1481213c2e
+        .quad 0x3f818b2dc8d2bb91
+        .quad 0x3f80a880e41a67f6
+        .quad 0x3f7f8c1b6b0c8d4e
+        .quad 0x3f7dc7a83f75a96d
+        .quad 0x3f7c03a80ae5e054
+        .quad 0x3f7a401a92ff827e
+        .quad 0x3f787cff9d9147a5
+        .quad 0x3f76ba56f09621bc
+        .quad 0x3f74f8205235102d
+        .quad 0x3f73365b88c0f347
+        .quad 0x3f7175085ab85ff0
+        .quad 0x3f6f684d1d8ae702
+        .quad 0x3f6be76bd77b4fc3
+        .quad 0x3f68676c71434fb9
+        .quad 0x3f64e84e793a474a
+        .quad 0x3f616a117e0d4b30
+        .quad 0x3f5bd96a1d7d9cbc
+        .quad 0x3f54e071754c98ba
+        .quad 0x3f4bd27045bfd025
+        .quad 0x3f3bcef518e29612
+        .quad 0x8000000000000000
+        /*== poly_coeff[5] ==*/
+        .align 16
+        .quad 0x3fb63C65231FBD16, 0x3fb63C65231FBD16 /* coeff5 */
+        .quad 0xbfbBCB7D4EFBE80B, 0xbfbBCB7D4EFBE80B /* coeff4 */
+        .quad 0x3fc287A7636F341E, 0x3fc287A7636F341E /* coeff3 */
+        .quad 0xbfcBCB7B1526DE36, 0xbfcBCB7B1526DE36 /* coeff2 */
+        .quad 0x3fdBCB7B1526E50E, 0x3fdBCB7B1526E50E /* coeff1 */
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 16
+        .quad 0x3f50000000000000, 0x3f50000000000000
+        /*== MinNorm ==*/
+        .align 16
+        .quad 0x0010000000000000, 0x0010000000000000
+        /*== MaxNorm ==*/
+        .align 16
+        .quad 0x7fefffffffffffff, 0x7fefffffffffffff
+        /*== HalfMask ==*/
+        .align 16
+        .quad 0xfffffffffc000000, 0xfffffffffc000000
+        /*== One ==*/
+        .align 16
+        .quad 0x3ff0000000000000, 0x3ff0000000000000
+        /*== Threshold ==*/
+        .align 16
+        .quad 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 16
+        .quad 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 16
+        .quad 0x408ff00000000000, 0x408ff00000000000
+        /*== L2 ==*/
+        .align 16
+        .quad 0x3fd34413509f79ff, 0x3fd34413509f79ff
+        .align 16
+        .type	__svml_dlog10_data_internal,@object
+        .size	__svml_dlog10_data_internal,.-__svml_dlog10_data_internal
+        .space 48, 0x00 	
+        .align 16
+
+.FLT_12:
+        .long	0x00000000,0x43380000,0x00000000,0x43380000
+        .type	.FLT_12,@object
+        .size	.FLT_12,16
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core-sse.S
new file mode 100644
index 0000000000..0a101666f5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized log10, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_log10 _ZGVdN4v_log10_sse_wrapper
+#include "../svml_d_log104_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core.c
new file mode 100644
index 0000000000..48c63cfb3d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized log10, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_log10
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_log10, __GI__ZGVdN4v_log10, __redirect__ZGVdN4v_log10)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core_avx2.S
new file mode 100644
index 0000000000..df23926562
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core_avx2.S
@@ -0,0 +1,1074 @@
+/* Function log10 vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
+ *       log10(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dlog10_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	4128
+#define poly_coeff                    	8256
+#define ExpMask                       	8416
+#define Two10                         	8448
+#define MinNorm                       	8480
+#define MaxNorm                       	8512
+#define HalfMask                      	8544
+#define One                           	8576
+#define Threshold                     	8608
+#define Bias                          	8640
+#define Bias1                         	8672
+#define L2                            	8704
+
+/* Lookup bias for data table __svml_dlog10_data_internal.  */
+#define Table_Lookup_Bias               -0x406fe0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_log10_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       Table_Lookup_Bias+__svml_dlog10_data_internal(%rip), %r8
+        vmovapd   %ymm0, %ymm3
+
+/* preserve mantissa, set input exponent to 2^(-10) */
+        vandpd    ExpMask+__svml_dlog10_data_internal(%rip), %ymm3, %ymm4
+        vorpd     Two10+__svml_dlog10_data_internal(%rip), %ymm4, %ymm2
+
+/* reciprocal approximation good to at least 11 bits */
+        vcvtpd2ps %ymm2, %xmm5
+
+/* exponent bits */
+        vpsrlq    $20, %ymm3, %ymm7
+        vmovupd   One+__svml_dlog10_data_internal(%rip), %ymm14
+        vrcpps    %xmm5, %xmm6
+
+/* check range */
+        vcmplt_oqpd MinNorm+__svml_dlog10_data_internal(%rip), %ymm3, %ymm11
+        vcmpnle_uqpd MaxNorm+__svml_dlog10_data_internal(%rip), %ymm3, %ymm12
+        vcvtps2pd %xmm6, %ymm9
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        vroundpd  $0, %ymm9, %ymm1
+
+/* exponent*log(2.0) */
+        vmovupd   Threshold+__svml_dlog10_data_internal(%rip), %ymm9
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        vpsrlq    $40, %ymm1, %ymm15
+
+/* argument reduction */
+        vfmsub213pd %ymm14, %ymm1, %ymm2
+        vcmplt_oqpd %ymm1, %ymm9, %ymm1
+        vorpd     %ymm12, %ymm11, %ymm13
+        vmovupd   poly_coeff+64+__svml_dlog10_data_internal(%rip), %ymm12
+        vfmadd213pd poly_coeff+96+__svml_dlog10_data_internal(%rip), %ymm2, %ymm12
+
+/* combine and get argument value range mask */
+        vmovmskpd %ymm13, %eax
+        vmulpd    %ymm2, %ymm2, %ymm13
+        vextractf128 $1, %ymm7, %xmm8
+        vshufps   $221, %xmm8, %xmm7, %xmm10
+
+/* biased exponent in DP format */
+        vcvtdq2pd %xmm10, %ymm0
+        vandpd    Bias+__svml_dlog10_data_internal(%rip), %ymm1, %ymm10
+        vorpd     Bias1+__svml_dlog10_data_internal(%rip), %ymm10, %ymm11
+        vsubpd    %ymm11, %ymm0, %ymm0
+        vmulpd    L2+__svml_dlog10_data_internal(%rip), %ymm0, %ymm1
+
+/* polynomial */
+        vmovupd   poly_coeff+__svml_dlog10_data_internal(%rip), %ymm0
+        vfmadd213pd poly_coeff+32+__svml_dlog10_data_internal(%rip), %ymm2, %ymm0
+        vmulpd    poly_coeff+128+__svml_dlog10_data_internal(%rip), %ymm2, %ymm2
+        vfmadd213pd %ymm12, %ymm13, %ymm0
+        vfmadd213pd %ymm2, %ymm13, %ymm0
+        vextractf128 $1, %ymm15, %xmm6
+        vmovd     %xmm15, %edx
+        vmovd     %xmm6, %esi
+        movslq    %edx, %rdx
+        vpextrd   $2, %xmm15, %ecx
+        movslq    %esi, %rsi
+        vpextrd   $2, %xmm6, %edi
+        movslq    %ecx, %rcx
+        movslq    %edi, %rdi
+        vmovsd    (%r8,%rdx), %xmm4
+        vmovsd    (%r8,%rsi), %xmm7
+        vmovhpd   (%r8,%rcx), %xmm4, %xmm5
+        vmovhpd   (%r8,%rdi), %xmm7, %xmm8
+        vinsertf128 $1, %xmm8, %ymm5, %ymm14
+
+/* reconstruction */
+        vaddpd    %ymm0, %ymm14, %ymm2
+        vaddpd    %ymm2, %ymm1, %ymm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm3, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      log10@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_log10_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dlog10_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 Log_HA_table[(1<<9)+2][2];
+        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(32)) VUINT32 poly_coeff[5][4][2];
+        __declspec(align(32)) VUINT32 ExpMask[4][2];
+        __declspec(align(32)) VUINT32 Two10[4][2];
+        __declspec(align(32)) VUINT32 MinNorm[4][2];
+        __declspec(align(32)) VUINT32 MaxNorm[4][2];
+        __declspec(align(32)) VUINT32 HalfMask[4][2];
+        __declspec(align(32)) VUINT32 One[4][2];
+        __declspec(align(32)) VUINT32 Threshold[4][2];
+        __declspec(align(32)) VUINT32 Bias[4][2];
+        __declspec(align(32)) VUINT32 Bias1[4][2];
+        __declspec(align(32)) VUINT32 L2[4][2];
+} __svml_dlog10_data_internal;
+#endif
+__svml_dlog10_data_internal:
+        /* Log_HA_table */
+        .quad 0xc0733a7146f6b080, 0xbe1e707ce619c200
+        .quad 0xc0733a7547771970, 0xbe1e79c6c06d6f51
+        .quad 0xc0733a7945aacb70, 0xbe1e78e225fad29c
+        .quad 0xc0733a7d41946970, 0xbe1e76d607f9693b
+        .quad 0xc0733a813b3691f0, 0xbe1e7704b3e0685b
+        .quad 0xc0733a853293df00, 0xbe1e79c1216a27fa
+        .quad 0xc0733a8927aee660, 0xbe1e76dce5734a81
+        .quad 0xc0733a8d1a8a3920, 0xbe1e782ee2ca4dba
+        .quad 0xc0733a910b286430, 0xbe1e7812d1a0a61f
+        .quad 0xc0733a94f98bf010, 0xbe1e77e1b5ecbc61
+        .quad 0xc0733a98e5b76100, 0xbe1e76635cac1586
+        .quad 0xc0733a9ccfad36f0, 0xbe1e7638f7968f32
+        .quad 0xc0733aa0b76feda0, 0xbe1e7840ee76e365
+        .quad 0xc0733aa49d01fcb0, 0xbe1e79f3fd01907e
+        .quad 0xc0733aa88065d7a0, 0xbe1e77bbb3a9c38a
+        .quad 0xc0733aac619dedb0, 0xbe1e7742719bf41d
+        .quad 0xc0733ab040acaa20, 0xbe1e79bcedaf79cb
+        .quad 0xc0733ab41d947450, 0xbe1e762d63cb7ca0
+        .quad 0xc0733ab7f857af50, 0xbe1e77a07be83403
+        .quad 0xc0733abbd0f8ba80, 0xbe1e7763ff836ad0
+        .quad 0xc0733abfa779f130, 0xbe1e7737720ead39
+        .quad 0xc0733ac37bddaad0, 0xbe1e7776a08e55e7
+        .quad 0xc0733ac74e263af0, 0xbe1e793e3c52dd36
+        .quad 0xc0733acb1e55f160, 0xbe1e788a94695051
+        .quad 0xc0733aceec6f1a10, 0xbe1e76508114a813
+        .quad 0xc0733ad2b873fd20, 0xbe1e76909457d23e
+        .quad 0xc0733ad68266df10, 0xbe1e7664a24f9ca4
+        .quad 0xc0733ada4a4a0090, 0xbe1e7a07b3d44b18
+        .quad 0xc0733ade101f9ee0, 0xbe1e76d87594704d
+        .quad 0xc0733ae1d3e9f340, 0xbe1e79563595a182
+        .quad 0xc0733ae595ab33b0, 0xbe1e771880c3c6ab
+        .quad 0xc0733ae955659250, 0xbe1e78c171f517d4
+        .quad 0xc0733aed131b3df0, 0xbe1e77eac3874666
+        .quad 0xc0733af0cece61b0, 0xbe1e790db479d8f6
+        .quad 0xc0733af488812550, 0xbe1e7965d1aa5c90
+        .quad 0xc0733af84035ad10, 0xbe1e78ceb398ba47
+        .quad 0xc0733afbf5ee19c0, 0xbe1e779cc0dcb5aa
+        .quad 0xc0733affa9ac88c0, 0xbe1e7871053953ed
+        .quad 0xc0733b035b731420, 0xbe1e7a082cffa71a
+        .quad 0xc0733b070b43d2a0, 0xbe1e7904b4382fad
+        .quad 0xc0733b0ab920d790, 0xbe1e79b458d0b4f3
+        .quad 0xc0733b0e650c3310, 0xbe1e79d0ded414c6
+        .quad 0xc0733b120f07f200, 0xbe1e763c357a1943
+        .quad 0xc0733b15b7161dd0, 0xbe1e78b80ba6daaa
+        .quad 0xc0733b195d38bd00, 0xbe1e7998e23b8ffd
+        .quad 0xc0733b1d0171d2c0, 0xbe1e7974aa65ee8c
+        .quad 0xc0733b20a3c35f20, 0xbe1e76ccfde752ab
+        .quad 0xc0733b24442f5ef0, 0xbe1e77b4ff19debb
+        .quad 0xc0733b27e2b7cc10, 0xbe1e7772ee478542
+        .quad 0xc0733b2b7f5e9d30, 0xbe1e781d81b58b44
+        .quad 0xc0733b2f1a25c600, 0xbe1e78350d967565
+        .quad 0xc0733b32b30f3720, 0xbe1e783888e48152
+        .quad 0xc0733b364a1cde30, 0xbe1e78367bf7c111
+        .quad 0xc0733b39df50a5d0, 0xbe1e7959e57ca47d
+        .quad 0xc0733b3d72ac75c0, 0xbe1e777322423222
+        .quad 0xc0733b41043232b0, 0xbe1e767ce42a60aa
+        .quad 0xc0733b4493e3be70, 0xbe1e781d445aea19
+        .quad 0xc0733b4821c2f800, 0xbe1e7922fca18e18
+        .quad 0xc0733b4badd1bb80, 0xbe1e76fed3d40647
+        .quad 0xc0733b4f3811e210, 0xbe1e793948c9eabc
+        .quad 0xc0733b52c0854240, 0xbe1e76e487656b8c
+        .quad 0xc0733b56472daf90, 0xbe1e780ab2f71223
+        .quad 0xc0733b59cc0cfaf0, 0xbe1e77189120b09c
+        .quad 0xc0733b5d4f24f270, 0xbe1e7644a0343a12
+        .quad 0xc0733b60d0776160, 0xbe1e78f2a3e4733d
+        .quad 0xc0733b6450061080, 0xbe1e7913b2f73ae5
+        .quad 0xc0733b67cdd2c5c0, 0xbe1e7882d08393b5
+        .quad 0xc0733b6b49df4470, 0xbe1e765e1b209979
+        .quad 0xc0733b6ec42d4d20, 0xbe1e785c9c4620d4
+        .quad 0xc0733b75b394f240, 0xbe1e78878cd0e956
+        .quad 0xc0733b7c9c178630, 0xbe1e789a4112d90b
+        .quad 0xc0733b837dc2b0f0, 0xbe1e79050b8a1766
+        .quad 0xc0733b8a58a3f220, 0xbe1e7790dffc47aa
+        .quad 0xc0733b912cc8a180, 0xbe1e77174593b06a
+        .quad 0xc0733b97fa3defb0, 0xbe1e7677de2d2ecc
+        .quad 0xc0733b9ec110e6b0, 0xbe1e76cff477ca18
+        .quad 0xc0733ba5814e6a80, 0xbe1e78f8644dec7b
+        .quad 0xc0733bac3b0339d0, 0xbe1e764e1361788d
+        .quad 0xc0733bb2ee3bee30, 0xbe1e78c913e738de
+        .quad 0xc0733bb99b04fd30, 0xbe1e76666f5bddaa
+        .quad 0xc0733bc0416ab850, 0xbe1e77e87cbd8ab6
+        .quad 0xc0733bc6e1794e10, 0xbe1e76f18ba1c966
+        .quad 0xc0733bcd7b3cca10, 0xbe1e777c9461b8db
+        .quad 0xc0733bd40ec115d0, 0xbe1e78b78526ffac
+        .quad 0xc0733bda9c11f920, 0xbe1e7942abecfede
+        .quad 0xc0733be1233b1aa0, 0xbe1e76d8a684fd8c
+        .quad 0xc0733be7a4480010, 0xbe1e79622b539ac9
+        .quad 0xc0733bee1f440f30, 0xbe1e7978e7cc20ea
+        .quad 0xc0733bf4943a8de0, 0xbe1e765c9c9de825
+        .quad 0xc0733bfb0336a290, 0xbe1e775d8b138ee2
+        .quad 0xc0733c016c435500, 0xbe1e78bf33465c2f
+        .quad 0xc0733c07cf6b8e80, 0xbe1e78164f7cc441
+        .quad 0xc0733c0e2cba1a50, 0xbe1e7824e64d0b23
+        .quad 0xc0733c148439a630, 0xbe1e78373ae7dd81
+        .quad 0xc0733c1ad5f4c2c0, 0xbe1e7704513e0afe
+        .quad 0xc0733c2121f5e3d0, 0xbe1e7914aa84200f
+        .quad 0xc0733c2768476110, 0xbe1e76b1cde25cf6
+        .quad 0xc0733c2da8f37600, 0xbe1e796120e3862d
+        .quad 0xc0733c33e40442e0, 0xbe1e78ec836d7e7b
+        .quad 0xc0733c3a1983cca0, 0xbe1e77fb13b7dabb
+        .quad 0xc0733c40497bfd70, 0xbe1e783c6fcb2404
+        .quad 0xc0733c4673f6a530, 0xbe1e7628bb93dce8
+        .quad 0xc0733c4c98fd7990, 0xbe1e7857a47b5001
+        .quad 0xc0733c52b89a16d0, 0xbe1e76708dc2831f
+        .quad 0xc0733c58d2d5ffa0, 0xbe1e77b6038651f1
+        .quad 0xc0733c5ee7ba9de0, 0xbe1e792e855bb5b2
+        .quad 0xc0733c64f75142d0, 0xbe1e776cacd5c105
+        .quad 0xc0733c6b01a32740, 0xbe1e77f8a8011315
+        .quad 0xc0733c7106b96c30, 0xbe1e765cf3efcfde
+        .quad 0xc0733c77069d1ad0, 0xbe1e78d837d2efac
+        .quad 0xc0733c7d01572530, 0xbe1e78b615cf772c
+        .quad 0xc0733c82f6f06640, 0xbe1e7650bbbd7a25
+        .quad 0xc0733c88e771a220, 0xbe1e78bcf3495872
+        .quad 0xc0733c8ed2e386c0, 0xbe1e792266832e84
+        .quad 0xc0733c94b94eabd0, 0xbe1e79c1c3c2ca52
+        .quad 0xc0733c9a9abb9340, 0xbe1e78aa61e5807d
+        .quad 0xc0733ca07732a970, 0xbe1e7620fc4cf156
+        .quad 0xc0733ca64ebc4570, 0xbe1e76b914a832c5
+        .quad 0xc0733cac2160a970, 0xbe1e79227f72020e
+        .quad 0xc0733cb1ef280300, 0xbe1e77ac972cc008
+        .quad 0xc0733cb7b81a6b10, 0xbe1e798089be41f4
+        .quad 0xc0733cbd7c3fe6a0, 0xbe1e77942ae037fe
+        .quad 0xc0733cc33ba06690, 0xbe1e7956ae6463d9
+        .quad 0xc0733cc8f643c850, 0xbe1e7918a50c7942
+        .quad 0xc0733cceac31d5d0, 0xbe1e78308eeab604
+        .quad 0xc0733cd45d7245e0, 0xbe1e76dd4ea88445
+        .quad 0xc0733cda0a0cbc60, 0xbe1e77e7c1aa5909
+        .quad 0xc0733cdfb208caa0, 0xbe1e7804b9d20e54
+        .quad 0xc0733ce5556def70, 0xbe1e78f88e99d49c
+        .quad 0xc0733ceaf4439780, 0xbe1e787d74682d68
+        .quad 0xc0733cf08e911d80, 0xbe1e76edc24fe6e7
+        .quad 0xc0733cf6245dca50, 0xbe1e79b347ec86d2
+        .quad 0xc0733cfbb5b0d580, 0xbe1e797cceb2c39b
+        .quad 0xc0733d0142916530, 0xbe1e783adbdc6aa1
+        .quad 0xc0733d06cb068e70, 0xbe1e76e4c20e3d9e
+        .quad 0xc0733d0c4f175570, 0xbe1e77070bf3cf61
+        .quad 0xc0733d11cecaadc0, 0xbe1e781c43502734
+        .quad 0xc0733d174a277a80, 0xbe1e78b11268ea72
+        .quad 0xc0733d1cc1348e90, 0xbe1e7754b83bfc7d
+        .quad 0xc0733d2233f8acb0, 0xbe1e7756c29bf5e9
+        .quad 0xc0733d27a27a87d0, 0xbe1e7952fc1d9333
+        .quad 0xc0733d2d0cc0c350, 0xbe1e778c76ae6077
+        .quad 0xc0733d3272d1f2e0, 0xbe1e7a1896ba8f43
+        .quad 0xc0733d37d4b49b30, 0xbe1e76dafdf432d8
+        .quad 0xc0733d3d326f3180, 0xbe1e795330184013
+        .quad 0xc0733d428c081c80, 0xbe1e763cc774d30f
+        .quad 0xc0733d47e185b3d0, 0xbe1e77030a779c0a
+        .quad 0xc0733d4d32ee40b0, 0xbe1e7908af2a2d7e
+        .quad 0xc0733d528047fe00, 0xbe1e78c4953b797d
+        .quad 0xc0733d57c9991850, 0xbe1e78b43b096579
+        .quad 0xc0733d5d0ee7ae30, 0xbe1e7824ae0a4804
+        .quad 0xc0733d625039d040, 0xbe1e79d2b2fbb740
+        .quad 0xc0733d678d958190, 0xbe1e7662de59a1a6
+        .quad 0xc0733d6cc700b760, 0xbe1e76b251d59aaa
+        .quad 0xc0733d71fc8159b0, 0xbe1e7a00cfd1f487
+        .quad 0xc0733d772e1d4360, 0xbe1e77f4d246167e
+        .quad 0xc0733d7c5bda4200, 0xbe1e767a4ee8e6fc
+        .quad 0xc0733d8185be1640, 0xbe1e777ccf0a8aed
+        .quad 0xc0733d86abce7420, 0xbe1e767d7e279ada
+        .quad 0xc0733d8bce1102d0, 0xbe1e7a05cef4bb90
+        .quad 0xc0733d90ec8b5d40, 0xbe1e78f75369be5b
+        .quad 0xc0733d96074311d0, 0xbe1e77b9612e8c8a
+        .quad 0xc0733d9b1e3da2b0, 0xbe1e794518b9adeb
+        .quad 0xc0733da031808620, 0xbe1e7810626fb934
+        .quad 0xc0733da541112650, 0xbe1e76d87223fa6d
+        .quad 0xc0733daa4cf4e1a0, 0xbe1e794c5e7ca3b5
+        .quad 0xc0733daf55310af0, 0xbe1e789856ef816f
+        .quad 0xc0733db459cae970, 0xbe1e77d2004effbd
+        .quad 0xc0733db95ac7b8f0, 0xbe1e78467d31eb9c
+        .quad 0xc0733dbe582caa00, 0xbe1e79aaa4e25787
+        .quad 0xc0733dc351fee220, 0xbe1e762de8f107bf
+        .quad 0xc0733dc848437b90, 0xbe1e7670670a63fe
+        .quad 0xc0733dcd3aff85d0, 0xbe1e795ca237c6cc
+        .quad 0xc0733dd22a3805b0, 0xbe1e77e55c53c1d9
+        .quad 0xc0733dd715f1f520, 0xbe1e78a806213ac4
+        .quad 0xc0733ddbfe3243b0, 0xbe1e77743a2bc615
+        .quad 0xc0733de0e2fdd660, 0xbe1e78b8b45b0b7d
+        .quad 0xc0733de5c4598800, 0xbe1e78d635f2f4b9
+        .quad 0xc0733deaa24a2920, 0xbe1e7758c396a11e
+        .quad 0xc0733def7cd48020, 0xbe1e7a17a8cc454c
+        .quad 0xc0733df453fd49a0, 0xbe1e783caa73f616
+        .quad 0xc0733df927c93820, 0xbe1e7932cfa29664
+        .quad 0xc0733dfdf83cf490, 0xbe1e777d265c72a6
+        .quad 0xc0733e02c55d1e10, 0xbe1e7775e7c03c60
+        .quad 0xc0733e078f2e4a40, 0xbe1e79f65d52d232
+        .quad 0xc0733e0c55b50570, 0xbe1e76e7e7464b4e
+        .quad 0xc0733e1118f5d250, 0xbe1e77be81cad877
+        .quad 0xc0733e15d8f52a80, 0xbe1e79dd25b5fb3a
+        .quad 0xc0733e1a95b77e80, 0xbe1e78e45f1418ef
+        .quad 0xc0733e1f4f4135a0, 0xbe1e78eb7289505b
+        .quad 0xc0733e240596ae50, 0xbe1e78a468c07cad
+        .quad 0xc0733e28b8bc3e20, 0xbe1e776b558a4009
+        .quad 0xc0733e2d68b631d0, 0xbe1e77412eb9941e
+        .quad 0xc0733e321588cd80, 0xbe1e76b2853f845e
+        .quad 0xc0733e36bf384cb0, 0xbe1e76aa7184273c
+        .quad 0xc0733e3b65c8e260, 0xbe1e7832027f78fa
+        .quad 0xc0733e40093eb930, 0xbe1e7a1c7da131f5
+        .quad 0xc0733e44a99df380, 0xbe1e76a0bc2ae4bc
+        .quad 0xc0733e4946eaab30, 0xbe1e78dff13b6f5d
+        .quad 0xc0733e4de128f250, 0xbe1e765a226dea2c
+        .quad 0xc0733e52785cd290, 0xbe1e78509b989111
+        .quad 0xc0733e570c8a4de0, 0xbe1e7916a4e9803d
+        .quad 0xc0733e5b9db55e30, 0xbe1e7950c15758cc
+        .quad 0xc0733e602be1f5a0, 0xbe1e7922ba1ad420
+        .quad 0xc0733e64b713fe90, 0xbe1e794cbaabcef6
+        .quad 0xc0733e693f4f5bc0, 0xbe1e7837bf883fed
+        .quad 0xc0733e6dc497e850, 0xbe1e76f198ddbbdf
+        .quad 0xc0733e7246f177d0, 0xbe1e7a18c1067764
+        .quad 0xc0733e76c65fd6a0, 0xbe1e76b845a8fd9d
+        .quad 0xc0733e7b42e6c970, 0xbe1e7714012df506
+        .quad 0xc0733e7fbc8a0de0, 0xbe1e7765612922cd
+        .quad 0xc0733e84334d5a50, 0xbe1e7688f5424a00
+        .quad 0xc0733e88a7345df0, 0xbe1e769d011f6663
+        .quad 0xc0733e8d1842c0e0, 0xbe1e79914acbfaf7
+        .quad 0xc0733e91867c2460, 0xbe1e79a85e189bd7
+        .quad 0xc0733e95f1e422a0, 0xbe1e79ea7c726432
+        .quad 0xc0733e9a5a7e4f10, 0xbe1e768a6fbb8e6e
+        .quad 0xc0733e9ec04e3620, 0xbe1e793c75bcc9fc
+        .quad 0xc0733ea323575dd0, 0xbe1e797f78da13d4
+        .quad 0xc0733ea7839d4550, 0xbe1e78d8c9cda978
+        .quad 0xc0733eabe1236540, 0xbe1e77028d480fff
+        .quad 0xc0733eb03bed2fa0, 0xbe1e7a0d0f74ff7c
+        .quad 0xc0733eb493fe1040, 0xbe1e76732e8a35fb
+        .quad 0xc0733eb8e9596c30, 0xbe1e77220caeabeb
+        .quad 0xc0733ebd3c02a260, 0xbe1e797438b645ef
+        .quad 0xc0733ec18bfd0b80, 0xbe1e79207c5fd6e8
+        .quad 0xc0733ec5d94bf9f0, 0xbe1e781c7df8f946
+        .quad 0xc0733eca23f2b9f0, 0xbe1e76736284e2db
+        .quad 0xc0733ece6bf49190, 0xbe1e7a109cc0c3f5
+        .quad 0xc0733ed2b154c120, 0xbe1e767f14a16d50
+        .quad 0xc0733ed6f4168290, 0xbe1e789cd22acaf0
+        .quad 0xc0733edb343d0a40, 0xbe1e764355ca28ad
+        .quad 0xc0733edf71cb8660, 0xbe1e79e4c7a81c45
+        .quad 0xc0733ee3acc51fb0, 0xbe1e761e26b644c2
+        .quad 0xc0733ee7e52cf8c0, 0xbe1e793e9f8fbdd3
+        .quad 0xc0733eec1b062ed0, 0xbe1e78c432991c20
+        .quad 0xc0733ef04e53d940, 0xbe1e78cdd025f4d8
+        .quad 0xc0733ef47f1909f0, 0xbe1e778310c6446e
+        .quad 0xc0733ef8ad58cd20, 0xbe1e7871af3d6e17
+        .quad 0xc0733efcd91629b0, 0xbe1e77e0e906f697
+        .quad 0xc0733f01025420f0, 0xbe1e7a1ae9b27892
+        .quad 0xc0733f052915af00, 0xbe1e76ac64c88f9d
+        .quad 0xc0733f094d5dca60, 0xbe1e779a815589c4
+        .quad 0xc0733f0d6f2f6480, 0xbe1e788f39a4864c
+        .quad 0xc0733f118e8d6980, 0xbe1e79fc51263525
+        .quad 0xc0733f15ab7ac060, 0xbe1e783501f19e90
+        .quad 0xc0733f19c5fa4ae0, 0xbe1e767e82c327ab
+        .quad 0xc0733f1dde0ee5a0, 0xbe1e7a1785d66123
+        .quad 0xc0733f21f3bb6870, 0xbe1e7936d07203da
+        .quad 0xc0733f260702a5e0, 0xbe1e7a010a7ac699
+        .quad 0xc0733f2a17e76bb0, 0xbe1e7975e4e16312
+        .quad 0xc0733f2e266c82b0, 0xbe1e7654b5422330
+        .quad 0xc0733f323294aeb0, 0xbe1e77f8a4909d35
+        .quad 0xc0733f363c62aee0, 0xbe1e792c8e30d226
+        .quad 0xc0733f3a43d93da0, 0xbe1e76f6ac67a1ff
+        .quad 0xc0733f3e48fb1070, 0xbe1e775c2e97715a
+        .quad 0xc0733f424bcad840, 0xbe1e781cd54ae100
+        /*== Log_LA_table ==*/
+        .align 32
+        .quad 0x0000000000000000
+        .quad 0xbf4bc48a867884b7
+        .quad 0xbf5bbd9e9482af09
+        .quad 0xbf64c9096b94befd
+        .quad 0xbf6bafd47221ed26
+        .quad 0xbf714999e2ad8ea6
+        .quad 0xbf74b99563d2a1bd
+        .quad 0xbf7827de6b310350
+        .quad 0xbf7b9476a4fcd10f
+        .quad 0xbf7eff5fbaf25781
+        .quad 0xbf81344daa2d7553
+        .quad 0xbf82e8158b08d957
+        .quad 0xbf849b0851443684
+        .quad 0xbf864d26cce610dd
+        .quad 0xbf87fe71ccc4e6b0
+        .quad 0xbf89aeea1e897fdf
+        .quad 0xbf8b5e908eb13790
+        .quad 0xbf8d0d65e890405a
+        .quad 0xbf8ebb6af653e2ee
+        .quad 0xbf90345040825bad
+        .quad 0xbf910a83a8446c78
+        .quad 0xbf91e05015d30a71
+        .quad 0xbf92b5b5ec0209d3
+        .quad 0xbf938ab58d173e91
+        .quad 0xbf945f4f5acb8be0
+        .quad 0xbf953383b64bf13f
+        .quad 0xbf960753003a94ef
+        .quad 0xbf96dabd98afcc05
+        .quad 0xbf97adc3df3b1ff8
+        .quad 0xbf98806632e451d0
+        .quad 0xbf9952a4f22c5ae9
+        .quad 0xbf9a24807b0e6b5c
+        .quad 0xbf9af5f92b00e610
+        .quad 0xbf9bc70f5ef65a77
+        .quad 0xbf9c97c3735e7c0a
+        .quad 0xbf9d6815c4271775
+        .quad 0xbf9e3806acbd058f
+        .quad 0xbf9f0796880d1c19
+        .quad 0xbf9fd6c5b0851c4c
+        .quad 0xbfa052ca400a4f9b
+        .quad 0xbfa0ba01a8170000
+        .quad 0xbfa121093ce3a205
+        .quad 0xbfa187e12aad8077
+        .quad 0xbfa1ee899d74a03e
+        .quad 0xbfa25502c0fc314c
+        .quad 0xbfa2bb4cc0cafe8d
+        .quad 0xbfa32167c82bdcda
+        .quad 0xbfa38754022e18e2
+        .quad 0xbfa3ed1199a5e425
+        .quad 0xbfa452a0b92cc0ec
+        .quad 0xbfa4b8018b21ed4f
+        .quad 0xbfa51d3439aacd4a
+        .quad 0xbfa58238eeb353da
+        .quad 0xbfa5e70fd3ee6b34
+        .quad 0xbfa64bb912d65c07
+        .quad 0xbfa6b034d4ad33df
+        .quad 0xbfa71483427d2a99
+        .quad 0xbfa778a4851906f3
+        .quad 0xbfa7dc98c51c8242
+        .quad 0xbfa840602aecab3d
+        .quad 0xbfa8a3fadeb847f4
+        .quad 0xbfa90769087836e4
+        .quad 0xbfa96aaacfefcf3c
+        .quad 0xbfa9cdc05cad4042
+        .quad 0xbfaa30a9d609efea
+        .quad 0xbfaa9367632ad897
+        .quad 0xbfaaf5f92b00e610
+        .quad 0xbfab585f544951a4
+        .quad 0xbfabba9a058dfd84
+        .quad 0xbfac1ca96525cf56
+        .quad 0xbfac7e8d993509f9
+        .quad 0xbface046c7ada68d
+        .quad 0xbfad41d5164facb4
+        .quad 0xbfada338aaa98a0c
+        .quad 0xbfae0471aa1868f5
+        .quad 0xbfae658039c88690
+        .quad 0xbfaec6647eb58808
+        .quad 0xbfaf271e9daacf20
+        .quad 0xbfaf87aebb43ce06
+        .quad 0xbfafe814fbec5a77
+        .quad 0xbfb02428c1f08016
+        .quad 0xbfb054323b97a948
+        .quad 0xbfb08426fcdb1ee7
+        .quad 0xbfb0b40717932b96
+        .quad 0xbfb0e3d29d81165e
+        .quad 0xbfb11389a04f4a2e
+        .quad 0xbfb1432c31917d08
+        .quad 0xbfb172ba62c4d6de
+        .quad 0xbfb1a23445501816
+        .quad 0xbfb1d199ea83bfbe
+        .quad 0xbfb200eb639a3173
+        .quad 0xbfb23028c1b7daed
+        .quad 0xbfb25f5215eb594a
+        .quad 0xbfb28e67712d9dfc
+        .quad 0xbfb2bd68e4621371
+        .quad 0xbfb2ec568056c16f
+        .quad 0xbfb31b3055c47118
+        .quad 0xbfb349f6754ed0b4
+        .quad 0xbfb378a8ef84971e
+        .quad 0xbfb3a747d4dfa6f5
+        .quad 0xbfb3d5d335c53179
+        .quad 0xbfb4044b2285d925
+        .quad 0xbfb432afab5dd3ff
+        .quad 0xbfb46100e0750da1
+        .quad 0xbfb48f3ed1df48fb
+        .quad 0xbfb4bd698f9c41cf
+        .quad 0xbfb4eb812997cde4
+        .quad 0xbfb51985afa9fdfd
+        .quad 0xbfb5477731973e85
+        .quad 0xbfb57555bf1077f5
+        .quad 0xbfb5a32167b32f02
+        .quad 0xbfb5d0da3b09a47e
+        .quad 0xbfb5fe80488af4fd
+        .quad 0xbfb62c139f9b3837
+        .quad 0xbfb659944f8ba02d
+        .quad 0xbfb68702679a980a
+        .quad 0xbfb6b45df6f3e2c9
+        .quad 0xbfb6e1a70cb0b99a
+        .quad 0xbfb70eddb7d7ea07
+        .quad 0xbfb73c02075df3e5
+        .quad 0xbfb769140a2526fd
+        .quad 0xbfb79613cefdc07d
+        .quad 0xbfb7c30164a60836
+        .quad 0xbfb7efdcd9ca6d8f
+        .quad 0xbfb81ca63d05a44a
+        .quad 0xbfb8495d9ce0c10c
+        .quad 0xbfb8760307d355ab
+        .quad 0xbfb8a2968c438d41
+        .quad 0xbfb8cf183886480d
+        .quad 0xbfb8fb881adf3713
+        .quad 0xbfb927e64180f790
+        .quad 0xbfb95432ba8d2e2f
+        .quad 0xbfb9806d9414a209
+        .quad 0xbfb9ac96dc175776
+        .quad 0xbfb9d8aea084aa9c
+        .quad 0xbfba04b4ef3b69d8
+        .quad 0xbfba30a9d609efea
+        .quad 0xbfba5c8d62ae3dec
+        .quad 0xbfba885fa2d6151e
+        .quad 0xbfbab420a41f1076
+        .quad 0xbfbadfd07416be07
+        .quad 0xbfbb0b6f203ab82c
+        .quad 0xbfbb36fcb5f8be8a
+        .quad 0xbfbb627942aecedd
+        .quad 0xbfbb8de4d3ab3d98
+        .quad 0xbfbbb93f762cce4f
+        .quad 0xbfbbe4893762cbf7
+        .quad 0xbfbc0fc2246d20f5
+        .quad 0xbfbc3aea4a5c6eff
+        .quad 0xbfbc6601b63226cb
+        .quad 0xbfbc910874e09f98
+        .quad 0xbfbcbbfe934b2e81
+        .quad 0xbfbce6e41e463da5
+        .quad 0xbfbd11b92297632b
+        .quad 0xbfbd3c7dacf5780b
+        .quad 0xbfbd6731ca08aeb9
+        .quad 0xbfbd91d5866aa99c
+        .quad 0xbfbdbc68eea6915b
+        .quad 0xbfbde6ec0f392b05
+        .quad 0xbfbe115ef490ee07
+        .quad 0xbfbe3bc1ab0e19fe
+        .quad 0xbfbe66143f02cc5d
+        .quad 0xbfbe9056bcb315e8
+        .quad 0xbfbeba893055100b
+        .quad 0xbfbee4aba610f204
+        .quad 0xbfbf0ebe2a0125eb
+        .quad 0xbfbf38c0c8325d86
+        .quad 0xbfbf62b38ca3a706
+        .quad 0xbfbf8c9683468191
+        .quad 0xbfbfb669b7fef1a8
+        .quad 0xbfbfe02d36a3956d
+        .quad 0xbfc004f0857edc5c
+        .quad 0xbfc019c2a064b486
+        .quad 0xbfc02e8cf1dac4b8
+        .quad 0xbfc0434f7fb1f307
+        .quad 0xbfc0580a4fb4a3df
+        .quad 0xbfc06cbd67a6c3b6
+        .quad 0xbfc08168cd45d0a9
+        .quad 0xbfc0960c8648e406
+        .quad 0xbfc0aaa89860bbcf
+        .quad 0xbfc0bf3d0937c41c
+        .quad 0xbfc0d3c9de722078
+        .quad 0xbfc0e84f1dadb526
+        .quad 0xbfc0fccccc823059
+        .quad 0xbfc11142f0811357
+        .quad 0xbfc125b18f35bb8e
+        .quad 0xbfc13a18ae256b99
+        .quad 0xbfc14e7852cf5430
+        .quad 0xbfc162d082ac9d10
+        .quad 0xbfc1772143306dc6
+        .quad 0xbfc18b6a99c7f679
+        .quad 0xbfc19fac8bda7897
+        .quad 0xbfc1b3e71ec94f7b
+        .quad 0xbfc1c81a57eff8fd
+        .quad 0xbfc1dc463ca41df8
+        .quad 0xbfc1f06ad2359abd
+        .quad 0xbfc204881dee8777
+        .quad 0xbfc2189e25134081
+        .quad 0xbfc22cacece26ead
+        .quad 0xbfc240b47a950f79
+        .quad 0xbfc254b4d35e7d3c
+        .quad 0xbfc268adfc6c773e
+        .quad 0xbfc27c9ffae729c1
+        .quad 0xbfc2908ad3f13603
+        .quad 0xbfc2a46e8ca7ba2a
+        .quad 0xbfc2b84b2a225923
+        .quad 0xbfc2cc20b1734279
+        .quad 0xbfc2dfef27a73a18
+        .quad 0xbfc2f3b691c5a001
+        .quad 0xbfc30776f4d077f7
+        .quad 0xbfc31b3055c47118
+        .quad 0xbfc32ee2b998ed6e
+        .quad 0xbfc3428e2540096d
+        .quad 0x3fc331f403985097
+        .quad 0x3fc31e56798a910a
+        .quad 0x3fc30abfd8f333b6
+        .quad 0x3fc2f7301cf4e87b
+        .quad 0x3fc2e3a740b7800f
+        .quad 0x3fc2d0253f67e4cb
+        .quad 0x3fc2bcaa14381386
+        .quad 0x3fc2a935ba5f1479
+        .quad 0x3fc295c82d18f434
+        .quad 0x3fc2826167a6bc9c
+        .quad 0x3fc26f01654e6df6
+        .quad 0x3fc25ba8215af7fc
+        .quad 0x3fc24855971c3307
+        .quad 0x3fc23509c1e6d937
+        .quad 0x3fc221c49d147fb3
+        .quad 0x3fc20e8624038fed
+        .quad 0x3fc1fb4e521740f4
+        .quad 0x3fc1e81d22b790d4
+        .quad 0x3fc1d4f291513e01
+        .quad 0x3fc1c1ce9955c0c6
+        .quad 0x3fc1aeb1363b44c8
+        .quad 0x3fc19b9a637ca295
+        .quad 0x3fc1888a1c995931
+        .quad 0x3fc175805d1587c1
+        .quad 0x3fc1627d2079e731
+        .quad 0x3fc14f806253c3ed
+        .quad 0x3fc13c8a1e34f7a0
+        .quad 0x3fc1299a4fb3e306
+        .quad 0x3fc116b0f26b67bb
+        .quad 0x3fc103ce01fae223
+        .quad 0x3fc0f0f17a062353
+        .quad 0x3fc0de1b56356b04
+        .quad 0x3fc0cb4b9235619a
+        .quad 0x3fc0b88229b71227
+        .quad 0x3fc0a5bf186fe483
+        .quad 0x3fc093025a19976c
+        .quad 0x3fc0804bea723aa9
+        .quad 0x3fc06d9bc53c2941
+        .quad 0x3fc05af1e63e03b4
+        .quad 0x3fc0484e4942aa43
+        .quad 0x3fc035b0ea19373b
+        .quad 0x3fc02319c494f951
+        .quad 0x3fc01088d48d6e03
+        .quad 0x3fbffbfc2bbc7803
+        .quad 0x3fbfd6f308ce5b52
+        .quad 0x3fbfb1f6381856f4
+        .quad 0x3fbf8d05b16a6d47
+        .quad 0x3fbf68216c9cc727
+        .quad 0x3fbf4349618fa91a
+        .quad 0x3fbf1e7d882b689a
+        .quad 0x3fbef9bdd860616b
+        .quad 0x3fbed50a4a26eafc
+        .quad 0x3fbeb062d57f4de8
+        .quad 0x3fbe8bc77271b97a
+        .quad 0x3fbe6738190e394c
+        .quad 0x3fbe42b4c16caaf3
+        .quad 0x3fbe1e3d63acb3ba
+        .quad 0x3fbdf9d1f7f5b674
+        .quad 0x3fbdd5727676c959
+        .quad 0x3fbdb11ed766abf4
+        .quad 0x3fbd8cd71303bd26
+        .quad 0x3fbd689b2193f133
+        .quad 0x3fbd446afb64c7e5
+        .quad 0x3fbd204698cb42bd
+        .quad 0x3fbcfc2df223db2d
+        .quad 0x3fbcd820ffd278f3
+        .quad 0x3fbcb41fba42686d
+        .quad 0x3fbc902a19e65111
+        .quad 0x3fbc6c4017382bea
+        .quad 0x3fbc4861aab93a23
+        .quad 0x3fbc248eccf1fba6
+        .quad 0x3fbc00c7767225cb
+        .quad 0x3fbbdd0b9fd09a10
+        .quad 0x3fbbb95b41ab5ce6
+        .quad 0x3fbb95b654a78c87
+        .quad 0x3fbb721cd17157e3
+        .quad 0x3fbb4e8eb0bbf58f
+        .quad 0x3fbb2b0beb419ad0
+        .quad 0x3fbb079479c372ad
+        .quad 0x3fbae4285509950b
+        .quad 0x3fbac0c775e2fde6
+        .quad 0x3fba9d71d5258484
+        .quad 0x3fba7a276badd2c8
+        .quad 0x3fba56e8325f5c87
+        .quad 0x3fba33b4222456f1
+        .quad 0x3fba108b33edb005
+        .quad 0x3fb9ed6d60b30612
+        .quad 0x3fb9ca5aa1729f45
+        .quad 0x3fb9a752ef316149
+        .quad 0x3fb9845642fac8f0
+        .quad 0x3fb9616495e0e1e8
+        .quad 0x3fb93e7de0fc3e80
+        .quad 0x3fb91ba21d6bef77
+        .quad 0x3fb8f8d144557bdf
+        .quad 0x3fb8d60b4ee4d901
+        .quad 0x3fb8b350364c6257
+        .quad 0x3fb8909ff3c4d191
+        .quad 0x3fb86dfa808d36a0
+        .quad 0x3fb84b5fd5eaefd8
+        .quad 0x3fb828cfed29a215
+        .quad 0x3fb8064abf9b30f1
+        .quad 0x3fb7e3d04697b704
+        .quad 0x3fb7c1607b7d7e32
+        .quad 0x3fb79efb57b0f803
+        .quad 0x3fb77ca0d49cb608
+        .quad 0x3fb75a50ebb1624a
+        .quad 0x3fb7380b9665b7c8
+        .quad 0x3fb715d0ce367afc
+        .quad 0x3fb6f3a08ca67270
+        .quad 0x3fb6d17acb3e5f5e
+        .quad 0x3fb6af5f838cf654
+        .quad 0x3fb68d4eaf26d7ee
+        .quad 0x3fb66b4847a68997
+        .quad 0x3fb6494c46ac6e4d
+        .quad 0x3fb6275aa5debf81
+        .quad 0x3fb605735ee985f1
+        .quad 0x3fb5e3966b7e9295
+        .quad 0x3fb5c1c3c5557799
+        .quad 0x3fb59ffb662b815c
+        .quad 0x3fb57e3d47c3af7b
+        .quad 0x3fb55c8963e6adeb
+        .quad 0x3fb53adfb462ce16
+        .quad 0x3fb51940330c000b
+        .quad 0x3fb4f7aad9bbcbaf
+        .quad 0x3fb4d61fa2514a00
+        .quad 0x3fb4b49e86b11e5f
+        .quad 0x3fb4932780c56fe2
+        .quad 0x3fb471ba8a7de2b7
+        .quad 0x3fb450579dcf9186
+        .quad 0x3fb42efeb4b506e9
+        .quad 0x3fb40dafc92e36e2
+        .quad 0x3fb3ec6ad5407868
+        .quad 0x3fb3cb2fd2f67ef1
+        .quad 0x3fb3a9febc60540a
+        .quad 0x3fb388d78b9350ff
+        .quad 0x3fb367ba3aaa1883
+        .quad 0x3fb346a6c3c49066
+        .quad 0x3fb3259d2107db54
+        .quad 0x3fb3049d4c9e52a0
+        .quad 0x3fb2e3a740b7800f
+        .quad 0x3fb2c2baf78817b7
+        .quad 0x3fb2a1d86b49f1e2
+        .quad 0x3fb280ff963c04fc
+        .quad 0x3fb2603072a25f82
+        .quad 0x3fb23f6afac6220a
+        .quad 0x3fb21eaf28f57941
+        .quad 0x3fb1fdfcf7839804
+        .quad 0x3fb1dd5460c8b16f
+        .quad 0x3fb1bcb55f21f307
+        .quad 0x3fb19c1fecf17ee0
+        .quad 0x3fb17b94049e65d0
+        .quad 0x3fb15b11a094a1aa
+        .quad 0x3fb13a98bb450f81
+        .quad 0x3fb11a294f2569f6
+        .quad 0x3fb0f9c356b04389
+        .quad 0x3fb0d966cc6500fa
+        .quad 0x3fb0b913aac7d3a7
+        .quad 0x3fb098c9ec61b3ff
+        .quad 0x3fb078898bc05bf4
+        .quad 0x3fb0585283764178
+        .quad 0x3fb03824ce1a9101
+        .quad 0x3fb0180066492817
+        .quad 0x3fafefca8d451fd6
+        .quad 0x3fafafa6d397efdb
+        .quad 0x3faf6f9594de60f0
+        .quad 0x3faf2f96c6754aee
+        .quad 0x3faeefaa5dc2b239
+        .quad 0x3faeafd05035bd3b
+        .quad 0x3fae70089346a9e6
+        .quad 0x3fae30531c76c34a
+        .quad 0x3fadf0afe1505738
+        .quad 0x3fadb11ed766abf4
+        .quad 0x3fad719ff455f5f7
+        .quad 0x3fad32332dc34dbd
+        .quad 0x3facf2d8795ca5a5
+        .quad 0x3facb38fccd8bfdb
+        .quad 0x3fac74591df72456
+        .quad 0x3fac3534628016dd
+        .quad 0x3fabf62190448d22
+        .quad 0x3fabb7209d1e24e5
+        .quad 0x3fab78317eef1a29
+        .quad 0x3fab39542ba23d73
+        .quad 0x3faafa88992aea19
+        .quad 0x3faabbcebd84fca0
+        .quad 0x3faa7d268eb4c924
+        .quad 0x3faa3e9002c711d2
+        .quad 0x3faa000b0fd0fd6b
+        .quad 0x3fa9c197abf00dd7
+        .quad 0x3fa98335cd4a16c3
+        .quad 0x3fa944e56a0d3450
+        .quad 0x3fa906a6786fc1cb
+        .quad 0x3fa8c878eeb05074
+        .quad 0x3fa88a5cc3159e53
+        .quad 0x3fa84c51ebee8d15
+        .quad 0x3fa80e585f9218fc
+        .quad 0x3fa7d070145f4fd7
+        .quad 0x3fa7929900bd4809
+        .quad 0x3fa754d31b1b179c
+        .quad 0x3fa7171e59efcb5f
+        .quad 0x3fa6d97ab3ba5e10
+        .quad 0x3fa69be81f01af99
+        .quad 0x3fa65e6692547c4e
+        .quad 0x3fa620f604495440
+        .quad 0x3fa5e3966b7e9295
+        .quad 0x3fa5a647be9a54f6
+        .quad 0x3fa56909f44a72fe
+        .quad 0x3fa52bdd034475b8
+        .quad 0x3fa4eec0e2458f30
+        .quad 0x3fa4b1b588129203
+        .quad 0x3fa474baeb77e904
+        .quad 0x3fa437d103498eec
+        .quad 0x3fa3faf7c663060e
+        .quad 0x3fa3be2f2ba7501f
+        .quad 0x3fa381772a00e604
+        .quad 0x3fa344cfb861afae
+        .quad 0x3fa30838cdc2fbfd
+        .quad 0x3fa2cbb2612578b4
+        .quad 0x3fa28f3c69912a74
+        .quad 0x3fa252d6de1564c1
+        .quad 0x3fa21681b5c8c213
+        .quad 0x3fa1da3ce7c91bf8
+        .quad 0x3fa19e086b3b8333
+        .quad 0x3fa161e4374c37f4
+        .quad 0x3fa125d0432ea20e
+        .quad 0x3fa0e9cc861d4944
+        .quad 0x3fa0add8f759cd95
+        .quad 0x3fa071f58e2cdf9b
+        .quad 0x3fa0362241e638ec
+        .quad 0x3f9ff4be13b92920
+        .quad 0x3f9f7d57badb4ee8
+        .quad 0x3f9f061167fc31e8
+        .quad 0x3f9e8eeb09f2f6cb
+        .quad 0x3f9e17e48fa48962
+        .quad 0x3f9da0fde8038de9
+        .quad 0x3f9d2a3702105259
+        .quad 0x3f9cb38fccd8bfdb
+        .quad 0x3f9c3d0837784c41
+        .quad 0x3f9bc6a03117eb97
+        .quad 0x3f9b5057a8ee01ce
+        .quad 0x3f9ada2e8e3e546f
+        .quad 0x3f9a6424d059fc68
+        .quad 0x3f99ee3a5e9f57e8
+        .quad 0x3f99786f2879fc53
+        .quad 0x3f9902c31d62a843
+        .quad 0x3f988d362cdf359e
+        .quad 0x3f9817c846828bbd
+        .quad 0x3f97a27959ec91aa
+        .quad 0x3f972d4956ca2067
+        .quad 0x3f96b8382cd4f551
+        .quad 0x3f964345cbd3a491
+        .quad 0x3f95ce7223998b98
+        .quad 0x3f9559bd2406c3ba
+        .quad 0x3f94e526bd0814d1
+        .quad 0x3f9470aede96e7f2
+        .quad 0x3f93fc5578b93a38
+        .quad 0x3f93881a7b818f9e
+        .quad 0x3f9313fdd70ee5e8
+        .quad 0x3f929fff7b8ca79d
+        .quad 0x3f922c1f59329f1b
+        .quad 0x3f91b85d6044e9ae
+        .quad 0x3f9144b98113eac0
+        .quad 0x3f90d133abfc3f1b
+        .quad 0x3f905dcbd166b033
+        .quad 0x3f8fd503c3904f1d
+        .quad 0x3f8eeeab9b43445d
+        .quad 0x3f8e088f0b004827
+        .quad 0x3f8d22adf3f9579d
+        .quad 0x3f8c3d0837784c41
+        .quad 0x3f8b579db6dec358
+        .quad 0x3f8a726e53a6056e
+        .quad 0x3f898d79ef5eedf0
+        .quad 0x3f88a8c06bb1d2f4
+        .quad 0x3f87c441aa5e6d15
+        .quad 0x3f86dffd8d3bbf70
+        .quad 0x3f85fbf3f637ffc5
+        .quad 0x3f851824c7587eb0
+        .quad 0x3f84348fe2b99002
+        .quad 0x3f8351352a8e733f
+        .quad 0x3f826e1481213c2e
+        .quad 0x3f818b2dc8d2bb91
+        .quad 0x3f80a880e41a67f6
+        .quad 0x3f7f8c1b6b0c8d4e
+        .quad 0x3f7dc7a83f75a96d
+        .quad 0x3f7c03a80ae5e054
+        .quad 0x3f7a401a92ff827e
+        .quad 0x3f787cff9d9147a5
+        .quad 0x3f76ba56f09621bc
+        .quad 0x3f74f8205235102d
+        .quad 0x3f73365b88c0f347
+        .quad 0x3f7175085ab85ff0
+        .quad 0x3f6f684d1d8ae702
+        .quad 0x3f6be76bd77b4fc3
+        .quad 0x3f68676c71434fb9
+        .quad 0x3f64e84e793a474a
+        .quad 0x3f616a117e0d4b30
+        .quad 0x3f5bd96a1d7d9cbc
+        .quad 0x3f54e071754c98ba
+        .quad 0x3f4bd27045bfd025
+        .quad 0x3f3bcef518e29612
+        .quad 0x8000000000000000
+        /*== poly_coeff[5] ==*/
+        .align 32
+        .quad 0x3fb63C65231FBD16, 0x3fb63C65231FBD16, 0x3fb63C65231FBD16, 0x3fb63C65231FBD16 /* coeff5 */
+        .quad 0xbfbBCB7D4EFBE80B, 0xbfbBCB7D4EFBE80B, 0xbfbBCB7D4EFBE80B, 0xbfbBCB7D4EFBE80B /* coeff4 */
+        .quad 0x3fc287A7636F341E, 0x3fc287A7636F341E, 0x3fc287A7636F341E, 0x3fc287A7636F341E /* coeff3 */
+        .quad 0xbfcBCB7B1526DE36, 0xbfcBCB7B1526DE36, 0xbfcBCB7B1526DE36, 0xbfcBCB7B1526DE36 /* coeff2 */
+        .quad 0x3fdBCB7B1526E50E, 0x3fdBCB7B1526E50E, 0x3fdBCB7B1526E50E, 0x3fdBCB7B1526E50E /* coeff1 */
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 32
+        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
+        /*== MinNorm ==*/
+        .align 32
+        .quad 0x0010000000000000, 0x0010000000000000, 0x0010000000000000, 0x0010000000000000
+        /*== MaxNorm ==*/
+        .align 32
+        .quad 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff
+        /*== HalfMask ==*/
+        .align 32
+        .quad 0xfffffffffc000000, 0xfffffffffc000000, 0xfffffffffc000000, 0xfffffffffc000000
+        /*== One ==*/
+        .align 32
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== Threshold ==*/
+        .align 32
+        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 32
+        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 32
+        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
+        /*== L2 ==*/
+        .align 32
+        .quad 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff
+        .align 32
+        .type	__svml_dlog10_data_internal,@object
+        .size	__svml_dlog10_data_internal,.-__svml_dlog10_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core-avx2.S
new file mode 100644
index 0000000000..3432e7cffe
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized log10, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_log10 _ZGVeN8v_log10_avx2_wrapper
+#include "../svml_d_log108_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core.c
new file mode 100644
index 0000000000..273a0d4739
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized log10, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_log10
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_log10, __GI__ZGVeN8v_log10, __redirect__ZGVeN8v_log10)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core_avx512.S
new file mode 100644
index 0000000000..0799f99eba
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core_avx512.S
@@ -0,0 +1,299 @@
+/* Function log10 vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
+ *       log10(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dlog10_data_internal_avx512
+ */
+#define Log_tbl                       	0
+#define One                           	128
+#define C075                          	192
+#define poly_coeff9                   	256
+#define poly_coeff8                   	320
+#define poly_coeff7                   	384
+#define poly_coeff6                   	448
+#define poly_coeff5                   	512
+#define poly_coeff4                   	576
+#define poly_coeff3                   	640
+#define poly_coeff2                   	704
+#define poly_coeff1                   	768
+#define L2                            	832
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_log10_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovaps   %zmm0, %zmm7
+        vgetmantpd $8, {sae}, %zmm7, %zmm6
+        vmovups   One+__svml_dlog10_data_internal_avx512(%rip), %zmm3
+        vmovups   poly_coeff5+__svml_dlog10_data_internal_avx512(%rip), %zmm12
+        vmovups   poly_coeff3+__svml_dlog10_data_internal_avx512(%rip), %zmm13
+
+/* Start polynomial evaluation */
+        vmovups   poly_coeff9+__svml_dlog10_data_internal_avx512(%rip), %zmm10
+        vmovups   poly_coeff8+__svml_dlog10_data_internal_avx512(%rip), %zmm1
+        vmovups   poly_coeff7+__svml_dlog10_data_internal_avx512(%rip), %zmm11
+        vmovups   poly_coeff6+__svml_dlog10_data_internal_avx512(%rip), %zmm14
+
+/* Prepare exponent correction: DblRcp<0.75? */
+        vmovups   C075+__svml_dlog10_data_internal_avx512(%rip), %zmm2
+
+/* Table lookup */
+        vmovups   __svml_dlog10_data_internal_avx512(%rip), %zmm5
+
+/* GetExp(x) */
+        vgetexppd {sae}, %zmm7, %zmm0
+
+/* DblRcp ~ 1/Mantissa */
+        vrcp14pd  %zmm6, %zmm8
+
+/* x<=0? */
+        vfpclasspd $94, %zmm7, %k0
+
+/* round DblRcp to 4 fractional bits (RN mode, no Precision exception) */
+        vrndscalepd $88, {sae}, %zmm8, %zmm4
+        vmovups   poly_coeff4+__svml_dlog10_data_internal_avx512(%rip), %zmm8
+        kmovw     %k0, %edx
+
+/* Reduced argument: R = DblRcp*Mantissa - 1 */
+        vfmsub213pd {rn-sae}, %zmm3, %zmm4, %zmm6
+        vcmppd    $17, {sae}, %zmm2, %zmm4, %k1
+        vfmadd231pd {rn-sae}, %zmm6, %zmm12, %zmm8
+        vmovups   poly_coeff2+__svml_dlog10_data_internal_avx512(%rip), %zmm12
+        vfmadd231pd {rn-sae}, %zmm6, %zmm10, %zmm1
+        vfmadd231pd {rn-sae}, %zmm6, %zmm11, %zmm14
+        vmovups   poly_coeff1+__svml_dlog10_data_internal_avx512(%rip), %zmm2
+
+/* R^2 */
+        vmulpd    {rn-sae}, %zmm6, %zmm6, %zmm15
+        vfmadd231pd {rn-sae}, %zmm6, %zmm13, %zmm12
+
+/* Prepare table index */
+        vpsrlq    $48, %zmm4, %zmm9
+
+/* add 1 to Expon if DblRcp<0.75 */
+        vaddpd    {rn-sae}, %zmm3, %zmm0, %zmm0{%k1}
+        vmulpd    {rn-sae}, %zmm15, %zmm15, %zmm13
+        vfmadd213pd {rn-sae}, %zmm14, %zmm15, %zmm1
+        vfmadd213pd {rn-sae}, %zmm12, %zmm15, %zmm8
+        vpermt2pd Log_tbl+64+__svml_dlog10_data_internal_avx512(%rip), %zmm9, %zmm5
+
+/* polynomial */
+        vfmadd213pd {rn-sae}, %zmm8, %zmm13, %zmm1
+        vfmadd213pd {rn-sae}, %zmm2, %zmm6, %zmm1
+        vfmadd213pd {rn-sae}, %zmm5, %zmm1, %zmm6
+        vmovups   L2+__svml_dlog10_data_internal_avx512(%rip), %zmm1
+        vfmadd213pd {rn-sae}, %zmm6, %zmm1, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm7
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm7, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      log10@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_log10_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dlog10_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Log_tbl[16][2];
+        __declspec(align(64)) VUINT32 One[8][2];
+        __declspec(align(64)) VUINT32 C075[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
+        __declspec(align(64)) VUINT32 L2[8][2];
+   } __svml_dlog10_data_internal_avx512;
+#endif
+__svml_dlog10_data_internal_avx512:
+        /*== Log_tbl ==*/
+        .quad 0x0000000000000000
+        .quad 0xbf9af5f92b00e610
+        .quad 0xbfaa30a9d609efea
+        .quad 0xbfb31b3055c47118
+        .quad 0xbfb8cf183886480d
+        .quad 0xbfbe3bc1ab0e19fe
+        .quad 0xbfc1b3e71ec94f7b
+        .quad 0xbfc42c7e7fe3fc02
+        .quad 0x3fbffbfc2bbc7803
+        .quad 0x3fbb721cd17157e3
+        .quad 0x3fb715d0ce367afc
+        .quad 0x3fb2e3a740b7800f
+        .quad 0x3fadb11ed766abf4
+        .quad 0x3fa5e3966b7e9295
+        .quad 0x3f9cb38fccd8bfdb
+        .quad 0x3f8c3d0837784c41
+        /*== One ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== 0.75 ==*/
+        .align 64
+        .quad 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000
+        /*== poly_coeff9 ==*/
+        .align 64
+        .quad 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370
+        /*== poly_coeff8 ==*/
+        .align 64
+        .quad 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814
+        /*== poly_coeff7 ==*/
+        .align 64
+        .quad 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2
+        /*== poly_coeff6 ==*/
+        .align 64
+        .quad 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80
+        /*== poly_coeff5 ==*/
+        .align 64
+        .quad 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9
+        /*== poly_coeff4 ==*/
+        .align 64
+        .quad 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3
+        /*== poly_coeff3 ==*/
+        .align 64
+        .quad 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c
+        /*== poly_coeff2 ==*/
+        .align 64
+        .quad 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db
+        /*== poly_coeff1 ==*/
+        .align 64
+        .quad 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e
+        /*== L2 ==*/
+        .align 64
+        .quad 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff
+        .align 64
+        .type	__svml_dlog10_data_internal_avx512,@object
+        .size	__svml_dlog10_data_internal_avx512,.-__svml_dlog10_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core-avx2.S
new file mode 100644
index 0000000000..e389e2eca1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized log10f.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_log10f _ZGVeN16v_log10f_avx2_wrapper
+#include "../svml_s_log10f16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core.c
new file mode 100644
index 0000000000..274fc7e0ff
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized log10f, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_log10f
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_log10f, __GI__ZGVeN16v_log10f,
+	       __redirect__ZGVeN16v_log10f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core_avx512.S
new file mode 100644
index 0000000000..3dffd662ab
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core_avx512.S
@@ -0,0 +1,238 @@
+/* Function log10f vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
+ *       log10(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_slog10_data_internal_avx512
+ */
+#define One                           	0
+#define coeff4                        	64
+#define coeff3                        	128
+#define coeff2                        	192
+#define coeff1                        	256
+#define L2                            	320
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_log10f_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vgetmantps $11, {sae}, %zmm0, %zmm3
+        vmovups   __svml_slog10_data_internal_avx512(%rip), %zmm1
+        vgetexpps {sae}, %zmm0, %zmm5
+        vmovups   L2+__svml_slog10_data_internal_avx512(%rip), %zmm10
+        vpsrld    $19, %zmm3, %zmm7
+        vgetexpps {sae}, %zmm3, %zmm6
+        vsubps    {rn-sae}, %zmm1, %zmm3, %zmm11
+        vpermps   coeff4+__svml_slog10_data_internal_avx512(%rip), %zmm7, %zmm1
+        vpermps   coeff3+__svml_slog10_data_internal_avx512(%rip), %zmm7, %zmm2
+        vsubps    {rn-sae}, %zmm6, %zmm5, %zmm9
+        vpermps   coeff2+__svml_slog10_data_internal_avx512(%rip), %zmm7, %zmm4
+        vpermps   coeff1+__svml_slog10_data_internal_avx512(%rip), %zmm7, %zmm8
+
+/* x<=0? */
+        vfpclassps $94, %zmm0, %k0
+        vfmadd213ps {rn-sae}, %zmm2, %zmm11, %zmm1
+        vmulps    {rn-sae}, %zmm10, %zmm9, %zmm12
+        vfmadd213ps {rn-sae}, %zmm4, %zmm11, %zmm1
+        kmovw     %k0, %edx
+        vfmadd213ps {rn-sae}, %zmm8, %zmm11, %zmm1
+        vfmadd213ps {rn-sae}, %zmm12, %zmm11, %zmm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %zmm1, %zmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm0, 64(%rsp)
+        vmovups   %zmm1, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm1
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      log10f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_log10f_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_slog10_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 One[16][1];
+        __declspec(align(64)) VUINT32 coeff4[16][1];
+        __declspec(align(64)) VUINT32 coeff3[16][1];
+        __declspec(align(64)) VUINT32 coeff2[16][1];
+        __declspec(align(64)) VUINT32 coeff1[16][1];
+        __declspec(align(64)) VUINT32 L2[16][1];
+    } __svml_slog10_data_internal_avx512;
+#endif
+__svml_slog10_data_internal_avx512:
+        /*== One ==*/
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        // c4
+        .align 64
+        .long 0xbdc9ae9b, 0xbda6fcf4
+        .long 0xbd8bac76, 0xbd6bca30
+        .long 0xbd48a99b, 0xbd2c0a9f
+        .long 0xbd1480db, 0xbd00faf2
+        .long 0xbe823aa9, 0xbe656348
+        .long 0xbe4afbb9, 0xbe346895
+        .long 0xbe20ffff, 0xbe103a0b
+        .long 0xbe01a91c, 0xbde9e84e
+        // c3
+        .align 64
+        .long 0x3e13d888, 0x3e10a87c
+        .long 0x3e0b95c3, 0x3e057f0b
+        .long 0x3dfde038, 0x3df080d9
+        .long 0x3de34c1e, 0x3dd68333
+        .long 0x3dac6e8e, 0x3dd54a51
+        .long 0x3df30f40, 0x3e04235d
+        .long 0x3e0b7033, 0x3e102c90
+        .long 0x3e12ebad, 0x3e141ff8
+        // c2
+        .align 64
+        .long 0xbe5e5a9b, 0xbe5e2677
+        .long 0xbe5d83f5, 0xbe5c6016
+        .long 0xbe5abd0b, 0xbe58a6fd
+        .long 0xbe562e02, 0xbe5362f8
+        .long 0xbe68e27c, 0xbe646747
+        .long 0xbe619a73, 0xbe5ff05a
+        .long 0xbe5f0570, 0xbe5e92d0
+        .long 0xbe5e662b, 0xbe5e5c08
+        // c1
+        .align 64
+        .long 0x3ede5bd8, 0x3ede5b45
+        .long 0x3ede57d8, 0x3ede4eb1
+        .long 0x3ede3d37, 0x3ede2166
+        .long 0x3eddf9d9, 0x3eddc5bb
+        .long 0x3ede08ed, 0x3ede32e7
+        .long 0x3ede4967, 0x3ede5490
+        .long 0x3ede597f, 0x3ede5b50
+        .long 0x3ede5bca, 0x3ede5bd9
+        /*== L2 ==*/
+        .align 64
+        .long 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b
+        .align 64
+        .type	__svml_slog10_data_internal_avx512,@object
+        .size	__svml_slog10_data_internal_avx512,.-__svml_slog10_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core-sse2.S
new file mode 100644
index 0000000000..bb1cdee37e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized log10f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_log10f _ZGVbN4v_log10f_sse2
+#include "../svml_s_log10f4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core.c
new file mode 100644
index 0000000000..67e9e71a76
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized log10f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_log10f
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_log10f, __GI__ZGVbN4v_log10f,
+	       __redirect__ZGVbN4v_log10f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core_sse4.S
new file mode 100644
index 0000000000..88b3535d5c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core_sse4.S
@@ -0,0 +1,243 @@
+/* Function log10f vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
+ *       log10(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_slog10_data_internal
+ */
+#define MinNorm                       	0
+#define MaxNorm                       	16
+#define L2H                           	32
+#define L2L                           	48
+#define iBrkValue                     	64
+#define iOffExpoMask                  	80
+#define One                           	96
+#define sPoly                         	112
+#define L2                            	256
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_log10f_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm1
+
+/* reduction: compute r,n */
+        movdqu    iBrkValue+__svml_slog10_data_internal(%rip), %xmm2
+        movaps    %xmm0, %xmm4
+        movdqu    iOffExpoMask+__svml_slog10_data_internal(%rip), %xmm10
+        psubd     %xmm2, %xmm1
+        pand      %xmm1, %xmm10
+        psrad     $23, %xmm1
+        paddd     %xmm2, %xmm10
+        movaps    %xmm0, %xmm3
+        movups    sPoly+__svml_slog10_data_internal(%rip), %xmm5
+        movups    sPoly+32+__svml_slog10_data_internal(%rip), %xmm6
+        movups    sPoly+64+__svml_slog10_data_internal(%rip), %xmm7
+        movups    sPoly+96+__svml_slog10_data_internal(%rip), %xmm9
+        cvtdq2ps  %xmm1, %xmm12
+        cmpltps   MinNorm+__svml_slog10_data_internal(%rip), %xmm4
+        cmpnleps  MaxNorm+__svml_slog10_data_internal(%rip), %xmm3
+        subps     One+__svml_slog10_data_internal(%rip), %xmm10
+        mulps     %xmm10, %xmm5
+        movaps    %xmm10, %xmm8
+        mulps     %xmm10, %xmm6
+        mulps     %xmm10, %xmm8
+        addps     sPoly+16+__svml_slog10_data_internal(%rip), %xmm5
+        mulps     %xmm10, %xmm7
+        addps     sPoly+48+__svml_slog10_data_internal(%rip), %xmm6
+        mulps     %xmm10, %xmm9
+        mulps     %xmm8, %xmm5
+        addps     sPoly+80+__svml_slog10_data_internal(%rip), %xmm7
+        addps     sPoly+112+__svml_slog10_data_internal(%rip), %xmm9
+        addps     %xmm5, %xmm6
+        mulps     %xmm8, %xmm6
+        orps      %xmm3, %xmm4
+
+/* combine and get argument value range mask */
+        movmskps  %xmm4, %edx
+        movups    L2L+__svml_slog10_data_internal(%rip), %xmm1
+        addps     %xmm6, %xmm7
+        mulps     %xmm12, %xmm1
+        mulps     %xmm7, %xmm8
+        movups    L2H+__svml_slog10_data_internal(%rip), %xmm11
+        addps     %xmm8, %xmm9
+        mulps     %xmm11, %xmm12
+        mulps     %xmm10, %xmm9
+        addps     sPoly+128+__svml_slog10_data_internal(%rip), %xmm9
+        mulps     %xmm9, %xmm10
+        addps     %xmm10, %xmm1
+        addps     %xmm12, %xmm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm1, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm1, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      log10f@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_log10f_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_slog10_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 MinNorm[4][1];
+        __declspec(align(16)) VUINT32 MaxNorm[4][1];
+        __declspec(align(16)) VUINT32 L2H[4][1];
+        __declspec(align(16)) VUINT32 L2L[4][1];
+        __declspec(align(16)) VUINT32 iBrkValue[4][1];
+        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
+        __declspec(align(16)) VUINT32 One[4][1];
+        __declspec(align(16)) VUINT32 sPoly[9][4][1];
+        __declspec(align(16)) VUINT32 L2[4][1];
+} __svml_slog10_data_internal;
+#endif
+__svml_slog10_data_internal:
+        /*== MinNorm ==*/
+        .long 0x00800000, 0x00800000, 0x00800000, 0x00800000
+        /*== MaxNorm ==*/
+        .align 16
+        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
+        /*== L2H ==*/
+        .align 16
+        .long 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100
+        /*== L2L ==*/
+        .align 16
+        .long 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 16
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 16
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sOne = SP 1.0 ==*/
+        .align 16
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== spoly[9] ==*/
+        .align 16
+        .long 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4 /* coeff9 */
+        .long 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073 /* coeff8 */
+        .long 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317 /* coeff7 */
+        .long 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27 /* coeff6 */
+        .long 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96 /* coeff5 */
+        .long 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20 /* coeff4 */
+        .long 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5 /* coeff3 */
+        .long 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5 /* coeff2 */
+        .long 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9 /* coeff1 */
+        /*== L2 ==*/
+        .align 16
+        .long 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b
+        .align 16
+        .type	__svml_slog10_data_internal,@object
+        .size	__svml_slog10_data_internal,.-__svml_slog10_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core-sse.S
new file mode 100644
index 0000000000..e3467e5c90
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized log10f, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_log10f _ZGVdN8v_log10f_sse_wrapper
+#include "../svml_s_log10f8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core.c
new file mode 100644
index 0000000000..bfd3ef6554
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized log10f, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_log10f
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_log10f, __GI__ZGVdN8v_log10f,
+	       __redirect__ZGVdN8v_log10f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core_avx2.S
new file mode 100644
index 0000000000..58e26342e7
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core_avx2.S
@@ -0,0 +1,243 @@
+/* Function log10f vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
+ *       log10(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_slog10_data_internal
+ */
+#define MinNorm                       	0
+#define MaxNorm                       	32
+#define L2H                           	64
+#define L2L                           	96
+#define iBrkValue                     	128
+#define iOffExpoMask                  	160
+#define One                           	192
+#define sPoly                         	224
+#define L2                            	512
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_log10f_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+
+/* reduction: compute r,n */
+        vmovups   iBrkValue+__svml_slog10_data_internal(%rip), %ymm4
+        vmovups   sPoly+__svml_slog10_data_internal(%rip), %ymm15
+        vmovups   sPoly+64+__svml_slog10_data_internal(%rip), %ymm9
+        vmovups   sPoly+128+__svml_slog10_data_internal(%rip), %ymm10
+        vmovups   sPoly+192+__svml_slog10_data_internal(%rip), %ymm12
+        vpsubd    %ymm4, %ymm0, %ymm1
+        vcmplt_oqps MinNorm+__svml_slog10_data_internal(%rip), %ymm0, %ymm5
+        vcmpnle_uqps MaxNorm+__svml_slog10_data_internal(%rip), %ymm0, %ymm6
+        vpand     iOffExpoMask+__svml_slog10_data_internal(%rip), %ymm1, %ymm3
+        vpsrad    $23, %ymm1, %ymm2
+        vpaddd    %ymm4, %ymm3, %ymm8
+        vcvtdq2ps %ymm2, %ymm1
+        vsubps    One+__svml_slog10_data_internal(%rip), %ymm8, %ymm13
+        vmulps    L2L+__svml_slog10_data_internal(%rip), %ymm1, %ymm14
+        vfmadd213ps sPoly+32+__svml_slog10_data_internal(%rip), %ymm13, %ymm15
+        vfmadd213ps sPoly+96+__svml_slog10_data_internal(%rip), %ymm13, %ymm9
+        vmulps    %ymm13, %ymm13, %ymm11
+        vfmadd213ps sPoly+160+__svml_slog10_data_internal(%rip), %ymm13, %ymm10
+        vfmadd213ps sPoly+224+__svml_slog10_data_internal(%rip), %ymm13, %ymm12
+        vfmadd213ps %ymm9, %ymm11, %ymm15
+        vfmadd213ps %ymm10, %ymm11, %ymm15
+        vfmadd213ps %ymm12, %ymm11, %ymm15
+        vfmadd213ps sPoly+256+__svml_slog10_data_internal(%rip), %ymm13, %ymm15
+        vfmadd213ps %ymm14, %ymm13, %ymm15
+        vorps     %ymm6, %ymm5, %ymm7
+
+/* combine and get argument value range mask */
+        vmovmskps %ymm7, %edx
+        vfmadd132ps L2H+__svml_slog10_data_internal(%rip), %ymm15, %ymm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %ymm1, %ymm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm0, 32(%rsp)
+        vmovups   %ymm1, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm1
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      log10f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_log10f_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_slog10_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 MinNorm[8][1];
+        __declspec(align(32)) VUINT32 MaxNorm[8][1];
+        __declspec(align(32)) VUINT32 L2H[8][1];
+        __declspec(align(32)) VUINT32 L2L[8][1];
+        __declspec(align(32)) VUINT32 iBrkValue[8][1];
+        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
+        __declspec(align(32)) VUINT32 One[8][1];
+        __declspec(align(32)) VUINT32 sPoly[9][8][1];
+        __declspec(align(32)) VUINT32 L2[8][1];
+} __svml_slog10_data_internal;
+#endif
+__svml_slog10_data_internal:
+        /*== MinNorm ==*/
+        .long 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000
+        /*== MaxNorm ==*/
+        .align 32
+        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
+        /*== L2H ==*/
+        .align 32
+        .long 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100
+        /*== L2L ==*/
+        .align 32
+        .long 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 32
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 32
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sOne = SP 1.0 ==*/
+        .align 32
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== spoly[9] ==*/
+        .align 32
+        .long 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4 /* coeff9 */
+        .long 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073 /* coeff8 */
+        .long 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317 /* coeff7 */
+        .long 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27 /* coeff6 */
+        .long 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96 /* coeff5 */
+        .long 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20 /* coeff4 */
+        .long 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5 /* coeff3 */
+        .long 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5 /* coeff2 */
+        .long 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9 /* coeff1 */
+        /*== L2 ==*/
+        .align 32
+        .long 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b
+        .align 32
+        .type	__svml_slog10_data_internal,@object
+        .size	__svml_slog10_data_internal,.-__svml_slog10_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_log102_core.S b/sysdeps/x86_64/fpu/svml_d_log102_core.S
new file mode 100644
index 0000000000..3d0c058ac2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log102_core.S
@@ -0,0 +1,29 @@
+/* Function log10 vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_log10)
+WRAPPER_IMPL_SSE2 log10
+END (_ZGVbN2v_log10)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_log10)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_log104_core.S b/sysdeps/x86_64/fpu/svml_d_log104_core.S
new file mode 100644
index 0000000000..9e32c62c0e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log104_core.S
@@ -0,0 +1,29 @@
+/* Function log10 vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_log10)
+WRAPPER_IMPL_AVX _ZGVbN2v_log10
+END (_ZGVdN4v_log10)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_log10)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_log104_core_avx.S b/sysdeps/x86_64/fpu/svml_d_log104_core_avx.S
new file mode 100644
index 0000000000..2b073b16f9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log104_core_avx.S
@@ -0,0 +1,25 @@
+/* Function log10 vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_log10)
+WRAPPER_IMPL_AVX _ZGVbN2v_log10
+END (_ZGVcN4v_log10)
diff --git a/sysdeps/x86_64/fpu/svml_d_log108_core.S b/sysdeps/x86_64/fpu/svml_d_log108_core.S
new file mode 100644
index 0000000000..853d791f2d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log108_core.S
@@ -0,0 +1,25 @@
+/* Function log10 vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_log10)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_log10
+END (_ZGVeN8v_log10)
diff --git a/sysdeps/x86_64/fpu/svml_s_log10f16_core.S b/sysdeps/x86_64/fpu/svml_s_log10f16_core.S
new file mode 100644
index 0000000000..769603c92d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log10f16_core.S
@@ -0,0 +1,25 @@
+/* Function log10f vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_log10f)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_log10f
+END (_ZGVeN16v_log10f)
diff --git a/sysdeps/x86_64/fpu/svml_s_log10f4_core.S b/sysdeps/x86_64/fpu/svml_s_log10f4_core.S
new file mode 100644
index 0000000000..523525409b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log10f4_core.S
@@ -0,0 +1,29 @@
+/* Function log10f vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_log10f)
+WRAPPER_IMPL_SSE2 log10f
+END (_ZGVbN4v_log10f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_log10f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_log10f8_core.S b/sysdeps/x86_64/fpu/svml_s_log10f8_core.S
new file mode 100644
index 0000000000..630ec76b7f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log10f8_core.S
@@ -0,0 +1,29 @@
+/* Function log10f vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_log10f)
+WRAPPER_IMPL_AVX _ZGVbN4v_log10f
+END (_ZGVdN8v_log10f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_log10f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S
new file mode 100644
index 0000000000..374208cb2c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function log10f vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_log10f)
+WRAPPER_IMPL_AVX _ZGVbN4v_log10f
+END (_ZGVcN8v_log10f)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx.c
new file mode 100644
index 0000000000..770fd725e0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-log10.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx2.c
new file mode 100644
index 0000000000..770fd725e0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-log10.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx512f.c
new file mode 100644
index 0000000000..770fd725e0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-log10.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log10.c b/sysdeps/x86_64/fpu/test-double-libmvec-log10.c
new file mode 100644
index 0000000000..cb1ab36819
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log10.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC log10
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 37a7a1c777..3dce136dfc 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVbN2v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
+VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 4313f67e06..1852625897 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVdN4v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
+VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 4b8b00f16d..cf9ea35ffe 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVcN4v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
+VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index d06522a407..b6457ea032 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1)
 VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVeN8v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
+VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx.c
new file mode 100644
index 0000000000..04f017f1e2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-log10f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx2.c
new file mode 100644
index 0000000000..04f017f1e2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-log10f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx512f.c
new file mode 100644
index 0000000000..04f017f1e2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-log10f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log10f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log10f.c
new file mode 100644
index 0000000000..682ce1e239
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log10f.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC log10f
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 0bd631bf9a..272e754e1b 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVeN16v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
+VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 1018398bd3..b892258b99 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVbN4v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
+VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 42ea28f30f..1c6ead71e1 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVdN8v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
+VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 70a0216a07..71f5d8d7b6 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f)
 VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVcN8v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
+VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 12/18] x86-64: Add vector log2/log2f implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (10 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 11/18] x86-64: Add vector log10/log10f " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:26   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 13/18] x86-64: Add vector log1p/log1pf " Sunil K Pandey
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized log2/log2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector log2/log2f with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |   11 +
 math/bits/mathcalls.h                         |    2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
 sysdeps/x86/fpu/bits/math-vector.h            |    4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
 sysdeps/x86_64/fpu/Makeconfig                 |    1 +
 sysdeps/x86_64/fpu/Versions                   |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
 .../fpu/multiarch/svml_d_log22_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_log22_core.c  |   27 +
 .../fpu/multiarch/svml_d_log22_core_sse4.S    | 1339 +++++++++++++++++
 .../fpu/multiarch/svml_d_log24_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_log24_core.c  |   27 +
 .../fpu/multiarch/svml_d_log24_core_avx2.S    | 1324 ++++++++++++++++
 .../fpu/multiarch/svml_d_log28_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_log28_core.c  |   27 +
 .../fpu/multiarch/svml_d_log28_core_avx512.S  |  293 ++++
 .../fpu/multiarch/svml_s_log2f16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_log2f16_core.c       |   28 +
 .../multiarch/svml_s_log2f16_core_avx512.S    |  231 +++
 .../fpu/multiarch/svml_s_log2f4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_log2f4_core.c |   28 +
 .../fpu/multiarch/svml_s_log2f4_core_sse4.S   |  223 +++
 .../fpu/multiarch/svml_s_log2f8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_log2f8_core.c |   28 +
 .../fpu/multiarch/svml_s_log2f8_core_avx2.S   |  226 +++
 sysdeps/x86_64/fpu/svml_d_log22_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_log24_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_log24_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_log28_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_s_log2f16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_log2f4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_log2f8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S   |   25 +
 .../x86_64/fpu/test-double-libmvec-log2-avx.c |    1 +
 .../fpu/test-double-libmvec-log2-avx2.c       |    1 +
 .../fpu/test-double-libmvec-log2-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-log2.c |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-log2f-avx.c |    1 +
 .../fpu/test-float-libmvec-log2f-avx2.c       |    1 +
 .../fpu/test-float-libmvec-log2f-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-log2f.c |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
 50 files changed, 4208 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log22_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log22_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log22_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log24_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log24_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log24_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log28_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log28_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log28_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log22_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log24_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log24_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log28_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 4ad584c227..73252615ca 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -230,4 +230,15 @@
 #define __DECL_SIMD_log10f32x
 #define __DECL_SIMD_log10f64x
 #define __DECL_SIMD_log10f128x
+
+#define __DECL_SIMD_log2
+#define __DECL_SIMD_log2f
+#define __DECL_SIMD_log2l
+#define __DECL_SIMD_log2f16
+#define __DECL_SIMD_log2f32
+#define __DECL_SIMD_log2f64
+#define __DECL_SIMD_log2f128
+#define __DECL_SIMD_log2f32x
+#define __DECL_SIMD_log2f64x
+#define __DECL_SIMD_log2f128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index f21384758a..bfe52a4666 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -130,7 +130,7 @@ __MATHCALL (logb,, (_Mdouble_ __x));
 __MATHCALL_VEC (exp2,, (_Mdouble_ __x));
 
 /* Compute base-2 logarithm of X.  */
-__MATHCALL (log2,, (_Mdouble_ __x));
+__MATHCALL_VEC (log2,, (_Mdouble_ __x));
 #endif
 
 
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 8108a2a189..fa8b016c5d 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -55,6 +55,7 @@ GLIBC_2.35 _ZGVbN2v_exp10 F
 GLIBC_2.35 _ZGVbN2v_exp2 F
 GLIBC_2.35 _ZGVbN2v_expm1 F
 GLIBC_2.35 _ZGVbN2v_log10 F
+GLIBC_2.35 _ZGVbN2v_log2 F
 GLIBC_2.35 _ZGVbN2v_sinh F
 GLIBC_2.35 _ZGVbN2vv_atan2 F
 GLIBC_2.35 _ZGVbN2vv_hypot F
@@ -67,6 +68,7 @@ GLIBC_2.35 _ZGVbN4v_exp10f F
 GLIBC_2.35 _ZGVbN4v_exp2f F
 GLIBC_2.35 _ZGVbN4v_expm1f F
 GLIBC_2.35 _ZGVbN4v_log10f F
+GLIBC_2.35 _ZGVbN4v_log2f F
 GLIBC_2.35 _ZGVbN4v_sinhf F
 GLIBC_2.35 _ZGVbN4vv_atan2f F
 GLIBC_2.35 _ZGVbN4vv_hypotf F
@@ -79,6 +81,7 @@ GLIBC_2.35 _ZGVcN4v_exp10 F
 GLIBC_2.35 _ZGVcN4v_exp2 F
 GLIBC_2.35 _ZGVcN4v_expm1 F
 GLIBC_2.35 _ZGVcN4v_log10 F
+GLIBC_2.35 _ZGVcN4v_log2 F
 GLIBC_2.35 _ZGVcN4v_sinh F
 GLIBC_2.35 _ZGVcN4vv_atan2 F
 GLIBC_2.35 _ZGVcN4vv_hypot F
@@ -91,6 +94,7 @@ GLIBC_2.35 _ZGVcN8v_exp10f F
 GLIBC_2.35 _ZGVcN8v_exp2f F
 GLIBC_2.35 _ZGVcN8v_expm1f F
 GLIBC_2.35 _ZGVcN8v_log10f F
+GLIBC_2.35 _ZGVcN8v_log2f F
 GLIBC_2.35 _ZGVcN8v_sinhf F
 GLIBC_2.35 _ZGVcN8vv_atan2f F
 GLIBC_2.35 _ZGVcN8vv_hypotf F
@@ -103,6 +107,7 @@ GLIBC_2.35 _ZGVdN4v_exp10 F
 GLIBC_2.35 _ZGVdN4v_exp2 F
 GLIBC_2.35 _ZGVdN4v_expm1 F
 GLIBC_2.35 _ZGVdN4v_log10 F
+GLIBC_2.35 _ZGVdN4v_log2 F
 GLIBC_2.35 _ZGVdN4v_sinh F
 GLIBC_2.35 _ZGVdN4vv_atan2 F
 GLIBC_2.35 _ZGVdN4vv_hypot F
@@ -115,6 +120,7 @@ GLIBC_2.35 _ZGVdN8v_exp10f F
 GLIBC_2.35 _ZGVdN8v_exp2f F
 GLIBC_2.35 _ZGVdN8v_expm1f F
 GLIBC_2.35 _ZGVdN8v_log10f F
+GLIBC_2.35 _ZGVdN8v_log2f F
 GLIBC_2.35 _ZGVdN8v_sinhf F
 GLIBC_2.35 _ZGVdN8vv_atan2f F
 GLIBC_2.35 _ZGVdN8vv_hypotf F
@@ -127,6 +133,7 @@ GLIBC_2.35 _ZGVeN16v_exp10f F
 GLIBC_2.35 _ZGVeN16v_exp2f F
 GLIBC_2.35 _ZGVeN16v_expm1f F
 GLIBC_2.35 _ZGVeN16v_log10f F
+GLIBC_2.35 _ZGVeN16v_log2f F
 GLIBC_2.35 _ZGVeN16v_sinhf F
 GLIBC_2.35 _ZGVeN16vv_atan2f F
 GLIBC_2.35 _ZGVeN16vv_hypotf F
@@ -139,6 +146,7 @@ GLIBC_2.35 _ZGVeN8v_exp10 F
 GLIBC_2.35 _ZGVeN8v_exp2 F
 GLIBC_2.35 _ZGVeN8v_expm1 F
 GLIBC_2.35 _ZGVeN8v_log10 F
+GLIBC_2.35 _ZGVeN8v_log2 F
 GLIBC_2.35 _ZGVeN8v_sinh F
 GLIBC_2.35 _ZGVeN8vv_atan2 F
 GLIBC_2.35 _ZGVeN8vv_hypot F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 64e80ada7a..59d284a10a 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -106,6 +106,10 @@
 #  define __DECL_SIMD_log10 __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_log10f
 #  define __DECL_SIMD_log10f __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_log2
+#  define __DECL_SIMD_log2 __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_log2f
+#  define __DECL_SIMD_log2f __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index f5050c68af..a2ca9a203f 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -52,6 +52,8 @@
 !GCC$ builtin (atan2f) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (log10) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (log10f) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (log2) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (log2f) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -89,3 +91,5 @@
 !GCC$ builtin (atan2f) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (log10) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (log10f) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (log2) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (log2f) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index ba37044e9d..8d6d0915af 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -36,6 +36,7 @@ libmvec-funcs = \
   hypot \
   log \
   log10 \
+  log2 \
   pow \
   sin \
   sincos \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 8beaf0736f..1b48c2d642 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -23,6 +23,7 @@ libmvec {
     _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
     _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
     _ZGVbN2v_log10; _ZGVcN4v_log10; _ZGVdN4v_log10; _ZGVeN8v_log10;
+    _ZGVbN2v_log2; _ZGVcN4v_log2; _ZGVdN4v_log2; _ZGVeN8v_log2;
     _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
     _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
     _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
@@ -35,6 +36,7 @@ libmvec {
     _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
     _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
     _ZGVbN4v_log10f; _ZGVcN8v_log10f; _ZGVdN8v_log10f; _ZGVeN16v_log10f;
+    _ZGVbN4v_log2f; _ZGVcN8v_log2f; _ZGVdN8v_log2f; _ZGVeN16v_log2f;
     _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
     _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
     _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index b0cd9d60ea..3b7f3cee6f 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1709,6 +1709,26 @@ float: 3
 float128: 1
 ldouble: 1
 
+Function: "log2_vlen16":
+float: 1
+
+Function: "log2_vlen2":
+double: 1
+
+Function: "log2_vlen4":
+double: 1
+float: 1
+
+Function: "log2_vlen4_avx2":
+double: 1
+
+Function: "log2_vlen8":
+double: 1
+float: 1
+
+Function: "log2_vlen8_avx2":
+float: 1
+
 Function: "log_downward":
 float: 2
 float128: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core-sse2.S
new file mode 100644
index 0000000000..e0833a174b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized log2, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_log2 _ZGVbN2v_log2_sse2
+#include "../svml_d_log22_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core.c
new file mode 100644
index 0000000000..6d0b5a03ca
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized log2, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_log2
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_log2, __GI__ZGVbN2v_log2, __redirect__ZGVbN2v_log2)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core_sse4.S
new file mode 100644
index 0000000000..22c12fdfea
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core_sse4.S
@@ -0,0 +1,1339 @@
+/* Function log2 vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log2(x) = k - log2(Rcp) + poly_approximation(R)
+ *       log2(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dlog2_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	8208
+#define poly_coeff                    	12320
+#define ExpMask                       	12400
+#define Two10                         	12416
+#define MinNorm                       	12432
+#define MaxNorm                       	12448
+#define HalfMask                      	12464
+#define One                           	12480
+#define Threshold                     	12496
+#define Bias                          	12512
+#define Bias1                         	12528
+
+/* Lookup bias for data table __svml_dlog2_data_internal.  */
+#define Table_Lookup_Bias               -0x405ff0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_log2_sse4)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $64, %rsp
+
+/* exponent bits */
+        movaps    %xmm0, %xmm5
+
+/* preserve mantissa, set input exponent to 2^(-10) */
+        movups    ExpMask+__svml_dlog2_data_internal(%rip), %xmm1
+        psrlq     $20, %xmm5
+        andps     %xmm0, %xmm1
+        lea       Table_Lookup_Bias+__svml_dlog2_data_internal(%rip), %rsi
+        orps      Two10+__svml_dlog2_data_internal(%rip), %xmm1
+
+/* check range */
+        movaps    %xmm0, %xmm8
+
+/* reciprocal approximation good to at least 11 bits */
+        cvtpd2ps  %xmm1, %xmm2
+        cmpltpd   MinNorm+__svml_dlog2_data_internal(%rip), %xmm8
+        movlhps   %xmm2, %xmm2
+        movaps    %xmm0, %xmm7
+        rcpps     %xmm2, %xmm3
+        cmpnlepd  MaxNorm+__svml_dlog2_data_internal(%rip), %xmm7
+        cvtps2pd  %xmm3, %xmm12
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        movups    .FLT_11(%rip), %xmm4
+        orps      %xmm7, %xmm8
+        addpd     %xmm4, %xmm12
+
+/* combine and get argument value range mask */
+        movmskpd  %xmm8, %edx
+
+/* argument reduction */
+        movups    HalfMask+__svml_dlog2_data_internal(%rip), %xmm9
+        subpd     %xmm4, %xmm12
+        andps     %xmm1, %xmm9
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        movaps    %xmm12, %xmm10
+        subpd     %xmm9, %xmm1
+        mulpd     %xmm12, %xmm9
+        mulpd     %xmm12, %xmm1
+        subpd     One+__svml_dlog2_data_internal(%rip), %xmm9
+        addpd     %xmm9, %xmm1
+
+/* polynomial */
+        movups    poly_coeff+__svml_dlog2_data_internal(%rip), %xmm14
+        psrlq     $40, %xmm10
+        mulpd     %xmm1, %xmm14
+        movd      %xmm10, %eax
+        pshufd    $2, %xmm10, %xmm11
+        movaps    %xmm1, %xmm10
+        movups    poly_coeff+32+__svml_dlog2_data_internal(%rip), %xmm15
+        mulpd     %xmm1, %xmm10
+        addpd     poly_coeff+16+__svml_dlog2_data_internal(%rip), %xmm14
+        mulpd     %xmm1, %xmm15
+        mulpd     %xmm10, %xmm14
+        addpd     poly_coeff+48+__svml_dlog2_data_internal(%rip), %xmm15
+        movd      %xmm11, %ecx
+        movups    poly_coeff+64+__svml_dlog2_data_internal(%rip), %xmm11
+        addpd     %xmm14, %xmm15
+        mulpd     %xmm1, %xmm11
+        mulpd     %xmm15, %xmm10
+
+/* exponent */
+        movups    Threshold+__svml_dlog2_data_internal(%rip), %xmm13
+        cmpltpd   %xmm12, %xmm13
+        addpd     %xmm10, %xmm11
+        pshufd    $221, %xmm5, %xmm6
+
+/* biased exponent in DP format */
+        cvtdq2pd  %xmm6, %xmm3
+        movslq    %eax, %rax
+        movslq    %ecx, %rcx
+        andps     Bias+__svml_dlog2_data_internal(%rip), %xmm13
+        orps      Bias1+__svml_dlog2_data_internal(%rip), %xmm13
+        movsd     (%rsi,%rax), %xmm2
+        movhpd    (%rsi,%rcx), %xmm2
+        subpd     %xmm13, %xmm3
+
+/* reconstruction */
+        addpd     %xmm11, %xmm2
+        addpd     %xmm2, %xmm3
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm3, %xmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm3, 48(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm3
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 xmm3
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      log2@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVbN2v_log2_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dlog2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
+        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(16)) VUINT32 poly_coeff[5][2][2];
+        __declspec(align(16)) VUINT32 ExpMask[2][2];
+        __declspec(align(16)) VUINT32 Two10[2][2];
+        __declspec(align(16)) VUINT32 MinNorm[2][2];
+        __declspec(align(16)) VUINT32 MaxNorm[2][2];
+        __declspec(align(16)) VUINT32 HalfMask[2][2];
+        __declspec(align(16)) VUINT32 One[2][2];
+        __declspec(align(16)) VUINT32 Threshold[2][2];
+        __declspec(align(16)) VUINT32 Bias[2][2];
+        __declspec(align(16)) VUINT32 Bias1[2][2];
+} __svml_dlog2_data_internal;
+#endif
+__svml_dlog2_data_internal:
+        /* Log_HA_table */
+        .quad 0xc08ff00000000000, 0x0000000000000000
+        .quad 0xc08ff0040038c920, 0x3d52bfc81744e999
+        .quad 0xc08ff007ff0f0190, 0xbd59b2cedc63c895
+        .quad 0xc08ff00bfc839e88, 0xbd28e365e6741d71
+        .quad 0xc08ff00ff8979428, 0x3d4027998f69a77d
+        .quad 0xc08ff013f34bd5a0, 0x3d5dd2cb33fe6a89
+        .quad 0xc08ff017eca15518, 0xbd526514cdf2c019
+        .quad 0xc08ff01be49903d8, 0xbd44bfeeba165e04
+        .quad 0xc08ff01fdb33d218, 0xbd3fa79ee110cec3
+        .quad 0xc08ff023d072af20, 0xbd4eebb642c7fd60
+        .quad 0xc08ff027c4568948, 0x3d429b13d7093443
+        .quad 0xc08ff02bb6e04de8, 0x3d50f346bd36551e
+        .quad 0xc08ff02fa810e968, 0xbd5020bb662f1536
+        .quad 0xc08ff03397e94750, 0x3d5de76b56340995
+        .quad 0xc08ff037866a5218, 0x3d58065ff3304090
+        .quad 0xc08ff03b7394f360, 0x3d561fc9322fb785
+        .quad 0xc08ff03f5f6a13d0, 0x3d0abecd17d0d778
+        .quad 0xc08ff04349ea9b28, 0xbd588f3ad0ce4d44
+        .quad 0xc08ff04733177040, 0xbd4454ba4ac5f44d
+        .quad 0xc08ff04b1af178f8, 0xbd556f78faaa0887
+        .quad 0xc08ff04f01799a58, 0x3d49db8976de7469
+        .quad 0xc08ff052e6b0b868, 0xbd5cdb6fce17ef00
+        .quad 0xc08ff056ca97b668, 0xbd576de8c0412f09
+        .quad 0xc08ff05aad2f76a0, 0x3d30142c7ec6475c
+        .quad 0xc08ff05e8e78da70, 0xbd1e685afc26de72
+        .quad 0xc08ff0626e74c260, 0xbd40b64c954078a3
+        .quad 0xc08ff0664d240e10, 0xbd5fcde393462d7d
+        .quad 0xc08ff06a2a879c48, 0xbd537245eeeecc53
+        .quad 0xc08ff06e06a04ae8, 0x3d4ac306eb47b436
+        .quad 0xc08ff071e16ef6e8, 0xbd5a1fd9d3758f6b
+        .quad 0xc08ff075baf47c80, 0x3d2401fbaaa67e3c
+        .quad 0xc08ff0799331b6f0, 0x3d4f8dbef47a4d53
+        .quad 0xc08ff07d6a2780a8, 0x3d51215e0abb42d1
+        .quad 0xc08ff0813fd6b340, 0x3d57ce6249eddb35
+        .quad 0xc08ff08514402770, 0xbd38a803c7083a25
+        .quad 0xc08ff088e764b528, 0x3d42218beba5073e
+        .quad 0xc08ff08cb9453370, 0x3d447b66f1c6248f
+        .quad 0xc08ff09089e27880, 0xbd53d9297847e995
+        .quad 0xc08ff094593d59c8, 0xbd12b6979cc77aa9
+        .quad 0xc08ff0982756abd0, 0xbd55308545ecd702
+        .quad 0xc08ff09bf42f4260, 0xbd578fa97c3b936f
+        .quad 0xc08ff09fbfc7f068, 0xbd41828408ce869d
+        .quad 0xc08ff0a38a218808, 0x3d555da6ce7251a6
+        .quad 0xc08ff0a7533cda88, 0xbd41f3cd14bfcb02
+        .quad 0xc08ff0ab1b1ab878, 0xbd1f028da6bf1852
+        .quad 0xc08ff0aee1bbf188, 0xbd4cf04de3267f54
+        .quad 0xc08ff0b2a72154a8, 0xbd4556e47019db10
+        .quad 0xc08ff0b66b4baff8, 0x3d1e7ba00b15fbe4
+        .quad 0xc08ff0ba2e3bd0d0, 0x3d5bfde1c52c2f28
+        .quad 0xc08ff0bdeff283b8, 0x3d48d63fe20ee5d6
+        .quad 0xc08ff0c1b0709480, 0x3d57f551980838ff
+        .quad 0xc08ff0c56fb6ce20, 0xbd4189091f293c81
+        .quad 0xc08ff0c92dc5fae0, 0x3d4d549f05f06169
+        .quad 0xc08ff0ccea9ee428, 0xbd5982466074e1e3
+        .quad 0xc08ff0d0a64252b8, 0xbd5d30a6b16c0e4b
+        .quad 0xc08ff0d460b10e80, 0xbd3138bf3b51a201
+        .quad 0xc08ff0d819ebdea8, 0xbd454e680c0801d6
+        .quad 0xc08ff0dbd1f389a8, 0x3d584db361385926
+        .quad 0xc08ff0df88c8d520, 0xbd564f2252a82c03
+        .quad 0xc08ff0e33e6c8610, 0xbd5c78c35ed5d034
+        .quad 0xc08ff0e6f2df60a8, 0xbd52eb9f29ca3d75
+        .quad 0xc08ff0eaa6222860, 0x3d5340c0c01b5ff8
+        .quad 0xc08ff0ee58359fe8, 0x3d10c2acaffa64b6
+        .quad 0xc08ff0f2091a8948, 0xbd3fced311301ebe
+        .quad 0xc08ff0f5b8d1a5c8, 0x3d41ee5d591af30b
+        .quad 0xc08ff0f9675bb5f0, 0x3d4873546b0e668c
+        .quad 0xc08ff0fd14b97998, 0x3d5a99928177a119
+        .quad 0xc08ff100c0ebafd8, 0x3d378ead132adcac
+        .quad 0xc08ff1046bf31720, 0x3d51a538bc597d48
+        .quad 0xc08ff10815d06d18, 0xbd540ee2f35efd7e
+        .quad 0xc08ff10bbe846ec8, 0xbd59cf94753adacc
+        .quad 0xc08ff10f660fd878, 0xbd5201a3d6862895
+        .quad 0xc08ff1130c7365c0, 0x3d383e25d0822d03
+        .quad 0xc08ff116b1afd180, 0xbd0b7389bbea8f7b
+        .quad 0xc08ff11a55c5d5f0, 0xbd4df278087a6617
+        .quad 0xc08ff11df8b62c98, 0xbd48daeb8ec01e26
+        .quad 0xc08ff1219a818e50, 0x3d57c9312e0a14da
+        .quad 0xc08ff1253b28b330, 0xbd5f0fbc0e4d507e
+        .quad 0xc08ff128daac52c8, 0xbd222afdee008687
+        .quad 0xc08ff12c790d23d8, 0x3d17c71747bcef8b
+        .quad 0xc08ff130164bdc88, 0x3d5d69cfd051af50
+        .quad 0xc08ff133b2693248, 0x3d59dff064e9433a
+        .quad 0xc08ff1374d65d9e8, 0x3d4f71a30db3240b
+        .quad 0xc08ff13ae7428788, 0xbd5e56afa9524606
+        .quad 0xc08ff13e7fffeeb0, 0xbd44acd84e6f8518
+        .quad 0xc08ff142179ec228, 0xbd519845ade5e121
+        .quad 0xc08ff145ae1fb420, 0xbd5b3b4a38ddec70
+        .quad 0xc08ff14943837620, 0xbd5ea4bb5bc137c7
+        .quad 0xc08ff14cd7cab910, 0x3d5610f3bf8eb6ce
+        .quad 0xc08ff1506af62d20, 0x3d57b1170d6184cf
+        .quad 0xc08ff153fd0681f0, 0x3d5791a688a3660e
+        .quad 0xc08ff1578dfc6678, 0x3d5d41ecf8abac2e
+        .quad 0xc08ff15b1dd88908, 0x3cf0bd995d64d573
+        .quad 0xc08ff15eac9b9758, 0xbd5e3653cd796d01
+        .quad 0xc08ff1623a463e80, 0xbd597573005ef2d8
+        .quad 0xc08ff165c6d92af0, 0xbd4ee222d6439c41
+        .quad 0xc08ff16952550880, 0x3d5913b845e75950
+        .quad 0xc08ff16cdcba8258, 0xbd558e7ba239077e
+        .quad 0xc08ff170660a4328, 0x3d5a0e174a2cae66
+        .quad 0xc08ff173ee44f4d8, 0x3d22b8db103db712
+        .quad 0xc08ff177756b40d8, 0x3d5cc610480853c4
+        .quad 0xc08ff17afb7dcfe0, 0xbd304a8bc84e5c0f
+        .quad 0xc08ff17e807d4a28, 0x3d3639d185da5f7d
+        .quad 0xc08ff182046a5738, 0xbd534705d06d788f
+        .quad 0xc08ff18587459e10, 0xbd540d25b28a51fd
+        .quad 0xc08ff189090fc510, 0xbd02d804afa7080a
+        .quad 0xc08ff18c89c97200, 0x3d5f2a5d305818ba
+        .quad 0xc08ff19009734a08, 0xbd3a602e9d05c3e4
+        .quad 0xc08ff193880df1d0, 0xbd533d6fdcd54875
+        .quad 0xc08ff197059a0d60, 0x3d24eaf0a9490202
+        .quad 0xc08ff19a82184020, 0xbd5685666d98eb59
+        .quad 0xc08ff19dfd892cf8, 0xbd509f8745f0868b
+        .quad 0xc08ff1a177ed7630, 0xbd2dcba340a9d268
+        .quad 0xc08ff1a4f145bd80, 0x3d4916fcd0331266
+        .quad 0xc08ff1a86992a408, 0xbd548cd033a49073
+        .quad 0xc08ff1abe0d4ca68, 0xbd5252f40e5df1a2
+        .quad 0xc08ff1af570cd0a0, 0xbd541d623bd02248
+        .quad 0xc08ff1b2cc3b5628, 0xbd258dc48235c071
+        .quad 0xc08ff1b64060f9e0, 0xbd4b4bd8f02ed3f2
+        .quad 0xc08ff1b9b37e5a28, 0x3d4e8d20a88cd0a2
+        .quad 0xc08ff1bd259414c0, 0x3d3b669b6380bc55
+        .quad 0xc08ff1c096a2c6e8, 0xbd45d54159d51094
+        .quad 0xc08ff1c406ab0d58, 0x3d59f684ffbca44d
+        .quad 0xc08ff1c775ad8428, 0x3d543b1b1d508399
+        .quad 0xc08ff1cae3aac6f8, 0x3d5c30953a12fc6e
+        .quad 0xc08ff1ce50a370d0, 0xbd1763b04f9aad5f
+        .quad 0xc08ff1d1bc981c40, 0x3d573c6fa54f46c2
+        .quad 0xc08ff1d527896338, 0x3d48ccfb9ffd7455
+        .quad 0xc08ff1d89177df30, 0x3d42756f80d6f7ce
+        .quad 0xc08ff1dbfa642910, 0xbd3c2bfbc353c5a5
+        .quad 0xc08ff1df624ed940, 0x3d1d6064f5dc380b
+        .quad 0xc08ff1e2c9388798, 0x3ce327c6b30711cf
+        .quad 0xc08ff1e62f21cb70, 0x3d140aa9546525bc
+        .quad 0xc08ff1e9940b3b98, 0xbd15c1ff43c21863
+        .quad 0xc08ff1ecf7f56e60, 0x3d590ba680120498
+        .quad 0xc08ff1f05ae0f988, 0x3d5390c6b62dff50
+        .quad 0xc08ff1f3bcce7258, 0x3d4da0c90878457f
+        .quad 0xc08ff1f71dbe6d90, 0x3d30697edc85b98c
+        .quad 0xc08ff1fa7db17f70, 0x3d04d81188510a79
+        .quad 0xc08ff1fddca83bb0, 0xbd5f2ddc983ce25c
+        .quad 0xc08ff2013aa33598, 0x3d46c22f0fae6844
+        .quad 0xc08ff20497a2ffd0, 0xbd53359b714c3d03
+        .quad 0xc08ff207f3a82ca0, 0xbd4aefaa5524f88b
+        .quad 0xc08ff20b4eb34dc0, 0x3d39bf4a4a73d01d
+        .quad 0xc08ff20ea8c4f468, 0x3d44217befdb12e6
+        .quad 0xc08ff21201ddb158, 0x3d5219b281d4b6f8
+        .quad 0xc08ff21559fe14c8, 0xbd5e3b123373d370
+        .quad 0xc08ff218b126ae88, 0xbd59b525a6edc3cb
+        .quad 0xc08ff21c07580dd8, 0xbd4b494e7737c4dc
+        .quad 0xc08ff21f5c92c180, 0xbd3989b7d67e3e54
+        .quad 0xc08ff222b0d757d0, 0x3d486c8f098ad3cf
+        .quad 0xc08ff22604265e98, 0x3d5254956d8e15b2
+        .quad 0xc08ff22956806330, 0x3d3f14730a362959
+        .quad 0xc08ff22ca7e5f278, 0xbd40e8ed02e32ea1
+        .quad 0xc08ff22ff85798d8, 0xbd40fb2b9b1e0261
+        .quad 0xc08ff23347d5e238, 0xbd5bfeb1e13c8bc3
+        .quad 0xc08ff23696615a18, 0x3d5b891f041e037b
+        .quad 0xc08ff239e3fa8b60, 0xbd36255027582bb9
+        .quad 0xc08ff23d30a200a8, 0x3d56bb5a92a55361
+        .quad 0xc08ff2407c5843f0, 0xbd31902fb4417244
+        .quad 0xc08ff243c71dded8, 0xbd5a8a7c3c4a2cc6
+        .quad 0xc08ff24710f35a88, 0xbd23be1be6941016
+        .quad 0xc08ff24a59d93fa8, 0x3d55c85afafa1d46
+        .quad 0xc08ff24da1d01668, 0xbd5b4b05a0adcbf1
+        .quad 0xc08ff250e8d866a0, 0x3d134d191476f74b
+        .quad 0xc08ff2542ef2b798, 0x3d5e78ce963395e1
+        .quad 0xc08ff257741f9028, 0x3d3f9219a8f57c17
+        .quad 0xc08ff25ab85f76c8, 0x3d5cfc6f47ac691b
+        .quad 0xc08ff25dfbb2f168, 0x3d4ab3b720b5ca71
+        .quad 0xc08ff2613e1a8598, 0x3d54a4ab99feb71a
+        .quad 0xc08ff2647f96b868, 0xbd42daa69d79d724
+        .quad 0xc08ff267c0280e88, 0xbd344d9115018f45
+        .quad 0xc08ff26affcf0c28, 0xbd56673e143d2ac0
+        .quad 0xc08ff26e3e8c3518, 0x3d3aac889e91c638
+        .quad 0xc08ff2717c600ca8, 0x3d4cf65b41d006e7
+        .quad 0xc08ff274b94b15c0, 0xbd4c821320391e76
+        .quad 0xc08ff277f54dd2e8, 0x3d51abd6e2ddc2a1
+        .quad 0xc08ff27b3068c620, 0xbd2f1bdd1264e703
+        .quad 0xc08ff27e6a9c7110, 0xbd58437b4f032f15
+        .quad 0xc08ff281a3e954f0, 0xbd4f8e063b069a7d
+        .quad 0xc08ff284dc4ff288, 0x3d5276d0723a662a
+        .quad 0xc08ff28813d0ca28, 0xbd5731f7c6d8f6eb
+        .quad 0xc08ff28b4a6c5bd0, 0xbd58b587f08307ec
+        .quad 0xc08ff28e80232708, 0x3d57f19a7a352baf
+        .quad 0xc08ff291b4f5aae0, 0x3d570d99aff32790
+        .quad 0xc08ff294e8e46610, 0x3d4efafaad4f59db
+        .quad 0xc08ff2981befd6e0, 0xbd41eb1728371564
+        .quad 0xc08ff29b4e187b38, 0x3d458465b4e080d7
+        .quad 0xc08ff29e7f5ed088, 0x3d46acb4a035a820
+        .quad 0xc08ff2a1afc353e0, 0xbd39fc68238dd5d3
+        .quad 0xc08ff2a4df4681f0, 0x3d526d90c6750dde
+        .quad 0xc08ff2a80de8d6f0, 0x3d48505c598278fd
+        .quad 0xc08ff2ab3baacec0, 0x3d520fece8e148e8
+        .quad 0xc08ff2ae688ce4d0, 0x3d14f7bf38646243
+        .quad 0xc08ff2b1948f9430, 0xbd5aa5f693a627df
+        .quad 0xc08ff2b4bfb35790, 0xbd4725d8e6280861
+        .quad 0xc08ff2b7e9f8a930, 0x3d482e0765d44bda
+        .quad 0xc08ff2bb136002e8, 0xbd523d745da75cde
+        .quad 0xc08ff2be3be9de40, 0xbd32e50b4191ef73
+        .quad 0xc08ff2c16396b448, 0xbd490856dfe073b2
+        .quad 0xc08ff2c48a66fdb8, 0xbd512b526137db4d
+        .quad 0xc08ff2c7b05b32e8, 0x3d5bfcdc71b36585
+        .quad 0xc08ff2cad573cbb8, 0xbd2c24f2afddb377
+        .quad 0xc08ff2cdf9b13fc0, 0xbd5ea60d06da12f6
+        .quad 0xc08ff2d11d140630, 0xbd582f2f9e256dc5
+        .quad 0xc08ff2d43f9c95d0, 0xbd4411c269523864
+        .quad 0xc08ff2d7614b6508, 0xbd41107eeb7e1093
+        .quad 0xc08ff2da8220e9e8, 0x3d5a4aa491710eda
+        .quad 0xc08ff2dda21d9a10, 0x3d46e50a14550378
+        .quad 0xc08ff2e0c141ead0, 0xbd4881e3bd846de9
+        .quad 0xc08ff2e3df8e5118, 0xbd46d93437bd399d
+        .quad 0xc08ff2e6fd034170, 0xbd5b4ef1e9713a4c
+        .quad 0xc08ff2ea19a13010, 0x3d4a0e31ed25b3ef
+        .quad 0xc08ff2ed356890b8, 0xbd5a7a560db90113
+        .quad 0xc08ff2f05059d6f0, 0x3d51f5bb5f9072c9
+        .quad 0xc08ff2f36a7575c0, 0x3d5ed5225350a585
+        .quad 0xc08ff2f683bbdfe0, 0xbd1c9363d9e745db
+        .quad 0xc08ff2f99c2d87b8, 0x3d329c788e376e0d
+        .quad 0xc08ff2fcb3cadf40, 0xbd59eb5d29918de0
+        .quad 0xc08ff2ffca945828, 0xbd4a86aac097a06b
+        .quad 0xc08ff302e08a63b8, 0x3d541c2c97e8b4d1
+        .quad 0xc08ff305f5ad72d8, 0x3d43c95dec31821b
+        .quad 0xc08ff30909fdf620, 0xbd590abed3d72738
+        .quad 0xc08ff30c1d7c5dd8, 0x3d4caefdad90e913
+        .quad 0xc08ff30f302919d0, 0xbd4f7ed5e1dcb170
+        .quad 0xc08ff312420499a0, 0x3d3c590edf8c3407
+        .quad 0xc08ff315530f4c70, 0x3d5477d46ce838e1
+        .quad 0xc08ff3186349a118, 0x3d5e4b00c511fa78
+        .quad 0xc08ff31b72b40610, 0xbd54333e5a0c1658
+        .quad 0xc08ff31e814ee990, 0x3d25300b88bfa10a
+        .quad 0xc08ff3218f1ab958, 0xbd5bfbd520249ed7
+        .quad 0xc08ff3249c17e2f0, 0x3d436b1cdba645b7
+        .quad 0xc08ff327a846d368, 0xbd5cb667c2f86eaa
+        .quad 0xc08ff32ab3a7f7a0, 0x3d5334d06a920d5f
+        .quad 0xc08ff32dbe3bbbf8, 0xbd5407602ab64243
+        .quad 0xc08ff330c8028ca0, 0xbd52b12c9cc82316
+        .quad 0xc08ff333d0fcd560, 0x3d158d7dd801324b
+        .quad 0xc08ff336d92b01a8, 0xbd38b55deae69564
+        .quad 0xc08ff339e08d7ca0, 0x3d4a92d51dc43d43
+        .quad 0xc08ff33ce724b110, 0x3d5455afbb5de008
+        .quad 0xc08ff33fecf10970, 0x3d3b65694b6f87fb
+        .quad 0xc08ff342f1f2efe8, 0xbd3afb8ccc1260eb
+        .quad 0xc08ff345f62ace50, 0x3d59c98f7ec71b79
+        .quad 0xc08ff348f9990e18, 0xbd5238294ff3846d
+        .quad 0xc08ff34bfc3e1880, 0x3d4deba7087bbf7b
+        .quad 0xc08ff34efe1a5650, 0xbd573e25d2d308e5
+        .quad 0xc08ff351ff2e3020, 0xbd44bc302ffa76fb
+        .quad 0xc08ff354ff7a0e20, 0xbd2cad65891df000
+        .quad 0xc08ff357fefe5838, 0x3d4b4fe326c05a8a
+        .quad 0xc08ff35afdbb75f8, 0x3d0fb5680f67649b
+        .quad 0xc08ff35dfbb1cea8, 0xbd4af509a9977e57
+        .quad 0xc08ff360f8e1c940, 0x3cea69221cfb0ad6
+        .quad 0xc08ff363f54bcc60, 0x3d3d116c159fead5
+        .quad 0xc08ff366f0f03e58, 0xbd5e64e8bff70d5e
+        .quad 0xc08ff369ebcf8538, 0xbd5cc32ce5effb96
+        .quad 0xc08ff36ce5ea06b8, 0x3d57bbe811e4fbda
+        .quad 0xc08ff36fdf402830, 0xbcf46d4595033678
+        .quad 0xc08ff372d7d24ec8, 0x3d4c4bbec857b9fc
+        .quad 0xc08ff375cfa0df40, 0xbd59d3f339613a2d
+        .quad 0xc08ff378c6ac3e28, 0x3d58408e1bcb4e24
+        .quad 0xc08ff37bbcf4cfa0, 0x3d5fdb793dc8e643
+        .quad 0xc08ff37eb27af788, 0xbd5f0d884b401f1e
+        .quad 0xc08ff381a73f1988, 0xbd5a7ed37e2c50b4
+        .quad 0xc08ff3849b4198e8, 0x3d5b14c1f630b2af
+        .quad 0xc08ff3878e82d898, 0x3d505a9abef02aff
+        .quad 0xc08ff38a81033b50, 0xbd4a9bbd51a7d1c4
+        .quad 0xc08ff38d72c32380, 0x3d4783623464f80e
+        .quad 0xc08ff39063c2f338, 0xbd0e2d78f68abcc7
+        .quad 0xc08ff39354030c50, 0x3d3e604763e782cb
+        .quad 0xc08ff3964383d048, 0xbd4514f0840b6f59
+        .quad 0xc08ff3993245a060, 0xbd5488753d6035a4
+        .quad 0xc08ff39c2048dd90, 0x3d5ccc099b5ff97d
+        .quad 0xc08ff39f0d8de870, 0x3d454ada83325c69
+        .quad 0xc08ff3a1fa152168, 0x3d1e4b27fb754eb1
+        .quad 0xc08ff3a4e5dee890, 0x3d58c67819ead583
+        .quad 0xc08ff3a7d0eb9da8, 0xbd536d02e85d644b
+        .quad 0xc08ff3aabb3ba048, 0x3d5f510ab9e7c184
+        .quad 0xc08ff3ada4cf4f98, 0x3d557bc5b296d5f5
+        .quad 0xc08ff3b08da70a90, 0xbd48893b8f7f52c9
+        .quad 0xc08ff3b375c32fe8, 0x3d5ca0b69a37d601
+        .quad 0xc08ff3b65d241df0, 0xbd519c57fff86872
+        .quad 0xc08ff3b943ca32d8, 0x3d048da0e3a8c3c3
+        .quad 0xc08ff3bc29b5cc68, 0xbd5dd05e06ec07d0
+        .quad 0xc08ff3bf0ee74840, 0x3d56c52a5c8015db
+        .quad 0xc08ff3c1f35f0398, 0x3d54e1dba9930bed
+        .quad 0xc08ff3c4d71d5b78, 0x3d2c5f679a7932b7
+        .quad 0xc08ff3c7ba22aca0, 0xbd3f77628aa1aed8
+        .quad 0xc08ff3cd7e03ac60, 0xbd5cc8a22f1d8591
+        .quad 0xc08ff3d33f04e360, 0x3d4ae09463e13f6f
+        .quad 0xc08ff3d8fd292dc8, 0x3d42736efbec3922
+        .quad 0xc08ff3deb8736390, 0xbce0324f8d149b09
+        .quad 0xc08ff3e470e65870, 0xbd52089e4b8dd900
+        .quad 0xc08ff3ea2684dbf0, 0xbd5f8e9d5dea127f
+        .quad 0xc08ff3efd951b970, 0xbd4b60d79db026b1
+        .quad 0xc08ff3f5894fb828, 0x3d45ff1d6cea2c52
+        .quad 0xc08ff3fb36819b38, 0x3d5d56022cd7f5b2
+        .quad 0xc08ff400e0ea21a8, 0xbd58d63f09907b27
+        .quad 0xc08ff406888c0690, 0xbd4ce6ea362f7ce0
+        .quad 0xc08ff40c2d6a00f0, 0x3d519fc9ad2ef3ab
+        .quad 0xc08ff411cf86c3c8, 0xbd55fc89e7b55f20
+        .quad 0xc08ff4176ee4fe40, 0xbd53229ca791d9be
+        .quad 0xc08ff41d0b875b88, 0x3d5e7733e6fb23d1
+        .quad 0xc08ff422a57082e0, 0x3d5871413696b637
+        .quad 0xc08ff4283ca317c0, 0x3d4b118aa7f493b9
+        .quad 0xc08ff42dd121b9c8, 0x3d4bdf3692763b50
+        .quad 0xc08ff43362ef04c8, 0x3d4867e17476dd63
+        .quad 0xc08ff438f20d90c8, 0xbd5d49b741c778f3
+        .quad 0xc08ff43e7e7ff228, 0x3d59ac35724f01e3
+        .quad 0xc08ff4440848b968, 0xbd5251ccdc49432d
+        .quad 0xc08ff4498f6a7388, 0x3d56cf153ebc9f07
+        .quad 0xc08ff44f13e7a9b8, 0x3d503b7a697a659c
+        .quad 0xc08ff45495c2e198, 0xbd5fa03da8acd872
+        .quad 0xc08ff45a14fe9d38, 0xbd5e6cfb0b5c38fc
+        .quad 0xc08ff45f919d5b08, 0x3d468b1f1269f1cf
+        .quad 0xc08ff4650ba195e0, 0xbd313a3a8f72c0f3
+        .quad 0xc08ff46a830dc528, 0x3d205d31eb8d2bd4
+        .quad 0xc08ff46ff7e45cb8, 0xbd56cb8ddf5d4a90
+        .quad 0xc08ff4756a27cd00, 0x3d272c2d46acdcbf
+        .quad 0xc08ff47ad9da82e8, 0xbd4946efab7a989d
+        .quad 0xc08ff48046fee800, 0xbd23fabe48cf933c
+        .quad 0xc08ff485b1976268, 0x3d4f03b099d80f79
+        .quad 0xc08ff48b19a654e0, 0x3d4fe0c35ab7e9b5
+        .quad 0xc08ff4907f2e1ed0, 0xbd54b4843f34fe09
+        .quad 0xc08ff495e2311c58, 0xbd5dfa6541236a64
+        .quad 0xc08ff49b42b1a648, 0x3d56fd2c8c418cbb
+        .quad 0xc08ff4a0a0b21218, 0x3d5e687ef208418a
+        .quad 0xc08ff4a5fc34b210, 0x3d4a671ce14c5521
+        .quad 0xc08ff4ab553bd540, 0x3d419d0202e3cd96
+        .quad 0xc08ff4b0abc9c780, 0x3d576b941a895781
+        .quad 0xc08ff4b5ffe0d170, 0xbd4ea96d88cd1a30
+        .quad 0xc08ff4bb518338a0, 0x3d4d6b405bd43ba6
+        .quad 0xc08ff4c0a0b33f60, 0xbcf03382150a56b7
+        .quad 0xc08ff4c5ed7324f8, 0xbd400df96beb0937
+        .quad 0xc08ff4cb37c52590, 0xbd5c161714cdebd5
+        .quad 0xc08ff4d07fab7a48, 0xbd333e8eda1a8e79
+        .quad 0xc08ff4d5c5285928, 0x3d53aba20381d59f
+        .quad 0xc08ff4db083df530, 0xbd45e9b07af4e77c
+        .quad 0xc08ff4e048ee7e70, 0xbd533cfdb78a8c41
+        .quad 0xc08ff4e5873c21f0, 0xbd5d9b87f4d283f2
+        .quad 0xc08ff4eac32909c8, 0xbd53a677deee97fa
+        .quad 0xc08ff4effcb75d18, 0xbd5afd9f5dedc208
+        .quad 0xc08ff4f533e94020, 0x3ce9dd794d20ab77
+        .quad 0xc08ff4fa68c0d428, 0xbd5eeae84ba1cbf1
+        .quad 0xc08ff4ff9b4037b0, 0xbd4f4451587282c8
+        .quad 0xc08ff504cb698648, 0xbd4a1fa15087e717
+        .quad 0xc08ff509f93ed8b0, 0xbd5f2f0042b9331a
+        .quad 0xc08ff50f24c244e0, 0xbd2c2389f8e86341
+        .quad 0xc08ff5144df5ddf0, 0xbd556fcb7b48f200
+        .quad 0xc08ff51974dbb448, 0x3d43ba060aa69038
+        .quad 0xc08ff51e9975d578, 0x3d477ef38ca20229
+        .quad 0xc08ff523bbc64c60, 0x3d49bcaf1aa4168a
+        .quad 0xc08ff528dbcf2120, 0xbd51c5609b60687e
+        .quad 0xc08ff52df9925930, 0xbd51691708d22ce7
+        .quad 0xc08ff5331511f750, 0x3d30d05c98ecb3d1
+        .quad 0xc08ff5382e4ffb90, 0xbd423adb056dd244
+        .quad 0xc08ff53d454e6368, 0xbd3663607042da50
+        .quad 0xc08ff5425a0f29a8, 0x3d42655d3c6187a6
+        .quad 0xc08ff5476c944680, 0xbd028c958ae09d20
+        .quad 0xc08ff54c7cdfaf90, 0xbd436eaf17756653
+        .quad 0xc08ff5518af357e8, 0x3d5fbbbee66f8d24
+        .quad 0xc08ff55696d12ff0, 0xbd5d93b389497880
+        .quad 0xc08ff55ba07b25b0, 0xbd43ff8ff777f337
+        .quad 0xc08ff560a7f32488, 0xbcf3568803ec82a4
+        .quad 0xc08ff565ad3b1560, 0xbd50c83eba5cc7ea
+        .quad 0xc08ff56ab054deb0, 0x3d5becc2411500b7
+        .quad 0xc08ff56fb1426458, 0xbd5dac964ffa8b83
+        .quad 0xc08ff574b00587f0, 0x3d1d82f6cc82e69f
+        .quad 0xc08ff579aca02878, 0xbd34767c0d40542c
+        .quad 0xc08ff57ea7142298, 0xbd52d28e996ed2ce
+        .quad 0xc08ff5839f635090, 0xbd432a85d337086d
+        .quad 0xc08ff588958f8a38, 0x3d512b06ec20c7fd
+        .quad 0xc08ff58d899aa500, 0xbd47e2147555e10b
+        .quad 0xc08ff5927b867410, 0xbd4d84480a1b301d
+        .quad 0xc08ff5976b54c830, 0x3d5622146f3a51bd
+        .quad 0xc08ff59c59076fc8, 0x3d46d485c5f9c392
+        .quad 0xc08ff5a144a03700, 0xbd4562714549f4fd
+        .quad 0xc08ff5a62e20e7b8, 0x3d541ab67e365a63
+        .quad 0xc08ff5ab158b4970, 0xbd5b0855668b2369
+        .quad 0xc08ff5affae12188, 0x3d27de1bc2ed4dd8
+        .quad 0xc08ff5b4de243300, 0x3d40f2592d5ed454
+        .quad 0xc08ff5b9bf563ea8, 0xbd4ee2f8ba7b3e9e
+        .quad 0xc08ff5be9e790320, 0xbd3c2214335c2164
+        .quad 0xc08ff5c37b8e3cc8, 0x3d30745623ab1fd9
+        .quad 0xc08ff5c85697a5d0, 0xbd326c8fb0ffde38
+        .quad 0xc08ff5cd2f96f640, 0xbd4c83277493b0bc
+        .quad 0xc08ff5d2068de3f8, 0x3d39bb1655e6e5ba
+        .quad 0xc08ff5d6db7e22a8, 0x3d403170b47a5559
+        .quad 0xc08ff5dbae6963e8, 0x3d5801ddf1edc325
+        .quad 0xc08ff5e07f515728, 0x3d4b2704c46fe064
+        .quad 0xc08ff5e54e37a9c8, 0x3d5a16e99ed6cd83
+        .quad 0xc08ff5ea1b1e0700, 0xbd5353a3ac18c62f
+        .quad 0xc08ff5eee6061810, 0x3d567c69c189f21a
+        .quad 0xc08ff5f3aef18400, 0xbd50dd3220e0b0f2
+        .quad 0xc08ff5f875e1eff0, 0xbd3ab64d80638db2
+        .quad 0xc08ff5fd3ad8fee0, 0x3d3ec753439035aa
+        .quad 0xc08ff601fdd851c8, 0xbd5e10415f5f5e74
+        .quad 0xc08ff606bee187b0, 0xbd55f1048b113fae
+        .quad 0xc08ff60b7df63d90, 0x3d1e94e4107406c8
+        .quad 0xc08ff6103b180e60, 0xbd4e2eb5d0c36eb5
+        .quad 0xc08ff614f6489330, 0x3d43ec5c714f709a
+        .quad 0xc08ff619af896308, 0x3d519ec459b62a08
+        .quad 0xc08ff61e66dc1300, 0xbd5b93d09dd6161d
+        .quad 0xc08ff6231c423658, 0x3d5d72b849dd56be
+        .quad 0xc08ff627cfbd5e38, 0xbd276b7e32659173
+        .quad 0xc08ff62c814f1a08, 0x3d4fd918f2e7a6b9
+        .quad 0xc08ff63130f8f730, 0x3d5609ba1dcc4c97
+        .quad 0xc08ff635debc8138, 0xbd55cab233dbd84c
+        .quad 0xc08ff63a8a9b41d8, 0xbd56778ab7aaabc9
+        .quad 0xc08ff63f3496c0e0, 0x3d5b2791da49c370
+        .quad 0xc08ff643dcb08438, 0x3d583063ef145f9c
+        .quad 0xc08ff64882ea1000, 0xbd484e9cab375fb6
+        .quad 0xc08ff64d2744e688, 0xbd5c430c95c374aa
+        .quad 0xc08ff651c9c28848, 0xbd57a16d78490bb3
+        .quad 0xc08ff6566a6473e8, 0xbd445d70374ea9ec
+        .quad 0xc08ff65b092c2648, 0x3d5c9729142b9d4b
+        .quad 0xc08ff65fa61b1a70, 0xbd4aaa179d032405
+        .quad 0xc08ff6644132c9c0, 0xbd2a3ea300d173de
+        .quad 0xc08ff668da74abc0, 0x3d57809438efb010
+        .quad 0xc08ff66d71e23630, 0xbd5e9156720951d6
+        .quad 0xc08ff672077cdd30, 0xbd5bab62e8462035
+        .quad 0xc08ff6769b461310, 0xbd05113545431443
+        .quad 0xc08ff67b2d3f4868, 0x3d5105eb0607e59b
+        .quad 0xc08ff67fbd69ec18, 0xbd5e657842b37dc0
+        .quad 0xc08ff6844bc76b68, 0x3d4ad1849705bc4c
+        .quad 0xc08ff688d85931c8, 0xbd508b6f92b6e0d6
+        .quad 0xc08ff68d6320a920, 0x3d48683cceb5fdfc
+        .quad 0xc08ff691ec1f3990, 0xbd2c25ee290acbf5
+        .quad 0xc08ff696735649a8, 0x3d58904932cd46d0
+        .quad 0xc08ff69af8c73e38, 0xbd5c964167f0bfeb
+        .quad 0xc08ff69f7c737a90, 0xbd43d66937fa06a9
+        .quad 0xc08ff6a3fe5c6040, 0xbd54bc302ffa76fb
+        .quad 0xc08ff6a87e834f50, 0x3d4609b1487f87a3
+        .quad 0xc08ff6acfce9a618, 0xbd42c0d9af0400b1
+        .quad 0xc08ff6b17990c170, 0x3d549a63973d262d
+        .quad 0xc08ff6b5f479fc80, 0xbd28cde894aa0641
+        .quad 0xc08ff6ba6da6b0f0, 0xbd5acef617609a34
+        .quad 0xc08ff6bee51836d8, 0x3d4abb9ff3cf80b8
+        .quad 0xc08ff6c35acfe4a8, 0xbd53dcfa1b7697f3
+        .quad 0xc08ff6c7cecf0f68, 0x3d5bcdf4aea18a55
+        .quad 0xc08ff6cc41170a70, 0x3d3cad29d4324038
+        .quad 0xc08ff6d0b1a927b0, 0x3d56945f9cc2a565
+        .quad 0xc08ff6d52086b780, 0x3d5d20dfc1c668a7
+        .quad 0xc08ff6d98db108b8, 0x3d37f20a9bcbbe04
+        .quad 0xc08ff6ddf92968b8, 0x3d1e0824a6e3a4d2
+        .quad 0xc08ff6e262f12358, 0xbd469f07bf6322c7
+        .quad 0xc08ff6e6cb0982f8, 0xbd5cc593afdbfaef
+        .quad 0xc08ff6eb3173d080, 0xbd5ee68d555d7122
+        .quad 0xc08ff6ef96315360, 0xbd144ee1d6a39124
+        .quad 0xc08ff6f3f9435188, 0xbd40f2cb308bcd25
+        .quad 0xc08ff6f85aab0f80, 0xbd5fd98ced08a73c
+        .quad 0xc08ff6fcba69d068, 0x3d54f2f2a1ea8606
+        .quad 0xc08ff7011880d5d0, 0xbd57818234572db7
+        .quad 0xc08ff70574f16008, 0x3d52429e823a9a83
+        .quad 0xc08ff709cfbcadd0, 0x3d5d6dc9bb81476c
+        .quad 0xc08ff70e28e3fc90, 0x3d57d189e116bcb2
+        .quad 0xc08ff71280688848, 0x3d0e18992809fd6d
+        .quad 0xc08ff716d64b8b98, 0xbd3b48ac92b8549a
+        .quad 0xc08ff71b2a8e3fb8, 0xbd4dcfa48040893b
+        .quad 0xc08ff71f7d31dc88, 0x3d58d945b8e53ef1
+        .quad 0xc08ff723ce379878, 0x3d4f80faef3e15ee
+        .quad 0xc08ff7281da0a8b0, 0x3d53edc0fd40d18f
+        .quad 0xc08ff72c6b6e40f0, 0xbd4bcac66e0be72f
+        .quad 0xc08ff730b7a193b0, 0xbd44fcf96e2ec967
+        .quad 0xc08ff735023bd208, 0x3d57e2ff34b08d86
+        .quad 0xc08ff7394b3e2bb0, 0xbd4caedfb10b98dd
+        .quad 0xc08ff73d92a9cf28, 0xbd55db1083e5ac6a
+        .quad 0xc08ff741d87fe990, 0xbd580e83e6d54ed6
+        .quad 0xc08ff7461cc1a6c0, 0x3d1688c83e1b0cba
+        .quad 0xc08ff74a5f703138, 0xbd52c398c872b701
+        .quad 0xc08ff74ea08cb240, 0xbd49aabc3683b259
+        .quad 0xc08ff752e01851d0, 0x3d5ccba8de72495b
+        .quad 0xc08ff7571e143688, 0xbd5981cf630f5793
+        .quad 0xc08ff75b5a8185e8, 0xbd4f235844e01ebd
+        .quad 0xc08ff75f95616410, 0xbd5047de7ba8ec62
+        .quad 0xc08ff763ceb4f3f0, 0x3d5fa55e004d6562
+        .quad 0xc08ff768067d5720, 0xbd49f386e521a80e
+        .quad 0xc08ff76c3cbbae20, 0x3d3693551e62fe83
+        .quad 0xc08ff77071711818, 0x3d4ba63b30b6c42c
+        .quad 0xc08ff774a49eb300, 0x3d4c26523d32f573
+        .quad 0xc08ff778d6459b98, 0x3d3b65e70806143a
+        .quad 0xc08ff77d0666ed68, 0xbd5796d9c9f2c2cb
+        .quad 0xc08ff7813503c2d0, 0x3d33267b004b912b
+        .quad 0xc08ff785621d34e8, 0x3d1d5d8a23e33341
+        .quad 0xc08ff7898db45ba8, 0x3d46c95233e60f40
+        .quad 0xc08ff78db7ca4dd0, 0x3d362865acc8f43f
+        .quad 0xc08ff791e06020f8, 0xbd10e8203e161511
+        .quad 0xc08ff7960776e988, 0xbd5cafe4f4467eaa
+        .quad 0xc08ff79a2d0fbac8, 0xbd520fddea9ea0cd
+        .quad 0xc08ff79e512ba6d0, 0x3d5c53d3778dae46
+        .quad 0xc08ff7a273cbbe80, 0xbd5f0f6f88490367
+        .quad 0xc08ff7a694f111c0, 0x3d5601aa3f55ec11
+        .quad 0xc08ff7aab49caf20, 0xbd4f1a8a2328a4c4
+        .quad 0xc08ff7aed2cfa438, 0xbd4a3d5341c07d0e
+        .quad 0xc08ff7b2ef8afd68, 0xbd5f4a1f4c525f31
+        .quad 0xc08ff7b70acfc600, 0xbd4d594d77b3d775
+        .quad 0xc08ff7bb249f0828, 0x3d2aef47e37e953b
+        .quad 0xc08ff7bf3cf9ccf0, 0x3d501803b47dfba2
+        .quad 0xc08ff7c353e11c50, 0x3d5ed5ec84e5745e
+        .quad 0xc08ff7c76955fd20, 0xbd3de249bc9e7f96
+        .quad 0xc08ff7cb7d597538, 0x3d5b5794341d1fdf
+        .quad 0xc08ff7cf8fec8938, 0xbd519dbd08276359
+        .quad 0xc08ff7d3a1103cd0, 0xbd450129b8038848
+        .quad 0xc08ff7d7b0c59288, 0x3d348f00d3bb30fd
+        .quad 0xc08ff7dbbf0d8bd8, 0xbd43529025720d8a
+        .quad 0xc08ff7dfcbe92938, 0x3d5abdaa2b1955d7
+        .quad 0xc08ff7e3d75969f8, 0xbd4e8837d4588a98
+        .quad 0xc08ff7e7e15f4c80, 0x3d57a782a6df5a1f
+        .quad 0xc08ff7ebe9fbce08, 0x3d304ba3eaa96bf1
+        .quad 0xc08ff7eff12fead8, 0xbd47aab17b868a60
+        .quad 0xc08ff7f3f6fc9e28, 0xbd5bd858693ba90a
+        .quad 0xc08ff7f7fb62e230, 0x3d26abb2c547789a
+        .quad 0xc08ff7fbfe63b010, 0xbd59d383d543b3f5
+        .quad 0xc08ff80000000000, 0x8000000000000000
+        /*== Log_LA_table ==*/
+        .align 16
+        .quad 0x0000000000000000
+        .quad 0xbf670f83ff0a7565
+        .quad 0xbf7709c46d7aac77
+        .quad 0xbf8143068125dd0e
+        .quad 0xbf86fe50b6ef0851
+        .quad 0xbf8cb6c3abd14559
+        .quad 0xbf91363117a97b0c
+        .quad 0xbf940f9786685d29
+        .quad 0xbf96e79685c2d22a
+        .quad 0xbf99be2f7749acc2
+        .quad 0xbf9c9363ba850f86
+        .quad 0xbf9f6734acf8695a
+        .quad 0xbfa11cd1d5133413
+        .quad 0xbfa2855905ca70f6
+        .quad 0xbfa3ed3094685a26
+        .quad 0xbfa554592bb8cd58
+        .quad 0xbfa6bad3758efd87
+        .quad 0xbfa820a01ac754cb
+        .quad 0xbfa985bfc3495194
+        .quad 0xbfaaea3316095f72
+        .quad 0xbfac4dfab90aab5f
+        .quad 0xbfadb1175160f3b0
+        .quad 0xbfaf1389833253a0
+        .quad 0xbfb03aa8f8dc854c
+        .quad 0xbfb0eb389fa29f9b
+        .quad 0xbfb19b74069f5f0a
+        .quad 0xbfb24b5b7e135a3d
+        .quad 0xbfb2faef55ccb372
+        .quad 0xbfb3aa2fdd27f1c3
+        .quad 0xbfb4591d6310d85a
+        .quad 0xbfb507b836033bb7
+        .quad 0xbfb5b600a40bd4f3
+        .quad 0xbfb663f6fac91316
+        .quad 0xbfb7119b876bea86
+        .quad 0xbfb7beee96b8a281
+        .quad 0xbfb86bf07507a0c7
+        .quad 0xbfb918a16e46335b
+        .quad 0xbfb9c501cdf75872
+        .quad 0xbfba7111df348494
+        .quad 0xbfbb1cd1ecae66e7
+        .quad 0xbfbbc84240adabba
+        .quad 0xbfbc73632513bd4f
+        .quad 0xbfbd1e34e35b82da
+        .quad 0xbfbdc8b7c49a1ddb
+        .quad 0xbfbe72ec117fa5b2
+        .quad 0xbfbf1cd21257e18c
+        .quad 0xbfbfc66a0f0b00a5
+        .quad 0xbfc037da278f2870
+        .quad 0xbfc08c588cda79e4
+        .quad 0xbfc0e0b05ac848ed
+        .quad 0xbfc134e1b489062e
+        .quad 0xbfc188ecbd1d16be
+        .quad 0xbfc1dcd197552b7b
+        .quad 0xbfc2309065d29791
+        .quad 0xbfc284294b07a640
+        .quad 0xbfc2d79c6937efdd
+        .quad 0xbfc32ae9e278ae1a
+        .quad 0xbfc37e11d8b10f89
+        .quad 0xbfc3d1146d9a8a64
+        .quad 0xbfc423f1c2c12ea2
+        .quad 0xbfc476a9f983f74d
+        .quad 0xbfc4c93d33151b24
+        .quad 0xbfc51bab907a5c8a
+        .quad 0xbfc56df5328d58c5
+        .quad 0xbfc5c01a39fbd688
+        .quad 0xbfc6121ac74813cf
+        .quad 0xbfc663f6fac91316
+        .quad 0xbfc6b5aef4aae7dc
+        .quad 0xbfc70742d4ef027f
+        .quad 0xbfc758b2bb6c7b76
+        .quad 0xbfc7a9fec7d05ddf
+        .quad 0xbfc7fb27199df16d
+        .quad 0xbfc84c2bd02f03b3
+        .quad 0xbfc89d0d0ab430cd
+        .quad 0xbfc8edcae8352b6c
+        .quad 0xbfc93e6587910444
+        .quad 0xbfc98edd077e70df
+        .quad 0xbfc9df31868c11d5
+        .quad 0xbfca2f632320b86b
+        .quad 0xbfca7f71fb7bab9d
+        .quad 0xbfcacf5e2db4ec94
+        .quad 0xbfcb1f27d7bd7a80
+        .quad 0xbfcb6ecf175f95e9
+        .quad 0xbfcbbe540a3f036f
+        .quad 0xbfcc0db6cdd94dee
+        .quad 0xbfcc5cf77f860826
+        .quad 0xbfccac163c770dc9
+        .quad 0xbfccfb1321b8c400
+        .quad 0xbfcd49ee4c325970
+        .quad 0xbfcd98a7d8a605a7
+        .quad 0xbfcde73fe3b1480f
+        .quad 0xbfce35b689cd2655
+        .quad 0xbfce840be74e6a4d
+        .quad 0xbfced2401865df52
+        .quad 0xbfcf205339208f27
+        .quad 0xbfcf6e456567fe55
+        .quad 0xbfcfbc16b902680a
+        .quad 0xbfd004e3a7c97cbd
+        .quad 0xbfd02baba24d0664
+        .quad 0xbfd0526359bab1b3
+        .quad 0xbfd0790adbb03009
+        .quad 0xbfd09fa235ba2020
+        .quad 0xbfd0c62975542a8f
+        .quad 0xbfd0eca0a7e91e0b
+        .quad 0xbfd11307dad30b76
+        .quad 0xbfd1395f1b5b61a6
+        .quad 0xbfd15fa676bb08ff
+        .quad 0xbfd185ddfa1a7ed0
+        .quad 0xbfd1ac05b291f070
+        .quad 0xbfd1d21dad295632
+        .quad 0xbfd1f825f6d88e13
+        .quad 0xbfd21e1e9c877639
+        .quad 0xbfd24407ab0e073a
+        .quad 0xbfd269e12f346e2c
+        .quad 0xbfd28fab35b32683
+        .quad 0xbfd2b565cb3313b6
+        .quad 0xbfd2db10fc4d9aaf
+        .quad 0xbfd300acd58cbb10
+        .quad 0xbfd32639636b2836
+        .quad 0xbfd34bb6b2546218
+        .quad 0xbfd37124cea4cded
+        .quad 0xbfd39683c4a9ce9a
+        .quad 0xbfd3bbd3a0a1dcfb
+        .quad 0xbfd3e1146ebc9ff2
+        .quad 0xbfd406463b1b0449
+        .quad 0xbfd42b6911cf5465
+        .quad 0xbfd4507cfedd4fc4
+        .quad 0xbfd475820e3a4251
+        .quad 0xbfd49a784bcd1b8b
+        .quad 0xbfd4bf5fc36e8577
+        .quad 0xbfd4e43880e8fb6a
+        .quad 0xbfd509028ff8e0a2
+        .quad 0xbfd52dbdfc4c96b3
+        .quad 0xbfd5526ad18493ce
+        .quad 0xbfd577091b3378cb
+        .quad 0xbfd59b98e4de271c
+        .quad 0xbfd5c01a39fbd688
+        .quad 0xbfd5e48d25f62ab9
+        .quad 0xbfd608f1b42948ae
+        .quad 0xbfd62d47efe3ebee
+        .quad 0xbfd6518fe4677ba7
+        .quad 0xbfd675c99ce81f92
+        .quad 0xbfd699f5248cd4b8
+        .quad 0xbfd6be12866f820d
+        .quad 0xbfd6e221cd9d0cde
+        .quad 0xbfd7062305156d1d
+        .quad 0xbfd72a1637cbc183
+        .quad 0xbfd74dfb70a66388
+        .quad 0xbfd771d2ba7efb3c
+        .quad 0xbfd7959c202292f1
+        .quad 0xbfd7b957ac51aac4
+        .quad 0xbfd7dd0569c04bff
+        .quad 0xbfd800a563161c54
+        .quad 0xbfd82437a2ee70f7
+        .quad 0xbfd847bc33d8618e
+        .quad 0xbfd86b332056db01
+        .quad 0xbfd88e9c72e0b226
+        .quad 0xbfd8b1f835e0b642
+        .quad 0xbfd8d54673b5c372
+        .quad 0xbfd8f88736b2d4e8
+        .quad 0xbfd91bba891f1709
+        .quad 0xbfd93ee07535f967
+        .quad 0xbfd961f90527409c
+        .quad 0xbfd98504431717fc
+        .quad 0xbfd9a802391e232f
+        .quad 0xbfd9caf2f1498fa4
+        .quad 0xbfd9edd6759b25e0
+        .quad 0xbfda10acd0095ab4
+        .quad 0xbfda33760a7f6051
+        .quad 0xbfda56322edd3731
+        .quad 0xbfda78e146f7bef4
+        .quad 0xbfda9b835c98c70a
+        .quad 0xbfdabe18797f1f49
+        .quad 0xbfdae0a0a75ea862
+        .quad 0xbfdb031befe06434
+        .quad 0xbfdb258a5ca28608
+        .quad 0xbfdb47ebf73882a1
+        .quad 0xbfdb6a40c92b203f
+        .quad 0xbfdb8c88dbf8867a
+        .quad 0xbfdbaec439144dfd
+        .quad 0xbfdbd0f2e9e79031
+        .quad 0xbfdbf314f7d0f6ba
+        .quad 0xbfdc152a6c24cae6
+        .quad 0xbfdc3733502d04f8
+        .quad 0xbfdc592fad295b56
+        .quad 0xbfdc7b1f8c4f51a4
+        .quad 0xbfdc9d02f6ca47b4
+        .quad 0xbfdcbed9f5bb886a
+        .quad 0xbfdce0a4923a587d
+        .quad 0xbfdd0262d554051c
+        .quad 0xbfdd2414c80bf27d
+        .quad 0xbfdd45ba735baa4f
+        .quad 0xbfdd6753e032ea0f
+        .quad 0xbfdd88e11777b149
+        .quad 0xbfddaa6222064fb9
+        .quad 0xbfddcbd708b17359
+        .quad 0xbfdded3fd442364c
+        .quad 0xbfde0e9c8d782cbd
+        .quad 0xbfde2fed3d097298
+        .quad 0xbfde5131eba2b931
+        .quad 0xbfde726aa1e754d2
+        .quad 0xbfde939768714a32
+        .quad 0xbfdeb4b847d15bce
+        .quad 0xbfded5cd488f1732
+        .quad 0xbfdef6d67328e220
+        .quad 0xbfdf17d3d01407af
+        .quad 0xbfdf38c567bcc541
+        .quad 0xbfdf59ab4286576c
+        .quad 0xbfdf7a8568cb06cf
+        .quad 0xbfdf9b53e2dc34c4
+        .quad 0xbfdfbc16b902680a
+        .quad 0xbfdfdccdf37d594c
+        .quad 0xbfdffd799a83ff9b
+        .quad 0x3fdfe1e649bb6335
+        .quad 0x3fdfc151b11b3640
+        .quad 0x3fdfa0c8937e7d5d
+        .quad 0x3fdf804ae8d0cd02
+        .quad 0x3fdf5fd8a9063e35
+        .quad 0x3fdf3f71cc1b629c
+        .quad 0x3fdf1f164a15389a
+        .quad 0x3fdefec61b011f85
+        .quad 0x3fdede8136f4cbf1
+        .quad 0x3fdebe47960e3c08
+        .quad 0x3fde9e193073ac06
+        .quad 0x3fde7df5fe538ab3
+        .quad 0x3fde5dddf7e46e0a
+        .quad 0x3fde3dd1156507de
+        .quad 0x3fde1dcf4f1c1a9e
+        .quad 0x3fddfdd89d586e2b
+        .quad 0x3fddddecf870c4c1
+        .quad 0x3fddbe0c58c3cff2
+        .quad 0x3fdd9e36b6b825b1
+        .quad 0x3fdd7e6c0abc3579
+        .quad 0x3fdd5eac4d463d7e
+        .quad 0x3fdd3ef776d43ff4
+        .quad 0x3fdd1f4d7febf868
+        .quad 0x3fdcffae611ad12b
+        .quad 0x3fdce01a12f5d8d1
+        .quad 0x3fdcc0908e19b7bd
+        .quad 0x3fdca111cb2aa5c5
+        .quad 0x3fdc819dc2d45fe4
+        .quad 0x3fdc62346dca1dfe
+        .quad 0x3fdc42d5c4c688b4
+        .quad 0x3fdc2381c08baf4f
+        .quad 0x3fdc043859e2fdb3
+        .quad 0x3fdbe4f9899d326e
+        .quad 0x3fdbc5c5489254cc
+        .quad 0x3fdba69b8fa1ab02
+        .quad 0x3fdb877c57b1b070
+        .quad 0x3fdb686799b00be3
+        .quad 0x3fdb495d4e9185f7
+        .quad 0x3fdb2a5d6f51ff83
+        .quad 0x3fdb0b67f4f46810
+        .quad 0x3fdaec7cd882b46c
+        .quad 0x3fdacd9c130dd53f
+        .quad 0x3fdaaec59dadadbe
+        .quad 0x3fda8ff971810a5e
+        .quad 0x3fda713787ad97a5
+        .quad 0x3fda527fd95fd8ff
+        .quad 0x3fda33d25fcb1fac
+        .quad 0x3fda152f142981b4
+        .quad 0x3fd9f695efbbd0ef
+        .quad 0x3fd9d806ebc9921c
+        .quad 0x3fd9b98201a0f405
+        .quad 0x3fd99b072a96c6b2
+        .quad 0x3fd97c96600672ad
+        .quad 0x3fd95e2f9b51f04e
+        .quad 0x3fd93fd2d5e1bf1d
+        .quad 0x3fd921800924dd3b
+        .quad 0x3fd903372e90bee4
+        .quad 0x3fd8e4f83fa145ee
+        .quad 0x3fd8c6c335d8b966
+        .quad 0x3fd8a8980abfbd32
+        .quad 0x3fd88a76b7e549c6
+        .quad 0x3fd86c5f36dea3dc
+        .quad 0x3fd84e5181475449
+        .quad 0x3fd8304d90c11fd3
+        .quad 0x3fd812535ef3ff19
+        .quad 0x3fd7f462e58e1688
+        .quad 0x3fd7d67c1e43ae5c
+        .quad 0x3fd7b89f02cf2aad
+        .quad 0x3fd79acb8cf10390
+        .quad 0x3fd77d01b66fbd37
+        .quad 0x3fd75f417917e02c
+        .quad 0x3fd7418acebbf18f
+        .quad 0x3fd723ddb1346b65
+        .quad 0x3fd7063a1a5fb4f2
+        .quad 0x3fd6e8a004221b1f
+        .quad 0x3fd6cb0f6865c8ea
+        .quad 0x3fd6ad88411abfea
+        .quad 0x3fd6900a8836d0d5
+        .quad 0x3fd6729637b59418
+        .quad 0x3fd6552b49986277
+        .quad 0x3fd637c9b7e64dc2
+        .quad 0x3fd61a717cac1983
+        .quad 0x3fd5fd2291fc33cf
+        .quad 0x3fd5dfdcf1eeae0e
+        .quad 0x3fd5c2a096a135dc
+        .quad 0x3fd5a56d7a370ded
+        .quad 0x3fd5884396d90702
+        .quad 0x3fd56b22e6b578e5
+        .quad 0x3fd54e0b64003b70
+        .quad 0x3fd530fd08f29fa7
+        .quad 0x3fd513f7cfcb68ce
+        .quad 0x3fd4f6fbb2cec598
+        .quad 0x3fd4da08ac46495a
+        .quad 0x3fd4bd1eb680e548
+        .quad 0x3fd4a03dcbd2e1be
+        .quad 0x3fd48365e695d797
+        .quad 0x3fd466970128a987
+        .quad 0x3fd449d115ef7d87
+        .quad 0x3fd42d141f53b646
+        .quad 0x3fd4106017c3eca3
+        .quad 0x3fd3f3b4f9b3e939
+        .quad 0x3fd3d712bf9c9def
+        .quad 0x3fd3ba7963fc1f8f
+        .quad 0x3fd39de8e1559f6f
+        .quad 0x3fd3816132316520
+        .quad 0x3fd364e2511cc821
+        .quad 0x3fd3486c38aa29a8
+        .quad 0x3fd32bfee370ee68
+        .quad 0x3fd30f9a4c0d786d
+        .quad 0x3fd2f33e6d2120f2
+        .quad 0x3fd2d6eb4152324f
+        .quad 0x3fd2baa0c34be1ec
+        .quad 0x3fd29e5eedbe4a35
+        .quad 0x3fd28225bb5e64a4
+        .quad 0x3fd265f526e603cb
+        .quad 0x3fd249cd2b13cd6c
+        .quad 0x3fd22dadc2ab3497
+        .quad 0x3fd21196e87473d1
+        .quad 0x3fd1f588973c8747
+        .quad 0x3fd1d982c9d52708
+        .quad 0x3fd1bd857b14c146
+        .quad 0x3fd1a190a5d674a0
+        .quad 0x3fd185a444fa0a7b
+        .quad 0x3fd169c05363f158
+        .quad 0x3fd14de4cbfd373e
+        .quad 0x3fd13211a9b38424
+        .quad 0x3fd11646e7791469
+        .quad 0x3fd0fa848044b351
+        .quad 0x3fd0deca6f11b58b
+        .quad 0x3fd0c318aedff3c0
+        .quad 0x3fd0a76f3ab3c52c
+        .quad 0x3fd08bce0d95fa38
+        .quad 0x3fd070352293d724
+        .quad 0x3fd054a474bf0eb7
+        .quad 0x3fd0391bff2dbcf3
+        .quad 0x3fd01d9bbcfa61d4
+        .quad 0x3fd00223a943dc19
+        .quad 0x3fcfcd677e5ac81d
+        .quad 0x3fcf9697f3bd0ccf
+        .quad 0x3fcf5fd8a9063e35
+        .quad 0x3fcf29299496a889
+        .quad 0x3fcef28aacd72231
+        .quad 0x3fcebbfbe83901a6
+        .quad 0x3fce857d3d361368
+        .quad 0x3fce4f0ea2509008
+        .quad 0x3fce18b00e13123d
+        .quad 0x3fcde26177108d03
+        .quad 0x3fcdac22d3e441d3
+        .quad 0x3fcd75f41b31b6dd
+        .quad 0x3fcd3fd543a4ad5c
+        .quad 0x3fcd09c643f117f0
+        .quad 0x3fccd3c712d31109
+        .quad 0x3fcc9dd7a70ed160
+        .quad 0x3fcc67f7f770a67e
+        .quad 0x3fcc3227facce950
+        .quad 0x3fcbfc67a7fff4cc
+        .quad 0x3fcbc6b6f5ee1c9b
+        .quad 0x3fcb9115db83a3dd
+        .quad 0x3fcb5b844fb4b3ef
+        .quad 0x3fcb2602497d5346
+        .quad 0x3fcaf08fbfe15c51
+        .quad 0x3fcabb2ca9ec7472
+        .quad 0x3fca85d8feb202f7
+        .quad 0x3fca5094b54d2828
+        .quad 0x3fca1b5fc4e0b465
+        .quad 0x3fc9e63a24971f46
+        .quad 0x3fc9b123cba27ed3
+        .quad 0x3fc97c1cb13c7ec1
+        .quad 0x3fc94724cca657be
+        .quad 0x3fc9123c1528c6ce
+        .quad 0x3fc8dd62821404a9
+        .quad 0x3fc8a8980abfbd32
+        .quad 0x3fc873dca68b06f4
+        .quad 0x3fc83f304cdc5aa7
+        .quad 0x3fc80a92f5218acc
+        .quad 0x3fc7d60496cfbb4c
+        .quad 0x3fc7a18529635926
+        .quad 0x3fc76d14a4601225
+        .quad 0x3fc738b2ff50ccad
+        .quad 0x3fc7046031c79f85
+        .quad 0x3fc6d01c335dc9b5
+        .quad 0x3fc69be6fbb3aa6f
+        .quad 0x3fc667c08270b905
+        .quad 0x3fc633a8bf437ce1
+        .quad 0x3fc5ff9fa9e18595
+        .quad 0x3fc5cba53a0762ed
+        .quad 0x3fc597b967789d12
+        .quad 0x3fc563dc29ffacb2
+        .quad 0x3fc5300d796df33a
+        .quad 0x3fc4fc4d4d9bb313
+        .quad 0x3fc4c89b9e6807f5
+        .quad 0x3fc494f863b8df35
+        .quad 0x3fc46163957af02e
+        .quad 0x3fc42ddd2ba1b4a9
+        .quad 0x3fc3fa651e276158
+        .quad 0x3fc3c6fb650cde51
+        .quad 0x3fc3939ff859bf9f
+        .quad 0x3fc36052d01c3dd7
+        .quad 0x3fc32d13e4692eb7
+        .quad 0x3fc2f9e32d5bfdd1
+        .quad 0x3fc2c6c0a316a540
+        .quad 0x3fc293ac3dc1a668
+        .quad 0x3fc260a5f58c02bd
+        .quad 0x3fc22dadc2ab3497
+        .quad 0x3fc1fac39d5b280c
+        .quad 0x3fc1c7e77dde33dc
+        .quad 0x3fc195195c7d125b
+        .quad 0x3fc162593186da70
+        .quad 0x3fc12fa6f550f896
+        .quad 0x3fc0fd02a03727ea
+        .quad 0x3fc0ca6c2a9b6b41
+        .quad 0x3fc097e38ce60649
+        .quad 0x3fc06568bf8576b3
+        .quad 0x3fc032fbbaee6d65
+        .quad 0x3fc0009c779bc7b5
+        .quad 0x3fbf9c95dc1d1165
+        .quad 0x3fbf380e2d9ba4df
+        .quad 0x3fbed3a1d4cdbebb
+        .quad 0x3fbe6f50c2d9f754
+        .quad 0x3fbe0b1ae8f2fd56
+        .quad 0x3fbda700385788a2
+        .quad 0x3fbd4300a2524d41
+        .quad 0x3fbcdf1c1839ee74
+        .quad 0x3fbc7b528b70f1c5
+        .quad 0x3fbc17a3ed65b23c
+        .quad 0x3fbbb4102f925394
+        .quad 0x3fbb5097437cb58e
+        .quad 0x3fbaed391ab6674e
+        .quad 0x3fba89f5a6dc9acc
+        .quad 0x3fba26ccd9981853
+        .quad 0x3fb9c3bea49d3214
+        .quad 0x3fb960caf9abb7ca
+        .quad 0x3fb8fdf1ca8eea6a
+        .quad 0x3fb89b33091d6fe8
+        .quad 0x3fb8388ea739470a
+        .quad 0x3fb7d60496cfbb4c
+        .quad 0x3fb77394c9d958d5
+        .quad 0x3fb7113f3259e07a
+        .quad 0x3fb6af03c2603bd0
+        .quad 0x3fb64ce26c067157
+        .quad 0x3fb5eadb217198a3
+        .quad 0x3fb588edd4d1ceaa
+        .quad 0x3fb5271a78622a0f
+        .quad 0x3fb4c560fe68af88
+        .quad 0x3fb463c15936464e
+        .quad 0x3fb4023b7b26ac9e
+        .quad 0x3fb3a0cf56a06c4b
+        .quad 0x3fb33f7cde14cf5a
+        .quad 0x3fb2de4403ffd4b3
+        .quad 0x3fb27d24bae824db
+        .quad 0x3fb21c1ef55f06c2
+        .quad 0x3fb1bb32a600549d
+        .quad 0x3fb15a5fbf7270ce
+        .quad 0x3fb0f9a634663add
+        .quad 0x3fb09905f797047c
+        .quad 0x3fb0387efbca869e
+        .quad 0x3fafb02267a1ad2d
+        .quad 0x3faeef792508b69d
+        .quad 0x3fae2f02159384fe
+        .quad 0x3fad6ebd1f1febfe
+        .quad 0x3facaeaa27a02241
+        .quad 0x3fabeec9151aac2e
+        .quad 0x3fab2f19cdaa46dc
+        .quad 0x3faa6f9c377dd31b
+        .quad 0x3fa9b05038d84095
+        .quad 0x3fa8f135b8107912
+        .quad 0x3fa8324c9b914bc7
+        .quad 0x3fa77394c9d958d5
+        .quad 0x3fa6b50e297afcce
+        .quad 0x3fa5f6b8a11c3c61
+        .quad 0x3fa538941776b01e
+        .quad 0x3fa47aa07357704f
+        .quad 0x3fa3bcdd9b9f00f3
+        .quad 0x3fa2ff4b77413dcb
+        .quad 0x3fa241e9ed454683
+        .quad 0x3fa184b8e4c56af8
+        .quad 0x3fa0c7b844ef1795
+        .quad 0x3fa00ae7f502c1c4
+        .quad 0x3f9e9c8fb8a7a900
+        .quad 0x3f9d23afc49139f9
+        .quad 0x3f9bab2fdcb46ec7
+        .quad 0x3f9a330fd028f75f
+        .quad 0x3f98bb4f6e2bd536
+        .quad 0x3f9743ee861f3556
+        .quad 0x3f95ccece78a4a9e
+        .quad 0x3f94564a62192834
+        .quad 0x3f92e006c59c9c29
+        .quad 0x3f916a21e20a0a45
+        .quad 0x3f8fe9370ef68e1b
+        .quad 0x3f8cfee70c5ce5dc
+        .quad 0x3f8a15535d0bab34
+        .quad 0x3f872c7ba20f7327
+        .quad 0x3f84445f7cbc8fd2
+        .quad 0x3f815cfe8eaec830
+        .quad 0x3f7cecb0f3922091
+        .quad 0x3f7720d9c06a835f
+        .quad 0x3f715676c8c7a8c1
+        .quad 0x3f671b0ea42e5fda
+        .quad 0x3f57182a894b69c6
+        .quad 0x8000000000000000
+        /*== poly_coeff[5] ==*/
+        .align 16
+        .quad 0x3fd2776E996DA1D2, 0x3fd2776E996DA1D2 /* coeff5 */
+        .quad 0xbfd715494C3E7C9B, 0xbfd715494C3E7C9B /* coeff4 */
+        .quad 0x3fdEC709DC39E926, 0x3fdEC709DC39E926 /* coeff3 */
+        .quad 0xbfe71547652B7CF8, 0xbfe71547652B7CF8 /* coeff2 */
+        .quad 0x3ff71547652B82FE, 0x3ff71547652B82FE /* coeff1 */
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 16
+        .quad 0x3f50000000000000, 0x3f50000000000000
+        /*== MinNorm ==*/
+        .align 16
+        .quad 0x0010000000000000, 0x0010000000000000
+        /*== MaxNorm ==*/
+        .align 16
+        .quad 0x7fefffffffffffff, 0x7fefffffffffffff
+        /*== HalfMask ==*/
+        .align 16
+        .quad 0xfffffffffc000000, 0xfffffffffc000000
+        /*== One ==*/
+        .align 16
+        .quad 0x3ff0000000000000, 0x3ff0000000000000
+        /*== Threshold ==*/
+        .align 16
+        .quad 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 16
+        .quad 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 16
+        .quad 0x408ff00000000000, 0x408ff00000000000
+        .align 16
+        .type	__svml_dlog2_data_internal,@object
+        .size	__svml_dlog2_data_internal,.-__svml_dlog2_data_internal
+        .space 80, 0x00 	
+        .align 16
+
+.FLT_11:
+        .long	0x00000000,0x43380000,0x00000000,0x43380000
+        .type	.FLT_11,@object
+        .size	.FLT_11,16
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core-sse.S
new file mode 100644
index 0000000000..882ee276f2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized log2, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_log2 _ZGVdN4v_log2_sse_wrapper
+#include "../svml_d_log24_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core.c
new file mode 100644
index 0000000000..7678090d11
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized log2, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_log2
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_log2, __GI__ZGVdN4v_log2, __redirect__ZGVdN4v_log2)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core_avx2.S
new file mode 100644
index 0000000000..b4ead42eae
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core_avx2.S
@@ -0,0 +1,1324 @@
+/* Function log2 vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log2(x) = k - log2(Rcp) + poly_approximation(R)
+ *       log2(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dlog2_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	8224
+#define poly_coeff                    	12352
+#define ExpMask                       	12512
+#define Two10                         	12544
+#define MinNorm                       	12576
+#define MaxNorm                       	12608
+#define HalfMask                      	12640
+#define One                           	12672
+#define Threshold                     	12704
+#define Bias                          	12736
+#define Bias1                         	12768
+
+/* Lookup bias for data table __svml_dlog2_data_internal.  */
+#define Table_Lookup_Bias               -0x405fe0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_log2_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       Table_Lookup_Bias+__svml_dlog2_data_internal(%rip), %r8
+        vmovapd   %ymm0, %ymm3
+
+/* preserve mantissa, set input exponent to 2^(-10) */
+        vandpd    ExpMask+__svml_dlog2_data_internal(%rip), %ymm3, %ymm4
+        vorpd     Two10+__svml_dlog2_data_internal(%rip), %ymm4, %ymm2
+
+/* reciprocal approximation good to at least 11 bits */
+        vcvtpd2ps %ymm2, %xmm5
+
+/* exponent bits */
+        vpsrlq    $20, %ymm3, %ymm7
+        vmovupd   One+__svml_dlog2_data_internal(%rip), %ymm14
+        vrcpps    %xmm5, %xmm6
+
+/* check range */
+        vcmplt_oqpd MinNorm+__svml_dlog2_data_internal(%rip), %ymm3, %ymm11
+        vcmpnle_uqpd MaxNorm+__svml_dlog2_data_internal(%rip), %ymm3, %ymm12
+        vcvtps2pd %xmm6, %ymm9
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        vroundpd  $0, %ymm9, %ymm1
+
+/* exponent */
+        vmovupd   Threshold+__svml_dlog2_data_internal(%rip), %ymm9
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        vpsrlq    $40, %ymm1, %ymm15
+
+/* argument reduction */
+        vfmsub213pd %ymm14, %ymm1, %ymm2
+
+/* polynomial */
+        vmovupd   poly_coeff+__svml_dlog2_data_internal(%rip), %ymm14
+        vcmplt_oqpd %ymm1, %ymm9, %ymm1
+        vfmadd213pd poly_coeff+32+__svml_dlog2_data_internal(%rip), %ymm2, %ymm14
+        vorpd     %ymm12, %ymm11, %ymm13
+        vmulpd    %ymm2, %ymm2, %ymm12
+
+/* combine and get argument value range mask */
+        vmovmskpd %ymm13, %eax
+        vextractf128 $1, %ymm7, %xmm8
+        vshufps   $221, %xmm8, %xmm7, %xmm10
+
+/* biased exponent in DP format */
+        vcvtdq2pd %xmm10, %ymm0
+        vandpd    Bias+__svml_dlog2_data_internal(%rip), %ymm1, %ymm10
+        vorpd     Bias1+__svml_dlog2_data_internal(%rip), %ymm10, %ymm11
+        vsubpd    %ymm11, %ymm0, %ymm1
+        vmovupd   poly_coeff+64+__svml_dlog2_data_internal(%rip), %ymm0
+        vfmadd213pd poly_coeff+96+__svml_dlog2_data_internal(%rip), %ymm2, %ymm0
+        vmulpd    poly_coeff+128+__svml_dlog2_data_internal(%rip), %ymm2, %ymm2
+        vfmadd213pd %ymm0, %ymm12, %ymm14
+        vfmadd213pd %ymm2, %ymm12, %ymm14
+        vextractf128 $1, %ymm15, %xmm6
+        vmovd     %xmm15, %edx
+        vmovd     %xmm6, %esi
+        movslq    %edx, %rdx
+        vpextrd   $2, %xmm15, %ecx
+        movslq    %esi, %rsi
+        vpextrd   $2, %xmm6, %edi
+        movslq    %ecx, %rcx
+        movslq    %edi, %rdi
+        vmovsd    (%r8,%rdx), %xmm4
+        vmovsd    (%r8,%rsi), %xmm7
+        vmovhpd   (%r8,%rcx), %xmm4, %xmm5
+        vmovhpd   (%r8,%rdi), %xmm7, %xmm8
+        vinsertf128 $1, %xmm8, %ymm5, %ymm13
+
+/* reconstruction */
+        vaddpd    %ymm14, %ymm13, %ymm0
+        vaddpd    %ymm0, %ymm1, %ymm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm3, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      log2@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_log2_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dlog2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
+        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(32)) VUINT32 poly_coeff[5][4][2];
+        __declspec(align(32)) VUINT32 ExpMask[4][2];
+        __declspec(align(32)) VUINT32 Two10[4][2];
+        __declspec(align(32)) VUINT32 MinNorm[4][2];
+        __declspec(align(32)) VUINT32 MaxNorm[4][2];
+        __declspec(align(32)) VUINT32 HalfMask[4][2];
+        __declspec(align(32)) VUINT32 One[4][2];
+        __declspec(align(32)) VUINT32 Threshold[4][2];
+        __declspec(align(32)) VUINT32 Bias[4][2];
+        __declspec(align(32)) VUINT32 Bias1[4][2];
+} __svml_dlog2_data_internal;
+#endif
+__svml_dlog2_data_internal:
+        /* Log_HA_table */
+        .quad 0xc08ff00000000000, 0x0000000000000000
+        .quad 0xc08ff0040038c920, 0x3d52bfc81744e999
+        .quad 0xc08ff007ff0f0190, 0xbd59b2cedc63c895
+        .quad 0xc08ff00bfc839e88, 0xbd28e365e6741d71
+        .quad 0xc08ff00ff8979428, 0x3d4027998f69a77d
+        .quad 0xc08ff013f34bd5a0, 0x3d5dd2cb33fe6a89
+        .quad 0xc08ff017eca15518, 0xbd526514cdf2c019
+        .quad 0xc08ff01be49903d8, 0xbd44bfeeba165e04
+        .quad 0xc08ff01fdb33d218, 0xbd3fa79ee110cec3
+        .quad 0xc08ff023d072af20, 0xbd4eebb642c7fd60
+        .quad 0xc08ff027c4568948, 0x3d429b13d7093443
+        .quad 0xc08ff02bb6e04de8, 0x3d50f346bd36551e
+        .quad 0xc08ff02fa810e968, 0xbd5020bb662f1536
+        .quad 0xc08ff03397e94750, 0x3d5de76b56340995
+        .quad 0xc08ff037866a5218, 0x3d58065ff3304090
+        .quad 0xc08ff03b7394f360, 0x3d561fc9322fb785
+        .quad 0xc08ff03f5f6a13d0, 0x3d0abecd17d0d778
+        .quad 0xc08ff04349ea9b28, 0xbd588f3ad0ce4d44
+        .quad 0xc08ff04733177040, 0xbd4454ba4ac5f44d
+        .quad 0xc08ff04b1af178f8, 0xbd556f78faaa0887
+        .quad 0xc08ff04f01799a58, 0x3d49db8976de7469
+        .quad 0xc08ff052e6b0b868, 0xbd5cdb6fce17ef00
+        .quad 0xc08ff056ca97b668, 0xbd576de8c0412f09
+        .quad 0xc08ff05aad2f76a0, 0x3d30142c7ec6475c
+        .quad 0xc08ff05e8e78da70, 0xbd1e685afc26de72
+        .quad 0xc08ff0626e74c260, 0xbd40b64c954078a3
+        .quad 0xc08ff0664d240e10, 0xbd5fcde393462d7d
+        .quad 0xc08ff06a2a879c48, 0xbd537245eeeecc53
+        .quad 0xc08ff06e06a04ae8, 0x3d4ac306eb47b436
+        .quad 0xc08ff071e16ef6e8, 0xbd5a1fd9d3758f6b
+        .quad 0xc08ff075baf47c80, 0x3d2401fbaaa67e3c
+        .quad 0xc08ff0799331b6f0, 0x3d4f8dbef47a4d53
+        .quad 0xc08ff07d6a2780a8, 0x3d51215e0abb42d1
+        .quad 0xc08ff0813fd6b340, 0x3d57ce6249eddb35
+        .quad 0xc08ff08514402770, 0xbd38a803c7083a25
+        .quad 0xc08ff088e764b528, 0x3d42218beba5073e
+        .quad 0xc08ff08cb9453370, 0x3d447b66f1c6248f
+        .quad 0xc08ff09089e27880, 0xbd53d9297847e995
+        .quad 0xc08ff094593d59c8, 0xbd12b6979cc77aa9
+        .quad 0xc08ff0982756abd0, 0xbd55308545ecd702
+        .quad 0xc08ff09bf42f4260, 0xbd578fa97c3b936f
+        .quad 0xc08ff09fbfc7f068, 0xbd41828408ce869d
+        .quad 0xc08ff0a38a218808, 0x3d555da6ce7251a6
+        .quad 0xc08ff0a7533cda88, 0xbd41f3cd14bfcb02
+        .quad 0xc08ff0ab1b1ab878, 0xbd1f028da6bf1852
+        .quad 0xc08ff0aee1bbf188, 0xbd4cf04de3267f54
+        .quad 0xc08ff0b2a72154a8, 0xbd4556e47019db10
+        .quad 0xc08ff0b66b4baff8, 0x3d1e7ba00b15fbe4
+        .quad 0xc08ff0ba2e3bd0d0, 0x3d5bfde1c52c2f28
+        .quad 0xc08ff0bdeff283b8, 0x3d48d63fe20ee5d6
+        .quad 0xc08ff0c1b0709480, 0x3d57f551980838ff
+        .quad 0xc08ff0c56fb6ce20, 0xbd4189091f293c81
+        .quad 0xc08ff0c92dc5fae0, 0x3d4d549f05f06169
+        .quad 0xc08ff0ccea9ee428, 0xbd5982466074e1e3
+        .quad 0xc08ff0d0a64252b8, 0xbd5d30a6b16c0e4b
+        .quad 0xc08ff0d460b10e80, 0xbd3138bf3b51a201
+        .quad 0xc08ff0d819ebdea8, 0xbd454e680c0801d6
+        .quad 0xc08ff0dbd1f389a8, 0x3d584db361385926
+        .quad 0xc08ff0df88c8d520, 0xbd564f2252a82c03
+        .quad 0xc08ff0e33e6c8610, 0xbd5c78c35ed5d034
+        .quad 0xc08ff0e6f2df60a8, 0xbd52eb9f29ca3d75
+        .quad 0xc08ff0eaa6222860, 0x3d5340c0c01b5ff8
+        .quad 0xc08ff0ee58359fe8, 0x3d10c2acaffa64b6
+        .quad 0xc08ff0f2091a8948, 0xbd3fced311301ebe
+        .quad 0xc08ff0f5b8d1a5c8, 0x3d41ee5d591af30b
+        .quad 0xc08ff0f9675bb5f0, 0x3d4873546b0e668c
+        .quad 0xc08ff0fd14b97998, 0x3d5a99928177a119
+        .quad 0xc08ff100c0ebafd8, 0x3d378ead132adcac
+        .quad 0xc08ff1046bf31720, 0x3d51a538bc597d48
+        .quad 0xc08ff10815d06d18, 0xbd540ee2f35efd7e
+        .quad 0xc08ff10bbe846ec8, 0xbd59cf94753adacc
+        .quad 0xc08ff10f660fd878, 0xbd5201a3d6862895
+        .quad 0xc08ff1130c7365c0, 0x3d383e25d0822d03
+        .quad 0xc08ff116b1afd180, 0xbd0b7389bbea8f7b
+        .quad 0xc08ff11a55c5d5f0, 0xbd4df278087a6617
+        .quad 0xc08ff11df8b62c98, 0xbd48daeb8ec01e26
+        .quad 0xc08ff1219a818e50, 0x3d57c9312e0a14da
+        .quad 0xc08ff1253b28b330, 0xbd5f0fbc0e4d507e
+        .quad 0xc08ff128daac52c8, 0xbd222afdee008687
+        .quad 0xc08ff12c790d23d8, 0x3d17c71747bcef8b
+        .quad 0xc08ff130164bdc88, 0x3d5d69cfd051af50
+        .quad 0xc08ff133b2693248, 0x3d59dff064e9433a
+        .quad 0xc08ff1374d65d9e8, 0x3d4f71a30db3240b
+        .quad 0xc08ff13ae7428788, 0xbd5e56afa9524606
+        .quad 0xc08ff13e7fffeeb0, 0xbd44acd84e6f8518
+        .quad 0xc08ff142179ec228, 0xbd519845ade5e121
+        .quad 0xc08ff145ae1fb420, 0xbd5b3b4a38ddec70
+        .quad 0xc08ff14943837620, 0xbd5ea4bb5bc137c7
+        .quad 0xc08ff14cd7cab910, 0x3d5610f3bf8eb6ce
+        .quad 0xc08ff1506af62d20, 0x3d57b1170d6184cf
+        .quad 0xc08ff153fd0681f0, 0x3d5791a688a3660e
+        .quad 0xc08ff1578dfc6678, 0x3d5d41ecf8abac2e
+        .quad 0xc08ff15b1dd88908, 0x3cf0bd995d64d573
+        .quad 0xc08ff15eac9b9758, 0xbd5e3653cd796d01
+        .quad 0xc08ff1623a463e80, 0xbd597573005ef2d8
+        .quad 0xc08ff165c6d92af0, 0xbd4ee222d6439c41
+        .quad 0xc08ff16952550880, 0x3d5913b845e75950
+        .quad 0xc08ff16cdcba8258, 0xbd558e7ba239077e
+        .quad 0xc08ff170660a4328, 0x3d5a0e174a2cae66
+        .quad 0xc08ff173ee44f4d8, 0x3d22b8db103db712
+        .quad 0xc08ff177756b40d8, 0x3d5cc610480853c4
+        .quad 0xc08ff17afb7dcfe0, 0xbd304a8bc84e5c0f
+        .quad 0xc08ff17e807d4a28, 0x3d3639d185da5f7d
+        .quad 0xc08ff182046a5738, 0xbd534705d06d788f
+        .quad 0xc08ff18587459e10, 0xbd540d25b28a51fd
+        .quad 0xc08ff189090fc510, 0xbd02d804afa7080a
+        .quad 0xc08ff18c89c97200, 0x3d5f2a5d305818ba
+        .quad 0xc08ff19009734a08, 0xbd3a602e9d05c3e4
+        .quad 0xc08ff193880df1d0, 0xbd533d6fdcd54875
+        .quad 0xc08ff197059a0d60, 0x3d24eaf0a9490202
+        .quad 0xc08ff19a82184020, 0xbd5685666d98eb59
+        .quad 0xc08ff19dfd892cf8, 0xbd509f8745f0868b
+        .quad 0xc08ff1a177ed7630, 0xbd2dcba340a9d268
+        .quad 0xc08ff1a4f145bd80, 0x3d4916fcd0331266
+        .quad 0xc08ff1a86992a408, 0xbd548cd033a49073
+        .quad 0xc08ff1abe0d4ca68, 0xbd5252f40e5df1a2
+        .quad 0xc08ff1af570cd0a0, 0xbd541d623bd02248
+        .quad 0xc08ff1b2cc3b5628, 0xbd258dc48235c071
+        .quad 0xc08ff1b64060f9e0, 0xbd4b4bd8f02ed3f2
+        .quad 0xc08ff1b9b37e5a28, 0x3d4e8d20a88cd0a2
+        .quad 0xc08ff1bd259414c0, 0x3d3b669b6380bc55
+        .quad 0xc08ff1c096a2c6e8, 0xbd45d54159d51094
+        .quad 0xc08ff1c406ab0d58, 0x3d59f684ffbca44d
+        .quad 0xc08ff1c775ad8428, 0x3d543b1b1d508399
+        .quad 0xc08ff1cae3aac6f8, 0x3d5c30953a12fc6e
+        .quad 0xc08ff1ce50a370d0, 0xbd1763b04f9aad5f
+        .quad 0xc08ff1d1bc981c40, 0x3d573c6fa54f46c2
+        .quad 0xc08ff1d527896338, 0x3d48ccfb9ffd7455
+        .quad 0xc08ff1d89177df30, 0x3d42756f80d6f7ce
+        .quad 0xc08ff1dbfa642910, 0xbd3c2bfbc353c5a5
+        .quad 0xc08ff1df624ed940, 0x3d1d6064f5dc380b
+        .quad 0xc08ff1e2c9388798, 0x3ce327c6b30711cf
+        .quad 0xc08ff1e62f21cb70, 0x3d140aa9546525bc
+        .quad 0xc08ff1e9940b3b98, 0xbd15c1ff43c21863
+        .quad 0xc08ff1ecf7f56e60, 0x3d590ba680120498
+        .quad 0xc08ff1f05ae0f988, 0x3d5390c6b62dff50
+        .quad 0xc08ff1f3bcce7258, 0x3d4da0c90878457f
+        .quad 0xc08ff1f71dbe6d90, 0x3d30697edc85b98c
+        .quad 0xc08ff1fa7db17f70, 0x3d04d81188510a79
+        .quad 0xc08ff1fddca83bb0, 0xbd5f2ddc983ce25c
+        .quad 0xc08ff2013aa33598, 0x3d46c22f0fae6844
+        .quad 0xc08ff20497a2ffd0, 0xbd53359b714c3d03
+        .quad 0xc08ff207f3a82ca0, 0xbd4aefaa5524f88b
+        .quad 0xc08ff20b4eb34dc0, 0x3d39bf4a4a73d01d
+        .quad 0xc08ff20ea8c4f468, 0x3d44217befdb12e6
+        .quad 0xc08ff21201ddb158, 0x3d5219b281d4b6f8
+        .quad 0xc08ff21559fe14c8, 0xbd5e3b123373d370
+        .quad 0xc08ff218b126ae88, 0xbd59b525a6edc3cb
+        .quad 0xc08ff21c07580dd8, 0xbd4b494e7737c4dc
+        .quad 0xc08ff21f5c92c180, 0xbd3989b7d67e3e54
+        .quad 0xc08ff222b0d757d0, 0x3d486c8f098ad3cf
+        .quad 0xc08ff22604265e98, 0x3d5254956d8e15b2
+        .quad 0xc08ff22956806330, 0x3d3f14730a362959
+        .quad 0xc08ff22ca7e5f278, 0xbd40e8ed02e32ea1
+        .quad 0xc08ff22ff85798d8, 0xbd40fb2b9b1e0261
+        .quad 0xc08ff23347d5e238, 0xbd5bfeb1e13c8bc3
+        .quad 0xc08ff23696615a18, 0x3d5b891f041e037b
+        .quad 0xc08ff239e3fa8b60, 0xbd36255027582bb9
+        .quad 0xc08ff23d30a200a8, 0x3d56bb5a92a55361
+        .quad 0xc08ff2407c5843f0, 0xbd31902fb4417244
+        .quad 0xc08ff243c71dded8, 0xbd5a8a7c3c4a2cc6
+        .quad 0xc08ff24710f35a88, 0xbd23be1be6941016
+        .quad 0xc08ff24a59d93fa8, 0x3d55c85afafa1d46
+        .quad 0xc08ff24da1d01668, 0xbd5b4b05a0adcbf1
+        .quad 0xc08ff250e8d866a0, 0x3d134d191476f74b
+        .quad 0xc08ff2542ef2b798, 0x3d5e78ce963395e1
+        .quad 0xc08ff257741f9028, 0x3d3f9219a8f57c17
+        .quad 0xc08ff25ab85f76c8, 0x3d5cfc6f47ac691b
+        .quad 0xc08ff25dfbb2f168, 0x3d4ab3b720b5ca71
+        .quad 0xc08ff2613e1a8598, 0x3d54a4ab99feb71a
+        .quad 0xc08ff2647f96b868, 0xbd42daa69d79d724
+        .quad 0xc08ff267c0280e88, 0xbd344d9115018f45
+        .quad 0xc08ff26affcf0c28, 0xbd56673e143d2ac0
+        .quad 0xc08ff26e3e8c3518, 0x3d3aac889e91c638
+        .quad 0xc08ff2717c600ca8, 0x3d4cf65b41d006e7
+        .quad 0xc08ff274b94b15c0, 0xbd4c821320391e76
+        .quad 0xc08ff277f54dd2e8, 0x3d51abd6e2ddc2a1
+        .quad 0xc08ff27b3068c620, 0xbd2f1bdd1264e703
+        .quad 0xc08ff27e6a9c7110, 0xbd58437b4f032f15
+        .quad 0xc08ff281a3e954f0, 0xbd4f8e063b069a7d
+        .quad 0xc08ff284dc4ff288, 0x3d5276d0723a662a
+        .quad 0xc08ff28813d0ca28, 0xbd5731f7c6d8f6eb
+        .quad 0xc08ff28b4a6c5bd0, 0xbd58b587f08307ec
+        .quad 0xc08ff28e80232708, 0x3d57f19a7a352baf
+        .quad 0xc08ff291b4f5aae0, 0x3d570d99aff32790
+        .quad 0xc08ff294e8e46610, 0x3d4efafaad4f59db
+        .quad 0xc08ff2981befd6e0, 0xbd41eb1728371564
+        .quad 0xc08ff29b4e187b38, 0x3d458465b4e080d7
+        .quad 0xc08ff29e7f5ed088, 0x3d46acb4a035a820
+        .quad 0xc08ff2a1afc353e0, 0xbd39fc68238dd5d3
+        .quad 0xc08ff2a4df4681f0, 0x3d526d90c6750dde
+        .quad 0xc08ff2a80de8d6f0, 0x3d48505c598278fd
+        .quad 0xc08ff2ab3baacec0, 0x3d520fece8e148e8
+        .quad 0xc08ff2ae688ce4d0, 0x3d14f7bf38646243
+        .quad 0xc08ff2b1948f9430, 0xbd5aa5f693a627df
+        .quad 0xc08ff2b4bfb35790, 0xbd4725d8e6280861
+        .quad 0xc08ff2b7e9f8a930, 0x3d482e0765d44bda
+        .quad 0xc08ff2bb136002e8, 0xbd523d745da75cde
+        .quad 0xc08ff2be3be9de40, 0xbd32e50b4191ef73
+        .quad 0xc08ff2c16396b448, 0xbd490856dfe073b2
+        .quad 0xc08ff2c48a66fdb8, 0xbd512b526137db4d
+        .quad 0xc08ff2c7b05b32e8, 0x3d5bfcdc71b36585
+        .quad 0xc08ff2cad573cbb8, 0xbd2c24f2afddb377
+        .quad 0xc08ff2cdf9b13fc0, 0xbd5ea60d06da12f6
+        .quad 0xc08ff2d11d140630, 0xbd582f2f9e256dc5
+        .quad 0xc08ff2d43f9c95d0, 0xbd4411c269523864
+        .quad 0xc08ff2d7614b6508, 0xbd41107eeb7e1093
+        .quad 0xc08ff2da8220e9e8, 0x3d5a4aa491710eda
+        .quad 0xc08ff2dda21d9a10, 0x3d46e50a14550378
+        .quad 0xc08ff2e0c141ead0, 0xbd4881e3bd846de9
+        .quad 0xc08ff2e3df8e5118, 0xbd46d93437bd399d
+        .quad 0xc08ff2e6fd034170, 0xbd5b4ef1e9713a4c
+        .quad 0xc08ff2ea19a13010, 0x3d4a0e31ed25b3ef
+        .quad 0xc08ff2ed356890b8, 0xbd5a7a560db90113
+        .quad 0xc08ff2f05059d6f0, 0x3d51f5bb5f9072c9
+        .quad 0xc08ff2f36a7575c0, 0x3d5ed5225350a585
+        .quad 0xc08ff2f683bbdfe0, 0xbd1c9363d9e745db
+        .quad 0xc08ff2f99c2d87b8, 0x3d329c788e376e0d
+        .quad 0xc08ff2fcb3cadf40, 0xbd59eb5d29918de0
+        .quad 0xc08ff2ffca945828, 0xbd4a86aac097a06b
+        .quad 0xc08ff302e08a63b8, 0x3d541c2c97e8b4d1
+        .quad 0xc08ff305f5ad72d8, 0x3d43c95dec31821b
+        .quad 0xc08ff30909fdf620, 0xbd590abed3d72738
+        .quad 0xc08ff30c1d7c5dd8, 0x3d4caefdad90e913
+        .quad 0xc08ff30f302919d0, 0xbd4f7ed5e1dcb170
+        .quad 0xc08ff312420499a0, 0x3d3c590edf8c3407
+        .quad 0xc08ff315530f4c70, 0x3d5477d46ce838e1
+        .quad 0xc08ff3186349a118, 0x3d5e4b00c511fa78
+        .quad 0xc08ff31b72b40610, 0xbd54333e5a0c1658
+        .quad 0xc08ff31e814ee990, 0x3d25300b88bfa10a
+        .quad 0xc08ff3218f1ab958, 0xbd5bfbd520249ed7
+        .quad 0xc08ff3249c17e2f0, 0x3d436b1cdba645b7
+        .quad 0xc08ff327a846d368, 0xbd5cb667c2f86eaa
+        .quad 0xc08ff32ab3a7f7a0, 0x3d5334d06a920d5f
+        .quad 0xc08ff32dbe3bbbf8, 0xbd5407602ab64243
+        .quad 0xc08ff330c8028ca0, 0xbd52b12c9cc82316
+        .quad 0xc08ff333d0fcd560, 0x3d158d7dd801324b
+        .quad 0xc08ff336d92b01a8, 0xbd38b55deae69564
+        .quad 0xc08ff339e08d7ca0, 0x3d4a92d51dc43d43
+        .quad 0xc08ff33ce724b110, 0x3d5455afbb5de008
+        .quad 0xc08ff33fecf10970, 0x3d3b65694b6f87fb
+        .quad 0xc08ff342f1f2efe8, 0xbd3afb8ccc1260eb
+        .quad 0xc08ff345f62ace50, 0x3d59c98f7ec71b79
+        .quad 0xc08ff348f9990e18, 0xbd5238294ff3846d
+        .quad 0xc08ff34bfc3e1880, 0x3d4deba7087bbf7b
+        .quad 0xc08ff34efe1a5650, 0xbd573e25d2d308e5
+        .quad 0xc08ff351ff2e3020, 0xbd44bc302ffa76fb
+        .quad 0xc08ff354ff7a0e20, 0xbd2cad65891df000
+        .quad 0xc08ff357fefe5838, 0x3d4b4fe326c05a8a
+        .quad 0xc08ff35afdbb75f8, 0x3d0fb5680f67649b
+        .quad 0xc08ff35dfbb1cea8, 0xbd4af509a9977e57
+        .quad 0xc08ff360f8e1c940, 0x3cea69221cfb0ad6
+        .quad 0xc08ff363f54bcc60, 0x3d3d116c159fead5
+        .quad 0xc08ff366f0f03e58, 0xbd5e64e8bff70d5e
+        .quad 0xc08ff369ebcf8538, 0xbd5cc32ce5effb96
+        .quad 0xc08ff36ce5ea06b8, 0x3d57bbe811e4fbda
+        .quad 0xc08ff36fdf402830, 0xbcf46d4595033678
+        .quad 0xc08ff372d7d24ec8, 0x3d4c4bbec857b9fc
+        .quad 0xc08ff375cfa0df40, 0xbd59d3f339613a2d
+        .quad 0xc08ff378c6ac3e28, 0x3d58408e1bcb4e24
+        .quad 0xc08ff37bbcf4cfa0, 0x3d5fdb793dc8e643
+        .quad 0xc08ff37eb27af788, 0xbd5f0d884b401f1e
+        .quad 0xc08ff381a73f1988, 0xbd5a7ed37e2c50b4
+        .quad 0xc08ff3849b4198e8, 0x3d5b14c1f630b2af
+        .quad 0xc08ff3878e82d898, 0x3d505a9abef02aff
+        .quad 0xc08ff38a81033b50, 0xbd4a9bbd51a7d1c4
+        .quad 0xc08ff38d72c32380, 0x3d4783623464f80e
+        .quad 0xc08ff39063c2f338, 0xbd0e2d78f68abcc7
+        .quad 0xc08ff39354030c50, 0x3d3e604763e782cb
+        .quad 0xc08ff3964383d048, 0xbd4514f0840b6f59
+        .quad 0xc08ff3993245a060, 0xbd5488753d6035a4
+        .quad 0xc08ff39c2048dd90, 0x3d5ccc099b5ff97d
+        .quad 0xc08ff39f0d8de870, 0x3d454ada83325c69
+        .quad 0xc08ff3a1fa152168, 0x3d1e4b27fb754eb1
+        .quad 0xc08ff3a4e5dee890, 0x3d58c67819ead583
+        .quad 0xc08ff3a7d0eb9da8, 0xbd536d02e85d644b
+        .quad 0xc08ff3aabb3ba048, 0x3d5f510ab9e7c184
+        .quad 0xc08ff3ada4cf4f98, 0x3d557bc5b296d5f5
+        .quad 0xc08ff3b08da70a90, 0xbd48893b8f7f52c9
+        .quad 0xc08ff3b375c32fe8, 0x3d5ca0b69a37d601
+        .quad 0xc08ff3b65d241df0, 0xbd519c57fff86872
+        .quad 0xc08ff3b943ca32d8, 0x3d048da0e3a8c3c3
+        .quad 0xc08ff3bc29b5cc68, 0xbd5dd05e06ec07d0
+        .quad 0xc08ff3bf0ee74840, 0x3d56c52a5c8015db
+        .quad 0xc08ff3c1f35f0398, 0x3d54e1dba9930bed
+        .quad 0xc08ff3c4d71d5b78, 0x3d2c5f679a7932b7
+        .quad 0xc08ff3c7ba22aca0, 0xbd3f77628aa1aed8
+        .quad 0xc08ff3cd7e03ac60, 0xbd5cc8a22f1d8591
+        .quad 0xc08ff3d33f04e360, 0x3d4ae09463e13f6f
+        .quad 0xc08ff3d8fd292dc8, 0x3d42736efbec3922
+        .quad 0xc08ff3deb8736390, 0xbce0324f8d149b09
+        .quad 0xc08ff3e470e65870, 0xbd52089e4b8dd900
+        .quad 0xc08ff3ea2684dbf0, 0xbd5f8e9d5dea127f
+        .quad 0xc08ff3efd951b970, 0xbd4b60d79db026b1
+        .quad 0xc08ff3f5894fb828, 0x3d45ff1d6cea2c52
+        .quad 0xc08ff3fb36819b38, 0x3d5d56022cd7f5b2
+        .quad 0xc08ff400e0ea21a8, 0xbd58d63f09907b27
+        .quad 0xc08ff406888c0690, 0xbd4ce6ea362f7ce0
+        .quad 0xc08ff40c2d6a00f0, 0x3d519fc9ad2ef3ab
+        .quad 0xc08ff411cf86c3c8, 0xbd55fc89e7b55f20
+        .quad 0xc08ff4176ee4fe40, 0xbd53229ca791d9be
+        .quad 0xc08ff41d0b875b88, 0x3d5e7733e6fb23d1
+        .quad 0xc08ff422a57082e0, 0x3d5871413696b637
+        .quad 0xc08ff4283ca317c0, 0x3d4b118aa7f493b9
+        .quad 0xc08ff42dd121b9c8, 0x3d4bdf3692763b50
+        .quad 0xc08ff43362ef04c8, 0x3d4867e17476dd63
+        .quad 0xc08ff438f20d90c8, 0xbd5d49b741c778f3
+        .quad 0xc08ff43e7e7ff228, 0x3d59ac35724f01e3
+        .quad 0xc08ff4440848b968, 0xbd5251ccdc49432d
+        .quad 0xc08ff4498f6a7388, 0x3d56cf153ebc9f07
+        .quad 0xc08ff44f13e7a9b8, 0x3d503b7a697a659c
+        .quad 0xc08ff45495c2e198, 0xbd5fa03da8acd872
+        .quad 0xc08ff45a14fe9d38, 0xbd5e6cfb0b5c38fc
+        .quad 0xc08ff45f919d5b08, 0x3d468b1f1269f1cf
+        .quad 0xc08ff4650ba195e0, 0xbd313a3a8f72c0f3
+        .quad 0xc08ff46a830dc528, 0x3d205d31eb8d2bd4
+        .quad 0xc08ff46ff7e45cb8, 0xbd56cb8ddf5d4a90
+        .quad 0xc08ff4756a27cd00, 0x3d272c2d46acdcbf
+        .quad 0xc08ff47ad9da82e8, 0xbd4946efab7a989d
+        .quad 0xc08ff48046fee800, 0xbd23fabe48cf933c
+        .quad 0xc08ff485b1976268, 0x3d4f03b099d80f79
+        .quad 0xc08ff48b19a654e0, 0x3d4fe0c35ab7e9b5
+        .quad 0xc08ff4907f2e1ed0, 0xbd54b4843f34fe09
+        .quad 0xc08ff495e2311c58, 0xbd5dfa6541236a64
+        .quad 0xc08ff49b42b1a648, 0x3d56fd2c8c418cbb
+        .quad 0xc08ff4a0a0b21218, 0x3d5e687ef208418a
+        .quad 0xc08ff4a5fc34b210, 0x3d4a671ce14c5521
+        .quad 0xc08ff4ab553bd540, 0x3d419d0202e3cd96
+        .quad 0xc08ff4b0abc9c780, 0x3d576b941a895781
+        .quad 0xc08ff4b5ffe0d170, 0xbd4ea96d88cd1a30
+        .quad 0xc08ff4bb518338a0, 0x3d4d6b405bd43ba6
+        .quad 0xc08ff4c0a0b33f60, 0xbcf03382150a56b7
+        .quad 0xc08ff4c5ed7324f8, 0xbd400df96beb0937
+        .quad 0xc08ff4cb37c52590, 0xbd5c161714cdebd5
+        .quad 0xc08ff4d07fab7a48, 0xbd333e8eda1a8e79
+        .quad 0xc08ff4d5c5285928, 0x3d53aba20381d59f
+        .quad 0xc08ff4db083df530, 0xbd45e9b07af4e77c
+        .quad 0xc08ff4e048ee7e70, 0xbd533cfdb78a8c41
+        .quad 0xc08ff4e5873c21f0, 0xbd5d9b87f4d283f2
+        .quad 0xc08ff4eac32909c8, 0xbd53a677deee97fa
+        .quad 0xc08ff4effcb75d18, 0xbd5afd9f5dedc208
+        .quad 0xc08ff4f533e94020, 0x3ce9dd794d20ab77
+        .quad 0xc08ff4fa68c0d428, 0xbd5eeae84ba1cbf1
+        .quad 0xc08ff4ff9b4037b0, 0xbd4f4451587282c8
+        .quad 0xc08ff504cb698648, 0xbd4a1fa15087e717
+        .quad 0xc08ff509f93ed8b0, 0xbd5f2f0042b9331a
+        .quad 0xc08ff50f24c244e0, 0xbd2c2389f8e86341
+        .quad 0xc08ff5144df5ddf0, 0xbd556fcb7b48f200
+        .quad 0xc08ff51974dbb448, 0x3d43ba060aa69038
+        .quad 0xc08ff51e9975d578, 0x3d477ef38ca20229
+        .quad 0xc08ff523bbc64c60, 0x3d49bcaf1aa4168a
+        .quad 0xc08ff528dbcf2120, 0xbd51c5609b60687e
+        .quad 0xc08ff52df9925930, 0xbd51691708d22ce7
+        .quad 0xc08ff5331511f750, 0x3d30d05c98ecb3d1
+        .quad 0xc08ff5382e4ffb90, 0xbd423adb056dd244
+        .quad 0xc08ff53d454e6368, 0xbd3663607042da50
+        .quad 0xc08ff5425a0f29a8, 0x3d42655d3c6187a6
+        .quad 0xc08ff5476c944680, 0xbd028c958ae09d20
+        .quad 0xc08ff54c7cdfaf90, 0xbd436eaf17756653
+        .quad 0xc08ff5518af357e8, 0x3d5fbbbee66f8d24
+        .quad 0xc08ff55696d12ff0, 0xbd5d93b389497880
+        .quad 0xc08ff55ba07b25b0, 0xbd43ff8ff777f337
+        .quad 0xc08ff560a7f32488, 0xbcf3568803ec82a4
+        .quad 0xc08ff565ad3b1560, 0xbd50c83eba5cc7ea
+        .quad 0xc08ff56ab054deb0, 0x3d5becc2411500b7
+        .quad 0xc08ff56fb1426458, 0xbd5dac964ffa8b83
+        .quad 0xc08ff574b00587f0, 0x3d1d82f6cc82e69f
+        .quad 0xc08ff579aca02878, 0xbd34767c0d40542c
+        .quad 0xc08ff57ea7142298, 0xbd52d28e996ed2ce
+        .quad 0xc08ff5839f635090, 0xbd432a85d337086d
+        .quad 0xc08ff588958f8a38, 0x3d512b06ec20c7fd
+        .quad 0xc08ff58d899aa500, 0xbd47e2147555e10b
+        .quad 0xc08ff5927b867410, 0xbd4d84480a1b301d
+        .quad 0xc08ff5976b54c830, 0x3d5622146f3a51bd
+        .quad 0xc08ff59c59076fc8, 0x3d46d485c5f9c392
+        .quad 0xc08ff5a144a03700, 0xbd4562714549f4fd
+        .quad 0xc08ff5a62e20e7b8, 0x3d541ab67e365a63
+        .quad 0xc08ff5ab158b4970, 0xbd5b0855668b2369
+        .quad 0xc08ff5affae12188, 0x3d27de1bc2ed4dd8
+        .quad 0xc08ff5b4de243300, 0x3d40f2592d5ed454
+        .quad 0xc08ff5b9bf563ea8, 0xbd4ee2f8ba7b3e9e
+        .quad 0xc08ff5be9e790320, 0xbd3c2214335c2164
+        .quad 0xc08ff5c37b8e3cc8, 0x3d30745623ab1fd9
+        .quad 0xc08ff5c85697a5d0, 0xbd326c8fb0ffde38
+        .quad 0xc08ff5cd2f96f640, 0xbd4c83277493b0bc
+        .quad 0xc08ff5d2068de3f8, 0x3d39bb1655e6e5ba
+        .quad 0xc08ff5d6db7e22a8, 0x3d403170b47a5559
+        .quad 0xc08ff5dbae6963e8, 0x3d5801ddf1edc325
+        .quad 0xc08ff5e07f515728, 0x3d4b2704c46fe064
+        .quad 0xc08ff5e54e37a9c8, 0x3d5a16e99ed6cd83
+        .quad 0xc08ff5ea1b1e0700, 0xbd5353a3ac18c62f
+        .quad 0xc08ff5eee6061810, 0x3d567c69c189f21a
+        .quad 0xc08ff5f3aef18400, 0xbd50dd3220e0b0f2
+        .quad 0xc08ff5f875e1eff0, 0xbd3ab64d80638db2
+        .quad 0xc08ff5fd3ad8fee0, 0x3d3ec753439035aa
+        .quad 0xc08ff601fdd851c8, 0xbd5e10415f5f5e74
+        .quad 0xc08ff606bee187b0, 0xbd55f1048b113fae
+        .quad 0xc08ff60b7df63d90, 0x3d1e94e4107406c8
+        .quad 0xc08ff6103b180e60, 0xbd4e2eb5d0c36eb5
+        .quad 0xc08ff614f6489330, 0x3d43ec5c714f709a
+        .quad 0xc08ff619af896308, 0x3d519ec459b62a08
+        .quad 0xc08ff61e66dc1300, 0xbd5b93d09dd6161d
+        .quad 0xc08ff6231c423658, 0x3d5d72b849dd56be
+        .quad 0xc08ff627cfbd5e38, 0xbd276b7e32659173
+        .quad 0xc08ff62c814f1a08, 0x3d4fd918f2e7a6b9
+        .quad 0xc08ff63130f8f730, 0x3d5609ba1dcc4c97
+        .quad 0xc08ff635debc8138, 0xbd55cab233dbd84c
+        .quad 0xc08ff63a8a9b41d8, 0xbd56778ab7aaabc9
+        .quad 0xc08ff63f3496c0e0, 0x3d5b2791da49c370
+        .quad 0xc08ff643dcb08438, 0x3d583063ef145f9c
+        .quad 0xc08ff64882ea1000, 0xbd484e9cab375fb6
+        .quad 0xc08ff64d2744e688, 0xbd5c430c95c374aa
+        .quad 0xc08ff651c9c28848, 0xbd57a16d78490bb3
+        .quad 0xc08ff6566a6473e8, 0xbd445d70374ea9ec
+        .quad 0xc08ff65b092c2648, 0x3d5c9729142b9d4b
+        .quad 0xc08ff65fa61b1a70, 0xbd4aaa179d032405
+        .quad 0xc08ff6644132c9c0, 0xbd2a3ea300d173de
+        .quad 0xc08ff668da74abc0, 0x3d57809438efb010
+        .quad 0xc08ff66d71e23630, 0xbd5e9156720951d6
+        .quad 0xc08ff672077cdd30, 0xbd5bab62e8462035
+        .quad 0xc08ff6769b461310, 0xbd05113545431443
+        .quad 0xc08ff67b2d3f4868, 0x3d5105eb0607e59b
+        .quad 0xc08ff67fbd69ec18, 0xbd5e657842b37dc0
+        .quad 0xc08ff6844bc76b68, 0x3d4ad1849705bc4c
+        .quad 0xc08ff688d85931c8, 0xbd508b6f92b6e0d6
+        .quad 0xc08ff68d6320a920, 0x3d48683cceb5fdfc
+        .quad 0xc08ff691ec1f3990, 0xbd2c25ee290acbf5
+        .quad 0xc08ff696735649a8, 0x3d58904932cd46d0
+        .quad 0xc08ff69af8c73e38, 0xbd5c964167f0bfeb
+        .quad 0xc08ff69f7c737a90, 0xbd43d66937fa06a9
+        .quad 0xc08ff6a3fe5c6040, 0xbd54bc302ffa76fb
+        .quad 0xc08ff6a87e834f50, 0x3d4609b1487f87a3
+        .quad 0xc08ff6acfce9a618, 0xbd42c0d9af0400b1
+        .quad 0xc08ff6b17990c170, 0x3d549a63973d262d
+        .quad 0xc08ff6b5f479fc80, 0xbd28cde894aa0641
+        .quad 0xc08ff6ba6da6b0f0, 0xbd5acef617609a34
+        .quad 0xc08ff6bee51836d8, 0x3d4abb9ff3cf80b8
+        .quad 0xc08ff6c35acfe4a8, 0xbd53dcfa1b7697f3
+        .quad 0xc08ff6c7cecf0f68, 0x3d5bcdf4aea18a55
+        .quad 0xc08ff6cc41170a70, 0x3d3cad29d4324038
+        .quad 0xc08ff6d0b1a927b0, 0x3d56945f9cc2a565
+        .quad 0xc08ff6d52086b780, 0x3d5d20dfc1c668a7
+        .quad 0xc08ff6d98db108b8, 0x3d37f20a9bcbbe04
+        .quad 0xc08ff6ddf92968b8, 0x3d1e0824a6e3a4d2
+        .quad 0xc08ff6e262f12358, 0xbd469f07bf6322c7
+        .quad 0xc08ff6e6cb0982f8, 0xbd5cc593afdbfaef
+        .quad 0xc08ff6eb3173d080, 0xbd5ee68d555d7122
+        .quad 0xc08ff6ef96315360, 0xbd144ee1d6a39124
+        .quad 0xc08ff6f3f9435188, 0xbd40f2cb308bcd25
+        .quad 0xc08ff6f85aab0f80, 0xbd5fd98ced08a73c
+        .quad 0xc08ff6fcba69d068, 0x3d54f2f2a1ea8606
+        .quad 0xc08ff7011880d5d0, 0xbd57818234572db7
+        .quad 0xc08ff70574f16008, 0x3d52429e823a9a83
+        .quad 0xc08ff709cfbcadd0, 0x3d5d6dc9bb81476c
+        .quad 0xc08ff70e28e3fc90, 0x3d57d189e116bcb2
+        .quad 0xc08ff71280688848, 0x3d0e18992809fd6d
+        .quad 0xc08ff716d64b8b98, 0xbd3b48ac92b8549a
+        .quad 0xc08ff71b2a8e3fb8, 0xbd4dcfa48040893b
+        .quad 0xc08ff71f7d31dc88, 0x3d58d945b8e53ef1
+        .quad 0xc08ff723ce379878, 0x3d4f80faef3e15ee
+        .quad 0xc08ff7281da0a8b0, 0x3d53edc0fd40d18f
+        .quad 0xc08ff72c6b6e40f0, 0xbd4bcac66e0be72f
+        .quad 0xc08ff730b7a193b0, 0xbd44fcf96e2ec967
+        .quad 0xc08ff735023bd208, 0x3d57e2ff34b08d86
+        .quad 0xc08ff7394b3e2bb0, 0xbd4caedfb10b98dd
+        .quad 0xc08ff73d92a9cf28, 0xbd55db1083e5ac6a
+        .quad 0xc08ff741d87fe990, 0xbd580e83e6d54ed6
+        .quad 0xc08ff7461cc1a6c0, 0x3d1688c83e1b0cba
+        .quad 0xc08ff74a5f703138, 0xbd52c398c872b701
+        .quad 0xc08ff74ea08cb240, 0xbd49aabc3683b259
+        .quad 0xc08ff752e01851d0, 0x3d5ccba8de72495b
+        .quad 0xc08ff7571e143688, 0xbd5981cf630f5793
+        .quad 0xc08ff75b5a8185e8, 0xbd4f235844e01ebd
+        .quad 0xc08ff75f95616410, 0xbd5047de7ba8ec62
+        .quad 0xc08ff763ceb4f3f0, 0x3d5fa55e004d6562
+        .quad 0xc08ff768067d5720, 0xbd49f386e521a80e
+        .quad 0xc08ff76c3cbbae20, 0x3d3693551e62fe83
+        .quad 0xc08ff77071711818, 0x3d4ba63b30b6c42c
+        .quad 0xc08ff774a49eb300, 0x3d4c26523d32f573
+        .quad 0xc08ff778d6459b98, 0x3d3b65e70806143a
+        .quad 0xc08ff77d0666ed68, 0xbd5796d9c9f2c2cb
+        .quad 0xc08ff7813503c2d0, 0x3d33267b004b912b
+        .quad 0xc08ff785621d34e8, 0x3d1d5d8a23e33341
+        .quad 0xc08ff7898db45ba8, 0x3d46c95233e60f40
+        .quad 0xc08ff78db7ca4dd0, 0x3d362865acc8f43f
+        .quad 0xc08ff791e06020f8, 0xbd10e8203e161511
+        .quad 0xc08ff7960776e988, 0xbd5cafe4f4467eaa
+        .quad 0xc08ff79a2d0fbac8, 0xbd520fddea9ea0cd
+        .quad 0xc08ff79e512ba6d0, 0x3d5c53d3778dae46
+        .quad 0xc08ff7a273cbbe80, 0xbd5f0f6f88490367
+        .quad 0xc08ff7a694f111c0, 0x3d5601aa3f55ec11
+        .quad 0xc08ff7aab49caf20, 0xbd4f1a8a2328a4c4
+        .quad 0xc08ff7aed2cfa438, 0xbd4a3d5341c07d0e
+        .quad 0xc08ff7b2ef8afd68, 0xbd5f4a1f4c525f31
+        .quad 0xc08ff7b70acfc600, 0xbd4d594d77b3d775
+        .quad 0xc08ff7bb249f0828, 0x3d2aef47e37e953b
+        .quad 0xc08ff7bf3cf9ccf0, 0x3d501803b47dfba2
+        .quad 0xc08ff7c353e11c50, 0x3d5ed5ec84e5745e
+        .quad 0xc08ff7c76955fd20, 0xbd3de249bc9e7f96
+        .quad 0xc08ff7cb7d597538, 0x3d5b5794341d1fdf
+        .quad 0xc08ff7cf8fec8938, 0xbd519dbd08276359
+        .quad 0xc08ff7d3a1103cd0, 0xbd450129b8038848
+        .quad 0xc08ff7d7b0c59288, 0x3d348f00d3bb30fd
+        .quad 0xc08ff7dbbf0d8bd8, 0xbd43529025720d8a
+        .quad 0xc08ff7dfcbe92938, 0x3d5abdaa2b1955d7
+        .quad 0xc08ff7e3d75969f8, 0xbd4e8837d4588a98
+        .quad 0xc08ff7e7e15f4c80, 0x3d57a782a6df5a1f
+        .quad 0xc08ff7ebe9fbce08, 0x3d304ba3eaa96bf1
+        .quad 0xc08ff7eff12fead8, 0xbd47aab17b868a60
+        .quad 0xc08ff7f3f6fc9e28, 0xbd5bd858693ba90a
+        .quad 0xc08ff7f7fb62e230, 0x3d26abb2c547789a
+        .quad 0xc08ff7fbfe63b010, 0xbd59d383d543b3f5
+        .quad 0xc08ff80000000000, 0x8000000000000000
+        /*== Log_LA_table ==*/
+        .align 32
+        .quad 0x0000000000000000
+        .quad 0xbf670f83ff0a7565
+        .quad 0xbf7709c46d7aac77
+        .quad 0xbf8143068125dd0e
+        .quad 0xbf86fe50b6ef0851
+        .quad 0xbf8cb6c3abd14559
+        .quad 0xbf91363117a97b0c
+        .quad 0xbf940f9786685d29
+        .quad 0xbf96e79685c2d22a
+        .quad 0xbf99be2f7749acc2
+        .quad 0xbf9c9363ba850f86
+        .quad 0xbf9f6734acf8695a
+        .quad 0xbfa11cd1d5133413
+        .quad 0xbfa2855905ca70f6
+        .quad 0xbfa3ed3094685a26
+        .quad 0xbfa554592bb8cd58
+        .quad 0xbfa6bad3758efd87
+        .quad 0xbfa820a01ac754cb
+        .quad 0xbfa985bfc3495194
+        .quad 0xbfaaea3316095f72
+        .quad 0xbfac4dfab90aab5f
+        .quad 0xbfadb1175160f3b0
+        .quad 0xbfaf1389833253a0
+        .quad 0xbfb03aa8f8dc854c
+        .quad 0xbfb0eb389fa29f9b
+        .quad 0xbfb19b74069f5f0a
+        .quad 0xbfb24b5b7e135a3d
+        .quad 0xbfb2faef55ccb372
+        .quad 0xbfb3aa2fdd27f1c3
+        .quad 0xbfb4591d6310d85a
+        .quad 0xbfb507b836033bb7
+        .quad 0xbfb5b600a40bd4f3
+        .quad 0xbfb663f6fac91316
+        .quad 0xbfb7119b876bea86
+        .quad 0xbfb7beee96b8a281
+        .quad 0xbfb86bf07507a0c7
+        .quad 0xbfb918a16e46335b
+        .quad 0xbfb9c501cdf75872
+        .quad 0xbfba7111df348494
+        .quad 0xbfbb1cd1ecae66e7
+        .quad 0xbfbbc84240adabba
+        .quad 0xbfbc73632513bd4f
+        .quad 0xbfbd1e34e35b82da
+        .quad 0xbfbdc8b7c49a1ddb
+        .quad 0xbfbe72ec117fa5b2
+        .quad 0xbfbf1cd21257e18c
+        .quad 0xbfbfc66a0f0b00a5
+        .quad 0xbfc037da278f2870
+        .quad 0xbfc08c588cda79e4
+        .quad 0xbfc0e0b05ac848ed
+        .quad 0xbfc134e1b489062e
+        .quad 0xbfc188ecbd1d16be
+        .quad 0xbfc1dcd197552b7b
+        .quad 0xbfc2309065d29791
+        .quad 0xbfc284294b07a640
+        .quad 0xbfc2d79c6937efdd
+        .quad 0xbfc32ae9e278ae1a
+        .quad 0xbfc37e11d8b10f89
+        .quad 0xbfc3d1146d9a8a64
+        .quad 0xbfc423f1c2c12ea2
+        .quad 0xbfc476a9f983f74d
+        .quad 0xbfc4c93d33151b24
+        .quad 0xbfc51bab907a5c8a
+        .quad 0xbfc56df5328d58c5
+        .quad 0xbfc5c01a39fbd688
+        .quad 0xbfc6121ac74813cf
+        .quad 0xbfc663f6fac91316
+        .quad 0xbfc6b5aef4aae7dc
+        .quad 0xbfc70742d4ef027f
+        .quad 0xbfc758b2bb6c7b76
+        .quad 0xbfc7a9fec7d05ddf
+        .quad 0xbfc7fb27199df16d
+        .quad 0xbfc84c2bd02f03b3
+        .quad 0xbfc89d0d0ab430cd
+        .quad 0xbfc8edcae8352b6c
+        .quad 0xbfc93e6587910444
+        .quad 0xbfc98edd077e70df
+        .quad 0xbfc9df31868c11d5
+        .quad 0xbfca2f632320b86b
+        .quad 0xbfca7f71fb7bab9d
+        .quad 0xbfcacf5e2db4ec94
+        .quad 0xbfcb1f27d7bd7a80
+        .quad 0xbfcb6ecf175f95e9
+        .quad 0xbfcbbe540a3f036f
+        .quad 0xbfcc0db6cdd94dee
+        .quad 0xbfcc5cf77f860826
+        .quad 0xbfccac163c770dc9
+        .quad 0xbfccfb1321b8c400
+        .quad 0xbfcd49ee4c325970
+        .quad 0xbfcd98a7d8a605a7
+        .quad 0xbfcde73fe3b1480f
+        .quad 0xbfce35b689cd2655
+        .quad 0xbfce840be74e6a4d
+        .quad 0xbfced2401865df52
+        .quad 0xbfcf205339208f27
+        .quad 0xbfcf6e456567fe55
+        .quad 0xbfcfbc16b902680a
+        .quad 0xbfd004e3a7c97cbd
+        .quad 0xbfd02baba24d0664
+        .quad 0xbfd0526359bab1b3
+        .quad 0xbfd0790adbb03009
+        .quad 0xbfd09fa235ba2020
+        .quad 0xbfd0c62975542a8f
+        .quad 0xbfd0eca0a7e91e0b
+        .quad 0xbfd11307dad30b76
+        .quad 0xbfd1395f1b5b61a6
+        .quad 0xbfd15fa676bb08ff
+        .quad 0xbfd185ddfa1a7ed0
+        .quad 0xbfd1ac05b291f070
+        .quad 0xbfd1d21dad295632
+        .quad 0xbfd1f825f6d88e13
+        .quad 0xbfd21e1e9c877639
+        .quad 0xbfd24407ab0e073a
+        .quad 0xbfd269e12f346e2c
+        .quad 0xbfd28fab35b32683
+        .quad 0xbfd2b565cb3313b6
+        .quad 0xbfd2db10fc4d9aaf
+        .quad 0xbfd300acd58cbb10
+        .quad 0xbfd32639636b2836
+        .quad 0xbfd34bb6b2546218
+        .quad 0xbfd37124cea4cded
+        .quad 0xbfd39683c4a9ce9a
+        .quad 0xbfd3bbd3a0a1dcfb
+        .quad 0xbfd3e1146ebc9ff2
+        .quad 0xbfd406463b1b0449
+        .quad 0xbfd42b6911cf5465
+        .quad 0xbfd4507cfedd4fc4
+        .quad 0xbfd475820e3a4251
+        .quad 0xbfd49a784bcd1b8b
+        .quad 0xbfd4bf5fc36e8577
+        .quad 0xbfd4e43880e8fb6a
+        .quad 0xbfd509028ff8e0a2
+        .quad 0xbfd52dbdfc4c96b3
+        .quad 0xbfd5526ad18493ce
+        .quad 0xbfd577091b3378cb
+        .quad 0xbfd59b98e4de271c
+        .quad 0xbfd5c01a39fbd688
+        .quad 0xbfd5e48d25f62ab9
+        .quad 0xbfd608f1b42948ae
+        .quad 0xbfd62d47efe3ebee
+        .quad 0xbfd6518fe4677ba7
+        .quad 0xbfd675c99ce81f92
+        .quad 0xbfd699f5248cd4b8
+        .quad 0xbfd6be12866f820d
+        .quad 0xbfd6e221cd9d0cde
+        .quad 0xbfd7062305156d1d
+        .quad 0xbfd72a1637cbc183
+        .quad 0xbfd74dfb70a66388
+        .quad 0xbfd771d2ba7efb3c
+        .quad 0xbfd7959c202292f1
+        .quad 0xbfd7b957ac51aac4
+        .quad 0xbfd7dd0569c04bff
+        .quad 0xbfd800a563161c54
+        .quad 0xbfd82437a2ee70f7
+        .quad 0xbfd847bc33d8618e
+        .quad 0xbfd86b332056db01
+        .quad 0xbfd88e9c72e0b226
+        .quad 0xbfd8b1f835e0b642
+        .quad 0xbfd8d54673b5c372
+        .quad 0xbfd8f88736b2d4e8
+        .quad 0xbfd91bba891f1709
+        .quad 0xbfd93ee07535f967
+        .quad 0xbfd961f90527409c
+        .quad 0xbfd98504431717fc
+        .quad 0xbfd9a802391e232f
+        .quad 0xbfd9caf2f1498fa4
+        .quad 0xbfd9edd6759b25e0
+        .quad 0xbfda10acd0095ab4
+        .quad 0xbfda33760a7f6051
+        .quad 0xbfda56322edd3731
+        .quad 0xbfda78e146f7bef4
+        .quad 0xbfda9b835c98c70a
+        .quad 0xbfdabe18797f1f49
+        .quad 0xbfdae0a0a75ea862
+        .quad 0xbfdb031befe06434
+        .quad 0xbfdb258a5ca28608
+        .quad 0xbfdb47ebf73882a1
+        .quad 0xbfdb6a40c92b203f
+        .quad 0xbfdb8c88dbf8867a
+        .quad 0xbfdbaec439144dfd
+        .quad 0xbfdbd0f2e9e79031
+        .quad 0xbfdbf314f7d0f6ba
+        .quad 0xbfdc152a6c24cae6
+        .quad 0xbfdc3733502d04f8
+        .quad 0xbfdc592fad295b56
+        .quad 0xbfdc7b1f8c4f51a4
+        .quad 0xbfdc9d02f6ca47b4
+        .quad 0xbfdcbed9f5bb886a
+        .quad 0xbfdce0a4923a587d
+        .quad 0xbfdd0262d554051c
+        .quad 0xbfdd2414c80bf27d
+        .quad 0xbfdd45ba735baa4f
+        .quad 0xbfdd6753e032ea0f
+        .quad 0xbfdd88e11777b149
+        .quad 0xbfddaa6222064fb9
+        .quad 0xbfddcbd708b17359
+        .quad 0xbfdded3fd442364c
+        .quad 0xbfde0e9c8d782cbd
+        .quad 0xbfde2fed3d097298
+        .quad 0xbfde5131eba2b931
+        .quad 0xbfde726aa1e754d2
+        .quad 0xbfde939768714a32
+        .quad 0xbfdeb4b847d15bce
+        .quad 0xbfded5cd488f1732
+        .quad 0xbfdef6d67328e220
+        .quad 0xbfdf17d3d01407af
+        .quad 0xbfdf38c567bcc541
+        .quad 0xbfdf59ab4286576c
+        .quad 0xbfdf7a8568cb06cf
+        .quad 0xbfdf9b53e2dc34c4
+        .quad 0xbfdfbc16b902680a
+        .quad 0xbfdfdccdf37d594c
+        .quad 0xbfdffd799a83ff9b
+        .quad 0x3fdfe1e649bb6335
+        .quad 0x3fdfc151b11b3640
+        .quad 0x3fdfa0c8937e7d5d
+        .quad 0x3fdf804ae8d0cd02
+        .quad 0x3fdf5fd8a9063e35
+        .quad 0x3fdf3f71cc1b629c
+        .quad 0x3fdf1f164a15389a
+        .quad 0x3fdefec61b011f85
+        .quad 0x3fdede8136f4cbf1
+        .quad 0x3fdebe47960e3c08
+        .quad 0x3fde9e193073ac06
+        .quad 0x3fde7df5fe538ab3
+        .quad 0x3fde5dddf7e46e0a
+        .quad 0x3fde3dd1156507de
+        .quad 0x3fde1dcf4f1c1a9e
+        .quad 0x3fddfdd89d586e2b
+        .quad 0x3fddddecf870c4c1
+        .quad 0x3fddbe0c58c3cff2
+        .quad 0x3fdd9e36b6b825b1
+        .quad 0x3fdd7e6c0abc3579
+        .quad 0x3fdd5eac4d463d7e
+        .quad 0x3fdd3ef776d43ff4
+        .quad 0x3fdd1f4d7febf868
+        .quad 0x3fdcffae611ad12b
+        .quad 0x3fdce01a12f5d8d1
+        .quad 0x3fdcc0908e19b7bd
+        .quad 0x3fdca111cb2aa5c5
+        .quad 0x3fdc819dc2d45fe4
+        .quad 0x3fdc62346dca1dfe
+        .quad 0x3fdc42d5c4c688b4
+        .quad 0x3fdc2381c08baf4f
+        .quad 0x3fdc043859e2fdb3
+        .quad 0x3fdbe4f9899d326e
+        .quad 0x3fdbc5c5489254cc
+        .quad 0x3fdba69b8fa1ab02
+        .quad 0x3fdb877c57b1b070
+        .quad 0x3fdb686799b00be3
+        .quad 0x3fdb495d4e9185f7
+        .quad 0x3fdb2a5d6f51ff83
+        .quad 0x3fdb0b67f4f46810
+        .quad 0x3fdaec7cd882b46c
+        .quad 0x3fdacd9c130dd53f
+        .quad 0x3fdaaec59dadadbe
+        .quad 0x3fda8ff971810a5e
+        .quad 0x3fda713787ad97a5
+        .quad 0x3fda527fd95fd8ff
+        .quad 0x3fda33d25fcb1fac
+        .quad 0x3fda152f142981b4
+        .quad 0x3fd9f695efbbd0ef
+        .quad 0x3fd9d806ebc9921c
+        .quad 0x3fd9b98201a0f405
+        .quad 0x3fd99b072a96c6b2
+        .quad 0x3fd97c96600672ad
+        .quad 0x3fd95e2f9b51f04e
+        .quad 0x3fd93fd2d5e1bf1d
+        .quad 0x3fd921800924dd3b
+        .quad 0x3fd903372e90bee4
+        .quad 0x3fd8e4f83fa145ee
+        .quad 0x3fd8c6c335d8b966
+        .quad 0x3fd8a8980abfbd32
+        .quad 0x3fd88a76b7e549c6
+        .quad 0x3fd86c5f36dea3dc
+        .quad 0x3fd84e5181475449
+        .quad 0x3fd8304d90c11fd3
+        .quad 0x3fd812535ef3ff19
+        .quad 0x3fd7f462e58e1688
+        .quad 0x3fd7d67c1e43ae5c
+        .quad 0x3fd7b89f02cf2aad
+        .quad 0x3fd79acb8cf10390
+        .quad 0x3fd77d01b66fbd37
+        .quad 0x3fd75f417917e02c
+        .quad 0x3fd7418acebbf18f
+        .quad 0x3fd723ddb1346b65
+        .quad 0x3fd7063a1a5fb4f2
+        .quad 0x3fd6e8a004221b1f
+        .quad 0x3fd6cb0f6865c8ea
+        .quad 0x3fd6ad88411abfea
+        .quad 0x3fd6900a8836d0d5
+        .quad 0x3fd6729637b59418
+        .quad 0x3fd6552b49986277
+        .quad 0x3fd637c9b7e64dc2
+        .quad 0x3fd61a717cac1983
+        .quad 0x3fd5fd2291fc33cf
+        .quad 0x3fd5dfdcf1eeae0e
+        .quad 0x3fd5c2a096a135dc
+        .quad 0x3fd5a56d7a370ded
+        .quad 0x3fd5884396d90702
+        .quad 0x3fd56b22e6b578e5
+        .quad 0x3fd54e0b64003b70
+        .quad 0x3fd530fd08f29fa7
+        .quad 0x3fd513f7cfcb68ce
+        .quad 0x3fd4f6fbb2cec598
+        .quad 0x3fd4da08ac46495a
+        .quad 0x3fd4bd1eb680e548
+        .quad 0x3fd4a03dcbd2e1be
+        .quad 0x3fd48365e695d797
+        .quad 0x3fd466970128a987
+        .quad 0x3fd449d115ef7d87
+        .quad 0x3fd42d141f53b646
+        .quad 0x3fd4106017c3eca3
+        .quad 0x3fd3f3b4f9b3e939
+        .quad 0x3fd3d712bf9c9def
+        .quad 0x3fd3ba7963fc1f8f
+        .quad 0x3fd39de8e1559f6f
+        .quad 0x3fd3816132316520
+        .quad 0x3fd364e2511cc821
+        .quad 0x3fd3486c38aa29a8
+        .quad 0x3fd32bfee370ee68
+        .quad 0x3fd30f9a4c0d786d
+        .quad 0x3fd2f33e6d2120f2
+        .quad 0x3fd2d6eb4152324f
+        .quad 0x3fd2baa0c34be1ec
+        .quad 0x3fd29e5eedbe4a35
+        .quad 0x3fd28225bb5e64a4
+        .quad 0x3fd265f526e603cb
+        .quad 0x3fd249cd2b13cd6c
+        .quad 0x3fd22dadc2ab3497
+        .quad 0x3fd21196e87473d1
+        .quad 0x3fd1f588973c8747
+        .quad 0x3fd1d982c9d52708
+        .quad 0x3fd1bd857b14c146
+        .quad 0x3fd1a190a5d674a0
+        .quad 0x3fd185a444fa0a7b
+        .quad 0x3fd169c05363f158
+        .quad 0x3fd14de4cbfd373e
+        .quad 0x3fd13211a9b38424
+        .quad 0x3fd11646e7791469
+        .quad 0x3fd0fa848044b351
+        .quad 0x3fd0deca6f11b58b
+        .quad 0x3fd0c318aedff3c0
+        .quad 0x3fd0a76f3ab3c52c
+        .quad 0x3fd08bce0d95fa38
+        .quad 0x3fd070352293d724
+        .quad 0x3fd054a474bf0eb7
+        .quad 0x3fd0391bff2dbcf3
+        .quad 0x3fd01d9bbcfa61d4
+        .quad 0x3fd00223a943dc19
+        .quad 0x3fcfcd677e5ac81d
+        .quad 0x3fcf9697f3bd0ccf
+        .quad 0x3fcf5fd8a9063e35
+        .quad 0x3fcf29299496a889
+        .quad 0x3fcef28aacd72231
+        .quad 0x3fcebbfbe83901a6
+        .quad 0x3fce857d3d361368
+        .quad 0x3fce4f0ea2509008
+        .quad 0x3fce18b00e13123d
+        .quad 0x3fcde26177108d03
+        .quad 0x3fcdac22d3e441d3
+        .quad 0x3fcd75f41b31b6dd
+        .quad 0x3fcd3fd543a4ad5c
+        .quad 0x3fcd09c643f117f0
+        .quad 0x3fccd3c712d31109
+        .quad 0x3fcc9dd7a70ed160
+        .quad 0x3fcc67f7f770a67e
+        .quad 0x3fcc3227facce950
+        .quad 0x3fcbfc67a7fff4cc
+        .quad 0x3fcbc6b6f5ee1c9b
+        .quad 0x3fcb9115db83a3dd
+        .quad 0x3fcb5b844fb4b3ef
+        .quad 0x3fcb2602497d5346
+        .quad 0x3fcaf08fbfe15c51
+        .quad 0x3fcabb2ca9ec7472
+        .quad 0x3fca85d8feb202f7
+        .quad 0x3fca5094b54d2828
+        .quad 0x3fca1b5fc4e0b465
+        .quad 0x3fc9e63a24971f46
+        .quad 0x3fc9b123cba27ed3
+        .quad 0x3fc97c1cb13c7ec1
+        .quad 0x3fc94724cca657be
+        .quad 0x3fc9123c1528c6ce
+        .quad 0x3fc8dd62821404a9
+        .quad 0x3fc8a8980abfbd32
+        .quad 0x3fc873dca68b06f4
+        .quad 0x3fc83f304cdc5aa7
+        .quad 0x3fc80a92f5218acc
+        .quad 0x3fc7d60496cfbb4c
+        .quad 0x3fc7a18529635926
+        .quad 0x3fc76d14a4601225
+        .quad 0x3fc738b2ff50ccad
+        .quad 0x3fc7046031c79f85
+        .quad 0x3fc6d01c335dc9b5
+        .quad 0x3fc69be6fbb3aa6f
+        .quad 0x3fc667c08270b905
+        .quad 0x3fc633a8bf437ce1
+        .quad 0x3fc5ff9fa9e18595
+        .quad 0x3fc5cba53a0762ed
+        .quad 0x3fc597b967789d12
+        .quad 0x3fc563dc29ffacb2
+        .quad 0x3fc5300d796df33a
+        .quad 0x3fc4fc4d4d9bb313
+        .quad 0x3fc4c89b9e6807f5
+        .quad 0x3fc494f863b8df35
+        .quad 0x3fc46163957af02e
+        .quad 0x3fc42ddd2ba1b4a9
+        .quad 0x3fc3fa651e276158
+        .quad 0x3fc3c6fb650cde51
+        .quad 0x3fc3939ff859bf9f
+        .quad 0x3fc36052d01c3dd7
+        .quad 0x3fc32d13e4692eb7
+        .quad 0x3fc2f9e32d5bfdd1
+        .quad 0x3fc2c6c0a316a540
+        .quad 0x3fc293ac3dc1a668
+        .quad 0x3fc260a5f58c02bd
+        .quad 0x3fc22dadc2ab3497
+        .quad 0x3fc1fac39d5b280c
+        .quad 0x3fc1c7e77dde33dc
+        .quad 0x3fc195195c7d125b
+        .quad 0x3fc162593186da70
+        .quad 0x3fc12fa6f550f896
+        .quad 0x3fc0fd02a03727ea
+        .quad 0x3fc0ca6c2a9b6b41
+        .quad 0x3fc097e38ce60649
+        .quad 0x3fc06568bf8576b3
+        .quad 0x3fc032fbbaee6d65
+        .quad 0x3fc0009c779bc7b5
+        .quad 0x3fbf9c95dc1d1165
+        .quad 0x3fbf380e2d9ba4df
+        .quad 0x3fbed3a1d4cdbebb
+        .quad 0x3fbe6f50c2d9f754
+        .quad 0x3fbe0b1ae8f2fd56
+        .quad 0x3fbda700385788a2
+        .quad 0x3fbd4300a2524d41
+        .quad 0x3fbcdf1c1839ee74
+        .quad 0x3fbc7b528b70f1c5
+        .quad 0x3fbc17a3ed65b23c
+        .quad 0x3fbbb4102f925394
+        .quad 0x3fbb5097437cb58e
+        .quad 0x3fbaed391ab6674e
+        .quad 0x3fba89f5a6dc9acc
+        .quad 0x3fba26ccd9981853
+        .quad 0x3fb9c3bea49d3214
+        .quad 0x3fb960caf9abb7ca
+        .quad 0x3fb8fdf1ca8eea6a
+        .quad 0x3fb89b33091d6fe8
+        .quad 0x3fb8388ea739470a
+        .quad 0x3fb7d60496cfbb4c
+        .quad 0x3fb77394c9d958d5
+        .quad 0x3fb7113f3259e07a
+        .quad 0x3fb6af03c2603bd0
+        .quad 0x3fb64ce26c067157
+        .quad 0x3fb5eadb217198a3
+        .quad 0x3fb588edd4d1ceaa
+        .quad 0x3fb5271a78622a0f
+        .quad 0x3fb4c560fe68af88
+        .quad 0x3fb463c15936464e
+        .quad 0x3fb4023b7b26ac9e
+        .quad 0x3fb3a0cf56a06c4b
+        .quad 0x3fb33f7cde14cf5a
+        .quad 0x3fb2de4403ffd4b3
+        .quad 0x3fb27d24bae824db
+        .quad 0x3fb21c1ef55f06c2
+        .quad 0x3fb1bb32a600549d
+        .quad 0x3fb15a5fbf7270ce
+        .quad 0x3fb0f9a634663add
+        .quad 0x3fb09905f797047c
+        .quad 0x3fb0387efbca869e
+        .quad 0x3fafb02267a1ad2d
+        .quad 0x3faeef792508b69d
+        .quad 0x3fae2f02159384fe
+        .quad 0x3fad6ebd1f1febfe
+        .quad 0x3facaeaa27a02241
+        .quad 0x3fabeec9151aac2e
+        .quad 0x3fab2f19cdaa46dc
+        .quad 0x3faa6f9c377dd31b
+        .quad 0x3fa9b05038d84095
+        .quad 0x3fa8f135b8107912
+        .quad 0x3fa8324c9b914bc7
+        .quad 0x3fa77394c9d958d5
+        .quad 0x3fa6b50e297afcce
+        .quad 0x3fa5f6b8a11c3c61
+        .quad 0x3fa538941776b01e
+        .quad 0x3fa47aa07357704f
+        .quad 0x3fa3bcdd9b9f00f3
+        .quad 0x3fa2ff4b77413dcb
+        .quad 0x3fa241e9ed454683
+        .quad 0x3fa184b8e4c56af8
+        .quad 0x3fa0c7b844ef1795
+        .quad 0x3fa00ae7f502c1c4
+        .quad 0x3f9e9c8fb8a7a900
+        .quad 0x3f9d23afc49139f9
+        .quad 0x3f9bab2fdcb46ec7
+        .quad 0x3f9a330fd028f75f
+        .quad 0x3f98bb4f6e2bd536
+        .quad 0x3f9743ee861f3556
+        .quad 0x3f95ccece78a4a9e
+        .quad 0x3f94564a62192834
+        .quad 0x3f92e006c59c9c29
+        .quad 0x3f916a21e20a0a45
+        .quad 0x3f8fe9370ef68e1b
+        .quad 0x3f8cfee70c5ce5dc
+        .quad 0x3f8a15535d0bab34
+        .quad 0x3f872c7ba20f7327
+        .quad 0x3f84445f7cbc8fd2
+        .quad 0x3f815cfe8eaec830
+        .quad 0x3f7cecb0f3922091
+        .quad 0x3f7720d9c06a835f
+        .quad 0x3f715676c8c7a8c1
+        .quad 0x3f671b0ea42e5fda
+        .quad 0x3f57182a894b69c6
+        .quad 0x8000000000000000
+        /*== poly_coeff[5] ==*/
+        .align 32
+        .quad 0x3fd2776E996DA1D2, 0x3fd2776E996DA1D2, 0x3fd2776E996DA1D2, 0x3fd2776E996DA1D2 /* coeff5 */
+        .quad 0xbfd715494C3E7C9B, 0xbfd715494C3E7C9B, 0xbfd715494C3E7C9B, 0xbfd715494C3E7C9B /* coeff4 */
+        .quad 0x3fdEC709DC39E926, 0x3fdEC709DC39E926, 0x3fdEC709DC39E926, 0x3fdEC709DC39E926 /* coeff3 */
+        .quad 0xbfe71547652B7CF8, 0xbfe71547652B7CF8, 0xbfe71547652B7CF8, 0xbfe71547652B7CF8 /* coeff2 */
+        .quad 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE /* coeff1 */
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 32
+        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
+        /*== MinNorm ==*/
+        .align 32
+        .quad 0x0010000000000000, 0x0010000000000000, 0x0010000000000000, 0x0010000000000000
+        /*== MaxNorm ==*/
+        .align 32
+        .quad 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff
+        /*== HalfMask ==*/
+        .align 32
+        .quad 0xfffffffffc000000, 0xfffffffffc000000, 0xfffffffffc000000, 0xfffffffffc000000
+        /*== One ==*/
+        .align 32
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== Threshold ==*/
+        .align 32
+        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 32
+        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 32
+        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
+        .align 32
+        .type	__svml_dlog2_data_internal,@object
+        .size	__svml_dlog2_data_internal,.-__svml_dlog2_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core-avx2.S
new file mode 100644
index 0000000000..804de5fe0c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized log2, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_log2 _ZGVeN8v_log2_avx2_wrapper
+#include "../svml_d_log28_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core.c
new file mode 100644
index 0000000000..bd55abecc7
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized log2, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_log2
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_log2, __GI__ZGVeN8v_log2, __redirect__ZGVeN8v_log2)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core_avx512.S
new file mode 100644
index 0000000000..211a78f315
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core_avx512.S
@@ -0,0 +1,293 @@
+/* Function log2 vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log2(x) = k - log2(Rcp) + poly_approximation(R)
+ *       log2(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dlog2_data_internal_avx512
+ */
+#define Log_tbl                       	0
+#define One                           	128
+#define C075                          	192
+#define poly_coeff9                   	256
+#define poly_coeff8                   	320
+#define poly_coeff7                   	384
+#define poly_coeff6                   	448
+#define poly_coeff5                   	512
+#define poly_coeff4                   	576
+#define poly_coeff3                   	640
+#define poly_coeff2                   	704
+#define poly_coeff1                   	768
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_log2_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovaps   %zmm0, %zmm7
+        vgetmantpd $8, {sae}, %zmm7, %zmm6
+        vmovups   One+__svml_dlog2_data_internal_avx512(%rip), %zmm2
+        vmovups   poly_coeff5+__svml_dlog2_data_internal_avx512(%rip), %zmm12
+        vmovups   poly_coeff3+__svml_dlog2_data_internal_avx512(%rip), %zmm13
+
+/* Start polynomial evaluation */
+        vmovups   poly_coeff9+__svml_dlog2_data_internal_avx512(%rip), %zmm10
+        vmovups   poly_coeff8+__svml_dlog2_data_internal_avx512(%rip), %zmm0
+        vmovups   poly_coeff7+__svml_dlog2_data_internal_avx512(%rip), %zmm11
+        vmovups   poly_coeff6+__svml_dlog2_data_internal_avx512(%rip), %zmm14
+
+/* Prepare exponent correction: DblRcp<0.75? */
+        vmovups   C075+__svml_dlog2_data_internal_avx512(%rip), %zmm1
+
+/* Table lookup */
+        vmovups   __svml_dlog2_data_internal_avx512(%rip), %zmm4
+
+/* GetExp(x) */
+        vgetexppd {sae}, %zmm7, %zmm5
+
+/* DblRcp ~ 1/Mantissa */
+        vrcp14pd  %zmm6, %zmm8
+
+/* x<=0? */
+        vfpclasspd $94, %zmm7, %k0
+
+/* round DblRcp to 4 fractional bits (RN mode, no Precision exception) */
+        vrndscalepd $88, {sae}, %zmm8, %zmm3
+        vmovups   poly_coeff4+__svml_dlog2_data_internal_avx512(%rip), %zmm8
+        kmovw     %k0, %edx
+
+/* Reduced argument: R = DblRcp*Mantissa - 1 */
+        vfmsub213pd {rn-sae}, %zmm2, %zmm3, %zmm6
+        vcmppd    $17, {sae}, %zmm1, %zmm3, %k1
+        vfmadd231pd {rn-sae}, %zmm6, %zmm12, %zmm8
+        vmovups   poly_coeff2+__svml_dlog2_data_internal_avx512(%rip), %zmm12
+        vfmadd231pd {rn-sae}, %zmm6, %zmm10, %zmm0
+        vfmadd231pd {rn-sae}, %zmm6, %zmm11, %zmm14
+        vmovups   poly_coeff1+__svml_dlog2_data_internal_avx512(%rip), %zmm1
+
+/* R^2 */
+        vmulpd    {rn-sae}, %zmm6, %zmm6, %zmm15
+        vfmadd231pd {rn-sae}, %zmm6, %zmm13, %zmm12
+
+/* Prepare table index */
+        vpsrlq    $48, %zmm3, %zmm9
+
+/* add 1 to Expon if DblRcp<0.75 */
+        vaddpd    {rn-sae}, %zmm2, %zmm5, %zmm5{%k1}
+        vmulpd    {rn-sae}, %zmm15, %zmm15, %zmm13
+        vfmadd213pd {rn-sae}, %zmm14, %zmm15, %zmm0
+        vfmadd213pd {rn-sae}, %zmm12, %zmm15, %zmm8
+        vpermt2pd Log_tbl+64+__svml_dlog2_data_internal_avx512(%rip), %zmm9, %zmm4
+
+/* polynomial */
+        vfmadd213pd {rn-sae}, %zmm8, %zmm13, %zmm0
+        vfmadd213pd {rn-sae}, %zmm1, %zmm6, %zmm0
+        vfmadd213pd {rn-sae}, %zmm4, %zmm0, %zmm6
+        vaddpd    {rn-sae}, %zmm6, %zmm5, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm7
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm7, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      log2@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_log2_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dlog2_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Log_tbl[16][2];
+        __declspec(align(64)) VUINT32 One[8][2];
+        __declspec(align(64)) VUINT32 C075[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
+   } __svml_dlog2_data_internal_avx512;
+#endif
+__svml_dlog2_data_internal_avx512:
+        /*== Log_tbl ==*/
+        .quad 0x0000000000000000
+        .quad 0xbfb663f6fac91316
+        .quad 0xbfc5c01a39fbd688
+        .quad 0xbfcfbc16b902680a
+        .quad 0xbfd49a784bcd1b8b
+        .quad 0xbfd91bba891f1709
+        .quad 0xbfdd6753e032ea0f
+        .quad 0xbfe0c10500d63aa6
+        .quad 0x3fda8ff971810a5e
+        .quad 0x3fd6cb0f6865c8ea
+        .quad 0x3fd32bfee370ee68
+        .quad 0x3fcf5fd8a9063e35
+        .quad 0x3fc8a8980abfbd32
+        .quad 0x3fc22dadc2ab3497
+        .quad 0x3fb7d60496cfbb4c
+        .quad 0x3fa77394c9d958d5
+        /*== One ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== C075 0.75 ==*/
+        .align 64
+        .quad 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000
+        /*== poly_coeff9 ==*/
+        .align 64
+        .quad 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12
+        /*== poly_coeff8 ==*/
+        .align 64
+        .quad 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce
+        /*== poly_coeff7 ==*/
+        .align 64
+        .quad 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613
+        /*== poly_coeff6 ==*/
+        .align 64
+        .quad 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c
+        /*== poly_coeff5 ==*/
+        .align 64
+        .quad 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a
+        /*== poly_coeff4 ==*/
+        .align 64
+        .quad 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d
+        /*== poly_coeff3 ==*/
+        .align 64
+        .quad 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f
+        /*== poly_coeff2 ==*/
+        .align 64
+        .quad 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4
+        /*== poly_coeff1 ==*/
+        .align 64
+        .quad 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe
+        .align 64
+        .type	__svml_dlog2_data_internal_avx512,@object
+        .size	__svml_dlog2_data_internal_avx512,.-__svml_dlog2_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core-avx2.S
new file mode 100644
index 0000000000..234bf4750b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized log2f.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_log2f _ZGVeN16v_log2f_avx2_wrapper
+#include "../svml_s_log2f16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core.c
new file mode 100644
index 0000000000..abf4f04988
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized log2f, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_log2f
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_log2f, __GI__ZGVeN16v_log2f,
+	       __redirect__ZGVeN16v_log2f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core_avx512.S
new file mode 100644
index 0000000000..c3a5aceef4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core_avx512.S
@@ -0,0 +1,231 @@
+/* Function log2f vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log2(x) = k - log2(Rcp) + poly_approximation(R)
+ *       log2(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_slog2_data_internal_avx512
+ */
+#define One                           	0
+#define coeff4                        	64
+#define coeff3                        	128
+#define coeff2                        	192
+#define coeff1                        	256
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_log2f_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vgetmantps $11, {sae}, %zmm0, %zmm3
+        vmovups   __svml_slog2_data_internal_avx512(%rip), %zmm1
+        vgetexpps {sae}, %zmm0, %zmm5
+
+/* x<=0? */
+        vfpclassps $94, %zmm0, %k0
+        vsubps    {rn-sae}, %zmm1, %zmm3, %zmm9
+        vpsrld    $19, %zmm3, %zmm7
+        vgetexpps {sae}, %zmm3, %zmm6
+        vpermps   coeff4+__svml_slog2_data_internal_avx512(%rip), %zmm7, %zmm1
+        vpermps   coeff3+__svml_slog2_data_internal_avx512(%rip), %zmm7, %zmm2
+        vpermps   coeff2+__svml_slog2_data_internal_avx512(%rip), %zmm7, %zmm4
+        vpermps   coeff1+__svml_slog2_data_internal_avx512(%rip), %zmm7, %zmm8
+        vsubps    {rn-sae}, %zmm6, %zmm5, %zmm10
+        vfmadd213ps {rn-sae}, %zmm2, %zmm9, %zmm1
+        kmovw     %k0, %edx
+        vfmadd213ps {rn-sae}, %zmm4, %zmm9, %zmm1
+        vfmadd213ps {rn-sae}, %zmm8, %zmm9, %zmm1
+        vfmadd213ps {rn-sae}, %zmm10, %zmm9, %zmm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %zmm1, %zmm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm0, 64(%rsp)
+        vmovups   %zmm1, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm1
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      log2f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_log2f_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_slog2_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 One[16][1];
+        __declspec(align(64)) VUINT32 coeff4[16][1];
+        __declspec(align(64)) VUINT32 coeff3[16][1];
+        __declspec(align(64)) VUINT32 coeff2[16][1];
+        __declspec(align(64)) VUINT32 coeff1[16][1];
+    } __svml_slog2_data_internal_avx512;
+#endif
+__svml_slog2_data_internal_avx512:
+        /*== One ==*/
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        // c4
+        .align 64
+        .long 0xbea77e4a, 0xbe8aae3d
+        .long 0xbe67fe32, 0xbe43d1b6
+        .long 0xbe26a589, 0xbe0ee09b
+        .long 0xbdf6a8a1, 0xbdd63b49
+        .long 0xbf584e51, 0xbf3e80a1
+        .long 0xbf2892f0, 0xbf15d377
+        .long 0xbf05b525, 0xbeef8e30
+        .long 0xbed75c8f, 0xbec24184
+        // c3
+        .align 64
+        .long 0x3ef5910c, 0x3ef045a1
+        .long 0x3ee7d87e, 0x3eddbb84
+        .long 0x3ed2d6df, 0x3ec7bbd2
+        .long 0x3ebcc42f, 0x3eb22616
+        .long 0x3e8f3399, 0x3eb1223e
+        .long 0x3ec9db4a, 0x3edb7a09
+        .long 0x3ee79a1a, 0x3eef77cb
+        .long 0x3ef407a4, 0x3ef607b4
+        // c2
+        .align 64
+        .long 0xbf38a934, 0xbf387de6
+        .long 0xbf37f6f0, 0xbf37048b
+        .long 0xbf35a88a, 0xbf33ed04
+        .long 0xbf31df56, 0xbf2f8d82
+        .long 0xbf416814, 0xbf3daf58
+        .long 0xbf3b5c08, 0xbf39fa2a
+        .long 0xbf393713, 0xbf38d7e1
+        .long 0xbf38b2cd, 0xbf38aa62
+        // c1
+        .align 64
+        .long 0x3fb8aa3b, 0x3fb8a9c0
+        .long 0x3fb8a6e8, 0x3fb89f4e
+        .long 0x3fb890cb, 0x3fb879b1
+        .long 0x3fb858d8, 0x3fb82d90
+        .long 0x3fb8655e, 0x3fb8883a
+        .long 0x3fb89aea, 0x3fb8a42f
+        .long 0x3fb8a848, 0x3fb8a9c9
+        .long 0x3fb8aa2f, 0x3fb8aa3b
+        .align 64
+        .type	__svml_slog2_data_internal_avx512,@object
+        .size	__svml_slog2_data_internal_avx512,.-__svml_slog2_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core-sse2.S
new file mode 100644
index 0000000000..dd0e763ac9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized log2f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_log2f _ZGVbN4v_log2f_sse2
+#include "../svml_s_log2f4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core.c
new file mode 100644
index 0000000000..1eb68d9f52
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized log2f, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_log2f
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_log2f, __GI__ZGVbN4v_log2f,
+	       __redirect__ZGVbN4v_log2f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core_sse4.S
new file mode 100644
index 0000000000..a45ea919f4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core_sse4.S
@@ -0,0 +1,223 @@
+/* Function log2f vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log2(x) = k - log2(Rcp) + poly_approximation(R)
+ *       log2(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_slog2_data_internal
+ */
+#define MinNorm                       	0
+#define MaxNorm                       	16
+#define iBrkValue                     	32
+#define iOffExpoMask                  	48
+#define One                           	64
+#define sPoly                         	80
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_log2f_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm1
+
+/* reduction: compute r,n */
+        movdqu    iBrkValue+__svml_slog2_data_internal(%rip), %xmm2
+        movaps    %xmm0, %xmm4
+        movdqu    iOffExpoMask+__svml_slog2_data_internal(%rip), %xmm10
+        psubd     %xmm2, %xmm1
+        pand      %xmm1, %xmm10
+        movaps    %xmm0, %xmm3
+        paddd     %xmm2, %xmm10
+        psrad     $23, %xmm1
+        movups    sPoly+__svml_slog2_data_internal(%rip), %xmm5
+        movups    sPoly+32+__svml_slog2_data_internal(%rip), %xmm6
+        movups    sPoly+64+__svml_slog2_data_internal(%rip), %xmm7
+        movups    sPoly+96+__svml_slog2_data_internal(%rip), %xmm9
+        cmpltps   MinNorm+__svml_slog2_data_internal(%rip), %xmm4
+        cmpnleps  MaxNorm+__svml_slog2_data_internal(%rip), %xmm3
+        cvtdq2ps  %xmm1, %xmm1
+        subps     One+__svml_slog2_data_internal(%rip), %xmm10
+        mulps     %xmm10, %xmm5
+        movaps    %xmm10, %xmm8
+        mulps     %xmm10, %xmm6
+        mulps     %xmm10, %xmm8
+        addps     sPoly+16+__svml_slog2_data_internal(%rip), %xmm5
+        mulps     %xmm10, %xmm7
+        addps     sPoly+48+__svml_slog2_data_internal(%rip), %xmm6
+        mulps     %xmm10, %xmm9
+        mulps     %xmm8, %xmm5
+        addps     sPoly+80+__svml_slog2_data_internal(%rip), %xmm7
+        addps     sPoly+112+__svml_slog2_data_internal(%rip), %xmm9
+        addps     %xmm5, %xmm6
+        mulps     %xmm8, %xmm6
+        orps      %xmm3, %xmm4
+
+/* combine and get argument value range mask */
+        movmskps  %xmm4, %edx
+        addps     %xmm6, %xmm7
+        mulps     %xmm7, %xmm8
+        addps     %xmm8, %xmm9
+        mulps     %xmm10, %xmm9
+        addps     sPoly+128+__svml_slog2_data_internal(%rip), %xmm9
+        mulps     %xmm9, %xmm10
+        addps     %xmm10, %xmm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm1, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm1, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      log2f@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_log2f_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_slog2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 MinNorm[4][1];
+        __declspec(align(16)) VUINT32 MaxNorm[4][1];
+        __declspec(align(16)) VUINT32 iBrkValue[4][1];
+        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
+        __declspec(align(16)) VUINT32 One[4][1];
+        __declspec(align(16)) VUINT32 sPoly[9][4][1];
+} __svml_slog2_data_internal;
+#endif
+__svml_slog2_data_internal:
+        /*== MinNorm ==*/
+        .long 0x00800000, 0x00800000, 0x00800000, 0x00800000
+        /*== MaxNorm ==*/
+        .align 16
+        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 16
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 16
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sOne = SP 1.0 ==*/
+        .align 16
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== spoly[9] ==*/
+        .align 16
+        .long 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012 /* coeff9 */
+        .long 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14 /* coeff8 */
+        .long 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B /* coeff7 */
+        .long 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824 /* coeff6 */
+        .long 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07 /* coeff5 */
+        .long 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969 /* coeff4 */
+        .long 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0 /* coeff3 */
+        .long 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B /* coeff2 */
+        .long 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B /* coeff1 */
+        .align 16
+        .type	__svml_slog2_data_internal,@object
+        .size	__svml_slog2_data_internal,.-__svml_slog2_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core-sse.S
new file mode 100644
index 0000000000..ec4b70568d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized log2f, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_log2f _ZGVdN8v_log2f_sse_wrapper
+#include "../svml_s_log2f8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core.c
new file mode 100644
index 0000000000..b3e958021a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized log2f, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_log2f
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_log2f, __GI__ZGVdN8v_log2f,
+	       __redirect__ZGVdN8v_log2f)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core_avx2.S
new file mode 100644
index 0000000000..bc0cb5081a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core_avx2.S
@@ -0,0 +1,226 @@
+/* Function log2f vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
+ *    R = Rcp*x - 1.0
+ *    log2(x) = k - log2(Rcp) + poly_approximation(R)
+ *       log2(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_slog2_data_internal
+ */
+#define MinNorm                       	0
+#define MaxNorm                       	32
+#define iBrkValue                     	64
+#define iOffExpoMask                  	96
+#define One                           	128
+#define sPoly                         	160
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_log2f_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+
+/* reduction: compute r,n */
+        vmovups   iBrkValue+__svml_slog2_data_internal(%rip), %ymm4
+        vmovups   sPoly+64+__svml_slog2_data_internal(%rip), %ymm9
+        vmovups   sPoly+128+__svml_slog2_data_internal(%rip), %ymm10
+        vmovups   sPoly+192+__svml_slog2_data_internal(%rip), %ymm12
+        vpsubd    %ymm4, %ymm0, %ymm1
+        vcmplt_oqps MinNorm+__svml_slog2_data_internal(%rip), %ymm0, %ymm5
+        vcmpnle_uqps MaxNorm+__svml_slog2_data_internal(%rip), %ymm0, %ymm6
+        vpand     iOffExpoMask+__svml_slog2_data_internal(%rip), %ymm1, %ymm3
+        vpsrad    $23, %ymm1, %ymm2
+        vmovups   sPoly+__svml_slog2_data_internal(%rip), %ymm1
+        vpaddd    %ymm4, %ymm3, %ymm8
+        vcvtdq2ps %ymm2, %ymm14
+        vsubps    One+__svml_slog2_data_internal(%rip), %ymm8, %ymm13
+        vfmadd213ps sPoly+32+__svml_slog2_data_internal(%rip), %ymm13, %ymm1
+        vfmadd213ps sPoly+96+__svml_slog2_data_internal(%rip), %ymm13, %ymm9
+        vmulps    %ymm13, %ymm13, %ymm11
+        vfmadd213ps sPoly+160+__svml_slog2_data_internal(%rip), %ymm13, %ymm10
+        vfmadd213ps sPoly+224+__svml_slog2_data_internal(%rip), %ymm13, %ymm12
+        vfmadd213ps %ymm9, %ymm11, %ymm1
+        vfmadd213ps %ymm10, %ymm11, %ymm1
+        vfmadd213ps %ymm12, %ymm11, %ymm1
+        vfmadd213ps sPoly+256+__svml_slog2_data_internal(%rip), %ymm13, %ymm1
+        vorps     %ymm6, %ymm5, %ymm7
+
+/* combine and get argument value range mask */
+        vmovmskps %ymm7, %edx
+        vfmadd213ps %ymm14, %ymm13, %ymm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        vmovaps   %ymm1, %ymm0
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm0, 32(%rsp)
+        vmovups   %ymm1, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm1
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      log2f@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_log2f_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_slog2_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 MinNorm[8][1];
+        __declspec(align(32)) VUINT32 MaxNorm[8][1];
+        __declspec(align(32)) VUINT32 iBrkValue[8][1];
+        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
+        __declspec(align(32)) VUINT32 One[8][1];
+        __declspec(align(32)) VUINT32 sPoly[9][8][1];
+} __svml_slog2_data_internal;
+#endif
+__svml_slog2_data_internal:
+        /*== MinNorm ==*/
+        .long 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000
+        /*== MaxNorm ==*/
+        .align 32
+        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 32
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 32
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sOne = SP 1.0 ==*/
+        .align 32
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== spoly[9] ==*/
+        .align 32
+        .long 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012 /* coeff9 */
+        .long 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14 /* coeff8 */
+        .long 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B /* coeff7 */
+        .long 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824 /* coeff6 */
+        .long 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07 /* coeff5 */
+        .long 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969 /* coeff4 */
+        .long 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0 /* coeff3 */
+        .long 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B /* coeff2 */
+        .long 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B /* coeff1 */
+        .align 32
+        .type	__svml_slog2_data_internal,@object
+        .size	__svml_slog2_data_internal,.-__svml_slog2_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_log22_core.S b/sysdeps/x86_64/fpu/svml_d_log22_core.S
new file mode 100644
index 0000000000..f181a62c7d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log22_core.S
@@ -0,0 +1,29 @@
+/* Function log2 vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_log2)
+WRAPPER_IMPL_SSE2 log2
+END (_ZGVbN2v_log2)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_log2)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_log24_core.S b/sysdeps/x86_64/fpu/svml_d_log24_core.S
new file mode 100644
index 0000000000..b0a5aa9532
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log24_core.S
@@ -0,0 +1,29 @@
+/* Function log2 vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_log2)
+WRAPPER_IMPL_AVX _ZGVbN2v_log2
+END (_ZGVdN4v_log2)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_log2)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_log24_core_avx.S b/sysdeps/x86_64/fpu/svml_d_log24_core_avx.S
new file mode 100644
index 0000000000..9a56cfed61
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log24_core_avx.S
@@ -0,0 +1,25 @@
+/* Function log2 vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_log2)
+WRAPPER_IMPL_AVX _ZGVbN2v_log2
+END (_ZGVcN4v_log2)
diff --git a/sysdeps/x86_64/fpu/svml_d_log28_core.S b/sysdeps/x86_64/fpu/svml_d_log28_core.S
new file mode 100644
index 0000000000..443cbfd578
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log28_core.S
@@ -0,0 +1,25 @@
+/* Function log2 vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_log2)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_log2
+END (_ZGVeN8v_log2)
diff --git a/sysdeps/x86_64/fpu/svml_s_log2f16_core.S b/sysdeps/x86_64/fpu/svml_s_log2f16_core.S
new file mode 100644
index 0000000000..6cf265fd33
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log2f16_core.S
@@ -0,0 +1,25 @@
+/* Function log2f vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_log2f)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_log2f
+END (_ZGVeN16v_log2f)
diff --git a/sysdeps/x86_64/fpu/svml_s_log2f4_core.S b/sysdeps/x86_64/fpu/svml_s_log2f4_core.S
new file mode 100644
index 0000000000..024ba9b8c5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log2f4_core.S
@@ -0,0 +1,29 @@
+/* Function log2f vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_log2f)
+WRAPPER_IMPL_SSE2 log2f
+END (_ZGVbN4v_log2f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_log2f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_log2f8_core.S b/sysdeps/x86_64/fpu/svml_s_log2f8_core.S
new file mode 100644
index 0000000000..5705590563
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log2f8_core.S
@@ -0,0 +1,29 @@
+/* Function log2f vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_log2f)
+WRAPPER_IMPL_AVX _ZGVbN4v_log2f
+END (_ZGVdN8v_log2f)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_log2f)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S
new file mode 100644
index 0000000000..38602c475e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function log2f vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_log2f)
+WRAPPER_IMPL_AVX _ZGVbN4v_log2f
+END (_ZGVcN8v_log2f)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx.c
new file mode 100644
index 0000000000..95d8e4bbd8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-log2.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx2.c
new file mode 100644
index 0000000000..95d8e4bbd8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-log2.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx512f.c
new file mode 100644
index 0000000000..95d8e4bbd8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-log2.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log2.c b/sysdeps/x86_64/fpu/test-double-libmvec-log2.c
new file mode 100644
index 0000000000..326b6f1171
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log2.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC log2
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 3dce136dfc..08c91ff634 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVbN2v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
+VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 1852625897..a2fb0de309 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVdN4v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
+VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index cf9ea35ffe..dc65a4ee25 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVcN4v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
+VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index b6457ea032..253ee8c906 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVeN8v_sinh)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
+VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx.c
new file mode 100644
index 0000000000..c88b3fc5a9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-log2f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx2.c
new file mode 100644
index 0000000000..c88b3fc5a9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-log2f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx512f.c
new file mode 100644
index 0000000000..c88b3fc5a9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-log2f.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log2f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log2f.c
new file mode 100644
index 0000000000..afba03d1e2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log2f.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC log2f
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 272e754e1b..1c7db5146c 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVeN16v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
+VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index b892258b99..8ec51603b3 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVbN4v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
+VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 1c6ead71e1..1cb4553c7a 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVdN8v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
+VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 71f5d8d7b6..6ecc1792bb 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVcN8v_sinhf)
 VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
+VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 13/18] x86-64: Add vector log1p/log1pf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (11 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 12/18] x86-64: Add vector log2/log2f " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:26   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 14/18] x86-64: Add vector atanh/atanhf " Sunil K Pandey
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector log1p/log1pf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |   11 +
 math/bits/mathcalls.h                         |    2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
 sysdeps/x86/fpu/bits/math-vector.h            |    4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
 sysdeps/x86_64/fpu/Makeconfig                 |    1 +
 sysdeps/x86_64/fpu/Versions                   |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
 .../fpu/multiarch/svml_d_log1p2_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_log1p2_core.c |   27 +
 .../fpu/multiarch/svml_d_log1p2_core_sse4.S   | 1398 +++++++++++++++++
 .../fpu/multiarch/svml_d_log1p4_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_log1p4_core.c |   27 +
 .../fpu/multiarch/svml_d_log1p4_core_avx2.S   | 1383 ++++++++++++++++
 .../fpu/multiarch/svml_d_log1p8_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_log1p8_core.c |   27 +
 .../fpu/multiarch/svml_d_log1p8_core_avx512.S |  317 ++++
 .../fpu/multiarch/svml_s_log1pf16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_log1pf16_core.c      |   28 +
 .../multiarch/svml_s_log1pf16_core_avx512.S   |  271 ++++
 .../fpu/multiarch/svml_s_log1pf4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_log1pf4_core.c       |   28 +
 .../fpu/multiarch/svml_s_log1pf4_core_sse4.S  |  252 +++
 .../fpu/multiarch/svml_s_log1pf8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_log1pf8_core.c       |   28 +
 .../fpu/multiarch/svml_s_log1pf8_core_avx2.S  |  254 +++
 sysdeps/x86_64/fpu/svml_d_log1p2_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_log1p4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_log1p8_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_s_log1pf16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_log1pf4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_log1pf8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S  |   25 +
 .../fpu/test-double-libmvec-log1p-avx.c       |    1 +
 .../fpu/test-double-libmvec-log1p-avx2.c      |    1 +
 .../fpu/test-double-libmvec-log1p-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-log1p.c    |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
 .../fpu/test-float-libmvec-log1pf-avx.c       |    1 +
 .../fpu/test-float-libmvec-log1pf-avx2.c      |    1 +
 .../fpu/test-float-libmvec-log1pf-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-log1pf.c    |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
 50 files changed, 4447 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 73252615ca..845246fab9 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -241,4 +241,15 @@
 #define __DECL_SIMD_log2f32x
 #define __DECL_SIMD_log2f64x
 #define __DECL_SIMD_log2f128x
+
+#define __DECL_SIMD_log1p
+#define __DECL_SIMD_log1pf
+#define __DECL_SIMD_log1pl
+#define __DECL_SIMD_log1pf16
+#define __DECL_SIMD_log1pf32
+#define __DECL_SIMD_log1pf64
+#define __DECL_SIMD_log1pf128
+#define __DECL_SIMD_log1pf32x
+#define __DECL_SIMD_log1pf64x
+#define __DECL_SIMD_log1pf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index bfe52a4666..aa4bc61aa4 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -119,7 +119,7 @@ __MATHCALL_VEC (exp10,, (_Mdouble_ __x));
 __MATHCALL_VEC (expm1,, (_Mdouble_ __x));
 
 /* Return log(1 + X).  */
-__MATHCALL (log1p,, (_Mdouble_ __x));
+__MATHCALL_VEC (log1p,, (_Mdouble_ __x));
 
 /* Return the base 2 signed integral exponent of X.  */
 __MATHCALL (logb,, (_Mdouble_ __x));
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index fa8b016c5d..68b940606a 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -55,6 +55,7 @@ GLIBC_2.35 _ZGVbN2v_exp10 F
 GLIBC_2.35 _ZGVbN2v_exp2 F
 GLIBC_2.35 _ZGVbN2v_expm1 F
 GLIBC_2.35 _ZGVbN2v_log10 F
+GLIBC_2.35 _ZGVbN2v_log1p F
 GLIBC_2.35 _ZGVbN2v_log2 F
 GLIBC_2.35 _ZGVbN2v_sinh F
 GLIBC_2.35 _ZGVbN2vv_atan2 F
@@ -68,6 +69,7 @@ GLIBC_2.35 _ZGVbN4v_exp10f F
 GLIBC_2.35 _ZGVbN4v_exp2f F
 GLIBC_2.35 _ZGVbN4v_expm1f F
 GLIBC_2.35 _ZGVbN4v_log10f F
+GLIBC_2.35 _ZGVbN4v_log1pf F
 GLIBC_2.35 _ZGVbN4v_log2f F
 GLIBC_2.35 _ZGVbN4v_sinhf F
 GLIBC_2.35 _ZGVbN4vv_atan2f F
@@ -81,6 +83,7 @@ GLIBC_2.35 _ZGVcN4v_exp10 F
 GLIBC_2.35 _ZGVcN4v_exp2 F
 GLIBC_2.35 _ZGVcN4v_expm1 F
 GLIBC_2.35 _ZGVcN4v_log10 F
+GLIBC_2.35 _ZGVcN4v_log1p F
 GLIBC_2.35 _ZGVcN4v_log2 F
 GLIBC_2.35 _ZGVcN4v_sinh F
 GLIBC_2.35 _ZGVcN4vv_atan2 F
@@ -94,6 +97,7 @@ GLIBC_2.35 _ZGVcN8v_exp10f F
 GLIBC_2.35 _ZGVcN8v_exp2f F
 GLIBC_2.35 _ZGVcN8v_expm1f F
 GLIBC_2.35 _ZGVcN8v_log10f F
+GLIBC_2.35 _ZGVcN8v_log1pf F
 GLIBC_2.35 _ZGVcN8v_log2f F
 GLIBC_2.35 _ZGVcN8v_sinhf F
 GLIBC_2.35 _ZGVcN8vv_atan2f F
@@ -107,6 +111,7 @@ GLIBC_2.35 _ZGVdN4v_exp10 F
 GLIBC_2.35 _ZGVdN4v_exp2 F
 GLIBC_2.35 _ZGVdN4v_expm1 F
 GLIBC_2.35 _ZGVdN4v_log10 F
+GLIBC_2.35 _ZGVdN4v_log1p F
 GLIBC_2.35 _ZGVdN4v_log2 F
 GLIBC_2.35 _ZGVdN4v_sinh F
 GLIBC_2.35 _ZGVdN4vv_atan2 F
@@ -120,6 +125,7 @@ GLIBC_2.35 _ZGVdN8v_exp10f F
 GLIBC_2.35 _ZGVdN8v_exp2f F
 GLIBC_2.35 _ZGVdN8v_expm1f F
 GLIBC_2.35 _ZGVdN8v_log10f F
+GLIBC_2.35 _ZGVdN8v_log1pf F
 GLIBC_2.35 _ZGVdN8v_log2f F
 GLIBC_2.35 _ZGVdN8v_sinhf F
 GLIBC_2.35 _ZGVdN8vv_atan2f F
@@ -133,6 +139,7 @@ GLIBC_2.35 _ZGVeN16v_exp10f F
 GLIBC_2.35 _ZGVeN16v_exp2f F
 GLIBC_2.35 _ZGVeN16v_expm1f F
 GLIBC_2.35 _ZGVeN16v_log10f F
+GLIBC_2.35 _ZGVeN16v_log1pf F
 GLIBC_2.35 _ZGVeN16v_log2f F
 GLIBC_2.35 _ZGVeN16v_sinhf F
 GLIBC_2.35 _ZGVeN16vv_atan2f F
@@ -146,6 +153,7 @@ GLIBC_2.35 _ZGVeN8v_exp10 F
 GLIBC_2.35 _ZGVeN8v_exp2 F
 GLIBC_2.35 _ZGVeN8v_expm1 F
 GLIBC_2.35 _ZGVeN8v_log10 F
+GLIBC_2.35 _ZGVeN8v_log1p F
 GLIBC_2.35 _ZGVeN8v_log2 F
 GLIBC_2.35 _ZGVeN8v_sinh F
 GLIBC_2.35 _ZGVeN8vv_atan2 F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 59d284a10a..14c9db3bb3 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -110,6 +110,10 @@
 #  define __DECL_SIMD_log2 __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_log2f
 #  define __DECL_SIMD_log2f __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_log1p
+#  define __DECL_SIMD_log1p __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_log1pf
+#  define __DECL_SIMD_log1pf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index a2ca9a203f..3dca196432 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -54,6 +54,8 @@
 !GCC$ builtin (log10f) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (log2) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (log2f) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (log1p) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (log1pf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -93,3 +95,5 @@
 !GCC$ builtin (log10f) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (log2) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (log2f) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (log1p) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (log1pf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 8d6d0915af..378cb06d37 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -36,6 +36,7 @@ libmvec-funcs = \
   hypot \
   log \
   log10 \
+  log1p \
   log2 \
   pow \
   sin \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 1b48c2d642..155fb115f3 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -23,6 +23,7 @@ libmvec {
     _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
     _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
     _ZGVbN2v_log10; _ZGVcN4v_log10; _ZGVdN4v_log10; _ZGVeN8v_log10;
+    _ZGVbN2v_log1p; _ZGVcN4v_log1p; _ZGVdN4v_log1p; _ZGVeN8v_log1p;
     _ZGVbN2v_log2; _ZGVcN4v_log2; _ZGVdN4v_log2; _ZGVeN8v_log2;
     _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
     _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
@@ -36,6 +37,7 @@ libmvec {
     _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
     _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
     _ZGVbN4v_log10f; _ZGVcN8v_log10f; _ZGVdN8v_log10f; _ZGVeN16v_log10f;
+    _ZGVbN4v_log1pf; _ZGVcN8v_log1pf; _ZGVdN8v_log1pf; _ZGVeN16v_log1pf;
     _ZGVbN4v_log2f; _ZGVcN8v_log2f; _ZGVdN8v_log2f; _ZGVeN16v_log2f;
     _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
     _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 3b7f3cee6f..a2b15a795b 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1685,6 +1685,26 @@ float: 2
 float128: 2
 ldouble: 3
 
+Function: "log1p_vlen16":
+float: 2
+
+Function: "log1p_vlen2":
+double: 1
+
+Function: "log1p_vlen4":
+double: 1
+float: 2
+
+Function: "log1p_vlen4_avx2":
+double: 1
+
+Function: "log1p_vlen8":
+double: 1
+float: 2
+
+Function: "log1p_vlen8_avx2":
+float: 2
+
 Function: "log2":
 double: 2
 float: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
new file mode 100644
index 0000000000..8004088346
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized log1p, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_log1p _ZGVbN2v_log1p_sse2
+#include "../svml_d_log1p2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
new file mode 100644
index 0000000000..35ca620aba
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized log1p, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_log1p
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_log1p, __GI__ZGVbN2v_log1p, __redirect__ZGVbN2v_log1p)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
new file mode 100644
index 0000000000..9d3f0647b4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
@@ -0,0 +1,1398 @@
+/* Function log1p vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
+ *    Get short reciprocal approximation Rcp ~ 1/xh
+ *    R = (Rcp*xh - 1.0) + Rcp*xl
+ *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
+ *       log(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dlog1p_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	8208
+#define poly_coeff                    	12320
+#define ExpMask                       	12384
+#define Two10                         	12400
+#define MinLog1p                      	12416
+#define MaxLog1p                      	12432
+#define One                           	12448
+#define SgnMask                       	12464
+#define XThreshold                    	12480
+#define XhMask                        	12496
+#define Threshold                     	12512
+#define Bias                          	12528
+#define Bias1                         	12544
+#define ExpMask0                      	12560
+#define ExpMask2                      	12576
+#define L2                            	12592
+
+/* Lookup bias for data table __svml_dlog1p_data_internal.  */
+#define Table_Lookup_Bias               -0x405ff0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_log1p_sse4)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $64, %rsp
+        movaps    %xmm0, %xmm7
+
+/* SgnMask used by all accuracies */
+        movups    SgnMask+__svml_dlog1p_data_internal(%rip), %xmm6
+        lea       Table_Lookup_Bias+__svml_dlog1p_data_internal(%rip), %rsi
+        movaps    %xmm6, %xmm8
+        movaps    %xmm7, %xmm15
+        movups    One+__svml_dlog1p_data_internal(%rip), %xmm0
+        andps     %xmm7, %xmm8
+        cmpltpd   XThreshold+__svml_dlog1p_data_internal(%rip), %xmm8
+        cmpnlepd  MaxLog1p+__svml_dlog1p_data_internal(%rip), %xmm15
+        movaps    %xmm0, %xmm4
+
+/* compute 1+x as high, low parts */
+        movaps    %xmm0, %xmm9
+        addpd     %xmm7, %xmm4
+        maxpd     %xmm7, %xmm9
+        orps      XhMask+__svml_dlog1p_data_internal(%rip), %xmm8
+        movaps    %xmm0, %xmm5
+
+/* preserve mantissa, set input exponent to 2^(-10) */
+        movups    ExpMask+__svml_dlog1p_data_internal(%rip), %xmm3
+        andps     %xmm8, %xmm4
+        andps     %xmm4, %xmm3
+
+/* check range */
+        movaps    %xmm7, %xmm8
+        orps      Two10+__svml_dlog1p_data_internal(%rip), %xmm3
+
+/* Compute SignMask for all accuracies, including EP */
+        andnps    %xmm7, %xmm6
+
+/* reciprocal approximation good to at least 11 bits */
+        cvtpd2ps  %xmm3, %xmm10
+        minpd     %xmm7, %xmm5
+        subpd     %xmm4, %xmm9
+        cmpltpd   MinLog1p+__svml_dlog1p_data_internal(%rip), %xmm8
+        addpd     %xmm9, %xmm5
+        movlhps   %xmm10, %xmm10
+        orps      %xmm15, %xmm8
+        rcpps     %xmm10, %xmm11
+
+/* combine and get argument value range mask */
+        movmskpd  %xmm8, %edx
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        movups    .FLT_16(%rip), %xmm13
+
+/* exponent of X needed to scale Xl */
+        movdqu    ExpMask0+__svml_dlog1p_data_internal(%rip), %xmm12
+        cvtps2pd  %xmm11, %xmm1
+        addpd     %xmm13, %xmm1
+        subpd     %xmm13, %xmm1
+
+/* 2^ (-10-exp(X) ) */
+        movdqu    ExpMask2+__svml_dlog1p_data_internal(%rip), %xmm2
+        pand      %xmm4, %xmm12
+        psubq     %xmm12, %xmm2
+        mulpd     %xmm1, %xmm3
+
+/* scale DblRcp */
+        mulpd     %xmm1, %xmm2
+        subpd     %xmm0, %xmm3
+
+/*
+ * argument reduction
+ * VQFMS( D, R, X, DblRcp1, One );
+ */
+        mulpd     %xmm2, %xmm5
+        addpd     %xmm5, %xmm3
+
+/* exponent*log(2.0) */
+        movups    Threshold+__svml_dlog1p_data_internal(%rip), %xmm10
+
+/* exponent bits */
+        psrlq     $20, %xmm4
+        pshufd    $221, %xmm4, %xmm14
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        movaps    %xmm1, %xmm4
+        cmpltpd   %xmm1, %xmm10
+
+/* biased exponent in DP format */
+        cvtdq2pd  %xmm14, %xmm0
+
+/* polynomial */
+        movups    poly_coeff+__svml_dlog1p_data_internal(%rip), %xmm1
+        movaps    %xmm3, %xmm5
+        mulpd     %xmm3, %xmm1
+        mulpd     %xmm3, %xmm5
+        addpd     poly_coeff+16+__svml_dlog1p_data_internal(%rip), %xmm1
+        movups    poly_coeff+32+__svml_dlog1p_data_internal(%rip), %xmm2
+        psrlq     $40, %xmm4
+        mulpd     %xmm3, %xmm2
+        mulpd     %xmm5, %xmm1
+        addpd     poly_coeff+48+__svml_dlog1p_data_internal(%rip), %xmm2
+        movd      %xmm4, %eax
+        andps     Bias+__svml_dlog1p_data_internal(%rip), %xmm10
+        addpd     %xmm1, %xmm2
+
+/* reconstruction */
+        mulpd     %xmm2, %xmm5
+        orps      Bias1+__svml_dlog1p_data_internal(%rip), %xmm10
+        pshufd    $2, %xmm4, %xmm9
+        subpd     %xmm10, %xmm0
+        addpd     %xmm5, %xmm3
+        movd      %xmm9, %ecx
+        mulpd     L2+__svml_dlog1p_data_internal(%rip), %xmm0
+        movslq    %eax, %rax
+        movslq    %ecx, %rcx
+        movsd     (%rsi,%rax), %xmm11
+        movhpd    (%rsi,%rcx), %xmm11
+        addpd     %xmm3, %xmm11
+        addpd     %xmm11, %xmm0
+
+/* OR in the Sign of input argument to produce correct log1p(-0) */
+        orps      %xmm6, %xmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm7
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm7, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      log1p@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVbN2v_log1p_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dlog1p_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
+        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
+        __declspec(align(16)) VUINT32 ExpMask[2][2];
+        __declspec(align(16)) VUINT32 Two10[2][2];
+        __declspec(align(16)) VUINT32 MinLog1p[2][2];
+        __declspec(align(16)) VUINT32 MaxLog1p[2][2];
+        __declspec(align(16)) VUINT32 One[2][2];
+        __declspec(align(16)) VUINT32 SgnMask[2][2];
+        __declspec(align(16)) VUINT32 XThreshold[2][2];
+        __declspec(align(16)) VUINT32 XhMask[2][2];
+        __declspec(align(16)) VUINT32 Threshold[2][2];
+        __declspec(align(16)) VUINT32 Bias[2][2];
+        __declspec(align(16)) VUINT32 Bias1[2][2];
+        __declspec(align(16)) VUINT32 ExpMask0[2][2];
+        __declspec(align(16)) VUINT32 ExpMask2[2][2];
+        __declspec(align(16)) VUINT32 L2[2][2];
+} __svml_dlog1p_data_internal;
+#endif
+__svml_dlog1p_data_internal:
+        /* Log_HA_table */
+        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
+        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
+        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
+        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
+        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
+        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
+        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
+        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
+        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
+        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
+        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
+        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
+        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
+        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
+        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
+        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
+        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
+        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
+        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
+        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
+        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
+        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
+        .quad 0xc086238206e94218, 0xbe1ceee898588610
+        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
+        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
+        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
+        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
+        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
+        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
+        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
+        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
+        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
+        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
+        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
+        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
+        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
+        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
+        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
+        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
+        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
+        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
+        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
+        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
+        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
+        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
+        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
+        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
+        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
+        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
+        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
+        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
+        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
+        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
+        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
+        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
+        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
+        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
+        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
+        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
+        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
+        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
+        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
+        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
+        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
+        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
+        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
+        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
+        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
+        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
+        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
+        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
+        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
+        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
+        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
+        .quad 0xc086244055d2c968, 0xbe1cef345284c119
+        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
+        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
+        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
+        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
+        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
+        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
+        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
+        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
+        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
+        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
+        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
+        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
+        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
+        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
+        .quad 0xc086247419475160, 0xbe1cf03dd9922331
+        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
+        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
+        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
+        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
+        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
+        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
+        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
+        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
+        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
+        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
+        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
+        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
+        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
+        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
+        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
+        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
+        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
+        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
+        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
+        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
+        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
+        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
+        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
+        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
+        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
+        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
+        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
+        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
+        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
+        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
+        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
+        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
+        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
+        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
+        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
+        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
+        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
+        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
+        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
+        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
+        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
+        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
+        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
+        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
+        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
+        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
+        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
+        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
+        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
+        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
+        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
+        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
+        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
+        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
+        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
+        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
+        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
+        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
+        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
+        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
+        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
+        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
+        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
+        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
+        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
+        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
+        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
+        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
+        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
+        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
+        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
+        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
+        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
+        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
+        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
+        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
+        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
+        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
+        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
+        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
+        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
+        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
+        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
+        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
+        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
+        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
+        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
+        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
+        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
+        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
+        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
+        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
+        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
+        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
+        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
+        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
+        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
+        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
+        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
+        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
+        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
+        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
+        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
+        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
+        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
+        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
+        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
+        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
+        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
+        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
+        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
+        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
+        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
+        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
+        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
+        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
+        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
+        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
+        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
+        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
+        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
+        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
+        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
+        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
+        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
+        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
+        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
+        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
+        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
+        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
+        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
+        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
+        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
+        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
+        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
+        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
+        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
+        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
+        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
+        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
+        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
+        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
+        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
+        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
+        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
+        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
+        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
+        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
+        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
+        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
+        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
+        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
+        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
+        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
+        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
+        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
+        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
+        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
+        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
+        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
+        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
+        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
+        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
+        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
+        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
+        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
+        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
+        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
+        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
+        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
+        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
+        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
+        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
+        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
+        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
+        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
+        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
+        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
+        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
+        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
+        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
+        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
+        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
+        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
+        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
+        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
+        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
+        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
+        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
+        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
+        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
+        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
+        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
+        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
+        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
+        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
+        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
+        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
+        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
+        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
+        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
+        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
+        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
+        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
+        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
+        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
+        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
+        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
+        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
+        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
+        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
+        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
+        .quad 0xc08626e164224880, 0xbe1ceeb431709788
+        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
+        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
+        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
+        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
+        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
+        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
+        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
+        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
+        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
+        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
+        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
+        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
+        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
+        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
+        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
+        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
+        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
+        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
+        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
+        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
+        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
+        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
+        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
+        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
+        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
+        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
+        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
+        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
+        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
+        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
+        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
+        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
+        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
+        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
+        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
+        .quad 0xc086273a05367688, 0xbe1cf18656c50806
+        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
+        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
+        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
+        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
+        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
+        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
+        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
+        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
+        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
+        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
+        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
+        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
+        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
+        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
+        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
+        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
+        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
+        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
+        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
+        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
+        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
+        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
+        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
+        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
+        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
+        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
+        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
+        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
+        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
+        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
+        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
+        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
+        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
+        .quad 0xc086278a58297918, 0xbe1cf053073872bf
+        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
+        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
+        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
+        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
+        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
+        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
+        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
+        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
+        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
+        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
+        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
+        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
+        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
+        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
+        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
+        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
+        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
+        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
+        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
+        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
+        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
+        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
+        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
+        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
+        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
+        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
+        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
+        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
+        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
+        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
+        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
+        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
+        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
+        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
+        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
+        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
+        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
+        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
+        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
+        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
+        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
+        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
+        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
+        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
+        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
+        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
+        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
+        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
+        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
+        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
+        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
+        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
+        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
+        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
+        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
+        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
+        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
+        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
+        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
+        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
+        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
+        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
+        .quad 0xc086281755366778, 0xbe1cef2edae5837d
+        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
+        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
+        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
+        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
+        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
+        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
+        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
+        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
+        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
+        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
+        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
+        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
+        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
+        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
+        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
+        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
+        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
+        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
+        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
+        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
+        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
+        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
+        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
+        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
+        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
+        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
+        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
+        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
+        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
+        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
+        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
+        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
+        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
+        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
+        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
+        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
+        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
+        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
+        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
+        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
+        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
+        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
+        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
+        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
+        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
+        .quad 0xc086287879041490, 0xbe1cf034803c8a48
+        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
+        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
+        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
+        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
+        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
+        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
+        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
+        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
+        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
+        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
+        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
+        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
+        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
+        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
+        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
+        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
+        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
+        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
+        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
+        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
+        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
+        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
+        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
+        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
+        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
+        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
+        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
+        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
+        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
+        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
+        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
+        /*== Log_LA_table ==*/
+        .align 16
+        .quad 0x8000000000000000
+        .quad 0xbf5ff802a9ab10e6
+        .quad 0xbf6ff00aa2b10bc0
+        .quad 0xbf77ee11ebd82e94
+        .quad 0xbf7fe02a6b106789
+        .quad 0xbf83e7295d25a7d9
+        .quad 0xbf87dc475f810a77
+        .quad 0xbf8bcf712c74384c
+        .quad 0xbf8fc0a8b0fc03e4
+        .quad 0xbf91d7f7eb9eebe7
+        .quad 0xbf93cea44346a575
+        .quad 0xbf95c45a51b8d389
+        .quad 0xbf97b91b07d5b11b
+        .quad 0xbf99ace7551cc514
+        .quad 0xbf9b9fc027af9198
+        .quad 0xbf9d91a66c543cc4
+        .quad 0xbf9f829b0e783300
+        .quad 0xbfa0b94f7c196176
+        .quad 0xbfa1b0d98923d980
+        .quad 0xbfa2a7ec2214e873
+        .quad 0xbfa39e87b9febd60
+        .quad 0xbfa494acc34d911c
+        .quad 0xbfa58a5bafc8e4d5
+        .quad 0xbfa67f94f094bd98
+        .quad 0xbfa77458f632dcfc
+        .quad 0xbfa868a83083f6cf
+        .quad 0xbfa95c830ec8e3eb
+        .quad 0xbfaa4fe9ffa3d235
+        .quad 0xbfab42dd711971bf
+        .quad 0xbfac355dd0921f2d
+        .quad 0xbfad276b8adb0b52
+        .quad 0xbfae19070c276016
+        .quad 0xbfaf0a30c01162a6
+        .quad 0xbfaffae9119b9303
+        .quad 0xbfb075983598e471
+        .quad 0xbfb0ed839b5526fe
+        .quad 0xbfb16536eea37ae1
+        .quad 0xbfb1dcb263db1944
+        .quad 0xbfb253f62f0a1417
+        .quad 0xbfb2cb0283f5de1f
+        .quad 0xbfb341d7961bd1d1
+        .quad 0xbfb3b87598b1b6ee
+        .quad 0xbfb42edcbea646f0
+        .quad 0xbfb4a50d3aa1b040
+        .quad 0xbfb51b073f06183f
+        .quad 0xbfb590cafdf01c28
+        .quad 0xbfb60658a93750c4
+        .quad 0xbfb67bb0726ec0fc
+        .quad 0xbfb6f0d28ae56b4c
+        .quad 0xbfb765bf23a6be13
+        .quad 0xbfb7da766d7b12cd
+        .quad 0xbfb84ef898e8282a
+        .quad 0xbfb8c345d6319b21
+        .quad 0xbfb9375e55595ede
+        .quad 0xbfb9ab42462033ad
+        .quad 0xbfba1ef1d8061cd4
+        .quad 0xbfba926d3a4ad563
+        .quad 0xbfbb05b49bee43fe
+        .quad 0xbfbb78c82bb0eda1
+        .quad 0xbfbbeba818146765
+        .quad 0xbfbc5e548f5bc743
+        .quad 0xbfbcd0cdbf8c13e1
+        .quad 0xbfbd4313d66cb35d
+        .quad 0xbfbdb5270187d927
+        .quad 0xbfbe27076e2af2e6
+        .quad 0xbfbe98b549671467
+        .quad 0xbfbf0a30c01162a6
+        .quad 0xbfbf7b79fec37ddf
+        .quad 0xbfbfec9131dbeabb
+        .quad 0xbfc02ebb42bf3d4b
+        .quad 0xbfc0671512ca596e
+        .quad 0xbfc09f561ee719c3
+        .quad 0xbfc0d77e7cd08e59
+        .quad 0xbfc10f8e422539b1
+        .quad 0xbfc14785846742ac
+        .quad 0xbfc17f6458fca611
+        .quad 0xbfc1b72ad52f67a0
+        .quad 0xbfc1eed90e2dc2c3
+        .quad 0xbfc2266f190a5acb
+        .quad 0xbfc25ded0abc6ad2
+        .quad 0xbfc29552f81ff523
+        .quad 0xbfc2cca0f5f5f251
+        .quad 0xbfc303d718e47fd3
+        .quad 0xbfc33af575770e4f
+        .quad 0xbfc371fc201e8f74
+        .quad 0xbfc3a8eb2d31a376
+        .quad 0xbfc3dfc2b0ecc62a
+        .quad 0xbfc41682bf727bc0
+        .quad 0xbfc44d2b6ccb7d1e
+        .quad 0xbfc483bccce6e3dd
+        .quad 0xbfc4ba36f39a55e5
+        .quad 0xbfc4f099f4a230b2
+        .quad 0xbfc526e5e3a1b438
+        .quad 0xbfc55d1ad4232d6f
+        .quad 0xbfc59338d9982086
+        .quad 0xbfc5c940075972b9
+        .quad 0xbfc5ff3070a793d4
+        .quad 0xbfc6350a28aaa758
+        .quad 0xbfc66acd4272ad51
+        .quad 0xbfc6a079d0f7aad2
+        .quad 0xbfc6d60fe719d21d
+        .quad 0xbfc70b8f97a1aa75
+        .quad 0xbfc740f8f54037a5
+        .quad 0xbfc7764c128f2127
+        .quad 0xbfc7ab890210d909
+        .quad 0xbfc7e0afd630c274
+        .quad 0xbfc815c0a14357eb
+        .quad 0xbfc84abb75865139
+        .quad 0xbfc87fa06520c911
+        .quad 0xbfc8b46f8223625b
+        .quad 0xbfc8e928de886d41
+        .quad 0xbfc91dcc8c340bde
+        .quad 0xbfc9525a9cf456b4
+        .quad 0xbfc986d3228180ca
+        .quad 0xbfc9bb362e7dfb83
+        .quad 0xbfc9ef83d2769a34
+        .quad 0xbfca23bc1fe2b563
+        .quad 0xbfca57df28244dcd
+        .quad 0xbfca8becfc882f19
+        .quad 0xbfcabfe5ae46124c
+        .quad 0xbfcaf3c94e80bff3
+        .quad 0xbfcb2797ee46320c
+        .quad 0xbfcb5b519e8fb5a4
+        .quad 0xbfcb8ef670420c3b
+        .quad 0xbfcbc286742d8cd6
+        .quad 0xbfcbf601bb0e44e2
+        .quad 0xbfcc2968558c18c1
+        .quad 0xbfcc5cba543ae425
+        .quad 0xbfcc8ff7c79a9a22
+        .quad 0xbfccc320c0176502
+        .quad 0xbfccf6354e09c5dc
+        .quad 0xbfcd293581b6b3e7
+        .quad 0xbfcd5c216b4fbb91
+        .quad 0xbfcd8ef91af31d5e
+        .quad 0xbfcdc1bca0abec7d
+        .quad 0xbfcdf46c0c722d2f
+        .quad 0xbfce27076e2af2e6
+        .quad 0xbfce598ed5a87e2f
+        .quad 0xbfce8c0252aa5a60
+        .quad 0xbfcebe61f4dd7b0b
+        .quad 0xbfcef0adcbdc5936
+        .quad 0xbfcf22e5e72f105d
+        .quad 0xbfcf550a564b7b37
+        .quad 0xbfcf871b28955045
+        .quad 0xbfcfb9186d5e3e2b
+        .quad 0xbfcfeb0233e607cc
+        .quad 0xbfd00e6c45ad501d
+        .quad 0xbfd0274dc16c232f
+        .quad 0xbfd0402594b4d041
+        .quad 0xbfd058f3c703ebc6
+        .quad 0xbfd071b85fcd590d
+        .quad 0xbfd08a73667c57af
+        .quad 0xbfd0a324e27390e3
+        .quad 0xbfd0bbccdb0d24bd
+        .quad 0xbfd0d46b579ab74b
+        .quad 0xbfd0ed005f657da4
+        .quad 0xbfd1058bf9ae4ad5
+        .quad 0xbfd11e0e2dad9cb7
+        .quad 0xbfd136870293a8b0
+        .quad 0xbfd14ef67f88685a
+        .quad 0xbfd1675cababa60e
+        .quad 0xbfd17fb98e15095d
+        .quad 0xbfd1980d2dd4236f
+        .quad 0xbfd1b05791f07b49
+        .quad 0xbfd1c898c16999fb
+        .quad 0xbfd1e0d0c33716be
+        .quad 0xbfd1f8ff9e48a2f3
+        .quad 0xbfd211255986160c
+        .quad 0xbfd22941fbcf7966
+        .quad 0xbfd241558bfd1404
+        .quad 0xbfd2596010df763a
+        .quad 0xbfd27161913f853d
+        .quad 0xbfd2895a13de86a3
+        .quad 0xbfd2a1499f762bc9
+        .quad 0xbfd2b9303ab89d25
+        .quad 0xbfd2d10dec508583
+        .quad 0xbfd2e8e2bae11d31
+        .quad 0xbfd300aead06350c
+        .quad 0xbfd31871c9544185
+        .quad 0xbfd3302c16586588
+        .quad 0xbfd347dd9a987d55
+        .quad 0xbfd35f865c93293e
+        .quad 0xbfd3772662bfd85b
+        .quad 0xbfd38ebdb38ed321
+        .quad 0xbfd3a64c556945ea
+        .quad 0xbfd3bdd24eb14b6a
+        .quad 0xbfd3d54fa5c1f710
+        .quad 0xbfd3ecc460ef5f50
+        .quad 0xbfd404308686a7e4
+        .quad 0xbfd41b941cce0bee
+        .quad 0xbfd432ef2a04e814
+        .quad 0xbfd44a41b463c47c
+        .quad 0xbfd4618bc21c5ec2
+        .quad 0xbfd478cd5959b3d9
+        .quad 0xbfd49006804009d1
+        .quad 0xbfd4a7373cecf997
+        .quad 0xbfd4be5f957778a1
+        .quad 0xbfd4d57f8fefe27f
+        .quad 0xbfd4ec973260026a
+        .quad 0xbfd503a682cb1cb3
+        .quad 0xbfd51aad872df82d
+        .quad 0xbfd531ac457ee77e
+        .quad 0xbfd548a2c3add263
+        .quad 0xbfd55f9107a43ee2
+        .quad 0xbfd5767717455a6c
+        .quad 0xbfd58d54f86e02f2
+        .quad 0xbfd5a42ab0f4cfe2
+        .quad 0xbfd5baf846aa1b19
+        .quad 0xbfd5d1bdbf5809ca
+        .quad 0xbfd5e87b20c2954a
+        .quad 0xbfd5ff3070a793d4
+        .quad 0xbfd615ddb4bec13c
+        .quad 0xbfd62c82f2b9c795
+        .quad 0x3fd61965cdb02c1f
+        .quad 0x3fd602d08af091ec
+        .quad 0x3fd5ec433d5c35ae
+        .quad 0x3fd5d5bddf595f30
+        .quad 0x3fd5bf406b543db2
+        .quad 0x3fd5a8cadbbedfa1
+        .quad 0x3fd5925d2b112a59
+        .quad 0x3fd57bf753c8d1fb
+        .quad 0x3fd565995069514c
+        .quad 0x3fd54f431b7be1a9
+        .quad 0x3fd538f4af8f72fe
+        .quad 0x3fd522ae0738a3d8
+        .quad 0x3fd50c6f1d11b97c
+        .quad 0x3fd4f637ebba9810
+        .quad 0x3fd4e0086dd8baca
+        .quad 0x3fd4c9e09e172c3c
+        .quad 0x3fd4b3c077267e9a
+        .quad 0x3fd49da7f3bcc41f
+        .quad 0x3fd487970e958770
+        .quad 0x3fd4718dc271c41b
+        .quad 0x3fd45b8c0a17df13
+        .quad 0x3fd44591e0539f49
+        .quad 0x3fd42f9f3ff62642
+        .quad 0x3fd419b423d5e8c7
+        .quad 0x3fd403d086cea79c
+        .quad 0x3fd3edf463c1683e
+        .quad 0x3fd3d81fb5946dba
+        .quad 0x3fd3c25277333184
+        .quad 0x3fd3ac8ca38e5c5f
+        .quad 0x3fd396ce359bbf54
+        .quad 0x3fd3811728564cb2
+        .quad 0x3fd36b6776be1117
+        .quad 0x3fd355bf1bd82c8b
+        .quad 0x3fd3401e12aecba1
+        .quad 0x3fd32a84565120a8
+        .quad 0x3fd314f1e1d35ce4
+        .quad 0x3fd2ff66b04ea9d4
+        .quad 0x3fd2e9e2bce12286
+        .quad 0x3fd2d46602adccee
+        .quad 0x3fd2bef07cdc9354
+        .quad 0x3fd2a982269a3dbf
+        .quad 0x3fd2941afb186b7c
+        .quad 0x3fd27ebaf58d8c9d
+        .quad 0x3fd269621134db92
+        .quad 0x3fd25410494e56c7
+        .quad 0x3fd23ec5991eba49
+        .quad 0x3fd22981fbef797b
+        .quad 0x3fd214456d0eb8d4
+        .quad 0x3fd1ff0fe7cf47a7
+        .quad 0x3fd1e9e1678899f4
+        .quad 0x3fd1d4b9e796c245
+        .quad 0x3fd1bf99635a6b95
+        .quad 0x3fd1aa7fd638d33f
+        .quad 0x3fd1956d3b9bc2fa
+        .quad 0x3fd180618ef18adf
+        .quad 0x3fd16b5ccbacfb73
+        .quad 0x3fd1565eed455fc3
+        .quad 0x3fd14167ef367783
+        .quad 0x3fd12c77cd00713b
+        .quad 0x3fd1178e8227e47c
+        .quad 0x3fd102ac0a35cc1c
+        .quad 0x3fd0edd060b78081
+        .quad 0x3fd0d8fb813eb1ef
+        .quad 0x3fd0c42d676162e3
+        .quad 0x3fd0af660eb9e279
+        .quad 0x3fd09aa572e6c6d4
+        .quad 0x3fd085eb8f8ae797
+        .quad 0x3fd07138604d5862
+        .quad 0x3fd05c8be0d9635a
+        .quad 0x3fd047e60cde83b8
+        .quad 0x3fd03346e0106062
+        .quad 0x3fd01eae5626c691
+        .quad 0x3fd00a1c6adda473
+        .quad 0x3fcfeb2233ea07cd
+        .quad 0x3fcfc218be620a5e
+        .quad 0x3fcf991c6cb3b379
+        .quad 0x3fcf702d36777df0
+        .quad 0x3fcf474b134df229
+        .quad 0x3fcf1e75fadf9bde
+        .quad 0x3fcef5ade4dcffe6
+        .quad 0x3fceccf2c8fe920a
+        .quad 0x3fcea4449f04aaf5
+        .quad 0x3fce7ba35eb77e2a
+        .quad 0x3fce530effe71012
+        .quad 0x3fce2a877a6b2c12
+        .quad 0x3fce020cc6235ab5
+        .quad 0x3fcdd99edaf6d7e9
+        .quad 0x3fcdb13db0d48940
+        .quad 0x3fcd88e93fb2f450
+        .quad 0x3fcd60a17f903515
+        .quad 0x3fcd38666871f465
+        .quad 0x3fcd1037f2655e7b
+        .quad 0x3fcce816157f1988
+        .quad 0x3fccc000c9db3c52
+        .quad 0x3fcc97f8079d44ec
+        .quad 0x3fcc6ffbc6f00f71
+        .quad 0x3fcc480c0005ccd1
+        .quad 0x3fcc2028ab17f9b4
+        .quad 0x3fcbf851c067555f
+        .quad 0x3fcbd087383bd8ad
+        .quad 0x3fcba8c90ae4ad19
+        .quad 0x3fcb811730b823d2
+        .quad 0x3fcb5971a213acdb
+        .quad 0x3fcb31d8575bce3d
+        .quad 0x3fcb0a4b48fc1b46
+        .quad 0x3fcae2ca6f672bd4
+        .quad 0x3fcabb55c31693ad
+        .quad 0x3fca93ed3c8ad9e3
+        .quad 0x3fca6c90d44b704e
+        .quad 0x3fca454082e6ab05
+        .quad 0x3fca1dfc40f1b7f1
+        .quad 0x3fc9f6c407089664
+        .quad 0x3fc9cf97cdce0ec3
+        .quad 0x3fc9a8778debaa38
+        .quad 0x3fc981634011aa75
+        .quad 0x3fc95a5adcf7017f
+        .quad 0x3fc9335e5d594989
+        .quad 0x3fc90c6db9fcbcd9
+        .quad 0x3fc8e588ebac2dbf
+        .quad 0x3fc8beafeb38fe8c
+        .quad 0x3fc897e2b17b19a5
+        .quad 0x3fc871213750e994
+        .quad 0x3fc84a6b759f512f
+        .quad 0x3fc823c16551a3c2
+        .quad 0x3fc7fd22ff599d4f
+        .quad 0x3fc7d6903caf5ad0
+        .quad 0x3fc7b0091651528c
+        .quad 0x3fc7898d85444c73
+        .quad 0x3fc7631d82935a86
+        .quad 0x3fc73cb9074fd14d
+        .quad 0x3fc716600c914054
+        .quad 0x3fc6f0128b756abc
+        .quad 0x3fc6c9d07d203fc7
+        .quad 0x3fc6a399dabbd383
+        .quad 0x3fc67d6e9d785771
+        .quad 0x3fc6574ebe8c133a
+        .quad 0x3fc6313a37335d76
+        .quad 0x3fc60b3100b09476
+        .quad 0x3fc5e533144c1719
+        .quad 0x3fc5bf406b543db2
+        .quad 0x3fc59958ff1d52f1
+        .quad 0x3fc5737cc9018cdd
+        .quad 0x3fc54dabc26105d2
+        .quad 0x3fc527e5e4a1b58d
+        .quad 0x3fc5022b292f6a45
+        .quad 0x3fc4dc7b897bc1c8
+        .quad 0x3fc4b6d6fefe22a4
+        .quad 0x3fc4913d8333b561
+        .quad 0x3fc46baf0f9f5db7
+        .quad 0x3fc4462b9dc9b3dc
+        .quad 0x3fc420b32740fdd4
+        .quad 0x3fc3fb45a59928cc
+        .quad 0x3fc3d5e3126bc27f
+        .quad 0x3fc3b08b6757f2a9
+        .quad 0x3fc38b3e9e027479
+        .quad 0x3fc365fcb0159016
+        .quad 0x3fc340c59741142e
+        .quad 0x3fc31b994d3a4f85
+        .quad 0x3fc2f677cbbc0a96
+        .quad 0x3fc2d1610c86813a
+        .quad 0x3fc2ac55095f5c59
+        .quad 0x3fc28753bc11aba5
+        .quad 0x3fc2625d1e6ddf57
+        .quad 0x3fc23d712a49c202
+        .quad 0x3fc2188fd9807263
+        .quad 0x3fc1f3b925f25d41
+        .quad 0x3fc1ceed09853752
+        .quad 0x3fc1aa2b7e23f72a
+        .quad 0x3fc185747dbecf34
+        .quad 0x3fc160c8024b27b1
+        .quad 0x3fc13c2605c398c3
+        .quad 0x3fc1178e8227e47c
+        .quad 0x3fc0f301717cf0fb
+        .quad 0x3fc0ce7ecdccc28d
+        .quad 0x3fc0aa06912675d5
+        .quad 0x3fc08598b59e3a07
+        .quad 0x3fc06135354d4b18
+        .quad 0x3fc03cdc0a51ec0d
+        .quad 0x3fc0188d2ecf6140
+        .quad 0x3fbfe89139dbd566
+        .quad 0x3fbfa01c9db57ce2
+        .quad 0x3fbf57bc7d9005db
+        .quad 0x3fbf0f70cdd992e3
+        .quad 0x3fbec739830a1120
+        .quad 0x3fbe7f1691a32d3e
+        .quad 0x3fbe3707ee30487b
+        .quad 0x3fbdef0d8d466db9
+        .quad 0x3fbda727638446a2
+        .quad 0x3fbd5f55659210e2
+        .quad 0x3fbd179788219364
+        .quad 0x3fbccfedbfee13a8
+        .quad 0x3fbc885801bc4b23
+        .quad 0x3fbc40d6425a5cb1
+        .quad 0x3fbbf968769fca11
+        .quad 0x3fbbb20e936d6974
+        .quad 0x3fbb6ac88dad5b1c
+        .quad 0x3fbb23965a52ff00
+        .quad 0x3fbadc77ee5aea8c
+        .quad 0x3fba956d3ecade63
+        .quad 0x3fba4e7640b1bc38
+        .quad 0x3fba0792e9277cac
+        .quad 0x3fb9c0c32d4d2548
+        .quad 0x3fb97a07024cbe74
+        .quad 0x3fb9335e5d594989
+        .quad 0x3fb8ecc933aeb6e8
+        .quad 0x3fb8a6477a91dc29
+        .quad 0x3fb85fd927506a48
+        .quad 0x3fb8197e2f40e3f0
+        .quad 0x3fb7d33687c293c9
+        .quad 0x3fb78d02263d82d3
+        .quad 0x3fb746e100226ed9
+        .quad 0x3fb700d30aeac0e1
+        .quad 0x3fb6bad83c1883b6
+        .quad 0x3fb674f089365a7a
+        .quad 0x3fb62f1be7d77743
+        .quad 0x3fb5e95a4d9791cb
+        .quad 0x3fb5a3abb01ade25
+        .quad 0x3fb55e10050e0384
+        .quad 0x3fb518874226130a
+        .quad 0x3fb4d3115d207eac
+        .quad 0x3fb48dae4bc31018
+        .quad 0x3fb4485e03dbdfad
+        .quad 0x3fb403207b414b7f
+        .quad 0x3fb3bdf5a7d1ee64
+        .quad 0x3fb378dd7f749714
+        .quad 0x3fb333d7f8183f4b
+        .quad 0x3fb2eee507b40301
+        .quad 0x3fb2aa04a44717a5
+        .quad 0x3fb26536c3d8c369
+        .quad 0x3fb2207b5c78549e
+        .quad 0x3fb1dbd2643d190b
+        .quad 0x3fb1973bd1465567
+        .quad 0x3fb152b799bb3cc9
+        .quad 0x3fb10e45b3cae831
+        .quad 0x3fb0c9e615ac4e17
+        .quad 0x3fb08598b59e3a07
+        .quad 0x3fb0415d89e74444
+        .quad 0x3faffa6911ab9301
+        .quad 0x3faf723b517fc523
+        .quad 0x3faeea31c006b87c
+        .quad 0x3fae624c4a0b5e1b
+        .quad 0x3fadda8adc67ee4e
+        .quad 0x3fad52ed6405d86f
+        .quad 0x3faccb73cdddb2cc
+        .quad 0x3fac441e06f72a9e
+        .quad 0x3fabbcebfc68f420
+        .quad 0x3fab35dd9b58baad
+        .quad 0x3faaaef2d0fb10fc
+        .quad 0x3faa282b8a936171
+        .quad 0x3fa9a187b573de7c
+        .quad 0x3fa91b073efd7314
+        .quad 0x3fa894aa149fb343
+        .quad 0x3fa80e7023d8ccc4
+        .quad 0x3fa788595a3577ba
+        .quad 0x3fa70265a550e777
+        .quad 0x3fa67c94f2d4bb58
+        .quad 0x3fa5f6e73078efb8
+        .quad 0x3fa5715c4c03ceef
+        .quad 0x3fa4ebf43349e26f
+        .quad 0x3fa466aed42de3ea
+        .quad 0x3fa3e18c1ca0ae92
+        .quad 0x3fa35c8bfaa1306b
+        .quad 0x3fa2d7ae5c3c5bae
+        .quad 0x3fa252f32f8d183f
+        .quad 0x3fa1ce5a62bc353a
+        .quad 0x3fa149e3e4005a8d
+        .quad 0x3fa0c58fa19dfaaa
+        .quad 0x3fa0415d89e74444
+        .quad 0x3f9f7a9b16782856
+        .quad 0x3f9e72bf2813ce51
+        .quad 0x3f9d6b2725979802
+        .quad 0x3f9c63d2ec14aaf2
+        .quad 0x3f9b5cc258b718e6
+        .quad 0x3f9a55f548c5c43f
+        .quad 0x3f994f6b99a24475
+        .quad 0x3f98492528c8cabf
+        .quad 0x3f974321d3d006d3
+        .quad 0x3f963d6178690bd6
+        .quad 0x3f9537e3f45f3565
+        .quad 0x3f9432a925980cc1
+        .quad 0x3f932db0ea132e22
+        .quad 0x3f9228fb1fea2e28
+        .quad 0x3f912487a5507f70
+        .quad 0x3f90205658935847
+        .quad 0x3f8e38ce3033310c
+        .quad 0x3f8c317384c75f06
+        .quad 0x3f8a2a9c6c170462
+        .quad 0x3f882448a388a2aa
+        .quad 0x3f861e77e8b53fc6
+        .quad 0x3f841929f96832f0
+        .quad 0x3f82145e939ef1e9
+        .quad 0x3f8010157588de71
+        .quad 0x3f7c189cbb0e27fb
+        .quad 0x3f78121214586b54
+        .quad 0x3f740c8a747878e2
+        .quad 0x3f70080559588b35
+        .quad 0x3f680904828985c0
+        .quad 0x3f60040155d5889e
+        .quad 0x3f50020055655889
+        .quad 0x0000000000000000
+        /*== poly_coeff[4] ==*/
+        .align 16
+        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
+        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
+        .quad 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
+        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 16
+        .quad 0x3f50000000000000, 0x3f50000000000000
+        /*== MinLog1p = -1+2^(-53) ==*/
+        .align 16
+        .quad 0xbfefffffffffffff, 0xbfefffffffffffff
+        /*== MaxLog1p ==*/
+        .align 16
+        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000
+        /*== One ==*/
+        .align 16
+        .quad 0x3ff0000000000000, 0x3ff0000000000000
+        /*== SgnMask ==*/
+        .align 16
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== XThreshold ==*/
+        .align 16
+        .quad 0x3e00000000000000, 0x3e00000000000000
+        /*== XhMask ==*/
+        .align 16
+        .quad 0xfffffffffffffc00, 0xfffffffffffffc00
+        /*== Threshold ==*/
+        .align 16
+        .quad 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 16
+        .quad 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 16
+        .quad 0x408ff00000000000, 0x408ff00000000000
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x7ff0000000000000, 0x7ff0000000000000
+        /*== ExpMask2 ==*/
+        .align 16
+        .quad 0x7f40000000000000, 0x7f40000000000000
+        /*== L2L ==*/
+        .align 16
+        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
+        .align 16
+        .type	__svml_dlog1p_data_internal,@object
+        .size	__svml_dlog1p_data_internal,.-__svml_dlog1p_data_internal
+        .space 96, 0x00 	
+        .align 16
+
+.FLT_16:
+        .long	0x00000000,0x43380000,0x00000000,0x43380000
+        .type	.FLT_16,@object
+        .size	.FLT_16,16
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
new file mode 100644
index 0000000000..ec01af680c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized log1p, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_log1p _ZGVdN4v_log1p_sse_wrapper
+#include "../svml_d_log1p4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
new file mode 100644
index 0000000000..808f3224ef
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized log1p, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_log1p
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_log1p, __GI__ZGVdN4v_log1p, __redirect__ZGVdN4v_log1p)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
new file mode 100644
index 0000000000..548538b0ec
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
@@ -0,0 +1,1383 @@
+/* Function log1p vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
+ *    Get short reciprocal approximation Rcp ~ 1/xh
+ *    R = (Rcp*xh - 1.0) + Rcp*xl
+ *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
+ *       log(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dlog1p_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	8224
+#define poly_coeff                    	12352
+#define ExpMask                       	12480
+#define Two10                         	12512
+#define MinLog1p                      	12544
+#define MaxLog1p                      	12576
+#define One                           	12608
+#define SgnMask                       	12640
+#define XThreshold                    	12672
+#define XhMask                        	12704
+#define Threshold                     	12736
+#define Bias                          	12768
+#define Bias1                         	12800
+#define ExpMask0                      	12832
+#define ExpMask2                      	12864
+#define L2                            	12896
+
+/* Lookup bias for data table __svml_dlog1p_data_internal.  */
+#define Table_Lookup_Bias               -0x405fe0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_log1p_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       Table_Lookup_Bias+__svml_dlog1p_data_internal(%rip), %r8
+
+/* SgnMask used by all accuracies */
+        vmovupd   SgnMask+__svml_dlog1p_data_internal(%rip), %ymm12
+        vmovupd   One+__svml_dlog1p_data_internal(%rip), %ymm7
+
+/* 2^ (-10-exp(X) ) */
+        vmovupd   ExpMask2+__svml_dlog1p_data_internal(%rip), %ymm3
+        vmovapd   %ymm0, %ymm9
+        vandpd    %ymm12, %ymm9, %ymm10
+        vcmplt_oqpd XThreshold+__svml_dlog1p_data_internal(%rip), %ymm10, %ymm11
+        vaddpd    %ymm7, %ymm9, %ymm13
+
+/* compute 1+x as high, low parts */
+        vmaxpd    %ymm9, %ymm7, %ymm15
+        vminpd    %ymm9, %ymm7, %ymm6
+        vorpd     XhMask+__svml_dlog1p_data_internal(%rip), %ymm11, %ymm14
+        vandpd    %ymm14, %ymm13, %ymm4
+
+/* preserve mantissa, set input exponent to 2^(-10) */
+        vandpd    ExpMask+__svml_dlog1p_data_internal(%rip), %ymm4, %ymm5
+        vorpd     Two10+__svml_dlog1p_data_internal(%rip), %ymm5, %ymm5
+
+/* reciprocal approximation good to at least 11 bits */
+        vcvtpd2ps %ymm5, %xmm2
+        vsubpd    %ymm4, %ymm15, %ymm0
+
+/* check range */
+        vcmplt_oqpd MinLog1p+__svml_dlog1p_data_internal(%rip), %ymm9, %ymm15
+        vrcpps    %xmm2, %xmm1
+        vaddpd    %ymm0, %ymm6, %ymm6
+        vcmpnle_uqpd MaxLog1p+__svml_dlog1p_data_internal(%rip), %ymm9, %ymm0
+        vcvtps2pd %xmm1, %ymm11
+
+/* exponent of X needed to scale Xl */
+        vandps    ExpMask0+__svml_dlog1p_data_internal(%rip), %ymm4, %ymm10
+        vpsubq    %ymm10, %ymm3, %ymm13
+
+/* exponent bits */
+        vpsrlq    $20, %ymm4, %ymm4
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        vroundpd  $0, %ymm11, %ymm3
+
+/* scale DblRcp */
+        vmulpd    %ymm13, %ymm3, %ymm2
+
+/* exponent*log(2.0) */
+        vmovupd   Threshold+__svml_dlog1p_data_internal(%rip), %ymm13
+        vfmsub213pd %ymm7, %ymm3, %ymm5
+
+/* Compute SignMask for all accuracies, including EP */
+        vandnpd   %ymm9, %ymm12, %ymm8
+        vorpd     %ymm0, %ymm15, %ymm7
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        vpsrlq    $40, %ymm3, %ymm0
+
+/*
+ * argument reduction
+ * VQFMS( D, R, X, DblRcp1, One );
+ */
+        vfmadd213pd %ymm5, %ymm2, %ymm6
+        vmovupd   poly_coeff+64+__svml_dlog1p_data_internal(%rip), %ymm2
+        vcmplt_oqpd %ymm3, %ymm13, %ymm3
+        vmulpd    %ymm6, %ymm6, %ymm5
+        vfmadd213pd poly_coeff+96+__svml_dlog1p_data_internal(%rip), %ymm6, %ymm2
+
+/* combine and get argument value range mask */
+        vmovmskpd %ymm7, %eax
+        vextractf128 $1, %ymm4, %xmm12
+        vshufps   $221, %xmm12, %xmm4, %xmm14
+
+/* biased exponent in DP format */
+        vcvtdq2pd %xmm14, %ymm1
+        vandpd    Bias+__svml_dlog1p_data_internal(%rip), %ymm3, %ymm14
+        vorpd     Bias1+__svml_dlog1p_data_internal(%rip), %ymm14, %ymm15
+        vsubpd    %ymm15, %ymm1, %ymm1
+        vmulpd    L2+__svml_dlog1p_data_internal(%rip), %ymm1, %ymm3
+
+/* polynomial */
+        vmovupd   poly_coeff+__svml_dlog1p_data_internal(%rip), %ymm1
+        vfmadd213pd poly_coeff+32+__svml_dlog1p_data_internal(%rip), %ymm6, %ymm1
+        vfmadd213pd %ymm2, %ymm5, %ymm1
+
+/* reconstruction */
+        vfmadd213pd %ymm6, %ymm5, %ymm1
+        vextractf128 $1, %ymm0, %xmm10
+        vmovd     %xmm0, %edx
+        vmovd     %xmm10, %esi
+        movslq    %edx, %rdx
+        vpextrd   $2, %xmm0, %ecx
+        movslq    %esi, %rsi
+        vpextrd   $2, %xmm10, %edi
+        movslq    %ecx, %rcx
+        movslq    %edi, %rdi
+        vmovsd    (%r8,%rdx), %xmm4
+        vmovsd    (%r8,%rsi), %xmm11
+        vmovhpd   (%r8,%rcx), %xmm4, %xmm7
+        vmovhpd   (%r8,%rdi), %xmm11, %xmm12
+        vinsertf128 $1, %xmm12, %ymm7, %ymm0
+        vaddpd    %ymm1, %ymm0, %ymm6
+        vaddpd    %ymm6, %ymm3, %ymm0
+
+/* OR in the Sign of input argument to produce correct log1p(-0) */
+        vorpd     %ymm8, %ymm0, %ymm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm9
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm9, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      log1p@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_log1p_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dlog1p_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
+        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
+        __declspec(align(32)) VUINT32 ExpMask[4][2];
+        __declspec(align(32)) VUINT32 Two10[4][2];
+        __declspec(align(32)) VUINT32 MinLog1p[4][2];
+        __declspec(align(32)) VUINT32 MaxLog1p[4][2];
+        __declspec(align(32)) VUINT32 One[4][2];
+        __declspec(align(32)) VUINT32 SgnMask[4][2];
+        __declspec(align(32)) VUINT32 XThreshold[4][2];
+        __declspec(align(32)) VUINT32 XhMask[4][2];
+        __declspec(align(32)) VUINT32 Threshold[4][2];
+        __declspec(align(32)) VUINT32 Bias[4][2];
+        __declspec(align(32)) VUINT32 Bias1[4][2];
+        __declspec(align(32)) VUINT32 ExpMask0[4][2];
+        __declspec(align(32)) VUINT32 ExpMask2[4][2];
+        __declspec(align(32)) VUINT32 L2[4][2];
+} __svml_dlog1p_data_internal;
+#endif
+__svml_dlog1p_data_internal:
+        /* Log_HA_table */
+        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
+        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
+        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
+        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
+        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
+        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
+        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
+        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
+        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
+        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
+        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
+        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
+        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
+        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
+        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
+        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
+        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
+        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
+        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
+        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
+        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
+        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
+        .quad 0xc086238206e94218, 0xbe1ceee898588610
+        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
+        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
+        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
+        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
+        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
+        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
+        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
+        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
+        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
+        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
+        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
+        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
+        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
+        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
+        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
+        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
+        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
+        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
+        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
+        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
+        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
+        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
+        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
+        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
+        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
+        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
+        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
+        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
+        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
+        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
+        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
+        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
+        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
+        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
+        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
+        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
+        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
+        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
+        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
+        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
+        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
+        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
+        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
+        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
+        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
+        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
+        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
+        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
+        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
+        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
+        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
+        .quad 0xc086244055d2c968, 0xbe1cef345284c119
+        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
+        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
+        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
+        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
+        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
+        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
+        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
+        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
+        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
+        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
+        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
+        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
+        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
+        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
+        .quad 0xc086247419475160, 0xbe1cf03dd9922331
+        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
+        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
+        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
+        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
+        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
+        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
+        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
+        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
+        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
+        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
+        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
+        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
+        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
+        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
+        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
+        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
+        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
+        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
+        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
+        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
+        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
+        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
+        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
+        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
+        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
+        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
+        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
+        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
+        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
+        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
+        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
+        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
+        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
+        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
+        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
+        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
+        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
+        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
+        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
+        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
+        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
+        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
+        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
+        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
+        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
+        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
+        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
+        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
+        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
+        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
+        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
+        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
+        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
+        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
+        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
+        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
+        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
+        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
+        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
+        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
+        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
+        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
+        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
+        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
+        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
+        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
+        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
+        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
+        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
+        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
+        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
+        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
+        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
+        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
+        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
+        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
+        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
+        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
+        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
+        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
+        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
+        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
+        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
+        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
+        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
+        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
+        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
+        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
+        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
+        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
+        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
+        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
+        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
+        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
+        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
+        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
+        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
+        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
+        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
+        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
+        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
+        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
+        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
+        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
+        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
+        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
+        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
+        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
+        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
+        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
+        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
+        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
+        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
+        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
+        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
+        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
+        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
+        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
+        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
+        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
+        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
+        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
+        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
+        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
+        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
+        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
+        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
+        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
+        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
+        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
+        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
+        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
+        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
+        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
+        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
+        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
+        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
+        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
+        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
+        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
+        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
+        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
+        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
+        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
+        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
+        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
+        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
+        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
+        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
+        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
+        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
+        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
+        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
+        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
+        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
+        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
+        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
+        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
+        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
+        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
+        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
+        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
+        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
+        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
+        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
+        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
+        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
+        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
+        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
+        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
+        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
+        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
+        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
+        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
+        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
+        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
+        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
+        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
+        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
+        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
+        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
+        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
+        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
+        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
+        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
+        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
+        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
+        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
+        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
+        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
+        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
+        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
+        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
+        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
+        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
+        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
+        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
+        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
+        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
+        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
+        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
+        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
+        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
+        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
+        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
+        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
+        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
+        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
+        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
+        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
+        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
+        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
+        .quad 0xc08626e164224880, 0xbe1ceeb431709788
+        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
+        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
+        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
+        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
+        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
+        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
+        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
+        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
+        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
+        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
+        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
+        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
+        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
+        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
+        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
+        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
+        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
+        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
+        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
+        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
+        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
+        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
+        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
+        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
+        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
+        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
+        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
+        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
+        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
+        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
+        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
+        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
+        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
+        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
+        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
+        .quad 0xc086273a05367688, 0xbe1cf18656c50806
+        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
+        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
+        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
+        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
+        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
+        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
+        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
+        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
+        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
+        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
+        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
+        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
+        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
+        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
+        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
+        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
+        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
+        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
+        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
+        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
+        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
+        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
+        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
+        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
+        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
+        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
+        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
+        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
+        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
+        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
+        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
+        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
+        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
+        .quad 0xc086278a58297918, 0xbe1cf053073872bf
+        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
+        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
+        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
+        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
+        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
+        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
+        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
+        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
+        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
+        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
+        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
+        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
+        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
+        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
+        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
+        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
+        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
+        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
+        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
+        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
+        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
+        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
+        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
+        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
+        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
+        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
+        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
+        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
+        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
+        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
+        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
+        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
+        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
+        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
+        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
+        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
+        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
+        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
+        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
+        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
+        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
+        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
+        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
+        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
+        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
+        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
+        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
+        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
+        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
+        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
+        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
+        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
+        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
+        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
+        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
+        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
+        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
+        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
+        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
+        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
+        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
+        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
+        .quad 0xc086281755366778, 0xbe1cef2edae5837d
+        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
+        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
+        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
+        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
+        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
+        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
+        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
+        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
+        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
+        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
+        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
+        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
+        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
+        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
+        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
+        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
+        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
+        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
+        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
+        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
+        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
+        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
+        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
+        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
+        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
+        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
+        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
+        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
+        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
+        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
+        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
+        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
+        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
+        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
+        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
+        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
+        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
+        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
+        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
+        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
+        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
+        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
+        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
+        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
+        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
+        .quad 0xc086287879041490, 0xbe1cf034803c8a48
+        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
+        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
+        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
+        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
+        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
+        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
+        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
+        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
+        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
+        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
+        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
+        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
+        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
+        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
+        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
+        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
+        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
+        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
+        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
+        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
+        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
+        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
+        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
+        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
+        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
+        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
+        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
+        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
+        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
+        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
+        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
+        /*== Log_LA_table ==*/
+        .align 32
+        .quad 0x8000000000000000
+        .quad 0xbf5ff802a9ab10e6
+        .quad 0xbf6ff00aa2b10bc0
+        .quad 0xbf77ee11ebd82e94
+        .quad 0xbf7fe02a6b106789
+        .quad 0xbf83e7295d25a7d9
+        .quad 0xbf87dc475f810a77
+        .quad 0xbf8bcf712c74384c
+        .quad 0xbf8fc0a8b0fc03e4
+        .quad 0xbf91d7f7eb9eebe7
+        .quad 0xbf93cea44346a575
+        .quad 0xbf95c45a51b8d389
+        .quad 0xbf97b91b07d5b11b
+        .quad 0xbf99ace7551cc514
+        .quad 0xbf9b9fc027af9198
+        .quad 0xbf9d91a66c543cc4
+        .quad 0xbf9f829b0e783300
+        .quad 0xbfa0b94f7c196176
+        .quad 0xbfa1b0d98923d980
+        .quad 0xbfa2a7ec2214e873
+        .quad 0xbfa39e87b9febd60
+        .quad 0xbfa494acc34d911c
+        .quad 0xbfa58a5bafc8e4d5
+        .quad 0xbfa67f94f094bd98
+        .quad 0xbfa77458f632dcfc
+        .quad 0xbfa868a83083f6cf
+        .quad 0xbfa95c830ec8e3eb
+        .quad 0xbfaa4fe9ffa3d235
+        .quad 0xbfab42dd711971bf
+        .quad 0xbfac355dd0921f2d
+        .quad 0xbfad276b8adb0b52
+        .quad 0xbfae19070c276016
+        .quad 0xbfaf0a30c01162a6
+        .quad 0xbfaffae9119b9303
+        .quad 0xbfb075983598e471
+        .quad 0xbfb0ed839b5526fe
+        .quad 0xbfb16536eea37ae1
+        .quad 0xbfb1dcb263db1944
+        .quad 0xbfb253f62f0a1417
+        .quad 0xbfb2cb0283f5de1f
+        .quad 0xbfb341d7961bd1d1
+        .quad 0xbfb3b87598b1b6ee
+        .quad 0xbfb42edcbea646f0
+        .quad 0xbfb4a50d3aa1b040
+        .quad 0xbfb51b073f06183f
+        .quad 0xbfb590cafdf01c28
+        .quad 0xbfb60658a93750c4
+        .quad 0xbfb67bb0726ec0fc
+        .quad 0xbfb6f0d28ae56b4c
+        .quad 0xbfb765bf23a6be13
+        .quad 0xbfb7da766d7b12cd
+        .quad 0xbfb84ef898e8282a
+        .quad 0xbfb8c345d6319b21
+        .quad 0xbfb9375e55595ede
+        .quad 0xbfb9ab42462033ad
+        .quad 0xbfba1ef1d8061cd4
+        .quad 0xbfba926d3a4ad563
+        .quad 0xbfbb05b49bee43fe
+        .quad 0xbfbb78c82bb0eda1
+        .quad 0xbfbbeba818146765
+        .quad 0xbfbc5e548f5bc743
+        .quad 0xbfbcd0cdbf8c13e1
+        .quad 0xbfbd4313d66cb35d
+        .quad 0xbfbdb5270187d927
+        .quad 0xbfbe27076e2af2e6
+        .quad 0xbfbe98b549671467
+        .quad 0xbfbf0a30c01162a6
+        .quad 0xbfbf7b79fec37ddf
+        .quad 0xbfbfec9131dbeabb
+        .quad 0xbfc02ebb42bf3d4b
+        .quad 0xbfc0671512ca596e
+        .quad 0xbfc09f561ee719c3
+        .quad 0xbfc0d77e7cd08e59
+        .quad 0xbfc10f8e422539b1
+        .quad 0xbfc14785846742ac
+        .quad 0xbfc17f6458fca611
+        .quad 0xbfc1b72ad52f67a0
+        .quad 0xbfc1eed90e2dc2c3
+        .quad 0xbfc2266f190a5acb
+        .quad 0xbfc25ded0abc6ad2
+        .quad 0xbfc29552f81ff523
+        .quad 0xbfc2cca0f5f5f251
+        .quad 0xbfc303d718e47fd3
+        .quad 0xbfc33af575770e4f
+        .quad 0xbfc371fc201e8f74
+        .quad 0xbfc3a8eb2d31a376
+        .quad 0xbfc3dfc2b0ecc62a
+        .quad 0xbfc41682bf727bc0
+        .quad 0xbfc44d2b6ccb7d1e
+        .quad 0xbfc483bccce6e3dd
+        .quad 0xbfc4ba36f39a55e5
+        .quad 0xbfc4f099f4a230b2
+        .quad 0xbfc526e5e3a1b438
+        .quad 0xbfc55d1ad4232d6f
+        .quad 0xbfc59338d9982086
+        .quad 0xbfc5c940075972b9
+        .quad 0xbfc5ff3070a793d4
+        .quad 0xbfc6350a28aaa758
+        .quad 0xbfc66acd4272ad51
+        .quad 0xbfc6a079d0f7aad2
+        .quad 0xbfc6d60fe719d21d
+        .quad 0xbfc70b8f97a1aa75
+        .quad 0xbfc740f8f54037a5
+        .quad 0xbfc7764c128f2127
+        .quad 0xbfc7ab890210d909
+        .quad 0xbfc7e0afd630c274
+        .quad 0xbfc815c0a14357eb
+        .quad 0xbfc84abb75865139
+        .quad 0xbfc87fa06520c911
+        .quad 0xbfc8b46f8223625b
+        .quad 0xbfc8e928de886d41
+        .quad 0xbfc91dcc8c340bde
+        .quad 0xbfc9525a9cf456b4
+        .quad 0xbfc986d3228180ca
+        .quad 0xbfc9bb362e7dfb83
+        .quad 0xbfc9ef83d2769a34
+        .quad 0xbfca23bc1fe2b563
+        .quad 0xbfca57df28244dcd
+        .quad 0xbfca8becfc882f19
+        .quad 0xbfcabfe5ae46124c
+        .quad 0xbfcaf3c94e80bff3
+        .quad 0xbfcb2797ee46320c
+        .quad 0xbfcb5b519e8fb5a4
+        .quad 0xbfcb8ef670420c3b
+        .quad 0xbfcbc286742d8cd6
+        .quad 0xbfcbf601bb0e44e2
+        .quad 0xbfcc2968558c18c1
+        .quad 0xbfcc5cba543ae425
+        .quad 0xbfcc8ff7c79a9a22
+        .quad 0xbfccc320c0176502
+        .quad 0xbfccf6354e09c5dc
+        .quad 0xbfcd293581b6b3e7
+        .quad 0xbfcd5c216b4fbb91
+        .quad 0xbfcd8ef91af31d5e
+        .quad 0xbfcdc1bca0abec7d
+        .quad 0xbfcdf46c0c722d2f
+        .quad 0xbfce27076e2af2e6
+        .quad 0xbfce598ed5a87e2f
+        .quad 0xbfce8c0252aa5a60
+        .quad 0xbfcebe61f4dd7b0b
+        .quad 0xbfcef0adcbdc5936
+        .quad 0xbfcf22e5e72f105d
+        .quad 0xbfcf550a564b7b37
+        .quad 0xbfcf871b28955045
+        .quad 0xbfcfb9186d5e3e2b
+        .quad 0xbfcfeb0233e607cc
+        .quad 0xbfd00e6c45ad501d
+        .quad 0xbfd0274dc16c232f
+        .quad 0xbfd0402594b4d041
+        .quad 0xbfd058f3c703ebc6
+        .quad 0xbfd071b85fcd590d
+        .quad 0xbfd08a73667c57af
+        .quad 0xbfd0a324e27390e3
+        .quad 0xbfd0bbccdb0d24bd
+        .quad 0xbfd0d46b579ab74b
+        .quad 0xbfd0ed005f657da4
+        .quad 0xbfd1058bf9ae4ad5
+        .quad 0xbfd11e0e2dad9cb7
+        .quad 0xbfd136870293a8b0
+        .quad 0xbfd14ef67f88685a
+        .quad 0xbfd1675cababa60e
+        .quad 0xbfd17fb98e15095d
+        .quad 0xbfd1980d2dd4236f
+        .quad 0xbfd1b05791f07b49
+        .quad 0xbfd1c898c16999fb
+        .quad 0xbfd1e0d0c33716be
+        .quad 0xbfd1f8ff9e48a2f3
+        .quad 0xbfd211255986160c
+        .quad 0xbfd22941fbcf7966
+        .quad 0xbfd241558bfd1404
+        .quad 0xbfd2596010df763a
+        .quad 0xbfd27161913f853d
+        .quad 0xbfd2895a13de86a3
+        .quad 0xbfd2a1499f762bc9
+        .quad 0xbfd2b9303ab89d25
+        .quad 0xbfd2d10dec508583
+        .quad 0xbfd2e8e2bae11d31
+        .quad 0xbfd300aead06350c
+        .quad 0xbfd31871c9544185
+        .quad 0xbfd3302c16586588
+        .quad 0xbfd347dd9a987d55
+        .quad 0xbfd35f865c93293e
+        .quad 0xbfd3772662bfd85b
+        .quad 0xbfd38ebdb38ed321
+        .quad 0xbfd3a64c556945ea
+        .quad 0xbfd3bdd24eb14b6a
+        .quad 0xbfd3d54fa5c1f710
+        .quad 0xbfd3ecc460ef5f50
+        .quad 0xbfd404308686a7e4
+        .quad 0xbfd41b941cce0bee
+        .quad 0xbfd432ef2a04e814
+        .quad 0xbfd44a41b463c47c
+        .quad 0xbfd4618bc21c5ec2
+        .quad 0xbfd478cd5959b3d9
+        .quad 0xbfd49006804009d1
+        .quad 0xbfd4a7373cecf997
+        .quad 0xbfd4be5f957778a1
+        .quad 0xbfd4d57f8fefe27f
+        .quad 0xbfd4ec973260026a
+        .quad 0xbfd503a682cb1cb3
+        .quad 0xbfd51aad872df82d
+        .quad 0xbfd531ac457ee77e
+        .quad 0xbfd548a2c3add263
+        .quad 0xbfd55f9107a43ee2
+        .quad 0xbfd5767717455a6c
+        .quad 0xbfd58d54f86e02f2
+        .quad 0xbfd5a42ab0f4cfe2
+        .quad 0xbfd5baf846aa1b19
+        .quad 0xbfd5d1bdbf5809ca
+        .quad 0xbfd5e87b20c2954a
+        .quad 0xbfd5ff3070a793d4
+        .quad 0xbfd615ddb4bec13c
+        .quad 0xbfd62c82f2b9c795
+        .quad 0x3fd61965cdb02c1f
+        .quad 0x3fd602d08af091ec
+        .quad 0x3fd5ec433d5c35ae
+        .quad 0x3fd5d5bddf595f30
+        .quad 0x3fd5bf406b543db2
+        .quad 0x3fd5a8cadbbedfa1
+        .quad 0x3fd5925d2b112a59
+        .quad 0x3fd57bf753c8d1fb
+        .quad 0x3fd565995069514c
+        .quad 0x3fd54f431b7be1a9
+        .quad 0x3fd538f4af8f72fe
+        .quad 0x3fd522ae0738a3d8
+        .quad 0x3fd50c6f1d11b97c
+        .quad 0x3fd4f637ebba9810
+        .quad 0x3fd4e0086dd8baca
+        .quad 0x3fd4c9e09e172c3c
+        .quad 0x3fd4b3c077267e9a
+        .quad 0x3fd49da7f3bcc41f
+        .quad 0x3fd487970e958770
+        .quad 0x3fd4718dc271c41b
+        .quad 0x3fd45b8c0a17df13
+        .quad 0x3fd44591e0539f49
+        .quad 0x3fd42f9f3ff62642
+        .quad 0x3fd419b423d5e8c7
+        .quad 0x3fd403d086cea79c
+        .quad 0x3fd3edf463c1683e
+        .quad 0x3fd3d81fb5946dba
+        .quad 0x3fd3c25277333184
+        .quad 0x3fd3ac8ca38e5c5f
+        .quad 0x3fd396ce359bbf54
+        .quad 0x3fd3811728564cb2
+        .quad 0x3fd36b6776be1117
+        .quad 0x3fd355bf1bd82c8b
+        .quad 0x3fd3401e12aecba1
+        .quad 0x3fd32a84565120a8
+        .quad 0x3fd314f1e1d35ce4
+        .quad 0x3fd2ff66b04ea9d4
+        .quad 0x3fd2e9e2bce12286
+        .quad 0x3fd2d46602adccee
+        .quad 0x3fd2bef07cdc9354
+        .quad 0x3fd2a982269a3dbf
+        .quad 0x3fd2941afb186b7c
+        .quad 0x3fd27ebaf58d8c9d
+        .quad 0x3fd269621134db92
+        .quad 0x3fd25410494e56c7
+        .quad 0x3fd23ec5991eba49
+        .quad 0x3fd22981fbef797b
+        .quad 0x3fd214456d0eb8d4
+        .quad 0x3fd1ff0fe7cf47a7
+        .quad 0x3fd1e9e1678899f4
+        .quad 0x3fd1d4b9e796c245
+        .quad 0x3fd1bf99635a6b95
+        .quad 0x3fd1aa7fd638d33f
+        .quad 0x3fd1956d3b9bc2fa
+        .quad 0x3fd180618ef18adf
+        .quad 0x3fd16b5ccbacfb73
+        .quad 0x3fd1565eed455fc3
+        .quad 0x3fd14167ef367783
+        .quad 0x3fd12c77cd00713b
+        .quad 0x3fd1178e8227e47c
+        .quad 0x3fd102ac0a35cc1c
+        .quad 0x3fd0edd060b78081
+        .quad 0x3fd0d8fb813eb1ef
+        .quad 0x3fd0c42d676162e3
+        .quad 0x3fd0af660eb9e279
+        .quad 0x3fd09aa572e6c6d4
+        .quad 0x3fd085eb8f8ae797
+        .quad 0x3fd07138604d5862
+        .quad 0x3fd05c8be0d9635a
+        .quad 0x3fd047e60cde83b8
+        .quad 0x3fd03346e0106062
+        .quad 0x3fd01eae5626c691
+        .quad 0x3fd00a1c6adda473
+        .quad 0x3fcfeb2233ea07cd
+        .quad 0x3fcfc218be620a5e
+        .quad 0x3fcf991c6cb3b379
+        .quad 0x3fcf702d36777df0
+        .quad 0x3fcf474b134df229
+        .quad 0x3fcf1e75fadf9bde
+        .quad 0x3fcef5ade4dcffe6
+        .quad 0x3fceccf2c8fe920a
+        .quad 0x3fcea4449f04aaf5
+        .quad 0x3fce7ba35eb77e2a
+        .quad 0x3fce530effe71012
+        .quad 0x3fce2a877a6b2c12
+        .quad 0x3fce020cc6235ab5
+        .quad 0x3fcdd99edaf6d7e9
+        .quad 0x3fcdb13db0d48940
+        .quad 0x3fcd88e93fb2f450
+        .quad 0x3fcd60a17f903515
+        .quad 0x3fcd38666871f465
+        .quad 0x3fcd1037f2655e7b
+        .quad 0x3fcce816157f1988
+        .quad 0x3fccc000c9db3c52
+        .quad 0x3fcc97f8079d44ec
+        .quad 0x3fcc6ffbc6f00f71
+        .quad 0x3fcc480c0005ccd1
+        .quad 0x3fcc2028ab17f9b4
+        .quad 0x3fcbf851c067555f
+        .quad 0x3fcbd087383bd8ad
+        .quad 0x3fcba8c90ae4ad19
+        .quad 0x3fcb811730b823d2
+        .quad 0x3fcb5971a213acdb
+        .quad 0x3fcb31d8575bce3d
+        .quad 0x3fcb0a4b48fc1b46
+        .quad 0x3fcae2ca6f672bd4
+        .quad 0x3fcabb55c31693ad
+        .quad 0x3fca93ed3c8ad9e3
+        .quad 0x3fca6c90d44b704e
+        .quad 0x3fca454082e6ab05
+        .quad 0x3fca1dfc40f1b7f1
+        .quad 0x3fc9f6c407089664
+        .quad 0x3fc9cf97cdce0ec3
+        .quad 0x3fc9a8778debaa38
+        .quad 0x3fc981634011aa75
+        .quad 0x3fc95a5adcf7017f
+        .quad 0x3fc9335e5d594989
+        .quad 0x3fc90c6db9fcbcd9
+        .quad 0x3fc8e588ebac2dbf
+        .quad 0x3fc8beafeb38fe8c
+        .quad 0x3fc897e2b17b19a5
+        .quad 0x3fc871213750e994
+        .quad 0x3fc84a6b759f512f
+        .quad 0x3fc823c16551a3c2
+        .quad 0x3fc7fd22ff599d4f
+        .quad 0x3fc7d6903caf5ad0
+        .quad 0x3fc7b0091651528c
+        .quad 0x3fc7898d85444c73
+        .quad 0x3fc7631d82935a86
+        .quad 0x3fc73cb9074fd14d
+        .quad 0x3fc716600c914054
+        .quad 0x3fc6f0128b756abc
+        .quad 0x3fc6c9d07d203fc7
+        .quad 0x3fc6a399dabbd383
+        .quad 0x3fc67d6e9d785771
+        .quad 0x3fc6574ebe8c133a
+        .quad 0x3fc6313a37335d76
+        .quad 0x3fc60b3100b09476
+        .quad 0x3fc5e533144c1719
+        .quad 0x3fc5bf406b543db2
+        .quad 0x3fc59958ff1d52f1
+        .quad 0x3fc5737cc9018cdd
+        .quad 0x3fc54dabc26105d2
+        .quad 0x3fc527e5e4a1b58d
+        .quad 0x3fc5022b292f6a45
+        .quad 0x3fc4dc7b897bc1c8
+        .quad 0x3fc4b6d6fefe22a4
+        .quad 0x3fc4913d8333b561
+        .quad 0x3fc46baf0f9f5db7
+        .quad 0x3fc4462b9dc9b3dc
+        .quad 0x3fc420b32740fdd4
+        .quad 0x3fc3fb45a59928cc
+        .quad 0x3fc3d5e3126bc27f
+        .quad 0x3fc3b08b6757f2a9
+        .quad 0x3fc38b3e9e027479
+        .quad 0x3fc365fcb0159016
+        .quad 0x3fc340c59741142e
+        .quad 0x3fc31b994d3a4f85
+        .quad 0x3fc2f677cbbc0a96
+        .quad 0x3fc2d1610c86813a
+        .quad 0x3fc2ac55095f5c59
+        .quad 0x3fc28753bc11aba5
+        .quad 0x3fc2625d1e6ddf57
+        .quad 0x3fc23d712a49c202
+        .quad 0x3fc2188fd9807263
+        .quad 0x3fc1f3b925f25d41
+        .quad 0x3fc1ceed09853752
+        .quad 0x3fc1aa2b7e23f72a
+        .quad 0x3fc185747dbecf34
+        .quad 0x3fc160c8024b27b1
+        .quad 0x3fc13c2605c398c3
+        .quad 0x3fc1178e8227e47c
+        .quad 0x3fc0f301717cf0fb
+        .quad 0x3fc0ce7ecdccc28d
+        .quad 0x3fc0aa06912675d5
+        .quad 0x3fc08598b59e3a07
+        .quad 0x3fc06135354d4b18
+        .quad 0x3fc03cdc0a51ec0d
+        .quad 0x3fc0188d2ecf6140
+        .quad 0x3fbfe89139dbd566
+        .quad 0x3fbfa01c9db57ce2
+        .quad 0x3fbf57bc7d9005db
+        .quad 0x3fbf0f70cdd992e3
+        .quad 0x3fbec739830a1120
+        .quad 0x3fbe7f1691a32d3e
+        .quad 0x3fbe3707ee30487b
+        .quad 0x3fbdef0d8d466db9
+        .quad 0x3fbda727638446a2
+        .quad 0x3fbd5f55659210e2
+        .quad 0x3fbd179788219364
+        .quad 0x3fbccfedbfee13a8
+        .quad 0x3fbc885801bc4b23
+        .quad 0x3fbc40d6425a5cb1
+        .quad 0x3fbbf968769fca11
+        .quad 0x3fbbb20e936d6974
+        .quad 0x3fbb6ac88dad5b1c
+        .quad 0x3fbb23965a52ff00
+        .quad 0x3fbadc77ee5aea8c
+        .quad 0x3fba956d3ecade63
+        .quad 0x3fba4e7640b1bc38
+        .quad 0x3fba0792e9277cac
+        .quad 0x3fb9c0c32d4d2548
+        .quad 0x3fb97a07024cbe74
+        .quad 0x3fb9335e5d594989
+        .quad 0x3fb8ecc933aeb6e8
+        .quad 0x3fb8a6477a91dc29
+        .quad 0x3fb85fd927506a48
+        .quad 0x3fb8197e2f40e3f0
+        .quad 0x3fb7d33687c293c9
+        .quad 0x3fb78d02263d82d3
+        .quad 0x3fb746e100226ed9
+        .quad 0x3fb700d30aeac0e1
+        .quad 0x3fb6bad83c1883b6
+        .quad 0x3fb674f089365a7a
+        .quad 0x3fb62f1be7d77743
+        .quad 0x3fb5e95a4d9791cb
+        .quad 0x3fb5a3abb01ade25
+        .quad 0x3fb55e10050e0384
+        .quad 0x3fb518874226130a
+        .quad 0x3fb4d3115d207eac
+        .quad 0x3fb48dae4bc31018
+        .quad 0x3fb4485e03dbdfad
+        .quad 0x3fb403207b414b7f
+        .quad 0x3fb3bdf5a7d1ee64
+        .quad 0x3fb378dd7f749714
+        .quad 0x3fb333d7f8183f4b
+        .quad 0x3fb2eee507b40301
+        .quad 0x3fb2aa04a44717a5
+        .quad 0x3fb26536c3d8c369
+        .quad 0x3fb2207b5c78549e
+        .quad 0x3fb1dbd2643d190b
+        .quad 0x3fb1973bd1465567
+        .quad 0x3fb152b799bb3cc9
+        .quad 0x3fb10e45b3cae831
+        .quad 0x3fb0c9e615ac4e17
+        .quad 0x3fb08598b59e3a07
+        .quad 0x3fb0415d89e74444
+        .quad 0x3faffa6911ab9301
+        .quad 0x3faf723b517fc523
+        .quad 0x3faeea31c006b87c
+        .quad 0x3fae624c4a0b5e1b
+        .quad 0x3fadda8adc67ee4e
+        .quad 0x3fad52ed6405d86f
+        .quad 0x3faccb73cdddb2cc
+        .quad 0x3fac441e06f72a9e
+        .quad 0x3fabbcebfc68f420
+        .quad 0x3fab35dd9b58baad
+        .quad 0x3faaaef2d0fb10fc
+        .quad 0x3faa282b8a936171
+        .quad 0x3fa9a187b573de7c
+        .quad 0x3fa91b073efd7314
+        .quad 0x3fa894aa149fb343
+        .quad 0x3fa80e7023d8ccc4
+        .quad 0x3fa788595a3577ba
+        .quad 0x3fa70265a550e777
+        .quad 0x3fa67c94f2d4bb58
+        .quad 0x3fa5f6e73078efb8
+        .quad 0x3fa5715c4c03ceef
+        .quad 0x3fa4ebf43349e26f
+        .quad 0x3fa466aed42de3ea
+        .quad 0x3fa3e18c1ca0ae92
+        .quad 0x3fa35c8bfaa1306b
+        .quad 0x3fa2d7ae5c3c5bae
+        .quad 0x3fa252f32f8d183f
+        .quad 0x3fa1ce5a62bc353a
+        .quad 0x3fa149e3e4005a8d
+        .quad 0x3fa0c58fa19dfaaa
+        .quad 0x3fa0415d89e74444
+        .quad 0x3f9f7a9b16782856
+        .quad 0x3f9e72bf2813ce51
+        .quad 0x3f9d6b2725979802
+        .quad 0x3f9c63d2ec14aaf2
+        .quad 0x3f9b5cc258b718e6
+        .quad 0x3f9a55f548c5c43f
+        .quad 0x3f994f6b99a24475
+        .quad 0x3f98492528c8cabf
+        .quad 0x3f974321d3d006d3
+        .quad 0x3f963d6178690bd6
+        .quad 0x3f9537e3f45f3565
+        .quad 0x3f9432a925980cc1
+        .quad 0x3f932db0ea132e22
+        .quad 0x3f9228fb1fea2e28
+        .quad 0x3f912487a5507f70
+        .quad 0x3f90205658935847
+        .quad 0x3f8e38ce3033310c
+        .quad 0x3f8c317384c75f06
+        .quad 0x3f8a2a9c6c170462
+        .quad 0x3f882448a388a2aa
+        .quad 0x3f861e77e8b53fc6
+        .quad 0x3f841929f96832f0
+        .quad 0x3f82145e939ef1e9
+        .quad 0x3f8010157588de71
+        .quad 0x3f7c189cbb0e27fb
+        .quad 0x3f78121214586b54
+        .quad 0x3f740c8a747878e2
+        .quad 0x3f70080559588b35
+        .quad 0x3f680904828985c0
+        .quad 0x3f60040155d5889e
+        .quad 0x3f50020055655889
+        .quad 0x0000000000000000
+        /*== poly_coeff[4] ==*/
+        .align 32
+        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
+        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
+        .quad 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
+        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 32
+        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
+        /*== MinLog1p = -1+2^(-53) ==*/
+        .align 32
+        .quad 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff
+        /*== MaxLog1p ==*/
+        .align 32
+        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000
+        /*== One ==*/
+        .align 32
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== SgnMask ==*/
+        .align 32
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== XThreshold ==*/
+        .align 32
+        .quad 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000
+        /*== XhMask ==*/
+        .align 32
+        .quad 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00
+        /*== Threshold ==*/
+        .align 32
+        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 32
+        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 32
+        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000
+        /*== ExpMask2 ==*/
+        .align 32
+        .quad 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000
+        /*== L2L ==*/
+        .align 32
+        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
+        .align 32
+        .type	__svml_dlog1p_data_internal,@object
+        .size	__svml_dlog1p_data_internal,.-__svml_dlog1p_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
new file mode 100644
index 0000000000..ca174a5f52
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized log1p, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_log1p _ZGVeN8v_log1p_avx2_wrapper
+#include "../svml_d_log1p8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
new file mode 100644
index 0000000000..0aa35ec8c5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized log1p, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_log1p
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_log1p, __GI__ZGVeN8v_log1p, __redirect__ZGVeN8v_log1p)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
new file mode 100644
index 0000000000..5e38ff8d39
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
@@ -0,0 +1,317 @@
+/* Function log1p vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
+ *    Get short reciprocal approximation Rcp ~ 1/xh
+ *    R = (Rcp*xh - 1.0) + Rcp*xl
+ *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
+ *       log(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dlog1p_data_internal_avx512
+ */
+#define Log_tbl                       	0
+#define One                           	128
+#define SgnMask                       	192
+#define C075                          	256
+#define poly_coeff9                   	320
+#define poly_coeff8                   	384
+#define poly_coeff7                   	448
+#define poly_coeff6                   	512
+#define poly_coeff5                   	576
+#define poly_coeff4                   	640
+#define poly_coeff3                   	704
+#define poly_coeff2                   	768
+#define L2                            	832
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_log1p_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   One+__svml_dlog1p_data_internal_avx512(%rip), %zmm7
+        vmovups   SgnMask+__svml_dlog1p_data_internal_avx512(%rip), %zmm14
+        vmovaps   %zmm0, %zmm9
+        vaddpd    {rn-sae}, %zmm9, %zmm7, %zmm11
+        vandpd    %zmm14, %zmm9, %zmm8
+
+/* compute 1+x as high, low parts */
+        vmaxpd    {sae}, %zmm9, %zmm7, %zmm10
+        vminpd    {sae}, %zmm9, %zmm7, %zmm12
+
+/* GetMant(x), normalized to [1,2) for x>=0, NaN for x<0 */
+        vgetmantpd $8, {sae}, %zmm11, %zmm6
+
+/* GetExp(x) */
+        vgetexppd {sae}, %zmm11, %zmm5
+        vsubpd    {rn-sae}, %zmm10, %zmm11, %zmm13
+
+/* DblRcp ~ 1/Mantissa */
+        vrcp14pd  %zmm6, %zmm15
+
+/* Start polynomial evaluation */
+        vmovups   poly_coeff9+__svml_dlog1p_data_internal_avx512(%rip), %zmm10
+        vmovups   poly_coeff7+__svml_dlog1p_data_internal_avx512(%rip), %zmm11
+
+/* Xl */
+        vsubpd    {rn-sae}, %zmm13, %zmm12, %zmm2
+        vxorpd    %zmm14, %zmm5, %zmm3
+
+/* round DblRcp to 4 fractional bits (RN mode, no Precision exception) */
+        vrndscalepd $88, {sae}, %zmm15, %zmm4
+        vmovups   poly_coeff5+__svml_dlog1p_data_internal_avx512(%rip), %zmm12
+        vmovups   poly_coeff6+__svml_dlog1p_data_internal_avx512(%rip), %zmm14
+        vmovups   poly_coeff3+__svml_dlog1p_data_internal_avx512(%rip), %zmm13
+
+/* Xl*2^(-Expon) */
+        vscalefpd {rn-sae}, %zmm3, %zmm2, %zmm1
+
+/* Reduced argument: R = DblRcp*(Mantissa+Xl) - 1 */
+        vfmsub213pd {rn-sae}, %zmm7, %zmm4, %zmm6
+        vmovups   __svml_dlog1p_data_internal_avx512(%rip), %zmm3
+
+/*
+ * Table lookup
+ * Prepare exponent correction: DblRcp<0.75?
+ */
+        vmovups   C075+__svml_dlog1p_data_internal_avx512(%rip), %zmm2
+
+/* Prepare table index */
+        vpsrlq    $48, %zmm4, %zmm0
+        vfmadd231pd {rn-sae}, %zmm4, %zmm1, %zmm6
+        vmovups   poly_coeff8+__svml_dlog1p_data_internal_avx512(%rip), %zmm1
+        vcmppd    $17, {sae}, %zmm2, %zmm4, %k1
+        vcmppd    $4, {sae}, %zmm6, %zmm6, %k0
+        vfmadd231pd {rn-sae}, %zmm6, %zmm10, %zmm1
+        vmovups   poly_coeff4+__svml_dlog1p_data_internal_avx512(%rip), %zmm10
+        vfmadd231pd {rn-sae}, %zmm6, %zmm11, %zmm14
+        vmovups   L2+__svml_dlog1p_data_internal_avx512(%rip), %zmm4
+        vpermt2pd Log_tbl+64+__svml_dlog1p_data_internal_avx512(%rip), %zmm0, %zmm3
+
+/* add 1 to Expon if DblRcp<0.75 */
+        vaddpd    {rn-sae}, %zmm7, %zmm5, %zmm5{%k1}
+
+/* R^2 */
+        vmulpd    {rn-sae}, %zmm6, %zmm6, %zmm0
+        vfmadd231pd {rn-sae}, %zmm6, %zmm12, %zmm10
+        vmovups   poly_coeff2+__svml_dlog1p_data_internal_avx512(%rip), %zmm12
+        vmulpd    {rn-sae}, %zmm0, %zmm0, %zmm15
+        vfmadd231pd {rn-sae}, %zmm6, %zmm13, %zmm12
+        vfmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm1
+        kmovw     %k0, %edx
+        vfmadd213pd {rn-sae}, %zmm12, %zmm0, %zmm10
+
+/* polynomial */
+        vfmadd213pd {rn-sae}, %zmm10, %zmm15, %zmm1
+        vfmadd213pd {rn-sae}, %zmm6, %zmm0, %zmm1
+        vaddpd    {rn-sae}, %zmm1, %zmm3, %zmm6
+        vfmadd213pd {rn-sae}, %zmm6, %zmm4, %zmm5
+        vorpd     %zmm8, %zmm5, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm9
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm9, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      log1p@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_log1p_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dlog1p_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Log_tbl[16][2];
+        __declspec(align(64)) VUINT32 One[8][2];
+        __declspec(align(64)) VUINT32 SgnMask[8][2];
+        __declspec(align(64)) VUINT32 C075[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
+        __declspec(align(64)) VUINT32 L2[8][2];
+   } __svml_dlog1p_data_internal_avx512;
+#endif
+__svml_dlog1p_data_internal_avx512:
+        /*== Log_tbl ==*/
+        .quad 0x0000000000000000
+        .quad 0xbfaf0a30c01162a6
+        .quad 0xbfbe27076e2af2e6
+        .quad 0xbfc5ff3070a793d4
+        .quad 0xbfcc8ff7c79a9a22
+        .quad 0xbfd1675cababa60e
+        .quad 0xbfd4618bc21c5ec2
+        .quad 0xbfd739d7f6bbd007
+        .quad 0x3fd269621134db92
+        .quad 0x3fcf991c6cb3b379
+        .quad 0x3fca93ed3c8ad9e3
+        .quad 0x3fc5bf406b543db2
+        .quad 0x3fc1178e8227e47c
+        .quad 0x3fb9335e5d594989
+        .quad 0x3fb08598b59e3a07
+        .quad 0x3fa0415d89e74444
+        /*== One ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== SgnMask ==*/
+        .align 64
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
+        /*== C075 0.75 ==*/
+        .align 64
+        .quad 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000
+        /*== poly_coeff9 ==*/
+        .align 64
+        .quad 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70
+        /*== poly_coeff8 ==*/
+        .align 64
+        .quad 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62
+        /*== poly_coeff7 ==*/
+        .align 64
+        .quad 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF
+        /*== poly_coeff6 ==*/
+        .align 64
+        .quad 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06
+        /*== poly_coeff5 ==*/
+        .align 64
+        .quad 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C
+        /*== poly_coeff4 ==*/
+        .align 64
+        .quad 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD
+        /*== poly_coeff3 ==*/
+        .align 64
+        .quad 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466
+        /*== poly_coeff2 ==*/
+        .align 64
+        .quad 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6
+        /*== L2 = log(2) ==*/
+        .align 64
+        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
+        .align 64
+        .type	__svml_dlog1p_data_internal_avx512,@object
+        .size	__svml_dlog1p_data_internal_avx512,.-__svml_dlog1p_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
new file mode 100644
index 0000000000..3c0a0a01a2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized log1pf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_log1pf _ZGVeN16v_log1pf_avx2_wrapper
+#include "../svml_s_log1pf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
new file mode 100644
index 0000000000..9af1320547
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized log1pf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_log1pf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_log1pf, __GI__ZGVeN16v_log1pf,
+	       __redirect__ZGVeN16v_log1pf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
new file mode 100644
index 0000000000..78b2fe417f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
@@ -0,0 +1,271 @@
+/* Function log1pf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
+ *    Get short reciprocal approximation Rcp ~ 1/xh
+ *    R = (Rcp*xh - 1.0) + Rcp*xl
+ *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
+ *       log(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_slog1p_data_internal
+ */
+#define SgnMask                       	0
+#define sOne                          	64
+#define sPoly_1                       	128
+#define sPoly_2                       	192
+#define sPoly_3                       	256
+#define sPoly_4                       	320
+#define sPoly_5                       	384
+#define sPoly_6                       	448
+#define sPoly_7                       	512
+#define sPoly_8                       	576
+#define iHiDelta                      	640
+#define iLoRange                      	704
+#define iBrkValue                     	768
+#define iOffExpoMask                  	832
+#define sLn2                          	896
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_log1pf_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   sOne+__svml_slog1p_data_internal(%rip), %zmm2
+
+/* reduction: compute r,n */
+        vmovups   iBrkValue+__svml_slog1p_data_internal(%rip), %zmm12
+        vmovups   SgnMask+__svml_slog1p_data_internal(%rip), %zmm4
+        vmovaps   %zmm0, %zmm3
+
+/* compute 1+x as high, low parts */
+        vmaxps    {sae}, %zmm3, %zmm2, %zmm5
+        vminps    {sae}, %zmm3, %zmm2, %zmm7
+        vandnps   %zmm3, %zmm4, %zmm1
+        vpternlogd $255, %zmm4, %zmm4, %zmm4
+        vaddps    {rn-sae}, %zmm7, %zmm5, %zmm9
+        vpsubd    %zmm12, %zmm9, %zmm10
+        vsubps    {rn-sae}, %zmm9, %zmm5, %zmm6
+
+/* check argument value ranges */
+        vpaddd    iHiDelta+__svml_slog1p_data_internal(%rip), %zmm9, %zmm8
+        vpsrad    $23, %zmm10, %zmm13
+        vmovups   sPoly_5+__svml_slog1p_data_internal(%rip), %zmm9
+        vpcmpd    $5, iLoRange+__svml_slog1p_data_internal(%rip), %zmm8, %k1
+        vpslld    $23, %zmm13, %zmm14
+        vaddps    {rn-sae}, %zmm7, %zmm6, %zmm15
+        vcvtdq2ps {rn-sae}, %zmm13, %zmm0
+        vpsubd    %zmm14, %zmm2, %zmm13
+        vmovups   sPoly_8+__svml_slog1p_data_internal(%rip), %zmm7
+        vmovups   sPoly_1+__svml_slog1p_data_internal(%rip), %zmm14
+        vmulps    {rn-sae}, %zmm13, %zmm15, %zmm6
+        vpandd    iOffExpoMask+__svml_slog1p_data_internal(%rip), %zmm10, %zmm11
+        vpaddd    %zmm12, %zmm11, %zmm5
+        vmovups   sPoly_4+__svml_slog1p_data_internal(%rip), %zmm10
+        vmovups   sPoly_3+__svml_slog1p_data_internal(%rip), %zmm11
+        vmovups   sPoly_2+__svml_slog1p_data_internal(%rip), %zmm12
+
+/* polynomial evaluation */
+        vsubps    {rn-sae}, %zmm2, %zmm5, %zmm2
+        vaddps    {rn-sae}, %zmm6, %zmm2, %zmm15
+        vmovups   sPoly_7+__svml_slog1p_data_internal(%rip), %zmm2
+        vfmadd231ps {rn-sae}, %zmm15, %zmm7, %zmm2
+        vpandnd   %zmm8, %zmm8, %zmm4{%k1}
+        vmovups   sPoly_6+__svml_slog1p_data_internal(%rip), %zmm8
+
+/* combine and get argument value range mask */
+        vptestmd  %zmm4, %zmm4, %k0
+        vfmadd213ps {rn-sae}, %zmm8, %zmm15, %zmm2
+        kmovw     %k0, %edx
+        vfmadd213ps {rn-sae}, %zmm9, %zmm15, %zmm2
+        vfmadd213ps {rn-sae}, %zmm10, %zmm15, %zmm2
+        vfmadd213ps {rn-sae}, %zmm11, %zmm15, %zmm2
+        vfmadd213ps {rn-sae}, %zmm12, %zmm15, %zmm2
+        vfmadd213ps {rn-sae}, %zmm14, %zmm15, %zmm2
+        vmulps    {rn-sae}, %zmm15, %zmm2, %zmm4
+        vfmadd213ps {rn-sae}, %zmm15, %zmm15, %zmm4
+
+/* final reconstruction */
+        vmovups   sLn2+__svml_slog1p_data_internal(%rip), %zmm15
+        vfmadd213ps {rn-sae}, %zmm4, %zmm15, %zmm0
+        vorps     %zmm1, %zmm0, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm3, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      log1pf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_log1pf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_slog1p_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 SgnMask[16][1];
+        __declspec(align(64)) VUINT32 sOne[16][1];
+        __declspec(align(64)) VUINT32 sPoly[8][16][1];
+        __declspec(align(64)) VUINT32 iHiDelta[16][1];
+        __declspec(align(64)) VUINT32 iLoRange[16][1];
+        __declspec(align(64)) VUINT32 iBrkValue[16][1];
+        __declspec(align(64)) VUINT32 iOffExpoMask[16][1];
+        __declspec(align(64)) VUINT32 sLn2[16][1];
+} __svml_slog1p_data_internal;
+#endif
+__svml_slog1p_data_internal:
+        /*== SgnMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== sOne = SP 1.0 ==*/
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== sPoly[] = SP polynomial ==*/
+        .align 64
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
+        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
+        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
+        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
+        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
+        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
+        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
+        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
+        /*== iHiDelta = SP 80000000-7f000000 ==*/
+        .align 64
+        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000
+        /*== iLoRange = SP 00800000+iHiDelta ==*/
+        .align 64
+        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 64
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 64
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sLn2 = SP ln(2) ==*/
+        .align 64
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
+        .align 64
+        .type	__svml_slog1p_data_internal,@object
+        .size	__svml_slog1p_data_internal,.-__svml_slog1p_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
new file mode 100644
index 0000000000..913c8290c8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized log1pf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_log1pf _ZGVbN4v_log1pf_sse2
+#include "../svml_s_log1pf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
new file mode 100644
index 0000000000..b6aff48023
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized log1pf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_log1pf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_log1pf, __GI__ZGVbN4v_log1pf,
+	       __redirect__ZGVbN4v_log1pf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
new file mode 100644
index 0000000000..ef1bae58c0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
@@ -0,0 +1,252 @@
+/* Function log1pf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
+ *    Get short reciprocal approximation Rcp ~ 1/xh
+ *    R = (Rcp*xh - 1.0) + Rcp*xl
+ *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
+ *       log(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_slog1p_data_internal
+ */
+#define SgnMask                       	0
+#define sOne                          	16
+#define sPoly                         	32
+#define iHiDelta                      	160
+#define iLoRange                      	176
+#define iBrkValue                     	192
+#define iOffExpoMask                  	208
+#define sLn2                          	224
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_log1pf_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movups    sOne+__svml_slog1p_data_internal(%rip), %xmm7
+
+/* compute 1+x as high, low parts */
+        movaps    %xmm7, %xmm1
+        movaps    %xmm7, %xmm5
+        maxps     %xmm0, %xmm1
+        minps     %xmm0, %xmm5
+        movaps    %xmm1, %xmm4
+
+/* check argument value ranges */
+        movdqu    iHiDelta+__svml_slog1p_data_internal(%rip), %xmm2
+        addps     %xmm5, %xmm4
+
+/* reduction: compute r,n */
+        movdqu    iBrkValue+__svml_slog1p_data_internal(%rip), %xmm3
+        paddd     %xmm4, %xmm2
+        movdqu    iOffExpoMask+__svml_slog1p_data_internal(%rip), %xmm8
+        subps     %xmm4, %xmm1
+        psubd     %xmm3, %xmm4
+        addps     %xmm1, %xmm5
+        pand      %xmm4, %xmm8
+        psrad     $23, %xmm4
+        cvtdq2ps  %xmm4, %xmm10
+        pslld     $23, %xmm4
+        movaps    %xmm7, %xmm1
+        paddd     %xmm3, %xmm8
+        psubd     %xmm4, %xmm1
+        mulps     %xmm5, %xmm1
+
+/* polynomial evaluation */
+        subps     %xmm7, %xmm8
+
+/* final reconstruction */
+        mulps     sLn2+__svml_slog1p_data_internal(%rip), %xmm10
+        addps     %xmm8, %xmm1
+        movups    sPoly+112+__svml_slog1p_data_internal(%rip), %xmm9
+        mulps     %xmm1, %xmm9
+        movdqu    iLoRange+__svml_slog1p_data_internal(%rip), %xmm6
+        pcmpgtd   %xmm2, %xmm6
+        addps     sPoly+96+__svml_slog1p_data_internal(%rip), %xmm9
+
+/* combine and get argument value range mask */
+        movmskps  %xmm6, %edx
+        movups    SgnMask+__svml_slog1p_data_internal(%rip), %xmm11
+        mulps     %xmm1, %xmm9
+        andnps    %xmm0, %xmm11
+        addps     sPoly+80+__svml_slog1p_data_internal(%rip), %xmm9
+        mulps     %xmm1, %xmm9
+        addps     sPoly+64+__svml_slog1p_data_internal(%rip), %xmm9
+        mulps     %xmm1, %xmm9
+        addps     sPoly+48+__svml_slog1p_data_internal(%rip), %xmm9
+        mulps     %xmm1, %xmm9
+        addps     sPoly+32+__svml_slog1p_data_internal(%rip), %xmm9
+        mulps     %xmm1, %xmm9
+        addps     sPoly+16+__svml_slog1p_data_internal(%rip), %xmm9
+        mulps     %xmm1, %xmm9
+        addps     sPoly+__svml_slog1p_data_internal(%rip), %xmm9
+        mulps     %xmm1, %xmm9
+        mulps     %xmm1, %xmm9
+        addps     %xmm9, %xmm1
+        addps     %xmm10, %xmm1
+        orps      %xmm11, %xmm1
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm1, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm1, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm1
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm1
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      log1pf@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_log1pf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_slog1p_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 SgnMask[4][1];
+        __declspec(align(16)) VUINT32 sOne[4][1];
+        __declspec(align(16)) VUINT32 sPoly[8][4][1];
+        __declspec(align(16)) VUINT32 iHiDelta[4][1];
+        __declspec(align(16)) VUINT32 iLoRange[4][1];
+        __declspec(align(16)) VUINT32 iBrkValue[4][1];
+        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
+        __declspec(align(16)) VUINT32 sLn2[4][1];
+} __svml_slog1p_data_internal;
+#endif
+__svml_slog1p_data_internal:
+        /*== SgnMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== sOne = SP 1.0 ==*/
+        .align 16
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== sPoly[] = SP polynomial ==*/
+        .align 16
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
+        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
+        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
+        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
+        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
+        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
+        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
+        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
+        /*== iHiDelta = SP 80000000-7f000000 ==*/
+        .align 16
+        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000
+        /*== iLoRange = SP 00800000+iHiDelta ==*/
+        .align 16
+        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 16
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 16
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sLn2 = SP ln(2) ==*/
+        .align 16
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
+        .align 16
+        .type	__svml_slog1p_data_internal,@object
+        .size	__svml_slog1p_data_internal,.-__svml_slog1p_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
new file mode 100644
index 0000000000..c0b97d89e6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized log1pf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_log1pf _ZGVdN8v_log1pf_sse_wrapper
+#include "../svml_s_log1pf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
new file mode 100644
index 0000000000..a2bbe37129
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized log1pf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_log1pf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_log1pf, __GI__ZGVdN8v_log1pf,
+	       __redirect__ZGVdN8v_log1pf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
new file mode 100644
index 0000000000..957dc23e3f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
@@ -0,0 +1,254 @@
+/* Function log1pf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
+ *    Get short reciprocal approximation Rcp ~ 1/xh
+ *    R = (Rcp*xh - 1.0) + Rcp*xl
+ *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
+ *       log(Rcp) is tabulated
+ *
+ *
+ */
+
+/* Offsets for data table __svml_slog1p_data_internal
+ */
+#define SgnMask                       	0
+#define sOne                          	32
+#define sPoly                         	64
+#define iHiDelta                      	320
+#define iLoRange                      	352
+#define iBrkValue                     	384
+#define iOffExpoMask                  	416
+#define sLn2                          	448
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_log1pf_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        vmovups   sOne+__svml_slog1p_data_internal(%rip), %ymm2
+
+/* reduction: compute r,n */
+        vmovups   iBrkValue+__svml_slog1p_data_internal(%rip), %ymm13
+        vmovups   SgnMask+__svml_slog1p_data_internal(%rip), %ymm4
+        vmovups   iLoRange+__svml_slog1p_data_internal(%rip), %ymm8
+        vmovaps   %ymm0, %ymm3
+
+/* compute 1+x as high, low parts */
+        vmaxps    %ymm3, %ymm2, %ymm5
+        vminps    %ymm3, %ymm2, %ymm6
+        vaddps    %ymm6, %ymm5, %ymm10
+        vpsubd    %ymm13, %ymm10, %ymm11
+
+/* check argument value ranges */
+        vpaddd    iHiDelta+__svml_slog1p_data_internal(%rip), %ymm10, %ymm9
+        vsubps    %ymm10, %ymm5, %ymm7
+        vpsrad    $23, %ymm11, %ymm14
+        vpand     iOffExpoMask+__svml_slog1p_data_internal(%rip), %ymm11, %ymm12
+        vpslld    $23, %ymm14, %ymm15
+        vcvtdq2ps %ymm14, %ymm0
+        vpsubd    %ymm15, %ymm2, %ymm14
+        vandnps   %ymm3, %ymm4, %ymm1
+        vaddps    %ymm7, %ymm6, %ymm4
+        vpaddd    %ymm13, %ymm12, %ymm6
+        vmulps    %ymm4, %ymm14, %ymm7
+
+/* polynomial evaluation */
+        vsubps    %ymm2, %ymm6, %ymm2
+        vpcmpgtd  %ymm9, %ymm8, %ymm5
+        vmovups   sPoly+224+__svml_slog1p_data_internal(%rip), %ymm8
+        vaddps    %ymm2, %ymm7, %ymm9
+        vfmadd213ps sPoly+192+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
+        vfmadd213ps sPoly+160+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
+        vfmadd213ps sPoly+128+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
+        vfmadd213ps sPoly+96+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
+        vfmadd213ps sPoly+64+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
+        vfmadd213ps sPoly+32+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
+        vfmadd213ps sPoly+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
+        vmulps    %ymm8, %ymm9, %ymm10
+        vfmadd213ps %ymm9, %ymm9, %ymm10
+
+/* final reconstruction */
+        vfmadd132ps sLn2+__svml_slog1p_data_internal(%rip), %ymm10, %ymm0
+
+/* combine and get argument value range mask */
+        vmovmskps %ymm5, %edx
+        vorps     %ymm1, %ymm0, %ymm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm3, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      log1pf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_log1pf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_slog1p_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 SgnMask[8][1];
+        __declspec(align(32)) VUINT32 sOne[8][1];
+        __declspec(align(32)) VUINT32 sPoly[8][8][1];
+        __declspec(align(32)) VUINT32 iHiDelta[8][1];
+        __declspec(align(32)) VUINT32 iLoRange[8][1];
+        __declspec(align(32)) VUINT32 iBrkValue[8][1];
+        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
+        __declspec(align(32)) VUINT32 sLn2[8][1];
+} __svml_slog1p_data_internal;
+#endif
+__svml_slog1p_data_internal:
+        /*== SgnMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== sOne = SP 1.0 ==*/
+        .align 32
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== sPoly[] = SP polynomial ==*/
+        .align 32
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
+        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
+        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
+        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
+        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
+        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
+        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
+        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
+        /*== iHiDelta = SP 80000000-7f000000 ==*/
+        .align 32
+        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000
+        /*== iLoRange = SP 00800000+iHiDelta ==*/
+        .align 32
+        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 32
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 32
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sLn2 = SP ln(2) ==*/
+        .align 32
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
+        .align 32
+        .type	__svml_slog1p_data_internal,@object
+        .size	__svml_slog1p_data_internal,.-__svml_slog1p_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_log1p2_core.S b/sysdeps/x86_64/fpu/svml_d_log1p2_core.S
new file mode 100644
index 0000000000..e3f01717d9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log1p2_core.S
@@ -0,0 +1,29 @@
+/* Function log1p vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_log1p)
+WRAPPER_IMPL_SSE2 log1p
+END (_ZGVbN2v_log1p)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_log1p)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_log1p4_core.S b/sysdeps/x86_64/fpu/svml_d_log1p4_core.S
new file mode 100644
index 0000000000..49beb96183
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log1p4_core.S
@@ -0,0 +1,29 @@
+/* Function log1p vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_log1p)
+WRAPPER_IMPL_AVX _ZGVbN2v_log1p
+END (_ZGVdN4v_log1p)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_log1p)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
new file mode 100644
index 0000000000..8b89768b7c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function log1p vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_log1p)
+WRAPPER_IMPL_AVX _ZGVbN2v_log1p
+END (_ZGVcN4v_log1p)
diff --git a/sysdeps/x86_64/fpu/svml_d_log1p8_core.S b/sysdeps/x86_64/fpu/svml_d_log1p8_core.S
new file mode 100644
index 0000000000..54b4d4ede8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_log1p8_core.S
@@ -0,0 +1,25 @@
+/* Function log1p vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_log1p)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_log1p
+END (_ZGVeN8v_log1p)
diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
new file mode 100644
index 0000000000..2c953d00fb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
@@ -0,0 +1,25 @@
+/* Function log1pf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_log1pf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_log1pf
+END (_ZGVeN16v_log1pf)
diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
new file mode 100644
index 0000000000..6f68762eaa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
@@ -0,0 +1,29 @@
+/* Function log1pf vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_log1pf)
+WRAPPER_IMPL_SSE2 log1pf
+END (_ZGVbN4v_log1pf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_log1pf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
new file mode 100644
index 0000000000..74f81283b1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
@@ -0,0 +1,29 @@
+/* Function log1pf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_log1pf)
+WRAPPER_IMPL_AVX _ZGVbN4v_log1pf
+END (_ZGVdN8v_log1pf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_log1pf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
new file mode 100644
index 0000000000..f33be0e904
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function log1pf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_log1pf)
+WRAPPER_IMPL_AVX _ZGVbN4v_log1pf
+END (_ZGVcN8v_log1pf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
new file mode 100644
index 0000000000..18aa6aaeaa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-log1p.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
new file mode 100644
index 0000000000..18aa6aaeaa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-log1p.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
new file mode 100644
index 0000000000..18aa6aaeaa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-log1p.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
new file mode 100644
index 0000000000..40937f987a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC log1p
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 08c91ff634..38359b05e3 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
+VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index a2fb0de309..17701e7731 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
+VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index dc65a4ee25..bba62b2446 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
+VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 253ee8c906..8a04e13a07 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
+VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
new file mode 100644
index 0000000000..3395decaf4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-log1pf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
new file mode 100644
index 0000000000..3395decaf4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-log1pf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
new file mode 100644
index 0000000000..3395decaf4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-log1pf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
new file mode 100644
index 0000000000..1b36069ded
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC log1pf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 1c7db5146c..706f52c618 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
+VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 8ec51603b3..ceace4c53a 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
+VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 1cb4553c7a..06a4753409 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
+VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 6ecc1792bb..a87e5298e0 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
+VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 14/18] x86-64: Add vector atanh/atanhf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (12 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 13/18] x86-64: Add vector log1p/log1pf " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:26   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 15/18] x86-64: Add vector acosh/acoshf " Sunil K Pandey
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized atanh/atanhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector atanh/atanhf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |   11 +
 math/bits/mathcalls.h                         |    2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
 sysdeps/x86/fpu/bits/math-vector.h            |    4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
 sysdeps/x86_64/fpu/Makeconfig                 |    1 +
 sysdeps/x86_64/fpu/Versions                   |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
 .../fpu/multiarch/svml_d_atanh2_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_atanh2_core.c |   27 +
 .../fpu/multiarch/svml_d_atanh2_core_sse4.S   | 1519 +++++++++++++++++
 .../fpu/multiarch/svml_d_atanh4_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_atanh4_core.c |   27 +
 .../fpu/multiarch/svml_d_atanh4_core_avx2.S   | 1479 ++++++++++++++++
 .../fpu/multiarch/svml_d_atanh8_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_atanh8_core.c |   27 +
 .../fpu/multiarch/svml_d_atanh8_core_avx512.S |  401 +++++
 .../fpu/multiarch/svml_s_atanhf16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_atanhf16_core.c      |   28 +
 .../multiarch/svml_s_atanhf16_core_avx512.S   |  393 +++++
 .../fpu/multiarch/svml_s_atanhf4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_atanhf4_core.c       |   28 +
 .../fpu/multiarch/svml_s_atanhf4_core_sse4.S  |  361 ++++
 .../fpu/multiarch/svml_s_atanhf8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_atanhf8_core.c       |   28 +
 .../fpu/multiarch/svml_s_atanhf8_core_avx2.S  |  335 ++++
 sysdeps/x86_64/fpu/svml_d_atanh2_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_atanh4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_atanh8_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_s_atanhf16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_atanhf4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_atanhf8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S  |   25 +
 .../fpu/test-double-libmvec-atanh-avx.c       |    1 +
 .../fpu/test-double-libmvec-atanh-avx2.c      |    1 +
 .../fpu/test-double-libmvec-atanh-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-atanh.c    |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
 .../fpu/test-float-libmvec-atanhf-avx.c       |    1 +
 .../fpu/test-float-libmvec-atanhf-avx2.c      |    1 +
 .../fpu/test-float-libmvec-atanhf-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-atanhf.c    |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
 50 files changed, 5060 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 845246fab9..bb7380a446 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -252,4 +252,15 @@
 #define __DECL_SIMD_log1pf32x
 #define __DECL_SIMD_log1pf64x
 #define __DECL_SIMD_log1pf128x
+
+#define __DECL_SIMD_atanh
+#define __DECL_SIMD_atanhf
+#define __DECL_SIMD_atanhl
+#define __DECL_SIMD_atanhf16
+#define __DECL_SIMD_atanhf32
+#define __DECL_SIMD_atanhf64
+#define __DECL_SIMD_atanhf128
+#define __DECL_SIMD_atanhf32x
+#define __DECL_SIMD_atanhf64x
+#define __DECL_SIMD_atanhf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index aa4bc61aa4..04dd9c5d1b 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -86,7 +86,7 @@ __MATHCALL (acosh,, (_Mdouble_ __x));
 /* Hyperbolic arc sine of X.  */
 __MATHCALL (asinh,, (_Mdouble_ __x));
 /* Hyperbolic arc tangent of X.  */
-__MATHCALL (atanh,, (_Mdouble_ __x));
+__MATHCALL_VEC (atanh,, (_Mdouble_ __x));
 #endif
 
 /* Exponential and logarithmic functions.  */
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 68b940606a..2d389912b1 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -49,6 +49,7 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
 GLIBC_2.35 _ZGVbN2v_acos F
 GLIBC_2.35 _ZGVbN2v_asin F
 GLIBC_2.35 _ZGVbN2v_atan F
+GLIBC_2.35 _ZGVbN2v_atanh F
 GLIBC_2.35 _ZGVbN2v_cbrt F
 GLIBC_2.35 _ZGVbN2v_cosh F
 GLIBC_2.35 _ZGVbN2v_exp10 F
@@ -63,6 +64,7 @@ GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
 GLIBC_2.35 _ZGVbN4v_asinf F
 GLIBC_2.35 _ZGVbN4v_atanf F
+GLIBC_2.35 _ZGVbN4v_atanhf F
 GLIBC_2.35 _ZGVbN4v_cbrtf F
 GLIBC_2.35 _ZGVbN4v_coshf F
 GLIBC_2.35 _ZGVbN4v_exp10f F
@@ -77,6 +79,7 @@ GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
 GLIBC_2.35 _ZGVcN4v_asin F
 GLIBC_2.35 _ZGVcN4v_atan F
+GLIBC_2.35 _ZGVcN4v_atanh F
 GLIBC_2.35 _ZGVcN4v_cbrt F
 GLIBC_2.35 _ZGVcN4v_cosh F
 GLIBC_2.35 _ZGVcN4v_exp10 F
@@ -91,6 +94,7 @@ GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
 GLIBC_2.35 _ZGVcN8v_asinf F
 GLIBC_2.35 _ZGVcN8v_atanf F
+GLIBC_2.35 _ZGVcN8v_atanhf F
 GLIBC_2.35 _ZGVcN8v_cbrtf F
 GLIBC_2.35 _ZGVcN8v_coshf F
 GLIBC_2.35 _ZGVcN8v_exp10f F
@@ -105,6 +109,7 @@ GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
 GLIBC_2.35 _ZGVdN4v_asin F
 GLIBC_2.35 _ZGVdN4v_atan F
+GLIBC_2.35 _ZGVdN4v_atanh F
 GLIBC_2.35 _ZGVdN4v_cbrt F
 GLIBC_2.35 _ZGVdN4v_cosh F
 GLIBC_2.35 _ZGVdN4v_exp10 F
@@ -119,6 +124,7 @@ GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
 GLIBC_2.35 _ZGVdN8v_asinf F
 GLIBC_2.35 _ZGVdN8v_atanf F
+GLIBC_2.35 _ZGVdN8v_atanhf F
 GLIBC_2.35 _ZGVdN8v_cbrtf F
 GLIBC_2.35 _ZGVdN8v_coshf F
 GLIBC_2.35 _ZGVdN8v_exp10f F
@@ -133,6 +139,7 @@ GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
 GLIBC_2.35 _ZGVeN16v_asinf F
 GLIBC_2.35 _ZGVeN16v_atanf F
+GLIBC_2.35 _ZGVeN16v_atanhf F
 GLIBC_2.35 _ZGVeN16v_cbrtf F
 GLIBC_2.35 _ZGVeN16v_coshf F
 GLIBC_2.35 _ZGVeN16v_exp10f F
@@ -147,6 +154,7 @@ GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
 GLIBC_2.35 _ZGVeN8v_asin F
 GLIBC_2.35 _ZGVeN8v_atan F
+GLIBC_2.35 _ZGVeN8v_atanh F
 GLIBC_2.35 _ZGVeN8v_cbrt F
 GLIBC_2.35 _ZGVeN8v_cosh F
 GLIBC_2.35 _ZGVeN8v_exp10 F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 14c9db3bb3..4937b6811f 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -114,6 +114,10 @@
 #  define __DECL_SIMD_log1p __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_log1pf
 #  define __DECL_SIMD_log1pf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_atanh
+#  define __DECL_SIMD_atanh __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_atanhf
+#  define __DECL_SIMD_atanhf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index 3dca196432..da39c08ba9 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -56,6 +56,8 @@
 !GCC$ builtin (log2f) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (log1p) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (log1pf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (atanh) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (atanhf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -97,3 +99,5 @@
 !GCC$ builtin (log2f) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (log1p) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (log1pf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (atanh) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (atanhf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 378cb06d37..de87544259 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -26,6 +26,7 @@ libmvec-funcs = \
   asin \
   atan \
   atan2 \
+  atanh \
   cbrt \
   cos \
   cosh \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 155fb115f3..df0ea83711 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -17,6 +17,7 @@ libmvec {
     _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
     _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
     _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
+    _ZGVbN2v_atanh; _ZGVcN4v_atanh; _ZGVdN4v_atanh; _ZGVeN8v_atanh;
     _ZGVbN2v_cbrt; _ZGVcN4v_cbrt; _ZGVdN4v_cbrt; _ZGVeN8v_cbrt;
     _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh;
     _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
@@ -31,6 +32,7 @@ libmvec {
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
     _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
+    _ZGVbN4v_atanhf; _ZGVcN8v_atanhf; _ZGVdN8v_atanhf; _ZGVeN16v_atanhf;
     _ZGVbN4v_cbrtf; _ZGVcN8v_cbrtf; _ZGVdN8v_cbrtf; _ZGVeN16v_cbrtf;
     _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf;
     _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index a2b15a795b..09a46190b6 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -248,6 +248,26 @@ float: 3
 float128: 4
 ldouble: 5
 
+Function: "atanh_vlen16":
+float: 1
+
+Function: "atanh_vlen2":
+double: 1
+
+Function: "atanh_vlen4":
+double: 1
+float: 1
+
+Function: "atanh_vlen4_avx2":
+double: 1
+
+Function: "atanh_vlen8":
+double: 1
+float: 1
+
+Function: "atanh_vlen8_avx2":
+float: 1
+
 Function: "cabs":
 double: 1
 float128: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core-sse2.S
new file mode 100644
index 0000000000..b154ab8649
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized atanh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_atanh _ZGVbN2v_atanh_sse2
+#include "../svml_d_atanh2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core.c
new file mode 100644
index 0000000000..138190e568
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized atanh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_atanh
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_atanh, __GI__ZGVbN2v_atanh, __redirect__ZGVbN2v_atanh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core_sse4.S
new file mode 100644
index 0000000000..7e70b036f7
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core_sse4.S
@@ -0,0 +1,1519 @@
+/* Function atanh vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
+ *
+ *   Special cases:
+ *
+ *   atanh(0)  = 0
+ *   atanh(+1) = +INF
+ *   atanh(-1) = -INF
+ *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
+ *
+ */
+
+/* Offsets for data table __svml_datanh_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	8208
+#define poly_coeff                    	12320
+#define ExpMask                       	12384
+#define Two10                         	12400
+#define MinLog1p                      	12416
+#define MaxLog1p                      	12432
+#define One                           	12448
+#define SgnMask                       	12464
+#define XThreshold                    	12480
+#define XhMask                        	12496
+#define Threshold                     	12512
+#define Bias                          	12528
+#define Bias1                         	12544
+#define ExpMask0                      	12560
+#define ExpMask2                      	12576
+#define L2                            	12592
+#define dHalf                         	12608
+#define dSign                         	12624
+#define dTopMask12                    	12640
+#define dTopMask41                    	12656
+#define TinyRange                     	12672
+
+/* Lookup bias for data table __svml_datanh_data_internal.  */
+#define Table_Lookup_Bias               -0x405ff0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_atanh_sse4)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $64, %rsp
+        movaps    %xmm0, %xmm12
+        movups    SgnMask+__svml_datanh_data_internal(%rip), %xmm7
+        lea       Table_Lookup_Bias+__svml_datanh_data_internal(%rip), %rsi
+
+/* Load the constant 1 and a sign mask */
+        movups    One+__svml_datanh_data_internal(%rip), %xmm11
+
+/* Strip off the sign, so treat X as positive until right at the end */
+        movaps    %xmm7, %xmm14
+        andps     %xmm12, %xmm14
+        movaps    %xmm11, %xmm15
+        subpd     %xmm14, %xmm15
+        movups    dTopMask41+__svml_datanh_data_internal(%rip), %xmm2
+        movaps    %xmm11, %xmm5
+        movaps    %xmm2, %xmm0
+
+/*
+ * Compute V = 2 * X trivially, and UHi + U_lo = 1 - X in two pieces,
+ * the upper part UHi being <= 41 bits long. Then we have
+ * atanh(X) = 1/2 * log((1 + X) / (1 - X)) = 1/2 * log1p(V / (UHi + ULo)).
+ */
+        movaps    %xmm14, %xmm6
+        andps     %xmm15, %xmm0
+
+/*
+ * Check whether |X| < 1, in which case we use the main function.
+ * Otherwise set the rangemask so that the callout will get used.
+ * Note that this will also use the callout for NaNs since not(NaN < 1).
+ */
+        movaps    %xmm14, %xmm13
+
+/*
+ * Now compute R = 1/(UHi+ULo) * (1 - E) and the error term E
+ * The first FMR is exact (we force R to 12 bits just in case it
+ * isn't already, to make absolutely sure), and since E is ~ 2^-12,
+ * the rounding error in the other one is acceptable.
+ */
+        cvtpd2ps  %xmm0, %xmm1
+        subpd     %xmm15, %xmm5
+        addpd     %xmm14, %xmm6
+        subpd     %xmm0, %xmm15
+        cmpnltpd  %xmm11, %xmm13
+        subpd     %xmm14, %xmm5
+        movmskpd  %xmm13, %edx
+        movlhps   %xmm1, %xmm1
+        movaps    %xmm14, %xmm9
+        rcpps     %xmm1, %xmm4
+        addpd     %xmm15, %xmm5
+        cmpltpd   TinyRange+__svml_datanh_data_internal(%rip), %xmm9
+        cvtps2pd  %xmm4, %xmm14
+        andps     dTopMask12+__svml_datanh_data_internal(%rip), %xmm14
+        movaps    %xmm11, %xmm13
+        mulpd     %xmm14, %xmm0
+        mulpd     %xmm14, %xmm5
+        subpd     %xmm0, %xmm13
+
+/*
+ * Split V as well into upper 41 bits and lower part, so that we can get
+ * a preliminary quotient estimate without rounding error.
+ */
+        andps     %xmm6, %xmm2
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * later incorporating L into the reduced argument.
+ * compute 1+x as high, low parts
+ */
+        movaps    %xmm11, %xmm0
+        subpd     %xmm5, %xmm13
+        subpd     %xmm2, %xmm6
+
+/* Hence get initial quotient estimate QHi + QLo = R * VHi + R * VLo */
+        mulpd     %xmm14, %xmm2
+        mulpd     %xmm6, %xmm14
+
+/*
+ * Compute D = E + E^2 + E^3 + E^4 + E^5
+ * = E + (E + E^2) (E + E * E^2)
+ */
+        movaps    %xmm13, %xmm6
+        movaps    %xmm13, %xmm3
+        mulpd     %xmm13, %xmm6
+        mulpd     %xmm6, %xmm3
+        addpd     %xmm13, %xmm6
+        addpd     %xmm13, %xmm3
+        mulpd     %xmm3, %xmm6
+        addpd     %xmm6, %xmm13
+
+/*
+ * Compute R * (VHi + VLo) * (1 + E + E^2 + E^3 + E^4 + E^5)
+ * = R *  (VHi + VLo) * (1 + D)
+ * = QHi + (QHi * D + QLo + QLo * D)
+ */
+        movaps    %xmm13, %xmm1
+        movaps    %xmm11, %xmm5
+        mulpd     %xmm14, %xmm13
+        mulpd     %xmm2, %xmm1
+        addpd     %xmm13, %xmm14
+        addpd     %xmm14, %xmm1
+
+/*
+ * Now finally accumulate the high and low parts of the
+ * argument to log1p, H + L, with a final compensated summation.
+ */
+        addpd     %xmm1, %xmm2
+        maxpd     %xmm2, %xmm0
+        minpd     %xmm2, %xmm5
+        andps     %xmm7, %xmm2
+        movaps    %xmm0, %xmm4
+        cmpltpd   XThreshold+__svml_datanh_data_internal(%rip), %xmm2
+        addpd     %xmm5, %xmm4
+        orps      XhMask+__svml_datanh_data_internal(%rip), %xmm2
+        movaps    %xmm12, %xmm10
+
+/* preserve mantissa, set input exponent to 2^(-10) */
+        movups    ExpMask+__svml_datanh_data_internal(%rip), %xmm7
+        andps     %xmm2, %xmm4
+        andps     %xmm4, %xmm7
+
+/* exponent bits */
+        movaps    %xmm4, %xmm6
+        orps      Two10+__svml_datanh_data_internal(%rip), %xmm7
+        psrlq     $20, %xmm6
+
+/* reciprocal approximation good to at least 11 bits */
+        cvtpd2ps  %xmm7, %xmm1
+        subpd     %xmm4, %xmm0
+        mulpd     %xmm12, %xmm10
+        addpd     %xmm0, %xmm5
+        addpd     %xmm12, %xmm10
+        movlhps   %xmm1, %xmm1
+        rcpps     %xmm1, %xmm15
+        cvtps2pd  %xmm15, %xmm3
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        movups    .FLT_21(%rip), %xmm1
+        addpd     %xmm1, %xmm3
+        subpd     %xmm1, %xmm3
+
+/* exponent of X needed to scale Xl */
+        movdqu    ExpMask0+__svml_datanh_data_internal(%rip), %xmm0
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        movaps    %xmm3, %xmm13
+
+/* 2^ (-10-exp(X) ) */
+        movdqu    ExpMask2+__svml_datanh_data_internal(%rip), %xmm2
+        pand      %xmm4, %xmm0
+        psubq     %xmm0, %xmm2
+
+/* scale DblRcp */
+        mulpd     %xmm3, %xmm2
+
+/* argument reduction */
+        mulpd     %xmm2, %xmm4
+        mulpd     %xmm2, %xmm5
+        subpd     %xmm11, %xmm4
+        addpd     %xmm5, %xmm4
+
+/* polynomial */
+        movups    poly_coeff+__svml_datanh_data_internal(%rip), %xmm11
+        psrlq     $40, %xmm13
+        mulpd     %xmm4, %xmm11
+        movd      %xmm13, %eax
+        pshufd    $221, %xmm6, %xmm7
+
+/* exponent*log(2.0) */
+        movups    Threshold+__svml_datanh_data_internal(%rip), %xmm6
+        cmpltpd   %xmm3, %xmm6
+        addpd     poly_coeff+16+__svml_datanh_data_internal(%rip), %xmm11
+
+/* biased exponent in DP format */
+        cvtdq2pd  %xmm7, %xmm1
+        movaps    %xmm4, %xmm3
+        mulpd     %xmm4, %xmm3
+        movups    poly_coeff+32+__svml_datanh_data_internal(%rip), %xmm2
+        mulpd     %xmm4, %xmm2
+        mulpd     %xmm3, %xmm11
+        addpd     poly_coeff+48+__svml_datanh_data_internal(%rip), %xmm2
+        addpd     %xmm11, %xmm2
+
+/* reconstruction */
+        mulpd     %xmm2, %xmm3
+        andps     Bias+__svml_datanh_data_internal(%rip), %xmm6
+        orps      Bias1+__svml_datanh_data_internal(%rip), %xmm6
+        pshufd    $2, %xmm13, %xmm14
+        subpd     %xmm6, %xmm1
+        addpd     %xmm3, %xmm4
+        movd      %xmm14, %ecx
+        mulpd     L2+__svml_datanh_data_internal(%rip), %xmm1
+        movslq    %eax, %rax
+        movslq    %ecx, %rcx
+
+/* Record the sign for eventual reincorporation. */
+        movups    dSign+__svml_datanh_data_internal(%rip), %xmm8
+        andps     %xmm12, %xmm8
+        movsd     (%rsi,%rax), %xmm0
+
+/* Or the sign bit in with the tiny result to handle atanh(-0) correctly */
+        orps      %xmm8, %xmm10
+        movhpd    (%rsi,%rcx), %xmm0
+        andps     %xmm9, %xmm10
+        addpd     %xmm4, %xmm0
+        addpd     %xmm0, %xmm1
+
+/* Finally, halve the result and reincorporate the sign */
+        movups    dHalf+__svml_datanh_data_internal(%rip), %xmm4
+        movaps    %xmm9, %xmm0
+        pxor      %xmm8, %xmm4
+        mulpd     %xmm1, %xmm4
+        andnps    %xmm4, %xmm0
+        orps      %xmm10, %xmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm12
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm12, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      atanh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVbN2v_atanh_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_datanh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
+        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
+        __declspec(align(16)) VUINT32 ExpMask[2][2];
+        __declspec(align(16)) VUINT32 Two10[2][2];
+        __declspec(align(16)) VUINT32 MinLog1p[2][2];
+        __declspec(align(16)) VUINT32 MaxLog1p[2][2];
+        __declspec(align(16)) VUINT32 One[2][2];
+        __declspec(align(16)) VUINT32 SgnMask[2][2];
+        __declspec(align(16)) VUINT32 XThreshold[2][2];
+        __declspec(align(16)) VUINT32 XhMask[2][2];
+        __declspec(align(16)) VUINT32 Threshold[2][2];
+        __declspec(align(16)) VUINT32 Bias[2][2];
+        __declspec(align(16)) VUINT32 Bias1[2][2];
+        __declspec(align(16)) VUINT32 ExpMask0[2][2];
+        __declspec(align(16)) VUINT32 ExpMask2[2][2];
+        __declspec(align(16)) VUINT32 L2[2][2];
+        __declspec(align(16)) VUINT32 dHalf[2][2];
+        __declspec(align(16)) VUINT32 dSign[2][2];
+        __declspec(align(16)) VUINT32 dTopMask12[2][2];
+        __declspec(align(16)) VUINT32 dTopMask41[2][2];
+        __declspec(align(16)) VUINT32 TinyRange[2][2];
+} __svml_datanh_data_internal;
+#endif
+__svml_datanh_data_internal:
+        /* Log_HA_table */
+        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
+        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
+        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
+        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
+        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
+        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
+        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
+        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
+        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
+        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
+        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
+        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
+        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
+        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
+        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
+        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
+        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
+        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
+        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
+        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
+        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
+        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
+        .quad 0xc086238206e94218, 0xbe1ceee898588610
+        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
+        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
+        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
+        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
+        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
+        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
+        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
+        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
+        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
+        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
+        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
+        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
+        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
+        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
+        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
+        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
+        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
+        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
+        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
+        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
+        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
+        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
+        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
+        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
+        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
+        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
+        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
+        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
+        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
+        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
+        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
+        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
+        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
+        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
+        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
+        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
+        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
+        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
+        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
+        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
+        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
+        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
+        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
+        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
+        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
+        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
+        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
+        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
+        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
+        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
+        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
+        .quad 0xc086244055d2c968, 0xbe1cef345284c119
+        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
+        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
+        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
+        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
+        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
+        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
+        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
+        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
+        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
+        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
+        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
+        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
+        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
+        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
+        .quad 0xc086247419475160, 0xbe1cf03dd9922331
+        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
+        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
+        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
+        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
+        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
+        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
+        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
+        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
+        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
+        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
+        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
+        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
+        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
+        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
+        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
+        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
+        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
+        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
+        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
+        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
+        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
+        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
+        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
+        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
+        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
+        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
+        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
+        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
+        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
+        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
+        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
+        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
+        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
+        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
+        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
+        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
+        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
+        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
+        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
+        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
+        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
+        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
+        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
+        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
+        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
+        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
+        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
+        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
+        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
+        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
+        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
+        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
+        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
+        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
+        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
+        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
+        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
+        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
+        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
+        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
+        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
+        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
+        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
+        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
+        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
+        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
+        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
+        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
+        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
+        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
+        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
+        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
+        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
+        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
+        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
+        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
+        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
+        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
+        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
+        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
+        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
+        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
+        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
+        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
+        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
+        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
+        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
+        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
+        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
+        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
+        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
+        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
+        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
+        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
+        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
+        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
+        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
+        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
+        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
+        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
+        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
+        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
+        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
+        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
+        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
+        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
+        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
+        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
+        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
+        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
+        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
+        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
+        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
+        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
+        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
+        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
+        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
+        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
+        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
+        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
+        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
+        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
+        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
+        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
+        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
+        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
+        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
+        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
+        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
+        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
+        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
+        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
+        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
+        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
+        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
+        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
+        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
+        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
+        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
+        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
+        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
+        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
+        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
+        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
+        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
+        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
+        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
+        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
+        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
+        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
+        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
+        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
+        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
+        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
+        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
+        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
+        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
+        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
+        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
+        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
+        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
+        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
+        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
+        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
+        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
+        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
+        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
+        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
+        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
+        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
+        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
+        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
+        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
+        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
+        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
+        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
+        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
+        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
+        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
+        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
+        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
+        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
+        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
+        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
+        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
+        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
+        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
+        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
+        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
+        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
+        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
+        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
+        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
+        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
+        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
+        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
+        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
+        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
+        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
+        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
+        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
+        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
+        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
+        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
+        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
+        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
+        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
+        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
+        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
+        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
+        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
+        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
+        .quad 0xc08626e164224880, 0xbe1ceeb431709788
+        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
+        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
+        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
+        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
+        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
+        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
+        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
+        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
+        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
+        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
+        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
+        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
+        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
+        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
+        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
+        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
+        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
+        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
+        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
+        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
+        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
+        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
+        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
+        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
+        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
+        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
+        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
+        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
+        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
+        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
+        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
+        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
+        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
+        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
+        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
+        .quad 0xc086273a05367688, 0xbe1cf18656c50806
+        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
+        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
+        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
+        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
+        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
+        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
+        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
+        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
+        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
+        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
+        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
+        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
+        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
+        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
+        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
+        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
+        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
+        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
+        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
+        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
+        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
+        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
+        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
+        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
+        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
+        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
+        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
+        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
+        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
+        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
+        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
+        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
+        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
+        .quad 0xc086278a58297918, 0xbe1cf053073872bf
+        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
+        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
+        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
+        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
+        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
+        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
+        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
+        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
+        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
+        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
+        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
+        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
+        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
+        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
+        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
+        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
+        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
+        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
+        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
+        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
+        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
+        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
+        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
+        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
+        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
+        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
+        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
+        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
+        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
+        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
+        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
+        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
+        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
+        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
+        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
+        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
+        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
+        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
+        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
+        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
+        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
+        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
+        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
+        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
+        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
+        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
+        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
+        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
+        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
+        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
+        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
+        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
+        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
+        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
+        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
+        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
+        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
+        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
+        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
+        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
+        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
+        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
+        .quad 0xc086281755366778, 0xbe1cef2edae5837d
+        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
+        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
+        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
+        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
+        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
+        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
+        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
+        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
+        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
+        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
+        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
+        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
+        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
+        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
+        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
+        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
+        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
+        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
+        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
+        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
+        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
+        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
+        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
+        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
+        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
+        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
+        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
+        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
+        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
+        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
+        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
+        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
+        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
+        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
+        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
+        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
+        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
+        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
+        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
+        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
+        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
+        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
+        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
+        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
+        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
+        .quad 0xc086287879041490, 0xbe1cf034803c8a48
+        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
+        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
+        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
+        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
+        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
+        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
+        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
+        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
+        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
+        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
+        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
+        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
+        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
+        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
+        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
+        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
+        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
+        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
+        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
+        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
+        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
+        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
+        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
+        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
+        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
+        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
+        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
+        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
+        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
+        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
+        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
+        /*== Log_LA_table ==*/
+        .align 16
+        .quad 0x8000000000000000
+        .quad 0xbf5ff802a9ab10e6
+        .quad 0xbf6ff00aa2b10bc0
+        .quad 0xbf77ee11ebd82e94
+        .quad 0xbf7fe02a6b106789
+        .quad 0xbf83e7295d25a7d9
+        .quad 0xbf87dc475f810a77
+        .quad 0xbf8bcf712c74384c
+        .quad 0xbf8fc0a8b0fc03e4
+        .quad 0xbf91d7f7eb9eebe7
+        .quad 0xbf93cea44346a575
+        .quad 0xbf95c45a51b8d389
+        .quad 0xbf97b91b07d5b11b
+        .quad 0xbf99ace7551cc514
+        .quad 0xbf9b9fc027af9198
+        .quad 0xbf9d91a66c543cc4
+        .quad 0xbf9f829b0e783300
+        .quad 0xbfa0b94f7c196176
+        .quad 0xbfa1b0d98923d980
+        .quad 0xbfa2a7ec2214e873
+        .quad 0xbfa39e87b9febd60
+        .quad 0xbfa494acc34d911c
+        .quad 0xbfa58a5bafc8e4d5
+        .quad 0xbfa67f94f094bd98
+        .quad 0xbfa77458f632dcfc
+        .quad 0xbfa868a83083f6cf
+        .quad 0xbfa95c830ec8e3eb
+        .quad 0xbfaa4fe9ffa3d235
+        .quad 0xbfab42dd711971bf
+        .quad 0xbfac355dd0921f2d
+        .quad 0xbfad276b8adb0b52
+        .quad 0xbfae19070c276016
+        .quad 0xbfaf0a30c01162a6
+        .quad 0xbfaffae9119b9303
+        .quad 0xbfb075983598e471
+        .quad 0xbfb0ed839b5526fe
+        .quad 0xbfb16536eea37ae1
+        .quad 0xbfb1dcb263db1944
+        .quad 0xbfb253f62f0a1417
+        .quad 0xbfb2cb0283f5de1f
+        .quad 0xbfb341d7961bd1d1
+        .quad 0xbfb3b87598b1b6ee
+        .quad 0xbfb42edcbea646f0
+        .quad 0xbfb4a50d3aa1b040
+        .quad 0xbfb51b073f06183f
+        .quad 0xbfb590cafdf01c28
+        .quad 0xbfb60658a93750c4
+        .quad 0xbfb67bb0726ec0fc
+        .quad 0xbfb6f0d28ae56b4c
+        .quad 0xbfb765bf23a6be13
+        .quad 0xbfb7da766d7b12cd
+        .quad 0xbfb84ef898e8282a
+        .quad 0xbfb8c345d6319b21
+        .quad 0xbfb9375e55595ede
+        .quad 0xbfb9ab42462033ad
+        .quad 0xbfba1ef1d8061cd4
+        .quad 0xbfba926d3a4ad563
+        .quad 0xbfbb05b49bee43fe
+        .quad 0xbfbb78c82bb0eda1
+        .quad 0xbfbbeba818146765
+        .quad 0xbfbc5e548f5bc743
+        .quad 0xbfbcd0cdbf8c13e1
+        .quad 0xbfbd4313d66cb35d
+        .quad 0xbfbdb5270187d927
+        .quad 0xbfbe27076e2af2e6
+        .quad 0xbfbe98b549671467
+        .quad 0xbfbf0a30c01162a6
+        .quad 0xbfbf7b79fec37ddf
+        .quad 0xbfbfec9131dbeabb
+        .quad 0xbfc02ebb42bf3d4b
+        .quad 0xbfc0671512ca596e
+        .quad 0xbfc09f561ee719c3
+        .quad 0xbfc0d77e7cd08e59
+        .quad 0xbfc10f8e422539b1
+        .quad 0xbfc14785846742ac
+        .quad 0xbfc17f6458fca611
+        .quad 0xbfc1b72ad52f67a0
+        .quad 0xbfc1eed90e2dc2c3
+        .quad 0xbfc2266f190a5acb
+        .quad 0xbfc25ded0abc6ad2
+        .quad 0xbfc29552f81ff523
+        .quad 0xbfc2cca0f5f5f251
+        .quad 0xbfc303d718e47fd3
+        .quad 0xbfc33af575770e4f
+        .quad 0xbfc371fc201e8f74
+        .quad 0xbfc3a8eb2d31a376
+        .quad 0xbfc3dfc2b0ecc62a
+        .quad 0xbfc41682bf727bc0
+        .quad 0xbfc44d2b6ccb7d1e
+        .quad 0xbfc483bccce6e3dd
+        .quad 0xbfc4ba36f39a55e5
+        .quad 0xbfc4f099f4a230b2
+        .quad 0xbfc526e5e3a1b438
+        .quad 0xbfc55d1ad4232d6f
+        .quad 0xbfc59338d9982086
+        .quad 0xbfc5c940075972b9
+        .quad 0xbfc5ff3070a793d4
+        .quad 0xbfc6350a28aaa758
+        .quad 0xbfc66acd4272ad51
+        .quad 0xbfc6a079d0f7aad2
+        .quad 0xbfc6d60fe719d21d
+        .quad 0xbfc70b8f97a1aa75
+        .quad 0xbfc740f8f54037a5
+        .quad 0xbfc7764c128f2127
+        .quad 0xbfc7ab890210d909
+        .quad 0xbfc7e0afd630c274
+        .quad 0xbfc815c0a14357eb
+        .quad 0xbfc84abb75865139
+        .quad 0xbfc87fa06520c911
+        .quad 0xbfc8b46f8223625b
+        .quad 0xbfc8e928de886d41
+        .quad 0xbfc91dcc8c340bde
+        .quad 0xbfc9525a9cf456b4
+        .quad 0xbfc986d3228180ca
+        .quad 0xbfc9bb362e7dfb83
+        .quad 0xbfc9ef83d2769a34
+        .quad 0xbfca23bc1fe2b563
+        .quad 0xbfca57df28244dcd
+        .quad 0xbfca8becfc882f19
+        .quad 0xbfcabfe5ae46124c
+        .quad 0xbfcaf3c94e80bff3
+        .quad 0xbfcb2797ee46320c
+        .quad 0xbfcb5b519e8fb5a4
+        .quad 0xbfcb8ef670420c3b
+        .quad 0xbfcbc286742d8cd6
+        .quad 0xbfcbf601bb0e44e2
+        .quad 0xbfcc2968558c18c1
+        .quad 0xbfcc5cba543ae425
+        .quad 0xbfcc8ff7c79a9a22
+        .quad 0xbfccc320c0176502
+        .quad 0xbfccf6354e09c5dc
+        .quad 0xbfcd293581b6b3e7
+        .quad 0xbfcd5c216b4fbb91
+        .quad 0xbfcd8ef91af31d5e
+        .quad 0xbfcdc1bca0abec7d
+        .quad 0xbfcdf46c0c722d2f
+        .quad 0xbfce27076e2af2e6
+        .quad 0xbfce598ed5a87e2f
+        .quad 0xbfce8c0252aa5a60
+        .quad 0xbfcebe61f4dd7b0b
+        .quad 0xbfcef0adcbdc5936
+        .quad 0xbfcf22e5e72f105d
+        .quad 0xbfcf550a564b7b37
+        .quad 0xbfcf871b28955045
+        .quad 0xbfcfb9186d5e3e2b
+        .quad 0xbfcfeb0233e607cc
+        .quad 0xbfd00e6c45ad501d
+        .quad 0xbfd0274dc16c232f
+        .quad 0xbfd0402594b4d041
+        .quad 0xbfd058f3c703ebc6
+        .quad 0xbfd071b85fcd590d
+        .quad 0xbfd08a73667c57af
+        .quad 0xbfd0a324e27390e3
+        .quad 0xbfd0bbccdb0d24bd
+        .quad 0xbfd0d46b579ab74b
+        .quad 0xbfd0ed005f657da4
+        .quad 0xbfd1058bf9ae4ad5
+        .quad 0xbfd11e0e2dad9cb7
+        .quad 0xbfd136870293a8b0
+        .quad 0xbfd14ef67f88685a
+        .quad 0xbfd1675cababa60e
+        .quad 0xbfd17fb98e15095d
+        .quad 0xbfd1980d2dd4236f
+        .quad 0xbfd1b05791f07b49
+        .quad 0xbfd1c898c16999fb
+        .quad 0xbfd1e0d0c33716be
+        .quad 0xbfd1f8ff9e48a2f3
+        .quad 0xbfd211255986160c
+        .quad 0xbfd22941fbcf7966
+        .quad 0xbfd241558bfd1404
+        .quad 0xbfd2596010df763a
+        .quad 0xbfd27161913f853d
+        .quad 0xbfd2895a13de86a3
+        .quad 0xbfd2a1499f762bc9
+        .quad 0xbfd2b9303ab89d25
+        .quad 0xbfd2d10dec508583
+        .quad 0xbfd2e8e2bae11d31
+        .quad 0xbfd300aead06350c
+        .quad 0xbfd31871c9544185
+        .quad 0xbfd3302c16586588
+        .quad 0xbfd347dd9a987d55
+        .quad 0xbfd35f865c93293e
+        .quad 0xbfd3772662bfd85b
+        .quad 0xbfd38ebdb38ed321
+        .quad 0xbfd3a64c556945ea
+        .quad 0xbfd3bdd24eb14b6a
+        .quad 0xbfd3d54fa5c1f710
+        .quad 0xbfd3ecc460ef5f50
+        .quad 0xbfd404308686a7e4
+        .quad 0xbfd41b941cce0bee
+        .quad 0xbfd432ef2a04e814
+        .quad 0xbfd44a41b463c47c
+        .quad 0xbfd4618bc21c5ec2
+        .quad 0xbfd478cd5959b3d9
+        .quad 0xbfd49006804009d1
+        .quad 0xbfd4a7373cecf997
+        .quad 0xbfd4be5f957778a1
+        .quad 0xbfd4d57f8fefe27f
+        .quad 0xbfd4ec973260026a
+        .quad 0xbfd503a682cb1cb3
+        .quad 0xbfd51aad872df82d
+        .quad 0xbfd531ac457ee77e
+        .quad 0xbfd548a2c3add263
+        .quad 0xbfd55f9107a43ee2
+        .quad 0xbfd5767717455a6c
+        .quad 0xbfd58d54f86e02f2
+        .quad 0xbfd5a42ab0f4cfe2
+        .quad 0xbfd5baf846aa1b19
+        .quad 0xbfd5d1bdbf5809ca
+        .quad 0xbfd5e87b20c2954a
+        .quad 0xbfd5ff3070a793d4
+        .quad 0xbfd615ddb4bec13c
+        .quad 0xbfd62c82f2b9c795
+        .quad 0x3fd61965cdb02c1f
+        .quad 0x3fd602d08af091ec
+        .quad 0x3fd5ec433d5c35ae
+        .quad 0x3fd5d5bddf595f30
+        .quad 0x3fd5bf406b543db2
+        .quad 0x3fd5a8cadbbedfa1
+        .quad 0x3fd5925d2b112a59
+        .quad 0x3fd57bf753c8d1fb
+        .quad 0x3fd565995069514c
+        .quad 0x3fd54f431b7be1a9
+        .quad 0x3fd538f4af8f72fe
+        .quad 0x3fd522ae0738a3d8
+        .quad 0x3fd50c6f1d11b97c
+        .quad 0x3fd4f637ebba9810
+        .quad 0x3fd4e0086dd8baca
+        .quad 0x3fd4c9e09e172c3c
+        .quad 0x3fd4b3c077267e9a
+        .quad 0x3fd49da7f3bcc41f
+        .quad 0x3fd487970e958770
+        .quad 0x3fd4718dc271c41b
+        .quad 0x3fd45b8c0a17df13
+        .quad 0x3fd44591e0539f49
+        .quad 0x3fd42f9f3ff62642
+        .quad 0x3fd419b423d5e8c7
+        .quad 0x3fd403d086cea79c
+        .quad 0x3fd3edf463c1683e
+        .quad 0x3fd3d81fb5946dba
+        .quad 0x3fd3c25277333184
+        .quad 0x3fd3ac8ca38e5c5f
+        .quad 0x3fd396ce359bbf54
+        .quad 0x3fd3811728564cb2
+        .quad 0x3fd36b6776be1117
+        .quad 0x3fd355bf1bd82c8b
+        .quad 0x3fd3401e12aecba1
+        .quad 0x3fd32a84565120a8
+        .quad 0x3fd314f1e1d35ce4
+        .quad 0x3fd2ff66b04ea9d4
+        .quad 0x3fd2e9e2bce12286
+        .quad 0x3fd2d46602adccee
+        .quad 0x3fd2bef07cdc9354
+        .quad 0x3fd2a982269a3dbf
+        .quad 0x3fd2941afb186b7c
+        .quad 0x3fd27ebaf58d8c9d
+        .quad 0x3fd269621134db92
+        .quad 0x3fd25410494e56c7
+        .quad 0x3fd23ec5991eba49
+        .quad 0x3fd22981fbef797b
+        .quad 0x3fd214456d0eb8d4
+        .quad 0x3fd1ff0fe7cf47a7
+        .quad 0x3fd1e9e1678899f4
+        .quad 0x3fd1d4b9e796c245
+        .quad 0x3fd1bf99635a6b95
+        .quad 0x3fd1aa7fd638d33f
+        .quad 0x3fd1956d3b9bc2fa
+        .quad 0x3fd180618ef18adf
+        .quad 0x3fd16b5ccbacfb73
+        .quad 0x3fd1565eed455fc3
+        .quad 0x3fd14167ef367783
+        .quad 0x3fd12c77cd00713b
+        .quad 0x3fd1178e8227e47c
+        .quad 0x3fd102ac0a35cc1c
+        .quad 0x3fd0edd060b78081
+        .quad 0x3fd0d8fb813eb1ef
+        .quad 0x3fd0c42d676162e3
+        .quad 0x3fd0af660eb9e279
+        .quad 0x3fd09aa572e6c6d4
+        .quad 0x3fd085eb8f8ae797
+        .quad 0x3fd07138604d5862
+        .quad 0x3fd05c8be0d9635a
+        .quad 0x3fd047e60cde83b8
+        .quad 0x3fd03346e0106062
+        .quad 0x3fd01eae5626c691
+        .quad 0x3fd00a1c6adda473
+        .quad 0x3fcfeb2233ea07cd
+        .quad 0x3fcfc218be620a5e
+        .quad 0x3fcf991c6cb3b379
+        .quad 0x3fcf702d36777df0
+        .quad 0x3fcf474b134df229
+        .quad 0x3fcf1e75fadf9bde
+        .quad 0x3fcef5ade4dcffe6
+        .quad 0x3fceccf2c8fe920a
+        .quad 0x3fcea4449f04aaf5
+        .quad 0x3fce7ba35eb77e2a
+        .quad 0x3fce530effe71012
+        .quad 0x3fce2a877a6b2c12
+        .quad 0x3fce020cc6235ab5
+        .quad 0x3fcdd99edaf6d7e9
+        .quad 0x3fcdb13db0d48940
+        .quad 0x3fcd88e93fb2f450
+        .quad 0x3fcd60a17f903515
+        .quad 0x3fcd38666871f465
+        .quad 0x3fcd1037f2655e7b
+        .quad 0x3fcce816157f1988
+        .quad 0x3fccc000c9db3c52
+        .quad 0x3fcc97f8079d44ec
+        .quad 0x3fcc6ffbc6f00f71
+        .quad 0x3fcc480c0005ccd1
+        .quad 0x3fcc2028ab17f9b4
+        .quad 0x3fcbf851c067555f
+        .quad 0x3fcbd087383bd8ad
+        .quad 0x3fcba8c90ae4ad19
+        .quad 0x3fcb811730b823d2
+        .quad 0x3fcb5971a213acdb
+        .quad 0x3fcb31d8575bce3d
+        .quad 0x3fcb0a4b48fc1b46
+        .quad 0x3fcae2ca6f672bd4
+        .quad 0x3fcabb55c31693ad
+        .quad 0x3fca93ed3c8ad9e3
+        .quad 0x3fca6c90d44b704e
+        .quad 0x3fca454082e6ab05
+        .quad 0x3fca1dfc40f1b7f1
+        .quad 0x3fc9f6c407089664
+        .quad 0x3fc9cf97cdce0ec3
+        .quad 0x3fc9a8778debaa38
+        .quad 0x3fc981634011aa75
+        .quad 0x3fc95a5adcf7017f
+        .quad 0x3fc9335e5d594989
+        .quad 0x3fc90c6db9fcbcd9
+        .quad 0x3fc8e588ebac2dbf
+        .quad 0x3fc8beafeb38fe8c
+        .quad 0x3fc897e2b17b19a5
+        .quad 0x3fc871213750e994
+        .quad 0x3fc84a6b759f512f
+        .quad 0x3fc823c16551a3c2
+        .quad 0x3fc7fd22ff599d4f
+        .quad 0x3fc7d6903caf5ad0
+        .quad 0x3fc7b0091651528c
+        .quad 0x3fc7898d85444c73
+        .quad 0x3fc7631d82935a86
+        .quad 0x3fc73cb9074fd14d
+        .quad 0x3fc716600c914054
+        .quad 0x3fc6f0128b756abc
+        .quad 0x3fc6c9d07d203fc7
+        .quad 0x3fc6a399dabbd383
+        .quad 0x3fc67d6e9d785771
+        .quad 0x3fc6574ebe8c133a
+        .quad 0x3fc6313a37335d76
+        .quad 0x3fc60b3100b09476
+        .quad 0x3fc5e533144c1719
+        .quad 0x3fc5bf406b543db2
+        .quad 0x3fc59958ff1d52f1
+        .quad 0x3fc5737cc9018cdd
+        .quad 0x3fc54dabc26105d2
+        .quad 0x3fc527e5e4a1b58d
+        .quad 0x3fc5022b292f6a45
+        .quad 0x3fc4dc7b897bc1c8
+        .quad 0x3fc4b6d6fefe22a4
+        .quad 0x3fc4913d8333b561
+        .quad 0x3fc46baf0f9f5db7
+        .quad 0x3fc4462b9dc9b3dc
+        .quad 0x3fc420b32740fdd4
+        .quad 0x3fc3fb45a59928cc
+        .quad 0x3fc3d5e3126bc27f
+        .quad 0x3fc3b08b6757f2a9
+        .quad 0x3fc38b3e9e027479
+        .quad 0x3fc365fcb0159016
+        .quad 0x3fc340c59741142e
+        .quad 0x3fc31b994d3a4f85
+        .quad 0x3fc2f677cbbc0a96
+        .quad 0x3fc2d1610c86813a
+        .quad 0x3fc2ac55095f5c59
+        .quad 0x3fc28753bc11aba5
+        .quad 0x3fc2625d1e6ddf57
+        .quad 0x3fc23d712a49c202
+        .quad 0x3fc2188fd9807263
+        .quad 0x3fc1f3b925f25d41
+        .quad 0x3fc1ceed09853752
+        .quad 0x3fc1aa2b7e23f72a
+        .quad 0x3fc185747dbecf34
+        .quad 0x3fc160c8024b27b1
+        .quad 0x3fc13c2605c398c3
+        .quad 0x3fc1178e8227e47c
+        .quad 0x3fc0f301717cf0fb
+        .quad 0x3fc0ce7ecdccc28d
+        .quad 0x3fc0aa06912675d5
+        .quad 0x3fc08598b59e3a07
+        .quad 0x3fc06135354d4b18
+        .quad 0x3fc03cdc0a51ec0d
+        .quad 0x3fc0188d2ecf6140
+        .quad 0x3fbfe89139dbd566
+        .quad 0x3fbfa01c9db57ce2
+        .quad 0x3fbf57bc7d9005db
+        .quad 0x3fbf0f70cdd992e3
+        .quad 0x3fbec739830a1120
+        .quad 0x3fbe7f1691a32d3e
+        .quad 0x3fbe3707ee30487b
+        .quad 0x3fbdef0d8d466db9
+        .quad 0x3fbda727638446a2
+        .quad 0x3fbd5f55659210e2
+        .quad 0x3fbd179788219364
+        .quad 0x3fbccfedbfee13a8
+        .quad 0x3fbc885801bc4b23
+        .quad 0x3fbc40d6425a5cb1
+        .quad 0x3fbbf968769fca11
+        .quad 0x3fbbb20e936d6974
+        .quad 0x3fbb6ac88dad5b1c
+        .quad 0x3fbb23965a52ff00
+        .quad 0x3fbadc77ee5aea8c
+        .quad 0x3fba956d3ecade63
+        .quad 0x3fba4e7640b1bc38
+        .quad 0x3fba0792e9277cac
+        .quad 0x3fb9c0c32d4d2548
+        .quad 0x3fb97a07024cbe74
+        .quad 0x3fb9335e5d594989
+        .quad 0x3fb8ecc933aeb6e8
+        .quad 0x3fb8a6477a91dc29
+        .quad 0x3fb85fd927506a48
+        .quad 0x3fb8197e2f40e3f0
+        .quad 0x3fb7d33687c293c9
+        .quad 0x3fb78d02263d82d3
+        .quad 0x3fb746e100226ed9
+        .quad 0x3fb700d30aeac0e1
+        .quad 0x3fb6bad83c1883b6
+        .quad 0x3fb674f089365a7a
+        .quad 0x3fb62f1be7d77743
+        .quad 0x3fb5e95a4d9791cb
+        .quad 0x3fb5a3abb01ade25
+        .quad 0x3fb55e10050e0384
+        .quad 0x3fb518874226130a
+        .quad 0x3fb4d3115d207eac
+        .quad 0x3fb48dae4bc31018
+        .quad 0x3fb4485e03dbdfad
+        .quad 0x3fb403207b414b7f
+        .quad 0x3fb3bdf5a7d1ee64
+        .quad 0x3fb378dd7f749714
+        .quad 0x3fb333d7f8183f4b
+        .quad 0x3fb2eee507b40301
+        .quad 0x3fb2aa04a44717a5
+        .quad 0x3fb26536c3d8c369
+        .quad 0x3fb2207b5c78549e
+        .quad 0x3fb1dbd2643d190b
+        .quad 0x3fb1973bd1465567
+        .quad 0x3fb152b799bb3cc9
+        .quad 0x3fb10e45b3cae831
+        .quad 0x3fb0c9e615ac4e17
+        .quad 0x3fb08598b59e3a07
+        .quad 0x3fb0415d89e74444
+        .quad 0x3faffa6911ab9301
+        .quad 0x3faf723b517fc523
+        .quad 0x3faeea31c006b87c
+        .quad 0x3fae624c4a0b5e1b
+        .quad 0x3fadda8adc67ee4e
+        .quad 0x3fad52ed6405d86f
+        .quad 0x3faccb73cdddb2cc
+        .quad 0x3fac441e06f72a9e
+        .quad 0x3fabbcebfc68f420
+        .quad 0x3fab35dd9b58baad
+        .quad 0x3faaaef2d0fb10fc
+        .quad 0x3faa282b8a936171
+        .quad 0x3fa9a187b573de7c
+        .quad 0x3fa91b073efd7314
+        .quad 0x3fa894aa149fb343
+        .quad 0x3fa80e7023d8ccc4
+        .quad 0x3fa788595a3577ba
+        .quad 0x3fa70265a550e777
+        .quad 0x3fa67c94f2d4bb58
+        .quad 0x3fa5f6e73078efb8
+        .quad 0x3fa5715c4c03ceef
+        .quad 0x3fa4ebf43349e26f
+        .quad 0x3fa466aed42de3ea
+        .quad 0x3fa3e18c1ca0ae92
+        .quad 0x3fa35c8bfaa1306b
+        .quad 0x3fa2d7ae5c3c5bae
+        .quad 0x3fa252f32f8d183f
+        .quad 0x3fa1ce5a62bc353a
+        .quad 0x3fa149e3e4005a8d
+        .quad 0x3fa0c58fa19dfaaa
+        .quad 0x3fa0415d89e74444
+        .quad 0x3f9f7a9b16782856
+        .quad 0x3f9e72bf2813ce51
+        .quad 0x3f9d6b2725979802
+        .quad 0x3f9c63d2ec14aaf2
+        .quad 0x3f9b5cc258b718e6
+        .quad 0x3f9a55f548c5c43f
+        .quad 0x3f994f6b99a24475
+        .quad 0x3f98492528c8cabf
+        .quad 0x3f974321d3d006d3
+        .quad 0x3f963d6178690bd6
+        .quad 0x3f9537e3f45f3565
+        .quad 0x3f9432a925980cc1
+        .quad 0x3f932db0ea132e22
+        .quad 0x3f9228fb1fea2e28
+        .quad 0x3f912487a5507f70
+        .quad 0x3f90205658935847
+        .quad 0x3f8e38ce3033310c
+        .quad 0x3f8c317384c75f06
+        .quad 0x3f8a2a9c6c170462
+        .quad 0x3f882448a388a2aa
+        .quad 0x3f861e77e8b53fc6
+        .quad 0x3f841929f96832f0
+        .quad 0x3f82145e939ef1e9
+        .quad 0x3f8010157588de71
+        .quad 0x3f7c189cbb0e27fb
+        .quad 0x3f78121214586b54
+        .quad 0x3f740c8a747878e2
+        .quad 0x3f70080559588b35
+        .quad 0x3f680904828985c0
+        .quad 0x3f60040155d5889e
+        .quad 0x3f50020055655889
+        .quad 0x0000000000000000
+        /*== poly_coeff[4] ==*/
+        .align 16
+        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
+        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
+        .quad 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
+        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 16
+        .quad 0x3f50000000000000, 0x3f50000000000000
+        /*== MinLog1p = -1+2^(-53) ==*/
+        .align 16
+        .quad 0xbfefffffffffffff, 0xbfefffffffffffff
+        /*== MaxLog1p ==*/
+        .align 16
+        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000
+        /*== One ==*/
+        .align 16
+        .quad 0x3ff0000000000000, 0x3ff0000000000000
+        /*== SgnMask ==*/
+        .align 16
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== XThreshold ==*/
+        .align 16
+        .quad 0x3e00000000000000, 0x3e00000000000000
+        /*== XhMask ==*/
+        .align 16
+        .quad 0xfffffffffffffc00, 0xfffffffffffffc00
+        /*== Threshold ==*/
+        .align 16
+        .quad 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 16
+        .quad 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 16
+        .quad 0x408ff00000000000, 0x408ff00000000000
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x7ff0000000000000, 0x7ff0000000000000
+        /*== ExpMask2 ==*/
+        .align 16
+        .quad 0x7f40000000000000, 0x7f40000000000000
+        /*== L2L ==*/
+        .align 16
+        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
+        /*== dHalf ==*/
+        .align 16
+        .quad 0x3FE0000000000000, 0x3FE0000000000000
+        /*== dSign ==*/
+        .align 16
+        .quad 0x8000000000000000, 0x8000000000000000
+        /*== dTopMask12 ==*/
+        .align 16
+        .quad 0xFFFFFE0000000000, 0xFFFFFE0000000000
+        /*== dTopMask41 ==*/
+        .align 16
+        .quad 0xFFFFFFFFFFFFF000, 0xFFFFFFFFFFFFF000
+        /*== dTinyRange ==*/
+        .align 16
+        .quad 0x0350000000000000, 0x0350000000000000
+        .align 16
+        .type	__svml_datanh_data_internal,@object
+        .size	__svml_datanh_data_internal,.-__svml_datanh_data_internal
+        .align 16
+
+.FLT_21:
+        .long	0x00000000,0x43380000,0x00000000,0x43380000
+        .type	.FLT_21,@object
+        .size	.FLT_21,16
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core-sse.S
new file mode 100644
index 0000000000..a39cbb7595
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized atanh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_atanh _ZGVdN4v_atanh_sse_wrapper
+#include "../svml_d_atanh4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core.c
new file mode 100644
index 0000000000..e8ef343ae7
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized atanh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_atanh
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_atanh, __GI__ZGVdN4v_atanh, __redirect__ZGVdN4v_atanh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core_avx2.S
new file mode 100644
index 0000000000..1230029da2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core_avx2.S
@@ -0,0 +1,1479 @@
+/* Function atanh vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
+ *
+ *   Special cases:
+ *
+ *   atanh(0)  = 0
+ *   atanh(+1) = +INF
+ *   atanh(-1) = -INF
+ *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
+ *
+ */
+
+/* Offsets for data table __svml_datanh_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	8224
+#define poly_coeff                    	12352
+#define ExpMask                       	12480
+#define Two10                         	12512
+#define MinLog1p                      	12544
+#define MaxLog1p                      	12576
+#define One                           	12608
+#define SgnMask                       	12640
+#define XThreshold                    	12672
+#define XhMask                        	12704
+#define Threshold                     	12736
+#define Bias                          	12768
+#define Bias1                         	12800
+#define ExpMask0                      	12832
+#define ExpMask2                      	12864
+#define L2                            	12896
+#define dHalf                         	12928
+#define dSign                         	12960
+#define dTopMask12                    	12992
+#define dTopMask41                    	13024
+#define TinyRange                     	13056
+
+/* Lookup bias for data table __svml_datanh_data_internal.  */
+#define Table_Lookup_Bias               -0x405fe0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_atanh_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       Table_Lookup_Bias+__svml_datanh_data_internal(%rip), %r8
+        vmovupd   SgnMask+__svml_datanh_data_internal(%rip), %ymm7
+
+/* Load the constant 1 and a sign mask */
+        vmovupd   One+__svml_datanh_data_internal(%rip), %ymm11
+        vmovapd   %ymm0, %ymm12
+
+/* Strip off the sign, so treat X as positive until right at the end */
+        vandpd    %ymm7, %ymm12, %ymm0
+        vsubpd    %ymm0, %ymm11, %ymm6
+
+/*
+ * Check whether |X| < 1, in which case we use the main function.
+ * Otherwise set the rangemask so that the callout will get used.
+ * Note that this will also use the callout for NaNs since not(NaN < 1).
+ */
+        vcmpnlt_uqpd %ymm11, %ymm0, %ymm13
+        vcmplt_oqpd TinyRange+__svml_datanh_data_internal(%rip), %ymm0, %ymm10
+        vsubpd    %ymm6, %ymm11, %ymm15
+
+/*
+ * Compute V = 2 * X trivially, and UHi + U_lo = 1 - X in two pieces,
+ * the upper part UHi being <= 41 bits long. Then we have
+ * atanh(X) = 1/2 * log((1 + X) / (1 - X)) = 1/2 * log1p(V / (UHi + ULo)).
+ */
+        vaddpd    %ymm0, %ymm0, %ymm3
+        vcvtpd2ps %ymm6, %xmm5
+        vsubpd    %ymm0, %ymm15, %ymm1
+        vrcpps    %xmm5, %xmm4
+        vmovapd   %ymm12, %ymm14
+        vfmadd213pd %ymm12, %ymm12, %ymm14
+        vcvtps2pd %xmm4, %ymm2
+
+/* Record the sign for eventual reincorporation. */
+        vandpd    dSign+__svml_datanh_data_internal(%rip), %ymm12, %ymm9
+
+/* Or the sign bit in with the tiny result to handle atanh(-0) correctly */
+        vorpd     %ymm9, %ymm14, %ymm8
+        vandpd    dTopMask12+__svml_datanh_data_internal(%rip), %ymm2, %ymm14
+
+/* No need to split dU when FMA is available */
+        vfnmadd213pd %ymm11, %ymm14, %ymm6
+        vfnmadd231pd %ymm14, %ymm1, %ymm6
+
+/*
+ * Compute D = E + E^2 + E^3 + E^4 + E^5
+ * = E + (E + E^2) (E + E * E^2)
+ * Only saves when FMA is available
+ */
+        vmovapd   %ymm11, %ymm0
+        vmovapd   %ymm6, %ymm5
+        vfmadd231pd %ymm6, %ymm6, %ymm0
+        vfmadd213pd %ymm6, %ymm6, %ymm5
+        vfmadd213pd %ymm11, %ymm0, %ymm5
+        vmovmskpd %ymm13, %eax
+
+/*
+ * Split V as well into upper 41 bits and lower part, so that we can get
+ * a preliminary quotient estimate without rounding error.
+ */
+        vandpd    dTopMask41+__svml_datanh_data_internal(%rip), %ymm3, %ymm13
+        vsubpd    %ymm13, %ymm3, %ymm15
+
+/* Hence get initial quotient estimate QHi + QLo = R * VHi + R * VLo */
+        vmulpd    %ymm13, %ymm14, %ymm2
+        vmulpd    %ymm5, %ymm6, %ymm0
+        vmulpd    %ymm15, %ymm14, %ymm4
+
+/* 2^ (-10-exp(X) ) */
+        vmovupd   ExpMask2+__svml_datanh_data_internal(%rip), %ymm15
+
+/*
+ * Compute R * (VHi + VLo) * (1 + E + E^2 + E^3 + E^4 + E^5)
+ * = R *  (VHi + VLo) * (1 + D)
+ * = QHi + (QHi * D + QLo + QLo * D)
+ */
+        vmulpd    %ymm0, %ymm2, %ymm6
+        vfmadd213pd %ymm4, %ymm4, %ymm0
+        vaddpd    %ymm0, %ymm6, %ymm5
+
+/*
+ * Now finally accumulate the high and low parts of the
+ * argument to log1p, H + L, with a final compensated summation.
+ */
+        vaddpd    %ymm5, %ymm2, %ymm4
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * later incorporating L into the reduced argument.
+ * compute 1+x as high, low parts
+ */
+        vmaxpd    %ymm4, %ymm11, %ymm1
+        vminpd    %ymm4, %ymm11, %ymm3
+        vandpd    %ymm7, %ymm4, %ymm7
+        vcmplt_oqpd XThreshold+__svml_datanh_data_internal(%rip), %ymm7, %ymm0
+        vaddpd    %ymm3, %ymm1, %ymm5
+        vorpd     XhMask+__svml_datanh_data_internal(%rip), %ymm0, %ymm4
+        vandpd    %ymm4, %ymm5, %ymm5
+
+/* preserve mantissa, set input exponent to 2^(-10) */
+        vandpd    ExpMask+__svml_datanh_data_internal(%rip), %ymm5, %ymm6
+        vorpd     Two10+__svml_datanh_data_internal(%rip), %ymm6, %ymm7
+
+/* reciprocal approximation good to at least 11 bits */
+        vcvtpd2ps %ymm7, %xmm13
+        vsubpd    %ymm5, %ymm1, %ymm2
+        vrcpps    %xmm13, %xmm14
+        vaddpd    %ymm2, %ymm3, %ymm4
+        vcvtps2pd %xmm14, %ymm3
+
+/* exponent bits */
+        vpsrlq    $20, %ymm5, %ymm2
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        vroundpd  $0, %ymm3, %ymm3
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        vpsrlq    $40, %ymm3, %ymm13
+
+/* exponent of X needed to scale Xl */
+        vandps    ExpMask0+__svml_datanh_data_internal(%rip), %ymm5, %ymm0
+        vpsubq    %ymm0, %ymm15, %ymm6
+
+/* Finally, halve the result and reincorporate the sign */
+        vxorpd    dHalf+__svml_datanh_data_internal(%rip), %ymm9, %ymm9
+        vmovd     %xmm13, %edx
+        vextractf128 $1, %ymm13, %xmm0
+        movslq    %edx, %rdx
+        vpextrd   $2, %xmm13, %ecx
+        movslq    %ecx, %rcx
+        vmovd     %xmm0, %esi
+        vmovsd    (%r8,%rdx), %xmm14
+        vmovhpd   (%r8,%rcx), %xmm14, %xmm15
+
+/* exponent*log(2.0) */
+        vmovupd   Threshold+__svml_datanh_data_internal(%rip), %ymm14
+        movslq    %esi, %rsi
+        vpextrd   $2, %xmm0, %edi
+        movslq    %edi, %rdi
+        vextractf128 $1, %ymm2, %xmm1
+        vshufps   $221, %xmm1, %xmm2, %xmm7
+
+/* scale DblRcp */
+        vmulpd    %ymm6, %ymm3, %ymm2
+        vmovsd    (%r8,%rsi), %xmm6
+
+/* biased exponent in DP format */
+        vcvtdq2pd %xmm7, %ymm1
+        vmovhpd   (%r8,%rdi), %xmm6, %xmm7
+        vcmplt_oqpd %ymm3, %ymm14, %ymm3
+
+/* argument reduction */
+        vfmsub213pd %ymm11, %ymm2, %ymm5
+        vmulpd    %ymm2, %ymm4, %ymm11
+        vmovupd   poly_coeff+64+__svml_datanh_data_internal(%rip), %ymm2
+        vaddpd    %ymm11, %ymm5, %ymm5
+        vandpd    Bias+__svml_datanh_data_internal(%rip), %ymm3, %ymm3
+        vorpd     Bias1+__svml_datanh_data_internal(%rip), %ymm3, %ymm6
+        vsubpd    %ymm6, %ymm1, %ymm1
+        vfmadd213pd poly_coeff+96+__svml_datanh_data_internal(%rip), %ymm5, %ymm2
+        vmulpd    %ymm5, %ymm5, %ymm4
+        vmulpd    L2+__svml_datanh_data_internal(%rip), %ymm1, %ymm3
+
+/* polynomial */
+        vmovupd   poly_coeff+__svml_datanh_data_internal(%rip), %ymm1
+        vfmadd213pd poly_coeff+32+__svml_datanh_data_internal(%rip), %ymm5, %ymm1
+        vfmadd213pd %ymm2, %ymm4, %ymm1
+
+/* reconstruction */
+        vfmadd213pd %ymm5, %ymm4, %ymm1
+        vinsertf128 $1, %xmm7, %ymm15, %ymm0
+        vaddpd    %ymm1, %ymm0, %ymm0
+        vaddpd    %ymm0, %ymm3, %ymm6
+        vmulpd    %ymm6, %ymm9, %ymm0
+        vblendvpd %ymm10, %ymm8, %ymm0, %ymm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm12
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm12, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      atanh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_atanh_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_datanh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
+        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
+        __declspec(align(32)) VUINT32 ExpMask[4][2];
+        __declspec(align(32)) VUINT32 Two10[4][2];
+        __declspec(align(32)) VUINT32 MinLog1p[4][2];
+        __declspec(align(32)) VUINT32 MaxLog1p[4][2];
+        __declspec(align(32)) VUINT32 One[4][2];
+        __declspec(align(32)) VUINT32 SgnMask[4][2];
+        __declspec(align(32)) VUINT32 XThreshold[4][2];
+        __declspec(align(32)) VUINT32 XhMask[4][2];
+        __declspec(align(32)) VUINT32 Threshold[4][2];
+        __declspec(align(32)) VUINT32 Bias[4][2];
+        __declspec(align(32)) VUINT32 Bias1[4][2];
+        __declspec(align(32)) VUINT32 ExpMask0[4][2];
+        __declspec(align(32)) VUINT32 ExpMask2[4][2];
+        __declspec(align(32)) VUINT32 L2[4][2];
+        __declspec(align(32)) VUINT32 dHalf[4][2];
+        __declspec(align(32)) VUINT32 dSign[4][2];
+        __declspec(align(32)) VUINT32 dTopMask12[4][2];
+        __declspec(align(32)) VUINT32 dTopMask41[4][2];
+        __declspec(align(32)) VUINT32 TinyRange[4][2];
+} __svml_datanh_data_internal;
+#endif
+__svml_datanh_data_internal:
+        /* Log_HA_table */
+        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
+        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
+        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
+        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
+        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
+        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
+        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
+        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
+        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
+        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
+        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
+        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
+        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
+        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
+        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
+        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
+        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
+        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
+        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
+        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
+        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
+        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
+        .quad 0xc086238206e94218, 0xbe1ceee898588610
+        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
+        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
+        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
+        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
+        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
+        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
+        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
+        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
+        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
+        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
+        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
+        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
+        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
+        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
+        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
+        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
+        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
+        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
+        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
+        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
+        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
+        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
+        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
+        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
+        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
+        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
+        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
+        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
+        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
+        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
+        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
+        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
+        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
+        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
+        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
+        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
+        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
+        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
+        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
+        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
+        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
+        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
+        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
+        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
+        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
+        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
+        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
+        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
+        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
+        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
+        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
+        .quad 0xc086244055d2c968, 0xbe1cef345284c119
+        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
+        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
+        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
+        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
+        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
+        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
+        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
+        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
+        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
+        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
+        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
+        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
+        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
+        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
+        .quad 0xc086247419475160, 0xbe1cf03dd9922331
+        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
+        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
+        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
+        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
+        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
+        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
+        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
+        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
+        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
+        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
+        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
+        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
+        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
+        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
+        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
+        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
+        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
+        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
+        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
+        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
+        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
+        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
+        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
+        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
+        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
+        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
+        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
+        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
+        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
+        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
+        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
+        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
+        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
+        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
+        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
+        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
+        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
+        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
+        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
+        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
+        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
+        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
+        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
+        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
+        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
+        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
+        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
+        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
+        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
+        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
+        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
+        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
+        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
+        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
+        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
+        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
+        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
+        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
+        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
+        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
+        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
+        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
+        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
+        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
+        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
+        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
+        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
+        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
+        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
+        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
+        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
+        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
+        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
+        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
+        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
+        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
+        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
+        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
+        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
+        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
+        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
+        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
+        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
+        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
+        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
+        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
+        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
+        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
+        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
+        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
+        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
+        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
+        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
+        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
+        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
+        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
+        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
+        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
+        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
+        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
+        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
+        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
+        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
+        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
+        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
+        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
+        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
+        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
+        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
+        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
+        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
+        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
+        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
+        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
+        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
+        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
+        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
+        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
+        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
+        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
+        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
+        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
+        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
+        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
+        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
+        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
+        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
+        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
+        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
+        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
+        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
+        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
+        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
+        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
+        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
+        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
+        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
+        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
+        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
+        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
+        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
+        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
+        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
+        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
+        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
+        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
+        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
+        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
+        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
+        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
+        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
+        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
+        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
+        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
+        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
+        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
+        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
+        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
+        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
+        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
+        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
+        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
+        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
+        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
+        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
+        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
+        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
+        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
+        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
+        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
+        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
+        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
+        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
+        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
+        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
+        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
+        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
+        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
+        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
+        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
+        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
+        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
+        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
+        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
+        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
+        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
+        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
+        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
+        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
+        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
+        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
+        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
+        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
+        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
+        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
+        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
+        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
+        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
+        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
+        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
+        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
+        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
+        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
+        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
+        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
+        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
+        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
+        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
+        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
+        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
+        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
+        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
+        .quad 0xc08626e164224880, 0xbe1ceeb431709788
+        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
+        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
+        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
+        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
+        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
+        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
+        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
+        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
+        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
+        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
+        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
+        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
+        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
+        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
+        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
+        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
+        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
+        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
+        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
+        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
+        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
+        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
+        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
+        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
+        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
+        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
+        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
+        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
+        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
+        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
+        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
+        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
+        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
+        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
+        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
+        .quad 0xc086273a05367688, 0xbe1cf18656c50806
+        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
+        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
+        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
+        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
+        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
+        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
+        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
+        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
+        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
+        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
+        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
+        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
+        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
+        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
+        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
+        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
+        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
+        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
+        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
+        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
+        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
+        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
+        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
+        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
+        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
+        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
+        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
+        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
+        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
+        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
+        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
+        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
+        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
+        .quad 0xc086278a58297918, 0xbe1cf053073872bf
+        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
+        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
+        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
+        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
+        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
+        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
+        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
+        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
+        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
+        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
+        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
+        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
+        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
+        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
+        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
+        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
+        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
+        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
+        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
+        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
+        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
+        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
+        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
+        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
+        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
+        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
+        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
+        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
+        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
+        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
+        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
+        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
+        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
+        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
+        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
+        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
+        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
+        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
+        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
+        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
+        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
+        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
+        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
+        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
+        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
+        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
+        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
+        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
+        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
+        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
+        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
+        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
+        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
+        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
+        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
+        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
+        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
+        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
+        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
+        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
+        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
+        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
+        .quad 0xc086281755366778, 0xbe1cef2edae5837d
+        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
+        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
+        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
+        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
+        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
+        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
+        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
+        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
+        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
+        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
+        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
+        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
+        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
+        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
+        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
+        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
+        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
+        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
+        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
+        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
+        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
+        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
+        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
+        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
+        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
+        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
+        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
+        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
+        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
+        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
+        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
+        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
+        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
+        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
+        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
+        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
+        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
+        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
+        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
+        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
+        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
+        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
+        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
+        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
+        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
+        .quad 0xc086287879041490, 0xbe1cf034803c8a48
+        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
+        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
+        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
+        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
+        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
+        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
+        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
+        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
+        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
+        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
+        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
+        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
+        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
+        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
+        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
+        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
+        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
+        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
+        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
+        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
+        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
+        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
+        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
+        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
+        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
+        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
+        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
+        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
+        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
+        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
+        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
+        /*== Log_LA_table ==*/
+        .align 32
+        .quad 0x8000000000000000
+        .quad 0xbf5ff802a9ab10e6
+        .quad 0xbf6ff00aa2b10bc0
+        .quad 0xbf77ee11ebd82e94
+        .quad 0xbf7fe02a6b106789
+        .quad 0xbf83e7295d25a7d9
+        .quad 0xbf87dc475f810a77
+        .quad 0xbf8bcf712c74384c
+        .quad 0xbf8fc0a8b0fc03e4
+        .quad 0xbf91d7f7eb9eebe7
+        .quad 0xbf93cea44346a575
+        .quad 0xbf95c45a51b8d389
+        .quad 0xbf97b91b07d5b11b
+        .quad 0xbf99ace7551cc514
+        .quad 0xbf9b9fc027af9198
+        .quad 0xbf9d91a66c543cc4
+        .quad 0xbf9f829b0e783300
+        .quad 0xbfa0b94f7c196176
+        .quad 0xbfa1b0d98923d980
+        .quad 0xbfa2a7ec2214e873
+        .quad 0xbfa39e87b9febd60
+        .quad 0xbfa494acc34d911c
+        .quad 0xbfa58a5bafc8e4d5
+        .quad 0xbfa67f94f094bd98
+        .quad 0xbfa77458f632dcfc
+        .quad 0xbfa868a83083f6cf
+        .quad 0xbfa95c830ec8e3eb
+        .quad 0xbfaa4fe9ffa3d235
+        .quad 0xbfab42dd711971bf
+        .quad 0xbfac355dd0921f2d
+        .quad 0xbfad276b8adb0b52
+        .quad 0xbfae19070c276016
+        .quad 0xbfaf0a30c01162a6
+        .quad 0xbfaffae9119b9303
+        .quad 0xbfb075983598e471
+        .quad 0xbfb0ed839b5526fe
+        .quad 0xbfb16536eea37ae1
+        .quad 0xbfb1dcb263db1944
+        .quad 0xbfb253f62f0a1417
+        .quad 0xbfb2cb0283f5de1f
+        .quad 0xbfb341d7961bd1d1
+        .quad 0xbfb3b87598b1b6ee
+        .quad 0xbfb42edcbea646f0
+        .quad 0xbfb4a50d3aa1b040
+        .quad 0xbfb51b073f06183f
+        .quad 0xbfb590cafdf01c28
+        .quad 0xbfb60658a93750c4
+        .quad 0xbfb67bb0726ec0fc
+        .quad 0xbfb6f0d28ae56b4c
+        .quad 0xbfb765bf23a6be13
+        .quad 0xbfb7da766d7b12cd
+        .quad 0xbfb84ef898e8282a
+        .quad 0xbfb8c345d6319b21
+        .quad 0xbfb9375e55595ede
+        .quad 0xbfb9ab42462033ad
+        .quad 0xbfba1ef1d8061cd4
+        .quad 0xbfba926d3a4ad563
+        .quad 0xbfbb05b49bee43fe
+        .quad 0xbfbb78c82bb0eda1
+        .quad 0xbfbbeba818146765
+        .quad 0xbfbc5e548f5bc743
+        .quad 0xbfbcd0cdbf8c13e1
+        .quad 0xbfbd4313d66cb35d
+        .quad 0xbfbdb5270187d927
+        .quad 0xbfbe27076e2af2e6
+        .quad 0xbfbe98b549671467
+        .quad 0xbfbf0a30c01162a6
+        .quad 0xbfbf7b79fec37ddf
+        .quad 0xbfbfec9131dbeabb
+        .quad 0xbfc02ebb42bf3d4b
+        .quad 0xbfc0671512ca596e
+        .quad 0xbfc09f561ee719c3
+        .quad 0xbfc0d77e7cd08e59
+        .quad 0xbfc10f8e422539b1
+        .quad 0xbfc14785846742ac
+        .quad 0xbfc17f6458fca611
+        .quad 0xbfc1b72ad52f67a0
+        .quad 0xbfc1eed90e2dc2c3
+        .quad 0xbfc2266f190a5acb
+        .quad 0xbfc25ded0abc6ad2
+        .quad 0xbfc29552f81ff523
+        .quad 0xbfc2cca0f5f5f251
+        .quad 0xbfc303d718e47fd3
+        .quad 0xbfc33af575770e4f
+        .quad 0xbfc371fc201e8f74
+        .quad 0xbfc3a8eb2d31a376
+        .quad 0xbfc3dfc2b0ecc62a
+        .quad 0xbfc41682bf727bc0
+        .quad 0xbfc44d2b6ccb7d1e
+        .quad 0xbfc483bccce6e3dd
+        .quad 0xbfc4ba36f39a55e5
+        .quad 0xbfc4f099f4a230b2
+        .quad 0xbfc526e5e3a1b438
+        .quad 0xbfc55d1ad4232d6f
+        .quad 0xbfc59338d9982086
+        .quad 0xbfc5c940075972b9
+        .quad 0xbfc5ff3070a793d4
+        .quad 0xbfc6350a28aaa758
+        .quad 0xbfc66acd4272ad51
+        .quad 0xbfc6a079d0f7aad2
+        .quad 0xbfc6d60fe719d21d
+        .quad 0xbfc70b8f97a1aa75
+        .quad 0xbfc740f8f54037a5
+        .quad 0xbfc7764c128f2127
+        .quad 0xbfc7ab890210d909
+        .quad 0xbfc7e0afd630c274
+        .quad 0xbfc815c0a14357eb
+        .quad 0xbfc84abb75865139
+        .quad 0xbfc87fa06520c911
+        .quad 0xbfc8b46f8223625b
+        .quad 0xbfc8e928de886d41
+        .quad 0xbfc91dcc8c340bde
+        .quad 0xbfc9525a9cf456b4
+        .quad 0xbfc986d3228180ca
+        .quad 0xbfc9bb362e7dfb83
+        .quad 0xbfc9ef83d2769a34
+        .quad 0xbfca23bc1fe2b563
+        .quad 0xbfca57df28244dcd
+        .quad 0xbfca8becfc882f19
+        .quad 0xbfcabfe5ae46124c
+        .quad 0xbfcaf3c94e80bff3
+        .quad 0xbfcb2797ee46320c
+        .quad 0xbfcb5b519e8fb5a4
+        .quad 0xbfcb8ef670420c3b
+        .quad 0xbfcbc286742d8cd6
+        .quad 0xbfcbf601bb0e44e2
+        .quad 0xbfcc2968558c18c1
+        .quad 0xbfcc5cba543ae425
+        .quad 0xbfcc8ff7c79a9a22
+        .quad 0xbfccc320c0176502
+        .quad 0xbfccf6354e09c5dc
+        .quad 0xbfcd293581b6b3e7
+        .quad 0xbfcd5c216b4fbb91
+        .quad 0xbfcd8ef91af31d5e
+        .quad 0xbfcdc1bca0abec7d
+        .quad 0xbfcdf46c0c722d2f
+        .quad 0xbfce27076e2af2e6
+        .quad 0xbfce598ed5a87e2f
+        .quad 0xbfce8c0252aa5a60
+        .quad 0xbfcebe61f4dd7b0b
+        .quad 0xbfcef0adcbdc5936
+        .quad 0xbfcf22e5e72f105d
+        .quad 0xbfcf550a564b7b37
+        .quad 0xbfcf871b28955045
+        .quad 0xbfcfb9186d5e3e2b
+        .quad 0xbfcfeb0233e607cc
+        .quad 0xbfd00e6c45ad501d
+        .quad 0xbfd0274dc16c232f
+        .quad 0xbfd0402594b4d041
+        .quad 0xbfd058f3c703ebc6
+        .quad 0xbfd071b85fcd590d
+        .quad 0xbfd08a73667c57af
+        .quad 0xbfd0a324e27390e3
+        .quad 0xbfd0bbccdb0d24bd
+        .quad 0xbfd0d46b579ab74b
+        .quad 0xbfd0ed005f657da4
+        .quad 0xbfd1058bf9ae4ad5
+        .quad 0xbfd11e0e2dad9cb7
+        .quad 0xbfd136870293a8b0
+        .quad 0xbfd14ef67f88685a
+        .quad 0xbfd1675cababa60e
+        .quad 0xbfd17fb98e15095d
+        .quad 0xbfd1980d2dd4236f
+        .quad 0xbfd1b05791f07b49
+        .quad 0xbfd1c898c16999fb
+        .quad 0xbfd1e0d0c33716be
+        .quad 0xbfd1f8ff9e48a2f3
+        .quad 0xbfd211255986160c
+        .quad 0xbfd22941fbcf7966
+        .quad 0xbfd241558bfd1404
+        .quad 0xbfd2596010df763a
+        .quad 0xbfd27161913f853d
+        .quad 0xbfd2895a13de86a3
+        .quad 0xbfd2a1499f762bc9
+        .quad 0xbfd2b9303ab89d25
+        .quad 0xbfd2d10dec508583
+        .quad 0xbfd2e8e2bae11d31
+        .quad 0xbfd300aead06350c
+        .quad 0xbfd31871c9544185
+        .quad 0xbfd3302c16586588
+        .quad 0xbfd347dd9a987d55
+        .quad 0xbfd35f865c93293e
+        .quad 0xbfd3772662bfd85b
+        .quad 0xbfd38ebdb38ed321
+        .quad 0xbfd3a64c556945ea
+        .quad 0xbfd3bdd24eb14b6a
+        .quad 0xbfd3d54fa5c1f710
+        .quad 0xbfd3ecc460ef5f50
+        .quad 0xbfd404308686a7e4
+        .quad 0xbfd41b941cce0bee
+        .quad 0xbfd432ef2a04e814
+        .quad 0xbfd44a41b463c47c
+        .quad 0xbfd4618bc21c5ec2
+        .quad 0xbfd478cd5959b3d9
+        .quad 0xbfd49006804009d1
+        .quad 0xbfd4a7373cecf997
+        .quad 0xbfd4be5f957778a1
+        .quad 0xbfd4d57f8fefe27f
+        .quad 0xbfd4ec973260026a
+        .quad 0xbfd503a682cb1cb3
+        .quad 0xbfd51aad872df82d
+        .quad 0xbfd531ac457ee77e
+        .quad 0xbfd548a2c3add263
+        .quad 0xbfd55f9107a43ee2
+        .quad 0xbfd5767717455a6c
+        .quad 0xbfd58d54f86e02f2
+        .quad 0xbfd5a42ab0f4cfe2
+        .quad 0xbfd5baf846aa1b19
+        .quad 0xbfd5d1bdbf5809ca
+        .quad 0xbfd5e87b20c2954a
+        .quad 0xbfd5ff3070a793d4
+        .quad 0xbfd615ddb4bec13c
+        .quad 0xbfd62c82f2b9c795
+        .quad 0x3fd61965cdb02c1f
+        .quad 0x3fd602d08af091ec
+        .quad 0x3fd5ec433d5c35ae
+        .quad 0x3fd5d5bddf595f30
+        .quad 0x3fd5bf406b543db2
+        .quad 0x3fd5a8cadbbedfa1
+        .quad 0x3fd5925d2b112a59
+        .quad 0x3fd57bf753c8d1fb
+        .quad 0x3fd565995069514c
+        .quad 0x3fd54f431b7be1a9
+        .quad 0x3fd538f4af8f72fe
+        .quad 0x3fd522ae0738a3d8
+        .quad 0x3fd50c6f1d11b97c
+        .quad 0x3fd4f637ebba9810
+        .quad 0x3fd4e0086dd8baca
+        .quad 0x3fd4c9e09e172c3c
+        .quad 0x3fd4b3c077267e9a
+        .quad 0x3fd49da7f3bcc41f
+        .quad 0x3fd487970e958770
+        .quad 0x3fd4718dc271c41b
+        .quad 0x3fd45b8c0a17df13
+        .quad 0x3fd44591e0539f49
+        .quad 0x3fd42f9f3ff62642
+        .quad 0x3fd419b423d5e8c7
+        .quad 0x3fd403d086cea79c
+        .quad 0x3fd3edf463c1683e
+        .quad 0x3fd3d81fb5946dba
+        .quad 0x3fd3c25277333184
+        .quad 0x3fd3ac8ca38e5c5f
+        .quad 0x3fd396ce359bbf54
+        .quad 0x3fd3811728564cb2
+        .quad 0x3fd36b6776be1117
+        .quad 0x3fd355bf1bd82c8b
+        .quad 0x3fd3401e12aecba1
+        .quad 0x3fd32a84565120a8
+        .quad 0x3fd314f1e1d35ce4
+        .quad 0x3fd2ff66b04ea9d4
+        .quad 0x3fd2e9e2bce12286
+        .quad 0x3fd2d46602adccee
+        .quad 0x3fd2bef07cdc9354
+        .quad 0x3fd2a982269a3dbf
+        .quad 0x3fd2941afb186b7c
+        .quad 0x3fd27ebaf58d8c9d
+        .quad 0x3fd269621134db92
+        .quad 0x3fd25410494e56c7
+        .quad 0x3fd23ec5991eba49
+        .quad 0x3fd22981fbef797b
+        .quad 0x3fd214456d0eb8d4
+        .quad 0x3fd1ff0fe7cf47a7
+        .quad 0x3fd1e9e1678899f4
+        .quad 0x3fd1d4b9e796c245
+        .quad 0x3fd1bf99635a6b95
+        .quad 0x3fd1aa7fd638d33f
+        .quad 0x3fd1956d3b9bc2fa
+        .quad 0x3fd180618ef18adf
+        .quad 0x3fd16b5ccbacfb73
+        .quad 0x3fd1565eed455fc3
+        .quad 0x3fd14167ef367783
+        .quad 0x3fd12c77cd00713b
+        .quad 0x3fd1178e8227e47c
+        .quad 0x3fd102ac0a35cc1c
+        .quad 0x3fd0edd060b78081
+        .quad 0x3fd0d8fb813eb1ef
+        .quad 0x3fd0c42d676162e3
+        .quad 0x3fd0af660eb9e279
+        .quad 0x3fd09aa572e6c6d4
+        .quad 0x3fd085eb8f8ae797
+        .quad 0x3fd07138604d5862
+        .quad 0x3fd05c8be0d9635a
+        .quad 0x3fd047e60cde83b8
+        .quad 0x3fd03346e0106062
+        .quad 0x3fd01eae5626c691
+        .quad 0x3fd00a1c6adda473
+        .quad 0x3fcfeb2233ea07cd
+        .quad 0x3fcfc218be620a5e
+        .quad 0x3fcf991c6cb3b379
+        .quad 0x3fcf702d36777df0
+        .quad 0x3fcf474b134df229
+        .quad 0x3fcf1e75fadf9bde
+        .quad 0x3fcef5ade4dcffe6
+        .quad 0x3fceccf2c8fe920a
+        .quad 0x3fcea4449f04aaf5
+        .quad 0x3fce7ba35eb77e2a
+        .quad 0x3fce530effe71012
+        .quad 0x3fce2a877a6b2c12
+        .quad 0x3fce020cc6235ab5
+        .quad 0x3fcdd99edaf6d7e9
+        .quad 0x3fcdb13db0d48940
+        .quad 0x3fcd88e93fb2f450
+        .quad 0x3fcd60a17f903515
+        .quad 0x3fcd38666871f465
+        .quad 0x3fcd1037f2655e7b
+        .quad 0x3fcce816157f1988
+        .quad 0x3fccc000c9db3c52
+        .quad 0x3fcc97f8079d44ec
+        .quad 0x3fcc6ffbc6f00f71
+        .quad 0x3fcc480c0005ccd1
+        .quad 0x3fcc2028ab17f9b4
+        .quad 0x3fcbf851c067555f
+        .quad 0x3fcbd087383bd8ad
+        .quad 0x3fcba8c90ae4ad19
+        .quad 0x3fcb811730b823d2
+        .quad 0x3fcb5971a213acdb
+        .quad 0x3fcb31d8575bce3d
+        .quad 0x3fcb0a4b48fc1b46
+        .quad 0x3fcae2ca6f672bd4
+        .quad 0x3fcabb55c31693ad
+        .quad 0x3fca93ed3c8ad9e3
+        .quad 0x3fca6c90d44b704e
+        .quad 0x3fca454082e6ab05
+        .quad 0x3fca1dfc40f1b7f1
+        .quad 0x3fc9f6c407089664
+        .quad 0x3fc9cf97cdce0ec3
+        .quad 0x3fc9a8778debaa38
+        .quad 0x3fc981634011aa75
+        .quad 0x3fc95a5adcf7017f
+        .quad 0x3fc9335e5d594989
+        .quad 0x3fc90c6db9fcbcd9
+        .quad 0x3fc8e588ebac2dbf
+        .quad 0x3fc8beafeb38fe8c
+        .quad 0x3fc897e2b17b19a5
+        .quad 0x3fc871213750e994
+        .quad 0x3fc84a6b759f512f
+        .quad 0x3fc823c16551a3c2
+        .quad 0x3fc7fd22ff599d4f
+        .quad 0x3fc7d6903caf5ad0
+        .quad 0x3fc7b0091651528c
+        .quad 0x3fc7898d85444c73
+        .quad 0x3fc7631d82935a86
+        .quad 0x3fc73cb9074fd14d
+        .quad 0x3fc716600c914054
+        .quad 0x3fc6f0128b756abc
+        .quad 0x3fc6c9d07d203fc7
+        .quad 0x3fc6a399dabbd383
+        .quad 0x3fc67d6e9d785771
+        .quad 0x3fc6574ebe8c133a
+        .quad 0x3fc6313a37335d76
+        .quad 0x3fc60b3100b09476
+        .quad 0x3fc5e533144c1719
+        .quad 0x3fc5bf406b543db2
+        .quad 0x3fc59958ff1d52f1
+        .quad 0x3fc5737cc9018cdd
+        .quad 0x3fc54dabc26105d2
+        .quad 0x3fc527e5e4a1b58d
+        .quad 0x3fc5022b292f6a45
+        .quad 0x3fc4dc7b897bc1c8
+        .quad 0x3fc4b6d6fefe22a4
+        .quad 0x3fc4913d8333b561
+        .quad 0x3fc46baf0f9f5db7
+        .quad 0x3fc4462b9dc9b3dc
+        .quad 0x3fc420b32740fdd4
+        .quad 0x3fc3fb45a59928cc
+        .quad 0x3fc3d5e3126bc27f
+        .quad 0x3fc3b08b6757f2a9
+        .quad 0x3fc38b3e9e027479
+        .quad 0x3fc365fcb0159016
+        .quad 0x3fc340c59741142e
+        .quad 0x3fc31b994d3a4f85
+        .quad 0x3fc2f677cbbc0a96
+        .quad 0x3fc2d1610c86813a
+        .quad 0x3fc2ac55095f5c59
+        .quad 0x3fc28753bc11aba5
+        .quad 0x3fc2625d1e6ddf57
+        .quad 0x3fc23d712a49c202
+        .quad 0x3fc2188fd9807263
+        .quad 0x3fc1f3b925f25d41
+        .quad 0x3fc1ceed09853752
+        .quad 0x3fc1aa2b7e23f72a
+        .quad 0x3fc185747dbecf34
+        .quad 0x3fc160c8024b27b1
+        .quad 0x3fc13c2605c398c3
+        .quad 0x3fc1178e8227e47c
+        .quad 0x3fc0f301717cf0fb
+        .quad 0x3fc0ce7ecdccc28d
+        .quad 0x3fc0aa06912675d5
+        .quad 0x3fc08598b59e3a07
+        .quad 0x3fc06135354d4b18
+        .quad 0x3fc03cdc0a51ec0d
+        .quad 0x3fc0188d2ecf6140
+        .quad 0x3fbfe89139dbd566
+        .quad 0x3fbfa01c9db57ce2
+        .quad 0x3fbf57bc7d9005db
+        .quad 0x3fbf0f70cdd992e3
+        .quad 0x3fbec739830a1120
+        .quad 0x3fbe7f1691a32d3e
+        .quad 0x3fbe3707ee30487b
+        .quad 0x3fbdef0d8d466db9
+        .quad 0x3fbda727638446a2
+        .quad 0x3fbd5f55659210e2
+        .quad 0x3fbd179788219364
+        .quad 0x3fbccfedbfee13a8
+        .quad 0x3fbc885801bc4b23
+        .quad 0x3fbc40d6425a5cb1
+        .quad 0x3fbbf968769fca11
+        .quad 0x3fbbb20e936d6974
+        .quad 0x3fbb6ac88dad5b1c
+        .quad 0x3fbb23965a52ff00
+        .quad 0x3fbadc77ee5aea8c
+        .quad 0x3fba956d3ecade63
+        .quad 0x3fba4e7640b1bc38
+        .quad 0x3fba0792e9277cac
+        .quad 0x3fb9c0c32d4d2548
+        .quad 0x3fb97a07024cbe74
+        .quad 0x3fb9335e5d594989
+        .quad 0x3fb8ecc933aeb6e8
+        .quad 0x3fb8a6477a91dc29
+        .quad 0x3fb85fd927506a48
+        .quad 0x3fb8197e2f40e3f0
+        .quad 0x3fb7d33687c293c9
+        .quad 0x3fb78d02263d82d3
+        .quad 0x3fb746e100226ed9
+        .quad 0x3fb700d30aeac0e1
+        .quad 0x3fb6bad83c1883b6
+        .quad 0x3fb674f089365a7a
+        .quad 0x3fb62f1be7d77743
+        .quad 0x3fb5e95a4d9791cb
+        .quad 0x3fb5a3abb01ade25
+        .quad 0x3fb55e10050e0384
+        .quad 0x3fb518874226130a
+        .quad 0x3fb4d3115d207eac
+        .quad 0x3fb48dae4bc31018
+        .quad 0x3fb4485e03dbdfad
+        .quad 0x3fb403207b414b7f
+        .quad 0x3fb3bdf5a7d1ee64
+        .quad 0x3fb378dd7f749714
+        .quad 0x3fb333d7f8183f4b
+        .quad 0x3fb2eee507b40301
+        .quad 0x3fb2aa04a44717a5
+        .quad 0x3fb26536c3d8c369
+        .quad 0x3fb2207b5c78549e
+        .quad 0x3fb1dbd2643d190b
+        .quad 0x3fb1973bd1465567
+        .quad 0x3fb152b799bb3cc9
+        .quad 0x3fb10e45b3cae831
+        .quad 0x3fb0c9e615ac4e17
+        .quad 0x3fb08598b59e3a07
+        .quad 0x3fb0415d89e74444
+        .quad 0x3faffa6911ab9301
+        .quad 0x3faf723b517fc523
+        .quad 0x3faeea31c006b87c
+        .quad 0x3fae624c4a0b5e1b
+        .quad 0x3fadda8adc67ee4e
+        .quad 0x3fad52ed6405d86f
+        .quad 0x3faccb73cdddb2cc
+        .quad 0x3fac441e06f72a9e
+        .quad 0x3fabbcebfc68f420
+        .quad 0x3fab35dd9b58baad
+        .quad 0x3faaaef2d0fb10fc
+        .quad 0x3faa282b8a936171
+        .quad 0x3fa9a187b573de7c
+        .quad 0x3fa91b073efd7314
+        .quad 0x3fa894aa149fb343
+        .quad 0x3fa80e7023d8ccc4
+        .quad 0x3fa788595a3577ba
+        .quad 0x3fa70265a550e777
+        .quad 0x3fa67c94f2d4bb58
+        .quad 0x3fa5f6e73078efb8
+        .quad 0x3fa5715c4c03ceef
+        .quad 0x3fa4ebf43349e26f
+        .quad 0x3fa466aed42de3ea
+        .quad 0x3fa3e18c1ca0ae92
+        .quad 0x3fa35c8bfaa1306b
+        .quad 0x3fa2d7ae5c3c5bae
+        .quad 0x3fa252f32f8d183f
+        .quad 0x3fa1ce5a62bc353a
+        .quad 0x3fa149e3e4005a8d
+        .quad 0x3fa0c58fa19dfaaa
+        .quad 0x3fa0415d89e74444
+        .quad 0x3f9f7a9b16782856
+        .quad 0x3f9e72bf2813ce51
+        .quad 0x3f9d6b2725979802
+        .quad 0x3f9c63d2ec14aaf2
+        .quad 0x3f9b5cc258b718e6
+        .quad 0x3f9a55f548c5c43f
+        .quad 0x3f994f6b99a24475
+        .quad 0x3f98492528c8cabf
+        .quad 0x3f974321d3d006d3
+        .quad 0x3f963d6178690bd6
+        .quad 0x3f9537e3f45f3565
+        .quad 0x3f9432a925980cc1
+        .quad 0x3f932db0ea132e22
+        .quad 0x3f9228fb1fea2e28
+        .quad 0x3f912487a5507f70
+        .quad 0x3f90205658935847
+        .quad 0x3f8e38ce3033310c
+        .quad 0x3f8c317384c75f06
+        .quad 0x3f8a2a9c6c170462
+        .quad 0x3f882448a388a2aa
+        .quad 0x3f861e77e8b53fc6
+        .quad 0x3f841929f96832f0
+        .quad 0x3f82145e939ef1e9
+        .quad 0x3f8010157588de71
+        .quad 0x3f7c189cbb0e27fb
+        .quad 0x3f78121214586b54
+        .quad 0x3f740c8a747878e2
+        .quad 0x3f70080559588b35
+        .quad 0x3f680904828985c0
+        .quad 0x3f60040155d5889e
+        .quad 0x3f50020055655889
+        .quad 0x0000000000000000
+        /*== poly_coeff[4] ==*/
+        .align 32
+        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
+        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
+        .quad 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
+        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 32
+        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
+        /*== MinLog1p = -1+2^(-53) ==*/
+        .align 32
+        .quad 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff
+        /*== MaxLog1p ==*/
+        .align 32
+        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000
+        /*== One ==*/
+        .align 32
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== SgnMask ==*/
+        .align 32
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== XThreshold ==*/
+        .align 32
+        .quad 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000
+        /*== XhMask ==*/
+        .align 32
+        .quad 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00
+        /*== Threshold ==*/
+        .align 32
+        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 32
+        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 32
+        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000
+        /*== ExpMask2 ==*/
+        .align 32
+        .quad 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000
+        /*== L2L ==*/
+        .align 32
+        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
+        /*== dHalf ==*/
+        .align 32
+        .quad 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000
+        /*== dSign ==*/
+        .align 32
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
+        /*== dTopMask12 ==*/
+        .align 32
+        .quad 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000
+        /*== dTopMask41 ==*/
+        .align 32
+        .quad 0xFFFFFFFFFFFFF000, 0xFFFFFFFFFFFFF000, 0xFFFFFFFFFFFFF000, 0xFFFFFFFFFFFFF000
+        /*== dTinyRange ==*/
+        .align 32
+        .quad 0x0350000000000000, 0x0350000000000000, 0x0350000000000000, 0x0350000000000000
+        .align 32
+        .type	__svml_datanh_data_internal,@object
+        .size	__svml_datanh_data_internal,.-__svml_datanh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core-avx2.S
new file mode 100644
index 0000000000..675ebd2fd6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized atanh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_atanh _ZGVeN8v_atanh_avx2_wrapper
+#include "../svml_d_atanh8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core.c
new file mode 100644
index 0000000000..4da8e20fad
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized atanh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_atanh
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_atanh, __GI__ZGVeN8v_atanh, __redirect__ZGVeN8v_atanh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core_avx512.S
new file mode 100644
index 0000000000..ef600c073a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core_avx512.S
@@ -0,0 +1,401 @@
+/* Function atanh vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
+ *   using small lookup table that map to AVX-512 permute instructions
+ *
+ *   Special cases:
+ *
+ *   atanh(0)  = 0
+ *   atanh(+1) = +INF
+ *   atanh(-1) = -INF
+ *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
+ *
+ */
+
+/* Offsets for data table __svml_datanh_data_internal_avx512
+ */
+#define Log_tbl_H                     	0
+#define Log_tbl_L                     	128
+#define One                           	256
+#define AbsMask                       	320
+#define AddB5                         	384
+#define RcpBitMask                    	448
+#define poly_coeff8                   	512
+#define poly_coeff7                   	576
+#define poly_coeff6                   	640
+#define poly_coeff5                   	704
+#define poly_coeff4                   	768
+#define poly_coeff3                   	832
+#define poly_coeff2                   	896
+#define poly_coeff1                   	960
+#define poly_coeff0                   	1024
+#define Half                          	1088
+#define L2H                           	1152
+#define L2L                           	1216
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_atanh_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   One+__svml_datanh_data_internal_avx512(%rip), %zmm15
+
+/* round reciprocals to 1+4b mantissas */
+        vmovups   AddB5+__svml_datanh_data_internal_avx512(%rip), %zmm6
+        vmovups   RcpBitMask+__svml_datanh_data_internal_avx512(%rip), %zmm9
+        vmovaps   %zmm0, %zmm2
+        vandpd    AbsMask+__svml_datanh_data_internal_avx512(%rip), %zmm2, %zmm13
+
+/* 1+y */
+        vaddpd    {rn-sae}, %zmm15, %zmm13, %zmm0
+
+/* 1-y */
+        vsubpd    {rn-sae}, %zmm13, %zmm15, %zmm4
+        vxorpd    %zmm13, %zmm2, %zmm1
+
+/* Yp_high */
+        vsubpd    {rn-sae}, %zmm15, %zmm0, %zmm7
+
+/* -Ym_high */
+        vsubpd    {rn-sae}, %zmm15, %zmm4, %zmm12
+
+/* RcpP ~ 1/Yp */
+        vrcp14pd  %zmm0, %zmm3
+
+/* RcpM ~ 1/Ym */
+        vrcp14pd  %zmm4, %zmm5
+
+/* input outside (-1, 1) ? */
+        vcmppd    $21, {sae}, %zmm15, %zmm13, %k0
+        vpaddq    %zmm6, %zmm3, %zmm11
+        vpaddq    %zmm6, %zmm5, %zmm10
+
+/* Yp_low */
+        vsubpd    {rn-sae}, %zmm7, %zmm13, %zmm8
+        vandpd    %zmm9, %zmm11, %zmm14
+        vandpd    %zmm9, %zmm10, %zmm3
+
+/* Ym_low */
+        vaddpd    {rn-sae}, %zmm12, %zmm13, %zmm12
+
+/* Reduced argument: Rp = (RcpP*Yp - 1)+RcpP*Yp_low */
+        vfmsub213pd {rn-sae}, %zmm15, %zmm14, %zmm0
+
+/* Reduced argument: Rm = (RcpM*Ym - 1)+RcpM*Ym_low */
+        vfmsub231pd {rn-sae}, %zmm3, %zmm4, %zmm15
+
+/* exponents */
+        vgetexppd {sae}, %zmm14, %zmm5
+        vgetexppd {sae}, %zmm3, %zmm4
+
+/* Table lookups */
+        vmovups   __svml_datanh_data_internal_avx512(%rip), %zmm9
+        vmovups   Log_tbl_H+64+__svml_datanh_data_internal_avx512(%rip), %zmm13
+        vmovups   Log_tbl_L+__svml_datanh_data_internal_avx512(%rip), %zmm7
+        vfmadd231pd {rn-sae}, %zmm14, %zmm8, %zmm0
+        vfnmadd231pd {rn-sae}, %zmm3, %zmm12, %zmm15
+
+/* Prepare table index */
+        vpsrlq    $48, %zmm14, %zmm11
+        vpsrlq    $48, %zmm3, %zmm8
+        vmovups   Log_tbl_L+64+__svml_datanh_data_internal_avx512(%rip), %zmm14
+
+/* polynomials */
+        vmovups   poly_coeff8+__svml_datanh_data_internal_avx512(%rip), %zmm3
+
+/* Km-Kp */
+        vsubpd    {rn-sae}, %zmm5, %zmm4, %zmm5
+        vmovups   poly_coeff7+__svml_datanh_data_internal_avx512(%rip), %zmm4
+        kmovw     %k0, %edx
+        vmovaps   %zmm11, %zmm10
+        vmovaps   %zmm4, %zmm6
+        vpermi2pd %zmm13, %zmm9, %zmm10
+        vpermi2pd %zmm14, %zmm7, %zmm11
+        vpermt2pd %zmm13, %zmm8, %zmm9
+        vpermt2pd %zmm14, %zmm8, %zmm7
+        vmovups   poly_coeff6+__svml_datanh_data_internal_avx512(%rip), %zmm8
+        vfmadd231pd {rn-sae}, %zmm0, %zmm3, %zmm6
+        vfmadd231pd {rn-sae}, %zmm15, %zmm3, %zmm4
+        vmovups   poly_coeff3+__svml_datanh_data_internal_avx512(%rip), %zmm13
+        vmovups   poly_coeff2+__svml_datanh_data_internal_avx512(%rip), %zmm14
+        vfmadd213pd {rn-sae}, %zmm8, %zmm0, %zmm6
+        vfmadd213pd {rn-sae}, %zmm8, %zmm15, %zmm4
+        vmovups   poly_coeff0+__svml_datanh_data_internal_avx512(%rip), %zmm8
+        vsubpd    {rn-sae}, %zmm11, %zmm7, %zmm12
+
+/* table values */
+        vsubpd    {rn-sae}, %zmm10, %zmm9, %zmm3
+        vmovups   poly_coeff5+__svml_datanh_data_internal_avx512(%rip), %zmm7
+        vmovups   poly_coeff4+__svml_datanh_data_internal_avx512(%rip), %zmm9
+
+/* K*L2H + Th */
+        vmovups   L2H+__svml_datanh_data_internal_avx512(%rip), %zmm10
+
+/* K*L2L + Tl */
+        vmovups   L2L+__svml_datanh_data_internal_avx512(%rip), %zmm11
+        vfmadd213pd {rn-sae}, %zmm7, %zmm0, %zmm6
+        vfmadd213pd {rn-sae}, %zmm7, %zmm15, %zmm4
+        vmovups   poly_coeff1+__svml_datanh_data_internal_avx512(%rip), %zmm7
+        vfmadd231pd {rn-sae}, %zmm5, %zmm10, %zmm3
+        vfmadd213pd {rn-sae}, %zmm12, %zmm11, %zmm5
+        vfmadd213pd {rn-sae}, %zmm9, %zmm0, %zmm6
+        vfmadd213pd {rn-sae}, %zmm9, %zmm15, %zmm4
+        vfmadd213pd {rn-sae}, %zmm13, %zmm0, %zmm6
+        vfmadd213pd {rn-sae}, %zmm13, %zmm15, %zmm4
+        vfmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm6
+        vfmadd213pd {rn-sae}, %zmm14, %zmm15, %zmm4
+        vfmadd213pd {rn-sae}, %zmm7, %zmm0, %zmm6
+        vfmadd213pd {rn-sae}, %zmm7, %zmm15, %zmm4
+        vfmadd213pd {rn-sae}, %zmm8, %zmm0, %zmm6
+        vfmadd213pd {rn-sae}, %zmm8, %zmm15, %zmm4
+
+/* (K*L2L + Tl) + Rp*PolyP */
+        vfmadd213pd {rn-sae}, %zmm5, %zmm0, %zmm6
+        vorpd     Half+__svml_datanh_data_internal_avx512(%rip), %zmm1, %zmm0
+
+/* (K*L2L + Tl) + Rp*PolyP -Rm*PolyM */
+        vfnmadd213pd {rn-sae}, %zmm6, %zmm15, %zmm4
+        vaddpd    {rn-sae}, %zmm4, %zmm3, %zmm1
+        vmulpd    {rn-sae}, %zmm0, %zmm1, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm2
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm2, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      atanh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_atanh_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_datanh_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Log_tbl_H[16][2];
+        __declspec(align(64)) VUINT32 Log_tbl_L[16][2];
+        __declspec(align(64)) VUINT32 One[8][2];
+        __declspec(align(64)) VUINT32 AbsMask[8][2];
+        __declspec(align(64)) VUINT32 AddB5[8][2];
+        __declspec(align(64)) VUINT32 RcpBitMask[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff0[8][2];
+        __declspec(align(64)) VUINT32 Half[8][2];
+        __declspec(align(64)) VUINT32 L2H[8][2];
+        __declspec(align(64)) VUINT32 L2L[8][2];
+    } __svml_datanh_data_internal_avx512;
+#endif
+__svml_datanh_data_internal_avx512:
+        /*== Log_tbl_H ==*/
+        .quad 0x0000000000000000
+        .quad 0x3faf0a30c0100000
+        .quad 0x3fbe27076e2a0000
+        .quad 0x3fc5ff3070a80000
+        .quad 0x3fcc8ff7c79b0000
+        .quad 0x3fd1675cabab8000
+        .quad 0x3fd4618bc21c8000
+        .quad 0x3fd739d7f6bc0000
+        .quad 0x3fd9f323ecbf8000
+        .quad 0x3fdc8ff7c79a8000
+        .quad 0x3fdf128f5faf0000
+        .quad 0x3fe0be72e4254000
+        .quad 0x3fe1e85f5e704000
+        .quad 0x3fe307d7334f0000
+        .quad 0x3fe41d8fe8468000
+        .quad 0x3fe52a2d265bc000
+        /*== Log_tbl_L ==*/
+        .align 64
+        .quad 0x0000000000000000
+        .quad 0x3d662a6617cc9717
+        .quad 0x3d6e5cbd3d50fffc
+        .quad 0xbd6b0b0de3077d7e
+        .quad 0xbd697794f689f843
+        .quad 0x3d630701ce63eab9
+        .quad 0xbd609ec17a426426
+        .quad 0xbd67fcb18ed9d603
+        .quad 0x3d584bf2b68d766f
+        .quad 0x3d5a21ac25d81ef3
+        .quad 0x3d3bb2cd720ec44c
+        .quad 0xbd657d49676844cc
+        .quad 0x3d1a07bd8b34be7c
+        .quad 0x3d60be1fb590a1f5
+        .quad 0xbd5aa33736867a17
+        .quad 0x3d46abb9df22bc57
+        /*== One ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== AbsMask ==*/
+        .align 64
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== AddB5 ==*/
+        .align 64
+        .quad 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000
+        /*== RcpBitMask ==*/
+        .align 64
+        .quad 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000
+        /*== poly_coeff8 ==*/
+        .align 64
+        .quad 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142
+        /*== poly_coeff7 ==*/
+        .align 64
+        .quad 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70
+        /*== poly_coeff6 ==*/
+        .align 64
+        .quad 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8
+        /*== poly_coeff5 ==*/
+        .align 64
+        .quad 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5
+        /*== poly_coeff4 ==*/
+        .align 64
+        .quad 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a
+        /*== poly_coeff3 ==*/
+        .align 64
+        .quad 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01
+        /*== poly_coeff2 ==*/
+        .align 64
+        .quad 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462
+        /*== poly_coeff1 ==*/
+        .align 64
+        .quad 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5
+        /*== poly_coeff0 ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== Half ==*/
+        .align 64
+        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
+        /*== L2H = log(2)_high ==*/
+        .align 64
+        .quad 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000
+        /*== L2L = log(2)_low ==*/
+        .align 64
+        .quad 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000
+        .align 64
+        .type	__svml_datanh_data_internal_avx512,@object
+        .size	__svml_datanh_data_internal_avx512,.-__svml_datanh_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core-avx2.S
new file mode 100644
index 0000000000..1af3662f65
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized atanhf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_atanhf _ZGVeN16v_atanhf_avx2_wrapper
+#include "../svml_s_atanhf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core.c
new file mode 100644
index 0000000000..4b1190f0eb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atanhf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_atanhf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_atanhf, __GI__ZGVeN16v_atanhf,
+	       __redirect__ZGVeN16v_atanhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core_avx512.S
new file mode 100644
index 0000000000..6c5f6a54fa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core_avx512.S
@@ -0,0 +1,393 @@
+/* Function atanhf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
+ *   using small lookup table that map to AVX-512 permute instructions
+ *
+ *   Special cases:
+ *
+ *   atanh(0)  = 0
+ *   atanh(+1) = +INF
+ *   atanh(-1) = -INF
+ *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
+ *
+ */
+
+/* Offsets for data table __svml_satanh_data_internal_avx512
+ */
+#define Log_tbl_H                     	0
+#define Log_tbl_L                     	128
+#define One                           	256
+#define AbsMask                       	320
+#define AddB5                         	384
+#define RcpBitMask                    	448
+#define poly_coeff3                   	512
+#define poly_coeff2                   	576
+#define poly_coeff1                   	640
+#define poly_coeff0                   	704
+#define Half                          	768
+#define L2H                           	832
+#define L2L                           	896
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_atanhf_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   One+__svml_satanh_data_internal_avx512(%rip), %zmm4
+
+/* round reciprocals to 1+5b mantissas */
+        vmovups   AddB5+__svml_satanh_data_internal_avx512(%rip), %zmm14
+        vmovups   RcpBitMask+__svml_satanh_data_internal_avx512(%rip), %zmm1
+        vmovaps   %zmm0, %zmm11
+        vandps    AbsMask+__svml_satanh_data_internal_avx512(%rip), %zmm11, %zmm6
+
+/* 1+y */
+        vaddps    {rn-sae}, %zmm4, %zmm6, %zmm9
+
+/* 1-y */
+        vsubps    {rn-sae}, %zmm6, %zmm4, %zmm8
+        vxorps    %zmm6, %zmm11, %zmm10
+
+/* Yp_high */
+        vsubps    {rn-sae}, %zmm4, %zmm9, %zmm2
+
+/* -Ym_high */
+        vsubps    {rn-sae}, %zmm4, %zmm8, %zmm5
+
+/* RcpP ~ 1/Yp */
+        vrcp14ps  %zmm9, %zmm12
+
+/* RcpM ~ 1/Ym */
+        vrcp14ps  %zmm8, %zmm13
+
+/* input outside (-1, 1) ? */
+        vcmpps    $21, {sae}, %zmm4, %zmm6, %k0
+        vpaddd    %zmm14, %zmm12, %zmm15
+        vpaddd    %zmm14, %zmm13, %zmm0
+
+/* Yp_low */
+        vsubps    {rn-sae}, %zmm2, %zmm6, %zmm3
+        vandps    %zmm1, %zmm15, %zmm7
+        vandps    %zmm1, %zmm0, %zmm12
+
+/* Ym_low */
+        vaddps    {rn-sae}, %zmm5, %zmm6, %zmm5
+
+/* Reduced argument: Rp = (RcpP*Yp - 1)+RcpP*Yp_low */
+        vfmsub213ps {rn-sae}, %zmm4, %zmm7, %zmm9
+
+/* Reduced argument: Rm = (RcpM*Ym - 1)+RcpM*Ym_low */
+        vfmsub231ps {rn-sae}, %zmm12, %zmm8, %zmm4
+        vmovups   Log_tbl_L+__svml_satanh_data_internal_avx512(%rip), %zmm8
+        vmovups   Log_tbl_L+64+__svml_satanh_data_internal_avx512(%rip), %zmm13
+
+/* exponents */
+        vgetexpps {sae}, %zmm7, %zmm15
+        vfmadd231ps {rn-sae}, %zmm7, %zmm3, %zmm9
+
+/* Table lookups */
+        vmovups   __svml_satanh_data_internal_avx512(%rip), %zmm6
+        vgetexpps {sae}, %zmm12, %zmm14
+        vfnmadd231ps {rn-sae}, %zmm12, %zmm5, %zmm4
+
+/* Prepare table index */
+        vpsrld    $18, %zmm7, %zmm3
+        vpsrld    $18, %zmm12, %zmm2
+        vmovups   Log_tbl_H+64+__svml_satanh_data_internal_avx512(%rip), %zmm7
+        vmovups   poly_coeff1+__svml_satanh_data_internal_avx512(%rip), %zmm12
+
+/* Km-Kp */
+        vsubps    {rn-sae}, %zmm15, %zmm14, %zmm1
+        kmovw     %k0, %edx
+        vmovaps   %zmm3, %zmm0
+        vpermi2ps %zmm13, %zmm8, %zmm3
+        vpermt2ps %zmm13, %zmm2, %zmm8
+        vpermi2ps %zmm7, %zmm6, %zmm0
+        vpermt2ps %zmm7, %zmm2, %zmm6
+        vsubps    {rn-sae}, %zmm3, %zmm8, %zmm5
+
+/* K*L2H + Th */
+        vmovups   L2H+__svml_satanh_data_internal_avx512(%rip), %zmm2
+
+/* K*L2L + Tl */
+        vmovups   L2L+__svml_satanh_data_internal_avx512(%rip), %zmm3
+
+/* polynomials */
+        vmovups   poly_coeff3+__svml_satanh_data_internal_avx512(%rip), %zmm7
+        vmovups   poly_coeff0+__svml_satanh_data_internal_avx512(%rip), %zmm13
+
+/* table values */
+        vsubps    {rn-sae}, %zmm0, %zmm6, %zmm0
+        vfmadd231ps {rn-sae}, %zmm1, %zmm2, %zmm0
+        vfmadd213ps {rn-sae}, %zmm5, %zmm3, %zmm1
+        vmovups   poly_coeff2+__svml_satanh_data_internal_avx512(%rip), %zmm3
+        vmovaps   %zmm3, %zmm2
+        vfmadd231ps {rn-sae}, %zmm9, %zmm7, %zmm2
+        vfmadd231ps {rn-sae}, %zmm4, %zmm7, %zmm3
+        vfmadd213ps {rn-sae}, %zmm12, %zmm9, %zmm2
+        vfmadd213ps {rn-sae}, %zmm12, %zmm4, %zmm3
+        vfmadd213ps {rn-sae}, %zmm13, %zmm9, %zmm2
+        vfmadd213ps {rn-sae}, %zmm13, %zmm4, %zmm3
+
+/* (K*L2L + Tl) + Rp*PolyP */
+        vfmadd213ps {rn-sae}, %zmm1, %zmm9, %zmm2
+        vorps     Half+__svml_satanh_data_internal_avx512(%rip), %zmm10, %zmm9
+
+/* (K*L2L + Tl) + Rp*PolyP -Rm*PolyM */
+        vfnmadd213ps {rn-sae}, %zmm2, %zmm4, %zmm3
+        vaddps    {rn-sae}, %zmm3, %zmm0, %zmm4
+        vmulps    {rn-sae}, %zmm9, %zmm4, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm11
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm11, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      atanhf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_atanhf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_satanh_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Log_tbl_H[32][1];
+        __declspec(align(64)) VUINT32 Log_tbl_L[32][1];
+        __declspec(align(64)) VUINT32 One[16][1];
+        __declspec(align(64)) VUINT32 AbsMask[16][1];
+        __declspec(align(64)) VUINT32 AddB5[16][1];
+        __declspec(align(64)) VUINT32 RcpBitMask[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff0[16][1];
+        __declspec(align(64)) VUINT32 Half[16][1];
+        __declspec(align(64)) VUINT32 L2H[16][1];
+        __declspec(align(64)) VUINT32 L2L[16][1];
+    } __svml_satanh_data_internal_avx512;
+#endif
+__svml_satanh_data_internal_avx512:
+        /*== Log_tbl_H ==*/
+        .long 0x00000000
+        .long 0x3cfc0000
+        .long 0x3d780000
+        .long 0x3db78000
+        .long 0x3df10000
+        .long 0x3e14c000
+        .long 0x3e300000
+        .long 0x3e4a8000
+        .long 0x3e648000
+        .long 0x3e7dc000
+        .long 0x3e8b4000
+        .long 0x3e974000
+        .long 0x3ea30000
+        .long 0x3eae8000
+        .long 0x3eb9c000
+        .long 0x3ec4e000
+        .long 0x3ecfa000
+        .long 0x3eda2000
+        .long 0x3ee48000
+        .long 0x3eeea000
+        .long 0x3ef8a000
+        .long 0x3f013000
+        .long 0x3f05f000
+        .long 0x3f0aa000
+        .long 0x3f0f4000
+        .long 0x3f13d000
+        .long 0x3f184000
+        .long 0x3f1ca000
+        .long 0x3f20f000
+        .long 0x3f252000
+        .long 0x3f295000
+        .long 0x3f2d7000
+        /*== Log_tbl_L ==*/
+        .align 64
+        .long 0x00000000
+        .long 0x3726c39e
+        .long 0x38a30c01
+        .long 0x37528ae5
+        .long 0x38e0edc5
+        .long 0xb8ab41f8
+        .long 0xb7cf8f58
+        .long 0x3896a73d
+        .long 0xb5838656
+        .long 0x380c36af
+        .long 0xb8235454
+        .long 0x3862bae1
+        .long 0x38c5e10e
+        .long 0x38dedfac
+        .long 0x38ebfb5e
+        .long 0xb8e63c9f
+        .long 0xb85c1340
+        .long 0x38777bcd
+        .long 0xb6038656
+        .long 0x37d40984
+        .long 0xb8b85028
+        .long 0xb8ad5a5a
+        .long 0x3865c84a
+        .long 0x38c3d2f5
+        .long 0x383ebce1
+        .long 0xb8a1ed76
+        .long 0xb7a332c4
+        .long 0xb779654f
+        .long 0xb8602f73
+        .long 0x38f85db0
+        .long 0x37b4996f
+        .long 0xb8bfb3ca
+        /*== One ==*/
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== AbsMask ==*/
+        .align 64
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== AddB5 ==*/
+        .align 64
+        .long 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000
+        /*== RcpBitMask ==*/
+        .align 64
+        .long 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000
+        /*== poly_coeff3 ==*/
+        .align 64
+        .long 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810
+        /*== poly_coeff2 ==*/
+        .align 64
+        .long 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e
+        /*== poly_coeff1 ==*/
+        .align 64
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000
+        /*== poly_coeff0 ==*/
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== Half ==*/
+        .align 64
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
+        /*== L2H = log(2)_high ==*/
+        .align 64
+        .long 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000
+        /*== L2L = log(2)_low ==*/
+        .align 64
+        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4
+        .align 64
+        .type	__svml_satanh_data_internal_avx512,@object
+        .size	__svml_satanh_data_internal_avx512,.-__svml_satanh_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core-sse2.S
new file mode 100644
index 0000000000..b750092887
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized atanhf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_atanhf _ZGVbN4v_atanhf_sse2
+#include "../svml_s_atanhf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core.c
new file mode 100644
index 0000000000..46624c48cd
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atanhf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_atanhf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_atanhf, __GI__ZGVbN4v_atanhf,
+	       __redirect__ZGVbN4v_atanhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core_sse4.S
new file mode 100644
index 0000000000..77e46cb5b9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core_sse4.S
@@ -0,0 +1,361 @@
+/* Function atanhf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
+ *
+ *   Special cases:
+ *
+ *   atanh(0)  = 0
+ *   atanh(+1) = +INF
+ *   atanh(-1) = -INF
+ *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
+ *
+ */
+
+/* Offsets for data table __svml_satanh_data_internal
+ */
+#define SgnMask                       	0
+#define sOne                          	16
+#define sPoly                         	32
+#define iBrkValue                     	160
+#define iOffExpoMask                  	176
+#define sHalf                         	192
+#define sSign                         	208
+#define sTopMask12                    	224
+#define TinyRange                     	240
+#define sLn2                          	256
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_atanhf_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm5
+
+/* Load constants including One = 1 */
+        movups    sOne+__svml_satanh_data_internal(%rip), %xmm4
+        movaps    %xmm5, %xmm3
+
+/* Strip off the sign, so treat X as positive until right at the end */
+        movups    SgnMask+__svml_satanh_data_internal(%rip), %xmm7
+        movaps    %xmm4, %xmm8
+        andps     %xmm5, %xmm7
+        movaps    %xmm4, %xmm10
+        movups    sTopMask12+__svml_satanh_data_internal(%rip), %xmm11
+        movaps    %xmm4, %xmm14
+        movaps    %xmm11, %xmm9
+
+/*
+ * Compute V = 2 * X trivially, and UHi + U_lo = 1 - X in two pieces,
+ * the upper part UHi being <= 12 bits long. Then we have
+ * atanh(X) = 1/2 * log((1 + X) / (1 - X)) = 1/2 * log1p(V / (UHi + ULo)).
+ */
+        movaps    %xmm7, %xmm12
+
+/*
+ * Check whether |X| < 1, in which case we use the main function.
+ * Otherwise set the rangemask so that the callout will get used.
+ * Note that this will also use the callout for NaNs since not(NaN < 1).
+ */
+        movaps    %xmm7, %xmm6
+        movaps    %xmm7, %xmm2
+        cmpnltps  %xmm4, %xmm6
+        cmpltps   TinyRange+__svml_satanh_data_internal(%rip), %xmm2
+        mulps     %xmm5, %xmm3
+        subps     %xmm7, %xmm8
+        addps     %xmm7, %xmm12
+        movmskps  %xmm6, %edx
+        subps     %xmm8, %xmm10
+        addps     %xmm5, %xmm3
+        subps     %xmm7, %xmm10
+        andps     %xmm8, %xmm9
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * later incorporating L into the reduced argument.
+ * compute 1+x as high, low parts
+ */
+        movaps    %xmm4, %xmm7
+
+/*
+ * Now compute R = 1/(UHi+ULo) * (1 - E) and the error term E
+ * The first FMR is exact (we force R to 12 bits just in case it
+ * isn't already, to make absolutely sure), and since E is ~ 2^-12,
+ * the rounding error in the other one is acceptable.
+ */
+        rcpps     %xmm9, %xmm15
+        subps     %xmm9, %xmm8
+        andps     %xmm11, %xmm15
+
+/*
+ * Split V as well into upper 12 bits and lower part, so that we can get
+ * a preliminary quotient estimate without rounding error.
+ */
+        andps     %xmm12, %xmm11
+        mulps     %xmm15, %xmm9
+        addps     %xmm8, %xmm10
+        subps     %xmm11, %xmm12
+
+/* Hence get initial quotient estimate QHi + QLo = R * VHi + R * VLo */
+        mulps     %xmm15, %xmm11
+        mulps     %xmm15, %xmm10
+        subps     %xmm9, %xmm14
+        mulps     %xmm12, %xmm15
+        subps     %xmm10, %xmm14
+
+/* Compute D = E + E^2 */
+        movaps    %xmm14, %xmm13
+        movaps    %xmm4, %xmm8
+        mulps     %xmm14, %xmm13
+
+/* reduction: compute r,n */
+        movdqu    iBrkValue+__svml_satanh_data_internal(%rip), %xmm9
+        addps     %xmm13, %xmm14
+
+/*
+ * Compute R * (VHi + VLo) * (1 + E + E^2)
+ * = R *  (VHi + VLo) * (1 + D)
+ * = QHi + (QHi * D + QLo + QLo * D)
+ */
+        movaps    %xmm14, %xmm0
+        mulps     %xmm15, %xmm14
+        mulps     %xmm11, %xmm0
+        addps     %xmm14, %xmm15
+        movdqu    iOffExpoMask+__svml_satanh_data_internal(%rip), %xmm12
+        movaps    %xmm4, %xmm14
+
+/* Record the sign for eventual reincorporation. */
+        movups    sSign+__svml_satanh_data_internal(%rip), %xmm1
+        addps     %xmm15, %xmm0
+
+/*
+ * Now finally accumulate the high and low parts of the
+ * argument to log1p, H + L, with a final compensated summation.
+ */
+        movaps    %xmm0, %xmm6
+        andps     %xmm5, %xmm1
+
+/* Or the sign bit in with the tiny result to handle atanh(-0) correctly */
+        orps      %xmm1, %xmm3
+        addps     %xmm11, %xmm6
+        maxps     %xmm6, %xmm7
+        minps     %xmm6, %xmm8
+        subps     %xmm6, %xmm11
+        movaps    %xmm7, %xmm10
+        andps     %xmm2, %xmm3
+        addps     %xmm8, %xmm10
+        addps     %xmm11, %xmm0
+        subps     %xmm10, %xmm7
+        psubd     %xmm9, %xmm10
+        addps     %xmm7, %xmm8
+        pand      %xmm10, %xmm12
+        psrad     $23, %xmm10
+        cvtdq2ps  %xmm10, %xmm13
+        addps     %xmm8, %xmm0
+
+/* final reconstruction */
+        mulps     sLn2+__svml_satanh_data_internal(%rip), %xmm13
+        pslld     $23, %xmm10
+        paddd     %xmm9, %xmm12
+        psubd     %xmm10, %xmm14
+
+/* polynomial evaluation */
+        subps     %xmm4, %xmm12
+        mulps     %xmm0, %xmm14
+        movups    sPoly+112+__svml_satanh_data_internal(%rip), %xmm0
+        addps     %xmm12, %xmm14
+        mulps     %xmm14, %xmm0
+
+/* Finally, halve the result and reincorporate the sign */
+        movups    sHalf+__svml_satanh_data_internal(%rip), %xmm4
+        pxor      %xmm1, %xmm4
+        addps     sPoly+96+__svml_satanh_data_internal(%rip), %xmm0
+        mulps     %xmm14, %xmm0
+        addps     sPoly+80+__svml_satanh_data_internal(%rip), %xmm0
+        mulps     %xmm14, %xmm0
+        addps     sPoly+64+__svml_satanh_data_internal(%rip), %xmm0
+        mulps     %xmm14, %xmm0
+        addps     sPoly+48+__svml_satanh_data_internal(%rip), %xmm0
+        mulps     %xmm14, %xmm0
+        addps     sPoly+32+__svml_satanh_data_internal(%rip), %xmm0
+        mulps     %xmm14, %xmm0
+        addps     sPoly+16+__svml_satanh_data_internal(%rip), %xmm0
+        mulps     %xmm14, %xmm0
+        addps     sPoly+__svml_satanh_data_internal(%rip), %xmm0
+        mulps     %xmm14, %xmm0
+        mulps     %xmm14, %xmm0
+        addps     %xmm0, %xmm14
+        movaps    %xmm2, %xmm0
+        addps     %xmm13, %xmm14
+        mulps     %xmm14, %xmm4
+        andnps    %xmm4, %xmm0
+        orps      %xmm3, %xmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm5
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm5, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      atanhf@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_atanhf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_satanh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 SgnMask[4][1];
+        __declspec(align(16)) VUINT32 sOne[4][1];
+        __declspec(align(16)) VUINT32 sPoly[8][4][1];
+        __declspec(align(16)) VUINT32 iBrkValue[4][1];
+        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
+        __declspec(align(16)) VUINT32 sHalf[4][1];
+        __declspec(align(16)) VUINT32 sSign[4][1];
+        __declspec(align(16)) VUINT32 sTopMask12[4][1];
+        __declspec(align(16)) VUINT32 TinyRange[4][1];
+        __declspec(align(16)) VUINT32 sLn2[4][1];
+} __svml_satanh_data_internal;
+#endif
+__svml_satanh_data_internal:
+        /*== SgnMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== sOne = SP 1.0 ==*/
+        .align 16
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== sPoly[] = SP polynomial ==*/
+        .align 16
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
+        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
+        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
+        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
+        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
+        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
+        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
+        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 16
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 16
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sHalf ==*/
+        .align 16
+        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
+        /*== sSign ==*/
+        .align 16
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000
+        /*== sTopMask12 ==*/
+        .align 16
+        .long 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000
+        /*== TinyRange ==*/
+        .align 16
+        .long 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000
+        /*== sLn2 = SP ln(2) ==*/
+        .align 16
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
+        .align 16
+        .type	__svml_satanh_data_internal,@object
+        .size	__svml_satanh_data_internal,.-__svml_satanh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core-sse.S
new file mode 100644
index 0000000000..b293bd5b41
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized atanhf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_atanhf _ZGVdN8v_atanhf_sse_wrapper
+#include "../svml_s_atanhf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core.c
new file mode 100644
index 0000000000..3df8d66c94
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized atanhf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_atanhf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_atanhf, __GI__ZGVdN8v_atanhf,
+	       __redirect__ZGVdN8v_atanhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core_avx2.S
new file mode 100644
index 0000000000..00225207a8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core_avx2.S
@@ -0,0 +1,335 @@
+/* Function atanhf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
+ *
+ *   Special cases:
+ *
+ *   atanh(0)  = 0
+ *   atanh(+1) = +INF
+ *   atanh(-1) = -INF
+ *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
+ *
+ */
+
+/* Offsets for data table __svml_satanh_data_internal
+ */
+#define SgnMask                       	0
+#define sOne                          	32
+#define sPoly                         	64
+#define iBrkValue                     	320
+#define iOffExpoMask                  	352
+#define sHalf                         	384
+#define sSign                         	416
+#define sTopMask12                    	448
+#define TinyRange                     	480
+#define sLn2                          	512
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_atanhf_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+
+/* Load constants including One = 1 */
+        vmovups   sOne+__svml_satanh_data_internal(%rip), %ymm5
+        vmovups   sTopMask12+__svml_satanh_data_internal(%rip), %ymm13
+        vmovaps   %ymm0, %ymm6
+
+/* Strip off the sign, so treat X as positive until right at the end */
+        vandps    SgnMask+__svml_satanh_data_internal(%rip), %ymm6, %ymm10
+        vsubps    %ymm10, %ymm5, %ymm1
+
+/*
+ * Compute V = 2 * X trivially, and UHi + U_lo = 1 - X in two pieces,
+ * the upper part UHi being <= 12 bits long. Then we have
+ * atanh(X) = 1/2 * log((1 + X) / (1 - X)) = 1/2 * log1p(V / (UHi + ULo)).
+ */
+        vaddps    %ymm10, %ymm10, %ymm14
+
+/*
+ * Check whether |X| < 1, in which case we use the main function.
+ * Otherwise set the rangemask so that the callout will get used.
+ * Note that this will also use the callout for NaNs since not(NaN < 1).
+ */
+        vcmpnlt_uqps %ymm5, %ymm10, %ymm7
+        vsubps    %ymm1, %ymm5, %ymm9
+        vcmplt_oqps TinyRange+__svml_satanh_data_internal(%rip), %ymm10, %ymm4
+        vrcpps    %ymm1, %ymm11
+        vsubps    %ymm10, %ymm9, %ymm12
+        vandps    %ymm13, %ymm11, %ymm0
+
+/* No need to split sU when FMA is available */
+        vfnmadd213ps %ymm5, %ymm0, %ymm1
+        vmovaps   %ymm6, %ymm8
+        vfmadd213ps %ymm6, %ymm6, %ymm8
+        vfnmadd231ps %ymm0, %ymm12, %ymm1
+
+/*
+ * Split V as well into upper 12 bits and lower part, so that we can get
+ * a preliminary quotient estimate without rounding error.
+ */
+        vandps    %ymm13, %ymm14, %ymm15
+        vmovmskps %ymm7, %edx
+        vsubps    %ymm15, %ymm14, %ymm7
+
+/* Hence get initial quotient estimate QHi + QLo = R * VHi + R * VLo */
+        vmulps    %ymm15, %ymm0, %ymm10
+
+/* Compute D = E + E^2 */
+        vfmadd213ps %ymm1, %ymm1, %ymm1
+
+/* Record the sign for eventual reincorporation. */
+        vandps    sSign+__svml_satanh_data_internal(%rip), %ymm6, %ymm3
+
+/* Or the sign bit in with the tiny result to handle atanh(-0) correctly */
+        vorps     %ymm3, %ymm8, %ymm2
+        vmulps    %ymm7, %ymm0, %ymm8
+
+/*
+ * Compute R * (VHi + VLo) * (1 + E + E^2)
+ * = R *  (VHi + VLo) * (1 + D)
+ * = QHi + (QHi * D + QLo + QLo * D)
+ */
+        vmulps    %ymm1, %ymm10, %ymm9
+        vfmadd213ps %ymm8, %ymm8, %ymm1
+        vaddps    %ymm1, %ymm9, %ymm1
+
+/* reduction: compute r,n */
+        vmovups   iBrkValue+__svml_satanh_data_internal(%rip), %ymm9
+
+/*
+ * Now finally accumulate the high and low parts of the
+ * argument to log1p, H + L, with a final compensated summation.
+ */
+        vaddps    %ymm1, %ymm10, %ymm12
+        vsubps    %ymm12, %ymm10, %ymm11
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * later incorporating L into the reduced argument.
+ * compute 1+x as high, low parts
+ */
+        vmaxps    %ymm12, %ymm5, %ymm13
+        vminps    %ymm12, %ymm5, %ymm14
+        vaddps    %ymm11, %ymm1, %ymm0
+        vaddps    %ymm14, %ymm13, %ymm1
+        vpsubd    %ymm9, %ymm1, %ymm7
+        vsubps    %ymm1, %ymm13, %ymm15
+        vpsrad    $23, %ymm7, %ymm10
+        vpand     iOffExpoMask+__svml_satanh_data_internal(%rip), %ymm7, %ymm8
+        vaddps    %ymm15, %ymm14, %ymm13
+        vpslld    $23, %ymm10, %ymm11
+        vpaddd    %ymm9, %ymm8, %ymm15
+        vaddps    %ymm13, %ymm0, %ymm14
+        vcvtdq2ps %ymm10, %ymm0
+        vpsubd    %ymm11, %ymm5, %ymm12
+
+/* polynomial evaluation */
+        vsubps    %ymm5, %ymm15, %ymm5
+        vmulps    %ymm14, %ymm12, %ymm1
+        vaddps    %ymm5, %ymm1, %ymm5
+        vmovups   sPoly+224+__svml_satanh_data_internal(%rip), %ymm1
+        vfmadd213ps sPoly+192+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
+        vfmadd213ps sPoly+160+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
+        vfmadd213ps sPoly+128+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
+        vfmadd213ps sPoly+96+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
+        vfmadd213ps sPoly+64+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
+        vfmadd213ps sPoly+32+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
+        vfmadd213ps sPoly+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
+        vmulps    %ymm1, %ymm5, %ymm7
+        vfmadd213ps %ymm5, %ymm5, %ymm7
+
+/* final reconstruction */
+        vfmadd132ps sLn2+__svml_satanh_data_internal(%rip), %ymm7, %ymm0
+
+/* Finally, halve the result and reincorporate the sign */
+        vxorps    sHalf+__svml_satanh_data_internal(%rip), %ymm3, %ymm3
+        vmulps    %ymm0, %ymm3, %ymm0
+        vblendvps %ymm4, %ymm2, %ymm0, %ymm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm6
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm6, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      atanhf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_atanhf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_satanh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 SgnMask[8][1];
+        __declspec(align(32)) VUINT32 sOne[8][1];
+        __declspec(align(32)) VUINT32 sPoly[8][8][1];
+        __declspec(align(32)) VUINT32 iBrkValue[8][1];
+        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
+        __declspec(align(32)) VUINT32 sHalf[8][1];
+        __declspec(align(32)) VUINT32 sSign[8][1];
+        __declspec(align(32)) VUINT32 sTopMask12[8][1];
+        __declspec(align(32)) VUINT32 TinyRange[8][1];
+        __declspec(align(32)) VUINT32 sLn2[8][1];
+} __svml_satanh_data_internal;
+#endif
+__svml_satanh_data_internal:
+        /*== SgnMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== sOne = SP 1.0 ==*/
+        .align 32
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== sPoly[] = SP polynomial ==*/
+        .align 32
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
+        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
+        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
+        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
+        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
+        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
+        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
+        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 32
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 32
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sHalf ==*/
+        .align 32
+        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
+        /*== sSign ==*/
+        .align 32
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000
+        /*== sTopMask12 ==*/
+        .align 32
+        .long 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000
+        /*== TinyRange ==*/
+        .align 32
+        .long 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000
+        /*== sLn2 = SP ln(2) ==*/
+        .align 32
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
+        .align 32
+        .type	__svml_satanh_data_internal,@object
+        .size	__svml_satanh_data_internal,.-__svml_satanh_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_atanh2_core.S b/sysdeps/x86_64/fpu/svml_d_atanh2_core.S
new file mode 100644
index 0000000000..36f549ddd9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atanh2_core.S
@@ -0,0 +1,29 @@
+/* Function atanh vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_atanh)
+WRAPPER_IMPL_SSE2 atanh
+END (_ZGVbN2v_atanh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_atanh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_atanh4_core.S b/sysdeps/x86_64/fpu/svml_d_atanh4_core.S
new file mode 100644
index 0000000000..6d6d11e85e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atanh4_core.S
@@ -0,0 +1,29 @@
+/* Function atanh vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_atanh)
+WRAPPER_IMPL_AVX _ZGVbN2v_atanh
+END (_ZGVdN4v_atanh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_atanh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S
new file mode 100644
index 0000000000..b4cfa275c8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function atanh vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_atanh)
+WRAPPER_IMPL_AVX _ZGVbN2v_atanh
+END (_ZGVcN4v_atanh)
diff --git a/sysdeps/x86_64/fpu/svml_d_atanh8_core.S b/sysdeps/x86_64/fpu/svml_d_atanh8_core.S
new file mode 100644
index 0000000000..b31a6a72a1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_atanh8_core.S
@@ -0,0 +1,25 @@
+/* Function atanh vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_atanh)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_atanh
+END (_ZGVeN8v_atanh)
diff --git a/sysdeps/x86_64/fpu/svml_s_atanhf16_core.S b/sysdeps/x86_64/fpu/svml_s_atanhf16_core.S
new file mode 100644
index 0000000000..2ea61888e7
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atanhf16_core.S
@@ -0,0 +1,25 @@
+/* Function atanhf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_atanhf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_atanhf
+END (_ZGVeN16v_atanhf)
diff --git a/sysdeps/x86_64/fpu/svml_s_atanhf4_core.S b/sysdeps/x86_64/fpu/svml_s_atanhf4_core.S
new file mode 100644
index 0000000000..6904cc388a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atanhf4_core.S
@@ -0,0 +1,29 @@
+/* Function atanhf vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_atanhf)
+WRAPPER_IMPL_SSE2 atanhf
+END (_ZGVbN4v_atanhf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_atanhf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_atanhf8_core.S b/sysdeps/x86_64/fpu/svml_s_atanhf8_core.S
new file mode 100644
index 0000000000..31d695fb5d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atanhf8_core.S
@@ -0,0 +1,29 @@
+/* Function atanhf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_atanhf)
+WRAPPER_IMPL_AVX _ZGVbN4v_atanhf
+END (_ZGVdN8v_atanhf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_atanhf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S
new file mode 100644
index 0000000000..6c24eaf45c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function atanhf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_atanhf)
+WRAPPER_IMPL_AVX _ZGVbN4v_atanhf
+END (_ZGVcN8v_atanhf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx.c
new file mode 100644
index 0000000000..0bdeec7851
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-atanh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx2.c
new file mode 100644
index 0000000000..0bdeec7851
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-atanh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx512f.c
new file mode 100644
index 0000000000..0bdeec7851
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-atanh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atanh.c b/sysdeps/x86_64/fpu/test-double-libmvec-atanh.c
new file mode 100644
index 0000000000..41dd8e7af3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-atanh.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC atanh
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 38359b05e3..04a4fe654b 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
+VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 17701e7731..f9ac2fad5d 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -44,6 +44,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
+VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index bba62b2446..185801fa82 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
+VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 8a04e13a07..1cc8aaecbf 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
 VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
+VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx.c
new file mode 100644
index 0000000000..6f89ae70f2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-atanhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx2.c
new file mode 100644
index 0000000000..6f89ae70f2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-atanhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx512f.c
new file mode 100644
index 0000000000..6f89ae70f2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-atanhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanhf.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf.c
new file mode 100644
index 0000000000..33a022adb8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC atanhf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 706f52c618..b5d76d80e0 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
+VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index ceace4c53a..c1df6a03c1 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
+VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 06a4753409..f4c646683f 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -44,6 +44,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
+VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index a87e5298e0..a6acd3ffca 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
+VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 15/18] x86-64: Add vector acosh/acoshf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (13 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 14/18] x86-64: Add vector atanh/atanhf " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:26   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 16/18] x86-64: Add vector erf/erff " Sunil K Pandey
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized acosh/acoshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector acosh/acoshf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |   11 +
 math/bits/mathcalls.h                         |    2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
 sysdeps/x86/fpu/bits/math-vector.h            |    4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
 sysdeps/x86_64/fpu/Makeconfig                 |    1 +
 sysdeps/x86_64/fpu/Versions                   |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
 .../fpu/multiarch/svml_d_acosh2_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_acosh2_core.c |   27 +
 .../fpu/multiarch/svml_d_acosh2_core_sse4.S   | 1469 ++++++++++++++++
 .../fpu/multiarch/svml_d_acosh4_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_acosh4_core.c |   27 +
 .../fpu/multiarch/svml_d_acosh4_core_avx2.S   | 1536 +++++++++++++++++
 .../fpu/multiarch/svml_d_acosh8_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_acosh8_core.c |   27 +
 .../fpu/multiarch/svml_d_acosh8_core_avx512.S |  480 ++++++
 .../fpu/multiarch/svml_s_acoshf16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_acoshf16_core.c      |   28 +
 .../multiarch/svml_s_acoshf16_core_avx512.S   |  449 +++++
 .../fpu/multiarch/svml_s_acoshf4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_acoshf4_core.c       |   28 +
 .../fpu/multiarch/svml_s_acoshf4_core_sse4.S  |  389 +++++
 .../fpu/multiarch/svml_s_acoshf8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_acoshf8_core.c       |   28 +
 .../fpu/multiarch/svml_s_acoshf8_core_avx2.S  |  370 ++++
 sysdeps/x86_64/fpu/svml_d_acosh2_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_acosh4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_acosh8_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_s_acoshf16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_acoshf4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_acoshf8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S  |   25 +
 .../fpu/test-double-libmvec-acosh-avx.c       |    1 +
 .../fpu/test-double-libmvec-acosh-avx2.c      |    1 +
 .../fpu/test-double-libmvec-acosh-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-acosh.c    |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
 .../fpu/test-float-libmvec-acoshf-avx.c       |    1 +
 .../fpu/test-float-libmvec-acoshf-avx2.c      |    1 +
 .../fpu/test-float-libmvec-acoshf-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-acoshf.c    |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
 50 files changed, 5265 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index bb7380a446..b17bf78cd9 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -263,4 +263,15 @@
 #define __DECL_SIMD_atanhf32x
 #define __DECL_SIMD_atanhf64x
 #define __DECL_SIMD_atanhf128x
+
+#define __DECL_SIMD_acosh
+#define __DECL_SIMD_acoshf
+#define __DECL_SIMD_acoshl
+#define __DECL_SIMD_acoshf16
+#define __DECL_SIMD_acoshf32
+#define __DECL_SIMD_acoshf64
+#define __DECL_SIMD_acoshf128
+#define __DECL_SIMD_acoshf32x
+#define __DECL_SIMD_acoshf64x
+#define __DECL_SIMD_acoshf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 04dd9c5d1b..bc37973c41 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -82,7 +82,7 @@ __MATHDECL_VEC (void,sincos,,
 
 #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99
 /* Hyperbolic arc cosine of X.  */
-__MATHCALL (acosh,, (_Mdouble_ __x));
+__MATHCALL_VEC (acosh,, (_Mdouble_ __x));
 /* Hyperbolic arc sine of X.  */
 __MATHCALL (asinh,, (_Mdouble_ __x));
 /* Hyperbolic arc tangent of X.  */
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 2d389912b1..e9d6ade70a 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -47,6 +47,7 @@ GLIBC_2.22 _ZGVeN8v_sin F
 GLIBC_2.22 _ZGVeN8vv_pow F
 GLIBC_2.22 _ZGVeN8vvv_sincos F
 GLIBC_2.35 _ZGVbN2v_acos F
+GLIBC_2.35 _ZGVbN2v_acosh F
 GLIBC_2.35 _ZGVbN2v_asin F
 GLIBC_2.35 _ZGVbN2v_atan F
 GLIBC_2.35 _ZGVbN2v_atanh F
@@ -62,6 +63,7 @@ GLIBC_2.35 _ZGVbN2v_sinh F
 GLIBC_2.35 _ZGVbN2vv_atan2 F
 GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
+GLIBC_2.35 _ZGVbN4v_acoshf F
 GLIBC_2.35 _ZGVbN4v_asinf F
 GLIBC_2.35 _ZGVbN4v_atanf F
 GLIBC_2.35 _ZGVbN4v_atanhf F
@@ -77,6 +79,7 @@ GLIBC_2.35 _ZGVbN4v_sinhf F
 GLIBC_2.35 _ZGVbN4vv_atan2f F
 GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
+GLIBC_2.35 _ZGVcN4v_acosh F
 GLIBC_2.35 _ZGVcN4v_asin F
 GLIBC_2.35 _ZGVcN4v_atan F
 GLIBC_2.35 _ZGVcN4v_atanh F
@@ -92,6 +95,7 @@ GLIBC_2.35 _ZGVcN4v_sinh F
 GLIBC_2.35 _ZGVcN4vv_atan2 F
 GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
+GLIBC_2.35 _ZGVcN8v_acoshf F
 GLIBC_2.35 _ZGVcN8v_asinf F
 GLIBC_2.35 _ZGVcN8v_atanf F
 GLIBC_2.35 _ZGVcN8v_atanhf F
@@ -107,6 +111,7 @@ GLIBC_2.35 _ZGVcN8v_sinhf F
 GLIBC_2.35 _ZGVcN8vv_atan2f F
 GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
+GLIBC_2.35 _ZGVdN4v_acosh F
 GLIBC_2.35 _ZGVdN4v_asin F
 GLIBC_2.35 _ZGVdN4v_atan F
 GLIBC_2.35 _ZGVdN4v_atanh F
@@ -122,6 +127,7 @@ GLIBC_2.35 _ZGVdN4v_sinh F
 GLIBC_2.35 _ZGVdN4vv_atan2 F
 GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
+GLIBC_2.35 _ZGVdN8v_acoshf F
 GLIBC_2.35 _ZGVdN8v_asinf F
 GLIBC_2.35 _ZGVdN8v_atanf F
 GLIBC_2.35 _ZGVdN8v_atanhf F
@@ -137,6 +143,7 @@ GLIBC_2.35 _ZGVdN8v_sinhf F
 GLIBC_2.35 _ZGVdN8vv_atan2f F
 GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
+GLIBC_2.35 _ZGVeN16v_acoshf F
 GLIBC_2.35 _ZGVeN16v_asinf F
 GLIBC_2.35 _ZGVeN16v_atanf F
 GLIBC_2.35 _ZGVeN16v_atanhf F
@@ -152,6 +159,7 @@ GLIBC_2.35 _ZGVeN16v_sinhf F
 GLIBC_2.35 _ZGVeN16vv_atan2f F
 GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
+GLIBC_2.35 _ZGVeN8v_acosh F
 GLIBC_2.35 _ZGVeN8v_asin F
 GLIBC_2.35 _ZGVeN8v_atan F
 GLIBC_2.35 _ZGVeN8v_atanh F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 4937b6811f..4ad12a33e5 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -118,6 +118,10 @@
 #  define __DECL_SIMD_atanh __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_atanhf
 #  define __DECL_SIMD_atanhf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_acosh
+#  define __DECL_SIMD_acosh __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_acoshf
+#  define __DECL_SIMD_acoshf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index da39c08ba9..503547d3e4 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -58,6 +58,8 @@
 !GCC$ builtin (log1pf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (atanh) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (atanhf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (acosh) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (acoshf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -101,3 +103,5 @@
 !GCC$ builtin (log1pf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (atanh) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (atanhf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (acosh) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (acoshf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index de87544259..7b90b3d049 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -23,6 +23,7 @@ postclean-generated += libmvec.mk
 # Define for both math and mathvec directories.
 libmvec-funcs = \
   acos \
+  acosh \
   asin \
   atan \
   atan2 \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index df0ea83711..fd5e5923a1 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -15,6 +15,7 @@ libmvec {
   }
   GLIBC_2.35 {
     _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
+    _ZGVbN2v_acosh; _ZGVcN4v_acosh; _ZGVdN4v_acosh; _ZGVeN8v_acosh;
     _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
     _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
     _ZGVbN2v_atanh; _ZGVcN4v_atanh; _ZGVdN4v_atanh; _ZGVeN8v_atanh;
@@ -30,6 +31,7 @@ libmvec {
     _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
     _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
+    _ZGVbN4v_acoshf; _ZGVcN8v_acoshf; _ZGVdN8v_acoshf; _ZGVeN16v_acoshf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
     _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
     _ZGVbN4v_atanhf; _ZGVcN8v_atanhf; _ZGVdN8v_atanhf; _ZGVeN16v_atanhf;
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 09a46190b6..b2aa8fc56e 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -69,6 +69,26 @@ float: 2
 float128: 3
 ldouble: 3
 
+Function: "acosh_vlen16":
+float: 1
+
+Function: "acosh_vlen2":
+double: 2
+
+Function: "acosh_vlen4":
+double: 2
+float: 1
+
+Function: "acosh_vlen4_avx2":
+double: 2
+
+Function: "acosh_vlen8":
+double: 1
+float: 1
+
+Function: "acosh_vlen8_avx2":
+float: 2
+
 Function: "asin":
 double: 1
 float: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core-sse2.S
new file mode 100644
index 0000000000..28620a03a9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized acosh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_acosh _ZGVbN2v_acosh_sse2
+#include "../svml_d_acosh2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core.c
new file mode 100644
index 0000000000..8a41507326
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized acosh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_acosh
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_acosh, __GI__ZGVbN2v_acosh, __redirect__ZGVbN2v_acosh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core_sse4.S
new file mode 100644
index 0000000000..6455f57ce7
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core_sse4.S
@@ -0,0 +1,1469 @@
+/* Function acosh vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute acosh(x) as log(x + sqrt(x*x - 1))
+ *
+ *   Special cases:
+ *
+ *   acosh(NaN)  = quiet NaN, and raise invalid exception
+ *   acosh(-INF) = NaN
+ *   acosh(+INF) = +INF
+ *   acosh(x)    = NaN if x < 1
+ *   acosh(1)    = +0
+ *
+ */
+
+/* Offsets for data table __svml_dacosh_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	8208
+#define poly_coeff                    	12320
+#define ExpMask                       	12384
+#define Two10                         	12400
+#define MinLog1p                      	12416
+#define MaxLog1p                      	12432
+#define One                           	12448
+#define SgnMask                       	12464
+#define XThreshold                    	12480
+#define XhMask                        	12496
+#define Threshold                     	12512
+#define Bias                          	12528
+#define Bias1                         	12544
+#define ExpMask0                      	12560
+#define ExpMask2                      	12576
+#define L2                            	12592
+#define dBigThreshold                 	12608
+#define dLargestFinite                	12624
+#define dThirtyOne                    	12640
+#define XScale                        	12656
+
+/* Lookup bias for data table __svml_dacosh_data_internal.  */
+#define Table_Lookup_Bias               -0x405ff0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_acosh_sse4)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $64, %rsp
+        movaps    %xmm0, %xmm7
+
+/* Load the constant 1 and possibly other stuff */
+        movups    One+__svml_dacosh_data_internal(%rip), %xmm6
+
+/* Compute U = X - 1 and V = X + 1, naively first. */
+        movaps    %xmm7, %xmm11
+        movaps    %xmm6, %xmm10
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * also adding L into Xl.
+ * compute 1+x as high, low parts
+ */
+        movaps    %xmm6, %xmm14
+        subpd     %xmm6, %xmm11
+        addpd     %xmm7, %xmm10
+
+/* For low-accuracy versions, naivety is harmless */
+        mulpd     %xmm11, %xmm10
+
+/* dH = [X + sqrt(X^2 - 1)] - 1 */
+        sqrtpd    %xmm10, %xmm13
+        addpd     %xmm11, %xmm13
+        maxpd     %xmm13, %xmm14
+        movaps    %xmm6, %xmm4
+
+/*
+ * The following computation can go wrong for very large X, e.g.
+ * the X^2 - 1 = U * V can overflow. But for large X we have
+ * acosh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
+ * we can just later stick X back into the log and tweak up the exponent.
+ * Actually we scale X by 2^-30 and tweak the exponent up by 31,
+ * to stay in the safe range for the later log computation.
+ * Compute a flag now telling us when to do this.
+ */
+        movaps    %xmm7, %xmm5
+        minpd     %xmm13, %xmm4
+        cmpltpd   dBigThreshold+__svml_dacosh_data_internal(%rip), %xmm5
+        movups    SgnMask+__svml_dacosh_data_internal(%rip), %xmm12
+        movaps    %xmm14, %xmm0
+
+/* Now multiplex to the case X = 2^-30 * input, Xl = dL = 0 in the "big" case. */
+        movups    XScale+__svml_dacosh_data_internal(%rip), %xmm15
+        andps     %xmm12, %xmm13
+        mulpd     %xmm7, %xmm15
+        cmpltpd   XThreshold+__svml_dacosh_data_internal(%rip), %xmm13
+        addpd     %xmm4, %xmm0
+        orps      XhMask+__svml_dacosh_data_internal(%rip), %xmm13
+        movaps    %xmm5, %xmm3
+        andps     %xmm13, %xmm0
+        andnps    %xmm15, %xmm3
+        subpd     %xmm0, %xmm14
+        andps     %xmm5, %xmm0
+
+/*
+ * Check that 1 < X < +inf; otherwise go to the callout function.
+ * We need the callout for X = 1 to avoid division by zero below.
+ * This test ensures that callout handles NaN and either infinity.
+ */
+        movaps    %xmm7, %xmm9
+
+/* Now resume the main code. */
+        movups    ExpMask+__svml_dacosh_data_internal(%rip), %xmm1
+        orps      %xmm0, %xmm3
+
+/* preserve mantissa, set input exponent to 2^(-10) */
+        andps     %xmm3, %xmm1
+        movaps    %xmm6, %xmm8
+        orps      Two10+__svml_dacosh_data_internal(%rip), %xmm1
+
+/* exponent bits */
+        movaps    %xmm3, %xmm11
+
+/* reciprocal approximation good to at least 11 bits */
+        cvtpd2ps  %xmm1, %xmm2
+        cmpnlepd  dLargestFinite+__svml_dacosh_data_internal(%rip), %xmm9
+        cmpnltpd  %xmm7, %xmm8
+        addpd     %xmm14, %xmm4
+        movlhps   %xmm2, %xmm2
+        orps      %xmm8, %xmm9
+        rcpps     %xmm2, %xmm8
+        movmskpd  %xmm9, %edx
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        movups    .FLT_20(%rip), %xmm10
+        andps     %xmm5, %xmm4
+
+/* exponent of X needed to scale Xl */
+        movdqu    ExpMask0+__svml_dacosh_data_internal(%rip), %xmm9
+        psrlq     $20, %xmm11
+        cvtps2pd  %xmm8, %xmm1
+        addpd     %xmm10, %xmm1
+        subpd     %xmm10, %xmm1
+
+/* 2^ (-10-exp(X) ) */
+        movdqu    ExpMask2+__svml_dacosh_data_internal(%rip), %xmm2
+        pand      %xmm3, %xmm9
+        psubq     %xmm9, %xmm2
+
+/* scale DblRcp */
+        mulpd     %xmm1, %xmm2
+
+/* argument reduction */
+        mulpd     %xmm2, %xmm3
+        mulpd     %xmm2, %xmm4
+        subpd     %xmm6, %xmm3
+        movaps    %xmm3, %xmm2
+        movaps    %xmm5, %xmm0
+        addpd     %xmm4, %xmm2
+        pshufd    $221, %xmm11, %xmm12
+        movaps    %xmm2, %xmm6
+
+/* biased exponent in DP format */
+        cvtdq2pd  %xmm12, %xmm14
+        subpd     %xmm3, %xmm6
+
+/* polynomial */
+        movups    poly_coeff+__svml_dacosh_data_internal(%rip), %xmm3
+        lea       Table_Lookup_Bias+__svml_dacosh_data_internal(%rip), %rsi
+        mulpd     %xmm2, %xmm3
+        subpd     %xmm6, %xmm4
+        addpd     poly_coeff+16+__svml_dacosh_data_internal(%rip), %xmm3
+
+/* Add 31 to the exponent in the "large" case to get log(2 * input) */
+        movups    dThirtyOne+__svml_dacosh_data_internal(%rip), %xmm13
+
+/* exponent*log(2.0) */
+        movups    Threshold+__svml_dacosh_data_internal(%rip), %xmm8
+        addpd     %xmm14, %xmm13
+        cmpltpd   %xmm1, %xmm8
+        andps     %xmm5, %xmm14
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        movaps    %xmm1, %xmm5
+        movaps    %xmm2, %xmm1
+        andnps    %xmm13, %xmm0
+        mulpd     %xmm2, %xmm1
+        movups    poly_coeff+32+__svml_dacosh_data_internal(%rip), %xmm6
+        psrlq     $40, %xmm5
+        mulpd     %xmm2, %xmm6
+        mulpd     %xmm1, %xmm3
+        addpd     poly_coeff+48+__svml_dacosh_data_internal(%rip), %xmm6
+        movd      %xmm5, %eax
+        andps     Bias+__svml_dacosh_data_internal(%rip), %xmm8
+        orps      %xmm14, %xmm0
+        addpd     %xmm3, %xmm6
+
+/*
+ * reconstruction
+ * VQFMA( D, R, P, R2, R );
+ */
+        mulpd     %xmm6, %xmm1
+        addpd     %xmm1, %xmm4
+        orps      Bias1+__svml_dacosh_data_internal(%rip), %xmm8
+        pshufd    $2, %xmm5, %xmm15
+        subpd     %xmm8, %xmm0
+        addpd     %xmm4, %xmm2
+        movd      %xmm15, %ecx
+        mulpd     L2+__svml_dacosh_data_internal(%rip), %xmm0
+        movslq    %eax, %rax
+        movslq    %ecx, %rcx
+        movsd     (%rsi,%rax), %xmm9
+        movhpd    (%rsi,%rcx), %xmm9
+        addpd     %xmm2, %xmm9
+        addpd     %xmm9, %xmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm7
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm7, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      acosh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVbN2v_acosh_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dacosh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
+        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
+        __declspec(align(16)) VUINT32 ExpMask[2][2];
+        __declspec(align(16)) VUINT32 Two10[2][2];
+        __declspec(align(16)) VUINT32 MinLog1p[2][2];
+        __declspec(align(16)) VUINT32 MaxLog1p[2][2];
+        __declspec(align(16)) VUINT32 One[2][2];
+        __declspec(align(16)) VUINT32 SgnMask[2][2];
+        __declspec(align(16)) VUINT32 XThreshold[2][2];
+        __declspec(align(16)) VUINT32 XhMask[2][2];
+        __declspec(align(16)) VUINT32 Threshold[2][2];
+        __declspec(align(16)) VUINT32 Bias[2][2];
+        __declspec(align(16)) VUINT32 Bias1[2][2];
+        __declspec(align(16)) VUINT32 ExpMask0[2][2];
+        __declspec(align(16)) VUINT32 ExpMask2[2][2];
+        __declspec(align(16)) VUINT32 L2[2][2];
+        __declspec(align(16)) VUINT32 dBigThreshold[2][2];
+        __declspec(align(16)) VUINT32 dLargestFinite[2][2];
+        __declspec(align(16)) VUINT32 dThirtyOne[2][2];
+        __declspec(align(16)) VUINT32 XScale[2][2];
+} __svml_dacosh_data_internal;
+#endif
+__svml_dacosh_data_internal:
+        /* Log_HA_table */
+        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
+        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
+        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
+        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
+        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
+        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
+        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
+        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
+        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
+        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
+        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
+        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
+        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
+        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
+        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
+        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
+        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
+        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
+        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
+        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
+        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
+        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
+        .quad 0xc086238206e94218, 0xbe1ceee898588610
+        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
+        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
+        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
+        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
+        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
+        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
+        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
+        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
+        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
+        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
+        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
+        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
+        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
+        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
+        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
+        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
+        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
+        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
+        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
+        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
+        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
+        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
+        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
+        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
+        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
+        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
+        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
+        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
+        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
+        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
+        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
+        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
+        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
+        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
+        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
+        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
+        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
+        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
+        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
+        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
+        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
+        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
+        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
+        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
+        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
+        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
+        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
+        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
+        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
+        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
+        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
+        .quad 0xc086244055d2c968, 0xbe1cef345284c119
+        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
+        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
+        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
+        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
+        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
+        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
+        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
+        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
+        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
+        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
+        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
+        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
+        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
+        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
+        .quad 0xc086247419475160, 0xbe1cf03dd9922331
+        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
+        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
+        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
+        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
+        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
+        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
+        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
+        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
+        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
+        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
+        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
+        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
+        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
+        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
+        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
+        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
+        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
+        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
+        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
+        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
+        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
+        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
+        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
+        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
+        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
+        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
+        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
+        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
+        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
+        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
+        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
+        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
+        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
+        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
+        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
+        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
+        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
+        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
+        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
+        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
+        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
+        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
+        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
+        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
+        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
+        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
+        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
+        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
+        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
+        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
+        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
+        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
+        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
+        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
+        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
+        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
+        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
+        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
+        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
+        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
+        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
+        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
+        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
+        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
+        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
+        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
+        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
+        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
+        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
+        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
+        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
+        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
+        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
+        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
+        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
+        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
+        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
+        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
+        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
+        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
+        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
+        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
+        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
+        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
+        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
+        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
+        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
+        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
+        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
+        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
+        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
+        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
+        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
+        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
+        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
+        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
+        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
+        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
+        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
+        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
+        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
+        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
+        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
+        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
+        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
+        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
+        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
+        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
+        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
+        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
+        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
+        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
+        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
+        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
+        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
+        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
+        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
+        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
+        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
+        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
+        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
+        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
+        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
+        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
+        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
+        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
+        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
+        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
+        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
+        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
+        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
+        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
+        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
+        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
+        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
+        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
+        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
+        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
+        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
+        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
+        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
+        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
+        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
+        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
+        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
+        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
+        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
+        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
+        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
+        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
+        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
+        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
+        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
+        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
+        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
+        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
+        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
+        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
+        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
+        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
+        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
+        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
+        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
+        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
+        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
+        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
+        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
+        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
+        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
+        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
+        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
+        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
+        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
+        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
+        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
+        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
+        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
+        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
+        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
+        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
+        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
+        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
+        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
+        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
+        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
+        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
+        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
+        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
+        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
+        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
+        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
+        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
+        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
+        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
+        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
+        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
+        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
+        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
+        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
+        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
+        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
+        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
+        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
+        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
+        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
+        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
+        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
+        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
+        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
+        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
+        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
+        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
+        .quad 0xc08626e164224880, 0xbe1ceeb431709788
+        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
+        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
+        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
+        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
+        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
+        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
+        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
+        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
+        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
+        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
+        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
+        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
+        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
+        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
+        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
+        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
+        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
+        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
+        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
+        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
+        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
+        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
+        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
+        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
+        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
+        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
+        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
+        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
+        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
+        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
+        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
+        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
+        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
+        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
+        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
+        .quad 0xc086273a05367688, 0xbe1cf18656c50806
+        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
+        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
+        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
+        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
+        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
+        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
+        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
+        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
+        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
+        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
+        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
+        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
+        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
+        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
+        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
+        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
+        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
+        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
+        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
+        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
+        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
+        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
+        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
+        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
+        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
+        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
+        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
+        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
+        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
+        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
+        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
+        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
+        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
+        .quad 0xc086278a58297918, 0xbe1cf053073872bf
+        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
+        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
+        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
+        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
+        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
+        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
+        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
+        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
+        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
+        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
+        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
+        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
+        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
+        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
+        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
+        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
+        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
+        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
+        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
+        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
+        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
+        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
+        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
+        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
+        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
+        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
+        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
+        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
+        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
+        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
+        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
+        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
+        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
+        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
+        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
+        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
+        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
+        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
+        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
+        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
+        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
+        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
+        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
+        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
+        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
+        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
+        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
+        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
+        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
+        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
+        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
+        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
+        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
+        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
+        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
+        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
+        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
+        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
+        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
+        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
+        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
+        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
+        .quad 0xc086281755366778, 0xbe1cef2edae5837d
+        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
+        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
+        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
+        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
+        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
+        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
+        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
+        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
+        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
+        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
+        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
+        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
+        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
+        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
+        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
+        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
+        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
+        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
+        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
+        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
+        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
+        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
+        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
+        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
+        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
+        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
+        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
+        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
+        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
+        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
+        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
+        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
+        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
+        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
+        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
+        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
+        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
+        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
+        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
+        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
+        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
+        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
+        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
+        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
+        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
+        .quad 0xc086287879041490, 0xbe1cf034803c8a48
+        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
+        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
+        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
+        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
+        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
+        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
+        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
+        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
+        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
+        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
+        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
+        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
+        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
+        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
+        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
+        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
+        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
+        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
+        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
+        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
+        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
+        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
+        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
+        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
+        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
+        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
+        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
+        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
+        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
+        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
+        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
+        /*== Log_LA_table ==*/
+        .align 16
+        .quad 0x8000000000000000
+        .quad 0xbf5ff802a9ab10e6
+        .quad 0xbf6ff00aa2b10bc0
+        .quad 0xbf77ee11ebd82e94
+        .quad 0xbf7fe02a6b106789
+        .quad 0xbf83e7295d25a7d9
+        .quad 0xbf87dc475f810a77
+        .quad 0xbf8bcf712c74384c
+        .quad 0xbf8fc0a8b0fc03e4
+        .quad 0xbf91d7f7eb9eebe7
+        .quad 0xbf93cea44346a575
+        .quad 0xbf95c45a51b8d389
+        .quad 0xbf97b91b07d5b11b
+        .quad 0xbf99ace7551cc514
+        .quad 0xbf9b9fc027af9198
+        .quad 0xbf9d91a66c543cc4
+        .quad 0xbf9f829b0e783300
+        .quad 0xbfa0b94f7c196176
+        .quad 0xbfa1b0d98923d980
+        .quad 0xbfa2a7ec2214e873
+        .quad 0xbfa39e87b9febd60
+        .quad 0xbfa494acc34d911c
+        .quad 0xbfa58a5bafc8e4d5
+        .quad 0xbfa67f94f094bd98
+        .quad 0xbfa77458f632dcfc
+        .quad 0xbfa868a83083f6cf
+        .quad 0xbfa95c830ec8e3eb
+        .quad 0xbfaa4fe9ffa3d235
+        .quad 0xbfab42dd711971bf
+        .quad 0xbfac355dd0921f2d
+        .quad 0xbfad276b8adb0b52
+        .quad 0xbfae19070c276016
+        .quad 0xbfaf0a30c01162a6
+        .quad 0xbfaffae9119b9303
+        .quad 0xbfb075983598e471
+        .quad 0xbfb0ed839b5526fe
+        .quad 0xbfb16536eea37ae1
+        .quad 0xbfb1dcb263db1944
+        .quad 0xbfb253f62f0a1417
+        .quad 0xbfb2cb0283f5de1f
+        .quad 0xbfb341d7961bd1d1
+        .quad 0xbfb3b87598b1b6ee
+        .quad 0xbfb42edcbea646f0
+        .quad 0xbfb4a50d3aa1b040
+        .quad 0xbfb51b073f06183f
+        .quad 0xbfb590cafdf01c28
+        .quad 0xbfb60658a93750c4
+        .quad 0xbfb67bb0726ec0fc
+        .quad 0xbfb6f0d28ae56b4c
+        .quad 0xbfb765bf23a6be13
+        .quad 0xbfb7da766d7b12cd
+        .quad 0xbfb84ef898e8282a
+        .quad 0xbfb8c345d6319b21
+        .quad 0xbfb9375e55595ede
+        .quad 0xbfb9ab42462033ad
+        .quad 0xbfba1ef1d8061cd4
+        .quad 0xbfba926d3a4ad563
+        .quad 0xbfbb05b49bee43fe
+        .quad 0xbfbb78c82bb0eda1
+        .quad 0xbfbbeba818146765
+        .quad 0xbfbc5e548f5bc743
+        .quad 0xbfbcd0cdbf8c13e1
+        .quad 0xbfbd4313d66cb35d
+        .quad 0xbfbdb5270187d927
+        .quad 0xbfbe27076e2af2e6
+        .quad 0xbfbe98b549671467
+        .quad 0xbfbf0a30c01162a6
+        .quad 0xbfbf7b79fec37ddf
+        .quad 0xbfbfec9131dbeabb
+        .quad 0xbfc02ebb42bf3d4b
+        .quad 0xbfc0671512ca596e
+        .quad 0xbfc09f561ee719c3
+        .quad 0xbfc0d77e7cd08e59
+        .quad 0xbfc10f8e422539b1
+        .quad 0xbfc14785846742ac
+        .quad 0xbfc17f6458fca611
+        .quad 0xbfc1b72ad52f67a0
+        .quad 0xbfc1eed90e2dc2c3
+        .quad 0xbfc2266f190a5acb
+        .quad 0xbfc25ded0abc6ad2
+        .quad 0xbfc29552f81ff523
+        .quad 0xbfc2cca0f5f5f251
+        .quad 0xbfc303d718e47fd3
+        .quad 0xbfc33af575770e4f
+        .quad 0xbfc371fc201e8f74
+        .quad 0xbfc3a8eb2d31a376
+        .quad 0xbfc3dfc2b0ecc62a
+        .quad 0xbfc41682bf727bc0
+        .quad 0xbfc44d2b6ccb7d1e
+        .quad 0xbfc483bccce6e3dd
+        .quad 0xbfc4ba36f39a55e5
+        .quad 0xbfc4f099f4a230b2
+        .quad 0xbfc526e5e3a1b438
+        .quad 0xbfc55d1ad4232d6f
+        .quad 0xbfc59338d9982086
+        .quad 0xbfc5c940075972b9
+        .quad 0xbfc5ff3070a793d4
+        .quad 0xbfc6350a28aaa758
+        .quad 0xbfc66acd4272ad51
+        .quad 0xbfc6a079d0f7aad2
+        .quad 0xbfc6d60fe719d21d
+        .quad 0xbfc70b8f97a1aa75
+        .quad 0xbfc740f8f54037a5
+        .quad 0xbfc7764c128f2127
+        .quad 0xbfc7ab890210d909
+        .quad 0xbfc7e0afd630c274
+        .quad 0xbfc815c0a14357eb
+        .quad 0xbfc84abb75865139
+        .quad 0xbfc87fa06520c911
+        .quad 0xbfc8b46f8223625b
+        .quad 0xbfc8e928de886d41
+        .quad 0xbfc91dcc8c340bde
+        .quad 0xbfc9525a9cf456b4
+        .quad 0xbfc986d3228180ca
+        .quad 0xbfc9bb362e7dfb83
+        .quad 0xbfc9ef83d2769a34
+        .quad 0xbfca23bc1fe2b563
+        .quad 0xbfca57df28244dcd
+        .quad 0xbfca8becfc882f19
+        .quad 0xbfcabfe5ae46124c
+        .quad 0xbfcaf3c94e80bff3
+        .quad 0xbfcb2797ee46320c
+        .quad 0xbfcb5b519e8fb5a4
+        .quad 0xbfcb8ef670420c3b
+        .quad 0xbfcbc286742d8cd6
+        .quad 0xbfcbf601bb0e44e2
+        .quad 0xbfcc2968558c18c1
+        .quad 0xbfcc5cba543ae425
+        .quad 0xbfcc8ff7c79a9a22
+        .quad 0xbfccc320c0176502
+        .quad 0xbfccf6354e09c5dc
+        .quad 0xbfcd293581b6b3e7
+        .quad 0xbfcd5c216b4fbb91
+        .quad 0xbfcd8ef91af31d5e
+        .quad 0xbfcdc1bca0abec7d
+        .quad 0xbfcdf46c0c722d2f
+        .quad 0xbfce27076e2af2e6
+        .quad 0xbfce598ed5a87e2f
+        .quad 0xbfce8c0252aa5a60
+        .quad 0xbfcebe61f4dd7b0b
+        .quad 0xbfcef0adcbdc5936
+        .quad 0xbfcf22e5e72f105d
+        .quad 0xbfcf550a564b7b37
+        .quad 0xbfcf871b28955045
+        .quad 0xbfcfb9186d5e3e2b
+        .quad 0xbfcfeb0233e607cc
+        .quad 0xbfd00e6c45ad501d
+        .quad 0xbfd0274dc16c232f
+        .quad 0xbfd0402594b4d041
+        .quad 0xbfd058f3c703ebc6
+        .quad 0xbfd071b85fcd590d
+        .quad 0xbfd08a73667c57af
+        .quad 0xbfd0a324e27390e3
+        .quad 0xbfd0bbccdb0d24bd
+        .quad 0xbfd0d46b579ab74b
+        .quad 0xbfd0ed005f657da4
+        .quad 0xbfd1058bf9ae4ad5
+        .quad 0xbfd11e0e2dad9cb7
+        .quad 0xbfd136870293a8b0
+        .quad 0xbfd14ef67f88685a
+        .quad 0xbfd1675cababa60e
+        .quad 0xbfd17fb98e15095d
+        .quad 0xbfd1980d2dd4236f
+        .quad 0xbfd1b05791f07b49
+        .quad 0xbfd1c898c16999fb
+        .quad 0xbfd1e0d0c33716be
+        .quad 0xbfd1f8ff9e48a2f3
+        .quad 0xbfd211255986160c
+        .quad 0xbfd22941fbcf7966
+        .quad 0xbfd241558bfd1404
+        .quad 0xbfd2596010df763a
+        .quad 0xbfd27161913f853d
+        .quad 0xbfd2895a13de86a3
+        .quad 0xbfd2a1499f762bc9
+        .quad 0xbfd2b9303ab89d25
+        .quad 0xbfd2d10dec508583
+        .quad 0xbfd2e8e2bae11d31
+        .quad 0xbfd300aead06350c
+        .quad 0xbfd31871c9544185
+        .quad 0xbfd3302c16586588
+        .quad 0xbfd347dd9a987d55
+        .quad 0xbfd35f865c93293e
+        .quad 0xbfd3772662bfd85b
+        .quad 0xbfd38ebdb38ed321
+        .quad 0xbfd3a64c556945ea
+        .quad 0xbfd3bdd24eb14b6a
+        .quad 0xbfd3d54fa5c1f710
+        .quad 0xbfd3ecc460ef5f50
+        .quad 0xbfd404308686a7e4
+        .quad 0xbfd41b941cce0bee
+        .quad 0xbfd432ef2a04e814
+        .quad 0xbfd44a41b463c47c
+        .quad 0xbfd4618bc21c5ec2
+        .quad 0xbfd478cd5959b3d9
+        .quad 0xbfd49006804009d1
+        .quad 0xbfd4a7373cecf997
+        .quad 0xbfd4be5f957778a1
+        .quad 0xbfd4d57f8fefe27f
+        .quad 0xbfd4ec973260026a
+        .quad 0xbfd503a682cb1cb3
+        .quad 0xbfd51aad872df82d
+        .quad 0xbfd531ac457ee77e
+        .quad 0xbfd548a2c3add263
+        .quad 0xbfd55f9107a43ee2
+        .quad 0xbfd5767717455a6c
+        .quad 0xbfd58d54f86e02f2
+        .quad 0xbfd5a42ab0f4cfe2
+        .quad 0xbfd5baf846aa1b19
+        .quad 0xbfd5d1bdbf5809ca
+        .quad 0xbfd5e87b20c2954a
+        .quad 0xbfd5ff3070a793d4
+        .quad 0xbfd615ddb4bec13c
+        .quad 0xbfd62c82f2b9c795
+        .quad 0x3fd61965cdb02c1f
+        .quad 0x3fd602d08af091ec
+        .quad 0x3fd5ec433d5c35ae
+        .quad 0x3fd5d5bddf595f30
+        .quad 0x3fd5bf406b543db2
+        .quad 0x3fd5a8cadbbedfa1
+        .quad 0x3fd5925d2b112a59
+        .quad 0x3fd57bf753c8d1fb
+        .quad 0x3fd565995069514c
+        .quad 0x3fd54f431b7be1a9
+        .quad 0x3fd538f4af8f72fe
+        .quad 0x3fd522ae0738a3d8
+        .quad 0x3fd50c6f1d11b97c
+        .quad 0x3fd4f637ebba9810
+        .quad 0x3fd4e0086dd8baca
+        .quad 0x3fd4c9e09e172c3c
+        .quad 0x3fd4b3c077267e9a
+        .quad 0x3fd49da7f3bcc41f
+        .quad 0x3fd487970e958770
+        .quad 0x3fd4718dc271c41b
+        .quad 0x3fd45b8c0a17df13
+        .quad 0x3fd44591e0539f49
+        .quad 0x3fd42f9f3ff62642
+        .quad 0x3fd419b423d5e8c7
+        .quad 0x3fd403d086cea79c
+        .quad 0x3fd3edf463c1683e
+        .quad 0x3fd3d81fb5946dba
+        .quad 0x3fd3c25277333184
+        .quad 0x3fd3ac8ca38e5c5f
+        .quad 0x3fd396ce359bbf54
+        .quad 0x3fd3811728564cb2
+        .quad 0x3fd36b6776be1117
+        .quad 0x3fd355bf1bd82c8b
+        .quad 0x3fd3401e12aecba1
+        .quad 0x3fd32a84565120a8
+        .quad 0x3fd314f1e1d35ce4
+        .quad 0x3fd2ff66b04ea9d4
+        .quad 0x3fd2e9e2bce12286
+        .quad 0x3fd2d46602adccee
+        .quad 0x3fd2bef07cdc9354
+        .quad 0x3fd2a982269a3dbf
+        .quad 0x3fd2941afb186b7c
+        .quad 0x3fd27ebaf58d8c9d
+        .quad 0x3fd269621134db92
+        .quad 0x3fd25410494e56c7
+        .quad 0x3fd23ec5991eba49
+        .quad 0x3fd22981fbef797b
+        .quad 0x3fd214456d0eb8d4
+        .quad 0x3fd1ff0fe7cf47a7
+        .quad 0x3fd1e9e1678899f4
+        .quad 0x3fd1d4b9e796c245
+        .quad 0x3fd1bf99635a6b95
+        .quad 0x3fd1aa7fd638d33f
+        .quad 0x3fd1956d3b9bc2fa
+        .quad 0x3fd180618ef18adf
+        .quad 0x3fd16b5ccbacfb73
+        .quad 0x3fd1565eed455fc3
+        .quad 0x3fd14167ef367783
+        .quad 0x3fd12c77cd00713b
+        .quad 0x3fd1178e8227e47c
+        .quad 0x3fd102ac0a35cc1c
+        .quad 0x3fd0edd060b78081
+        .quad 0x3fd0d8fb813eb1ef
+        .quad 0x3fd0c42d676162e3
+        .quad 0x3fd0af660eb9e279
+        .quad 0x3fd09aa572e6c6d4
+        .quad 0x3fd085eb8f8ae797
+        .quad 0x3fd07138604d5862
+        .quad 0x3fd05c8be0d9635a
+        .quad 0x3fd047e60cde83b8
+        .quad 0x3fd03346e0106062
+        .quad 0x3fd01eae5626c691
+        .quad 0x3fd00a1c6adda473
+        .quad 0x3fcfeb2233ea07cd
+        .quad 0x3fcfc218be620a5e
+        .quad 0x3fcf991c6cb3b379
+        .quad 0x3fcf702d36777df0
+        .quad 0x3fcf474b134df229
+        .quad 0x3fcf1e75fadf9bde
+        .quad 0x3fcef5ade4dcffe6
+        .quad 0x3fceccf2c8fe920a
+        .quad 0x3fcea4449f04aaf5
+        .quad 0x3fce7ba35eb77e2a
+        .quad 0x3fce530effe71012
+        .quad 0x3fce2a877a6b2c12
+        .quad 0x3fce020cc6235ab5
+        .quad 0x3fcdd99edaf6d7e9
+        .quad 0x3fcdb13db0d48940
+        .quad 0x3fcd88e93fb2f450
+        .quad 0x3fcd60a17f903515
+        .quad 0x3fcd38666871f465
+        .quad 0x3fcd1037f2655e7b
+        .quad 0x3fcce816157f1988
+        .quad 0x3fccc000c9db3c52
+        .quad 0x3fcc97f8079d44ec
+        .quad 0x3fcc6ffbc6f00f71
+        .quad 0x3fcc480c0005ccd1
+        .quad 0x3fcc2028ab17f9b4
+        .quad 0x3fcbf851c067555f
+        .quad 0x3fcbd087383bd8ad
+        .quad 0x3fcba8c90ae4ad19
+        .quad 0x3fcb811730b823d2
+        .quad 0x3fcb5971a213acdb
+        .quad 0x3fcb31d8575bce3d
+        .quad 0x3fcb0a4b48fc1b46
+        .quad 0x3fcae2ca6f672bd4
+        .quad 0x3fcabb55c31693ad
+        .quad 0x3fca93ed3c8ad9e3
+        .quad 0x3fca6c90d44b704e
+        .quad 0x3fca454082e6ab05
+        .quad 0x3fca1dfc40f1b7f1
+        .quad 0x3fc9f6c407089664
+        .quad 0x3fc9cf97cdce0ec3
+        .quad 0x3fc9a8778debaa38
+        .quad 0x3fc981634011aa75
+        .quad 0x3fc95a5adcf7017f
+        .quad 0x3fc9335e5d594989
+        .quad 0x3fc90c6db9fcbcd9
+        .quad 0x3fc8e588ebac2dbf
+        .quad 0x3fc8beafeb38fe8c
+        .quad 0x3fc897e2b17b19a5
+        .quad 0x3fc871213750e994
+        .quad 0x3fc84a6b759f512f
+        .quad 0x3fc823c16551a3c2
+        .quad 0x3fc7fd22ff599d4f
+        .quad 0x3fc7d6903caf5ad0
+        .quad 0x3fc7b0091651528c
+        .quad 0x3fc7898d85444c73
+        .quad 0x3fc7631d82935a86
+        .quad 0x3fc73cb9074fd14d
+        .quad 0x3fc716600c914054
+        .quad 0x3fc6f0128b756abc
+        .quad 0x3fc6c9d07d203fc7
+        .quad 0x3fc6a399dabbd383
+        .quad 0x3fc67d6e9d785771
+        .quad 0x3fc6574ebe8c133a
+        .quad 0x3fc6313a37335d76
+        .quad 0x3fc60b3100b09476
+        .quad 0x3fc5e533144c1719
+        .quad 0x3fc5bf406b543db2
+        .quad 0x3fc59958ff1d52f1
+        .quad 0x3fc5737cc9018cdd
+        .quad 0x3fc54dabc26105d2
+        .quad 0x3fc527e5e4a1b58d
+        .quad 0x3fc5022b292f6a45
+        .quad 0x3fc4dc7b897bc1c8
+        .quad 0x3fc4b6d6fefe22a4
+        .quad 0x3fc4913d8333b561
+        .quad 0x3fc46baf0f9f5db7
+        .quad 0x3fc4462b9dc9b3dc
+        .quad 0x3fc420b32740fdd4
+        .quad 0x3fc3fb45a59928cc
+        .quad 0x3fc3d5e3126bc27f
+        .quad 0x3fc3b08b6757f2a9
+        .quad 0x3fc38b3e9e027479
+        .quad 0x3fc365fcb0159016
+        .quad 0x3fc340c59741142e
+        .quad 0x3fc31b994d3a4f85
+        .quad 0x3fc2f677cbbc0a96
+        .quad 0x3fc2d1610c86813a
+        .quad 0x3fc2ac55095f5c59
+        .quad 0x3fc28753bc11aba5
+        .quad 0x3fc2625d1e6ddf57
+        .quad 0x3fc23d712a49c202
+        .quad 0x3fc2188fd9807263
+        .quad 0x3fc1f3b925f25d41
+        .quad 0x3fc1ceed09853752
+        .quad 0x3fc1aa2b7e23f72a
+        .quad 0x3fc185747dbecf34
+        .quad 0x3fc160c8024b27b1
+        .quad 0x3fc13c2605c398c3
+        .quad 0x3fc1178e8227e47c
+        .quad 0x3fc0f301717cf0fb
+        .quad 0x3fc0ce7ecdccc28d
+        .quad 0x3fc0aa06912675d5
+        .quad 0x3fc08598b59e3a07
+        .quad 0x3fc06135354d4b18
+        .quad 0x3fc03cdc0a51ec0d
+        .quad 0x3fc0188d2ecf6140
+        .quad 0x3fbfe89139dbd566
+        .quad 0x3fbfa01c9db57ce2
+        .quad 0x3fbf57bc7d9005db
+        .quad 0x3fbf0f70cdd992e3
+        .quad 0x3fbec739830a1120
+        .quad 0x3fbe7f1691a32d3e
+        .quad 0x3fbe3707ee30487b
+        .quad 0x3fbdef0d8d466db9
+        .quad 0x3fbda727638446a2
+        .quad 0x3fbd5f55659210e2
+        .quad 0x3fbd179788219364
+        .quad 0x3fbccfedbfee13a8
+        .quad 0x3fbc885801bc4b23
+        .quad 0x3fbc40d6425a5cb1
+        .quad 0x3fbbf968769fca11
+        .quad 0x3fbbb20e936d6974
+        .quad 0x3fbb6ac88dad5b1c
+        .quad 0x3fbb23965a52ff00
+        .quad 0x3fbadc77ee5aea8c
+        .quad 0x3fba956d3ecade63
+        .quad 0x3fba4e7640b1bc38
+        .quad 0x3fba0792e9277cac
+        .quad 0x3fb9c0c32d4d2548
+        .quad 0x3fb97a07024cbe74
+        .quad 0x3fb9335e5d594989
+        .quad 0x3fb8ecc933aeb6e8
+        .quad 0x3fb8a6477a91dc29
+        .quad 0x3fb85fd927506a48
+        .quad 0x3fb8197e2f40e3f0
+        .quad 0x3fb7d33687c293c9
+        .quad 0x3fb78d02263d82d3
+        .quad 0x3fb746e100226ed9
+        .quad 0x3fb700d30aeac0e1
+        .quad 0x3fb6bad83c1883b6
+        .quad 0x3fb674f089365a7a
+        .quad 0x3fb62f1be7d77743
+        .quad 0x3fb5e95a4d9791cb
+        .quad 0x3fb5a3abb01ade25
+        .quad 0x3fb55e10050e0384
+        .quad 0x3fb518874226130a
+        .quad 0x3fb4d3115d207eac
+        .quad 0x3fb48dae4bc31018
+        .quad 0x3fb4485e03dbdfad
+        .quad 0x3fb403207b414b7f
+        .quad 0x3fb3bdf5a7d1ee64
+        .quad 0x3fb378dd7f749714
+        .quad 0x3fb333d7f8183f4b
+        .quad 0x3fb2eee507b40301
+        .quad 0x3fb2aa04a44717a5
+        .quad 0x3fb26536c3d8c369
+        .quad 0x3fb2207b5c78549e
+        .quad 0x3fb1dbd2643d190b
+        .quad 0x3fb1973bd1465567
+        .quad 0x3fb152b799bb3cc9
+        .quad 0x3fb10e45b3cae831
+        .quad 0x3fb0c9e615ac4e17
+        .quad 0x3fb08598b59e3a07
+        .quad 0x3fb0415d89e74444
+        .quad 0x3faffa6911ab9301
+        .quad 0x3faf723b517fc523
+        .quad 0x3faeea31c006b87c
+        .quad 0x3fae624c4a0b5e1b
+        .quad 0x3fadda8adc67ee4e
+        .quad 0x3fad52ed6405d86f
+        .quad 0x3faccb73cdddb2cc
+        .quad 0x3fac441e06f72a9e
+        .quad 0x3fabbcebfc68f420
+        .quad 0x3fab35dd9b58baad
+        .quad 0x3faaaef2d0fb10fc
+        .quad 0x3faa282b8a936171
+        .quad 0x3fa9a187b573de7c
+        .quad 0x3fa91b073efd7314
+        .quad 0x3fa894aa149fb343
+        .quad 0x3fa80e7023d8ccc4
+        .quad 0x3fa788595a3577ba
+        .quad 0x3fa70265a550e777
+        .quad 0x3fa67c94f2d4bb58
+        .quad 0x3fa5f6e73078efb8
+        .quad 0x3fa5715c4c03ceef
+        .quad 0x3fa4ebf43349e26f
+        .quad 0x3fa466aed42de3ea
+        .quad 0x3fa3e18c1ca0ae92
+        .quad 0x3fa35c8bfaa1306b
+        .quad 0x3fa2d7ae5c3c5bae
+        .quad 0x3fa252f32f8d183f
+        .quad 0x3fa1ce5a62bc353a
+        .quad 0x3fa149e3e4005a8d
+        .quad 0x3fa0c58fa19dfaaa
+        .quad 0x3fa0415d89e74444
+        .quad 0x3f9f7a9b16782856
+        .quad 0x3f9e72bf2813ce51
+        .quad 0x3f9d6b2725979802
+        .quad 0x3f9c63d2ec14aaf2
+        .quad 0x3f9b5cc258b718e6
+        .quad 0x3f9a55f548c5c43f
+        .quad 0x3f994f6b99a24475
+        .quad 0x3f98492528c8cabf
+        .quad 0x3f974321d3d006d3
+        .quad 0x3f963d6178690bd6
+        .quad 0x3f9537e3f45f3565
+        .quad 0x3f9432a925980cc1
+        .quad 0x3f932db0ea132e22
+        .quad 0x3f9228fb1fea2e28
+        .quad 0x3f912487a5507f70
+        .quad 0x3f90205658935847
+        .quad 0x3f8e38ce3033310c
+        .quad 0x3f8c317384c75f06
+        .quad 0x3f8a2a9c6c170462
+        .quad 0x3f882448a388a2aa
+        .quad 0x3f861e77e8b53fc6
+        .quad 0x3f841929f96832f0
+        .quad 0x3f82145e939ef1e9
+        .quad 0x3f8010157588de71
+        .quad 0x3f7c189cbb0e27fb
+        .quad 0x3f78121214586b54
+        .quad 0x3f740c8a747878e2
+        .quad 0x3f70080559588b35
+        .quad 0x3f680904828985c0
+        .quad 0x3f60040155d5889e
+        .quad 0x3f50020055655889
+        .quad 0x0000000000000000
+        /*== poly_coeff[4] ==*/
+        .align 16
+        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
+        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
+        .quad 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
+        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 16
+        .quad 0x3f50000000000000, 0x3f50000000000000
+        /*== MinLog1p = -1+2^(-53) ==*/
+        .align 16
+        .quad 0xbfefffffffffffff, 0xbfefffffffffffff
+        /*== MaxLog1p ==*/
+        .align 16
+        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000
+        /*== One ==*/
+        .align 16
+        .quad 0x3ff0000000000000, 0x3ff0000000000000
+        /*== SgnMask ==*/
+        .align 16
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== XThreshold ==*/
+        .align 16
+        .quad 0x3e00000000000000, 0x3e00000000000000
+        /*== XhMask ==*/
+        .align 16
+        .quad 0xfffffffffffffc00, 0xfffffffffffffc00
+        /*== Threshold ==*/
+        .align 16
+        .quad 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 16
+        .quad 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 16
+        .quad 0x408ff00000000000, 0x408ff00000000000
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x7ff0000000000000, 0x7ff0000000000000
+        /*== ExpMask2 ==*/
+        .align 16
+        .quad 0x7f40000000000000, 0x7f40000000000000
+        /*== L2L ==*/
+        .align 16
+        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
+        /*== dBigThreshold ==*/
+        .align 16
+        .quad 0x41D0000000000000, 0x41D0000000000000
+        /*== dLargestFinite ==*/
+        .align 16
+        .quad 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF
+        /*== dThirtyOne ==*/
+        .align 16
+        .quad 0x403F000000000000, 0x403F000000000000
+        /*== XScale ==*/
+        .align 16
+        .quad 0x3E10000000000000, 0x3E10000000000000
+        .align 16
+        .type	__svml_dacosh_data_internal,@object
+        .size	__svml_dacosh_data_internal,.-__svml_dacosh_data_internal
+        .align 16
+
+.FLT_20:
+        .long	0x00000000,0x43380000,0x00000000,0x43380000
+        .type	.FLT_20,@object
+        .size	.FLT_20,16
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core-sse.S
new file mode 100644
index 0000000000..cc524d4813
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized acosh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_acosh _ZGVdN4v_acosh_sse_wrapper
+#include "../svml_d_acosh4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core.c
new file mode 100644
index 0000000000..bb07c44f4b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized acosh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_acosh
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_acosh, __GI__ZGVdN4v_acosh, __redirect__ZGVdN4v_acosh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core_avx2.S
new file mode 100644
index 0000000000..18f278d899
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core_avx2.S
@@ -0,0 +1,1536 @@
+/* Function acosh vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute acosh(x) as log(x + sqrt(x*x - 1))
+ *
+ *   Special cases:
+ *
+ *   acosh(NaN)  = quiet NaN, and raise invalid exception
+ *   acosh(-INF) = NaN
+ *   acosh(+INF) = +INF
+ *   acosh(x)    = NaN if x < 1
+ *   acosh(1)    = +0
+ *
+ */
+
+/* Offsets for data table __svml_dacosh_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	8224
+#define poly_coeff                    	12352
+#define ExpMask                       	12480
+#define Two10                         	12512
+#define MinLog1p                      	12544
+#define MaxLog1p                      	12576
+#define One                           	12608
+#define SgnMask                       	12640
+#define XThreshold                    	12672
+#define XhMask                        	12704
+#define Threshold                     	12736
+#define Bias                          	12768
+#define Bias1                         	12800
+#define ExpMask0                      	12832
+#define ExpMask2                      	12864
+#define L2                            	12896
+#define dBigThreshold                 	12928
+#define dC1                           	12960
+#define dC2                           	12992
+#define dC3                           	13024
+#define dC4                           	13056
+#define dC5                           	13088
+#define dLargestFinite                	13120
+#define dThirtyOne                    	13152
+#define dTopMask12                    	13184
+#define dTopMask29                    	13216
+#define XScale                        	13248
+
+/* Lookup bias for data table __svml_dacosh_data_internal.  */
+#define Table_Lookup_Bias               -0x405fe0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_acosh_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       Table_Lookup_Bias+__svml_dacosh_data_internal(%rip), %r8
+
+/* Load the constant 1 and possibly other stuff */
+        vmovupd   One+__svml_dacosh_data_internal(%rip), %ymm8
+
+/*
+ * Now       1 / (1 + d)
+ * = 1 / (1 + (sqrt(1 - e) - 1))
+ * = 1 / sqrt(1 - e)
+ * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 +
+ * 63/256 * e^5 + 231/1024 * e^6 + ....
+ * So compute the first five nonconstant terms of that, so that
+ * we have a relative correction (1 + Corr) to apply to S etc.
+ * C1 = 1/2
+ * C2 = 3/8
+ * C3 = 5/16
+ * C4 = 35/128
+ * C5 = 63/256
+ */
+        vmovupd   dC5+__svml_dacosh_data_internal(%rip), %ymm3
+        vmovapd   %ymm0, %ymm9
+        vmovapd   %ymm8, %ymm13
+        vfmsub231pd %ymm9, %ymm9, %ymm13
+
+/*
+ * Check that 1 < X < +inf; otherwise go to the callout function.
+ * We need the callout for X = 1 to avoid division by zero below.
+ * This test ensures that callout handles NaN and either infinity.
+ */
+        vcmpnle_uqpd dLargestFinite+__svml_dacosh_data_internal(%rip), %ymm9, %ymm10
+        vcmpngt_uqpd %ymm8, %ymm9, %ymm11
+
+/* dU is needed later on */
+        vsubpd    %ymm8, %ymm9, %ymm6
+
+/*
+ * The following computation can go wrong for very large X, e.g.
+ * the X^2 - 1 = U * V can overflow. But for large X we have
+ * acosh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
+ * we can just later stick X back into the log and tweak up the exponent.
+ * Actually we scale X by 2^-30 and tweak the exponent up by 31,
+ * to stay in the safe range for the later log computation.
+ * Compute a flag now telling us when to do this.
+ */
+        vcmplt_oqpd dBigThreshold+__svml_dacosh_data_internal(%rip), %ymm9, %ymm7
+
+/*
+ * do the same thing but with NR iteration
+ * Finally, express Y + W = U * V accurately where Y has <= 29 bits
+ */
+        vandpd    dTopMask29+__svml_dacosh_data_internal(%rip), %ymm13, %ymm5
+
+/*
+ * Compute R = 1/sqrt(Y + W) * (1 + d)
+ * Force R to <= 12 significant bits in case it isn't already
+ * This means that R * Y and R^2 * Y are exactly representable.
+ */
+        vcvtpd2ps %ymm5, %xmm14
+        vsubpd    %ymm5, %ymm13, %ymm4
+        vrsqrtps  %xmm14, %xmm15
+        vcvtps2pd %xmm15, %ymm0
+        vandpd    dTopMask12+__svml_dacosh_data_internal(%rip), %ymm0, %ymm2
+        vorpd     %ymm11, %ymm10, %ymm12
+
+/*
+ * Compute S = (Y/sqrt(Y + W)) * (1 + d)
+ * and T = (W/sqrt(Y + W)) * (1 + d)
+ * so that S + T = sqrt(Y + W) * (1 + d)
+ * S is exact, and the rounding error in T is OK.
+ */
+        vmulpd    %ymm2, %ymm5, %ymm10
+        vmulpd    %ymm4, %ymm2, %ymm11
+
+/*
+ * Compute e = -(2 * d + d^2)
+ * The first FMR is exact, and the rounding error in the other is acceptable
+ * since d and e are ~ 2^-12
+ */
+        vmovapd   %ymm8, %ymm1
+        vfnmadd231pd %ymm10, %ymm2, %ymm1
+
+/*
+ * For low-accuracy versions, the computation can be done
+ * just as U + ((S + T) + (S + T) * Corr)
+ */
+        vaddpd    %ymm11, %ymm10, %ymm13
+        vfnmadd231pd %ymm11, %ymm2, %ymm1
+        vfmadd213pd dC4+__svml_dacosh_data_internal(%rip), %ymm1, %ymm3
+        vfmadd213pd dC3+__svml_dacosh_data_internal(%rip), %ymm1, %ymm3
+        vfmadd213pd dC2+__svml_dacosh_data_internal(%rip), %ymm1, %ymm3
+        vfmadd213pd dC1+__svml_dacosh_data_internal(%rip), %ymm1, %ymm3
+        vmovmskpd %ymm12, %eax
+        vmulpd    %ymm3, %ymm1, %ymm12
+
+/* Now multiplex to the case X = 2^-30 * input, Xl = dL = 0 in the "big" case. */
+        vmulpd    XScale+__svml_dacosh_data_internal(%rip), %ymm9, %ymm3
+        vfmadd213pd %ymm13, %ymm12, %ymm13
+        vaddpd    %ymm13, %ymm6, %ymm6
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * also adding L into Xl.
+ * compute 1+x as high, low parts
+ */
+        vmaxpd    %ymm6, %ymm8, %ymm4
+        vminpd    %ymm6, %ymm8, %ymm2
+        vandpd    SgnMask+__svml_dacosh_data_internal(%rip), %ymm6, %ymm14
+        vcmplt_oqpd XThreshold+__svml_dacosh_data_internal(%rip), %ymm14, %ymm15
+        vaddpd    %ymm2, %ymm4, %ymm0
+        vorpd     XhMask+__svml_dacosh_data_internal(%rip), %ymm15, %ymm5
+        vandpd    %ymm5, %ymm0, %ymm6
+        vblendvpd %ymm7, %ymm6, %ymm3, %ymm5
+        vsubpd    %ymm6, %ymm4, %ymm1
+
+/* 2^ (-10-exp(X) ) */
+        vmovupd   ExpMask2+__svml_dacosh_data_internal(%rip), %ymm15
+        vaddpd    %ymm1, %ymm2, %ymm10
+
+/* exponent bits */
+        vpsrlq    $20, %ymm5, %ymm2
+
+/*
+ * Now resume the main code.
+ * preserve mantissa, set input exponent to 2^(-10)
+ */
+        vandpd    ExpMask+__svml_dacosh_data_internal(%rip), %ymm5, %ymm11
+        vorpd     Two10+__svml_dacosh_data_internal(%rip), %ymm11, %ymm12
+
+/* reciprocal approximation good to at least 11 bits */
+        vcvtpd2ps %ymm12, %xmm13
+        vrcpps    %xmm13, %xmm14
+
+/* exponent*log(2.0) */
+        vmovupd   Threshold+__svml_dacosh_data_internal(%rip), %ymm13
+        vcvtps2pd %xmm14, %ymm3
+        vandpd    %ymm7, %ymm10, %ymm4
+
+/* exponent of X needed to scale Xl */
+        vandps    ExpMask0+__svml_dacosh_data_internal(%rip), %ymm5, %ymm0
+        vpsubq    %ymm0, %ymm15, %ymm6
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        vroundpd  $0, %ymm3, %ymm3
+        vextractf128 $1, %ymm2, %xmm1
+        vshufps   $221, %xmm1, %xmm2, %xmm10
+
+/* biased exponent in DP format */
+        vcvtdq2pd %xmm10, %ymm12
+
+/* scale DblRcp */
+        vmulpd    %ymm6, %ymm3, %ymm2
+
+/* Add 31 to the exponent in the "large" case to get log(2 * input) */
+        vaddpd    dThirtyOne+__svml_dacosh_data_internal(%rip), %ymm12, %ymm11
+
+/* argument reduction */
+        vfmsub213pd %ymm8, %ymm2, %ymm5
+        vmulpd    %ymm2, %ymm4, %ymm8
+        vmovupd   poly_coeff+64+__svml_dacosh_data_internal(%rip), %ymm2
+        vblendvpd %ymm7, %ymm12, %ymm11, %ymm1
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        vpsrlq    $40, %ymm3, %ymm7
+        vcmplt_oqpd %ymm3, %ymm13, %ymm3
+        vandpd    Bias+__svml_dacosh_data_internal(%rip), %ymm3, %ymm14
+        vorpd     Bias1+__svml_dacosh_data_internal(%rip), %ymm14, %ymm15
+        vsubpd    %ymm15, %ymm1, %ymm1
+
+/* polynomial */
+        vmovupd   poly_coeff+__svml_dacosh_data_internal(%rip), %ymm3
+        vmovd     %xmm7, %edx
+        vextractf128 $1, %ymm7, %xmm10
+        vpextrd   $2, %xmm7, %ecx
+        vmulpd    L2+__svml_dacosh_data_internal(%rip), %ymm1, %ymm7
+        vaddpd    %ymm8, %ymm5, %ymm1
+        vmovd     %xmm10, %esi
+        vsubpd    %ymm5, %ymm1, %ymm5
+        vfmadd213pd poly_coeff+32+__svml_dacosh_data_internal(%rip), %ymm1, %ymm3
+        vfmadd213pd poly_coeff+96+__svml_dacosh_data_internal(%rip), %ymm1, %ymm2
+        vsubpd    %ymm5, %ymm8, %ymm4
+        vmulpd    %ymm1, %ymm1, %ymm8
+        vfmadd213pd %ymm2, %ymm8, %ymm3
+        movslq    %edx, %rdx
+        movslq    %esi, %rsi
+        vpextrd   $2, %xmm10, %edi
+        movslq    %ecx, %rcx
+        movslq    %edi, %rdi
+
+/*
+ * reconstruction
+ * VQFMA( D, R, P, R2, R );
+ */
+        vfmadd213pd %ymm4, %ymm8, %ymm3
+        vmovsd    (%r8,%rdx), %xmm0
+        vmovsd    (%r8,%rsi), %xmm11
+        vmovhpd   (%r8,%rcx), %xmm0, %xmm6
+        vmovhpd   (%r8,%rdi), %xmm11, %xmm12
+        vinsertf128 $1, %xmm12, %ymm6, %ymm0
+        vaddpd    %ymm3, %ymm1, %ymm6
+        vaddpd    %ymm6, %ymm0, %ymm0
+        vaddpd    %ymm0, %ymm7, %ymm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm9
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm9, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      acosh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_acosh_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dacosh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
+        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
+        __declspec(align(32)) VUINT32 ExpMask[4][2];
+        __declspec(align(32)) VUINT32 Two10[4][2];
+        __declspec(align(32)) VUINT32 MinLog1p[4][2];
+        __declspec(align(32)) VUINT32 MaxLog1p[4][2];
+        __declspec(align(32)) VUINT32 One[4][2];
+        __declspec(align(32)) VUINT32 SgnMask[4][2];
+        __declspec(align(32)) VUINT32 XThreshold[4][2];
+        __declspec(align(32)) VUINT32 XhMask[4][2];
+        __declspec(align(32)) VUINT32 Threshold[4][2];
+        __declspec(align(32)) VUINT32 Bias[4][2];
+        __declspec(align(32)) VUINT32 Bias1[4][2];
+        __declspec(align(32)) VUINT32 ExpMask0[4][2];
+        __declspec(align(32)) VUINT32 ExpMask2[4][2];
+        __declspec(align(32)) VUINT32 L2[4][2];
+        __declspec(align(32)) VUINT32 dBigThreshold[4][2];
+        __declspec(align(32)) VUINT32 dC1[4][2];
+        __declspec(align(32)) VUINT32 dC2[4][2];
+        __declspec(align(32)) VUINT32 dC3[4][2];
+        __declspec(align(32)) VUINT32 dC4[4][2];
+        __declspec(align(32)) VUINT32 dC5[4][2];
+        __declspec(align(32)) VUINT32 dLargestFinite[4][2];
+        __declspec(align(32)) VUINT32 dThirtyOne[4][2];
+        __declspec(align(32)) VUINT32 dTopMask12[4][2];
+        __declspec(align(32)) VUINT32 dTopMask29[4][2];
+        __declspec(align(32)) VUINT32 XScale[4][2];
+} __svml_dacosh_data_internal;
+#endif
+__svml_dacosh_data_internal:
+        /* Log_HA_table */
+        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
+        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
+        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
+        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
+        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
+        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
+        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
+        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
+        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
+        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
+        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
+        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
+        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
+        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
+        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
+        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
+        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
+        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
+        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
+        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
+        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
+        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
+        .quad 0xc086238206e94218, 0xbe1ceee898588610
+        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
+        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
+        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
+        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
+        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
+        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
+        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
+        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
+        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
+        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
+        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
+        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
+        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
+        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
+        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
+        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
+        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
+        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
+        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
+        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
+        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
+        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
+        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
+        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
+        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
+        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
+        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
+        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
+        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
+        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
+        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
+        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
+        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
+        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
+        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
+        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
+        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
+        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
+        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
+        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
+        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
+        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
+        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
+        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
+        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
+        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
+        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
+        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
+        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
+        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
+        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
+        .quad 0xc086244055d2c968, 0xbe1cef345284c119
+        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
+        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
+        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
+        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
+        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
+        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
+        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
+        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
+        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
+        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
+        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
+        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
+        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
+        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
+        .quad 0xc086247419475160, 0xbe1cf03dd9922331
+        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
+        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
+        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
+        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
+        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
+        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
+        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
+        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
+        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
+        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
+        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
+        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
+        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
+        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
+        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
+        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
+        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
+        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
+        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
+        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
+        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
+        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
+        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
+        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
+        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
+        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
+        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
+        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
+        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
+        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
+        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
+        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
+        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
+        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
+        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
+        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
+        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
+        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
+        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
+        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
+        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
+        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
+        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
+        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
+        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
+        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
+        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
+        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
+        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
+        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
+        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
+        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
+        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
+        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
+        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
+        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
+        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
+        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
+        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
+        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
+        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
+        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
+        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
+        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
+        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
+        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
+        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
+        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
+        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
+        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
+        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
+        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
+        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
+        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
+        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
+        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
+        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
+        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
+        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
+        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
+        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
+        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
+        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
+        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
+        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
+        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
+        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
+        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
+        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
+        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
+        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
+        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
+        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
+        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
+        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
+        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
+        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
+        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
+        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
+        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
+        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
+        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
+        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
+        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
+        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
+        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
+        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
+        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
+        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
+        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
+        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
+        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
+        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
+        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
+        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
+        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
+        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
+        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
+        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
+        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
+        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
+        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
+        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
+        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
+        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
+        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
+        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
+        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
+        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
+        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
+        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
+        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
+        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
+        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
+        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
+        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
+        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
+        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
+        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
+        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
+        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
+        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
+        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
+        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
+        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
+        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
+        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
+        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
+        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
+        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
+        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
+        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
+        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
+        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
+        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
+        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
+        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
+        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
+        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
+        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
+        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
+        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
+        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
+        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
+        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
+        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
+        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
+        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
+        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
+        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
+        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
+        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
+        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
+        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
+        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
+        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
+        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
+        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
+        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
+        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
+        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
+        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
+        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
+        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
+        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
+        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
+        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
+        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
+        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
+        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
+        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
+        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
+        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
+        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
+        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
+        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
+        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
+        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
+        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
+        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
+        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
+        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
+        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
+        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
+        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
+        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
+        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
+        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
+        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
+        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
+        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
+        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
+        .quad 0xc08626e164224880, 0xbe1ceeb431709788
+        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
+        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
+        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
+        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
+        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
+        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
+        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
+        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
+        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
+        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
+        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
+        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
+        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
+        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
+        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
+        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
+        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
+        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
+        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
+        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
+        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
+        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
+        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
+        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
+        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
+        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
+        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
+        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
+        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
+        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
+        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
+        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
+        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
+        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
+        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
+        .quad 0xc086273a05367688, 0xbe1cf18656c50806
+        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
+        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
+        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
+        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
+        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
+        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
+        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
+        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
+        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
+        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
+        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
+        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
+        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
+        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
+        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
+        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
+        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
+        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
+        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
+        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
+        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
+        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
+        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
+        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
+        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
+        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
+        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
+        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
+        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
+        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
+        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
+        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
+        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
+        .quad 0xc086278a58297918, 0xbe1cf053073872bf
+        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
+        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
+        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
+        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
+        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
+        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
+        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
+        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
+        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
+        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
+        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
+        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
+        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
+        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
+        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
+        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
+        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
+        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
+        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
+        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
+        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
+        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
+        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
+        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
+        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
+        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
+        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
+        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
+        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
+        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
+        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
+        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
+        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
+        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
+        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
+        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
+        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
+        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
+        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
+        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
+        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
+        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
+        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
+        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
+        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
+        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
+        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
+        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
+        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
+        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
+        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
+        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
+        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
+        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
+        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
+        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
+        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
+        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
+        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
+        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
+        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
+        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
+        .quad 0xc086281755366778, 0xbe1cef2edae5837d
+        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
+        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
+        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
+        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
+        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
+        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
+        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
+        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
+        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
+        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
+        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
+        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
+        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
+        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
+        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
+        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
+        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
+        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
+        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
+        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
+        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
+        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
+        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
+        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
+        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
+        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
+        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
+        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
+        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
+        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
+        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
+        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
+        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
+        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
+        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
+        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
+        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
+        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
+        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
+        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
+        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
+        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
+        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
+        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
+        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
+        .quad 0xc086287879041490, 0xbe1cf034803c8a48
+        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
+        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
+        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
+        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
+        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
+        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
+        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
+        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
+        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
+        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
+        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
+        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
+        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
+        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
+        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
+        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
+        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
+        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
+        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
+        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
+        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
+        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
+        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
+        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
+        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
+        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
+        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
+        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
+        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
+        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
+        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
+        /*== Log_LA_table ==*/
+        .align 32
+        .quad 0x8000000000000000
+        .quad 0xbf5ff802a9ab10e6
+        .quad 0xbf6ff00aa2b10bc0
+        .quad 0xbf77ee11ebd82e94
+        .quad 0xbf7fe02a6b106789
+        .quad 0xbf83e7295d25a7d9
+        .quad 0xbf87dc475f810a77
+        .quad 0xbf8bcf712c74384c
+        .quad 0xbf8fc0a8b0fc03e4
+        .quad 0xbf91d7f7eb9eebe7
+        .quad 0xbf93cea44346a575
+        .quad 0xbf95c45a51b8d389
+        .quad 0xbf97b91b07d5b11b
+        .quad 0xbf99ace7551cc514
+        .quad 0xbf9b9fc027af9198
+        .quad 0xbf9d91a66c543cc4
+        .quad 0xbf9f829b0e783300
+        .quad 0xbfa0b94f7c196176
+        .quad 0xbfa1b0d98923d980
+        .quad 0xbfa2a7ec2214e873
+        .quad 0xbfa39e87b9febd60
+        .quad 0xbfa494acc34d911c
+        .quad 0xbfa58a5bafc8e4d5
+        .quad 0xbfa67f94f094bd98
+        .quad 0xbfa77458f632dcfc
+        .quad 0xbfa868a83083f6cf
+        .quad 0xbfa95c830ec8e3eb
+        .quad 0xbfaa4fe9ffa3d235
+        .quad 0xbfab42dd711971bf
+        .quad 0xbfac355dd0921f2d
+        .quad 0xbfad276b8adb0b52
+        .quad 0xbfae19070c276016
+        .quad 0xbfaf0a30c01162a6
+        .quad 0xbfaffae9119b9303
+        .quad 0xbfb075983598e471
+        .quad 0xbfb0ed839b5526fe
+        .quad 0xbfb16536eea37ae1
+        .quad 0xbfb1dcb263db1944
+        .quad 0xbfb253f62f0a1417
+        .quad 0xbfb2cb0283f5de1f
+        .quad 0xbfb341d7961bd1d1
+        .quad 0xbfb3b87598b1b6ee
+        .quad 0xbfb42edcbea646f0
+        .quad 0xbfb4a50d3aa1b040
+        .quad 0xbfb51b073f06183f
+        .quad 0xbfb590cafdf01c28
+        .quad 0xbfb60658a93750c4
+        .quad 0xbfb67bb0726ec0fc
+        .quad 0xbfb6f0d28ae56b4c
+        .quad 0xbfb765bf23a6be13
+        .quad 0xbfb7da766d7b12cd
+        .quad 0xbfb84ef898e8282a
+        .quad 0xbfb8c345d6319b21
+        .quad 0xbfb9375e55595ede
+        .quad 0xbfb9ab42462033ad
+        .quad 0xbfba1ef1d8061cd4
+        .quad 0xbfba926d3a4ad563
+        .quad 0xbfbb05b49bee43fe
+        .quad 0xbfbb78c82bb0eda1
+        .quad 0xbfbbeba818146765
+        .quad 0xbfbc5e548f5bc743
+        .quad 0xbfbcd0cdbf8c13e1
+        .quad 0xbfbd4313d66cb35d
+        .quad 0xbfbdb5270187d927
+        .quad 0xbfbe27076e2af2e6
+        .quad 0xbfbe98b549671467
+        .quad 0xbfbf0a30c01162a6
+        .quad 0xbfbf7b79fec37ddf
+        .quad 0xbfbfec9131dbeabb
+        .quad 0xbfc02ebb42bf3d4b
+        .quad 0xbfc0671512ca596e
+        .quad 0xbfc09f561ee719c3
+        .quad 0xbfc0d77e7cd08e59
+        .quad 0xbfc10f8e422539b1
+        .quad 0xbfc14785846742ac
+        .quad 0xbfc17f6458fca611
+        .quad 0xbfc1b72ad52f67a0
+        .quad 0xbfc1eed90e2dc2c3
+        .quad 0xbfc2266f190a5acb
+        .quad 0xbfc25ded0abc6ad2
+        .quad 0xbfc29552f81ff523
+        .quad 0xbfc2cca0f5f5f251
+        .quad 0xbfc303d718e47fd3
+        .quad 0xbfc33af575770e4f
+        .quad 0xbfc371fc201e8f74
+        .quad 0xbfc3a8eb2d31a376
+        .quad 0xbfc3dfc2b0ecc62a
+        .quad 0xbfc41682bf727bc0
+        .quad 0xbfc44d2b6ccb7d1e
+        .quad 0xbfc483bccce6e3dd
+        .quad 0xbfc4ba36f39a55e5
+        .quad 0xbfc4f099f4a230b2
+        .quad 0xbfc526e5e3a1b438
+        .quad 0xbfc55d1ad4232d6f
+        .quad 0xbfc59338d9982086
+        .quad 0xbfc5c940075972b9
+        .quad 0xbfc5ff3070a793d4
+        .quad 0xbfc6350a28aaa758
+        .quad 0xbfc66acd4272ad51
+        .quad 0xbfc6a079d0f7aad2
+        .quad 0xbfc6d60fe719d21d
+        .quad 0xbfc70b8f97a1aa75
+        .quad 0xbfc740f8f54037a5
+        .quad 0xbfc7764c128f2127
+        .quad 0xbfc7ab890210d909
+        .quad 0xbfc7e0afd630c274
+        .quad 0xbfc815c0a14357eb
+        .quad 0xbfc84abb75865139
+        .quad 0xbfc87fa06520c911
+        .quad 0xbfc8b46f8223625b
+        .quad 0xbfc8e928de886d41
+        .quad 0xbfc91dcc8c340bde
+        .quad 0xbfc9525a9cf456b4
+        .quad 0xbfc986d3228180ca
+        .quad 0xbfc9bb362e7dfb83
+        .quad 0xbfc9ef83d2769a34
+        .quad 0xbfca23bc1fe2b563
+        .quad 0xbfca57df28244dcd
+        .quad 0xbfca8becfc882f19
+        .quad 0xbfcabfe5ae46124c
+        .quad 0xbfcaf3c94e80bff3
+        .quad 0xbfcb2797ee46320c
+        .quad 0xbfcb5b519e8fb5a4
+        .quad 0xbfcb8ef670420c3b
+        .quad 0xbfcbc286742d8cd6
+        .quad 0xbfcbf601bb0e44e2
+        .quad 0xbfcc2968558c18c1
+        .quad 0xbfcc5cba543ae425
+        .quad 0xbfcc8ff7c79a9a22
+        .quad 0xbfccc320c0176502
+        .quad 0xbfccf6354e09c5dc
+        .quad 0xbfcd293581b6b3e7
+        .quad 0xbfcd5c216b4fbb91
+        .quad 0xbfcd8ef91af31d5e
+        .quad 0xbfcdc1bca0abec7d
+        .quad 0xbfcdf46c0c722d2f
+        .quad 0xbfce27076e2af2e6
+        .quad 0xbfce598ed5a87e2f
+        .quad 0xbfce8c0252aa5a60
+        .quad 0xbfcebe61f4dd7b0b
+        .quad 0xbfcef0adcbdc5936
+        .quad 0xbfcf22e5e72f105d
+        .quad 0xbfcf550a564b7b37
+        .quad 0xbfcf871b28955045
+        .quad 0xbfcfb9186d5e3e2b
+        .quad 0xbfcfeb0233e607cc
+        .quad 0xbfd00e6c45ad501d
+        .quad 0xbfd0274dc16c232f
+        .quad 0xbfd0402594b4d041
+        .quad 0xbfd058f3c703ebc6
+        .quad 0xbfd071b85fcd590d
+        .quad 0xbfd08a73667c57af
+        .quad 0xbfd0a324e27390e3
+        .quad 0xbfd0bbccdb0d24bd
+        .quad 0xbfd0d46b579ab74b
+        .quad 0xbfd0ed005f657da4
+        .quad 0xbfd1058bf9ae4ad5
+        .quad 0xbfd11e0e2dad9cb7
+        .quad 0xbfd136870293a8b0
+        .quad 0xbfd14ef67f88685a
+        .quad 0xbfd1675cababa60e
+        .quad 0xbfd17fb98e15095d
+        .quad 0xbfd1980d2dd4236f
+        .quad 0xbfd1b05791f07b49
+        .quad 0xbfd1c898c16999fb
+        .quad 0xbfd1e0d0c33716be
+        .quad 0xbfd1f8ff9e48a2f3
+        .quad 0xbfd211255986160c
+        .quad 0xbfd22941fbcf7966
+        .quad 0xbfd241558bfd1404
+        .quad 0xbfd2596010df763a
+        .quad 0xbfd27161913f853d
+        .quad 0xbfd2895a13de86a3
+        .quad 0xbfd2a1499f762bc9
+        .quad 0xbfd2b9303ab89d25
+        .quad 0xbfd2d10dec508583
+        .quad 0xbfd2e8e2bae11d31
+        .quad 0xbfd300aead06350c
+        .quad 0xbfd31871c9544185
+        .quad 0xbfd3302c16586588
+        .quad 0xbfd347dd9a987d55
+        .quad 0xbfd35f865c93293e
+        .quad 0xbfd3772662bfd85b
+        .quad 0xbfd38ebdb38ed321
+        .quad 0xbfd3a64c556945ea
+        .quad 0xbfd3bdd24eb14b6a
+        .quad 0xbfd3d54fa5c1f710
+        .quad 0xbfd3ecc460ef5f50
+        .quad 0xbfd404308686a7e4
+        .quad 0xbfd41b941cce0bee
+        .quad 0xbfd432ef2a04e814
+        .quad 0xbfd44a41b463c47c
+        .quad 0xbfd4618bc21c5ec2
+        .quad 0xbfd478cd5959b3d9
+        .quad 0xbfd49006804009d1
+        .quad 0xbfd4a7373cecf997
+        .quad 0xbfd4be5f957778a1
+        .quad 0xbfd4d57f8fefe27f
+        .quad 0xbfd4ec973260026a
+        .quad 0xbfd503a682cb1cb3
+        .quad 0xbfd51aad872df82d
+        .quad 0xbfd531ac457ee77e
+        .quad 0xbfd548a2c3add263
+        .quad 0xbfd55f9107a43ee2
+        .quad 0xbfd5767717455a6c
+        .quad 0xbfd58d54f86e02f2
+        .quad 0xbfd5a42ab0f4cfe2
+        .quad 0xbfd5baf846aa1b19
+        .quad 0xbfd5d1bdbf5809ca
+        .quad 0xbfd5e87b20c2954a
+        .quad 0xbfd5ff3070a793d4
+        .quad 0xbfd615ddb4bec13c
+        .quad 0xbfd62c82f2b9c795
+        .quad 0x3fd61965cdb02c1f
+        .quad 0x3fd602d08af091ec
+        .quad 0x3fd5ec433d5c35ae
+        .quad 0x3fd5d5bddf595f30
+        .quad 0x3fd5bf406b543db2
+        .quad 0x3fd5a8cadbbedfa1
+        .quad 0x3fd5925d2b112a59
+        .quad 0x3fd57bf753c8d1fb
+        .quad 0x3fd565995069514c
+        .quad 0x3fd54f431b7be1a9
+        .quad 0x3fd538f4af8f72fe
+        .quad 0x3fd522ae0738a3d8
+        .quad 0x3fd50c6f1d11b97c
+        .quad 0x3fd4f637ebba9810
+        .quad 0x3fd4e0086dd8baca
+        .quad 0x3fd4c9e09e172c3c
+        .quad 0x3fd4b3c077267e9a
+        .quad 0x3fd49da7f3bcc41f
+        .quad 0x3fd487970e958770
+        .quad 0x3fd4718dc271c41b
+        .quad 0x3fd45b8c0a17df13
+        .quad 0x3fd44591e0539f49
+        .quad 0x3fd42f9f3ff62642
+        .quad 0x3fd419b423d5e8c7
+        .quad 0x3fd403d086cea79c
+        .quad 0x3fd3edf463c1683e
+        .quad 0x3fd3d81fb5946dba
+        .quad 0x3fd3c25277333184
+        .quad 0x3fd3ac8ca38e5c5f
+        .quad 0x3fd396ce359bbf54
+        .quad 0x3fd3811728564cb2
+        .quad 0x3fd36b6776be1117
+        .quad 0x3fd355bf1bd82c8b
+        .quad 0x3fd3401e12aecba1
+        .quad 0x3fd32a84565120a8
+        .quad 0x3fd314f1e1d35ce4
+        .quad 0x3fd2ff66b04ea9d4
+        .quad 0x3fd2e9e2bce12286
+        .quad 0x3fd2d46602adccee
+        .quad 0x3fd2bef07cdc9354
+        .quad 0x3fd2a982269a3dbf
+        .quad 0x3fd2941afb186b7c
+        .quad 0x3fd27ebaf58d8c9d
+        .quad 0x3fd269621134db92
+        .quad 0x3fd25410494e56c7
+        .quad 0x3fd23ec5991eba49
+        .quad 0x3fd22981fbef797b
+        .quad 0x3fd214456d0eb8d4
+        .quad 0x3fd1ff0fe7cf47a7
+        .quad 0x3fd1e9e1678899f4
+        .quad 0x3fd1d4b9e796c245
+        .quad 0x3fd1bf99635a6b95
+        .quad 0x3fd1aa7fd638d33f
+        .quad 0x3fd1956d3b9bc2fa
+        .quad 0x3fd180618ef18adf
+        .quad 0x3fd16b5ccbacfb73
+        .quad 0x3fd1565eed455fc3
+        .quad 0x3fd14167ef367783
+        .quad 0x3fd12c77cd00713b
+        .quad 0x3fd1178e8227e47c
+        .quad 0x3fd102ac0a35cc1c
+        .quad 0x3fd0edd060b78081
+        .quad 0x3fd0d8fb813eb1ef
+        .quad 0x3fd0c42d676162e3
+        .quad 0x3fd0af660eb9e279
+        .quad 0x3fd09aa572e6c6d4
+        .quad 0x3fd085eb8f8ae797
+        .quad 0x3fd07138604d5862
+        .quad 0x3fd05c8be0d9635a
+        .quad 0x3fd047e60cde83b8
+        .quad 0x3fd03346e0106062
+        .quad 0x3fd01eae5626c691
+        .quad 0x3fd00a1c6adda473
+        .quad 0x3fcfeb2233ea07cd
+        .quad 0x3fcfc218be620a5e
+        .quad 0x3fcf991c6cb3b379
+        .quad 0x3fcf702d36777df0
+        .quad 0x3fcf474b134df229
+        .quad 0x3fcf1e75fadf9bde
+        .quad 0x3fcef5ade4dcffe6
+        .quad 0x3fceccf2c8fe920a
+        .quad 0x3fcea4449f04aaf5
+        .quad 0x3fce7ba35eb77e2a
+        .quad 0x3fce530effe71012
+        .quad 0x3fce2a877a6b2c12
+        .quad 0x3fce020cc6235ab5
+        .quad 0x3fcdd99edaf6d7e9
+        .quad 0x3fcdb13db0d48940
+        .quad 0x3fcd88e93fb2f450
+        .quad 0x3fcd60a17f903515
+        .quad 0x3fcd38666871f465
+        .quad 0x3fcd1037f2655e7b
+        .quad 0x3fcce816157f1988
+        .quad 0x3fccc000c9db3c52
+        .quad 0x3fcc97f8079d44ec
+        .quad 0x3fcc6ffbc6f00f71
+        .quad 0x3fcc480c0005ccd1
+        .quad 0x3fcc2028ab17f9b4
+        .quad 0x3fcbf851c067555f
+        .quad 0x3fcbd087383bd8ad
+        .quad 0x3fcba8c90ae4ad19
+        .quad 0x3fcb811730b823d2
+        .quad 0x3fcb5971a213acdb
+        .quad 0x3fcb31d8575bce3d
+        .quad 0x3fcb0a4b48fc1b46
+        .quad 0x3fcae2ca6f672bd4
+        .quad 0x3fcabb55c31693ad
+        .quad 0x3fca93ed3c8ad9e3
+        .quad 0x3fca6c90d44b704e
+        .quad 0x3fca454082e6ab05
+        .quad 0x3fca1dfc40f1b7f1
+        .quad 0x3fc9f6c407089664
+        .quad 0x3fc9cf97cdce0ec3
+        .quad 0x3fc9a8778debaa38
+        .quad 0x3fc981634011aa75
+        .quad 0x3fc95a5adcf7017f
+        .quad 0x3fc9335e5d594989
+        .quad 0x3fc90c6db9fcbcd9
+        .quad 0x3fc8e588ebac2dbf
+        .quad 0x3fc8beafeb38fe8c
+        .quad 0x3fc897e2b17b19a5
+        .quad 0x3fc871213750e994
+        .quad 0x3fc84a6b759f512f
+        .quad 0x3fc823c16551a3c2
+        .quad 0x3fc7fd22ff599d4f
+        .quad 0x3fc7d6903caf5ad0
+        .quad 0x3fc7b0091651528c
+        .quad 0x3fc7898d85444c73
+        .quad 0x3fc7631d82935a86
+        .quad 0x3fc73cb9074fd14d
+        .quad 0x3fc716600c914054
+        .quad 0x3fc6f0128b756abc
+        .quad 0x3fc6c9d07d203fc7
+        .quad 0x3fc6a399dabbd383
+        .quad 0x3fc67d6e9d785771
+        .quad 0x3fc6574ebe8c133a
+        .quad 0x3fc6313a37335d76
+        .quad 0x3fc60b3100b09476
+        .quad 0x3fc5e533144c1719
+        .quad 0x3fc5bf406b543db2
+        .quad 0x3fc59958ff1d52f1
+        .quad 0x3fc5737cc9018cdd
+        .quad 0x3fc54dabc26105d2
+        .quad 0x3fc527e5e4a1b58d
+        .quad 0x3fc5022b292f6a45
+        .quad 0x3fc4dc7b897bc1c8
+        .quad 0x3fc4b6d6fefe22a4
+        .quad 0x3fc4913d8333b561
+        .quad 0x3fc46baf0f9f5db7
+        .quad 0x3fc4462b9dc9b3dc
+        .quad 0x3fc420b32740fdd4
+        .quad 0x3fc3fb45a59928cc
+        .quad 0x3fc3d5e3126bc27f
+        .quad 0x3fc3b08b6757f2a9
+        .quad 0x3fc38b3e9e027479
+        .quad 0x3fc365fcb0159016
+        .quad 0x3fc340c59741142e
+        .quad 0x3fc31b994d3a4f85
+        .quad 0x3fc2f677cbbc0a96
+        .quad 0x3fc2d1610c86813a
+        .quad 0x3fc2ac55095f5c59
+        .quad 0x3fc28753bc11aba5
+        .quad 0x3fc2625d1e6ddf57
+        .quad 0x3fc23d712a49c202
+        .quad 0x3fc2188fd9807263
+        .quad 0x3fc1f3b925f25d41
+        .quad 0x3fc1ceed09853752
+        .quad 0x3fc1aa2b7e23f72a
+        .quad 0x3fc185747dbecf34
+        .quad 0x3fc160c8024b27b1
+        .quad 0x3fc13c2605c398c3
+        .quad 0x3fc1178e8227e47c
+        .quad 0x3fc0f301717cf0fb
+        .quad 0x3fc0ce7ecdccc28d
+        .quad 0x3fc0aa06912675d5
+        .quad 0x3fc08598b59e3a07
+        .quad 0x3fc06135354d4b18
+        .quad 0x3fc03cdc0a51ec0d
+        .quad 0x3fc0188d2ecf6140
+        .quad 0x3fbfe89139dbd566
+        .quad 0x3fbfa01c9db57ce2
+        .quad 0x3fbf57bc7d9005db
+        .quad 0x3fbf0f70cdd992e3
+        .quad 0x3fbec739830a1120
+        .quad 0x3fbe7f1691a32d3e
+        .quad 0x3fbe3707ee30487b
+        .quad 0x3fbdef0d8d466db9
+        .quad 0x3fbda727638446a2
+        .quad 0x3fbd5f55659210e2
+        .quad 0x3fbd179788219364
+        .quad 0x3fbccfedbfee13a8
+        .quad 0x3fbc885801bc4b23
+        .quad 0x3fbc40d6425a5cb1
+        .quad 0x3fbbf968769fca11
+        .quad 0x3fbbb20e936d6974
+        .quad 0x3fbb6ac88dad5b1c
+        .quad 0x3fbb23965a52ff00
+        .quad 0x3fbadc77ee5aea8c
+        .quad 0x3fba956d3ecade63
+        .quad 0x3fba4e7640b1bc38
+        .quad 0x3fba0792e9277cac
+        .quad 0x3fb9c0c32d4d2548
+        .quad 0x3fb97a07024cbe74
+        .quad 0x3fb9335e5d594989
+        .quad 0x3fb8ecc933aeb6e8
+        .quad 0x3fb8a6477a91dc29
+        .quad 0x3fb85fd927506a48
+        .quad 0x3fb8197e2f40e3f0
+        .quad 0x3fb7d33687c293c9
+        .quad 0x3fb78d02263d82d3
+        .quad 0x3fb746e100226ed9
+        .quad 0x3fb700d30aeac0e1
+        .quad 0x3fb6bad83c1883b6
+        .quad 0x3fb674f089365a7a
+        .quad 0x3fb62f1be7d77743
+        .quad 0x3fb5e95a4d9791cb
+        .quad 0x3fb5a3abb01ade25
+        .quad 0x3fb55e10050e0384
+        .quad 0x3fb518874226130a
+        .quad 0x3fb4d3115d207eac
+        .quad 0x3fb48dae4bc31018
+        .quad 0x3fb4485e03dbdfad
+        .quad 0x3fb403207b414b7f
+        .quad 0x3fb3bdf5a7d1ee64
+        .quad 0x3fb378dd7f749714
+        .quad 0x3fb333d7f8183f4b
+        .quad 0x3fb2eee507b40301
+        .quad 0x3fb2aa04a44717a5
+        .quad 0x3fb26536c3d8c369
+        .quad 0x3fb2207b5c78549e
+        .quad 0x3fb1dbd2643d190b
+        .quad 0x3fb1973bd1465567
+        .quad 0x3fb152b799bb3cc9
+        .quad 0x3fb10e45b3cae831
+        .quad 0x3fb0c9e615ac4e17
+        .quad 0x3fb08598b59e3a07
+        .quad 0x3fb0415d89e74444
+        .quad 0x3faffa6911ab9301
+        .quad 0x3faf723b517fc523
+        .quad 0x3faeea31c006b87c
+        .quad 0x3fae624c4a0b5e1b
+        .quad 0x3fadda8adc67ee4e
+        .quad 0x3fad52ed6405d86f
+        .quad 0x3faccb73cdddb2cc
+        .quad 0x3fac441e06f72a9e
+        .quad 0x3fabbcebfc68f420
+        .quad 0x3fab35dd9b58baad
+        .quad 0x3faaaef2d0fb10fc
+        .quad 0x3faa282b8a936171
+        .quad 0x3fa9a187b573de7c
+        .quad 0x3fa91b073efd7314
+        .quad 0x3fa894aa149fb343
+        .quad 0x3fa80e7023d8ccc4
+        .quad 0x3fa788595a3577ba
+        .quad 0x3fa70265a550e777
+        .quad 0x3fa67c94f2d4bb58
+        .quad 0x3fa5f6e73078efb8
+        .quad 0x3fa5715c4c03ceef
+        .quad 0x3fa4ebf43349e26f
+        .quad 0x3fa466aed42de3ea
+        .quad 0x3fa3e18c1ca0ae92
+        .quad 0x3fa35c8bfaa1306b
+        .quad 0x3fa2d7ae5c3c5bae
+        .quad 0x3fa252f32f8d183f
+        .quad 0x3fa1ce5a62bc353a
+        .quad 0x3fa149e3e4005a8d
+        .quad 0x3fa0c58fa19dfaaa
+        .quad 0x3fa0415d89e74444
+        .quad 0x3f9f7a9b16782856
+        .quad 0x3f9e72bf2813ce51
+        .quad 0x3f9d6b2725979802
+        .quad 0x3f9c63d2ec14aaf2
+        .quad 0x3f9b5cc258b718e6
+        .quad 0x3f9a55f548c5c43f
+        .quad 0x3f994f6b99a24475
+        .quad 0x3f98492528c8cabf
+        .quad 0x3f974321d3d006d3
+        .quad 0x3f963d6178690bd6
+        .quad 0x3f9537e3f45f3565
+        .quad 0x3f9432a925980cc1
+        .quad 0x3f932db0ea132e22
+        .quad 0x3f9228fb1fea2e28
+        .quad 0x3f912487a5507f70
+        .quad 0x3f90205658935847
+        .quad 0x3f8e38ce3033310c
+        .quad 0x3f8c317384c75f06
+        .quad 0x3f8a2a9c6c170462
+        .quad 0x3f882448a388a2aa
+        .quad 0x3f861e77e8b53fc6
+        .quad 0x3f841929f96832f0
+        .quad 0x3f82145e939ef1e9
+        .quad 0x3f8010157588de71
+        .quad 0x3f7c189cbb0e27fb
+        .quad 0x3f78121214586b54
+        .quad 0x3f740c8a747878e2
+        .quad 0x3f70080559588b35
+        .quad 0x3f680904828985c0
+        .quad 0x3f60040155d5889e
+        .quad 0x3f50020055655889
+        .quad 0x0000000000000000
+        /*== poly_coeff[4] ==*/
+        .align 32
+        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
+        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
+        .quad 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
+        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 32
+        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
+        /*== MinLog1p = -1+2^(-53) ==*/
+        .align 32
+        .quad 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff
+        /*== MaxLog1p ==*/
+        .align 32
+        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000
+        /*== One ==*/
+        .align 32
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== SgnMask ==*/
+        .align 32
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== XThreshold ==*/
+        .align 32
+        .quad 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000
+        /*== XhMask ==*/
+        .align 32
+        .quad 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00
+        /*== Threshold ==*/
+        .align 32
+        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 32
+        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 32
+        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000
+        /*== ExpMask2 ==*/
+        .align 32
+        .quad 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000
+        /*== L2L ==*/
+        .align 32
+        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
+        /*== dBigThreshold ==*/
+        .align 32
+        .quad 0x41D0000000000000, 0x41D0000000000000, 0x41D0000000000000, 0x41D0000000000000
+        /*== dC1 ==*/
+        .align 32
+        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
+        /*== dC2 ==*/
+        .align 32
+        .quad 0x3fd7fffffffffffa, 0x3fd7fffffffffffa, 0x3fd7fffffffffffa, 0x3fd7fffffffffffa
+        /*== dC3 ==*/
+        .align 32
+        .quad 0x3fd3fffffffffffa, 0x3fd3fffffffffffa, 0x3fd3fffffffffffa, 0x3fd3fffffffffffa
+        /*== dC4 ==*/
+        .align 32
+        .quad 0x3fd1800013d9d428, 0x3fd1800013d9d428, 0x3fd1800013d9d428, 0x3fd1800013d9d428
+        /*== dC5 ==*/
+        .align 32
+        .quad 0x3fcf800025de102f, 0x3fcf800025de102f, 0x3fcf800025de102f, 0x3fcf800025de102f
+        /*== dLargestFinite ==*/
+        .align 32
+        .quad 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF
+        /*== dThirtyOne ==*/
+        .align 32
+        .quad 0x403F000000000000, 0x403F000000000000, 0x403F000000000000, 0x403F000000000000
+        /*== dTopMask12 ==*/
+        .align 32
+        .quad 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000
+        /*== dTopMask29 ==*/
+        .align 32
+        .quad 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000
+        /*== XScale ==*/
+        .align 32
+        .quad 0x3E10000000000000, 0x3E10000000000000, 0x3E10000000000000, 0x3E10000000000000
+        .align 32
+        .type	__svml_dacosh_data_internal,@object
+        .size	__svml_dacosh_data_internal,.-__svml_dacosh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core-avx2.S
new file mode 100644
index 0000000000..48879787c1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized acosh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_acosh _ZGVeN8v_acosh_avx2_wrapper
+#include "../svml_d_acosh8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core.c
new file mode 100644
index 0000000000..4322a5f707
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized acosh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_acosh
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_acosh, __GI__ZGVeN8v_acosh, __redirect__ZGVeN8v_acosh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core_avx512.S
new file mode 100644
index 0000000000..3199ef77e2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core_avx512.S
@@ -0,0 +1,480 @@
+/* Function acosh vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute acosh(x) as log(x + sqrt(x*x - 1))
+ *   using RSQRT instructions for starting the
+ *   square root approximation, and small table lookups for log
+ *   that map to AVX-512 permute instructions
+ *
+ *   Special cases:
+ *
+ *   acosh(NaN)  = quiet NaN, and raise invalid exception
+ *   acosh(-INF) = NaN
+ *   acosh(+INF) = +INF
+ *   acosh(x)    = NaN if x < 1
+ *   acosh(1)    = +0
+ *
+ */
+
+/* Offsets for data table __svml_dacosh_data_internal_avx512
+ */
+#define Log_tbl_H                     	0
+#define Log_tbl_L                     	128
+#define One                           	256
+#define SmallThreshold                	320
+#define Threshold                     	384
+#define LargeThreshold                	448
+#define ca2                           	512
+#define ca1                           	576
+#define c4s                           	640
+#define c3s                           	704
+#define c2s                           	768
+#define c1s                           	832
+#define AddB5                         	896
+#define RcpBitMask                    	960
+#define OneEighth                     	1024
+#define Four                          	1088
+#define poly_coeff9                   	1152
+#define poly_coeff8                   	1216
+#define poly_coeff7                   	1280
+#define poly_coeff6                   	1344
+#define poly_coeff5                   	1408
+#define poly_coeff4                   	1472
+#define poly_coeff3                   	1536
+#define poly_coeff2                   	1600
+#define poly_coeff1                   	1664
+#define L2H                           	1728
+#define L2L                           	1792
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_acosh_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   One+__svml_dacosh_data_internal_avx512(%rip), %zmm5
+
+/* polynomial computation for small inputs */
+        vmovups   ca2+__svml_dacosh_data_internal_avx512(%rip), %zmm13
+        vmovups   ca1+__svml_dacosh_data_internal_avx512(%rip), %zmm14
+
+/*
+ * sqrt(1+x^2) ~ Sh + Sl + Sh*Eh*poly_s
+ * poly_s = c1+c2*Eh+c3*Eh^2
+ */
+        vmovups   c4s+__svml_dacosh_data_internal_avx512(%rip), %zmm1
+        vmovups   c2s+__svml_dacosh_data_internal_avx512(%rip), %zmm2
+        vmovups   c1s+__svml_dacosh_data_internal_avx512(%rip), %zmm6
+
+/* very large inputs ? */
+        vmovups   Threshold+__svml_dacosh_data_internal_avx512(%rip), %zmm15
+
+/* out of range inputs? */
+        vmovups   LargeThreshold+__svml_dacosh_data_internal_avx512(%rip), %zmm3
+
+/* not a very small input ? */
+        vmovups   SmallThreshold+__svml_dacosh_data_internal_avx512(%rip), %zmm10
+        vmovaps   %zmm0, %zmm12
+
+/* x^2 - 1 */
+        vmovaps   %zmm5, %zmm11
+        vfmsub231pd {rn-sae}, %zmm12, %zmm12, %zmm11
+        vcmppd    $21, {sae}, %zmm15, %zmm12, %k2
+        vcmppd    $22, {sae}, %zmm3, %zmm12, %k0
+        vcmppd    $18, {sae}, %zmm5, %zmm12, %k1
+        vrsqrt14pd %zmm11, %zmm4
+        vcmppd    $21, {sae}, %zmm10, %zmm11, %k3
+        vfmadd231pd {rn-sae}, %zmm11, %zmm13, %zmm14
+        vmovups   c3s+__svml_dacosh_data_internal_avx512(%rip), %zmm13
+
+/* Sh ~sqrt(-1+x^2) */
+        vmulpd    {rn-sae}, %zmm4, %zmm11, %zmm9
+        vmulpd    {rn-sae}, %zmm11, %zmm14, %zmm8
+
+/* Sh+x */
+        vaddpd    {rn-sae}, %zmm12, %zmm9, %zmm15
+
+/* Shh */
+        vsubpd    {rn-sae}, %zmm12, %zmm15, %zmm14
+
+/* (Yh*R0)_low */
+        vmovaps   %zmm11, %zmm0
+        korw      %k0, %k1, %k0
+
+/* rel. error term: Eh=1-Sh*R0 */
+        vmovaps   %zmm5, %zmm7
+        vfmsub213pd {rn-sae}, %zmm9, %zmm4, %zmm0
+        vfnmadd231pd {rn-sae}, %zmm9, %zmm4, %zmm7
+
+/* rel. error term: Eh=(1-Sh*R0)-Sl*R0 */
+        vfnmadd231pd {rn-sae}, %zmm0, %zmm4, %zmm7
+
+/* Shl */
+        vsubpd    {rn-sae}, %zmm14, %zmm9, %zmm4
+        vmovups   poly_coeff7+__svml_dacosh_data_internal_avx512(%rip), %zmm14
+        vfmadd231pd {rn-sae}, %zmm7, %zmm1, %zmm13
+        vfmadd213pd {rn-sae}, %zmm2, %zmm7, %zmm13
+        vfmadd213pd {rn-sae}, %zmm6, %zmm7, %zmm13
+
+/* Sh*Eh */
+        vmulpd    {rn-sae}, %zmm7, %zmm9, %zmm7
+
+/* Sl + Sh*Eh*poly_s */
+        vfmadd213pd {rn-sae}, %zmm0, %zmm13, %zmm7
+
+/* polynomials */
+        vmovups   poly_coeff9+__svml_dacosh_data_internal_avx512(%rip), %zmm13
+
+/* polynomial computation for small inputs */
+        vaddpd    {rn-sae}, %zmm7, %zmm9, %zmm0
+
+/* Xin0+Sl+Sh*Eh*poly_s ~ x+sqrt(1+x^2) */
+        vaddpd    {rn-sae}, %zmm7, %zmm15, %zmm6
+        vfmadd231pd {rn-sae}, %zmm0, %zmm8, %zmm0
+
+/* fixup for very large inputs */
+        vmovups   OneEighth+__svml_dacosh_data_internal_avx512(%rip), %zmm8
+
+/* Sl_high */
+        vsubpd    {rn-sae}, %zmm15, %zmm6, %zmm9
+        vmovups   poly_coeff6+__svml_dacosh_data_internal_avx512(%rip), %zmm15
+        vmulpd    {rn-sae}, %zmm8, %zmm12, %zmm6{%k2}
+
+/* Sl_l */
+        vsubpd    {rn-sae}, %zmm9, %zmm7, %zmm3
+        vrcp14pd  %zmm6, %zmm1
+
+/* Xin_low */
+        vaddpd    {rn-sae}, %zmm4, %zmm3, %zmm7
+
+/* Table lookups */
+        vmovups   __svml_dacosh_data_internal_avx512(%rip), %zmm3
+
+/* round reciprocal to 1+4b mantissas */
+        vpaddq    AddB5+__svml_dacosh_data_internal_avx512(%rip), %zmm1, %zmm2
+
+/* fixup for very large inputs */
+        vxorpd    %zmm7, %zmm7, %zmm7{%k2}
+        vmovups   poly_coeff8+__svml_dacosh_data_internal_avx512(%rip), %zmm1
+        vandpd    RcpBitMask+__svml_dacosh_data_internal_avx512(%rip), %zmm2, %zmm8
+        vmovups   Log_tbl_L+__svml_dacosh_data_internal_avx512(%rip), %zmm2
+
+/* Prepare table index */
+        vpsrlq    $48, %zmm8, %zmm9
+
+/* reduced argument for log(): (Rcp*Xin-1)+Rcp*Xin_low */
+        vfmsub231pd {rn-sae}, %zmm8, %zmm6, %zmm5
+
+/* exponents */
+        vgetexppd {sae}, %zmm8, %zmm4
+        vmovups   Four+__svml_dacosh_data_internal_avx512(%rip), %zmm6
+        vpermt2pd Log_tbl_H+64+__svml_dacosh_data_internal_avx512(%rip), %zmm9, %zmm3
+        vpermt2pd Log_tbl_L+64+__svml_dacosh_data_internal_avx512(%rip), %zmm9, %zmm2
+        vsubpd    {rn-sae}, %zmm6, %zmm4, %zmm4{%k2}
+        vfmadd231pd {rn-sae}, %zmm8, %zmm7, %zmm5
+        vmovups   poly_coeff5+__svml_dacosh_data_internal_avx512(%rip), %zmm6
+        vmovups   poly_coeff4+__svml_dacosh_data_internal_avx512(%rip), %zmm7
+
+/* -K*L2H + Th */
+        vmovups   L2H+__svml_dacosh_data_internal_avx512(%rip), %zmm8
+
+/* -K*L2L + Tl */
+        vmovups   L2L+__svml_dacosh_data_internal_avx512(%rip), %zmm9
+        vfmadd231pd {rn-sae}, %zmm5, %zmm13, %zmm1
+        vmovups   poly_coeff2+__svml_dacosh_data_internal_avx512(%rip), %zmm13
+        vfnmadd231pd {rn-sae}, %zmm4, %zmm8, %zmm3
+        vfnmadd213pd {rn-sae}, %zmm2, %zmm9, %zmm4
+        vfmadd213pd {rn-sae}, %zmm14, %zmm5, %zmm1
+        vmovups   poly_coeff3+__svml_dacosh_data_internal_avx512(%rip), %zmm2
+        vmovups   poly_coeff1+__svml_dacosh_data_internal_avx512(%rip), %zmm14
+        vfmadd213pd {rn-sae}, %zmm15, %zmm5, %zmm1
+
+/* R^2 */
+        vmulpd    {rn-sae}, %zmm5, %zmm5, %zmm15
+        vfmadd213pd {rn-sae}, %zmm6, %zmm5, %zmm1
+        vfmadd213pd {rn-sae}, %zmm7, %zmm5, %zmm1
+        vfmadd213pd {rn-sae}, %zmm2, %zmm5, %zmm1
+        vfmadd213pd {rn-sae}, %zmm13, %zmm5, %zmm1
+        vfmadd213pd {rn-sae}, %zmm14, %zmm5, %zmm1
+
+/* Tl + R^2*Poly */
+        vfmadd213pd {rn-sae}, %zmm4, %zmm15, %zmm1
+
+/* R+Tl + R^2*Poly */
+        vaddpd    {rn-sae}, %zmm5, %zmm1, %zmm5
+        vaddpd    {rn-sae}, %zmm5, %zmm3, %zmm0{%k3}
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 k0 zmm0 zmm12
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm12, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 k0 zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax k0
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        kmovd     %k0, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      acosh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_acosh_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dacosh_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Log_tbl_H[16][2];
+        __declspec(align(64)) VUINT32 Log_tbl_L[16][2];
+        __declspec(align(64)) VUINT32 One[8][2];
+        __declspec(align(64)) VUINT32 SmallThreshold[8][2];
+        __declspec(align(64)) VUINT32 Threshold[8][2];
+        __declspec(align(64)) VUINT32 LargeThreshold[8][2];
+        __declspec(align(64)) VUINT32 ca2[8][2];
+        __declspec(align(64)) VUINT32 ca1[8][2];
+        __declspec(align(64)) VUINT32 c4s[8][2];
+        __declspec(align(64)) VUINT32 c3s[8][2];
+        __declspec(align(64)) VUINT32 c2s[8][2];
+        __declspec(align(64)) VUINT32 c1s[8][2];
+        __declspec(align(64)) VUINT32 AddB5[8][2];
+        __declspec(align(64)) VUINT32 RcpBitMask[8][2];
+        __declspec(align(64)) VUINT32 OneEighth[8][2];
+        __declspec(align(64)) VUINT32 Four[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
+        __declspec(align(64)) VUINT32 L2H[8][2];
+        __declspec(align(64)) VUINT32 L2L[8][2];
+    } __svml_dacosh_data_internal_avx512;
+#endif
+__svml_dacosh_data_internal_avx512:
+        /*== Log_tbl_H ==*/
+        .quad 0x0000000000000000
+        .quad 0xbfaf0a30c0120000
+        .quad 0xbfbe27076e2b0000
+        .quad 0xbfc5ff3070a78000
+        .quad 0xbfcc8ff7c79a8000
+        .quad 0xbfd1675cababc000
+        .quad 0xbfd4618bc21c4000
+        .quad 0xbfd739d7f6bbc000
+        .quad 0xbfd9f323ecbf8000
+        .quad 0xbfdc8ff7c79a8000
+        .quad 0xbfdf128f5faf0000
+        .quad 0xbfe0be72e4252000
+        .quad 0xbfe1e85f5e704000
+        .quad 0xbfe307d7334f2000
+        .quad 0xbfe41d8fe8468000
+        .quad 0xbfe52a2d265bc000
+        /*== Log_tbl_L ==*/
+        .align 64
+        .quad 0x0000000000000000
+        .quad 0x3d53ab33d066d1d2
+        .quad 0x3d2a342c2af0003c
+        .quad 0xbd43d3c873e20a07
+        .quad 0xbd4a21ac25d81ef3
+        .quad 0x3d59f1fc63382a8f
+        .quad 0xbd5ec27d0b7b37b3
+        .quad 0xbd50069ce24c53fb
+        .quad 0xbd584bf2b68d766f
+        .quad 0xbd5a21ac25d81ef3
+        .quad 0xbd3bb2cd720ec44c
+        .quad 0xbd55056d312f7668
+        .quad 0xbd1a07bd8b34be7c
+        .quad 0x3d5e83c094debc15
+        .quad 0x3d5aa33736867a17
+        .quad 0xbd46abb9df22bc57
+        /*== One ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== SmallThreshold ==*/
+        .align 64
+        .quad 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000
+        /*== Threshold ==*/
+        .align 64
+        .quad 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000
+        /*== LargeThreshold ==*/
+        .align 64
+        .quad 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff
+        /*== ca2 ==*/
+        .align 64
+        .quad 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7
+        /*== ca1 ==*/
+        .align 64
+        .quad 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e
+        /*== c4s ==*/
+        .align 64
+        .quad 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612
+        /*== c3s ==*/
+        .align 64
+        .quad 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000
+        /*== c2s ==*/
+        .align 64
+        .quad 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000
+        /*== c1s ==*/
+        .align 64
+        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
+        /*== AddB5 ==*/
+        .align 64
+        .quad 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000
+        /*== RcpBitMask ==*/
+        .align 64
+        .quad 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000
+        /*==OneEighth ==*/
+        .align 64
+        .quad 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000
+        /*== Four ==*/
+        .align 64
+        .quad 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000
+        /*== poly_coeff9 ==*/
+        .align 64
+        .quad 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368
+        /*== poly_coeff8 ==*/
+        .align 64
+        .quad 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778
+        /*== poly_coeff7 ==*/
+        .align 64
+        .quad 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9
+        /*== poly_coeff6 ==*/
+        .align 64
+        .quad 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1
+        /*== poly_coeff5 ==*/
+        .align 64
+        .quad 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736
+        /*== poly_coeff4 ==*/
+        .align 64
+        .quad 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af
+        /*== poly_coeff3 ==*/
+        .align 64
+        .quad 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65
+        /*== poly_coeff2 ==*/
+        .align 64
+        .quad 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1
+        /*== poly_coeff1 ==*/
+        .align 64
+        .quad 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000
+        /*== L2H = log(2)_high ==*/
+        .align 64
+        .quad 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000
+        /*== L2L = log(2)_low ==*/
+        .align 64
+        .quad 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000
+        .align 64
+        .type	__svml_dacosh_data_internal_avx512,@object
+        .size	__svml_dacosh_data_internal_avx512,.-__svml_dacosh_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core-avx2.S
new file mode 100644
index 0000000000..a54c6863c5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized acoshf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_acoshf _ZGVeN16v_acoshf_avx2_wrapper
+#include "../svml_s_acoshf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core.c
new file mode 100644
index 0000000000..8109b73ebf
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized acoshf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_acoshf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_acoshf, __GI__ZGVeN16v_acoshf,
+	       __redirect__ZGVeN16v_acoshf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core_avx512.S
new file mode 100644
index 0000000000..688ca38669
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core_avx512.S
@@ -0,0 +1,449 @@
+/* Function acoshf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute acosh(x) as log(x + sqrt(x*x - 1))
+ *   using RSQRT instructions for starting the
+ *   square root approximation, and small table lookups for log
+ *   that map to AVX-512 permute instructions
+ *
+ *   Special cases:
+ *
+ *   acosh(NaN)  = quiet NaN, and raise invalid exception
+ *   acosh(-INF) = NaN
+ *   acosh(+INF) = +INF
+ *   acosh(x)    = NaN if x < 1
+ *   acosh(1)    = +0
+ *
+ */
+
+/* Offsets for data table __svml_sacosh_data_internal_avx512
+ */
+#define Log_tbl_H                     	0
+#define Log_tbl_L                     	128
+#define One                           	256
+#define SmallThreshold                	320
+#define Threshold                     	384
+#define LargeThreshold                	448
+#define ca1                           	512
+#define c2s                           	576
+#define c1s                           	640
+#define AddB5                         	704
+#define RcpBitMask                    	768
+#define OneEighth                     	832
+#define Four                          	896
+#define poly_coeff3                   	960
+#define poly_coeff2                   	1024
+#define poly_coeff1                   	1088
+#define L2H                           	1152
+#define L2L                           	1216
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_acoshf_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovups   One+__svml_sacosh_data_internal_avx512(%rip), %zmm1
+
+/*
+ * sqrt(1+x^2) ~ Sh + Sl + Sh*Eh*poly_s
+ * poly_s = c1+c2*Eh
+ */
+        vmovups   c2s+__svml_sacosh_data_internal_avx512(%rip), %zmm13
+        vmovups   c1s+__svml_sacosh_data_internal_avx512(%rip), %zmm15
+
+/* polynomial computation for small inputs */
+        vmovups   ca1+__svml_sacosh_data_internal_avx512(%rip), %zmm9
+
+/* very large inputs ? */
+        vmovups   Threshold+__svml_sacosh_data_internal_avx512(%rip), %zmm10
+
+/* out of range inputs? */
+        vmovups   LargeThreshold+__svml_sacosh_data_internal_avx512(%rip), %zmm11
+
+/* not a very small input ? */
+        vmovups   SmallThreshold+__svml_sacosh_data_internal_avx512(%rip), %zmm6
+        vmovaps   %zmm0, %zmm8
+
+/* x^2 - 1 */
+        vmovaps   %zmm1, %zmm7
+        vfmsub231ps {rn-sae}, %zmm8, %zmm8, %zmm7
+        vcmpps    $21, {sae}, %zmm10, %zmm8, %k2
+        vcmpps    $22, {sae}, %zmm11, %zmm8, %k0
+        vcmpps    $18, {sae}, %zmm1, %zmm8, %k1
+        vrsqrt14ps %zmm7, %zmm12
+        vcmpps    $21, {sae}, %zmm6, %zmm7, %k3
+        vmulps    {rn-sae}, %zmm9, %zmm7, %zmm4
+
+/* Sh ~sqrt(-1+x^2) */
+        vmulps    {rn-sae}, %zmm12, %zmm7, %zmm5
+
+/* Sh+x */
+        vaddps    {rn-sae}, %zmm8, %zmm5, %zmm9
+
+/* (Yh*R0)_low */
+        vmovaps   %zmm7, %zmm0
+        korw      %k0, %k1, %k0
+
+/* rel. error term: Eh=1-Sh*R0 */
+        vmovaps   %zmm1, %zmm14
+        vfmsub213ps {rn-sae}, %zmm5, %zmm12, %zmm0
+        vfnmadd231ps {rn-sae}, %zmm5, %zmm12, %zmm14
+
+/* rel. error term: Eh=(1-Sh*R0)-Sl*R0 */
+        vfnmadd231ps {rn-sae}, %zmm0, %zmm12, %zmm14
+
+/* Sh*Eh */
+        vmulps    {rn-sae}, %zmm14, %zmm5, %zmm3
+        vfmadd231ps {rn-sae}, %zmm14, %zmm13, %zmm15
+
+/* Sl + Sh*Eh*poly_s */
+        vfmadd213ps {rn-sae}, %zmm0, %zmm15, %zmm3
+
+/* Shh */
+        vsubps    {rn-sae}, %zmm8, %zmm9, %zmm15
+
+/* polynomial computation for small inputs */
+        vaddps    {rn-sae}, %zmm3, %zmm5, %zmm0
+
+/* Xin0+Sl+Sh*Eh*poly_s ~ x+sqrt(1+x^2) */
+        vaddps    {rn-sae}, %zmm3, %zmm9, %zmm2
+
+/* Shl */
+        vsubps    {rn-sae}, %zmm15, %zmm5, %zmm10
+        vfmadd231ps {rn-sae}, %zmm0, %zmm4, %zmm0
+
+/* fixup for very large inputs */
+        vmovups   OneEighth+__svml_sacosh_data_internal_avx512(%rip), %zmm4
+
+/* Sl_high */
+        vsubps    {rn-sae}, %zmm9, %zmm2, %zmm5
+
+/* polynomial */
+        vmovups   poly_coeff3+__svml_sacosh_data_internal_avx512(%rip), %zmm9
+        vmulps    {rn-sae}, %zmm4, %zmm8, %zmm2{%k2}
+
+/* -K*L2L + Tl */
+        vmovups   L2L+__svml_sacosh_data_internal_avx512(%rip), %zmm4
+
+/* Sl_l */
+        vsubps    {rn-sae}, %zmm5, %zmm3, %zmm3
+        vrcp14ps  %zmm2, %zmm11
+        vmovups   Log_tbl_L+__svml_sacosh_data_internal_avx512(%rip), %zmm5
+
+/* Xin_low */
+        vaddps    {rn-sae}, %zmm10, %zmm3, %zmm13
+
+/* round reciprocal to 1+4b mantissas */
+        vpaddd    AddB5+__svml_sacosh_data_internal_avx512(%rip), %zmm11, %zmm12
+        vmovups   poly_coeff1+__svml_sacosh_data_internal_avx512(%rip), %zmm10
+        vandps    RcpBitMask+__svml_sacosh_data_internal_avx512(%rip), %zmm12, %zmm14
+
+/* fixup for very large inputs */
+        vxorps    %zmm13, %zmm13, %zmm13{%k2}
+
+/* reduced argument for log(): (Rcp*Xin-1)+Rcp*Xin_low */
+        vfmsub231ps {rn-sae}, %zmm14, %zmm2, %zmm1
+
+/* exponents */
+        vgetexpps {sae}, %zmm14, %zmm12
+        vmovups   Four+__svml_sacosh_data_internal_avx512(%rip), %zmm2
+
+/* Prepare table index */
+        vpsrld    $18, %zmm14, %zmm3
+        vfmadd231ps {rn-sae}, %zmm14, %zmm13, %zmm1
+        vmovups   poly_coeff2+__svml_sacosh_data_internal_avx512(%rip), %zmm13
+
+/* Table lookups */
+        vmovups   __svml_sacosh_data_internal_avx512(%rip), %zmm14
+        vsubps    {rn-sae}, %zmm2, %zmm12, %zmm12{%k2}
+        vpermt2ps Log_tbl_L+64+__svml_sacosh_data_internal_avx512(%rip), %zmm3, %zmm5
+        vpermt2ps Log_tbl_H+64+__svml_sacosh_data_internal_avx512(%rip), %zmm3, %zmm14
+
+/* R^2 */
+        vmulps    {rn-sae}, %zmm1, %zmm1, %zmm11
+
+/* -K*L2H + Th */
+        vmovups   L2H+__svml_sacosh_data_internal_avx512(%rip), %zmm2
+        vfmadd231ps {rn-sae}, %zmm1, %zmm9, %zmm13
+        vfnmadd231ps {rn-sae}, %zmm12, %zmm2, %zmm14
+        vfnmadd213ps {rn-sae}, %zmm5, %zmm4, %zmm12
+        vfmadd213ps {rn-sae}, %zmm10, %zmm1, %zmm13
+
+/* Tl + R^2*Poly */
+        vfmadd213ps {rn-sae}, %zmm12, %zmm11, %zmm13
+
+/* R+Tl + R^2*Poly */
+        vaddps    {rn-sae}, %zmm1, %zmm13, %zmm1
+        vaddps    {rn-sae}, %zmm1, %zmm14, %zmm0{%k3}
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 k0 zmm0 zmm8
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm8, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 k0 zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax k0
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        kmovd     %k0, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      acoshf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_acoshf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_sacosh_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Log_tbl_H[32][1];
+        __declspec(align(64)) VUINT32 Log_tbl_L[32][1];
+        __declspec(align(64)) VUINT32 One[16][1];
+        __declspec(align(64)) VUINT32 SmallThreshold[16][1];
+        __declspec(align(64)) VUINT32 Threshold[16][1];
+        __declspec(align(64)) VUINT32 LargeThreshold[16][1];
+        __declspec(align(64)) VUINT32 ca1[16][1];
+        __declspec(align(64)) VUINT32 c2s[16][1];
+        __declspec(align(64)) VUINT32 c1s[16][1];
+        __declspec(align(64)) VUINT32 AddB5[16][1];
+        __declspec(align(64)) VUINT32 RcpBitMask[16][1];
+        __declspec(align(64)) VUINT32 OneEighth[16][1];
+        __declspec(align(64)) VUINT32 Four[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
+        __declspec(align(64)) VUINT32 L2H[16][1];
+        __declspec(align(64)) VUINT32 L2L[16][1];
+    } __svml_sacosh_data_internal_avx512;
+#endif
+__svml_sacosh_data_internal_avx512:
+        /*== Log_tbl_H ==*/
+        .long 0x00000000
+        .long 0xbcfc0000
+        .long 0xbd788000
+        .long 0xbdb78000
+        .long 0xbdf14000
+        .long 0xbe14a000
+        .long 0xbe300000
+        .long 0xbe4aa000
+        .long 0xbe648000
+        .long 0xbe7dc000
+        .long 0xbe8b4000
+        .long 0xbe974000
+        .long 0xbea31000
+        .long 0xbeae9000
+        .long 0xbeb9d000
+        .long 0xbec4d000
+        .long 0xbecfa000
+        .long 0xbeda2000
+        .long 0xbee48000
+        .long 0xbeeea000
+        .long 0xbef89000
+        .long 0xbf012800
+        .long 0xbf05f000
+        .long 0xbf0aa800
+        .long 0xbf0f4000
+        .long 0xbf13c800
+        .long 0xbf184000
+        .long 0xbf1ca000
+        .long 0xbf20f000
+        .long 0xbf252800
+        .long 0xbf295000
+        .long 0xbf2d6800
+        /*== Log_tbl_L ==*/
+        .align 64
+        .long 0x80000000
+        .long 0xb726c39e
+        .long 0x3839e7fe
+        .long 0xb7528ae5
+        .long 0x377891d5
+        .long 0xb8297c10
+        .long 0x37cf8f58
+        .long 0x3852b186
+        .long 0x35838656
+        .long 0xb80c36af
+        .long 0x38235454
+        .long 0xb862bae1
+        .long 0x37e87bc7
+        .long 0x37848150
+        .long 0x37202511
+        .long 0xb74e1b05
+        .long 0x385c1340
+        .long 0xb8777bcd
+        .long 0x36038656
+        .long 0xb7d40984
+        .long 0xb80f5faf
+        .long 0xb8254b4c
+        .long 0xb865c84a
+        .long 0x37f0b42d
+        .long 0xb83ebce1
+        .long 0xb83c2513
+        .long 0x37a332c4
+        .long 0x3779654f
+        .long 0x38602f73
+        .long 0x367449f8
+        .long 0xb7b4996f
+        .long 0xb800986b
+        /*== One ==*/
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== SmallThreshold ==*/
+        .align 64
+        .long 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000
+        /*== Threshold ==*/
+        .align 64
+        .long 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000
+        /*== LargeThreshold ==*/
+        .align 64
+        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
+        /*== ca1 ==*/
+        .align 64
+        .long 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE
+        /*== c2s ==*/
+        .align 64
+        .long 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000
+        /*== c1s ==*/
+        .align 64
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
+        /*== AddB5 ==*/
+        .align 64
+        .long 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000
+        /*== RcpBitMask ==*/
+        .align 64
+        .long 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000
+        /*==OneEighth ==*/
+        .align 64
+        .long 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000
+        /*== Four ==*/
+        .align 64
+        .long 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000
+        /*== poly_coeff3 ==*/
+        .align 64
+        .long 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810
+        /*== poly_coeff2 ==*/
+        .align 64
+        .long 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e
+        /*== poly_coeff1 ==*/
+        .align 64
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000
+        /*== L2H = log(2)_high ==*/
+        .align 64
+        .long 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000
+        /*== L2L = log(2)_low ==*/
+        .align 64
+        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4
+        .align 64
+        .type	__svml_sacosh_data_internal_avx512,@object
+        .size	__svml_sacosh_data_internal_avx512,.-__svml_sacosh_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core-sse2.S
new file mode 100644
index 0000000000..d789ec1d47
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized acoshf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_acoshf _ZGVbN4v_acoshf_sse2
+#include "../svml_s_acoshf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core.c
new file mode 100644
index 0000000000..b2d9101c47
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized acoshf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_acoshf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_acoshf, __GI__ZGVbN4v_acoshf,
+	       __redirect__ZGVbN4v_acoshf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core_sse4.S
new file mode 100644
index 0000000000..e897ea304f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core_sse4.S
@@ -0,0 +1,389 @@
+/* Function acoshf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute acosh(x) as log(x + sqrt(x*x - 1))
+ *
+ *   Special cases:
+ *
+ *   acosh(NaN)  = quiet NaN, and raise invalid exception
+ *   acosh(-INF) = NaN
+ *   acosh(+INF) = +INF
+ *   acosh(x)    = NaN if x < 1
+ *   acosh(1)    = +0
+ *
+ */
+
+/* Offsets for data table __svml_sacosh_data_internal
+ */
+#define sOne                          	0
+#define sPoly                         	16
+#define iBrkValue                     	144
+#define iOffExpoMask                  	160
+#define sBigThreshold                 	176
+#define sC2                           	192
+#define sC3                           	208
+#define sHalf                         	224
+#define sLargestFinite                	240
+#define sThirtyOne                    	256
+#define sTopMask8                     	272
+#define XScale                        	288
+#define sLn2                          	304
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_acoshf_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+
+/* Compute U = X - 1 and V = X + 1, naively first. */
+        movaps    %xmm0, %xmm12
+
+/* Load constants, always including One = 1 */
+        movups    sOne+__svml_sacosh_data_internal(%rip), %xmm2
+
+/*
+ * Check that 1 < X < +inf; otherwise go to the callout function.
+ * We need the callout for X = 1 to avoid division by zero below.
+ * This test ensures that callout handles NaN and either infinity.
+ */
+        movaps    %xmm0, %xmm4
+        movaps    %xmm2, %xmm9
+
+/*
+ * Compute e = -(2 * d + d^2)
+ * The first FMR is exact, and the rounding error in the other is acceptable
+ * since d and e are ~ 2^-8
+ */
+        movaps    %xmm2, %xmm10
+
+/* Finally, express Y + W = U * V accurately where Y has <= 8 bits */
+        movups    sTopMask8+__svml_sacosh_data_internal(%rip), %xmm5
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * also adding L into Xl.
+ * compute 1+x as high, low parts
+ */
+        movaps    %xmm2, %xmm13
+        movaps    %xmm5, %xmm11
+        movaps    %xmm2, %xmm3
+
+/*
+ * Now       1 / (1 + d)
+ * = 1 / (1 + (sqrt(1 - e) - 1))
+ * = 1 / sqrt(1 - e)
+ * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 + ...
+ * So compute the first three nonconstant terms of that, so that
+ * we have a relative correction (1 + Corr) to apply to S etc.
+ * C1 = 1/2
+ * C2 = 3/8
+ * C3 = 5/16
+ */
+        movups    sC3+__svml_sacosh_data_internal(%rip), %xmm8
+
+/*
+ * The following computation can go wrong for very large X, e.g.
+ * the X^2 - 1 = U * V can overflow. But for large X we have
+ * acosh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
+ * we can just later stick X back into the log and tweak up the exponent.
+ * Actually we scale X by 2^-30 and tweak the exponent up by 31,
+ * to stay in the safe range for the later log computation.
+ * Compute a flag now telling us when to do this.
+ */
+        movaps    %xmm0, %xmm1
+        cmpnleps  sLargestFinite+__svml_sacosh_data_internal(%rip), %xmm4
+        cmpltps   sBigThreshold+__svml_sacosh_data_internal(%rip), %xmm1
+        cmpnltps  %xmm0, %xmm3
+        subps     %xmm2, %xmm12
+        addps     %xmm0, %xmm9
+
+/* For low-accuracy versions, naivety is harmless */
+        mulps     %xmm12, %xmm9
+        orps      %xmm3, %xmm4
+        movmskps  %xmm4, %edx
+        andps     %xmm9, %xmm11
+        movaps    %xmm1, %xmm3
+
+/*
+ * Compute R = 1/sqrt(Y + W) * (1 + d)
+ * Force R to <= 8 significant bits.
+ * This means that R * Y and R^2 * Y are exactly representable.
+ */
+        rsqrtps   %xmm11, %xmm7
+        subps     %xmm11, %xmm9
+        andps     %xmm5, %xmm7
+        movaps    %xmm2, %xmm4
+
+/*
+ * Compute S = (Y/sqrt(Y + W)) * (1 + d)
+ * and T = (W/sqrt(Y + W)) * (1 + d)
+ * so that S + T = sqrt(Y + W) * (1 + d)
+ * S is exact, and the rounding error in T is OK.
+ */
+        mulps     %xmm7, %xmm11
+        movaps    %xmm7, %xmm6
+        mulps     %xmm7, %xmm9
+        mulps     %xmm11, %xmm6
+        mulps     %xmm9, %xmm7
+
+/*
+ * For low-accuracy versions, the computation can be done
+ * just as U + ((S + T) + (S + T) * Corr)
+ */
+        addps     %xmm9, %xmm11
+        subps     %xmm6, %xmm10
+        movaps    %xmm2, %xmm9
+        subps     %xmm7, %xmm10
+        mulps     %xmm10, %xmm8
+
+/* Now multiplex to the case X = 2^-30 * input, Xl = 0 in the "big" case. */
+        movups    XScale+__svml_sacosh_data_internal(%rip), %xmm14
+        mulps     %xmm0, %xmm14
+        addps     sC2+__svml_sacosh_data_internal(%rip), %xmm8
+        mulps     %xmm10, %xmm8
+        andnps    %xmm14, %xmm3
+
+/*
+ * Now resume the main code.
+ * reduction: compute r,n
+ */
+        movdqu    iBrkValue+__svml_sacosh_data_internal(%rip), %xmm14
+        movdqu    iOffExpoMask+__svml_sacosh_data_internal(%rip), %xmm5
+
+/* Add 31 to the exponent in the "large" case to get log(2 * input) */
+        movups    sThirtyOne+__svml_sacosh_data_internal(%rip), %xmm6
+        addps     sHalf+__svml_sacosh_data_internal(%rip), %xmm8
+        mulps     %xmm8, %xmm10
+        movaps    %xmm1, %xmm8
+        mulps     %xmm11, %xmm10
+        addps     %xmm10, %xmm11
+        addps     %xmm11, %xmm12
+        maxps     %xmm12, %xmm13
+        minps     %xmm12, %xmm9
+        movaps    %xmm13, %xmm15
+        addps     %xmm9, %xmm15
+        subps     %xmm15, %xmm13
+        andps     %xmm1, %xmm15
+        orps      %xmm15, %xmm3
+        addps     %xmm13, %xmm9
+        psubd     %xmm14, %xmm3
+        andps     %xmm1, %xmm9
+        pand      %xmm3, %xmm5
+        psrad     $23, %xmm3
+        cvtdq2ps  %xmm3, %xmm7
+        pslld     $23, %xmm3
+        paddd     %xmm14, %xmm5
+        psubd     %xmm3, %xmm4
+
+/* polynomial evaluation */
+        subps     %xmm2, %xmm5
+        mulps     %xmm4, %xmm9
+        addps     %xmm7, %xmm6
+        movups    sPoly+112+__svml_sacosh_data_internal(%rip), %xmm2
+        andnps    %xmm6, %xmm8
+        andps     %xmm1, %xmm7
+        addps     %xmm5, %xmm9
+        mulps     %xmm9, %xmm2
+        orps      %xmm7, %xmm8
+
+/* final reconstruction */
+        mulps     sLn2+__svml_sacosh_data_internal(%rip), %xmm8
+        addps     sPoly+96+__svml_sacosh_data_internal(%rip), %xmm2
+        mulps     %xmm9, %xmm2
+        addps     sPoly+80+__svml_sacosh_data_internal(%rip), %xmm2
+        mulps     %xmm9, %xmm2
+        addps     sPoly+64+__svml_sacosh_data_internal(%rip), %xmm2
+        mulps     %xmm9, %xmm2
+        addps     sPoly+48+__svml_sacosh_data_internal(%rip), %xmm2
+        mulps     %xmm9, %xmm2
+        addps     sPoly+32+__svml_sacosh_data_internal(%rip), %xmm2
+        mulps     %xmm9, %xmm2
+        addps     sPoly+16+__svml_sacosh_data_internal(%rip), %xmm2
+        mulps     %xmm9, %xmm2
+        addps     sPoly+__svml_sacosh_data_internal(%rip), %xmm2
+        mulps     %xmm9, %xmm2
+        mulps     %xmm9, %xmm2
+        addps     %xmm2, %xmm9
+        addps     %xmm8, %xmm9
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm9
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movaps    %xmm9, %xmm0
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm0, 32(%rsp)
+        movups    %xmm9, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm9
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm9
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      acoshf@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_acoshf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_sacosh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 sOne[4][1];
+        __declspec(align(16)) VUINT32 sPoly[8][4][1];
+        __declspec(align(16)) VUINT32 iBrkValue[4][1];
+        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
+        __declspec(align(16)) VUINT32 sBigThreshold[4][1];
+        __declspec(align(16)) VUINT32 sC2[4][1];
+        __declspec(align(16)) VUINT32 sC3[4][1];
+        __declspec(align(16)) VUINT32 sHalf[4][1];
+        __declspec(align(16)) VUINT32 sLargestFinite[4][1];
+        __declspec(align(16)) VUINT32 sThirtyOne[4][1];
+        __declspec(align(16)) VUINT32 sTopMask8[4][1];
+        __declspec(align(16)) VUINT32 XScale[4][1];
+        __declspec(align(16)) VUINT32 sLn2[4][1];
+} __svml_sacosh_data_internal;
+#endif
+__svml_sacosh_data_internal:
+        /*== sOne = SP 1.0 ==*/
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== sPoly[] = SP polynomial ==*/
+        .align 16
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
+        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
+        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
+        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
+        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
+        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
+        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
+        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 16
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 16
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sBigThreshold ==*/
+        .align 16
+        .long 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000
+        /*== sC2 ==*/
+        .align 16
+        .long 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000
+        /*== sC3 ==*/
+        .align 16
+        .long 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000
+        /*== sHalf ==*/
+        .align 16
+        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
+        /*== sLargestFinite ==*/
+        .align 16
+        .long 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF
+        /*== sThirtyOne ==*/
+        .align 16
+        .long 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000
+        /*== sTopMask8 ==*/
+        .align 16
+        .long 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000
+        /*== XScale ==*/
+        .align 16
+        .long 0x30800000, 0x30800000, 0x30800000, 0x30800000
+        /*== sLn2 = SP ln(2) ==*/
+        .align 16
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
+        .align 16
+        .type	__svml_sacosh_data_internal,@object
+        .size	__svml_sacosh_data_internal,.-__svml_sacosh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core-sse.S
new file mode 100644
index 0000000000..cb97d291c5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized acoshf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_acoshf _ZGVdN8v_acoshf_sse_wrapper
+#include "../svml_s_acoshf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core.c
new file mode 100644
index 0000000000..db71194cd0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized acoshf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_acoshf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_acoshf, __GI__ZGVdN8v_acoshf,
+	       __redirect__ZGVdN8v_acoshf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core_avx2.S
new file mode 100644
index 0000000000..1d847fcd40
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core_avx2.S
@@ -0,0 +1,370 @@
+/* Function acoshf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute acosh(x) as log(x + sqrt(x*x - 1))
+ *
+ *   Special cases:
+ *
+ *   acosh(NaN)  = quiet NaN, and raise invalid exception
+ *   acosh(-INF) = NaN
+ *   acosh(+INF) = +INF
+ *   acosh(x)    = NaN if x < 1
+ *   acosh(1)    = +0
+ *
+ */
+
+/* Offsets for data table __svml_sacosh_data_internal
+ */
+#define sOne                          	0
+#define sPoly                         	32
+#define iBrkValue                     	288
+#define iOffExpoMask                  	320
+#define sBigThreshold                 	352
+#define sC2                           	384
+#define sC3                           	416
+#define sHalf                         	448
+#define sLargestFinite                	480
+#define sThirtyOne                    	512
+#define sTopMask8                     	544
+#define XScale                        	576
+#define sLn2                          	608
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_acoshf_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+
+/* Load constants, always including One = 1 */
+        vmovups   sOne+__svml_sacosh_data_internal(%rip), %ymm2
+
+/* Finally, express Y + W = U * V accurately where Y has <= 8 bits */
+        vmovups   sTopMask8+__svml_sacosh_data_internal(%rip), %ymm9
+
+/*
+ * Now       1 / (1 + d)
+ * = 1 / (1 + (sqrt(1 - e) - 1))
+ * = 1 / sqrt(1 - e)
+ * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 + ...
+ * So compute the first three nonconstant terms of that, so that
+ * we have a relative correction (1 + Corr) to apply to S etc.
+ * C1 = 1/2
+ * C2 = 3/8
+ * C3 = 5/16
+ */
+        vmovups   sC3+__svml_sacosh_data_internal(%rip), %ymm14
+        vmovaps   %ymm0, %ymm3
+        vmovaps   %ymm2, %ymm7
+        vfmsub231ps %ymm3, %ymm3, %ymm7
+
+/*
+ * Check that 1 < X < +inf; otherwise go to the callout function.
+ * We need the callout for X = 1 to avoid division by zero below.
+ * This test ensures that callout handles NaN and either infinity.
+ */
+        vcmpnle_uqps sLargestFinite+__svml_sacosh_data_internal(%rip), %ymm3, %ymm4
+        vcmpngt_uqps %ymm2, %ymm3, %ymm5
+
+/*
+ * The following computation can go wrong for very large X, e.g.
+ * the X^2 - 1 = U * V can overflow. But for large X we have
+ * acosh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
+ * we can just later stick X back into the log and tweak up the exponent.
+ * Actually we scale X by 2^-30 and tweak the exponent up by 31,
+ * to stay in the safe range for the later log computation.
+ * Compute a flag now telling us when to do this.
+ */
+        vcmplt_oqps sBigThreshold+__svml_sacosh_data_internal(%rip), %ymm3, %ymm1
+        vandps    %ymm9, %ymm7, %ymm10
+
+/*
+ * Compute R = 1/sqrt(Y + W) * (1 + d)
+ * Force R to <= 8 significant bits.
+ * This means that R * Y and R^2 * Y are exactly representable.
+ */
+        vrsqrtps  %ymm10, %ymm8
+        vsubps    %ymm10, %ymm7, %ymm11
+        vandps    %ymm9, %ymm8, %ymm12
+
+/*
+ * Compute S = (Y/sqrt(Y + W)) * (1 + d)
+ * and T = (W/sqrt(Y + W)) * (1 + d)
+ * so that S + T = sqrt(Y + W) * (1 + d)
+ * S is exact, and the rounding error in T is OK.
+ */
+        vmulps    %ymm12, %ymm10, %ymm15
+        vmulps    %ymm11, %ymm12, %ymm0
+
+/* Now multiplex to the case X = 2^-30 * input, Xl = 0 in the "big" case. */
+        vmulps    XScale+__svml_sacosh_data_internal(%rip), %ymm3, %ymm11
+
+/*
+ * Compute e = -(2 * d + d^2)
+ * The first FMR is exact, and the rounding error in the other is acceptable
+ * since d and e are ~ 2^-8
+ */
+        vmovaps   %ymm2, %ymm13
+        vfnmadd231ps %ymm15, %ymm12, %ymm13
+        vfnmadd231ps %ymm0, %ymm12, %ymm13
+        vfmadd213ps sC2+__svml_sacosh_data_internal(%rip), %ymm13, %ymm14
+        vfmadd213ps sHalf+__svml_sacosh_data_internal(%rip), %ymm13, %ymm14
+        vmulps    %ymm14, %ymm13, %ymm7
+        vorps     %ymm5, %ymm4, %ymm6
+
+/*
+ * For low-accuracy versions, the computation can be done
+ * just as U + ((S + T) + (S + T) * Corr)
+ */
+        vaddps    %ymm0, %ymm15, %ymm5
+
+/* sU is needed later on */
+        vsubps    %ymm2, %ymm3, %ymm4
+        vfmadd213ps %ymm5, %ymm7, %ymm5
+        vmovmskps %ymm6, %edx
+        vaddps    %ymm5, %ymm4, %ymm6
+
+/*
+ * Now resume the main code.
+ * reduction: compute r,n
+ */
+        vmovups   iBrkValue+__svml_sacosh_data_internal(%rip), %ymm4
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * also adding L into Xl.
+ * compute 1+x as high, low parts
+ */
+        vmaxps    %ymm6, %ymm2, %ymm8
+        vminps    %ymm6, %ymm2, %ymm9
+        vaddps    %ymm9, %ymm8, %ymm12
+        vblendvps %ymm1, %ymm12, %ymm11, %ymm14
+        vsubps    %ymm12, %ymm8, %ymm10
+        vpsubd    %ymm4, %ymm14, %ymm15
+        vaddps    %ymm10, %ymm9, %ymm13
+        vpand     iOffExpoMask+__svml_sacosh_data_internal(%rip), %ymm15, %ymm14
+        vpsrad    $23, %ymm15, %ymm15
+        vpaddd    %ymm4, %ymm14, %ymm8
+        vpslld    $23, %ymm15, %ymm5
+        vmovups   sPoly+224+__svml_sacosh_data_internal(%rip), %ymm4
+        vcvtdq2ps %ymm15, %ymm0
+        vpsubd    %ymm5, %ymm2, %ymm7
+
+/* polynomial evaluation */
+        vsubps    %ymm2, %ymm8, %ymm2
+
+/* Add 31 to the exponent in the "large" case to get log(2 * input) */
+        vaddps    sThirtyOne+__svml_sacosh_data_internal(%rip), %ymm0, %ymm5
+        vandps    %ymm1, %ymm13, %ymm6
+        vmulps    %ymm7, %ymm6, %ymm9
+        vblendvps %ymm1, %ymm0, %ymm5, %ymm0
+        vaddps    %ymm2, %ymm9, %ymm2
+        vfmadd213ps sPoly+192+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
+        vfmadd213ps sPoly+160+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
+        vfmadd213ps sPoly+128+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
+        vfmadd213ps sPoly+96+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
+        vfmadd213ps sPoly+64+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
+        vfmadd213ps sPoly+32+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
+        vfmadd213ps sPoly+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
+        vmulps    %ymm4, %ymm2, %ymm6
+        vfmadd213ps %ymm2, %ymm2, %ymm6
+
+/* final reconstruction */
+        vfmadd132ps sLn2+__svml_sacosh_data_internal(%rip), %ymm6, %ymm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm3, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      acoshf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_acoshf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_sacosh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 sOne[8][1];
+        __declspec(align(32)) VUINT32 sPoly[8][8][1];
+        __declspec(align(32)) VUINT32 iBrkValue[8][1];
+        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
+        __declspec(align(32)) VUINT32 sBigThreshold[8][1];
+        __declspec(align(32)) VUINT32 sC2[8][1];
+        __declspec(align(32)) VUINT32 sC3[8][1];
+        __declspec(align(32)) VUINT32 sHalf[8][1];
+        __declspec(align(32)) VUINT32 sLargestFinite[8][1];
+        __declspec(align(32)) VUINT32 sThirtyOne[8][1];
+        __declspec(align(32)) VUINT32 sTopMask8[8][1];
+        __declspec(align(32)) VUINT32 XScale[8][1];
+        __declspec(align(32)) VUINT32 sLn2[8][1];
+} __svml_sacosh_data_internal;
+#endif
+__svml_sacosh_data_internal:
+        /*== sOne = SP 1.0 ==*/
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== sPoly[] = SP polynomial ==*/
+        .align 32
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
+        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
+        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
+        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
+        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
+        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
+        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
+        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 32
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 32
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sBigThreshold ==*/
+        .align 32
+        .long 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000
+        /*== sC2 ==*/
+        .align 32
+        .long 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000
+        /*== sC3 ==*/
+        .align 32
+        .long 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000
+        /*== sHalf ==*/
+        .align 32
+        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
+        /*== sLargestFinite ==*/
+        .align 32
+        .long 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF
+        /*== sThirtyOne ==*/
+        .align 32
+        .long 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000
+        /*== sTopMask8 ==*/
+        .align 32
+        .long 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000
+        /*== XScale ==*/
+        .align 32
+        .long 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000
+        /*== sLn2 = SP ln(2) ==*/
+        .align 32
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
+        .align 32
+        .type	__svml_sacosh_data_internal,@object
+        .size	__svml_sacosh_data_internal,.-__svml_sacosh_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_acosh2_core.S b/sysdeps/x86_64/fpu/svml_d_acosh2_core.S
new file mode 100644
index 0000000000..42bd5c1b5d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_acosh2_core.S
@@ -0,0 +1,29 @@
+/* Function acosh vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_acosh)
+WRAPPER_IMPL_SSE2 acosh
+END (_ZGVbN2v_acosh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_acosh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_acosh4_core.S b/sysdeps/x86_64/fpu/svml_d_acosh4_core.S
new file mode 100644
index 0000000000..433192bae1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_acosh4_core.S
@@ -0,0 +1,29 @@
+/* Function acosh vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_acosh)
+WRAPPER_IMPL_AVX _ZGVbN2v_acosh
+END (_ZGVdN4v_acosh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_acosh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S
new file mode 100644
index 0000000000..9e60289c45
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function acosh vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_acosh)
+WRAPPER_IMPL_AVX _ZGVbN2v_acosh
+END (_ZGVcN4v_acosh)
diff --git a/sysdeps/x86_64/fpu/svml_d_acosh8_core.S b/sysdeps/x86_64/fpu/svml_d_acosh8_core.S
new file mode 100644
index 0000000000..ef1f8b3426
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_acosh8_core.S
@@ -0,0 +1,25 @@
+/* Function acosh vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_acosh)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_acosh
+END (_ZGVeN8v_acosh)
diff --git a/sysdeps/x86_64/fpu/svml_s_acoshf16_core.S b/sysdeps/x86_64/fpu/svml_s_acoshf16_core.S
new file mode 100644
index 0000000000..41c0241492
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_acoshf16_core.S
@@ -0,0 +1,25 @@
+/* Function acoshf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_acoshf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_acoshf
+END (_ZGVeN16v_acoshf)
diff --git a/sysdeps/x86_64/fpu/svml_s_acoshf4_core.S b/sysdeps/x86_64/fpu/svml_s_acoshf4_core.S
new file mode 100644
index 0000000000..2ef7f428c0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_acoshf4_core.S
@@ -0,0 +1,29 @@
+/* Function acoshf vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_acoshf)
+WRAPPER_IMPL_SSE2 acoshf
+END (_ZGVbN4v_acoshf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_acoshf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_acoshf8_core.S b/sysdeps/x86_64/fpu/svml_s_acoshf8_core.S
new file mode 100644
index 0000000000..40f1066ce2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_acoshf8_core.S
@@ -0,0 +1,29 @@
+/* Function acoshf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_acoshf)
+WRAPPER_IMPL_AVX _ZGVbN4v_acoshf
+END (_ZGVdN8v_acoshf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_acoshf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S
new file mode 100644
index 0000000000..b44a9ed28b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function acoshf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_acoshf)
+WRAPPER_IMPL_AVX _ZGVbN4v_acoshf
+END (_ZGVcN8v_acoshf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx.c
new file mode 100644
index 0000000000..331c6d71cc
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-acosh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx2.c
new file mode 100644
index 0000000000..331c6d71cc
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-acosh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx512f.c
new file mode 100644
index 0000000000..331c6d71cc
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-acosh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acosh.c b/sysdeps/x86_64/fpu/test-double-libmvec-acosh.c
new file mode 100644
index 0000000000..19b5997414
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-acosh.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC acosh
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 04a4fe654b..db7ae3e7a6 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
+VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVbN2v_acosh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index f9ac2fad5d..269ae38f67 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
+VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVdN4v_acosh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 185801fa82..d95b960a45 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
+VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVcN4v_acosh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 1cc8aaecbf..a22f08b5f8 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
 VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
+VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVeN8v_acosh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx.c
new file mode 100644
index 0000000000..7d75108bc0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-acoshf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx2.c
new file mode 100644
index 0000000000..7d75108bc0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-acoshf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx512f.c
new file mode 100644
index 0000000000..7d75108bc0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-acoshf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acoshf.c b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf.c
new file mode 100644
index 0000000000..f8b536df2e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC acoshf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index b5d76d80e0..7982ae2c84 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
+VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVeN16v_acoshf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index c1df6a03c1..bdfcbea2cd 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
+VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVbN4v_acoshf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index f4c646683f..7b3ba81441 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
+VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVdN8v_acoshf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index a6acd3ffca..a13d2e4ca1 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
 VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
+VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVcN8v_acoshf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 16/18] x86-64: Add vector erf/erff implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (14 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 15/18] x86-64: Add vector acosh/acoshf " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:27   ` H.J. Lu
  2021-12-29  6:39 ` [PATCH v5 17/18] x86-64: Add vector tanh/tanhf " Sunil K Pandey
  2021-12-29  6:40 ` [PATCH v5 18/18] x86-64: Add vector asinh/asinhf " Sunil K Pandey
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized erf/erff containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector erf/erff with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |  11 +
 math/bits/mathcalls.h                         |   2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
 sysdeps/x86/fpu/bits/math-vector.h            |   4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
 sysdeps/x86_64/fpu/Makeconfig                 |   1 +
 sysdeps/x86_64/fpu/Versions                   |   2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
 .../fpu/multiarch/svml_d_erf2_core-sse2.S     |  20 +
 .../x86_64/fpu/multiarch/svml_d_erf2_core.c   |  27 +
 .../fpu/multiarch/svml_d_erf2_core_sse4.S     | 987 ++++++++++++++++++
 .../fpu/multiarch/svml_d_erf4_core-sse.S      |  20 +
 .../x86_64/fpu/multiarch/svml_d_erf4_core.c   |  27 +
 .../fpu/multiarch/svml_d_erf4_core_avx2.S     | 984 +++++++++++++++++
 .../fpu/multiarch/svml_d_erf8_core-avx2.S     |  20 +
 .../x86_64/fpu/multiarch/svml_d_erf8_core.c   |  27 +
 .../fpu/multiarch/svml_d_erf8_core_avx512.S   | 983 +++++++++++++++++
 .../fpu/multiarch/svml_s_erff16_core-avx2.S   |  20 +
 .../x86_64/fpu/multiarch/svml_s_erff16_core.c |  28 +
 .../fpu/multiarch/svml_s_erff16_core_avx512.S | 185 ++++
 .../fpu/multiarch/svml_s_erff4_core-sse2.S    |  20 +
 .../x86_64/fpu/multiarch/svml_s_erff4_core.c  |  28 +
 .../fpu/multiarch/svml_s_erff4_core_sse4.S    | 664 ++++++++++++
 .../fpu/multiarch/svml_s_erff8_core-sse.S     |  20 +
 .../x86_64/fpu/multiarch/svml_s_erff8_core.c  |  28 +
 .../fpu/multiarch/svml_s_erff8_core_avx2.S    | 669 ++++++++++++
 sysdeps/x86_64/fpu/svml_d_erf2_core.S         |  29 +
 sysdeps/x86_64/fpu/svml_d_erf4_core.S         |  29 +
 sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S     |  25 +
 sysdeps/x86_64/fpu/svml_d_erf8_core.S         |  25 +
 sysdeps/x86_64/fpu/svml_s_erff16_core.S       |  25 +
 sysdeps/x86_64/fpu/svml_s_erff4_core.S        |  29 +
 sysdeps/x86_64/fpu/svml_s_erff8_core.S        |  29 +
 sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S    |  25 +
 .../x86_64/fpu/test-double-libmvec-erf-avx.c  |   1 +
 .../x86_64/fpu/test-double-libmvec-erf-avx2.c |   1 +
 .../fpu/test-double-libmvec-erf-avx512f.c     |   1 +
 sysdeps/x86_64/fpu/test-double-libmvec-erf.c  |   3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-libmvec-erff-avx.c  |   1 +
 .../x86_64/fpu/test-float-libmvec-erff-avx2.c |   1 +
 .../fpu/test-float-libmvec-erff-avx512f.c     |   1 +
 sysdeps/x86_64/fpu/test-float-libmvec-erff.c  |   3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
 50 files changed, 5044 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erf2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_erf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erff16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erff4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erff8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index b17bf78cd9..33d480031b 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -274,4 +274,15 @@
 #define __DECL_SIMD_acoshf32x
 #define __DECL_SIMD_acoshf64x
 #define __DECL_SIMD_acoshf128x
+
+#define __DECL_SIMD_erf
+#define __DECL_SIMD_erff
+#define __DECL_SIMD_erfl
+#define __DECL_SIMD_erff16
+#define __DECL_SIMD_erff32
+#define __DECL_SIMD_erff64
+#define __DECL_SIMD_erff128
+#define __DECL_SIMD_erff32x
+#define __DECL_SIMD_erff64x
+#define __DECL_SIMD_erff128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index bc37973c41..a5b6c4457f 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -228,7 +228,7 @@ __MATHCALL (yn,, (int, _Mdouble_));
 
 #if defined __USE_XOPEN || defined __USE_ISOC99
 /* Error and gamma functions.  */
-__MATHCALL (erf,, (_Mdouble_));
+__MATHCALL_VEC (erf,, (_Mdouble_));
 __MATHCALL (erfc,, (_Mdouble_));
 __MATHCALL (lgamma,, (_Mdouble_));
 #endif
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index e9d6ade70a..5525c8a0d6 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -53,6 +53,7 @@ GLIBC_2.35 _ZGVbN2v_atan F
 GLIBC_2.35 _ZGVbN2v_atanh F
 GLIBC_2.35 _ZGVbN2v_cbrt F
 GLIBC_2.35 _ZGVbN2v_cosh F
+GLIBC_2.35 _ZGVbN2v_erf F
 GLIBC_2.35 _ZGVbN2v_exp10 F
 GLIBC_2.35 _ZGVbN2v_exp2 F
 GLIBC_2.35 _ZGVbN2v_expm1 F
@@ -69,6 +70,7 @@ GLIBC_2.35 _ZGVbN4v_atanf F
 GLIBC_2.35 _ZGVbN4v_atanhf F
 GLIBC_2.35 _ZGVbN4v_cbrtf F
 GLIBC_2.35 _ZGVbN4v_coshf F
+GLIBC_2.35 _ZGVbN4v_erff F
 GLIBC_2.35 _ZGVbN4v_exp10f F
 GLIBC_2.35 _ZGVbN4v_exp2f F
 GLIBC_2.35 _ZGVbN4v_expm1f F
@@ -85,6 +87,7 @@ GLIBC_2.35 _ZGVcN4v_atan F
 GLIBC_2.35 _ZGVcN4v_atanh F
 GLIBC_2.35 _ZGVcN4v_cbrt F
 GLIBC_2.35 _ZGVcN4v_cosh F
+GLIBC_2.35 _ZGVcN4v_erf F
 GLIBC_2.35 _ZGVcN4v_exp10 F
 GLIBC_2.35 _ZGVcN4v_exp2 F
 GLIBC_2.35 _ZGVcN4v_expm1 F
@@ -101,6 +104,7 @@ GLIBC_2.35 _ZGVcN8v_atanf F
 GLIBC_2.35 _ZGVcN8v_atanhf F
 GLIBC_2.35 _ZGVcN8v_cbrtf F
 GLIBC_2.35 _ZGVcN8v_coshf F
+GLIBC_2.35 _ZGVcN8v_erff F
 GLIBC_2.35 _ZGVcN8v_exp10f F
 GLIBC_2.35 _ZGVcN8v_exp2f F
 GLIBC_2.35 _ZGVcN8v_expm1f F
@@ -117,6 +121,7 @@ GLIBC_2.35 _ZGVdN4v_atan F
 GLIBC_2.35 _ZGVdN4v_atanh F
 GLIBC_2.35 _ZGVdN4v_cbrt F
 GLIBC_2.35 _ZGVdN4v_cosh F
+GLIBC_2.35 _ZGVdN4v_erf F
 GLIBC_2.35 _ZGVdN4v_exp10 F
 GLIBC_2.35 _ZGVdN4v_exp2 F
 GLIBC_2.35 _ZGVdN4v_expm1 F
@@ -133,6 +138,7 @@ GLIBC_2.35 _ZGVdN8v_atanf F
 GLIBC_2.35 _ZGVdN8v_atanhf F
 GLIBC_2.35 _ZGVdN8v_cbrtf F
 GLIBC_2.35 _ZGVdN8v_coshf F
+GLIBC_2.35 _ZGVdN8v_erff F
 GLIBC_2.35 _ZGVdN8v_exp10f F
 GLIBC_2.35 _ZGVdN8v_exp2f F
 GLIBC_2.35 _ZGVdN8v_expm1f F
@@ -149,6 +155,7 @@ GLIBC_2.35 _ZGVeN16v_atanf F
 GLIBC_2.35 _ZGVeN16v_atanhf F
 GLIBC_2.35 _ZGVeN16v_cbrtf F
 GLIBC_2.35 _ZGVeN16v_coshf F
+GLIBC_2.35 _ZGVeN16v_erff F
 GLIBC_2.35 _ZGVeN16v_exp10f F
 GLIBC_2.35 _ZGVeN16v_exp2f F
 GLIBC_2.35 _ZGVeN16v_expm1f F
@@ -165,6 +172,7 @@ GLIBC_2.35 _ZGVeN8v_atan F
 GLIBC_2.35 _ZGVeN8v_atanh F
 GLIBC_2.35 _ZGVeN8v_cbrt F
 GLIBC_2.35 _ZGVeN8v_cosh F
+GLIBC_2.35 _ZGVeN8v_erf F
 GLIBC_2.35 _ZGVeN8v_exp10 F
 GLIBC_2.35 _ZGVeN8v_exp2 F
 GLIBC_2.35 _ZGVeN8v_expm1 F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 4ad12a33e5..ea0deb31c1 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -122,6 +122,10 @@
 #  define __DECL_SIMD_acosh __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_acoshf
 #  define __DECL_SIMD_acoshf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_erf
+#  define __DECL_SIMD_erf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_erff
+#  define __DECL_SIMD_erff __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index 503547d3e4..42addd9a25 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -60,6 +60,8 @@
 !GCC$ builtin (atanhf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (acosh) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (acoshf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (erf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (erff) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -105,3 +107,5 @@
 !GCC$ builtin (atanhf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (acosh) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (acoshf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (erf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (erff) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 7b90b3d049..2b89a1bba3 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -31,6 +31,7 @@ libmvec-funcs = \
   cbrt \
   cos \
   cosh \
+  erf \
   exp \
   exp10 \
   exp2 \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index fd5e5923a1..2fcdef6944 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -21,6 +21,7 @@ libmvec {
     _ZGVbN2v_atanh; _ZGVcN4v_atanh; _ZGVdN4v_atanh; _ZGVeN8v_atanh;
     _ZGVbN2v_cbrt; _ZGVcN4v_cbrt; _ZGVdN4v_cbrt; _ZGVeN8v_cbrt;
     _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh;
+    _ZGVbN2v_erf; _ZGVcN4v_erf; _ZGVdN4v_erf; _ZGVeN8v_erf;
     _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
     _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
     _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
@@ -37,6 +38,7 @@ libmvec {
     _ZGVbN4v_atanhf; _ZGVcN8v_atanhf; _ZGVdN8v_atanhf; _ZGVeN16v_atanhf;
     _ZGVbN4v_cbrtf; _ZGVcN8v_cbrtf; _ZGVdN8v_cbrtf; _ZGVeN16v_cbrtf;
     _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf;
+    _ZGVbN4v_erff; _ZGVcN8v_erff; _ZGVdN8v_erff; _ZGVeN16v_erff;
     _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
     _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
     _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index b2aa8fc56e..929de0e786 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1298,6 +1298,26 @@ float: 1
 float128: 2
 ldouble: 1
 
+Function: "erf_vlen16":
+float: 1
+
+Function: "erf_vlen2":
+double: 1
+
+Function: "erf_vlen4":
+double: 1
+float: 2
+
+Function: "erf_vlen4_avx2":
+double: 1
+
+Function: "erf_vlen8":
+double: 1
+float: 2
+
+Function: "erf_vlen8_avx2":
+float: 2
+
 Function: "erfc":
 double: 5
 float: 3
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core-sse2.S
new file mode 100644
index 0000000000..2b5735ebb3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized erf, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_erf _ZGVbN2v_erf_sse2
+#include "../svml_d_erf2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core.c
new file mode 100644
index 0000000000..74757be88f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized erf, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_erf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_erf, __GI__ZGVbN2v_erf, __redirect__ZGVbN2v_erf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core_sse4.S
new file mode 100644
index 0000000000..c164748bbe
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core_sse4.S
@@ -0,0 +1,987 @@
+/* Function erf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Basic formula is
+ *    erf(x) ~ erf(x0) +
+ *              + exp(-x0*x0)*D*(1+c0+T*P1(T)+D^2*P3(T)+D^4*P5(T)+D^6*p7+D^8*p9)
+ *   where D=x-x0, T=x0*D
+ *   x0 is x rounded to a specified number of fractional bits (in this case 7),
+ *    except that x0=0 for |x|<3.5/128.0 (using x0=0 for first 4 table entries)
+ *
+ *   Data table packs both erf(x0)_high and a few bits of erf(x0)_low in one
+ *   entry (in place of redundant exponent bits)
+ *
+ */
+
+/* Offsets for data table __svml_derf_data_internal
+ */
+#define _erf_tbl                      	0
+#define _AbsMask                      	12288
+#define _MaxThreshold                 	12304
+#define _SRound                       	12320
+#define _U2Threshold                  	12336
+#define _poly1_0                      	12352
+#define _poly1_1                      	12368
+#define _poly3_0                      	12384
+#define _poly3_1                      	12400
+#define _poly5_0                      	12416
+#define _poly5_1                      	12432
+#define _poly1_2                      	12448
+#define _poly3_2                      	12464
+#define _poly1_3                      	12480
+#define _poly3_3                      	12496
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_erf_sse4)
+/*
+ * vector gather: erf(x0),
+ * second value is exp(-x0*x0)
+ */
+        lea       __svml_derf_data_internal(%rip), %rcx
+        movups    _AbsMask+__svml_derf_data_internal(%rip), %xmm5
+        andps     %xmm0, %xmm5
+
+/*
+ * erf(x) rounds to 1.0 for x>_MaxThreshold (5.9921875)
+ * can compute all results in the main path
+ */
+        movaps    %xmm5, %xmm9
+
+/* save sign */
+        pxor      %xmm5, %xmm0
+        minpd     _MaxThreshold+__svml_derf_data_internal(%rip), %xmm9
+        movups    _SRound+__svml_derf_data_internal(%rip), %xmm1
+        movaps    %xmm1, %xmm2
+        addpd     %xmm9, %xmm2
+        movaps    %xmm2, %xmm8
+        psllq     $4, %xmm2
+        subpd     %xmm1, %xmm8
+        movd      %xmm2, %eax
+        movups    _U2Threshold+__svml_derf_data_internal(%rip), %xmm11
+        cmpltpd   %xmm9, %xmm11
+        subpd     %xmm8, %xmm9
+        mulpd     %xmm9, %xmm8
+
+/*
+ * _LA_ polynomial computation
+ * Start polynomial evaluation
+ */
+        movups    _poly1_0+__svml_derf_data_internal(%rip), %xmm7
+        andps     %xmm9, %xmm11
+        mulpd     %xmm8, %xmm7
+
+/* D2 = Diff^2 */
+        mulpd     %xmm11, %xmm11
+        addpd     _poly1_1+__svml_derf_data_internal(%rip), %xmm7
+
+/* NaN fixup */
+        minpd     %xmm5, %xmm9
+        mulpd     %xmm8, %xmm7
+        movups    _poly3_0+__svml_derf_data_internal(%rip), %xmm6
+
+/* T^2 */
+        movaps    %xmm8, %xmm12
+        mulpd     %xmm8, %xmm6
+        addpd     _poly1_2+__svml_derf_data_internal(%rip), %xmm7
+        addpd     _poly3_1+__svml_derf_data_internal(%rip), %xmm6
+        mulpd     %xmm8, %xmm12
+        mulpd     %xmm8, %xmm6
+        mulpd     %xmm8, %xmm7
+        addpd     _poly3_2+__svml_derf_data_internal(%rip), %xmm6
+        addpd     _poly1_3+__svml_derf_data_internal(%rip), %xmm7
+        mulpd     %xmm8, %xmm6
+
+/* P1 = T^2*P1 - T */
+        mulpd     %xmm7, %xmm12
+        movups    _poly5_0+__svml_derf_data_internal(%rip), %xmm10
+
+/* Sign | Diff */
+        pxor      %xmm0, %xmm9
+        mulpd     %xmm8, %xmm10
+        subpd     %xmm8, %xmm12
+        addpd     _poly5_1+__svml_derf_data_internal(%rip), %xmm10
+        mulpd     %xmm11, %xmm10
+        addpd     _poly3_3+__svml_derf_data_internal(%rip), %xmm10
+        addpd     %xmm6, %xmm10
+        pshufd    $2, %xmm2, %xmm3
+        movd      %xmm3, %edx
+
+/* P1 + P3*D2 */
+        mulpd     %xmm10, %xmm11
+        movslq    %eax, %rax
+        movslq    %edx, %rdx
+        addpd     %xmm11, %xmm12
+        movups    (%rcx,%rax), %xmm13
+        movups    (%rcx,%rdx), %xmm4
+        movaps    %xmm13, %xmm14
+        unpckhpd  %xmm4, %xmm13
+
+/* exp_h(x0) * Diff */
+        mulpd     %xmm9, %xmm13
+
+/*
+ * branch-free
+ * low part of result: exp_h(x0) * Diff*(1+P1)
+ */
+        mulpd     %xmm13, %xmm12
+        addpd     %xmm12, %xmm13
+        unpcklpd  %xmm4, %xmm14
+
+/* Sign | _Erf_H */
+        pxor      %xmm0, %xmm14
+
+/* Final result */
+        addpd     %xmm13, %xmm14
+
+/* Fix erf(-0) = -0 */
+        orps      %xmm14, %xmm0
+        ret
+
+END(_ZGVbN2v_erf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_derf_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _erf_tbl[6*128*2][2];
+        __declspec(align(16)) VUINT32 _AbsMask[2][2];
+        __declspec(align(16)) VUINT32 _MaxThreshold[2][2];
+        __declspec(align(16)) VUINT32 _SRound[2][2];
+        __declspec(align(16)) VUINT32 _U2Threshold[2][2];
+        __declspec(align(16)) VUINT32 _poly1_0[2][2];
+        __declspec(align(16)) VUINT32 _poly1_1[2][2];
+        __declspec(align(16)) VUINT32 _poly3_0[2][2];
+        __declspec(align(16)) VUINT32 _poly3_1[2][2];
+        __declspec(align(16)) VUINT32 _poly5_0[2][2];
+        __declspec(align(16)) VUINT32 _poly5_1[2][2];
+        __declspec(align(16)) VUINT32 _poly1_2[2][2];
+        __declspec(align(16)) VUINT32 _poly3_2[2][2];
+        __declspec(align(16)) VUINT32 _poly1_3[2][2];
+        __declspec(align(16)) VUINT32 _poly3_3[2][2];
+} __svml_derf_data_internal;
+#endif
+__svml_derf_data_internal:
+        /*== _erf_tbl ==*/
+        .quad 0x0000000000000000, 0x3ff20dd750429b6d
+        .quad 0x3f820dbf3deb1340, 0x3ff20d8f1975c85d
+        .quad 0x3f920d77083f17a0, 0x3ff20cb67bd452c7
+        .quad 0x3f9b137e0cf584dc, 0x3ff20b4d8bac36c1
+        .quad 0x3fa20c5645dd2538, 0x3ff209546ad13ccf
+        .quad 0x3fa68e5d3bbc9526, 0x3ff206cb4897b148
+        .quad 0x3fab0fafef135745, 0x3ff203b261cd0053
+        .quad 0x3faf902a77bd3821, 0x3ff2000a00ae3804
+        .quad 0x3fb207d480e90658, 0x3ff1fbd27cdc72d3
+        .quad 0x3fb44703e87e8593, 0x3ff1f70c3b4f2cc8
+        .quad 0x3fb68591a1e83b5d, 0x3ff1f1b7ae44867f
+        .quad 0x3fb8c36beb8a8d23, 0x3ff1ebd5552f795b
+        .quad 0x3fbb0081148a873a, 0x3ff1e565bca400d4
+        .quad 0x3fbd3cbf7e70a4b3, 0x3ff1de697e413d29
+        .quad 0x3fbf78159ec8bb50, 0x3ff1d6e14099944a
+        .quad 0x3fc0d939005f65e5, 0x3ff1cecdb718d61c
+        .quad 0x3fc1f5e1a35c3b89, 0x3ff1c62fa1e869b6
+        .quad 0x3fc311fc15f56d14, 0x3ff1bd07cdd189ac
+        .quad 0x3fc42d7fc2f64959, 0x3ff1b357141d95d5
+        .quad 0x3fc548642321d7c6, 0x3ff1a91e5a748165
+        .quad 0x3fc662a0bdf7a89f, 0x3ff19e5e92b964ab
+        .quad 0x3fc77c2d2a765f9e, 0x3ff19318bae53a04
+        .quad 0x3fc895010fdbdbfd, 0x3ff1874ddcdfce24
+        .quad 0x3fc9ad142662e14d, 0x3ff17aff0e56ec10
+        .quad 0x3fcac45e37fe2526, 0x3ff16e2d7093cd8c
+        .quad 0x3fcbdad72110a648, 0x3ff160da304ed92f
+        .quad 0x3fccf076d1233237, 0x3ff153068581b781
+        .quad 0x3fce05354b96ff36, 0x3ff144b3b337c90c
+        .quad 0x3fcf190aa85540e2, 0x3ff135e3075d076b
+        .quad 0x3fd015f78a3dcf3d, 0x3ff12695da8b5bde
+        .quad 0x3fd09eed6982b948, 0x3ff116cd8fd67618
+        .quad 0x3fd127631eb8de32, 0x3ff1068b94962e5e
+        .quad 0x3fd1af54e232d609, 0x3ff0f5d1602f7e41
+        .quad 0x3fd236bef825d9a2, 0x3ff0e4a073dc1b91
+        .quad 0x3fd2bd9db0f7827f, 0x3ff0d2fa5a70c168
+        .quad 0x3fd343ed6989b7d9, 0x3ff0c0e0a8223359
+        .quad 0x3fd3c9aa8b84beda, 0x3ff0ae54fa490723
+        .quad 0x3fd44ed18d9f6462, 0x3ff09b58f724416b
+        .quad 0x3fd4d35ef3e5372e, 0x3ff087ee4d9ad247
+        .quad 0x3fd5574f4ffac98e, 0x3ff07416b4fbfe7c
+        .quad 0x3fd5da9f415ff23f, 0x3ff05fd3ecbec298
+        .quad 0x3fd65d4b75b00471, 0x3ff04b27bc403d30
+        .quad 0x3fd6df50a8dff772, 0x3ff03613f2812daf
+        .quad 0x3fd760aba57a76bf, 0x3ff0209a65e29545
+        .quad 0x3fd7e15944d9d3e4, 0x3ff00abcf3e187a9
+        .quad 0x3fd861566f5fd3c0, 0x3fefe8fb01a47307
+        .quad 0x3fd8e0a01cab516b, 0x3fefbbbbef34b4b2
+        .quad 0x3fd95f3353cbb146, 0x3fef8dc092d58ff8
+        .quad 0x3fd9dd0d2b721f39, 0x3fef5f0cdaf15313
+        .quad 0x3fda5a2aca209394, 0x3fef2fa4c16c0019
+        .quad 0x3fdad68966569a87, 0x3feeff8c4b1375db
+        .quad 0x3fdb522646bbda68, 0x3feecec7870ebca8
+        .quad 0x3fdbccfec24855b8, 0x3fee9d5a8e4c934e
+        .quad 0x3fdc4710406a65fc, 0x3fee6b4982f158b9
+        .quad 0x3fdcc058392a6d2d, 0x3fee38988fc46e72
+        .quad 0x3fdd38d4354c3bd0, 0x3fee054be79d3042
+        .quad 0x3fddb081ce6e2a48, 0x3fedd167c4cf9d2a
+        .quad 0x3fde275eaf25e458, 0x3fed9cf06898cdaf
+        .quad 0x3fde9d68931ae650, 0x3fed67ea1a8b5368
+        .quad 0x3fdf129d471eabb1, 0x3fed325927fb9d89
+        .quad 0x3fdf86faa9428f9d, 0x3fecfc41e36c7df9
+        .quad 0x3fdffa7ea8eb5fd0, 0x3fecc5a8a3fbea40
+        .quad 0x3fe03693a371519c, 0x3fec8e91c4d01368
+        .quad 0x3fe06f794ab2cae7, 0x3fec5701a484ef9d
+        .quad 0x3fe0a7ef5c18edd2, 0x3fec1efca49a5011
+        .quad 0x3fe0dff4f247f6c6, 0x3febe68728e29d5e
+        .quad 0x3fe1178930ada115, 0x3febada596f25436
+        .quad 0x3fe14eab43841b55, 0x3feb745c55905bf8
+        .quad 0x3fe1855a5fd3dd50, 0x3feb3aafcc27502e
+        .quad 0x3fe1bb95c3746199, 0x3feb00a46237d5be
+        .quad 0x3fe1f15cb50bc4de, 0x3feac63e7ecc1411
+        .quad 0x3fe226ae840d4d70, 0x3fea8b8287ec6a09
+        .quad 0x3fe25b8a88b6dd7f, 0x3fea5074e2157620
+        .quad 0x3fe28ff0240d52cd, 0x3fea1519efaf889e
+        .quad 0x3fe2c3debfd7d6c1, 0x3fe9d97610879642
+        .quad 0x3fe2f755ce9a21f4, 0x3fe99d8da149c13f
+        .quad 0x3fe32a54cb8db67b, 0x3fe96164fafd8de3
+        .quad 0x3fe35cdb3a9a144d, 0x3fe925007283d7aa
+        .quad 0x3fe38ee8a84beb71, 0x3fe8e86458169af8
+        .quad 0x3fe3c07ca9cb4f9e, 0x3fe8ab94f6caa71d
+        .quad 0x3fe3f196dcd0f135, 0x3fe86e9694134b9e
+        .quad 0x3fe42236e79a5fa6, 0x3fe8316d6f48133d
+        .quad 0x3fe4525c78dd5966, 0x3fe7f41dc12c9e89
+        .quad 0x3fe4820747ba2dc2, 0x3fe7b6abbb7aaf19
+        .quad 0x3fe4b13713ad3513, 0x3fe7791b886e7403
+        .quad 0x3fe4dfeba47f63cc, 0x3fe73b714a552763
+        .quad 0x3fe50e24ca35fd2c, 0x3fe6fdb11b1e0c34
+        .quad 0x3fe53be25d016a4f, 0x3fe6bfdf0beddaf5
+        .quad 0x3fe569243d2b3a9b, 0x3fe681ff24b4ab04
+        .quad 0x3fe595ea53035283, 0x3fe6441563c665d4
+        .quad 0x3fe5c2348ecc4dc3, 0x3fe60625bd75d07b
+        .quad 0x3fe5ee02e8a71a53, 0x3fe5c8341bb23767
+        .quad 0x3fe61955607dd15d, 0x3fe58a445da7c74c
+        .quad 0x3fe6442bfdedd397, 0x3fe54c5a57629db0
+        .quad 0x3fe66e86d0312e82, 0x3fe50e79d1749ac9
+        .quad 0x3fe69865ee075011, 0x3fe4d0a6889dfd9f
+        .quad 0x3fe6c1c9759d0e5f, 0x3fe492e42d78d2c5
+        .quad 0x3fe6eab18c74091b, 0x3fe4553664273d24
+        .quad 0x3fe7131e5f496a5a, 0x3fe417a0c4049fd0
+        .quad 0x3fe73b1021fc0cb8, 0x3fe3da26d759aef5
+        .quad 0x3fe762870f720c6f, 0x3fe39ccc1b136d5a
+        .quad 0x3fe78983697dc96f, 0x3fe35f93fe7d1b3d
+        .quad 0x3fe7b00578c26037, 0x3fe32281e2fd1a92
+        .quad 0x3fe7d60d8c979f7b, 0x3fe2e5991bd4cbfc
+        .quad 0x3fe7fb9bfaed8078, 0x3fe2a8dcede3673b
+        .quad 0x3fe820b1202f27fb, 0x3fe26c508f6bd0ff
+        .quad 0x3fe8454d5f25760d, 0x3fe22ff727dd6f7b
+        .quad 0x3fe8697120d92a4a, 0x3fe1f3d3cf9ffe5a
+        .quad 0x3fe88d1cd474a2e0, 0x3fe1b7e98fe26217
+        .quad 0x3fe8b050ef253c37, 0x3fe17c3b626c7a12
+        .quad 0x3fe8d30debfc572e, 0x3fe140cc3173f007
+        .quad 0x3fe8f5544bd00c04, 0x3fe1059ed7740313
+        .quad 0x3fe91724951b8fc6, 0x3fe0cab61f084b93
+        .quad 0x3fe9387f53df5238, 0x3fe09014c2ca74da
+        .quad 0x3fe959651980da31, 0x3fe055bd6d32e8d7
+        .quad 0x3fe979d67caa6631, 0x3fe01bb2b87c6968
+        .quad 0x3fe999d4192a5715, 0x3fdfc3ee5d1524b0
+        .quad 0x3fe9b95e8fd26aba, 0x3fdf511a91a67d2a
+        .quad 0x3fe9d8768656cc42, 0x3fdedeeee0959518
+        .quad 0x3fe9f71ca72cffb6, 0x3fde6d6ffaa65a25
+        .quad 0x3fea1551a16aaeaf, 0x3fddfca26f5bbf88
+        .quad 0x3fea331628a45b92, 0x3fdd8c8aace11e63
+        .quad 0x3fea506af4cc00f4, 0x3fdd1d2cfff91594
+        .quad 0x3fea6d50c20fa293, 0x3fdcae8d93f1d7b7
+        .quad 0x3fea89c850b7d54d, 0x3fdc40b0729ed548
+        .quad 0x3feaa5d265064366, 0x3fdbd3998457afdb
+        .quad 0x3feac16fc7143263, 0x3fdb674c8ffc6283
+        .quad 0x3feadca142b10f98, 0x3fdafbcd3afe8ab6
+        .quad 0x3feaf767a741088b, 0x3fda911f096fbc26
+        .quad 0x3feb11c3c79bb424, 0x3fda27455e14c93c
+        .quad 0x3feb2bb679ead19c, 0x3fd9be437a7de946
+        .quad 0x3feb4540978921ee, 0x3fd9561c7f23a47b
+        .quad 0x3feb5e62fce16095, 0x3fd8eed36b886d93
+        .quad 0x3feb771e894d602e, 0x3fd8886b1e5ecfd1
+        .quad 0x3feb8f741ef54f83, 0x3fd822e655b417e7
+        .quad 0x3feba764a2af2b78, 0x3fd7be47af1f5d89
+        .quad 0x3febbef0fbde6221, 0x3fd75a91a7f4d2ed
+        .quad 0x3febd61a1453ab44, 0x3fd6f7c69d7d3ef8
+        .quad 0x3febece0d82d1a5c, 0x3fd695e8cd31867e
+        .quad 0x3fec034635b66e23, 0x3fd634fa54fa285f
+        .quad 0x3fec194b1d49a184, 0x3fd5d4fd33729015
+        .quad 0x3fec2ef0812fc1bd, 0x3fd575f3483021c3
+        .quad 0x3fec443755820d64, 0x3fd517de540ce2a3
+        .quad 0x3fec5920900b5fd1, 0x3fd4babff975a04c
+        .quad 0x3fec6dad2829ec62, 0x3fd45e99bcbb7915
+        .quad 0x3fec81de16b14cef, 0x3fd4036d0468a7a2
+        .quad 0x3fec95b455cce69d, 0x3fd3a93b1998736c
+        .quad 0x3feca930e0e2a825, 0x3fd35005285227f1
+        .quad 0x3fecbc54b476248d, 0x3fd2f7cc3fe6f423
+        .quad 0x3feccf20ce0c0d27, 0x3fd2a09153529381
+        .quad 0x3fece1962c0e0d8b, 0x3fd24a55399ea239
+        .quad 0x3fecf3b5cdaf0c39, 0x3fd1f518ae487dc8
+        .quad 0x3fed0580b2cfd249, 0x3fd1a0dc51a9934d
+        .quad 0x3fed16f7dbe41ca0, 0x3fd14da0a961fd14
+        .quad 0x3fed281c49d818d0, 0x3fd0fb6620c550af
+        .quad 0x3fed38eefdf64fdd, 0x3fd0aa2d09497f2b
+        .quad 0x3fed4970f9ce00d9, 0x3fd059f59af7a906
+        .quad 0x3fed59a33f19ed42, 0x3fd00abff4dec7a3
+        .quad 0x3fed6986cfa798e7, 0x3fcf79183b101c5b
+        .quad 0x3fed791cad3eff01, 0x3fcedeb406d9c825
+        .quad 0x3fed8865d98abe01, 0x3fce4652fadcb6b2
+        .quad 0x3fed97635600bb89, 0x3fcdaff4969c0b04
+        .quad 0x3feda61623cb41e0, 0x3fcd1b982c501370
+        .quad 0x3fedb47f43b2980d, 0x3fcc893ce1dcbef7
+        .quad 0x3fedc29fb60715af, 0x3fcbf8e1b1ca2279
+        .quad 0x3fedd0787a8bb39d, 0x3fcb6a856c3ed54f
+        .quad 0x3fedde0a90611a0d, 0x3fcade26b7fbed95
+        .quad 0x3fedeb56f5f12d28, 0x3fca53c4135a6526
+        .quad 0x3fedf85ea8db188e, 0x3fc9cb5bd549b111
+        .quad 0x3fee0522a5dfda73, 0x3fc944ec2e4f5630
+        .quad 0x3fee11a3e8cf4eb8, 0x3fc8c07329874652
+        .quad 0x3fee1de36c75ba58, 0x3fc83deeada4d25a
+        .quad 0x3fee29e22a89d766, 0x3fc7bd5c7df3fe9c
+        .quad 0x3fee35a11b9b61ce, 0x3fc73eba3b5b07b7
+        .quad 0x3fee4121370224cc, 0x3fc6c205655be720
+        .quad 0x3fee4c6372cd8927, 0x3fc6473b5b15a7a1
+        .quad 0x3fee5768c3b4a3fc, 0x3fc5ce595c455b0a
+        .quad 0x3fee62321d06c5e0, 0x3fc5575c8a468362
+        .quad 0x3fee6cc0709c8a0d, 0x3fc4e241e912c305
+        .quad 0x3fee7714aec96534, 0x3fc46f066040a832
+        .quad 0x3fee812fc64db369, 0x3fc3fda6bc016994
+        .quad 0x3fee8b12a44944a8, 0x3fc38e1fae1d6a9d
+        .quad 0x3fee94be342e6743, 0x3fc3206dceef5f87
+        .quad 0x3fee9e335fb56f87, 0x3fc2b48d9e5dea1c
+        .quad 0x3feea7730ed0bbb9, 0x3fc24a7b84d38971
+        .quad 0x3feeb07e27a133aa, 0x3fc1e233d434b813
+        .quad 0x3feeb9558e6b42ce, 0x3fc17bb2c8d41535
+        .quad 0x3feec1fa258c4bea, 0x3fc116f48a6476cc
+        .quad 0x3feeca6ccd709544, 0x3fc0b3f52ce8c383
+        .quad 0x3feed2ae6489ac1e, 0x3fc052b0b1a174ea
+        .quad 0x3feedabfc7453e63, 0x3fbfe6460fef4680
+        .quad 0x3feee2a1d004692c, 0x3fbf2a901ccafb37
+        .quad 0x3feeea5557137ae0, 0x3fbe723726b824a9
+        .quad 0x3feef1db32a2277c, 0x3fbdbd32ac4c99b0
+        .quad 0x3feef93436bc2daa, 0x3fbd0b7a0f921e7c
+        .quad 0x3fef006135426b26, 0x3fbc5d0497c09e74
+        .quad 0x3fef0762fde45ee6, 0x3fbbb1c972f23e50
+        .quad 0x3fef0e3a5e1a1788, 0x3fbb09bfb7d11a84
+        .quad 0x3fef14e8211e8c55, 0x3fba64de673e8837
+        .quad 0x3fef1b6d0fea5f4d, 0x3fb9c31c6df3b1b8
+        .quad 0x3fef21c9f12f0677, 0x3fb92470a61b6965
+        .quad 0x3fef27ff89525acf, 0x3fb888d1d8e510a3
+        .quad 0x3fef2e0e9a6a8b09, 0x3fb7f036c0107294
+        .quad 0x3fef33f7e43a706b, 0x3fb75a96077274ba
+        .quad 0x3fef39bc242e43e6, 0x3fb6c7e64e7281cb
+        .quad 0x3fef3f5c1558b19e, 0x3fb6381e2980956b
+        .quad 0x3fef44d870704911, 0x3fb5ab342383d178
+        .quad 0x3fef4a31ebcd47df, 0x3fb5211ebf41880b
+        .quad 0x3fef4f693b67bd77, 0x3fb499d478bca735
+        .quad 0x3fef547f10d60597, 0x3fb4154bc68d75c3
+        .quad 0x3fef59741b4b97cf, 0x3fb3937b1b31925a
+        .quad 0x3fef5e4907982a07, 0x3fb31458e6542847
+        .quad 0x3fef62fe80272419, 0x3fb297db960e4f63
+        .quad 0x3fef67952cff6282, 0x3fb21df9981f8e53
+        .quad 0x3fef6c0db3c34641, 0x3fb1a6a95b1e786f
+        .quad 0x3fef7068b7b10fd9, 0x3fb131e14fa1625d
+        .quad 0x3fef74a6d9a38383, 0x3fb0bf97e95f2a64
+        .quad 0x3fef78c8b812d498, 0x3fb04fc3a0481321
+        .quad 0x3fef7cceef15d631, 0x3fafc4b5e32d6259
+        .quad 0x3fef80ba18636f07, 0x3faeeea8c1b1db94
+        .quad 0x3fef848acb544e95, 0x3fae1d4cf1e2450a
+        .quad 0x3fef88419ce4e184, 0x3fad508f9a1ea64f
+        .quad 0x3fef8bdf1fb78370, 0x3fac885df3451a07
+        .quad 0x3fef8f63e416ebff, 0x3fabc4a54a84e834
+        .quad 0x3fef92d077f8d56d, 0x3fab055303221015
+        .quad 0x3fef96256700da8e, 0x3faa4a549829587e
+        .quad 0x3fef99633a838a57, 0x3fa993979e14fffe
+        .quad 0x3fef9c8a7989af0d, 0x3fa8e109c4622913
+        .quad 0x3fef9f9ba8d3c733, 0x3fa83298d717210e
+        .quad 0x3fefa2974addae45, 0x3fa78832c03aa2b1
+        .quad 0x3fefa57ddfe27376, 0x3fa6e1c5893c380b
+        .quad 0x3fefa84fe5e05c8d, 0x3fa63f3f5c4de13b
+        .quad 0x3fefab0dd89d1309, 0x3fa5a08e85af27e0
+        .quad 0x3fefadb831a9f9c3, 0x3fa505a174e9c929
+        .quad 0x3fefb04f6868a944, 0x3fa46e66be002240
+        .quad 0x3fefb2d3f20f9101, 0x3fa3dacd1a8d8cce
+        .quad 0x3fefb54641aebbc9, 0x3fa34ac36ad8dafe
+        .quad 0x3fefb7a6c834b5a2, 0x3fa2be38b6d92415
+        .quad 0x3fefb9f5f4739170, 0x3fa2351c2f2d1449
+        .quad 0x3fefbc3433260ca5, 0x3fa1af5d2e04f3f6
+        .quad 0x3fefbe61eef4cf6a, 0x3fa12ceb37ff9bc3
+        .quad 0x3fefc07f907bc794, 0x3fa0adb5fcfa8c75
+        .quad 0x3fefc28d7e4f9cd0, 0x3fa031ad58d56279
+        .quad 0x3fefc48c1d033c7a, 0x3f9f7182a851bca2
+        .quad 0x3fefc67bcf2d7b8f, 0x3f9e85c449e377f3
+        .quad 0x3fefc85cf56ecd38, 0x3f9da0005e5f28df
+        .quad 0x3fefca2fee770c79, 0x3f9cc0180af00a8b
+        .quad 0x3fefcbf5170b578b, 0x3f9be5ecd2fcb5f9
+        .quad 0x3fefcdacca0bfb73, 0x3f9b1160991ff737
+        .quad 0x3fefcf57607a6e7c, 0x3f9a4255a00b9f03
+        .quad 0x3fefd0f5317f582f, 0x3f9978ae8b55ce1b
+        .quad 0x3fefd2869270a56f, 0x3f98b44e6031383e
+        .quad 0x3fefd40bd6d7a785, 0x3f97f5188610ddc8
+        .quad 0x3fefd58550773cb5, 0x3f973af0c737bb45
+        .quad 0x3fefd6f34f52013a, 0x3f9685bb5134ef13
+        .quad 0x3fefd85621b0876d, 0x3f95d55cb54cd53a
+        .quad 0x3fefd9ae142795e3, 0x3f9529b9e8cf9a1e
+        .quad 0x3fefdafb719e6a69, 0x3f9482b8455dc491
+        .quad 0x3fefdc3e835500b3, 0x3f93e03d891b37de
+        .quad 0x3fefdd7790ea5bc0, 0x3f93422fd6d12e2b
+        .quad 0x3fefdea6e062d0c9, 0x3f92a875b5ffab56
+        .quad 0x3fefdfccb62e52d3, 0x3f9212f612dee7fb
+        .quad 0x3fefe0e9552ebdd6, 0x3f9181983e5133dd
+        .quad 0x3fefe1fcfebe2083, 0x3f90f443edc5ce49
+        .quad 0x3fefe307f2b503d0, 0x3f906ae13b0d3255
+        .quad 0x3fefe40a6f70af4b, 0x3f8fcab1483ea7fc
+        .quad 0x3fefe504b1d9696c, 0x3f8ec72615a894c4
+        .quad 0x3fefe5f6f568b301, 0x3f8dcaf3691fc448
+        .quad 0x3fefe6e1742f7cf6, 0x3f8cd5ec93c12432
+        .quad 0x3fefe7c466dc57a1, 0x3f8be7e5ac24963b
+        .quad 0x3fefe8a004c19ae6, 0x3f8b00b38d6b3575
+        .quad 0x3fefe97483db8670, 0x3f8a202bd6372dce
+        .quad 0x3fefea4218d6594a, 0x3f894624e78e0faf
+        .quad 0x3fefeb08f7146046, 0x3f887275e3a6869e
+        .quad 0x3fefebc950b3fa75, 0x3f87a4f6aca256cb
+        .quad 0x3fefec835695932e, 0x3f86dd7fe3358230
+        .quad 0x3fefed37386190fb, 0x3f861beae53b72b7
+        .quad 0x3fefede5248e38f4, 0x3f856011cc3b036d
+        .quad 0x3fefee8d486585ee, 0x3f84a9cf6bda3f4c
+        .quad 0x3fefef2fd00af31a, 0x3f83f8ff5042a88e
+        .quad 0x3fefefcce6813974, 0x3f834d7dbc76d7e5
+        .quad 0x3feff064b5afffbe, 0x3f82a727a89a3f14
+        .quad 0x3feff0f766697c76, 0x3f8205dac02bd6b9
+        .quad 0x3feff18520700971, 0x3f81697560347b26
+        .quad 0x3feff20e0a7ba8c2, 0x3f80d1d69569b82d
+        .quad 0x3feff2924a3f7a83, 0x3f803ede1a45bfee
+        .quad 0x3feff312046f2339, 0x3f7f60d8aa2a88f2
+        .quad 0x3feff38d5cc4227f, 0x3f7e4cc4abf7d065
+        .quad 0x3feff404760319b4, 0x3f7d4143a9dfe965
+        .quad 0x3feff47772010262, 0x3f7c3e1a5f5c077c
+        .quad 0x3feff4e671a85425, 0x3f7b430ecf4a83a8
+        .quad 0x3feff55194fe19df, 0x3f7a4fe83fb9db25
+        .quad 0x3feff5b8fb26f5f6, 0x3f79646f35a76624
+        .quad 0x3feff61cc26c1578, 0x3f78806d70b2fc36
+        .quad 0x3feff67d08401202, 0x3f77a3ade6c8b3e5
+        .quad 0x3feff6d9e943c231, 0x3f76cdfcbfc1e263
+        .quad 0x3feff733814af88c, 0x3f75ff2750fe7820
+        .quad 0x3feff789eb6130c9, 0x3f7536fc18f7ce5c
+        .quad 0x3feff7dd41ce2b4d, 0x3f74754abacdf1dc
+        .quad 0x3feff82d9e1a76d8, 0x3f73b9e3f9d06e3f
+        .quad 0x3feff87b1913e853, 0x3f730499b503957f
+        .quad 0x3feff8c5cad200a5, 0x3f72553ee2a336bf
+        .quad 0x3feff90dcaba4096, 0x3f71aba78ba3af89
+        .quad 0x3feff9532f846ab0, 0x3f7107a8c7323a6e
+        .quad 0x3feff9960f3eb327, 0x3f706918b6355624
+        .quad 0x3feff9d67f51ddba, 0x3f6f9f9cfd9c3035
+        .quad 0x3feffa14948549a7, 0x3f6e77448fb66bb9
+        .quad 0x3feffa506302ebae, 0x3f6d58da68fd1170
+        .quad 0x3feffa89fe5b3625, 0x3f6c4412bf4b8f0b
+        .quad 0x3feffac17988ef4b, 0x3f6b38a3af2e55b4
+        .quad 0x3feffaf6e6f4f5c0, 0x3f6a3645330550ff
+        .quad 0x3feffb2a5879f35e, 0x3f693cb11a30d765
+        .quad 0x3feffb5bdf67fe6f, 0x3f684ba3004a50d0
+        .quad 0x3feffb8b8c88295f, 0x3f6762d84469c18f
+        .quad 0x3feffbb970200110, 0x3f66821000795a03
+        .quad 0x3feffbe599f4f9d9, 0x3f65a90b00981d93
+        .quad 0x3feffc10194fcb64, 0x3f64d78bba8ca5fd
+        .quad 0x3feffc38fcffbb7c, 0x3f640d564548fad7
+        .quad 0x3feffc60535dd7f5, 0x3f634a305080681f
+        .quad 0x3feffc862a501fd7, 0x3f628de11c5031eb
+        .quad 0x3feffcaa8f4c9bea, 0x3f61d83170fbf6fb
+        .quad 0x3feffccd8f5c66d1, 0x3f6128eb96be8798
+        .quad 0x3feffcef371ea4d7, 0x3f607fdb4dafea5f
+        .quad 0x3feffd0f92cb6ba7, 0x3f5fb99b8b8279e1
+        .quad 0x3feffd2eae369a07, 0x3f5e7f232d9e2630
+        .quad 0x3feffd4c94d29fdb, 0x3f5d4fed7195d7e8
+        .quad 0x3feffd6951b33686, 0x3f5c2b9cf7f893bf
+        .quad 0x3feffd84ef9009ee, 0x3f5b11d702b3deb2
+        .quad 0x3feffd9f78c7524a, 0x3f5a024365f771bd
+        .quad 0x3feffdb8f7605ee7, 0x3f58fc8c794b03b5
+        .quad 0x3feffdd1750e1220, 0x3f58005f08d6f1ef
+        .quad 0x3feffde8fb314ebf, 0x3f570d6a46e07dda
+        .quad 0x3feffdff92db56e5, 0x3f56235fbd7a4345
+        .quad 0x3feffe1544d01ccb, 0x3f5541f340697987
+        .quad 0x3feffe2a1988857c, 0x3f5468dadf4080ab
+        .quad 0x3feffe3e19349dc7, 0x3f5397ced7af2b15
+        .quad 0x3feffe514bbdc197, 0x3f52ce898809244e
+        .quad 0x3feffe63b8c8b5f7, 0x3f520cc76202c5fb
+        .quad 0x3feffe7567b7b5e1, 0x3f515246dda49d47
+        .quad 0x3feffe865fac722b, 0x3f509ec86c75d497
+        .quad 0x3feffe96a78a04a9, 0x3f4fe41cd9bb4eee
+        .quad 0x3feffea645f6d6da, 0x3f4e97ba3b77f306
+        .quad 0x3feffeb5415e7c44, 0x3f4d57f524723822
+        .quad 0x3feffec39ff380b9, 0x3f4c245d4b99847a
+        .quad 0x3feffed167b12ac2, 0x3f4afc85e0f82e12
+        .quad 0x3feffede9e5d3262, 0x3f49e005769dbc1d
+        .quad 0x3feffeeb49896c6d, 0x3f48ce75e9f6f8a0
+        .quad 0x3feffef76e956a9f, 0x3f47c7744d9378f7
+        .quad 0x3fefff0312b010b5, 0x3f46caa0d3582fe9
+        .quad 0x3fefff0e3ad91ec2, 0x3f45d79eb71e893b
+        .quad 0x3fefff18ebe2b0e1, 0x3f44ee1429bf7cc0
+        .quad 0x3fefff232a72b48e, 0x3f440daa3c89f5b6
+        .quad 0x3fefff2cfb0453d9, 0x3f43360ccd23db3a
+        .quad 0x3fefff3661e9569d, 0x3f4266ea71d4f71a
+        .quad 0x3fefff3f634b79f9, 0x3f419ff4663ae9df
+        .quad 0x3fefff48032dbe40, 0x3f40e0de78654d1e
+        .quad 0x3fefff50456dab8c, 0x3f40295ef6591848
+        .quad 0x3fefff582dc48d30, 0x3f3ef25d37f49fe1
+        .quad 0x3fefff5fbfc8a439, 0x3f3da01102b5f851
+        .quad 0x3fefff66feee5129, 0x3f3c5b5412dcafad
+        .quad 0x3fefff6dee89352e, 0x3f3b23a5a23e4210
+        .quad 0x3fefff7491cd4af6, 0x3f39f8893d8fd1c1
+        .quad 0x3fefff7aebcff755, 0x3f38d986a4187285
+        .quad 0x3fefff80ff8911fd, 0x3f37c629a822bc9e
+        .quad 0x3fefff86cfd3e657, 0x3f36be02102b3520
+        .quad 0x3fefff8c5f702ccf, 0x3f35c0a378c90bca
+        .quad 0x3fefff91b102fca8, 0x3f34cda5374ea275
+        .quad 0x3fefff96c717b695, 0x3f33e4a23d1f4703
+        .quad 0x3fefff9ba420e834, 0x3f330538fbb77ecd
+        .quad 0x3fefffa04a7928b1, 0x3f322f0b496539be
+        .quad 0x3fefffa4bc63ee9a, 0x3f3161be46ad3b50
+        .quad 0x3fefffa8fc0e5f33, 0x3f309cfa445b00ff
+        .quad 0x3fefffad0b901755, 0x3f2fc0d55470cf51
+        .quad 0x3fefffb0ecebee1b, 0x3f2e577bbcd49935
+        .quad 0x3fefffb4a210b172, 0x3f2cfd4a5adec5c0
+        .quad 0x3fefffb82cd9dcbf, 0x3f2bb1a9657ce465
+        .quad 0x3fefffbb8f1049c6, 0x3f2a740684026555
+        .quad 0x3fefffbeca6adbe9, 0x3f2943d4a1d1ed39
+        .quad 0x3fefffc1e08f25f5, 0x3f28208bc334a6a5
+        .quad 0x3fefffc4d3120aa1, 0x3f2709a8db59f25c
+        .quad 0x3fefffc7a37857d2, 0x3f25feada379d8b7
+        .quad 0x3fefffca53375ce3, 0x3f24ff207314a102
+        .quad 0x3fefffcce3b57bff, 0x3f240a8c1949f75e
+        .quad 0x3fefffcf564ab6b7, 0x3f23207fb7420eb9
+        .quad 0x3fefffd1ac4135f9, 0x3f22408e9ba3327f
+        .quad 0x3fefffd3e6d5cd87, 0x3f216a501f0e42ca
+        .quad 0x3fefffd607387b07, 0x3f209d5f819c9e29
+        .quad 0x3fefffd80e8ce0da, 0x3f1fb2b792b40a22
+        .quad 0x3fefffd9fdeabcce, 0x3f1e3bcf436a1a95
+        .quad 0x3fefffdbd65e5ad0, 0x3f1cd55277c18d05
+        .quad 0x3fefffdd98e903b2, 0x3f1b7e94604479dc
+        .quad 0x3fefffdf46816833, 0x3f1a36eec00926dd
+        .quad 0x3fefffe0e0140857, 0x3f18fdc1b2dcf7b9
+        .quad 0x3fefffe26683972a, 0x3f17d2737527c3f9
+        .quad 0x3fefffe3daa95b18, 0x3f16b4702d7d5849
+        .quad 0x3fefffe53d558ae9, 0x3f15a329b7d30748
+        .quad 0x3fefffe68f4fa777, 0x3f149e17724f4d41
+        .quad 0x3fefffe7d156d244, 0x3f13a4b60ba9aa4e
+        .quad 0x3fefffe904222101, 0x3f12b6875310f785
+        .quad 0x3fefffea2860ee1e, 0x3f11d312098e9dba
+        .quad 0x3fefffeb3ebb267b, 0x3f10f9e1b4dd36df
+        .quad 0x3fefffec47d19457, 0x3f102a8673a94692
+        .quad 0x3fefffed443e2787, 0x3f0ec929a665b449
+        .quad 0x3fefffee34943b15, 0x3f0d4f4b4c8e09ed
+        .quad 0x3fefffef1960d85d, 0x3f0be6abbb10a5aa
+        .quad 0x3fefffeff32af7af, 0x3f0a8e8cc1fadef6
+        .quad 0x3feffff0c273bea2, 0x3f094637d5bacfdb
+        .quad 0x3feffff187b6bc0e, 0x3f080cfdc72220cf
+        .quad 0x3feffff2436a21dc, 0x3f06e2367dc27f95
+        .quad 0x3feffff2f5fefcaa, 0x3f05c540b4936fd2
+        .quad 0x3feffff39fe16963, 0x3f04b581b8d170fc
+        .quad 0x3feffff44178c8d2, 0x3f03b2652b06c2b2
+        .quad 0x3feffff4db27f146, 0x3f02bb5cc22e5db6
+        .quad 0x3feffff56d4d5e5e, 0x3f01cfe010e2052d
+        .quad 0x3feffff5f8435efc, 0x3f00ef6c4c84a0fe
+        .quad 0x3feffff67c604180, 0x3f001984165a5f36
+        .quad 0x3feffff6f9f67e55, 0x3efe9b5e8d00ce77
+        .quad 0x3feffff77154e0d6, 0x3efd16f5716c6c1a
+        .quad 0x3feffff7e2c6aea2, 0x3efba4f035d60e03
+        .quad 0x3feffff84e93cd75, 0x3efa447b7b03f045
+        .quad 0x3feffff8b500e77c, 0x3ef8f4ccca7fc90d
+        .quad 0x3feffff9164f8e46, 0x3ef7b5223dac7336
+        .quad 0x3feffff972be5c59, 0x3ef684c227fcacef
+        .quad 0x3feffff9ca891572, 0x3ef562fac4329b48
+        .quad 0x3feffffa1de8c582, 0x3ef44f21e49054f2
+        .quad 0x3feffffa6d13de73, 0x3ef34894a5e24657
+        .quad 0x3feffffab83e54b8, 0x3ef24eb7254ccf83
+        .quad 0x3feffffaff99bac4, 0x3ef160f438c70913
+        .quad 0x3feffffb43555b5f, 0x3ef07ebd2a2d2844
+        .quad 0x3feffffb839e52f3, 0x3eef4f12e9ab070a
+        .quad 0x3feffffbc09fa7cd, 0x3eedb5ad0b27805c
+        .quad 0x3feffffbfa82616b, 0x3eec304efa2c6f4e
+        .quad 0x3feffffc316d9ed0, 0x3eeabe09e9144b5e
+        .quad 0x3feffffc6586abf6, 0x3ee95df988e76644
+        .quad 0x3feffffc96f1165e, 0x3ee80f439b4ee04b
+        .quad 0x3feffffcc5cec0c1, 0x3ee6d11788a69c64
+        .quad 0x3feffffcf23ff5fc, 0x3ee5a2adfa0b4bc4
+        .quad 0x3feffffd1c637b2b, 0x3ee4834877429b8f
+        .quad 0x3feffffd4456a10d, 0x3ee37231085c7d9a
+        .quad 0x3feffffd6a3554a1, 0x3ee26eb9daed6f7e
+        .quad 0x3feffffd8e1a2f22, 0x3ee1783ceac28910
+        .quad 0x3feffffdb01e8546, 0x3ee08e1badf0fced
+        .quad 0x3feffffdd05a75ea, 0x3edf5f7d88472604
+        .quad 0x3feffffdeee4f810, 0x3eddb92b5212fb8d
+        .quad 0x3feffffe0bd3e852, 0x3edc282cd3957eda
+        .quad 0x3feffffe273c15b7, 0x3edaab7abace48dc
+        .quad 0x3feffffe41314e06, 0x3ed94219bfcb4928
+        .quad 0x3feffffe59c6698b, 0x3ed7eb1a2075864e
+        .quad 0x3feffffe710d565e, 0x3ed6a597219a93da
+        .quad 0x3feffffe8717232d, 0x3ed570b69502f313
+        .quad 0x3feffffe9bf4098c, 0x3ed44ba864670882
+        .quad 0x3feffffeafb377d5, 0x3ed335a62115bce2
+        .quad 0x3feffffec2641a9e, 0x3ed22df298214423
+        .quad 0x3feffffed413e5b7, 0x3ed133d96ae7e0dd
+        .quad 0x3feffffee4d01cd6, 0x3ed046aeabcfcdec
+        .quad 0x3feffffef4a55bd4, 0x3ececb9cfe1d8642
+        .quad 0x3fefffff039f9e8f, 0x3ecd21397ead99cb
+        .quad 0x3fefffff11ca4876, 0x3ecb8d094c86d374
+        .quad 0x3fefffff1f302bc1, 0x3eca0df0f0c626dc
+        .quad 0x3fefffff2bdb904d, 0x3ec8a2e269750a39
+        .quad 0x3fefffff37d63a36, 0x3ec74adc8f4064d3
+        .quad 0x3fefffff43297019, 0x3ec604ea819f007c
+        .quad 0x3fefffff4dde0118, 0x3ec4d0231928c6f9
+        .quad 0x3fefffff57fc4a95, 0x3ec3aba85fe22e20
+        .quad 0x3fefffff618c3da6, 0x3ec296a70f414053
+        .quad 0x3fefffff6a956450, 0x3ec1905613b3abf2
+        .quad 0x3fefffff731ee681, 0x3ec097f6156f32c5
+        .quad 0x3fefffff7b2f8ed6, 0x3ebf59a20caf6695
+        .quad 0x3fefffff82cdcf1b, 0x3ebd9c73698fb1dc
+        .quad 0x3fefffff89ffc4aa, 0x3ebbf716c6168bae
+        .quad 0x3fefffff90cb3c81, 0x3eba6852c6b58392
+        .quad 0x3fefffff9735b73b, 0x3eb8eefd70594a89
+        .quad 0x3fefffff9d446ccc, 0x3eb789fb715aae95
+        .quad 0x3fefffffa2fc5015, 0x3eb6383f726a8e04
+        .quad 0x3fefffffa8621251, 0x3eb4f8c96f26a26a
+        .quad 0x3fefffffad7a2652, 0x3eb3caa61607f920
+        .quad 0x3fefffffb248c39d, 0x3eb2acee2f5ecdb8
+        .quad 0x3fefffffb6d1e95d, 0x3eb19ec60b1242ed
+        .quad 0x3fefffffbb196132, 0x3eb09f5cf4dd2877
+        .quad 0x3fefffffbf22c1e2, 0x3eaf5bd95d8730d8
+        .quad 0x3fefffffc2f171e3, 0x3ead9371e2ff7c35
+        .quad 0x3fefffffc688a9cf, 0x3eabe41de54d155a
+        .quad 0x3fefffffc9eb76ac, 0x3eaa4c89e08ef4f3
+        .quad 0x3fefffffcd1cbc28, 0x3ea8cb738399b12c
+        .quad 0x3fefffffd01f36af, 0x3ea75fa8dbc84bec
+        .quad 0x3fefffffd2f57d68, 0x3ea608078a70dcbc
+        .quad 0x3fefffffd5a2041f, 0x3ea4c37c0394d094
+        .quad 0x3fefffffd8271d12, 0x3ea39100d5687bfe
+        .quad 0x3fefffffda86faa9, 0x3ea26f9df8519bd7
+        .quad 0x3fefffffdcc3b117, 0x3ea15e6827001f18
+        .quad 0x3fefffffdedf37ed, 0x3ea05c803e4831c1
+        .quad 0x3fefffffe0db6b91, 0x3e9ed22548cffd35
+        .quad 0x3fefffffe2ba0ea5, 0x3e9d06ad6ecdf971
+        .quad 0x3fefffffe47ccb60, 0x3e9b551c847fbc96
+        .quad 0x3fefffffe62534d4, 0x3e99bc09f112b494
+        .quad 0x3fefffffe7b4c81e, 0x3e983a1ff0aa239d
+        .quad 0x3fefffffe92ced93, 0x3e96ce1aa3fd7bdd
+        .quad 0x3fefffffea8ef9cf, 0x3e9576c72b514859
+        .quad 0x3fefffffebdc2ec6, 0x3e943302cc4a0da8
+        .quad 0x3fefffffed15bcba, 0x3e9301ba221dc9bb
+        .quad 0x3fefffffee3cc32c, 0x3e91e1e857adc568
+        .quad 0x3fefffffef5251c2, 0x3e90d2966b1746f7
+        .quad 0x3feffffff0576917, 0x3e8fa5b4f49cc6b2
+        .quad 0x3feffffff14cfb92, 0x3e8dc3ae30b55c16
+        .quad 0x3feffffff233ee1d, 0x3e8bfd7555a3bd68
+        .quad 0x3feffffff30d18e8, 0x3e8a517d9e61628a
+        .quad 0x3feffffff3d9480f, 0x3e88be4f8f6c951f
+        .quad 0x3feffffff4993c46, 0x3e874287ded49339
+        .quad 0x3feffffff54dab72, 0x3e85dcd669f2cd34
+        .quad 0x3feffffff5f74141, 0x3e848bfd38302871
+        .quad 0x3feffffff6969fb8, 0x3e834ecf8a3c124a
+        .quad 0x3feffffff72c5fb6, 0x3e822430f521cbcf
+        .quad 0x3feffffff7b91176, 0x3e810b1488aeb235
+        .quad 0x3feffffff83d3d07, 0x3e80027c00a263a6
+        .quad 0x3feffffff8b962be, 0x3e7e12ee004efc37
+        .quad 0x3feffffff92dfba2, 0x3e7c3e44ae32b16b
+        .quad 0x3feffffff99b79d2, 0x3e7a854ea14102a8
+        .quad 0x3feffffffa0248e8, 0x3e78e6761569f45d
+        .quad 0x3feffffffa62ce54, 0x3e77603bac345f65
+        .quad 0x3feffffffabd69b4, 0x3e75f1353cdad001
+        .quad 0x3feffffffb127525, 0x3e74980cb3c80949
+        .quad 0x3feffffffb624592, 0x3e73537f00b6ad4d
+        .quad 0x3feffffffbad2aff, 0x3e72225b12bffc68
+        .quad 0x3feffffffbf370cd, 0x3e710380e1adb7e9
+        .quad 0x3feffffffc355dfd, 0x3e6febc107d5efaa
+        .quad 0x3feffffffc733572, 0x3e6df0f2a0ee6947
+        .quad 0x3feffffffcad3626, 0x3e6c14b2188bcee4
+        .quad 0x3feffffffce39b67, 0x3e6a553644f7f07d
+        .quad 0x3feffffffd169d0c, 0x3e68b0cfce0579e0
+        .quad 0x3feffffffd466fa5, 0x3e6725e7c5dd20f7
+        .quad 0x3feffffffd7344aa, 0x3e65b2fe547a1340
+        .quad 0x3feffffffd9d4aab, 0x3e6456a974e92e93
+        .quad 0x3feffffffdc4ad7a, 0x3e630f93c3699078
+        .quad 0x3feffffffde9964e, 0x3e61dc7b5b978cf8
+        .quad 0x3feffffffe0c2bf0, 0x3e60bc30c5d52f15
+        .quad 0x3feffffffe2c92db, 0x3e5f5b2be65a0c7f
+        .quad 0x3feffffffe4aed5e, 0x3e5d5f3a8dea7357
+        .quad 0x3feffffffe675bbd, 0x3e5b82915b03515b
+        .quad 0x3feffffffe81fc4e, 0x3e59c3517e789488
+        .quad 0x3feffffffe9aeb97, 0x3e581fb7df06136e
+        .quad 0x3feffffffeb24467, 0x3e56961b8d641d06
+        .quad 0x3feffffffec81ff2, 0x3e5524ec4d916cae
+        .quad 0x3feffffffedc95e7, 0x3e53cab1343d18d1
+        .quad 0x3feffffffeefbc85, 0x3e52860757487a01
+        .quad 0x3fefffffff01a8b6, 0x3e5155a09065d4f7
+        .quad 0x3fefffffff126e1e, 0x3e50384250e4c9fc
+        .quad 0x3fefffffff221f30, 0x3e4e59890b926c78
+        .quad 0x3fefffffff30cd3f, 0x3e4c642116a8a9e3
+        .quad 0x3fefffffff3e8892, 0x3e4a8e405e651ab6
+        .quad 0x3fefffffff4b606f, 0x3e48d5f98114f872
+        .quad 0x3fefffffff57632d, 0x3e47397c5a66e307
+        .quad 0x3fefffffff629e44, 0x3e45b71456c5a4c4
+        .quad 0x3fefffffff6d1e56, 0x3e444d26de513197
+        .quad 0x3fefffffff76ef3f, 0x3e42fa31d6371537
+        .quad 0x3fefffffff801c1f, 0x3e41bcca373b7b43
+        .quad 0x3fefffffff88af67, 0x3e40939ab853339f
+        .quad 0x3fefffffff90b2e3, 0x3e3efac5187b2863
+        .quad 0x3fefffffff982fc1, 0x3e3cf1e86235d0e7
+        .quad 0x3fefffffff9f2e9f, 0x3e3b0a68a2128bab
+        .quad 0x3fefffffffa5b790, 0x3e39423165bc4444
+        .quad 0x3fefffffffabd229, 0x3e37974e743dea3d
+        .quad 0x3fefffffffb18582, 0x3e3607e9eacd1050
+        .quad 0x3fefffffffb6d844, 0x3e34924a74dec729
+        .quad 0x3fefffffffbbd0aa, 0x3e3334d19e0c2160
+        .quad 0x3fefffffffc0748f, 0x3e31edfa3c5f5cca
+        .quad 0x3fefffffffc4c96c, 0x3e30bc56f1b54701
+        .quad 0x3fefffffffc8d462, 0x3e2f3d2185e047d9
+        .quad 0x3fefffffffcc9a41, 0x3e2d26cb87945e87
+        .quad 0x3fefffffffd01f89, 0x3e2b334fac4b9f99
+        .quad 0x3fefffffffd36871, 0x3e296076f7918d1c
+        .quad 0x3fefffffffd678ed, 0x3e27ac2d72fc2c63
+        .quad 0x3fefffffffd954ae, 0x3e2614801550319e
+        .quad 0x3fefffffffdbff2a, 0x3e24979ac8b28927
+        .quad 0x3fefffffffde7ba0, 0x3e2333c68e2d0548
+        .quad 0x3fefffffffe0cd16, 0x3e21e767bce37dd7
+        .quad 0x3fefffffffe2f664, 0x3e20b0fc5b6d05a0
+        .quad 0x3fefffffffe4fa30, 0x3e1f1e3523b41d7d
+        .quad 0x3fefffffffe6daf7, 0x3e1d00de6608effe
+        .quad 0x3fefffffffe89b0c, 0x3e1b0778b7b3301b
+        .quad 0x3fefffffffea3c9a, 0x3e192fb04ec0f6cf
+        .quad 0x3fefffffffebc1a9, 0x3e177756ec9f78fa
+        .quad 0x3fefffffffed2c21, 0x3e15dc61922d5a06
+        .quad 0x3fefffffffee7dc8, 0x3e145ce65699ff6d
+        .quad 0x3fefffffffefb847, 0x3e12f71a5f159970
+        .quad 0x3feffffffff0dd2b, 0x3e11a94ff571654f
+        .quad 0x3feffffffff1ede9, 0x3e1071f4bbea09ec
+        .quad 0x3feffffffff2ebda, 0x3e0e9f1ff8ddd774
+        .quad 0x3feffffffff3d843, 0x3e0c818223a202c7
+        .quad 0x3feffffffff4b453, 0x3e0a887bd2b4404d
+        .quad 0x3feffffffff58126, 0x3e08b1a336c5eb6b
+        .quad 0x3feffffffff63fc3, 0x3e06fab63324088a
+        .quad 0x3feffffffff6f121, 0x3e056197e30205ba
+        .quad 0x3feffffffff79626, 0x3e03e44e45301b92
+        .quad 0x3feffffffff82fab, 0x3e0281000bfe4c3f
+        .quad 0x3feffffffff8be77, 0x3e0135f28f2d50b4
+        .quad 0x3feffffffff94346, 0x3e000187dded5975
+        .quad 0x3feffffffff9bec8, 0x3dfdc479de0ef001
+        .quad 0x3feffffffffa319f, 0x3dfbad4fdad3caa1
+        .quad 0x3feffffffffa9c63, 0x3df9baed3ed27ab8
+        .quad 0x3feffffffffaffa4, 0x3df7ead9ce4285bb
+        .quad 0x3feffffffffb5be5, 0x3df63ac6b4edc88e
+        .quad 0x3feffffffffbb1a2, 0x3df4a88be2a6390c
+        .quad 0x3feffffffffc014e, 0x3df332259185f1a0
+        .quad 0x3feffffffffc4b56, 0x3df1d5b1f3793044
+        .quad 0x3feffffffffc901c, 0x3df0916f04b6e18b
+        .quad 0x3feffffffffccfff, 0x3deec77101de6926
+        .quad 0x3feffffffffd0b56, 0x3dec960bf23153e0
+        .quad 0x3feffffffffd4271, 0x3dea8bd20fc65ef7
+        .quad 0x3feffffffffd759d, 0x3de8a61745ec7d1d
+        .quad 0x3feffffffffda520, 0x3de6e25d0e756261
+        .quad 0x3feffffffffdd13c, 0x3de53e4f7d1666cb
+        .quad 0x3feffffffffdfa2d, 0x3de3b7c27a7ddb0e
+        .quad 0x3feffffffffe202d, 0x3de24caf2c32af14
+        .quad 0x3feffffffffe4371, 0x3de0fb3186804d0f
+        .quad 0x3feffffffffe642a, 0x3ddf830c0bb41fd7
+        .quad 0x3feffffffffe8286, 0x3ddd3c0f1a91c846
+        .quad 0x3feffffffffe9eb0, 0x3ddb1e5acf351d87
+        .quad 0x3feffffffffeb8d0, 0x3dd92712d259ce66
+        .quad 0x3feffffffffed10a, 0x3dd7538c60a04476
+        .quad 0x3feffffffffee782, 0x3dd5a14b04b47879
+        .quad 0x3feffffffffefc57, 0x3dd40dfd87456f4c
+        .quad 0x3fefffffffff0fa7, 0x3dd2977b1172b9d5
+        .quad 0x3fefffffffff218f, 0x3dd13bc07e891491
+        .quad 0x3fefffffffff3227, 0x3dcff1dbb4300811
+        .quad 0x3fefffffffff4188, 0x3dcd9a880f306bd8
+        .quad 0x3fefffffffff4fc9, 0x3dcb6e45220b55e0
+        .quad 0x3fefffffffff5cfd, 0x3dc96a0b33f2c4da
+        .quad 0x3fefffffffff6939, 0x3dc78b07e9e924ac
+        .quad 0x3fefffffffff748e, 0x3dc5ce9ab1670dd2
+        .quad 0x3fefffffffff7f0d, 0x3dc4325167006bb0
+        .quad 0x3fefffffffff88c5, 0x3dc2b3e53538ff3f
+        .quad 0x3fefffffffff91c6, 0x3dc15137a7f44864
+        .quad 0x3fefffffffff9a1b, 0x3dc0084ff125639d
+        .quad 0x3fefffffffffa1d2, 0x3dbdaeb0b7311ec7
+        .quad 0x3fefffffffffa8f6, 0x3dbb7937d1c40c53
+        .quad 0x3fefffffffffaf92, 0x3db96d082f59ab06
+        .quad 0x3fefffffffffb5b0, 0x3db7872d9fa10aad
+        .quad 0x3fefffffffffbb58, 0x3db5c4e8e37bc7d0
+        .quad 0x3fefffffffffc095, 0x3db423ac0df49a40
+        .quad 0x3fefffffffffc56d, 0x3db2a117230ad284
+        .quad 0x3fefffffffffc9e8, 0x3db13af4f04f9998
+        .quad 0x3fefffffffffce0d, 0x3dafde703724e560
+        .quad 0x3fefffffffffd1e1, 0x3dad77f0c82e7641
+        .quad 0x3fefffffffffd56c, 0x3dab3ee02611d7dd
+        .quad 0x3fefffffffffd8b3, 0x3da92ff33023d5bd
+        .quad 0x3fefffffffffdbba, 0x3da7481a9e69f53f
+        .quad 0x3fefffffffffde86, 0x3da5847eda620959
+        .quad 0x3fefffffffffe11d, 0x3da3e27c1fcc74bd
+        .quad 0x3fefffffffffe380, 0x3da25f9ee0b923dc
+        .quad 0x3fefffffffffe5b6, 0x3da0f9a068653200
+        .quad 0x3fefffffffffe7c0, 0x3d9f5cc7718082b0
+        .quad 0x3fefffffffffe9a2, 0x3d9cf7e53d6a2ca5
+        .quad 0x3fefffffffffeb60, 0x3d9ac0f5f3229372
+        .quad 0x3fefffffffffecfb, 0x3d98b498644847ea
+        .quad 0x3fefffffffffee77, 0x3d96cfa9bcca59dc
+        .quad 0x3fefffffffffefd6, 0x3d950f411d4fd2cd
+        .quad 0x3feffffffffff11a, 0x3d9370ab8327af5e
+        .quad 0x3feffffffffff245, 0x3d91f167f88c6b6e
+        .quad 0x3feffffffffff359, 0x3d908f24085d4597
+        .quad 0x3feffffffffff457, 0x3d8e8f70e181d61a
+        .quad 0x3feffffffffff542, 0x3d8c324c20e337dc
+        .quad 0x3feffffffffff61b, 0x3d8a03261574b54e
+        .quad 0x3feffffffffff6e3, 0x3d87fe903cdf5855
+        .quad 0x3feffffffffff79b, 0x3d86215c58da3450
+        .quad 0x3feffffffffff845, 0x3d846897d4b69fc6
+        .quad 0x3feffffffffff8e2, 0x3d82d1877d731b7b
+        .quad 0x3feffffffffff973, 0x3d8159a386b11517
+        .quad 0x3feffffffffff9f8, 0x3d7ffd27ae9393ce
+        .quad 0x3feffffffffffa73, 0x3d7d7c593130dd0b
+        .quad 0x3feffffffffffae4, 0x3d7b2cd607c79bcf
+        .quad 0x3feffffffffffb4c, 0x3d790ae4d3405651
+        .quad 0x3feffffffffffbad, 0x3d771312dd1759e2
+        .quad 0x3feffffffffffc05, 0x3d75422ef5d8949d
+        .quad 0x3feffffffffffc57, 0x3d739544b0ecc957
+        .quad 0x3feffffffffffca2, 0x3d720997f73e73dd
+        .quad 0x3feffffffffffce7, 0x3d709ca0eaacd277
+        .quad 0x3feffffffffffd27, 0x3d6e9810295890ec
+        .quad 0x3feffffffffffd62, 0x3d6c2b45b5aa4a1d
+        .quad 0x3feffffffffffd98, 0x3d69eee068fa7596
+        .quad 0x3feffffffffffdca, 0x3d67df2b399c10a8
+        .quad 0x3feffffffffffdf8, 0x3d65f8b87a31bd85
+        .quad 0x3feffffffffffe22, 0x3d64385c96e9a2d9
+        .quad 0x3feffffffffffe49, 0x3d629b2933ef4cbc
+        .quad 0x3feffffffffffe6c, 0x3d611e68a6378f8a
+        .quad 0x3feffffffffffe8d, 0x3d5f7f338086a86b
+        .quad 0x3feffffffffffeab, 0x3d5cf8d7d9ce040a
+        .quad 0x3feffffffffffec7, 0x3d5aa577251ae485
+        .quad 0x3feffffffffffee1, 0x3d58811d739efb5f
+        .quad 0x3feffffffffffef8, 0x3d568823e52970be
+        .quad 0x3fefffffffffff0e, 0x3d54b72ae68e8b4c
+        .quad 0x3fefffffffffff22, 0x3d530b14dbe876bc
+        .quad 0x3fefffffffffff34, 0x3d5181012ef86610
+        .quad 0x3fefffffffffff45, 0x3d501647ba798745
+        .quad 0x3fefffffffffff54, 0x3d4d90e917701675
+        .quad 0x3fefffffffffff62, 0x3d4b2a87e86d0c8a
+        .quad 0x3fefffffffffff6f, 0x3d48f53dcb377293
+        .quad 0x3fefffffffffff7b, 0x3d46ed2f2515e933
+        .quad 0x3fefffffffffff86, 0x3d450ecc9ed47f19
+        .quad 0x3fefffffffffff90, 0x3d4356cd5ce7799e
+        .quad 0x3fefffffffffff9a, 0x3d41c229a587ab78
+        .quad 0x3fefffffffffffa2, 0x3d404e15ecc7f3f6
+        .quad 0x3fefffffffffffaa, 0x3d3deffc7e6a6017
+        .quad 0x3fefffffffffffb1, 0x3d3b7b040832f310
+        .quad 0x3fefffffffffffb8, 0x3d3938e021f36d76
+        .quad 0x3fefffffffffffbe, 0x3d37258610b3b233
+        .quad 0x3fefffffffffffc3, 0x3d353d3bfc82a909
+        .quad 0x3fefffffffffffc8, 0x3d337c92babdc2fd
+        .quad 0x3fefffffffffffcd, 0x3d31e06010120f6a
+        .quad 0x3fefffffffffffd1, 0x3d3065b9616170d4
+        .quad 0x3fefffffffffffd5, 0x3d2e13dd96b3753b
+        .quad 0x3fefffffffffffd9, 0x3d2b950d32467392
+        .quad 0x3fefffffffffffdc, 0x3d294a72263259a5
+        .quad 0x3fefffffffffffdf, 0x3d272fd93e036cdc
+        .quad 0x3fefffffffffffe2, 0x3d254164576929ab
+        .quad 0x3fefffffffffffe4, 0x3d237b83c521fe96
+        .quad 0x3fefffffffffffe7, 0x3d21daf033182e96
+        .quad 0x3fefffffffffffe9, 0x3d205ca50205d26a
+        .quad 0x3fefffffffffffeb, 0x3d1dfbb6235639fa
+        .quad 0x3fefffffffffffed, 0x3d1b7807e294781f
+        .quad 0x3fefffffffffffee, 0x3d19298add70a734
+        .quad 0x3feffffffffffff0, 0x3d170beaf9c7ffb6
+        .quad 0x3feffffffffffff1, 0x3d151b2cd6709222
+        .quad 0x3feffffffffffff3, 0x3d1353a6cf7f7fff
+        .quad 0x3feffffffffffff4, 0x3d11b1fa8cbe84a7
+        .quad 0x3feffffffffffff5, 0x3d10330f0fd69921
+        .quad 0x3feffffffffffff6, 0x3d0da81670f96f9b
+        .quad 0x3feffffffffffff7, 0x3d0b24a16b4d09aa
+        .quad 0x3feffffffffffff7, 0x3d08d6eeb6efdbd6
+        .quad 0x3feffffffffffff8, 0x3d06ba91ac734786
+        .quad 0x3feffffffffffff9, 0x3d04cb7966770ab5
+        .quad 0x3feffffffffffff9, 0x3d0305e9721d0981
+        .quad 0x3feffffffffffffa, 0x3d01667311fff70a
+        .quad 0x3feffffffffffffb, 0x3cffd3de10d62855
+        .quad 0x3feffffffffffffb, 0x3cfd1aefbcd48d0c
+        .quad 0x3feffffffffffffb, 0x3cfa9cc93c25aca9
+        .quad 0x3feffffffffffffc, 0x3cf85487ee3ea735
+        .quad 0x3feffffffffffffc, 0x3cf63daf8b4b1e0c
+        .quad 0x3feffffffffffffd, 0x3cf45421e69a6ca1
+        .quad 0x3feffffffffffffd, 0x3cf294175802d99a
+        .quad 0x3feffffffffffffd, 0x3cf0fa17bf41068f
+        .quad 0x3feffffffffffffd, 0x3cef05e82aae2bb9
+        .quad 0x3feffffffffffffe, 0x3cec578101b29058
+        .quad 0x3feffffffffffffe, 0x3ce9e39dc5dd2f7c
+        .quad 0x3feffffffffffffe, 0x3ce7a553a728bbf2
+        .quad 0x3feffffffffffffe, 0x3ce5982008db1304
+        .quad 0x3feffffffffffffe, 0x3ce3b7e00422e51b
+        .quad 0x3feffffffffffffe, 0x3ce200c898d9ee3e
+        .quad 0x3fefffffffffffff, 0x3ce06f5f7eb65a56
+        .quad 0x3fefffffffffffff, 0x3cde00e9148a1d25
+        .quad 0x3fefffffffffffff, 0x3cdb623734024e92
+        .quad 0x3fefffffffffffff, 0x3cd8fd4e01891bf8
+        .quad 0x3fefffffffffffff, 0x3cd6cd44c7470d89
+        .quad 0x3fefffffffffffff, 0x3cd4cd9c04158cd7
+        .quad 0x3fefffffffffffff, 0x3cd2fa34bf5c8344
+        .quad 0x3fefffffffffffff, 0x3cd14f4890ff2461
+        .quad 0x3fefffffffffffff, 0x3ccf92c49dfa4df5
+        .quad 0x3fefffffffffffff, 0x3ccccaaea71ab0df
+        .quad 0x3fefffffffffffff, 0x3cca40829f001197
+        .quad 0x3ff0000000000000, 0x3cc7eef13b59e96c
+        .quad 0x3ff0000000000000, 0x3cc5d11e1a252bf5
+        .quad 0x3ff0000000000000, 0x3cc3e296303b2297
+        .quad 0x3ff0000000000000, 0x3cc21f47009f43ce
+        .quad 0x3ff0000000000000, 0x3cc083768c5e4542
+        .quad 0x3ff0000000000000, 0x3cbe1777d831265f
+        .quad 0x3ff0000000000000, 0x3cbb69f10b0191b5
+        .quad 0x3ff0000000000000, 0x3cb8f8a3a05b5b53
+        .quad 0x3ff0000000000000, 0x3cb6be573c40c8e7
+        .quad 0x3ff0000000000000, 0x3cb4b645ba991fdb
+        .align 16
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff  /* _AbsMask */
+        .align 16
+        .quad 0x4017f80000000000, 0x4017f80000000000  /* _MaxThreshold = 6.0 - 1.0/128.0 */
+        .align 16
+        .quad 0x42c0000000000000, 0x42c0000000000000  /* SRound */
+        .align 16
+        .quad 0x2ff0000000000000, 0x2ff0000000000000  /* _U2THreshold  */
+        .align 16
+        .quad 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5  /* _poly_1_0 */
+        .align 16
+        .quad 0x3fc1111235a363b1, 0x3fc1111235a363b1  /* _poly_1_1 */
+        .align 16
+        .quad 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57  /* _poly_3_0 */
+        .align 16
+        .quad 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8  /* _poly_3_1 */
+        .align 16
+        .quad 0xbfc5555800001B4F, 0xbfc5555800001B4F  /* _poly_5_0 */
+        .align 16
+        .quad 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122  /* _poly_5_1 */
+        .align 16
+        .quad 0xbfd55555555547f6, 0xbfd55555555547f6  /* _poly_1_2 */
+        .align 16
+        .quad 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd  /* _poly_3_2 */
+        .align 16
+        .quad 0x3fe5555555554b0c, 0x3fe5555555554b0c  /* _poly_1_3 */
+        .align 16
+        .quad 0xbfd5555555555555, 0xbfd5555555555555  /* _poly_3_3 */
+        .align 16
+        .type	__svml_derf_data_internal,@object
+        .size	__svml_derf_data_internal,.-__svml_derf_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core-sse.S
new file mode 100644
index 0000000000..704785738f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized erf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_erf _ZGVdN4v_erf_sse_wrapper
+#include "../svml_d_erf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core.c
new file mode 100644
index 0000000000..0647917209
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized erf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_erf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_erf, __GI__ZGVdN4v_erf, __redirect__ZGVdN4v_erf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core_avx2.S
new file mode 100644
index 0000000000..bd7226cd5c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core_avx2.S
@@ -0,0 +1,984 @@
+/* Function erf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Basic formula is
+ *    erf(x) ~ erf(x0) +
+ *              + exp(-x0*x0)*D*(1+c0+T*P1(T)+D^2*P3(T)+D^4*P5(T)+D^6*p7+D^8*p9)
+ *   where D=x-x0, T=x0*D
+ *   x0 is x rounded to a specified number of fractional bits (in this case 7),
+ *    except that x0=0 for |x|<3.5/128.0 (using x0=0 for first 4 table entries)
+ *
+ *   Data table packs both erf(x0)_high and a few bits of erf(x0)_low in one
+ *   entry (in place of redundant exponent bits)
+ *
+ */
+
+/* Offsets for data table __svml_derf_data_internal
+ */
+#define _erf_tbl                      	0
+#define _AbsMask                      	12288
+#define _MaxThreshold                 	12320
+#define _SRound                       	12352
+#define _U2Threshold                  	12384
+#define _poly1_0                      	12416
+#define _poly1_1                      	12448
+#define _poly3_0                      	12480
+#define _poly3_1                      	12512
+#define _poly5_0                      	12544
+#define _poly5_1                      	12576
+#define _poly1_2                      	12608
+#define _poly3_2                      	12640
+#define _poly1_3                      	12672
+#define _poly3_3                      	12704
+#define _Mask32                       	12736
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_erf_avx2)
+/*
+ * vector gather: erf(x0),
+ * second value is exp(-x0*x0)
+ */
+        lea       __svml_derf_data_internal(%rip), %rdi
+        vmovupd   _SRound+__svml_derf_data_internal(%rip), %ymm6
+        vandpd    _AbsMask+__svml_derf_data_internal(%rip), %ymm0, %ymm5
+
+/*
+ * erf(x) rounds to 1.0 for x>_MaxThreshold (5.9921875)
+ * can compute all results in the main path
+ */
+        vminpd    _MaxThreshold+__svml_derf_data_internal(%rip), %ymm5, %ymm7
+        vaddpd    %ymm6, %ymm7, %ymm10
+        vcmpgt_oqpd _U2Threshold+__svml_derf_data_internal(%rip), %ymm7, %ymm9
+        vpsllq    $4, %ymm10, %ymm11
+        vsubpd    %ymm6, %ymm10, %ymm8
+        vandps    _Mask32+__svml_derf_data_internal(%rip), %ymm11, %ymm12
+        vsubpd    %ymm8, %ymm7, %ymm3
+        vmulpd    %ymm3, %ymm8, %ymm2
+        vandpd    %ymm9, %ymm3, %ymm1
+
+/* NaN fixup */
+        vminpd    %ymm5, %ymm3, %ymm3
+
+/* save sign */
+        vxorpd    %ymm0, %ymm5, %ymm4
+
+/* T^2 */
+        vmulpd    %ymm2, %ymm2, %ymm5
+        vextractf128 $1, %ymm12, %xmm13
+        vmovd     %xmm12, %eax
+        vmovd     %xmm13, %ecx
+        vpextrd   $2, %xmm12, %edx
+        vpextrd   $2, %xmm13, %esi
+        movslq    %eax, %rax
+        movslq    %edx, %rdx
+        movslq    %ecx, %rcx
+        movslq    %esi, %rsi
+
+/* Sign | Diff */
+        vxorpd    %ymm4, %ymm3, %ymm12
+
+/*
+ * _LA_ polynomial computation
+ * Start polynomial evaluation
+ */
+        vmovupd   _poly1_0+__svml_derf_data_internal(%rip), %ymm3
+        vmovupd   (%rdi,%rax), %xmm6
+        vmovupd   (%rdi,%rdx), %xmm7
+        vmovupd   (%rdi,%rcx), %xmm8
+        vmovupd   (%rdi,%rsi), %xmm9
+        vunpcklpd %xmm7, %xmm6, %xmm14
+        vunpcklpd %xmm9, %xmm8, %xmm15
+
+/* D2 = Diff^2 */
+        vmulpd    %ymm1, %ymm1, %ymm13
+        vfmadd213pd _poly1_1+__svml_derf_data_internal(%rip), %ymm2, %ymm3
+        vmovupd   _poly5_0+__svml_derf_data_internal(%rip), %ymm1
+        vunpckhpd %xmm9, %xmm8, %xmm10
+        vfmadd213pd _poly1_2+__svml_derf_data_internal(%rip), %ymm2, %ymm3
+        vfmadd213pd _poly5_1+__svml_derf_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213pd _poly1_3+__svml_derf_data_internal(%rip), %ymm2, %ymm3
+        vfmadd213pd _poly3_3+__svml_derf_data_internal(%rip), %ymm13, %ymm1
+
+/* P1 = T^2*P1 - T */
+        vfmsub213pd %ymm2, %ymm5, %ymm3
+        vinsertf128 $1, %xmm15, %ymm14, %ymm0
+        vunpckhpd %xmm7, %xmm6, %xmm14
+        vmovupd   _poly3_0+__svml_derf_data_internal(%rip), %ymm6
+        vfmadd213pd _poly3_1+__svml_derf_data_internal(%rip), %ymm2, %ymm6
+        vfmadd213pd _poly3_2+__svml_derf_data_internal(%rip), %ymm2, %ymm6
+        vfmadd213pd %ymm1, %ymm2, %ymm6
+
+/* P1 + P3*D2 */
+        vfmadd213pd %ymm3, %ymm13, %ymm6
+
+/* Sign | _Erf_H */
+        vxorpd    %ymm4, %ymm0, %ymm0
+        vinsertf128 $1, %xmm10, %ymm14, %ymm11
+
+/* exp_h(x0) * Diff */
+        vmulpd    %ymm12, %ymm11, %ymm2
+
+/*
+ * branch-free
+ * low part of result: exp_h(x0) * Diff*(1+P1)
+ */
+        vfmadd213pd %ymm2, %ymm2, %ymm6
+
+/* Final result */
+        vaddpd    %ymm6, %ymm0, %ymm15
+
+/* Fix erf(-0) = -0 */
+        vorpd     %ymm4, %ymm15, %ymm0
+        ret
+
+END(_ZGVdN4v_erf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_derf_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _erf_tbl[6*128*2][2];
+        __declspec(align(32)) VUINT32 _AbsMask[4][2];
+        __declspec(align(32)) VUINT32 _MaxThreshold[4][2];
+        __declspec(align(32)) VUINT32 _SRound[4][2];
+        __declspec(align(32)) VUINT32 _U2Threshold[4][2];
+        __declspec(align(32)) VUINT32 _poly1_0[4][2];
+        __declspec(align(32)) VUINT32 _poly1_1[4][2];
+        __declspec(align(32)) VUINT32 _poly3_0[4][2];
+        __declspec(align(32)) VUINT32 _poly3_1[4][2];
+        __declspec(align(32)) VUINT32 _poly5_0[4][2];
+        __declspec(align(32)) VUINT32 _poly5_1[4][2];
+        __declspec(align(32)) VUINT32 _poly1_2[4][2];
+        __declspec(align(32)) VUINT32 _poly3_2[4][2];
+        __declspec(align(32)) VUINT32 _poly1_3[4][2];
+        __declspec(align(32)) VUINT32 _poly3_3[4][2];
+        __declspec(align(32)) VUINT32 _Mask32[4][2];
+} __svml_derf_data_internal;
+#endif
+__svml_derf_data_internal:
+        /*== _erf_tbl ==*/
+        .quad 0x0000000000000000, 0x3ff20dd750429b6d
+        .quad 0x3f820dbf3deb1340, 0x3ff20d8f1975c85d
+        .quad 0x3f920d77083f17a0, 0x3ff20cb67bd452c7
+        .quad 0x3f9b137e0cf584dc, 0x3ff20b4d8bac36c1
+        .quad 0x3fa20c5645dd2538, 0x3ff209546ad13ccf
+        .quad 0x3fa68e5d3bbc9526, 0x3ff206cb4897b148
+        .quad 0x3fab0fafef135745, 0x3ff203b261cd0053
+        .quad 0x3faf902a77bd3821, 0x3ff2000a00ae3804
+        .quad 0x3fb207d480e90658, 0x3ff1fbd27cdc72d3
+        .quad 0x3fb44703e87e8593, 0x3ff1f70c3b4f2cc8
+        .quad 0x3fb68591a1e83b5d, 0x3ff1f1b7ae44867f
+        .quad 0x3fb8c36beb8a8d23, 0x3ff1ebd5552f795b
+        .quad 0x3fbb0081148a873a, 0x3ff1e565bca400d4
+        .quad 0x3fbd3cbf7e70a4b3, 0x3ff1de697e413d29
+        .quad 0x3fbf78159ec8bb50, 0x3ff1d6e14099944a
+        .quad 0x3fc0d939005f65e5, 0x3ff1cecdb718d61c
+        .quad 0x3fc1f5e1a35c3b89, 0x3ff1c62fa1e869b6
+        .quad 0x3fc311fc15f56d14, 0x3ff1bd07cdd189ac
+        .quad 0x3fc42d7fc2f64959, 0x3ff1b357141d95d5
+        .quad 0x3fc548642321d7c6, 0x3ff1a91e5a748165
+        .quad 0x3fc662a0bdf7a89f, 0x3ff19e5e92b964ab
+        .quad 0x3fc77c2d2a765f9e, 0x3ff19318bae53a04
+        .quad 0x3fc895010fdbdbfd, 0x3ff1874ddcdfce24
+        .quad 0x3fc9ad142662e14d, 0x3ff17aff0e56ec10
+        .quad 0x3fcac45e37fe2526, 0x3ff16e2d7093cd8c
+        .quad 0x3fcbdad72110a648, 0x3ff160da304ed92f
+        .quad 0x3fccf076d1233237, 0x3ff153068581b781
+        .quad 0x3fce05354b96ff36, 0x3ff144b3b337c90c
+        .quad 0x3fcf190aa85540e2, 0x3ff135e3075d076b
+        .quad 0x3fd015f78a3dcf3d, 0x3ff12695da8b5bde
+        .quad 0x3fd09eed6982b948, 0x3ff116cd8fd67618
+        .quad 0x3fd127631eb8de32, 0x3ff1068b94962e5e
+        .quad 0x3fd1af54e232d609, 0x3ff0f5d1602f7e41
+        .quad 0x3fd236bef825d9a2, 0x3ff0e4a073dc1b91
+        .quad 0x3fd2bd9db0f7827f, 0x3ff0d2fa5a70c168
+        .quad 0x3fd343ed6989b7d9, 0x3ff0c0e0a8223359
+        .quad 0x3fd3c9aa8b84beda, 0x3ff0ae54fa490723
+        .quad 0x3fd44ed18d9f6462, 0x3ff09b58f724416b
+        .quad 0x3fd4d35ef3e5372e, 0x3ff087ee4d9ad247
+        .quad 0x3fd5574f4ffac98e, 0x3ff07416b4fbfe7c
+        .quad 0x3fd5da9f415ff23f, 0x3ff05fd3ecbec298
+        .quad 0x3fd65d4b75b00471, 0x3ff04b27bc403d30
+        .quad 0x3fd6df50a8dff772, 0x3ff03613f2812daf
+        .quad 0x3fd760aba57a76bf, 0x3ff0209a65e29545
+        .quad 0x3fd7e15944d9d3e4, 0x3ff00abcf3e187a9
+        .quad 0x3fd861566f5fd3c0, 0x3fefe8fb01a47307
+        .quad 0x3fd8e0a01cab516b, 0x3fefbbbbef34b4b2
+        .quad 0x3fd95f3353cbb146, 0x3fef8dc092d58ff8
+        .quad 0x3fd9dd0d2b721f39, 0x3fef5f0cdaf15313
+        .quad 0x3fda5a2aca209394, 0x3fef2fa4c16c0019
+        .quad 0x3fdad68966569a87, 0x3feeff8c4b1375db
+        .quad 0x3fdb522646bbda68, 0x3feecec7870ebca8
+        .quad 0x3fdbccfec24855b8, 0x3fee9d5a8e4c934e
+        .quad 0x3fdc4710406a65fc, 0x3fee6b4982f158b9
+        .quad 0x3fdcc058392a6d2d, 0x3fee38988fc46e72
+        .quad 0x3fdd38d4354c3bd0, 0x3fee054be79d3042
+        .quad 0x3fddb081ce6e2a48, 0x3fedd167c4cf9d2a
+        .quad 0x3fde275eaf25e458, 0x3fed9cf06898cdaf
+        .quad 0x3fde9d68931ae650, 0x3fed67ea1a8b5368
+        .quad 0x3fdf129d471eabb1, 0x3fed325927fb9d89
+        .quad 0x3fdf86faa9428f9d, 0x3fecfc41e36c7df9
+        .quad 0x3fdffa7ea8eb5fd0, 0x3fecc5a8a3fbea40
+        .quad 0x3fe03693a371519c, 0x3fec8e91c4d01368
+        .quad 0x3fe06f794ab2cae7, 0x3fec5701a484ef9d
+        .quad 0x3fe0a7ef5c18edd2, 0x3fec1efca49a5011
+        .quad 0x3fe0dff4f247f6c6, 0x3febe68728e29d5e
+        .quad 0x3fe1178930ada115, 0x3febada596f25436
+        .quad 0x3fe14eab43841b55, 0x3feb745c55905bf8
+        .quad 0x3fe1855a5fd3dd50, 0x3feb3aafcc27502e
+        .quad 0x3fe1bb95c3746199, 0x3feb00a46237d5be
+        .quad 0x3fe1f15cb50bc4de, 0x3feac63e7ecc1411
+        .quad 0x3fe226ae840d4d70, 0x3fea8b8287ec6a09
+        .quad 0x3fe25b8a88b6dd7f, 0x3fea5074e2157620
+        .quad 0x3fe28ff0240d52cd, 0x3fea1519efaf889e
+        .quad 0x3fe2c3debfd7d6c1, 0x3fe9d97610879642
+        .quad 0x3fe2f755ce9a21f4, 0x3fe99d8da149c13f
+        .quad 0x3fe32a54cb8db67b, 0x3fe96164fafd8de3
+        .quad 0x3fe35cdb3a9a144d, 0x3fe925007283d7aa
+        .quad 0x3fe38ee8a84beb71, 0x3fe8e86458169af8
+        .quad 0x3fe3c07ca9cb4f9e, 0x3fe8ab94f6caa71d
+        .quad 0x3fe3f196dcd0f135, 0x3fe86e9694134b9e
+        .quad 0x3fe42236e79a5fa6, 0x3fe8316d6f48133d
+        .quad 0x3fe4525c78dd5966, 0x3fe7f41dc12c9e89
+        .quad 0x3fe4820747ba2dc2, 0x3fe7b6abbb7aaf19
+        .quad 0x3fe4b13713ad3513, 0x3fe7791b886e7403
+        .quad 0x3fe4dfeba47f63cc, 0x3fe73b714a552763
+        .quad 0x3fe50e24ca35fd2c, 0x3fe6fdb11b1e0c34
+        .quad 0x3fe53be25d016a4f, 0x3fe6bfdf0beddaf5
+        .quad 0x3fe569243d2b3a9b, 0x3fe681ff24b4ab04
+        .quad 0x3fe595ea53035283, 0x3fe6441563c665d4
+        .quad 0x3fe5c2348ecc4dc3, 0x3fe60625bd75d07b
+        .quad 0x3fe5ee02e8a71a53, 0x3fe5c8341bb23767
+        .quad 0x3fe61955607dd15d, 0x3fe58a445da7c74c
+        .quad 0x3fe6442bfdedd397, 0x3fe54c5a57629db0
+        .quad 0x3fe66e86d0312e82, 0x3fe50e79d1749ac9
+        .quad 0x3fe69865ee075011, 0x3fe4d0a6889dfd9f
+        .quad 0x3fe6c1c9759d0e5f, 0x3fe492e42d78d2c5
+        .quad 0x3fe6eab18c74091b, 0x3fe4553664273d24
+        .quad 0x3fe7131e5f496a5a, 0x3fe417a0c4049fd0
+        .quad 0x3fe73b1021fc0cb8, 0x3fe3da26d759aef5
+        .quad 0x3fe762870f720c6f, 0x3fe39ccc1b136d5a
+        .quad 0x3fe78983697dc96f, 0x3fe35f93fe7d1b3d
+        .quad 0x3fe7b00578c26037, 0x3fe32281e2fd1a92
+        .quad 0x3fe7d60d8c979f7b, 0x3fe2e5991bd4cbfc
+        .quad 0x3fe7fb9bfaed8078, 0x3fe2a8dcede3673b
+        .quad 0x3fe820b1202f27fb, 0x3fe26c508f6bd0ff
+        .quad 0x3fe8454d5f25760d, 0x3fe22ff727dd6f7b
+        .quad 0x3fe8697120d92a4a, 0x3fe1f3d3cf9ffe5a
+        .quad 0x3fe88d1cd474a2e0, 0x3fe1b7e98fe26217
+        .quad 0x3fe8b050ef253c37, 0x3fe17c3b626c7a12
+        .quad 0x3fe8d30debfc572e, 0x3fe140cc3173f007
+        .quad 0x3fe8f5544bd00c04, 0x3fe1059ed7740313
+        .quad 0x3fe91724951b8fc6, 0x3fe0cab61f084b93
+        .quad 0x3fe9387f53df5238, 0x3fe09014c2ca74da
+        .quad 0x3fe959651980da31, 0x3fe055bd6d32e8d7
+        .quad 0x3fe979d67caa6631, 0x3fe01bb2b87c6968
+        .quad 0x3fe999d4192a5715, 0x3fdfc3ee5d1524b0
+        .quad 0x3fe9b95e8fd26aba, 0x3fdf511a91a67d2a
+        .quad 0x3fe9d8768656cc42, 0x3fdedeeee0959518
+        .quad 0x3fe9f71ca72cffb6, 0x3fde6d6ffaa65a25
+        .quad 0x3fea1551a16aaeaf, 0x3fddfca26f5bbf88
+        .quad 0x3fea331628a45b92, 0x3fdd8c8aace11e63
+        .quad 0x3fea506af4cc00f4, 0x3fdd1d2cfff91594
+        .quad 0x3fea6d50c20fa293, 0x3fdcae8d93f1d7b7
+        .quad 0x3fea89c850b7d54d, 0x3fdc40b0729ed548
+        .quad 0x3feaa5d265064366, 0x3fdbd3998457afdb
+        .quad 0x3feac16fc7143263, 0x3fdb674c8ffc6283
+        .quad 0x3feadca142b10f98, 0x3fdafbcd3afe8ab6
+        .quad 0x3feaf767a741088b, 0x3fda911f096fbc26
+        .quad 0x3feb11c3c79bb424, 0x3fda27455e14c93c
+        .quad 0x3feb2bb679ead19c, 0x3fd9be437a7de946
+        .quad 0x3feb4540978921ee, 0x3fd9561c7f23a47b
+        .quad 0x3feb5e62fce16095, 0x3fd8eed36b886d93
+        .quad 0x3feb771e894d602e, 0x3fd8886b1e5ecfd1
+        .quad 0x3feb8f741ef54f83, 0x3fd822e655b417e7
+        .quad 0x3feba764a2af2b78, 0x3fd7be47af1f5d89
+        .quad 0x3febbef0fbde6221, 0x3fd75a91a7f4d2ed
+        .quad 0x3febd61a1453ab44, 0x3fd6f7c69d7d3ef8
+        .quad 0x3febece0d82d1a5c, 0x3fd695e8cd31867e
+        .quad 0x3fec034635b66e23, 0x3fd634fa54fa285f
+        .quad 0x3fec194b1d49a184, 0x3fd5d4fd33729015
+        .quad 0x3fec2ef0812fc1bd, 0x3fd575f3483021c3
+        .quad 0x3fec443755820d64, 0x3fd517de540ce2a3
+        .quad 0x3fec5920900b5fd1, 0x3fd4babff975a04c
+        .quad 0x3fec6dad2829ec62, 0x3fd45e99bcbb7915
+        .quad 0x3fec81de16b14cef, 0x3fd4036d0468a7a2
+        .quad 0x3fec95b455cce69d, 0x3fd3a93b1998736c
+        .quad 0x3feca930e0e2a825, 0x3fd35005285227f1
+        .quad 0x3fecbc54b476248d, 0x3fd2f7cc3fe6f423
+        .quad 0x3feccf20ce0c0d27, 0x3fd2a09153529381
+        .quad 0x3fece1962c0e0d8b, 0x3fd24a55399ea239
+        .quad 0x3fecf3b5cdaf0c39, 0x3fd1f518ae487dc8
+        .quad 0x3fed0580b2cfd249, 0x3fd1a0dc51a9934d
+        .quad 0x3fed16f7dbe41ca0, 0x3fd14da0a961fd14
+        .quad 0x3fed281c49d818d0, 0x3fd0fb6620c550af
+        .quad 0x3fed38eefdf64fdd, 0x3fd0aa2d09497f2b
+        .quad 0x3fed4970f9ce00d9, 0x3fd059f59af7a906
+        .quad 0x3fed59a33f19ed42, 0x3fd00abff4dec7a3
+        .quad 0x3fed6986cfa798e7, 0x3fcf79183b101c5b
+        .quad 0x3fed791cad3eff01, 0x3fcedeb406d9c825
+        .quad 0x3fed8865d98abe01, 0x3fce4652fadcb6b2
+        .quad 0x3fed97635600bb89, 0x3fcdaff4969c0b04
+        .quad 0x3feda61623cb41e0, 0x3fcd1b982c501370
+        .quad 0x3fedb47f43b2980d, 0x3fcc893ce1dcbef7
+        .quad 0x3fedc29fb60715af, 0x3fcbf8e1b1ca2279
+        .quad 0x3fedd0787a8bb39d, 0x3fcb6a856c3ed54f
+        .quad 0x3fedde0a90611a0d, 0x3fcade26b7fbed95
+        .quad 0x3fedeb56f5f12d28, 0x3fca53c4135a6526
+        .quad 0x3fedf85ea8db188e, 0x3fc9cb5bd549b111
+        .quad 0x3fee0522a5dfda73, 0x3fc944ec2e4f5630
+        .quad 0x3fee11a3e8cf4eb8, 0x3fc8c07329874652
+        .quad 0x3fee1de36c75ba58, 0x3fc83deeada4d25a
+        .quad 0x3fee29e22a89d766, 0x3fc7bd5c7df3fe9c
+        .quad 0x3fee35a11b9b61ce, 0x3fc73eba3b5b07b7
+        .quad 0x3fee4121370224cc, 0x3fc6c205655be720
+        .quad 0x3fee4c6372cd8927, 0x3fc6473b5b15a7a1
+        .quad 0x3fee5768c3b4a3fc, 0x3fc5ce595c455b0a
+        .quad 0x3fee62321d06c5e0, 0x3fc5575c8a468362
+        .quad 0x3fee6cc0709c8a0d, 0x3fc4e241e912c305
+        .quad 0x3fee7714aec96534, 0x3fc46f066040a832
+        .quad 0x3fee812fc64db369, 0x3fc3fda6bc016994
+        .quad 0x3fee8b12a44944a8, 0x3fc38e1fae1d6a9d
+        .quad 0x3fee94be342e6743, 0x3fc3206dceef5f87
+        .quad 0x3fee9e335fb56f87, 0x3fc2b48d9e5dea1c
+        .quad 0x3feea7730ed0bbb9, 0x3fc24a7b84d38971
+        .quad 0x3feeb07e27a133aa, 0x3fc1e233d434b813
+        .quad 0x3feeb9558e6b42ce, 0x3fc17bb2c8d41535
+        .quad 0x3feec1fa258c4bea, 0x3fc116f48a6476cc
+        .quad 0x3feeca6ccd709544, 0x3fc0b3f52ce8c383
+        .quad 0x3feed2ae6489ac1e, 0x3fc052b0b1a174ea
+        .quad 0x3feedabfc7453e63, 0x3fbfe6460fef4680
+        .quad 0x3feee2a1d004692c, 0x3fbf2a901ccafb37
+        .quad 0x3feeea5557137ae0, 0x3fbe723726b824a9
+        .quad 0x3feef1db32a2277c, 0x3fbdbd32ac4c99b0
+        .quad 0x3feef93436bc2daa, 0x3fbd0b7a0f921e7c
+        .quad 0x3fef006135426b26, 0x3fbc5d0497c09e74
+        .quad 0x3fef0762fde45ee6, 0x3fbbb1c972f23e50
+        .quad 0x3fef0e3a5e1a1788, 0x3fbb09bfb7d11a84
+        .quad 0x3fef14e8211e8c55, 0x3fba64de673e8837
+        .quad 0x3fef1b6d0fea5f4d, 0x3fb9c31c6df3b1b8
+        .quad 0x3fef21c9f12f0677, 0x3fb92470a61b6965
+        .quad 0x3fef27ff89525acf, 0x3fb888d1d8e510a3
+        .quad 0x3fef2e0e9a6a8b09, 0x3fb7f036c0107294
+        .quad 0x3fef33f7e43a706b, 0x3fb75a96077274ba
+        .quad 0x3fef39bc242e43e6, 0x3fb6c7e64e7281cb
+        .quad 0x3fef3f5c1558b19e, 0x3fb6381e2980956b
+        .quad 0x3fef44d870704911, 0x3fb5ab342383d178
+        .quad 0x3fef4a31ebcd47df, 0x3fb5211ebf41880b
+        .quad 0x3fef4f693b67bd77, 0x3fb499d478bca735
+        .quad 0x3fef547f10d60597, 0x3fb4154bc68d75c3
+        .quad 0x3fef59741b4b97cf, 0x3fb3937b1b31925a
+        .quad 0x3fef5e4907982a07, 0x3fb31458e6542847
+        .quad 0x3fef62fe80272419, 0x3fb297db960e4f63
+        .quad 0x3fef67952cff6282, 0x3fb21df9981f8e53
+        .quad 0x3fef6c0db3c34641, 0x3fb1a6a95b1e786f
+        .quad 0x3fef7068b7b10fd9, 0x3fb131e14fa1625d
+        .quad 0x3fef74a6d9a38383, 0x3fb0bf97e95f2a64
+        .quad 0x3fef78c8b812d498, 0x3fb04fc3a0481321
+        .quad 0x3fef7cceef15d631, 0x3fafc4b5e32d6259
+        .quad 0x3fef80ba18636f07, 0x3faeeea8c1b1db94
+        .quad 0x3fef848acb544e95, 0x3fae1d4cf1e2450a
+        .quad 0x3fef88419ce4e184, 0x3fad508f9a1ea64f
+        .quad 0x3fef8bdf1fb78370, 0x3fac885df3451a07
+        .quad 0x3fef8f63e416ebff, 0x3fabc4a54a84e834
+        .quad 0x3fef92d077f8d56d, 0x3fab055303221015
+        .quad 0x3fef96256700da8e, 0x3faa4a549829587e
+        .quad 0x3fef99633a838a57, 0x3fa993979e14fffe
+        .quad 0x3fef9c8a7989af0d, 0x3fa8e109c4622913
+        .quad 0x3fef9f9ba8d3c733, 0x3fa83298d717210e
+        .quad 0x3fefa2974addae45, 0x3fa78832c03aa2b1
+        .quad 0x3fefa57ddfe27376, 0x3fa6e1c5893c380b
+        .quad 0x3fefa84fe5e05c8d, 0x3fa63f3f5c4de13b
+        .quad 0x3fefab0dd89d1309, 0x3fa5a08e85af27e0
+        .quad 0x3fefadb831a9f9c3, 0x3fa505a174e9c929
+        .quad 0x3fefb04f6868a944, 0x3fa46e66be002240
+        .quad 0x3fefb2d3f20f9101, 0x3fa3dacd1a8d8cce
+        .quad 0x3fefb54641aebbc9, 0x3fa34ac36ad8dafe
+        .quad 0x3fefb7a6c834b5a2, 0x3fa2be38b6d92415
+        .quad 0x3fefb9f5f4739170, 0x3fa2351c2f2d1449
+        .quad 0x3fefbc3433260ca5, 0x3fa1af5d2e04f3f6
+        .quad 0x3fefbe61eef4cf6a, 0x3fa12ceb37ff9bc3
+        .quad 0x3fefc07f907bc794, 0x3fa0adb5fcfa8c75
+        .quad 0x3fefc28d7e4f9cd0, 0x3fa031ad58d56279
+        .quad 0x3fefc48c1d033c7a, 0x3f9f7182a851bca2
+        .quad 0x3fefc67bcf2d7b8f, 0x3f9e85c449e377f3
+        .quad 0x3fefc85cf56ecd38, 0x3f9da0005e5f28df
+        .quad 0x3fefca2fee770c79, 0x3f9cc0180af00a8b
+        .quad 0x3fefcbf5170b578b, 0x3f9be5ecd2fcb5f9
+        .quad 0x3fefcdacca0bfb73, 0x3f9b1160991ff737
+        .quad 0x3fefcf57607a6e7c, 0x3f9a4255a00b9f03
+        .quad 0x3fefd0f5317f582f, 0x3f9978ae8b55ce1b
+        .quad 0x3fefd2869270a56f, 0x3f98b44e6031383e
+        .quad 0x3fefd40bd6d7a785, 0x3f97f5188610ddc8
+        .quad 0x3fefd58550773cb5, 0x3f973af0c737bb45
+        .quad 0x3fefd6f34f52013a, 0x3f9685bb5134ef13
+        .quad 0x3fefd85621b0876d, 0x3f95d55cb54cd53a
+        .quad 0x3fefd9ae142795e3, 0x3f9529b9e8cf9a1e
+        .quad 0x3fefdafb719e6a69, 0x3f9482b8455dc491
+        .quad 0x3fefdc3e835500b3, 0x3f93e03d891b37de
+        .quad 0x3fefdd7790ea5bc0, 0x3f93422fd6d12e2b
+        .quad 0x3fefdea6e062d0c9, 0x3f92a875b5ffab56
+        .quad 0x3fefdfccb62e52d3, 0x3f9212f612dee7fb
+        .quad 0x3fefe0e9552ebdd6, 0x3f9181983e5133dd
+        .quad 0x3fefe1fcfebe2083, 0x3f90f443edc5ce49
+        .quad 0x3fefe307f2b503d0, 0x3f906ae13b0d3255
+        .quad 0x3fefe40a6f70af4b, 0x3f8fcab1483ea7fc
+        .quad 0x3fefe504b1d9696c, 0x3f8ec72615a894c4
+        .quad 0x3fefe5f6f568b301, 0x3f8dcaf3691fc448
+        .quad 0x3fefe6e1742f7cf6, 0x3f8cd5ec93c12432
+        .quad 0x3fefe7c466dc57a1, 0x3f8be7e5ac24963b
+        .quad 0x3fefe8a004c19ae6, 0x3f8b00b38d6b3575
+        .quad 0x3fefe97483db8670, 0x3f8a202bd6372dce
+        .quad 0x3fefea4218d6594a, 0x3f894624e78e0faf
+        .quad 0x3fefeb08f7146046, 0x3f887275e3a6869e
+        .quad 0x3fefebc950b3fa75, 0x3f87a4f6aca256cb
+        .quad 0x3fefec835695932e, 0x3f86dd7fe3358230
+        .quad 0x3fefed37386190fb, 0x3f861beae53b72b7
+        .quad 0x3fefede5248e38f4, 0x3f856011cc3b036d
+        .quad 0x3fefee8d486585ee, 0x3f84a9cf6bda3f4c
+        .quad 0x3fefef2fd00af31a, 0x3f83f8ff5042a88e
+        .quad 0x3fefefcce6813974, 0x3f834d7dbc76d7e5
+        .quad 0x3feff064b5afffbe, 0x3f82a727a89a3f14
+        .quad 0x3feff0f766697c76, 0x3f8205dac02bd6b9
+        .quad 0x3feff18520700971, 0x3f81697560347b26
+        .quad 0x3feff20e0a7ba8c2, 0x3f80d1d69569b82d
+        .quad 0x3feff2924a3f7a83, 0x3f803ede1a45bfee
+        .quad 0x3feff312046f2339, 0x3f7f60d8aa2a88f2
+        .quad 0x3feff38d5cc4227f, 0x3f7e4cc4abf7d065
+        .quad 0x3feff404760319b4, 0x3f7d4143a9dfe965
+        .quad 0x3feff47772010262, 0x3f7c3e1a5f5c077c
+        .quad 0x3feff4e671a85425, 0x3f7b430ecf4a83a8
+        .quad 0x3feff55194fe19df, 0x3f7a4fe83fb9db25
+        .quad 0x3feff5b8fb26f5f6, 0x3f79646f35a76624
+        .quad 0x3feff61cc26c1578, 0x3f78806d70b2fc36
+        .quad 0x3feff67d08401202, 0x3f77a3ade6c8b3e5
+        .quad 0x3feff6d9e943c231, 0x3f76cdfcbfc1e263
+        .quad 0x3feff733814af88c, 0x3f75ff2750fe7820
+        .quad 0x3feff789eb6130c9, 0x3f7536fc18f7ce5c
+        .quad 0x3feff7dd41ce2b4d, 0x3f74754abacdf1dc
+        .quad 0x3feff82d9e1a76d8, 0x3f73b9e3f9d06e3f
+        .quad 0x3feff87b1913e853, 0x3f730499b503957f
+        .quad 0x3feff8c5cad200a5, 0x3f72553ee2a336bf
+        .quad 0x3feff90dcaba4096, 0x3f71aba78ba3af89
+        .quad 0x3feff9532f846ab0, 0x3f7107a8c7323a6e
+        .quad 0x3feff9960f3eb327, 0x3f706918b6355624
+        .quad 0x3feff9d67f51ddba, 0x3f6f9f9cfd9c3035
+        .quad 0x3feffa14948549a7, 0x3f6e77448fb66bb9
+        .quad 0x3feffa506302ebae, 0x3f6d58da68fd1170
+        .quad 0x3feffa89fe5b3625, 0x3f6c4412bf4b8f0b
+        .quad 0x3feffac17988ef4b, 0x3f6b38a3af2e55b4
+        .quad 0x3feffaf6e6f4f5c0, 0x3f6a3645330550ff
+        .quad 0x3feffb2a5879f35e, 0x3f693cb11a30d765
+        .quad 0x3feffb5bdf67fe6f, 0x3f684ba3004a50d0
+        .quad 0x3feffb8b8c88295f, 0x3f6762d84469c18f
+        .quad 0x3feffbb970200110, 0x3f66821000795a03
+        .quad 0x3feffbe599f4f9d9, 0x3f65a90b00981d93
+        .quad 0x3feffc10194fcb64, 0x3f64d78bba8ca5fd
+        .quad 0x3feffc38fcffbb7c, 0x3f640d564548fad7
+        .quad 0x3feffc60535dd7f5, 0x3f634a305080681f
+        .quad 0x3feffc862a501fd7, 0x3f628de11c5031eb
+        .quad 0x3feffcaa8f4c9bea, 0x3f61d83170fbf6fb
+        .quad 0x3feffccd8f5c66d1, 0x3f6128eb96be8798
+        .quad 0x3feffcef371ea4d7, 0x3f607fdb4dafea5f
+        .quad 0x3feffd0f92cb6ba7, 0x3f5fb99b8b8279e1
+        .quad 0x3feffd2eae369a07, 0x3f5e7f232d9e2630
+        .quad 0x3feffd4c94d29fdb, 0x3f5d4fed7195d7e8
+        .quad 0x3feffd6951b33686, 0x3f5c2b9cf7f893bf
+        .quad 0x3feffd84ef9009ee, 0x3f5b11d702b3deb2
+        .quad 0x3feffd9f78c7524a, 0x3f5a024365f771bd
+        .quad 0x3feffdb8f7605ee7, 0x3f58fc8c794b03b5
+        .quad 0x3feffdd1750e1220, 0x3f58005f08d6f1ef
+        .quad 0x3feffde8fb314ebf, 0x3f570d6a46e07dda
+        .quad 0x3feffdff92db56e5, 0x3f56235fbd7a4345
+        .quad 0x3feffe1544d01ccb, 0x3f5541f340697987
+        .quad 0x3feffe2a1988857c, 0x3f5468dadf4080ab
+        .quad 0x3feffe3e19349dc7, 0x3f5397ced7af2b15
+        .quad 0x3feffe514bbdc197, 0x3f52ce898809244e
+        .quad 0x3feffe63b8c8b5f7, 0x3f520cc76202c5fb
+        .quad 0x3feffe7567b7b5e1, 0x3f515246dda49d47
+        .quad 0x3feffe865fac722b, 0x3f509ec86c75d497
+        .quad 0x3feffe96a78a04a9, 0x3f4fe41cd9bb4eee
+        .quad 0x3feffea645f6d6da, 0x3f4e97ba3b77f306
+        .quad 0x3feffeb5415e7c44, 0x3f4d57f524723822
+        .quad 0x3feffec39ff380b9, 0x3f4c245d4b99847a
+        .quad 0x3feffed167b12ac2, 0x3f4afc85e0f82e12
+        .quad 0x3feffede9e5d3262, 0x3f49e005769dbc1d
+        .quad 0x3feffeeb49896c6d, 0x3f48ce75e9f6f8a0
+        .quad 0x3feffef76e956a9f, 0x3f47c7744d9378f7
+        .quad 0x3fefff0312b010b5, 0x3f46caa0d3582fe9
+        .quad 0x3fefff0e3ad91ec2, 0x3f45d79eb71e893b
+        .quad 0x3fefff18ebe2b0e1, 0x3f44ee1429bf7cc0
+        .quad 0x3fefff232a72b48e, 0x3f440daa3c89f5b6
+        .quad 0x3fefff2cfb0453d9, 0x3f43360ccd23db3a
+        .quad 0x3fefff3661e9569d, 0x3f4266ea71d4f71a
+        .quad 0x3fefff3f634b79f9, 0x3f419ff4663ae9df
+        .quad 0x3fefff48032dbe40, 0x3f40e0de78654d1e
+        .quad 0x3fefff50456dab8c, 0x3f40295ef6591848
+        .quad 0x3fefff582dc48d30, 0x3f3ef25d37f49fe1
+        .quad 0x3fefff5fbfc8a439, 0x3f3da01102b5f851
+        .quad 0x3fefff66feee5129, 0x3f3c5b5412dcafad
+        .quad 0x3fefff6dee89352e, 0x3f3b23a5a23e4210
+        .quad 0x3fefff7491cd4af6, 0x3f39f8893d8fd1c1
+        .quad 0x3fefff7aebcff755, 0x3f38d986a4187285
+        .quad 0x3fefff80ff8911fd, 0x3f37c629a822bc9e
+        .quad 0x3fefff86cfd3e657, 0x3f36be02102b3520
+        .quad 0x3fefff8c5f702ccf, 0x3f35c0a378c90bca
+        .quad 0x3fefff91b102fca8, 0x3f34cda5374ea275
+        .quad 0x3fefff96c717b695, 0x3f33e4a23d1f4703
+        .quad 0x3fefff9ba420e834, 0x3f330538fbb77ecd
+        .quad 0x3fefffa04a7928b1, 0x3f322f0b496539be
+        .quad 0x3fefffa4bc63ee9a, 0x3f3161be46ad3b50
+        .quad 0x3fefffa8fc0e5f33, 0x3f309cfa445b00ff
+        .quad 0x3fefffad0b901755, 0x3f2fc0d55470cf51
+        .quad 0x3fefffb0ecebee1b, 0x3f2e577bbcd49935
+        .quad 0x3fefffb4a210b172, 0x3f2cfd4a5adec5c0
+        .quad 0x3fefffb82cd9dcbf, 0x3f2bb1a9657ce465
+        .quad 0x3fefffbb8f1049c6, 0x3f2a740684026555
+        .quad 0x3fefffbeca6adbe9, 0x3f2943d4a1d1ed39
+        .quad 0x3fefffc1e08f25f5, 0x3f28208bc334a6a5
+        .quad 0x3fefffc4d3120aa1, 0x3f2709a8db59f25c
+        .quad 0x3fefffc7a37857d2, 0x3f25feada379d8b7
+        .quad 0x3fefffca53375ce3, 0x3f24ff207314a102
+        .quad 0x3fefffcce3b57bff, 0x3f240a8c1949f75e
+        .quad 0x3fefffcf564ab6b7, 0x3f23207fb7420eb9
+        .quad 0x3fefffd1ac4135f9, 0x3f22408e9ba3327f
+        .quad 0x3fefffd3e6d5cd87, 0x3f216a501f0e42ca
+        .quad 0x3fefffd607387b07, 0x3f209d5f819c9e29
+        .quad 0x3fefffd80e8ce0da, 0x3f1fb2b792b40a22
+        .quad 0x3fefffd9fdeabcce, 0x3f1e3bcf436a1a95
+        .quad 0x3fefffdbd65e5ad0, 0x3f1cd55277c18d05
+        .quad 0x3fefffdd98e903b2, 0x3f1b7e94604479dc
+        .quad 0x3fefffdf46816833, 0x3f1a36eec00926dd
+        .quad 0x3fefffe0e0140857, 0x3f18fdc1b2dcf7b9
+        .quad 0x3fefffe26683972a, 0x3f17d2737527c3f9
+        .quad 0x3fefffe3daa95b18, 0x3f16b4702d7d5849
+        .quad 0x3fefffe53d558ae9, 0x3f15a329b7d30748
+        .quad 0x3fefffe68f4fa777, 0x3f149e17724f4d41
+        .quad 0x3fefffe7d156d244, 0x3f13a4b60ba9aa4e
+        .quad 0x3fefffe904222101, 0x3f12b6875310f785
+        .quad 0x3fefffea2860ee1e, 0x3f11d312098e9dba
+        .quad 0x3fefffeb3ebb267b, 0x3f10f9e1b4dd36df
+        .quad 0x3fefffec47d19457, 0x3f102a8673a94692
+        .quad 0x3fefffed443e2787, 0x3f0ec929a665b449
+        .quad 0x3fefffee34943b15, 0x3f0d4f4b4c8e09ed
+        .quad 0x3fefffef1960d85d, 0x3f0be6abbb10a5aa
+        .quad 0x3fefffeff32af7af, 0x3f0a8e8cc1fadef6
+        .quad 0x3feffff0c273bea2, 0x3f094637d5bacfdb
+        .quad 0x3feffff187b6bc0e, 0x3f080cfdc72220cf
+        .quad 0x3feffff2436a21dc, 0x3f06e2367dc27f95
+        .quad 0x3feffff2f5fefcaa, 0x3f05c540b4936fd2
+        .quad 0x3feffff39fe16963, 0x3f04b581b8d170fc
+        .quad 0x3feffff44178c8d2, 0x3f03b2652b06c2b2
+        .quad 0x3feffff4db27f146, 0x3f02bb5cc22e5db6
+        .quad 0x3feffff56d4d5e5e, 0x3f01cfe010e2052d
+        .quad 0x3feffff5f8435efc, 0x3f00ef6c4c84a0fe
+        .quad 0x3feffff67c604180, 0x3f001984165a5f36
+        .quad 0x3feffff6f9f67e55, 0x3efe9b5e8d00ce77
+        .quad 0x3feffff77154e0d6, 0x3efd16f5716c6c1a
+        .quad 0x3feffff7e2c6aea2, 0x3efba4f035d60e03
+        .quad 0x3feffff84e93cd75, 0x3efa447b7b03f045
+        .quad 0x3feffff8b500e77c, 0x3ef8f4ccca7fc90d
+        .quad 0x3feffff9164f8e46, 0x3ef7b5223dac7336
+        .quad 0x3feffff972be5c59, 0x3ef684c227fcacef
+        .quad 0x3feffff9ca891572, 0x3ef562fac4329b48
+        .quad 0x3feffffa1de8c582, 0x3ef44f21e49054f2
+        .quad 0x3feffffa6d13de73, 0x3ef34894a5e24657
+        .quad 0x3feffffab83e54b8, 0x3ef24eb7254ccf83
+        .quad 0x3feffffaff99bac4, 0x3ef160f438c70913
+        .quad 0x3feffffb43555b5f, 0x3ef07ebd2a2d2844
+        .quad 0x3feffffb839e52f3, 0x3eef4f12e9ab070a
+        .quad 0x3feffffbc09fa7cd, 0x3eedb5ad0b27805c
+        .quad 0x3feffffbfa82616b, 0x3eec304efa2c6f4e
+        .quad 0x3feffffc316d9ed0, 0x3eeabe09e9144b5e
+        .quad 0x3feffffc6586abf6, 0x3ee95df988e76644
+        .quad 0x3feffffc96f1165e, 0x3ee80f439b4ee04b
+        .quad 0x3feffffcc5cec0c1, 0x3ee6d11788a69c64
+        .quad 0x3feffffcf23ff5fc, 0x3ee5a2adfa0b4bc4
+        .quad 0x3feffffd1c637b2b, 0x3ee4834877429b8f
+        .quad 0x3feffffd4456a10d, 0x3ee37231085c7d9a
+        .quad 0x3feffffd6a3554a1, 0x3ee26eb9daed6f7e
+        .quad 0x3feffffd8e1a2f22, 0x3ee1783ceac28910
+        .quad 0x3feffffdb01e8546, 0x3ee08e1badf0fced
+        .quad 0x3feffffdd05a75ea, 0x3edf5f7d88472604
+        .quad 0x3feffffdeee4f810, 0x3eddb92b5212fb8d
+        .quad 0x3feffffe0bd3e852, 0x3edc282cd3957eda
+        .quad 0x3feffffe273c15b7, 0x3edaab7abace48dc
+        .quad 0x3feffffe41314e06, 0x3ed94219bfcb4928
+        .quad 0x3feffffe59c6698b, 0x3ed7eb1a2075864e
+        .quad 0x3feffffe710d565e, 0x3ed6a597219a93da
+        .quad 0x3feffffe8717232d, 0x3ed570b69502f313
+        .quad 0x3feffffe9bf4098c, 0x3ed44ba864670882
+        .quad 0x3feffffeafb377d5, 0x3ed335a62115bce2
+        .quad 0x3feffffec2641a9e, 0x3ed22df298214423
+        .quad 0x3feffffed413e5b7, 0x3ed133d96ae7e0dd
+        .quad 0x3feffffee4d01cd6, 0x3ed046aeabcfcdec
+        .quad 0x3feffffef4a55bd4, 0x3ececb9cfe1d8642
+        .quad 0x3fefffff039f9e8f, 0x3ecd21397ead99cb
+        .quad 0x3fefffff11ca4876, 0x3ecb8d094c86d374
+        .quad 0x3fefffff1f302bc1, 0x3eca0df0f0c626dc
+        .quad 0x3fefffff2bdb904d, 0x3ec8a2e269750a39
+        .quad 0x3fefffff37d63a36, 0x3ec74adc8f4064d3
+        .quad 0x3fefffff43297019, 0x3ec604ea819f007c
+        .quad 0x3fefffff4dde0118, 0x3ec4d0231928c6f9
+        .quad 0x3fefffff57fc4a95, 0x3ec3aba85fe22e20
+        .quad 0x3fefffff618c3da6, 0x3ec296a70f414053
+        .quad 0x3fefffff6a956450, 0x3ec1905613b3abf2
+        .quad 0x3fefffff731ee681, 0x3ec097f6156f32c5
+        .quad 0x3fefffff7b2f8ed6, 0x3ebf59a20caf6695
+        .quad 0x3fefffff82cdcf1b, 0x3ebd9c73698fb1dc
+        .quad 0x3fefffff89ffc4aa, 0x3ebbf716c6168bae
+        .quad 0x3fefffff90cb3c81, 0x3eba6852c6b58392
+        .quad 0x3fefffff9735b73b, 0x3eb8eefd70594a89
+        .quad 0x3fefffff9d446ccc, 0x3eb789fb715aae95
+        .quad 0x3fefffffa2fc5015, 0x3eb6383f726a8e04
+        .quad 0x3fefffffa8621251, 0x3eb4f8c96f26a26a
+        .quad 0x3fefffffad7a2652, 0x3eb3caa61607f920
+        .quad 0x3fefffffb248c39d, 0x3eb2acee2f5ecdb8
+        .quad 0x3fefffffb6d1e95d, 0x3eb19ec60b1242ed
+        .quad 0x3fefffffbb196132, 0x3eb09f5cf4dd2877
+        .quad 0x3fefffffbf22c1e2, 0x3eaf5bd95d8730d8
+        .quad 0x3fefffffc2f171e3, 0x3ead9371e2ff7c35
+        .quad 0x3fefffffc688a9cf, 0x3eabe41de54d155a
+        .quad 0x3fefffffc9eb76ac, 0x3eaa4c89e08ef4f3
+        .quad 0x3fefffffcd1cbc28, 0x3ea8cb738399b12c
+        .quad 0x3fefffffd01f36af, 0x3ea75fa8dbc84bec
+        .quad 0x3fefffffd2f57d68, 0x3ea608078a70dcbc
+        .quad 0x3fefffffd5a2041f, 0x3ea4c37c0394d094
+        .quad 0x3fefffffd8271d12, 0x3ea39100d5687bfe
+        .quad 0x3fefffffda86faa9, 0x3ea26f9df8519bd7
+        .quad 0x3fefffffdcc3b117, 0x3ea15e6827001f18
+        .quad 0x3fefffffdedf37ed, 0x3ea05c803e4831c1
+        .quad 0x3fefffffe0db6b91, 0x3e9ed22548cffd35
+        .quad 0x3fefffffe2ba0ea5, 0x3e9d06ad6ecdf971
+        .quad 0x3fefffffe47ccb60, 0x3e9b551c847fbc96
+        .quad 0x3fefffffe62534d4, 0x3e99bc09f112b494
+        .quad 0x3fefffffe7b4c81e, 0x3e983a1ff0aa239d
+        .quad 0x3fefffffe92ced93, 0x3e96ce1aa3fd7bdd
+        .quad 0x3fefffffea8ef9cf, 0x3e9576c72b514859
+        .quad 0x3fefffffebdc2ec6, 0x3e943302cc4a0da8
+        .quad 0x3fefffffed15bcba, 0x3e9301ba221dc9bb
+        .quad 0x3fefffffee3cc32c, 0x3e91e1e857adc568
+        .quad 0x3fefffffef5251c2, 0x3e90d2966b1746f7
+        .quad 0x3feffffff0576917, 0x3e8fa5b4f49cc6b2
+        .quad 0x3feffffff14cfb92, 0x3e8dc3ae30b55c16
+        .quad 0x3feffffff233ee1d, 0x3e8bfd7555a3bd68
+        .quad 0x3feffffff30d18e8, 0x3e8a517d9e61628a
+        .quad 0x3feffffff3d9480f, 0x3e88be4f8f6c951f
+        .quad 0x3feffffff4993c46, 0x3e874287ded49339
+        .quad 0x3feffffff54dab72, 0x3e85dcd669f2cd34
+        .quad 0x3feffffff5f74141, 0x3e848bfd38302871
+        .quad 0x3feffffff6969fb8, 0x3e834ecf8a3c124a
+        .quad 0x3feffffff72c5fb6, 0x3e822430f521cbcf
+        .quad 0x3feffffff7b91176, 0x3e810b1488aeb235
+        .quad 0x3feffffff83d3d07, 0x3e80027c00a263a6
+        .quad 0x3feffffff8b962be, 0x3e7e12ee004efc37
+        .quad 0x3feffffff92dfba2, 0x3e7c3e44ae32b16b
+        .quad 0x3feffffff99b79d2, 0x3e7a854ea14102a8
+        .quad 0x3feffffffa0248e8, 0x3e78e6761569f45d
+        .quad 0x3feffffffa62ce54, 0x3e77603bac345f65
+        .quad 0x3feffffffabd69b4, 0x3e75f1353cdad001
+        .quad 0x3feffffffb127525, 0x3e74980cb3c80949
+        .quad 0x3feffffffb624592, 0x3e73537f00b6ad4d
+        .quad 0x3feffffffbad2aff, 0x3e72225b12bffc68
+        .quad 0x3feffffffbf370cd, 0x3e710380e1adb7e9
+        .quad 0x3feffffffc355dfd, 0x3e6febc107d5efaa
+        .quad 0x3feffffffc733572, 0x3e6df0f2a0ee6947
+        .quad 0x3feffffffcad3626, 0x3e6c14b2188bcee4
+        .quad 0x3feffffffce39b67, 0x3e6a553644f7f07d
+        .quad 0x3feffffffd169d0c, 0x3e68b0cfce0579e0
+        .quad 0x3feffffffd466fa5, 0x3e6725e7c5dd20f7
+        .quad 0x3feffffffd7344aa, 0x3e65b2fe547a1340
+        .quad 0x3feffffffd9d4aab, 0x3e6456a974e92e93
+        .quad 0x3feffffffdc4ad7a, 0x3e630f93c3699078
+        .quad 0x3feffffffde9964e, 0x3e61dc7b5b978cf8
+        .quad 0x3feffffffe0c2bf0, 0x3e60bc30c5d52f15
+        .quad 0x3feffffffe2c92db, 0x3e5f5b2be65a0c7f
+        .quad 0x3feffffffe4aed5e, 0x3e5d5f3a8dea7357
+        .quad 0x3feffffffe675bbd, 0x3e5b82915b03515b
+        .quad 0x3feffffffe81fc4e, 0x3e59c3517e789488
+        .quad 0x3feffffffe9aeb97, 0x3e581fb7df06136e
+        .quad 0x3feffffffeb24467, 0x3e56961b8d641d06
+        .quad 0x3feffffffec81ff2, 0x3e5524ec4d916cae
+        .quad 0x3feffffffedc95e7, 0x3e53cab1343d18d1
+        .quad 0x3feffffffeefbc85, 0x3e52860757487a01
+        .quad 0x3fefffffff01a8b6, 0x3e5155a09065d4f7
+        .quad 0x3fefffffff126e1e, 0x3e50384250e4c9fc
+        .quad 0x3fefffffff221f30, 0x3e4e59890b926c78
+        .quad 0x3fefffffff30cd3f, 0x3e4c642116a8a9e3
+        .quad 0x3fefffffff3e8892, 0x3e4a8e405e651ab6
+        .quad 0x3fefffffff4b606f, 0x3e48d5f98114f872
+        .quad 0x3fefffffff57632d, 0x3e47397c5a66e307
+        .quad 0x3fefffffff629e44, 0x3e45b71456c5a4c4
+        .quad 0x3fefffffff6d1e56, 0x3e444d26de513197
+        .quad 0x3fefffffff76ef3f, 0x3e42fa31d6371537
+        .quad 0x3fefffffff801c1f, 0x3e41bcca373b7b43
+        .quad 0x3fefffffff88af67, 0x3e40939ab853339f
+        .quad 0x3fefffffff90b2e3, 0x3e3efac5187b2863
+        .quad 0x3fefffffff982fc1, 0x3e3cf1e86235d0e7
+        .quad 0x3fefffffff9f2e9f, 0x3e3b0a68a2128bab
+        .quad 0x3fefffffffa5b790, 0x3e39423165bc4444
+        .quad 0x3fefffffffabd229, 0x3e37974e743dea3d
+        .quad 0x3fefffffffb18582, 0x3e3607e9eacd1050
+        .quad 0x3fefffffffb6d844, 0x3e34924a74dec729
+        .quad 0x3fefffffffbbd0aa, 0x3e3334d19e0c2160
+        .quad 0x3fefffffffc0748f, 0x3e31edfa3c5f5cca
+        .quad 0x3fefffffffc4c96c, 0x3e30bc56f1b54701
+        .quad 0x3fefffffffc8d462, 0x3e2f3d2185e047d9
+        .quad 0x3fefffffffcc9a41, 0x3e2d26cb87945e87
+        .quad 0x3fefffffffd01f89, 0x3e2b334fac4b9f99
+        .quad 0x3fefffffffd36871, 0x3e296076f7918d1c
+        .quad 0x3fefffffffd678ed, 0x3e27ac2d72fc2c63
+        .quad 0x3fefffffffd954ae, 0x3e2614801550319e
+        .quad 0x3fefffffffdbff2a, 0x3e24979ac8b28927
+        .quad 0x3fefffffffde7ba0, 0x3e2333c68e2d0548
+        .quad 0x3fefffffffe0cd16, 0x3e21e767bce37dd7
+        .quad 0x3fefffffffe2f664, 0x3e20b0fc5b6d05a0
+        .quad 0x3fefffffffe4fa30, 0x3e1f1e3523b41d7d
+        .quad 0x3fefffffffe6daf7, 0x3e1d00de6608effe
+        .quad 0x3fefffffffe89b0c, 0x3e1b0778b7b3301b
+        .quad 0x3fefffffffea3c9a, 0x3e192fb04ec0f6cf
+        .quad 0x3fefffffffebc1a9, 0x3e177756ec9f78fa
+        .quad 0x3fefffffffed2c21, 0x3e15dc61922d5a06
+        .quad 0x3fefffffffee7dc8, 0x3e145ce65699ff6d
+        .quad 0x3fefffffffefb847, 0x3e12f71a5f159970
+        .quad 0x3feffffffff0dd2b, 0x3e11a94ff571654f
+        .quad 0x3feffffffff1ede9, 0x3e1071f4bbea09ec
+        .quad 0x3feffffffff2ebda, 0x3e0e9f1ff8ddd774
+        .quad 0x3feffffffff3d843, 0x3e0c818223a202c7
+        .quad 0x3feffffffff4b453, 0x3e0a887bd2b4404d
+        .quad 0x3feffffffff58126, 0x3e08b1a336c5eb6b
+        .quad 0x3feffffffff63fc3, 0x3e06fab63324088a
+        .quad 0x3feffffffff6f121, 0x3e056197e30205ba
+        .quad 0x3feffffffff79626, 0x3e03e44e45301b92
+        .quad 0x3feffffffff82fab, 0x3e0281000bfe4c3f
+        .quad 0x3feffffffff8be77, 0x3e0135f28f2d50b4
+        .quad 0x3feffffffff94346, 0x3e000187dded5975
+        .quad 0x3feffffffff9bec8, 0x3dfdc479de0ef001
+        .quad 0x3feffffffffa319f, 0x3dfbad4fdad3caa1
+        .quad 0x3feffffffffa9c63, 0x3df9baed3ed27ab8
+        .quad 0x3feffffffffaffa4, 0x3df7ead9ce4285bb
+        .quad 0x3feffffffffb5be5, 0x3df63ac6b4edc88e
+        .quad 0x3feffffffffbb1a2, 0x3df4a88be2a6390c
+        .quad 0x3feffffffffc014e, 0x3df332259185f1a0
+        .quad 0x3feffffffffc4b56, 0x3df1d5b1f3793044
+        .quad 0x3feffffffffc901c, 0x3df0916f04b6e18b
+        .quad 0x3feffffffffccfff, 0x3deec77101de6926
+        .quad 0x3feffffffffd0b56, 0x3dec960bf23153e0
+        .quad 0x3feffffffffd4271, 0x3dea8bd20fc65ef7
+        .quad 0x3feffffffffd759d, 0x3de8a61745ec7d1d
+        .quad 0x3feffffffffda520, 0x3de6e25d0e756261
+        .quad 0x3feffffffffdd13c, 0x3de53e4f7d1666cb
+        .quad 0x3feffffffffdfa2d, 0x3de3b7c27a7ddb0e
+        .quad 0x3feffffffffe202d, 0x3de24caf2c32af14
+        .quad 0x3feffffffffe4371, 0x3de0fb3186804d0f
+        .quad 0x3feffffffffe642a, 0x3ddf830c0bb41fd7
+        .quad 0x3feffffffffe8286, 0x3ddd3c0f1a91c846
+        .quad 0x3feffffffffe9eb0, 0x3ddb1e5acf351d87
+        .quad 0x3feffffffffeb8d0, 0x3dd92712d259ce66
+        .quad 0x3feffffffffed10a, 0x3dd7538c60a04476
+        .quad 0x3feffffffffee782, 0x3dd5a14b04b47879
+        .quad 0x3feffffffffefc57, 0x3dd40dfd87456f4c
+        .quad 0x3fefffffffff0fa7, 0x3dd2977b1172b9d5
+        .quad 0x3fefffffffff218f, 0x3dd13bc07e891491
+        .quad 0x3fefffffffff3227, 0x3dcff1dbb4300811
+        .quad 0x3fefffffffff4188, 0x3dcd9a880f306bd8
+        .quad 0x3fefffffffff4fc9, 0x3dcb6e45220b55e0
+        .quad 0x3fefffffffff5cfd, 0x3dc96a0b33f2c4da
+        .quad 0x3fefffffffff6939, 0x3dc78b07e9e924ac
+        .quad 0x3fefffffffff748e, 0x3dc5ce9ab1670dd2
+        .quad 0x3fefffffffff7f0d, 0x3dc4325167006bb0
+        .quad 0x3fefffffffff88c5, 0x3dc2b3e53538ff3f
+        .quad 0x3fefffffffff91c6, 0x3dc15137a7f44864
+        .quad 0x3fefffffffff9a1b, 0x3dc0084ff125639d
+        .quad 0x3fefffffffffa1d2, 0x3dbdaeb0b7311ec7
+        .quad 0x3fefffffffffa8f6, 0x3dbb7937d1c40c53
+        .quad 0x3fefffffffffaf92, 0x3db96d082f59ab06
+        .quad 0x3fefffffffffb5b0, 0x3db7872d9fa10aad
+        .quad 0x3fefffffffffbb58, 0x3db5c4e8e37bc7d0
+        .quad 0x3fefffffffffc095, 0x3db423ac0df49a40
+        .quad 0x3fefffffffffc56d, 0x3db2a117230ad284
+        .quad 0x3fefffffffffc9e8, 0x3db13af4f04f9998
+        .quad 0x3fefffffffffce0d, 0x3dafde703724e560
+        .quad 0x3fefffffffffd1e1, 0x3dad77f0c82e7641
+        .quad 0x3fefffffffffd56c, 0x3dab3ee02611d7dd
+        .quad 0x3fefffffffffd8b3, 0x3da92ff33023d5bd
+        .quad 0x3fefffffffffdbba, 0x3da7481a9e69f53f
+        .quad 0x3fefffffffffde86, 0x3da5847eda620959
+        .quad 0x3fefffffffffe11d, 0x3da3e27c1fcc74bd
+        .quad 0x3fefffffffffe380, 0x3da25f9ee0b923dc
+        .quad 0x3fefffffffffe5b6, 0x3da0f9a068653200
+        .quad 0x3fefffffffffe7c0, 0x3d9f5cc7718082b0
+        .quad 0x3fefffffffffe9a2, 0x3d9cf7e53d6a2ca5
+        .quad 0x3fefffffffffeb60, 0x3d9ac0f5f3229372
+        .quad 0x3fefffffffffecfb, 0x3d98b498644847ea
+        .quad 0x3fefffffffffee77, 0x3d96cfa9bcca59dc
+        .quad 0x3fefffffffffefd6, 0x3d950f411d4fd2cd
+        .quad 0x3feffffffffff11a, 0x3d9370ab8327af5e
+        .quad 0x3feffffffffff245, 0x3d91f167f88c6b6e
+        .quad 0x3feffffffffff359, 0x3d908f24085d4597
+        .quad 0x3feffffffffff457, 0x3d8e8f70e181d61a
+        .quad 0x3feffffffffff542, 0x3d8c324c20e337dc
+        .quad 0x3feffffffffff61b, 0x3d8a03261574b54e
+        .quad 0x3feffffffffff6e3, 0x3d87fe903cdf5855
+        .quad 0x3feffffffffff79b, 0x3d86215c58da3450
+        .quad 0x3feffffffffff845, 0x3d846897d4b69fc6
+        .quad 0x3feffffffffff8e2, 0x3d82d1877d731b7b
+        .quad 0x3feffffffffff973, 0x3d8159a386b11517
+        .quad 0x3feffffffffff9f8, 0x3d7ffd27ae9393ce
+        .quad 0x3feffffffffffa73, 0x3d7d7c593130dd0b
+        .quad 0x3feffffffffffae4, 0x3d7b2cd607c79bcf
+        .quad 0x3feffffffffffb4c, 0x3d790ae4d3405651
+        .quad 0x3feffffffffffbad, 0x3d771312dd1759e2
+        .quad 0x3feffffffffffc05, 0x3d75422ef5d8949d
+        .quad 0x3feffffffffffc57, 0x3d739544b0ecc957
+        .quad 0x3feffffffffffca2, 0x3d720997f73e73dd
+        .quad 0x3feffffffffffce7, 0x3d709ca0eaacd277
+        .quad 0x3feffffffffffd27, 0x3d6e9810295890ec
+        .quad 0x3feffffffffffd62, 0x3d6c2b45b5aa4a1d
+        .quad 0x3feffffffffffd98, 0x3d69eee068fa7596
+        .quad 0x3feffffffffffdca, 0x3d67df2b399c10a8
+        .quad 0x3feffffffffffdf8, 0x3d65f8b87a31bd85
+        .quad 0x3feffffffffffe22, 0x3d64385c96e9a2d9
+        .quad 0x3feffffffffffe49, 0x3d629b2933ef4cbc
+        .quad 0x3feffffffffffe6c, 0x3d611e68a6378f8a
+        .quad 0x3feffffffffffe8d, 0x3d5f7f338086a86b
+        .quad 0x3feffffffffffeab, 0x3d5cf8d7d9ce040a
+        .quad 0x3feffffffffffec7, 0x3d5aa577251ae485
+        .quad 0x3feffffffffffee1, 0x3d58811d739efb5f
+        .quad 0x3feffffffffffef8, 0x3d568823e52970be
+        .quad 0x3fefffffffffff0e, 0x3d54b72ae68e8b4c
+        .quad 0x3fefffffffffff22, 0x3d530b14dbe876bc
+        .quad 0x3fefffffffffff34, 0x3d5181012ef86610
+        .quad 0x3fefffffffffff45, 0x3d501647ba798745
+        .quad 0x3fefffffffffff54, 0x3d4d90e917701675
+        .quad 0x3fefffffffffff62, 0x3d4b2a87e86d0c8a
+        .quad 0x3fefffffffffff6f, 0x3d48f53dcb377293
+        .quad 0x3fefffffffffff7b, 0x3d46ed2f2515e933
+        .quad 0x3fefffffffffff86, 0x3d450ecc9ed47f19
+        .quad 0x3fefffffffffff90, 0x3d4356cd5ce7799e
+        .quad 0x3fefffffffffff9a, 0x3d41c229a587ab78
+        .quad 0x3fefffffffffffa2, 0x3d404e15ecc7f3f6
+        .quad 0x3fefffffffffffaa, 0x3d3deffc7e6a6017
+        .quad 0x3fefffffffffffb1, 0x3d3b7b040832f310
+        .quad 0x3fefffffffffffb8, 0x3d3938e021f36d76
+        .quad 0x3fefffffffffffbe, 0x3d37258610b3b233
+        .quad 0x3fefffffffffffc3, 0x3d353d3bfc82a909
+        .quad 0x3fefffffffffffc8, 0x3d337c92babdc2fd
+        .quad 0x3fefffffffffffcd, 0x3d31e06010120f6a
+        .quad 0x3fefffffffffffd1, 0x3d3065b9616170d4
+        .quad 0x3fefffffffffffd5, 0x3d2e13dd96b3753b
+        .quad 0x3fefffffffffffd9, 0x3d2b950d32467392
+        .quad 0x3fefffffffffffdc, 0x3d294a72263259a5
+        .quad 0x3fefffffffffffdf, 0x3d272fd93e036cdc
+        .quad 0x3fefffffffffffe2, 0x3d254164576929ab
+        .quad 0x3fefffffffffffe4, 0x3d237b83c521fe96
+        .quad 0x3fefffffffffffe7, 0x3d21daf033182e96
+        .quad 0x3fefffffffffffe9, 0x3d205ca50205d26a
+        .quad 0x3fefffffffffffeb, 0x3d1dfbb6235639fa
+        .quad 0x3fefffffffffffed, 0x3d1b7807e294781f
+        .quad 0x3fefffffffffffee, 0x3d19298add70a734
+        .quad 0x3feffffffffffff0, 0x3d170beaf9c7ffb6
+        .quad 0x3feffffffffffff1, 0x3d151b2cd6709222
+        .quad 0x3feffffffffffff3, 0x3d1353a6cf7f7fff
+        .quad 0x3feffffffffffff4, 0x3d11b1fa8cbe84a7
+        .quad 0x3feffffffffffff5, 0x3d10330f0fd69921
+        .quad 0x3feffffffffffff6, 0x3d0da81670f96f9b
+        .quad 0x3feffffffffffff7, 0x3d0b24a16b4d09aa
+        .quad 0x3feffffffffffff7, 0x3d08d6eeb6efdbd6
+        .quad 0x3feffffffffffff8, 0x3d06ba91ac734786
+        .quad 0x3feffffffffffff9, 0x3d04cb7966770ab5
+        .quad 0x3feffffffffffff9, 0x3d0305e9721d0981
+        .quad 0x3feffffffffffffa, 0x3d01667311fff70a
+        .quad 0x3feffffffffffffb, 0x3cffd3de10d62855
+        .quad 0x3feffffffffffffb, 0x3cfd1aefbcd48d0c
+        .quad 0x3feffffffffffffb, 0x3cfa9cc93c25aca9
+        .quad 0x3feffffffffffffc, 0x3cf85487ee3ea735
+        .quad 0x3feffffffffffffc, 0x3cf63daf8b4b1e0c
+        .quad 0x3feffffffffffffd, 0x3cf45421e69a6ca1
+        .quad 0x3feffffffffffffd, 0x3cf294175802d99a
+        .quad 0x3feffffffffffffd, 0x3cf0fa17bf41068f
+        .quad 0x3feffffffffffffd, 0x3cef05e82aae2bb9
+        .quad 0x3feffffffffffffe, 0x3cec578101b29058
+        .quad 0x3feffffffffffffe, 0x3ce9e39dc5dd2f7c
+        .quad 0x3feffffffffffffe, 0x3ce7a553a728bbf2
+        .quad 0x3feffffffffffffe, 0x3ce5982008db1304
+        .quad 0x3feffffffffffffe, 0x3ce3b7e00422e51b
+        .quad 0x3feffffffffffffe, 0x3ce200c898d9ee3e
+        .quad 0x3fefffffffffffff, 0x3ce06f5f7eb65a56
+        .quad 0x3fefffffffffffff, 0x3cde00e9148a1d25
+        .quad 0x3fefffffffffffff, 0x3cdb623734024e92
+        .quad 0x3fefffffffffffff, 0x3cd8fd4e01891bf8
+        .quad 0x3fefffffffffffff, 0x3cd6cd44c7470d89
+        .quad 0x3fefffffffffffff, 0x3cd4cd9c04158cd7
+        .quad 0x3fefffffffffffff, 0x3cd2fa34bf5c8344
+        .quad 0x3fefffffffffffff, 0x3cd14f4890ff2461
+        .quad 0x3fefffffffffffff, 0x3ccf92c49dfa4df5
+        .quad 0x3fefffffffffffff, 0x3ccccaaea71ab0df
+        .quad 0x3fefffffffffffff, 0x3cca40829f001197
+        .quad 0x3ff0000000000000, 0x3cc7eef13b59e96c
+        .quad 0x3ff0000000000000, 0x3cc5d11e1a252bf5
+        .quad 0x3ff0000000000000, 0x3cc3e296303b2297
+        .quad 0x3ff0000000000000, 0x3cc21f47009f43ce
+        .quad 0x3ff0000000000000, 0x3cc083768c5e4542
+        .quad 0x3ff0000000000000, 0x3cbe1777d831265f
+        .quad 0x3ff0000000000000, 0x3cbb69f10b0191b5
+        .quad 0x3ff0000000000000, 0x3cb8f8a3a05b5b53
+        .quad 0x3ff0000000000000, 0x3cb6be573c40c8e7
+        .quad 0x3ff0000000000000, 0x3cb4b645ba991fdb
+        .align 32
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff  /* _AbsMask */
+        .align 32
+        .quad 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000  /* _MaxThreshold = 6.0 - 1.0/128.0 */
+        .align 32
+        .quad 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000  /* SRound */
+        .align 32
+        .quad 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000  /* _U2THreshold  */
+        .align 32
+        .quad 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5  /* _poly_1_0 */
+        .align 32
+        .quad 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1  /* _poly_1_1 */
+        .align 32
+        .quad 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57  /* _poly_3_0 */
+        .align 32
+        .quad 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8  /* _poly_3_1 */
+        .align 32
+        .quad 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F  /* _poly_5_0 */
+        .align 32
+        .quad 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122  /* _poly_5_1 */
+        .align 32
+        .quad 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6  /* _poly_1_2 */
+        .align 32
+        .quad 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd  /* _poly_3_2 */
+        .align 32
+        .quad 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c  /* _poly_1_3 */
+        .align 32
+        .quad 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555  /* _poly_3_3 */
+        .align 32
+        .quad 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff  /* _Mask32 */
+        .align 32
+        .type	__svml_derf_data_internal,@object
+        .size	__svml_derf_data_internal,.-__svml_derf_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core-avx2.S
new file mode 100644
index 0000000000..3456142289
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized erf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_erf _ZGVeN8v_erf_avx2_wrapper
+#include "../svml_d_erf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core.c
new file mode 100644
index 0000000000..78e4a852c6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized erf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_erf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_erf, __GI__ZGVeN8v_erf, __redirect__ZGVeN8v_erf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core_avx512.S
new file mode 100644
index 0000000000..38f373102a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core_avx512.S
@@ -0,0 +1,983 @@
+/* Function erf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Basic formula is
+ *    erf(x) ~ erf(x0) +
+ *              + exp(-x0*x0)*D*(1+c0+T*P1(T)+D^2*P3(T)+D^4*P5(T)+D^6*p7+D^8*p9)
+ *   where D=x-x0, T=x0*D
+ *   x0 is x rounded to a specified number of fractional bits (in this case 7),
+ *    except that x0=0 for |x|<3.5/128.0 (using x0=0 for first 4 table entries)
+ *
+ *   Data table packs both erf(x0)_high and a few bits of erf(x0)_low in one
+ *   entry (in place of redundant exponent bits)
+ *
+ */
+
+/* Offsets for data table __svml_derf_data_internal
+ */
+#define _erf_tbl                      	0
+#define _AbsMask                      	12288
+#define _MaxThreshold                 	12352
+#define _SRound                       	12416
+#define _U2Threshold                  	12480
+#define _poly1_0                      	12544
+#define _poly1_1                      	12608
+#define _poly3_0                      	12672
+#define _poly3_1                      	12736
+#define _poly5_0                      	12800
+#define _poly5_1                      	12864
+#define _poly1_2                      	12928
+#define _poly3_2                      	12992
+#define _poly1_3                      	13056
+#define _poly3_3                      	13120
+#define _Mask32                       	13184
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_erf_skx)
+/*
+ * vector gather: erf(x0),
+ * second value is exp(-x0*x0)
+ */
+        lea       __svml_derf_data_internal(%rip), %rax
+
+/*
+ * erf(x) rounds to 1.0 for x>_MaxThreshold (5.9921875)
+ * can compute all results in the main path
+ */
+        vmovups   _MaxThreshold+__svml_derf_data_internal(%rip), %zmm9
+        vmovups   _SRound+__svml_derf_data_internal(%rip), %zmm11
+        vmovups   _U2Threshold+__svml_derf_data_internal(%rip), %zmm10
+        vandpd    _AbsMask+__svml_derf_data_internal(%rip), %zmm0, %zmm7
+        vpternlogd $0xff, %zmm1, %zmm1, %zmm14
+        kxnorw    %k0, %k0, %k3
+        kxnorw    %k0, %k0, %k2
+        vminpd    {sae}, %zmm9, %zmm7, %zmm12
+
+/* save sign */
+        vxorpd    %zmm0, %zmm7, %zmm8
+        vaddpd    {rn-sae}, %zmm11, %zmm12, %zmm15
+        vcmppd    $26, {sae}, %zmm10, %zmm12, %k1
+
+/*
+ * _LA_ polynomial computation
+ * Start polynomial evaluation
+ */
+        vmovups   _poly1_0+__svml_derf_data_internal(%rip), %zmm10
+        vpsllq    $4, %zmm15, %zmm3
+        vsubpd    {rn-sae}, %zmm11, %zmm15, %zmm13
+        vmovups   _poly3_0+__svml_derf_data_internal(%rip), %zmm11
+        vmovups   _poly3_3+__svml_derf_data_internal(%rip), %zmm15
+        vsubpd    {rn-sae}, %zmm13, %zmm12, %zmm1
+        vmulpd    {rn-sae}, %zmm1, %zmm13, %zmm6
+
+/* NaN fixup */
+        vminpd    {sae}, %zmm7, %zmm1, %zmm7
+        vmovups   _poly1_2+__svml_derf_data_internal(%rip), %zmm13
+        vpandq    _Mask32+__svml_derf_data_internal(%rip), %zmm3, %zmm2
+        vpmovqd   %zmm2, %ymm0
+        vmovups   _poly1_1+__svml_derf_data_internal(%rip), %zmm2
+        vfmadd231pd {rn-sae}, %zmm6, %zmm10, %zmm2
+        vfmadd213pd {rn-sae}, %zmm13, %zmm6, %zmm2
+        vpxord    %zmm4, %zmm4, %zmm4
+        vgatherdpd 8(%rax,%ymm0), %zmm4{%k3}
+        vpxord    %zmm5, %zmm5, %zmm5
+        vgatherdpd (%rax,%ymm0), %zmm5{%k2}
+        vmovups   _poly3_1+__svml_derf_data_internal(%rip), %zmm0
+
+/* Sign | _Erf_H */
+        vxorpd    %zmm8, %zmm5, %zmm5
+        vfmadd231pd {rn-sae}, %zmm6, %zmm11, %zmm0
+        vpandnq   %zmm12, %zmm12, %zmm14{%k1}
+        vandpd    %zmm14, %zmm1, %zmm9
+
+/* Sign | Diff */
+        vxorpd    %zmm8, %zmm7, %zmm1
+        vmovups   _poly5_0+__svml_derf_data_internal(%rip), %zmm12
+        vmovups   _poly5_1+__svml_derf_data_internal(%rip), %zmm7
+        vmovups   _poly3_2+__svml_derf_data_internal(%rip), %zmm14
+
+/* D2 = Diff^2 */
+        vmulpd    {rn-sae}, %zmm9, %zmm9, %zmm3
+
+/* T^2 */
+        vmulpd    {rn-sae}, %zmm6, %zmm6, %zmm9
+
+/* exp_h(x0) * Diff */
+        vmulpd    {rn-sae}, %zmm1, %zmm4, %zmm4
+        vfmadd231pd {rn-sae}, %zmm6, %zmm12, %zmm7
+        vmovups   _poly1_3+__svml_derf_data_internal(%rip), %zmm12
+        vfmadd213pd {rn-sae}, %zmm14, %zmm6, %zmm0
+        vfmadd213pd {rn-sae}, %zmm15, %zmm3, %zmm7
+        vfmadd213pd {rn-sae}, %zmm12, %zmm6, %zmm2
+        vfmadd213pd {rn-sae}, %zmm7, %zmm6, %zmm0
+
+/* P1 = T^2*P1 - T */
+        vfmsub213pd {rn-sae}, %zmm6, %zmm9, %zmm2
+
+/* P1 + P3*D2 */
+        vfmadd213pd {rn-sae}, %zmm2, %zmm3, %zmm0
+
+/*
+ * branch-free
+ * low part of result: exp_h(x0) * Diff*(1+P1)
+ */
+        vfmadd213pd {rn-sae}, %zmm4, %zmm4, %zmm0
+
+/* Final result */
+        vaddpd    {rn-sae}, %zmm5, %zmm0, %zmm6
+
+/* Fix erf(-0) = -0 */
+        vorpd     %zmm8, %zmm6, %zmm0
+        ret
+
+END(_ZGVeN8v_erf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_derf_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(64)) VUINT32 _erf_tbl[6*128*2][2];
+        __declspec(align(64)) VUINT32 _AbsMask[8][2];
+        __declspec(align(64)) VUINT32 _MaxThreshold[8][2];
+        __declspec(align(64)) VUINT32 _SRound[8][2];
+        __declspec(align(64)) VUINT32 _U2Threshold[8][2];
+        __declspec(align(64)) VUINT32 _poly1_0[8][2];
+        __declspec(align(64)) VUINT32 _poly1_1[8][2];
+        __declspec(align(64)) VUINT32 _poly3_0[8][2];
+        __declspec(align(64)) VUINT32 _poly3_1[8][2];
+        __declspec(align(64)) VUINT32 _poly5_0[8][2];
+        __declspec(align(64)) VUINT32 _poly5_1[8][2];
+        __declspec(align(64)) VUINT32 _poly1_2[8][2];
+        __declspec(align(64)) VUINT32 _poly3_2[8][2];
+        __declspec(align(64)) VUINT32 _poly1_3[8][2];
+        __declspec(align(64)) VUINT32 _poly3_3[8][2];
+        __declspec(align(64)) VUINT32 _Mask32[8][2];
+} __svml_derf_data_internal;
+#endif
+__svml_derf_data_internal:
+        /*== _erf_tbl ==*/
+        .quad 0x0000000000000000, 0x3ff20dd750429b6d
+        .quad 0x3f820dbf3deb1340, 0x3ff20d8f1975c85d
+        .quad 0x3f920d77083f17a0, 0x3ff20cb67bd452c7
+        .quad 0x3f9b137e0cf584dc, 0x3ff20b4d8bac36c1
+        .quad 0x3fa20c5645dd2538, 0x3ff209546ad13ccf
+        .quad 0x3fa68e5d3bbc9526, 0x3ff206cb4897b148
+        .quad 0x3fab0fafef135745, 0x3ff203b261cd0053
+        .quad 0x3faf902a77bd3821, 0x3ff2000a00ae3804
+        .quad 0x3fb207d480e90658, 0x3ff1fbd27cdc72d3
+        .quad 0x3fb44703e87e8593, 0x3ff1f70c3b4f2cc8
+        .quad 0x3fb68591a1e83b5d, 0x3ff1f1b7ae44867f
+        .quad 0x3fb8c36beb8a8d23, 0x3ff1ebd5552f795b
+        .quad 0x3fbb0081148a873a, 0x3ff1e565bca400d4
+        .quad 0x3fbd3cbf7e70a4b3, 0x3ff1de697e413d29
+        .quad 0x3fbf78159ec8bb50, 0x3ff1d6e14099944a
+        .quad 0x3fc0d939005f65e5, 0x3ff1cecdb718d61c
+        .quad 0x3fc1f5e1a35c3b89, 0x3ff1c62fa1e869b6
+        .quad 0x3fc311fc15f56d14, 0x3ff1bd07cdd189ac
+        .quad 0x3fc42d7fc2f64959, 0x3ff1b357141d95d5
+        .quad 0x3fc548642321d7c6, 0x3ff1a91e5a748165
+        .quad 0x3fc662a0bdf7a89f, 0x3ff19e5e92b964ab
+        .quad 0x3fc77c2d2a765f9e, 0x3ff19318bae53a04
+        .quad 0x3fc895010fdbdbfd, 0x3ff1874ddcdfce24
+        .quad 0x3fc9ad142662e14d, 0x3ff17aff0e56ec10
+        .quad 0x3fcac45e37fe2526, 0x3ff16e2d7093cd8c
+        .quad 0x3fcbdad72110a648, 0x3ff160da304ed92f
+        .quad 0x3fccf076d1233237, 0x3ff153068581b781
+        .quad 0x3fce05354b96ff36, 0x3ff144b3b337c90c
+        .quad 0x3fcf190aa85540e2, 0x3ff135e3075d076b
+        .quad 0x3fd015f78a3dcf3d, 0x3ff12695da8b5bde
+        .quad 0x3fd09eed6982b948, 0x3ff116cd8fd67618
+        .quad 0x3fd127631eb8de32, 0x3ff1068b94962e5e
+        .quad 0x3fd1af54e232d609, 0x3ff0f5d1602f7e41
+        .quad 0x3fd236bef825d9a2, 0x3ff0e4a073dc1b91
+        .quad 0x3fd2bd9db0f7827f, 0x3ff0d2fa5a70c168
+        .quad 0x3fd343ed6989b7d9, 0x3ff0c0e0a8223359
+        .quad 0x3fd3c9aa8b84beda, 0x3ff0ae54fa490723
+        .quad 0x3fd44ed18d9f6462, 0x3ff09b58f724416b
+        .quad 0x3fd4d35ef3e5372e, 0x3ff087ee4d9ad247
+        .quad 0x3fd5574f4ffac98e, 0x3ff07416b4fbfe7c
+        .quad 0x3fd5da9f415ff23f, 0x3ff05fd3ecbec298
+        .quad 0x3fd65d4b75b00471, 0x3ff04b27bc403d30
+        .quad 0x3fd6df50a8dff772, 0x3ff03613f2812daf
+        .quad 0x3fd760aba57a76bf, 0x3ff0209a65e29545
+        .quad 0x3fd7e15944d9d3e4, 0x3ff00abcf3e187a9
+        .quad 0x3fd861566f5fd3c0, 0x3fefe8fb01a47307
+        .quad 0x3fd8e0a01cab516b, 0x3fefbbbbef34b4b2
+        .quad 0x3fd95f3353cbb146, 0x3fef8dc092d58ff8
+        .quad 0x3fd9dd0d2b721f39, 0x3fef5f0cdaf15313
+        .quad 0x3fda5a2aca209394, 0x3fef2fa4c16c0019
+        .quad 0x3fdad68966569a87, 0x3feeff8c4b1375db
+        .quad 0x3fdb522646bbda68, 0x3feecec7870ebca8
+        .quad 0x3fdbccfec24855b8, 0x3fee9d5a8e4c934e
+        .quad 0x3fdc4710406a65fc, 0x3fee6b4982f158b9
+        .quad 0x3fdcc058392a6d2d, 0x3fee38988fc46e72
+        .quad 0x3fdd38d4354c3bd0, 0x3fee054be79d3042
+        .quad 0x3fddb081ce6e2a48, 0x3fedd167c4cf9d2a
+        .quad 0x3fde275eaf25e458, 0x3fed9cf06898cdaf
+        .quad 0x3fde9d68931ae650, 0x3fed67ea1a8b5368
+        .quad 0x3fdf129d471eabb1, 0x3fed325927fb9d89
+        .quad 0x3fdf86faa9428f9d, 0x3fecfc41e36c7df9
+        .quad 0x3fdffa7ea8eb5fd0, 0x3fecc5a8a3fbea40
+        .quad 0x3fe03693a371519c, 0x3fec8e91c4d01368
+        .quad 0x3fe06f794ab2cae7, 0x3fec5701a484ef9d
+        .quad 0x3fe0a7ef5c18edd2, 0x3fec1efca49a5011
+        .quad 0x3fe0dff4f247f6c6, 0x3febe68728e29d5e
+        .quad 0x3fe1178930ada115, 0x3febada596f25436
+        .quad 0x3fe14eab43841b55, 0x3feb745c55905bf8
+        .quad 0x3fe1855a5fd3dd50, 0x3feb3aafcc27502e
+        .quad 0x3fe1bb95c3746199, 0x3feb00a46237d5be
+        .quad 0x3fe1f15cb50bc4de, 0x3feac63e7ecc1411
+        .quad 0x3fe226ae840d4d70, 0x3fea8b8287ec6a09
+        .quad 0x3fe25b8a88b6dd7f, 0x3fea5074e2157620
+        .quad 0x3fe28ff0240d52cd, 0x3fea1519efaf889e
+        .quad 0x3fe2c3debfd7d6c1, 0x3fe9d97610879642
+        .quad 0x3fe2f755ce9a21f4, 0x3fe99d8da149c13f
+        .quad 0x3fe32a54cb8db67b, 0x3fe96164fafd8de3
+        .quad 0x3fe35cdb3a9a144d, 0x3fe925007283d7aa
+        .quad 0x3fe38ee8a84beb71, 0x3fe8e86458169af8
+        .quad 0x3fe3c07ca9cb4f9e, 0x3fe8ab94f6caa71d
+        .quad 0x3fe3f196dcd0f135, 0x3fe86e9694134b9e
+        .quad 0x3fe42236e79a5fa6, 0x3fe8316d6f48133d
+        .quad 0x3fe4525c78dd5966, 0x3fe7f41dc12c9e89
+        .quad 0x3fe4820747ba2dc2, 0x3fe7b6abbb7aaf19
+        .quad 0x3fe4b13713ad3513, 0x3fe7791b886e7403
+        .quad 0x3fe4dfeba47f63cc, 0x3fe73b714a552763
+        .quad 0x3fe50e24ca35fd2c, 0x3fe6fdb11b1e0c34
+        .quad 0x3fe53be25d016a4f, 0x3fe6bfdf0beddaf5
+        .quad 0x3fe569243d2b3a9b, 0x3fe681ff24b4ab04
+        .quad 0x3fe595ea53035283, 0x3fe6441563c665d4
+        .quad 0x3fe5c2348ecc4dc3, 0x3fe60625bd75d07b
+        .quad 0x3fe5ee02e8a71a53, 0x3fe5c8341bb23767
+        .quad 0x3fe61955607dd15d, 0x3fe58a445da7c74c
+        .quad 0x3fe6442bfdedd397, 0x3fe54c5a57629db0
+        .quad 0x3fe66e86d0312e82, 0x3fe50e79d1749ac9
+        .quad 0x3fe69865ee075011, 0x3fe4d0a6889dfd9f
+        .quad 0x3fe6c1c9759d0e5f, 0x3fe492e42d78d2c5
+        .quad 0x3fe6eab18c74091b, 0x3fe4553664273d24
+        .quad 0x3fe7131e5f496a5a, 0x3fe417a0c4049fd0
+        .quad 0x3fe73b1021fc0cb8, 0x3fe3da26d759aef5
+        .quad 0x3fe762870f720c6f, 0x3fe39ccc1b136d5a
+        .quad 0x3fe78983697dc96f, 0x3fe35f93fe7d1b3d
+        .quad 0x3fe7b00578c26037, 0x3fe32281e2fd1a92
+        .quad 0x3fe7d60d8c979f7b, 0x3fe2e5991bd4cbfc
+        .quad 0x3fe7fb9bfaed8078, 0x3fe2a8dcede3673b
+        .quad 0x3fe820b1202f27fb, 0x3fe26c508f6bd0ff
+        .quad 0x3fe8454d5f25760d, 0x3fe22ff727dd6f7b
+        .quad 0x3fe8697120d92a4a, 0x3fe1f3d3cf9ffe5a
+        .quad 0x3fe88d1cd474a2e0, 0x3fe1b7e98fe26217
+        .quad 0x3fe8b050ef253c37, 0x3fe17c3b626c7a12
+        .quad 0x3fe8d30debfc572e, 0x3fe140cc3173f007
+        .quad 0x3fe8f5544bd00c04, 0x3fe1059ed7740313
+        .quad 0x3fe91724951b8fc6, 0x3fe0cab61f084b93
+        .quad 0x3fe9387f53df5238, 0x3fe09014c2ca74da
+        .quad 0x3fe959651980da31, 0x3fe055bd6d32e8d7
+        .quad 0x3fe979d67caa6631, 0x3fe01bb2b87c6968
+        .quad 0x3fe999d4192a5715, 0x3fdfc3ee5d1524b0
+        .quad 0x3fe9b95e8fd26aba, 0x3fdf511a91a67d2a
+        .quad 0x3fe9d8768656cc42, 0x3fdedeeee0959518
+        .quad 0x3fe9f71ca72cffb6, 0x3fde6d6ffaa65a25
+        .quad 0x3fea1551a16aaeaf, 0x3fddfca26f5bbf88
+        .quad 0x3fea331628a45b92, 0x3fdd8c8aace11e63
+        .quad 0x3fea506af4cc00f4, 0x3fdd1d2cfff91594
+        .quad 0x3fea6d50c20fa293, 0x3fdcae8d93f1d7b7
+        .quad 0x3fea89c850b7d54d, 0x3fdc40b0729ed548
+        .quad 0x3feaa5d265064366, 0x3fdbd3998457afdb
+        .quad 0x3feac16fc7143263, 0x3fdb674c8ffc6283
+        .quad 0x3feadca142b10f98, 0x3fdafbcd3afe8ab6
+        .quad 0x3feaf767a741088b, 0x3fda911f096fbc26
+        .quad 0x3feb11c3c79bb424, 0x3fda27455e14c93c
+        .quad 0x3feb2bb679ead19c, 0x3fd9be437a7de946
+        .quad 0x3feb4540978921ee, 0x3fd9561c7f23a47b
+        .quad 0x3feb5e62fce16095, 0x3fd8eed36b886d93
+        .quad 0x3feb771e894d602e, 0x3fd8886b1e5ecfd1
+        .quad 0x3feb8f741ef54f83, 0x3fd822e655b417e7
+        .quad 0x3feba764a2af2b78, 0x3fd7be47af1f5d89
+        .quad 0x3febbef0fbde6221, 0x3fd75a91a7f4d2ed
+        .quad 0x3febd61a1453ab44, 0x3fd6f7c69d7d3ef8
+        .quad 0x3febece0d82d1a5c, 0x3fd695e8cd31867e
+        .quad 0x3fec034635b66e23, 0x3fd634fa54fa285f
+        .quad 0x3fec194b1d49a184, 0x3fd5d4fd33729015
+        .quad 0x3fec2ef0812fc1bd, 0x3fd575f3483021c3
+        .quad 0x3fec443755820d64, 0x3fd517de540ce2a3
+        .quad 0x3fec5920900b5fd1, 0x3fd4babff975a04c
+        .quad 0x3fec6dad2829ec62, 0x3fd45e99bcbb7915
+        .quad 0x3fec81de16b14cef, 0x3fd4036d0468a7a2
+        .quad 0x3fec95b455cce69d, 0x3fd3a93b1998736c
+        .quad 0x3feca930e0e2a825, 0x3fd35005285227f1
+        .quad 0x3fecbc54b476248d, 0x3fd2f7cc3fe6f423
+        .quad 0x3feccf20ce0c0d27, 0x3fd2a09153529381
+        .quad 0x3fece1962c0e0d8b, 0x3fd24a55399ea239
+        .quad 0x3fecf3b5cdaf0c39, 0x3fd1f518ae487dc8
+        .quad 0x3fed0580b2cfd249, 0x3fd1a0dc51a9934d
+        .quad 0x3fed16f7dbe41ca0, 0x3fd14da0a961fd14
+        .quad 0x3fed281c49d818d0, 0x3fd0fb6620c550af
+        .quad 0x3fed38eefdf64fdd, 0x3fd0aa2d09497f2b
+        .quad 0x3fed4970f9ce00d9, 0x3fd059f59af7a906
+        .quad 0x3fed59a33f19ed42, 0x3fd00abff4dec7a3
+        .quad 0x3fed6986cfa798e7, 0x3fcf79183b101c5b
+        .quad 0x3fed791cad3eff01, 0x3fcedeb406d9c825
+        .quad 0x3fed8865d98abe01, 0x3fce4652fadcb6b2
+        .quad 0x3fed97635600bb89, 0x3fcdaff4969c0b04
+        .quad 0x3feda61623cb41e0, 0x3fcd1b982c501370
+        .quad 0x3fedb47f43b2980d, 0x3fcc893ce1dcbef7
+        .quad 0x3fedc29fb60715af, 0x3fcbf8e1b1ca2279
+        .quad 0x3fedd0787a8bb39d, 0x3fcb6a856c3ed54f
+        .quad 0x3fedde0a90611a0d, 0x3fcade26b7fbed95
+        .quad 0x3fedeb56f5f12d28, 0x3fca53c4135a6526
+        .quad 0x3fedf85ea8db188e, 0x3fc9cb5bd549b111
+        .quad 0x3fee0522a5dfda73, 0x3fc944ec2e4f5630
+        .quad 0x3fee11a3e8cf4eb8, 0x3fc8c07329874652
+        .quad 0x3fee1de36c75ba58, 0x3fc83deeada4d25a
+        .quad 0x3fee29e22a89d766, 0x3fc7bd5c7df3fe9c
+        .quad 0x3fee35a11b9b61ce, 0x3fc73eba3b5b07b7
+        .quad 0x3fee4121370224cc, 0x3fc6c205655be720
+        .quad 0x3fee4c6372cd8927, 0x3fc6473b5b15a7a1
+        .quad 0x3fee5768c3b4a3fc, 0x3fc5ce595c455b0a
+        .quad 0x3fee62321d06c5e0, 0x3fc5575c8a468362
+        .quad 0x3fee6cc0709c8a0d, 0x3fc4e241e912c305
+        .quad 0x3fee7714aec96534, 0x3fc46f066040a832
+        .quad 0x3fee812fc64db369, 0x3fc3fda6bc016994
+        .quad 0x3fee8b12a44944a8, 0x3fc38e1fae1d6a9d
+        .quad 0x3fee94be342e6743, 0x3fc3206dceef5f87
+        .quad 0x3fee9e335fb56f87, 0x3fc2b48d9e5dea1c
+        .quad 0x3feea7730ed0bbb9, 0x3fc24a7b84d38971
+        .quad 0x3feeb07e27a133aa, 0x3fc1e233d434b813
+        .quad 0x3feeb9558e6b42ce, 0x3fc17bb2c8d41535
+        .quad 0x3feec1fa258c4bea, 0x3fc116f48a6476cc
+        .quad 0x3feeca6ccd709544, 0x3fc0b3f52ce8c383
+        .quad 0x3feed2ae6489ac1e, 0x3fc052b0b1a174ea
+        .quad 0x3feedabfc7453e63, 0x3fbfe6460fef4680
+        .quad 0x3feee2a1d004692c, 0x3fbf2a901ccafb37
+        .quad 0x3feeea5557137ae0, 0x3fbe723726b824a9
+        .quad 0x3feef1db32a2277c, 0x3fbdbd32ac4c99b0
+        .quad 0x3feef93436bc2daa, 0x3fbd0b7a0f921e7c
+        .quad 0x3fef006135426b26, 0x3fbc5d0497c09e74
+        .quad 0x3fef0762fde45ee6, 0x3fbbb1c972f23e50
+        .quad 0x3fef0e3a5e1a1788, 0x3fbb09bfb7d11a84
+        .quad 0x3fef14e8211e8c55, 0x3fba64de673e8837
+        .quad 0x3fef1b6d0fea5f4d, 0x3fb9c31c6df3b1b8
+        .quad 0x3fef21c9f12f0677, 0x3fb92470a61b6965
+        .quad 0x3fef27ff89525acf, 0x3fb888d1d8e510a3
+        .quad 0x3fef2e0e9a6a8b09, 0x3fb7f036c0107294
+        .quad 0x3fef33f7e43a706b, 0x3fb75a96077274ba
+        .quad 0x3fef39bc242e43e6, 0x3fb6c7e64e7281cb
+        .quad 0x3fef3f5c1558b19e, 0x3fb6381e2980956b
+        .quad 0x3fef44d870704911, 0x3fb5ab342383d178
+        .quad 0x3fef4a31ebcd47df, 0x3fb5211ebf41880b
+        .quad 0x3fef4f693b67bd77, 0x3fb499d478bca735
+        .quad 0x3fef547f10d60597, 0x3fb4154bc68d75c3
+        .quad 0x3fef59741b4b97cf, 0x3fb3937b1b31925a
+        .quad 0x3fef5e4907982a07, 0x3fb31458e6542847
+        .quad 0x3fef62fe80272419, 0x3fb297db960e4f63
+        .quad 0x3fef67952cff6282, 0x3fb21df9981f8e53
+        .quad 0x3fef6c0db3c34641, 0x3fb1a6a95b1e786f
+        .quad 0x3fef7068b7b10fd9, 0x3fb131e14fa1625d
+        .quad 0x3fef74a6d9a38383, 0x3fb0bf97e95f2a64
+        .quad 0x3fef78c8b812d498, 0x3fb04fc3a0481321
+        .quad 0x3fef7cceef15d631, 0x3fafc4b5e32d6259
+        .quad 0x3fef80ba18636f07, 0x3faeeea8c1b1db94
+        .quad 0x3fef848acb544e95, 0x3fae1d4cf1e2450a
+        .quad 0x3fef88419ce4e184, 0x3fad508f9a1ea64f
+        .quad 0x3fef8bdf1fb78370, 0x3fac885df3451a07
+        .quad 0x3fef8f63e416ebff, 0x3fabc4a54a84e834
+        .quad 0x3fef92d077f8d56d, 0x3fab055303221015
+        .quad 0x3fef96256700da8e, 0x3faa4a549829587e
+        .quad 0x3fef99633a838a57, 0x3fa993979e14fffe
+        .quad 0x3fef9c8a7989af0d, 0x3fa8e109c4622913
+        .quad 0x3fef9f9ba8d3c733, 0x3fa83298d717210e
+        .quad 0x3fefa2974addae45, 0x3fa78832c03aa2b1
+        .quad 0x3fefa57ddfe27376, 0x3fa6e1c5893c380b
+        .quad 0x3fefa84fe5e05c8d, 0x3fa63f3f5c4de13b
+        .quad 0x3fefab0dd89d1309, 0x3fa5a08e85af27e0
+        .quad 0x3fefadb831a9f9c3, 0x3fa505a174e9c929
+        .quad 0x3fefb04f6868a944, 0x3fa46e66be002240
+        .quad 0x3fefb2d3f20f9101, 0x3fa3dacd1a8d8cce
+        .quad 0x3fefb54641aebbc9, 0x3fa34ac36ad8dafe
+        .quad 0x3fefb7a6c834b5a2, 0x3fa2be38b6d92415
+        .quad 0x3fefb9f5f4739170, 0x3fa2351c2f2d1449
+        .quad 0x3fefbc3433260ca5, 0x3fa1af5d2e04f3f6
+        .quad 0x3fefbe61eef4cf6a, 0x3fa12ceb37ff9bc3
+        .quad 0x3fefc07f907bc794, 0x3fa0adb5fcfa8c75
+        .quad 0x3fefc28d7e4f9cd0, 0x3fa031ad58d56279
+        .quad 0x3fefc48c1d033c7a, 0x3f9f7182a851bca2
+        .quad 0x3fefc67bcf2d7b8f, 0x3f9e85c449e377f3
+        .quad 0x3fefc85cf56ecd38, 0x3f9da0005e5f28df
+        .quad 0x3fefca2fee770c79, 0x3f9cc0180af00a8b
+        .quad 0x3fefcbf5170b578b, 0x3f9be5ecd2fcb5f9
+        .quad 0x3fefcdacca0bfb73, 0x3f9b1160991ff737
+        .quad 0x3fefcf57607a6e7c, 0x3f9a4255a00b9f03
+        .quad 0x3fefd0f5317f582f, 0x3f9978ae8b55ce1b
+        .quad 0x3fefd2869270a56f, 0x3f98b44e6031383e
+        .quad 0x3fefd40bd6d7a785, 0x3f97f5188610ddc8
+        .quad 0x3fefd58550773cb5, 0x3f973af0c737bb45
+        .quad 0x3fefd6f34f52013a, 0x3f9685bb5134ef13
+        .quad 0x3fefd85621b0876d, 0x3f95d55cb54cd53a
+        .quad 0x3fefd9ae142795e3, 0x3f9529b9e8cf9a1e
+        .quad 0x3fefdafb719e6a69, 0x3f9482b8455dc491
+        .quad 0x3fefdc3e835500b3, 0x3f93e03d891b37de
+        .quad 0x3fefdd7790ea5bc0, 0x3f93422fd6d12e2b
+        .quad 0x3fefdea6e062d0c9, 0x3f92a875b5ffab56
+        .quad 0x3fefdfccb62e52d3, 0x3f9212f612dee7fb
+        .quad 0x3fefe0e9552ebdd6, 0x3f9181983e5133dd
+        .quad 0x3fefe1fcfebe2083, 0x3f90f443edc5ce49
+        .quad 0x3fefe307f2b503d0, 0x3f906ae13b0d3255
+        .quad 0x3fefe40a6f70af4b, 0x3f8fcab1483ea7fc
+        .quad 0x3fefe504b1d9696c, 0x3f8ec72615a894c4
+        .quad 0x3fefe5f6f568b301, 0x3f8dcaf3691fc448
+        .quad 0x3fefe6e1742f7cf6, 0x3f8cd5ec93c12432
+        .quad 0x3fefe7c466dc57a1, 0x3f8be7e5ac24963b
+        .quad 0x3fefe8a004c19ae6, 0x3f8b00b38d6b3575
+        .quad 0x3fefe97483db8670, 0x3f8a202bd6372dce
+        .quad 0x3fefea4218d6594a, 0x3f894624e78e0faf
+        .quad 0x3fefeb08f7146046, 0x3f887275e3a6869e
+        .quad 0x3fefebc950b3fa75, 0x3f87a4f6aca256cb
+        .quad 0x3fefec835695932e, 0x3f86dd7fe3358230
+        .quad 0x3fefed37386190fb, 0x3f861beae53b72b7
+        .quad 0x3fefede5248e38f4, 0x3f856011cc3b036d
+        .quad 0x3fefee8d486585ee, 0x3f84a9cf6bda3f4c
+        .quad 0x3fefef2fd00af31a, 0x3f83f8ff5042a88e
+        .quad 0x3fefefcce6813974, 0x3f834d7dbc76d7e5
+        .quad 0x3feff064b5afffbe, 0x3f82a727a89a3f14
+        .quad 0x3feff0f766697c76, 0x3f8205dac02bd6b9
+        .quad 0x3feff18520700971, 0x3f81697560347b26
+        .quad 0x3feff20e0a7ba8c2, 0x3f80d1d69569b82d
+        .quad 0x3feff2924a3f7a83, 0x3f803ede1a45bfee
+        .quad 0x3feff312046f2339, 0x3f7f60d8aa2a88f2
+        .quad 0x3feff38d5cc4227f, 0x3f7e4cc4abf7d065
+        .quad 0x3feff404760319b4, 0x3f7d4143a9dfe965
+        .quad 0x3feff47772010262, 0x3f7c3e1a5f5c077c
+        .quad 0x3feff4e671a85425, 0x3f7b430ecf4a83a8
+        .quad 0x3feff55194fe19df, 0x3f7a4fe83fb9db25
+        .quad 0x3feff5b8fb26f5f6, 0x3f79646f35a76624
+        .quad 0x3feff61cc26c1578, 0x3f78806d70b2fc36
+        .quad 0x3feff67d08401202, 0x3f77a3ade6c8b3e5
+        .quad 0x3feff6d9e943c231, 0x3f76cdfcbfc1e263
+        .quad 0x3feff733814af88c, 0x3f75ff2750fe7820
+        .quad 0x3feff789eb6130c9, 0x3f7536fc18f7ce5c
+        .quad 0x3feff7dd41ce2b4d, 0x3f74754abacdf1dc
+        .quad 0x3feff82d9e1a76d8, 0x3f73b9e3f9d06e3f
+        .quad 0x3feff87b1913e853, 0x3f730499b503957f
+        .quad 0x3feff8c5cad200a5, 0x3f72553ee2a336bf
+        .quad 0x3feff90dcaba4096, 0x3f71aba78ba3af89
+        .quad 0x3feff9532f846ab0, 0x3f7107a8c7323a6e
+        .quad 0x3feff9960f3eb327, 0x3f706918b6355624
+        .quad 0x3feff9d67f51ddba, 0x3f6f9f9cfd9c3035
+        .quad 0x3feffa14948549a7, 0x3f6e77448fb66bb9
+        .quad 0x3feffa506302ebae, 0x3f6d58da68fd1170
+        .quad 0x3feffa89fe5b3625, 0x3f6c4412bf4b8f0b
+        .quad 0x3feffac17988ef4b, 0x3f6b38a3af2e55b4
+        .quad 0x3feffaf6e6f4f5c0, 0x3f6a3645330550ff
+        .quad 0x3feffb2a5879f35e, 0x3f693cb11a30d765
+        .quad 0x3feffb5bdf67fe6f, 0x3f684ba3004a50d0
+        .quad 0x3feffb8b8c88295f, 0x3f6762d84469c18f
+        .quad 0x3feffbb970200110, 0x3f66821000795a03
+        .quad 0x3feffbe599f4f9d9, 0x3f65a90b00981d93
+        .quad 0x3feffc10194fcb64, 0x3f64d78bba8ca5fd
+        .quad 0x3feffc38fcffbb7c, 0x3f640d564548fad7
+        .quad 0x3feffc60535dd7f5, 0x3f634a305080681f
+        .quad 0x3feffc862a501fd7, 0x3f628de11c5031eb
+        .quad 0x3feffcaa8f4c9bea, 0x3f61d83170fbf6fb
+        .quad 0x3feffccd8f5c66d1, 0x3f6128eb96be8798
+        .quad 0x3feffcef371ea4d7, 0x3f607fdb4dafea5f
+        .quad 0x3feffd0f92cb6ba7, 0x3f5fb99b8b8279e1
+        .quad 0x3feffd2eae369a07, 0x3f5e7f232d9e2630
+        .quad 0x3feffd4c94d29fdb, 0x3f5d4fed7195d7e8
+        .quad 0x3feffd6951b33686, 0x3f5c2b9cf7f893bf
+        .quad 0x3feffd84ef9009ee, 0x3f5b11d702b3deb2
+        .quad 0x3feffd9f78c7524a, 0x3f5a024365f771bd
+        .quad 0x3feffdb8f7605ee7, 0x3f58fc8c794b03b5
+        .quad 0x3feffdd1750e1220, 0x3f58005f08d6f1ef
+        .quad 0x3feffde8fb314ebf, 0x3f570d6a46e07dda
+        .quad 0x3feffdff92db56e5, 0x3f56235fbd7a4345
+        .quad 0x3feffe1544d01ccb, 0x3f5541f340697987
+        .quad 0x3feffe2a1988857c, 0x3f5468dadf4080ab
+        .quad 0x3feffe3e19349dc7, 0x3f5397ced7af2b15
+        .quad 0x3feffe514bbdc197, 0x3f52ce898809244e
+        .quad 0x3feffe63b8c8b5f7, 0x3f520cc76202c5fb
+        .quad 0x3feffe7567b7b5e1, 0x3f515246dda49d47
+        .quad 0x3feffe865fac722b, 0x3f509ec86c75d497
+        .quad 0x3feffe96a78a04a9, 0x3f4fe41cd9bb4eee
+        .quad 0x3feffea645f6d6da, 0x3f4e97ba3b77f306
+        .quad 0x3feffeb5415e7c44, 0x3f4d57f524723822
+        .quad 0x3feffec39ff380b9, 0x3f4c245d4b99847a
+        .quad 0x3feffed167b12ac2, 0x3f4afc85e0f82e12
+        .quad 0x3feffede9e5d3262, 0x3f49e005769dbc1d
+        .quad 0x3feffeeb49896c6d, 0x3f48ce75e9f6f8a0
+        .quad 0x3feffef76e956a9f, 0x3f47c7744d9378f7
+        .quad 0x3fefff0312b010b5, 0x3f46caa0d3582fe9
+        .quad 0x3fefff0e3ad91ec2, 0x3f45d79eb71e893b
+        .quad 0x3fefff18ebe2b0e1, 0x3f44ee1429bf7cc0
+        .quad 0x3fefff232a72b48e, 0x3f440daa3c89f5b6
+        .quad 0x3fefff2cfb0453d9, 0x3f43360ccd23db3a
+        .quad 0x3fefff3661e9569d, 0x3f4266ea71d4f71a
+        .quad 0x3fefff3f634b79f9, 0x3f419ff4663ae9df
+        .quad 0x3fefff48032dbe40, 0x3f40e0de78654d1e
+        .quad 0x3fefff50456dab8c, 0x3f40295ef6591848
+        .quad 0x3fefff582dc48d30, 0x3f3ef25d37f49fe1
+        .quad 0x3fefff5fbfc8a439, 0x3f3da01102b5f851
+        .quad 0x3fefff66feee5129, 0x3f3c5b5412dcafad
+        .quad 0x3fefff6dee89352e, 0x3f3b23a5a23e4210
+        .quad 0x3fefff7491cd4af6, 0x3f39f8893d8fd1c1
+        .quad 0x3fefff7aebcff755, 0x3f38d986a4187285
+        .quad 0x3fefff80ff8911fd, 0x3f37c629a822bc9e
+        .quad 0x3fefff86cfd3e657, 0x3f36be02102b3520
+        .quad 0x3fefff8c5f702ccf, 0x3f35c0a378c90bca
+        .quad 0x3fefff91b102fca8, 0x3f34cda5374ea275
+        .quad 0x3fefff96c717b695, 0x3f33e4a23d1f4703
+        .quad 0x3fefff9ba420e834, 0x3f330538fbb77ecd
+        .quad 0x3fefffa04a7928b1, 0x3f322f0b496539be
+        .quad 0x3fefffa4bc63ee9a, 0x3f3161be46ad3b50
+        .quad 0x3fefffa8fc0e5f33, 0x3f309cfa445b00ff
+        .quad 0x3fefffad0b901755, 0x3f2fc0d55470cf51
+        .quad 0x3fefffb0ecebee1b, 0x3f2e577bbcd49935
+        .quad 0x3fefffb4a210b172, 0x3f2cfd4a5adec5c0
+        .quad 0x3fefffb82cd9dcbf, 0x3f2bb1a9657ce465
+        .quad 0x3fefffbb8f1049c6, 0x3f2a740684026555
+        .quad 0x3fefffbeca6adbe9, 0x3f2943d4a1d1ed39
+        .quad 0x3fefffc1e08f25f5, 0x3f28208bc334a6a5
+        .quad 0x3fefffc4d3120aa1, 0x3f2709a8db59f25c
+        .quad 0x3fefffc7a37857d2, 0x3f25feada379d8b7
+        .quad 0x3fefffca53375ce3, 0x3f24ff207314a102
+        .quad 0x3fefffcce3b57bff, 0x3f240a8c1949f75e
+        .quad 0x3fefffcf564ab6b7, 0x3f23207fb7420eb9
+        .quad 0x3fefffd1ac4135f9, 0x3f22408e9ba3327f
+        .quad 0x3fefffd3e6d5cd87, 0x3f216a501f0e42ca
+        .quad 0x3fefffd607387b07, 0x3f209d5f819c9e29
+        .quad 0x3fefffd80e8ce0da, 0x3f1fb2b792b40a22
+        .quad 0x3fefffd9fdeabcce, 0x3f1e3bcf436a1a95
+        .quad 0x3fefffdbd65e5ad0, 0x3f1cd55277c18d05
+        .quad 0x3fefffdd98e903b2, 0x3f1b7e94604479dc
+        .quad 0x3fefffdf46816833, 0x3f1a36eec00926dd
+        .quad 0x3fefffe0e0140857, 0x3f18fdc1b2dcf7b9
+        .quad 0x3fefffe26683972a, 0x3f17d2737527c3f9
+        .quad 0x3fefffe3daa95b18, 0x3f16b4702d7d5849
+        .quad 0x3fefffe53d558ae9, 0x3f15a329b7d30748
+        .quad 0x3fefffe68f4fa777, 0x3f149e17724f4d41
+        .quad 0x3fefffe7d156d244, 0x3f13a4b60ba9aa4e
+        .quad 0x3fefffe904222101, 0x3f12b6875310f785
+        .quad 0x3fefffea2860ee1e, 0x3f11d312098e9dba
+        .quad 0x3fefffeb3ebb267b, 0x3f10f9e1b4dd36df
+        .quad 0x3fefffec47d19457, 0x3f102a8673a94692
+        .quad 0x3fefffed443e2787, 0x3f0ec929a665b449
+        .quad 0x3fefffee34943b15, 0x3f0d4f4b4c8e09ed
+        .quad 0x3fefffef1960d85d, 0x3f0be6abbb10a5aa
+        .quad 0x3fefffeff32af7af, 0x3f0a8e8cc1fadef6
+        .quad 0x3feffff0c273bea2, 0x3f094637d5bacfdb
+        .quad 0x3feffff187b6bc0e, 0x3f080cfdc72220cf
+        .quad 0x3feffff2436a21dc, 0x3f06e2367dc27f95
+        .quad 0x3feffff2f5fefcaa, 0x3f05c540b4936fd2
+        .quad 0x3feffff39fe16963, 0x3f04b581b8d170fc
+        .quad 0x3feffff44178c8d2, 0x3f03b2652b06c2b2
+        .quad 0x3feffff4db27f146, 0x3f02bb5cc22e5db6
+        .quad 0x3feffff56d4d5e5e, 0x3f01cfe010e2052d
+        .quad 0x3feffff5f8435efc, 0x3f00ef6c4c84a0fe
+        .quad 0x3feffff67c604180, 0x3f001984165a5f36
+        .quad 0x3feffff6f9f67e55, 0x3efe9b5e8d00ce77
+        .quad 0x3feffff77154e0d6, 0x3efd16f5716c6c1a
+        .quad 0x3feffff7e2c6aea2, 0x3efba4f035d60e03
+        .quad 0x3feffff84e93cd75, 0x3efa447b7b03f045
+        .quad 0x3feffff8b500e77c, 0x3ef8f4ccca7fc90d
+        .quad 0x3feffff9164f8e46, 0x3ef7b5223dac7336
+        .quad 0x3feffff972be5c59, 0x3ef684c227fcacef
+        .quad 0x3feffff9ca891572, 0x3ef562fac4329b48
+        .quad 0x3feffffa1de8c582, 0x3ef44f21e49054f2
+        .quad 0x3feffffa6d13de73, 0x3ef34894a5e24657
+        .quad 0x3feffffab83e54b8, 0x3ef24eb7254ccf83
+        .quad 0x3feffffaff99bac4, 0x3ef160f438c70913
+        .quad 0x3feffffb43555b5f, 0x3ef07ebd2a2d2844
+        .quad 0x3feffffb839e52f3, 0x3eef4f12e9ab070a
+        .quad 0x3feffffbc09fa7cd, 0x3eedb5ad0b27805c
+        .quad 0x3feffffbfa82616b, 0x3eec304efa2c6f4e
+        .quad 0x3feffffc316d9ed0, 0x3eeabe09e9144b5e
+        .quad 0x3feffffc6586abf6, 0x3ee95df988e76644
+        .quad 0x3feffffc96f1165e, 0x3ee80f439b4ee04b
+        .quad 0x3feffffcc5cec0c1, 0x3ee6d11788a69c64
+        .quad 0x3feffffcf23ff5fc, 0x3ee5a2adfa0b4bc4
+        .quad 0x3feffffd1c637b2b, 0x3ee4834877429b8f
+        .quad 0x3feffffd4456a10d, 0x3ee37231085c7d9a
+        .quad 0x3feffffd6a3554a1, 0x3ee26eb9daed6f7e
+        .quad 0x3feffffd8e1a2f22, 0x3ee1783ceac28910
+        .quad 0x3feffffdb01e8546, 0x3ee08e1badf0fced
+        .quad 0x3feffffdd05a75ea, 0x3edf5f7d88472604
+        .quad 0x3feffffdeee4f810, 0x3eddb92b5212fb8d
+        .quad 0x3feffffe0bd3e852, 0x3edc282cd3957eda
+        .quad 0x3feffffe273c15b7, 0x3edaab7abace48dc
+        .quad 0x3feffffe41314e06, 0x3ed94219bfcb4928
+        .quad 0x3feffffe59c6698b, 0x3ed7eb1a2075864e
+        .quad 0x3feffffe710d565e, 0x3ed6a597219a93da
+        .quad 0x3feffffe8717232d, 0x3ed570b69502f313
+        .quad 0x3feffffe9bf4098c, 0x3ed44ba864670882
+        .quad 0x3feffffeafb377d5, 0x3ed335a62115bce2
+        .quad 0x3feffffec2641a9e, 0x3ed22df298214423
+        .quad 0x3feffffed413e5b7, 0x3ed133d96ae7e0dd
+        .quad 0x3feffffee4d01cd6, 0x3ed046aeabcfcdec
+        .quad 0x3feffffef4a55bd4, 0x3ececb9cfe1d8642
+        .quad 0x3fefffff039f9e8f, 0x3ecd21397ead99cb
+        .quad 0x3fefffff11ca4876, 0x3ecb8d094c86d374
+        .quad 0x3fefffff1f302bc1, 0x3eca0df0f0c626dc
+        .quad 0x3fefffff2bdb904d, 0x3ec8a2e269750a39
+        .quad 0x3fefffff37d63a36, 0x3ec74adc8f4064d3
+        .quad 0x3fefffff43297019, 0x3ec604ea819f007c
+        .quad 0x3fefffff4dde0118, 0x3ec4d0231928c6f9
+        .quad 0x3fefffff57fc4a95, 0x3ec3aba85fe22e20
+        .quad 0x3fefffff618c3da6, 0x3ec296a70f414053
+        .quad 0x3fefffff6a956450, 0x3ec1905613b3abf2
+        .quad 0x3fefffff731ee681, 0x3ec097f6156f32c5
+        .quad 0x3fefffff7b2f8ed6, 0x3ebf59a20caf6695
+        .quad 0x3fefffff82cdcf1b, 0x3ebd9c73698fb1dc
+        .quad 0x3fefffff89ffc4aa, 0x3ebbf716c6168bae
+        .quad 0x3fefffff90cb3c81, 0x3eba6852c6b58392
+        .quad 0x3fefffff9735b73b, 0x3eb8eefd70594a89
+        .quad 0x3fefffff9d446ccc, 0x3eb789fb715aae95
+        .quad 0x3fefffffa2fc5015, 0x3eb6383f726a8e04
+        .quad 0x3fefffffa8621251, 0x3eb4f8c96f26a26a
+        .quad 0x3fefffffad7a2652, 0x3eb3caa61607f920
+        .quad 0x3fefffffb248c39d, 0x3eb2acee2f5ecdb8
+        .quad 0x3fefffffb6d1e95d, 0x3eb19ec60b1242ed
+        .quad 0x3fefffffbb196132, 0x3eb09f5cf4dd2877
+        .quad 0x3fefffffbf22c1e2, 0x3eaf5bd95d8730d8
+        .quad 0x3fefffffc2f171e3, 0x3ead9371e2ff7c35
+        .quad 0x3fefffffc688a9cf, 0x3eabe41de54d155a
+        .quad 0x3fefffffc9eb76ac, 0x3eaa4c89e08ef4f3
+        .quad 0x3fefffffcd1cbc28, 0x3ea8cb738399b12c
+        .quad 0x3fefffffd01f36af, 0x3ea75fa8dbc84bec
+        .quad 0x3fefffffd2f57d68, 0x3ea608078a70dcbc
+        .quad 0x3fefffffd5a2041f, 0x3ea4c37c0394d094
+        .quad 0x3fefffffd8271d12, 0x3ea39100d5687bfe
+        .quad 0x3fefffffda86faa9, 0x3ea26f9df8519bd7
+        .quad 0x3fefffffdcc3b117, 0x3ea15e6827001f18
+        .quad 0x3fefffffdedf37ed, 0x3ea05c803e4831c1
+        .quad 0x3fefffffe0db6b91, 0x3e9ed22548cffd35
+        .quad 0x3fefffffe2ba0ea5, 0x3e9d06ad6ecdf971
+        .quad 0x3fefffffe47ccb60, 0x3e9b551c847fbc96
+        .quad 0x3fefffffe62534d4, 0x3e99bc09f112b494
+        .quad 0x3fefffffe7b4c81e, 0x3e983a1ff0aa239d
+        .quad 0x3fefffffe92ced93, 0x3e96ce1aa3fd7bdd
+        .quad 0x3fefffffea8ef9cf, 0x3e9576c72b514859
+        .quad 0x3fefffffebdc2ec6, 0x3e943302cc4a0da8
+        .quad 0x3fefffffed15bcba, 0x3e9301ba221dc9bb
+        .quad 0x3fefffffee3cc32c, 0x3e91e1e857adc568
+        .quad 0x3fefffffef5251c2, 0x3e90d2966b1746f7
+        .quad 0x3feffffff0576917, 0x3e8fa5b4f49cc6b2
+        .quad 0x3feffffff14cfb92, 0x3e8dc3ae30b55c16
+        .quad 0x3feffffff233ee1d, 0x3e8bfd7555a3bd68
+        .quad 0x3feffffff30d18e8, 0x3e8a517d9e61628a
+        .quad 0x3feffffff3d9480f, 0x3e88be4f8f6c951f
+        .quad 0x3feffffff4993c46, 0x3e874287ded49339
+        .quad 0x3feffffff54dab72, 0x3e85dcd669f2cd34
+        .quad 0x3feffffff5f74141, 0x3e848bfd38302871
+        .quad 0x3feffffff6969fb8, 0x3e834ecf8a3c124a
+        .quad 0x3feffffff72c5fb6, 0x3e822430f521cbcf
+        .quad 0x3feffffff7b91176, 0x3e810b1488aeb235
+        .quad 0x3feffffff83d3d07, 0x3e80027c00a263a6
+        .quad 0x3feffffff8b962be, 0x3e7e12ee004efc37
+        .quad 0x3feffffff92dfba2, 0x3e7c3e44ae32b16b
+        .quad 0x3feffffff99b79d2, 0x3e7a854ea14102a8
+        .quad 0x3feffffffa0248e8, 0x3e78e6761569f45d
+        .quad 0x3feffffffa62ce54, 0x3e77603bac345f65
+        .quad 0x3feffffffabd69b4, 0x3e75f1353cdad001
+        .quad 0x3feffffffb127525, 0x3e74980cb3c80949
+        .quad 0x3feffffffb624592, 0x3e73537f00b6ad4d
+        .quad 0x3feffffffbad2aff, 0x3e72225b12bffc68
+        .quad 0x3feffffffbf370cd, 0x3e710380e1adb7e9
+        .quad 0x3feffffffc355dfd, 0x3e6febc107d5efaa
+        .quad 0x3feffffffc733572, 0x3e6df0f2a0ee6947
+        .quad 0x3feffffffcad3626, 0x3e6c14b2188bcee4
+        .quad 0x3feffffffce39b67, 0x3e6a553644f7f07d
+        .quad 0x3feffffffd169d0c, 0x3e68b0cfce0579e0
+        .quad 0x3feffffffd466fa5, 0x3e6725e7c5dd20f7
+        .quad 0x3feffffffd7344aa, 0x3e65b2fe547a1340
+        .quad 0x3feffffffd9d4aab, 0x3e6456a974e92e93
+        .quad 0x3feffffffdc4ad7a, 0x3e630f93c3699078
+        .quad 0x3feffffffde9964e, 0x3e61dc7b5b978cf8
+        .quad 0x3feffffffe0c2bf0, 0x3e60bc30c5d52f15
+        .quad 0x3feffffffe2c92db, 0x3e5f5b2be65a0c7f
+        .quad 0x3feffffffe4aed5e, 0x3e5d5f3a8dea7357
+        .quad 0x3feffffffe675bbd, 0x3e5b82915b03515b
+        .quad 0x3feffffffe81fc4e, 0x3e59c3517e789488
+        .quad 0x3feffffffe9aeb97, 0x3e581fb7df06136e
+        .quad 0x3feffffffeb24467, 0x3e56961b8d641d06
+        .quad 0x3feffffffec81ff2, 0x3e5524ec4d916cae
+        .quad 0x3feffffffedc95e7, 0x3e53cab1343d18d1
+        .quad 0x3feffffffeefbc85, 0x3e52860757487a01
+        .quad 0x3fefffffff01a8b6, 0x3e5155a09065d4f7
+        .quad 0x3fefffffff126e1e, 0x3e50384250e4c9fc
+        .quad 0x3fefffffff221f30, 0x3e4e59890b926c78
+        .quad 0x3fefffffff30cd3f, 0x3e4c642116a8a9e3
+        .quad 0x3fefffffff3e8892, 0x3e4a8e405e651ab6
+        .quad 0x3fefffffff4b606f, 0x3e48d5f98114f872
+        .quad 0x3fefffffff57632d, 0x3e47397c5a66e307
+        .quad 0x3fefffffff629e44, 0x3e45b71456c5a4c4
+        .quad 0x3fefffffff6d1e56, 0x3e444d26de513197
+        .quad 0x3fefffffff76ef3f, 0x3e42fa31d6371537
+        .quad 0x3fefffffff801c1f, 0x3e41bcca373b7b43
+        .quad 0x3fefffffff88af67, 0x3e40939ab853339f
+        .quad 0x3fefffffff90b2e3, 0x3e3efac5187b2863
+        .quad 0x3fefffffff982fc1, 0x3e3cf1e86235d0e7
+        .quad 0x3fefffffff9f2e9f, 0x3e3b0a68a2128bab
+        .quad 0x3fefffffffa5b790, 0x3e39423165bc4444
+        .quad 0x3fefffffffabd229, 0x3e37974e743dea3d
+        .quad 0x3fefffffffb18582, 0x3e3607e9eacd1050
+        .quad 0x3fefffffffb6d844, 0x3e34924a74dec729
+        .quad 0x3fefffffffbbd0aa, 0x3e3334d19e0c2160
+        .quad 0x3fefffffffc0748f, 0x3e31edfa3c5f5cca
+        .quad 0x3fefffffffc4c96c, 0x3e30bc56f1b54701
+        .quad 0x3fefffffffc8d462, 0x3e2f3d2185e047d9
+        .quad 0x3fefffffffcc9a41, 0x3e2d26cb87945e87
+        .quad 0x3fefffffffd01f89, 0x3e2b334fac4b9f99
+        .quad 0x3fefffffffd36871, 0x3e296076f7918d1c
+        .quad 0x3fefffffffd678ed, 0x3e27ac2d72fc2c63
+        .quad 0x3fefffffffd954ae, 0x3e2614801550319e
+        .quad 0x3fefffffffdbff2a, 0x3e24979ac8b28927
+        .quad 0x3fefffffffde7ba0, 0x3e2333c68e2d0548
+        .quad 0x3fefffffffe0cd16, 0x3e21e767bce37dd7
+        .quad 0x3fefffffffe2f664, 0x3e20b0fc5b6d05a0
+        .quad 0x3fefffffffe4fa30, 0x3e1f1e3523b41d7d
+        .quad 0x3fefffffffe6daf7, 0x3e1d00de6608effe
+        .quad 0x3fefffffffe89b0c, 0x3e1b0778b7b3301b
+        .quad 0x3fefffffffea3c9a, 0x3e192fb04ec0f6cf
+        .quad 0x3fefffffffebc1a9, 0x3e177756ec9f78fa
+        .quad 0x3fefffffffed2c21, 0x3e15dc61922d5a06
+        .quad 0x3fefffffffee7dc8, 0x3e145ce65699ff6d
+        .quad 0x3fefffffffefb847, 0x3e12f71a5f159970
+        .quad 0x3feffffffff0dd2b, 0x3e11a94ff571654f
+        .quad 0x3feffffffff1ede9, 0x3e1071f4bbea09ec
+        .quad 0x3feffffffff2ebda, 0x3e0e9f1ff8ddd774
+        .quad 0x3feffffffff3d843, 0x3e0c818223a202c7
+        .quad 0x3feffffffff4b453, 0x3e0a887bd2b4404d
+        .quad 0x3feffffffff58126, 0x3e08b1a336c5eb6b
+        .quad 0x3feffffffff63fc3, 0x3e06fab63324088a
+        .quad 0x3feffffffff6f121, 0x3e056197e30205ba
+        .quad 0x3feffffffff79626, 0x3e03e44e45301b92
+        .quad 0x3feffffffff82fab, 0x3e0281000bfe4c3f
+        .quad 0x3feffffffff8be77, 0x3e0135f28f2d50b4
+        .quad 0x3feffffffff94346, 0x3e000187dded5975
+        .quad 0x3feffffffff9bec8, 0x3dfdc479de0ef001
+        .quad 0x3feffffffffa319f, 0x3dfbad4fdad3caa1
+        .quad 0x3feffffffffa9c63, 0x3df9baed3ed27ab8
+        .quad 0x3feffffffffaffa4, 0x3df7ead9ce4285bb
+        .quad 0x3feffffffffb5be5, 0x3df63ac6b4edc88e
+        .quad 0x3feffffffffbb1a2, 0x3df4a88be2a6390c
+        .quad 0x3feffffffffc014e, 0x3df332259185f1a0
+        .quad 0x3feffffffffc4b56, 0x3df1d5b1f3793044
+        .quad 0x3feffffffffc901c, 0x3df0916f04b6e18b
+        .quad 0x3feffffffffccfff, 0x3deec77101de6926
+        .quad 0x3feffffffffd0b56, 0x3dec960bf23153e0
+        .quad 0x3feffffffffd4271, 0x3dea8bd20fc65ef7
+        .quad 0x3feffffffffd759d, 0x3de8a61745ec7d1d
+        .quad 0x3feffffffffda520, 0x3de6e25d0e756261
+        .quad 0x3feffffffffdd13c, 0x3de53e4f7d1666cb
+        .quad 0x3feffffffffdfa2d, 0x3de3b7c27a7ddb0e
+        .quad 0x3feffffffffe202d, 0x3de24caf2c32af14
+        .quad 0x3feffffffffe4371, 0x3de0fb3186804d0f
+        .quad 0x3feffffffffe642a, 0x3ddf830c0bb41fd7
+        .quad 0x3feffffffffe8286, 0x3ddd3c0f1a91c846
+        .quad 0x3feffffffffe9eb0, 0x3ddb1e5acf351d87
+        .quad 0x3feffffffffeb8d0, 0x3dd92712d259ce66
+        .quad 0x3feffffffffed10a, 0x3dd7538c60a04476
+        .quad 0x3feffffffffee782, 0x3dd5a14b04b47879
+        .quad 0x3feffffffffefc57, 0x3dd40dfd87456f4c
+        .quad 0x3fefffffffff0fa7, 0x3dd2977b1172b9d5
+        .quad 0x3fefffffffff218f, 0x3dd13bc07e891491
+        .quad 0x3fefffffffff3227, 0x3dcff1dbb4300811
+        .quad 0x3fefffffffff4188, 0x3dcd9a880f306bd8
+        .quad 0x3fefffffffff4fc9, 0x3dcb6e45220b55e0
+        .quad 0x3fefffffffff5cfd, 0x3dc96a0b33f2c4da
+        .quad 0x3fefffffffff6939, 0x3dc78b07e9e924ac
+        .quad 0x3fefffffffff748e, 0x3dc5ce9ab1670dd2
+        .quad 0x3fefffffffff7f0d, 0x3dc4325167006bb0
+        .quad 0x3fefffffffff88c5, 0x3dc2b3e53538ff3f
+        .quad 0x3fefffffffff91c6, 0x3dc15137a7f44864
+        .quad 0x3fefffffffff9a1b, 0x3dc0084ff125639d
+        .quad 0x3fefffffffffa1d2, 0x3dbdaeb0b7311ec7
+        .quad 0x3fefffffffffa8f6, 0x3dbb7937d1c40c53
+        .quad 0x3fefffffffffaf92, 0x3db96d082f59ab06
+        .quad 0x3fefffffffffb5b0, 0x3db7872d9fa10aad
+        .quad 0x3fefffffffffbb58, 0x3db5c4e8e37bc7d0
+        .quad 0x3fefffffffffc095, 0x3db423ac0df49a40
+        .quad 0x3fefffffffffc56d, 0x3db2a117230ad284
+        .quad 0x3fefffffffffc9e8, 0x3db13af4f04f9998
+        .quad 0x3fefffffffffce0d, 0x3dafde703724e560
+        .quad 0x3fefffffffffd1e1, 0x3dad77f0c82e7641
+        .quad 0x3fefffffffffd56c, 0x3dab3ee02611d7dd
+        .quad 0x3fefffffffffd8b3, 0x3da92ff33023d5bd
+        .quad 0x3fefffffffffdbba, 0x3da7481a9e69f53f
+        .quad 0x3fefffffffffde86, 0x3da5847eda620959
+        .quad 0x3fefffffffffe11d, 0x3da3e27c1fcc74bd
+        .quad 0x3fefffffffffe380, 0x3da25f9ee0b923dc
+        .quad 0x3fefffffffffe5b6, 0x3da0f9a068653200
+        .quad 0x3fefffffffffe7c0, 0x3d9f5cc7718082b0
+        .quad 0x3fefffffffffe9a2, 0x3d9cf7e53d6a2ca5
+        .quad 0x3fefffffffffeb60, 0x3d9ac0f5f3229372
+        .quad 0x3fefffffffffecfb, 0x3d98b498644847ea
+        .quad 0x3fefffffffffee77, 0x3d96cfa9bcca59dc
+        .quad 0x3fefffffffffefd6, 0x3d950f411d4fd2cd
+        .quad 0x3feffffffffff11a, 0x3d9370ab8327af5e
+        .quad 0x3feffffffffff245, 0x3d91f167f88c6b6e
+        .quad 0x3feffffffffff359, 0x3d908f24085d4597
+        .quad 0x3feffffffffff457, 0x3d8e8f70e181d61a
+        .quad 0x3feffffffffff542, 0x3d8c324c20e337dc
+        .quad 0x3feffffffffff61b, 0x3d8a03261574b54e
+        .quad 0x3feffffffffff6e3, 0x3d87fe903cdf5855
+        .quad 0x3feffffffffff79b, 0x3d86215c58da3450
+        .quad 0x3feffffffffff845, 0x3d846897d4b69fc6
+        .quad 0x3feffffffffff8e2, 0x3d82d1877d731b7b
+        .quad 0x3feffffffffff973, 0x3d8159a386b11517
+        .quad 0x3feffffffffff9f8, 0x3d7ffd27ae9393ce
+        .quad 0x3feffffffffffa73, 0x3d7d7c593130dd0b
+        .quad 0x3feffffffffffae4, 0x3d7b2cd607c79bcf
+        .quad 0x3feffffffffffb4c, 0x3d790ae4d3405651
+        .quad 0x3feffffffffffbad, 0x3d771312dd1759e2
+        .quad 0x3feffffffffffc05, 0x3d75422ef5d8949d
+        .quad 0x3feffffffffffc57, 0x3d739544b0ecc957
+        .quad 0x3feffffffffffca2, 0x3d720997f73e73dd
+        .quad 0x3feffffffffffce7, 0x3d709ca0eaacd277
+        .quad 0x3feffffffffffd27, 0x3d6e9810295890ec
+        .quad 0x3feffffffffffd62, 0x3d6c2b45b5aa4a1d
+        .quad 0x3feffffffffffd98, 0x3d69eee068fa7596
+        .quad 0x3feffffffffffdca, 0x3d67df2b399c10a8
+        .quad 0x3feffffffffffdf8, 0x3d65f8b87a31bd85
+        .quad 0x3feffffffffffe22, 0x3d64385c96e9a2d9
+        .quad 0x3feffffffffffe49, 0x3d629b2933ef4cbc
+        .quad 0x3feffffffffffe6c, 0x3d611e68a6378f8a
+        .quad 0x3feffffffffffe8d, 0x3d5f7f338086a86b
+        .quad 0x3feffffffffffeab, 0x3d5cf8d7d9ce040a
+        .quad 0x3feffffffffffec7, 0x3d5aa577251ae485
+        .quad 0x3feffffffffffee1, 0x3d58811d739efb5f
+        .quad 0x3feffffffffffef8, 0x3d568823e52970be
+        .quad 0x3fefffffffffff0e, 0x3d54b72ae68e8b4c
+        .quad 0x3fefffffffffff22, 0x3d530b14dbe876bc
+        .quad 0x3fefffffffffff34, 0x3d5181012ef86610
+        .quad 0x3fefffffffffff45, 0x3d501647ba798745
+        .quad 0x3fefffffffffff54, 0x3d4d90e917701675
+        .quad 0x3fefffffffffff62, 0x3d4b2a87e86d0c8a
+        .quad 0x3fefffffffffff6f, 0x3d48f53dcb377293
+        .quad 0x3fefffffffffff7b, 0x3d46ed2f2515e933
+        .quad 0x3fefffffffffff86, 0x3d450ecc9ed47f19
+        .quad 0x3fefffffffffff90, 0x3d4356cd5ce7799e
+        .quad 0x3fefffffffffff9a, 0x3d41c229a587ab78
+        .quad 0x3fefffffffffffa2, 0x3d404e15ecc7f3f6
+        .quad 0x3fefffffffffffaa, 0x3d3deffc7e6a6017
+        .quad 0x3fefffffffffffb1, 0x3d3b7b040832f310
+        .quad 0x3fefffffffffffb8, 0x3d3938e021f36d76
+        .quad 0x3fefffffffffffbe, 0x3d37258610b3b233
+        .quad 0x3fefffffffffffc3, 0x3d353d3bfc82a909
+        .quad 0x3fefffffffffffc8, 0x3d337c92babdc2fd
+        .quad 0x3fefffffffffffcd, 0x3d31e06010120f6a
+        .quad 0x3fefffffffffffd1, 0x3d3065b9616170d4
+        .quad 0x3fefffffffffffd5, 0x3d2e13dd96b3753b
+        .quad 0x3fefffffffffffd9, 0x3d2b950d32467392
+        .quad 0x3fefffffffffffdc, 0x3d294a72263259a5
+        .quad 0x3fefffffffffffdf, 0x3d272fd93e036cdc
+        .quad 0x3fefffffffffffe2, 0x3d254164576929ab
+        .quad 0x3fefffffffffffe4, 0x3d237b83c521fe96
+        .quad 0x3fefffffffffffe7, 0x3d21daf033182e96
+        .quad 0x3fefffffffffffe9, 0x3d205ca50205d26a
+        .quad 0x3fefffffffffffeb, 0x3d1dfbb6235639fa
+        .quad 0x3fefffffffffffed, 0x3d1b7807e294781f
+        .quad 0x3fefffffffffffee, 0x3d19298add70a734
+        .quad 0x3feffffffffffff0, 0x3d170beaf9c7ffb6
+        .quad 0x3feffffffffffff1, 0x3d151b2cd6709222
+        .quad 0x3feffffffffffff3, 0x3d1353a6cf7f7fff
+        .quad 0x3feffffffffffff4, 0x3d11b1fa8cbe84a7
+        .quad 0x3feffffffffffff5, 0x3d10330f0fd69921
+        .quad 0x3feffffffffffff6, 0x3d0da81670f96f9b
+        .quad 0x3feffffffffffff7, 0x3d0b24a16b4d09aa
+        .quad 0x3feffffffffffff7, 0x3d08d6eeb6efdbd6
+        .quad 0x3feffffffffffff8, 0x3d06ba91ac734786
+        .quad 0x3feffffffffffff9, 0x3d04cb7966770ab5
+        .quad 0x3feffffffffffff9, 0x3d0305e9721d0981
+        .quad 0x3feffffffffffffa, 0x3d01667311fff70a
+        .quad 0x3feffffffffffffb, 0x3cffd3de10d62855
+        .quad 0x3feffffffffffffb, 0x3cfd1aefbcd48d0c
+        .quad 0x3feffffffffffffb, 0x3cfa9cc93c25aca9
+        .quad 0x3feffffffffffffc, 0x3cf85487ee3ea735
+        .quad 0x3feffffffffffffc, 0x3cf63daf8b4b1e0c
+        .quad 0x3feffffffffffffd, 0x3cf45421e69a6ca1
+        .quad 0x3feffffffffffffd, 0x3cf294175802d99a
+        .quad 0x3feffffffffffffd, 0x3cf0fa17bf41068f
+        .quad 0x3feffffffffffffd, 0x3cef05e82aae2bb9
+        .quad 0x3feffffffffffffe, 0x3cec578101b29058
+        .quad 0x3feffffffffffffe, 0x3ce9e39dc5dd2f7c
+        .quad 0x3feffffffffffffe, 0x3ce7a553a728bbf2
+        .quad 0x3feffffffffffffe, 0x3ce5982008db1304
+        .quad 0x3feffffffffffffe, 0x3ce3b7e00422e51b
+        .quad 0x3feffffffffffffe, 0x3ce200c898d9ee3e
+        .quad 0x3fefffffffffffff, 0x3ce06f5f7eb65a56
+        .quad 0x3fefffffffffffff, 0x3cde00e9148a1d25
+        .quad 0x3fefffffffffffff, 0x3cdb623734024e92
+        .quad 0x3fefffffffffffff, 0x3cd8fd4e01891bf8
+        .quad 0x3fefffffffffffff, 0x3cd6cd44c7470d89
+        .quad 0x3fefffffffffffff, 0x3cd4cd9c04158cd7
+        .quad 0x3fefffffffffffff, 0x3cd2fa34bf5c8344
+        .quad 0x3fefffffffffffff, 0x3cd14f4890ff2461
+        .quad 0x3fefffffffffffff, 0x3ccf92c49dfa4df5
+        .quad 0x3fefffffffffffff, 0x3ccccaaea71ab0df
+        .quad 0x3fefffffffffffff, 0x3cca40829f001197
+        .quad 0x3ff0000000000000, 0x3cc7eef13b59e96c
+        .quad 0x3ff0000000000000, 0x3cc5d11e1a252bf5
+        .quad 0x3ff0000000000000, 0x3cc3e296303b2297
+        .quad 0x3ff0000000000000, 0x3cc21f47009f43ce
+        .quad 0x3ff0000000000000, 0x3cc083768c5e4542
+        .quad 0x3ff0000000000000, 0x3cbe1777d831265f
+        .quad 0x3ff0000000000000, 0x3cbb69f10b0191b5
+        .quad 0x3ff0000000000000, 0x3cb8f8a3a05b5b53
+        .quad 0x3ff0000000000000, 0x3cb6be573c40c8e7
+        .quad 0x3ff0000000000000, 0x3cb4b645ba991fdb
+        .align 64
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff  /* _AbsMask */
+        .align 64
+        .quad 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000  /* _MaxThreshold = 6.0 - 1.0/128.0 */
+        .align 64
+        .quad 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000  /* SRound */
+        .align 64
+        .quad 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000  /* _U2THreshold  */
+        .align 64
+        .quad 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5  /* _poly_1_0 */
+        .align 64
+        .quad 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1  /* _poly_1_1 */
+        .align 64
+        .quad 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57  /* _poly_3_0 */
+        .align 64
+        .quad 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8  /* _poly_3_1 */
+        .align 64
+        .quad 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F  /* _poly_5_0 */
+        .align 64
+        .quad 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122  /* _poly_5_1 */
+        .align 64
+        .quad 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6  /* _poly_1_2 */
+        .align 64
+        .quad 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd  /* _poly_3_2 */
+        .align 64
+        .quad 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c  /* _poly_1_3 */
+        .align 64
+        .quad 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555  /* _poly_3_3 */
+        .align 64
+        .quad 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff  /* _Mask32 */
+        .align 64
+        .type	__svml_derf_data_internal,@object
+        .size	__svml_derf_data_internal,.-__svml_derf_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core-avx2.S
new file mode 100644
index 0000000000..852a247f83
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized erff.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_erff _ZGVeN16v_erff_avx2_wrapper
+#include "../svml_s_erff16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core.c
new file mode 100644
index 0000000000..5714eaf023
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized erff, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_erff
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_erff, __GI__ZGVeN16v_erff,
+	       __redirect__ZGVeN16v_erff)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core_avx512.S
new file mode 100644
index 0000000000..5cdc8a77f7
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core_avx512.S
@@ -0,0 +1,185 @@
+/* Function erff vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   erf(x) is computed as higher precision simple polynomial
+ *   with no lookup table:
+ *
+ *     R = P0 + x^2*(P1 + x^2*(P2 + .... x^2*P12));
+ *     erf(x) = R * R * x;
+ *
+ *   Special cases:
+ *
+ *   erf(0)    = 0
+ *   erf(+INF) = +1
+ *   erf(-INF) = -1
+ *   erf(QNaN) = QNaN
+ *   erf(SNaN) = QNaN
+ *
+ */
+
+/* Offsets for data table __svml_serf_data_internal
+ */
+#define _AbsMask                      	0
+#define _One                          	64
+#define _gf_MaxThreshold_LA           	128
+#define _gf_la_poly_0                 	192
+#define _gf_la_poly_1                 	256
+#define _gf_la_poly_2                 	320
+#define _gf_la_poly_3                 	384
+#define _gf_la_poly_4                 	448
+#define _gf_la_poly_5                 	512
+#define _gf_la_poly_6                 	576
+#define _gf_la_poly_7                 	640
+#define _gf_la_poly_8                 	704
+#define _gf_la_poly_9                 	768
+#define _gf_la_poly_10                	832
+#define _gf_la_poly_11                	896
+#define _gf_la_poly_12                	960
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_erff_skx)
+        vmovaps   %zmm0, %zmm8
+        vmulps    {rn-sae}, %zmm8, %zmm8, %zmm11
+        vmovups   _gf_la_poly_11+__svml_serf_data_internal(%rip), %zmm15
+        vmovups   _gf_la_poly_12+__svml_serf_data_internal(%rip), %zmm10
+        vmovups   _gf_la_poly_10+__svml_serf_data_internal(%rip), %zmm9
+        vmovups   _gf_la_poly_9+__svml_serf_data_internal(%rip), %zmm7
+        vmovups   _gf_la_poly_8+__svml_serf_data_internal(%rip), %zmm0
+        vmovups   _gf_la_poly_7+__svml_serf_data_internal(%rip), %zmm1
+        vmovups   _gf_la_poly_6+__svml_serf_data_internal(%rip), %zmm2
+        vmovups   _gf_la_poly_5+__svml_serf_data_internal(%rip), %zmm3
+        vmovups   _gf_la_poly_4+__svml_serf_data_internal(%rip), %zmm4
+        vmovups   _gf_la_poly_3+__svml_serf_data_internal(%rip), %zmm5
+        vmovups   _gf_la_poly_2+__svml_serf_data_internal(%rip), %zmm6
+        vextractf32x8 $1, %zmm8, %ymm13
+        vcvtps2pd {sae}, %ymm8, %zmm12
+        vcvtps2pd {sae}, %ymm13, %zmm14
+        vmulpd    {rn-sae}, %zmm12, %zmm12, %zmm12
+        vmulpd    {rn-sae}, %zmm14, %zmm14, %zmm13
+
+/* R = P0 + x^2*(P1 + x^2*(P2 + .... x^2*P12)); */
+        vmovaps   %zmm15, %zmm14
+        vfmadd231pd {rn-sae}, %zmm12, %zmm10, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm10, %zmm15
+        vmovups   _gf_la_poly_1+__svml_serf_data_internal(%rip), %zmm10
+        vfmadd213pd {rn-sae}, %zmm9, %zmm12, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm15, %zmm9
+        vfmadd213pd {rn-sae}, %zmm7, %zmm12, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm9, %zmm7
+        vfmadd213pd {rn-sae}, %zmm0, %zmm12, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm7, %zmm0
+        vmovups   _gf_MaxThreshold_LA+__svml_serf_data_internal(%rip), %zmm7
+        vfmadd213pd {rn-sae}, %zmm1, %zmm12, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm0, %zmm1
+        vmovups   _gf_la_poly_0+__svml_serf_data_internal(%rip), %zmm0
+        vcmpps    $22, {sae}, %zmm11, %zmm7, %k1
+        vfmadd213pd {rn-sae}, %zmm2, %zmm12, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm1, %zmm2
+        vfmadd213pd {rn-sae}, %zmm3, %zmm12, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm2, %zmm3
+        vfmadd213pd {rn-sae}, %zmm4, %zmm12, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm3, %zmm4
+        vfmadd213pd {rn-sae}, %zmm5, %zmm12, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm4, %zmm5
+        vfmadd213pd {rn-sae}, %zmm6, %zmm12, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm5, %zmm6
+        vmovups   _AbsMask+__svml_serf_data_internal(%rip), %zmm5
+        vfmadd213pd {rn-sae}, %zmm10, %zmm12, %zmm14
+        vfmadd231pd {rn-sae}, %zmm13, %zmm6, %zmm10
+        vandnps   %zmm8, %zmm5, %zmm6
+        vfmadd213pd {rn-sae}, %zmm0, %zmm14, %zmm12
+        vfmadd213pd {rn-sae}, %zmm0, %zmm10, %zmm13
+        vorps     _One+__svml_serf_data_internal(%rip), %zmm6, %zmm0
+        vmulpd    {rn-sae}, %zmm12, %zmm12, %zmm1
+        vmulpd    {rn-sae}, %zmm13, %zmm13, %zmm3
+        vcvtpd2ps {rn-sae}, %zmm1, %ymm2
+        vcvtpd2ps {rn-sae}, %zmm3, %ymm4
+        vinsertf32x8 $1, %ymm4, %zmm2, %zmm9
+
+/* erf(x) = R * R * x; */
+        vmulps    {rn-sae}, %zmm8, %zmm9, %zmm0{%k1}
+        ret
+
+END(_ZGVeN16v_erff_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_serf_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(64)) VUINT32 _AbsMask[16][1];
+        __declspec(align(64)) VUINT32 _One[16][1];
+        __declspec(align(64)) VUINT32 _gf_MaxThreshold_LA[16][1];
+        __declspec(align(64)) VUINT32 _gf_la_poly_0[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_1[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_2[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_3[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_4[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_5[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_6[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_7[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_8[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_9[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_10[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_11[8][2];
+        __declspec(align(64)) VUINT32 _gf_la_poly_12[8][2];
+} __svml_serf_data_internal;
+#endif
+__svml_serf_data_internal:
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _AbsMask */
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000  /* _One */
+        .align 64
+        .long 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a          /* _gf_MaxThreshold_LA */
+        .align 64
+        .quad 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903  /* _gf_la_poly_0 */
+        .align 64
+        .quad 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367  /* _gf_la_poly_1 */
+        .align 64
+        .quad 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b  /* _gf_la_poly_2 */
+        .align 64
+        .quad 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc  /* _gf_la_poly_3 */
+        .align 64
+        .quad 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392  /* _gf_la_poly_4 */
+        .align 64
+        .quad 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede  /* _gf_la_poly_5 */
+        .align 64
+        .quad 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0  /* _gf_la_poly_6 */
+        .align 64
+        .quad 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f  /* _gf_la_poly_7 */
+        .align 64
+        .quad 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523  /* _gf_la_poly_8 */
+        .align 64
+        .quad 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47  /* _gf_la_poly_9 */
+        .align 64
+        .quad 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03  /* _gf_la_poly_10 */
+        .align 64
+        .quad 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb  /* _gf_la_poly_11 */
+        .align 64
+        .quad 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1  /* _gf_la_poly_12 */
+        .align 64
+        .type	__svml_serf_data_internal,@object
+        .size	__svml_serf_data_internal,.-__svml_serf_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core-sse2.S
new file mode 100644
index 0000000000..651fd267a5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized erff, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_erff _ZGVbN4v_erff_sse2
+#include "../svml_s_erff4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core.c
new file mode 100644
index 0000000000..02286a68c6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized erff, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_erff
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_erff, __GI__ZGVbN4v_erff,
+	       __redirect__ZGVbN4v_erff)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core_sse4.S
new file mode 100644
index 0000000000..5c052f5921
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core_sse4.S
@@ -0,0 +1,664 @@
+/* Function erff vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Basic formula is
+ *    erf(x) ~ erf(x0) +
+ *              + exp(-x0*x0)*D*(1+c0+T*P1(T)+D^2*P3(T)+D^4*p5)
+ *   where D=x-x0, T=x0*D
+ *   x0 is x rounded to a specified number of fractional bits (in this case 8),
+ *    except that x0=0 for |x|<3.5/256.0 (using x0=0 for first 4 table entries)
+ *
+ *   Data table packs both erf(x0)_high and a few bits of erf(x0)_low in one
+ *   entry (in place of redundant exponent bits)
+ *
+ */
+
+/* Offsets for data table __svml_serf_data_internal
+ */
+#define _erf_tbl                      	0
+#define _AbsMask                      	4032
+#define _MaxThreshold                 	4048
+#define _SRound                       	4064
+#define _U2Threshold                  	4080
+#define _poly3_0                      	4096
+
+/* Lookup bias for data table __svml_serf_data_internal.  */
+#define Table_Lookup_Bias               -0x3c000000
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_erff_sse4)
+        lea       Table_Lookup_Bias+__svml_serf_data_internal(%rip), %rdi
+        movups    _AbsMask+__svml_serf_data_internal(%rip), %xmm9
+        andps     %xmm0, %xmm9
+
+/*
+ * erf(x) rounds to 1.0 for x>_MaxThreshold (3.9375)
+ * can compute all results in the main path
+ */
+        movaps    %xmm9, %xmm12
+
+/* save sign */
+        pxor      %xmm9, %xmm0
+        minps     _MaxThreshold+__svml_serf_data_internal(%rip), %xmm12
+
+/*
+ * vector gather:
+ * erf(x0), exp(-x0*x0)*2.0/sqrt(pi)
+ */
+        movups    _SRound+__svml_serf_data_internal(%rip), %xmm1
+        movaps    %xmm1, %xmm4
+        movups    _U2Threshold+__svml_serf_data_internal(%rip), %xmm11
+        addps     %xmm12, %xmm4
+        cmpltps   %xmm12, %xmm11
+        movaps    %xmm4, %xmm10
+        pslld     $3, %xmm4
+        pshufd    $1, %xmm4, %xmm2
+        subps     %xmm1, %xmm10
+        movd      %xmm4, %eax
+        movd      %xmm2, %edx
+        pshufd    $2, %xmm4, %xmm3
+        subps     %xmm10, %xmm12
+        movd      %xmm3, %ecx
+        andps     %xmm12, %xmm11
+
+/* D2 = Diff^2 */
+        mulps     %xmm11, %xmm11
+        mulps     %xmm12, %xmm10
+
+/* NaN fixup */
+        minps     %xmm9, %xmm12
+
+/*
+ * Start polynomial evaluation
+ * P1
+ */
+        mulps     _poly3_0+__svml_serf_data_internal(%rip), %xmm11
+        pshufd    $3, %xmm4, %xmm5
+        subps     %xmm10, %xmm11
+        movd      %xmm5, %esi
+
+/*
+ * branch-free
+ * (exp_h(x0) * Diff) * (poly + 1.0)
+ */
+        mulps     %xmm12, %xmm11
+        movslq    %eax, %rax
+        addps     %xmm11, %xmm12
+        movslq    %edx, %rdx
+        movslq    %ecx, %rcx
+        movslq    %esi, %rsi
+        movq      (%rdi,%rax), %xmm13
+        movq      (%rdi,%rdx), %xmm6
+        movq      (%rdi,%rcx), %xmm8
+        movq      (%rdi,%rsi), %xmm7
+        unpcklps  %xmm6, %xmm13
+        unpcklps  %xmm7, %xmm8
+        movaps    %xmm13, %xmm14
+        shufps    $238, %xmm8, %xmm13
+
+/* Final result */
+        mulps     %xmm12, %xmm13
+        movlhps   %xmm8, %xmm14
+        addps     %xmm13, %xmm14
+
+/* set sign */
+        orps      %xmm14, %xmm0
+        ret
+
+END(_ZGVbN4v_erff_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_serf_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _erf_tbl[1008][1];
+        __declspec(align(16)) VUINT32 _AbsMask[4][1];
+        __declspec(align(16)) VUINT32 _MaxThreshold[4][1];
+        __declspec(align(16)) VUINT32 _SRound[4][1];
+        __declspec(align(16)) VUINT32 _U2Threshold[4][1];
+        __declspec(align(16)) VUINT32 _poly3_0[4][1];
+} __svml_serf_data_internal;
+#endif
+__svml_serf_data_internal:
+        /*== _erf_tbl ==*/
+        .long 0x00000000, 0x3f906ebb
+        .long 0x3c106dfa, 0x3f906c79
+        .long 0x3c906bb8, 0x3f9065b4
+        .long 0x3cd89bf0, 0x3f905a6c
+        .long 0x3d1062b2, 0x3f904aa3
+        .long 0x3d3472ea, 0x3f90365a
+        .long 0x3d587d7f, 0x3f901d93
+        .long 0x3d7c8154, 0x3f900050
+        .long 0x3d903ea4, 0x3f8fde94
+        .long 0x3da2381f, 0x3f8fb862
+        .long 0x3db42c8d, 0x3f8f8dbd
+        .long 0x3dc61b5f, 0x3f8f5eab
+        .long 0x3dd80409, 0x3f8f2b2e
+        .long 0x3de9e5fc, 0x3f8ef34c
+        .long 0x3dfbc0ad, 0x3f8eb70a
+        .long 0x3e06c9c8, 0x3f8e766e
+        .long 0x3e0faf0d, 0x3f8e317d
+        .long 0x3e188fe1, 0x3f8de83e
+        .long 0x3e216bfe, 0x3f8d9ab9
+        .long 0x3e2a4321, 0x3f8d48f3
+        .long 0x3e331506, 0x3f8cf2f5
+        .long 0x3e3be169, 0x3f8c98c6
+        .long 0x3e44a808, 0x3f8c3a6f
+        .long 0x3e4d68a1, 0x3f8bd7f8
+        .long 0x3e5622f2, 0x3f8b716c
+        .long 0x3e5ed6b9, 0x3f8b06d2
+        .long 0x3e6783b7, 0x3f8a9834
+        .long 0x3e7029aa, 0x3f8a259e
+        .long 0x3e78c855, 0x3f89af18
+        .long 0x3e80afbc, 0x3f8934af
+        .long 0x3e84f76b, 0x3f88b66c
+        .long 0x3e893b19, 0x3f88345d
+        .long 0x3e8d7aa7, 0x3f87ae8b
+        .long 0x3e91b5f8, 0x3f872504
+        .long 0x3e95ecee, 0x3f8697d3
+        .long 0x3e9a1f6b, 0x3f860705
+        .long 0x3e9e4d54, 0x3f8572a8
+        .long 0x3ea2768c, 0x3f84dac8
+        .long 0x3ea69af8, 0x3f843f72
+        .long 0x3eaaba7a, 0x3f83a0b6
+        .long 0x3eaed4fa, 0x3f82fe9f
+        .long 0x3eb2ea5c, 0x3f82593e
+        .long 0x3eb6fa85, 0x3f81b0a0
+        .long 0x3ebb055d, 0x3f8104d3
+        .long 0x3ebf0aca, 0x3f8055e8
+        .long 0x3ec30ab3, 0x3f7f47d8
+        .long 0x3ec70501, 0x3f7ddddf
+        .long 0x3ecaf99b, 0x3f7c6e05
+        .long 0x3ecee869, 0x3f7af867
+        .long 0x3ed2d156, 0x3f797d26
+        .long 0x3ed6b44b, 0x3f77fc62
+        .long 0x3eda9132, 0x3f76763c
+        .long 0x3ede67f6, 0x3f74ead4
+        .long 0x3ee23882, 0x3f735a4c
+        .long 0x3ee602c2, 0x3f71c4c4
+        .long 0x3ee9c6a2, 0x3f702a5f
+        .long 0x3eed840e, 0x3f6e8b3e
+        .long 0x3ef13af5, 0x3f6ce783
+        .long 0x3ef4eb45, 0x3f6b3f51
+        .long 0x3ef894ea, 0x3f6992c9
+        .long 0x3efc37d5, 0x3f67e20f
+        .long 0x3effd3f5, 0x3f662d45
+        .long 0x3f01b49d, 0x3f64748e
+        .long 0x3f037bca, 0x3f62b80d
+        .long 0x3f053f7b, 0x3f60f7e5
+        .long 0x3f06ffa8, 0x3f5f3439
+        .long 0x3f08bc4a, 0x3f5d6d2d
+        .long 0x3f0a755a, 0x3f5ba2e3
+        .long 0x3f0c2ad3, 0x3f59d57e
+        .long 0x3f0ddcae, 0x3f580523
+        .long 0x3f0f8ae6, 0x3f5631f4
+        .long 0x3f113574, 0x3f545c14
+        .long 0x3f12dc54, 0x3f5283a7
+        .long 0x3f147f81, 0x3f50a8cf
+        .long 0x3f161ef6, 0x3f4ecbb1
+        .long 0x3f17baae, 0x3f4cec6d
+        .long 0x3f1952a6, 0x3f4b0b28
+        .long 0x3f1ae6da, 0x3f492804
+        .long 0x3f1c7745, 0x3f474323
+        .long 0x3f1e03e5, 0x3f455ca8
+        .long 0x3f1f8cb7, 0x3f4374b5
+        .long 0x3f2111b7, 0x3f418b6b
+        .long 0x3f2292e4, 0x3f3fa0ee
+        .long 0x3f24103a, 0x3f3db55e
+        .long 0x3f2589b9, 0x3f3bc8dc
+        .long 0x3f26ff5d, 0x3f39db8a
+        .long 0x3f287126, 0x3f37ed89
+        .long 0x3f29df13, 0x3f35fef8
+        .long 0x3f2b4922, 0x3f340ff9
+        .long 0x3f2caf53, 0x3f3220ab
+        .long 0x3f2e11a4, 0x3f30312e
+        .long 0x3f2f7017, 0x3f2e41a1
+        .long 0x3f30caab, 0x3f2c5223
+        .long 0x3f322160, 0x3f2a62d3
+        .long 0x3f337437, 0x3f2873cf
+        .long 0x3f34c32f, 0x3f268534
+        .long 0x3f360e4c, 0x3f249721
+        .long 0x3f37558c, 0x3f22a9b3
+        .long 0x3f3898f3, 0x3f20bd06
+        .long 0x3f39d881, 0x3f1ed137
+        .long 0x3f3b1438, 0x3f1ce661
+        .long 0x3f3c4c1b, 0x3f1afca0
+        .long 0x3f3d802c, 0x3f19140f
+        .long 0x3f3eb06c, 0x3f172cc9
+        .long 0x3f3fdce0, 0x3f1546e7
+        .long 0x3f410589, 0x3f136284
+        .long 0x3f422a6b, 0x3f117fb9
+        .long 0x3f434b89, 0x3f0f9e9e
+        .long 0x3f4468e7, 0x3f0dbf4c
+        .long 0x3f458287, 0x3f0be1db
+        .long 0x3f46986f, 0x3f0a0662
+        .long 0x3f47aaa2, 0x3f082cf7
+        .long 0x3f48b925, 0x3f0655b1
+        .long 0x3f49c3fb, 0x3f0480a6
+        .long 0x3f4acb29, 0x3f02adeb
+        .long 0x3f4bceb4, 0x3f00dd96
+        .long 0x3f4ccea1, 0x3efe1f73
+        .long 0x3f4dcaf4, 0x3efa88d5
+        .long 0x3f4ec3b4, 0x3ef6f777
+        .long 0x3f4fb8e5, 0x3ef36b80
+        .long 0x3f50aa8d, 0x3eefe513
+        .long 0x3f5198b1, 0x3eec6455
+        .long 0x3f528358, 0x3ee8e968
+        .long 0x3f536a86, 0x3ee5746d
+        .long 0x3f544e43, 0x3ee20584
+        .long 0x3f552e93, 0x3ede9ccc
+        .long 0x3f560b7e, 0x3edb3a64
+        .long 0x3f56e50a, 0x3ed7de6a
+        .long 0x3f57bb3d, 0x3ed488f8
+        .long 0x3f588e1e, 0x3ed13a2b
+        .long 0x3f595db4, 0x3ecdf21c
+        .long 0x3f5a2a05, 0x3ecab0e4
+        .long 0x3f5af318, 0x3ec7769b
+        .long 0x3f5bb8f4, 0x3ec44359
+        .long 0x3f5c7ba1, 0x3ec11733
+        .long 0x3f5d3b25, 0x3ebdf23d
+        .long 0x3f5df788, 0x3ebad48d
+        .long 0x3f5eb0d1, 0x3eb7be35
+        .long 0x3f5f6707, 0x3eb4af46
+        .long 0x3f601a32, 0x3eb1a7d3
+        .long 0x3f60ca59, 0x3eaea7ea
+        .long 0x3f617784, 0x3eabaf9a
+        .long 0x3f6221bb, 0x3ea8bef3
+        .long 0x3f62c905, 0x3ea5d600
+        .long 0x3f636d69, 0x3ea2f4ce
+        .long 0x3f640ef1, 0x3ea01b68
+        .long 0x3f64ada3, 0x3e9d49d9
+        .long 0x3f654987, 0x3e9a8029
+        .long 0x3f65e2a6, 0x3e97be62
+        .long 0x3f667906, 0x3e95048b
+        .long 0x3f670cb1, 0x3e9252aa
+        .long 0x3f679dae, 0x3e8fa8c5
+        .long 0x3f682c06, 0x3e8d06e3
+        .long 0x3f68b7bf, 0x3e8a6d05
+        .long 0x3f6940e2, 0x3e87db31
+        .long 0x3f69c778, 0x3e855168
+        .long 0x3f6a4b88, 0x3e82cfad
+        .long 0x3f6acd1a, 0x3e805600
+        .long 0x3f6b4c36, 0x3e7bc8c2
+        .long 0x3f6bc8e5, 0x3e76f5a0
+        .long 0x3f6c432f, 0x3e723298
+        .long 0x3f6cbb1b, 0x3e6d7fa5
+        .long 0x3f6d30b1, 0x3e68dcc1
+        .long 0x3f6da3fa, 0x3e6449e7
+        .long 0x3f6e14fe, 0x3e5fc70e
+        .long 0x3f6e83c4, 0x3e5b542b
+        .long 0x3f6ef055, 0x3e56f136
+        .long 0x3f6f5ab8, 0x3e529e21
+        .long 0x3f6fc2f5, 0x3e4e5adf
+        .long 0x3f702915, 0x3e4a2761
+        .long 0x3f708d1f, 0x3e460399
+        .long 0x3f70ef1b, 0x3e41ef75
+        .long 0x3f714f11, 0x3e3deae4
+        .long 0x3f71ad09, 0x3e39f5d2
+        .long 0x3f72090a, 0x3e36102b
+        .long 0x3f72631c, 0x3e3239db
+        .long 0x3f72bb46, 0x3e2e72cb
+        .long 0x3f731191, 0x3e2abae4
+        .long 0x3f736604, 0x3e27120f
+        .long 0x3f73b8a5, 0x3e237833
+        .long 0x3f74097e, 0x3e1fed36
+        .long 0x3f745895, 0x3e1c70fd
+        .long 0x3f74a5f2, 0x3e19036e
+        .long 0x3f74f19b, 0x3e15a46d
+        .long 0x3f753b98, 0x3e1253dc
+        .long 0x3f7583f1, 0x3e0f119f
+        .long 0x3f75caac, 0x3e0bdd96
+        .long 0x3f760fd1, 0x3e08b7a4
+        .long 0x3f765366, 0x3e059fa9
+        .long 0x3f769573, 0x3e029586
+        .long 0x3f76d5fe, 0x3dff3230
+        .long 0x3f77150f, 0x3df95481
+        .long 0x3f7752ab, 0x3df391b9
+        .long 0x3f778eda, 0x3dede995
+        .long 0x3f77c9a2, 0x3de85bd0
+        .long 0x3f78030a, 0x3de2e825
+        .long 0x3f783b18, 0x3ddd8e4c
+        .long 0x3f7871d3, 0x3dd84dfe
+        .long 0x3f78a741, 0x3dd326f3
+        .long 0x3f78db68, 0x3dce18e3
+        .long 0x3f790e50, 0x3dc92385
+        .long 0x3f793ffc, 0x3dc4468f
+        .long 0x3f797075, 0x3dbf81b6
+        .long 0x3f799fbf, 0x3dbad4b0
+        .long 0x3f79cde1, 0x3db63f32
+        .long 0x3f79fae1, 0x3db1c0f1
+        .long 0x3f7a26c4, 0x3dad59a1
+        .long 0x3f7a518f, 0x3da908f6
+        .long 0x3f7a7b4a, 0x3da4cea4
+        .long 0x3f7aa3f9, 0x3da0aa5e
+        .long 0x3f7acba1, 0x3d9c9bd9
+        .long 0x3f7af248, 0x3d98a2c7
+        .long 0x3f7b17f4, 0x3d94bedd
+        .long 0x3f7b3ca9, 0x3d90efcd
+        .long 0x3f7b606e, 0x3d8d354b
+        .long 0x3f7b8346, 0x3d898f0a
+        .long 0x3f7ba537, 0x3d85fcbf
+        .long 0x3f7bc646, 0x3d827e1d
+        .long 0x3f7be677, 0x3d7e25af
+        .long 0x3f7c05d1, 0x3d777546
+        .long 0x3f7c2456, 0x3d70ea68
+        .long 0x3f7c420d, 0x3d6a847d
+        .long 0x3f7c5ef9, 0x3d6442f0
+        .long 0x3f7c7b1f, 0x3d5e252a
+        .long 0x3f7c9684, 0x3d582a98
+        .long 0x3f7cb12b, 0x3d5252a5
+        .long 0x3f7ccb1a, 0x3d4c9cbd
+        .long 0x3f7ce454, 0x3d47084e
+        .long 0x3f7cfcdd, 0x3d4194c7
+        .long 0x3f7d14ba, 0x3d3c4196
+        .long 0x3f7d2bef, 0x3d370e2c
+        .long 0x3f7d427f, 0x3d31f9fb
+        .long 0x3f7d586f, 0x3d2d0474
+        .long 0x3f7d6dc2, 0x3d282d0c
+        .long 0x3f7d827b, 0x3d237336
+        .long 0x3f7d96a0, 0x3d1ed669
+        .long 0x3f7daa32, 0x3d1a561b
+        .long 0x3f7dbd36, 0x3d15f1c6
+        .long 0x3f7dcfb0, 0x3d11a8e1
+        .long 0x3f7de1a2, 0x3d0d7ae9
+        .long 0x3f7df30f, 0x3d09675a
+        .long 0x3f7e03fd, 0x3d056db0
+        .long 0x3f7e146c, 0x3d018d6b
+        .long 0x3f7e2461, 0x3cfb8c15
+        .long 0x3f7e33de, 0x3cf42e22
+        .long 0x3f7e42e8, 0x3ced0003
+        .long 0x3f7e517f, 0x3ce600c0
+        .long 0x3f7e5fa9, 0x3cdf2f67
+        .long 0x3f7e6d66, 0x3cd88b05
+        .long 0x3f7e7abb, 0x3cd212ad
+        .long 0x3f7e87aa, 0x3ccbc574
+        .long 0x3f7e9435, 0x3cc5a273
+        .long 0x3f7ea05f, 0x3cbfa8c4
+        .long 0x3f7eac2b, 0x3cb9d786
+        .long 0x3f7eb79a, 0x3cb42ddb
+        .long 0x3f7ec2b1, 0x3caeaae6
+        .long 0x3f7ecd71, 0x3ca94dcf
+        .long 0x3f7ed7dc, 0x3ca415c2
+        .long 0x3f7ee1f4, 0x3c9f01ec
+        .long 0x3f7eebbd, 0x3c9a117f
+        .long 0x3f7ef537, 0x3c9543ae
+        .long 0x3f7efe66, 0x3c9097b1
+        .long 0x3f7f074b, 0x3c8c0cc2
+        .long 0x3f7f0fe8, 0x3c87a21f
+        .long 0x3f7f1840, 0x3c83570a
+        .long 0x3f7f2053, 0x3c7e558a
+        .long 0x3f7f2826, 0x3c763931
+        .long 0x3f7f2fb8, 0x3c6e579b
+        .long 0x3f7f370c, 0x3c66af65
+        .long 0x3f7f3e23, 0x3c5f3f2d
+        .long 0x3f7f4500, 0x3c58059c
+        .long 0x3f7f4ba4, 0x3c51015f
+        .long 0x3f7f5211, 0x3c4a3127
+        .long 0x3f7f5848, 0x3c4393af
+        .long 0x3f7f5e4b, 0x3c3d27b5
+        .long 0x3f7f641b, 0x3c36ebff
+        .long 0x3f7f69ba, 0x3c30df57
+        .long 0x3f7f6f29, 0x3c2b008e
+        .long 0x3f7f746a, 0x3c254e7b
+        .long 0x3f7f797f, 0x3c1fc7fb
+        .long 0x3f7f7e67, 0x3c1a6bee
+        .long 0x3f7f8326, 0x3c15393d
+        .long 0x3f7f87bb, 0x3c102ed6
+        .long 0x3f7f8c29, 0x3c0b4bab
+        .long 0x3f7f9070, 0x3c068eb5
+        .long 0x3f7f9492, 0x3c01f6f1
+        .long 0x3f7f9890, 0x3bfb06c5
+        .long 0x3f7f9c6b, 0x3bf26625
+        .long 0x3f7fa024, 0x3bea0a1d
+        .long 0x3f7fa3bc, 0x3be1f0d3
+        .long 0x3f7fa734, 0x3bda1876
+        .long 0x3f7faa8d, 0x3bd27f42
+        .long 0x3f7fadc8, 0x3bcb237a
+        .long 0x3f7fb0e6, 0x3bc4036c
+        .long 0x3f7fb3e8, 0x3bbd1d6f
+        .long 0x3f7fb6cf, 0x3bb66fe6
+        .long 0x3f7fb99c, 0x3baff93b
+        .long 0x3f7fbc4f, 0x3ba9b7e1
+        .long 0x3f7fbeea, 0x3ba3aa56
+        .long 0x3f7fc16d, 0x3b9dcf20
+        .long 0x3f7fc3d9, 0x3b9824ce
+        .long 0x3f7fc62e, 0x3b92a9f7
+        .long 0x3f7fc86e, 0x3b8d5d3c
+        .long 0x3f7fca99, 0x3b883d46
+        .long 0x3f7fccb0, 0x3b8348c6
+        .long 0x3f7fceb4, 0x3b7cfce8
+        .long 0x3f7fd0a5, 0x3b73ba24
+        .long 0x3f7fd283, 0x3b6ac6d3
+        .long 0x3f7fd450, 0x3b622096
+        .long 0x3f7fd60c, 0x3b59c51d
+        .long 0x3f7fd7b7, 0x3b51b22a
+        .long 0x3f7fd953, 0x3b49e589
+        .long 0x3f7fdadf, 0x3b425d18
+        .long 0x3f7fdc5c, 0x3b3b16c2
+        .long 0x3f7fddcc, 0x3b341080
+        .long 0x3f7fdf2d, 0x3b2d4858
+        .long 0x3f7fe081, 0x3b26bc5e
+        .long 0x3f7fe1c8, 0x3b206ab2
+        .long 0x3f7fe303, 0x3b1a5183
+        .long 0x3f7fe431, 0x3b146f09
+        .long 0x3f7fe554, 0x3b0ec18c
+        .long 0x3f7fe66c, 0x3b09475d
+        .long 0x3f7fe77a, 0x3b03feda
+        .long 0x3f7fe87d, 0x3afdccdc
+        .long 0x3f7fe975, 0x3af3f919
+        .long 0x3f7fea65, 0x3aea7f6c
+        .long 0x3f7feb4b, 0x3ae15ce8
+        .long 0x3f7fec27, 0x3ad88eb8
+        .long 0x3f7fecfc, 0x3ad0121b
+        .long 0x3f7fedc8, 0x3ac7e464
+        .long 0x3f7fee8c, 0x3ac002f8
+        .long 0x3f7fef48, 0x3ab86b52
+        .long 0x3f7feffd, 0x3ab11afe
+        .long 0x3f7ff0aa, 0x3aaa0f9a
+        .long 0x3f7ff151, 0x3aa346d7
+        .long 0x3f7ff1f1, 0x3a9cbe77
+        .long 0x3f7ff28a, 0x3a96744c
+        .long 0x3f7ff31e, 0x3a90663b
+        .long 0x3f7ff3ab, 0x3a8a9237
+        .long 0x3f7ff433, 0x3a84f643
+        .long 0x3f7ff4b5, 0x3a7f20e7
+        .long 0x3f7ff532, 0x3a74bdd2
+        .long 0x3f7ff5aa, 0x3a6abfa9
+        .long 0x3f7ff61d, 0x3a6122ea
+        .long 0x3f7ff68b, 0x3a57e42f
+        .long 0x3f7ff6f5, 0x3a4f002c
+        .long 0x3f7ff75a, 0x3a4673af
+        .long 0x3f7ff7bb, 0x3a3e3ba2
+        .long 0x3f7ff819, 0x3a365507
+        .long 0x3f7ff872, 0x3a2ebcf6
+        .long 0x3f7ff8c7, 0x3a2770a1
+        .long 0x3f7ff919, 0x3a206d52
+        .long 0x3f7ff968, 0x3a19b066
+        .long 0x3f7ff9b3, 0x3a133754
+        .long 0x3f7ff9fb, 0x3a0cffa3
+        .long 0x3f7ffa40, 0x3a0706f4
+        .long 0x3f7ffa82, 0x3a014af8
+        .long 0x3f7ffac1, 0x39f792ea
+        .long 0x3f7ffafe, 0x39ed0088
+        .long 0x3f7ffb38, 0x39e2daa1
+        .long 0x3f7ffb6f, 0x39d91d2d
+        .long 0x3f7ffba5, 0x39cfc44a
+        .long 0x3f7ffbd7, 0x39c6cc35
+        .long 0x3f7ffc08, 0x39be314d
+        .long 0x3f7ffc36, 0x39b5f011
+        .long 0x3f7ffc63, 0x39ae051c
+        .long 0x3f7ffc8e, 0x39a66d2a
+        .long 0x3f7ffcb6, 0x399f2512
+        .long 0x3f7ffcdd, 0x399829c8
+        .long 0x3f7ffd02, 0x3991785a
+        .long 0x3f7ffd26, 0x398b0df2
+        .long 0x3f7ffd48, 0x3984e7d2
+        .long 0x3f7ffd68, 0x397e06ab
+        .long 0x3f7ffd87, 0x3972bbde
+        .long 0x3f7ffda5, 0x3967ea53
+        .long 0x3f7ffdc1, 0x395d8d4b
+        .long 0x3f7ffddc, 0x3953a034
+        .long 0x3f7ffdf6, 0x394a1ea5
+        .long 0x3f7ffe0f, 0x3941045e
+        .long 0x3f7ffe27, 0x39384d47
+        .long 0x3f7ffe3d, 0x392ff56d
+        .long 0x3f7ffe53, 0x3927f904
+        .long 0x3f7ffe67, 0x39205461
+        .long 0x3f7ffe7b, 0x391903fe
+        .long 0x3f7ffe8d, 0x39120475
+        .long 0x3f7ffe9f, 0x390b5281
+        .long 0x3f7ffeb0, 0x3904eafc
+        .long 0x3f7ffec0, 0x38fd95bd
+        .long 0x3f7ffed0, 0x38f1de7a
+        .long 0x3f7ffedf, 0x38e6aa94
+        .long 0x3f7ffeed, 0x38dbf4a3
+        .long 0x3f7ffefa, 0x38d1b776
+        .long 0x3f7fff07, 0x38c7ee0e
+        .long 0x3f7fff13, 0x38be939c
+        .long 0x3f7fff1f, 0x38b5a381
+        .long 0x3f7fff2a, 0x38ad194e
+        .long 0x3f7fff34, 0x38a4f0bc
+        .long 0x3f7fff3f, 0x389d25b0
+        .long 0x3f7fff48, 0x3895b43b
+        .long 0x3f7fff51, 0x388e9890
+        .long 0x3f7fff5a, 0x3887cf0e
+        .long 0x3f7fff62, 0x38815434
+        .long 0x3f7fff6a, 0x3876494d
+        .long 0x3f7fff72, 0x386a7a5a
+        .long 0x3f7fff79, 0x385f355e
+        .long 0x3f7fff80, 0x38547466
+        .long 0x3f7fff86, 0x384a31bf
+        .long 0x3f7fff8c, 0x384067ee
+        .long 0x3f7fff92, 0x383711b4
+        .long 0x3f7fff98, 0x382e2a06
+        .long 0x3f7fff9d, 0x3825ac0e
+        .long 0x3f7fffa2, 0x381d9329
+        .long 0x3f7fffa7, 0x3815dae6
+        .long 0x3f7fffab, 0x380e7f01
+        .long 0x3f7fffb0, 0x38077b62
+        .long 0x3f7fffb4, 0x3800cc21
+        .long 0x3f7fffb8, 0x37f4daf4
+        .long 0x3f7fffbc, 0x37e8b7ac
+        .long 0x3f7fffbf, 0x37dd2782
+        .long 0x3f7fffc2, 0x37d223dc
+        .long 0x3f7fffc6, 0x37c7a666
+        .long 0x3f7fffc9, 0x37bda912
+        .long 0x3f7fffcc, 0x37b42611
+        .long 0x3f7fffce, 0x37ab17d6
+        .long 0x3f7fffd1, 0x37a2790f
+        .long 0x3f7fffd3, 0x379a44a5
+        .long 0x3f7fffd6, 0x379275b9
+        .long 0x3f7fffd8, 0x378b07a2
+        .long 0x3f7fffda, 0x3783f5e9
+        .long 0x3f7fffdc, 0x377a7897
+        .long 0x3f7fffde, 0x376dad68
+        .long 0x3f7fffe0, 0x37618278
+        .long 0x3f7fffe2, 0x3755f04f
+        .long 0x3f7fffe3, 0x374aefcc
+        .long 0x3f7fffe5, 0x37407a1d
+        .long 0x3f7fffe6, 0x373688bc
+        .long 0x3f7fffe8, 0x372d1570
+        .long 0x3f7fffe9, 0x37241a44
+        .long 0x3f7fffea, 0x371b9188
+        .long 0x3f7fffeb, 0x371375cf
+        .long 0x3f7fffec, 0x370bc1e7
+        .long 0x3f7fffee, 0x370470dd
+        .long 0x3f7fffef, 0x36fafbec
+        .long 0x3f7fffef, 0x36edc95b
+        .long 0x3f7ffff0, 0x36e14167
+        .long 0x3f7ffff1, 0x36d55bd6
+        .long 0x3f7ffff2, 0x36ca10ce
+        .long 0x3f7ffff3, 0x36bf58d1
+        .long 0x3f7ffff4, 0x36b52cb9
+        .long 0x3f7ffff4, 0x36ab85b5
+        .long 0x3f7ffff5, 0x36a25d43
+        .long 0x3f7ffff5, 0x3699ad31
+        .long 0x3f7ffff6, 0x36916f95
+        .long 0x3f7ffff7, 0x36899ecb
+        .long 0x3f7ffff7, 0x36823575
+        .long 0x3f7ffff8, 0x36765ce8
+        .long 0x3f7ffff8, 0x366909cc
+        .long 0x3f7ffff9, 0x365c684a
+        .long 0x3f7ffff9, 0x36506f88
+        .long 0x3f7ffff9, 0x36451713
+        .long 0x3f7ffffa, 0x363a56e4
+        .long 0x3f7ffffa, 0x36302754
+        .long 0x3f7ffffa, 0x36268119
+        .long 0x3f7ffffb, 0x361d5d43
+        .long 0x3f7ffffb, 0x3614b538
+        .long 0x3f7ffffb, 0x360c82b1
+        .long 0x3f7ffffc, 0x3604bfb1
+        .long 0x3f7ffffc, 0x35facd10
+        .long 0x3f7ffffc, 0x35ece39b
+        .long 0x3f7ffffc, 0x35dfb8b6
+        .long 0x3f7ffffd, 0x35d34296
+        .long 0x3f7ffffd, 0x35c777ec
+        .long 0x3f7ffffd, 0x35bc4fdc
+        .long 0x3f7ffffd, 0x35b1c1fc
+        .long 0x3f7ffffd, 0x35a7c64b
+        .long 0x3f7ffffd, 0x359e5531
+        .long 0x3f7ffffe, 0x35956771
+        .long 0x3f7ffffe, 0x358cf630
+        .long 0x3f7ffffe, 0x3584fae8
+        .long 0x3f7ffffe, 0x357adecb
+        .long 0x3f7ffffe, 0x356c9b8f
+        .long 0x3f7ffffe, 0x355f20ef
+        .long 0x3f7ffffe, 0x3552644f
+        .long 0x3f7ffffe, 0x35465b9c
+        .long 0x3f7fffff, 0x353afd47
+        .long 0x3f7fffff, 0x3530403c
+        .long 0x3f7fffff, 0x35261be0
+        .long 0x3f7fffff, 0x351c8807
+        .long 0x3f7fffff, 0x35137cf0
+        .long 0x3f7fffff, 0x350af341
+        .long 0x3f7fffff, 0x3502e402
+        .long 0x3f7fffff, 0x34f6912a
+        .long 0x3f7fffff, 0x34e8356b
+        .long 0x3f7fffff, 0x34daa8e4
+        .long 0x3f7fffff, 0x34cde050
+        .long 0x3f7fffff, 0x34c1d100
+        .long 0x3f7fffff, 0x34b670d5
+        .long 0x3f7fffff, 0x34abb639
+        .long 0x3f7fffff, 0x34a19816
+        .long 0x3f7fffff, 0x34980dd1
+        .long 0x3f7fffff, 0x348f0f43
+        .long 0x3f7fffff, 0x348694b3
+        .long 0x3f800000, 0x347d2da8
+        .long 0x3f800000, 0x346e1d72
+        .align 16
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _AbsMask */
+        .align 16
+        .long 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000  /* _MaxThreshold */
+        .align 16
+        .long 0x47800000, 0x47800000, 0x47800000, 0x47800000  /* _SRound */
+        .align 16
+        .long 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000  /* _U2THreshold  */
+        .align 16
+        .long 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade  /* _poly_3_0 */
+        .align 16
+        .type	__svml_serf_data_internal,@object
+        .size	__svml_serf_data_internal,.-__svml_serf_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core-sse.S
new file mode 100644
index 0000000000..4b939f8c55
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized erff, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_erff _ZGVdN8v_erff_sse_wrapper
+#include "../svml_s_erff8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core.c
new file mode 100644
index 0000000000..50f5901db1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized erff, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_erff
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_erff, __GI__ZGVdN8v_erff,
+	       __redirect__ZGVdN8v_erff)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core_avx2.S
new file mode 100644
index 0000000000..4cd82b45e9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core_avx2.S
@@ -0,0 +1,669 @@
+/* Function erff vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Basic formula is
+ *    erf(x) ~ erf(x0) +
+ *              + exp(-x0*x0)*D*(1+c0+T*P1(T)+D^2*P3(T)+D^4*p5)
+ *   where D=x-x0, T=x0*D
+ *   x0 is x rounded to a specified number of fractional bits (in this case 8),
+ *    except that x0=0 for |x|<3.5/256.0 (using x0=0 for first 4 table entries)
+ *
+ *   Data table packs both erf(x0)_high and a few bits of erf(x0)_low in one
+ *   entry (in place of redundant exponent bits)
+ *
+ */
+
+/* Offsets for data table __svml_serf_data_internal
+ */
+#define _erf_tbl                      	0
+#define _AbsMask                      	4032
+#define _MaxThreshold                 	4064
+#define _SRound                       	4096
+#define _U2Threshold                  	4128
+#define _poly3_0                      	4160
+
+/* Lookup bias for data table __svml_serf_data_internal.  */
+#define Table_Lookup_Bias               -0x3c000000
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_erff_avx2)
+        lea       Table_Lookup_Bias+__svml_serf_data_internal(%rip), %rax
+
+/*
+ * vector gather:
+ * erf(x0), exp(-x0*x0)*2.0/sqrt(pi)
+ */
+        vmovups   _SRound+__svml_serf_data_internal(%rip), %ymm7
+        vandps    _AbsMask+__svml_serf_data_internal(%rip), %ymm0, %ymm6
+
+/*
+ * erf(x) rounds to 1.0 for x>_MaxThreshold (3.9375)
+ * can compute all results in the main path
+ */
+        vminps    _MaxThreshold+__svml_serf_data_internal(%rip), %ymm6, %ymm8
+        vaddps    %ymm7, %ymm8, %ymm10
+        vcmpgt_oqps _U2Threshold+__svml_serf_data_internal(%rip), %ymm8, %ymm9
+        vpslld    $3, %ymm10, %ymm11
+        vsubps    %ymm7, %ymm10, %ymm4
+        vsubps    %ymm4, %ymm8, %ymm3
+        vandps    %ymm9, %ymm3, %ymm2
+
+/* NaN fixup */
+        vminps    %ymm6, %ymm3, %ymm3
+
+/* D2 = Diff^2 */
+        vmulps    %ymm2, %ymm2, %ymm2
+
+/* save sign */
+        vxorps    %ymm0, %ymm6, %ymm5
+        vmovd     %xmm11, %edx
+        vextractf128 $1, %ymm11, %xmm12
+        vpextrd   $2, %xmm11, %esi
+        movslq    %edx, %rdx
+        movslq    %esi, %rsi
+        vmovd     %xmm12, %r8d
+        vmovq     (%rax,%rdx), %xmm13
+        vmovq     (%rax,%rsi), %xmm14
+        vunpcklps %xmm14, %xmm13, %xmm10
+        vmovups   _poly3_0+__svml_serf_data_internal(%rip), %ymm14
+        vpextrd   $1, %xmm11, %ecx
+        vpextrd   $3, %xmm11, %edi
+        vpextrd   $1, %xmm12, %r9d
+        vpextrd   $2, %xmm12, %r10d
+        vpextrd   $3, %xmm12, %r11d
+
+/*
+ * Start polynomial evaluation
+ * P1
+ */
+        vfmsub231ps %ymm14, %ymm3, %ymm4
+        movslq    %ecx, %rcx
+        movslq    %edi, %rdi
+        movslq    %r8d, %r8
+        movslq    %r9d, %r9
+        movslq    %r10d, %r10
+        movslq    %r11d, %r11
+        vmovq     (%rax,%rcx), %xmm1
+        vmovq     (%rax,%rdi), %xmm15
+
+/*
+ * branch-free
+ * (exp_h(x0) * Diff) * (poly + 1.0)
+ */
+        vfmadd213ps %ymm3, %ymm2, %ymm4
+        vmovq     (%rax,%r8), %xmm7
+        vmovq     (%rax,%r9), %xmm0
+        vmovq     (%rax,%r10), %xmm8
+        vmovq     (%rax,%r11), %xmm9
+        vunpcklps %xmm15, %xmm1, %xmm11
+        vunpcklps %xmm8, %xmm7, %xmm1
+        vunpcklps %xmm9, %xmm0, %xmm0
+        vinsertf128 $1, %xmm1, %ymm10, %ymm12
+        vinsertf128 $1, %xmm0, %ymm11, %ymm13
+        vunpcklps %ymm13, %ymm12, %ymm0
+        vunpckhps %ymm13, %ymm12, %ymm15
+
+/* Final result */
+        vfmadd213ps %ymm0, %ymm15, %ymm4
+
+/* set sign */
+        vorps     %ymm5, %ymm4, %ymm0
+        ret
+
+END(_ZGVdN8v_erff_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_serf_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _erf_tbl[1008][1];
+        __declspec(align(32)) VUINT32 _AbsMask[8][1];
+        __declspec(align(32)) VUINT32 _MaxThreshold[8][1];
+        __declspec(align(32)) VUINT32 _SRound[8][1];
+        __declspec(align(32)) VUINT32 _U2Threshold[8][1];
+        __declspec(align(32)) VUINT32 _poly3_0[8][1];
+} __svml_serf_data_internal;
+#endif
+__svml_serf_data_internal:
+        /*== _erf_tbl ==*/
+        .long 0x00000000, 0x3f906ebb
+        .long 0x3c106dfa, 0x3f906c79
+        .long 0x3c906bb8, 0x3f9065b4
+        .long 0x3cd89bf0, 0x3f905a6c
+        .long 0x3d1062b2, 0x3f904aa3
+        .long 0x3d3472ea, 0x3f90365a
+        .long 0x3d587d7f, 0x3f901d93
+        .long 0x3d7c8154, 0x3f900050
+        .long 0x3d903ea4, 0x3f8fde94
+        .long 0x3da2381f, 0x3f8fb862
+        .long 0x3db42c8d, 0x3f8f8dbd
+        .long 0x3dc61b5f, 0x3f8f5eab
+        .long 0x3dd80409, 0x3f8f2b2e
+        .long 0x3de9e5fc, 0x3f8ef34c
+        .long 0x3dfbc0ad, 0x3f8eb70a
+        .long 0x3e06c9c8, 0x3f8e766e
+        .long 0x3e0faf0d, 0x3f8e317d
+        .long 0x3e188fe1, 0x3f8de83e
+        .long 0x3e216bfe, 0x3f8d9ab9
+        .long 0x3e2a4321, 0x3f8d48f3
+        .long 0x3e331506, 0x3f8cf2f5
+        .long 0x3e3be169, 0x3f8c98c6
+        .long 0x3e44a808, 0x3f8c3a6f
+        .long 0x3e4d68a1, 0x3f8bd7f8
+        .long 0x3e5622f2, 0x3f8b716c
+        .long 0x3e5ed6b9, 0x3f8b06d2
+        .long 0x3e6783b7, 0x3f8a9834
+        .long 0x3e7029aa, 0x3f8a259e
+        .long 0x3e78c855, 0x3f89af18
+        .long 0x3e80afbc, 0x3f8934af
+        .long 0x3e84f76b, 0x3f88b66c
+        .long 0x3e893b19, 0x3f88345d
+        .long 0x3e8d7aa7, 0x3f87ae8b
+        .long 0x3e91b5f8, 0x3f872504
+        .long 0x3e95ecee, 0x3f8697d3
+        .long 0x3e9a1f6b, 0x3f860705
+        .long 0x3e9e4d54, 0x3f8572a8
+        .long 0x3ea2768c, 0x3f84dac8
+        .long 0x3ea69af8, 0x3f843f72
+        .long 0x3eaaba7a, 0x3f83a0b6
+        .long 0x3eaed4fa, 0x3f82fe9f
+        .long 0x3eb2ea5c, 0x3f82593e
+        .long 0x3eb6fa85, 0x3f81b0a0
+        .long 0x3ebb055d, 0x3f8104d3
+        .long 0x3ebf0aca, 0x3f8055e8
+        .long 0x3ec30ab3, 0x3f7f47d8
+        .long 0x3ec70501, 0x3f7ddddf
+        .long 0x3ecaf99b, 0x3f7c6e05
+        .long 0x3ecee869, 0x3f7af867
+        .long 0x3ed2d156, 0x3f797d26
+        .long 0x3ed6b44b, 0x3f77fc62
+        .long 0x3eda9132, 0x3f76763c
+        .long 0x3ede67f6, 0x3f74ead4
+        .long 0x3ee23882, 0x3f735a4c
+        .long 0x3ee602c2, 0x3f71c4c4
+        .long 0x3ee9c6a2, 0x3f702a5f
+        .long 0x3eed840e, 0x3f6e8b3e
+        .long 0x3ef13af5, 0x3f6ce783
+        .long 0x3ef4eb45, 0x3f6b3f51
+        .long 0x3ef894ea, 0x3f6992c9
+        .long 0x3efc37d5, 0x3f67e20f
+        .long 0x3effd3f5, 0x3f662d45
+        .long 0x3f01b49d, 0x3f64748e
+        .long 0x3f037bca, 0x3f62b80d
+        .long 0x3f053f7b, 0x3f60f7e5
+        .long 0x3f06ffa8, 0x3f5f3439
+        .long 0x3f08bc4a, 0x3f5d6d2d
+        .long 0x3f0a755a, 0x3f5ba2e3
+        .long 0x3f0c2ad3, 0x3f59d57e
+        .long 0x3f0ddcae, 0x3f580523
+        .long 0x3f0f8ae6, 0x3f5631f4
+        .long 0x3f113574, 0x3f545c14
+        .long 0x3f12dc54, 0x3f5283a7
+        .long 0x3f147f81, 0x3f50a8cf
+        .long 0x3f161ef6, 0x3f4ecbb1
+        .long 0x3f17baae, 0x3f4cec6d
+        .long 0x3f1952a6, 0x3f4b0b28
+        .long 0x3f1ae6da, 0x3f492804
+        .long 0x3f1c7745, 0x3f474323
+        .long 0x3f1e03e5, 0x3f455ca8
+        .long 0x3f1f8cb7, 0x3f4374b5
+        .long 0x3f2111b7, 0x3f418b6b
+        .long 0x3f2292e4, 0x3f3fa0ee
+        .long 0x3f24103a, 0x3f3db55e
+        .long 0x3f2589b9, 0x3f3bc8dc
+        .long 0x3f26ff5d, 0x3f39db8a
+        .long 0x3f287126, 0x3f37ed89
+        .long 0x3f29df13, 0x3f35fef8
+        .long 0x3f2b4922, 0x3f340ff9
+        .long 0x3f2caf53, 0x3f3220ab
+        .long 0x3f2e11a4, 0x3f30312e
+        .long 0x3f2f7017, 0x3f2e41a1
+        .long 0x3f30caab, 0x3f2c5223
+        .long 0x3f322160, 0x3f2a62d3
+        .long 0x3f337437, 0x3f2873cf
+        .long 0x3f34c32f, 0x3f268534
+        .long 0x3f360e4c, 0x3f249721
+        .long 0x3f37558c, 0x3f22a9b3
+        .long 0x3f3898f3, 0x3f20bd06
+        .long 0x3f39d881, 0x3f1ed137
+        .long 0x3f3b1438, 0x3f1ce661
+        .long 0x3f3c4c1b, 0x3f1afca0
+        .long 0x3f3d802c, 0x3f19140f
+        .long 0x3f3eb06c, 0x3f172cc9
+        .long 0x3f3fdce0, 0x3f1546e7
+        .long 0x3f410589, 0x3f136284
+        .long 0x3f422a6b, 0x3f117fb9
+        .long 0x3f434b89, 0x3f0f9e9e
+        .long 0x3f4468e7, 0x3f0dbf4c
+        .long 0x3f458287, 0x3f0be1db
+        .long 0x3f46986f, 0x3f0a0662
+        .long 0x3f47aaa2, 0x3f082cf7
+        .long 0x3f48b925, 0x3f0655b1
+        .long 0x3f49c3fb, 0x3f0480a6
+        .long 0x3f4acb29, 0x3f02adeb
+        .long 0x3f4bceb4, 0x3f00dd96
+        .long 0x3f4ccea1, 0x3efe1f73
+        .long 0x3f4dcaf4, 0x3efa88d5
+        .long 0x3f4ec3b4, 0x3ef6f777
+        .long 0x3f4fb8e5, 0x3ef36b80
+        .long 0x3f50aa8d, 0x3eefe513
+        .long 0x3f5198b1, 0x3eec6455
+        .long 0x3f528358, 0x3ee8e968
+        .long 0x3f536a86, 0x3ee5746d
+        .long 0x3f544e43, 0x3ee20584
+        .long 0x3f552e93, 0x3ede9ccc
+        .long 0x3f560b7e, 0x3edb3a64
+        .long 0x3f56e50a, 0x3ed7de6a
+        .long 0x3f57bb3d, 0x3ed488f8
+        .long 0x3f588e1e, 0x3ed13a2b
+        .long 0x3f595db4, 0x3ecdf21c
+        .long 0x3f5a2a05, 0x3ecab0e4
+        .long 0x3f5af318, 0x3ec7769b
+        .long 0x3f5bb8f4, 0x3ec44359
+        .long 0x3f5c7ba1, 0x3ec11733
+        .long 0x3f5d3b25, 0x3ebdf23d
+        .long 0x3f5df788, 0x3ebad48d
+        .long 0x3f5eb0d1, 0x3eb7be35
+        .long 0x3f5f6707, 0x3eb4af46
+        .long 0x3f601a32, 0x3eb1a7d3
+        .long 0x3f60ca59, 0x3eaea7ea
+        .long 0x3f617784, 0x3eabaf9a
+        .long 0x3f6221bb, 0x3ea8bef3
+        .long 0x3f62c905, 0x3ea5d600
+        .long 0x3f636d69, 0x3ea2f4ce
+        .long 0x3f640ef1, 0x3ea01b68
+        .long 0x3f64ada3, 0x3e9d49d9
+        .long 0x3f654987, 0x3e9a8029
+        .long 0x3f65e2a6, 0x3e97be62
+        .long 0x3f667906, 0x3e95048b
+        .long 0x3f670cb1, 0x3e9252aa
+        .long 0x3f679dae, 0x3e8fa8c5
+        .long 0x3f682c06, 0x3e8d06e3
+        .long 0x3f68b7bf, 0x3e8a6d05
+        .long 0x3f6940e2, 0x3e87db31
+        .long 0x3f69c778, 0x3e855168
+        .long 0x3f6a4b88, 0x3e82cfad
+        .long 0x3f6acd1a, 0x3e805600
+        .long 0x3f6b4c36, 0x3e7bc8c2
+        .long 0x3f6bc8e5, 0x3e76f5a0
+        .long 0x3f6c432f, 0x3e723298
+        .long 0x3f6cbb1b, 0x3e6d7fa5
+        .long 0x3f6d30b1, 0x3e68dcc1
+        .long 0x3f6da3fa, 0x3e6449e7
+        .long 0x3f6e14fe, 0x3e5fc70e
+        .long 0x3f6e83c4, 0x3e5b542b
+        .long 0x3f6ef055, 0x3e56f136
+        .long 0x3f6f5ab8, 0x3e529e21
+        .long 0x3f6fc2f5, 0x3e4e5adf
+        .long 0x3f702915, 0x3e4a2761
+        .long 0x3f708d1f, 0x3e460399
+        .long 0x3f70ef1b, 0x3e41ef75
+        .long 0x3f714f11, 0x3e3deae4
+        .long 0x3f71ad09, 0x3e39f5d2
+        .long 0x3f72090a, 0x3e36102b
+        .long 0x3f72631c, 0x3e3239db
+        .long 0x3f72bb46, 0x3e2e72cb
+        .long 0x3f731191, 0x3e2abae4
+        .long 0x3f736604, 0x3e27120f
+        .long 0x3f73b8a5, 0x3e237833
+        .long 0x3f74097e, 0x3e1fed36
+        .long 0x3f745895, 0x3e1c70fd
+        .long 0x3f74a5f2, 0x3e19036e
+        .long 0x3f74f19b, 0x3e15a46d
+        .long 0x3f753b98, 0x3e1253dc
+        .long 0x3f7583f1, 0x3e0f119f
+        .long 0x3f75caac, 0x3e0bdd96
+        .long 0x3f760fd1, 0x3e08b7a4
+        .long 0x3f765366, 0x3e059fa9
+        .long 0x3f769573, 0x3e029586
+        .long 0x3f76d5fe, 0x3dff3230
+        .long 0x3f77150f, 0x3df95481
+        .long 0x3f7752ab, 0x3df391b9
+        .long 0x3f778eda, 0x3dede995
+        .long 0x3f77c9a2, 0x3de85bd0
+        .long 0x3f78030a, 0x3de2e825
+        .long 0x3f783b18, 0x3ddd8e4c
+        .long 0x3f7871d3, 0x3dd84dfe
+        .long 0x3f78a741, 0x3dd326f3
+        .long 0x3f78db68, 0x3dce18e3
+        .long 0x3f790e50, 0x3dc92385
+        .long 0x3f793ffc, 0x3dc4468f
+        .long 0x3f797075, 0x3dbf81b6
+        .long 0x3f799fbf, 0x3dbad4b0
+        .long 0x3f79cde1, 0x3db63f32
+        .long 0x3f79fae1, 0x3db1c0f1
+        .long 0x3f7a26c4, 0x3dad59a1
+        .long 0x3f7a518f, 0x3da908f6
+        .long 0x3f7a7b4a, 0x3da4cea4
+        .long 0x3f7aa3f9, 0x3da0aa5e
+        .long 0x3f7acba1, 0x3d9c9bd9
+        .long 0x3f7af248, 0x3d98a2c7
+        .long 0x3f7b17f4, 0x3d94bedd
+        .long 0x3f7b3ca9, 0x3d90efcd
+        .long 0x3f7b606e, 0x3d8d354b
+        .long 0x3f7b8346, 0x3d898f0a
+        .long 0x3f7ba537, 0x3d85fcbf
+        .long 0x3f7bc646, 0x3d827e1d
+        .long 0x3f7be677, 0x3d7e25af
+        .long 0x3f7c05d1, 0x3d777546
+        .long 0x3f7c2456, 0x3d70ea68
+        .long 0x3f7c420d, 0x3d6a847d
+        .long 0x3f7c5ef9, 0x3d6442f0
+        .long 0x3f7c7b1f, 0x3d5e252a
+        .long 0x3f7c9684, 0x3d582a98
+        .long 0x3f7cb12b, 0x3d5252a5
+        .long 0x3f7ccb1a, 0x3d4c9cbd
+        .long 0x3f7ce454, 0x3d47084e
+        .long 0x3f7cfcdd, 0x3d4194c7
+        .long 0x3f7d14ba, 0x3d3c4196
+        .long 0x3f7d2bef, 0x3d370e2c
+        .long 0x3f7d427f, 0x3d31f9fb
+        .long 0x3f7d586f, 0x3d2d0474
+        .long 0x3f7d6dc2, 0x3d282d0c
+        .long 0x3f7d827b, 0x3d237336
+        .long 0x3f7d96a0, 0x3d1ed669
+        .long 0x3f7daa32, 0x3d1a561b
+        .long 0x3f7dbd36, 0x3d15f1c6
+        .long 0x3f7dcfb0, 0x3d11a8e1
+        .long 0x3f7de1a2, 0x3d0d7ae9
+        .long 0x3f7df30f, 0x3d09675a
+        .long 0x3f7e03fd, 0x3d056db0
+        .long 0x3f7e146c, 0x3d018d6b
+        .long 0x3f7e2461, 0x3cfb8c15
+        .long 0x3f7e33de, 0x3cf42e22
+        .long 0x3f7e42e8, 0x3ced0003
+        .long 0x3f7e517f, 0x3ce600c0
+        .long 0x3f7e5fa9, 0x3cdf2f67
+        .long 0x3f7e6d66, 0x3cd88b05
+        .long 0x3f7e7abb, 0x3cd212ad
+        .long 0x3f7e87aa, 0x3ccbc574
+        .long 0x3f7e9435, 0x3cc5a273
+        .long 0x3f7ea05f, 0x3cbfa8c4
+        .long 0x3f7eac2b, 0x3cb9d786
+        .long 0x3f7eb79a, 0x3cb42ddb
+        .long 0x3f7ec2b1, 0x3caeaae6
+        .long 0x3f7ecd71, 0x3ca94dcf
+        .long 0x3f7ed7dc, 0x3ca415c2
+        .long 0x3f7ee1f4, 0x3c9f01ec
+        .long 0x3f7eebbd, 0x3c9a117f
+        .long 0x3f7ef537, 0x3c9543ae
+        .long 0x3f7efe66, 0x3c9097b1
+        .long 0x3f7f074b, 0x3c8c0cc2
+        .long 0x3f7f0fe8, 0x3c87a21f
+        .long 0x3f7f1840, 0x3c83570a
+        .long 0x3f7f2053, 0x3c7e558a
+        .long 0x3f7f2826, 0x3c763931
+        .long 0x3f7f2fb8, 0x3c6e579b
+        .long 0x3f7f370c, 0x3c66af65
+        .long 0x3f7f3e23, 0x3c5f3f2d
+        .long 0x3f7f4500, 0x3c58059c
+        .long 0x3f7f4ba4, 0x3c51015f
+        .long 0x3f7f5211, 0x3c4a3127
+        .long 0x3f7f5848, 0x3c4393af
+        .long 0x3f7f5e4b, 0x3c3d27b5
+        .long 0x3f7f641b, 0x3c36ebff
+        .long 0x3f7f69ba, 0x3c30df57
+        .long 0x3f7f6f29, 0x3c2b008e
+        .long 0x3f7f746a, 0x3c254e7b
+        .long 0x3f7f797f, 0x3c1fc7fb
+        .long 0x3f7f7e67, 0x3c1a6bee
+        .long 0x3f7f8326, 0x3c15393d
+        .long 0x3f7f87bb, 0x3c102ed6
+        .long 0x3f7f8c29, 0x3c0b4bab
+        .long 0x3f7f9070, 0x3c068eb5
+        .long 0x3f7f9492, 0x3c01f6f1
+        .long 0x3f7f9890, 0x3bfb06c5
+        .long 0x3f7f9c6b, 0x3bf26625
+        .long 0x3f7fa024, 0x3bea0a1d
+        .long 0x3f7fa3bc, 0x3be1f0d3
+        .long 0x3f7fa734, 0x3bda1876
+        .long 0x3f7faa8d, 0x3bd27f42
+        .long 0x3f7fadc8, 0x3bcb237a
+        .long 0x3f7fb0e6, 0x3bc4036c
+        .long 0x3f7fb3e8, 0x3bbd1d6f
+        .long 0x3f7fb6cf, 0x3bb66fe6
+        .long 0x3f7fb99c, 0x3baff93b
+        .long 0x3f7fbc4f, 0x3ba9b7e1
+        .long 0x3f7fbeea, 0x3ba3aa56
+        .long 0x3f7fc16d, 0x3b9dcf20
+        .long 0x3f7fc3d9, 0x3b9824ce
+        .long 0x3f7fc62e, 0x3b92a9f7
+        .long 0x3f7fc86e, 0x3b8d5d3c
+        .long 0x3f7fca99, 0x3b883d46
+        .long 0x3f7fccb0, 0x3b8348c6
+        .long 0x3f7fceb4, 0x3b7cfce8
+        .long 0x3f7fd0a5, 0x3b73ba24
+        .long 0x3f7fd283, 0x3b6ac6d3
+        .long 0x3f7fd450, 0x3b622096
+        .long 0x3f7fd60c, 0x3b59c51d
+        .long 0x3f7fd7b7, 0x3b51b22a
+        .long 0x3f7fd953, 0x3b49e589
+        .long 0x3f7fdadf, 0x3b425d18
+        .long 0x3f7fdc5c, 0x3b3b16c2
+        .long 0x3f7fddcc, 0x3b341080
+        .long 0x3f7fdf2d, 0x3b2d4858
+        .long 0x3f7fe081, 0x3b26bc5e
+        .long 0x3f7fe1c8, 0x3b206ab2
+        .long 0x3f7fe303, 0x3b1a5183
+        .long 0x3f7fe431, 0x3b146f09
+        .long 0x3f7fe554, 0x3b0ec18c
+        .long 0x3f7fe66c, 0x3b09475d
+        .long 0x3f7fe77a, 0x3b03feda
+        .long 0x3f7fe87d, 0x3afdccdc
+        .long 0x3f7fe975, 0x3af3f919
+        .long 0x3f7fea65, 0x3aea7f6c
+        .long 0x3f7feb4b, 0x3ae15ce8
+        .long 0x3f7fec27, 0x3ad88eb8
+        .long 0x3f7fecfc, 0x3ad0121b
+        .long 0x3f7fedc8, 0x3ac7e464
+        .long 0x3f7fee8c, 0x3ac002f8
+        .long 0x3f7fef48, 0x3ab86b52
+        .long 0x3f7feffd, 0x3ab11afe
+        .long 0x3f7ff0aa, 0x3aaa0f9a
+        .long 0x3f7ff151, 0x3aa346d7
+        .long 0x3f7ff1f1, 0x3a9cbe77
+        .long 0x3f7ff28a, 0x3a96744c
+        .long 0x3f7ff31e, 0x3a90663b
+        .long 0x3f7ff3ab, 0x3a8a9237
+        .long 0x3f7ff433, 0x3a84f643
+        .long 0x3f7ff4b5, 0x3a7f20e7
+        .long 0x3f7ff532, 0x3a74bdd2
+        .long 0x3f7ff5aa, 0x3a6abfa9
+        .long 0x3f7ff61d, 0x3a6122ea
+        .long 0x3f7ff68b, 0x3a57e42f
+        .long 0x3f7ff6f5, 0x3a4f002c
+        .long 0x3f7ff75a, 0x3a4673af
+        .long 0x3f7ff7bb, 0x3a3e3ba2
+        .long 0x3f7ff819, 0x3a365507
+        .long 0x3f7ff872, 0x3a2ebcf6
+        .long 0x3f7ff8c7, 0x3a2770a1
+        .long 0x3f7ff919, 0x3a206d52
+        .long 0x3f7ff968, 0x3a19b066
+        .long 0x3f7ff9b3, 0x3a133754
+        .long 0x3f7ff9fb, 0x3a0cffa3
+        .long 0x3f7ffa40, 0x3a0706f4
+        .long 0x3f7ffa82, 0x3a014af8
+        .long 0x3f7ffac1, 0x39f792ea
+        .long 0x3f7ffafe, 0x39ed0088
+        .long 0x3f7ffb38, 0x39e2daa1
+        .long 0x3f7ffb6f, 0x39d91d2d
+        .long 0x3f7ffba5, 0x39cfc44a
+        .long 0x3f7ffbd7, 0x39c6cc35
+        .long 0x3f7ffc08, 0x39be314d
+        .long 0x3f7ffc36, 0x39b5f011
+        .long 0x3f7ffc63, 0x39ae051c
+        .long 0x3f7ffc8e, 0x39a66d2a
+        .long 0x3f7ffcb6, 0x399f2512
+        .long 0x3f7ffcdd, 0x399829c8
+        .long 0x3f7ffd02, 0x3991785a
+        .long 0x3f7ffd26, 0x398b0df2
+        .long 0x3f7ffd48, 0x3984e7d2
+        .long 0x3f7ffd68, 0x397e06ab
+        .long 0x3f7ffd87, 0x3972bbde
+        .long 0x3f7ffda5, 0x3967ea53
+        .long 0x3f7ffdc1, 0x395d8d4b
+        .long 0x3f7ffddc, 0x3953a034
+        .long 0x3f7ffdf6, 0x394a1ea5
+        .long 0x3f7ffe0f, 0x3941045e
+        .long 0x3f7ffe27, 0x39384d47
+        .long 0x3f7ffe3d, 0x392ff56d
+        .long 0x3f7ffe53, 0x3927f904
+        .long 0x3f7ffe67, 0x39205461
+        .long 0x3f7ffe7b, 0x391903fe
+        .long 0x3f7ffe8d, 0x39120475
+        .long 0x3f7ffe9f, 0x390b5281
+        .long 0x3f7ffeb0, 0x3904eafc
+        .long 0x3f7ffec0, 0x38fd95bd
+        .long 0x3f7ffed0, 0x38f1de7a
+        .long 0x3f7ffedf, 0x38e6aa94
+        .long 0x3f7ffeed, 0x38dbf4a3
+        .long 0x3f7ffefa, 0x38d1b776
+        .long 0x3f7fff07, 0x38c7ee0e
+        .long 0x3f7fff13, 0x38be939c
+        .long 0x3f7fff1f, 0x38b5a381
+        .long 0x3f7fff2a, 0x38ad194e
+        .long 0x3f7fff34, 0x38a4f0bc
+        .long 0x3f7fff3f, 0x389d25b0
+        .long 0x3f7fff48, 0x3895b43b
+        .long 0x3f7fff51, 0x388e9890
+        .long 0x3f7fff5a, 0x3887cf0e
+        .long 0x3f7fff62, 0x38815434
+        .long 0x3f7fff6a, 0x3876494d
+        .long 0x3f7fff72, 0x386a7a5a
+        .long 0x3f7fff79, 0x385f355e
+        .long 0x3f7fff80, 0x38547466
+        .long 0x3f7fff86, 0x384a31bf
+        .long 0x3f7fff8c, 0x384067ee
+        .long 0x3f7fff92, 0x383711b4
+        .long 0x3f7fff98, 0x382e2a06
+        .long 0x3f7fff9d, 0x3825ac0e
+        .long 0x3f7fffa2, 0x381d9329
+        .long 0x3f7fffa7, 0x3815dae6
+        .long 0x3f7fffab, 0x380e7f01
+        .long 0x3f7fffb0, 0x38077b62
+        .long 0x3f7fffb4, 0x3800cc21
+        .long 0x3f7fffb8, 0x37f4daf4
+        .long 0x3f7fffbc, 0x37e8b7ac
+        .long 0x3f7fffbf, 0x37dd2782
+        .long 0x3f7fffc2, 0x37d223dc
+        .long 0x3f7fffc6, 0x37c7a666
+        .long 0x3f7fffc9, 0x37bda912
+        .long 0x3f7fffcc, 0x37b42611
+        .long 0x3f7fffce, 0x37ab17d6
+        .long 0x3f7fffd1, 0x37a2790f
+        .long 0x3f7fffd3, 0x379a44a5
+        .long 0x3f7fffd6, 0x379275b9
+        .long 0x3f7fffd8, 0x378b07a2
+        .long 0x3f7fffda, 0x3783f5e9
+        .long 0x3f7fffdc, 0x377a7897
+        .long 0x3f7fffde, 0x376dad68
+        .long 0x3f7fffe0, 0x37618278
+        .long 0x3f7fffe2, 0x3755f04f
+        .long 0x3f7fffe3, 0x374aefcc
+        .long 0x3f7fffe5, 0x37407a1d
+        .long 0x3f7fffe6, 0x373688bc
+        .long 0x3f7fffe8, 0x372d1570
+        .long 0x3f7fffe9, 0x37241a44
+        .long 0x3f7fffea, 0x371b9188
+        .long 0x3f7fffeb, 0x371375cf
+        .long 0x3f7fffec, 0x370bc1e7
+        .long 0x3f7fffee, 0x370470dd
+        .long 0x3f7fffef, 0x36fafbec
+        .long 0x3f7fffef, 0x36edc95b
+        .long 0x3f7ffff0, 0x36e14167
+        .long 0x3f7ffff1, 0x36d55bd6
+        .long 0x3f7ffff2, 0x36ca10ce
+        .long 0x3f7ffff3, 0x36bf58d1
+        .long 0x3f7ffff4, 0x36b52cb9
+        .long 0x3f7ffff4, 0x36ab85b5
+        .long 0x3f7ffff5, 0x36a25d43
+        .long 0x3f7ffff5, 0x3699ad31
+        .long 0x3f7ffff6, 0x36916f95
+        .long 0x3f7ffff7, 0x36899ecb
+        .long 0x3f7ffff7, 0x36823575
+        .long 0x3f7ffff8, 0x36765ce8
+        .long 0x3f7ffff8, 0x366909cc
+        .long 0x3f7ffff9, 0x365c684a
+        .long 0x3f7ffff9, 0x36506f88
+        .long 0x3f7ffff9, 0x36451713
+        .long 0x3f7ffffa, 0x363a56e4
+        .long 0x3f7ffffa, 0x36302754
+        .long 0x3f7ffffa, 0x36268119
+        .long 0x3f7ffffb, 0x361d5d43
+        .long 0x3f7ffffb, 0x3614b538
+        .long 0x3f7ffffb, 0x360c82b1
+        .long 0x3f7ffffc, 0x3604bfb1
+        .long 0x3f7ffffc, 0x35facd10
+        .long 0x3f7ffffc, 0x35ece39b
+        .long 0x3f7ffffc, 0x35dfb8b6
+        .long 0x3f7ffffd, 0x35d34296
+        .long 0x3f7ffffd, 0x35c777ec
+        .long 0x3f7ffffd, 0x35bc4fdc
+        .long 0x3f7ffffd, 0x35b1c1fc
+        .long 0x3f7ffffd, 0x35a7c64b
+        .long 0x3f7ffffd, 0x359e5531
+        .long 0x3f7ffffe, 0x35956771
+        .long 0x3f7ffffe, 0x358cf630
+        .long 0x3f7ffffe, 0x3584fae8
+        .long 0x3f7ffffe, 0x357adecb
+        .long 0x3f7ffffe, 0x356c9b8f
+        .long 0x3f7ffffe, 0x355f20ef
+        .long 0x3f7ffffe, 0x3552644f
+        .long 0x3f7ffffe, 0x35465b9c
+        .long 0x3f7fffff, 0x353afd47
+        .long 0x3f7fffff, 0x3530403c
+        .long 0x3f7fffff, 0x35261be0
+        .long 0x3f7fffff, 0x351c8807
+        .long 0x3f7fffff, 0x35137cf0
+        .long 0x3f7fffff, 0x350af341
+        .long 0x3f7fffff, 0x3502e402
+        .long 0x3f7fffff, 0x34f6912a
+        .long 0x3f7fffff, 0x34e8356b
+        .long 0x3f7fffff, 0x34daa8e4
+        .long 0x3f7fffff, 0x34cde050
+        .long 0x3f7fffff, 0x34c1d100
+        .long 0x3f7fffff, 0x34b670d5
+        .long 0x3f7fffff, 0x34abb639
+        .long 0x3f7fffff, 0x34a19816
+        .long 0x3f7fffff, 0x34980dd1
+        .long 0x3f7fffff, 0x348f0f43
+        .long 0x3f7fffff, 0x348694b3
+        .long 0x3f800000, 0x347d2da8
+        .long 0x3f800000, 0x346e1d72
+        .align 32
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _AbsMask */
+        .align 32
+        .long 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000  /* _MaxThreshold */
+        .align 32
+        .long 0x47800000, 0x47800000, 0x47800000, 0x47800000, 0x47800000, 0x47800000, 0x47800000, 0x47800000  /* _SRound */
+        .align 32
+        .long 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000  /* _U2THreshold  */
+        .align 32
+        .long 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade  /* _poly_3_0 */
+        .align 32
+        .type	__svml_serf_data_internal,@object
+        .size	__svml_serf_data_internal,.-__svml_serf_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_erf2_core.S b/sysdeps/x86_64/fpu/svml_d_erf2_core.S
new file mode 100644
index 0000000000..6ef30af2bd
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_erf2_core.S
@@ -0,0 +1,29 @@
+/* Function erf vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_erf)
+WRAPPER_IMPL_SSE2 erf
+END (_ZGVbN2v_erf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_erf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_erf4_core.S b/sysdeps/x86_64/fpu/svml_d_erf4_core.S
new file mode 100644
index 0000000000..2ca8dfe92e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_erf4_core.S
@@ -0,0 +1,29 @@
+/* Function erf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_erf)
+WRAPPER_IMPL_AVX _ZGVbN2v_erf
+END (_ZGVdN4v_erf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_erf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S
new file mode 100644
index 0000000000..264ff09459
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function erf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_erf)
+WRAPPER_IMPL_AVX _ZGVbN2v_erf
+END (_ZGVcN4v_erf)
diff --git a/sysdeps/x86_64/fpu/svml_d_erf8_core.S b/sysdeps/x86_64/fpu/svml_d_erf8_core.S
new file mode 100644
index 0000000000..de8c2a48bb
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_erf8_core.S
@@ -0,0 +1,25 @@
+/* Function erf vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_erf)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_erf
+END (_ZGVeN8v_erf)
diff --git a/sysdeps/x86_64/fpu/svml_s_erff16_core.S b/sysdeps/x86_64/fpu/svml_s_erff16_core.S
new file mode 100644
index 0000000000..2c5037a0ec
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_erff16_core.S
@@ -0,0 +1,25 @@
+/* Function erff vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_erff)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_erff
+END (_ZGVeN16v_erff)
diff --git a/sysdeps/x86_64/fpu/svml_s_erff4_core.S b/sysdeps/x86_64/fpu/svml_s_erff4_core.S
new file mode 100644
index 0000000000..0f58bb7aaf
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_erff4_core.S
@@ -0,0 +1,29 @@
+/* Function erff vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_erff)
+WRAPPER_IMPL_SSE2 erff
+END (_ZGVbN4v_erff)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_erff)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_erff8_core.S b/sysdeps/x86_64/fpu/svml_s_erff8_core.S
new file mode 100644
index 0000000000..a9f287c420
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_erff8_core.S
@@ -0,0 +1,29 @@
+/* Function erff vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_erff)
+WRAPPER_IMPL_AVX _ZGVbN4v_erff
+END (_ZGVdN8v_erff)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_erff)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S
new file mode 100644
index 0000000000..ca5a8048e8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function erff vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_erff)
+WRAPPER_IMPL_AVX _ZGVbN4v_erff
+END (_ZGVcN8v_erff)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx.c
new file mode 100644
index 0000000000..a2eceefc9b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-erf.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx2.c
new file mode 100644
index 0000000000..a2eceefc9b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-erf.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx512f.c
new file mode 100644
index 0000000000..a2eceefc9b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-erf.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-erf.c b/sysdeps/x86_64/fpu/test-double-libmvec-erf.c
new file mode 100644
index 0000000000..c1ded24b1d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-erf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC erf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index db7ae3e7a6..9d91ccfe51 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVbN2v_acosh)
+VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVbN2v_erf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 269ae38f67..9e86d5fef8 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -46,6 +46,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVdN4v_acosh)
+VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVdN4v_erf)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index d95b960a45..0f4ef00de4 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVcN4v_acosh)
+VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVcN4v_erf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index a22f08b5f8..975dff85af 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
 VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVeN8v_acosh)
+VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVeN8v_erf)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx.c
new file mode 100644
index 0000000000..8cdf4dc069
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-erff.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx2.c
new file mode 100644
index 0000000000..8cdf4dc069
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-erff.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx512f.c
new file mode 100644
index 0000000000..8cdf4dc069
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-erff.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-erff.c b/sysdeps/x86_64/fpu/test-float-libmvec-erff.c
new file mode 100644
index 0000000000..ba83826ab9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-erff.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC erff
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 7982ae2c84..2b1e27391a 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVeN16v_acoshf)
+VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVeN16v_erff)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index bdfcbea2cd..78428bf517 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVbN4v_acoshf)
+VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVbN4v_erff)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index 7b3ba81441..dadd4e6ca0 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -46,6 +46,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVdN8v_acoshf)
+VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVdN8v_erff)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index a13d2e4ca1..7b2d583e54 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
 VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVcN8v_acoshf)
+VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVcN8v_erff)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 17/18] x86-64: Add vector tanh/tanhf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (15 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 16/18] x86-64: Add vector erf/erff " Sunil K Pandey
@ 2021-12-29  6:39 ` Sunil K Pandey
  2021-12-29 21:26   ` H.J. Lu
  2022-01-29  1:33   ` Noah Goldstein
  2021-12-29  6:40 ` [PATCH v5 18/18] x86-64: Add vector asinh/asinhf " Sunil K Pandey
  17 siblings, 2 replies; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector tanh/tanhf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |   11 +
 math/bits/mathcalls.h                         |    2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
 sysdeps/x86/fpu/bits/math-vector.h            |    4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
 sysdeps/x86_64/fpu/Makeconfig                 |    1 +
 sysdeps/x86_64/fpu/Versions                   |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |   15 +
 .../fpu/multiarch/svml_d_tanh2_core-sse2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_tanh2_core.c  |   27 +
 .../fpu/multiarch/svml_d_tanh2_core_sse4.S    | 1272 ++++++++++++++++
 .../fpu/multiarch/svml_d_tanh4_core-sse.S     |   20 +
 .../x86_64/fpu/multiarch/svml_d_tanh4_core.c  |   27 +
 .../fpu/multiarch/svml_d_tanh4_core_avx2.S    | 1279 +++++++++++++++++
 .../fpu/multiarch/svml_d_tanh8_core-avx2.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_tanh8_core.c  |   27 +
 .../fpu/multiarch/svml_d_tanh8_core_avx512.S  |  472 ++++++
 .../fpu/multiarch/svml_s_tanhf16_core-avx2.S  |   20 +
 .../fpu/multiarch/svml_s_tanhf16_core.c       |   28 +
 .../multiarch/svml_s_tanhf16_core_avx512.S    |  381 +++++
 .../fpu/multiarch/svml_s_tanhf4_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_s_tanhf4_core.c |   28 +
 .../fpu/multiarch/svml_s_tanhf4_core_sse4.S   |  832 +++++++++++
 .../fpu/multiarch/svml_s_tanhf8_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_s_tanhf8_core.c |   28 +
 .../fpu/multiarch/svml_s_tanhf8_core_avx2.S   |  844 +++++++++++
 sysdeps/x86_64/fpu/svml_d_tanh2_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_tanh4_core.S        |   29 +
 sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S    |   25 +
 sysdeps/x86_64/fpu/svml_d_tanh8_core.S        |   25 +
 sysdeps/x86_64/fpu/svml_s_tanhf16_core.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_tanhf4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_tanhf8_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S   |   25 +
 .../x86_64/fpu/test-double-libmvec-tanh-avx.c |    1 +
 .../fpu/test-double-libmvec-tanh-avx2.c       |    1 +
 .../fpu/test-double-libmvec-tanh-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-double-libmvec-tanh.c |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-tanhf-avx.c |    1 +
 .../fpu/test-float-libmvec-tanhf-avx2.c       |    1 +
 .../fpu/test-float-libmvec-tanhf-avx512f.c    |    1 +
 sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
 50 files changed, 5647 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 33d480031b..21f1a43232 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -285,4 +285,15 @@
 #define __DECL_SIMD_erff32x
 #define __DECL_SIMD_erff64x
 #define __DECL_SIMD_erff128x
+
+#define __DECL_SIMD_tanh
+#define __DECL_SIMD_tanhf
+#define __DECL_SIMD_tanhl
+#define __DECL_SIMD_tanhf16
+#define __DECL_SIMD_tanhf32
+#define __DECL_SIMD_tanhf64
+#define __DECL_SIMD_tanhf128
+#define __DECL_SIMD_tanhf32x
+#define __DECL_SIMD_tanhf64x
+#define __DECL_SIMD_tanhf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index a5b6c4457f..3d1c2056d5 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -72,7 +72,7 @@ __MATHCALL_VEC (cosh,, (_Mdouble_ __x));
 /* Hyperbolic sine of X.  */
 __MATHCALL_VEC (sinh,, (_Mdouble_ __x));
 /* Hyperbolic tangent of X.  */
-__MATHCALL (tanh,, (_Mdouble_ __x));
+__MATHCALL_VEC (tanh,, (_Mdouble_ __x));
 
 #ifdef __USE_GNU
 /* Cosine and sine of X.  */
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 5525c8a0d6..e178cef683 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -61,6 +61,7 @@ GLIBC_2.35 _ZGVbN2v_log10 F
 GLIBC_2.35 _ZGVbN2v_log1p F
 GLIBC_2.35 _ZGVbN2v_log2 F
 GLIBC_2.35 _ZGVbN2v_sinh F
+GLIBC_2.35 _ZGVbN2v_tanh F
 GLIBC_2.35 _ZGVbN2vv_atan2 F
 GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
@@ -78,6 +79,7 @@ GLIBC_2.35 _ZGVbN4v_log10f F
 GLIBC_2.35 _ZGVbN4v_log1pf F
 GLIBC_2.35 _ZGVbN4v_log2f F
 GLIBC_2.35 _ZGVbN4v_sinhf F
+GLIBC_2.35 _ZGVbN4v_tanhf F
 GLIBC_2.35 _ZGVbN4vv_atan2f F
 GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
@@ -95,6 +97,7 @@ GLIBC_2.35 _ZGVcN4v_log10 F
 GLIBC_2.35 _ZGVcN4v_log1p F
 GLIBC_2.35 _ZGVcN4v_log2 F
 GLIBC_2.35 _ZGVcN4v_sinh F
+GLIBC_2.35 _ZGVcN4v_tanh F
 GLIBC_2.35 _ZGVcN4vv_atan2 F
 GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
@@ -112,6 +115,7 @@ GLIBC_2.35 _ZGVcN8v_log10f F
 GLIBC_2.35 _ZGVcN8v_log1pf F
 GLIBC_2.35 _ZGVcN8v_log2f F
 GLIBC_2.35 _ZGVcN8v_sinhf F
+GLIBC_2.35 _ZGVcN8v_tanhf F
 GLIBC_2.35 _ZGVcN8vv_atan2f F
 GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
@@ -129,6 +133,7 @@ GLIBC_2.35 _ZGVdN4v_log10 F
 GLIBC_2.35 _ZGVdN4v_log1p F
 GLIBC_2.35 _ZGVdN4v_log2 F
 GLIBC_2.35 _ZGVdN4v_sinh F
+GLIBC_2.35 _ZGVdN4v_tanh F
 GLIBC_2.35 _ZGVdN4vv_atan2 F
 GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
@@ -146,6 +151,7 @@ GLIBC_2.35 _ZGVdN8v_log10f F
 GLIBC_2.35 _ZGVdN8v_log1pf F
 GLIBC_2.35 _ZGVdN8v_log2f F
 GLIBC_2.35 _ZGVdN8v_sinhf F
+GLIBC_2.35 _ZGVdN8v_tanhf F
 GLIBC_2.35 _ZGVdN8vv_atan2f F
 GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
@@ -163,6 +169,7 @@ GLIBC_2.35 _ZGVeN16v_log10f F
 GLIBC_2.35 _ZGVeN16v_log1pf F
 GLIBC_2.35 _ZGVeN16v_log2f F
 GLIBC_2.35 _ZGVeN16v_sinhf F
+GLIBC_2.35 _ZGVeN16v_tanhf F
 GLIBC_2.35 _ZGVeN16vv_atan2f F
 GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
@@ -180,5 +187,6 @@ GLIBC_2.35 _ZGVeN8v_log10 F
 GLIBC_2.35 _ZGVeN8v_log1p F
 GLIBC_2.35 _ZGVeN8v_log2 F
 GLIBC_2.35 _ZGVeN8v_sinh F
+GLIBC_2.35 _ZGVeN8v_tanh F
 GLIBC_2.35 _ZGVeN8vv_atan2 F
 GLIBC_2.35 _ZGVeN8vv_hypot F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index ea0deb31c1..3c657f6108 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -126,6 +126,10 @@
 #  define __DECL_SIMD_erf __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_erff
 #  define __DECL_SIMD_erff __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_tanh
+#  define __DECL_SIMD_tanh __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_tanhf
+#  define __DECL_SIMD_tanhf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index 42addd9a25..c7f81945fe 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -62,6 +62,8 @@
 !GCC$ builtin (acoshf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (erf) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (erff) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (tanh) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (tanhf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -109,3 +111,5 @@
 !GCC$ builtin (acoshf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (erf) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (erff) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (tanh) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (tanhf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 2b89a1bba3..26df8d47bf 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -45,6 +45,7 @@ libmvec-funcs = \
   sin \
   sincos \
   sinh \
+  tanh \
 
 # Define libmvec function for benchtests directory.
 libmvec-bench-funcs = \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 2fcdef6944..adcbe0fefb 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -29,6 +29,7 @@ libmvec {
     _ZGVbN2v_log1p; _ZGVcN4v_log1p; _ZGVdN4v_log1p; _ZGVeN8v_log1p;
     _ZGVbN2v_log2; _ZGVcN4v_log2; _ZGVdN4v_log2; _ZGVeN8v_log2;
     _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
+    _ZGVbN2v_tanh; _ZGVcN4v_tanh; _ZGVdN4v_tanh; _ZGVeN8v_tanh;
     _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
     _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
@@ -46,6 +47,7 @@ libmvec {
     _ZGVbN4v_log1pf; _ZGVcN8v_log1pf; _ZGVdN8v_log1pf; _ZGVeN16v_log1pf;
     _ZGVbN4v_log2f; _ZGVcN8v_log2f; _ZGVdN8v_log2f; _ZGVeN16v_log2f;
     _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
+    _ZGVbN4v_tanhf; _ZGVcN8v_tanhf; _ZGVdN8v_tanhf; _ZGVeN16v_tanhf;
     _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
     _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
   }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 929de0e786..bfaad7acef 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -2067,6 +2067,21 @@ float: 3
 float128: 3
 ldouble: 4
 
+Function: "tanh_vlen16":
+float: 1
+
+Function: "tanh_vlen2":
+double: 1
+
+Function: "tanh_vlen4":
+double: 1
+
+Function: "tanh_vlen4_avx2":
+double: 1
+
+Function: "tanh_vlen8":
+double: 1
+
 Function: "tgamma":
 double: 9
 float: 8
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S
new file mode 100644
index 0000000000..35b065fe55
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized tanh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_tanh _ZGVbN2v_tanh_sse2
+#include "../svml_d_tanh2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c
new file mode 100644
index 0000000000..d2e63bdc56
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized tanh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_tanh
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_tanh, __GI__ZGVbN2v_tanh, __redirect__ZGVbN2v_tanh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S
new file mode 100644
index 0000000000..35bbb5b04c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S
@@ -0,0 +1,1272 @@
+/* Function tanh vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   NOTE: Since the hyperbolic tangent function is odd
+ *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
+ *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
+ *
+ *   We use a table lookup method to compute tanh(|x|).
+ *   The basic idea is to split the input range into a number of subintervals
+ *   and to approximate tanh(.) with a polynomial on each of them.
+ *
+ *   IEEE SPECIAL CONDITIONS:
+ *   x = [+,-]0, r = [+,-]0
+ *   x = +Inf,   r = +1
+ *   x = -Inf,   r = -1
+ *   x = QNaN,   r = QNaN
+ *   x = SNaN,   r = QNaN
+ *
+ *
+ *   ALGORITHM DETAILS
+ *   We handle special values in a callout function, aside from main path
+ *   computations. "Special" for this algorithm are:
+ *   INF, NAN, |x| > HUGE_THRESHOLD
+ *
+ *
+ *   Main path computations are organized as follows:
+ *   Actually we split the interval [0, SATURATION_THRESHOLD)
+ *   into a number of subintervals.  On each subinterval we approximate tanh(.)
+ *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
+ *   are computed beforehand and stored in table. We also use
+ *
+ *       y := |x| + B,
+ *
+ *   here B depends on subinterval and is used to make argument
+ *   closer to zero.
+ *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
+ *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
+ *   preserve main path computation logic but return 1.0 for all arguments.
+ *
+ *   Hence reconstruction looks as follows:
+ *   we extract proper polynomial and range reduction coefficients
+ *        (Pj and B), corresponding to subinterval, to which |x| belongs,
+ *        and return
+ *
+ *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
+ *
+ *   NOTE: we use multiprecision technique to multiply and sum the first
+ *         K terms of the polynomial. So Pj, j = 0..K are stored in
+ *         table each as a pair of target precision numbers (Pj and PLj) to
+ *         achieve wider than target precision.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dtanh_data_internal
+ */
+#define _dbP                          	0
+#define _dbSignMask                   	7680
+#define _dbAbsMask                    	7696
+#define _iExpMantMask                 	7712
+#define _iExpMask                     	7728
+#define _iMinIdxOfsMask               	7744
+#define _iMaxIdxMask                  	7760
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_tanh_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm13
+        movq      _iExpMantMask+__svml_dtanh_data_internal(%rip), %xmm14
+        lea       _dbP+96+__svml_dtanh_data_internal(%rip), %rsi
+        pshufd    $221, %xmm13, %xmm8
+
+/* if VMIN, VMAX is defined for I type */
+        pxor      %xmm10, %xmm10
+        movq      _iMinIdxOfsMask+__svml_dtanh_data_internal(%rip), %xmm9
+
+/* Here huge arguments, INF and NaNs are filtered out to callout. */
+        pand      %xmm14, %xmm8
+        movdqa    %xmm8, %xmm11
+        psubd     %xmm9, %xmm8
+        movq      _iMaxIdxMask+__svml_dtanh_data_internal(%rip), %xmm5
+        movdqa    %xmm8, %xmm6
+        movdqa    %xmm8, %xmm7
+        pcmpgtd   %xmm5, %xmm6
+        pcmpgtd   %xmm10, %xmm7
+        movdqa    %xmm6, %xmm3
+        pand      %xmm7, %xmm8
+        andps     %xmm6, %xmm5
+        andnps    %xmm8, %xmm3
+        orps      %xmm5, %xmm3
+
+/*
+ * VSHRIMM( I, iIndex, = iIndex, (17 - 4) );
+ * VGATHER_MATRIX( L2D, p, TAB._dbP, iIndex, 0, T_ITEM_SIZE, T_ITEM_GRAN, 13, 0, 0 );
+ */
+        psrld     $10, %xmm3
+        movd      %xmm3, %eax
+        pshufd    $1, %xmm3, %xmm4
+
+/*  Constant loading  */
+        movq      _iExpMask+__svml_dtanh_data_internal(%rip), %xmm15
+        movd      %xmm4, %ecx
+        pcmpgtd   %xmm15, %xmm11
+        movmskps  %xmm11, %edx
+        movups    _dbAbsMask+__svml_dtanh_data_internal(%rip), %xmm0
+        movups    _dbSignMask+__svml_dtanh_data_internal(%rip), %xmm12
+        andps     %xmm13, %xmm0
+        movslq    %eax, %rax
+        andps     %xmm13, %xmm12
+        movslq    %ecx, %rcx
+        movups    %xmm13, (%rsp)
+        movups    -96(%rax,%rsi), %xmm11
+        movups    -96(%rcx,%rsi), %xmm2
+        movups    -80(%rax,%rsi), %xmm9
+        movups    -48(%rax,%rsi), %xmm5
+        movaps    %xmm9, %xmm10
+        movups    -32(%rax,%rsi), %xmm3
+        movaps    %xmm5, %xmm6
+        movaps    %xmm3, %xmm4
+        unpckhpd  %xmm2, %xmm11
+        movups    -80(%rcx,%rsi), %xmm13
+        movups    -48(%rcx,%rsi), %xmm15
+        movups    -32(%rcx,%rsi), %xmm1
+        movups    -64(%rax,%rsi), %xmm7
+        movups    -16(%rax,%rsi), %xmm2
+        movaps    %xmm7, %xmm8
+        unpcklpd  %xmm13, %xmm10
+        unpckhpd  %xmm13, %xmm9
+        movups    -64(%rcx,%rsi), %xmm14
+        movups    -16(%rcx,%rsi), %xmm13
+        unpcklpd  %xmm15, %xmm6
+        unpckhpd  %xmm15, %xmm5
+        unpcklpd  %xmm1, %xmm4
+        unpckhpd  %xmm1, %xmm3
+        movaps    %xmm2, %xmm1
+        movups    (%rax,%rsi), %xmm15
+        unpcklpd  %xmm14, %xmm8
+        unpckhpd  %xmm14, %xmm7
+        unpcklpd  %xmm13, %xmm1
+        unpckhpd  %xmm13, %xmm2
+        movaps    %xmm15, %xmm13
+        movups    (%rcx,%rsi), %xmm14
+        unpcklpd  %xmm14, %xmm13
+        addpd     %xmm13, %xmm0
+        mulpd     %xmm0, %xmm2
+        addpd     %xmm1, %xmm2
+        mulpd     %xmm0, %xmm2
+        addpd     %xmm3, %xmm2
+        mulpd     %xmm0, %xmm2
+        addpd     %xmm4, %xmm2
+        mulpd     %xmm0, %xmm2
+        addpd     %xmm5, %xmm2
+        mulpd     %xmm0, %xmm2
+        addpd     %xmm6, %xmm2
+        mulpd     %xmm0, %xmm2
+        addpd     %xmm7, %xmm2
+        mulpd     %xmm0, %xmm2
+        addpd     %xmm8, %xmm2
+        mulpd     %xmm0, %xmm2
+        addpd     %xmm9, %xmm2
+        mulpd     %xmm0, %xmm2
+        addpd     %xmm10, %xmm2
+        mulpd     %xmm2, %xmm0
+        addpd     %xmm11, %xmm0
+        orps      %xmm12, %xmm0
+        andl      $3, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    (%rsp), %xmm1
+        movups    %xmm1, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      tanh@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN2v_tanh_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dtanh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _dbP[60*16][2];
+        __declspec(align(16)) VUINT32 _dbSignMask[2][2];
+        __declspec(align(16)) VUINT32 _dbAbsMask[2][2];
+        __declspec(align(16)) VUINT32 _iExpMantMask[4][1];
+        __declspec(align(16)) VUINT32 _iExpMask[4][1];
+        __declspec(align(16)) VUINT32 _iMinIdxOfsMask[4][1];
+        __declspec(align(16)) VUINT32 _iMaxIdxMask[4][1];
+} __svml_dtanh_data_internal;
+#endif
+__svml_dtanh_data_internal:
+        /* Polynomial coefficients */
+        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* PH0 = +0.000000000000000000000e-01 */
+        .quad 0x3FF0000000000000   /* P1  = +1.000000000000000014103e+00 */
+        .quad 0xBD197DEAD79668D3   /* P2  = -2.264132406596103056796e-14 */
+        .quad 0xBFD555555553AF3C   /* P3  = -3.333333333273349741024e-01 */
+        .quad 0xBE052F7CCA134846   /* P4  = -6.165791385711493738399e-10 */
+        .quad 0x3FC11111563849D6   /* P5  = +1.333333655353061107201e-01 */
+        .quad 0xBEB038623673FFB2   /* P6  = -9.668021563879858950855e-07 */
+        .quad 0xBFAB9F685E64022E   /* P7  = -5.395055916051593179252e-02 */
+        .quad 0xBF2A54E2B28F2207   /* P8  = -2.008940439550829012647e-04 */
+        .quad 0x3F97CFB9328A230E   /* P9  = +2.325333949059698582189e-02 */
+        .quad 0xBF75CA6D61723E02   /* P10 = -5.320002811586290441790e-03 */
+        .quad 0x0000000000000000   /* B = +0        */
+        .quad 0x3FF0000000000000   /* A = +1.0      */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C3708A564FAD29A   /* PL0 = +1.248663375337163807466e-18 */
+        .quad 0x3FC0E6973998DA48   /* PH0 = +1.320370703922029154143e-01 */
+        .quad 0x3FEF712EB25C0888   /* P1  = +9.825662120422444519229e-01 */
+        .quad 0xBFC09B296F7C1EA9   /* P2  = -1.297351641044220078331e-01 */
+        .quad 0xBFD3DD77541EDDA7   /* P3  = -3.103922196855485849143e-01 */
+        .quad 0x3FB58FFCF4309615   /* P4  = +8.422833406128689275566e-02 */
+        .quad 0x3FBD3ABE845DCF49   /* P5  = +1.141776154670967208833e-01 */
+        .quad 0xBFA791DF538C37FA   /* P6  = -4.603479285115947936529e-02 */
+        .quad 0xBFA4F872F69CD6E8   /* P7  = -4.095801601799370195284e-02 */
+        .quad 0x3F9772E49EF6412B   /* P8  = +2.289921970583567527179e-02 */
+        .quad 0x3F8CBC0807393909   /* P9  = +1.403051635784581776625e-02 */
+        .quad 0xBF85F06A30F93319   /* P10 = -1.071246110873285040939e-02 */
+        .quad 0xBFC1000000000000   /* B = -.132813 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6004EE5739DEAC   /* PL0 = +6.947247374112211856530e-18 */
+        .quad 0x3FC2DC968E6E0D62   /* PH0 = +1.473568149050193398786e-01 */
+        .quad 0x3FEF4E1E606D96DF   /* P1  = +9.782859691010478680677e-01 */
+        .quad 0xBFC273BD70994AB9   /* P2  = -1.441571044730005866646e-01 */
+        .quad 0xBFD382B548270D2C   /* P3  = -3.048527912726111386771e-01 */
+        .quad 0x3FB7CD2D582A6B29   /* P4  = +9.297450449450351894400e-02 */
+        .quad 0x3FBC1278CCCBF0DB   /* P5  = +1.096568584434324642303e-01 */
+        .quad 0xBFA9C7F5115B86A1   /* P6  = -5.035367810138536095866e-02 */
+        .quad 0xBFA371C21BAF618E   /* P7  = -3.797728145554222910481e-02 */
+        .quad 0x3F9958943F68417E   /* P8  = +2.475196492201935923783e-02 */
+        .quad 0x3F8930D5CFFD4152   /* P9  = +1.230017701132682667572e-02 */
+        .quad 0xBF875CF7ADD31B76   /* P10 = -1.140779017658897660092e-02 */
+        .quad 0xBFC3000000000000   /* B = -.148438 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7EABE24E052A1F   /* PL0 = +2.660321779421749543501e-17 */
+        .quad 0x3FC4D04783618C71   /* PH0 = +1.626061812886266111366e-01 */
+        .quad 0x3FEF2765AF97A4B3   /* P1  = +9.735592298067302883212e-01 */
+        .quad 0xBFC443654205FEA5   /* P2  = -1.583067486171689074207e-01 */
+        .quad 0xBFD31F2E208A5B97   /* P3  = -2.987780874040536844467e-01 */
+        .quad 0x3FB9F235BD339878   /* P4  = +1.013520800512156573576e-01 */
+        .quad 0x3FBAD0B0DFCCA141   /* P5  = +1.047468706498238100104e-01 */
+        .quad 0xBFABD1B9600E608E   /* P6  = -5.433444306908184548967e-02 */
+        .quad 0xBFA1CEBEAF07DB58   /* P7  = -3.478046309094534453598e-02 */
+        .quad 0x3F9AFC9FB1D8EFD2   /* P8  = +2.635430834764902126383e-02 */
+        .quad 0x3F8573444F1AB502   /* P9  = +1.047376028449287564018e-02 */
+        .quad 0xBF8874FBC8F24406   /* P10 = -1.194187838544459322219e-02 */
+        .quad 0xBFC5000000000000   /* B = -.164063 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7FB199D361A790   /* PL0 = +2.748994907060158996213e-17 */
+        .quad 0x3FC6C170259E21F7   /* PH0 = +1.777782615356639783766e-01 */
+        .quad 0x3FEEFD17479F7C65   /* P1  = +9.683948897253570478266e-01 */
+        .quad 0xBFC609530FE4DF8D   /* P2  = -1.721595599753950294577e-01 */
+        .quad 0xBFD2B3465D71B4DE   /* P3  = -2.921920692959484052676e-01 */
+        .quad 0x3FBBFD2D34AC509B   /* P4  = +1.093319181057403192166e-01 */
+        .quad 0x3FB9778C3C16A0FE   /* P5  = +9.948040453912551395183e-02 */
+        .quad 0xBFADAC4D9E63C665   /* P6  = -5.795519407719210697372e-02 */
+        .quad 0xBFA0139CCAD02D60   /* P7  = -3.139963126894929339124e-02 */
+        .quad 0x3F9C5BF43BA6F19D   /* P8  = +2.769452680671379432854e-02 */
+        .quad 0x3F8190B703350341   /* P9  = +8.576803002712575184772e-03 */
+        .quad 0xBF8936606782858A   /* P10 = -1.231074634444230850234e-02 */
+        .quad 0xBFC7000000000000   /* B = -.179688 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6A917CA3624D50   /* PL0 = +1.152216693509785660691e-17 */
+        .quad 0x3FC8AFD7B974FABB   /* PH0 = +1.928662925292508878439e-01 */
+        .quad 0x3FEECF47624A5D03   /* P1  = +9.628025932060214187231e-01 */
+        .quad 0xBFC7C4C2CB4FDE4D   /* P2  = -1.856921665891938814679e-01 */
+        .quad 0xBFD23F69CB2C1F9D   /* P3  = -2.851204380135586155453e-01 */
+        .quad 0x3FBDEC5703A03814   /* P4  = +1.168875106670557712458e-01 */
+        .quad 0x3FB8095003D0CF15   /* P5  = +9.389209836154706616487e-02 */
+        .quad 0xBFAF554B47B10CBB   /* P6  = -6.119761705533607365968e-02 */
+        .quad 0xBF9C89743FE7BC1B   /* P7  = -2.786809577986213853937e-02 */
+        .quad 0x3F9D74725B746E7C   /* P8  = +2.876452143855921824991e-02 */
+        .quad 0x3F7B2D8AFB70B88C   /* P9  = +6.635229968237631511880e-03 */
+        .quad 0xBF89A0A2883EF6CB   /* P10 = -1.251341799058582545252e-02 */
+        .quad 0xBFC9000000000000   /* B = -.195313 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7608279E8609CB   /* PL0 = +1.910958764623660748269e-17 */
+        .quad 0x3FCA9B46D2DDC5E3   /* PH0 = +2.078636674519166172015e-01 */
+        .quad 0x3FEE9E0BB72A01A1   /* P1  = +9.567926957534390123919e-01 */
+        .quad 0xBFC974FAD10C5330   /* P2  = -1.988824387305156976885e-01 */
+        .quad 0xBFD1C40ACCBA4044   /* P3  = -2.775904654781735703430e-01 */
+        .quad 0x3FBFBE24E2987853   /* P4  = +1.239951184474830487522e-01 */
+        .quad 0x3FB6885B4345E47F   /* P5  = +8.801813499839460539687e-02 */
+        .quad 0xBFB06563D5670584   /* P6  = -6.404708824176991770896e-02 */
+        .quad 0xBF98CD1D620DF6E2   /* P7  = -2.421995078065365147772e-02 */
+        .quad 0x3F9E44EF3E844D21   /* P8  = +2.955983943054463683119e-02 */
+        .quad 0x3F7325FA0148CAAE   /* P9  = +4.674889165971292322643e-03 */
+        .quad 0xBF89B4C8556C2D92   /* P10 = -1.255184660614964011319e-02 */
+        .quad 0xBFCB000000000000   /* B = -.210938 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6F19DAA20F51D5   /* PL0 = +1.348790537832000351176e-17 */
+        .quad 0x3FCC83876CA98E15   /* PH0 = +2.227639465883021474557e-01 */
+        .quad 0x3FEE697B662D07CD   /* P1  = +9.503762241004040620296e-01 */
+        .quad 0xBFCB194C7ED76ACF   /* P2  = -2.117095584242946953999e-01 */
+        .quad 0xBFD141A19E419762   /* P3  = -2.696308179350720680191e-01 */
+        .quad 0x3FC0B89C64BC7B98   /* P4  = +1.306338779331468503007e-01 */
+        .quad 0x3FB4F721150BBFC5   /* P5  = +8.189589275184434216748e-02 */
+        .quad 0xBFB105AAFAB87898   /* P6  = -6.649273511036069461061e-02 */
+        .quad 0xBF94FB3B31248C01   /* P7  = -2.048962104266749732921e-02 */
+        .quad 0x3F9ECD31E588709C   /* P8  = +3.007963145692880855964e-02 */
+        .quad 0x3F664A91A335C105   /* P9  = +2.721104095762541127495e-03 */
+        .quad 0xBF89754E32E1E26E   /* P10 = -1.243077366619723806134e-02 */
+        .quad 0xBFCD000000000000   /* B = -.226563 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6AC6C889D8111D   /* PL0 = +1.161245469312620769170e-17 */
+        .quad 0x3FCE6864FE55A3D0   /* PH0 = +2.375608674877001114112e-01 */
+        .quad 0x3FEE31AEE116B82B   /* P1  = +9.435648342384913826391e-01 */
+        .quad 0xBFCCB114B69E808B   /* P2  = -2.241540805525839833707e-01 */
+        .quad 0xBFD0B8AB913BA99D   /* P3  = -2.612713735858507980441e-01 */
+        .quad 0x3FC1823322BED48A   /* P4  = +1.367858810096190233514e-01 */
+        .quad 0x3FB35822B7929893   /* P5  = +7.556359273675842651653e-02 */
+        .quad 0xBFB18B03CC78D2DA   /* P6  = -6.852744810096158580830e-02 */
+        .quad 0xBF911CCC3C8D5E5D   /* P7  = -1.671141738492420009734e-02 */
+        .quad 0x3F9F0DEC2D99B12F   /* P8  = +3.032654789278515819797e-02 */
+        .quad 0x3F4A28398B4EBD98   /* P9  = +7.982521989244205404918e-04 */
+        .quad 0xBF88E60CB2FAB9A4   /* P10 = -1.215753480150000985458e-02 */
+        .quad 0xBFCF000000000000   /* B = -.242188 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C89D2B6774FB61D   /* PL0 = +4.479593208720169247958e-17 */
+        .quad 0x3FD09C744F539BE4   /* PH0 = +2.595492148088267558848e-01 */
+        .quad 0x3FEDD823B0400D42   /* P1  = +9.326342050921214825882e-01 */
+        .quad 0xBFCEFBF7FF305FCC   /* P2  = -2.420644756355144687086e-01 */
+        .quad 0xBFCFC01DC4F24A41   /* P3  = -2.480504237797323303990e-01 */
+        .quad 0x3FC291A2C26D5548   /* P4  = +1.450694512701977626753e-01 */
+        .quad 0x3FB0D562E672D188   /* P5  = +6.575601698097532991976e-02 */
+        .quad 0xBFB2201ECC119E06   /* P6  = -7.080261690281738261872e-02 */
+        .quad 0xBF8695D50F778D31   /* P7  = -1.102796987010509974642e-02 */
+        .quad 0x3F9EEC8CFBC031A0   /* P8  = +3.019924437107734972427e-02 */
+        .quad 0xBF6030F0A4D3660A   /* P9  = -1.976461417694923328722e-03 */
+        .quad 0xBF87845288A4AEF5   /* P10 = -1.148285369398347838494e-02 */
+        .quad 0xBFD1000000000000   /* B = -.265625 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8B6AAB614D1C8D   /* PL0 = +4.756035418366735312727e-17 */
+        .quad 0x3FD275F7E1CF7F63   /* PH0 = +2.884502129727392616410e-01 */
+        .quad 0x3FED56658F74C9CC   /* P1  = +9.167964746359813351341e-01 */
+        .quad 0xBFD0ECC045EBD596   /* P2  = -2.644501383614054083635e-01 */
+        .quad 0xBFCD5A4BDE179180   /* P3  = -2.293181261476426808811e-01 */
+        .quad 0x3FC3C00047D34767   /* P4  = +1.542969084462655120552e-01 */
+        .quad 0x3FAAC7CE84FD609F   /* P5  = +5.230565427217581251974e-02 */
+        .quad 0xBFB288948D2E8B43   /* P6  = -7.239654967137902384931e-02 */
+        .quad 0xBF6D6605AAD5A1C0   /* P7  = -3.588687008847041164896e-03 */
+        .quad 0x3F9DDB0790848E97   /* P8  = +2.915584392134337382866e-02 */
+        .quad 0xBF75FDE291BAD5B4   /* P9  = -5.369076763306269573660e-03 */
+        .quad 0xBF84CEA5C52E0A78   /* P10 = -1.015977390284671071888e-02 */
+        .quad 0xBFD3000000000000   /* B = -.296875 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7139A81C8A6ECF   /* PL0 = +1.494049799478574591322e-17 */
+        .quad 0x3FD4470650036407   /* PH0 = +3.168350011233659890841e-01 */
+        .quad 0x3FECC9A69DFDDD48   /* P1  = +8.996155820631566629678e-01 */
+        .quad 0xBFD23DED3A37A09F   /* P2  = -2.850297039535778028925e-01 */
+        .quad 0xBFCAD302395D51C1   /* P3  = -2.095644741153943890185e-01 */
+        .quad 0x3FC4A8FE3F309C22   /* P4  = +1.614072617096278705115e-01 */
+        .quad 0x3FA3D161188AA436   /* P5  = +3.870681213931741151586e-02 */
+        .quad 0xBFB288CFE5494E98   /* P6  = -7.240008685885823969403e-02 */
+        .quad 0x3F6C7903EED8D334   /* P7  = +3.475673371918475361081e-03 */
+        .quad 0x3F9BE023CDFB02F6   /* P8  = +2.722221321778569498033e-02 */
+        .quad 0xBF80F8296F2C3A95   /* P9  = -8.285831170295390358336e-03 */
+        .quad 0xBF8152DF4790049B   /* P10 = -8.458847400108650973189e-03 */
+        .quad 0xBFD5000000000000   /* B = -.328125 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7751FE0FEE8335   /* PL0 = +2.022712113430213599928e-17 */
+        .quad 0x3FD60EF7120502A9   /* PH0 = +3.446633983585721261456e-01 */
+        .quad 0x3FEC32D951E56E6F   /* P1  = +8.812071418319202070776e-01 */
+        .quad 0xBFD370255FC004F8   /* P2  = -3.037198481616338996824e-01 */
+        .quad 0xBFC832F0EBC6BB41   /* P3  = -1.890545989276351359107e-01 */
+        .quad 0x3FC54C99A0FF432F   /* P4  = +1.664001499289269127540e-01 */
+        .quad 0x3F99DAC0CC283C18   /* P5  = +2.524853941036661688369e-02 */
+        .quad 0xBFB227B3896A026D   /* P6  = -7.091829399906553280461e-02 */
+        .quad 0x3F84663364E1FB19   /* P7  = +9.960557476231411602383e-03 */
+        .quad 0x3F9922D70DE07C57   /* P8  = +2.454696676442965935283e-02 */
+        .quad 0xBF85C4A4EB6F86BC   /* P9  = -1.062897532932837635222e-02 */
+        .quad 0xBF7AAB61214FFE17   /* P10 = -6.511096396024671890972e-03 */
+        .quad 0xBFD7000000000000   /* B = -.359375 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3BFE67F266843B2C   /* PL0 = +1.030196791298162288777e-19 */
+        .quad 0x3FD7CD3115FC0F16   /* PH0 = +3.718989100163850869407e-01 */
+        .quad 0x3FEB92F96CCC2C5B   /* P1  = +8.616912007286247079761e-01 */
+        .quad 0xBFD4827320135092   /* P2  = -3.204620183216856200247e-01 */
+        .quad 0xBFC582B15550168A   /* P3  = -1.680509249273891977521e-01 */
+        .quad 0x3FC5AC3B9A2E4C31   /* P4  = +1.693186285816366254244e-01 */
+        .quad 0x3F88FA599FCADAFB   /* P5  = +1.219625491044728129762e-02 */
+        .quad 0xBFB16EC8F5CA169E   /* P6  = -6.809669495313605642174e-02 */
+        .quad 0x3F90140EFC748BBE   /* P7  = +1.570151725639922719844e-02 */
+        .quad 0x3F95CFC49C1A28DC   /* P8  = +2.130038454792147768770e-02 */
+        .quad 0xBF8946ED8B1BF454   /* P9  = -1.234231549050882816697e-02 */
+        .quad 0xBF7239E55C1DD50F   /* P10 = -4.449745117985472755606e-03 */
+        .quad 0xBFD9000000000000   /* B = -.390625 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6412330191189C   /* PL0 = +8.704448096175471149661e-18 */
+        .quad 0x3FD9812B3B03F0A5   /* PH0 = +3.985088421175169703936e-01 */
+        .quad 0x3FEAEB08C3C0E84D   /* P1  = +8.411907027541559254748e-01 */
+        .quad 0xBFD57446B1BC46CF   /* P2  = -3.352219329545790787820e-01 */
+        .quad 0xBFC2CA9ABC0444AD   /* P3  = -1.468079965639267634401e-01 */
+        .quad 0x3FC5CA95F9460D18   /* P4  = +1.702449290424759093710e-01 */
+        .quad 0xBF2C2DAA35DD05C3   /* P5  = -2.149839664813813012186e-04 */
+        .quad 0xBFB069A516EEB75D   /* P6  = -6.411201295733578195472e-02 */
+        .quad 0x3F9512716416FDC7   /* P7  = +2.057816670798986720058e-02 */
+        .quad 0x3F921630CB1319A3   /* P8  = +1.766277541607908852593e-02 */
+        .quad 0xBF8B76DA2EC99526   /* P9  = -1.341028647693549562145e-02 */
+        .quad 0xBF63A97474A161E4   /* P10 = -2.400138332671485493040e-03 */
+        .quad 0xBFDB000000000000   /* B = -.421875 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C89B79F5783381C   /* PL0 = +4.461236087774530799537e-17 */
+        .quad 0x3FDB2A6C993B829D   /* PH0 = +4.244643684778937609003e-01 */
+        .quad 0x3FEA3C0C1FBA328C   /* P1  = +8.198299998926627915155e-01 */
+        .quad 0xBFD6457212F78DE0   /* P2  = -3.479886231636708581604e-01 */
+        .quad 0xBFC0129BDA380A66   /* P3  = -1.255678954622282824818e-01 */
+        .quad 0x3FC5AB77F388FBDE   /* P4  = +1.692953051696965507089e-01 */
+        .quad 0xBF8822F3A6CADB7C   /* P5  = -1.178541519889874597783e-02 */
+        .quad 0xBFAE4A876370A4BD   /* P6  = -5.916236008517603590739e-02 */
+        .quad 0x3F991A89BC3B7710   /* P7  = +2.451529704455085335710e-02 */
+        .quad 0x3F8C4A4328204D4B   /* P8  = +1.381351915555364098800e-02 */
+        .quad 0xBF8C5F921D01EC0B   /* P9  = -1.385416174911393178490e-02 */
+        .quad 0xBF3EE844C5B79FB8   /* P10 = -4.716079617694784908234e-04 */
+        .quad 0xBFDD000000000000   /* B = -.453125 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C73FA437AD7AD87   /* PL0 = +1.732779905745858845932e-17 */
+        .quad 0x3FDCC88C9902CF45   /* PH0 = +4.497405523536495697279e-01 */
+        .quad 0x3FE9870845162D1D   /* P1  = +7.977334355686341748810e-01 */
+        .quad 0xBFD6F62358F73DA8   /* P2  = -3.587730759436120677668e-01 */
+        .quad 0xBFBAC4345D675FE1   /* P3  = -1.045563438450467661101e-01 */
+        .quad 0x3FC5539DA8287019   /* P4  = +1.666142531474868131862e-01 */
+        .quad 0xBF96E3E0DC04A09F   /* P5  = -2.235366194614185212822e-02 */
+        .quad 0xBFAB5EC7147C207D   /* P6  = -5.345747113284546871398e-02 */
+        .quad 0x3F9C24166FFA7A58   /* P7  = +2.748141344511120915667e-02 */
+        .quad 0x3F8451B907819844   /* P8  = +9.921498815128277696693e-03 */
+        .quad 0xBF8C1C6D19191FCB   /* P9  = -1.372609360545586670239e-02 */
+        .quad 0x3F547372DF72E35A   /* P10 = +1.248228245272117756098e-03 */
+        .quad 0xBFDF000000000000   /* B = -.484375 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C848FE06EE49950   /* PL0 = +3.566941590788961528958e-17 */
+        .quad 0x3FDF20211A36475D   /* PH0 = +4.863360172249622803697e-01 */
+        .quad 0x3FE86E67E6B80AC2   /* P1  = +7.634772783497611574659e-01 */
+        .quad 0xBFD7C37C55474D9B   /* P2  = -3.713064987943767913461e-01 */
+        .quad 0xBFB2EBF15F3CB036   /* P3  = -7.391270232318521952684e-02 */
+        .quad 0x3FC4718C8EF6E3AA   /* P4  = +1.597152422016539530950e-01 */
+        .quad 0xBFA277F8394E9B07   /* P5  = -3.607154559658991932071e-02 */
+        .quad 0xBFA680312AB207E3   /* P6  = -4.394677778419955009224e-02 */
+        .quad 0x3F9EDC9A8B57E286   /* P7  = +3.013841128810892143223e-02 */
+        .quad 0x3F71B8C5E648EAF6   /* P8  = +4.326603932492947851719e-03 */
+        .quad 0xBF89DB218356730C   /* P9  = -1.262499029217558458029e-02 */
+        .quad 0x3F6B05728E6EBC8E   /* P10 = +3.298496001171330815865e-03 */
+        .quad 0xBFE1000000000000   /* B = -.53125  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8429831EDD94DE   /* PL0 = +3.497576705878673192147e-17 */
+        .quad 0x3FE10AF47E0BF610   /* PH0 = +5.325872861719194162333e-01 */
+        .quad 0x3FE6EC5879F87EEE   /* P1  = +7.163507826080299761242e-01 */
+        .quad 0xBFD86AD001BFE200   /* P2  = -3.815193192563413204129e-01 */
+        .quad 0xBFA239045B661385   /* P3  = -3.559125533778398983564e-02 */
+        .quad 0x3FC2B4572D9CC147   /* P4  = +1.461285565105845078038e-01 */
+        .quad 0xBFA99F4F01740705   /* P5  = -5.004355328311586406115e-02 */
+        .quad 0xBF9F449C484F4879   /* P6  = -3.053516570418721511214e-02 */
+        .quad 0x3F9F5F42169D7DDE   /* P7  = +3.063681853325116830798e-02 */
+        .quad 0xBF6111B1BA632A97   /* P8  = -2.083632588527460989469e-03 */
+        .quad 0xBF84725FBE5B6E61   /* P9  = -9.983776089419639342530e-03 */
+        .quad 0x3F7438A2986CFA9C   /* P10 = +4.936823976832951342488e-03 */
+        .quad 0xBFE3000000000000   /* B = -.59375  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6BE9160BFB3505   /* PL0 = +1.210424670976053242391e-17 */
+        .quad 0x3FE26D76F73233C7   /* PH0 = +5.758623912857893101247e-01 */
+        .quad 0x3FE56363B5B93937   /* P1  = +6.683825063026124740752e-01 */
+        .quad 0xBFD8A2244B27297E   /* P2  = -3.848963483730115724200e-01 */
+        .quad 0xBF52CA2F101EEF63   /* P3  = -1.146837196286797844817e-03 */
+        .quad 0x3FC081BC342243AD   /* P4  = +1.289592032012739958675e-01 */
+        .quad 0xBFAE38DB4A932344   /* P5  = -5.902753148399722719732e-02 */
+        .quad 0xBF91F814D4AE90C6   /* P6  = -1.754791782481459457885e-02 */
+        .quad 0x3F9D056AE193C4F3   /* P7  = +2.834097863973723355792e-02 */
+        .quad 0xBF7BD0B502D8F3A0   /* P8  = -6.790835451792626336974e-03 */
+        .quad 0xBF7B763F7BB8AE2F   /* P9  = -6.704566938008179114124e-03 */
+        .quad 0x3F76036F42D9AB69   /* P10 = +5.374369252971835729099e-03 */
+        .quad 0xBFE5000000000000   /* B = -.65625  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8B64AF0450486E   /* PL0 = +4.751979286662385162741e-17 */
+        .quad 0x3FE3B75F8BCB742D   /* PH0 = +6.161344271055263499548e-01 */
+        .quad 0x3FE3DA23BC12369F   /* P1  = +6.203783677353447780947e-01 */
+        .quad 0xBFD8768FF4B46416   /* P2  = -3.822364701932782367281e-01 */
+        .quad 0x3F9D67CB8AD9CB1A   /* P3  = +2.871625933625941117406e-02 */
+        .quad 0x3FBC168CB7827DF4   /* P4  = +1.097190807363331305006e-01 */
+        .quad 0xBFB03A2B83C9272E   /* P5  = -6.338760344911228324430e-02 */
+        .quad 0xBF789FEB595297DC   /* P6  = -6.011885959344067548074e-03 */
+        .quad 0x3F98BD01B4C335E7   /* P7  = +2.415850320612902513532e-02 */
+        .quad 0xBF83BADC303D6535   /* P8  = -9.633751127398152979976e-03 */
+        .quad 0xBF6C54E7A1C1E3F3   /* P9  = -3.458454519258407989501e-03 */
+        .quad 0x3F7408394B7EF3E7   /* P10 = +4.890655334688332484537e-03 */
+        .quad 0xBFE7000000000000   /* B = -.71875  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6A48557F6E0D3E   /* PL0 = +1.139824111505584215867e-17 */
+        .quad 0x3FE4E8D895B010DC   /* PH0 = +6.534235881413468227663e-01 */
+        .quad 0x3FE25652FAAF8A73   /* P1  = +5.730376144604875448991e-01 */
+        .quad 0xBFD7F6C3A57C444B   /* P2  = -3.744362941807295084434e-01 */
+        .quad 0x3FAB7866E3F99EBE   /* P3  = +5.365296872042567001598e-02 */
+        .quad 0x3FB6FA1DF47CCD40   /* P4  = +8.975398272450707099784e-02 */
+        .quad 0xBFB05508D3741B8E   /* P5  = -6.379752314033580026840e-02 */
+        .quad 0x3F6C3EFDF7BB279C   /* P6  = +3.448005705512137236209e-03 */
+        .quad 0x3F9372BADD6D3E27   /* P7  = +1.899234749299530050806e-02 */
+        .quad 0xBF860FD5AE65F3DA   /* P8  = -1.077238977881649471165e-02 */
+        .quad 0xBF47266FFB07E628   /* P9  = -7.064863949032872448118e-04 */
+        .quad 0x3F6F9763992C2A05   /* P10 = +3.856367614735181120799e-03 */
+        .quad 0xBFE9000000000000   /* B = -.78125  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6BB6A2B194E3AB   /* PL0 = +1.201878007209462528697e-17 */
+        .quad 0x3FE602609AAE7C22   /* PH0 = +6.877902051090851731630e-01 */
+        .quad 0x3FE0DCBAFE191C7F   /* P1  = +5.269446337560025312137e-01 */
+        .quad 0xBFD732028428A9FB   /* P2  = -3.624273577321727538225e-01 */
+        .quad 0x3FB2D92389BE065B   /* P3  = +7.362577545975439796588e-02 */
+        .quad 0x3FB1F6A9C8C49993   /* P4  = +7.017003203927733370937e-02 */
+        .quad 0xBFAF47C0B50B56EE   /* P5  = -6.109430513394707378526e-02 */
+        .quad 0x3F85A8EDD1356223   /* P6  = +1.057611269668352068104e-02 */
+        .quad 0x3F8BE05C5CD1B4FA   /* P7  = +1.361152799855823798207e-02 */
+        .quad 0xBF85A0EFE4552F76   /* P8  = -1.056086936537046752272e-02 */
+        .quad 0x3F559F2A6A356194   /* P9  = +1.319686337259627831943e-03 */
+        .quad 0x3F6576F5E989208D   /* P10 = +2.620201394425042596201e-03 */
+        .quad 0xBFEB000000000000   /* B = -.84375  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C80328BD86C8B74   /* PL0 = +2.809809047161267929701e-17 */
+        .quad 0x3FE704BB1B7FCB81   /* PH0 = +7.193275010198335595035e-01 */
+        .quad 0x3FDEE264AAD6C40C   /* P1  = +4.825679462765613089739e-01 */
+        .quad 0xBFD637493CE659F1   /* P2  = -3.471243948673921548357e-01 */
+        .quad 0x3FB6BE3A3DEE6F4A   /* P3  = +8.884014141079635303208e-02 */
+        .quad 0x3FAA85EB6470AC0F   /* P4  = +5.180297471118688523488e-02 */
+        .quad 0xBFACC0146EA4858D   /* P5  = -5.615295267694895314457e-02 */
+        .quad 0x3F8F8FB683CDDAC5   /* P6  = +1.541082944616557159055e-02 */
+        .quad 0x3F819515DEE2CB91   /* P7  = +8.585139145315585602547e-03 */
+        .quad 0xBF834E45E6AF9EA1   /* P8  = -9.426637747267209169415e-03 */
+        .quad 0x3F65250F197CA56D   /* P9  = +2.581147662472352252568e-03 */
+        .quad 0x3F57A766026D036C   /* P10 = +1.443719500187702367690e-03 */
+        .quad 0xBFED000000000000   /* B = -.90625  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C716F7EEF7B61AD   /* PL0 = +1.512291215142578135651e-17 */
+        .quad 0x3FE7F0E1A4CD846E   /* PH0 = +7.481544703297353660076e-01 */
+        .quad 0x3FDC2D4CC872DC09   /* P1  = +4.402648885256331012598e-01 */
+        .quad 0xBFD514A99F92ED53   /* P2  = -3.293861444796750250530e-01 */
+        .quad 0x3FB9846A6CF2F337   /* P3  = +9.967675361526749494844e-02 */
+        .quad 0x3FA20896939AB161   /* P4  = +3.522177268800664413493e-02 */
+        .quad 0xBFA97E801F31EE0D   /* P5  = -4.979324703978358553405e-02 */
+        .quad 0x3F92A11F47B82085   /* P6  = +1.819275737037219740638e-02 */
+        .quad 0x3F717D70FE289C34   /* P7  = +4.270020845559097605514e-03 */
+        .quad 0xBF7FDCF1D3F6CE2D   /* P8  = -7.779068604054678540132e-03 */
+        .quad 0x3F69F607E81AF6B6   /* P9  = +3.169074480722534625181e-03 */
+        .quad 0x3F3F925C80D0F889   /* P10 = +4.817462766516585511824e-04 */
+        .quad 0xBFEF000000000000   /* B = -.96875  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C931A11D7E8606E   /* PL0 = +6.627280241435322692188e-17 */
+        .quad 0x3FE92BFB370D9B71   /* PH0 = +7.866188121086975515439e-01 */
+        .quad 0x3FD866160E454111   /* P1  = +3.812308444367014680480e-01 */
+        .quad 0xBFD33149F3801DBA   /* P2  = -2.998833539899937679796e-01 */
+        .quad 0x3FBBDB6D4C949899   /* P3  = +1.088169395412442909023e-01 */
+        .quad 0x3F8D6AB2A74B9343   /* P4  = +1.436366627735597372494e-02 */
+        .quad 0xBFA404D1047C5D72   /* P5  = -3.909924678571997970917e-02 */
+        .quad 0x3F93C47D9ACCD919   /* P6  = +1.930423981976856424661e-02 */
+        .quad 0xBF41B755642CFF1B   /* P7  = -5.406538915408738478158e-04 */
+        .quad 0xBF74B5301AA1E788   /* P8  = -5.055606752756853900641e-03 */
+        .quad 0x3F69A84C5B2A3E68   /* P9  = +3.132008679422249529120e-03 */
+        .quad 0xBF3CF47830328C11   /* P10 = -4.418176105877589308931e-04 */
+        .quad 0xBFF1000000000000   /* B = -1.0625   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C884D471B8FD396   /* PL0 = +4.215701792312937090514e-17 */
+        .quad 0x3FEA8DBCBC31897A   /* PH0 = +8.298019099859594849278e-01 */
+        .quad 0x3FD3EE730537C8EA   /* P1  = +3.114287901836535219818e-01 */
+        .quad 0xBFD08A05AD27CE32   /* P2  = -2.584242049190123217982e-01 */
+        .quad 0x3FBC5255406F84B6   /* P3  = +1.106313021005175045399e-01 */
+        .quad 0xBF772FA2F633AA5E   /* P4  = -5.660664147607434209241e-03 */
+        .quad 0xBF99DD8E4C473FC4   /* P5  = -2.525923100057504533247e-02 */
+        .quad 0x3F9183C935B6495D   /* P6  = +1.710428610165003372069e-02 */
+        .quad 0xBF70471A3A591480   /* P7  = -3.974058583087303228038e-03 */
+        .quad 0xBF603DDD4DEBB9A4   /* P8  = -1.982624278176818987264e-03 */
+        .quad 0x3F62591E44D3C17F   /* P9  = +2.239760512218135956425e-03 */
+        .quad 0xBF4C195D3A9B1AB4   /* P10 = -8.575158328419569430544e-04 */
+        .quad 0xBFF3000000000000   /* B = -1.1875   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C90DD1C9BFF7F64   /* PL0 = +5.850777430004479798187e-17 */
+        .quad 0x3FEBAD50A4A68BC1   /* PH0 = +8.649066177207417327466e-01 */
+        .quad 0x3FD01FBA72CEE1A5   /* P1  = +2.519365426228666233893e-01 */
+        .quad 0xBFCBE432F647C4D6   /* P2  = -2.179015829602010702633e-01 */
+        .quad 0x3FBABF92B6E5AC73   /* P3  = +1.044856735731387955105e-01 */
+        .quad 0xBF922983AA24E217   /* P4  = -1.773648954369563555378e-02 */
+        .quad 0xBF8C72214C14E23A   /* P5  = -1.388956082756564056328e-02 */
+        .quad 0x3F8ACB4D1F388E8B   /* P6  = +1.308307887581540972153e-02 */
+        .quad 0xBF740EF8B4A2EE3B   /* P7  = -4.897090441029978580995e-03 */
+        .quad 0xBF0EA9F30C8DC900   /* P8  = -5.848668076326342477133e-05 */
+        .quad 0x3F53CC40D18713AE   /* P9  = +1.208365725788622757410e-03 */
+        .quad 0xBF4848B86029CBA1   /* P10 = -7.410908004444779592485e-04 */
+        .quad 0xBFF5000000000000   /* B = -1.3125   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8FB61781D22681   /* PL0 = +5.501032995458057064843e-17 */
+        .quad 0x3FEC950A3340C8BF   /* PH0 = +8.931933404003514764824e-01 */
+        .quad 0x3FC9E1DFFD385423   /* P1  = +2.022056566644617586005e-01 */
+        .quad 0xBFC71E2FF88EBA23   /* P2  = -1.806087459239772032583e-01 */
+        .quad 0x3FB80AEBD07AB5BA   /* P3  = +9.391664352252506838449e-02 */
+        .quad 0xBF98404E27EAE6ED   /* P4  = -2.368280523908243895884e-02 */
+        .quad 0xBF772DA520B5006E   /* P5  = -5.658764868087568802107e-03 */
+        .quad 0x3F824C9268AF9423   /* P6  = +8.935111827620250551925e-03 */
+        .quad 0xBF722AE76D206AE3   /* P7  = -4.435447701349490160113e-03 */
+        .quad 0x3F4B807F56298D5E   /* P8  = +8.392926941493230644497e-04 */
+        .quad 0x3F3D71027DF95D2A   /* P9  = +4.492407879061627603159e-04 */
+        .quad 0xBF3EBD17676755FB   /* P10 = -4.690343988874298905483e-04 */
+        .quad 0xBFF7000000000000   /* B = -1.4375   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C95393C63CE8224   /* PL0 = +7.363407705201031038415e-17 */
+        .quad 0x3FED4E6F464286B0   /* PH0 = +9.158245441687622445670e-01 */
+        .quad 0x3FC4A45842B7DE1E   /* P1  = +1.612654042980787191461e-01 */
+        .quad 0xBFC2E7885AFDD3D0   /* P2  = -1.476908153814791087327e-01 */
+        .quad 0x3FB4DD6DD51D3FEB   /* P3  = +8.150373890862254580204e-02 */
+        .quad 0xBF9A05D3ADAB489C   /* P4  = -2.541285274021075503042e-02 */
+        .quad 0xBF3459B643B4995C   /* P5  = -3.105230313899165257622e-04 */
+        .quad 0x3F766B30745F2E3A   /* P6  = +5.473317409222350365811e-03 */
+        .quad 0xBF6C2C891E555BDF   /* P7  = -3.439204988051155730940e-03 */
+        .quad 0x3F5194F30D6C576D   /* P8  = +1.073109966176012791522e-03 */
+        .quad 0x3EF4DBB43C3132A2   /* P9  = +1.989194766975849961365e-05 */
+        .quad 0xBF2E45EBAB3C15A0   /* P10 = -2.309656316514087783666e-04 */
+        .quad 0xBFF9000000000000   /* B = -1.5625   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C75111669651DAA   /* PL0 = +1.827249135453834384396e-17 */
+        .quad 0x3FEDE1EB5937518F   /* PH0 = +9.338280432225917193634e-01 */
+        .quad 0x3FC06129C7C8EBB1   /* P1  = +1.279651856910653382507e-01 */
+        .quad 0xBFBE9763041064E1   /* P2  = -1.194974789545031421774e-01 */
+        .quad 0x3FB1A5B9F9113928   /* P3  = +6.893503504509068635308e-02 */
+        .quad 0xBF992145039F9AFE   /* P4  = -2.454097590080105816526e-02 */
+        .quad 0x3F66CB116EA49C89   /* P5  = +2.782377288116648315142e-03 */
+        .quad 0x3F67F972FDF30001   /* P6  = +2.926563829163342740100e-03 */
+        .quad 0xBF63A7B5975F02F3   /* P7  = -2.399305983061922438601e-03 */
+        .quad 0x3F4FDE7B8777F4C8   /* P8  = +9.725669069095216373599e-04 */
+        .quad 0xBF25918876626BA4   /* P9  = -1.645545082212515656240e-04 */
+        .quad 0xBF1495123C991F00   /* P10 = -7.851527984669912693674e-05 */
+        .quad 0xBFFB000000000000   /* B = -1.6875   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9F29A5B7426D27   /* PL0 = +1.081172820484012446345e-16 */
+        .quad 0x3FEE56B6F3EFABFC   /* PH0 = +9.480852856044061915952e-01 */
+        .quad 0x3FB9E3EFD94BB9FC   /* P1  = +1.011342912204113371518e-01 */
+        .quad 0xBFB88BD9760FECA7   /* P2  = -9.588393337610288420285e-02 */
+        .quad 0x3FAD48A0350B3ACF   /* P3  = +5.719471595295077387313e-02 */
+        .quad 0xBF96CC6A5110F129   /* P4  = -2.226415748394675367257e-02 */
+        .quad 0x3F71934687170384   /* P5  = +4.290843485649345772606e-03 */
+        .quad 0x3F5407BAF73B3DF9   /* P6  = +1.222546180475235334287e-03 */
+        .quad 0xBF591B626C0646DD   /* P7  = -1.532407870488964407324e-03 */
+        .quad 0x3F48B0E1DD283558   /* P8  = +7.535078860329375669277e-04 */
+        .quad 0xBF2B322292840D2B   /* P9  = -2.074877932117605962646e-04 */
+        .quad 0xBE99E4061120C741   /* P10 = -3.858017559892704559672e-07 */
+        .quad 0xBFFD000000000000   /* B = -1.8125   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6AF8C2041C67CD   /* PL0 = +1.169711482626385762338e-17 */
+        .quad 0x3FEEB2DFEDD5EC93   /* PH0 = +9.593352933146824801369e-01 */
+        .quad 0x3FB465A205CFB638   /* P1  = +7.967579500083210999681e-02 */
+        .quad 0xBFB3914BF68D39FF   /* P2  = -7.643580216720378576778e-02 */
+        .quad 0x3FA7F21A08C5C734   /* P3  = +4.676896435820623621673e-02 */
+        .quad 0xBF93DA9560EA9960   /* P4  = -1.938851741820124550772e-02 */
+        .quad 0x3F73953FEC62820E   /* P5  = +4.781007481284861359820e-03 */
+        .quad 0x3F2749D5E1273E3C   /* P6  = +1.776765426044646108071e-04 */
+        .quad 0xBF4D46B0B498CE5A   /* P7  = -8.934367007839658352859e-04 */
+        .quad 0x3F4153D680E1F4C4   /* P8  = +5.287930851093571206574e-04 */
+        .quad 0xBF28477014ECA6A2   /* P9  = -1.852344816708944640949e-04 */
+        .quad 0x3EFFAC54E07CEB4B   /* P10 = +3.020588886147182143902e-05 */
+        .quad 0xBFFF000000000000   /* B = -1.9375   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7A8AF2BB2231F2   /* PL0 = +2.302217989249372577466e-17 */
+        .quad 0x3FEF1994DF724FC8   /* PH0 = +9.718727459135090285258e-01 */
+        .quad 0x3FAC65B1BC0C9D58   /* P1  = +5.546336575053583942603e-02 */
+        .quad 0xBFAB9937BDA747C8   /* P2  = -5.390333356957871365599e-02 */
+        .quad 0x3FA15B42D9EF931C   /* P3  = +3.389939222669210777241e-02 */
+        .quad 0xBF8EACD8E8507A3C   /* P4  = -1.497811755149058215502e-02 */
+        .quad 0x3F7263A15721C682   /* P5  = +4.489546046998806349050e-03 */
+        .quad 0xBF42A032ACDC3B32   /* P6  = -5.684134900735048121829e-04 */
+        .quad 0xBF3431E79B5AD185   /* P7  = -3.081503340170088810438e-04 */
+        .quad 0x3F31B51667C7DF5E   /* P8  = +2.701930714290502424828e-04 */
+        .quad 0xBF1F8709579250AD   /* P9  = -1.202678157759563704341e-04 */
+        .quad 0x3F01ED8ED1BF9595   /* P10 = +3.419487094883790833778e-05 */
+        .quad 0xC001000000000000   /* B = -2.125    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C86F3F7C3DAFC55   /* PL0 = +3.981710680748877459333e-17 */
+        .quad 0x3FEF73776B2AA2DB   /* PH0 = +9.828450291725759901951e-01 */
+        .quad 0x3FA16A7FC4D7B900   /* P1  = +3.401564863075812007064e-02 */
+        .quad 0xBFA11E03803AD621   /* P2  = -3.343211117082156940532e-02 */
+        .quad 0x3F9609591597297F   /* P3  = +2.152003473546803654658e-02 */
+        .quad 0xBF847E74ED9BBB0C   /* P4  = -1.000682211039596246436e-02 */
+        .quad 0x3F6BFF771725CD65   /* P5  = +3.417713736035987187864e-03 */
+        .quad 0xBF491D1FF73C18FA   /* P6  = -7.664114077392807421000e-04 */
+        .quad 0x3EF53EE467B51DC5   /* P7  = +2.026145237479599375099e-05 */
+        .quad 0x3F160135BE0D94A0   /* P8  = +8.394136922403255700685e-05 */
+        .quad 0xBF0B32CB1D276A40   /* P9  = -5.187685350778849443841e-05 */
+        .quad 0x3EF4DAF70C12D555   /* P10 = +1.988919462255396826584e-05 */
+        .quad 0xC003000000000000   /* B = -2.375    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C19DBF4E2E5B7DC   /* PL0 = +3.504575836708380670219e-19 */
+        .quad 0x3FEFAA7934B75EBD   /* PH0 = +9.895597486128832054320e-01 */
+        .quad 0x3F9545200830A42C   /* P1  = +2.077150392520736492125e-02 */
+        .quad 0xBF950C46D285F6BC   /* P2  = -2.055464420253970271376e-02 */
+        .quad 0x3F8B79F5BFC6513F   /* P3  = +1.341621390819425058164e-02 */
+        .quad 0xBF7A50ADAD777898   /* P4  = -6.424597194806612772505e-03 */
+        .quad 0x3F633A19BE8255E3   /* P5  = +2.347040444940816227383e-03 */
+        .quad 0xBF44E609BC2557B7   /* P6  = -6.377742322836087134324e-04 */
+        .quad 0x3F1AFCBAD60EAACD   /* P7  = +1.029480968230231421206e-04 */
+        .quad 0x3EE80476AC34A8EF   /* P8  = +1.145240583485084317660e-05 */
+        .quad 0xBEF278E23DE463E9   /* P9  = -1.761646478213091821804e-05 */
+        .quad 0x3EE209FAF377264D   /* P10 = +8.601658563106529694651e-06 */
+        .quad 0xC005000000000000   /* B = -2.625    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C979D62702C631C   /* PL0 = +8.193023793215066385979e-17 */
+        .quad 0x3FEFCC04CDBCDC4B   /* PH0 = +9.936546343150295390600e-01 */
+        .quad 0x3F89E87D088D269A   /* P1  = +1.265046770426474576547e-02 */
+        .quad 0xBF89BE6721012B80   /* P2  = -1.257019586059526836624e-02 */
+        .quad 0x3F80F1C13E8D39D3   /* P3  = +8.273610803056031004326e-03 */
+        .quad 0xBF7082DBC9602757   /* P4  = -4.031046430108839563004e-03 */
+        .quad 0x3F590BE9BD4E0A11   /* P5  = +1.528719197467002507978e-03 */
+        .quad 0xBF3DCC2BEF6D0283   /* P6  = -4.546744598208711809986e-04 */
+        .quad 0x3F1A08065C4A8E85   /* P7  = +9.930170842636406837764e-05 */
+        .quad 0xBEE528117D0410F3   /* P8  = -1.008821337267942266431e-05 */
+        .quad 0xBED0BE73A44FF565   /* P9  = -3.992069257383521775961e-06 */
+        .quad 0x3EC9B0C11E342E38   /* P10 = +3.062539904901699218737e-06 */
+        .quad 0xC007000000000000   /* B = -2.875    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C804B931AD7A3CC   /* PL0 = +2.826768921701616830245e-17 */
+        .quad 0x3FEFE06EB0688212   /* PH0 = +9.961465306733450209009e-01 */
+        .quad 0x3F7F81BD8876224D   /* P1  = +7.692089427458426472642e-03 */
+        .quad 0xBF7F62A8C699A963   /* P2  = -7.662448196791823756776e-03 */
+        .quad 0x3F74C31E2B2A6A28   /* P3  = +5.068891378551522166321e-03 */
+        .quad 0xBF6470D537F16227   /* P4  = -2.495209162173734080001e-03 */
+        .quad 0x3F4FAEEF61C89673   /* P5  = +9.668988091717359455754e-04 */
+        .quad 0xBF33C5E80B349783   /* P6  = -3.017131341088651514023e-04 */
+        .quad 0x3F138F3D31037A6B   /* P7  = +7.461367590931028650557e-05 */
+        .quad 0xBEEB3C780996FFE3   /* P8  = -1.298723536791163711556e-05 */
+        .quad 0x3E9D0C75BC8BFEFC   /* P9  = +4.328589367358221917138e-07 */
+        .quad 0x3EAC3865227764D4   /* P10 = +8.410302755848104487452e-07 */
+        .quad 0xC009000000000000   /* B = -3.125    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C5B978B202749F9   /* PL0 = +5.983054034451594408315e-18 */
+        .quad 0x3FEFECD6B7EA3128   /* PH0 = +9.976609794698889643882e-01 */
+        .quad 0x3F73238B786137FE   /* P1  = +4.672570043181776968058e-03 */
+        .quad 0xBF731815ACEA072E   /* P2  = -4.661640805922390930706e-03 */
+        .quad 0x3F6956F0816D5AEE   /* P3  = +3.093213784647877798933e-03 */
+        .quad 0xBF591A16286C4885   /* P4  = -1.532098425461232453877e-03 */
+        .quad 0x3F43B3E3A00C6096   /* P5  = +6.012784434430592468442e-04 */
+        .quad 0xBF29441B2A56DEC7   /* P6  = -1.927645836710038499293e-04 */
+        .quad 0x3F0A99C3A2E857B6   /* P7  = +5.073669705184196724674e-05 */
+        .quad 0xBEE61CB034DDC151   /* P8  = -1.054385361573597042258e-05 */
+        .quad 0x3EB792BBC76D6107   /* P9  = +1.405070887824641788698e-06 */
+        .quad 0x3E761472362A16F0   /* P10 = +8.225391704739515383837e-08 */
+        .quad 0xC00B000000000000   /* B = -3.375    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9C290AFCBDE00D   /* PL0 = +9.770074992945060684926e-17 */
+        .quad 0x3FEFF45F6D36133A   /* PH0 = +9.985806592017987259879e-01 */
+        .quad 0x3F673CEC093032DE   /* P1  = +2.836667068100913999228e-03 */
+        .quad 0xBF67347A7CD844D5   /* P2  = -2.832640870800243808078e-03 */
+        .quad 0x3F5EDA25530355DB   /* P3  = +1.883064698679040793627e-03 */
+        .quad 0xBF4EAD3BBABC1BA9   /* P4  = -9.361783645268534848806e-04 */
+        .quad 0x3F3842E61CD35432   /* P5  = +3.701984213198588740338e-04 */
+        .quad 0xBF1F9AB7FD1A3DDD   /* P6  = -1.205611036090218544867e-04 */
+        .quad 0x3F0136C154EA3DED   /* P7  = +3.283288480304320224929e-05 */
+        .quad 0xBEDF12807F721E66   /* P8  = -7.408207230892235753013e-06 */
+        .quad 0x3EB5B53687AD5112   /* P9  = +1.293889481520047941659e-06 */
+        .quad 0xBE801E90FBFED147   /* P10 = -1.200988872775447204019e-07 */
+        .quad 0xC00D000000000000   /* B = -3.625    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9E323294294877   /* PL0 = +1.047637125334028950603e-16 */
+        .quad 0x3FEFF8F21CDAAA62   /* PH0 = +9.991388858373506653976e-01 */
+        .quad 0x3F5C3470628813F2   /* P1  = +1.721486807697344658108e-03 */
+        .quad 0xBF5C2E38AC6FF8D2   /* P2  = -1.720004411026422324849e-03 */
+        .quad 0x3F52C13234626F43   /* P3  = +1.144694354969070234454e-03 */
+        .quad 0xBF42B0A47DF47BB4   /* P4  = -5.703738387728891173354e-04 */
+        .quad 0x3F2DB2889E32FBFD   /* P5  = +2.265731592156760387344e-04 */
+        .quad 0xBF1385FBD54C5A55   /* P6  = -7.447576110695385196414e-05 */
+        .quad 0x3EF5AFA812C6984E   /* P7  = +2.068153223579892541184e-05 */
+        .quad 0xBED47097C188A03C   /* P8  = -4.873231795467276043290e-06 */
+        .quad 0x3EAFF2B982F7EE8C   /* P9  = +9.521288628073486288914e-07 */
+        .quad 0xBE828EC5B57D424D   /* P10 = -1.382656715739529384702e-07 */
+        .quad 0xC00F000000000000   /* B = -3.875    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9BA40DA6983BEC   /* PL0 = +9.589840482158163453169e-17 */
+        .quad 0x3FEFFCAAC3F20E65   /* PH0 = +9.995931460438894911036e-01 */
+        .quad 0x3F4AA87CF664754C   /* P1  = +8.135423820793490331956e-04 */
+        .quad 0xBF4AA5B62919E224   /* P2  = -8.132113891426467676310e-04 */
+        .quad 0x3F41C01B53B0B312   /* P3  = +5.416997368051531710388e-04 */
+        .quad 0xBF31B8B54D091751   /* P4  = -2.704088811110632606347e-04 */
+        .quad 0x3F1C431305954ECC   /* P5  = +1.078110084525254933728e-04 */
+        .quad 0xBF02B7DEAD0D44E6   /* P6  = -3.570221236393906131126e-05 */
+        .quad 0x3EE51C6EFF109EA9   /* P7  = +1.006654199116272154479e-05 */
+        .quad 0xBEC48CFB08072D17   /* P8  = -2.449834994621594976610e-06 */
+        .quad 0x3EA1585EC59CAE34   /* P9  = +5.169271261920604503617e-07 */
+        .quad 0xBE78832BAF950BA9   /* P10 = -9.131575131209528255629e-08 */
+        .quad 0xC011000000000000   /* B = -4.25     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8FBF237F4AFE10   /* PL0 = +5.507163370275307643966e-17 */
+        .quad 0x3FEFFEC61279A3A4   /* PH0 = +9.998503075449787225182e-01 */
+        .quad 0x3F339E78281A00EA   /* P1  = +2.993625022114214863645e-04 */
+        .quad 0xBF339DB7B072AD62   /* P2  = -2.993176899035080028902e-04 */
+        .quad 0x3F2A259E658EF4E4   /* P3  = +1.994853835451177669594e-04 */
+        .quad 0xBF1A219C312B10BA   /* P4  = -9.968295880030927192162e-05 */
+        .quad 0x3F04E146B4F5F4B7   /* P5  = +3.982541113154699160876e-05 */
+        .quad 0xBEEBC5F137088210   /* P6  = -1.324329943580649487333e-05 */
+        .quad 0x3ECF96736E300B00   /* P7  = +3.765547135882256916132e-06 */
+        .quad 0xBEAF4874840B91EB   /* P8  = -9.323068824421825762292e-07 */
+        .quad 0x3E8B6AB2B5C8FD3F   /* P9  = +2.042709991312793245971e-07 */
+        .quad 0xBE650BCCE62FD2B7   /* P10 = -3.920140725219944650830e-08 */
+        .quad 0xC013000000000000   /* B = -4.75     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9C869C85471703   /* PL0 = +9.896883942603146946483e-17 */
+        .quad 0x3FEFFF8C81C6DC33   /* PH0 = +9.999449286177707341139e-01 */
+        .quad 0x3F1CDF5A2E4D7C69   /* P1  = +1.101397316012206760643e-04 */
+        .quad 0xBF1CDEF1F9BE63BE   /* P2  = -1.101336660539594564027e-04 */
+        .quad 0x3F133EC10C83AAA0   /* P3  = +7.341435696487731017506e-05 */
+        .quad 0xBF033DAB325FAACB   /* P4  = -3.669909192168459445238e-05 */
+        .quad 0x3EEEC598FA98BAD8   /* P5  = +1.467316890843338172161e-05 */
+        .quad 0xBED47F1A15BA368E   /* P6  = -4.886744445221253126882e-06 */
+        .quad 0x3EB761FBE7D201C1   /* P7  = +1.393720509029845064726e-06 */
+        .quad 0xBE974CD75A43BF6B   /* P8  = -3.471994551992448536007e-07 */
+        .quad 0x3E74B02965BBF8DC   /* P9  = +7.706929621914905669946e-08 */
+        .quad 0xBE504EF4E3892A66   /* P10 = -1.518840362012570189110e-08 */
+        .quad 0xC015000000000000   /* B = -5.25     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C643810400471B0   /* PL0 = +8.768592603904887599187e-18 */
+        .quad 0x3FEFFFD583014825   /* PH0 = +9.999797400180382433987e-01 */
+        .quad 0x3F053E71416C43CA   /* P1  = +4.051955345663706869871e-05 */
+        .quad 0xBF053E550C7C8CC9   /* P2  = -4.051873253121394012080e-05 */
+        .quad 0x3EFC52D0D90D4843   /* P3  = +2.701139380018752534477e-05 */
+        .quad 0xBEEC523A6ADBE142   /* P4  = -1.350460237457883558350e-05 */
+        .quad 0x3ED6A73E22D844B3   /* P5  = +5.400965660055565196396e-06 */
+        .quad 0xBEBE31D10F23ACD0   /* P6  = -1.799738182979224868919e-06 */
+        .quad 0x3EA13E14264DEAB2   /* P7  = +5.138663935333241981438e-07 */
+        .quad 0xBE81385ABB98EDCC   /* P8  = -1.282999997786486835638e-07 */
+        .quad 0x3E5EB9164593E0B6   /* P9  = +2.861301981891537161158e-08 */
+        .quad 0xBE387218CFE7772E   /* P10 = -5.691705994073124478195e-09 */
+        .quad 0xC017000000000000   /* B = -5.75     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C92530433F4C703   /* PL0 = +6.357512739163799046861e-17 */
+        .quad 0x3FEFFFF05E8D3191   /* PH0 = +9.999925467214315633058e-01 */
+        .quad 0x3EEF42DDFA52B575   /* P1  = +1.490650158538873335176e-05 */
+        .quad 0xBEEF42CEB54212AA   /* P2  = -1.490639048307961378200e-05 */
+        .quad 0x3EE4D7201CBCB853   /* P3  = +9.937445518550804010127e-06 */
+        .quad 0xBED4D6F764B66C37   /* P4  = -4.968574624976280456686e-06 */
+        .quad 0x3EC0ABB806EBDE71   /* P5  = +1.987311456171617620608e-06 */
+        .quad 0xBEA6399CF854F876   /* P6  = -6.623581475862682369330e-07 */
+        .quad 0x3E8964B91728D7C9   /* P7  = +1.891959403186505598965e-07 */
+        .quad 0xBE6961A0528444D6   /* P8  = -4.727645325404986954168e-08 */
+        .quad 0x3E46AE3B0814EE00   /* P9  = +1.056147192151514779549e-08 */
+        .quad 0xBE221B8194DACD16   /* P10 = -2.107984154277957626641e-09 */
+        .quad 0xC019000000000000   /* B = -6.25     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7BB5622CE1A79E   /* PL0 = +2.403331811901679167526e-17 */
+        .quad 0x3FEFFFFA3FF22708   /* PH0 = +9.999972580855862602789e-01 */
+        .quad 0x3ED7003552D53503   /* P1  = +5.483821309338170039906e-06 */
+        .quad 0xBED7003130C1AB92   /* P2  = -5.483806273169366545037e-06 */
+        .quad 0x3ECEAAE13B699C45   /* P3  = +3.655850800133043324271e-06 */
+        .quad 0xBEBEAACB305F3D07   /* P4  = -1.827905351959291114416e-06 */
+        .quad 0x3EA8887F5F9C87EF   /* P5  = +7.311461438267648556646e-07 */
+        .quad 0xBE905AD08DF8454F   /* P6  = -2.437046884027860662692e-07 */
+        .quad 0x3E72B068300B703F   /* P7  = +6.962228483613086736676e-08 */
+        .quad 0xBE52AF921A71C058   /* P8  = -1.740252888706390465423e-08 */
+        .quad 0x3E30B53EAA35300D   /* P9  = +3.890131469838137725119e-09 */
+        .quad 0xBE0AB60CDAD7E22E   /* P10 = -7.773963050435300060566e-10 */
+        .quad 0xC01B000000000000   /* B = -6.75     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8BD1ACF80D7256   /* PL0 = +4.825835138930451121169e-17 */
+        .quad 0x3FEFFFFDE2760A41   /* PH0 = +9.999989913051835488389e-01 */
+        .quad 0x3EC0EC4F1EC27E55   /* P1  = +2.017388615341105998718e-06 */
+        .quad 0xBEC0EC4E005E6EAC   /* P2  = -2.017386580411626200507e-06 */
+        .quad 0x3EB6906504BC4610   /* P3  = +1.344921673533307001969e-06 */
+        .quad 0xBEA6905F0D52C8B5   /* P4  = -6.724581235377781360384e-07 */
+        .quad 0x3E920D0F5CCE152B   /* P5  = +2.689810941136721216499e-07 */
+        .quad 0xBE7811505B10E753   /* P6  = -8.965891741619763761543e-08 */
+        .quad 0x3E5B811EE4F9B8EE   /* P7  = +2.561544781706659619288e-08 */
+        .quad 0xBE3B80ABC067E840   /* P8  = -6.403452884688571158579e-09 */
+        .quad 0x3E1898E394E09335   /* P9  = +1.431746793613569087489e-09 */
+        .quad 0xBDF3ABB5BA711DB7   /* P10 = -2.862469657501951918569e-10 */
+        .quad 0xC01D000000000000   /* B = -7.25     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8AE01DB39A3791   /* PL0 = +4.662147961093911873193e-17 */
+        .quad 0x3FEFFFFF38C76668   /* PH0 = +9.999996289217962797125e-01 */
+        .quad 0x3EA8E712E56E1188   /* P1  = +7.421562696484951529573e-07 */
+        .quad 0xBEA8E7124A650791   /* P2  = -7.421559942504648535596e-07 */
+        .quad 0x3EA09A0B62D8EF94   /* P3  = +4.947702955735978541097e-07 */
+        .quad 0xBE909A09C56C2107   /* P4  = -2.473847805916120382218e-07 */
+        .quad 0x3E7A900A90A54A6E   /* P5  = +9.895362410487317236618e-08 */
+        .quad 0xBE61B5557BB449B6   /* P6  = -3.298434544432568302770e-08 */
+        .quad 0x3E443CC74732CDCA   /* P7  = +9.423781066565733462466e-09 */
+        .quad 0xBE243CA8AA8D6E54   /* P8  = -2.355890888986360997159e-09 */
+        .quad 0x3E0219C341E0D1B4   /* P9  = +5.267978308406275552691e-10 */
+        .quad 0xBDDCF49A10950F13   /* P10 = -1.053394074620716018815e-10 */
+        .quad 0xC01F000000000000   /* B = -7.75     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C75CB18F3775414   /* PL0 = +1.890271747518592444083e-17 */
+        .quad 0x3FEFFFFFD38C39F0   /* PH0 = +9.999999172012490333827e-01 */
+        .quad 0x3E8639E2F89493BB   /* P1  = +1.655974950855472979393e-07 */
+        .quad 0xBE8639E2D9B29562   /* P2  = -1.655974813708346974914e-07 */
+        .quad 0x3E7DA2836A1F706E   /* P3  = +1.103982989742589616541e-07 */
+        .quad 0xBE6DA282C6733DAE   /* P4  = -5.519913131581509871840e-08 */
+        .quad 0x3E57B53A278851FD   /* P5  = +2.207971980430773309147e-08 */
+        .quad 0xBE3F9C4A72536E22   /* P6  = -7.359895614149337484810e-09 */
+        .quad 0x3E220E81FBE19CDD   /* P7  = +2.102073153607135257714e-09 */
+        .quad 0xBE020E8875ADA8D8   /* P8  = -5.255211642212584097407e-10 */
+        .quad 0x3DE07634328384FC   /* P9  = +1.197748786062966341989e-10 */
+        .quad 0xBDBA54078E3C351F   /* P10 = -2.394539505021488953905e-11 */
+        .quad 0xC021000000000000   /* B = -8.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C98B78738B0EDEF   /* PL0 = +8.575399788039081964921e-17 */
+        .quad 0x3FEFFFFFF9FBEA40   /* PH0 = +9.999999887944071019774e-01 */
+        .quad 0x3E581056FAC28C46   /* P1  = +2.241118550516412682327e-08 */
+        .quad 0xBE581056F63A4351   /* P2  = -2.241118525356742542550e-08 */
+        .quad 0x3E500AE49533790A   /* P3  = +1.494078933911655875521e-08 */
+        .quad 0xBE400AE489ACBA90   /* P4  = -7.470394349637968945652e-09 */
+        .quad 0x3E29AB0D59A1967B   /* P5  = +2.988168557255271725494e-09 */
+        .quad 0xBE111CB32D6EEF2B   /* P6  = -9.960558400070350772418e-10 */
+        .quad 0x3DF38CBADF396908   /* P7  = +2.844859618921805216353e-10 */
+        .quad 0xBDD38CC7B92CECD3   /* P8  = -7.112220386749926320915e-11 */
+        .quad 0x3DB1D2BBE2705032   /* P9  = +1.621008722427575444686e-11 */
+        .quad 0xBD8C8199294E6380   /* P10 = -3.240784656869469020111e-12 */
+        .quad 0xC023000000000000   /* B = -9.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8EEEC16618B984   /* PL0 = +5.365957423487855307906e-17 */
+        .quad 0x3FEFFFFFFF2F9279   /* PH0 = +9.999999984834878619111e-01 */
+        .quad 0x3E2A0DB0D052B148   /* P1  = +3.033024167396880687734e-09 */
+        .quad 0xBE2A0DB0CFA6AB71   /* P2  = -3.033024162734192808028e-09 */
+        .quad 0x3E215E75D53A3105   /* P3  = +2.022016035353114070618e-09 */
+        .quad 0xBE115E75D40AA47F   /* P4  = -1.011008013562702155050e-09 */
+        .quad 0x3DFBCA5CDC12ED1C   /* P5  = +4.044047007631481841556e-10 */
+        .quad 0xBDE286E85704FC22   /* P6  = -1.348015410318274576187e-10 */
+        .quad 0x3DC52A8925354517   /* P7  = +3.850101197145027796396e-11 */
+        .quad 0xBDA52A97EA3F5F4A   /* P8  = -9.625355478142550638468e-12 */
+        .quad 0x3D834C011A2AC0F7   /* P9  = +2.193802608697321032841e-12 */
+        .quad 0xBD5EDD05BDCB3A62   /* P10 = -4.385948508419928563300e-13 */
+        .quad 0xC025000000000000   /* B = -10.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6BD8B474BBF792   /* PL0 = +1.207649585364892639612e-17 */
+        .quad 0x3FEFFFFFFFE3CAD8   /* PH0 = +9.999999997947623953110e-01 */
+        .quad 0x3DFC3527E43C565F   /* P1  = +4.104751852963940338559e-10 */
+        .quad 0xBDFC3527E420F415   /* P2  = -4.104751852036136216697e-10 */
+        .quad 0x3DF2CE1A8D806DAD   /* P3  = +2.736501142887952919489e-10 */
+        .quad 0xBDE2CE1A8DDF690A   /* P4  = -1.368250573053032426141e-10 */
+        .quad 0x3DCE169832D8BD68   /* P5  = +5.473022586854025789680e-11 */
+        .quad 0xBDB40F0FE853DA5B   /* P6  = -1.824340550195944358477e-11 */
+        .quad 0x3D96EA8D930D31A1   /* P7  = +5.210545794901128943676e-12 */
+        .quad 0xBD76EA9DB0D09839   /* P8  = -1.302650427355019556441e-12 */
+        .quad 0x3D54E474FD4303A1   /* P9  = +2.968990047962355000258e-13 */
+        .quad 0xBD30B526CA2B228A   /* P10 = -5.935740124899435401321e-14 */
+        .quad 0xC027000000000000   /* B = -11.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C56E8953D525FD5   /* PL0 = +4.967494994909661698725e-18 */
+        .quad 0x3FEFFFFFFFFC2EB9   /* PH0 = +9.999999999722241073030e-01 */
+        .quad 0x3DCE8A37A48016C2   /* P1  = +5.555177547354687971427e-11 */
+        .quad 0xBDCE8A37A479B7D4   /* P2  = -5.555177547084873157964e-11 */
+        .quad 0x3DC45C250CFA9C16   /* P3  = +3.703451575129414499553e-11 */
+        .quad 0xBDB45C250D9F8467   /* P4  = -1.851725791056759260154e-11 */
+        .quad 0x3DA049BB33CBD4E9   /* P5  = +7.406930640558963265190e-12 */
+        .quad 0xBD85B7A407C422C1   /* P6  = -2.468976464832073512208e-12 */
+        .quad 0x3D68CF9CED2B3FD5   /* P7  = +7.051706989348171774536e-13 */
+        .quad 0xBD48CFAE64C352B3   /* P8  = -1.762945685274427023683e-13 */
+        .quad 0x3D269EAE08690D52   /* P9  = +4.018091287355461204663e-14 */
+        .quad 0xBD0216CBEAFFF5AA   /* P10 = -8.033151495672990022322e-15 */
+        .quad 0xC029000000000000   /* B = -12.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8ACF1392B106D3   /* PL0 = +4.650601502940921454330e-17 */
+        .quad 0x3FEFFFFFFFFF7BBD   /* PH0 = +9.999999999962408958609e-01 */
+        .quad 0x3DA088529889B316   /* P1  = +7.518115268189742464885e-12 */
+        .quad 0xBDA088529887F4C4   /* P2  = -7.518115268005149164680e-12 */
+        .quad 0x3D960B18BF1DF711   /* P3  = +5.012076679213679703380e-12 */
+        .quad 0xBD860B18BFD99A48   /* P4  = -2.506038344573564868987e-12 */
+        .quad 0x3D71A27E7CA64143   /* P5  = +1.002419056539285288454e-12 */
+        .quad 0xBD5783530EA76D91   /* P6  = -3.341396294294381580191e-13 */
+        .quad 0x3D3ADCC75CBD2A03   /* P7  = +9.543447641637910477850e-14 */
+        .quad 0xBD1ADCDA46BE5F17   /* P8  = -2.385887543769010971872e-14 */
+        .quad 0x3CF87D77650BE5B8   /* P9  = +5.437895260471143131391e-15 */
+        .quad 0xBCD395AE6E74C6D2   /* P10 = -1.087168847335561258239e-15 */
+        .quad 0xC02B000000000000   /* B = -13.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C97A8A295292858   /* PL0 = +8.208271151146829171896e-17 */
+        .quad 0x3FEFFFFFFFFFEE19   /* PH0 = +9.999999999994911847878e-01 */
+        .quad 0x3D71E642BB008F95   /* P1  = +1.017466259229268282255e-12 */
+        .quad 0xBD71E642BAFEEC54   /* P2  = -1.017466259207593392022e-12 */
+        .quad 0x3D67DDAE41647741   /* P3  = +6.783108169938233581038e-13 */
+        .quad 0xBD57DDAE4230F34B   /* P4  = -3.391554091734942426856e-13 */
+        .quad 0x3D4317C33FAE2536   /* P5  = +1.356626669455791324801e-13 */
+        .quad 0xBD2975040D3E26B9   /* P6  = -4.522088139411435138867e-14 */
+        .quad 0x3D0D155DCD0F0AFB   /* P7  = +1.291565189902030307333e-14 */
+        .quad 0xBCED157247832B20   /* P8  = -3.228947666403019234175e-15 */
+        .quad 0x3CCA83D70F607C28   /* P9  = +7.359390959466796619024e-16 */
+        .quad 0xBCA5343952C1E19E   /* P10 = -1.471323041436694087188e-16 */
+        .quad 0xC02D000000000000   /* B = -14.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9B7876CBC5306E   /* PL0 = +9.530765996816607711732e-17 */
+        .quad 0x3FEFFFFFFFFFFD93   /* PH0 = +9.999999999999310551502e-01 */
+        .quad 0x3D436121E2640D76   /* P1  = +1.376990843765503869546e-13 */
+        .quad 0xBD436121E26250EA   /* P2  = -1.376990843736775811281e-13 */
+        .quad 0x3D39D6D7CA259186   /* P3  = +9.179938654047876451320e-14 */
+        .quad 0xBD29D6D7CB0327CE   /* P4  = -4.589969336188563660531e-14 */
+        .quad 0x3D14ABE4DC31244A   /* P5  = +1.835994545584345768382e-14 */
+        .quad 0xBCFB8FDB82AB6BB7   /* P6  = -6.119980791767901275443e-15 */
+        .quad 0x3CDF7CF757491B60   /* P7  = +1.747943407988343076526e-15 */
+        .quad 0xBCBF7D0D833640FB   /* P8  = -4.369905470133249448357e-16 */
+        .quad 0x3C9CB512F6BDC754   /* P9  = +9.959852600692493655511e-17 */
+        .quad 0xBC76F50AB1B0E9BA   /* P10 = -1.991219205936492089091e-17 */
+        .quad 0xC02F000000000000   /* B = -15.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6FFE15D5F78543   /* PL0 = +1.387454417328248962819e-17 */
+        .quad 0x3FEFFFFFFFFFFFE1   /* PH0 = +9.999999999999965583086e-01 */
+        .quad 0x3CFEE00288B99C26   /* P1  = +6.855635762864742358597e-15 */
+        .quad 0xBCFEE0027D060EE2   /* P2  = -6.855635607998342735403e-15 */
+        .quad 0x3CF4954AA23148A2   /* P3  = +4.570381865813341696777e-15 */
+        .quad 0xBCE4954B5DAD3010   /* P4  = -2.285192173571711474199e-15 */
+        .quad 0x3CD07883DD8793BD   /* P5  = +9.143109661358222028007e-16 */
+        .quad 0xBCB5F5F4BB87ADCF   /* P6  = -3.047668447080103869032e-16 */
+        .quad 0x3C98F1A905097685   /* P7  = +8.654183371862458774513e-17 */
+        .quad 0xBC78F2D585007222   /* P8  = -2.163943551222030413627e-17 */
+        .quad 0x3C58A37CC5082B5F   /* P9  = +5.342649626494471588064e-18 */
+        .quad 0xBC33AE7917F94D17   /* P10 = -1.066938163384541013918e-18 */
+        .quad 0xC031000000000000   /* B = -17        */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C91BF1D80474F0F   /* PL0 = +6.157069264461989135096e-17 */
+        .quad 0x3FEFFFFFFFFFFFFE   /* PH0 = +9.999999999999997779554e-01 */
+        .quad 0x3CB72071400E6275   /* P1  = +3.209478247225075961360e-16 */
+        .quad 0xBCB72071400A9F37   /* P2  = -3.209478247103497434502e-16 */
+        .quad 0x3CAED5EC39A77629   /* P3  = +2.139652050028423711308e-16 */
+        .quad 0xBC9ED5EC3B530600   /* P4  = -1.069826028468029104719e-16 */
+        .quad 0x3C88AB2BFED159DE   /* P5  = +4.279326904335078988705e-17 */
+        .quad 0xBC70721D1220B3FC   /* P6  = -1.426441958074916244382e-17 */
+        .quad 0x3C52C96049721FB8   /* P7  = +4.073700029965821523731e-18 */
+        .quad 0xBC32C971215735DC   /* P8  = -1.018438939975201710113e-18 */
+        .quad 0x3C112EF658AB41A9   /* P9  = +2.328791246104218830028e-19 */
+        .quad 0xBBEB7B598C6AD3DE   /* P10 = -4.655603964908654142787e-20 */
+        .quad 0xC03287E0C98F84E5   /* B = -18.530774 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
+        .quad 0x3FF0000000000000   /* PH0 = +1.000000000000000000000e+00 */
+        .quad 0x0000000000000000   /* P1  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P2  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P3  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P4  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P5  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P6  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P7  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P8  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P9  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P10 = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* B = +0        */
+        .quad 0x0000000000000000   /* A = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .align 16
+        .quad 0x8000000000000000, 0x8000000000000000   /* _dbSignMask       */
+        .align 16
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff   /* _dbAbsMask        */
+        .align 16
+        .long 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000           /* _iExpMantMask     */
+        .align 16
+        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMask         */
+        .align 16
+        .long 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000           /* _iMinIdxOfsMask   */
+        .align 16
+        .long 0x00760000, 0x00760000, 0x00760000, 0x00760000           /* _iMaxIdxMask      */
+        .align 16
+        .type	__svml_dtanh_data_internal,@object
+        .size	__svml_dtanh_data_internal,.-__svml_dtanh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S
new file mode 100644
index 0000000000..80e85c47ec
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized tanh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_tanh _ZGVdN4v_tanh_sse_wrapper
+#include "../svml_d_tanh4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c
new file mode 100644
index 0000000000..a26e62052b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized tanh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_tanh
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_tanh, __GI__ZGVdN4v_tanh, __redirect__ZGVdN4v_tanh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S
new file mode 100644
index 0000000000..53dda241e4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S
@@ -0,0 +1,1279 @@
+/* Function tanh vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   NOTE: Since the hyperbolic tangent function is odd
+ *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
+ *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
+ *
+ *   We use a table lookup method to compute tanh(|x|).
+ *   The basic idea is to split the input range into a number of subintervals
+ *   and to approximate tanh(.) with a polynomial on each of them.
+ *
+ *   IEEE SPECIAL CONDITIONS:
+ *   x = [+,-]0, r = [+,-]0
+ *   x = +Inf,   r = +1
+ *   x = -Inf,   r = -1
+ *   x = QNaN,   r = QNaN
+ *   x = SNaN,   r = QNaN
+ *
+ *
+ *   ALGORITHM DETAILS
+ *   We handle special values in a callout function, aside from main path
+ *   computations. "Special" for this algorithm are:
+ *   INF, NAN, |x| > HUGE_THRESHOLD
+ *
+ *
+ *   Main path computations are organized as follows:
+ *   Actually we split the interval [0, SATURATION_THRESHOLD)
+ *   into a number of subintervals.  On each subinterval we approximate tanh(.)
+ *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
+ *   are computed beforehand and stored in table. We also use
+ *
+ *       y := |x| + B,
+ *
+ *   here B depends on subinterval and is used to make argument
+ *   closer to zero.
+ *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
+ *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
+ *   preserve main path computation logic but return 1.0 for all arguments.
+ *
+ *   Hence reconstruction looks as follows:
+ *   we extract proper polynomial and range reduction coefficients
+ *        (Pj and B), corresponding to subinterval, to which |x| belongs,
+ *        and return
+ *
+ *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
+ *
+ *   NOTE: we use multiprecision technique to multiply and sum the first
+ *         K terms of the polynomial. So Pj, j = 0..K are stored in
+ *         table each as a pair of target precision numbers (Pj and PLj) to
+ *         achieve wider than target precision.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dtanh_data_internal
+ */
+#define _dbP                          	0
+#define _dbSignMask                   	7680
+#define _dbAbsMask                    	7712
+#define _iExpMantMask                 	7744
+#define _iExpMask                     	7776
+#define _iMinIdxOfsMask               	7808
+#define _iMaxIdxMask                  	7840
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_tanh_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       _dbP+96+__svml_dtanh_data_internal(%rip), %r8
+        vmovupd   %ymm0, (%rsp)
+
+/* if VMIN, VMAX is defined for I type */
+        vpxor     %xmm11, %xmm11, %xmm11
+
+/*  Constant loading  */
+        vmovups   _iMaxIdxMask+__svml_dtanh_data_internal(%rip), %xmm8
+        vandpd    _dbAbsMask+__svml_dtanh_data_internal(%rip), %ymm0, %ymm1
+        vandpd    _dbSignMask+__svml_dtanh_data_internal(%rip), %ymm0, %ymm2
+        vextractf128 $1, %ymm0, %xmm15
+        vshufps   $221, %xmm15, %xmm0, %xmm14
+
+/* Here huge arguments, INF and NaNs are filtered out to callout. */
+        vpand     _iExpMantMask+__svml_dtanh_data_internal(%rip), %xmm14, %xmm12
+        vpsubd    _iMinIdxOfsMask+__svml_dtanh_data_internal(%rip), %xmm12, %xmm9
+        vpcmpgtd  %xmm11, %xmm9, %xmm10
+        vpcmpgtd  %xmm8, %xmm9, %xmm0
+        vpand     %xmm10, %xmm9, %xmm7
+        blendvps  %xmm0, %xmm8, %xmm7
+
+/*
+ * VSHRIMM( I, iIndex, = iIndex, (17 - 4) );
+ * VGATHER_MATRIX( L2D, p, TAB._dbP, iIndex, 0, T_ITEM_SIZE, T_ITEM_GRAN, 13, 0, 0 );
+ */
+        vpsrld    $10, %xmm7, %xmm6
+        vmovd     %xmm6, %edx
+        vpcmpgtd  _iExpMask+__svml_dtanh_data_internal(%rip), %xmm12, %xmm13
+        vmovmskps %xmm13, %eax
+        vpextrd   $1, %xmm6, %ecx
+        movslq    %edx, %rdx
+        movslq    %ecx, %rcx
+        vpextrd   $2, %xmm6, %esi
+        vpextrd   $3, %xmm6, %edi
+        movslq    %esi, %rsi
+        movslq    %edi, %rdi
+        vmovupd   -96(%rdx,%r8), %xmm3
+        vmovupd   -96(%rcx,%r8), %xmm4
+        vmovupd   -80(%rcx,%r8), %xmm13
+        vmovupd   -64(%rcx,%r8), %xmm9
+        vmovupd   -80(%rdx,%r8), %xmm14
+        vmovupd   -64(%rdx,%r8), %xmm10
+        vmovupd   -48(%rdx,%r8), %xmm6
+        vinsertf128 $1, -96(%rsi,%r8), %ymm3, %ymm0
+        vinsertf128 $1, -96(%rdi,%r8), %ymm4, %ymm15
+        vmovupd   -48(%rcx,%r8), %xmm3
+        vunpckhpd %ymm15, %ymm0, %ymm0
+        vinsertf128 $1, -80(%rsi,%r8), %ymm14, %ymm12
+        vinsertf128 $1, -64(%rsi,%r8), %ymm10, %ymm8
+        vinsertf128 $1, -80(%rdi,%r8), %ymm13, %ymm11
+        vinsertf128 $1, -64(%rdi,%r8), %ymm9, %ymm7
+        vunpcklpd %ymm11, %ymm12, %ymm15
+        vunpckhpd %ymm11, %ymm12, %ymm14
+        vunpcklpd %ymm7, %ymm8, %ymm13
+        vunpckhpd %ymm7, %ymm8, %ymm12
+        vmovupd   -32(%rdx,%r8), %xmm9
+        vmovupd   -32(%rcx,%r8), %xmm8
+        vinsertf128 $1, -48(%rsi,%r8), %ymm6, %ymm4
+        vinsertf128 $1, -48(%rdi,%r8), %ymm3, %ymm5
+        vunpcklpd %ymm5, %ymm4, %ymm11
+        vunpckhpd %ymm5, %ymm4, %ymm10
+        vmovupd   -16(%rdx,%r8), %xmm3
+        vmovupd   -16(%rcx,%r8), %xmm4
+        vinsertf128 $1, -32(%rsi,%r8), %ymm9, %ymm7
+        vinsertf128 $1, -32(%rdi,%r8), %ymm8, %ymm6
+        vunpcklpd %ymm6, %ymm7, %ymm9
+        vunpckhpd %ymm6, %ymm7, %ymm8
+        vinsertf128 $1, -16(%rsi,%r8), %ymm3, %ymm5
+        vinsertf128 $1, -16(%rdi,%r8), %ymm4, %ymm6
+        vunpcklpd %ymm6, %ymm5, %ymm7
+        vunpckhpd %ymm6, %ymm5, %ymm6
+        vmovupd   (%rdx,%r8), %xmm3
+        vmovupd   (%rcx,%r8), %xmm5
+        vinsertf128 $1, (%rsi,%r8), %ymm3, %ymm4
+        vinsertf128 $1, (%rdi,%r8), %ymm5, %ymm5
+        vunpcklpd %ymm5, %ymm4, %ymm3
+        vaddpd    %ymm3, %ymm1, %ymm1
+        vfmadd213pd %ymm7, %ymm1, %ymm6
+        vfmadd213pd %ymm8, %ymm1, %ymm6
+        vfmadd213pd %ymm9, %ymm1, %ymm6
+        vfmadd213pd %ymm10, %ymm1, %ymm6
+        vfmadd213pd %ymm11, %ymm1, %ymm6
+        vfmadd213pd %ymm12, %ymm1, %ymm6
+        vfmadd213pd %ymm13, %ymm1, %ymm6
+        vfmadd213pd %ymm14, %ymm1, %ymm6
+        vfmadd213pd %ymm15, %ymm1, %ymm6
+        vfmadd213pd %ymm0, %ymm1, %ymm6
+        vorpd     %ymm2, %ymm6, %ymm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   (%rsp), %ymm1
+        vmovupd   %ymm0, 64(%rsp)
+        vmovupd   %ymm1, 32(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      tanh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_tanh_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dtanh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _dbP[60*16][2];
+        __declspec(align(32)) VUINT32 _dbSignMask[4][2];
+        __declspec(align(32)) VUINT32 _dbAbsMask[4][2];
+        __declspec(align(32)) VUINT32 _iExpMantMask[8][1];
+        __declspec(align(32)) VUINT32 _iExpMask[8][1];
+        __declspec(align(32)) VUINT32 _iMinIdxOfsMask[8][1];
+        __declspec(align(32)) VUINT32 _iMaxIdxMask[8][1];
+} __svml_dtanh_data_internal;
+#endif
+__svml_dtanh_data_internal:
+        /* Polynomial coefficients */
+        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* PH0 = +0.000000000000000000000e-01 */
+        .quad 0x3FF0000000000000   /* P1  = +1.000000000000000014103e+00 */
+        .quad 0xBD197DEAD79668D3   /* P2  = -2.264132406596103056796e-14 */
+        .quad 0xBFD555555553AF3C   /* P3  = -3.333333333273349741024e-01 */
+        .quad 0xBE052F7CCA134846   /* P4  = -6.165791385711493738399e-10 */
+        .quad 0x3FC11111563849D6   /* P5  = +1.333333655353061107201e-01 */
+        .quad 0xBEB038623673FFB2   /* P6  = -9.668021563879858950855e-07 */
+        .quad 0xBFAB9F685E64022E   /* P7  = -5.395055916051593179252e-02 */
+        .quad 0xBF2A54E2B28F2207   /* P8  = -2.008940439550829012647e-04 */
+        .quad 0x3F97CFB9328A230E   /* P9  = +2.325333949059698582189e-02 */
+        .quad 0xBF75CA6D61723E02   /* P10 = -5.320002811586290441790e-03 */
+        .quad 0x0000000000000000   /* B = +0        */
+        .quad 0x3FF0000000000000   /* A = +1.0      */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C3708A564FAD29A   /* PL0 = +1.248663375337163807466e-18 */
+        .quad 0x3FC0E6973998DA48   /* PH0 = +1.320370703922029154143e-01 */
+        .quad 0x3FEF712EB25C0888   /* P1  = +9.825662120422444519229e-01 */
+        .quad 0xBFC09B296F7C1EA9   /* P2  = -1.297351641044220078331e-01 */
+        .quad 0xBFD3DD77541EDDA7   /* P3  = -3.103922196855485849143e-01 */
+        .quad 0x3FB58FFCF4309615   /* P4  = +8.422833406128689275566e-02 */
+        .quad 0x3FBD3ABE845DCF49   /* P5  = +1.141776154670967208833e-01 */
+        .quad 0xBFA791DF538C37FA   /* P6  = -4.603479285115947936529e-02 */
+        .quad 0xBFA4F872F69CD6E8   /* P7  = -4.095801601799370195284e-02 */
+        .quad 0x3F9772E49EF6412B   /* P8  = +2.289921970583567527179e-02 */
+        .quad 0x3F8CBC0807393909   /* P9  = +1.403051635784581776625e-02 */
+        .quad 0xBF85F06A30F93319   /* P10 = -1.071246110873285040939e-02 */
+        .quad 0xBFC1000000000000   /* B = -.132813 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6004EE5739DEAC   /* PL0 = +6.947247374112211856530e-18 */
+        .quad 0x3FC2DC968E6E0D62   /* PH0 = +1.473568149050193398786e-01 */
+        .quad 0x3FEF4E1E606D96DF   /* P1  = +9.782859691010478680677e-01 */
+        .quad 0xBFC273BD70994AB9   /* P2  = -1.441571044730005866646e-01 */
+        .quad 0xBFD382B548270D2C   /* P3  = -3.048527912726111386771e-01 */
+        .quad 0x3FB7CD2D582A6B29   /* P4  = +9.297450449450351894400e-02 */
+        .quad 0x3FBC1278CCCBF0DB   /* P5  = +1.096568584434324642303e-01 */
+        .quad 0xBFA9C7F5115B86A1   /* P6  = -5.035367810138536095866e-02 */
+        .quad 0xBFA371C21BAF618E   /* P7  = -3.797728145554222910481e-02 */
+        .quad 0x3F9958943F68417E   /* P8  = +2.475196492201935923783e-02 */
+        .quad 0x3F8930D5CFFD4152   /* P9  = +1.230017701132682667572e-02 */
+        .quad 0xBF875CF7ADD31B76   /* P10 = -1.140779017658897660092e-02 */
+        .quad 0xBFC3000000000000   /* B = -.148438 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7EABE24E052A1F   /* PL0 = +2.660321779421749543501e-17 */
+        .quad 0x3FC4D04783618C71   /* PH0 = +1.626061812886266111366e-01 */
+        .quad 0x3FEF2765AF97A4B3   /* P1  = +9.735592298067302883212e-01 */
+        .quad 0xBFC443654205FEA5   /* P2  = -1.583067486171689074207e-01 */
+        .quad 0xBFD31F2E208A5B97   /* P3  = -2.987780874040536844467e-01 */
+        .quad 0x3FB9F235BD339878   /* P4  = +1.013520800512156573576e-01 */
+        .quad 0x3FBAD0B0DFCCA141   /* P5  = +1.047468706498238100104e-01 */
+        .quad 0xBFABD1B9600E608E   /* P6  = -5.433444306908184548967e-02 */
+        .quad 0xBFA1CEBEAF07DB58   /* P7  = -3.478046309094534453598e-02 */
+        .quad 0x3F9AFC9FB1D8EFD2   /* P8  = +2.635430834764902126383e-02 */
+        .quad 0x3F8573444F1AB502   /* P9  = +1.047376028449287564018e-02 */
+        .quad 0xBF8874FBC8F24406   /* P10 = -1.194187838544459322219e-02 */
+        .quad 0xBFC5000000000000   /* B = -.164063 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7FB199D361A790   /* PL0 = +2.748994907060158996213e-17 */
+        .quad 0x3FC6C170259E21F7   /* PH0 = +1.777782615356639783766e-01 */
+        .quad 0x3FEEFD17479F7C65   /* P1  = +9.683948897253570478266e-01 */
+        .quad 0xBFC609530FE4DF8D   /* P2  = -1.721595599753950294577e-01 */
+        .quad 0xBFD2B3465D71B4DE   /* P3  = -2.921920692959484052676e-01 */
+        .quad 0x3FBBFD2D34AC509B   /* P4  = +1.093319181057403192166e-01 */
+        .quad 0x3FB9778C3C16A0FE   /* P5  = +9.948040453912551395183e-02 */
+        .quad 0xBFADAC4D9E63C665   /* P6  = -5.795519407719210697372e-02 */
+        .quad 0xBFA0139CCAD02D60   /* P7  = -3.139963126894929339124e-02 */
+        .quad 0x3F9C5BF43BA6F19D   /* P8  = +2.769452680671379432854e-02 */
+        .quad 0x3F8190B703350341   /* P9  = +8.576803002712575184772e-03 */
+        .quad 0xBF8936606782858A   /* P10 = -1.231074634444230850234e-02 */
+        .quad 0xBFC7000000000000   /* B = -.179688 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6A917CA3624D50   /* PL0 = +1.152216693509785660691e-17 */
+        .quad 0x3FC8AFD7B974FABB   /* PH0 = +1.928662925292508878439e-01 */
+        .quad 0x3FEECF47624A5D03   /* P1  = +9.628025932060214187231e-01 */
+        .quad 0xBFC7C4C2CB4FDE4D   /* P2  = -1.856921665891938814679e-01 */
+        .quad 0xBFD23F69CB2C1F9D   /* P3  = -2.851204380135586155453e-01 */
+        .quad 0x3FBDEC5703A03814   /* P4  = +1.168875106670557712458e-01 */
+        .quad 0x3FB8095003D0CF15   /* P5  = +9.389209836154706616487e-02 */
+        .quad 0xBFAF554B47B10CBB   /* P6  = -6.119761705533607365968e-02 */
+        .quad 0xBF9C89743FE7BC1B   /* P7  = -2.786809577986213853937e-02 */
+        .quad 0x3F9D74725B746E7C   /* P8  = +2.876452143855921824991e-02 */
+        .quad 0x3F7B2D8AFB70B88C   /* P9  = +6.635229968237631511880e-03 */
+        .quad 0xBF89A0A2883EF6CB   /* P10 = -1.251341799058582545252e-02 */
+        .quad 0xBFC9000000000000   /* B = -.195313 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7608279E8609CB   /* PL0 = +1.910958764623660748269e-17 */
+        .quad 0x3FCA9B46D2DDC5E3   /* PH0 = +2.078636674519166172015e-01 */
+        .quad 0x3FEE9E0BB72A01A1   /* P1  = +9.567926957534390123919e-01 */
+        .quad 0xBFC974FAD10C5330   /* P2  = -1.988824387305156976885e-01 */
+        .quad 0xBFD1C40ACCBA4044   /* P3  = -2.775904654781735703430e-01 */
+        .quad 0x3FBFBE24E2987853   /* P4  = +1.239951184474830487522e-01 */
+        .quad 0x3FB6885B4345E47F   /* P5  = +8.801813499839460539687e-02 */
+        .quad 0xBFB06563D5670584   /* P6  = -6.404708824176991770896e-02 */
+        .quad 0xBF98CD1D620DF6E2   /* P7  = -2.421995078065365147772e-02 */
+        .quad 0x3F9E44EF3E844D21   /* P8  = +2.955983943054463683119e-02 */
+        .quad 0x3F7325FA0148CAAE   /* P9  = +4.674889165971292322643e-03 */
+        .quad 0xBF89B4C8556C2D92   /* P10 = -1.255184660614964011319e-02 */
+        .quad 0xBFCB000000000000   /* B = -.210938 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6F19DAA20F51D5   /* PL0 = +1.348790537832000351176e-17 */
+        .quad 0x3FCC83876CA98E15   /* PH0 = +2.227639465883021474557e-01 */
+        .quad 0x3FEE697B662D07CD   /* P1  = +9.503762241004040620296e-01 */
+        .quad 0xBFCB194C7ED76ACF   /* P2  = -2.117095584242946953999e-01 */
+        .quad 0xBFD141A19E419762   /* P3  = -2.696308179350720680191e-01 */
+        .quad 0x3FC0B89C64BC7B98   /* P4  = +1.306338779331468503007e-01 */
+        .quad 0x3FB4F721150BBFC5   /* P5  = +8.189589275184434216748e-02 */
+        .quad 0xBFB105AAFAB87898   /* P6  = -6.649273511036069461061e-02 */
+        .quad 0xBF94FB3B31248C01   /* P7  = -2.048962104266749732921e-02 */
+        .quad 0x3F9ECD31E588709C   /* P8  = +3.007963145692880855964e-02 */
+        .quad 0x3F664A91A335C105   /* P9  = +2.721104095762541127495e-03 */
+        .quad 0xBF89754E32E1E26E   /* P10 = -1.243077366619723806134e-02 */
+        .quad 0xBFCD000000000000   /* B = -.226563 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6AC6C889D8111D   /* PL0 = +1.161245469312620769170e-17 */
+        .quad 0x3FCE6864FE55A3D0   /* PH0 = +2.375608674877001114112e-01 */
+        .quad 0x3FEE31AEE116B82B   /* P1  = +9.435648342384913826391e-01 */
+        .quad 0xBFCCB114B69E808B   /* P2  = -2.241540805525839833707e-01 */
+        .quad 0xBFD0B8AB913BA99D   /* P3  = -2.612713735858507980441e-01 */
+        .quad 0x3FC1823322BED48A   /* P4  = +1.367858810096190233514e-01 */
+        .quad 0x3FB35822B7929893   /* P5  = +7.556359273675842651653e-02 */
+        .quad 0xBFB18B03CC78D2DA   /* P6  = -6.852744810096158580830e-02 */
+        .quad 0xBF911CCC3C8D5E5D   /* P7  = -1.671141738492420009734e-02 */
+        .quad 0x3F9F0DEC2D99B12F   /* P8  = +3.032654789278515819797e-02 */
+        .quad 0x3F4A28398B4EBD98   /* P9  = +7.982521989244205404918e-04 */
+        .quad 0xBF88E60CB2FAB9A4   /* P10 = -1.215753480150000985458e-02 */
+        .quad 0xBFCF000000000000   /* B = -.242188 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C89D2B6774FB61D   /* PL0 = +4.479593208720169247958e-17 */
+        .quad 0x3FD09C744F539BE4   /* PH0 = +2.595492148088267558848e-01 */
+        .quad 0x3FEDD823B0400D42   /* P1  = +9.326342050921214825882e-01 */
+        .quad 0xBFCEFBF7FF305FCC   /* P2  = -2.420644756355144687086e-01 */
+        .quad 0xBFCFC01DC4F24A41   /* P3  = -2.480504237797323303990e-01 */
+        .quad 0x3FC291A2C26D5548   /* P4  = +1.450694512701977626753e-01 */
+        .quad 0x3FB0D562E672D188   /* P5  = +6.575601698097532991976e-02 */
+        .quad 0xBFB2201ECC119E06   /* P6  = -7.080261690281738261872e-02 */
+        .quad 0xBF8695D50F778D31   /* P7  = -1.102796987010509974642e-02 */
+        .quad 0x3F9EEC8CFBC031A0   /* P8  = +3.019924437107734972427e-02 */
+        .quad 0xBF6030F0A4D3660A   /* P9  = -1.976461417694923328722e-03 */
+        .quad 0xBF87845288A4AEF5   /* P10 = -1.148285369398347838494e-02 */
+        .quad 0xBFD1000000000000   /* B = -.265625 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8B6AAB614D1C8D   /* PL0 = +4.756035418366735312727e-17 */
+        .quad 0x3FD275F7E1CF7F63   /* PH0 = +2.884502129727392616410e-01 */
+        .quad 0x3FED56658F74C9CC   /* P1  = +9.167964746359813351341e-01 */
+        .quad 0xBFD0ECC045EBD596   /* P2  = -2.644501383614054083635e-01 */
+        .quad 0xBFCD5A4BDE179180   /* P3  = -2.293181261476426808811e-01 */
+        .quad 0x3FC3C00047D34767   /* P4  = +1.542969084462655120552e-01 */
+        .quad 0x3FAAC7CE84FD609F   /* P5  = +5.230565427217581251974e-02 */
+        .quad 0xBFB288948D2E8B43   /* P6  = -7.239654967137902384931e-02 */
+        .quad 0xBF6D6605AAD5A1C0   /* P7  = -3.588687008847041164896e-03 */
+        .quad 0x3F9DDB0790848E97   /* P8  = +2.915584392134337382866e-02 */
+        .quad 0xBF75FDE291BAD5B4   /* P9  = -5.369076763306269573660e-03 */
+        .quad 0xBF84CEA5C52E0A78   /* P10 = -1.015977390284671071888e-02 */
+        .quad 0xBFD3000000000000   /* B = -.296875 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7139A81C8A6ECF   /* PL0 = +1.494049799478574591322e-17 */
+        .quad 0x3FD4470650036407   /* PH0 = +3.168350011233659890841e-01 */
+        .quad 0x3FECC9A69DFDDD48   /* P1  = +8.996155820631566629678e-01 */
+        .quad 0xBFD23DED3A37A09F   /* P2  = -2.850297039535778028925e-01 */
+        .quad 0xBFCAD302395D51C1   /* P3  = -2.095644741153943890185e-01 */
+        .quad 0x3FC4A8FE3F309C22   /* P4  = +1.614072617096278705115e-01 */
+        .quad 0x3FA3D161188AA436   /* P5  = +3.870681213931741151586e-02 */
+        .quad 0xBFB288CFE5494E98   /* P6  = -7.240008685885823969403e-02 */
+        .quad 0x3F6C7903EED8D334   /* P7  = +3.475673371918475361081e-03 */
+        .quad 0x3F9BE023CDFB02F6   /* P8  = +2.722221321778569498033e-02 */
+        .quad 0xBF80F8296F2C3A95   /* P9  = -8.285831170295390358336e-03 */
+        .quad 0xBF8152DF4790049B   /* P10 = -8.458847400108650973189e-03 */
+        .quad 0xBFD5000000000000   /* B = -.328125 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7751FE0FEE8335   /* PL0 = +2.022712113430213599928e-17 */
+        .quad 0x3FD60EF7120502A9   /* PH0 = +3.446633983585721261456e-01 */
+        .quad 0x3FEC32D951E56E6F   /* P1  = +8.812071418319202070776e-01 */
+        .quad 0xBFD370255FC004F8   /* P2  = -3.037198481616338996824e-01 */
+        .quad 0xBFC832F0EBC6BB41   /* P3  = -1.890545989276351359107e-01 */
+        .quad 0x3FC54C99A0FF432F   /* P4  = +1.664001499289269127540e-01 */
+        .quad 0x3F99DAC0CC283C18   /* P5  = +2.524853941036661688369e-02 */
+        .quad 0xBFB227B3896A026D   /* P6  = -7.091829399906553280461e-02 */
+        .quad 0x3F84663364E1FB19   /* P7  = +9.960557476231411602383e-03 */
+        .quad 0x3F9922D70DE07C57   /* P8  = +2.454696676442965935283e-02 */
+        .quad 0xBF85C4A4EB6F86BC   /* P9  = -1.062897532932837635222e-02 */
+        .quad 0xBF7AAB61214FFE17   /* P10 = -6.511096396024671890972e-03 */
+        .quad 0xBFD7000000000000   /* B = -.359375 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3BFE67F266843B2C   /* PL0 = +1.030196791298162288777e-19 */
+        .quad 0x3FD7CD3115FC0F16   /* PH0 = +3.718989100163850869407e-01 */
+        .quad 0x3FEB92F96CCC2C5B   /* P1  = +8.616912007286247079761e-01 */
+        .quad 0xBFD4827320135092   /* P2  = -3.204620183216856200247e-01 */
+        .quad 0xBFC582B15550168A   /* P3  = -1.680509249273891977521e-01 */
+        .quad 0x3FC5AC3B9A2E4C31   /* P4  = +1.693186285816366254244e-01 */
+        .quad 0x3F88FA599FCADAFB   /* P5  = +1.219625491044728129762e-02 */
+        .quad 0xBFB16EC8F5CA169E   /* P6  = -6.809669495313605642174e-02 */
+        .quad 0x3F90140EFC748BBE   /* P7  = +1.570151725639922719844e-02 */
+        .quad 0x3F95CFC49C1A28DC   /* P8  = +2.130038454792147768770e-02 */
+        .quad 0xBF8946ED8B1BF454   /* P9  = -1.234231549050882816697e-02 */
+        .quad 0xBF7239E55C1DD50F   /* P10 = -4.449745117985472755606e-03 */
+        .quad 0xBFD9000000000000   /* B = -.390625 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6412330191189C   /* PL0 = +8.704448096175471149661e-18 */
+        .quad 0x3FD9812B3B03F0A5   /* PH0 = +3.985088421175169703936e-01 */
+        .quad 0x3FEAEB08C3C0E84D   /* P1  = +8.411907027541559254748e-01 */
+        .quad 0xBFD57446B1BC46CF   /* P2  = -3.352219329545790787820e-01 */
+        .quad 0xBFC2CA9ABC0444AD   /* P3  = -1.468079965639267634401e-01 */
+        .quad 0x3FC5CA95F9460D18   /* P4  = +1.702449290424759093710e-01 */
+        .quad 0xBF2C2DAA35DD05C3   /* P5  = -2.149839664813813012186e-04 */
+        .quad 0xBFB069A516EEB75D   /* P6  = -6.411201295733578195472e-02 */
+        .quad 0x3F9512716416FDC7   /* P7  = +2.057816670798986720058e-02 */
+        .quad 0x3F921630CB1319A3   /* P8  = +1.766277541607908852593e-02 */
+        .quad 0xBF8B76DA2EC99526   /* P9  = -1.341028647693549562145e-02 */
+        .quad 0xBF63A97474A161E4   /* P10 = -2.400138332671485493040e-03 */
+        .quad 0xBFDB000000000000   /* B = -.421875 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C89B79F5783381C   /* PL0 = +4.461236087774530799537e-17 */
+        .quad 0x3FDB2A6C993B829D   /* PH0 = +4.244643684778937609003e-01 */
+        .quad 0x3FEA3C0C1FBA328C   /* P1  = +8.198299998926627915155e-01 */
+        .quad 0xBFD6457212F78DE0   /* P2  = -3.479886231636708581604e-01 */
+        .quad 0xBFC0129BDA380A66   /* P3  = -1.255678954622282824818e-01 */
+        .quad 0x3FC5AB77F388FBDE   /* P4  = +1.692953051696965507089e-01 */
+        .quad 0xBF8822F3A6CADB7C   /* P5  = -1.178541519889874597783e-02 */
+        .quad 0xBFAE4A876370A4BD   /* P6  = -5.916236008517603590739e-02 */
+        .quad 0x3F991A89BC3B7710   /* P7  = +2.451529704455085335710e-02 */
+        .quad 0x3F8C4A4328204D4B   /* P8  = +1.381351915555364098800e-02 */
+        .quad 0xBF8C5F921D01EC0B   /* P9  = -1.385416174911393178490e-02 */
+        .quad 0xBF3EE844C5B79FB8   /* P10 = -4.716079617694784908234e-04 */
+        .quad 0xBFDD000000000000   /* B = -.453125 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C73FA437AD7AD87   /* PL0 = +1.732779905745858845932e-17 */
+        .quad 0x3FDCC88C9902CF45   /* PH0 = +4.497405523536495697279e-01 */
+        .quad 0x3FE9870845162D1D   /* P1  = +7.977334355686341748810e-01 */
+        .quad 0xBFD6F62358F73DA8   /* P2  = -3.587730759436120677668e-01 */
+        .quad 0xBFBAC4345D675FE1   /* P3  = -1.045563438450467661101e-01 */
+        .quad 0x3FC5539DA8287019   /* P4  = +1.666142531474868131862e-01 */
+        .quad 0xBF96E3E0DC04A09F   /* P5  = -2.235366194614185212822e-02 */
+        .quad 0xBFAB5EC7147C207D   /* P6  = -5.345747113284546871398e-02 */
+        .quad 0x3F9C24166FFA7A58   /* P7  = +2.748141344511120915667e-02 */
+        .quad 0x3F8451B907819844   /* P8  = +9.921498815128277696693e-03 */
+        .quad 0xBF8C1C6D19191FCB   /* P9  = -1.372609360545586670239e-02 */
+        .quad 0x3F547372DF72E35A   /* P10 = +1.248228245272117756098e-03 */
+        .quad 0xBFDF000000000000   /* B = -.484375 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C848FE06EE49950   /* PL0 = +3.566941590788961528958e-17 */
+        .quad 0x3FDF20211A36475D   /* PH0 = +4.863360172249622803697e-01 */
+        .quad 0x3FE86E67E6B80AC2   /* P1  = +7.634772783497611574659e-01 */
+        .quad 0xBFD7C37C55474D9B   /* P2  = -3.713064987943767913461e-01 */
+        .quad 0xBFB2EBF15F3CB036   /* P3  = -7.391270232318521952684e-02 */
+        .quad 0x3FC4718C8EF6E3AA   /* P4  = +1.597152422016539530950e-01 */
+        .quad 0xBFA277F8394E9B07   /* P5  = -3.607154559658991932071e-02 */
+        .quad 0xBFA680312AB207E3   /* P6  = -4.394677778419955009224e-02 */
+        .quad 0x3F9EDC9A8B57E286   /* P7  = +3.013841128810892143223e-02 */
+        .quad 0x3F71B8C5E648EAF6   /* P8  = +4.326603932492947851719e-03 */
+        .quad 0xBF89DB218356730C   /* P9  = -1.262499029217558458029e-02 */
+        .quad 0x3F6B05728E6EBC8E   /* P10 = +3.298496001171330815865e-03 */
+        .quad 0xBFE1000000000000   /* B = -.53125  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8429831EDD94DE   /* PL0 = +3.497576705878673192147e-17 */
+        .quad 0x3FE10AF47E0BF610   /* PH0 = +5.325872861719194162333e-01 */
+        .quad 0x3FE6EC5879F87EEE   /* P1  = +7.163507826080299761242e-01 */
+        .quad 0xBFD86AD001BFE200   /* P2  = -3.815193192563413204129e-01 */
+        .quad 0xBFA239045B661385   /* P3  = -3.559125533778398983564e-02 */
+        .quad 0x3FC2B4572D9CC147   /* P4  = +1.461285565105845078038e-01 */
+        .quad 0xBFA99F4F01740705   /* P5  = -5.004355328311586406115e-02 */
+        .quad 0xBF9F449C484F4879   /* P6  = -3.053516570418721511214e-02 */
+        .quad 0x3F9F5F42169D7DDE   /* P7  = +3.063681853325116830798e-02 */
+        .quad 0xBF6111B1BA632A97   /* P8  = -2.083632588527460989469e-03 */
+        .quad 0xBF84725FBE5B6E61   /* P9  = -9.983776089419639342530e-03 */
+        .quad 0x3F7438A2986CFA9C   /* P10 = +4.936823976832951342488e-03 */
+        .quad 0xBFE3000000000000   /* B = -.59375  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6BE9160BFB3505   /* PL0 = +1.210424670976053242391e-17 */
+        .quad 0x3FE26D76F73233C7   /* PH0 = +5.758623912857893101247e-01 */
+        .quad 0x3FE56363B5B93937   /* P1  = +6.683825063026124740752e-01 */
+        .quad 0xBFD8A2244B27297E   /* P2  = -3.848963483730115724200e-01 */
+        .quad 0xBF52CA2F101EEF63   /* P3  = -1.146837196286797844817e-03 */
+        .quad 0x3FC081BC342243AD   /* P4  = +1.289592032012739958675e-01 */
+        .quad 0xBFAE38DB4A932344   /* P5  = -5.902753148399722719732e-02 */
+        .quad 0xBF91F814D4AE90C6   /* P6  = -1.754791782481459457885e-02 */
+        .quad 0x3F9D056AE193C4F3   /* P7  = +2.834097863973723355792e-02 */
+        .quad 0xBF7BD0B502D8F3A0   /* P8  = -6.790835451792626336974e-03 */
+        .quad 0xBF7B763F7BB8AE2F   /* P9  = -6.704566938008179114124e-03 */
+        .quad 0x3F76036F42D9AB69   /* P10 = +5.374369252971835729099e-03 */
+        .quad 0xBFE5000000000000   /* B = -.65625  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8B64AF0450486E   /* PL0 = +4.751979286662385162741e-17 */
+        .quad 0x3FE3B75F8BCB742D   /* PH0 = +6.161344271055263499548e-01 */
+        .quad 0x3FE3DA23BC12369F   /* P1  = +6.203783677353447780947e-01 */
+        .quad 0xBFD8768FF4B46416   /* P2  = -3.822364701932782367281e-01 */
+        .quad 0x3F9D67CB8AD9CB1A   /* P3  = +2.871625933625941117406e-02 */
+        .quad 0x3FBC168CB7827DF4   /* P4  = +1.097190807363331305006e-01 */
+        .quad 0xBFB03A2B83C9272E   /* P5  = -6.338760344911228324430e-02 */
+        .quad 0xBF789FEB595297DC   /* P6  = -6.011885959344067548074e-03 */
+        .quad 0x3F98BD01B4C335E7   /* P7  = +2.415850320612902513532e-02 */
+        .quad 0xBF83BADC303D6535   /* P8  = -9.633751127398152979976e-03 */
+        .quad 0xBF6C54E7A1C1E3F3   /* P9  = -3.458454519258407989501e-03 */
+        .quad 0x3F7408394B7EF3E7   /* P10 = +4.890655334688332484537e-03 */
+        .quad 0xBFE7000000000000   /* B = -.71875  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6A48557F6E0D3E   /* PL0 = +1.139824111505584215867e-17 */
+        .quad 0x3FE4E8D895B010DC   /* PH0 = +6.534235881413468227663e-01 */
+        .quad 0x3FE25652FAAF8A73   /* P1  = +5.730376144604875448991e-01 */
+        .quad 0xBFD7F6C3A57C444B   /* P2  = -3.744362941807295084434e-01 */
+        .quad 0x3FAB7866E3F99EBE   /* P3  = +5.365296872042567001598e-02 */
+        .quad 0x3FB6FA1DF47CCD40   /* P4  = +8.975398272450707099784e-02 */
+        .quad 0xBFB05508D3741B8E   /* P5  = -6.379752314033580026840e-02 */
+        .quad 0x3F6C3EFDF7BB279C   /* P6  = +3.448005705512137236209e-03 */
+        .quad 0x3F9372BADD6D3E27   /* P7  = +1.899234749299530050806e-02 */
+        .quad 0xBF860FD5AE65F3DA   /* P8  = -1.077238977881649471165e-02 */
+        .quad 0xBF47266FFB07E628   /* P9  = -7.064863949032872448118e-04 */
+        .quad 0x3F6F9763992C2A05   /* P10 = +3.856367614735181120799e-03 */
+        .quad 0xBFE9000000000000   /* B = -.78125  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6BB6A2B194E3AB   /* PL0 = +1.201878007209462528697e-17 */
+        .quad 0x3FE602609AAE7C22   /* PH0 = +6.877902051090851731630e-01 */
+        .quad 0x3FE0DCBAFE191C7F   /* P1  = +5.269446337560025312137e-01 */
+        .quad 0xBFD732028428A9FB   /* P2  = -3.624273577321727538225e-01 */
+        .quad 0x3FB2D92389BE065B   /* P3  = +7.362577545975439796588e-02 */
+        .quad 0x3FB1F6A9C8C49993   /* P4  = +7.017003203927733370937e-02 */
+        .quad 0xBFAF47C0B50B56EE   /* P5  = -6.109430513394707378526e-02 */
+        .quad 0x3F85A8EDD1356223   /* P6  = +1.057611269668352068104e-02 */
+        .quad 0x3F8BE05C5CD1B4FA   /* P7  = +1.361152799855823798207e-02 */
+        .quad 0xBF85A0EFE4552F76   /* P8  = -1.056086936537046752272e-02 */
+        .quad 0x3F559F2A6A356194   /* P9  = +1.319686337259627831943e-03 */
+        .quad 0x3F6576F5E989208D   /* P10 = +2.620201394425042596201e-03 */
+        .quad 0xBFEB000000000000   /* B = -.84375  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C80328BD86C8B74   /* PL0 = +2.809809047161267929701e-17 */
+        .quad 0x3FE704BB1B7FCB81   /* PH0 = +7.193275010198335595035e-01 */
+        .quad 0x3FDEE264AAD6C40C   /* P1  = +4.825679462765613089739e-01 */
+        .quad 0xBFD637493CE659F1   /* P2  = -3.471243948673921548357e-01 */
+        .quad 0x3FB6BE3A3DEE6F4A   /* P3  = +8.884014141079635303208e-02 */
+        .quad 0x3FAA85EB6470AC0F   /* P4  = +5.180297471118688523488e-02 */
+        .quad 0xBFACC0146EA4858D   /* P5  = -5.615295267694895314457e-02 */
+        .quad 0x3F8F8FB683CDDAC5   /* P6  = +1.541082944616557159055e-02 */
+        .quad 0x3F819515DEE2CB91   /* P7  = +8.585139145315585602547e-03 */
+        .quad 0xBF834E45E6AF9EA1   /* P8  = -9.426637747267209169415e-03 */
+        .quad 0x3F65250F197CA56D   /* P9  = +2.581147662472352252568e-03 */
+        .quad 0x3F57A766026D036C   /* P10 = +1.443719500187702367690e-03 */
+        .quad 0xBFED000000000000   /* B = -.90625  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C716F7EEF7B61AD   /* PL0 = +1.512291215142578135651e-17 */
+        .quad 0x3FE7F0E1A4CD846E   /* PH0 = +7.481544703297353660076e-01 */
+        .quad 0x3FDC2D4CC872DC09   /* P1  = +4.402648885256331012598e-01 */
+        .quad 0xBFD514A99F92ED53   /* P2  = -3.293861444796750250530e-01 */
+        .quad 0x3FB9846A6CF2F337   /* P3  = +9.967675361526749494844e-02 */
+        .quad 0x3FA20896939AB161   /* P4  = +3.522177268800664413493e-02 */
+        .quad 0xBFA97E801F31EE0D   /* P5  = -4.979324703978358553405e-02 */
+        .quad 0x3F92A11F47B82085   /* P6  = +1.819275737037219740638e-02 */
+        .quad 0x3F717D70FE289C34   /* P7  = +4.270020845559097605514e-03 */
+        .quad 0xBF7FDCF1D3F6CE2D   /* P8  = -7.779068604054678540132e-03 */
+        .quad 0x3F69F607E81AF6B6   /* P9  = +3.169074480722534625181e-03 */
+        .quad 0x3F3F925C80D0F889   /* P10 = +4.817462766516585511824e-04 */
+        .quad 0xBFEF000000000000   /* B = -.96875  */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C931A11D7E8606E   /* PL0 = +6.627280241435322692188e-17 */
+        .quad 0x3FE92BFB370D9B71   /* PH0 = +7.866188121086975515439e-01 */
+        .quad 0x3FD866160E454111   /* P1  = +3.812308444367014680480e-01 */
+        .quad 0xBFD33149F3801DBA   /* P2  = -2.998833539899937679796e-01 */
+        .quad 0x3FBBDB6D4C949899   /* P3  = +1.088169395412442909023e-01 */
+        .quad 0x3F8D6AB2A74B9343   /* P4  = +1.436366627735597372494e-02 */
+        .quad 0xBFA404D1047C5D72   /* P5  = -3.909924678571997970917e-02 */
+        .quad 0x3F93C47D9ACCD919   /* P6  = +1.930423981976856424661e-02 */
+        .quad 0xBF41B755642CFF1B   /* P7  = -5.406538915408738478158e-04 */
+        .quad 0xBF74B5301AA1E788   /* P8  = -5.055606752756853900641e-03 */
+        .quad 0x3F69A84C5B2A3E68   /* P9  = +3.132008679422249529120e-03 */
+        .quad 0xBF3CF47830328C11   /* P10 = -4.418176105877589308931e-04 */
+        .quad 0xBFF1000000000000   /* B = -1.0625   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C884D471B8FD396   /* PL0 = +4.215701792312937090514e-17 */
+        .quad 0x3FEA8DBCBC31897A   /* PH0 = +8.298019099859594849278e-01 */
+        .quad 0x3FD3EE730537C8EA   /* P1  = +3.114287901836535219818e-01 */
+        .quad 0xBFD08A05AD27CE32   /* P2  = -2.584242049190123217982e-01 */
+        .quad 0x3FBC5255406F84B6   /* P3  = +1.106313021005175045399e-01 */
+        .quad 0xBF772FA2F633AA5E   /* P4  = -5.660664147607434209241e-03 */
+        .quad 0xBF99DD8E4C473FC4   /* P5  = -2.525923100057504533247e-02 */
+        .quad 0x3F9183C935B6495D   /* P6  = +1.710428610165003372069e-02 */
+        .quad 0xBF70471A3A591480   /* P7  = -3.974058583087303228038e-03 */
+        .quad 0xBF603DDD4DEBB9A4   /* P8  = -1.982624278176818987264e-03 */
+        .quad 0x3F62591E44D3C17F   /* P9  = +2.239760512218135956425e-03 */
+        .quad 0xBF4C195D3A9B1AB4   /* P10 = -8.575158328419569430544e-04 */
+        .quad 0xBFF3000000000000   /* B = -1.1875   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C90DD1C9BFF7F64   /* PL0 = +5.850777430004479798187e-17 */
+        .quad 0x3FEBAD50A4A68BC1   /* PH0 = +8.649066177207417327466e-01 */
+        .quad 0x3FD01FBA72CEE1A5   /* P1  = +2.519365426228666233893e-01 */
+        .quad 0xBFCBE432F647C4D6   /* P2  = -2.179015829602010702633e-01 */
+        .quad 0x3FBABF92B6E5AC73   /* P3  = +1.044856735731387955105e-01 */
+        .quad 0xBF922983AA24E217   /* P4  = -1.773648954369563555378e-02 */
+        .quad 0xBF8C72214C14E23A   /* P5  = -1.388956082756564056328e-02 */
+        .quad 0x3F8ACB4D1F388E8B   /* P6  = +1.308307887581540972153e-02 */
+        .quad 0xBF740EF8B4A2EE3B   /* P7  = -4.897090441029978580995e-03 */
+        .quad 0xBF0EA9F30C8DC900   /* P8  = -5.848668076326342477133e-05 */
+        .quad 0x3F53CC40D18713AE   /* P9  = +1.208365725788622757410e-03 */
+        .quad 0xBF4848B86029CBA1   /* P10 = -7.410908004444779592485e-04 */
+        .quad 0xBFF5000000000000   /* B = -1.3125   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8FB61781D22681   /* PL0 = +5.501032995458057064843e-17 */
+        .quad 0x3FEC950A3340C8BF   /* PH0 = +8.931933404003514764824e-01 */
+        .quad 0x3FC9E1DFFD385423   /* P1  = +2.022056566644617586005e-01 */
+        .quad 0xBFC71E2FF88EBA23   /* P2  = -1.806087459239772032583e-01 */
+        .quad 0x3FB80AEBD07AB5BA   /* P3  = +9.391664352252506838449e-02 */
+        .quad 0xBF98404E27EAE6ED   /* P4  = -2.368280523908243895884e-02 */
+        .quad 0xBF772DA520B5006E   /* P5  = -5.658764868087568802107e-03 */
+        .quad 0x3F824C9268AF9423   /* P6  = +8.935111827620250551925e-03 */
+        .quad 0xBF722AE76D206AE3   /* P7  = -4.435447701349490160113e-03 */
+        .quad 0x3F4B807F56298D5E   /* P8  = +8.392926941493230644497e-04 */
+        .quad 0x3F3D71027DF95D2A   /* P9  = +4.492407879061627603159e-04 */
+        .quad 0xBF3EBD17676755FB   /* P10 = -4.690343988874298905483e-04 */
+        .quad 0xBFF7000000000000   /* B = -1.4375   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C95393C63CE8224   /* PL0 = +7.363407705201031038415e-17 */
+        .quad 0x3FED4E6F464286B0   /* PH0 = +9.158245441687622445670e-01 */
+        .quad 0x3FC4A45842B7DE1E   /* P1  = +1.612654042980787191461e-01 */
+        .quad 0xBFC2E7885AFDD3D0   /* P2  = -1.476908153814791087327e-01 */
+        .quad 0x3FB4DD6DD51D3FEB   /* P3  = +8.150373890862254580204e-02 */
+        .quad 0xBF9A05D3ADAB489C   /* P4  = -2.541285274021075503042e-02 */
+        .quad 0xBF3459B643B4995C   /* P5  = -3.105230313899165257622e-04 */
+        .quad 0x3F766B30745F2E3A   /* P6  = +5.473317409222350365811e-03 */
+        .quad 0xBF6C2C891E555BDF   /* P7  = -3.439204988051155730940e-03 */
+        .quad 0x3F5194F30D6C576D   /* P8  = +1.073109966176012791522e-03 */
+        .quad 0x3EF4DBB43C3132A2   /* P9  = +1.989194766975849961365e-05 */
+        .quad 0xBF2E45EBAB3C15A0   /* P10 = -2.309656316514087783666e-04 */
+        .quad 0xBFF9000000000000   /* B = -1.5625   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C75111669651DAA   /* PL0 = +1.827249135453834384396e-17 */
+        .quad 0x3FEDE1EB5937518F   /* PH0 = +9.338280432225917193634e-01 */
+        .quad 0x3FC06129C7C8EBB1   /* P1  = +1.279651856910653382507e-01 */
+        .quad 0xBFBE9763041064E1   /* P2  = -1.194974789545031421774e-01 */
+        .quad 0x3FB1A5B9F9113928   /* P3  = +6.893503504509068635308e-02 */
+        .quad 0xBF992145039F9AFE   /* P4  = -2.454097590080105816526e-02 */
+        .quad 0x3F66CB116EA49C89   /* P5  = +2.782377288116648315142e-03 */
+        .quad 0x3F67F972FDF30001   /* P6  = +2.926563829163342740100e-03 */
+        .quad 0xBF63A7B5975F02F3   /* P7  = -2.399305983061922438601e-03 */
+        .quad 0x3F4FDE7B8777F4C8   /* P8  = +9.725669069095216373599e-04 */
+        .quad 0xBF25918876626BA4   /* P9  = -1.645545082212515656240e-04 */
+        .quad 0xBF1495123C991F00   /* P10 = -7.851527984669912693674e-05 */
+        .quad 0xBFFB000000000000   /* B = -1.6875   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9F29A5B7426D27   /* PL0 = +1.081172820484012446345e-16 */
+        .quad 0x3FEE56B6F3EFABFC   /* PH0 = +9.480852856044061915952e-01 */
+        .quad 0x3FB9E3EFD94BB9FC   /* P1  = +1.011342912204113371518e-01 */
+        .quad 0xBFB88BD9760FECA7   /* P2  = -9.588393337610288420285e-02 */
+        .quad 0x3FAD48A0350B3ACF   /* P3  = +5.719471595295077387313e-02 */
+        .quad 0xBF96CC6A5110F129   /* P4  = -2.226415748394675367257e-02 */
+        .quad 0x3F71934687170384   /* P5  = +4.290843485649345772606e-03 */
+        .quad 0x3F5407BAF73B3DF9   /* P6  = +1.222546180475235334287e-03 */
+        .quad 0xBF591B626C0646DD   /* P7  = -1.532407870488964407324e-03 */
+        .quad 0x3F48B0E1DD283558   /* P8  = +7.535078860329375669277e-04 */
+        .quad 0xBF2B322292840D2B   /* P9  = -2.074877932117605962646e-04 */
+        .quad 0xBE99E4061120C741   /* P10 = -3.858017559892704559672e-07 */
+        .quad 0xBFFD000000000000   /* B = -1.8125   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6AF8C2041C67CD   /* PL0 = +1.169711482626385762338e-17 */
+        .quad 0x3FEEB2DFEDD5EC93   /* PH0 = +9.593352933146824801369e-01 */
+        .quad 0x3FB465A205CFB638   /* P1  = +7.967579500083210999681e-02 */
+        .quad 0xBFB3914BF68D39FF   /* P2  = -7.643580216720378576778e-02 */
+        .quad 0x3FA7F21A08C5C734   /* P3  = +4.676896435820623621673e-02 */
+        .quad 0xBF93DA9560EA9960   /* P4  = -1.938851741820124550772e-02 */
+        .quad 0x3F73953FEC62820E   /* P5  = +4.781007481284861359820e-03 */
+        .quad 0x3F2749D5E1273E3C   /* P6  = +1.776765426044646108071e-04 */
+        .quad 0xBF4D46B0B498CE5A   /* P7  = -8.934367007839658352859e-04 */
+        .quad 0x3F4153D680E1F4C4   /* P8  = +5.287930851093571206574e-04 */
+        .quad 0xBF28477014ECA6A2   /* P9  = -1.852344816708944640949e-04 */
+        .quad 0x3EFFAC54E07CEB4B   /* P10 = +3.020588886147182143902e-05 */
+        .quad 0xBFFF000000000000   /* B = -1.9375   */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7A8AF2BB2231F2   /* PL0 = +2.302217989249372577466e-17 */
+        .quad 0x3FEF1994DF724FC8   /* PH0 = +9.718727459135090285258e-01 */
+        .quad 0x3FAC65B1BC0C9D58   /* P1  = +5.546336575053583942603e-02 */
+        .quad 0xBFAB9937BDA747C8   /* P2  = -5.390333356957871365599e-02 */
+        .quad 0x3FA15B42D9EF931C   /* P3  = +3.389939222669210777241e-02 */
+        .quad 0xBF8EACD8E8507A3C   /* P4  = -1.497811755149058215502e-02 */
+        .quad 0x3F7263A15721C682   /* P5  = +4.489546046998806349050e-03 */
+        .quad 0xBF42A032ACDC3B32   /* P6  = -5.684134900735048121829e-04 */
+        .quad 0xBF3431E79B5AD185   /* P7  = -3.081503340170088810438e-04 */
+        .quad 0x3F31B51667C7DF5E   /* P8  = +2.701930714290502424828e-04 */
+        .quad 0xBF1F8709579250AD   /* P9  = -1.202678157759563704341e-04 */
+        .quad 0x3F01ED8ED1BF9595   /* P10 = +3.419487094883790833778e-05 */
+        .quad 0xC001000000000000   /* B = -2.125    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C86F3F7C3DAFC55   /* PL0 = +3.981710680748877459333e-17 */
+        .quad 0x3FEF73776B2AA2DB   /* PH0 = +9.828450291725759901951e-01 */
+        .quad 0x3FA16A7FC4D7B900   /* P1  = +3.401564863075812007064e-02 */
+        .quad 0xBFA11E03803AD621   /* P2  = -3.343211117082156940532e-02 */
+        .quad 0x3F9609591597297F   /* P3  = +2.152003473546803654658e-02 */
+        .quad 0xBF847E74ED9BBB0C   /* P4  = -1.000682211039596246436e-02 */
+        .quad 0x3F6BFF771725CD65   /* P5  = +3.417713736035987187864e-03 */
+        .quad 0xBF491D1FF73C18FA   /* P6  = -7.664114077392807421000e-04 */
+        .quad 0x3EF53EE467B51DC5   /* P7  = +2.026145237479599375099e-05 */
+        .quad 0x3F160135BE0D94A0   /* P8  = +8.394136922403255700685e-05 */
+        .quad 0xBF0B32CB1D276A40   /* P9  = -5.187685350778849443841e-05 */
+        .quad 0x3EF4DAF70C12D555   /* P10 = +1.988919462255396826584e-05 */
+        .quad 0xC003000000000000   /* B = -2.375    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C19DBF4E2E5B7DC   /* PL0 = +3.504575836708380670219e-19 */
+        .quad 0x3FEFAA7934B75EBD   /* PH0 = +9.895597486128832054320e-01 */
+        .quad 0x3F9545200830A42C   /* P1  = +2.077150392520736492125e-02 */
+        .quad 0xBF950C46D285F6BC   /* P2  = -2.055464420253970271376e-02 */
+        .quad 0x3F8B79F5BFC6513F   /* P3  = +1.341621390819425058164e-02 */
+        .quad 0xBF7A50ADAD777898   /* P4  = -6.424597194806612772505e-03 */
+        .quad 0x3F633A19BE8255E3   /* P5  = +2.347040444940816227383e-03 */
+        .quad 0xBF44E609BC2557B7   /* P6  = -6.377742322836087134324e-04 */
+        .quad 0x3F1AFCBAD60EAACD   /* P7  = +1.029480968230231421206e-04 */
+        .quad 0x3EE80476AC34A8EF   /* P8  = +1.145240583485084317660e-05 */
+        .quad 0xBEF278E23DE463E9   /* P9  = -1.761646478213091821804e-05 */
+        .quad 0x3EE209FAF377264D   /* P10 = +8.601658563106529694651e-06 */
+        .quad 0xC005000000000000   /* B = -2.625    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C979D62702C631C   /* PL0 = +8.193023793215066385979e-17 */
+        .quad 0x3FEFCC04CDBCDC4B   /* PH0 = +9.936546343150295390600e-01 */
+        .quad 0x3F89E87D088D269A   /* P1  = +1.265046770426474576547e-02 */
+        .quad 0xBF89BE6721012B80   /* P2  = -1.257019586059526836624e-02 */
+        .quad 0x3F80F1C13E8D39D3   /* P3  = +8.273610803056031004326e-03 */
+        .quad 0xBF7082DBC9602757   /* P4  = -4.031046430108839563004e-03 */
+        .quad 0x3F590BE9BD4E0A11   /* P5  = +1.528719197467002507978e-03 */
+        .quad 0xBF3DCC2BEF6D0283   /* P6  = -4.546744598208711809986e-04 */
+        .quad 0x3F1A08065C4A8E85   /* P7  = +9.930170842636406837764e-05 */
+        .quad 0xBEE528117D0410F3   /* P8  = -1.008821337267942266431e-05 */
+        .quad 0xBED0BE73A44FF565   /* P9  = -3.992069257383521775961e-06 */
+        .quad 0x3EC9B0C11E342E38   /* P10 = +3.062539904901699218737e-06 */
+        .quad 0xC007000000000000   /* B = -2.875    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C804B931AD7A3CC   /* PL0 = +2.826768921701616830245e-17 */
+        .quad 0x3FEFE06EB0688212   /* PH0 = +9.961465306733450209009e-01 */
+        .quad 0x3F7F81BD8876224D   /* P1  = +7.692089427458426472642e-03 */
+        .quad 0xBF7F62A8C699A963   /* P2  = -7.662448196791823756776e-03 */
+        .quad 0x3F74C31E2B2A6A28   /* P3  = +5.068891378551522166321e-03 */
+        .quad 0xBF6470D537F16227   /* P4  = -2.495209162173734080001e-03 */
+        .quad 0x3F4FAEEF61C89673   /* P5  = +9.668988091717359455754e-04 */
+        .quad 0xBF33C5E80B349783   /* P6  = -3.017131341088651514023e-04 */
+        .quad 0x3F138F3D31037A6B   /* P7  = +7.461367590931028650557e-05 */
+        .quad 0xBEEB3C780996FFE3   /* P8  = -1.298723536791163711556e-05 */
+        .quad 0x3E9D0C75BC8BFEFC   /* P9  = +4.328589367358221917138e-07 */
+        .quad 0x3EAC3865227764D4   /* P10 = +8.410302755848104487452e-07 */
+        .quad 0xC009000000000000   /* B = -3.125    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C5B978B202749F9   /* PL0 = +5.983054034451594408315e-18 */
+        .quad 0x3FEFECD6B7EA3128   /* PH0 = +9.976609794698889643882e-01 */
+        .quad 0x3F73238B786137FE   /* P1  = +4.672570043181776968058e-03 */
+        .quad 0xBF731815ACEA072E   /* P2  = -4.661640805922390930706e-03 */
+        .quad 0x3F6956F0816D5AEE   /* P3  = +3.093213784647877798933e-03 */
+        .quad 0xBF591A16286C4885   /* P4  = -1.532098425461232453877e-03 */
+        .quad 0x3F43B3E3A00C6096   /* P5  = +6.012784434430592468442e-04 */
+        .quad 0xBF29441B2A56DEC7   /* P6  = -1.927645836710038499293e-04 */
+        .quad 0x3F0A99C3A2E857B6   /* P7  = +5.073669705184196724674e-05 */
+        .quad 0xBEE61CB034DDC151   /* P8  = -1.054385361573597042258e-05 */
+        .quad 0x3EB792BBC76D6107   /* P9  = +1.405070887824641788698e-06 */
+        .quad 0x3E761472362A16F0   /* P10 = +8.225391704739515383837e-08 */
+        .quad 0xC00B000000000000   /* B = -3.375    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9C290AFCBDE00D   /* PL0 = +9.770074992945060684926e-17 */
+        .quad 0x3FEFF45F6D36133A   /* PH0 = +9.985806592017987259879e-01 */
+        .quad 0x3F673CEC093032DE   /* P1  = +2.836667068100913999228e-03 */
+        .quad 0xBF67347A7CD844D5   /* P2  = -2.832640870800243808078e-03 */
+        .quad 0x3F5EDA25530355DB   /* P3  = +1.883064698679040793627e-03 */
+        .quad 0xBF4EAD3BBABC1BA9   /* P4  = -9.361783645268534848806e-04 */
+        .quad 0x3F3842E61CD35432   /* P5  = +3.701984213198588740338e-04 */
+        .quad 0xBF1F9AB7FD1A3DDD   /* P6  = -1.205611036090218544867e-04 */
+        .quad 0x3F0136C154EA3DED   /* P7  = +3.283288480304320224929e-05 */
+        .quad 0xBEDF12807F721E66   /* P8  = -7.408207230892235753013e-06 */
+        .quad 0x3EB5B53687AD5112   /* P9  = +1.293889481520047941659e-06 */
+        .quad 0xBE801E90FBFED147   /* P10 = -1.200988872775447204019e-07 */
+        .quad 0xC00D000000000000   /* B = -3.625    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9E323294294877   /* PL0 = +1.047637125334028950603e-16 */
+        .quad 0x3FEFF8F21CDAAA62   /* PH0 = +9.991388858373506653976e-01 */
+        .quad 0x3F5C3470628813F2   /* P1  = +1.721486807697344658108e-03 */
+        .quad 0xBF5C2E38AC6FF8D2   /* P2  = -1.720004411026422324849e-03 */
+        .quad 0x3F52C13234626F43   /* P3  = +1.144694354969070234454e-03 */
+        .quad 0xBF42B0A47DF47BB4   /* P4  = -5.703738387728891173354e-04 */
+        .quad 0x3F2DB2889E32FBFD   /* P5  = +2.265731592156760387344e-04 */
+        .quad 0xBF1385FBD54C5A55   /* P6  = -7.447576110695385196414e-05 */
+        .quad 0x3EF5AFA812C6984E   /* P7  = +2.068153223579892541184e-05 */
+        .quad 0xBED47097C188A03C   /* P8  = -4.873231795467276043290e-06 */
+        .quad 0x3EAFF2B982F7EE8C   /* P9  = +9.521288628073486288914e-07 */
+        .quad 0xBE828EC5B57D424D   /* P10 = -1.382656715739529384702e-07 */
+        .quad 0xC00F000000000000   /* B = -3.875    */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9BA40DA6983BEC   /* PL0 = +9.589840482158163453169e-17 */
+        .quad 0x3FEFFCAAC3F20E65   /* PH0 = +9.995931460438894911036e-01 */
+        .quad 0x3F4AA87CF664754C   /* P1  = +8.135423820793490331956e-04 */
+        .quad 0xBF4AA5B62919E224   /* P2  = -8.132113891426467676310e-04 */
+        .quad 0x3F41C01B53B0B312   /* P3  = +5.416997368051531710388e-04 */
+        .quad 0xBF31B8B54D091751   /* P4  = -2.704088811110632606347e-04 */
+        .quad 0x3F1C431305954ECC   /* P5  = +1.078110084525254933728e-04 */
+        .quad 0xBF02B7DEAD0D44E6   /* P6  = -3.570221236393906131126e-05 */
+        .quad 0x3EE51C6EFF109EA9   /* P7  = +1.006654199116272154479e-05 */
+        .quad 0xBEC48CFB08072D17   /* P8  = -2.449834994621594976610e-06 */
+        .quad 0x3EA1585EC59CAE34   /* P9  = +5.169271261920604503617e-07 */
+        .quad 0xBE78832BAF950BA9   /* P10 = -9.131575131209528255629e-08 */
+        .quad 0xC011000000000000   /* B = -4.25     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8FBF237F4AFE10   /* PL0 = +5.507163370275307643966e-17 */
+        .quad 0x3FEFFEC61279A3A4   /* PH0 = +9.998503075449787225182e-01 */
+        .quad 0x3F339E78281A00EA   /* P1  = +2.993625022114214863645e-04 */
+        .quad 0xBF339DB7B072AD62   /* P2  = -2.993176899035080028902e-04 */
+        .quad 0x3F2A259E658EF4E4   /* P3  = +1.994853835451177669594e-04 */
+        .quad 0xBF1A219C312B10BA   /* P4  = -9.968295880030927192162e-05 */
+        .quad 0x3F04E146B4F5F4B7   /* P5  = +3.982541113154699160876e-05 */
+        .quad 0xBEEBC5F137088210   /* P6  = -1.324329943580649487333e-05 */
+        .quad 0x3ECF96736E300B00   /* P7  = +3.765547135882256916132e-06 */
+        .quad 0xBEAF4874840B91EB   /* P8  = -9.323068824421825762292e-07 */
+        .quad 0x3E8B6AB2B5C8FD3F   /* P9  = +2.042709991312793245971e-07 */
+        .quad 0xBE650BCCE62FD2B7   /* P10 = -3.920140725219944650830e-08 */
+        .quad 0xC013000000000000   /* B = -4.75     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9C869C85471703   /* PL0 = +9.896883942603146946483e-17 */
+        .quad 0x3FEFFF8C81C6DC33   /* PH0 = +9.999449286177707341139e-01 */
+        .quad 0x3F1CDF5A2E4D7C69   /* P1  = +1.101397316012206760643e-04 */
+        .quad 0xBF1CDEF1F9BE63BE   /* P2  = -1.101336660539594564027e-04 */
+        .quad 0x3F133EC10C83AAA0   /* P3  = +7.341435696487731017506e-05 */
+        .quad 0xBF033DAB325FAACB   /* P4  = -3.669909192168459445238e-05 */
+        .quad 0x3EEEC598FA98BAD8   /* P5  = +1.467316890843338172161e-05 */
+        .quad 0xBED47F1A15BA368E   /* P6  = -4.886744445221253126882e-06 */
+        .quad 0x3EB761FBE7D201C1   /* P7  = +1.393720509029845064726e-06 */
+        .quad 0xBE974CD75A43BF6B   /* P8  = -3.471994551992448536007e-07 */
+        .quad 0x3E74B02965BBF8DC   /* P9  = +7.706929621914905669946e-08 */
+        .quad 0xBE504EF4E3892A66   /* P10 = -1.518840362012570189110e-08 */
+        .quad 0xC015000000000000   /* B = -5.25     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C643810400471B0   /* PL0 = +8.768592603904887599187e-18 */
+        .quad 0x3FEFFFD583014825   /* PH0 = +9.999797400180382433987e-01 */
+        .quad 0x3F053E71416C43CA   /* P1  = +4.051955345663706869871e-05 */
+        .quad 0xBF053E550C7C8CC9   /* P2  = -4.051873253121394012080e-05 */
+        .quad 0x3EFC52D0D90D4843   /* P3  = +2.701139380018752534477e-05 */
+        .quad 0xBEEC523A6ADBE142   /* P4  = -1.350460237457883558350e-05 */
+        .quad 0x3ED6A73E22D844B3   /* P5  = +5.400965660055565196396e-06 */
+        .quad 0xBEBE31D10F23ACD0   /* P6  = -1.799738182979224868919e-06 */
+        .quad 0x3EA13E14264DEAB2   /* P7  = +5.138663935333241981438e-07 */
+        .quad 0xBE81385ABB98EDCC   /* P8  = -1.282999997786486835638e-07 */
+        .quad 0x3E5EB9164593E0B6   /* P9  = +2.861301981891537161158e-08 */
+        .quad 0xBE387218CFE7772E   /* P10 = -5.691705994073124478195e-09 */
+        .quad 0xC017000000000000   /* B = -5.75     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C92530433F4C703   /* PL0 = +6.357512739163799046861e-17 */
+        .quad 0x3FEFFFF05E8D3191   /* PH0 = +9.999925467214315633058e-01 */
+        .quad 0x3EEF42DDFA52B575   /* P1  = +1.490650158538873335176e-05 */
+        .quad 0xBEEF42CEB54212AA   /* P2  = -1.490639048307961378200e-05 */
+        .quad 0x3EE4D7201CBCB853   /* P3  = +9.937445518550804010127e-06 */
+        .quad 0xBED4D6F764B66C37   /* P4  = -4.968574624976280456686e-06 */
+        .quad 0x3EC0ABB806EBDE71   /* P5  = +1.987311456171617620608e-06 */
+        .quad 0xBEA6399CF854F876   /* P6  = -6.623581475862682369330e-07 */
+        .quad 0x3E8964B91728D7C9   /* P7  = +1.891959403186505598965e-07 */
+        .quad 0xBE6961A0528444D6   /* P8  = -4.727645325404986954168e-08 */
+        .quad 0x3E46AE3B0814EE00   /* P9  = +1.056147192151514779549e-08 */
+        .quad 0xBE221B8194DACD16   /* P10 = -2.107984154277957626641e-09 */
+        .quad 0xC019000000000000   /* B = -6.25     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C7BB5622CE1A79E   /* PL0 = +2.403331811901679167526e-17 */
+        .quad 0x3FEFFFFA3FF22708   /* PH0 = +9.999972580855862602789e-01 */
+        .quad 0x3ED7003552D53503   /* P1  = +5.483821309338170039906e-06 */
+        .quad 0xBED7003130C1AB92   /* P2  = -5.483806273169366545037e-06 */
+        .quad 0x3ECEAAE13B699C45   /* P3  = +3.655850800133043324271e-06 */
+        .quad 0xBEBEAACB305F3D07   /* P4  = -1.827905351959291114416e-06 */
+        .quad 0x3EA8887F5F9C87EF   /* P5  = +7.311461438267648556646e-07 */
+        .quad 0xBE905AD08DF8454F   /* P6  = -2.437046884027860662692e-07 */
+        .quad 0x3E72B068300B703F   /* P7  = +6.962228483613086736676e-08 */
+        .quad 0xBE52AF921A71C058   /* P8  = -1.740252888706390465423e-08 */
+        .quad 0x3E30B53EAA35300D   /* P9  = +3.890131469838137725119e-09 */
+        .quad 0xBE0AB60CDAD7E22E   /* P10 = -7.773963050435300060566e-10 */
+        .quad 0xC01B000000000000   /* B = -6.75     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8BD1ACF80D7256   /* PL0 = +4.825835138930451121169e-17 */
+        .quad 0x3FEFFFFDE2760A41   /* PH0 = +9.999989913051835488389e-01 */
+        .quad 0x3EC0EC4F1EC27E55   /* P1  = +2.017388615341105998718e-06 */
+        .quad 0xBEC0EC4E005E6EAC   /* P2  = -2.017386580411626200507e-06 */
+        .quad 0x3EB6906504BC4610   /* P3  = +1.344921673533307001969e-06 */
+        .quad 0xBEA6905F0D52C8B5   /* P4  = -6.724581235377781360384e-07 */
+        .quad 0x3E920D0F5CCE152B   /* P5  = +2.689810941136721216499e-07 */
+        .quad 0xBE7811505B10E753   /* P6  = -8.965891741619763761543e-08 */
+        .quad 0x3E5B811EE4F9B8EE   /* P7  = +2.561544781706659619288e-08 */
+        .quad 0xBE3B80ABC067E840   /* P8  = -6.403452884688571158579e-09 */
+        .quad 0x3E1898E394E09335   /* P9  = +1.431746793613569087489e-09 */
+        .quad 0xBDF3ABB5BA711DB7   /* P10 = -2.862469657501951918569e-10 */
+        .quad 0xC01D000000000000   /* B = -7.25     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8AE01DB39A3791   /* PL0 = +4.662147961093911873193e-17 */
+        .quad 0x3FEFFFFF38C76668   /* PH0 = +9.999996289217962797125e-01 */
+        .quad 0x3EA8E712E56E1188   /* P1  = +7.421562696484951529573e-07 */
+        .quad 0xBEA8E7124A650791   /* P2  = -7.421559942504648535596e-07 */
+        .quad 0x3EA09A0B62D8EF94   /* P3  = +4.947702955735978541097e-07 */
+        .quad 0xBE909A09C56C2107   /* P4  = -2.473847805916120382218e-07 */
+        .quad 0x3E7A900A90A54A6E   /* P5  = +9.895362410487317236618e-08 */
+        .quad 0xBE61B5557BB449B6   /* P6  = -3.298434544432568302770e-08 */
+        .quad 0x3E443CC74732CDCA   /* P7  = +9.423781066565733462466e-09 */
+        .quad 0xBE243CA8AA8D6E54   /* P8  = -2.355890888986360997159e-09 */
+        .quad 0x3E0219C341E0D1B4   /* P9  = +5.267978308406275552691e-10 */
+        .quad 0xBDDCF49A10950F13   /* P10 = -1.053394074620716018815e-10 */
+        .quad 0xC01F000000000000   /* B = -7.75     */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C75CB18F3775414   /* PL0 = +1.890271747518592444083e-17 */
+        .quad 0x3FEFFFFFD38C39F0   /* PH0 = +9.999999172012490333827e-01 */
+        .quad 0x3E8639E2F89493BB   /* P1  = +1.655974950855472979393e-07 */
+        .quad 0xBE8639E2D9B29562   /* P2  = -1.655974813708346974914e-07 */
+        .quad 0x3E7DA2836A1F706E   /* P3  = +1.103982989742589616541e-07 */
+        .quad 0xBE6DA282C6733DAE   /* P4  = -5.519913131581509871840e-08 */
+        .quad 0x3E57B53A278851FD   /* P5  = +2.207971980430773309147e-08 */
+        .quad 0xBE3F9C4A72536E22   /* P6  = -7.359895614149337484810e-09 */
+        .quad 0x3E220E81FBE19CDD   /* P7  = +2.102073153607135257714e-09 */
+        .quad 0xBE020E8875ADA8D8   /* P8  = -5.255211642212584097407e-10 */
+        .quad 0x3DE07634328384FC   /* P9  = +1.197748786062966341989e-10 */
+        .quad 0xBDBA54078E3C351F   /* P10 = -2.394539505021488953905e-11 */
+        .quad 0xC021000000000000   /* B = -8.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C98B78738B0EDEF   /* PL0 = +8.575399788039081964921e-17 */
+        .quad 0x3FEFFFFFF9FBEA40   /* PH0 = +9.999999887944071019774e-01 */
+        .quad 0x3E581056FAC28C46   /* P1  = +2.241118550516412682327e-08 */
+        .quad 0xBE581056F63A4351   /* P2  = -2.241118525356742542550e-08 */
+        .quad 0x3E500AE49533790A   /* P3  = +1.494078933911655875521e-08 */
+        .quad 0xBE400AE489ACBA90   /* P4  = -7.470394349637968945652e-09 */
+        .quad 0x3E29AB0D59A1967B   /* P5  = +2.988168557255271725494e-09 */
+        .quad 0xBE111CB32D6EEF2B   /* P6  = -9.960558400070350772418e-10 */
+        .quad 0x3DF38CBADF396908   /* P7  = +2.844859618921805216353e-10 */
+        .quad 0xBDD38CC7B92CECD3   /* P8  = -7.112220386749926320915e-11 */
+        .quad 0x3DB1D2BBE2705032   /* P9  = +1.621008722427575444686e-11 */
+        .quad 0xBD8C8199294E6380   /* P10 = -3.240784656869469020111e-12 */
+        .quad 0xC023000000000000   /* B = -9.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8EEEC16618B984   /* PL0 = +5.365957423487855307906e-17 */
+        .quad 0x3FEFFFFFFF2F9279   /* PH0 = +9.999999984834878619111e-01 */
+        .quad 0x3E2A0DB0D052B148   /* P1  = +3.033024167396880687734e-09 */
+        .quad 0xBE2A0DB0CFA6AB71   /* P2  = -3.033024162734192808028e-09 */
+        .quad 0x3E215E75D53A3105   /* P3  = +2.022016035353114070618e-09 */
+        .quad 0xBE115E75D40AA47F   /* P4  = -1.011008013562702155050e-09 */
+        .quad 0x3DFBCA5CDC12ED1C   /* P5  = +4.044047007631481841556e-10 */
+        .quad 0xBDE286E85704FC22   /* P6  = -1.348015410318274576187e-10 */
+        .quad 0x3DC52A8925354517   /* P7  = +3.850101197145027796396e-11 */
+        .quad 0xBDA52A97EA3F5F4A   /* P8  = -9.625355478142550638468e-12 */
+        .quad 0x3D834C011A2AC0F7   /* P9  = +2.193802608697321032841e-12 */
+        .quad 0xBD5EDD05BDCB3A62   /* P10 = -4.385948508419928563300e-13 */
+        .quad 0xC025000000000000   /* B = -10.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6BD8B474BBF792   /* PL0 = +1.207649585364892639612e-17 */
+        .quad 0x3FEFFFFFFFE3CAD8   /* PH0 = +9.999999997947623953110e-01 */
+        .quad 0x3DFC3527E43C565F   /* P1  = +4.104751852963940338559e-10 */
+        .quad 0xBDFC3527E420F415   /* P2  = -4.104751852036136216697e-10 */
+        .quad 0x3DF2CE1A8D806DAD   /* P3  = +2.736501142887952919489e-10 */
+        .quad 0xBDE2CE1A8DDF690A   /* P4  = -1.368250573053032426141e-10 */
+        .quad 0x3DCE169832D8BD68   /* P5  = +5.473022586854025789680e-11 */
+        .quad 0xBDB40F0FE853DA5B   /* P6  = -1.824340550195944358477e-11 */
+        .quad 0x3D96EA8D930D31A1   /* P7  = +5.210545794901128943676e-12 */
+        .quad 0xBD76EA9DB0D09839   /* P8  = -1.302650427355019556441e-12 */
+        .quad 0x3D54E474FD4303A1   /* P9  = +2.968990047962355000258e-13 */
+        .quad 0xBD30B526CA2B228A   /* P10 = -5.935740124899435401321e-14 */
+        .quad 0xC027000000000000   /* B = -11.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C56E8953D525FD5   /* PL0 = +4.967494994909661698725e-18 */
+        .quad 0x3FEFFFFFFFFC2EB9   /* PH0 = +9.999999999722241073030e-01 */
+        .quad 0x3DCE8A37A48016C2   /* P1  = +5.555177547354687971427e-11 */
+        .quad 0xBDCE8A37A479B7D4   /* P2  = -5.555177547084873157964e-11 */
+        .quad 0x3DC45C250CFA9C16   /* P3  = +3.703451575129414499553e-11 */
+        .quad 0xBDB45C250D9F8467   /* P4  = -1.851725791056759260154e-11 */
+        .quad 0x3DA049BB33CBD4E9   /* P5  = +7.406930640558963265190e-12 */
+        .quad 0xBD85B7A407C422C1   /* P6  = -2.468976464832073512208e-12 */
+        .quad 0x3D68CF9CED2B3FD5   /* P7  = +7.051706989348171774536e-13 */
+        .quad 0xBD48CFAE64C352B3   /* P8  = -1.762945685274427023683e-13 */
+        .quad 0x3D269EAE08690D52   /* P9  = +4.018091287355461204663e-14 */
+        .quad 0xBD0216CBEAFFF5AA   /* P10 = -8.033151495672990022322e-15 */
+        .quad 0xC029000000000000   /* B = -12.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C8ACF1392B106D3   /* PL0 = +4.650601502940921454330e-17 */
+        .quad 0x3FEFFFFFFFFF7BBD   /* PH0 = +9.999999999962408958609e-01 */
+        .quad 0x3DA088529889B316   /* P1  = +7.518115268189742464885e-12 */
+        .quad 0xBDA088529887F4C4   /* P2  = -7.518115268005149164680e-12 */
+        .quad 0x3D960B18BF1DF711   /* P3  = +5.012076679213679703380e-12 */
+        .quad 0xBD860B18BFD99A48   /* P4  = -2.506038344573564868987e-12 */
+        .quad 0x3D71A27E7CA64143   /* P5  = +1.002419056539285288454e-12 */
+        .quad 0xBD5783530EA76D91   /* P6  = -3.341396294294381580191e-13 */
+        .quad 0x3D3ADCC75CBD2A03   /* P7  = +9.543447641637910477850e-14 */
+        .quad 0xBD1ADCDA46BE5F17   /* P8  = -2.385887543769010971872e-14 */
+        .quad 0x3CF87D77650BE5B8   /* P9  = +5.437895260471143131391e-15 */
+        .quad 0xBCD395AE6E74C6D2   /* P10 = -1.087168847335561258239e-15 */
+        .quad 0xC02B000000000000   /* B = -13.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C97A8A295292858   /* PL0 = +8.208271151146829171896e-17 */
+        .quad 0x3FEFFFFFFFFFEE19   /* PH0 = +9.999999999994911847878e-01 */
+        .quad 0x3D71E642BB008F95   /* P1  = +1.017466259229268282255e-12 */
+        .quad 0xBD71E642BAFEEC54   /* P2  = -1.017466259207593392022e-12 */
+        .quad 0x3D67DDAE41647741   /* P3  = +6.783108169938233581038e-13 */
+        .quad 0xBD57DDAE4230F34B   /* P4  = -3.391554091734942426856e-13 */
+        .quad 0x3D4317C33FAE2536   /* P5  = +1.356626669455791324801e-13 */
+        .quad 0xBD2975040D3E26B9   /* P6  = -4.522088139411435138867e-14 */
+        .quad 0x3D0D155DCD0F0AFB   /* P7  = +1.291565189902030307333e-14 */
+        .quad 0xBCED157247832B20   /* P8  = -3.228947666403019234175e-15 */
+        .quad 0x3CCA83D70F607C28   /* P9  = +7.359390959466796619024e-16 */
+        .quad 0xBCA5343952C1E19E   /* P10 = -1.471323041436694087188e-16 */
+        .quad 0xC02D000000000000   /* B = -14.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C9B7876CBC5306E   /* PL0 = +9.530765996816607711732e-17 */
+        .quad 0x3FEFFFFFFFFFFD93   /* PH0 = +9.999999999999310551502e-01 */
+        .quad 0x3D436121E2640D76   /* P1  = +1.376990843765503869546e-13 */
+        .quad 0xBD436121E26250EA   /* P2  = -1.376990843736775811281e-13 */
+        .quad 0x3D39D6D7CA259186   /* P3  = +9.179938654047876451320e-14 */
+        .quad 0xBD29D6D7CB0327CE   /* P4  = -4.589969336188563660531e-14 */
+        .quad 0x3D14ABE4DC31244A   /* P5  = +1.835994545584345768382e-14 */
+        .quad 0xBCFB8FDB82AB6BB7   /* P6  = -6.119980791767901275443e-15 */
+        .quad 0x3CDF7CF757491B60   /* P7  = +1.747943407988343076526e-15 */
+        .quad 0xBCBF7D0D833640FB   /* P8  = -4.369905470133249448357e-16 */
+        .quad 0x3C9CB512F6BDC754   /* P9  = +9.959852600692493655511e-17 */
+        .quad 0xBC76F50AB1B0E9BA   /* P10 = -1.991219205936492089091e-17 */
+        .quad 0xC02F000000000000   /* B = -15.5      */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C6FFE15D5F78543   /* PL0 = +1.387454417328248962819e-17 */
+        .quad 0x3FEFFFFFFFFFFFE1   /* PH0 = +9.999999999999965583086e-01 */
+        .quad 0x3CFEE00288B99C26   /* P1  = +6.855635762864742358597e-15 */
+        .quad 0xBCFEE0027D060EE2   /* P2  = -6.855635607998342735403e-15 */
+        .quad 0x3CF4954AA23148A2   /* P3  = +4.570381865813341696777e-15 */
+        .quad 0xBCE4954B5DAD3010   /* P4  = -2.285192173571711474199e-15 */
+        .quad 0x3CD07883DD8793BD   /* P5  = +9.143109661358222028007e-16 */
+        .quad 0xBCB5F5F4BB87ADCF   /* P6  = -3.047668447080103869032e-16 */
+        .quad 0x3C98F1A905097685   /* P7  = +8.654183371862458774513e-17 */
+        .quad 0xBC78F2D585007222   /* P8  = -2.163943551222030413627e-17 */
+        .quad 0x3C58A37CC5082B5F   /* P9  = +5.342649626494471588064e-18 */
+        .quad 0xBC33AE7917F94D17   /* P10 = -1.066938163384541013918e-18 */
+        .quad 0xC031000000000000   /* B = -17        */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x3C91BF1D80474F0F   /* PL0 = +6.157069264461989135096e-17 */
+        .quad 0x3FEFFFFFFFFFFFFE   /* PH0 = +9.999999999999997779554e-01 */
+        .quad 0x3CB72071400E6275   /* P1  = +3.209478247225075961360e-16 */
+        .quad 0xBCB72071400A9F37   /* P2  = -3.209478247103497434502e-16 */
+        .quad 0x3CAED5EC39A77629   /* P3  = +2.139652050028423711308e-16 */
+        .quad 0xBC9ED5EC3B530600   /* P4  = -1.069826028468029104719e-16 */
+        .quad 0x3C88AB2BFED159DE   /* P5  = +4.279326904335078988705e-17 */
+        .quad 0xBC70721D1220B3FC   /* P6  = -1.426441958074916244382e-17 */
+        .quad 0x3C52C96049721FB8   /* P7  = +4.073700029965821523731e-18 */
+        .quad 0xBC32C971215735DC   /* P8  = -1.018438939975201710113e-18 */
+        .quad 0x3C112EF658AB41A9   /* P9  = +2.328791246104218830028e-19 */
+        .quad 0xBBEB7B598C6AD3DE   /* P10 = -4.655603964908654142787e-20 */
+        .quad 0xC03287E0C98F84E5   /* B = -18.530774 */
+        .quad 0x3FF0000000000000   /* A = +1        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
+        .quad 0x3FF0000000000000   /* PH0 = +1.000000000000000000000e+00 */
+        .quad 0x0000000000000000   /* P1  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P2  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P3  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P4  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P5  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P6  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P7  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P8  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P9  = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* P10 = +0.000000000000000000000e-01 */
+        .quad 0x0000000000000000   /* B = +0        */
+        .quad 0x0000000000000000   /* A = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .quad 0x0000000000000000   /* Align value = +0        */
+        .align 32
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000   /* _dbSignMask       */
+        .align 32
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff   /* _dbAbsMask        */
+        .align 32
+        .long 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000           /* _iExpMantMask     */
+        .align 32
+        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMask         */
+        .align 32
+        .long 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000           /* _iMinIdxOfsMask   */
+        .align 32
+        .long 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000           /* _iMaxIdxMask      */
+        .align 32
+        .type	__svml_dtanh_data_internal,@object
+        .size	__svml_dtanh_data_internal,.-__svml_dtanh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S
new file mode 100644
index 0000000000..92fb24a640
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized tanh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_tanh _ZGVeN8v_tanh_avx2_wrapper
+#include "../svml_d_tanh8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c
new file mode 100644
index 0000000000..495cb1f4fc
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized tanh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_tanh
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_tanh, __GI__ZGVeN8v_tanh, __redirect__ZGVeN8v_tanh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S
new file mode 100644
index 0000000000..01fc22ba6f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S
@@ -0,0 +1,472 @@
+/* Function tanh vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   NOTE: Since the hyperbolic tangent function is odd
+ *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
+ *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
+ *
+ *   We use a table lookup method to compute tanh(|x|).
+ *   The basic idea is to split the input range into a number of subintervals
+ *   and to approximate tanh(.) with a polynomial on each of them.
+ *
+ *   IEEE SPECIAL CONDITIONS:
+ *   x = [+,-]0, r = [+,-]0
+ *   x = +Inf,   r = +1
+ *   x = -Inf,   r = -1
+ *   x = QNaN,   r = QNaN
+ *   x = SNaN,   r = QNaN
+ *
+ *
+ *   ALGORITHM DETAILS
+ *   We handle special values in a callout function, aside from main path
+ *   computations. "Special" for this algorithm are:
+ *   INF, NAN, |x| > HUGE_THRESHOLD
+ *
+ *
+ *   Main path computations are organized as follows:
+ *   Actually we split the interval [0, SATURATION_THRESHOLD)
+ *   into a number of subintervals.  On each subinterval we approximate tanh(.)
+ *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
+ *   are computed beforehand and stored in table. We also use
+ *
+ *       y := |x| + B,
+ *
+ *   here B depends on subinterval and is used to make argument
+ *   closer to zero.
+ *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
+ *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
+ *   preserve main path computation logic but return 1.0 for all arguments.
+ *
+ *   Hence reconstruction looks as follows:
+ *   we extract proper polynomial and range reduction coefficients
+ *        (Pj and B), corresponding to subinterval, to which |x| belongs,
+ *        and return
+ *
+ *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
+ *
+ *   NOTE: we use multiprecision technique to multiply and sum the first
+ *         K terms of the polynomial. So Pj, j = 0..K are stored in
+ *         table each as a pair of target precision numbers (Pj and PLj) to
+ *         achieve wider than target precision.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_dtanh_data_internal
+ */
+#define _dC                           	0
+#define _dP0                          	128
+#define _dP1                          	256
+#define _dP2                          	384
+#define _dP3                          	512
+#define _dP4                          	640
+#define _dP5                          	768
+#define _dP6                          	896
+#define _dP7                          	1024
+#define _dP8                          	1152
+#define _dP9                          	1280
+#define _dP10                         	1408
+#define _dP11                         	1536
+#define _dP12                         	1664
+#define _dP13                         	1792
+#define _dP14                         	1920
+#define _dP15                         	2048
+#define _dP16                         	2176
+#define _dP17                         	2304
+#define _iExpMantMask_UISA            	2432
+#define _iMinIdxOfsMask_UISA          	2496
+#define _iMaxIdxMask_UISA             	2560
+#define _dbSignMask                   	2624
+#define _dbAbsMask                    	2688
+#define _iExpMantMask                 	2752
+#define _iExpMask                     	2816
+#define _iMinIdxOfsMask               	2880
+#define _iMaxIdxMask                  	2944
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_tanh_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $320, %rsp
+        vpsrlq    $32, %zmm0, %zmm4
+        vmovups   %zmm0, (%rsp)
+        vmovups   __svml_dtanh_data_internal(%rip), %zmm14
+        vmovups   _dP0+__svml_dtanh_data_internal(%rip), %zmm15
+        vpmovqd   %zmm4, %ymm5
+
+/*  Constant loading  */
+        vandpd    _dbAbsMask+__svml_dtanh_data_internal(%rip), %zmm0, %zmm13
+        vandpd    _dbSignMask+__svml_dtanh_data_internal(%rip), %zmm0, %zmm3
+
+/* Here huge arguments, INF and NaNs are filtered out to callout. */
+        vpand     _iExpMantMask_UISA+__svml_dtanh_data_internal(%rip), %ymm5, %ymm7
+        vmovups   _dP2+__svml_dtanh_data_internal(%rip), %zmm0
+        vmovups   _dP16+__svml_dtanh_data_internal(%rip), %zmm4
+        vmovups   _dP15+__svml_dtanh_data_internal(%rip), %zmm5
+        vmovups   %zmm3, 64(%rsp)
+        vmovups   _dP3+__svml_dtanh_data_internal(%rip), %zmm3
+        vpsubd    _iMinIdxOfsMask_UISA+__svml_dtanh_data_internal(%rip), %ymm7, %ymm8
+
+/* if VMIN, VMAX is defined for I type */
+        vxorps    %ymm9, %ymm9, %ymm9
+        vpmaxsd   %ymm9, %ymm8, %ymm10
+        vpminsd   _iMaxIdxMask_UISA+__svml_dtanh_data_internal(%rip), %ymm10, %ymm11
+        vpsrld    $19, %ymm11, %ymm12
+        vmovups   _dP12+__svml_dtanh_data_internal(%rip), %zmm8
+        vmovups   _dP11+__svml_dtanh_data_internal(%rip), %zmm9
+        vmovups   _dP10+__svml_dtanh_data_internal(%rip), %zmm10
+        vmovups   _dP9+__svml_dtanh_data_internal(%rip), %zmm11
+        vpmovzxdq %ymm12, %zmm2
+        vmovups   _dP8+__svml_dtanh_data_internal(%rip), %zmm12
+        vpermt2pd _dP2+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm0
+        vpermt2pd _dC+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm14
+        vpermt2pd _dP16+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm4
+        vpermt2pd _dP15+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm5
+        vsubpd    {rn-sae}, %zmm14, %zmm13, %zmm1
+        vpermt2pd _dP12+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm8
+        vpermt2pd _dP11+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm9
+        vpermt2pd _dP10+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm10
+        vpermt2pd _dP9+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm11
+        vpermt2pd _dP8+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm12
+        vpermt2pd _dP3+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm3
+        vpermt2pd _dP0+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm15
+        vmovups   %zmm0, 192(%rsp)
+        vmovups   _dP17+__svml_dtanh_data_internal(%rip), %zmm0
+        vmovups   _dP7+__svml_dtanh_data_internal(%rip), %zmm13
+        vmovups   _dP6+__svml_dtanh_data_internal(%rip), %zmm14
+        vmovups   %zmm3, 256(%rsp)
+        vmovups   _dP5+__svml_dtanh_data_internal(%rip), %zmm3
+        vmovups   %zmm15, 128(%rsp)
+        vmovups   _dP4+__svml_dtanh_data_internal(%rip), %zmm15
+        vpermt2pd _dP17+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm0
+        vpermt2pd _dP7+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm13
+        vpermt2pd _dP6+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm14
+        vpermt2pd _dP5+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm3
+        vpermt2pd _dP4+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm15
+        vfmadd213pd {rn-sae}, %zmm4, %zmm1, %zmm0
+        vpcmpgtd  _iExpMask+__svml_dtanh_data_internal(%rip), %ymm7, %ymm6
+        vmovmskps %ymm6, %edx
+        vmovups   _dP14+__svml_dtanh_data_internal(%rip), %zmm6
+        vfmadd213pd {rn-sae}, %zmm5, %zmm1, %zmm0
+        vmovups   _dP13+__svml_dtanh_data_internal(%rip), %zmm7
+        vpermt2pd _dP14+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm6
+        vpermt2pd _dP13+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm7
+        vfmadd213pd {rn-sae}, %zmm6, %zmm1, %zmm0
+        vmovups   256(%rsp), %zmm2
+        vfmadd213pd {rn-sae}, %zmm7, %zmm1, %zmm0
+        vfmadd213pd {rn-sae}, %zmm8, %zmm1, %zmm0
+        vfmadd213pd {rn-sae}, %zmm9, %zmm1, %zmm0
+        vfmadd213pd {rn-sae}, %zmm10, %zmm1, %zmm0
+        vfmadd213pd {rn-sae}, %zmm11, %zmm1, %zmm0
+        vfmadd213pd {rn-sae}, %zmm12, %zmm1, %zmm0
+        vfmadd213pd {rn-sae}, %zmm13, %zmm1, %zmm0
+        vfmadd213pd {rn-sae}, %zmm14, %zmm1, %zmm0
+        vfmadd213pd {rn-sae}, %zmm3, %zmm1, %zmm0
+        vmovups   128(%rsp), %zmm3
+        vfmadd213pd {rn-sae}, %zmm15, %zmm1, %zmm0
+        vfmadd213pd {rn-sae}, %zmm2, %zmm1, %zmm0
+        vmovups   192(%rsp), %zmm2
+        vfmadd213pd {rn-sae}, %zmm2, %zmm1, %zmm0
+        vfmadd213pd {rn-sae}, %zmm3, %zmm1, %zmm0
+        vorpd     64(%rsp), %zmm0, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   (%rsp), %zmm1
+        vmovups   %zmm0, 128(%rsp)
+        vmovups   %zmm1, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -304; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -312; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xfe, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -320; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -304; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -312; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xfe, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -320; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      tanh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_tanh_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dtanh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(64)) VUINT32 _dC[16][2];
+        __declspec(align(64)) VUINT32 _dP0[16][2];
+        __declspec(align(64)) VUINT32 _dP1[16][2];
+        __declspec(align(64)) VUINT32 _dP2[16][2];
+        __declspec(align(64)) VUINT32 _dP3[16][2];
+        __declspec(align(64)) VUINT32 _dP4[16][2];
+        __declspec(align(64)) VUINT32 _dP5[16][2];
+        __declspec(align(64)) VUINT32 _dP6[16][2];
+        __declspec(align(64)) VUINT32 _dP7[16][2];
+        __declspec(align(64)) VUINT32 _dP8[16][2];
+        __declspec(align(64)) VUINT32 _dP9[16][2];
+        __declspec(align(64)) VUINT32 _dP10[16][2];
+        __declspec(align(64)) VUINT32 _dP11[16][2];
+        __declspec(align(64)) VUINT32 _dP12[16][2];
+        __declspec(align(64)) VUINT32 _dP13[16][2];
+        __declspec(align(64)) VUINT32 _dP14[16][2];
+        __declspec(align(64)) VUINT32 _dP15[16][2];
+        __declspec(align(64)) VUINT32 _dP16[16][2];
+        __declspec(align(64)) VUINT32 _dP17[16][2];
+        __declspec(align(64)) VUINT32 _iExpMantMask_UISA[16][1];
+        __declspec(align(64)) VUINT32 _iMinIdxOfsMask_UISA[16][1];
+        __declspec(align(64)) VUINT32 _iMaxIdxMask_UISA[16][1];
+        __declspec(align(64)) VUINT32 _dbSignMask[8][2];
+        __declspec(align(64)) VUINT32 _dbAbsMask[8][2];
+        __declspec(align(64)) VUINT32 _iExpMantMask[16][1];
+        __declspec(align(64)) VUINT32 _iExpMask[16][1];
+        __declspec(align(64)) VUINT32 _iMinIdxOfsMask[16][1];
+        __declspec(align(64)) VUINT32 _iMaxIdxMask[16][1];
+} __svml_dtanh_data_internal;
+#endif
+__svml_dtanh_data_internal:
+        /*== _dC ==*/
+        .quad 0x0000000000000000, 0x3fcc000000000000, 0x3fd4000000000000, 0x3fdc000000000000
+        .quad 0x3fe4000000000000, 0x3fec000000000000, 0x3ff4000000000000, 0x3ffc000000000000
+        .quad 0x4004000000000000, 0x400c000000000000, 0x4014000000000000, 0x401c000000000000
+        .quad 0x4024000000000000, 0x402c000000000000, 0x4034000000000000, 0x0000000000000000
+        /*== p0 ==*/
+        .align 64
+        .quad 0x0000000000000000, 0x3fcb8fd0416a7c92, 0x3fd35f98a0ea650e, 0x3fda5729ee488037
+        .quad 0x3fe1bf47eabb8f95, 0x3fe686650b8c2015, 0x3feb2523bb6b2dee, 0x3fee1fbf97e33527
+        .quad 0x3fef9258260a71c2, 0x3feff112c63a9077, 0x3fefff419668df11, 0x3feffffc832750f2
+        .quad 0x3feffffffdc96f35, 0x3fefffffffffcf58, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== p1 ==*/
+        .align 64
+        .quad 0x0000000000000000, 0x3c65e23ebcd3bcbe, 0xbc4c600bac3adf00, 0x3c6c44091785d040
+        .quad 0x3c8221d7a6e3674b, 0x3c69f89d2cf6b85c, 0x3c73b3e9ec0b8f1c, 0xbc7f8d4b0428aada
+        .quad 0xbc7c52d880cf43c0, 0x3c7dd36e37096480, 0x3c7b4f6380c442ca, 0xbc729755de470096
+        .quad 0x3c84cf852845efbd, 0x3c6fc4fb440a5378, 0xbc63981083b55870, 0x0000000000000000
+        /*== p2 ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3fee842ca3f08532, 0x3fed11574af58f1b, 0x3fea945b9c24e4f9
+        .quad 0x3fe6284c3374f815, 0x3fe02500a09f8d6e, 0x3fd1f25131e3a8c0, 0x3fbd22ca1c24a139
+        .quad 0x3f9b3afe1fba5c76, 0x3f6dd37d19b22b21, 0x3f27ccec13a9ef96, 0x3ecbe6c3f33250ae
+        .quad 0x3e41b4865394f75f, 0x3d8853f01bda5f28, 0x3c73953c0197ef58, 0x0000000000000000
+        /*== p3 ==*/
+        .align 64
+        .quad 0xbbf0b3ea3fdfaa19, 0xbfca48aaeb53bc21, 0xbfd19921f4329916, 0xbfd5e0f09bef8011
+        .quad 0xbfd893b59c35c882, 0xbfd6ba7cb7576538, 0xbfce7291743d7555, 0xbfbb6d85a01efb80
+        .quad 0xbf9addae58c7141a, 0xbf6dc59376c7aa19, 0xbf27cc5e74677410, 0xbecbe6c0e8b4cc87
+        .quad 0xbe41b486526b0565, 0xbd8853f01bef63a4, 0xbc73955be519be31, 0x0000000000000000
+        /*== p4 ==*/
+        .align 64
+        .quad 0xbfd5555555555555, 0xbfd183afc292ba11, 0xbfcc1a4b039c9bfa, 0xbfc16e1e6d8d0be6
+        .quad 0xbf92426c751e48a2, 0x3fb4f152b2bad124, 0x3fbbba40cbef72be, 0x3fb01ba038be6a3d
+        .quad 0x3f916df44871efc8, 0x3f63c6869dfc8870, 0x3f1fb9aef915d828, 0x3ec299d1e27c6e11
+        .quad 0x3e379b5ddcca334c, 0x3d8037f57bc62c9a, 0x3c6a2d4b50a2cff7, 0x0000000000000000
+        /*== p5 ==*/
+        .align 64
+        .quad 0xbce6863ee44ed636, 0x3fc04dcd0476c75e, 0x3fc43d3449a80f08, 0x3fc5c26f3699b7e7
+        .quad 0x3fc1a686f6ab2533, 0x3faf203c316ce730, 0xbf89c7a02788557c, 0xbf98157e26e0d541
+        .quad 0xbf807b55c1c7d278, 0xbf53a18d5843190f, 0xbf0fb6bbc89b1a5b, 0xbeb299c9c684a963
+        .quad 0xbe279b5dd4fb3d01, 0xbd7037f57ae72aa6, 0xbc5a2ca2bba78e86, 0x0000000000000000
+        /*== p6 ==*/
+        .align 64
+        .quad 0x3fc1111111112ab5, 0x3fb5c19efdfc08ad, 0x3fa74c98dc34fbac, 0xbf790d6a8eff0a77
+        .quad 0xbfac3c021789a786, 0xbfae2196b7326859, 0xbf93a7a011ff8c2a, 0x3f6e4709c7e8430e
+        .quad 0x3f67682afa611151, 0x3f3ef2ee77717cbf, 0x3ef95a4482f180b7, 0x3e9dc2c27da3b603
+        .quad 0x3e12e2afd9f7433e, 0x3d59f320348679ba, 0x3c44b61d9bbcc940, 0x0000000000000000
+        /*== p7 ==*/
+        .align 64
+        .quad 0xbda1ea19ddddb3b4, 0xbfb0b8df995ce4df, 0xbfb2955cf41e8164, 0xbfaf9d05c309f7c6
+        .quad 0xbf987d27ccff4291, 0x3f8b2ca62572b098, 0x3f8f1cf6c7f5b00a, 0x3f60379811e43dd5
+        .quad 0xbf4793826f78537e, 0xbf2405695e36240f, 0xbee0e08de39ce756, 0xbe83d709ba5f714e
+        .quad 0xbdf92e3fc5ee63e0, 0xbd414cc030f2110e, 0xbc2ba022e8d82a87, 0x0000000000000000
+        /*== p8 ==*/
+        .align 64
+        .quad 0xbfaba1ba1990520b, 0xbf96e37bba52f6fc, 0x3ecff7df18455399, 0x3f97362834d33a4e
+        .quad 0x3f9e7f8380184b45, 0x3f869543e7c420d4, 0xbf7326bd4914222a, 0xbf5fc15b0a9d98fa
+        .quad 0x3f14cffcfa69fbb6, 0x3f057e48e5b79d10, 0x3ec33b66d7d77264, 0x3e66ac4e578b9b10
+        .quad 0x3ddcc74b8d3d5c42, 0x3d23c589137f92b4, 0x3c107f8e2c8707a1, 0x0000000000000000
+        /*== p9 ==*/
+        .align 64
+        .quad 0xbe351ca7f096011f, 0x3f9eaaf3320c3851, 0x3f9cf823fe761fc1, 0x3f9022271754ff1f
+        .quad 0xbf731fe77c9c60af, 0xbf84a6046865ec7d, 0xbf4ca3f1f2b9192b, 0x3f4c77dee0afd227
+        .quad 0x3f04055bce68597a, 0xbee2bf0cb4a71647, 0xbea31eaafe73efd5, 0xbe46abb02c4368ed
+        .quad 0xbdbcc749ca8079dd, 0xbd03c5883836b9d2, 0xbbf07a5416264aec, 0x0000000000000000
+        /*== p10 ==*/
+        .align 64
+        .quad 0x3f9664f94e6ac14e, 0xbf94d3343bae39dd, 0xbf7bc748e60df843, 0xbf8c89372b43ba85
+        .quad 0xbf8129a092de747a, 0x3f60c85b4d538746, 0x3f5be9392199ec18, 0xbf2a0c68a4489f10
+        .quad 0xbf00462601dc2faa, 0x3eb7b6a219dea9f4, 0x3e80cbcc8d4c5c8a, 0x3e2425bb231a5e29
+        .quad 0x3d9992a4beac8662, 0x3ce191ba5ed3fb67, 0x3bc892450bad44c4, 0x0000000000000000
+        /*== p11 ==*/
+        .align 64
+        .quad 0xbea8c4c1fd7852fe, 0xbfccce16b1046f13, 0xbf81a16f224bb7b6, 0xbf62cbf00406bc09
+        .quad 0x3f75b29bb02cf69b, 0x3f607df0f9f90c17, 0xbf4b852a6e0758d5, 0xbf0078c63d1b8445
+        .quad 0x3eec12eadd55be7a, 0xbe6fa600f593181b, 0xbe5a3c935dce3f7d, 0xbe001c6d95e3ae96
+        .quad 0xbd74755a00ea1fd3, 0xbcbc1c6c063bb7ac, 0xbba3be9a4460fe00, 0x0000000000000000
+        /*== p12 ==*/
+        .align 64
+        .quad 0xbf822404577aa9dd, 0x403d8b07f7a82aa3, 0xbf9f44ab92fbab0a, 0x3fb2eac604473d6a
+        .quad 0x3f45f87d903aaac8, 0xbf5e104671036300, 0x3f19bc98ddf0f340, 0x3f0d4304bc9246e8
+        .quad 0xbed13c415f7b9d41, 0xbe722b8d9720cdb0, 0x3e322666d739bec0, 0x3dd76a553d7e7918
+        .quad 0x3d4de0fa59416a39, 0x3c948716cf3681b4, 0x3b873f9f2d2fda99, 0x0000000000000000
+        /*== p13 ==*/
+        .align 64
+        .quad 0xbefdd99a221ed573, 0x4070593a3735bab4, 0xbfccab654e44835e, 0x3fd13ed80037dbac
+        .quad 0xbf6045b9076cc487, 0x3f2085ee7e8ac170, 0x3f23524622610430, 0xbeff12a6626911b4
+        .quad 0x3eab9008bca408af, 0x3e634df71865f620, 0xbe05bb1bcf83ca73, 0xbdaf2ac143fb6762
+        .quad 0xbd23eae52a3dbf57, 0xbc6b5e3e9ca0955e, 0xbb5eca68e2c1ba2e, 0x0000000000000000
+        /*== p14 ==*/
+        .align 64
+        .quad 0x3f6e3be689423841, 0xc0d263511f5baac1, 0x40169f73b15ebe5c, 0xc025c1dd41cd6cb5
+        .quad 0xbf58fd89fe05e0d1, 0x3f73f7af01d5af7a, 0xbf1e40bdead17e6b, 0x3ee224cd6c4513e5
+        .quad 0xbe24b645e68eeaa3, 0xbe4abfebfb72bc83, 0x3dd51c38f8695ed3, 0x3d8313ac38c6832b
+        .quad 0x3cf7787935626685, 0x3c401ffc49c6bc29, 0xbabf0b21acfa52ab, 0x0000000000000000
+        /*== p15 ==*/
+        .align 64
+        .quad 0xbf2a1306713a4f3a, 0xc1045e509116b066, 0x4041fab9250984ce, 0xc0458d090ec3de95
+        .quad 0xbf74949d60113d63, 0x3f7c9fd6200d0ade, 0x3f02cd40e0ad0a9f, 0xbe858ab8e019f311
+        .quad 0xbe792fa6323b7cf8, 0x3e2df04d67876402, 0xbd95c72be95e4d2c, 0xbd55a89c30203106
+        .quad 0xbccad6b3bb9eff65, 0xbc12705ccd3dd884, 0xba8e0a4c47ae75f5, 0x0000000000000000
+        /*== p16 ==*/
+        .align 64
+        .quad 0xbf55d7e76dc56871, 0x41528c38809c90c7, 0xc076d57fb5190b02, 0x4085f09f888f8ada
+        .quad 0x3fa246332a2fcba5, 0xbfb29d851a896fcd, 0x3ed9065ae369b212, 0xbeb8e1ba4c98a030
+        .quad 0x3e6ffd0766ad4016, 0xbe0c63c29f505f5b, 0xbd7fab216b9e0e49, 0x3d2826b62056aa27
+        .quad 0x3ca313e31762f523, 0x3bea37aa21895319, 0x3ae5c7f1fd871496, 0x0000000000000000
+        /*== p17 ==*/
+        .align 64
+        .quad 0x3f35e67ab76a26e7, 0x41848ee0627d8206, 0xc0a216d618b489ec, 0x40a5b89107c8af4f
+        .quad 0x3fb69d8374520eda, 0xbfbded519f981716, 0xbef02d288b5b3371, 0x3eb290981209c1a6
+        .quad 0xbe567e924bf5ff6e, 0x3de3f7f7de6b0eb6, 0x3d69ed18bae3ebbc, 0xbcf7534c4f3dfa71
+        .quad 0xbc730b73f1eaff20, 0xbbba2cff8135d462, 0xbab5a71b5f7d9035, 0x0000000000000000
+        .align 64
+        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask_UISA     */
+        .align 64
+        .long 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000           /* _iMinIdxOfsMask_UISA   */
+        .align 64
+        .long 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000           /* _iMaxIdxMask_UISA      */
+        .align 64
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000   /* _dbSignMask       */
+        .align 64
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff   /* _dbAbsMask        */
+        .align 64
+        .long 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000           /* _iExpMantMask     */
+        .align 64
+        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMask         */
+        .align 64
+        .long 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000           /* _iMinIdxOfsMask   */
+        .align 64
+        .long 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000           /* _iMaxIdxMask      */
+        .align 64
+        .type	__svml_dtanh_data_internal,@object
+        .size	__svml_dtanh_data_internal,.-__svml_dtanh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S
new file mode 100644
index 0000000000..76bb22229e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized tanhf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_tanhf _ZGVeN16v_tanhf_avx2_wrapper
+#include "../svml_s_tanhf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c
new file mode 100644
index 0000000000..cec4c7ed74
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized tanhf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_tanhf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_tanhf, __GI__ZGVeN16v_tanhf,
+	       __redirect__ZGVeN16v_tanhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
new file mode 100644
index 0000000000..b6bdf97cc5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
@@ -0,0 +1,381 @@
+/* Function tanhf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   NOTE: Since the hyperbolic tangent function is odd
+ *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
+ *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
+ *
+ *   We use a table lookup method to compute tanh(|x|).
+ *   The basic idea is to split the input range into a number of subintervals
+ *   and to approximate tanh(.) with a polynomial on each of them.
+ *
+ *   IEEE SPECIAL CONDITIONS:
+ *   x = [+,-]0, r = [+,-]0
+ *   x = +Inf,   r = +1
+ *   x = -Inf,   r = -1
+ *   x = QNaN,   r = QNaN
+ *   x = SNaN,   r = QNaN
+ *
+ *
+ *   ALGORITHM DETAILS
+ *   We handle special values in a callout function, aside from main path
+ *   computations. "Special" for this algorithm are:
+ *   INF, NAN, |x| > HUGE_THRESHOLD
+ *
+ *
+ *   Main path computations are organized as follows:
+ *   Actually we split the interval [0, SATURATION_THRESHOLD)
+ *   into a number of subintervals.  On each subinterval we approximate tanh(.)
+ *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
+ *   are computed beforehand and stored in table. We also use
+ *
+ *       y := |x| + B,
+ *
+ *   here B depends on subinterval and is used to make argument
+ *   closer to zero.
+ *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
+ *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
+ *   preserve main path computation logic but return 1.0 for all arguments.
+ *
+ *   Hence reconstruction looks as follows:
+ *   we extract proper polynomial and range reduction coefficients
+ *        (Pj and B), corresponding to subinterval, to which |x| belongs,
+ *        and return
+ *
+ *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
+ *
+ *   NOTE: we use multiprecision technique to multiply and sum the first
+ *         K terms of the polynomial. So Pj, j = 0..K are stored in
+ *         table each as a pair of target precision numbers (Pj and PLj) to
+ *         achieve wider than target precision.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_stanh_data_internal
+ */
+#define _sC                           	0
+#define _sP0                          	128
+#define _sP2                          	256
+#define _sP3                          	384
+#define _sP4                          	512
+#define _sP5                          	640
+#define _sP6                          	768
+#define _sP7                          	896
+#define _iExpMantMask_UISA            	1024
+#define _iMinIdxOfsMask_UISA          	1088
+#define _iMaxIdxMask_UISA             	1152
+#define _sSignMask                    	1216
+#define _sAbsMask                     	1280
+#define _iExpMantMask                 	1344
+#define _iExpMask                     	1408
+#define _iMinIdxOfsMask               	1472
+#define _iMaxIdxMask                  	1536
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_tanhf_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovaps   %zmm0, %zmm1
+        vmovups   __svml_stanh_data_internal(%rip), %zmm9
+        vmovups   _sP6+__svml_stanh_data_internal(%rip), %zmm11
+        vmovups   _sP5+__svml_stanh_data_internal(%rip), %zmm12
+        vmovups   _sP4+__svml_stanh_data_internal(%rip), %zmm13
+        vmovups   _sP3+__svml_stanh_data_internal(%rip), %zmm14
+        vmovups   _sP2+__svml_stanh_data_internal(%rip), %zmm15
+        vpternlogd $255, %zmm2, %zmm2, %zmm2
+        vandps    _sAbsMask+__svml_stanh_data_internal(%rip), %zmm1, %zmm8
+        vandps    _sSignMask+__svml_stanh_data_internal(%rip), %zmm1, %zmm0
+
+/* Here huge arguments, INF and NaNs are filtered out to callout. */
+        vpandd    _iExpMantMask_UISA+__svml_stanh_data_internal(%rip), %zmm1, %zmm3
+        vpsubd    _iMinIdxOfsMask_UISA+__svml_stanh_data_internal(%rip), %zmm3, %zmm4
+        vpcmpd    $2, _iExpMask+__svml_stanh_data_internal(%rip), %zmm3, %k1
+
+/*
+ *  small table specific variables *
+ *  Constant loading
+ */
+        vpxord    %zmm5, %zmm5, %zmm5
+
+/* if VMIN, VMAX is defined for I type */
+        vpmaxsd   %zmm5, %zmm4, %zmm6
+        vpminsd   _iMaxIdxMask_UISA+__svml_stanh_data_internal(%rip), %zmm6, %zmm7
+        vpsrld    $21, %zmm7, %zmm10
+        vmovups   _sP7+__svml_stanh_data_internal(%rip), %zmm4
+        vpermt2ps _sC+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm9
+        vpermt2ps _sP6+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm11
+        vpermt2ps _sP7+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm4
+        vpermt2ps _sP5+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm12
+        vpermt2ps _sP4+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm13
+        vpermt2ps _sP3+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm14
+        vpermt2ps _sP2+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm15
+        vpandnd   %zmm3, %zmm3, %zmm2{%k1}
+        vptestmd  %zmm2, %zmm2, %k0
+        vmovups   _sP0+__svml_stanh_data_internal(%rip), %zmm3
+        vsubps    {rn-sae}, %zmm9, %zmm8, %zmm2
+        kmovw     %k0, %edx
+        vfmadd213ps {rn-sae}, %zmm11, %zmm2, %zmm4
+        vpermt2ps _sP0+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm3
+        vfmadd213ps {rn-sae}, %zmm12, %zmm2, %zmm4
+        vfmadd213ps {rn-sae}, %zmm13, %zmm2, %zmm4
+        vfmadd213ps {rn-sae}, %zmm14, %zmm2, %zmm4
+        vfmadd213ps {rn-sae}, %zmm15, %zmm2, %zmm4
+        vfmadd213ps {rn-sae}, %zmm3, %zmm2, %zmm4
+        vorps     %zmm0, %zmm4, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm1, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      tanhf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_tanhf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_stanh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(64)) VUINT32 _sC[32][1];
+        __declspec(align(64)) VUINT32 _sP0[32][1];
+        __declspec(align(64)) VUINT32 _sP2[32][1];
+        __declspec(align(64)) VUINT32 _sP3[32][1];
+        __declspec(align(64)) VUINT32 _sP4[32][1];
+        __declspec(align(64)) VUINT32 _sP5[32][1];
+        __declspec(align(64)) VUINT32 _sP6[32][1];
+        __declspec(align(64)) VUINT32 _sP7[32][1];
+        __declspec(align(64)) VUINT32 _iExpMantMask_UISA[16][1];
+        __declspec(align(64)) VUINT32 _iMinIdxOfsMask_UISA[16][1];
+        __declspec(align(64)) VUINT32 _iMaxIdxMask_UISA[16][1];
+        __declspec(align(64)) VUINT32 _sSignMask[16][1];
+        __declspec(align(64)) VUINT32 _sAbsMask[16][1];
+        __declspec(align(64)) VUINT32 _iExpMantMask[16][1];
+        __declspec(align(64)) VUINT32 _iExpMask[16][1];
+        __declspec(align(64)) VUINT32 _iMinIdxOfsMask[16][1];
+        __declspec(align(64)) VUINT32 _iMaxIdxMask[16][1];
+} __svml_stanh_data_internal;
+#endif
+__svml_stanh_data_internal:
+        /*== _sC ==*/
+        .long 0x00000000, 0x3d700000, 0x3d900000, 0x3db00000
+        .long 0x3dd00000, 0x3df00000, 0x3e100000, 0x3e300000
+        .long 0x3e500000, 0x3e700000, 0x3e900000, 0x3eb00000
+        .long 0x3ed00000, 0x3ef00000, 0x3f100000, 0x3f300000
+        .long 0x3f500000, 0x3f700000, 0x3f900000, 0x3fb00000
+        .long 0x3fd00000, 0x3ff00000, 0x40100000, 0x40300000
+        .long 0x40500000, 0x40700000, 0x40900000, 0x40b00000
+        .long 0x40d00000, 0x40f00000, 0x41100000, 0x00000000
+        /*== p0 ==*/
+        .align 64
+        .long 0x00000000, 0x3d6fb9c9, 0x3d8fc35f, 0x3daf9169
+        .long 0x3dcf49ab, 0x3deee849, 0x3e0f0ee8, 0x3e2e4984
+        .long 0x3e4d2f8e, 0x3e6bb32e, 0x3e8c51cd, 0x3ea96163
+        .long 0x3ec543f1, 0x3edfd735, 0x3f028438, 0x3f18abf0
+        .long 0x3f2bc480, 0x3f3bec1c, 0x3f4f2e5b, 0x3f613c53
+        .long 0x3f6ce37d, 0x3f743c4f, 0x3f7a5feb, 0x3f7dea85
+        .long 0x3f7f3b3d, 0x3f7fb78c, 0x3f7fefd4, 0x3f7ffdd0
+        .long 0x3f7fffb4, 0x3f7ffff6, 0x3f7fffff, 0x3f800000
+        /*== p2 ==*/
+        .align 64
+        .long 0x3f800000, 0x3f7f1f84, 0x3f7ebd11, 0x3f7e1e5f
+        .long 0x3f7d609f, 0x3f7c842d, 0x3f7b00e5, 0x3f789580
+        .long 0x3f75b8ad, 0x3f726fd9, 0x3f6cc59b, 0x3f63fb92
+        .long 0x3f59ff97, 0x3f4f11d7, 0x3f3d7573, 0x3f24f360
+        .long 0x3f0cbfe7, 0x3eec1a69, 0x3eb0a801, 0x3e6753a2
+        .long 0x3e132f1a, 0x3db7e7d3, 0x3d320845, 0x3c84d3d4
+        .long 0x3bc477b7, 0x3b10d3da, 0x3a01601e, 0x388c1a3b
+        .long 0x3717b0da, 0x35a43bce, 0x338306c6, 0x00000000
+        /*== p3 ==*/
+        .align 64
+        .long 0xb0343c7b, 0xbd6ee69d, 0xbd8f0da7, 0xbdae477d
+        .long 0xbdcd2a1f, 0xbdeba80d, 0xbe0c443b, 0xbe293cf3
+        .long 0xbe44f282, 0xbe5f3651, 0xbe81c7c0, 0xbe96d7ca
+        .long 0xbea7fb8e, 0xbeb50e9e, 0xbec12efe, 0xbec4be92
+        .long 0xbebce070, 0xbead510e, 0xbe8ef7d6, 0xbe4b8704
+        .long 0xbe083237, 0xbdaf7449, 0xbd2e1ec4, 0xbc83bf06
+        .long 0xbbc3e0b5, 0xbb10aadc, 0xba0157db, 0xb88c18f2
+        .long 0xb717b096, 0xb5a43bae, 0xb383012c, 0x00000000
+        /*== p4 ==*/
+        .align 64
+        .long 0xbeaaaaa5, 0xbeab0612, 0xbea7f01f, 0xbea4e120
+        .long 0xbea387b7, 0xbea15962, 0xbe9d57f7, 0xbe976b5a
+        .long 0xbe90230d, 0xbe880dff, 0xbe7479b3, 0xbe4c3d88
+        .long 0xbe212482, 0xbdeb8cba, 0xbd5e78ad, 0x3c6b5e6e
+        .long 0x3d839143, 0x3dc21ee1, 0x3de347af, 0x3dcbec96
+        .long 0x3d99ef2d, 0x3d542ea1, 0x3cdde701, 0x3c2cca67
+        .long 0x3b81cb27, 0x3ac073a1, 0x39ac3032, 0x383a94d9
+        .long 0x36ca081d, 0x355abd4c, 0x332b3cb6, 0x00000000
+        /*== p5 ==*/
+        .align 64
+        .long 0xb76dd6b9, 0xbe1c276d, 0x3c1dcf2f, 0x3dc1a78d
+        .long 0x3d96f985, 0x3da2b61b, 0x3dc13397, 0x3dd2f670
+        .long 0x3df48a0a, 0x3e06c5a8, 0x3e1a3aba, 0x3e27c405
+        .long 0x3e2e78d0, 0x3e2c3e44, 0x3e1d3097, 0x3df4a8f4
+        .long 0x3da38508, 0x3d31416a, 0x3b562657, 0xbcaeeac9
+        .long 0xbcce9419, 0xbcaaeac4, 0xbc49e7d0, 0xbba71ddd
+        .long 0xbb003b0e, 0xba3f9a05, 0xb92c08a7, 0xb7ba9232
+        .long 0xb64a0b0f, 0xb4dac169, 0xb2ab78ac, 0x00000000
+        /*== p6 ==*/
+        .align 64
+        .long 0x3e0910e9, 0x43761143, 0x4165ecdc, 0xc190f756
+        .long 0xc08c097d, 0xc02ba813, 0xbf7f6bda, 0x3f2b1dc0
+        .long 0x3ece105d, 0x3f426a94, 0xbadb0dc4, 0x3da43b17
+        .long 0xbd51ab88, 0xbcaea23d, 0xbd3b6d8d, 0xbd6caaad
+        .long 0xbd795bed, 0xbd5fddda, 0xbd038f3b, 0xbc1cad63
+        .long 0x3abb4766, 0x3b95f10b, 0x3b825873, 0x3afaea66
+        .long 0x3a49f878, 0x39996bf3, 0x388f3e6c, 0x371bb0e3
+        .long 0x35a8a5e6, 0x34369b17, 0x322487b0, 0x00000000
+        /*== p7 ==*/
+        .align 64
+        .long 0xbc0e2f66, 0x460bda12, 0x43d638ef, 0xc3e11c3e
+        .long 0xc2baa4e9, 0xc249da2d, 0xc1859b82, 0x40dd5b57
+        .long 0x40494640, 0x40c730a8, 0xbf0f160e, 0x3e30e76f
+        .long 0xbea81387, 0xbdb26a1c, 0xbd351e57, 0xbb4c01a0
+        .long 0x3c1d7bfb, 0x3c722cd1, 0x3c973f1c, 0x3c33a31b
+        .long 0x3b862ef4, 0x3a27b3d0, 0xba3b5907, 0xba0efc22
+        .long 0xb97f9f0f, 0xb8c8af50, 0xb7bdddfb, 0xb64f2950
+        .long 0xb4e085b1, 0xb3731dfa, 0xb15a1f04, 0x00000000
+        .align 64
+        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMantMask_UISA     */
+        .align 64
+        .long 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000           /* _iMinIdxOfsMask_UISA   */
+        .align 64
+        .long 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000           /* _iMaxIdxMask_UISA      */
+        .align 64
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSignMask        */
+        .align 64
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff           /* _sAbsMask         */
+        .align 64
+        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask     */
+        .align 64
+        .long 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000           /* _iExpMask         */
+        .align 64
+        .long 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000           /* _iMinIdxOfsMask   */
+        .align 64
+        .long 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000           /* _iMaxIdxMask      */
+        .align 64
+        .type	__svml_stanh_data_internal,@object
+        .size	__svml_stanh_data_internal,.-__svml_stanh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S
new file mode 100644
index 0000000000..cd290db337
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized tanhf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_tanhf _ZGVbN4v_tanhf_sse2
+#include "../svml_s_tanhf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c
new file mode 100644
index 0000000000..2dcb1f3676
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized tanhf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_tanhf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_tanhf, __GI__ZGVbN4v_tanhf,
+	       __redirect__ZGVbN4v_tanhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S
new file mode 100644
index 0000000000..3a0ce20473
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S
@@ -0,0 +1,832 @@
+/* Function tanhf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   NOTE: Since the hyperbolic tangent function is odd
+ *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
+ *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
+ *
+ *   We use a table lookup method to compute tanh(|x|).
+ *   The basic idea is to split the input range into a number of subintervals
+ *   and to approximate tanh(.) with a polynomial on each of them.
+ *
+ *   IEEE SPECIAL CONDITIONS:
+ *   x = [+,-]0, r = [+,-]0
+ *   x = +Inf,   r = +1
+ *   x = -Inf,   r = -1
+ *   x = QNaN,   r = QNaN
+ *   x = SNaN,   r = QNaN
+ *
+ *
+ *   ALGORITHM DETAILS
+ *   We handle special values in a callout function, aside from main path
+ *   computations. "Special" for this algorithm are:
+ *   INF, NAN, |x| > HUGE_THRESHOLD
+ *
+ *
+ *   Main path computations are organized as follows:
+ *   Actually we split the interval [0, SATURATION_THRESHOLD)
+ *   into a number of subintervals.  On each subinterval we approximate tanh(.)
+ *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
+ *   are computed beforehand and stored in table. We also use
+ *
+ *       y := |x| + B,
+ *
+ *   here B depends on subinterval and is used to make argument
+ *   closer to zero.
+ *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
+ *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
+ *   preserve main path computation logic but return 1.0 for all arguments.
+ *
+ *   Hence reconstruction looks as follows:
+ *   we extract proper polynomial and range reduction coefficients
+ *        (Pj and B), corresponding to subinterval, to which |x| belongs,
+ *        and return
+ *
+ *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
+ *
+ *   NOTE: we use multiprecision technique to multiply and sum the first
+ *         K terms of the polynomial. So Pj, j = 0..K are stored in
+ *         table each as a pair of target precision numbers (Pj and PLj) to
+ *         achieve wider than target precision.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_stanh_data_internal
+ */
+#define _dbP                          	0
+#define _sSignMask                    	4288
+#define _sAbsMask                     	4304
+#define _iExpMantMask                 	4320
+#define _iExpMask                     	4336
+#define _iMinIdxOfsMask               	4352
+#define _iMaxIdxMask                  	4368
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_tanhf_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm5
+
+/* Here huge arguments, INF and NaNs are filtered out to callout. */
+        movdqu    _iExpMantMask+__svml_stanh_data_internal(%rip), %xmm9
+        lea       _dbP+16+__svml_stanh_data_internal(%rip), %r8
+        pand      %xmm5, %xmm9
+
+/* if VMIN, VMAX is defined for I type */
+        pxor      %xmm7, %xmm7
+        movdqa    %xmm9, %xmm6
+        psubd     _iMinIdxOfsMask+__svml_stanh_data_internal(%rip), %xmm9
+
+/*
+ *  small table specific variables *
+ *  Constant loading
+ */
+        movdqu    _iMaxIdxMask+__svml_stanh_data_internal(%rip), %xmm10
+        movdqa    %xmm9, %xmm11
+        movdqa    %xmm9, %xmm8
+        pcmpgtd   %xmm10, %xmm11
+        pcmpgtd   %xmm7, %xmm8
+        movdqa    %xmm11, %xmm14
+        pand      %xmm8, %xmm9
+        andps     %xmm11, %xmm10
+        andnps    %xmm9, %xmm14
+        orps      %xmm10, %xmm14
+        psrld     $14, %xmm14
+        movd      %xmm14, %edx
+        pshufd    $1, %xmm14, %xmm12
+        pshufd    $2, %xmm14, %xmm13
+        movd      %xmm12, %ecx
+        pshufd    $3, %xmm14, %xmm15
+        movups    _sAbsMask+__svml_stanh_data_internal(%rip), %xmm3
+        movslq    %edx, %rdx
+        andps     %xmm5, %xmm3
+        movslq    %ecx, %rcx
+        pcmpgtd   _iExpMask+__svml_stanh_data_internal(%rip), %xmm6
+        movd      %xmm13, %esi
+        movups    -16(%rdx,%r8), %xmm2
+        movaps    %xmm2, %xmm0
+        movd      %xmm15, %edi
+        movmskps  %xmm6, %eax
+        movups    -16(%rcx,%r8), %xmm6
+        unpcklpd  %xmm6, %xmm0
+        unpckhpd  %xmm6, %xmm2
+        cvtps2pd  %xmm3, %xmm6
+        movhlps   %xmm3, %xmm3
+        cvtps2pd  %xmm3, %xmm3
+        movslq    %esi, %rsi
+        movslq    %edi, %rdi
+        movups    (%rcx,%r8), %xmm8
+        movups    (%rdx,%r8), %xmm12
+        movups    (%rsi,%r8), %xmm13
+        movaps    %xmm12, %xmm10
+        movups    (%rdi,%r8), %xmm9
+        movaps    %xmm13, %xmm11
+        unpckhpd  %xmm8, %xmm12
+        unpckhpd  %xmm9, %xmm13
+        mulpd     %xmm6, %xmm12
+        mulpd     %xmm3, %xmm13
+        unpcklpd  %xmm8, %xmm10
+        unpcklpd  %xmm9, %xmm11
+        addpd     %xmm10, %xmm12
+        addpd     %xmm11, %xmm13
+        mulpd     %xmm6, %xmm12
+        mulpd     %xmm3, %xmm13
+        addpd     %xmm2, %xmm12
+        movups    -16(%rsi,%r8), %xmm1
+        movups    -16(%rdi,%r8), %xmm7
+        movaps    %xmm1, %xmm14
+        unpckhpd  %xmm7, %xmm1
+        addpd     %xmm1, %xmm13
+        mulpd     %xmm12, %xmm6
+        mulpd     %xmm13, %xmm3
+        addpd     %xmm0, %xmm6
+        unpcklpd  %xmm7, %xmm14
+        addpd     %xmm14, %xmm3
+        cvtpd2ps  %xmm6, %xmm0
+        cvtpd2ps  %xmm3, %xmm1
+        movups    _sSignMask+__svml_stanh_data_internal(%rip), %xmm4
+        movlhps   %xmm1, %xmm0
+        andps     %xmm5, %xmm4
+        orps      %xmm4, %xmm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm5
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm5, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 eax
+
+        xorl      %edx, %edx
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      tanhf@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_tanhf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_stanh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(16)) VUINT32 _dbP[(134*4)][2];
+        __declspec(align(16)) VUINT32 _sSignMask[4][1];
+        __declspec(align(16)) VUINT32 _sAbsMask[4][1];
+        __declspec(align(16)) VUINT32 _iExpMantMask[4][1];
+        __declspec(align(16)) VUINT32 _iExpMask[4][1];
+        __declspec(align(16)) VUINT32 _iMinIdxOfsMask[4][1];
+        __declspec(align(16)) VUINT32 _iMaxIdxMask[4][1];
+} __svml_stanh_data_internal;
+#endif
+__svml_stanh_data_internal:
+        /* Pol_000:  err=7.93e-09, x in [0.0000000; 0.0312500]. */
+        .quad 0x0000000000000000  /* A00 = +0.000000000000000000000e-01 */
+        .quad 0x3FF00000022C70EB  /* A01 = +1.000000008097283510367e+00 */
+        .quad 0xBED00E878CFFA194  /* A02 = -3.828228912518614443549e-06 */
+        .quad 0xBFD551766D0607A9  /* A03 = -3.330970825846813476723e-01 */
+        .quad 0xBE53D60CE3E4C297  /* A00 = -1.847383956330407336230e-08 */
+        .quad 0x3FF000024177CF5C  /* A01 = +1.000002151235967140508e+00 */
+        .quad 0xBF1758BC94A51A25  /* A02 = -8.906031613262943753568e-05 */
+        .quad 0xBFD53EAE67E0D4F0  /* A03 = -3.319507612644221339337e-01 */
+        .quad 0xBE5A9E47EF32D6FE  /* A00 = -2.479020984039698285657e-08 */
+        .quad 0x3FF00002DA983057  /* A01 = +1.000002721676556793895e+00 */
+        .quad 0xBF1BD953509E94AA  /* A02 = -1.062352277175377670507e-04 */
+        .quad 0xBFD53BDB562EEDD5  /* A03 = -3.317783681520414806876e-01 */
+        .quad 0xBE6191BBE496D294  /* A00 = -3.272532162914017685901e-08 */
+        .quad 0x3FF0000390492017  /* A01 = +1.000003398528866105366e+00 */
+        .quad 0xBF20727E814A57CE  /* A02 = -1.254825043772153972919e-04 */
+        .quad 0xBFD538DE060A6F22  /* A03 = -3.315959033004550748913e-01 */
+        .quad 0xBE66DAFA2A893A25  /* A00 = -4.257146219278012568149e-08 */
+        .quad 0x3FF0000465E08CD1  /* A01 = +1.000004194219219266770e+00 */
+        .quad 0xBF2341C765EF91B6  /* A02 = -1.469188600530365522261e-04 */
+        .quad 0xBFD535B6841FAF9E  /* A03 = -3.314033785124993469751e-01 */
+        .quad 0xBE6D5794E361E964  /* A00 = -5.465394929765249413434e-08 */
+        .quad 0x3FF000055EE2A0CB  /* A01 = +1.000005121846742950353e+00 */
+        .quad 0xBF265E6C77E66C8B  /* A02 = -1.706607253709506650304e-04 */
+        .quad 0xBFD53264DDCCEDA6  /* A03 = -3.312008062382240103361e-01 */
+        .quad 0xBE729C844D374A6E  /* A00 = -6.933284462462096107184e-08 */
+        .quad 0x3FF000067F019093  /* A01 = +1.000006195180536350264e+00 */
+        .quad 0xBF29CC5348D6DCE5  /* A02 = -1.968242326435338705130e-04 */
+        .quad 0xBFD52EE92121ED35  /* A03 = -3.309881995734998416658e-01 */
+        .quad 0xBE775AEA17EAA872  /* A00 = -8.700465590574974405858e-08 */
+        .quad 0x3FF00007CA1D66B8  /* A01 = +1.000007428656699559610e+00 */
+        .quad 0xBF2D8F5EB98A2637  /* A02 = -2.255252009216044881395e-04 */
+        .quad 0xBFD52B435CDF9128  /* A03 = -3.307655722585587376727e-01 */
+        .quad 0xBE7D04DA28C343F0  /* A00 = -1.081040272327705484794e-07 */
+        .quad 0x3FF000094443CCF5  /* A01 = +1.000008837375216730337e+00 */
+        .quad 0xBF30D5B76C947AE5  /* A02 = -2.568791210978817814332e-04 */
+        .quad 0xBFD52773A0776FAD  /* A03 = -3.305329386764651045105e-01 */
+        .quad 0xBE81DD77A12C51C7  /* A00 = -1.331054169875768625701e-07 */
+        .quad 0x3FF0000AF1AFD2DA  /* A01 = +1.000010437096696680470e+00 */
+        .quad 0xBF331230624C1680  /* A02 = -2.910011410651516805537e-04 */
+        .quad 0xBFD52379FC0B61DF  /* A03 = -3.302903138515186909352e-01 */
+        .quad 0xBE85D04EEEB3C435  /* A00 = -1.625247628488202841012e-07 */
+        .quad 0x3FF0000CD6C9B1F2  /* A01 = +1.000012244238970726684e+00 */
+        .quad 0xBF357F0742FADDD4  /* A02 = -3.280060509313874068243e-04 */
+        .quad 0xBFD51F56806D0E81  /* A03 = -3.300377134475880880338e-01 */
+        .quad 0xBE8A6E289B59681B  /* A00 = -1.969211333326924655065e-07 */
+        .quad 0x3FF0000EF8268F72  /* A01 = +1.000014275873550406715e+00 */
+        .quad 0xBF381E277A1B747A  /* A02 = -3.680082682942575423093e-04 */
+        .quad 0xBFD51B093F1D6FD4  /* A03 = -3.297751537663746734808e-01 */
+        .quad 0xBE8FCBC40EE9ABD5  /* A00 = -2.368983653301529373887e-07 */
+        .quad 0x3FF000115A883B6C  /* A01 = +1.000016549721943981410e+00 */
+        .quad 0xBF3AF17AC974B3D9  /* A02 = -4.111218235774406434303e-04 */
+        .quad 0xBFD516924A4C549C  /* A03 = -3.295026517456081105450e-01 */
+        .quad 0xBE92FFBC60A3F956  /* A00 = -2.831066871072026054144e-07 */
+        .quad 0x3FF0001402DCED8A  /* A01 = +1.000019084151832604590e+00 */
+        .quad 0xBF3DFAE9390C4801  /* A02 = -4.574603454311488280083e-04 */
+        .quad 0xBFD511F1B4D7DC3A  /* A03 = -3.292202249571719585575e-01 */
+        .quad 0xBE9690A22F96D5AD  /* A00 = -3.362443262393081632612e-07 */
+        .quad 0x3FF00016F63EFF5D  /* A01 = +1.000021898173108825247e+00 */
+        .quad 0xBF409E2C839605BB  /* A02 = -5.071370461992499986334e-04 */
+        .quad 0xBFD50D27924BEE00  /* A03 = -3.289278916051614487515e-01 */
+        .quad 0xBE9AA56C65E72A73  /* A00 = -3.970591019557469835586e-07 */
+        .quad 0x3FF0001A39F4A43E  /* A01 = +1.000025011433776978009e+00 */
+        .quad 0xBF425BD74C3D6667  /* A02 = -5.602647074553602319844e-04 */
+        .quad 0xBFD50833F6E1ABA2  /* A03 = -3.286256705238718156536e-01 */
+        .quad 0xBE9F4BD4FF1A83B0  /* A00 = -4.663500013744687071912e-07 */
+        .quad 0x3FF0001DD36F9EC2  /* A01 = +1.000028444215715683896e+00 */
+        .quad 0xBF44376634149405  /* A02 = -6.169556656102642569831e-04 */
+        .quad 0xBFD50316F77EDEE5  /* A03 = -3.283135811757190158922e-01 */
+        .quad 0xBEA3B625387BB079  /* A00 = -5.874486399249461304297e-07 */
+        .quad 0x3FF00023E14CFBA9  /* A01 = +1.000034217911642153709e+00 */
+        .quad 0xBF47392F923218D2  /* A02 = -7.087213783883111826306e-04 */
+        .quad 0xBFD4FB1FACDEB938  /* A03 = -3.278273761924483942209e-01 */
+        .quad 0xBEAA6E24F543500A  /* A00 = -7.876828740601738750574e-07 */
+        .quad 0x3FF0002D5C6E8412  /* A01 = +1.000043259679163742959e+00 */
+        .quad 0xBF4BAF02BD7FDD70  /* A02 = -8.448375110664940040861e-04 */
+        .quad 0xBFD4EFEE6527A7DE  /* A03 = -3.271442401734229177279e-01 */
+        .quad 0xBEB16E3EBE2157D0  /* A00 = -1.038947396133402500647e-06 */
+        .quad 0x3FF00038990FEE2F  /* A01 = +1.000053975962952312884e+00 */
+        .quad 0xBF50569481C574CB  /* A02 = -9.972048056490652716971e-04 */
+        .quad 0xBFD4E419278DA2B4  /* A03 = -3.264220129263251113372e-01 */
+        .quad 0xBEB6A7B6723165D4  /* A00 = -1.350350836279403750524e-06 */
+        .quad 0x3FF00045CAB4158E  /* A01 = +1.000066558657042303793e+00 */
+        .quad 0xBF531D7C9C849108  /* A02 = -1.166698160951775212202e-03 */
+        .quad 0xBFD4D7A0BB33B152  /* A03 = -3.256608799117844954552e-01 */
+        .quad 0xBEBD0EE2A8654AFD  /* A00 = -1.732000471561702711532e-06 */
+        .quad 0x3FF00055276F18D6  /* A01 = +1.000081209219890521211e+00 */
+        .quad 0xBF562FDBA3FB6C6C  /* A02 = -1.354183666925102939860e-03 */
+        .quad 0xBFD4CA85F1B93DB2  /* A03 = -3.248610363561638125773e-01 */
+        .quad 0xBEC269D4036A207E  /* A00 = -2.195047297096822741730e-06 */
+        .quad 0x3FF00066E7DA6E4E  /* A01 = +1.000098138500919997540e+00 */
+        .quad 0xBF5991499FC36B3A  /* A02 = -1.560518167983372759405e-03 */
+        .quad 0xBFD4BCC9A72283D6  /* A03 = -3.240226871658341556426e-01 */
+        .quad 0xBEC7154B6C09CFE1  /* A00 = -2.751729738565190291276e-06 */
+        .quad 0x3FF0007B47086B80  /* A01 = +1.000117566559055148900e+00 */
+        .quad 0xBF5D455433B4F8F4  /* A02 = -1.786548832412968197680e-03 */
+        .quad 0xBFD4AE6CC1BFE145  /* A03 = -3.231460468373550942722e-01 */
+        .quad 0xBECCA68CC64A0F8A  /* A00 = -3.415415948561670285790e-06 */
+        .quad 0x3FF00092827742F7  /* A01 = +1.000139722473418535387e+00 */
+        .quad 0xBF60A7BF15A527AF  /* A02 = -2.033112728132522705610e-03 */
+        .quad 0xBFD49F703214084C  /* A03 = -3.222313393636155876010e-01 */
+        .quad 0xBED19E68676B241B  /* A00 = -4.200644630977303616698e-06 */
+        .quad 0x3FF000ACDA037B26  /* A01 = +1.000164844146362863597e+00 */
+        .quad 0xBF62D99F836A02F8  /* A02 = -2.301036405072284102280e-03 */
+        .quad 0xBFD48FD4F2B91B28  /* A03 = -3.212787981359945810311e-01 */
+        .quad 0xBED57CF4B0C7AA54  /* A00 = -5.123164339408145209103e-06 */
+        .quad 0x3FF000CA8FD9E1A1  /* A01 = +1.000193178099017865534e+00 */
+        .quad 0xBF653A014548E686  /* A02 = -2.591135484433962181405e-03 */
+        .quad 0xBFD47F9C0844B38F  /* A03 = -3.202886658426046806447e-01 */
+        .quad 0xBEDA012B1B1A41E2  /* A00 = -6.199971197454598722328e-06 */
+        .quad 0x3FF000EBE868FDF4  /* A01 = +1.000224979259539459520e+00 */
+        .quad 0xBF67CA9427E0A544  /* A02 = -2.904214255086275467410e-03 */
+        .quad 0xBFD46EC6812ADB37  /* A03 = -3.192611943626845749655e-01 */
+        .quad 0xBEDF3EAC5BF12194  /* A00 = -7.449344990702664567927e-06 */
+        .quad 0x3FF001112A520784  /* A01 = +1.000260510744255704196e+00 */
+        .quad 0xBF6A8D01ABDA4DC4  /* A02 = -3.241065277345108255891e-03 */
+        .quad 0xBFD45D55759FFA4A  /* A03 = -3.181966446572103146551e-01 */
+        .quad 0xBEE2A541BC274267  /* A00 = -8.890883582164319970972e-06 */
+        .quad 0x3FF0013A9E5961F2  /* A01 = +1.000300043631906721231e+00 */
+        .quad 0xBF6D82ECD080C540  /* A02 = -3.602468994380686462264e-03 */
+        .quad 0xBFD44B4A0779C0AD  /* A03 = -3.170952866557950611259e-01 */
+        .quad 0xBEE61D97609A27F4  /* A00 = -1.054553560499505625520e-05 */
+        .quad 0x3FF001688F56A3AF  /* A01 = +1.000343856731187974773e+00 */
+        .quad 0xBF7056F8EFB683EC  /* A02 = -3.989193351487490407647e-03 */
+        .quad 0xBFD438A5620F0F74  /* A03 = -3.159573991399533543500e-01 */
+        .quad 0xBEEA145429EDD370  /* A00 = -1.243563138839952927732e-05 */
+        .quad 0x3FF0019B4A242A67  /* A01 = +1.000392236341804297339e+00 */
+        .quad 0xBF7207D31CA78D9B  /* A02 = -4.401993423445739288258e-03 */
+        .quad 0xBFD42568BA16E7CD  /* A03 = -3.147832696228050619602e-01 */
+        .quad 0xBEEE96370D52680F  /* A00 = -1.458491207477835326165e-05 */
+        .quad 0x3FF001D31D8E4115  /* A01 = +1.000445476009251821736e+00 */
+        .quad 0xBF73D4CC11EDC094  /* A02 = -4.841611050196221316400e-03 */
+        .quad 0xBFD411954D8664E7  /* A03 = -3.135731942252974469021e-01 */
+        .quad 0xBEF338C046215EF8  /* A00 = -1.833122622260562810219e-05 */
+        .quad 0x3FF00230C32C2EC1  /* A01 = +1.000534784691737621998e+00 */
+        .quad 0xBF76BD019BCC5DAF  /* A02 = -5.551344188254799492943e-03 */
+        .quad 0xBFD3F2C7156DC21E  /* A03 = -3.116929730668135389848e-01 */
+        .quad 0xBEF9B15EAE411EAE  /* A00 = -2.450261207822986676092e-05 */
+        .quad 0x3FF002C2DF057A4D  /* A01 = +1.000674124886830940184e+00 */
+        .quad 0xBF7B08CCD9AC1E30  /* A02 = -6.600189396301511801646e-03 */
+        .quad 0xBFD3C7A7A114FED8  /* A03 = -3.090609620157755976777e-01 */
+        .quad 0xBF00E36483C373B3  /* A00 = -3.221178528332122595812e-05 */
+        .quad 0x3FF0036F419480D7  /* A01 = +1.000838524028997644777e+00 */
+        .quad 0xBF7FD255D1777007  /* A02 = -7.768950679260206403087e-03 */
+        .quad 0xBFD39A453911D6CE  /* A03 = -3.062909180947429588215e-01 */
+        .quad 0xBF05DFA04DD12059  /* A00 = -4.172046622180685472624e-05 */
+        .quad 0x3FF00438B2A03D8D  /* A01 = +1.001030633695197069599e+00 */
+        .quad 0xBF828F8DBB4A9D10  /* A02 = -9.062869337255224921890e-03 */
+        .quad 0xBFD36AAB704697D9  /* A03 = -3.033856007044711255993e-01 */
+        .quad 0xBF0BF3E0C647DEFB  /* A00 = -5.331544597092331081714e-05 */
+        .quad 0x3FF005221063D36D  /* A01 = +1.001253189109060359741e+00 */
+        .quad 0xBF857A2CB3C96102  /* A02 = -1.048693584122917590862e-02 */
+        .quad 0xBFD338E65BBB4FEC  /* A03 = -3.003478904549854444639e-01 */
+        .quad 0xBF11A506ED7C9D31  /* A00 = -6.730894835681591541979e-05 */
+        .quad 0x3FF0062E4D0EA92A  /* A01 = +1.001508999829250345925e+00 */
+        .quad 0xBF88AB82C2761AF3  /* A02 = -1.204588085125866091241e-02 */
+        .quad 0xBFD305028D6BD206  /* A03 = -2.971807843271395688234e-01 */
+        .quad 0xBF1607C0922D9BF1  /* A00 = -8.403885708006799337092e-05 */
+        .quad 0x3FF007606C341961  /* A01 = +1.001800940198869449560e+00 */
+        .quad 0xBF8C25E6DA487BCF  /* A02 = -1.374416688582682892494e-02 */
+        .quad 0xBFD2CF0D0EE8F7B5  /* A03 = -2.938873906713255768075e-01 */
+        .quad 0xBF1B3A8480A0A16D  /* A00 = -1.038688061788578038307e-04 */
+        .quad 0x3FF008BB802D02D6  /* A01 = +1.002131939589323561535e+00 */
+        .quad 0xBF8FEB8AE99FD100  /* A02 = -1.558598065819483124983e-02 */
+        .quad 0xBFD297135BD0911B  /* A03 = -2.904709240558688843059e-01 */
+        .quad 0xBF20ABB9BDB75C65  /* A00 = -1.271881327357976163798e-04 */
+        .quad 0x3FF00A42A76D8CD1  /* A01 = +1.002504972472525901495e+00 */
+        .quad 0xBF91FF3D752BB9E6  /* A02 = -1.757522609380570560722e-02 */
+        .quad 0xBFD25D235C1F88B4  /* A03 = -2.869346999779154305799e-01 */
+        .quad 0xBF243D3254425461  /* A00 = -1.544116913733432829448e-04 */
+        .quad 0x3FF00BF909D1795E  /* A01 = +1.002923048355647051011e+00 */
+        .quad 0xBF94304E04D44942  /* A02 = -1.971551804042204897316e-02 */
+        .quad 0xBFD2214B5E61CFA6  /* A03 = -2.832821294498394371075e-01 */
+        .quad 0xBF286070011B61CE  /* A00 = -1.859795307186510085994e-04 */
+        .quad 0x3FF00DE1D5E1627E  /* A01 = +1.003389201612804537689e+00 */
+        .quad 0xBF9689D5F4163F59  /* A02 = -2.201017668045266231780e-02 */
+        .quad 0xBFD1E39A11C3B42C  /* A03 = -2.795167134743816728104e-01 */
+        .quad 0xBF2D250B366A79E8  /* A00 = -2.223564326486314902259e-04 */
+        .quad 0x3FF010003E134001  /* A01 = +1.003906481248123094829e+00 */
+        .quad 0xBF990C9FF91F6F81  /* A02 = -2.446222265267250853271e-02 */
+        .quad 0xBFD1A41E80084CDC  /* A03 = -2.756420374218586655246e-01 */
+        .quad 0xBF314DB5DDC2A30E  /* A00 = -2.640313157465248123865e-04 */
+        .quad 0x3FF012577608921B  /* A01 = +1.004477940624503018441e+00 */
+        .quad 0xBF9BB9626875B0C9  /* A02 = -2.707437288829409385849e-02 */
+        .quad 0xBFD162E80768A9D0  /* A03 = -2.716617653228725615122e-01 */
+        .quad 0xBF346A6133808864  /* A00 = -3.115165050094957730625e-04 */
+        .quad 0x3FF014EAAFCC88A3  /* A01 = +1.005106627192198898157e+00 */
+        .quad 0xBF9E90BEF9BF7419  /* A02 = -2.984903716411588595059e-02 */
+        .quad 0xBFD12006545F7FAD  /* A03 = -2.675796340899932457269e-01 */
+        .quad 0xBF37F180DC3848EA  /* A00 = -3.653468704395550778821e-04 */
+        .quad 0x3FF017BD19147861  /* A01 = +1.005795572250939295955e+00 */
+        .quad 0xBFA0C9A14C702E07  /* A02 = -3.278831537326359207851e-02 */
+        .quad 0xBFD0DB895B650092  /* A03 = -2.633994476818851682154e-01 */
+        .quad 0xBF3BEC6AAC6D7635  /* A00 = -4.260788377246944457107e-04 */
+        .quad 0x3FF01AD1D884E719  /* A01 = +1.006547780778822565040e+00 */
+        .quad 0xBFA260B2A1B1434A  /* A02 = -3.589399551186163439542e-02 */
+        .quad 0xBFD09581529E93D6  /* A03 = -2.591250712233067465817e-01 */
+        .quad 0xBF4164E26167882B  /* A00 = -5.308251737086202562063e-04 */
+        .quad 0x3FF01FEF14B62B81  /* A01 = +1.007796364693348545316e+00 */
+        .quad 0xBFA4EB014538AA42  /* A02 = -4.085544557559163403315e-02 */
+        .quad 0xBFD029D36FEAF41F  /* A03 = -2.525528519580024222613e-01 */
+        .quad 0xBF46F6FFF4E53DC8  /* A00 = -7.008313930700277652464e-04 */
+        .quad 0x3FF027CBB51CBBA0  /* A01 = +1.009715754956893363214e+00 */
+        .quad 0xBFA89DEC9FEC112E  /* A02 = -4.807986690687680864098e-02 */
+        .quad 0xBFCF2A99464D0DB4  /* A03 = -2.434875100390009317053e-01 */
+        .quad 0xBF4DCC9C4F66A4D9  /* A00 = -9.094012482836712945103e-04 */
+        .quad 0x3FF030E7CFCCD583  /* A01 = +1.011939822882909068014e+00 */
+        .quad 0xBFACAA3B95814081  /* A02 = -5.598627281199331645611e-02 */
+        .quad 0xBFCDF78F156BE7CF  /* A03 = -2.341173987004467604844e-01 */
+        .quad 0xBF5308ED74E5C7A6  /* A00 = -1.161796466103906435435e-03 */
+        .quad 0x3FF03B5986412ECB  /* A01 = +1.014489674026594512313e+00 */
+        .quad 0xBFB087EBA88DCC3F  /* A02 = -6.457398285947223148806e-02 */
+        .quad 0xBFCCBB9BD134862F  /* A03 = -2.244753619680052991736e-01 */
+        .quad 0xBF57FA23C00DF4B5  /* A00 = -1.463446533505758208674e-03 */
+        .quad 0x3FF0473558A1BCC0  /* A01 = +1.017384859292903342975e+00 */
+        .quad 0xBFB2E702BC6360EF  /* A02 = -7.383744334527241048871e-02 */
+        .quad 0xBFCB77D546379288  /* A03 = -2.145945160729250122955e-01 */
+        .quad 0xBF5DD12971557F71  /* A00 = -1.819887610814388068450e-03 */
+        .quad 0x3FF0548DDF5000A8  /* A01 = +1.020643112482540360020e+00 */
+        .quad 0xBFB571B63DA186E1  /* A02 = -8.376635555898871710045e-02 */
+        .quad 0xBFCA2D5202605148  /* A03 = -2.045080672838912594358e-01 */
+        .quad 0xBF6252B1AD5D4F17  /* A00 = -2.236697221556737096709e-03 */
+        .quad 0x3FF063738A910BF7  /* A01 = +1.024280110622155737232e+00 */
+        .quad 0xBFB8270C8E6B601B  /* A02 = -9.434584118878357184013e-02 */
+        .quad 0xBFC8DD27D950A07E  /* A03 = -1.942491351230763441116e-01 */
+        .quad 0xBF66470C91730CFC  /* A00 = -2.719425723258004842786e-03 */
+        .quad 0x3FF073F468FCF331  /* A01 = +1.028309259519300633556e+00 */
+        .quad 0xBFBB05C2952191E4  /* A02 = -1.055566419686964629854e-01 */
+        .quad 0xBFC7886A770DE2BD  /* A03 = -1.838505822486435070662e-01 */
+        .quad 0xBF6AD114AC8E98EC  /* A00 = -3.273525599485007861467e-03 */
+        .quad 0x3FF0861BF53E5226  /* A01 = +1.032741506559554434119e+00 */
+        .quad 0xBFBE0C4F9B461507  /* A02 = -1.173753503881763554650e-01 */
+        .quad 0xBFC6302A037CDE3A  /* A03 = -1.733448521642786954722e-01 */
+        .quad 0xBF6FFBDE2A6C2AF8  /* A00 = -3.904279630096648551207e-03 */
+        .quad 0x3FF099F2EB8E7DA3  /* A01 = +1.037585182326304034106e+00 */
+        .quad 0xBFC09C74D192DDF0  /* A02 = -1.297746680554463516444e-01 */
+        .quad 0xBFC4D571D8E3079F  /* A03 = -1.627638157861470424859e-01 */
+        .quad 0xBF72E8FDC0B952AA  /* A00 = -4.616728994353872309042e-03 */
+        .quad 0x3FF0AF7F273C9533  /* A01 = +1.042845872181101141152e+00 */
+        .quad 0xBFC244C512736F10  /* A02 = -1.427236881344176033792e-01 */
+        .quad 0xBFC379474F58B902  /* A03 = -1.521386277613104298645e-01 */
+        .quad 0xBF762EABAF17395B  /* A00 = -5.415602341101023557701e-03 */
+        .quad 0x3FF0C6C3886F63FB  /* A01 = +1.048526318502125631582e+00 */
+        .quad 0xBFC3FDF9918EA12A  /* A02 = -1.561881981590514389957e-01 */
+        .quad 0xBFC21CA89ECAB895  /* A03 = -1.414995932913753196036e-01 */
+        .quad 0xBF79D387CE5B2BAE  /* A00 = -6.305246822828998107258e-03 */
+        .quad 0x3FF0DFBFE2346376  /* A01 = +1.054626353847394337748e+00 */
+        .quad 0xBFC5C6DA43602620  /* A02 = -1.701309994680721970894e-01 */
+        .quad 0xBFC0C08BD8DB6631  /* A03 = -1.308760460731704100557e-01 */
+        .quad 0xBF7DDBA8E8DA9060  /* A00 = -7.289562037531366334164e-03 */
+        .quad 0x3FF0FA70F0D1B464  /* A01 = +1.061142864894713433443e+00 */
+        .quad 0xBFC79E18D92BAA7C  /* A02 = -1.845122394946264732241e-01 */
+        .quad 0xBFBECBBBF74C2669  /* A03 = -1.202962378266875381749e-01 */
+        .quad 0xBF81254E76EA25DA  /* A00 = -8.371937755572145950511e-03 */
+        .quad 0x3FF116D05835EBD0  /* A01 = +1.068069786618014660462e+00 */
+        .quad 0xBFC982539E2ED224  /* A02 = -1.992897531869327609755e-01 */
+        .quad 0xBFBC1B043C350159  /* A03 = -1.097872397413132278254e-01 */
+        .quad 0xBF8391ACBA863403  /* A00 = -9.555196230190082448686e-03 */
+        .quad 0x3FF134D4AA477FE2  /* A01 = +1.075398125794884141015e+00 */
+        .quad 0xBFCB7218609FEAFB  /* A02 = -2.144194099235717521079e-01 */
+        .quad 0xBFB970A16CB88329  /* A03 = -9.937485603633135211599e-02 */
+        .quad 0xBF87935088E48E8B  /* A00 = -1.151144902957603431692e-02 */
+        .quad 0x3FF1649892AD7DD3  /* A01 = +1.087059567413110938716e+00 */
+        .quad 0xBFCE6971DDE75409  /* A02 = -2.375929196847723912089e-01 */
+        .quad 0xBFB58291E88CB251  /* A03 = -8.402358939628952472223e-02 */
+        .quad 0xBF8DB3A62C325325  /* A00 = -1.450280973794233242702e-02 */
+        .quad 0x3FF1A9C900C6DEEA  /* A01 = +1.103951457056548068891e+00 */
+        .quad 0xBFD13DBC65B0E08E  /* A02 = -2.693930619311765140012e-01 */
+        .quad 0xBFB06696F62696D1  /* A03 = -6.406539449252625362252e-02 */
+        .quad 0xBF92583699F2E27A  /* A00 = -1.791463198307716858659e-02 */
+        .quad 0x3FF1F451B85AA9F0  /* A01 = +1.122148246892376022288e+00 */
+        .quad 0xBFD34FD5F8288180  /* A02 = -3.017477916164565954205e-01 */
+        .quad 0xBFA6FB692825B683  /* A03 = -4.488686194495718900788e-02 */
+        .quad 0xBF9641C26E673D6F  /* A00 = -2.173522757385398448959e-02 */
+        .quad 0x3FF24364DA5E2B07  /* A01 = +1.141453602790251542487e+00 */
+        .quad 0xBFD564A5A5EF5890  /* A02 = -3.342680092295120530821e-01 */
+        .quad 0xBF9B43712011A982  /* A03 = -2.662445791467283467968e-02 */
+        .quad 0xBF9A901038EC2F39  /* A00 = -2.594018313816024226548e-02 */
+        .quad 0x3FF2961356DFFEBA  /* A01 = +1.161639537196534011088e+00 */
+        .quad 0xBFD775EBB17198C7  /* A02 = -3.665723069046972759644e-01 */
+        .quad 0xBF833B1A926CD462  /* A03 = -9.390075295963199591975e-03 */
+        .quad 0xBF9F396A6A461B91  /* A00 = -3.049246095317987084727e-02 */
+        .quad 0x3FF2EB53BAEF534B  /* A01 = +1.182452898229899629357e+00 */
+        .quad 0xBFD97DABF8AD8BBD  /* A02 = -3.982953957076310058660e-01 */
+        .quad 0x3F7B8F6A3E0F8837  /* A03 = +6.728568086119371925713e-03 */
+        .quad 0xBFA21878590F8BAA  /* A00 = -3.534294211546946951064e-02 */
+        .quad 0x3FF34209790236E1  /* A01 = +1.203622315111197105253e+00 */
+        .quad 0xBFDB764C0E71BECB  /* A02 = -4.290952817018306997277e-01 */
+        .quad 0x3F962FE0C03F84C0  /* A03 = +2.166701482190513949888e-02 */
+        .quad 0xBFA4B36B9AD27ECC  /* A00 = -4.043136849327097492868e-02 */
+        .quad 0x3FF3990C5B12FC16  /* A01 = +1.224865298994477935679e+00 */
+        .quad 0xBFDD5AABB0D01390  /* A02 = -4.586590983092770912322e-01 */
+        .quad 0x3FA21DAF5CA162DB  /* A03 = +3.538272863142363083844e-02 */
+        .quad 0xBFA7645E4D7BF28B  /* A00 = -4.568762489177399105378e-02 */
+        .quad 0x3FF3EF2FD51C0D9F  /* A01 = +1.245895225962932562069e+00 */
+        .quad 0xBFDF26377E1B686E  /* A02 = -4.867075664057044503963e-01 */
+        .quad 0x3FA8803E756EE812  /* A03 = +4.785342391501513914509e-02 */
+        .quad 0xBFAA210925C64413  /* A00 = -5.103329263796054643398e-02 */
+        .quad 0x3FF44349F897D8E7  /* A01 = +1.266427966181760345066e+00 */
+        .quad 0xBFE06A7B02C6D8E2  /* A02 = -5.129981092675530707226e-01 */
+        .quad 0x3FAE3F194734F5D0  /* A03 = +5.907515520309980505687e-02 */
+        .quad 0xBFACDE48F8A19BBB  /* A00 = -5.638340029764018351832e-02 */
+        .quad 0x3FF49439D5466582  /* A01 = +1.286187966447272845727e+00 */
+        .quad 0xBFE131C7C1063DDC  /* A02 = -5.373266954429101183166e-01 */
+        .quad 0x3FB1ADEEC36AD805  /* A03 = +6.906025191241844940482e-02 */
+        .quad 0xBFAF905D8F585680  /* A00 = -6.164829611604449866036e-02 */
+        .quad 0x3FF4E0ED1FD27F99  /* A01 = +1.304913639360142818546e+00 */
+        .quad 0xBFE1E7A859DC1D3D  /* A02 = -5.595285182070380836095e-01 */
+        .quad 0x3FB3ED018E4642A1  /* A03 = +7.783517573831001679086e-02 */
+        .quad 0xBFB11595104160BA  /* A00 = -6.673556944713512906198e-02 */
+        .quad 0x3FF528650340490B  /* A01 = +1.322361958217302513319e+00 */
+        .quad 0xBFE28B14B40BC974  /* A02 = -5.794776455425521000109e-01 */
+        .quad 0x3FB5DF49F5BAF6D7  /* A03 = +8.543836831355676453281e-02 */
+        .quad 0xBFB2513A97344BA4  /* A00 = -7.155195418844911836587e-02 */
+        .quad 0x3FF569BA0DB5EE14  /* A01 = +1.338312200124055273420e+00 */
+        .quad 0xBFE31B53A8B67B20  /* A02 = -5.970857901737396389308e-01 */
+        .quad 0x3FB787F297BB0544  /* A03 = +9.191814617499455275507e-02 */
+        .quad 0xBFB37512E848FAFA  /* A00 = -7.600515528700305112331e-02 */
+        .quad 0x3FF5A41F33B403C8  /* A01 = +1.352568819013173495591e+00 */
+        .quad 0xBFE397F6EA9A58A5  /* A02 = -6.123003561103997904880e-01 */
+        .quad 0x3FB8EAA9FF25CA06  /* A03 = +9.733068923177520814782e-02 */
+        .quad 0xBFB47B3E603AFC5D  /* A00 = -8.000554894805263217439e-02 */
+        .quad 0x3FF5D6E3EDE40487  /* A01 = +1.364963464031718975988e+00 */
+        .quad 0xBFE400D5BCA6D631  /* A02 = -6.251019177058819709103e-01 */
+        .quad 0x3FBA0B830ED567FE  /* A03 = +1.017381583418739132707e-01 */
+        .quad 0xBFB5BBFE8AC90496  /* A00 = -8.489981544791400103200e-02 */
+        .quad 0x3FF612BA70107E95  /* A01 = +1.379572332145390989311e+00 */
+        .quad 0xBFE477EAF1FA7693  /* A02 = -6.396383978023599814478e-01 */
+        .quad 0x3FBB4784B7C08A95  /* A03 = +1.065600346196709652391e-01 */
+        .quad 0xBFB6D5D940743939  /* A00 = -8.920057128509463473254e-02 */
+        .quad 0x3FF644A8748F70CE  /* A01 = +1.391762214006166953340e+00 */
+        .quad 0xBFE4D646AB07EA37  /* A02 = -6.511567440459832267763e-01 */
+        .quad 0x3FBC354F4E1D5292  /* A03 = +1.101884427747086558913e-01 */
+        .quad 0xBFB7223D19E4F3D1  /* A00 = -9.036619074045339206069e-02 */
+        .quad 0x3FF6518FEB42B7FA  /* A01 = +1.394912642466350494175e+00 */
+        .quad 0xBFE4ED86CB87498C  /* A02 = -6.539949393430091184598e-01 */
+        .quad 0x3FBC6D29F28CCA9B  /* A03 = +1.110407082713131127205e-01 */
+        .quad 0xBFB6878652FF6312  /* A00 = -8.800544287022329936754e-02 */
+        .quad 0x3FF63948C302D040  /* A01 = +1.388985406648330922508e+00 */
+        .quad 0xBFE4C4E2E7904E17  /* A02 = -6.490339777687407218920e-01 */
+        .quad 0x3FBC127356CA1ABE  /* A03 = +1.096565329445224612481e-01 */
+        .quad 0xBFB4F5D18B0C91D6  /* A00 = -8.187589306596207427980e-02 */
+        .quad 0x3FF5FD27EB7DD0B8  /* A01 = +1.374305648697413673176e+00 */
+        .quad 0xBFE464E01A2B2FC6  /* A02 = -6.373138915164353601739e-01 */
+        .quad 0x3FBB460547674A30  /* A03 = +1.065371798825160976065e-01 */
+        .quad 0xBFB26642FA16A685  /* A00 = -7.187288861919156890412e-02 */
+        .quad 0x3FF59F9BEDE1C95A  /* A01 = +1.351467065073470141812e+00 */
+        .quad 0xBFE3D67920C8FBEA  /* A02 = -6.199308052381387046381e-01 */
+        .quad 0x3FBA24F6A8D3CBC1  /* A03 = +1.021265184570401413078e-01 */
+        .quad 0xBFADB5294794F097  /* A00 = -5.802277563859197656582e-02 */
+        .quad 0x3FF523EA7B9CF453  /* A01 = +1.321268542159732772845e+00 */
+        .quad 0xBFE322A8B55E35DB  /* A02 = -5.979808370918208160205e-01 */
+        .quad 0x3FB8C8673B1B3E37  /* A03 = +9.680791085269722928697e-02 */
+        .quad 0xBFA4B7D661965C6A  /* A00 = -4.046506825687219699450e-02 */
+        .quad 0x3FF48DE3E2CE3122  /* A01 = +1.284641157110919085227e+00 */
+        .quad 0xBFE251FED1A7F445  /* A02 = -5.725092024655472622285e-01 */
+        .quad 0x3FB745699FCABDB9  /* A03 = +9.090290213747821701507e-02 */
+        .quad 0xBF93E60456E4EE1D  /* A00 = -1.943213253365004902773e-02 */
+        .quad 0x3FF3E1A14E628A59  /* A01 = +1.242585474196536532432e+00 */
+        .quad 0xBFE16C5AB660E876  /* A02 = -5.444768488007543094653e-01 */
+        .quad 0x3FB5AD33AA8C188F  /* A03 = +8.467410005332197397987e-02 */
+        .quad 0x3F738C17C47C7961  /* A00 = +4.772274820224659853951e-03 */
+        .quad 0x3FF3234DDE3BD146  /* A01 = +1.196119182682268355933e+00 */
+        .quad 0xBFE078C0D77A9D3B  /* A02 = -5.147403915952176722826e-01 */
+        .quad 0x3FB40D74B3E276B8  /* A03 = +7.833032027925923568290e-02 */
+        .quad 0x3FA0474BECC689C7  /* A00 = +3.179394975019849550746e-02 */
+        .quad 0x3FF256FB4FA7D18A  /* A01 = +1.146235762743432307076e+00 */
+        .quad 0xBFDEFA8E3FB285E2  /* A02 = -4.840427038235174395098e-01 */
+        .quad 0x3FB270C007493D59  /* A03 = +7.203293016322244446403e-02 */
+        .quad 0x3FAF5BD51E479BDC  /* A00 = +6.124750132203590768931e-02 */
+        .quad 0x3FF18081D0B53BC5  /* A01 = +1.093873801484492647162e+00 */
+        .quad 0xBFDCFE2439BD0C03  /* A02 = -4.530115665294831006626e-01 */
+        .quad 0x3FB0DEFE5A45AFDD  /* A03 = +6.590261176978580437424e-02 */
+        .quad 0x3FB7BD5D2806EA26  /* A00 = +9.273321368429118805032e-02 */
+        .quad 0x3FF0A369E35B4440  /* A01 = +1.039895904647224256223e+00 */
+        .quad 0xBFDB04BC5C9951E7  /* A02 = -4.221640495573226181669e-01 */
+        .quad 0x3FAEBBBAA9D6DEEF  /* A03 = +6.002600978120919278380e-02 */
+        .quad 0x3FC01BE411098DBC  /* A00 = +1.258511622610124502941e-01 */
+        .quad 0x3FEF85BDABC031C1  /* A01 = +9.850757936961188621083e-01 */
+        .quad 0xBFD91521375097C2  /* A02 = -3.919146576102968682065e-01 */
+        .quad 0x3FABE26F0086D982  /* A03 = +5.446192628317005068883e-02 */
+        .quad 0x3FC481D7FF5776B9  /* A00 = +1.602125164781023347604e-01 */
+        .quad 0x3FEDC3506C1E7218  /* A01 = +9.300920592973538347792e-01 */
+        .quad 0xBFD7349A88DA7D4F  /* A02 = -3.625856720409119104964e-01 */
+        .quad 0x3FA936E2DFF8E2AE  /* A03 = +4.924687370334389358018e-02 */
+        .quad 0x3FC90471F96FA27A  /* A00 = +1.954481571149420671141e-01 */
+        .quad 0x3FEC0451601987A2  /* A01 = +8.755270840595026360376e-01 */
+        .quad 0xBFD5671CD4B898DC  /* A02 = -3.344184949259110251063e-01 */
+        .quad 0x3FA6BB9594603B67  /* A03 = +4.439990459660841243261e-02 */
+        .quad 0x3FCFD8ADB9ED944C  /* A00 = +2.488000066615846384011e-01 */
+        .quad 0x3FE978C073F6809A  /* A01 = +7.959902062321078108909e-01 */
+        .quad 0xBFD2DF7E00BCD5A9  /* A02 = -2.948908812716931060471e-01 */
+        .quad 0x3FA3614033D490B2  /* A03 = +3.785133965200894456959e-02 */
+        .quad 0x3FD4846A12AFE5A0  /* A00 = +3.205819303981005674586e-01 */
+        .quad 0x3FE63A1147D40472  /* A01 = +6.945883181471244061100e-01 */
+        .quad 0xBFCFA2268AD34450  /* A02 = -2.471359422548027318101e-01 */
+        .quad 0x3F9F150201D9FFE0  /* A03 = +3.035357605267552383310e-02 */
+        .quad 0x3FD9018641F82BEB  /* A00 = +3.907180446846598154131e-01 */
+        .quad 0x3FE33B7C220FFBDC  /* A01 = +6.010113396913498995389e-01 */
+        .quad 0xBFCA4E4187E29C86  /* A02 = -2.055131829740483584423e-01 */
+        .quad 0x3F98C30CED19F8F4  /* A03 = +2.418155858185229434287e-02 */
+        .quad 0x3FDD4B8255BEB078  /* A00 = +4.577337109901757905561e-01 */
+        .quad 0x3FE0858B19D3A49B  /* A01 = +5.163016800335243905451e-01 */
+        .quad 0xBFC5BC929EACE564  /* A02 = -1.698172831327539045176e-01 */
+        .quad 0x3F93A083CE57DE2B  /* A03 = +1.916700312537337677621e-02 */
+        .quad 0x3FE0A8E5E039295C  /* A00 = +5.206174258576470315063e-01 */
+        .quad 0x3FDC35E1234583FE  /* A01 = +4.407885403107342225937e-01 */
+        .quad 0xBFC1DE034E31AEB9  /* A02 = -1.395877963835710222629e-01 */
+        .quad 0x3F8EFDEBB3471BDC  /* A03 = +1.513275280821162888101e-02 */
+        .quad 0x3FE2851B603CB2A5  /* A00 = +5.787484054213406503564e-01 */
+        .quad 0x3FD7F4A44ABBB286  /* A01 = +3.743067483726821853551e-01 */
+        .quad 0xBFBD3EEB67087DE7  /* A02 = -1.142413260026767657385e-01 */
+        .quad 0x3F8864F38329E8BD  /* A03 = +1.191129917173260922836e-02 */
+        .quad 0x3FE437DBE3C34AC1  /* A00 = +6.318187187665317283702e-01 */
+        .quad 0x3FD43F6F789441B5  /* A01 = +3.163717916040938438194e-01 */
+        .quad 0xBFB7D92E7901B9A4  /* A02 = -9.315767721429907277653e-02 */
+        .quad 0x3F8327ED342308E1  /* A03 = +9.353497651663324544136e-03 */
+        .quad 0x3FE5C0977766D55C  /* A00 = +6.797597248138731451661e-01 */
+        .quad 0x3FD10B42A764D8F9  /* A01 = +2.663122782427219115142e-01 */
+        .quad 0xBFB3633351D3D70F  /* A02 = -7.573242900602060456716e-02 */
+        .quad 0x3F7E079E30FF899C  /* A03 = +7.331483779099558922843e-03 */
+        .quad 0x3FE7202CE08A88C4  /* A00 = +7.226776490754436288455e-01 */
+        .quad 0x3FCC973EB5662B01  /* A01 = +2.233656297433626314319e-01 */
+        .quad 0xBFAF70A455F9920B  /* A02 = -6.140626477716545211782e-02 */
+        .quad 0x3F77812411CE99B6  /* A03 = +5.738392731393584730859e-03 */
+        .quad 0x3FE85879424095B1  /* A00 = +7.608000082006382003286e-01 */
+        .quad 0x3FC7E73BD1674D84  /* A01 = +1.867441914060742336190e-01 */
+        .quad 0xBFA96F84E4BF333B  /* A02 = -4.967894832916504993525e-02 */
+        .quad 0x3F72606DDCA6E117  /* A03 = +4.486493251924870105662e-03 */
+        .quad 0x3FE96BFE4957F4DD  /* A00 = +7.944327766887472330737e-01 */
+        .quad 0x3FC3ED4780D25478  /* A01 = +1.556786898624158421711e-01 */
+        .quad 0xBFA489C5F9A56B58  /* A02 = -4.011362717093075458408e-02 */
+        .quad 0x3F6CB5DC17E9AD2A  /* A03 = +3.504686231556104931972e-03 */
+        .quad 0x3FEA5D9CB2F41234  /* A00 = +8.239272589858672724006e-01 */
+        .quad 0x3FC091A758374DCF  /* A01 = +1.294449978582705440555e-01 */
+        .quad 0xBFA08E436D4B5CE0  /* A02 = -3.233538350257858517978e-02 */
+        .quad 0x3F666997AD53E6B7  /* A03 = +2.735897297154145629133e-03 */
+        .quad 0x3FEB3060342CB850  /* A00 = +8.496552485501158713532e-01 */
+        .quad 0x3FBB7D30BBC7DC1B  /* A01 = +1.073790033768634993860e-01 */
+        .quad 0xBF9AA6BA3443D9E3  /* A02 = -2.602663940430173170060e-02 */
+        .quad 0x3F617CA764B7850B  /* A03 = +2.134634914668814050648e-03 */
+        .quad 0x3FEBE759A6A0C7B8  /* A00 = +8.719909910635044170135e-01 */
+        .quad 0x3FB6C10DE6A703FF  /* A01 = +8.888327485239243264115e-02 */
+        .quad 0xBF956C566D8BE1F6  /* A02 = -2.092108768099084498138e-02 */
+        .quad 0x3F5B46D1A4A59CF8  /* A03 = +1.664833764687232917079e-03 */
+        .quad 0x3FEC858494887A04  /* A00 = +8.912985707318630268503e-01 */
+        .quad 0x3FB2CC31F543394D  /* A01 = +7.342827070099140762682e-02 */
+        .quad 0xBF9133477FF69137  /* A02 = -1.679717749142747504343e-02 */
+        .quad 0x3F5544482FBB4DA5  /* A03 = +1.298017973501022466823e-03 */
+        .quad 0x3FED0DB59D0E32E9  /* A00 = +9.079235141267335551518e-01 */
+        .quad 0x3FAF006BAFFC6EF4  /* A01 = +6.055008433597022787787e-02 */
+        .quad 0xBF8B97146FA2B97A  /* A02 = -1.347175565419144252499e-02 */
+        .quad 0x3F5093B01F4CDC69  /* A03 = +1.011774057770665211434e-03 */
+        .quad 0x3FEDB487C3EC457C  /* A00 = +9.282873942012623835751e-01 */
+        .quad 0x3FA7390C09D0BD1D  /* A01 = +4.535710925881118044112e-02 */
+        .quad 0xBF83D9F7C3181106  /* A02 = -9.693084374710735778846e-03 */
+        .quad 0x3F46E34A0A3C0E64  /* A03 = +6.984817050299072134500e-04 */
+        .quad 0x3FEE5FFCB4E6EB00  /* A00 = +9.492171796076434020506e-01 */
+        .quad 0x3F9F4913ED00AADF  /* A01 = +3.055220731782070861526e-02 */
+        .quad 0xBF79670BD0E59B5C  /* A02 = -6.201788097633133961528e-03 */
+        .quad 0x3F3BC998EBCAF96D  /* A03 = +4.240034429975534616304e-04 */
+        .quad 0x3FEEDBA41E9542FE  /* A00 = +9.643116566968215064293e-01 */
+        .quad 0x3F94F5DD18D9C24D  /* A01 = +2.046914543319848858727e-02 */
+        .quad 0xBF7034896AA122B9  /* A02 = -3.956352980886528904192e-03 */
+        .quad 0x3F30DCCB47810B39  /* A03 = +2.573009765038273091199e-04 */
+        .quad 0x3FEF33F2882520ED  /* A00 = +9.750912341196716903724e-01 */
+        .quad 0x3F8BF37F2CF553FF  /* A01 = +1.364802699996836392315e-02 */
+        .quad 0xBF649F6F05A69619  /* A02 = -2.517430152880317534986e-03 */
+        .quad 0x3F247623C950AAC9  /* A03 = +1.561087307505231250044e-04 */
+        .quad 0x3FEF727757751741  /* A00 = +9.827229221489021115943e-01 */
+        .quad 0x3F828E67912C4400  /* A01 = +9.060677640748693306705e-03 */
+        .quad 0xBF5A2F51A806CC2C  /* A02 = -1.598195784123355826789e-03 */
+        .quad 0x3F18D35D7687E613  /* A03 = +9.470231965016282719549e-05 */
+        .quad 0x3FEF9E6325C5942A  /* A00 = +9.880843866091073568469e-01 */
+        .quad 0x3F788AB117618F76  /* A01 = +5.991641772286606867914e-03 */
+        .quad 0xBF5096EAB0B1EA89  /* A02 = -1.012543859160305046233e-03 */
+        .quad 0x3F0E1E50EC4435AB  /* A03 = +5.744633156910412119652e-05 */
+        .quad 0x3FEFBD0784049369  /* A00 = +9.918248728250605994461e-01 */
+        .quad 0x3F702BBD8294035F  /* A01 = +3.947963975634432264028e-03 */
+        .quad 0xBF44FB55E0F00593  /* A02 = -6.403130845457509273330e-04 */
+        .quad 0x3F0244DCD723230A  /* A03 = +3.484534217219031730379e-05 */
+        .quad 0x3FEFD245E2366A43  /* A00 = +9.944180887426415926811e-01 */
+        .quad 0x3F653D82EC088433  /* A01 = +2.592807490387838333795e-03 */
+        .quad 0xBF3A7DF75E013CB8  /* A02 = -4.042366908878036561859e-04 */
+        .quad 0x3EF6298E69F991CD  /* A03 = +2.113564425911141559972e-05 */
+        .quad 0x3FEFE0EAA508BC69  /* A00 = +9.962056372950317539861e-01 */
+        .quad 0x3F5BD0771AF3FDDA  /* A01 = +1.697651208644282514598e-03 */
+        .quad 0xBF30B2E1254DE571  /* A02 = -2.548026725928887099328e-04 */
+        .quad 0x3EEAE28B70EC0256  /* A03 = +1.281973848454955042307e-05 */
+        .quad 0x3FEFEAF5303D7F96  /* A00 = +9.974313680831865536192e-01 */
+        .quad 0x3F5229111365657E  /* A01 = +1.108423877289460134782e-03 */
+        .quad 0xBF250572D04DFE66  /* A02 = -1.603796628408704519168e-04 */
+        .quad 0x3EE04E89BB57C981  /* A03 = +7.775682983689149966743e-06 */
+        .quad 0x3FEFF1CF52F1CF44  /* A00 = +9.982678051005469122003e-01 */
+        .quad 0x3F47A71316147CEB  /* A01 = +7.218211359577819110842e-04 */
+        .quad 0xBF1A6D7604055719  /* A02 = -1.008132248946049582547e-04 */
+        .quad 0x3ED3C8047586A85C  /* A03 = +4.716233739913014633626e-06 */
+        .quad 0x3FEFF6770369EF69  /* A00 = +9.988360468555416149528e-01 */
+        .quad 0x3F3EBB261180FBF0  /* A01 = +4.689186039321105101130e-04 */
+        .quad 0xBF1097754FE19D7F  /* A02 = -6.329206004950480057066e-05 */
+        .quad 0x3EC7FEFF83BCA0A7  /* A03 = +2.860556404988488738366e-06 */
+        .quad 0x3FEFF99D42371AC4  /* A00 = +9.992204945818561334647e-01 */
+        .quad 0x3F33EB2AEC271F59  /* A01 = +3.039340773764907474054e-04 */
+        .quad 0xBF04CF18E0FC0D79  /* A02 = -3.968996690952969588805e-05 */
+        .quad 0x3EBD1BDBD6019BE9  /* A03 = +1.735021065507727833886e-06 */
+        .quad 0x3FEFFBBCA32B0D91  /* A00 = +9.994795977476532700123e-01 */
+        .quad 0x3F29C41E1615110A  /* A01 = +1.965796209707565346710e-04 */
+        .quad 0xBEFA11F93D9DCB5A  /* A02 = -2.486248909101414873235e-05 */
+        .quad 0x3EB1A7CA4546F7A7  /* A03 = +1.052345642723709228769e-06 */
+        .quad 0x3FEFFD298B8E8DE2  /* A00 = +9.996535993308806045121e-01 */
+        .quad 0x3F20A1C42D523C5B  /* A01 = +1.268913244172078754520e-04 */
+        .quad 0xBEF0507A364AFAE4  /* A02 = -1.555859070622834605755e-05 */
+        .quad 0x3EA56ACA17E7CDF4  /* A03 = +6.382806956848098872313e-07 */
+        .quad 0x3FEFFE1DC82BA5A3  /* A00 = +9.997700604991915929176e-01 */
+        .quad 0x3F156E73B90F1769  /* A01 = +8.175450626798714452801e-05 */
+        .quad 0xBEE4663579D0A09F  /* A02 = -9.727122057226747625365e-06 */
+        .quad 0x3E99FAF6FEC5D4C1  /* A03 = +3.871371052824002996020e-07 */
+        .quad 0x3FEFFEF8D0BB5E81  /* A00 = +9.998745037837154514548e-01 */
+        .quad 0x3F06686DA18D39C3  /* A01 = +4.273972098777251447726e-05 */
+        .quad 0xBED46BC298073E90  /* A02 = -4.868731025855742842491e-06 */
+        .quad 0x3E88E42286B9D0FD  /* A03 = +1.854535328530838170114e-07 */
+        .quad 0x3FEFFF8DBC68DDC7  /* A00 = +9.999455146670975791423e-01 */
+        .quad 0x3EF26B2953A80AF0  /* A01 = +1.756534514108903368909e-05 */
+        .quad 0xBEBFC4472D580F83  /* A02 = -1.893443529411295465239e-06 */
+        .quad 0x3E72505B4553D19F  /* A03 = +6.822456673547912277047e-08 */
+        .quad 0x3FEFFFCED1276609  /* A00 = +9.999765477215883935358e-01 */
+        .quad 0x3EDE1A94C7CC58F5  /* A01 = +7.177313020153979672606e-06 */
+        .quad 0xBEA8A2C988744E57  /* A02 = -7.342066660497443762363e-07 */
+        .quad 0x3E5AF30036BBBAF4  /* A03 = +2.509841882843541084885e-08 */
+        .quad 0x3FEFFFEAFE70FCFC  /* A00 = +9.999899835164849370983e-01 */
+        .quad 0x3EC879175E3549F5  /* A01 = +2.917410471128503564412e-06 */
+        .quad 0xBE930E36677D1813  /* A02 = -2.839493400307523115929e-07 */
+        .quad 0x3E43D4005B42D48F  /* A03 = +9.233192745401904898013e-09 */
+        .quad 0x3ff0000000000000
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        .align 16
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSignMask        */
+        .align 16
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff           /* _sAbsMask         */
+        .align 16
+        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask     */
+        .align 16
+        .long 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000           /* _iExpMask         */
+        .align 16
+        .long 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000           /* _iMinIdxOfsMask   */
+        .align 16
+        .long 0x04280000, 0x04280000, 0x04280000, 0x04280000           /* _iMaxIdxMask      */
+        .align 16
+        .type	__svml_stanh_data_internal,@object
+        .size	__svml_stanh_data_internal,.-__svml_stanh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S
new file mode 100644
index 0000000000..a56795e3cd
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized tanhf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_tanhf _ZGVdN8v_tanhf_sse_wrapper
+#include "../svml_s_tanhf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c
new file mode 100644
index 0000000000..fadcea36ab
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized tanhf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_tanhf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_tanhf, __GI__ZGVdN8v_tanhf,
+	       __redirect__ZGVdN8v_tanhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
new file mode 100644
index 0000000000..c19e6bf8b5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
@@ -0,0 +1,844 @@
+/* Function tanhf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   NOTE: Since the hyperbolic tangent function is odd
+ *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
+ *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
+ *
+ *   We use a table lookup method to compute tanh(|x|).
+ *   The basic idea is to split the input range into a number of subintervals
+ *   and to approximate tanh(.) with a polynomial on each of them.
+ *
+ *   IEEE SPECIAL CONDITIONS:
+ *   x = [+,-]0, r = [+,-]0
+ *   x = +Inf,   r = +1
+ *   x = -Inf,   r = -1
+ *   x = QNaN,   r = QNaN
+ *   x = SNaN,   r = QNaN
+ *
+ *
+ *   ALGORITHM DETAILS
+ *   We handle special values in a callout function, aside from main path
+ *   computations. "Special" for this algorithm are:
+ *   INF, NAN, |x| > HUGE_THRESHOLD
+ *
+ *
+ *   Main path computations are organized as follows:
+ *   Actually we split the interval [0, SATURATION_THRESHOLD)
+ *   into a number of subintervals.  On each subinterval we approximate tanh(.)
+ *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
+ *   are computed beforehand and stored in table. We also use
+ *
+ *       y := |x| + B,
+ *
+ *   here B depends on subinterval and is used to make argument
+ *   closer to zero.
+ *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
+ *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
+ *   preserve main path computation logic but return 1.0 for all arguments.
+ *
+ *   Hence reconstruction looks as follows:
+ *   we extract proper polynomial and range reduction coefficients
+ *        (Pj and B), corresponding to subinterval, to which |x| belongs,
+ *        and return
+ *
+ *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
+ *
+ *   NOTE: we use multiprecision technique to multiply and sum the first
+ *         K terms of the polynomial. So Pj, j = 0..K are stored in
+ *         table each as a pair of target precision numbers (Pj and PLj) to
+ *         achieve wider than target precision.
+ *
+ *
+ */
+
+/* Offsets for data table __svml_stanh_data_internal
+ */
+#define _dbP                          	0
+#define _sSignMask                    	4288
+#define _sAbsMask                     	4320
+#define _iExpMantMask                 	4352
+#define _iExpMask                     	4384
+#define _iMinIdxOfsMask               	4416
+#define _iMaxIdxMask                  	4448
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_tanhf_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        pushq     %r12
+        subq      $120, %rsp
+        lea       _dbP+16+__svml_stanh_data_internal(%rip), %r10
+        vmovaps   %ymm0, %ymm12
+
+/* Here huge arguments, INF and NaNs are filtered out to callout. */
+        vpand     _iExpMantMask+__svml_stanh_data_internal(%rip), %ymm12, %ymm14
+
+/*
+ *  small table specific variables *
+ *  Constant loading
+ */
+        vmovups   _iMaxIdxMask+__svml_stanh_data_internal(%rip), %ymm8
+        vpsubd    _iMinIdxOfsMask+__svml_stanh_data_internal(%rip), %ymm14, %ymm9
+
+/* if VMIN, VMAX is defined for I type */
+        vxorps    %ymm15, %ymm15, %ymm15
+        vpcmpgtd  %ymm15, %ymm9, %ymm0
+        vpand     %ymm0, %ymm9, %ymm7
+        vpcmpgtd  %ymm8, %ymm9, %ymm6
+        vblendvps %ymm6, %ymm8, %ymm7, %ymm3
+        vpsrld    $14, %ymm3, %ymm1
+        vpcmpgtd  _iExpMask+__svml_stanh_data_internal(%rip), %ymm14, %ymm13
+        vmovmskps %ymm13, %r11d
+        vandps    _sAbsMask+__svml_stanh_data_internal(%rip), %ymm12, %ymm10
+        vandps    _sSignMask+__svml_stanh_data_internal(%rip), %ymm12, %ymm11
+        vextractf128 $1, %ymm1, %xmm2
+        vmovd     %xmm1, %r9d
+        vmovd     %xmm2, %ecx
+        vpextrd   $1, %xmm2, %edx
+        vpextrd   $1, %xmm1, %r8d
+        movslq    %r9d, %r9
+        movslq    %edx, %rdx
+        movslq    %r8d, %r8
+        vpextrd   $2, %xmm1, %edi
+        movslq    %ecx, %rcx
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -8; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf8, 0xff, 0xff, 0xff, 0x22
+        vpextrd   $3, %xmm2, %r12d
+        vpextrd   $3, %xmm1, %esi
+        vpextrd   $2, %xmm2, %eax
+        movslq    %edi, %rdi
+        movslq    %r12d, %r12
+        movslq    %esi, %rsi
+        movslq    %eax, %rax
+        vmovupd   -16(%r9,%r10), %xmm5
+        vmovupd   -16(%rdx,%r10), %xmm14
+        vmovupd   -16(%rcx,%r10), %xmm13
+        vmovupd   (%r9,%r10), %xmm1
+        vmovupd   (%r8,%r10), %xmm2
+        vmovupd   -16(%r8,%r10), %xmm4
+        vinsertf128 $1, -16(%rdi,%r10), %ymm5, %ymm15
+        vinsertf128 $1, -16(%r12,%r10), %ymm14, %ymm3
+        vinsertf128 $1, -16(%rax,%r10), %ymm13, %ymm6
+        vinsertf128 $1, (%rdi,%r10), %ymm1, %ymm5
+        vinsertf128 $1, (%rsi,%r10), %ymm2, %ymm14
+        vunpcklpd %ymm3, %ymm6, %ymm8
+        vunpckhpd %ymm3, %ymm6, %ymm6
+        vunpcklpd %ymm14, %ymm5, %ymm3
+        vunpckhpd %ymm14, %ymm5, %ymm2
+        vmovupd   (%rcx,%r10), %xmm13
+        vcvtps2pd %xmm10, %ymm5
+        vextractf128 $1, %ymm10, %xmm10
+        vfmadd213pd %ymm3, %ymm5, %ymm2
+        vinsertf128 $1, -16(%rsi,%r10), %ymm4, %ymm0
+        vmovupd   (%rdx,%r10), %xmm4
+        vunpcklpd %ymm0, %ymm15, %ymm9
+        vunpckhpd %ymm0, %ymm15, %ymm7
+        vfmadd213pd %ymm7, %ymm5, %ymm2
+        vfmadd213pd %ymm9, %ymm5, %ymm2
+        vinsertf128 $1, (%r12,%r10), %ymm4, %ymm0
+        vcvtps2pd %xmm10, %ymm4
+        vinsertf128 $1, (%rax,%r10), %ymm13, %ymm15
+        vunpcklpd %ymm0, %ymm15, %ymm1
+        vunpckhpd %ymm0, %ymm15, %ymm0
+        vfmadd213pd %ymm1, %ymm4, %ymm0
+        vcvtpd2ps %ymm2, %xmm1
+        vfmadd213pd %ymm6, %ymm4, %ymm0
+        vfmadd213pd %ymm8, %ymm4, %ymm0
+        vcvtpd2ps %ymm0, %xmm0
+        vinsertf128 $1, %xmm0, %ymm1, %ymm2
+        vorps     %ymm11, %ymm2, %ymm0
+        testl     %r11d, %r11d
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r13 r14 r15 r11d ymm0 ymm12
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $120, %rsp
+        cfi_restore(12)
+        popq      %r12
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -8; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf8, 0xff, 0xff, 0xff, 0x22
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm12, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r13 r14 r15 r11d ymm0
+
+        xorl      %r12d, %r12d
+                                # LOE rbx r13 r14 r15 r11d r12d
+
+        vzeroupper
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
+        movl      %r11d, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      tanhf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_tanhf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_stanh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct
+{
+        __declspec(align(32)) VUINT32 _dbP[(134*4)][2];
+        __declspec(align(32)) VUINT32 _sSignMask[8][1];
+        __declspec(align(32)) VUINT32 _sAbsMask[8][1];
+        __declspec(align(32)) VUINT32 _iExpMantMask[8][1];
+        __declspec(align(32)) VUINT32 _iExpMask[8][1];
+        __declspec(align(32)) VUINT32 _iMinIdxOfsMask[8][1];
+        __declspec(align(32)) VUINT32 _iMaxIdxMask[8][1];
+} __svml_stanh_data_internal;
+#endif
+__svml_stanh_data_internal:
+        /* Pol_000:  err=7.93e-09, x in [0.0000000; 0.0312500]. */
+        .quad 0x0000000000000000  /* A00 = +0.000000000000000000000e-01 */
+        .quad 0x3FF00000022C70EB  /* A01 = +1.000000008097283510367e+00 */
+        .quad 0xBED00E878CFFA194  /* A02 = -3.828228912518614443549e-06 */
+        .quad 0xBFD551766D0607A9  /* A03 = -3.330970825846813476723e-01 */
+        .quad 0xBE53D60CE3E4C297  /* A00 = -1.847383956330407336230e-08 */
+        .quad 0x3FF000024177CF5C  /* A01 = +1.000002151235967140508e+00 */
+        .quad 0xBF1758BC94A51A25  /* A02 = -8.906031613262943753568e-05 */
+        .quad 0xBFD53EAE67E0D4F0  /* A03 = -3.319507612644221339337e-01 */
+        .quad 0xBE5A9E47EF32D6FE  /* A00 = -2.479020984039698285657e-08 */
+        .quad 0x3FF00002DA983057  /* A01 = +1.000002721676556793895e+00 */
+        .quad 0xBF1BD953509E94AA  /* A02 = -1.062352277175377670507e-04 */
+        .quad 0xBFD53BDB562EEDD5  /* A03 = -3.317783681520414806876e-01 */
+        .quad 0xBE6191BBE496D294  /* A00 = -3.272532162914017685901e-08 */
+        .quad 0x3FF0000390492017  /* A01 = +1.000003398528866105366e+00 */
+        .quad 0xBF20727E814A57CE  /* A02 = -1.254825043772153972919e-04 */
+        .quad 0xBFD538DE060A6F22  /* A03 = -3.315959033004550748913e-01 */
+        .quad 0xBE66DAFA2A893A25  /* A00 = -4.257146219278012568149e-08 */
+        .quad 0x3FF0000465E08CD1  /* A01 = +1.000004194219219266770e+00 */
+        .quad 0xBF2341C765EF91B6  /* A02 = -1.469188600530365522261e-04 */
+        .quad 0xBFD535B6841FAF9E  /* A03 = -3.314033785124993469751e-01 */
+        .quad 0xBE6D5794E361E964  /* A00 = -5.465394929765249413434e-08 */
+        .quad 0x3FF000055EE2A0CB  /* A01 = +1.000005121846742950353e+00 */
+        .quad 0xBF265E6C77E66C8B  /* A02 = -1.706607253709506650304e-04 */
+        .quad 0xBFD53264DDCCEDA6  /* A03 = -3.312008062382240103361e-01 */
+        .quad 0xBE729C844D374A6E  /* A00 = -6.933284462462096107184e-08 */
+        .quad 0x3FF000067F019093  /* A01 = +1.000006195180536350264e+00 */
+        .quad 0xBF29CC5348D6DCE5  /* A02 = -1.968242326435338705130e-04 */
+        .quad 0xBFD52EE92121ED35  /* A03 = -3.309881995734998416658e-01 */
+        .quad 0xBE775AEA17EAA872  /* A00 = -8.700465590574974405858e-08 */
+        .quad 0x3FF00007CA1D66B8  /* A01 = +1.000007428656699559610e+00 */
+        .quad 0xBF2D8F5EB98A2637  /* A02 = -2.255252009216044881395e-04 */
+        .quad 0xBFD52B435CDF9128  /* A03 = -3.307655722585587376727e-01 */
+        .quad 0xBE7D04DA28C343F0  /* A00 = -1.081040272327705484794e-07 */
+        .quad 0x3FF000094443CCF5  /* A01 = +1.000008837375216730337e+00 */
+        .quad 0xBF30D5B76C947AE5  /* A02 = -2.568791210978817814332e-04 */
+        .quad 0xBFD52773A0776FAD  /* A03 = -3.305329386764651045105e-01 */
+        .quad 0xBE81DD77A12C51C7  /* A00 = -1.331054169875768625701e-07 */
+        .quad 0x3FF0000AF1AFD2DA  /* A01 = +1.000010437096696680470e+00 */
+        .quad 0xBF331230624C1680  /* A02 = -2.910011410651516805537e-04 */
+        .quad 0xBFD52379FC0B61DF  /* A03 = -3.302903138515186909352e-01 */
+        .quad 0xBE85D04EEEB3C435  /* A00 = -1.625247628488202841012e-07 */
+        .quad 0x3FF0000CD6C9B1F2  /* A01 = +1.000012244238970726684e+00 */
+        .quad 0xBF357F0742FADDD4  /* A02 = -3.280060509313874068243e-04 */
+        .quad 0xBFD51F56806D0E81  /* A03 = -3.300377134475880880338e-01 */
+        .quad 0xBE8A6E289B59681B  /* A00 = -1.969211333326924655065e-07 */
+        .quad 0x3FF0000EF8268F72  /* A01 = +1.000014275873550406715e+00 */
+        .quad 0xBF381E277A1B747A  /* A02 = -3.680082682942575423093e-04 */
+        .quad 0xBFD51B093F1D6FD4  /* A03 = -3.297751537663746734808e-01 */
+        .quad 0xBE8FCBC40EE9ABD5  /* A00 = -2.368983653301529373887e-07 */
+        .quad 0x3FF000115A883B6C  /* A01 = +1.000016549721943981410e+00 */
+        .quad 0xBF3AF17AC974B3D9  /* A02 = -4.111218235774406434303e-04 */
+        .quad 0xBFD516924A4C549C  /* A03 = -3.295026517456081105450e-01 */
+        .quad 0xBE92FFBC60A3F956  /* A00 = -2.831066871072026054144e-07 */
+        .quad 0x3FF0001402DCED8A  /* A01 = +1.000019084151832604590e+00 */
+        .quad 0xBF3DFAE9390C4801  /* A02 = -4.574603454311488280083e-04 */
+        .quad 0xBFD511F1B4D7DC3A  /* A03 = -3.292202249571719585575e-01 */
+        .quad 0xBE9690A22F96D5AD  /* A00 = -3.362443262393081632612e-07 */
+        .quad 0x3FF00016F63EFF5D  /* A01 = +1.000021898173108825247e+00 */
+        .quad 0xBF409E2C839605BB  /* A02 = -5.071370461992499986334e-04 */
+        .quad 0xBFD50D27924BEE00  /* A03 = -3.289278916051614487515e-01 */
+        .quad 0xBE9AA56C65E72A73  /* A00 = -3.970591019557469835586e-07 */
+        .quad 0x3FF0001A39F4A43E  /* A01 = +1.000025011433776978009e+00 */
+        .quad 0xBF425BD74C3D6667  /* A02 = -5.602647074553602319844e-04 */
+        .quad 0xBFD50833F6E1ABA2  /* A03 = -3.286256705238718156536e-01 */
+        .quad 0xBE9F4BD4FF1A83B0  /* A00 = -4.663500013744687071912e-07 */
+        .quad 0x3FF0001DD36F9EC2  /* A01 = +1.000028444215715683896e+00 */
+        .quad 0xBF44376634149405  /* A02 = -6.169556656102642569831e-04 */
+        .quad 0xBFD50316F77EDEE5  /* A03 = -3.283135811757190158922e-01 */
+        .quad 0xBEA3B625387BB079  /* A00 = -5.874486399249461304297e-07 */
+        .quad 0x3FF00023E14CFBA9  /* A01 = +1.000034217911642153709e+00 */
+        .quad 0xBF47392F923218D2  /* A02 = -7.087213783883111826306e-04 */
+        .quad 0xBFD4FB1FACDEB938  /* A03 = -3.278273761924483942209e-01 */
+        .quad 0xBEAA6E24F543500A  /* A00 = -7.876828740601738750574e-07 */
+        .quad 0x3FF0002D5C6E8412  /* A01 = +1.000043259679163742959e+00 */
+        .quad 0xBF4BAF02BD7FDD70  /* A02 = -8.448375110664940040861e-04 */
+        .quad 0xBFD4EFEE6527A7DE  /* A03 = -3.271442401734229177279e-01 */
+        .quad 0xBEB16E3EBE2157D0  /* A00 = -1.038947396133402500647e-06 */
+        .quad 0x3FF00038990FEE2F  /* A01 = +1.000053975962952312884e+00 */
+        .quad 0xBF50569481C574CB  /* A02 = -9.972048056490652716971e-04 */
+        .quad 0xBFD4E419278DA2B4  /* A03 = -3.264220129263251113372e-01 */
+        .quad 0xBEB6A7B6723165D4  /* A00 = -1.350350836279403750524e-06 */
+        .quad 0x3FF00045CAB4158E  /* A01 = +1.000066558657042303793e+00 */
+        .quad 0xBF531D7C9C849108  /* A02 = -1.166698160951775212202e-03 */
+        .quad 0xBFD4D7A0BB33B152  /* A03 = -3.256608799117844954552e-01 */
+        .quad 0xBEBD0EE2A8654AFD  /* A00 = -1.732000471561702711532e-06 */
+        .quad 0x3FF00055276F18D6  /* A01 = +1.000081209219890521211e+00 */
+        .quad 0xBF562FDBA3FB6C6C  /* A02 = -1.354183666925102939860e-03 */
+        .quad 0xBFD4CA85F1B93DB2  /* A03 = -3.248610363561638125773e-01 */
+        .quad 0xBEC269D4036A207E  /* A00 = -2.195047297096822741730e-06 */
+        .quad 0x3FF00066E7DA6E4E  /* A01 = +1.000098138500919997540e+00 */
+        .quad 0xBF5991499FC36B3A  /* A02 = -1.560518167983372759405e-03 */
+        .quad 0xBFD4BCC9A72283D6  /* A03 = -3.240226871658341556426e-01 */
+        .quad 0xBEC7154B6C09CFE1  /* A00 = -2.751729738565190291276e-06 */
+        .quad 0x3FF0007B47086B80  /* A01 = +1.000117566559055148900e+00 */
+        .quad 0xBF5D455433B4F8F4  /* A02 = -1.786548832412968197680e-03 */
+        .quad 0xBFD4AE6CC1BFE145  /* A03 = -3.231460468373550942722e-01 */
+        .quad 0xBECCA68CC64A0F8A  /* A00 = -3.415415948561670285790e-06 */
+        .quad 0x3FF00092827742F7  /* A01 = +1.000139722473418535387e+00 */
+        .quad 0xBF60A7BF15A527AF  /* A02 = -2.033112728132522705610e-03 */
+        .quad 0xBFD49F703214084C  /* A03 = -3.222313393636155876010e-01 */
+        .quad 0xBED19E68676B241B  /* A00 = -4.200644630977303616698e-06 */
+        .quad 0x3FF000ACDA037B26  /* A01 = +1.000164844146362863597e+00 */
+        .quad 0xBF62D99F836A02F8  /* A02 = -2.301036405072284102280e-03 */
+        .quad 0xBFD48FD4F2B91B28  /* A03 = -3.212787981359945810311e-01 */
+        .quad 0xBED57CF4B0C7AA54  /* A00 = -5.123164339408145209103e-06 */
+        .quad 0x3FF000CA8FD9E1A1  /* A01 = +1.000193178099017865534e+00 */
+        .quad 0xBF653A014548E686  /* A02 = -2.591135484433962181405e-03 */
+        .quad 0xBFD47F9C0844B38F  /* A03 = -3.202886658426046806447e-01 */
+        .quad 0xBEDA012B1B1A41E2  /* A00 = -6.199971197454598722328e-06 */
+        .quad 0x3FF000EBE868FDF4  /* A01 = +1.000224979259539459520e+00 */
+        .quad 0xBF67CA9427E0A544  /* A02 = -2.904214255086275467410e-03 */
+        .quad 0xBFD46EC6812ADB37  /* A03 = -3.192611943626845749655e-01 */
+        .quad 0xBEDF3EAC5BF12194  /* A00 = -7.449344990702664567927e-06 */
+        .quad 0x3FF001112A520784  /* A01 = +1.000260510744255704196e+00 */
+        .quad 0xBF6A8D01ABDA4DC4  /* A02 = -3.241065277345108255891e-03 */
+        .quad 0xBFD45D55759FFA4A  /* A03 = -3.181966446572103146551e-01 */
+        .quad 0xBEE2A541BC274267  /* A00 = -8.890883582164319970972e-06 */
+        .quad 0x3FF0013A9E5961F2  /* A01 = +1.000300043631906721231e+00 */
+        .quad 0xBF6D82ECD080C540  /* A02 = -3.602468994380686462264e-03 */
+        .quad 0xBFD44B4A0779C0AD  /* A03 = -3.170952866557950611259e-01 */
+        .quad 0xBEE61D97609A27F4  /* A00 = -1.054553560499505625520e-05 */
+        .quad 0x3FF001688F56A3AF  /* A01 = +1.000343856731187974773e+00 */
+        .quad 0xBF7056F8EFB683EC  /* A02 = -3.989193351487490407647e-03 */
+        .quad 0xBFD438A5620F0F74  /* A03 = -3.159573991399533543500e-01 */
+        .quad 0xBEEA145429EDD370  /* A00 = -1.243563138839952927732e-05 */
+        .quad 0x3FF0019B4A242A67  /* A01 = +1.000392236341804297339e+00 */
+        .quad 0xBF7207D31CA78D9B  /* A02 = -4.401993423445739288258e-03 */
+        .quad 0xBFD42568BA16E7CD  /* A03 = -3.147832696228050619602e-01 */
+        .quad 0xBEEE96370D52680F  /* A00 = -1.458491207477835326165e-05 */
+        .quad 0x3FF001D31D8E4115  /* A01 = +1.000445476009251821736e+00 */
+        .quad 0xBF73D4CC11EDC094  /* A02 = -4.841611050196221316400e-03 */
+        .quad 0xBFD411954D8664E7  /* A03 = -3.135731942252974469021e-01 */
+        .quad 0xBEF338C046215EF8  /* A00 = -1.833122622260562810219e-05 */
+        .quad 0x3FF00230C32C2EC1  /* A01 = +1.000534784691737621998e+00 */
+        .quad 0xBF76BD019BCC5DAF  /* A02 = -5.551344188254799492943e-03 */
+        .quad 0xBFD3F2C7156DC21E  /* A03 = -3.116929730668135389848e-01 */
+        .quad 0xBEF9B15EAE411EAE  /* A00 = -2.450261207822986676092e-05 */
+        .quad 0x3FF002C2DF057A4D  /* A01 = +1.000674124886830940184e+00 */
+        .quad 0xBF7B08CCD9AC1E30  /* A02 = -6.600189396301511801646e-03 */
+        .quad 0xBFD3C7A7A114FED8  /* A03 = -3.090609620157755976777e-01 */
+        .quad 0xBF00E36483C373B3  /* A00 = -3.221178528332122595812e-05 */
+        .quad 0x3FF0036F419480D7  /* A01 = +1.000838524028997644777e+00 */
+        .quad 0xBF7FD255D1777007  /* A02 = -7.768950679260206403087e-03 */
+        .quad 0xBFD39A453911D6CE  /* A03 = -3.062909180947429588215e-01 */
+        .quad 0xBF05DFA04DD12059  /* A00 = -4.172046622180685472624e-05 */
+        .quad 0x3FF00438B2A03D8D  /* A01 = +1.001030633695197069599e+00 */
+        .quad 0xBF828F8DBB4A9D10  /* A02 = -9.062869337255224921890e-03 */
+        .quad 0xBFD36AAB704697D9  /* A03 = -3.033856007044711255993e-01 */
+        .quad 0xBF0BF3E0C647DEFB  /* A00 = -5.331544597092331081714e-05 */
+        .quad 0x3FF005221063D36D  /* A01 = +1.001253189109060359741e+00 */
+        .quad 0xBF857A2CB3C96102  /* A02 = -1.048693584122917590862e-02 */
+        .quad 0xBFD338E65BBB4FEC  /* A03 = -3.003478904549854444639e-01 */
+        .quad 0xBF11A506ED7C9D31  /* A00 = -6.730894835681591541979e-05 */
+        .quad 0x3FF0062E4D0EA92A  /* A01 = +1.001508999829250345925e+00 */
+        .quad 0xBF88AB82C2761AF3  /* A02 = -1.204588085125866091241e-02 */
+        .quad 0xBFD305028D6BD206  /* A03 = -2.971807843271395688234e-01 */
+        .quad 0xBF1607C0922D9BF1  /* A00 = -8.403885708006799337092e-05 */
+        .quad 0x3FF007606C341961  /* A01 = +1.001800940198869449560e+00 */
+        .quad 0xBF8C25E6DA487BCF  /* A02 = -1.374416688582682892494e-02 */
+        .quad 0xBFD2CF0D0EE8F7B5  /* A03 = -2.938873906713255768075e-01 */
+        .quad 0xBF1B3A8480A0A16D  /* A00 = -1.038688061788578038307e-04 */
+        .quad 0x3FF008BB802D02D6  /* A01 = +1.002131939589323561535e+00 */
+        .quad 0xBF8FEB8AE99FD100  /* A02 = -1.558598065819483124983e-02 */
+        .quad 0xBFD297135BD0911B  /* A03 = -2.904709240558688843059e-01 */
+        .quad 0xBF20ABB9BDB75C65  /* A00 = -1.271881327357976163798e-04 */
+        .quad 0x3FF00A42A76D8CD1  /* A01 = +1.002504972472525901495e+00 */
+        .quad 0xBF91FF3D752BB9E6  /* A02 = -1.757522609380570560722e-02 */
+        .quad 0xBFD25D235C1F88B4  /* A03 = -2.869346999779154305799e-01 */
+        .quad 0xBF243D3254425461  /* A00 = -1.544116913733432829448e-04 */
+        .quad 0x3FF00BF909D1795E  /* A01 = +1.002923048355647051011e+00 */
+        .quad 0xBF94304E04D44942  /* A02 = -1.971551804042204897316e-02 */
+        .quad 0xBFD2214B5E61CFA6  /* A03 = -2.832821294498394371075e-01 */
+        .quad 0xBF286070011B61CE  /* A00 = -1.859795307186510085994e-04 */
+        .quad 0x3FF00DE1D5E1627E  /* A01 = +1.003389201612804537689e+00 */
+        .quad 0xBF9689D5F4163F59  /* A02 = -2.201017668045266231780e-02 */
+        .quad 0xBFD1E39A11C3B42C  /* A03 = -2.795167134743816728104e-01 */
+        .quad 0xBF2D250B366A79E8  /* A00 = -2.223564326486314902259e-04 */
+        .quad 0x3FF010003E134001  /* A01 = +1.003906481248123094829e+00 */
+        .quad 0xBF990C9FF91F6F81  /* A02 = -2.446222265267250853271e-02 */
+        .quad 0xBFD1A41E80084CDC  /* A03 = -2.756420374218586655246e-01 */
+        .quad 0xBF314DB5DDC2A30E  /* A00 = -2.640313157465248123865e-04 */
+        .quad 0x3FF012577608921B  /* A01 = +1.004477940624503018441e+00 */
+        .quad 0xBF9BB9626875B0C9  /* A02 = -2.707437288829409385849e-02 */
+        .quad 0xBFD162E80768A9D0  /* A03 = -2.716617653228725615122e-01 */
+        .quad 0xBF346A6133808864  /* A00 = -3.115165050094957730625e-04 */
+        .quad 0x3FF014EAAFCC88A3  /* A01 = +1.005106627192198898157e+00 */
+        .quad 0xBF9E90BEF9BF7419  /* A02 = -2.984903716411588595059e-02 */
+        .quad 0xBFD12006545F7FAD  /* A03 = -2.675796340899932457269e-01 */
+        .quad 0xBF37F180DC3848EA  /* A00 = -3.653468704395550778821e-04 */
+        .quad 0x3FF017BD19147861  /* A01 = +1.005795572250939295955e+00 */
+        .quad 0xBFA0C9A14C702E07  /* A02 = -3.278831537326359207851e-02 */
+        .quad 0xBFD0DB895B650092  /* A03 = -2.633994476818851682154e-01 */
+        .quad 0xBF3BEC6AAC6D7635  /* A00 = -4.260788377246944457107e-04 */
+        .quad 0x3FF01AD1D884E719  /* A01 = +1.006547780778822565040e+00 */
+        .quad 0xBFA260B2A1B1434A  /* A02 = -3.589399551186163439542e-02 */
+        .quad 0xBFD09581529E93D6  /* A03 = -2.591250712233067465817e-01 */
+        .quad 0xBF4164E26167882B  /* A00 = -5.308251737086202562063e-04 */
+        .quad 0x3FF01FEF14B62B81  /* A01 = +1.007796364693348545316e+00 */
+        .quad 0xBFA4EB014538AA42  /* A02 = -4.085544557559163403315e-02 */
+        .quad 0xBFD029D36FEAF41F  /* A03 = -2.525528519580024222613e-01 */
+        .quad 0xBF46F6FFF4E53DC8  /* A00 = -7.008313930700277652464e-04 */
+        .quad 0x3FF027CBB51CBBA0  /* A01 = +1.009715754956893363214e+00 */
+        .quad 0xBFA89DEC9FEC112E  /* A02 = -4.807986690687680864098e-02 */
+        .quad 0xBFCF2A99464D0DB4  /* A03 = -2.434875100390009317053e-01 */
+        .quad 0xBF4DCC9C4F66A4D9  /* A00 = -9.094012482836712945103e-04 */
+        .quad 0x3FF030E7CFCCD583  /* A01 = +1.011939822882909068014e+00 */
+        .quad 0xBFACAA3B95814081  /* A02 = -5.598627281199331645611e-02 */
+        .quad 0xBFCDF78F156BE7CF  /* A03 = -2.341173987004467604844e-01 */
+        .quad 0xBF5308ED74E5C7A6  /* A00 = -1.161796466103906435435e-03 */
+        .quad 0x3FF03B5986412ECB  /* A01 = +1.014489674026594512313e+00 */
+        .quad 0xBFB087EBA88DCC3F  /* A02 = -6.457398285947223148806e-02 */
+        .quad 0xBFCCBB9BD134862F  /* A03 = -2.244753619680052991736e-01 */
+        .quad 0xBF57FA23C00DF4B5  /* A00 = -1.463446533505758208674e-03 */
+        .quad 0x3FF0473558A1BCC0  /* A01 = +1.017384859292903342975e+00 */
+        .quad 0xBFB2E702BC6360EF  /* A02 = -7.383744334527241048871e-02 */
+        .quad 0xBFCB77D546379288  /* A03 = -2.145945160729250122955e-01 */
+        .quad 0xBF5DD12971557F71  /* A00 = -1.819887610814388068450e-03 */
+        .quad 0x3FF0548DDF5000A8  /* A01 = +1.020643112482540360020e+00 */
+        .quad 0xBFB571B63DA186E1  /* A02 = -8.376635555898871710045e-02 */
+        .quad 0xBFCA2D5202605148  /* A03 = -2.045080672838912594358e-01 */
+        .quad 0xBF6252B1AD5D4F17  /* A00 = -2.236697221556737096709e-03 */
+        .quad 0x3FF063738A910BF7  /* A01 = +1.024280110622155737232e+00 */
+        .quad 0xBFB8270C8E6B601B  /* A02 = -9.434584118878357184013e-02 */
+        .quad 0xBFC8DD27D950A07E  /* A03 = -1.942491351230763441116e-01 */
+        .quad 0xBF66470C91730CFC  /* A00 = -2.719425723258004842786e-03 */
+        .quad 0x3FF073F468FCF331  /* A01 = +1.028309259519300633556e+00 */
+        .quad 0xBFBB05C2952191E4  /* A02 = -1.055566419686964629854e-01 */
+        .quad 0xBFC7886A770DE2BD  /* A03 = -1.838505822486435070662e-01 */
+        .quad 0xBF6AD114AC8E98EC  /* A00 = -3.273525599485007861467e-03 */
+        .quad 0x3FF0861BF53E5226  /* A01 = +1.032741506559554434119e+00 */
+        .quad 0xBFBE0C4F9B461507  /* A02 = -1.173753503881763554650e-01 */
+        .quad 0xBFC6302A037CDE3A  /* A03 = -1.733448521642786954722e-01 */
+        .quad 0xBF6FFBDE2A6C2AF8  /* A00 = -3.904279630096648551207e-03 */
+        .quad 0x3FF099F2EB8E7DA3  /* A01 = +1.037585182326304034106e+00 */
+        .quad 0xBFC09C74D192DDF0  /* A02 = -1.297746680554463516444e-01 */
+        .quad 0xBFC4D571D8E3079F  /* A03 = -1.627638157861470424859e-01 */
+        .quad 0xBF72E8FDC0B952AA  /* A00 = -4.616728994353872309042e-03 */
+        .quad 0x3FF0AF7F273C9533  /* A01 = +1.042845872181101141152e+00 */
+        .quad 0xBFC244C512736F10  /* A02 = -1.427236881344176033792e-01 */
+        .quad 0xBFC379474F58B902  /* A03 = -1.521386277613104298645e-01 */
+        .quad 0xBF762EABAF17395B  /* A00 = -5.415602341101023557701e-03 */
+        .quad 0x3FF0C6C3886F63FB  /* A01 = +1.048526318502125631582e+00 */
+        .quad 0xBFC3FDF9918EA12A  /* A02 = -1.561881981590514389957e-01 */
+        .quad 0xBFC21CA89ECAB895  /* A03 = -1.414995932913753196036e-01 */
+        .quad 0xBF79D387CE5B2BAE  /* A00 = -6.305246822828998107258e-03 */
+        .quad 0x3FF0DFBFE2346376  /* A01 = +1.054626353847394337748e+00 */
+        .quad 0xBFC5C6DA43602620  /* A02 = -1.701309994680721970894e-01 */
+        .quad 0xBFC0C08BD8DB6631  /* A03 = -1.308760460731704100557e-01 */
+        .quad 0xBF7DDBA8E8DA9060  /* A00 = -7.289562037531366334164e-03 */
+        .quad 0x3FF0FA70F0D1B464  /* A01 = +1.061142864894713433443e+00 */
+        .quad 0xBFC79E18D92BAA7C  /* A02 = -1.845122394946264732241e-01 */
+        .quad 0xBFBECBBBF74C2669  /* A03 = -1.202962378266875381749e-01 */
+        .quad 0xBF81254E76EA25DA  /* A00 = -8.371937755572145950511e-03 */
+        .quad 0x3FF116D05835EBD0  /* A01 = +1.068069786618014660462e+00 */
+        .quad 0xBFC982539E2ED224  /* A02 = -1.992897531869327609755e-01 */
+        .quad 0xBFBC1B043C350159  /* A03 = -1.097872397413132278254e-01 */
+        .quad 0xBF8391ACBA863403  /* A00 = -9.555196230190082448686e-03 */
+        .quad 0x3FF134D4AA477FE2  /* A01 = +1.075398125794884141015e+00 */
+        .quad 0xBFCB7218609FEAFB  /* A02 = -2.144194099235717521079e-01 */
+        .quad 0xBFB970A16CB88329  /* A03 = -9.937485603633135211599e-02 */
+        .quad 0xBF87935088E48E8B  /* A00 = -1.151144902957603431692e-02 */
+        .quad 0x3FF1649892AD7DD3  /* A01 = +1.087059567413110938716e+00 */
+        .quad 0xBFCE6971DDE75409  /* A02 = -2.375929196847723912089e-01 */
+        .quad 0xBFB58291E88CB251  /* A03 = -8.402358939628952472223e-02 */
+        .quad 0xBF8DB3A62C325325  /* A00 = -1.450280973794233242702e-02 */
+        .quad 0x3FF1A9C900C6DEEA  /* A01 = +1.103951457056548068891e+00 */
+        .quad 0xBFD13DBC65B0E08E  /* A02 = -2.693930619311765140012e-01 */
+        .quad 0xBFB06696F62696D1  /* A03 = -6.406539449252625362252e-02 */
+        .quad 0xBF92583699F2E27A  /* A00 = -1.791463198307716858659e-02 */
+        .quad 0x3FF1F451B85AA9F0  /* A01 = +1.122148246892376022288e+00 */
+        .quad 0xBFD34FD5F8288180  /* A02 = -3.017477916164565954205e-01 */
+        .quad 0xBFA6FB692825B683  /* A03 = -4.488686194495718900788e-02 */
+        .quad 0xBF9641C26E673D6F  /* A00 = -2.173522757385398448959e-02 */
+        .quad 0x3FF24364DA5E2B07  /* A01 = +1.141453602790251542487e+00 */
+        .quad 0xBFD564A5A5EF5890  /* A02 = -3.342680092295120530821e-01 */
+        .quad 0xBF9B43712011A982  /* A03 = -2.662445791467283467968e-02 */
+        .quad 0xBF9A901038EC2F39  /* A00 = -2.594018313816024226548e-02 */
+        .quad 0x3FF2961356DFFEBA  /* A01 = +1.161639537196534011088e+00 */
+        .quad 0xBFD775EBB17198C7  /* A02 = -3.665723069046972759644e-01 */
+        .quad 0xBF833B1A926CD462  /* A03 = -9.390075295963199591975e-03 */
+        .quad 0xBF9F396A6A461B91  /* A00 = -3.049246095317987084727e-02 */
+        .quad 0x3FF2EB53BAEF534B  /* A01 = +1.182452898229899629357e+00 */
+        .quad 0xBFD97DABF8AD8BBD  /* A02 = -3.982953957076310058660e-01 */
+        .quad 0x3F7B8F6A3E0F8837  /* A03 = +6.728568086119371925713e-03 */
+        .quad 0xBFA21878590F8BAA  /* A00 = -3.534294211546946951064e-02 */
+        .quad 0x3FF34209790236E1  /* A01 = +1.203622315111197105253e+00 */
+        .quad 0xBFDB764C0E71BECB  /* A02 = -4.290952817018306997277e-01 */
+        .quad 0x3F962FE0C03F84C0  /* A03 = +2.166701482190513949888e-02 */
+        .quad 0xBFA4B36B9AD27ECC  /* A00 = -4.043136849327097492868e-02 */
+        .quad 0x3FF3990C5B12FC16  /* A01 = +1.224865298994477935679e+00 */
+        .quad 0xBFDD5AABB0D01390  /* A02 = -4.586590983092770912322e-01 */
+        .quad 0x3FA21DAF5CA162DB  /* A03 = +3.538272863142363083844e-02 */
+        .quad 0xBFA7645E4D7BF28B  /* A00 = -4.568762489177399105378e-02 */
+        .quad 0x3FF3EF2FD51C0D9F  /* A01 = +1.245895225962932562069e+00 */
+        .quad 0xBFDF26377E1B686E  /* A02 = -4.867075664057044503963e-01 */
+        .quad 0x3FA8803E756EE812  /* A03 = +4.785342391501513914509e-02 */
+        .quad 0xBFAA210925C64413  /* A00 = -5.103329263796054643398e-02 */
+        .quad 0x3FF44349F897D8E7  /* A01 = +1.266427966181760345066e+00 */
+        .quad 0xBFE06A7B02C6D8E2  /* A02 = -5.129981092675530707226e-01 */
+        .quad 0x3FAE3F194734F5D0  /* A03 = +5.907515520309980505687e-02 */
+        .quad 0xBFACDE48F8A19BBB  /* A00 = -5.638340029764018351832e-02 */
+        .quad 0x3FF49439D5466582  /* A01 = +1.286187966447272845727e+00 */
+        .quad 0xBFE131C7C1063DDC  /* A02 = -5.373266954429101183166e-01 */
+        .quad 0x3FB1ADEEC36AD805  /* A03 = +6.906025191241844940482e-02 */
+        .quad 0xBFAF905D8F585680  /* A00 = -6.164829611604449866036e-02 */
+        .quad 0x3FF4E0ED1FD27F99  /* A01 = +1.304913639360142818546e+00 */
+        .quad 0xBFE1E7A859DC1D3D  /* A02 = -5.595285182070380836095e-01 */
+        .quad 0x3FB3ED018E4642A1  /* A03 = +7.783517573831001679086e-02 */
+        .quad 0xBFB11595104160BA  /* A00 = -6.673556944713512906198e-02 */
+        .quad 0x3FF528650340490B  /* A01 = +1.322361958217302513319e+00 */
+        .quad 0xBFE28B14B40BC974  /* A02 = -5.794776455425521000109e-01 */
+        .quad 0x3FB5DF49F5BAF6D7  /* A03 = +8.543836831355676453281e-02 */
+        .quad 0xBFB2513A97344BA4  /* A00 = -7.155195418844911836587e-02 */
+        .quad 0x3FF569BA0DB5EE14  /* A01 = +1.338312200124055273420e+00 */
+        .quad 0xBFE31B53A8B67B20  /* A02 = -5.970857901737396389308e-01 */
+        .quad 0x3FB787F297BB0544  /* A03 = +9.191814617499455275507e-02 */
+        .quad 0xBFB37512E848FAFA  /* A00 = -7.600515528700305112331e-02 */
+        .quad 0x3FF5A41F33B403C8  /* A01 = +1.352568819013173495591e+00 */
+        .quad 0xBFE397F6EA9A58A5  /* A02 = -6.123003561103997904880e-01 */
+        .quad 0x3FB8EAA9FF25CA06  /* A03 = +9.733068923177520814782e-02 */
+        .quad 0xBFB47B3E603AFC5D  /* A00 = -8.000554894805263217439e-02 */
+        .quad 0x3FF5D6E3EDE40487  /* A01 = +1.364963464031718975988e+00 */
+        .quad 0xBFE400D5BCA6D631  /* A02 = -6.251019177058819709103e-01 */
+        .quad 0x3FBA0B830ED567FE  /* A03 = +1.017381583418739132707e-01 */
+        .quad 0xBFB5BBFE8AC90496  /* A00 = -8.489981544791400103200e-02 */
+        .quad 0x3FF612BA70107E95  /* A01 = +1.379572332145390989311e+00 */
+        .quad 0xBFE477EAF1FA7693  /* A02 = -6.396383978023599814478e-01 */
+        .quad 0x3FBB4784B7C08A95  /* A03 = +1.065600346196709652391e-01 */
+        .quad 0xBFB6D5D940743939  /* A00 = -8.920057128509463473254e-02 */
+        .quad 0x3FF644A8748F70CE  /* A01 = +1.391762214006166953340e+00 */
+        .quad 0xBFE4D646AB07EA37  /* A02 = -6.511567440459832267763e-01 */
+        .quad 0x3FBC354F4E1D5292  /* A03 = +1.101884427747086558913e-01 */
+        .quad 0xBFB7223D19E4F3D1  /* A00 = -9.036619074045339206069e-02 */
+        .quad 0x3FF6518FEB42B7FA  /* A01 = +1.394912642466350494175e+00 */
+        .quad 0xBFE4ED86CB87498C  /* A02 = -6.539949393430091184598e-01 */
+        .quad 0x3FBC6D29F28CCA9B  /* A03 = +1.110407082713131127205e-01 */
+        .quad 0xBFB6878652FF6312  /* A00 = -8.800544287022329936754e-02 */
+        .quad 0x3FF63948C302D040  /* A01 = +1.388985406648330922508e+00 */
+        .quad 0xBFE4C4E2E7904E17  /* A02 = -6.490339777687407218920e-01 */
+        .quad 0x3FBC127356CA1ABE  /* A03 = +1.096565329445224612481e-01 */
+        .quad 0xBFB4F5D18B0C91D6  /* A00 = -8.187589306596207427980e-02 */
+        .quad 0x3FF5FD27EB7DD0B8  /* A01 = +1.374305648697413673176e+00 */
+        .quad 0xBFE464E01A2B2FC6  /* A02 = -6.373138915164353601739e-01 */
+        .quad 0x3FBB460547674A30  /* A03 = +1.065371798825160976065e-01 */
+        .quad 0xBFB26642FA16A685  /* A00 = -7.187288861919156890412e-02 */
+        .quad 0x3FF59F9BEDE1C95A  /* A01 = +1.351467065073470141812e+00 */
+        .quad 0xBFE3D67920C8FBEA  /* A02 = -6.199308052381387046381e-01 */
+        .quad 0x3FBA24F6A8D3CBC1  /* A03 = +1.021265184570401413078e-01 */
+        .quad 0xBFADB5294794F097  /* A00 = -5.802277563859197656582e-02 */
+        .quad 0x3FF523EA7B9CF453  /* A01 = +1.321268542159732772845e+00 */
+        .quad 0xBFE322A8B55E35DB  /* A02 = -5.979808370918208160205e-01 */
+        .quad 0x3FB8C8673B1B3E37  /* A03 = +9.680791085269722928697e-02 */
+        .quad 0xBFA4B7D661965C6A  /* A00 = -4.046506825687219699450e-02 */
+        .quad 0x3FF48DE3E2CE3122  /* A01 = +1.284641157110919085227e+00 */
+        .quad 0xBFE251FED1A7F445  /* A02 = -5.725092024655472622285e-01 */
+        .quad 0x3FB745699FCABDB9  /* A03 = +9.090290213747821701507e-02 */
+        .quad 0xBF93E60456E4EE1D  /* A00 = -1.943213253365004902773e-02 */
+        .quad 0x3FF3E1A14E628A59  /* A01 = +1.242585474196536532432e+00 */
+        .quad 0xBFE16C5AB660E876  /* A02 = -5.444768488007543094653e-01 */
+        .quad 0x3FB5AD33AA8C188F  /* A03 = +8.467410005332197397987e-02 */
+        .quad 0x3F738C17C47C7961  /* A00 = +4.772274820224659853951e-03 */
+        .quad 0x3FF3234DDE3BD146  /* A01 = +1.196119182682268355933e+00 */
+        .quad 0xBFE078C0D77A9D3B  /* A02 = -5.147403915952176722826e-01 */
+        .quad 0x3FB40D74B3E276B8  /* A03 = +7.833032027925923568290e-02 */
+        .quad 0x3FA0474BECC689C7  /* A00 = +3.179394975019849550746e-02 */
+        .quad 0x3FF256FB4FA7D18A  /* A01 = +1.146235762743432307076e+00 */
+        .quad 0xBFDEFA8E3FB285E2  /* A02 = -4.840427038235174395098e-01 */
+        .quad 0x3FB270C007493D59  /* A03 = +7.203293016322244446403e-02 */
+        .quad 0x3FAF5BD51E479BDC  /* A00 = +6.124750132203590768931e-02 */
+        .quad 0x3FF18081D0B53BC5  /* A01 = +1.093873801484492647162e+00 */
+        .quad 0xBFDCFE2439BD0C03  /* A02 = -4.530115665294831006626e-01 */
+        .quad 0x3FB0DEFE5A45AFDD  /* A03 = +6.590261176978580437424e-02 */
+        .quad 0x3FB7BD5D2806EA26  /* A00 = +9.273321368429118805032e-02 */
+        .quad 0x3FF0A369E35B4440  /* A01 = +1.039895904647224256223e+00 */
+        .quad 0xBFDB04BC5C9951E7  /* A02 = -4.221640495573226181669e-01 */
+        .quad 0x3FAEBBBAA9D6DEEF  /* A03 = +6.002600978120919278380e-02 */
+        .quad 0x3FC01BE411098DBC  /* A00 = +1.258511622610124502941e-01 */
+        .quad 0x3FEF85BDABC031C1  /* A01 = +9.850757936961188621083e-01 */
+        .quad 0xBFD91521375097C2  /* A02 = -3.919146576102968682065e-01 */
+        .quad 0x3FABE26F0086D982  /* A03 = +5.446192628317005068883e-02 */
+        .quad 0x3FC481D7FF5776B9  /* A00 = +1.602125164781023347604e-01 */
+        .quad 0x3FEDC3506C1E7218  /* A01 = +9.300920592973538347792e-01 */
+        .quad 0xBFD7349A88DA7D4F  /* A02 = -3.625856720409119104964e-01 */
+        .quad 0x3FA936E2DFF8E2AE  /* A03 = +4.924687370334389358018e-02 */
+        .quad 0x3FC90471F96FA27A  /* A00 = +1.954481571149420671141e-01 */
+        .quad 0x3FEC0451601987A2  /* A01 = +8.755270840595026360376e-01 */
+        .quad 0xBFD5671CD4B898DC  /* A02 = -3.344184949259110251063e-01 */
+        .quad 0x3FA6BB9594603B67  /* A03 = +4.439990459660841243261e-02 */
+        .quad 0x3FCFD8ADB9ED944C  /* A00 = +2.488000066615846384011e-01 */
+        .quad 0x3FE978C073F6809A  /* A01 = +7.959902062321078108909e-01 */
+        .quad 0xBFD2DF7E00BCD5A9  /* A02 = -2.948908812716931060471e-01 */
+        .quad 0x3FA3614033D490B2  /* A03 = +3.785133965200894456959e-02 */
+        .quad 0x3FD4846A12AFE5A0  /* A00 = +3.205819303981005674586e-01 */
+        .quad 0x3FE63A1147D40472  /* A01 = +6.945883181471244061100e-01 */
+        .quad 0xBFCFA2268AD34450  /* A02 = -2.471359422548027318101e-01 */
+        .quad 0x3F9F150201D9FFE0  /* A03 = +3.035357605267552383310e-02 */
+        .quad 0x3FD9018641F82BEB  /* A00 = +3.907180446846598154131e-01 */
+        .quad 0x3FE33B7C220FFBDC  /* A01 = +6.010113396913498995389e-01 */
+        .quad 0xBFCA4E4187E29C86  /* A02 = -2.055131829740483584423e-01 */
+        .quad 0x3F98C30CED19F8F4  /* A03 = +2.418155858185229434287e-02 */
+        .quad 0x3FDD4B8255BEB078  /* A00 = +4.577337109901757905561e-01 */
+        .quad 0x3FE0858B19D3A49B  /* A01 = +5.163016800335243905451e-01 */
+        .quad 0xBFC5BC929EACE564  /* A02 = -1.698172831327539045176e-01 */
+        .quad 0x3F93A083CE57DE2B  /* A03 = +1.916700312537337677621e-02 */
+        .quad 0x3FE0A8E5E039295C  /* A00 = +5.206174258576470315063e-01 */
+        .quad 0x3FDC35E1234583FE  /* A01 = +4.407885403107342225937e-01 */
+        .quad 0xBFC1DE034E31AEB9  /* A02 = -1.395877963835710222629e-01 */
+        .quad 0x3F8EFDEBB3471BDC  /* A03 = +1.513275280821162888101e-02 */
+        .quad 0x3FE2851B603CB2A5  /* A00 = +5.787484054213406503564e-01 */
+        .quad 0x3FD7F4A44ABBB286  /* A01 = +3.743067483726821853551e-01 */
+        .quad 0xBFBD3EEB67087DE7  /* A02 = -1.142413260026767657385e-01 */
+        .quad 0x3F8864F38329E8BD  /* A03 = +1.191129917173260922836e-02 */
+        .quad 0x3FE437DBE3C34AC1  /* A00 = +6.318187187665317283702e-01 */
+        .quad 0x3FD43F6F789441B5  /* A01 = +3.163717916040938438194e-01 */
+        .quad 0xBFB7D92E7901B9A4  /* A02 = -9.315767721429907277653e-02 */
+        .quad 0x3F8327ED342308E1  /* A03 = +9.353497651663324544136e-03 */
+        .quad 0x3FE5C0977766D55C  /* A00 = +6.797597248138731451661e-01 */
+        .quad 0x3FD10B42A764D8F9  /* A01 = +2.663122782427219115142e-01 */
+        .quad 0xBFB3633351D3D70F  /* A02 = -7.573242900602060456716e-02 */
+        .quad 0x3F7E079E30FF899C  /* A03 = +7.331483779099558922843e-03 */
+        .quad 0x3FE7202CE08A88C4  /* A00 = +7.226776490754436288455e-01 */
+        .quad 0x3FCC973EB5662B01  /* A01 = +2.233656297433626314319e-01 */
+        .quad 0xBFAF70A455F9920B  /* A02 = -6.140626477716545211782e-02 */
+        .quad 0x3F77812411CE99B6  /* A03 = +5.738392731393584730859e-03 */
+        .quad 0x3FE85879424095B1  /* A00 = +7.608000082006382003286e-01 */
+        .quad 0x3FC7E73BD1674D84  /* A01 = +1.867441914060742336190e-01 */
+        .quad 0xBFA96F84E4BF333B  /* A02 = -4.967894832916504993525e-02 */
+        .quad 0x3F72606DDCA6E117  /* A03 = +4.486493251924870105662e-03 */
+        .quad 0x3FE96BFE4957F4DD  /* A00 = +7.944327766887472330737e-01 */
+        .quad 0x3FC3ED4780D25478  /* A01 = +1.556786898624158421711e-01 */
+        .quad 0xBFA489C5F9A56B58  /* A02 = -4.011362717093075458408e-02 */
+        .quad 0x3F6CB5DC17E9AD2A  /* A03 = +3.504686231556104931972e-03 */
+        .quad 0x3FEA5D9CB2F41234  /* A00 = +8.239272589858672724006e-01 */
+        .quad 0x3FC091A758374DCF  /* A01 = +1.294449978582705440555e-01 */
+        .quad 0xBFA08E436D4B5CE0  /* A02 = -3.233538350257858517978e-02 */
+        .quad 0x3F666997AD53E6B7  /* A03 = +2.735897297154145629133e-03 */
+        .quad 0x3FEB3060342CB850  /* A00 = +8.496552485501158713532e-01 */
+        .quad 0x3FBB7D30BBC7DC1B  /* A01 = +1.073790033768634993860e-01 */
+        .quad 0xBF9AA6BA3443D9E3  /* A02 = -2.602663940430173170060e-02 */
+        .quad 0x3F617CA764B7850B  /* A03 = +2.134634914668814050648e-03 */
+        .quad 0x3FEBE759A6A0C7B8  /* A00 = +8.719909910635044170135e-01 */
+        .quad 0x3FB6C10DE6A703FF  /* A01 = +8.888327485239243264115e-02 */
+        .quad 0xBF956C566D8BE1F6  /* A02 = -2.092108768099084498138e-02 */
+        .quad 0x3F5B46D1A4A59CF8  /* A03 = +1.664833764687232917079e-03 */
+        .quad 0x3FEC858494887A04  /* A00 = +8.912985707318630268503e-01 */
+        .quad 0x3FB2CC31F543394D  /* A01 = +7.342827070099140762682e-02 */
+        .quad 0xBF9133477FF69137  /* A02 = -1.679717749142747504343e-02 */
+        .quad 0x3F5544482FBB4DA5  /* A03 = +1.298017973501022466823e-03 */
+        .quad 0x3FED0DB59D0E32E9  /* A00 = +9.079235141267335551518e-01 */
+        .quad 0x3FAF006BAFFC6EF4  /* A01 = +6.055008433597022787787e-02 */
+        .quad 0xBF8B97146FA2B97A  /* A02 = -1.347175565419144252499e-02 */
+        .quad 0x3F5093B01F4CDC69  /* A03 = +1.011774057770665211434e-03 */
+        .quad 0x3FEDB487C3EC457C  /* A00 = +9.282873942012623835751e-01 */
+        .quad 0x3FA7390C09D0BD1D  /* A01 = +4.535710925881118044112e-02 */
+        .quad 0xBF83D9F7C3181106  /* A02 = -9.693084374710735778846e-03 */
+        .quad 0x3F46E34A0A3C0E64  /* A03 = +6.984817050299072134500e-04 */
+        .quad 0x3FEE5FFCB4E6EB00  /* A00 = +9.492171796076434020506e-01 */
+        .quad 0x3F9F4913ED00AADF  /* A01 = +3.055220731782070861526e-02 */
+        .quad 0xBF79670BD0E59B5C  /* A02 = -6.201788097633133961528e-03 */
+        .quad 0x3F3BC998EBCAF96D  /* A03 = +4.240034429975534616304e-04 */
+        .quad 0x3FEEDBA41E9542FE  /* A00 = +9.643116566968215064293e-01 */
+        .quad 0x3F94F5DD18D9C24D  /* A01 = +2.046914543319848858727e-02 */
+        .quad 0xBF7034896AA122B9  /* A02 = -3.956352980886528904192e-03 */
+        .quad 0x3F30DCCB47810B39  /* A03 = +2.573009765038273091199e-04 */
+        .quad 0x3FEF33F2882520ED  /* A00 = +9.750912341196716903724e-01 */
+        .quad 0x3F8BF37F2CF553FF  /* A01 = +1.364802699996836392315e-02 */
+        .quad 0xBF649F6F05A69619  /* A02 = -2.517430152880317534986e-03 */
+        .quad 0x3F247623C950AAC9  /* A03 = +1.561087307505231250044e-04 */
+        .quad 0x3FEF727757751741  /* A00 = +9.827229221489021115943e-01 */
+        .quad 0x3F828E67912C4400  /* A01 = +9.060677640748693306705e-03 */
+        .quad 0xBF5A2F51A806CC2C  /* A02 = -1.598195784123355826789e-03 */
+        .quad 0x3F18D35D7687E613  /* A03 = +9.470231965016282719549e-05 */
+        .quad 0x3FEF9E6325C5942A  /* A00 = +9.880843866091073568469e-01 */
+        .quad 0x3F788AB117618F76  /* A01 = +5.991641772286606867914e-03 */
+        .quad 0xBF5096EAB0B1EA89  /* A02 = -1.012543859160305046233e-03 */
+        .quad 0x3F0E1E50EC4435AB  /* A03 = +5.744633156910412119652e-05 */
+        .quad 0x3FEFBD0784049369  /* A00 = +9.918248728250605994461e-01 */
+        .quad 0x3F702BBD8294035F  /* A01 = +3.947963975634432264028e-03 */
+        .quad 0xBF44FB55E0F00593  /* A02 = -6.403130845457509273330e-04 */
+        .quad 0x3F0244DCD723230A  /* A03 = +3.484534217219031730379e-05 */
+        .quad 0x3FEFD245E2366A43  /* A00 = +9.944180887426415926811e-01 */
+        .quad 0x3F653D82EC088433  /* A01 = +2.592807490387838333795e-03 */
+        .quad 0xBF3A7DF75E013CB8  /* A02 = -4.042366908878036561859e-04 */
+        .quad 0x3EF6298E69F991CD  /* A03 = +2.113564425911141559972e-05 */
+        .quad 0x3FEFE0EAA508BC69  /* A00 = +9.962056372950317539861e-01 */
+        .quad 0x3F5BD0771AF3FDDA  /* A01 = +1.697651208644282514598e-03 */
+        .quad 0xBF30B2E1254DE571  /* A02 = -2.548026725928887099328e-04 */
+        .quad 0x3EEAE28B70EC0256  /* A03 = +1.281973848454955042307e-05 */
+        .quad 0x3FEFEAF5303D7F96  /* A00 = +9.974313680831865536192e-01 */
+        .quad 0x3F5229111365657E  /* A01 = +1.108423877289460134782e-03 */
+        .quad 0xBF250572D04DFE66  /* A02 = -1.603796628408704519168e-04 */
+        .quad 0x3EE04E89BB57C981  /* A03 = +7.775682983689149966743e-06 */
+        .quad 0x3FEFF1CF52F1CF44  /* A00 = +9.982678051005469122003e-01 */
+        .quad 0x3F47A71316147CEB  /* A01 = +7.218211359577819110842e-04 */
+        .quad 0xBF1A6D7604055719  /* A02 = -1.008132248946049582547e-04 */
+        .quad 0x3ED3C8047586A85C  /* A03 = +4.716233739913014633626e-06 */
+        .quad 0x3FEFF6770369EF69  /* A00 = +9.988360468555416149528e-01 */
+        .quad 0x3F3EBB261180FBF0  /* A01 = +4.689186039321105101130e-04 */
+        .quad 0xBF1097754FE19D7F  /* A02 = -6.329206004950480057066e-05 */
+        .quad 0x3EC7FEFF83BCA0A7  /* A03 = +2.860556404988488738366e-06 */
+        .quad 0x3FEFF99D42371AC4  /* A00 = +9.992204945818561334647e-01 */
+        .quad 0x3F33EB2AEC271F59  /* A01 = +3.039340773764907474054e-04 */
+        .quad 0xBF04CF18E0FC0D79  /* A02 = -3.968996690952969588805e-05 */
+        .quad 0x3EBD1BDBD6019BE9  /* A03 = +1.735021065507727833886e-06 */
+        .quad 0x3FEFFBBCA32B0D91  /* A00 = +9.994795977476532700123e-01 */
+        .quad 0x3F29C41E1615110A  /* A01 = +1.965796209707565346710e-04 */
+        .quad 0xBEFA11F93D9DCB5A  /* A02 = -2.486248909101414873235e-05 */
+        .quad 0x3EB1A7CA4546F7A7  /* A03 = +1.052345642723709228769e-06 */
+        .quad 0x3FEFFD298B8E8DE2  /* A00 = +9.996535993308806045121e-01 */
+        .quad 0x3F20A1C42D523C5B  /* A01 = +1.268913244172078754520e-04 */
+        .quad 0xBEF0507A364AFAE4  /* A02 = -1.555859070622834605755e-05 */
+        .quad 0x3EA56ACA17E7CDF4  /* A03 = +6.382806956848098872313e-07 */
+        .quad 0x3FEFFE1DC82BA5A3  /* A00 = +9.997700604991915929176e-01 */
+        .quad 0x3F156E73B90F1769  /* A01 = +8.175450626798714452801e-05 */
+        .quad 0xBEE4663579D0A09F  /* A02 = -9.727122057226747625365e-06 */
+        .quad 0x3E99FAF6FEC5D4C1  /* A03 = +3.871371052824002996020e-07 */
+        .quad 0x3FEFFEF8D0BB5E81  /* A00 = +9.998745037837154514548e-01 */
+        .quad 0x3F06686DA18D39C3  /* A01 = +4.273972098777251447726e-05 */
+        .quad 0xBED46BC298073E90  /* A02 = -4.868731025855742842491e-06 */
+        .quad 0x3E88E42286B9D0FD  /* A03 = +1.854535328530838170114e-07 */
+        .quad 0x3FEFFF8DBC68DDC7  /* A00 = +9.999455146670975791423e-01 */
+        .quad 0x3EF26B2953A80AF0  /* A01 = +1.756534514108903368909e-05 */
+        .quad 0xBEBFC4472D580F83  /* A02 = -1.893443529411295465239e-06 */
+        .quad 0x3E72505B4553D19F  /* A03 = +6.822456673547912277047e-08 */
+        .quad 0x3FEFFFCED1276609  /* A00 = +9.999765477215883935358e-01 */
+        .quad 0x3EDE1A94C7CC58F5  /* A01 = +7.177313020153979672606e-06 */
+        .quad 0xBEA8A2C988744E57  /* A02 = -7.342066660497443762363e-07 */
+        .quad 0x3E5AF30036BBBAF4  /* A03 = +2.509841882843541084885e-08 */
+        .quad 0x3FEFFFEAFE70FCFC  /* A00 = +9.999899835164849370983e-01 */
+        .quad 0x3EC879175E3549F5  /* A01 = +2.917410471128503564412e-06 */
+        .quad 0xBE930E36677D1813  /* A02 = -2.839493400307523115929e-07 */
+        .quad 0x3E43D4005B42D48F  /* A03 = +9.233192745401904898013e-09 */
+        .quad 0x3ff0000000000000
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        .quad 0x0000000000000000
+        .align 32
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSignMask        */
+        .align 32
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff           /* _sAbsMask         */
+        .align 32
+        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask     */
+        .align 32
+        .long 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000           /* _iExpMask         */
+        .align 32
+        .long 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000           /* _iMinIdxOfsMask   */
+        .align 32
+        .long 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000           /* _iMaxIdxMask      */
+        .align 32
+        .type	__svml_stanh_data_internal,@object
+        .size	__svml_stanh_data_internal,.-__svml_stanh_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_tanh2_core.S b/sysdeps/x86_64/fpu/svml_d_tanh2_core.S
new file mode 100644
index 0000000000..c703131777
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_tanh2_core.S
@@ -0,0 +1,29 @@
+/* Function tanh vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_tanh)
+WRAPPER_IMPL_SSE2 tanh
+END (_ZGVbN2v_tanh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_tanh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_tanh4_core.S b/sysdeps/x86_64/fpu/svml_d_tanh4_core.S
new file mode 100644
index 0000000000..fb293f4dba
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_tanh4_core.S
@@ -0,0 +1,29 @@
+/* Function tanh vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_tanh)
+WRAPPER_IMPL_AVX _ZGVbN2v_tanh
+END (_ZGVdN4v_tanh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_tanh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S
new file mode 100644
index 0000000000..5385a2c27c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function tanh vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_tanh)
+WRAPPER_IMPL_AVX _ZGVbN2v_tanh
+END (_ZGVcN4v_tanh)
diff --git a/sysdeps/x86_64/fpu/svml_d_tanh8_core.S b/sysdeps/x86_64/fpu/svml_d_tanh8_core.S
new file mode 100644
index 0000000000..9dafa7bb9a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_tanh8_core.S
@@ -0,0 +1,25 @@
+/* Function tanh vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_tanh)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_tanh
+END (_ZGVeN8v_tanh)
diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf16_core.S b/sysdeps/x86_64/fpu/svml_s_tanhf16_core.S
new file mode 100644
index 0000000000..19d51365e8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_tanhf16_core.S
@@ -0,0 +1,25 @@
+/* Function tanhf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_tanhf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_tanhf
+END (_ZGVeN16v_tanhf)
diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf4_core.S b/sysdeps/x86_64/fpu/svml_s_tanhf4_core.S
new file mode 100644
index 0000000000..6b98950f84
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_tanhf4_core.S
@@ -0,0 +1,29 @@
+/* Function tanhf vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_tanhf)
+WRAPPER_IMPL_SSE2 tanhf
+END (_ZGVbN4v_tanhf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_tanhf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf8_core.S b/sysdeps/x86_64/fpu/svml_s_tanhf8_core.S
new file mode 100644
index 0000000000..3ada061ae0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_tanhf8_core.S
@@ -0,0 +1,29 @@
+/* Function tanhf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_tanhf)
+WRAPPER_IMPL_AVX _ZGVbN4v_tanhf
+END (_ZGVdN8v_tanhf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_tanhf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S
new file mode 100644
index 0000000000..255d45952d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function tanhf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_tanhf)
+WRAPPER_IMPL_AVX _ZGVbN4v_tanhf
+END (_ZGVcN8v_tanhf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c
new file mode 100644
index 0000000000..a456c574e2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-tanh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c
new file mode 100644
index 0000000000..a456c574e2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-tanh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c
new file mode 100644
index 0000000000..a456c574e2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-tanh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh.c
new file mode 100644
index 0000000000..4cb6a169d8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC tanh
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index 9d91ccfe51..f53bb6813e 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVbN2v_acosh)
 VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVbN2v_erf)
+VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVbN2v_tanh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 9e86d5fef8..0452c3db38 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -47,6 +47,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVdN4v_acosh)
 VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVdN4v_erf)
+VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVdN4v_tanh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 0f4ef00de4..197d5afc88 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVcN4v_acosh)
 VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVcN4v_erf)
+VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVcN4v_tanh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index 975dff85af..e56ece640c 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
 VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVeN8v_acosh)
 VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVeN8v_erf)
+VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVeN8v_tanh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c
new file mode 100644
index 0000000000..254f9201aa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-tanhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c
new file mode 100644
index 0000000000..254f9201aa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-tanhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c
new file mode 100644
index 0000000000..254f9201aa
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-tanhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c
new file mode 100644
index 0000000000..9a61ee8f9c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC tanhf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 2b1e27391a..abbebf9993 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVeN16v_acoshf)
 VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVeN16v_erff)
+VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVeN16v_tanhf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 78428bf517..ae1c8b98c2 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVbN4v_acoshf)
 VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVbN4v_erff)
+VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVbN4v_tanhf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index dadd4e6ca0..eb477a0371 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -47,6 +47,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVdN8v_acoshf)
 VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVdN8v_erff)
+VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVdN8v_tanhf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 7b2d583e54..944f7f0a75 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
 VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVcN8v_acoshf)
 VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVcN8v_erff)
+VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVcN8v_tanhf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 18/18] x86-64: Add vector asinh/asinhf implementation to libmvec
  2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
                   ` (16 preceding siblings ...)
  2021-12-29  6:39 ` [PATCH v5 17/18] x86-64: Add vector tanh/tanhf " Sunil K Pandey
@ 2021-12-29  6:40 ` Sunil K Pandey
  2021-12-29 21:27   ` H.J. Lu
  17 siblings, 1 reply; 40+ messages in thread
From: Sunil K Pandey @ 2021-12-29  6:40 UTC (permalink / raw)
  To: libc-alpha; +Cc: hjl.tools, andrey.kolesov, marius.cornea

Implement vectorized asinh/asinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector asinh/asinhf with regenerated ulps.
---
 bits/libm-simd-decl-stubs.h                   |   11 +
 math/bits/mathcalls.h                         |    2 +-
 .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
 sysdeps/x86/fpu/bits/math-vector.h            |    4 +
 .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
 sysdeps/x86_64/fpu/Makeconfig                 |    1 +
 sysdeps/x86_64/fpu/Versions                   |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps             |   17 +
 .../fpu/multiarch/svml_d_asinh2_core-sse2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_asinh2_core.c |   27 +
 .../fpu/multiarch/svml_d_asinh2_core_sse4.S   | 1662 +++++++++++++++++
 .../fpu/multiarch/svml_d_asinh4_core-sse.S    |   20 +
 .../x86_64/fpu/multiarch/svml_d_asinh4_core.c |   27 +
 .../fpu/multiarch/svml_d_asinh4_core_avx2.S   | 1601 ++++++++++++++++
 .../fpu/multiarch/svml_d_asinh8_core-avx2.S   |   20 +
 .../x86_64/fpu/multiarch/svml_d_asinh8_core.c |   27 +
 .../fpu/multiarch/svml_d_asinh8_core_avx512.S |  510 +++++
 .../fpu/multiarch/svml_s_asinhf16_core-avx2.S |   20 +
 .../fpu/multiarch/svml_s_asinhf16_core.c      |   28 +
 .../multiarch/svml_s_asinhf16_core_avx512.S   |  476 +++++
 .../fpu/multiarch/svml_s_asinhf4_core-sse2.S  |   20 +
 .../fpu/multiarch/svml_s_asinhf4_core.c       |   28 +
 .../fpu/multiarch/svml_s_asinhf4_core_sse4.S  |  509 +++++
 .../fpu/multiarch/svml_s_asinhf8_core-sse.S   |   20 +
 .../fpu/multiarch/svml_s_asinhf8_core.c       |   28 +
 .../fpu/multiarch/svml_s_asinhf8_core_avx2.S  |  457 +++++
 sysdeps/x86_64/fpu/svml_d_asinh2_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_asinh4_core.S       |   29 +
 sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S   |   25 +
 sysdeps/x86_64/fpu/svml_d_asinh8_core.S       |   25 +
 sysdeps/x86_64/fpu/svml_s_asinhf16_core.S     |   25 +
 sysdeps/x86_64/fpu/svml_s_asinhf4_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_asinhf8_core.S      |   29 +
 sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S  |   25 +
 .../fpu/test-double-libmvec-asinh-avx.c       |    1 +
 .../fpu/test-double-libmvec-asinh-avx2.c      |    1 +
 .../fpu/test-double-libmvec-asinh-avx512f.c   |    1 +
 .../x86_64/fpu/test-double-libmvec-asinh.c    |    3 +
 .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
 .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
 .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
 .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
 .../fpu/test-float-libmvec-asinhf-avx.c       |    1 +
 .../fpu/test-float-libmvec-asinhf-avx2.c      |    1 +
 .../fpu/test-float-libmvec-asinhf-avx512f.c   |    1 +
 .../x86_64/fpu/test-float-libmvec-asinhf.c    |    3 +
 .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
 .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
 .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
 .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
 50 files changed, 5784 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core-avx2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core-sse2.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core-sse.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh2_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx2.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx512f.c
 create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
index 21f1a43232..bcaddb7a0e 100644
--- a/bits/libm-simd-decl-stubs.h
+++ b/bits/libm-simd-decl-stubs.h
@@ -296,4 +296,15 @@
 #define __DECL_SIMD_tanhf32x
 #define __DECL_SIMD_tanhf64x
 #define __DECL_SIMD_tanhf128x
+
+#define __DECL_SIMD_asinh
+#define __DECL_SIMD_asinhf
+#define __DECL_SIMD_asinhl
+#define __DECL_SIMD_asinhf16
+#define __DECL_SIMD_asinhf32
+#define __DECL_SIMD_asinhf64
+#define __DECL_SIMD_asinhf128
+#define __DECL_SIMD_asinhf32x
+#define __DECL_SIMD_asinhf64x
+#define __DECL_SIMD_asinhf128x
 #endif
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 3d1c2056d5..40e055e579 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -84,7 +84,7 @@ __MATHDECL_VEC (void,sincos,,
 /* Hyperbolic arc cosine of X.  */
 __MATHCALL_VEC (acosh,, (_Mdouble_ __x));
 /* Hyperbolic arc sine of X.  */
-__MATHCALL (asinh,, (_Mdouble_ __x));
+__MATHCALL_VEC (asinh,, (_Mdouble_ __x));
 /* Hyperbolic arc tangent of X.  */
 __MATHCALL_VEC (atanh,, (_Mdouble_ __x));
 #endif
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index e178cef683..df265d6a12 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -49,6 +49,7 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
 GLIBC_2.35 _ZGVbN2v_acos F
 GLIBC_2.35 _ZGVbN2v_acosh F
 GLIBC_2.35 _ZGVbN2v_asin F
+GLIBC_2.35 _ZGVbN2v_asinh F
 GLIBC_2.35 _ZGVbN2v_atan F
 GLIBC_2.35 _ZGVbN2v_atanh F
 GLIBC_2.35 _ZGVbN2v_cbrt F
@@ -67,6 +68,7 @@ GLIBC_2.35 _ZGVbN2vv_hypot F
 GLIBC_2.35 _ZGVbN4v_acosf F
 GLIBC_2.35 _ZGVbN4v_acoshf F
 GLIBC_2.35 _ZGVbN4v_asinf F
+GLIBC_2.35 _ZGVbN4v_asinhf F
 GLIBC_2.35 _ZGVbN4v_atanf F
 GLIBC_2.35 _ZGVbN4v_atanhf F
 GLIBC_2.35 _ZGVbN4v_cbrtf F
@@ -85,6 +87,7 @@ GLIBC_2.35 _ZGVbN4vv_hypotf F
 GLIBC_2.35 _ZGVcN4v_acos F
 GLIBC_2.35 _ZGVcN4v_acosh F
 GLIBC_2.35 _ZGVcN4v_asin F
+GLIBC_2.35 _ZGVcN4v_asinh F
 GLIBC_2.35 _ZGVcN4v_atan F
 GLIBC_2.35 _ZGVcN4v_atanh F
 GLIBC_2.35 _ZGVcN4v_cbrt F
@@ -103,6 +106,7 @@ GLIBC_2.35 _ZGVcN4vv_hypot F
 GLIBC_2.35 _ZGVcN8v_acosf F
 GLIBC_2.35 _ZGVcN8v_acoshf F
 GLIBC_2.35 _ZGVcN8v_asinf F
+GLIBC_2.35 _ZGVcN8v_asinhf F
 GLIBC_2.35 _ZGVcN8v_atanf F
 GLIBC_2.35 _ZGVcN8v_atanhf F
 GLIBC_2.35 _ZGVcN8v_cbrtf F
@@ -121,6 +125,7 @@ GLIBC_2.35 _ZGVcN8vv_hypotf F
 GLIBC_2.35 _ZGVdN4v_acos F
 GLIBC_2.35 _ZGVdN4v_acosh F
 GLIBC_2.35 _ZGVdN4v_asin F
+GLIBC_2.35 _ZGVdN4v_asinh F
 GLIBC_2.35 _ZGVdN4v_atan F
 GLIBC_2.35 _ZGVdN4v_atanh F
 GLIBC_2.35 _ZGVdN4v_cbrt F
@@ -139,6 +144,7 @@ GLIBC_2.35 _ZGVdN4vv_hypot F
 GLIBC_2.35 _ZGVdN8v_acosf F
 GLIBC_2.35 _ZGVdN8v_acoshf F
 GLIBC_2.35 _ZGVdN8v_asinf F
+GLIBC_2.35 _ZGVdN8v_asinhf F
 GLIBC_2.35 _ZGVdN8v_atanf F
 GLIBC_2.35 _ZGVdN8v_atanhf F
 GLIBC_2.35 _ZGVdN8v_cbrtf F
@@ -157,6 +163,7 @@ GLIBC_2.35 _ZGVdN8vv_hypotf F
 GLIBC_2.35 _ZGVeN16v_acosf F
 GLIBC_2.35 _ZGVeN16v_acoshf F
 GLIBC_2.35 _ZGVeN16v_asinf F
+GLIBC_2.35 _ZGVeN16v_asinhf F
 GLIBC_2.35 _ZGVeN16v_atanf F
 GLIBC_2.35 _ZGVeN16v_atanhf F
 GLIBC_2.35 _ZGVeN16v_cbrtf F
@@ -175,6 +182,7 @@ GLIBC_2.35 _ZGVeN16vv_hypotf F
 GLIBC_2.35 _ZGVeN8v_acos F
 GLIBC_2.35 _ZGVeN8v_acosh F
 GLIBC_2.35 _ZGVeN8v_asin F
+GLIBC_2.35 _ZGVeN8v_asinh F
 GLIBC_2.35 _ZGVeN8v_atan F
 GLIBC_2.35 _ZGVeN8v_atanh F
 GLIBC_2.35 _ZGVeN8v_cbrt F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 3c657f6108..71b7d660db 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -130,6 +130,10 @@
 #  define __DECL_SIMD_tanh __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_tanhf
 #  define __DECL_SIMD_tanhf __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_asinh
+#  define __DECL_SIMD_asinh __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_asinhf
+#  define __DECL_SIMD_asinhf __DECL_SIMD_x86_64
 
 # endif
 #endif
diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
index c7f81945fe..4d3afdf753 100644
--- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
@@ -64,6 +64,8 @@
 !GCC$ builtin (erff) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (tanh) attributes simd (notinbranch) if('x86_64')
 !GCC$ builtin (tanhf) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (asinh) attributes simd (notinbranch) if('x86_64')
+!GCC$ builtin (asinhf) attributes simd (notinbranch) if('x86_64')
 
 !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
@@ -113,3 +115,5 @@
 !GCC$ builtin (erff) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (tanh) attributes simd (notinbranch) if('x32')
 !GCC$ builtin (tanhf) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (asinh) attributes simd (notinbranch) if('x32')
+!GCC$ builtin (asinhf) attributes simd (notinbranch) if('x32')
diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
index 26df8d47bf..2ff33c7dd8 100644
--- a/sysdeps/x86_64/fpu/Makeconfig
+++ b/sysdeps/x86_64/fpu/Makeconfig
@@ -25,6 +25,7 @@ libmvec-funcs = \
   acos \
   acosh \
   asin \
+  asinh \
   atan \
   atan2 \
   atanh \
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index adcbe0fefb..e6ead13085 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -17,6 +17,7 @@ libmvec {
     _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
     _ZGVbN2v_acosh; _ZGVcN4v_acosh; _ZGVdN4v_acosh; _ZGVeN8v_acosh;
     _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
+    _ZGVbN2v_asinh; _ZGVcN4v_asinh; _ZGVdN4v_asinh; _ZGVeN8v_asinh;
     _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
     _ZGVbN2v_atanh; _ZGVcN4v_atanh; _ZGVdN4v_atanh; _ZGVeN8v_atanh;
     _ZGVbN2v_cbrt; _ZGVcN4v_cbrt; _ZGVdN4v_cbrt; _ZGVeN8v_cbrt;
@@ -35,6 +36,7 @@ libmvec {
     _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
     _ZGVbN4v_acoshf; _ZGVcN8v_acoshf; _ZGVdN8v_acoshf; _ZGVeN16v_acoshf;
     _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
+    _ZGVbN4v_asinhf; _ZGVcN8v_asinhf; _ZGVdN8v_asinhf; _ZGVeN16v_asinhf;
     _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
     _ZGVbN4v_atanhf; _ZGVcN8v_atanhf; _ZGVdN8v_atanhf; _ZGVeN16v_atanhf;
     _ZGVbN4v_cbrtf; _ZGVcN8v_cbrtf; _ZGVdN8v_cbrtf; _ZGVeN16v_cbrtf;
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index bfaad7acef..71e9fced02 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -157,6 +157,23 @@ float: 3
 float128: 4
 ldouble: 5
 
+Function: "asinh_vlen2":
+double: 1
+
+Function: "asinh_vlen4":
+double: 1
+float: 1
+
+Function: "asinh_vlen4_avx2":
+double: 1
+
+Function: "asinh_vlen8":
+double: 1
+float: 1
+
+Function: "asinh_vlen8_avx2":
+float: 1
+
 Function: "atan":
 double: 1
 float: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core-sse2.S
new file mode 100644
index 0000000000..ddd1c3ca24
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized asinh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN2v_asinh _ZGVbN2v_asinh_sse2
+#include "../svml_d_asinh2_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core.c
new file mode 100644
index 0000000000..37452d0f92
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized asinh, vector length is 2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN2v_asinh
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN2v_asinh, __GI__ZGVbN2v_asinh, __redirect__ZGVbN2v_asinh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core_sse4.S
new file mode 100644
index 0000000000..0fe130f20a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core_sse4.S
@@ -0,0 +1,1662 @@
+/* Function asinh vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute asinh(x) as log(x + sqrt(x*x + 1))
+ *
+ *   Special cases:
+ *
+ *   asinh(NaN) = quiet NaN, and raise invalid exception
+ *   asinh(INF) = that INF
+ *   asinh(0)   = that 0
+ *
+ */
+
+/* Offsets for data table __svml_dasinh_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	8208
+#define poly_coeff                    	12320
+#define ExpMask                       	12384
+#define Two10                         	12400
+#define MinLog1p                      	12416
+#define MaxLog1p                      	12432
+#define One                           	12448
+#define SgnMask                       	12464
+#define XThreshold                    	12480
+#define XhMask                        	12496
+#define Threshold                     	12512
+#define Bias                          	12528
+#define Bias1                         	12544
+#define ExpMask0                      	12560
+#define ExpMask2                      	12576
+#define L2                            	12592
+#define dBigThreshold                 	12608
+#define dC2                           	12624
+#define dC3                           	12640
+#define dC4                           	12656
+#define dC5                           	12672
+#define dHalf                         	12688
+#define dLargestFinite                	12704
+#define dLittleThreshold              	12720
+#define dSign                         	12736
+#define dThirtyOne                    	12752
+#define dTopMask12                    	12768
+#define dTopMask26                    	12784
+#define dTopMask29                    	12800
+#define XScale                        	12816
+
+/* Lookup bias for data table __svml_dasinh_data_internal.  */
+#define Table_Lookup_Bias               -0x405ff0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN2v_asinh_sse4)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $64, %rsp
+        movaps    %xmm0, %xmm13
+
+/*
+ * Split X into high and low parts, XHi (<= 26 bits) and XLo (<= 27 bits)
+ * We could use either X or |X| here, but it doesn't seem to matter
+ */
+        movups    dTopMask26+__svml_dasinh_data_internal(%rip), %xmm15
+        movaps    %xmm13, %xmm7
+        andps     %xmm13, %xmm15
+        lea       Table_Lookup_Bias+__svml_dasinh_data_internal(%rip), %rsi
+
+/*
+ * Compute X^2 = (XHi + XLo)^2 = XHi^2 + XLo * (X + XHi)
+ * The two parts are shifted off by around 26 bits. So even though
+ * the low bit will not in general be exact, it's near enough
+ */
+        movaps    %xmm15, %xmm8
+        mulpd     %xmm15, %xmm8
+        subpd     %xmm15, %xmm7
+        addpd     %xmm13, %xmm15
+
+/* Load the constant 1 and a sign mask */
+        movups    One+__svml_dasinh_data_internal(%rip), %xmm12
+
+/*
+ * Finally, express Y + W = X^2 + 1 accurately where Y has <= 29 bits.
+ * If |X| <= 1 then |XHi| <= 1 and so |X2Hi| <= 1, so we can treat 1
+ * as the dominant component in the compensated summation. Otherwise,
+ * if |X| >= 1, then since X2Hi only has 52 significant bits, the basic
+ * addition will be exact anyway until we get to |X| >= 2^53. But by
+ * that time the log function is well-conditioned enough that the
+ * rounding error doesn't matter. Hence we can treat 1 as dominant even
+ * if it literally isn't.
+ */
+        movaps    %xmm12, %xmm3
+        movaps    %xmm12, %xmm5
+        addpd     %xmm8, %xmm3
+        mulpd     %xmm15, %xmm7
+        subpd     %xmm3, %xmm5
+        movups    dTopMask29+__svml_dasinh_data_internal(%rip), %xmm6
+        andps     %xmm3, %xmm6
+
+/*
+ * Compute R = 1/sqrt(Y + W) * (1 + d)
+ * Force R to <= 12 significant bits in case it isn't already
+ * This means that R * Y and R^2 * Y are exactly representable.
+ */
+        cvtpd2ps  %xmm6, %xmm1
+        addpd     %xmm8, %xmm5
+        subpd     %xmm6, %xmm3
+
+/*
+ * Unfortunately, we can still be in trouble if |X| <= 2^-10, since
+ * the absolute error 2^-(12+53)-ish in sqrt(1 + X^2) gets scaled up
+ * by 1/X and comes close to our threshold. Hence if |X| <= 2^-9,
+ * perform an alternative computation
+ * sqrt(1 + X^2) - 1 = X^2/2 - X^4/8 + X^6/16
+ * X2 = X^2
+ */
+        addpd     %xmm7, %xmm8
+        addpd     %xmm7, %xmm5
+        movlhps   %xmm1, %xmm1
+        rsqrtps   %xmm1, %xmm4
+        addpd     %xmm3, %xmm5
+        cvtps2pd  %xmm4, %xmm2
+        andps     dTopMask12+__svml_dasinh_data_internal(%rip), %xmm2
+
+/*
+ * Compute e = -(2 * d + d^2)
+ * The first FMR is exact, and the rounding error in the other is acceptable
+ * since d and e are ~ 2^-12
+ */
+        movaps    %xmm12, %xmm1
+
+/*
+ * Compute S = (Y/sqrt(Y + W)) * (1 + d)
+ * and T = (W/sqrt(Y + W)) * (1 + d)
+ * so that S + T = sqrt(Y + W) * (1 + d)
+ * S is exact, and the rounding error in T is OK.
+ */
+        mulpd     %xmm2, %xmm6
+        mulpd     %xmm2, %xmm5
+        movaps    %xmm2, %xmm0
+
+/*
+ * Obtain sqrt(1 + X^2) - 1 in two pieces
+ * sqrt(1 + X^2) - 1
+ * = sqrt(Y + W) - 1
+ * = (S + T) * (1 + Corr) - 1
+ * = [S - 1] + [T + (S + T) * Corr]
+ * We need a compensated summation for the last part. We treat S - 1
+ * as the larger part; it certainly is until about X < 2^-4, and in that
+ * case, the error is affordable since X dominates over sqrt(1 + X^2) - 1
+ * Final sum is dTmp5 (hi) + dTmp7 (lo)
+ */
+        movaps    %xmm6, %xmm3
+        mulpd     %xmm6, %xmm0
+        mulpd     %xmm5, %xmm2
+        subpd     %xmm0, %xmm1
+        addpd     %xmm5, %xmm3
+        subpd     %xmm12, %xmm6
+        subpd     %xmm2, %xmm1
+        movups    SgnMask+__svml_dasinh_data_internal(%rip), %xmm9
+        movaps    %xmm12, %xmm4
+
+/*
+ * Get the absolute value of the input, since we will exploit antisymmetry
+ * and mostly assume X >= 0 in the core computation
+ */
+        movaps    %xmm9, %xmm10
+        andps     %xmm13, %xmm10
+
+/*
+ * Check whether the input is finite, by checking |X| <= MaxFloat
+ * Otherwise set the rangemask so that the callout will get used.
+ * Note that this will also use the callout for NaNs since not(NaN <= MaxFloat)
+ */
+        movaps    %xmm10, %xmm14
+
+/*
+ * The following computation can go wrong for very large X, basically
+ * because X^2 overflows. But for large X we have
+ * asinh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
+ * we can just later stick X back into the log and tweak up the exponent.
+ * Actually we scale X by 2^-30 and tweak the exponent up by 31,
+ * to stay in the safe range for the later log computation.
+ * Compute a flag now telling us when do do this.
+ */
+        movaps    %xmm10, %xmm11
+        cmpnlepd  dLargestFinite+__svml_dasinh_data_internal(%rip), %xmm14
+        cmpltpd   dBigThreshold+__svml_dasinh_data_internal(%rip), %xmm11
+        movmskpd  %xmm14, %edx
+
+/*
+ * Now       1 / (1 + d)
+ * = 1 / (1 + (sqrt(1 - e) - 1))
+ * = 1 / sqrt(1 - e)
+ * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 +
+ * 63/256 * e^5 + 231/1024 * e^6 + ....
+ * So compute the first five nonconstant terms of that, so that
+ * we have a relative correction (1 + Corr) to apply to S etc.
+ * C1 = 1/2
+ * C2 = 3/8
+ * C3 = 5/16
+ * C4 = 35/128
+ * C5 = 63/256
+ */
+        movups    dC5+__svml_dasinh_data_internal(%rip), %xmm14
+        movups    dHalf+__svml_dasinh_data_internal(%rip), %xmm15
+        mulpd     %xmm1, %xmm14
+
+/* dX2over2 = X^2/2 */
+        mulpd     %xmm15, %xmm8
+        addpd     dC4+__svml_dasinh_data_internal(%rip), %xmm14
+        mulpd     %xmm1, %xmm14
+        addpd     dC3+__svml_dasinh_data_internal(%rip), %xmm14
+        mulpd     %xmm1, %xmm14
+        addpd     dC2+__svml_dasinh_data_internal(%rip), %xmm14
+        mulpd     %xmm1, %xmm14
+        addpd     %xmm15, %xmm14
+        mulpd     %xmm14, %xmm1
+        mulpd     %xmm3, %xmm1
+        addpd     %xmm1, %xmm5
+        addpd     %xmm6, %xmm5
+
+/* dX4over4 = X^4/4 */
+        movaps    %xmm8, %xmm6
+
+/* dX46 = -X^4/4 + X^6/8 */
+        movaps    %xmm8, %xmm7
+        mulpd     %xmm8, %xmm6
+        mulpd     %xmm6, %xmm7
+        subpd     %xmm6, %xmm7
+
+/* dX46over2 = -X^4/8 + x^6/16 */
+        mulpd     %xmm7, %xmm15
+
+/* Now multiplex the two possible computations */
+        movaps    %xmm10, %xmm3
+        cmplepd   dLittleThreshold+__svml_dasinh_data_internal(%rip), %xmm3
+        addpd     %xmm15, %xmm8
+        movaps    %xmm3, %xmm1
+        andps     %xmm3, %xmm8
+        andnps    %xmm5, %xmm1
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * also adding L into Xl.
+ * compute 1+x as high, low parts
+ */
+        movaps    %xmm12, %xmm5
+        orps      %xmm8, %xmm1
+        movaps    %xmm11, %xmm3
+
+/*
+ * Now do another compensated sum to add |X| + [sqrt(1 + X^2) - 1].
+ * It's always safe to assume |X| is larger.
+ * This is the final 2-part argument to the log1p function
+ */
+        addpd     %xmm10, %xmm1
+        maxpd     %xmm1, %xmm5
+        minpd     %xmm1, %xmm4
+
+/* Now multiplex to the case X = 2^-30 * |input|, Xl = dL = 0 in the "big" case. */
+        movups    XScale+__svml_dasinh_data_internal(%rip), %xmm8
+        andps     %xmm9, %xmm1
+        mulpd     %xmm8, %xmm10
+        cmpltpd   XThreshold+__svml_dasinh_data_internal(%rip), %xmm1
+        movaps    %xmm5, %xmm9
+        andnps    %xmm10, %xmm3
+        addpd     %xmm4, %xmm9
+        orps      XhMask+__svml_dasinh_data_internal(%rip), %xmm1
+        andps     %xmm1, %xmm9
+        subpd     %xmm9, %xmm5
+        andps     %xmm11, %xmm9
+
+/* Now resume the main code. */
+        movups    ExpMask+__svml_dasinh_data_internal(%rip), %xmm10
+        orps      %xmm9, %xmm3
+
+/* preserve mantissa, set input exponent to 2^(-10) */
+        andps     %xmm3, %xmm10
+
+/* exponent bits */
+        movaps    %xmm3, %xmm7
+        orps      Two10+__svml_dasinh_data_internal(%rip), %xmm10
+        psrlq     $20, %xmm7
+
+/* reciprocal approximation good to at least 11 bits */
+        cvtpd2ps  %xmm10, %xmm1
+        addpd     %xmm5, %xmm4
+        movlhps   %xmm1, %xmm1
+        andps     %xmm11, %xmm4
+        rcpps     %xmm1, %xmm0
+        cvtps2pd  %xmm0, %xmm0
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        movups    .FLT_30(%rip), %xmm6
+        movaps    %xmm11, %xmm1
+        addpd     %xmm6, %xmm0
+        subpd     %xmm6, %xmm0
+
+/* exponent of X needed to scale Xl */
+        movdqu    ExpMask0+__svml_dasinh_data_internal(%rip), %xmm5
+
+/* 2^ (-10-exp(X) ) */
+        movdqu    ExpMask2+__svml_dasinh_data_internal(%rip), %xmm2
+        pand      %xmm3, %xmm5
+        psubq     %xmm5, %xmm2
+
+/* scale DblRcp */
+        mulpd     %xmm0, %xmm2
+
+/* argument reduction */
+        mulpd     %xmm2, %xmm3
+        mulpd     %xmm2, %xmm4
+        subpd     %xmm12, %xmm3
+        addpd     %xmm4, %xmm3
+
+/* polynomial */
+        movups    poly_coeff+__svml_dasinh_data_internal(%rip), %xmm12
+        movaps    %xmm3, %xmm2
+        pshufd    $221, %xmm7, %xmm8
+        mulpd     %xmm3, %xmm12
+
+/* biased exponent in DP format */
+        cvtdq2pd  %xmm8, %xmm14
+        addpd     poly_coeff+16+__svml_dasinh_data_internal(%rip), %xmm12
+        mulpd     %xmm3, %xmm2
+
+/* Add 31 to the exponent in the "large" case to get log(2 * input) */
+        movups    dThirtyOne+__svml_dasinh_data_internal(%rip), %xmm9
+
+/* exponent*log(2.0) */
+        movups    Threshold+__svml_dasinh_data_internal(%rip), %xmm5
+        addpd     %xmm14, %xmm9
+        cmpltpd   %xmm0, %xmm5
+        mulpd     %xmm2, %xmm12
+        andps     %xmm11, %xmm14
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        movaps    %xmm0, %xmm11
+        movups    poly_coeff+32+__svml_dasinh_data_internal(%rip), %xmm0
+        andnps    %xmm9, %xmm1
+        mulpd     %xmm3, %xmm0
+        addpd     poly_coeff+48+__svml_dasinh_data_internal(%rip), %xmm0
+        addpd     %xmm12, %xmm0
+
+/* reconstruction */
+        mulpd     %xmm0, %xmm2
+        andps     Bias+__svml_dasinh_data_internal(%rip), %xmm5
+        psrlq     $40, %xmm11
+        orps      Bias1+__svml_dasinh_data_internal(%rip), %xmm5
+        orps      %xmm14, %xmm1
+        movd      %xmm11, %eax
+        pshufd    $2, %xmm11, %xmm11
+
+/* Finally, reincorporate the original sign. */
+        movups    dSign+__svml_dasinh_data_internal(%rip), %xmm0
+        subpd     %xmm5, %xmm1
+        addpd     %xmm2, %xmm3
+        movd      %xmm11, %ecx
+        mulpd     L2+__svml_dasinh_data_internal(%rip), %xmm1
+        movslq    %eax, %rax
+        andps     %xmm13, %xmm0
+        movslq    %ecx, %rcx
+        movsd     (%rsi,%rax), %xmm6
+        movhpd    (%rsi,%rcx), %xmm6
+        addpd     %xmm3, %xmm6
+        addpd     %xmm6, %xmm1
+        pxor      %xmm1, %xmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm13
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm13, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $2, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      asinh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 48(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVbN2v_asinh_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_dasinh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
+        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
+        __declspec(align(16)) VUINT32 ExpMask[2][2];
+        __declspec(align(16)) VUINT32 Two10[2][2];
+        __declspec(align(16)) VUINT32 MinLog1p[2][2];
+        __declspec(align(16)) VUINT32 MaxLog1p[2][2];
+        __declspec(align(16)) VUINT32 One[2][2];
+        __declspec(align(16)) VUINT32 SgnMask[2][2];
+        __declspec(align(16)) VUINT32 XThreshold[2][2];
+        __declspec(align(16)) VUINT32 XhMask[2][2];
+        __declspec(align(16)) VUINT32 Threshold[2][2];
+        __declspec(align(16)) VUINT32 Bias[2][2];
+        __declspec(align(16)) VUINT32 Bias1[2][2];
+        __declspec(align(16)) VUINT32 ExpMask0[2][2];
+        __declspec(align(16)) VUINT32 ExpMask2[2][2];
+        __declspec(align(16)) VUINT32 L2[2][2];
+        __declspec(align(16)) VUINT32 dBigThreshold[2][2];
+        __declspec(align(16)) VUINT32 dC2[2][2];
+        __declspec(align(16)) VUINT32 dC3[2][2];
+        __declspec(align(16)) VUINT32 dC4[2][2];
+        __declspec(align(16)) VUINT32 dC5[2][2];
+        __declspec(align(16)) VUINT32 dHalf[2][2];
+        __declspec(align(16)) VUINT32 dLargestFinite[2][2];
+        __declspec(align(16)) VUINT32 dLittleThreshold[2][2];
+        __declspec(align(16)) VUINT32 dSign[2][2];
+        __declspec(align(16)) VUINT32 dThirtyOne[2][2];
+        __declspec(align(16)) VUINT32 dTopMask12[2][2];
+        __declspec(align(16)) VUINT32 dTopMask26[2][2];
+        __declspec(align(16)) VUINT32 dTopMask29[2][2];
+        __declspec(align(16)) VUINT32 XScale[2][2];
+} __svml_dasinh_data_internal;
+#endif
+__svml_dasinh_data_internal:
+        /* Log_HA_table */
+        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
+        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
+        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
+        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
+        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
+        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
+        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
+        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
+        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
+        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
+        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
+        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
+        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
+        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
+        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
+        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
+        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
+        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
+        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
+        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
+        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
+        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
+        .quad 0xc086238206e94218, 0xbe1ceee898588610
+        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
+        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
+        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
+        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
+        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
+        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
+        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
+        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
+        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
+        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
+        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
+        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
+        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
+        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
+        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
+        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
+        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
+        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
+        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
+        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
+        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
+        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
+        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
+        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
+        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
+        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
+        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
+        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
+        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
+        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
+        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
+        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
+        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
+        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
+        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
+        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
+        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
+        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
+        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
+        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
+        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
+        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
+        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
+        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
+        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
+        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
+        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
+        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
+        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
+        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
+        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
+        .quad 0xc086244055d2c968, 0xbe1cef345284c119
+        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
+        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
+        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
+        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
+        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
+        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
+        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
+        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
+        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
+        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
+        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
+        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
+        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
+        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
+        .quad 0xc086247419475160, 0xbe1cf03dd9922331
+        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
+        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
+        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
+        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
+        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
+        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
+        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
+        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
+        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
+        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
+        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
+        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
+        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
+        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
+        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
+        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
+        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
+        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
+        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
+        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
+        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
+        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
+        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
+        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
+        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
+        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
+        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
+        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
+        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
+        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
+        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
+        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
+        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
+        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
+        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
+        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
+        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
+        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
+        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
+        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
+        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
+        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
+        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
+        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
+        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
+        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
+        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
+        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
+        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
+        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
+        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
+        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
+        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
+        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
+        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
+        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
+        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
+        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
+        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
+        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
+        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
+        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
+        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
+        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
+        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
+        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
+        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
+        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
+        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
+        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
+        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
+        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
+        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
+        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
+        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
+        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
+        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
+        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
+        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
+        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
+        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
+        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
+        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
+        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
+        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
+        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
+        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
+        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
+        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
+        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
+        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
+        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
+        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
+        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
+        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
+        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
+        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
+        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
+        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
+        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
+        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
+        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
+        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
+        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
+        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
+        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
+        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
+        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
+        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
+        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
+        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
+        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
+        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
+        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
+        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
+        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
+        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
+        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
+        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
+        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
+        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
+        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
+        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
+        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
+        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
+        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
+        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
+        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
+        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
+        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
+        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
+        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
+        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
+        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
+        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
+        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
+        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
+        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
+        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
+        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
+        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
+        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
+        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
+        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
+        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
+        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
+        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
+        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
+        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
+        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
+        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
+        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
+        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
+        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
+        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
+        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
+        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
+        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
+        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
+        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
+        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
+        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
+        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
+        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
+        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
+        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
+        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
+        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
+        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
+        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
+        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
+        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
+        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
+        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
+        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
+        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
+        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
+        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
+        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
+        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
+        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
+        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
+        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
+        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
+        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
+        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
+        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
+        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
+        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
+        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
+        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
+        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
+        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
+        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
+        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
+        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
+        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
+        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
+        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
+        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
+        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
+        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
+        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
+        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
+        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
+        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
+        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
+        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
+        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
+        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
+        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
+        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
+        .quad 0xc08626e164224880, 0xbe1ceeb431709788
+        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
+        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
+        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
+        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
+        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
+        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
+        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
+        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
+        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
+        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
+        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
+        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
+        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
+        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
+        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
+        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
+        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
+        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
+        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
+        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
+        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
+        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
+        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
+        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
+        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
+        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
+        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
+        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
+        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
+        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
+        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
+        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
+        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
+        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
+        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
+        .quad 0xc086273a05367688, 0xbe1cf18656c50806
+        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
+        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
+        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
+        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
+        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
+        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
+        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
+        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
+        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
+        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
+        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
+        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
+        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
+        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
+        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
+        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
+        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
+        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
+        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
+        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
+        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
+        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
+        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
+        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
+        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
+        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
+        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
+        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
+        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
+        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
+        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
+        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
+        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
+        .quad 0xc086278a58297918, 0xbe1cf053073872bf
+        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
+        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
+        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
+        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
+        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
+        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
+        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
+        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
+        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
+        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
+        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
+        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
+        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
+        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
+        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
+        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
+        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
+        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
+        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
+        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
+        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
+        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
+        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
+        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
+        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
+        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
+        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
+        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
+        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
+        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
+        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
+        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
+        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
+        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
+        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
+        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
+        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
+        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
+        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
+        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
+        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
+        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
+        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
+        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
+        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
+        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
+        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
+        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
+        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
+        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
+        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
+        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
+        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
+        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
+        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
+        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
+        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
+        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
+        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
+        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
+        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
+        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
+        .quad 0xc086281755366778, 0xbe1cef2edae5837d
+        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
+        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
+        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
+        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
+        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
+        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
+        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
+        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
+        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
+        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
+        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
+        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
+        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
+        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
+        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
+        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
+        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
+        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
+        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
+        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
+        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
+        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
+        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
+        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
+        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
+        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
+        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
+        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
+        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
+        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
+        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
+        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
+        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
+        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
+        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
+        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
+        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
+        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
+        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
+        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
+        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
+        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
+        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
+        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
+        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
+        .quad 0xc086287879041490, 0xbe1cf034803c8a48
+        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
+        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
+        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
+        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
+        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
+        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
+        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
+        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
+        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
+        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
+        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
+        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
+        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
+        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
+        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
+        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
+        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
+        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
+        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
+        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
+        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
+        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
+        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
+        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
+        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
+        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
+        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
+        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
+        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
+        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
+        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
+        /*== Log_LA_table ==*/
+        .align 16
+        .quad 0x8000000000000000
+        .quad 0xbf5ff802a9ab10e6
+        .quad 0xbf6ff00aa2b10bc0
+        .quad 0xbf77ee11ebd82e94
+        .quad 0xbf7fe02a6b106789
+        .quad 0xbf83e7295d25a7d9
+        .quad 0xbf87dc475f810a77
+        .quad 0xbf8bcf712c74384c
+        .quad 0xbf8fc0a8b0fc03e4
+        .quad 0xbf91d7f7eb9eebe7
+        .quad 0xbf93cea44346a575
+        .quad 0xbf95c45a51b8d389
+        .quad 0xbf97b91b07d5b11b
+        .quad 0xbf99ace7551cc514
+        .quad 0xbf9b9fc027af9198
+        .quad 0xbf9d91a66c543cc4
+        .quad 0xbf9f829b0e783300
+        .quad 0xbfa0b94f7c196176
+        .quad 0xbfa1b0d98923d980
+        .quad 0xbfa2a7ec2214e873
+        .quad 0xbfa39e87b9febd60
+        .quad 0xbfa494acc34d911c
+        .quad 0xbfa58a5bafc8e4d5
+        .quad 0xbfa67f94f094bd98
+        .quad 0xbfa77458f632dcfc
+        .quad 0xbfa868a83083f6cf
+        .quad 0xbfa95c830ec8e3eb
+        .quad 0xbfaa4fe9ffa3d235
+        .quad 0xbfab42dd711971bf
+        .quad 0xbfac355dd0921f2d
+        .quad 0xbfad276b8adb0b52
+        .quad 0xbfae19070c276016
+        .quad 0xbfaf0a30c01162a6
+        .quad 0xbfaffae9119b9303
+        .quad 0xbfb075983598e471
+        .quad 0xbfb0ed839b5526fe
+        .quad 0xbfb16536eea37ae1
+        .quad 0xbfb1dcb263db1944
+        .quad 0xbfb253f62f0a1417
+        .quad 0xbfb2cb0283f5de1f
+        .quad 0xbfb341d7961bd1d1
+        .quad 0xbfb3b87598b1b6ee
+        .quad 0xbfb42edcbea646f0
+        .quad 0xbfb4a50d3aa1b040
+        .quad 0xbfb51b073f06183f
+        .quad 0xbfb590cafdf01c28
+        .quad 0xbfb60658a93750c4
+        .quad 0xbfb67bb0726ec0fc
+        .quad 0xbfb6f0d28ae56b4c
+        .quad 0xbfb765bf23a6be13
+        .quad 0xbfb7da766d7b12cd
+        .quad 0xbfb84ef898e8282a
+        .quad 0xbfb8c345d6319b21
+        .quad 0xbfb9375e55595ede
+        .quad 0xbfb9ab42462033ad
+        .quad 0xbfba1ef1d8061cd4
+        .quad 0xbfba926d3a4ad563
+        .quad 0xbfbb05b49bee43fe
+        .quad 0xbfbb78c82bb0eda1
+        .quad 0xbfbbeba818146765
+        .quad 0xbfbc5e548f5bc743
+        .quad 0xbfbcd0cdbf8c13e1
+        .quad 0xbfbd4313d66cb35d
+        .quad 0xbfbdb5270187d927
+        .quad 0xbfbe27076e2af2e6
+        .quad 0xbfbe98b549671467
+        .quad 0xbfbf0a30c01162a6
+        .quad 0xbfbf7b79fec37ddf
+        .quad 0xbfbfec9131dbeabb
+        .quad 0xbfc02ebb42bf3d4b
+        .quad 0xbfc0671512ca596e
+        .quad 0xbfc09f561ee719c3
+        .quad 0xbfc0d77e7cd08e59
+        .quad 0xbfc10f8e422539b1
+        .quad 0xbfc14785846742ac
+        .quad 0xbfc17f6458fca611
+        .quad 0xbfc1b72ad52f67a0
+        .quad 0xbfc1eed90e2dc2c3
+        .quad 0xbfc2266f190a5acb
+        .quad 0xbfc25ded0abc6ad2
+        .quad 0xbfc29552f81ff523
+        .quad 0xbfc2cca0f5f5f251
+        .quad 0xbfc303d718e47fd3
+        .quad 0xbfc33af575770e4f
+        .quad 0xbfc371fc201e8f74
+        .quad 0xbfc3a8eb2d31a376
+        .quad 0xbfc3dfc2b0ecc62a
+        .quad 0xbfc41682bf727bc0
+        .quad 0xbfc44d2b6ccb7d1e
+        .quad 0xbfc483bccce6e3dd
+        .quad 0xbfc4ba36f39a55e5
+        .quad 0xbfc4f099f4a230b2
+        .quad 0xbfc526e5e3a1b438
+        .quad 0xbfc55d1ad4232d6f
+        .quad 0xbfc59338d9982086
+        .quad 0xbfc5c940075972b9
+        .quad 0xbfc5ff3070a793d4
+        .quad 0xbfc6350a28aaa758
+        .quad 0xbfc66acd4272ad51
+        .quad 0xbfc6a079d0f7aad2
+        .quad 0xbfc6d60fe719d21d
+        .quad 0xbfc70b8f97a1aa75
+        .quad 0xbfc740f8f54037a5
+        .quad 0xbfc7764c128f2127
+        .quad 0xbfc7ab890210d909
+        .quad 0xbfc7e0afd630c274
+        .quad 0xbfc815c0a14357eb
+        .quad 0xbfc84abb75865139
+        .quad 0xbfc87fa06520c911
+        .quad 0xbfc8b46f8223625b
+        .quad 0xbfc8e928de886d41
+        .quad 0xbfc91dcc8c340bde
+        .quad 0xbfc9525a9cf456b4
+        .quad 0xbfc986d3228180ca
+        .quad 0xbfc9bb362e7dfb83
+        .quad 0xbfc9ef83d2769a34
+        .quad 0xbfca23bc1fe2b563
+        .quad 0xbfca57df28244dcd
+        .quad 0xbfca8becfc882f19
+        .quad 0xbfcabfe5ae46124c
+        .quad 0xbfcaf3c94e80bff3
+        .quad 0xbfcb2797ee46320c
+        .quad 0xbfcb5b519e8fb5a4
+        .quad 0xbfcb8ef670420c3b
+        .quad 0xbfcbc286742d8cd6
+        .quad 0xbfcbf601bb0e44e2
+        .quad 0xbfcc2968558c18c1
+        .quad 0xbfcc5cba543ae425
+        .quad 0xbfcc8ff7c79a9a22
+        .quad 0xbfccc320c0176502
+        .quad 0xbfccf6354e09c5dc
+        .quad 0xbfcd293581b6b3e7
+        .quad 0xbfcd5c216b4fbb91
+        .quad 0xbfcd8ef91af31d5e
+        .quad 0xbfcdc1bca0abec7d
+        .quad 0xbfcdf46c0c722d2f
+        .quad 0xbfce27076e2af2e6
+        .quad 0xbfce598ed5a87e2f
+        .quad 0xbfce8c0252aa5a60
+        .quad 0xbfcebe61f4dd7b0b
+        .quad 0xbfcef0adcbdc5936
+        .quad 0xbfcf22e5e72f105d
+        .quad 0xbfcf550a564b7b37
+        .quad 0xbfcf871b28955045
+        .quad 0xbfcfb9186d5e3e2b
+        .quad 0xbfcfeb0233e607cc
+        .quad 0xbfd00e6c45ad501d
+        .quad 0xbfd0274dc16c232f
+        .quad 0xbfd0402594b4d041
+        .quad 0xbfd058f3c703ebc6
+        .quad 0xbfd071b85fcd590d
+        .quad 0xbfd08a73667c57af
+        .quad 0xbfd0a324e27390e3
+        .quad 0xbfd0bbccdb0d24bd
+        .quad 0xbfd0d46b579ab74b
+        .quad 0xbfd0ed005f657da4
+        .quad 0xbfd1058bf9ae4ad5
+        .quad 0xbfd11e0e2dad9cb7
+        .quad 0xbfd136870293a8b0
+        .quad 0xbfd14ef67f88685a
+        .quad 0xbfd1675cababa60e
+        .quad 0xbfd17fb98e15095d
+        .quad 0xbfd1980d2dd4236f
+        .quad 0xbfd1b05791f07b49
+        .quad 0xbfd1c898c16999fb
+        .quad 0xbfd1e0d0c33716be
+        .quad 0xbfd1f8ff9e48a2f3
+        .quad 0xbfd211255986160c
+        .quad 0xbfd22941fbcf7966
+        .quad 0xbfd241558bfd1404
+        .quad 0xbfd2596010df763a
+        .quad 0xbfd27161913f853d
+        .quad 0xbfd2895a13de86a3
+        .quad 0xbfd2a1499f762bc9
+        .quad 0xbfd2b9303ab89d25
+        .quad 0xbfd2d10dec508583
+        .quad 0xbfd2e8e2bae11d31
+        .quad 0xbfd300aead06350c
+        .quad 0xbfd31871c9544185
+        .quad 0xbfd3302c16586588
+        .quad 0xbfd347dd9a987d55
+        .quad 0xbfd35f865c93293e
+        .quad 0xbfd3772662bfd85b
+        .quad 0xbfd38ebdb38ed321
+        .quad 0xbfd3a64c556945ea
+        .quad 0xbfd3bdd24eb14b6a
+        .quad 0xbfd3d54fa5c1f710
+        .quad 0xbfd3ecc460ef5f50
+        .quad 0xbfd404308686a7e4
+        .quad 0xbfd41b941cce0bee
+        .quad 0xbfd432ef2a04e814
+        .quad 0xbfd44a41b463c47c
+        .quad 0xbfd4618bc21c5ec2
+        .quad 0xbfd478cd5959b3d9
+        .quad 0xbfd49006804009d1
+        .quad 0xbfd4a7373cecf997
+        .quad 0xbfd4be5f957778a1
+        .quad 0xbfd4d57f8fefe27f
+        .quad 0xbfd4ec973260026a
+        .quad 0xbfd503a682cb1cb3
+        .quad 0xbfd51aad872df82d
+        .quad 0xbfd531ac457ee77e
+        .quad 0xbfd548a2c3add263
+        .quad 0xbfd55f9107a43ee2
+        .quad 0xbfd5767717455a6c
+        .quad 0xbfd58d54f86e02f2
+        .quad 0xbfd5a42ab0f4cfe2
+        .quad 0xbfd5baf846aa1b19
+        .quad 0xbfd5d1bdbf5809ca
+        .quad 0xbfd5e87b20c2954a
+        .quad 0xbfd5ff3070a793d4
+        .quad 0xbfd615ddb4bec13c
+        .quad 0xbfd62c82f2b9c795
+        .quad 0x3fd61965cdb02c1f
+        .quad 0x3fd602d08af091ec
+        .quad 0x3fd5ec433d5c35ae
+        .quad 0x3fd5d5bddf595f30
+        .quad 0x3fd5bf406b543db2
+        .quad 0x3fd5a8cadbbedfa1
+        .quad 0x3fd5925d2b112a59
+        .quad 0x3fd57bf753c8d1fb
+        .quad 0x3fd565995069514c
+        .quad 0x3fd54f431b7be1a9
+        .quad 0x3fd538f4af8f72fe
+        .quad 0x3fd522ae0738a3d8
+        .quad 0x3fd50c6f1d11b97c
+        .quad 0x3fd4f637ebba9810
+        .quad 0x3fd4e0086dd8baca
+        .quad 0x3fd4c9e09e172c3c
+        .quad 0x3fd4b3c077267e9a
+        .quad 0x3fd49da7f3bcc41f
+        .quad 0x3fd487970e958770
+        .quad 0x3fd4718dc271c41b
+        .quad 0x3fd45b8c0a17df13
+        .quad 0x3fd44591e0539f49
+        .quad 0x3fd42f9f3ff62642
+        .quad 0x3fd419b423d5e8c7
+        .quad 0x3fd403d086cea79c
+        .quad 0x3fd3edf463c1683e
+        .quad 0x3fd3d81fb5946dba
+        .quad 0x3fd3c25277333184
+        .quad 0x3fd3ac8ca38e5c5f
+        .quad 0x3fd396ce359bbf54
+        .quad 0x3fd3811728564cb2
+        .quad 0x3fd36b6776be1117
+        .quad 0x3fd355bf1bd82c8b
+        .quad 0x3fd3401e12aecba1
+        .quad 0x3fd32a84565120a8
+        .quad 0x3fd314f1e1d35ce4
+        .quad 0x3fd2ff66b04ea9d4
+        .quad 0x3fd2e9e2bce12286
+        .quad 0x3fd2d46602adccee
+        .quad 0x3fd2bef07cdc9354
+        .quad 0x3fd2a982269a3dbf
+        .quad 0x3fd2941afb186b7c
+        .quad 0x3fd27ebaf58d8c9d
+        .quad 0x3fd269621134db92
+        .quad 0x3fd25410494e56c7
+        .quad 0x3fd23ec5991eba49
+        .quad 0x3fd22981fbef797b
+        .quad 0x3fd214456d0eb8d4
+        .quad 0x3fd1ff0fe7cf47a7
+        .quad 0x3fd1e9e1678899f4
+        .quad 0x3fd1d4b9e796c245
+        .quad 0x3fd1bf99635a6b95
+        .quad 0x3fd1aa7fd638d33f
+        .quad 0x3fd1956d3b9bc2fa
+        .quad 0x3fd180618ef18adf
+        .quad 0x3fd16b5ccbacfb73
+        .quad 0x3fd1565eed455fc3
+        .quad 0x3fd14167ef367783
+        .quad 0x3fd12c77cd00713b
+        .quad 0x3fd1178e8227e47c
+        .quad 0x3fd102ac0a35cc1c
+        .quad 0x3fd0edd060b78081
+        .quad 0x3fd0d8fb813eb1ef
+        .quad 0x3fd0c42d676162e3
+        .quad 0x3fd0af660eb9e279
+        .quad 0x3fd09aa572e6c6d4
+        .quad 0x3fd085eb8f8ae797
+        .quad 0x3fd07138604d5862
+        .quad 0x3fd05c8be0d9635a
+        .quad 0x3fd047e60cde83b8
+        .quad 0x3fd03346e0106062
+        .quad 0x3fd01eae5626c691
+        .quad 0x3fd00a1c6adda473
+        .quad 0x3fcfeb2233ea07cd
+        .quad 0x3fcfc218be620a5e
+        .quad 0x3fcf991c6cb3b379
+        .quad 0x3fcf702d36777df0
+        .quad 0x3fcf474b134df229
+        .quad 0x3fcf1e75fadf9bde
+        .quad 0x3fcef5ade4dcffe6
+        .quad 0x3fceccf2c8fe920a
+        .quad 0x3fcea4449f04aaf5
+        .quad 0x3fce7ba35eb77e2a
+        .quad 0x3fce530effe71012
+        .quad 0x3fce2a877a6b2c12
+        .quad 0x3fce020cc6235ab5
+        .quad 0x3fcdd99edaf6d7e9
+        .quad 0x3fcdb13db0d48940
+        .quad 0x3fcd88e93fb2f450
+        .quad 0x3fcd60a17f903515
+        .quad 0x3fcd38666871f465
+        .quad 0x3fcd1037f2655e7b
+        .quad 0x3fcce816157f1988
+        .quad 0x3fccc000c9db3c52
+        .quad 0x3fcc97f8079d44ec
+        .quad 0x3fcc6ffbc6f00f71
+        .quad 0x3fcc480c0005ccd1
+        .quad 0x3fcc2028ab17f9b4
+        .quad 0x3fcbf851c067555f
+        .quad 0x3fcbd087383bd8ad
+        .quad 0x3fcba8c90ae4ad19
+        .quad 0x3fcb811730b823d2
+        .quad 0x3fcb5971a213acdb
+        .quad 0x3fcb31d8575bce3d
+        .quad 0x3fcb0a4b48fc1b46
+        .quad 0x3fcae2ca6f672bd4
+        .quad 0x3fcabb55c31693ad
+        .quad 0x3fca93ed3c8ad9e3
+        .quad 0x3fca6c90d44b704e
+        .quad 0x3fca454082e6ab05
+        .quad 0x3fca1dfc40f1b7f1
+        .quad 0x3fc9f6c407089664
+        .quad 0x3fc9cf97cdce0ec3
+        .quad 0x3fc9a8778debaa38
+        .quad 0x3fc981634011aa75
+        .quad 0x3fc95a5adcf7017f
+        .quad 0x3fc9335e5d594989
+        .quad 0x3fc90c6db9fcbcd9
+        .quad 0x3fc8e588ebac2dbf
+        .quad 0x3fc8beafeb38fe8c
+        .quad 0x3fc897e2b17b19a5
+        .quad 0x3fc871213750e994
+        .quad 0x3fc84a6b759f512f
+        .quad 0x3fc823c16551a3c2
+        .quad 0x3fc7fd22ff599d4f
+        .quad 0x3fc7d6903caf5ad0
+        .quad 0x3fc7b0091651528c
+        .quad 0x3fc7898d85444c73
+        .quad 0x3fc7631d82935a86
+        .quad 0x3fc73cb9074fd14d
+        .quad 0x3fc716600c914054
+        .quad 0x3fc6f0128b756abc
+        .quad 0x3fc6c9d07d203fc7
+        .quad 0x3fc6a399dabbd383
+        .quad 0x3fc67d6e9d785771
+        .quad 0x3fc6574ebe8c133a
+        .quad 0x3fc6313a37335d76
+        .quad 0x3fc60b3100b09476
+        .quad 0x3fc5e533144c1719
+        .quad 0x3fc5bf406b543db2
+        .quad 0x3fc59958ff1d52f1
+        .quad 0x3fc5737cc9018cdd
+        .quad 0x3fc54dabc26105d2
+        .quad 0x3fc527e5e4a1b58d
+        .quad 0x3fc5022b292f6a45
+        .quad 0x3fc4dc7b897bc1c8
+        .quad 0x3fc4b6d6fefe22a4
+        .quad 0x3fc4913d8333b561
+        .quad 0x3fc46baf0f9f5db7
+        .quad 0x3fc4462b9dc9b3dc
+        .quad 0x3fc420b32740fdd4
+        .quad 0x3fc3fb45a59928cc
+        .quad 0x3fc3d5e3126bc27f
+        .quad 0x3fc3b08b6757f2a9
+        .quad 0x3fc38b3e9e027479
+        .quad 0x3fc365fcb0159016
+        .quad 0x3fc340c59741142e
+        .quad 0x3fc31b994d3a4f85
+        .quad 0x3fc2f677cbbc0a96
+        .quad 0x3fc2d1610c86813a
+        .quad 0x3fc2ac55095f5c59
+        .quad 0x3fc28753bc11aba5
+        .quad 0x3fc2625d1e6ddf57
+        .quad 0x3fc23d712a49c202
+        .quad 0x3fc2188fd9807263
+        .quad 0x3fc1f3b925f25d41
+        .quad 0x3fc1ceed09853752
+        .quad 0x3fc1aa2b7e23f72a
+        .quad 0x3fc185747dbecf34
+        .quad 0x3fc160c8024b27b1
+        .quad 0x3fc13c2605c398c3
+        .quad 0x3fc1178e8227e47c
+        .quad 0x3fc0f301717cf0fb
+        .quad 0x3fc0ce7ecdccc28d
+        .quad 0x3fc0aa06912675d5
+        .quad 0x3fc08598b59e3a07
+        .quad 0x3fc06135354d4b18
+        .quad 0x3fc03cdc0a51ec0d
+        .quad 0x3fc0188d2ecf6140
+        .quad 0x3fbfe89139dbd566
+        .quad 0x3fbfa01c9db57ce2
+        .quad 0x3fbf57bc7d9005db
+        .quad 0x3fbf0f70cdd992e3
+        .quad 0x3fbec739830a1120
+        .quad 0x3fbe7f1691a32d3e
+        .quad 0x3fbe3707ee30487b
+        .quad 0x3fbdef0d8d466db9
+        .quad 0x3fbda727638446a2
+        .quad 0x3fbd5f55659210e2
+        .quad 0x3fbd179788219364
+        .quad 0x3fbccfedbfee13a8
+        .quad 0x3fbc885801bc4b23
+        .quad 0x3fbc40d6425a5cb1
+        .quad 0x3fbbf968769fca11
+        .quad 0x3fbbb20e936d6974
+        .quad 0x3fbb6ac88dad5b1c
+        .quad 0x3fbb23965a52ff00
+        .quad 0x3fbadc77ee5aea8c
+        .quad 0x3fba956d3ecade63
+        .quad 0x3fba4e7640b1bc38
+        .quad 0x3fba0792e9277cac
+        .quad 0x3fb9c0c32d4d2548
+        .quad 0x3fb97a07024cbe74
+        .quad 0x3fb9335e5d594989
+        .quad 0x3fb8ecc933aeb6e8
+        .quad 0x3fb8a6477a91dc29
+        .quad 0x3fb85fd927506a48
+        .quad 0x3fb8197e2f40e3f0
+        .quad 0x3fb7d33687c293c9
+        .quad 0x3fb78d02263d82d3
+        .quad 0x3fb746e100226ed9
+        .quad 0x3fb700d30aeac0e1
+        .quad 0x3fb6bad83c1883b6
+        .quad 0x3fb674f089365a7a
+        .quad 0x3fb62f1be7d77743
+        .quad 0x3fb5e95a4d9791cb
+        .quad 0x3fb5a3abb01ade25
+        .quad 0x3fb55e10050e0384
+        .quad 0x3fb518874226130a
+        .quad 0x3fb4d3115d207eac
+        .quad 0x3fb48dae4bc31018
+        .quad 0x3fb4485e03dbdfad
+        .quad 0x3fb403207b414b7f
+        .quad 0x3fb3bdf5a7d1ee64
+        .quad 0x3fb378dd7f749714
+        .quad 0x3fb333d7f8183f4b
+        .quad 0x3fb2eee507b40301
+        .quad 0x3fb2aa04a44717a5
+        .quad 0x3fb26536c3d8c369
+        .quad 0x3fb2207b5c78549e
+        .quad 0x3fb1dbd2643d190b
+        .quad 0x3fb1973bd1465567
+        .quad 0x3fb152b799bb3cc9
+        .quad 0x3fb10e45b3cae831
+        .quad 0x3fb0c9e615ac4e17
+        .quad 0x3fb08598b59e3a07
+        .quad 0x3fb0415d89e74444
+        .quad 0x3faffa6911ab9301
+        .quad 0x3faf723b517fc523
+        .quad 0x3faeea31c006b87c
+        .quad 0x3fae624c4a0b5e1b
+        .quad 0x3fadda8adc67ee4e
+        .quad 0x3fad52ed6405d86f
+        .quad 0x3faccb73cdddb2cc
+        .quad 0x3fac441e06f72a9e
+        .quad 0x3fabbcebfc68f420
+        .quad 0x3fab35dd9b58baad
+        .quad 0x3faaaef2d0fb10fc
+        .quad 0x3faa282b8a936171
+        .quad 0x3fa9a187b573de7c
+        .quad 0x3fa91b073efd7314
+        .quad 0x3fa894aa149fb343
+        .quad 0x3fa80e7023d8ccc4
+        .quad 0x3fa788595a3577ba
+        .quad 0x3fa70265a550e777
+        .quad 0x3fa67c94f2d4bb58
+        .quad 0x3fa5f6e73078efb8
+        .quad 0x3fa5715c4c03ceef
+        .quad 0x3fa4ebf43349e26f
+        .quad 0x3fa466aed42de3ea
+        .quad 0x3fa3e18c1ca0ae92
+        .quad 0x3fa35c8bfaa1306b
+        .quad 0x3fa2d7ae5c3c5bae
+        .quad 0x3fa252f32f8d183f
+        .quad 0x3fa1ce5a62bc353a
+        .quad 0x3fa149e3e4005a8d
+        .quad 0x3fa0c58fa19dfaaa
+        .quad 0x3fa0415d89e74444
+        .quad 0x3f9f7a9b16782856
+        .quad 0x3f9e72bf2813ce51
+        .quad 0x3f9d6b2725979802
+        .quad 0x3f9c63d2ec14aaf2
+        .quad 0x3f9b5cc258b718e6
+        .quad 0x3f9a55f548c5c43f
+        .quad 0x3f994f6b99a24475
+        .quad 0x3f98492528c8cabf
+        .quad 0x3f974321d3d006d3
+        .quad 0x3f963d6178690bd6
+        .quad 0x3f9537e3f45f3565
+        .quad 0x3f9432a925980cc1
+        .quad 0x3f932db0ea132e22
+        .quad 0x3f9228fb1fea2e28
+        .quad 0x3f912487a5507f70
+        .quad 0x3f90205658935847
+        .quad 0x3f8e38ce3033310c
+        .quad 0x3f8c317384c75f06
+        .quad 0x3f8a2a9c6c170462
+        .quad 0x3f882448a388a2aa
+        .quad 0x3f861e77e8b53fc6
+        .quad 0x3f841929f96832f0
+        .quad 0x3f82145e939ef1e9
+        .quad 0x3f8010157588de71
+        .quad 0x3f7c189cbb0e27fb
+        .quad 0x3f78121214586b54
+        .quad 0x3f740c8a747878e2
+        .quad 0x3f70080559588b35
+        .quad 0x3f680904828985c0
+        .quad 0x3f60040155d5889e
+        .quad 0x3f50020055655889
+        .quad 0x0000000000000000
+        /*== poly_coeff[4] ==*/
+        .align 16
+        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
+        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
+        .quad 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
+        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 16
+        .quad 0x3f50000000000000, 0x3f50000000000000
+        /*== MinLog1p = -1+2^(-53) ==*/
+        .align 16
+        .quad 0xbfefffffffffffff, 0xbfefffffffffffff
+        /*== MaxLog1p ==*/
+        .align 16
+        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000
+        /*== One ==*/
+        .align 16
+        .quad 0x3ff0000000000000, 0x3ff0000000000000
+        /*== SgnMask ==*/
+        .align 16
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== XThreshold ==*/
+        .align 16
+        .quad 0x3e00000000000000, 0x3e00000000000000
+        /*== XhMask ==*/
+        .align 16
+        .quad 0xfffffffffffffc00, 0xfffffffffffffc00
+        /*== Threshold ==*/
+        .align 16
+        .quad 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 16
+        .quad 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 16
+        .quad 0x408ff00000000000, 0x408ff00000000000
+        /*== ExpMask ==*/
+        .align 16
+        .quad 0x7ff0000000000000, 0x7ff0000000000000
+        /*== ExpMask2 ==*/
+        .align 16
+        .quad 0x7f40000000000000, 0x7f40000000000000
+        /*== L2L ==*/
+        .align 16
+        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
+        /*== dBigThreshold ==*/
+        .align 16
+        .quad 0x41D0000000000000, 0x41D0000000000000
+        /*== dC2 ==*/
+        .align 16
+        .quad 0x3FD8000000000000, 0x3FD8000000000000
+        /*== dC3 ==*/
+        .align 16
+        .quad 0x3FD4000000000000, 0x3FD4000000000000
+        /*== dC4 ==*/
+        .align 16
+        .quad 0x3FD1800000000000, 0x3FD1800000000000
+        /*== dC5 ==*/
+        .align 16
+        .quad 0x3FCF800000000000, 0x3FCF800000000000
+        /*== dHalf ==*/
+        .align 16
+        .quad 0x3FE0000000000000, 0x3FE0000000000000
+        /*== dLargestFinite ==*/
+        .align 16
+        .quad 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF
+        /*== dLittleThreshold ==*/
+        .align 16
+        .quad 0x3F60000000000000, 0x3F60000000000000
+        /*== dSign ==*/
+        .align 16
+        .quad 0x8000000000000000, 0x8000000000000000
+        /*== dThirtyOne ==*/
+        .align 16
+        .quad 0x403F000000000000, 0x403F000000000000
+        /*== dTopMask12 ==*/
+        .align 16
+        .quad 0xFFFFFE0000000000, 0xFFFFFE0000000000
+        /*== dTopMask26 ==*/
+        .align 16
+        .quad 0xFFFFFFFFF8000000, 0xFFFFFFFFF8000000
+        /*== dTopMask29 ==*/
+        .align 16
+        .quad 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000
+        /*== XScale ==*/
+        .align 16
+        .quad 0x3E10000000000000, 0x3E10000000000000
+        .align 16
+        .type	__svml_dasinh_data_internal,@object
+        .size	__svml_dasinh_data_internal,.-__svml_dasinh_data_internal
+        .align 16
+
+.FLT_30:
+        .long	0x00000000,0x43380000,0x00000000,0x43380000
+        .type	.FLT_30,@object
+        .size	.FLT_30,16
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core-sse.S
new file mode 100644
index 0000000000..903b5f0fb5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized asinh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN4v_asinh _ZGVdN4v_asinh_sse_wrapper
+#include "../svml_d_asinh4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core.c
new file mode 100644
index 0000000000..e7acd032b5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized asinh, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN4v_asinh
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN4v_asinh, __GI__ZGVdN4v_asinh, __redirect__ZGVdN4v_asinh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core_avx2.S
new file mode 100644
index 0000000000..d691d1ec6f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core_avx2.S
@@ -0,0 +1,1601 @@
+/* Function asinh vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute asinh(x) as log(x + sqrt(x*x + 1))
+ *
+ *   Special cases:
+ *
+ *   asinh(NaN) = quiet NaN, and raise invalid exception
+ *   asinh(INF) = that INF
+ *   asinh(0)   = that 0
+ *
+ */
+
+/* Offsets for data table __svml_dasinh_data_internal
+ */
+#define Log_HA_table                  	0
+#define Log_LA_table                  	8224
+#define poly_coeff                    	12352
+#define ExpMask                       	12480
+#define Two10                         	12512
+#define MinLog1p                      	12544
+#define MaxLog1p                      	12576
+#define One                           	12608
+#define SgnMask                       	12640
+#define XThreshold                    	12672
+#define XhMask                        	12704
+#define Threshold                     	12736
+#define Bias                          	12768
+#define Bias1                         	12800
+#define ExpMask0                      	12832
+#define ExpMask2                      	12864
+#define L2                            	12896
+#define dBigThreshold                 	12928
+#define dC2                           	12960
+#define dC3                           	12992
+#define dC4                           	13024
+#define dC5                           	13056
+#define dHalf                         	13088
+#define dLargestFinite                	13120
+#define dLittleThreshold              	13152
+#define dSign                         	13184
+#define dThirtyOne                    	13216
+#define dTopMask12                    	13248
+#define dTopMask29                    	13280
+#define XScale                        	13312
+
+/* Lookup bias for data table __svml_dasinh_data_internal.  */
+#define Table_Lookup_Bias               -0x405fe0
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN4v_asinh_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        lea       Table_Lookup_Bias+__svml_dasinh_data_internal(%rip), %r8
+        vmovapd   %ymm0, %ymm13
+        vmovupd   SgnMask+__svml_dasinh_data_internal(%rip), %ymm9
+
+/* Load the constant 1 and a sign mask */
+        vmovupd   One+__svml_dasinh_data_internal(%rip), %ymm12
+
+/* No need to split X when FMA is available in hardware. */
+        vmulpd    %ymm13, %ymm13, %ymm8
+
+/*
+ * Get the absolute value of the input, since we will exploit antisymmetry
+ * and mostly assume X >= 0 in the core computation
+ */
+        vandpd    %ymm9, %ymm13, %ymm10
+
+/*
+ * Check whether the input is finite, by checking |X| <= MaxFloat
+ * Otherwise set the rangemask so that the callout will get used.
+ * Note that this will also use the callout for NaNs since not(NaN <= MaxFloat)
+ */
+        vcmpnle_uqpd dLargestFinite+__svml_dasinh_data_internal(%rip), %ymm10, %ymm14
+
+/*
+ * Finally, express Y + W = X^2 + 1 accurately where Y has <= 29 bits.
+ * If |X| <= 1 then |XHi| <= 1 and so |X2Hi| <= 1, so we can treat 1
+ * as the dominant component in the compensated summation. Otherwise,
+ * if |X| >= 1, then since X2Hi only has 52 significant bits, the basic
+ * addition will be exact anyway until we get to |X| >= 2^53. But by
+ * that time the log function is well-conditioned enough that the
+ * rounding error doesn't matter. Hence we can treat 1 as dominant even
+ * if it literally isn't.
+ */
+        vaddpd    %ymm8, %ymm12, %ymm5
+
+/*
+ * The following computation can go wrong for very large X, basically
+ * because X^2 overflows. But for large X we have
+ * asinh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
+ * we can just later stick X back into the log and tweak up the exponent.
+ * Actually we scale X by 2^-30 and tweak the exponent up by 31,
+ * to stay in the safe range for the later log computation.
+ * Compute a flag now telling us when do do this.
+ */
+        vcmplt_oqpd dBigThreshold+__svml_dasinh_data_internal(%rip), %ymm10, %ymm11
+        vsubpd    %ymm5, %ymm12, %ymm15
+        vmovmskpd %ymm14, %eax
+        vandpd    dTopMask29+__svml_dasinh_data_internal(%rip), %ymm5, %ymm14
+
+/*
+ * Compute R = 1/sqrt(Y + W) * (1 + d)
+ * Force R to <= 12 significant bits in case it isn't already
+ * This means that R * Y and R^2 * Y are exactly representable.
+ */
+        vcvtpd2ps %ymm14, %xmm1
+        vaddpd    %ymm15, %ymm8, %ymm0
+        vsubpd    %ymm14, %ymm5, %ymm2
+        vrsqrtps  %xmm1, %xmm3
+        vmovapd   %ymm13, %ymm7
+        vfmsub213pd %ymm8, %ymm13, %ymm7
+        vcvtps2pd %xmm3, %ymm6
+        vaddpd    %ymm0, %ymm7, %ymm4
+
+/*
+ * Unfortunately, we can still be in trouble if |X| <= 2^-10, since
+ * the absolute error 2^-(12+53)-ish in sqrt(1 + X^2) gets scaled up
+ * by 1/X and comes close to our threshold. Hence if |X| <= 2^-9,
+ * perform an alternative computation
+ * sqrt(1 + X^2) - 1 = X^2/2 - X^4/8 + X^6/16
+ * X2 = X^2
+ */
+        vaddpd    %ymm7, %ymm8, %ymm7
+        vaddpd    %ymm2, %ymm4, %ymm15
+
+/*
+ * Now       1 / (1 + d)
+ * = 1 / (1 + (sqrt(1 - e) - 1))
+ * = 1 / sqrt(1 - e)
+ * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 +
+ * 63/256 * e^5 + 231/1024 * e^6 + ....
+ * So compute the first five nonconstant terms of that, so that
+ * we have a relative correction (1 + Corr) to apply to S etc.
+ * C1 = 1/2
+ * C2 = 3/8
+ * C3 = 5/16
+ * C4 = 35/128
+ * C5 = 63/256
+ */
+        vmovupd   dC5+__svml_dasinh_data_internal(%rip), %ymm4
+        vandpd    dTopMask12+__svml_dasinh_data_internal(%rip), %ymm6, %ymm0
+
+/*
+ * Compute S = (Y/sqrt(Y + W)) * (1 + d)
+ * and T = (W/sqrt(Y + W)) * (1 + d)
+ * so that S + T = sqrt(Y + W) * (1 + d)
+ * S is exact, and the rounding error in T is OK.
+ */
+        vmulpd    %ymm0, %ymm14, %ymm3
+        vmulpd    %ymm15, %ymm0, %ymm1
+        vmovupd   dHalf+__svml_dasinh_data_internal(%rip), %ymm6
+        vsubpd    %ymm12, %ymm3, %ymm14
+
+/*
+ * Obtain sqrt(1 + X^2) - 1 in two pieces
+ * sqrt(1 + X^2) - 1
+ * = sqrt(Y + W) - 1
+ * = (S + T) * (1 + Corr) - 1
+ * = [S - 1] + [T + (S + T) * Corr]
+ * We need a compensated summation for the last part. We treat S - 1
+ * as the larger part; it certainly is until about X < 2^-4, and in that
+ * case, the error is affordable since X dominates over sqrt(1 + X^2) - 1
+ * Final sum is dTmp5 (hi) + dTmp7 (lo)
+ */
+        vaddpd    %ymm1, %ymm3, %ymm2
+
+/*
+ * Compute e = -(2 * d + d^2)
+ * The first FMR is exact, and the rounding error in the other is acceptable
+ * since d and e are ~ 2^-12
+ */
+        vmovapd   %ymm12, %ymm5
+        vfnmadd231pd %ymm3, %ymm0, %ymm5
+        vfnmadd231pd %ymm1, %ymm0, %ymm5
+        vfmadd213pd dC4+__svml_dasinh_data_internal(%rip), %ymm5, %ymm4
+        vfmadd213pd dC3+__svml_dasinh_data_internal(%rip), %ymm5, %ymm4
+        vfmadd213pd dC2+__svml_dasinh_data_internal(%rip), %ymm5, %ymm4
+        vfmadd213pd %ymm6, %ymm5, %ymm4
+        vmulpd    %ymm4, %ymm5, %ymm0
+        vfmadd213pd %ymm1, %ymm2, %ymm0
+
+/* Now multiplex the two possible computations */
+        vcmple_oqpd dLittleThreshold+__svml_dasinh_data_internal(%rip), %ymm10, %ymm2
+        vaddpd    %ymm14, %ymm0, %ymm15
+
+/* dX2over2 = X^2/2 */
+        vmulpd    %ymm7, %ymm6, %ymm0
+
+/* dX4over4 = X^4/4 */
+        vmulpd    %ymm0, %ymm0, %ymm8
+
+/* dX46 = -X^4/4 + X^6/8 */
+        vfmsub231pd %ymm0, %ymm8, %ymm8
+
+/* dX46over2 = -X^4/8 + x^6/16 */
+        vmulpd    %ymm8, %ymm6, %ymm5
+
+/* 2^ (-10-exp(X) ) */
+        vmovupd   ExpMask2+__svml_dasinh_data_internal(%rip), %ymm8
+        vaddpd    %ymm5, %ymm0, %ymm4
+        vblendvpd %ymm2, %ymm4, %ymm15, %ymm1
+
+/*
+ * Now do another compensated sum to add |X| + [sqrt(1 + X^2) - 1].
+ * It's always safe to assume |X| is larger.
+ * This is the final 2-part argument to the log1p function
+ */
+        vaddpd    %ymm1, %ymm10, %ymm3
+
+/* Now multiplex to the case X = 2^-30 * |input|, Xl = dL = 0 in the "big" case. */
+        vmulpd    XScale+__svml_dasinh_data_internal(%rip), %ymm10, %ymm10
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * also adding L into Xl.
+ * compute 1+x as high, low parts
+ */
+        vmaxpd    %ymm3, %ymm12, %ymm6
+        vminpd    %ymm3, %ymm12, %ymm7
+        vandpd    %ymm9, %ymm3, %ymm9
+        vcmplt_oqpd XThreshold+__svml_dasinh_data_internal(%rip), %ymm9, %ymm0
+        vaddpd    %ymm7, %ymm6, %ymm5
+        vorpd     XhMask+__svml_dasinh_data_internal(%rip), %ymm0, %ymm4
+        vandpd    %ymm4, %ymm5, %ymm1
+        vblendvpd %ymm11, %ymm1, %ymm10, %ymm5
+        vsubpd    %ymm1, %ymm6, %ymm2
+
+/* exponent bits */
+        vpsrlq    $20, %ymm5, %ymm10
+        vaddpd    %ymm2, %ymm7, %ymm3
+
+/*
+ * Now resume the main code.
+ * preserve mantissa, set input exponent to 2^(-10)
+ */
+        vandpd    ExpMask+__svml_dasinh_data_internal(%rip), %ymm5, %ymm0
+        vorpd     Two10+__svml_dasinh_data_internal(%rip), %ymm0, %ymm2
+
+/* reciprocal approximation good to at least 11 bits */
+        vcvtpd2ps %ymm2, %xmm6
+        vrcpps    %xmm6, %xmm7
+        vcvtps2pd %xmm7, %ymm15
+
+/* exponent of X needed to scale Xl */
+        vandps    ExpMask0+__svml_dasinh_data_internal(%rip), %ymm5, %ymm9
+        vpsubq    %ymm9, %ymm8, %ymm0
+        vandpd    %ymm11, %ymm3, %ymm4
+
+/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
+        vroundpd  $0, %ymm15, %ymm3
+
+/* scale DblRcp */
+        vmulpd    %ymm0, %ymm3, %ymm2
+
+/* argument reduction */
+        vfmsub213pd %ymm12, %ymm2, %ymm5
+        vmulpd    %ymm2, %ymm4, %ymm12
+        vmovupd   poly_coeff+64+__svml_dasinh_data_internal(%rip), %ymm2
+        vaddpd    %ymm12, %ymm5, %ymm5
+        vfmadd213pd poly_coeff+96+__svml_dasinh_data_internal(%rip), %ymm5, %ymm2
+        vmulpd    %ymm5, %ymm5, %ymm4
+        vextractf128 $1, %ymm10, %xmm14
+        vshufps   $221, %xmm14, %xmm10, %xmm1
+
+/* biased exponent in DP format */
+        vcvtdq2pd %xmm1, %ymm7
+
+/* exponent*log(2.0) */
+        vmovupd   Threshold+__svml_dasinh_data_internal(%rip), %ymm10
+
+/* Add 31 to the exponent in the "large" case to get log(2 * input) */
+        vaddpd    dThirtyOne+__svml_dasinh_data_internal(%rip), %ymm7, %ymm6
+        vblendvpd %ymm11, %ymm7, %ymm6, %ymm1
+
+/*
+ * prepare table index
+ * table lookup
+ */
+        vpsrlq    $40, %ymm3, %ymm11
+        vcmplt_oqpd %ymm3, %ymm10, %ymm3
+        vandpd    Bias+__svml_dasinh_data_internal(%rip), %ymm3, %ymm14
+        vorpd     Bias1+__svml_dasinh_data_internal(%rip), %ymm14, %ymm15
+        vsubpd    %ymm15, %ymm1, %ymm1
+        vmulpd    L2+__svml_dasinh_data_internal(%rip), %ymm1, %ymm3
+
+/* polynomial */
+        vmovupd   poly_coeff+__svml_dasinh_data_internal(%rip), %ymm1
+        vfmadd213pd poly_coeff+32+__svml_dasinh_data_internal(%rip), %ymm5, %ymm1
+        vfmadd213pd %ymm2, %ymm4, %ymm1
+
+/* reconstruction */
+        vfmadd213pd %ymm5, %ymm4, %ymm1
+        vextractf128 $1, %ymm11, %xmm7
+        vmovd     %xmm11, %edx
+        vmovd     %xmm7, %esi
+        movslq    %edx, %rdx
+        vpextrd   $2, %xmm11, %ecx
+        movslq    %esi, %rsi
+        vpextrd   $2, %xmm7, %edi
+        movslq    %ecx, %rcx
+        movslq    %edi, %rdi
+        vmovsd    (%r8,%rdx), %xmm0
+        vmovsd    (%r8,%rsi), %xmm8
+        vmovhpd   (%r8,%rcx), %xmm0, %xmm6
+        vmovhpd   (%r8,%rdi), %xmm8, %xmm9
+        vinsertf128 $1, %xmm9, %ymm6, %ymm0
+        vaddpd    %ymm1, %ymm0, %ymm0
+        vaddpd    %ymm0, %ymm3, %ymm7
+
+/* Finally, reincorporate the original sign. */
+        vandpd    dSign+__svml_dasinh_data_internal(%rip), %ymm13, %ymm6
+        vxorpd    %ymm7, %ymm6, %ymm0
+        testl     %eax, %eax
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm13
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovupd   %ymm13, 32(%rsp)
+        vmovupd   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 eax ymm0
+
+        xorl      %edx, %edx
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovupd   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     32(%rsp,%r14,8), %xmm0
+        call      asinh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 64(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN4v_asinh_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_dasinh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
+        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
+        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
+        __declspec(align(32)) VUINT32 ExpMask[4][2];
+        __declspec(align(32)) VUINT32 Two10[4][2];
+        __declspec(align(32)) VUINT32 MinLog1p[4][2];
+        __declspec(align(32)) VUINT32 MaxLog1p[4][2];
+        __declspec(align(32)) VUINT32 One[4][2];
+        __declspec(align(32)) VUINT32 SgnMask[4][2];
+        __declspec(align(32)) VUINT32 XThreshold[4][2];
+        __declspec(align(32)) VUINT32 XhMask[4][2];
+        __declspec(align(32)) VUINT32 Threshold[4][2];
+        __declspec(align(32)) VUINT32 Bias[4][2];
+        __declspec(align(32)) VUINT32 Bias1[4][2];
+        __declspec(align(32)) VUINT32 ExpMask0[4][2];
+        __declspec(align(32)) VUINT32 ExpMask2[4][2];
+        __declspec(align(32)) VUINT32 L2[4][2];
+        __declspec(align(32)) VUINT32 dBigThreshold[4][2];
+        __declspec(align(32)) VUINT32 dC2[4][2];
+        __declspec(align(32)) VUINT32 dC3[4][2];
+        __declspec(align(32)) VUINT32 dC4[4][2];
+        __declspec(align(32)) VUINT32 dC5[4][2];
+        __declspec(align(32)) VUINT32 dHalf[4][2];
+        __declspec(align(32)) VUINT32 dLargestFinite[4][2];
+        __declspec(align(32)) VUINT32 dLittleThreshold[4][2];
+        __declspec(align(32)) VUINT32 dSign[4][2];
+        __declspec(align(32)) VUINT32 dThirtyOne[4][2];
+        __declspec(align(32)) VUINT32 dTopMask12[4][2];
+        __declspec(align(32)) VUINT32 dTopMask29[4][2];
+        __declspec(align(32)) VUINT32 XScale[4][2];
+} __svml_dasinh_data_internal;
+#endif
+__svml_dasinh_data_internal:
+        /* Log_HA_table */
+        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
+        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
+        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
+        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
+        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
+        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
+        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
+        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
+        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
+        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
+        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
+        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
+        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
+        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
+        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
+        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
+        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
+        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
+        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
+        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
+        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
+        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
+        .quad 0xc086238206e94218, 0xbe1ceee898588610
+        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
+        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
+        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
+        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
+        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
+        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
+        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
+        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
+        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
+        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
+        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
+        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
+        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
+        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
+        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
+        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
+        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
+        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
+        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
+        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
+        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
+        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
+        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
+        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
+        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
+        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
+        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
+        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
+        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
+        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
+        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
+        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
+        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
+        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
+        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
+        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
+        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
+        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
+        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
+        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
+        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
+        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
+        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
+        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
+        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
+        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
+        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
+        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
+        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
+        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
+        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
+        .quad 0xc086244055d2c968, 0xbe1cef345284c119
+        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
+        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
+        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
+        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
+        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
+        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
+        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
+        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
+        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
+        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
+        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
+        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
+        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
+        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
+        .quad 0xc086247419475160, 0xbe1cf03dd9922331
+        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
+        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
+        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
+        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
+        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
+        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
+        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
+        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
+        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
+        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
+        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
+        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
+        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
+        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
+        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
+        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
+        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
+        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
+        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
+        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
+        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
+        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
+        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
+        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
+        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
+        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
+        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
+        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
+        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
+        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
+        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
+        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
+        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
+        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
+        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
+        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
+        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
+        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
+        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
+        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
+        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
+        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
+        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
+        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
+        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
+        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
+        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
+        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
+        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
+        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
+        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
+        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
+        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
+        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
+        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
+        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
+        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
+        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
+        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
+        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
+        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
+        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
+        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
+        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
+        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
+        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
+        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
+        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
+        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
+        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
+        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
+        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
+        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
+        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
+        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
+        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
+        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
+        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
+        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
+        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
+        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
+        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
+        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
+        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
+        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
+        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
+        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
+        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
+        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
+        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
+        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
+        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
+        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
+        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
+        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
+        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
+        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
+        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
+        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
+        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
+        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
+        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
+        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
+        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
+        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
+        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
+        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
+        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
+        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
+        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
+        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
+        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
+        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
+        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
+        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
+        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
+        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
+        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
+        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
+        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
+        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
+        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
+        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
+        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
+        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
+        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
+        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
+        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
+        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
+        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
+        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
+        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
+        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
+        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
+        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
+        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
+        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
+        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
+        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
+        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
+        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
+        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
+        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
+        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
+        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
+        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
+        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
+        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
+        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
+        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
+        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
+        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
+        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
+        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
+        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
+        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
+        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
+        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
+        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
+        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
+        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
+        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
+        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
+        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
+        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
+        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
+        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
+        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
+        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
+        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
+        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
+        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
+        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
+        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
+        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
+        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
+        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
+        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
+        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
+        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
+        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
+        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
+        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
+        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
+        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
+        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
+        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
+        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
+        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
+        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
+        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
+        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
+        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
+        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
+        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
+        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
+        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
+        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
+        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
+        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
+        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
+        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
+        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
+        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
+        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
+        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
+        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
+        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
+        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
+        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
+        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
+        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
+        .quad 0xc08626e164224880, 0xbe1ceeb431709788
+        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
+        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
+        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
+        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
+        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
+        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
+        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
+        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
+        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
+        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
+        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
+        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
+        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
+        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
+        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
+        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
+        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
+        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
+        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
+        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
+        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
+        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
+        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
+        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
+        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
+        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
+        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
+        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
+        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
+        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
+        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
+        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
+        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
+        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
+        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
+        .quad 0xc086273a05367688, 0xbe1cf18656c50806
+        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
+        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
+        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
+        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
+        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
+        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
+        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
+        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
+        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
+        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
+        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
+        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
+        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
+        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
+        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
+        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
+        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
+        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
+        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
+        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
+        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
+        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
+        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
+        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
+        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
+        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
+        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
+        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
+        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
+        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
+        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
+        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
+        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
+        .quad 0xc086278a58297918, 0xbe1cf053073872bf
+        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
+        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
+        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
+        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
+        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
+        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
+        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
+        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
+        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
+        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
+        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
+        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
+        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
+        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
+        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
+        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
+        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
+        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
+        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
+        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
+        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
+        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
+        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
+        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
+        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
+        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
+        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
+        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
+        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
+        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
+        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
+        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
+        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
+        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
+        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
+        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
+        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
+        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
+        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
+        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
+        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
+        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
+        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
+        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
+        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
+        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
+        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
+        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
+        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
+        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
+        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
+        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
+        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
+        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
+        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
+        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
+        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
+        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
+        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
+        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
+        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
+        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
+        .quad 0xc086281755366778, 0xbe1cef2edae5837d
+        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
+        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
+        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
+        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
+        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
+        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
+        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
+        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
+        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
+        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
+        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
+        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
+        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
+        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
+        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
+        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
+        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
+        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
+        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
+        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
+        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
+        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
+        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
+        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
+        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
+        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
+        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
+        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
+        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
+        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
+        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
+        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
+        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
+        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
+        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
+        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
+        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
+        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
+        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
+        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
+        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
+        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
+        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
+        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
+        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
+        .quad 0xc086287879041490, 0xbe1cf034803c8a48
+        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
+        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
+        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
+        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
+        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
+        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
+        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
+        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
+        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
+        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
+        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
+        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
+        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
+        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
+        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
+        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
+        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
+        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
+        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
+        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
+        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
+        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
+        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
+        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
+        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
+        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
+        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
+        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
+        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
+        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
+        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
+        /*== Log_LA_table ==*/
+        .align 32
+        .quad 0x8000000000000000
+        .quad 0xbf5ff802a9ab10e6
+        .quad 0xbf6ff00aa2b10bc0
+        .quad 0xbf77ee11ebd82e94
+        .quad 0xbf7fe02a6b106789
+        .quad 0xbf83e7295d25a7d9
+        .quad 0xbf87dc475f810a77
+        .quad 0xbf8bcf712c74384c
+        .quad 0xbf8fc0a8b0fc03e4
+        .quad 0xbf91d7f7eb9eebe7
+        .quad 0xbf93cea44346a575
+        .quad 0xbf95c45a51b8d389
+        .quad 0xbf97b91b07d5b11b
+        .quad 0xbf99ace7551cc514
+        .quad 0xbf9b9fc027af9198
+        .quad 0xbf9d91a66c543cc4
+        .quad 0xbf9f829b0e783300
+        .quad 0xbfa0b94f7c196176
+        .quad 0xbfa1b0d98923d980
+        .quad 0xbfa2a7ec2214e873
+        .quad 0xbfa39e87b9febd60
+        .quad 0xbfa494acc34d911c
+        .quad 0xbfa58a5bafc8e4d5
+        .quad 0xbfa67f94f094bd98
+        .quad 0xbfa77458f632dcfc
+        .quad 0xbfa868a83083f6cf
+        .quad 0xbfa95c830ec8e3eb
+        .quad 0xbfaa4fe9ffa3d235
+        .quad 0xbfab42dd711971bf
+        .quad 0xbfac355dd0921f2d
+        .quad 0xbfad276b8adb0b52
+        .quad 0xbfae19070c276016
+        .quad 0xbfaf0a30c01162a6
+        .quad 0xbfaffae9119b9303
+        .quad 0xbfb075983598e471
+        .quad 0xbfb0ed839b5526fe
+        .quad 0xbfb16536eea37ae1
+        .quad 0xbfb1dcb263db1944
+        .quad 0xbfb253f62f0a1417
+        .quad 0xbfb2cb0283f5de1f
+        .quad 0xbfb341d7961bd1d1
+        .quad 0xbfb3b87598b1b6ee
+        .quad 0xbfb42edcbea646f0
+        .quad 0xbfb4a50d3aa1b040
+        .quad 0xbfb51b073f06183f
+        .quad 0xbfb590cafdf01c28
+        .quad 0xbfb60658a93750c4
+        .quad 0xbfb67bb0726ec0fc
+        .quad 0xbfb6f0d28ae56b4c
+        .quad 0xbfb765bf23a6be13
+        .quad 0xbfb7da766d7b12cd
+        .quad 0xbfb84ef898e8282a
+        .quad 0xbfb8c345d6319b21
+        .quad 0xbfb9375e55595ede
+        .quad 0xbfb9ab42462033ad
+        .quad 0xbfba1ef1d8061cd4
+        .quad 0xbfba926d3a4ad563
+        .quad 0xbfbb05b49bee43fe
+        .quad 0xbfbb78c82bb0eda1
+        .quad 0xbfbbeba818146765
+        .quad 0xbfbc5e548f5bc743
+        .quad 0xbfbcd0cdbf8c13e1
+        .quad 0xbfbd4313d66cb35d
+        .quad 0xbfbdb5270187d927
+        .quad 0xbfbe27076e2af2e6
+        .quad 0xbfbe98b549671467
+        .quad 0xbfbf0a30c01162a6
+        .quad 0xbfbf7b79fec37ddf
+        .quad 0xbfbfec9131dbeabb
+        .quad 0xbfc02ebb42bf3d4b
+        .quad 0xbfc0671512ca596e
+        .quad 0xbfc09f561ee719c3
+        .quad 0xbfc0d77e7cd08e59
+        .quad 0xbfc10f8e422539b1
+        .quad 0xbfc14785846742ac
+        .quad 0xbfc17f6458fca611
+        .quad 0xbfc1b72ad52f67a0
+        .quad 0xbfc1eed90e2dc2c3
+        .quad 0xbfc2266f190a5acb
+        .quad 0xbfc25ded0abc6ad2
+        .quad 0xbfc29552f81ff523
+        .quad 0xbfc2cca0f5f5f251
+        .quad 0xbfc303d718e47fd3
+        .quad 0xbfc33af575770e4f
+        .quad 0xbfc371fc201e8f74
+        .quad 0xbfc3a8eb2d31a376
+        .quad 0xbfc3dfc2b0ecc62a
+        .quad 0xbfc41682bf727bc0
+        .quad 0xbfc44d2b6ccb7d1e
+        .quad 0xbfc483bccce6e3dd
+        .quad 0xbfc4ba36f39a55e5
+        .quad 0xbfc4f099f4a230b2
+        .quad 0xbfc526e5e3a1b438
+        .quad 0xbfc55d1ad4232d6f
+        .quad 0xbfc59338d9982086
+        .quad 0xbfc5c940075972b9
+        .quad 0xbfc5ff3070a793d4
+        .quad 0xbfc6350a28aaa758
+        .quad 0xbfc66acd4272ad51
+        .quad 0xbfc6a079d0f7aad2
+        .quad 0xbfc6d60fe719d21d
+        .quad 0xbfc70b8f97a1aa75
+        .quad 0xbfc740f8f54037a5
+        .quad 0xbfc7764c128f2127
+        .quad 0xbfc7ab890210d909
+        .quad 0xbfc7e0afd630c274
+        .quad 0xbfc815c0a14357eb
+        .quad 0xbfc84abb75865139
+        .quad 0xbfc87fa06520c911
+        .quad 0xbfc8b46f8223625b
+        .quad 0xbfc8e928de886d41
+        .quad 0xbfc91dcc8c340bde
+        .quad 0xbfc9525a9cf456b4
+        .quad 0xbfc986d3228180ca
+        .quad 0xbfc9bb362e7dfb83
+        .quad 0xbfc9ef83d2769a34
+        .quad 0xbfca23bc1fe2b563
+        .quad 0xbfca57df28244dcd
+        .quad 0xbfca8becfc882f19
+        .quad 0xbfcabfe5ae46124c
+        .quad 0xbfcaf3c94e80bff3
+        .quad 0xbfcb2797ee46320c
+        .quad 0xbfcb5b519e8fb5a4
+        .quad 0xbfcb8ef670420c3b
+        .quad 0xbfcbc286742d8cd6
+        .quad 0xbfcbf601bb0e44e2
+        .quad 0xbfcc2968558c18c1
+        .quad 0xbfcc5cba543ae425
+        .quad 0xbfcc8ff7c79a9a22
+        .quad 0xbfccc320c0176502
+        .quad 0xbfccf6354e09c5dc
+        .quad 0xbfcd293581b6b3e7
+        .quad 0xbfcd5c216b4fbb91
+        .quad 0xbfcd8ef91af31d5e
+        .quad 0xbfcdc1bca0abec7d
+        .quad 0xbfcdf46c0c722d2f
+        .quad 0xbfce27076e2af2e6
+        .quad 0xbfce598ed5a87e2f
+        .quad 0xbfce8c0252aa5a60
+        .quad 0xbfcebe61f4dd7b0b
+        .quad 0xbfcef0adcbdc5936
+        .quad 0xbfcf22e5e72f105d
+        .quad 0xbfcf550a564b7b37
+        .quad 0xbfcf871b28955045
+        .quad 0xbfcfb9186d5e3e2b
+        .quad 0xbfcfeb0233e607cc
+        .quad 0xbfd00e6c45ad501d
+        .quad 0xbfd0274dc16c232f
+        .quad 0xbfd0402594b4d041
+        .quad 0xbfd058f3c703ebc6
+        .quad 0xbfd071b85fcd590d
+        .quad 0xbfd08a73667c57af
+        .quad 0xbfd0a324e27390e3
+        .quad 0xbfd0bbccdb0d24bd
+        .quad 0xbfd0d46b579ab74b
+        .quad 0xbfd0ed005f657da4
+        .quad 0xbfd1058bf9ae4ad5
+        .quad 0xbfd11e0e2dad9cb7
+        .quad 0xbfd136870293a8b0
+        .quad 0xbfd14ef67f88685a
+        .quad 0xbfd1675cababa60e
+        .quad 0xbfd17fb98e15095d
+        .quad 0xbfd1980d2dd4236f
+        .quad 0xbfd1b05791f07b49
+        .quad 0xbfd1c898c16999fb
+        .quad 0xbfd1e0d0c33716be
+        .quad 0xbfd1f8ff9e48a2f3
+        .quad 0xbfd211255986160c
+        .quad 0xbfd22941fbcf7966
+        .quad 0xbfd241558bfd1404
+        .quad 0xbfd2596010df763a
+        .quad 0xbfd27161913f853d
+        .quad 0xbfd2895a13de86a3
+        .quad 0xbfd2a1499f762bc9
+        .quad 0xbfd2b9303ab89d25
+        .quad 0xbfd2d10dec508583
+        .quad 0xbfd2e8e2bae11d31
+        .quad 0xbfd300aead06350c
+        .quad 0xbfd31871c9544185
+        .quad 0xbfd3302c16586588
+        .quad 0xbfd347dd9a987d55
+        .quad 0xbfd35f865c93293e
+        .quad 0xbfd3772662bfd85b
+        .quad 0xbfd38ebdb38ed321
+        .quad 0xbfd3a64c556945ea
+        .quad 0xbfd3bdd24eb14b6a
+        .quad 0xbfd3d54fa5c1f710
+        .quad 0xbfd3ecc460ef5f50
+        .quad 0xbfd404308686a7e4
+        .quad 0xbfd41b941cce0bee
+        .quad 0xbfd432ef2a04e814
+        .quad 0xbfd44a41b463c47c
+        .quad 0xbfd4618bc21c5ec2
+        .quad 0xbfd478cd5959b3d9
+        .quad 0xbfd49006804009d1
+        .quad 0xbfd4a7373cecf997
+        .quad 0xbfd4be5f957778a1
+        .quad 0xbfd4d57f8fefe27f
+        .quad 0xbfd4ec973260026a
+        .quad 0xbfd503a682cb1cb3
+        .quad 0xbfd51aad872df82d
+        .quad 0xbfd531ac457ee77e
+        .quad 0xbfd548a2c3add263
+        .quad 0xbfd55f9107a43ee2
+        .quad 0xbfd5767717455a6c
+        .quad 0xbfd58d54f86e02f2
+        .quad 0xbfd5a42ab0f4cfe2
+        .quad 0xbfd5baf846aa1b19
+        .quad 0xbfd5d1bdbf5809ca
+        .quad 0xbfd5e87b20c2954a
+        .quad 0xbfd5ff3070a793d4
+        .quad 0xbfd615ddb4bec13c
+        .quad 0xbfd62c82f2b9c795
+        .quad 0x3fd61965cdb02c1f
+        .quad 0x3fd602d08af091ec
+        .quad 0x3fd5ec433d5c35ae
+        .quad 0x3fd5d5bddf595f30
+        .quad 0x3fd5bf406b543db2
+        .quad 0x3fd5a8cadbbedfa1
+        .quad 0x3fd5925d2b112a59
+        .quad 0x3fd57bf753c8d1fb
+        .quad 0x3fd565995069514c
+        .quad 0x3fd54f431b7be1a9
+        .quad 0x3fd538f4af8f72fe
+        .quad 0x3fd522ae0738a3d8
+        .quad 0x3fd50c6f1d11b97c
+        .quad 0x3fd4f637ebba9810
+        .quad 0x3fd4e0086dd8baca
+        .quad 0x3fd4c9e09e172c3c
+        .quad 0x3fd4b3c077267e9a
+        .quad 0x3fd49da7f3bcc41f
+        .quad 0x3fd487970e958770
+        .quad 0x3fd4718dc271c41b
+        .quad 0x3fd45b8c0a17df13
+        .quad 0x3fd44591e0539f49
+        .quad 0x3fd42f9f3ff62642
+        .quad 0x3fd419b423d5e8c7
+        .quad 0x3fd403d086cea79c
+        .quad 0x3fd3edf463c1683e
+        .quad 0x3fd3d81fb5946dba
+        .quad 0x3fd3c25277333184
+        .quad 0x3fd3ac8ca38e5c5f
+        .quad 0x3fd396ce359bbf54
+        .quad 0x3fd3811728564cb2
+        .quad 0x3fd36b6776be1117
+        .quad 0x3fd355bf1bd82c8b
+        .quad 0x3fd3401e12aecba1
+        .quad 0x3fd32a84565120a8
+        .quad 0x3fd314f1e1d35ce4
+        .quad 0x3fd2ff66b04ea9d4
+        .quad 0x3fd2e9e2bce12286
+        .quad 0x3fd2d46602adccee
+        .quad 0x3fd2bef07cdc9354
+        .quad 0x3fd2a982269a3dbf
+        .quad 0x3fd2941afb186b7c
+        .quad 0x3fd27ebaf58d8c9d
+        .quad 0x3fd269621134db92
+        .quad 0x3fd25410494e56c7
+        .quad 0x3fd23ec5991eba49
+        .quad 0x3fd22981fbef797b
+        .quad 0x3fd214456d0eb8d4
+        .quad 0x3fd1ff0fe7cf47a7
+        .quad 0x3fd1e9e1678899f4
+        .quad 0x3fd1d4b9e796c245
+        .quad 0x3fd1bf99635a6b95
+        .quad 0x3fd1aa7fd638d33f
+        .quad 0x3fd1956d3b9bc2fa
+        .quad 0x3fd180618ef18adf
+        .quad 0x3fd16b5ccbacfb73
+        .quad 0x3fd1565eed455fc3
+        .quad 0x3fd14167ef367783
+        .quad 0x3fd12c77cd00713b
+        .quad 0x3fd1178e8227e47c
+        .quad 0x3fd102ac0a35cc1c
+        .quad 0x3fd0edd060b78081
+        .quad 0x3fd0d8fb813eb1ef
+        .quad 0x3fd0c42d676162e3
+        .quad 0x3fd0af660eb9e279
+        .quad 0x3fd09aa572e6c6d4
+        .quad 0x3fd085eb8f8ae797
+        .quad 0x3fd07138604d5862
+        .quad 0x3fd05c8be0d9635a
+        .quad 0x3fd047e60cde83b8
+        .quad 0x3fd03346e0106062
+        .quad 0x3fd01eae5626c691
+        .quad 0x3fd00a1c6adda473
+        .quad 0x3fcfeb2233ea07cd
+        .quad 0x3fcfc218be620a5e
+        .quad 0x3fcf991c6cb3b379
+        .quad 0x3fcf702d36777df0
+        .quad 0x3fcf474b134df229
+        .quad 0x3fcf1e75fadf9bde
+        .quad 0x3fcef5ade4dcffe6
+        .quad 0x3fceccf2c8fe920a
+        .quad 0x3fcea4449f04aaf5
+        .quad 0x3fce7ba35eb77e2a
+        .quad 0x3fce530effe71012
+        .quad 0x3fce2a877a6b2c12
+        .quad 0x3fce020cc6235ab5
+        .quad 0x3fcdd99edaf6d7e9
+        .quad 0x3fcdb13db0d48940
+        .quad 0x3fcd88e93fb2f450
+        .quad 0x3fcd60a17f903515
+        .quad 0x3fcd38666871f465
+        .quad 0x3fcd1037f2655e7b
+        .quad 0x3fcce816157f1988
+        .quad 0x3fccc000c9db3c52
+        .quad 0x3fcc97f8079d44ec
+        .quad 0x3fcc6ffbc6f00f71
+        .quad 0x3fcc480c0005ccd1
+        .quad 0x3fcc2028ab17f9b4
+        .quad 0x3fcbf851c067555f
+        .quad 0x3fcbd087383bd8ad
+        .quad 0x3fcba8c90ae4ad19
+        .quad 0x3fcb811730b823d2
+        .quad 0x3fcb5971a213acdb
+        .quad 0x3fcb31d8575bce3d
+        .quad 0x3fcb0a4b48fc1b46
+        .quad 0x3fcae2ca6f672bd4
+        .quad 0x3fcabb55c31693ad
+        .quad 0x3fca93ed3c8ad9e3
+        .quad 0x3fca6c90d44b704e
+        .quad 0x3fca454082e6ab05
+        .quad 0x3fca1dfc40f1b7f1
+        .quad 0x3fc9f6c407089664
+        .quad 0x3fc9cf97cdce0ec3
+        .quad 0x3fc9a8778debaa38
+        .quad 0x3fc981634011aa75
+        .quad 0x3fc95a5adcf7017f
+        .quad 0x3fc9335e5d594989
+        .quad 0x3fc90c6db9fcbcd9
+        .quad 0x3fc8e588ebac2dbf
+        .quad 0x3fc8beafeb38fe8c
+        .quad 0x3fc897e2b17b19a5
+        .quad 0x3fc871213750e994
+        .quad 0x3fc84a6b759f512f
+        .quad 0x3fc823c16551a3c2
+        .quad 0x3fc7fd22ff599d4f
+        .quad 0x3fc7d6903caf5ad0
+        .quad 0x3fc7b0091651528c
+        .quad 0x3fc7898d85444c73
+        .quad 0x3fc7631d82935a86
+        .quad 0x3fc73cb9074fd14d
+        .quad 0x3fc716600c914054
+        .quad 0x3fc6f0128b756abc
+        .quad 0x3fc6c9d07d203fc7
+        .quad 0x3fc6a399dabbd383
+        .quad 0x3fc67d6e9d785771
+        .quad 0x3fc6574ebe8c133a
+        .quad 0x3fc6313a37335d76
+        .quad 0x3fc60b3100b09476
+        .quad 0x3fc5e533144c1719
+        .quad 0x3fc5bf406b543db2
+        .quad 0x3fc59958ff1d52f1
+        .quad 0x3fc5737cc9018cdd
+        .quad 0x3fc54dabc26105d2
+        .quad 0x3fc527e5e4a1b58d
+        .quad 0x3fc5022b292f6a45
+        .quad 0x3fc4dc7b897bc1c8
+        .quad 0x3fc4b6d6fefe22a4
+        .quad 0x3fc4913d8333b561
+        .quad 0x3fc46baf0f9f5db7
+        .quad 0x3fc4462b9dc9b3dc
+        .quad 0x3fc420b32740fdd4
+        .quad 0x3fc3fb45a59928cc
+        .quad 0x3fc3d5e3126bc27f
+        .quad 0x3fc3b08b6757f2a9
+        .quad 0x3fc38b3e9e027479
+        .quad 0x3fc365fcb0159016
+        .quad 0x3fc340c59741142e
+        .quad 0x3fc31b994d3a4f85
+        .quad 0x3fc2f677cbbc0a96
+        .quad 0x3fc2d1610c86813a
+        .quad 0x3fc2ac55095f5c59
+        .quad 0x3fc28753bc11aba5
+        .quad 0x3fc2625d1e6ddf57
+        .quad 0x3fc23d712a49c202
+        .quad 0x3fc2188fd9807263
+        .quad 0x3fc1f3b925f25d41
+        .quad 0x3fc1ceed09853752
+        .quad 0x3fc1aa2b7e23f72a
+        .quad 0x3fc185747dbecf34
+        .quad 0x3fc160c8024b27b1
+        .quad 0x3fc13c2605c398c3
+        .quad 0x3fc1178e8227e47c
+        .quad 0x3fc0f301717cf0fb
+        .quad 0x3fc0ce7ecdccc28d
+        .quad 0x3fc0aa06912675d5
+        .quad 0x3fc08598b59e3a07
+        .quad 0x3fc06135354d4b18
+        .quad 0x3fc03cdc0a51ec0d
+        .quad 0x3fc0188d2ecf6140
+        .quad 0x3fbfe89139dbd566
+        .quad 0x3fbfa01c9db57ce2
+        .quad 0x3fbf57bc7d9005db
+        .quad 0x3fbf0f70cdd992e3
+        .quad 0x3fbec739830a1120
+        .quad 0x3fbe7f1691a32d3e
+        .quad 0x3fbe3707ee30487b
+        .quad 0x3fbdef0d8d466db9
+        .quad 0x3fbda727638446a2
+        .quad 0x3fbd5f55659210e2
+        .quad 0x3fbd179788219364
+        .quad 0x3fbccfedbfee13a8
+        .quad 0x3fbc885801bc4b23
+        .quad 0x3fbc40d6425a5cb1
+        .quad 0x3fbbf968769fca11
+        .quad 0x3fbbb20e936d6974
+        .quad 0x3fbb6ac88dad5b1c
+        .quad 0x3fbb23965a52ff00
+        .quad 0x3fbadc77ee5aea8c
+        .quad 0x3fba956d3ecade63
+        .quad 0x3fba4e7640b1bc38
+        .quad 0x3fba0792e9277cac
+        .quad 0x3fb9c0c32d4d2548
+        .quad 0x3fb97a07024cbe74
+        .quad 0x3fb9335e5d594989
+        .quad 0x3fb8ecc933aeb6e8
+        .quad 0x3fb8a6477a91dc29
+        .quad 0x3fb85fd927506a48
+        .quad 0x3fb8197e2f40e3f0
+        .quad 0x3fb7d33687c293c9
+        .quad 0x3fb78d02263d82d3
+        .quad 0x3fb746e100226ed9
+        .quad 0x3fb700d30aeac0e1
+        .quad 0x3fb6bad83c1883b6
+        .quad 0x3fb674f089365a7a
+        .quad 0x3fb62f1be7d77743
+        .quad 0x3fb5e95a4d9791cb
+        .quad 0x3fb5a3abb01ade25
+        .quad 0x3fb55e10050e0384
+        .quad 0x3fb518874226130a
+        .quad 0x3fb4d3115d207eac
+        .quad 0x3fb48dae4bc31018
+        .quad 0x3fb4485e03dbdfad
+        .quad 0x3fb403207b414b7f
+        .quad 0x3fb3bdf5a7d1ee64
+        .quad 0x3fb378dd7f749714
+        .quad 0x3fb333d7f8183f4b
+        .quad 0x3fb2eee507b40301
+        .quad 0x3fb2aa04a44717a5
+        .quad 0x3fb26536c3d8c369
+        .quad 0x3fb2207b5c78549e
+        .quad 0x3fb1dbd2643d190b
+        .quad 0x3fb1973bd1465567
+        .quad 0x3fb152b799bb3cc9
+        .quad 0x3fb10e45b3cae831
+        .quad 0x3fb0c9e615ac4e17
+        .quad 0x3fb08598b59e3a07
+        .quad 0x3fb0415d89e74444
+        .quad 0x3faffa6911ab9301
+        .quad 0x3faf723b517fc523
+        .quad 0x3faeea31c006b87c
+        .quad 0x3fae624c4a0b5e1b
+        .quad 0x3fadda8adc67ee4e
+        .quad 0x3fad52ed6405d86f
+        .quad 0x3faccb73cdddb2cc
+        .quad 0x3fac441e06f72a9e
+        .quad 0x3fabbcebfc68f420
+        .quad 0x3fab35dd9b58baad
+        .quad 0x3faaaef2d0fb10fc
+        .quad 0x3faa282b8a936171
+        .quad 0x3fa9a187b573de7c
+        .quad 0x3fa91b073efd7314
+        .quad 0x3fa894aa149fb343
+        .quad 0x3fa80e7023d8ccc4
+        .quad 0x3fa788595a3577ba
+        .quad 0x3fa70265a550e777
+        .quad 0x3fa67c94f2d4bb58
+        .quad 0x3fa5f6e73078efb8
+        .quad 0x3fa5715c4c03ceef
+        .quad 0x3fa4ebf43349e26f
+        .quad 0x3fa466aed42de3ea
+        .quad 0x3fa3e18c1ca0ae92
+        .quad 0x3fa35c8bfaa1306b
+        .quad 0x3fa2d7ae5c3c5bae
+        .quad 0x3fa252f32f8d183f
+        .quad 0x3fa1ce5a62bc353a
+        .quad 0x3fa149e3e4005a8d
+        .quad 0x3fa0c58fa19dfaaa
+        .quad 0x3fa0415d89e74444
+        .quad 0x3f9f7a9b16782856
+        .quad 0x3f9e72bf2813ce51
+        .quad 0x3f9d6b2725979802
+        .quad 0x3f9c63d2ec14aaf2
+        .quad 0x3f9b5cc258b718e6
+        .quad 0x3f9a55f548c5c43f
+        .quad 0x3f994f6b99a24475
+        .quad 0x3f98492528c8cabf
+        .quad 0x3f974321d3d006d3
+        .quad 0x3f963d6178690bd6
+        .quad 0x3f9537e3f45f3565
+        .quad 0x3f9432a925980cc1
+        .quad 0x3f932db0ea132e22
+        .quad 0x3f9228fb1fea2e28
+        .quad 0x3f912487a5507f70
+        .quad 0x3f90205658935847
+        .quad 0x3f8e38ce3033310c
+        .quad 0x3f8c317384c75f06
+        .quad 0x3f8a2a9c6c170462
+        .quad 0x3f882448a388a2aa
+        .quad 0x3f861e77e8b53fc6
+        .quad 0x3f841929f96832f0
+        .quad 0x3f82145e939ef1e9
+        .quad 0x3f8010157588de71
+        .quad 0x3f7c189cbb0e27fb
+        .quad 0x3f78121214586b54
+        .quad 0x3f740c8a747878e2
+        .quad 0x3f70080559588b35
+        .quad 0x3f680904828985c0
+        .quad 0x3f60040155d5889e
+        .quad 0x3f50020055655889
+        .quad 0x0000000000000000
+        /*== poly_coeff[4] ==*/
+        .align 32
+        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
+        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
+        .quad 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
+        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
+        /*== Two10 ==*/
+        .align 32
+        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
+        /*== MinLog1p = -1+2^(-53) ==*/
+        .align 32
+        .quad 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff
+        /*== MaxLog1p ==*/
+        .align 32
+        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000
+        /*== One ==*/
+        .align 32
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== SgnMask ==*/
+        .align 32
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== XThreshold ==*/
+        .align 32
+        .quad 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000
+        /*== XhMask ==*/
+        .align 32
+        .quad 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00
+        /*== Threshold ==*/
+        .align 32
+        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
+        /*== Bias ==*/
+        .align 32
+        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
+        /*== Bias1 ==*/
+        .align 32
+        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
+        /*== ExpMask ==*/
+        .align 32
+        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000
+        /*== ExpMask2 ==*/
+        .align 32
+        .quad 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000
+        /*== L2L ==*/
+        .align 32
+        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
+        /*== dBigThreshold ==*/
+        .align 32
+        .quad 0x41D0000000000000, 0x41D0000000000000, 0x41D0000000000000, 0x41D0000000000000
+        /*== dC2 ==*/
+        .align 32
+        .quad 0x3FD8000000000000, 0x3FD8000000000000, 0x3FD8000000000000, 0x3FD8000000000000
+        /*== dC3 ==*/
+        .align 32
+        .quad 0x3FD4000000000000, 0x3FD4000000000000, 0x3FD4000000000000, 0x3FD4000000000000
+        /*== dC4 ==*/
+        .align 32
+        .quad 0x3FD1800000000000, 0x3FD1800000000000, 0x3FD1800000000000, 0x3FD1800000000000
+        /*== dC5 ==*/
+        .align 32
+        .quad 0x3FCF800000000000, 0x3FCF800000000000, 0x3FCF800000000000, 0x3FCF800000000000
+        /*== dHalf ==*/
+        .align 32
+        .quad 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000
+        /*== dLargestFinite ==*/
+        .align 32
+        .quad 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF
+        /*== dLittleThreshold ==*/
+        .align 32
+        .quad 0x3F60000000000000, 0x3F60000000000000, 0x3F60000000000000, 0x3F60000000000000
+        /*== dSign ==*/
+        .align 32
+        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
+        /*== dThirtyOne ==*/
+        .align 32
+        .quad 0x403F000000000000, 0x403F000000000000, 0x403F000000000000, 0x403F000000000000
+        /*== dTopMask12 ==*/
+        .align 32
+        .quad 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000
+        /*== dTopMask29 ==*/
+        .align 32
+        .quad 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000
+        /*== XScale ==*/
+        .align 32
+        .quad 0x3E10000000000000, 0x3E10000000000000, 0x3E10000000000000, 0x3E10000000000000
+        .align 32
+        .type	__svml_dasinh_data_internal,@object
+        .size	__svml_dasinh_data_internal,.-__svml_dasinh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core-avx2.S
new file mode 100644
index 0000000000..647c73292c
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized asinh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN8v_asinh _ZGVeN8v_asinh_avx2_wrapper
+#include "../svml_d_asinh8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core.c
new file mode 100644
index 0000000000..45e5ab72a6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core.c
@@ -0,0 +1,27 @@
+/* Multiple versions of vectorized asinh, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN8v_asinh
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN8v_asinh, __GI__ZGVeN8v_asinh, __redirect__ZGVeN8v_asinh)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core_avx512.S
new file mode 100644
index 0000000000..8100e8a50a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core_avx512.S
@@ -0,0 +1,510 @@
+/* Function asinh vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute asinh(x) as log(x + sqrt(x*x + 1))
+ *   using RSQRT instructions for starting the
+ *   square root approximation, and small table lookups for log
+ *   that map to AVX-512 permute instructions
+ *
+ *   Special cases:
+ *
+ *   asinh(NaN) = quiet NaN, and raise invalid exception
+ *   asinh(INF) = that INF
+ *   asinh(0)   = that 0
+ *
+ */
+
+/* Offsets for data table __svml_dasinh_data_internal_avx512
+ */
+#define Log_tbl_H                     	0
+#define Log_tbl_L                     	128
+#define One                           	256
+#define AbsMask                       	320
+#define SmallThreshold                	384
+#define Threshold                     	448
+#define LargeThreshold                	512
+#define ca2                           	576
+#define ca1                           	640
+#define c4s                           	704
+#define c3s                           	768
+#define c2s                           	832
+#define c1s                           	896
+#define AddB5                         	960
+#define RcpBitMask                    	1024
+#define OneEighth                     	1088
+#define Four                          	1152
+#define poly_coeff9                   	1216
+#define poly_coeff8                   	1280
+#define poly_coeff7                   	1344
+#define poly_coeff6                   	1408
+#define poly_coeff5                   	1472
+#define poly_coeff4                   	1536
+#define poly_coeff3                   	1600
+#define poly_coeff2                   	1664
+#define poly_coeff1                   	1728
+#define L2H                           	1792
+#define L2L                           	1856
+
+#include <sysdep.h>
+
+        .text
+	.section .text.evex512,"ax",@progbits
+ENTRY(_ZGVeN8v_asinh_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovaps   %zmm0, %zmm3
+
+/* x^2 */
+        vmulpd    {rn-sae}, %zmm3, %zmm3, %zmm14
+        vmovups   One+__svml_dasinh_data_internal_avx512(%rip), %zmm9
+
+/* polynomial computation for small inputs */
+        vmovups   ca2+__svml_dasinh_data_internal_avx512(%rip), %zmm10
+        vmovups   ca1+__svml_dasinh_data_internal_avx512(%rip), %zmm11
+
+/* not a very small input ? */
+        vmovups   SmallThreshold+__svml_dasinh_data_internal_avx512(%rip), %zmm0
+
+/* A=max(x^2, 1); */
+        vmaxpd    {sae}, %zmm14, %zmm9, %zmm4
+
+/* B=min(x^2, 1); */
+        vminpd    {sae}, %zmm14, %zmm9, %zmm5
+        vfmadd231pd {rn-sae}, %zmm14, %zmm10, %zmm11
+
+/* 1+x^2 */
+        vaddpd    {rn-sae}, %zmm9, %zmm14, %zmm8
+
+/* |input| */
+        vandpd    AbsMask+__svml_dasinh_data_internal_avx512(%rip), %zmm3, %zmm1
+        vrsqrt14pd %zmm8, %zmm6
+        vcmppd    $21, {sae}, %zmm0, %zmm1, %k2
+
+/* B_high */
+        vsubpd    {rn-sae}, %zmm4, %zmm8, %zmm7
+
+/* sign bit */
+        vxorpd    %zmm3, %zmm1, %zmm2
+        vmulpd    {rn-sae}, %zmm14, %zmm11, %zmm4
+
+/* B_low */
+        vsubpd    {rn-sae}, %zmm7, %zmm5, %zmm13
+        vmovups   c2s+__svml_dasinh_data_internal_avx512(%rip), %zmm5
+        vmovups   c1s+__svml_dasinh_data_internal_avx512(%rip), %zmm7
+
+/* polynomial computation for small inputs */
+        vfmadd213pd {rn-sae}, %zmm1, %zmm1, %zmm4
+
+/* (x^2)_low */
+        vmovaps   %zmm3, %zmm15
+        vfmsub213pd {rn-sae}, %zmm14, %zmm3, %zmm15
+
+/* Sh ~sqrt(1+x^2) */
+        vmulpd    {rn-sae}, %zmm6, %zmm8, %zmm14
+
+/* Yl = (x^2)_low + B_low */
+        vaddpd    {rn-sae}, %zmm15, %zmm13, %zmm13
+
+/* very large inputs ? */
+        vmovups   Threshold+__svml_dasinh_data_internal_avx512(%rip), %zmm15
+
+/* (Yh*R0)_low */
+        vfmsub213pd {rn-sae}, %zmm14, %zmm6, %zmm8
+        vcmppd    $21, {sae}, %zmm15, %zmm1, %k1
+
+/* Sl = (Yh*R0)_low+(R0*Yl) */
+        vfmadd213pd {rn-sae}, %zmm8, %zmm6, %zmm13
+        vmovups   LargeThreshold+__svml_dasinh_data_internal_avx512(%rip), %zmm8
+
+/* rel. error term: Eh=1-Sh*R0 */
+        vmovaps   %zmm9, %zmm12
+        vfnmadd231pd {rn-sae}, %zmm14, %zmm6, %zmm12
+        vcmppd    $22, {sae}, %zmm8, %zmm1, %k0
+
+/* rel. error term: Eh=(1-Sh*R0)-Sl*R0 */
+        vfnmadd231pd {rn-sae}, %zmm13, %zmm6, %zmm12
+
+/*
+ * sqrt(1+x^2) ~ Sh + Sl + Sh*Eh*poly_s
+ * poly_s = c1+c2*Eh+c3*Eh^2
+ */
+        vmovups   c4s+__svml_dasinh_data_internal_avx512(%rip), %zmm6
+        vmovups   c3s+__svml_dasinh_data_internal_avx512(%rip), %zmm8
+
+/* Sh*Eh */
+        vmulpd    {rn-sae}, %zmm12, %zmm14, %zmm11
+        vfmadd231pd {rn-sae}, %zmm12, %zmm6, %zmm8
+
+/* Sh+x */
+        vaddpd    {rn-sae}, %zmm1, %zmm14, %zmm6
+        kmovw     %k0, %edx
+        vfmadd213pd {rn-sae}, %zmm5, %zmm12, %zmm8
+        vfmadd213pd {rn-sae}, %zmm7, %zmm12, %zmm8
+
+/* Xh */
+        vsubpd    {rn-sae}, %zmm14, %zmm6, %zmm12
+
+/* Sl + Sh*Eh*poly_s */
+        vfmadd213pd {rn-sae}, %zmm13, %zmm8, %zmm11
+
+/* fixup for very large inputs */
+        vmovups   OneEighth+__svml_dasinh_data_internal_avx512(%rip), %zmm8
+
+/* Xl */
+        vsubpd    {rn-sae}, %zmm12, %zmm1, %zmm12
+
+/* Xin0+Sl+Sh*Eh*poly_s ~ x+sqrt(1+x^2) */
+        vaddpd    {rn-sae}, %zmm11, %zmm6, %zmm10
+
+/* Sl_high */
+        vsubpd    {rn-sae}, %zmm6, %zmm10, %zmm5
+        vmulpd    {rn-sae}, %zmm8, %zmm1, %zmm10{%k1}
+
+/* Table lookups */
+        vmovups   __svml_dasinh_data_internal_avx512(%rip), %zmm6
+
+/* Sl_l */
+        vsubpd    {rn-sae}, %zmm5, %zmm11, %zmm7
+        vrcp14pd  %zmm10, %zmm13
+
+/* Xin_low */
+        vaddpd    {rn-sae}, %zmm12, %zmm7, %zmm14
+        vmovups   Log_tbl_L+__svml_dasinh_data_internal_avx512(%rip), %zmm7
+        vmovups   poly_coeff6+__svml_dasinh_data_internal_avx512(%rip), %zmm12
+
+/* round reciprocal to 1+4b mantissas */
+        vpaddq    AddB5+__svml_dasinh_data_internal_avx512(%rip), %zmm13, %zmm11
+
+/* fixup for very large inputs */
+        vxorpd    %zmm14, %zmm14, %zmm14{%k1}
+        vmovups   poly_coeff5+__svml_dasinh_data_internal_avx512(%rip), %zmm13
+        vandpd    RcpBitMask+__svml_dasinh_data_internal_avx512(%rip), %zmm11, %zmm15
+        vmovups   poly_coeff7+__svml_dasinh_data_internal_avx512(%rip), %zmm11
+
+/* Prepare table index */
+        vpsrlq    $48, %zmm15, %zmm5
+
+/* reduced argument for log(): (Rcp*Xin-1)+Rcp*Xin_low */
+        vfmsub231pd {rn-sae}, %zmm15, %zmm10, %zmm9
+
+/* exponents */
+        vgetexppd {sae}, %zmm15, %zmm8
+        vmovups   Four+__svml_dasinh_data_internal_avx512(%rip), %zmm10
+        vpermt2pd Log_tbl_H+64+__svml_dasinh_data_internal_avx512(%rip), %zmm5, %zmm6
+        vpermt2pd Log_tbl_L+64+__svml_dasinh_data_internal_avx512(%rip), %zmm5, %zmm7
+        vsubpd    {rn-sae}, %zmm10, %zmm8, %zmm8{%k1}
+        vfmadd231pd {rn-sae}, %zmm15, %zmm14, %zmm9
+
+/* polynomials */
+        vmovups   poly_coeff9+__svml_dasinh_data_internal_avx512(%rip), %zmm10
+        vmovups   poly_coeff8+__svml_dasinh_data_internal_avx512(%rip), %zmm5
+        vmovups   poly_coeff4+__svml_dasinh_data_internal_avx512(%rip), %zmm14
+
+/* -K*L2H + Th */
+        vmovups   L2H+__svml_dasinh_data_internal_avx512(%rip), %zmm15
+        vfmadd231pd {rn-sae}, %zmm9, %zmm10, %zmm5
+
+/* -K*L2L + Tl */
+        vmovups   L2L+__svml_dasinh_data_internal_avx512(%rip), %zmm10
+        vfnmadd231pd {rn-sae}, %zmm8, %zmm15, %zmm6
+        vfmadd213pd {rn-sae}, %zmm11, %zmm9, %zmm5
+        vfnmadd213pd {rn-sae}, %zmm7, %zmm10, %zmm8
+        vmovups   poly_coeff3+__svml_dasinh_data_internal_avx512(%rip), %zmm7
+        vmovups   poly_coeff1+__svml_dasinh_data_internal_avx512(%rip), %zmm10
+
+/* R^2 */
+        vmulpd    {rn-sae}, %zmm9, %zmm9, %zmm11
+        vfmadd213pd {rn-sae}, %zmm12, %zmm9, %zmm5
+        vfmadd213pd {rn-sae}, %zmm13, %zmm9, %zmm5
+        vfmadd213pd {rn-sae}, %zmm14, %zmm9, %zmm5
+        vfmadd213pd {rn-sae}, %zmm7, %zmm9, %zmm5
+        vmovups   poly_coeff2+__svml_dasinh_data_internal_avx512(%rip), %zmm7
+        vfmadd213pd {rn-sae}, %zmm7, %zmm9, %zmm5
+        vfmadd213pd {rn-sae}, %zmm10, %zmm9, %zmm5
+
+/* Tl + R^2*Poly */
+        vfmadd213pd {rn-sae}, %zmm8, %zmm11, %zmm5
+
+/* R+Tl + R^2*Poly */
+        vaddpd    {rn-sae}, %zmm9, %zmm5, %zmm9
+        vaddpd    {rn-sae}, %zmm9, %zmm6, %zmm4{%k2}
+        vxorpd    %zmm2, %zmm4, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm3
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm3, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movsd     64(%rsp,%r14,8), %xmm0
+        call      asinh@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movsd     %xmm0, 128(%rsp,%r14,8)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN8v_asinh_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_dasinh_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Log_tbl_H[16][2];
+        __declspec(align(64)) VUINT32 Log_tbl_L[16][2];
+        __declspec(align(64)) VUINT32 One[8][2];
+        __declspec(align(64)) VUINT32 AbsMask[8][2];
+        __declspec(align(64)) VUINT32 SmallThreshold[8][2];
+        __declspec(align(64)) VUINT32 Threshold[8][2];
+        __declspec(align(64)) VUINT32 LargeThreshold[8][2];
+        __declspec(align(64)) VUINT32 ca2[8][2];
+        __declspec(align(64)) VUINT32 ca1[8][2];
+        __declspec(align(64)) VUINT32 c4s[8][2];
+        __declspec(align(64)) VUINT32 c3s[8][2];
+        __declspec(align(64)) VUINT32 c2s[8][2];
+        __declspec(align(64)) VUINT32 c1s[8][2];
+        __declspec(align(64)) VUINT32 AddB5[8][2];
+        __declspec(align(64)) VUINT32 RcpBitMask[8][2];
+        __declspec(align(64)) VUINT32 OneEighth[8][2];
+        __declspec(align(64)) VUINT32 Four[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
+        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
+        __declspec(align(64)) VUINT32 L2H[8][2];
+        __declspec(align(64)) VUINT32 L2L[8][2];
+    } __svml_dasinh_data_internal_avx512;
+#endif
+__svml_dasinh_data_internal_avx512:
+        /*== Log_tbl_H ==*/
+        .quad 0x0000000000000000
+        .quad 0xbfaf0a30c0120000
+        .quad 0xbfbe27076e2b0000
+        .quad 0xbfc5ff3070a78000
+        .quad 0xbfcc8ff7c79a8000
+        .quad 0xbfd1675cababc000
+        .quad 0xbfd4618bc21c4000
+        .quad 0xbfd739d7f6bbc000
+        .quad 0xbfd9f323ecbf8000
+        .quad 0xbfdc8ff7c79a8000
+        .quad 0xbfdf128f5faf0000
+        .quad 0xbfe0be72e4252000
+        .quad 0xbfe1e85f5e704000
+        .quad 0xbfe307d7334f2000
+        .quad 0xbfe41d8fe8468000
+        .quad 0xbfe52a2d265bc000
+        /*== Log_tbl_L ==*/
+        .align 64
+        .quad 0x0000000000000000
+        .quad 0x3d53ab33d066d1d2
+        .quad 0x3d2a342c2af0003c
+        .quad 0xbd43d3c873e20a07
+        .quad 0xbd4a21ac25d81ef3
+        .quad 0x3d59f1fc63382a8f
+        .quad 0xbd5ec27d0b7b37b3
+        .quad 0xbd50069ce24c53fb
+        .quad 0xbd584bf2b68d766f
+        .quad 0xbd5a21ac25d81ef3
+        .quad 0xbd3bb2cd720ec44c
+        .quad 0xbd55056d312f7668
+        .quad 0xbd1a07bd8b34be7c
+        .quad 0x3d5e83c094debc15
+        .quad 0x3d5aa33736867a17
+        .quad 0xbd46abb9df22bc57
+        /*== One ==*/
+        .align 64
+        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
+        /*== AbsMask ==*/
+        .align 64
+        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
+        /*== SmallThreshold ==*/
+        .align 64
+        .quad 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000
+        /*== Threshold ==*/
+        .align 64
+        .quad 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000
+        /*== LargeThreshold ==*/
+        .align 64
+        .quad 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff
+        /*== ca2 ==*/
+        .align 64
+        .quad 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7
+        /*== ca1 ==*/
+        .align 64
+        .quad 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e
+        /*== c4s ==*/
+        .align 64
+        .quad 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612
+        /*== c3s ==*/
+        .align 64
+        .quad 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000
+        /*== c2s ==*/
+        .align 64
+        .quad 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000
+        /*== c1s ==*/
+        .align 64
+        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
+        /*== AddB5 ==*/
+        .align 64
+        .quad 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000
+        /*== RcpBitMask ==*/
+        .align 64
+        .quad 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000
+        /*==OneEighth ==*/
+        .align 64
+        .quad 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000
+        /*== Four ==*/
+        .align 64
+        .quad 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000
+        /*== poly_coeff9 ==*/
+        .align 64
+        .quad 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368
+        /*== poly_coeff8 ==*/
+        .align 64
+        .quad 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778
+        /*== poly_coeff7 ==*/
+        .align 64
+        .quad 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9
+        /*== poly_coeff6 ==*/
+        .align 64
+        .quad 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1
+        /*== poly_coeff5 ==*/
+        .align 64
+        .quad 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736
+        /*== poly_coeff4 ==*/
+        .align 64
+        .quad 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af
+        /*== poly_coeff3 ==*/
+        .align 64
+        .quad 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65
+        /*== poly_coeff2 ==*/
+        .align 64
+        .quad 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1
+        /*== poly_coeff1 ==*/
+        .align 64
+        .quad 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000
+        /*== L2H = log(2)_high ==*/
+        .align 64
+        .quad 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000
+        /*== L2L = log(2)_low ==*/
+        .align 64
+        .quad 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000
+        .align 64
+        .type	__svml_dasinh_data_internal_avx512,@object
+        .size	__svml_dasinh_data_internal_avx512,.-__svml_dasinh_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core-avx2.S
new file mode 100644
index 0000000000..7dfd95e400
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core-avx2.S
@@ -0,0 +1,20 @@
+/* AVX2 version of vectorized asinhf.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVeN16v_asinhf _ZGVeN16v_asinhf_avx2_wrapper
+#include "../svml_s_asinhf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core.c
new file mode 100644
index 0000000000..dc770a0e65
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized asinhf, vector length is 16.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVeN16v_asinhf
+#include "ifunc-mathvec-avx512-skx.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVeN16v_asinhf, __GI__ZGVeN16v_asinhf,
+	       __redirect__ZGVeN16v_asinhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core_avx512.S
new file mode 100644
index 0000000000..fc6a8e7cd3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core_avx512.S
@@ -0,0 +1,476 @@
+/* Function asinhf vectorized with AVX-512.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute asinh(x) as log(x + sqrt(x*x + 1))
+ *   using RSQRT instructions for starting the
+ *   square root approximation, and small table lookups for log
+ *   that map to AVX-512 permute instructions
+ *
+ *   Special cases:
+ *
+ *   asinh(NaN) = quiet NaN, and raise invalid exception
+ *   asinh(INF) = that INF
+ *   asinh(0)   = that 0
+ *
+ */
+
+/* Offsets for data table __svml_sasinh_data_internal_avx512
+ */
+#define Log_tbl_H                     	0
+#define Log_tbl_L                     	128
+#define One                           	256
+#define AbsMask                       	320
+#define SmallThreshold                	384
+#define Threshold                     	448
+#define LargeThreshold                	512
+#define ca1                           	576
+#define c2s                           	640
+#define c1s                           	704
+#define AddB5                         	768
+#define RcpBitMask                    	832
+#define OneEighth                     	896
+#define Four                          	960
+#define poly_coeff3                   	1024
+#define poly_coeff2                   	1088
+#define poly_coeff1                   	1152
+#define L2H                           	1216
+#define L2L                           	1280
+
+#include <sysdep.h>
+
+        .text
+	.section .text.exex512,"ax",@progbits
+ENTRY(_ZGVeN16v_asinhf_skx)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-64, %rsp
+        subq      $192, %rsp
+        vmovaps   %zmm0, %zmm10
+
+/* x^2 */
+        vmulps    {rn-sae}, %zmm10, %zmm10, %zmm0
+        vmovups   One+__svml_sasinh_data_internal_avx512(%rip), %zmm2
+
+/* polynomial computation for small inputs */
+        vmovups   ca1+__svml_sasinh_data_internal_avx512(%rip), %zmm1
+
+/* not a very small input ? */
+        vmovups   SmallThreshold+__svml_sasinh_data_internal_avx512(%rip), %zmm11
+
+/* 1+x^2 */
+        vaddps    {rn-sae}, %zmm2, %zmm0, %zmm7
+
+/* |input| */
+        vandps    AbsMask+__svml_sasinh_data_internal_avx512(%rip), %zmm10, %zmm12
+
+/* A=max(x^2, 1); */
+        vmaxps    {sae}, %zmm0, %zmm2, %zmm14
+        vrsqrt14ps %zmm7, %zmm8
+
+/* B=min(x^2, 1); */
+        vminps    {sae}, %zmm0, %zmm2, %zmm15
+        vcmpps    $21, {sae}, %zmm11, %zmm12, %k2
+
+/* B_high */
+        vsubps    {rn-sae}, %zmm14, %zmm7, %zmm9
+
+/* sign bit */
+        vxorps    %zmm10, %zmm12, %zmm13
+
+/* Sh ~sqrt(1+x^2) */
+        vmulps    {rn-sae}, %zmm8, %zmm7, %zmm6
+        vmovups   LargeThreshold+__svml_sasinh_data_internal_avx512(%rip), %zmm14
+
+/* B_low */
+        vsubps    {rn-sae}, %zmm9, %zmm15, %zmm3
+
+/* Sh+x */
+        vaddps    {rn-sae}, %zmm12, %zmm6, %zmm15
+
+/* (Yh*R0)_low */
+        vfmsub213ps {rn-sae}, %zmm6, %zmm8, %zmm7
+        vmulps    {rn-sae}, %zmm1, %zmm0, %zmm9
+        vcmpps    $22, {sae}, %zmm14, %zmm12, %k0
+        vmovups   c1s+__svml_sasinh_data_internal_avx512(%rip), %zmm1
+
+/* polynomial computation for small inputs */
+        vfmadd213ps {rn-sae}, %zmm12, %zmm12, %zmm9
+        kmovw     %k0, %edx
+
+/* (x^2)_low */
+        vmovaps   %zmm10, %zmm4
+        vfmsub213ps {rn-sae}, %zmm0, %zmm10, %zmm4
+
+/* Yl = (x^2)_low + B_low */
+        vaddps    {rn-sae}, %zmm4, %zmm3, %zmm5
+
+/* rel. error term: Eh=1-Sh*R0 */
+        vmovaps   %zmm2, %zmm0
+        vfnmadd231ps {rn-sae}, %zmm6, %zmm8, %zmm0
+
+/* Sl = (Yh*R0)_low+(R0*Yl) */
+        vfmadd213ps {rn-sae}, %zmm7, %zmm8, %zmm5
+
+/* very large inputs ? */
+        vmovups   Threshold+__svml_sasinh_data_internal_avx512(%rip), %zmm7
+
+/* rel. error term: Eh=(1-Sh*R0)-Sl*R0 */
+        vfnmadd231ps {rn-sae}, %zmm5, %zmm8, %zmm0
+
+/* sqrt(1+x^2) ~ Sh + Sl + Sh*Eh*poly_s */
+        vmovups   c2s+__svml_sasinh_data_internal_avx512(%rip), %zmm8
+        vcmpps    $21, {sae}, %zmm7, %zmm12, %k1
+
+/* Sh*Eh */
+        vmulps    {rn-sae}, %zmm0, %zmm6, %zmm4
+        vfmadd231ps {rn-sae}, %zmm0, %zmm8, %zmm1
+
+/* Sl + Sh*Eh*poly_s */
+        vfmadd213ps {rn-sae}, %zmm5, %zmm1, %zmm4
+
+/* Xh */
+        vsubps    {rn-sae}, %zmm6, %zmm15, %zmm5
+
+/* fixup for very large inputs */
+        vmovups   OneEighth+__svml_sasinh_data_internal_avx512(%rip), %zmm6
+
+/* Xin0+Sl+Sh*Eh*poly_s ~ x+sqrt(1+x^2) */
+        vaddps    {rn-sae}, %zmm4, %zmm15, %zmm3
+
+/* Xl */
+        vsubps    {rn-sae}, %zmm5, %zmm12, %zmm5
+
+/* Sl_high */
+        vsubps    {rn-sae}, %zmm15, %zmm3, %zmm0
+        vmulps    {rn-sae}, %zmm6, %zmm12, %zmm3{%k1}
+
+/* -K*L2H + Th */
+        vmovups   L2H+__svml_sasinh_data_internal_avx512(%rip), %zmm15
+
+/* Sl_l */
+        vsubps    {rn-sae}, %zmm0, %zmm4, %zmm1
+        vrcp14ps  %zmm3, %zmm6
+
+/* Table lookups */
+        vmovups   __svml_sasinh_data_internal_avx512(%rip), %zmm0
+
+/* Xin_low */
+        vaddps    {rn-sae}, %zmm5, %zmm1, %zmm7
+
+/* round reciprocal to 1+4b mantissas */
+        vpaddd    AddB5+__svml_sasinh_data_internal_avx512(%rip), %zmm6, %zmm4
+        vmovups   poly_coeff1+__svml_sasinh_data_internal_avx512(%rip), %zmm5
+        vandps    RcpBitMask+__svml_sasinh_data_internal_avx512(%rip), %zmm4, %zmm8
+
+/* fixup for very large inputs */
+        vxorps    %zmm7, %zmm7, %zmm7{%k1}
+
+/* polynomial */
+        vmovups   poly_coeff3+__svml_sasinh_data_internal_avx512(%rip), %zmm4
+
+/* reduced argument for log(): (Rcp*Xin-1)+Rcp*Xin_low */
+        vfmsub231ps {rn-sae}, %zmm8, %zmm3, %zmm2
+        vmovups   Four+__svml_sasinh_data_internal_avx512(%rip), %zmm3
+
+/* exponents */
+        vgetexpps {sae}, %zmm8, %zmm1
+
+/* Prepare table index */
+        vpsrld    $18, %zmm8, %zmm14
+        vfmadd231ps {rn-sae}, %zmm8, %zmm7, %zmm2
+        vmovups   poly_coeff2+__svml_sasinh_data_internal_avx512(%rip), %zmm7
+        vsubps    {rn-sae}, %zmm3, %zmm1, %zmm1{%k1}
+        vpermt2ps Log_tbl_H+64+__svml_sasinh_data_internal_avx512(%rip), %zmm14, %zmm0
+        vmovups   Log_tbl_L+__svml_sasinh_data_internal_avx512(%rip), %zmm3
+        vfmadd231ps {rn-sae}, %zmm2, %zmm4, %zmm7
+        vfnmadd231ps {rn-sae}, %zmm1, %zmm15, %zmm0
+
+/* R^2 */
+        vmulps    {rn-sae}, %zmm2, %zmm2, %zmm6
+        vfmadd213ps {rn-sae}, %zmm5, %zmm2, %zmm7
+        vpermt2ps Log_tbl_L+64+__svml_sasinh_data_internal_avx512(%rip), %zmm14, %zmm3
+
+/* -K*L2L + Tl */
+        vmovups   L2L+__svml_sasinh_data_internal_avx512(%rip), %zmm14
+        vfnmadd213ps {rn-sae}, %zmm3, %zmm14, %zmm1
+
+/* Tl + R^2*Poly */
+        vfmadd213ps {rn-sae}, %zmm1, %zmm6, %zmm7
+
+/* R+Tl + R^2*Poly */
+        vaddps    {rn-sae}, %zmm2, %zmm7, %zmm2
+        vaddps    {rn-sae}, %zmm2, %zmm0, %zmm9{%k2}
+        vxorps    %zmm13, %zmm9, %zmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm10
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %zmm10, 64(%rsp)
+        vmovups   %zmm0, 128(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx zmm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $16, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   128(%rsp), %zmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 zmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     64(%rsp,%r14,4), %xmm0
+        call      asinhf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 128(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVeN16v_asinhf_skx)
+
+        .section .rodata, "a"
+        .align 64
+
+#ifdef __svml_sasinh_data_internal_avx512_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(64)) VUINT32 Log_tbl_H[32][1];
+        __declspec(align(64)) VUINT32 Log_tbl_L[32][1];
+        __declspec(align(64)) VUINT32 One[16][1];
+        __declspec(align(64)) VUINT32 AbsMask[16][1];
+        __declspec(align(64)) VUINT32 SmallThreshold[16][1];
+        __declspec(align(64)) VUINT32 Threshold[16][1];
+        __declspec(align(64)) VUINT32 LargeThreshold[16][1];
+        __declspec(align(64)) VUINT32 ca1[16][1];
+        __declspec(align(64)) VUINT32 c2s[16][1];
+        __declspec(align(64)) VUINT32 c1s[16][1];
+        __declspec(align(64)) VUINT32 AddB5[16][1];
+        __declspec(align(64)) VUINT32 RcpBitMask[16][1];
+        __declspec(align(64)) VUINT32 OneEighth[16][1];
+        __declspec(align(64)) VUINT32 Four[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
+        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
+        __declspec(align(64)) VUINT32 L2H[16][1];
+        __declspec(align(64)) VUINT32 L2L[16][1];
+    } __svml_sasinh_data_internal_avx512;
+#endif
+__svml_sasinh_data_internal_avx512:
+        /*== Log_tbl_H ==*/
+        .long 0x00000000
+        .long 0xbcfc0000
+        .long 0xbd788000
+        .long 0xbdb78000
+        .long 0xbdf14000
+        .long 0xbe14a000
+        .long 0xbe300000
+        .long 0xbe4aa000
+        .long 0xbe648000
+        .long 0xbe7dc000
+        .long 0xbe8b4000
+        .long 0xbe974000
+        .long 0xbea31000
+        .long 0xbeae9000
+        .long 0xbeb9d000
+        .long 0xbec4d000
+        .long 0xbecfa000
+        .long 0xbeda2000
+        .long 0xbee48000
+        .long 0xbeeea000
+        .long 0xbef89000
+        .long 0xbf012800
+        .long 0xbf05f000
+        .long 0xbf0aa800
+        .long 0xbf0f4000
+        .long 0xbf13c800
+        .long 0xbf184000
+        .long 0xbf1ca000
+        .long 0xbf20f000
+        .long 0xbf252800
+        .long 0xbf295000
+        .long 0xbf2d6800
+        /*== Log_tbl_L ==*/
+        .align 64
+        .long 0x80000000
+        .long 0xb726c39e
+        .long 0x3839e7fe
+        .long 0xb7528ae5
+        .long 0x377891d5
+        .long 0xb8297c10
+        .long 0x37cf8f58
+        .long 0x3852b186
+        .long 0x35838656
+        .long 0xb80c36af
+        .long 0x38235454
+        .long 0xb862bae1
+        .long 0x37e87bc7
+        .long 0x37848150
+        .long 0x37202511
+        .long 0xb74e1b05
+        .long 0x385c1340
+        .long 0xb8777bcd
+        .long 0x36038656
+        .long 0xb7d40984
+        .long 0xb80f5faf
+        .long 0xb8254b4c
+        .long 0xb865c84a
+        .long 0x37f0b42d
+        .long 0xb83ebce1
+        .long 0xb83c2513
+        .long 0x37a332c4
+        .long 0x3779654f
+        .long 0x38602f73
+        .long 0x367449f8
+        .long 0xb7b4996f
+        .long 0xb800986b
+        /*== One ==*/
+        .align 64
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== AbsMask ==*/
+        .align 64
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== SmallThreshold ==*/
+        .align 64
+        .long 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000
+        /*== Threshold ==*/
+        .align 64
+        .long 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000
+        /*== LargeThreshold ==*/
+        .align 64
+        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
+        /*== ca1 ==*/
+        .align 64
+        .long 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE
+        /*== c2s ==*/
+        .align 64
+        .long 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000
+        /*== c1s ==*/
+        .align 64
+        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
+        /*== AddB5 ==*/
+        .align 64
+        .long 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000
+        /*== RcpBitMask ==*/
+        .align 64
+        .long 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000
+        /*==OneEighth ==*/
+        .align 64
+        .long 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000
+        /*== Four ==*/
+        .align 64
+        .long 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000
+        /*== poly_coeff3 ==*/
+        .align 64
+        .long 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810
+        /*== poly_coeff2 ==*/
+        .align 64
+        .long 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e
+        /*== poly_coeff1 ==*/
+        .align 64
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000
+        /*== L2H = log(2)_high ==*/
+        .align 64
+        .long 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000
+        /*== L2L = log(2)_low ==*/
+        .align 64
+        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4
+        .align 64
+        .type	__svml_sasinh_data_internal_avx512,@object
+        .size	__svml_sasinh_data_internal_avx512,.-__svml_sasinh_data_internal_avx512
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core-sse2.S
new file mode 100644
index 0000000000..52e4d2f728
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core-sse2.S
@@ -0,0 +1,20 @@
+/* SSE2 version of vectorized asinhf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVbN4v_asinhf _ZGVbN4v_asinhf_sse2
+#include "../svml_s_asinhf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core.c
new file mode 100644
index 0000000000..296d5754ae
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized asinhf, vector length is 4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVbN4v_asinhf
+#include "ifunc-mathvec-sse4_1.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVbN4v_asinhf, __GI__ZGVbN4v_asinhf,
+	       __redirect__ZGVbN4v_asinhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core_sse4.S
new file mode 100644
index 0000000000..1eeeb4f5af
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core_sse4.S
@@ -0,0 +1,509 @@
+/* Function asinhf vectorized with SSE4.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute asinh(x) as log(x + sqrt(x*x + 1))
+ *
+ *   Special cases:
+ *
+ *   asinh(NaN) = quiet NaN, and raise invalid exception
+ *   asinh(INF) = that INF
+ *   asinh(0)   = that 0
+ *
+ */
+
+/* Offsets for data table __svml_sasinh_data_internal
+ */
+#define SgnMask                       	0
+#define sOne                          	16
+#define sPoly                         	32
+#define iBrkValue                     	160
+#define iOffExpoMask                  	176
+#define sBigThreshold                 	192
+#define sC2                           	208
+#define sC3                           	224
+#define sHalf                         	240
+#define sLargestFinite                	256
+#define sLittleThreshold              	272
+#define sSign                         	288
+#define sThirtyOne                    	304
+#define sTopMask11                    	320
+#define sTopMask8                     	336
+#define XScale                        	352
+#define sLn2                          	368
+
+#include <sysdep.h>
+
+        .text
+	.section .text.sse4,"ax",@progbits
+ENTRY(_ZGVbN4v_asinhf_sse4)
+        subq      $72, %rsp
+        cfi_def_cfa_offset(80)
+        movaps    %xmm0, %xmm8
+
+/*
+ * Split X into high and low parts, XHi (<= 11 bits) and XLo (<= 13 bits)
+ * We could use either X or |X| here, but it doesn't seem to matter
+ */
+        movups    sTopMask11+__svml_sasinh_data_internal(%rip), %xmm10
+        movaps    %xmm8, %xmm2
+        andps     %xmm8, %xmm10
+
+/*
+ * Compute X^2 = (XHi + XLo)^2 = XHi^2 + XLo * (X + XHi)
+ * The two parts are shifted off by around 11 bits. So even though
+ * the low bit will not in general be exact, it's near enough
+ */
+        movaps    %xmm10, %xmm3
+        subps     %xmm10, %xmm2
+        mulps     %xmm10, %xmm3
+        addps     %xmm8, %xmm10
+
+/* Load the constant 1 and a sign mask */
+        movups    sOne+__svml_sasinh_data_internal(%rip), %xmm7
+
+/*
+ * Finally, express Y + W = X^2 + 1 accurately where Y has <= 8 bits.
+ * If |X| <= 1 then |XHi| <= 1 and so |X2Hi| <= 1, so we can treat 1
+ * as the dominant component in the compensated summation. Otherwise,
+ * if |X| >= 1, then since X2Hi only has 22 significant bits, the basic
+ * addition will be exact anyway until we get to |X| >= 2^24. But by
+ * that time the log function is well-conditioned enough that the
+ * rounding error doesn't matter. Hence we can treat 1 as dominant even
+ * if it literally isn't.
+ */
+        movaps    %xmm7, %xmm11
+        movaps    %xmm7, %xmm4
+        movups    sTopMask8+__svml_sasinh_data_internal(%rip), %xmm12
+        addps     %xmm3, %xmm11
+        mulps     %xmm10, %xmm2
+        subps     %xmm11, %xmm4
+        movaps    %xmm12, %xmm0
+        addps     %xmm3, %xmm4
+
+/*
+ * Unfortunately, we can still be in trouble if |X| <= 2^-5, since
+ * the absolute error 2^-(7+24)-ish in sqrt(1 + X^2) gets scaled up
+ * by 1/X and comes close to our threshold. Hence if |X| <= 2^-4,
+ * perform an alternative computation
+ * sqrt(1 + X^2) - 1 = X^2/2 - X^4/8 + X^6/16
+ * X2 = X^2
+ */
+        addps     %xmm2, %xmm3
+        addps     %xmm2, %xmm4
+        andps     %xmm11, %xmm0
+
+/*
+ * Compute R = 1/sqrt(Y + W) * (1 + d)
+ * Force R to <= 8 significant bits.
+ * This means that R * Y and R^2 * Y are exactly representable.
+ */
+        rsqrtps   %xmm0, %xmm14
+        subps     %xmm0, %xmm11
+        andps     %xmm12, %xmm14
+        addps     %xmm11, %xmm4
+
+/*
+ * Compute S = (Y/sqrt(Y + W)) * (1 + d)
+ * and T = (W/sqrt(Y + W)) * (1 + d)
+ * so that S + T = sqrt(Y + W) * (1 + d)
+ * S is exact, and the rounding error in T is OK.
+ */
+        mulps     %xmm14, %xmm0
+        mulps     %xmm14, %xmm4
+
+/*
+ * Get the absolute value of the input, since we will exploit antisymmetry
+ * and mostly assume X >= 0 in the core computation
+ */
+        movups    SgnMask+__svml_sasinh_data_internal(%rip), %xmm6
+
+/*
+ * Compute e = -(2 * d + d^2)
+ * The first FMR is exact, and the rounding error in the other is acceptable
+ * since d and e are ~ 2^-8
+ */
+        movaps    %xmm14, %xmm13
+        andps     %xmm8, %xmm6
+
+/*
+ * Obtain sqrt(1 + X^2) - 1 in two pieces
+ * sqrt(1 + X^2) - 1
+ * = sqrt(Y + W) - 1
+ * = (S + T) * (1 + Corr) - 1
+ * = [S - 1] + [T + (S + T) * Corr]
+ * We need a compensated summation for the last part. We treat S - 1
+ * as the larger part; it certainly is until about X < 2^-4, and in that
+ * case, the error is affordable since X dominates over sqrt(1 + X^2) - 1
+ * Final sum is dTmp5 (hi) + dTmp7 (lo)
+ */
+        movaps    %xmm0, %xmm1
+
+/*
+ * Check whether the input is finite, by checking |X| <= MaxFloat
+ * Otherwise set the rangemask so that the callout will get used.
+ * Note that this will also use the callout for NaNs since not(NaN <= MaxFloat)
+ */
+        movaps    %xmm6, %xmm9
+
+/*
+ * The following computation can go wrong for very large X, basically
+ * because X^2 overflows. But for large X we have
+ * asinh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
+ * we can just later stick X back into the log and tweak up the exponent.
+ * Actually we scale X by 2^-30 and tweak the exponent up by 31,
+ * to stay in the safe range for the later log computation.
+ * Compute a flag now telling us when do do this.
+ */
+        movaps    %xmm6, %xmm5
+        cmpnleps  sLargestFinite+__svml_sasinh_data_internal(%rip), %xmm9
+        cmpltps   sBigThreshold+__svml_sasinh_data_internal(%rip), %xmm5
+        mulps     %xmm0, %xmm13
+        addps     %xmm4, %xmm1
+        subps     %xmm7, %xmm0
+        mulps     %xmm4, %xmm14
+        movmskps  %xmm9, %edx
+        movaps    %xmm7, %xmm9
+
+/*
+ * Now       1 / (1 + d)
+ * = 1 / (1 + (sqrt(1 - e) - 1))
+ * = 1 / sqrt(1 - e)
+ * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 + ...
+ * So compute the first three nonconstant terms of that, so that
+ * we have a relative correction (1 + Corr) to apply to S etc.
+ * C1 = 1/2
+ * C2 = 3/8
+ * C3 = 5/16
+ */
+        movups    sC3+__svml_sasinh_data_internal(%rip), %xmm15
+        subps     %xmm13, %xmm9
+        movups    sHalf+__svml_sasinh_data_internal(%rip), %xmm10
+        subps     %xmm14, %xmm9
+
+/* sX2over2 = X^2/2 */
+        mulps     %xmm10, %xmm3
+        mulps     %xmm9, %xmm15
+
+/* sX46 = -X^4/4 + X^6/8 */
+        movaps    %xmm3, %xmm2
+        movaps    %xmm3, %xmm12
+
+/*
+ * Now do another compensated sum to add |X| + [sqrt(1 + X^2) - 1].
+ * It's always safe to assume |X| is larger.
+ * This is the final 2-part argument to the log1p function
+ */
+        movaps    %xmm6, %xmm14
+        addps     sC2+__svml_sasinh_data_internal(%rip), %xmm15
+        mulps     %xmm9, %xmm15
+        addps     %xmm10, %xmm15
+        mulps     %xmm15, %xmm9
+        mulps     %xmm1, %xmm9
+
+/* Now multiplex to the case X = 2^-30 * input, Xl = sL = 0 in the "big" case. */
+        movups    XScale+__svml_sasinh_data_internal(%rip), %xmm15
+        addps     %xmm9, %xmm4
+        movaps    %xmm4, %xmm11
+        addps     %xmm0, %xmm11
+        subps     %xmm11, %xmm0
+        addps     %xmm0, %xmm4
+
+/* sX4over4 = X^4/4 */
+        movaps    %xmm3, %xmm0
+        mulps     %xmm3, %xmm0
+        mulps     %xmm0, %xmm2
+        subps     %xmm0, %xmm2
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * also adding L into Xl.
+ * compute 1+x as high, low parts
+ */
+        movaps    %xmm7, %xmm0
+
+/* sX46over2 = -X^4/8 + x^6/16 */
+        mulps     %xmm2, %xmm10
+        movaps    %xmm7, %xmm2
+        addps     %xmm10, %xmm12
+        subps     %xmm12, %xmm3
+        addps     %xmm3, %xmm10
+
+/* Now multiplex the two possible computations */
+        movaps    %xmm6, %xmm3
+        cmpleps   sLittleThreshold+__svml_sasinh_data_internal(%rip), %xmm3
+        movaps    %xmm3, %xmm13
+        andps     %xmm3, %xmm12
+        andnps    %xmm11, %xmm13
+        movaps    %xmm3, %xmm1
+        orps      %xmm12, %xmm13
+        andnps    %xmm4, %xmm1
+        andps     %xmm3, %xmm10
+        movaps    %xmm6, %xmm4
+        orps      %xmm10, %xmm1
+        addps     %xmm13, %xmm14
+        mulps     %xmm15, %xmm6
+        maxps     %xmm14, %xmm0
+        minps     %xmm14, %xmm2
+        subps     %xmm14, %xmm4
+        movaps    %xmm0, %xmm3
+        addps     %xmm4, %xmm13
+        addps     %xmm2, %xmm3
+        addps     %xmm13, %xmm1
+        subps     %xmm3, %xmm0
+        movaps    %xmm5, %xmm4
+        andps     %xmm5, %xmm3
+        andnps    %xmm6, %xmm4
+        addps     %xmm0, %xmm2
+
+/*
+ * Now resume the main code.
+ * reduction: compute r,n
+ */
+        movdqu    iBrkValue+__svml_sasinh_data_internal(%rip), %xmm6
+        orps      %xmm3, %xmm4
+        psubd     %xmm6, %xmm4
+        movaps    %xmm7, %xmm0
+        addps     %xmm2, %xmm1
+        movdqu    iOffExpoMask+__svml_sasinh_data_internal(%rip), %xmm2
+        pand      %xmm4, %xmm2
+        psrad     $23, %xmm4
+        cvtdq2ps  %xmm4, %xmm3
+        pslld     $23, %xmm4
+        andps     %xmm5, %xmm1
+        paddd     %xmm6, %xmm2
+        psubd     %xmm4, %xmm0
+        mulps     %xmm0, %xmm1
+
+/* polynomial evaluation */
+        subps     %xmm7, %xmm2
+        movups    sPoly+112+__svml_sasinh_data_internal(%rip), %xmm7
+        addps     %xmm2, %xmm1
+        mulps     %xmm1, %xmm7
+        movaps    %xmm5, %xmm2
+
+/* Add 31 to the exponent in the "large" case to get log(2 * input) */
+        movups    sThirtyOne+__svml_sasinh_data_internal(%rip), %xmm0
+        addps     sPoly+96+__svml_sasinh_data_internal(%rip), %xmm7
+        addps     %xmm3, %xmm0
+        mulps     %xmm1, %xmm7
+        andnps    %xmm0, %xmm2
+        andps     %xmm5, %xmm3
+        orps      %xmm3, %xmm2
+        addps     sPoly+80+__svml_sasinh_data_internal(%rip), %xmm7
+
+/* final reconstruction */
+        mulps     sLn2+__svml_sasinh_data_internal(%rip), %xmm2
+        mulps     %xmm1, %xmm7
+
+/* Finally, reincorporate the original sign. */
+        movups    sSign+__svml_sasinh_data_internal(%rip), %xmm0
+        andps     %xmm8, %xmm0
+        addps     sPoly+64+__svml_sasinh_data_internal(%rip), %xmm7
+        mulps     %xmm1, %xmm7
+        addps     sPoly+48+__svml_sasinh_data_internal(%rip), %xmm7
+        mulps     %xmm1, %xmm7
+        addps     sPoly+32+__svml_sasinh_data_internal(%rip), %xmm7
+        mulps     %xmm1, %xmm7
+        addps     sPoly+16+__svml_sasinh_data_internal(%rip), %xmm7
+        mulps     %xmm1, %xmm7
+        addps     sPoly+__svml_sasinh_data_internal(%rip), %xmm7
+        mulps     %xmm1, %xmm7
+        mulps     %xmm1, %xmm7
+        addps     %xmm7, %xmm1
+        addps     %xmm2, %xmm1
+        pxor      %xmm1, %xmm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm8
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        addq      $72, %rsp
+        cfi_def_cfa_offset(8)
+        ret
+        cfi_def_cfa_offset(80)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        movups    %xmm8, 32(%rsp)
+        movups    %xmm0, 48(%rsp)
+                                # LOE rbx rbp r12 r13 r14 r15 edx
+
+        xorl      %eax, %eax
+        movq      %r12, 16(%rsp)
+        cfi_offset(12, -64)
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        cfi_offset(13, -72)
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx rbp r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $4, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx rbp r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        movups    48(%rsp), %xmm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        cfi_offset(12, -64)
+        cfi_offset(13, -72)
+        cfi_offset(14, -80)
+                                # LOE rbx rbp r12 r13 r14 r15 xmm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      asinhf@PLT
+                                # LOE rbx rbp r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 48(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx rbp r15 r12d r13d
+END(_ZGVbN4v_asinhf_sse4)
+
+        .section .rodata, "a"
+        .align 16
+
+#ifdef __svml_sasinh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(16)) VUINT32 SgnMask[4][1];
+        __declspec(align(16)) VUINT32 sOne[4][1];
+        __declspec(align(16)) VUINT32 sPoly[8][4][1];
+        __declspec(align(16)) VUINT32 iBrkValue[4][1];
+        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
+        __declspec(align(16)) VUINT32 sBigThreshold[4][1];
+        __declspec(align(16)) VUINT32 sC2[4][1];
+        __declspec(align(16)) VUINT32 sC3[4][1];
+        __declspec(align(16)) VUINT32 sHalf[4][1];
+        __declspec(align(16)) VUINT32 sLargestFinite[4][1];
+        __declspec(align(16)) VUINT32 sLittleThreshold[4][1];
+        __declspec(align(16)) VUINT32 sSign[4][1];
+        __declspec(align(16)) VUINT32 sThirtyOne[4][1];
+        __declspec(align(16)) VUINT32 sTopMask11[4][1];
+        __declspec(align(16)) VUINT32 sTopMask8[4][1];
+        __declspec(align(16)) VUINT32 XScale[4][1];
+        __declspec(align(16)) VUINT32 sLn2[4][1];
+} __svml_sasinh_data_internal;
+#endif
+__svml_sasinh_data_internal:
+        /*== SgnMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== sOne = SP 1.0 ==*/
+        .align 16
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== sPoly[] = SP polynomial ==*/
+        .align 16
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
+        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
+        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
+        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
+        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
+        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
+        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
+        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 16
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 16
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sBigThreshold ==*/
+        .align 16
+        .long 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000
+        /*== sC2 ==*/
+        .align 16
+        .long 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000
+        /*== sC3 ==*/
+        .align 16
+        .long 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000
+        /*== sHalf ==*/
+        .align 16
+        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
+        /*== sLargestFinite ==*/
+        .align 16
+        .long 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF
+        /*== sLittleThreshold ==*/
+        .align 16
+        .long 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000
+        /*== sSign ==*/
+        .align 16
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000
+        /*== sThirtyOne ==*/
+        .align 16
+        .long 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000
+        /*== sTopMask11 ==*/
+        .align 16
+        .long 0xFFFFE000, 0xFFFFE000, 0xFFFFE000, 0xFFFFE000
+        /*== sTopMask8 ==*/
+        .align 16
+        .long 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000
+        /*== XScale ==*/
+        .align 16
+        .long 0x30800000, 0x30800000, 0x30800000, 0x30800000
+        /*== sLn2 = SP ln(2) ==*/
+        .align 16
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
+        .align 16
+        .type	__svml_sasinh_data_internal,@object
+        .size	__svml_sasinh_data_internal,.-__svml_sasinh_data_internal
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core-sse.S
new file mode 100644
index 0000000000..1a0e113e94
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core-sse.S
@@ -0,0 +1,20 @@
+/* SSE version of vectorized asinhf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define _ZGVdN8v_asinhf _ZGVdN8v_asinhf_sse_wrapper
+#include "../svml_s_asinhf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core.c
new file mode 100644
index 0000000000..d97097a394
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core.c
@@ -0,0 +1,28 @@
+/* Multiple versions of vectorized asinhf, vector length is 8.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define SYMBOL_NAME _ZGVdN8v_asinhf
+#include "ifunc-mathvec-avx2.h"
+
+libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
+
+#ifdef SHARED
+__hidden_ver1 (_ZGVdN8v_asinhf, __GI__ZGVdN8v_asinhf,
+	       __redirect__ZGVdN8v_asinhf)
+  __attribute__ ((visibility ("hidden")));
+#endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core_avx2.S
new file mode 100644
index 0000000000..a966f53773
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core_avx2.S
@@ -0,0 +1,457 @@
+/* Function asinhf vectorized with AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   https://www.gnu.org/licenses/.  */
+
+/*
+ * ALGORITHM DESCRIPTION:
+ *
+ *   Compute asinh(x) as log(x + sqrt(x*x + 1))
+ *
+ *   Special cases:
+ *
+ *   asinh(NaN) = quiet NaN, and raise invalid exception
+ *   asinh(INF) = that INF
+ *   asinh(0)   = that 0
+ *
+ */
+
+/* Offsets for data table __svml_sasinh_data_internal
+ */
+#define SgnMask                       	0
+#define sOne                          	32
+#define sPoly                         	64
+#define iBrkValue                     	320
+#define iOffExpoMask                  	352
+#define sBigThreshold                 	384
+#define sC2                           	416
+#define sC3                           	448
+#define sHalf                         	480
+#define sLargestFinite                	512
+#define sLittleThreshold              	544
+#define sSign                         	576
+#define sThirtyOne                    	608
+#define sTopMask8                     	640
+#define XScale                        	672
+#define sLn2                          	704
+
+#include <sysdep.h>
+
+        .text
+	.section .text.avx2,"ax",@progbits
+ENTRY(_ZGVdN8v_asinhf_avx2)
+        pushq     %rbp
+        cfi_def_cfa_offset(16)
+        movq      %rsp, %rbp
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+        andq      $-32, %rsp
+        subq      $96, %rsp
+        vmovaps   %ymm0, %ymm9
+
+/* Load the constant 1 and a sign mask */
+        vmovups   sOne+__svml_sasinh_data_internal(%rip), %ymm8
+
+/* No need to split X when FMA is available in hardware. */
+        vmulps    %ymm9, %ymm9, %ymm5
+        vmovups   sTopMask8+__svml_sasinh_data_internal(%rip), %ymm1
+
+/*
+ * Finally, express Y + W = X^2 + 1 accurately where Y has <= 8 bits.
+ * If |X| <= 1 then |XHi| <= 1 and so |X2Hi| <= 1, so we can treat 1
+ * as the dominant component in the compensated summation. Otherwise,
+ * if |X| >= 1, then since X2Hi only has 22 significant bits, the basic
+ * addition will be exact anyway until we get to |X| >= 2^24. But by
+ * that time the log function is well-conditioned enough that the
+ * rounding error doesn't matter. Hence we can treat 1 as dominant even
+ * if it literally isn't.
+ */
+        vaddps    %ymm5, %ymm8, %ymm13
+        vandps    %ymm1, %ymm13, %ymm2
+        vmovaps   %ymm9, %ymm4
+        vsubps    %ymm13, %ymm8, %ymm11
+        vsubps    %ymm2, %ymm13, %ymm15
+
+/*
+ * Compute R = 1/sqrt(Y + W) * (1 + d)
+ * Force R to <= 8 significant bits.
+ * This means that R * Y and R^2 * Y are exactly representable.
+ */
+        vrsqrtps  %ymm2, %ymm0
+        vfmsub213ps %ymm5, %ymm9, %ymm4
+        vaddps    %ymm11, %ymm5, %ymm12
+
+/*
+ * Get the absolute value of the input, since we will exploit antisymmetry
+ * and mostly assume X >= 0 in the core computation
+ */
+        vandps    SgnMask+__svml_sasinh_data_internal(%rip), %ymm9, %ymm6
+
+/*
+ * Check whether the input is finite, by checking |X| <= MaxFloat
+ * Otherwise set the rangemask so that the callout will get used.
+ * Note that this will also use the callout for NaNs since not(NaN <= MaxFloat)
+ */
+        vcmpnle_uqps sLargestFinite+__svml_sasinh_data_internal(%rip), %ymm6, %ymm10
+        vaddps    %ymm12, %ymm4, %ymm14
+
+/*
+ * Unfortunately, we can still be in trouble if |X| <= 2^-5, since
+ * the absolute error 2^-(7+24)-ish in sqrt(1 + X^2) gets scaled up
+ * by 1/X and comes close to our threshold. Hence if |X| <= 2^-4,
+ * perform an alternative computation
+ * sqrt(1 + X^2) - 1 = X^2/2 - X^4/8 + X^6/16
+ * X2 = X^2
+ */
+        vaddps    %ymm4, %ymm5, %ymm4
+
+/*
+ * The following computation can go wrong for very large X, basically
+ * because X^2 overflows. But for large X we have
+ * asinh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
+ * we can just later stick X back into the log and tweak up the exponent.
+ * Actually we scale X by 2^-30 and tweak the exponent up by 31,
+ * to stay in the safe range for the later log computation.
+ * Compute a flag now telling us when do do this.
+ */
+        vcmplt_oqps sBigThreshold+__svml_sasinh_data_internal(%rip), %ymm6, %ymm7
+        vaddps    %ymm15, %ymm14, %ymm3
+
+/*
+ * Now       1 / (1 + d)
+ * = 1 / (1 + (sqrt(1 - e) - 1))
+ * = 1 / sqrt(1 - e)
+ * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 + ...
+ * So compute the first three nonconstant terms of that, so that
+ * we have a relative correction (1 + Corr) to apply to S etc.
+ * C1 = 1/2
+ * C2 = 3/8
+ * C3 = 5/16
+ */
+        vmovups   sC3+__svml_sasinh_data_internal(%rip), %ymm12
+        vmovmskps %ymm10, %edx
+        vandps    %ymm1, %ymm0, %ymm10
+
+/*
+ * Compute S = (Y/sqrt(Y + W)) * (1 + d)
+ * and T = (W/sqrt(Y + W)) * (1 + d)
+ * so that S + T = sqrt(Y + W) * (1 + d)
+ * S is exact, and the rounding error in T is OK.
+ */
+        vmulps    %ymm10, %ymm2, %ymm15
+        vmulps    %ymm3, %ymm10, %ymm14
+        vmovups   sHalf+__svml_sasinh_data_internal(%rip), %ymm3
+        vsubps    %ymm8, %ymm15, %ymm0
+
+/*
+ * Obtain sqrt(1 + X^2) - 1 in two pieces
+ * sqrt(1 + X^2) - 1
+ * = sqrt(Y + W) - 1
+ * = (S + T) * (1 + Corr) - 1
+ * = [S - 1] + [T + (S + T) * Corr]
+ * We need a compensated summation for the last part. We treat S - 1
+ * as the larger part; it certainly is until about X < 2^-4, and in that
+ * case, the error is affordable since X dominates over sqrt(1 + X^2) - 1
+ * Final sum is dTmp5 (hi) + dTmp7 (lo)
+ */
+        vaddps    %ymm14, %ymm15, %ymm13
+
+/*
+ * Compute e = -(2 * d + d^2)
+ * The first FMR is exact, and the rounding error in the other is acceptable
+ * since d and e are ~ 2^-8
+ */
+        vmovaps   %ymm8, %ymm11
+        vfnmadd231ps %ymm15, %ymm10, %ymm11
+        vfnmadd231ps %ymm14, %ymm10, %ymm11
+        vfmadd213ps sC2+__svml_sasinh_data_internal(%rip), %ymm11, %ymm12
+        vfmadd213ps %ymm3, %ymm11, %ymm12
+        vmulps    %ymm12, %ymm11, %ymm1
+
+/* Now multiplex the two possible computations */
+        vcmple_oqps sLittleThreshold+__svml_sasinh_data_internal(%rip), %ymm6, %ymm11
+        vfmadd213ps %ymm14, %ymm13, %ymm1
+        vaddps    %ymm0, %ymm1, %ymm2
+        vsubps    %ymm2, %ymm0, %ymm10
+
+/* sX2over2 = X^2/2 */
+        vmulps    %ymm4, %ymm3, %ymm0
+        vaddps    %ymm10, %ymm1, %ymm1
+
+/* sX4over4 = X^4/4 */
+        vmulps    %ymm0, %ymm0, %ymm5
+
+/* sX46 = -X^4/4 + X^6/8 */
+        vfmsub231ps %ymm0, %ymm5, %ymm5
+
+/* sX46over2 = -X^4/8 + x^6/16 */
+        vmulps    %ymm5, %ymm3, %ymm3
+        vaddps    %ymm3, %ymm0, %ymm5
+        vblendvps %ymm11, %ymm5, %ymm2, %ymm2
+        vsubps    %ymm5, %ymm0, %ymm4
+
+/*
+ * Now do another compensated sum to add |X| + [sqrt(1 + X^2) - 1].
+ * It's always safe to assume |X| is larger.
+ * This is the final 2-part argument to the log1p function
+ */
+        vaddps    %ymm2, %ymm6, %ymm14
+
+/*
+ * Now resume the main code.
+ * reduction: compute r,n
+ */
+        vmovups   iBrkValue+__svml_sasinh_data_internal(%rip), %ymm5
+        vaddps    %ymm4, %ymm3, %ymm10
+
+/*
+ * Now we feed into the log1p code, using H in place of _VARG1 and
+ * also adding L into Xl.
+ * compute 1+x as high, low parts
+ */
+        vmaxps    %ymm14, %ymm8, %ymm15
+        vminps    %ymm14, %ymm8, %ymm0
+        vblendvps %ymm11, %ymm10, %ymm1, %ymm12
+        vsubps    %ymm14, %ymm6, %ymm1
+        vaddps    %ymm0, %ymm15, %ymm3
+
+/* Now multiplex to the case X = 2^-30 * input, Xl = sL = 0 in the "big" case. */
+        vmulps    XScale+__svml_sasinh_data_internal(%rip), %ymm6, %ymm6
+        vaddps    %ymm1, %ymm2, %ymm13
+        vsubps    %ymm3, %ymm15, %ymm15
+        vaddps    %ymm13, %ymm12, %ymm1
+        vaddps    %ymm15, %ymm0, %ymm2
+        vblendvps %ymm7, %ymm3, %ymm6, %ymm0
+        vaddps    %ymm2, %ymm1, %ymm4
+        vpsubd    %ymm5, %ymm0, %ymm1
+        vpsrad    $23, %ymm1, %ymm6
+        vpand     iOffExpoMask+__svml_sasinh_data_internal(%rip), %ymm1, %ymm2
+        vmovups   sPoly+224+__svml_sasinh_data_internal(%rip), %ymm1
+        vpslld    $23, %ymm6, %ymm10
+        vpaddd    %ymm5, %ymm2, %ymm13
+        vcvtdq2ps %ymm6, %ymm0
+        vpsubd    %ymm10, %ymm8, %ymm12
+
+/* polynomial evaluation */
+        vsubps    %ymm8, %ymm13, %ymm8
+
+/* Add 31 to the exponent in the "large" case to get log(2 * input) */
+        vaddps    sThirtyOne+__svml_sasinh_data_internal(%rip), %ymm0, %ymm3
+        vandps    %ymm7, %ymm4, %ymm11
+        vmulps    %ymm12, %ymm11, %ymm14
+        vblendvps %ymm7, %ymm0, %ymm3, %ymm0
+        vaddps    %ymm8, %ymm14, %ymm2
+        vfmadd213ps sPoly+192+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213ps sPoly+160+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213ps sPoly+128+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213ps sPoly+96+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213ps sPoly+64+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213ps sPoly+32+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
+        vfmadd213ps sPoly+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
+        vmulps    %ymm1, %ymm2, %ymm4
+        vfmadd213ps %ymm2, %ymm2, %ymm4
+
+/* final reconstruction */
+        vfmadd132ps sLn2+__svml_sasinh_data_internal(%rip), %ymm4, %ymm0
+
+/* Finally, reincorporate the original sign. */
+        vandps    sSign+__svml_sasinh_data_internal(%rip), %ymm9, %ymm7
+        vxorps    %ymm0, %ymm7, %ymm0
+        testl     %edx, %edx
+
+/* Go to special inputs processing branch */
+        jne       L(SPECIAL_VALUES_BRANCH)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm9
+
+/* Restore registers
+ * and exit the function
+ */
+
+L(EXIT):
+        movq      %rbp, %rsp
+        popq      %rbp
+        cfi_def_cfa(7, 8)
+        cfi_restore(6)
+        ret
+        cfi_def_cfa(6, 16)
+        cfi_offset(6, -16)
+
+/* Branch to process
+ * special inputs
+ */
+
+L(SPECIAL_VALUES_BRANCH):
+        vmovups   %ymm9, 32(%rsp)
+        vmovups   %ymm0, 64(%rsp)
+                                # LOE rbx r12 r13 r14 r15 edx ymm0
+
+        xorl      %eax, %eax
+                                # LOE rbx r12 r13 r14 r15 eax edx
+
+        vzeroupper
+        movq      %r12, 16(%rsp)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        movl      %eax, %r12d
+        movq      %r13, 8(%rsp)
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        movl      %edx, %r13d
+        movq      %r14, (%rsp)
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r15 r12d r13d
+
+/* Range mask
+ * bits check
+ */
+
+L(RANGEMASK_CHECK):
+        btl       %r12d, %r13d
+
+/* Call scalar math function */
+        jc        L(SCALAR_MATH_CALL)
+                                # LOE rbx r15 r12d r13d
+
+/* Special inputs
+ * processing loop
+ */
+
+L(SPECIAL_VALUES_LOOP):
+        incl      %r12d
+        cmpl      $8, %r12d
+
+/* Check bits in range mask */
+        jl        L(RANGEMASK_CHECK)
+                                # LOE rbx r15 r12d r13d
+
+        movq      16(%rsp), %r12
+        cfi_restore(12)
+        movq      8(%rsp), %r13
+        cfi_restore(13)
+        movq      (%rsp), %r14
+        cfi_restore(14)
+        vmovups   64(%rsp), %ymm0
+
+/* Go to exit */
+        jmp       L(EXIT)
+        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
+        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
+        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
+                                # LOE rbx r12 r13 r14 r15 ymm0
+
+/* Scalar math fucntion call
+ * to process special input
+ */
+
+L(SCALAR_MATH_CALL):
+        movl      %r12d, %r14d
+        movss     32(%rsp,%r14,4), %xmm0
+        call      asinhf@PLT
+                                # LOE rbx r14 r15 r12d r13d xmm0
+
+        movss     %xmm0, 64(%rsp,%r14,4)
+
+/* Process special inputs in loop */
+        jmp       L(SPECIAL_VALUES_LOOP)
+                                # LOE rbx r15 r12d r13d
+END(_ZGVdN8v_asinhf_avx2)
+
+        .section .rodata, "a"
+        .align 32
+
+#ifdef __svml_sasinh_data_internal_typedef
+typedef unsigned int VUINT32;
+typedef struct {
+        __declspec(align(32)) VUINT32 SgnMask[8][1];
+        __declspec(align(32)) VUINT32 sOne[8][1];
+        __declspec(align(32)) VUINT32 sPoly[8][8][1];
+        __declspec(align(32)) VUINT32 iBrkValue[8][1];
+        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
+        __declspec(align(32)) VUINT32 sBigThreshold[8][1];
+        __declspec(align(32)) VUINT32 sC2[8][1];
+        __declspec(align(32)) VUINT32 sC3[8][1];
+        __declspec(align(32)) VUINT32 sHalf[8][1];
+        __declspec(align(32)) VUINT32 sLargestFinite[8][1];
+        __declspec(align(32)) VUINT32 sLittleThreshold[8][1];
+        __declspec(align(32)) VUINT32 sSign[8][1];
+        __declspec(align(32)) VUINT32 sThirtyOne[8][1];
+        __declspec(align(32)) VUINT32 sTopMask8[8][1];
+        __declspec(align(32)) VUINT32 XScale[8][1];
+        __declspec(align(32)) VUINT32 sLn2[8][1];
+} __svml_sasinh_data_internal;
+#endif
+__svml_sasinh_data_internal:
+        /*== SgnMask ==*/
+        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
+        /*== sOne = SP 1.0 ==*/
+        .align 32
+        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
+        /*== sPoly[] = SP polynomial ==*/
+        .align 32
+        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
+        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
+        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
+        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
+        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
+        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
+        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
+        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
+        /*== iBrkValue = SP 2/3 ==*/
+        .align 32
+        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
+        /*== iOffExpoMask = SP significand mask ==*/
+        .align 32
+        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
+        /*== sBigThreshold ==*/
+        .align 32
+        .long 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000
+        /*== sC2 ==*/
+        .align 32
+        .long 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000
+        /*== sC3 ==*/
+        .align 32
+        .long 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000
+        /*== sHalf ==*/
+        .align 32
+        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
+        /*== sLargestFinite ==*/
+        .align 32
+        .long 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF
+        /*== sLittleThreshold ==*/
+        .align 32
+        .long 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000
+        /*== sSign ==*/
+        .align 32
+        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000
+        /*== sThirtyOne ==*/
+        .align 32
+        .long 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000
+        /*== sTopMask8 ==*/
+        .align 32
+        .long 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000
+        /*== XScale ==*/
+        .align 32
+        .long 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000
+        /*== sLn2 = SP ln(2) ==*/
+        .align 32
+        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
+        .align 32
+        .type	__svml_sasinh_data_internal,@object
+        .size	__svml_sasinh_data_internal,.-__svml_sasinh_data_internal
diff --git a/sysdeps/x86_64/fpu/svml_d_asinh2_core.S b/sysdeps/x86_64/fpu/svml_d_asinh2_core.S
new file mode 100644
index 0000000000..60e372238a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_asinh2_core.S
@@ -0,0 +1,29 @@
+/* Function asinh vectorized with SSE2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN2v_asinh)
+WRAPPER_IMPL_SSE2 asinh
+END (_ZGVbN2v_asinh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN2v_asinh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_asinh4_core.S b/sysdeps/x86_64/fpu/svml_d_asinh4_core.S
new file mode 100644
index 0000000000..c7350011e1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_asinh4_core.S
@@ -0,0 +1,29 @@
+/* Function asinh vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN4v_asinh)
+WRAPPER_IMPL_AVX _ZGVbN2v_asinh
+END (_ZGVdN4v_asinh)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN4v_asinh)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S
new file mode 100644
index 0000000000..83aaa8c3f1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S
@@ -0,0 +1,25 @@
+/* Function asinh vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVcN4v_asinh)
+WRAPPER_IMPL_AVX _ZGVbN2v_asinh
+END (_ZGVcN4v_asinh)
diff --git a/sysdeps/x86_64/fpu/svml_d_asinh8_core.S b/sysdeps/x86_64/fpu/svml_d_asinh8_core.S
new file mode 100644
index 0000000000..9597975ff6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_asinh8_core.S
@@ -0,0 +1,25 @@
+/* Function asinh vectorized with AVX-512, wrapper to AVX2.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_d_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN8v_asinh)
+WRAPPER_IMPL_AVX512 _ZGVdN4v_asinh
+END (_ZGVeN8v_asinh)
diff --git a/sysdeps/x86_64/fpu/svml_s_asinhf16_core.S b/sysdeps/x86_64/fpu/svml_s_asinhf16_core.S
new file mode 100644
index 0000000000..5b3d405f2e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_asinhf16_core.S
@@ -0,0 +1,25 @@
+/* Function asinhf vectorized with AVX-512. Wrapper to AVX2 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVeN16v_asinhf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_asinhf
+END (_ZGVeN16v_asinhf)
diff --git a/sysdeps/x86_64/fpu/svml_s_asinhf4_core.S b/sysdeps/x86_64/fpu/svml_s_asinhf4_core.S
new file mode 100644
index 0000000000..af44fa5108
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_asinhf4_core.S
@@ -0,0 +1,29 @@
+/* Function asinhf vectorized with SSE2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVbN4v_asinhf)
+WRAPPER_IMPL_SSE2 asinhf
+END (_ZGVbN4v_asinhf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_asinhf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_asinhf8_core.S b/sysdeps/x86_64/fpu/svml_s_asinhf8_core.S
new file mode 100644
index 0000000000..3bd06d8032
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_asinhf8_core.S
@@ -0,0 +1,29 @@
+/* Function asinhf vectorized with AVX2, wrapper version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+	.text
+ENTRY (_ZGVdN8v_asinhf)
+WRAPPER_IMPL_AVX _ZGVbN4v_asinhf
+END (_ZGVdN8v_asinhf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_asinhf)
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S
new file mode 100644
index 0000000000..f79616c0bd
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S
@@ -0,0 +1,25 @@
+/* Function asinhf vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
+
+        .text
+ENTRY (_ZGVcN8v_asinhf)
+WRAPPER_IMPL_AVX _ZGVbN4v_asinhf
+END (_ZGVcN8v_asinhf)
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx.c
new file mode 100644
index 0000000000..da03528700
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-asinh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx2.c
new file mode 100644
index 0000000000..da03528700
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx2.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-asinh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx512f.c
new file mode 100644
index 0000000000..da03528700
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx512f.c
@@ -0,0 +1 @@
+#include "test-double-libmvec-asinh.c"
diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asinh.c b/sysdeps/x86_64/fpu/test-double-libmvec-asinh.c
new file mode 100644
index 0000000000..71e6b9f578
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-libmvec-asinh.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE double
+#define LIBMVEC_FUNC asinh
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
index f53bb6813e..76114772ba 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
@@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVbN2v_acosh)
 VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVbN2v_erf)
 VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVbN2v_tanh)
+VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVbN2v_asinh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
index 0452c3db38..1e0ee34975 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
@@ -48,6 +48,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVdN4v_acosh)
 VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVdN4v_erf)
 VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVdN4v_tanh)
+VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVdN4v_asinh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m256i
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
index 197d5afc88..17c43a75d1 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
@@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVcN4v_acosh)
 VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVcN4v_erf)
 VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVcN4v_tanh)
+VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVcN4v_asinh)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
index e56ece640c..1c6809e6e3 100644
--- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
@@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
 VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVeN8v_acosh)
 VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVeN8v_erf)
 VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVeN8v_tanh)
+VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVeN8v_asinh)
 
 #ifndef __ILP32__
 # define VEC_INT_TYPE __m512i
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx.c
new file mode 100644
index 0000000000..77e1838bb4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-asinhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx2.c
new file mode 100644
index 0000000000..77e1838bb4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx2.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-asinhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx512f.c
new file mode 100644
index 0000000000..77e1838bb4
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx512f.c
@@ -0,0 +1 @@
+#include "test-float-libmvec-asinhf.c"
diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinhf.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf.c
new file mode 100644
index 0000000000..3353754102
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf.c
@@ -0,0 +1,3 @@
+#define LIBMVEC_TYPE float
+#define LIBMVEC_FUNC asinhf
+#include "test-vector-abi-arg1.h"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index abbebf9993..e8ab1885a7 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVeN16v_acoshf)
 VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVeN16v_erff)
 VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVeN16v_tanhf)
+VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVeN16v_asinhf)
 
 #define VEC_INT_TYPE __m512i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index ae1c8b98c2..a80c5387e4 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVbN4v_acoshf)
 VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVbN4v_erff)
 VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVbN4v_tanhf)
+VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVbN4v_asinhf)
 
 #define VEC_INT_TYPE __m128i
 
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index eb477a0371..c3d1d5936b 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -48,6 +48,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVdN8v_acoshf)
 VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVdN8v_erff)
 VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVdN8v_tanhf)
+VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVdN8v_asinhf)
 
 /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
 #undef VECTOR_WRAPPER_fFF
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 944f7f0a75..b7da0f523b 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
 VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVcN8v_acoshf)
 VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVcN8v_erff)
 VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVcN8v_tanhf)
+VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVcN8v_asinhf)
 
 #define VEC_INT_TYPE __m128i
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 03/18] x86-64: Add vector hypot/hypotf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 03/18] x86-64: Add vector hypot/hypotf " Sunil K Pandey
@ 2021-12-29 21:24   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:24 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:45PM -0800, Sunil K Pandey wrote:
> Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector hypot/hypotf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 ++
>  .../fpu/multiarch/svml_d_hypot2_core-sse2.S   |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_hypot2_core.c |  28 ++
>  .../fpu/multiarch/svml_d_hypot2_core_sse4.S   | 279 +++++++++++++++++
>  .../fpu/multiarch/svml_d_hypot4_core-sse.S    |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_hypot4_core.c |  28 ++
>  .../fpu/multiarch/svml_d_hypot4_core_avx2.S   | 289 ++++++++++++++++++
>  .../fpu/multiarch/svml_d_hypot8_core-avx2.S   |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_hypot8_core.c |  28 ++
>  .../fpu/multiarch/svml_d_hypot8_core_avx512.S | 235 ++++++++++++++
>  .../fpu/multiarch/svml_s_hypotf16_core-avx2.S |  20 ++
>  .../fpu/multiarch/svml_s_hypotf16_core.c      |  28 ++
>  .../multiarch/svml_s_hypotf16_core_avx512.S   | 239 +++++++++++++++
>  .../fpu/multiarch/svml_s_hypotf4_core-sse2.S  |  20 ++
>  .../fpu/multiarch/svml_s_hypotf4_core.c       |  28 ++
>  .../fpu/multiarch/svml_s_hypotf4_core_sse4.S  | 265 ++++++++++++++++
>  .../fpu/multiarch/svml_s_hypotf8_core-sse.S   |  20 ++
>  .../fpu/multiarch/svml_s_hypotf8_core.c       |  28 ++
>  .../fpu/multiarch/svml_s_hypotf8_core_avx2.S  | 269 ++++++++++++++++
>  sysdeps/x86_64/fpu/svml_d_hypot2_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_d_hypot4_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S   |  25 ++
>  sysdeps/x86_64/fpu/svml_d_hypot8_core.S       |  25 ++
>  sysdeps/x86_64/fpu/svml_s_hypotf16_core.S     |  25 ++
>  sysdeps/x86_64/fpu/svml_s_hypotf4_core.S      |  29 ++
>  sysdeps/x86_64/fpu/svml_s_hypotf8_core.S      |  29 ++
>  sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S  |  25 ++
>  .../fpu/test-double-libmvec-hypot-avx.c       |   1 +
>  .../fpu/test-double-libmvec-hypot-avx2.c      |   1 +
>  .../fpu/test-double-libmvec-hypot-avx512f.c   |   1 +
>  .../x86_64/fpu/test-double-libmvec-hypot.c    |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../fpu/test-float-libmvec-hypotf-avx.c       |   1 +
>  .../fpu/test-float-libmvec-hypotf-avx2.c      |   1 +
>  .../fpu/test-float-libmvec-hypotf-avx512f.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-hypotf.c    |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 2151 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_hypot8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-hypot.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index ae8ee882d0..adf65f6bc2 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -131,4 +131,15 @@
>  #define __DECL_SIMD_asinf32x
>  #define __DECL_SIMD_asinf64x
>  #define __DECL_SIMD_asinf128x
> +
> +#define __DECL_SIMD_hypot
> +#define __DECL_SIMD_hypotf
> +#define __DECL_SIMD_hypotl
> +#define __DECL_SIMD_hypotf16
> +#define __DECL_SIMD_hypotf32
> +#define __DECL_SIMD_hypotf64
> +#define __DECL_SIMD_hypotf128
> +#define __DECL_SIMD_hypotf32x
> +#define __DECL_SIMD_hypotf64x
> +#define __DECL_SIMD_hypotf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index bb53b7021e..2ed820a0dc 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -144,7 +144,7 @@ __MATHCALL (sqrt,, (_Mdouble_ __x));
>  
>  #if defined __USE_XOPEN || defined __USE_ISOC99
>  /* Return `sqrt(X*X + Y*Y)'.  */
> -__MATHCALL (hypot,, (_Mdouble_ __x, _Mdouble_ __y));
> +__MATHCALL_VEC (hypot,, (_Mdouble_ __x, _Mdouble_ __y));
>  #endif
>  
>  #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index ab03a07f92..12bb03245b 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -49,24 +49,32 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
>  GLIBC_2.35 _ZGVbN2v_acos F
>  GLIBC_2.35 _ZGVbN2v_asin F
>  GLIBC_2.35 _ZGVbN2v_atan F
> +GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
>  GLIBC_2.35 _ZGVbN4v_atanf F
> +GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
>  GLIBC_2.35 _ZGVcN4v_asin F
>  GLIBC_2.35 _ZGVcN4v_atan F
> +GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
>  GLIBC_2.35 _ZGVcN8v_atanf F
> +GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
>  GLIBC_2.35 _ZGVdN4v_asin F
>  GLIBC_2.35 _ZGVdN4v_atan F
> +GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
>  GLIBC_2.35 _ZGVdN8v_atanf F
> +GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
>  GLIBC_2.35 _ZGVeN16v_atanf F
> +GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
>  GLIBC_2.35 _ZGVeN8v_asin F
>  GLIBC_2.35 _ZGVeN8v_atan F
> +GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 73cb8849ff..437977c5fd 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -70,6 +70,10 @@
>  #  define __DECL_SIMD_asin __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_asinf
>  #  define __DECL_SIMD_asinf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_hypot
> +#  define __DECL_SIMD_hypot __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_hypotf
> +#  define __DECL_SIMD_hypotf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 4552c2bdfa..cda31479a6 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -34,6 +34,8 @@
>  !GCC$ builtin (atanf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (asin) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (asinf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (hypot) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (hypotf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -53,3 +55,5 @@
>  !GCC$ builtin (atanf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (asin) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (asinf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (hypot) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (hypotf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index e0eae0b196..7769a02731 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -27,6 +27,7 @@ libmvec-funcs = \
>    atan \
>    cos \
>    exp \
> +  hypot \
>    log \
>    pow \
>    sin \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 10baf869a5..e359e5dc2c 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -17,8 +17,10 @@ libmvec {
>      _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
>      _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
>      _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
> +    _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
>      _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
> +    _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
>    }
>  }
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index ea0f833381..a7513ec94e 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -1375,6 +1375,26 @@ double: 1
>  float128: 1
>  ldouble: 1
>  
> +Function: "hypot_vlen16":
> +float: 1
> +
> +Function: "hypot_vlen2":
> +double: 1
> +
> +Function: "hypot_vlen4":
> +double: 1
> +float: 1
> +
> +Function: "hypot_vlen4_avx2":
> +double: 1
> +
> +Function: "hypot_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "hypot_vlen8_avx2":
> +float: 1
> +
>  Function: "j0":
>  double: 3
>  float: 9
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S
> new file mode 100644
> index 0000000000..237e38459e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized hypot.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2vv_hypot _ZGVbN2vv_hypot_sse2
> +#include "../svml_d_hypot2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c
> new file mode 100644
> index 0000000000..3f0865f05d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized hypot, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2vv_hypot
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2vv_hypot, __GI__ZGVbN2vv_hypot,
> +	       __redirect__ZGVbN2vv_hypot)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S
> new file mode 100644
> index 0000000000..931f34e5f2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot2_core_sse4.S
> @@ -0,0 +1,279 @@
> +/* Function hypot vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      HIGH LEVEL OVERVIEW
> + *
> + *      Calculate z = (x*x+y*y)
> + *      Calculate reciplicle sqrt (z)
> + *      Calculate error = z*(rsqrt(z)*rsqrt(z)) - 1
> + *      Calculate fixing part p with polynom
> + *      Fix answer with sqrt(z) = z * rsqrt(z) + error * p * z
> + *
> + *      ALGORITHM DETAILS
> + *
> + *    Multiprecision branch for _HA_ only
> + *      Remove sigm from both arguments
> + *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
> + *      Split _x int _a and _b for multiprecision
> + *      If _x >> _y we will we will not split _y for multiprecision
> + *      all _y will be put into lower part (_d) and higher part (_c = 0)
> + *      Fixing _hilo_mask for the case _x >> _y
> + *      Split _y into _c and _d for multiprecision with fixed mask
> + *
> + *      compute Hi and Lo parts of _z = _x*_x + _y*_y
> + *
> + *      _zHi = _a*_a + _c*_c
> + *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
> + *      _z = _zHi + _zLo
> + *
> + *    No multiprecision branch for _LA_ and _EP_
> + *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
> + *
> + *    Check _z exponent to be withing borders [3BC ; 441] else goto Callout
> + *
> + *    _s  ~ 1.0/sqrt(_z)
> + *    _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z = (1.0/_z + O)
> + *    _e[rror]  =  (1.0/_z + O) * _z - 1.0
> + *    calculate fixing part _p
> + *    _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1
> + *    some parts of polynom are skipped for lower flav
> + *
> + *    result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dhypot_data_internal
> + */
> +#define _dHiLoMask                    	0
> +#define _dAbsMask                     	16
> +#define _dOne                         	32
> +#define _POLY_C5                      	48
> +#define _POLY_C4                      	64
> +#define _POLY_C3                      	80
> +#define _POLY_C2                      	96
> +#define _POLY_C1                      	112
> +#define _LowBoundary                  	128
> +#define _HighBoundary                 	144
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2vv_hypot_sse4)
> +        subq      $88, %rsp
> +        cfi_def_cfa_offset(96)
> +
> +/*
> + *  Defines
> + *  Implementation
> + * Multiprecision branch for _HA_ only
> + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
> + */
> +        movaps    %xmm0, %xmm10
> +        movaps    %xmm1, %xmm2
> +        mulpd     %xmm0, %xmm10
> +        mulpd     %xmm1, %xmm2
> +        addpd     %xmm2, %xmm10
> +
> +/*
> + * _s  ~ 1.0/sqrt(_z)
> + * _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z
> + */
> +        cvtpd2ps  %xmm10, %xmm7
> +        movlhps   %xmm7, %xmm7
> +        rsqrtps   %xmm7, %xmm8
> +        cvtps2pd  %xmm8, %xmm11
> +        movaps    %xmm11, %xmm2
> +        mulpd     %xmm11, %xmm2
> +
> +/* _e[rror]  ~  (1.0/_z + O) * _z - 1.0 */
> +        mulpd     %xmm10, %xmm2
> +        subpd     _dOne+__svml_dhypot_data_internal(%rip), %xmm2
> +
> +/*
> + * calculate fixing part _p
> + * _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1
> + * some parts of polynom are skipped for lower flav
> + */
> +        movups    _POLY_C4+__svml_dhypot_data_internal(%rip), %xmm9
> +        mulpd     %xmm2, %xmm9
> +        addpd     _POLY_C3+__svml_dhypot_data_internal(%rip), %xmm9
> +        mulpd     %xmm2, %xmm9
> +        addpd     _POLY_C2+__svml_dhypot_data_internal(%rip), %xmm9
> +        mulpd     %xmm2, %xmm9
> +        addpd     _POLY_C1+__svml_dhypot_data_internal(%rip), %xmm9
> +
> +/* result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z */
> +        mulpd     %xmm9, %xmm2
> +        mulpd     %xmm11, %xmm2
> +        mulpd     %xmm10, %xmm11
> +        mulpd     %xmm10, %xmm2
> +
> +/* Check _z exponent to be withing borders [3BC ; 441] else goto Callout */
> +        movq      _LowBoundary+__svml_dhypot_data_internal(%rip), %xmm5
> +        movq      _HighBoundary+__svml_dhypot_data_internal(%rip), %xmm3
> +        pshufd    $221, %xmm10, %xmm4
> +        pcmpgtd   %xmm4, %xmm5
> +        pcmpgtd   %xmm3, %xmm4
> +        por       %xmm4, %xmm5
> +        pshufd    $80, %xmm5, %xmm6
> +        movmskpd  %xmm6, %edx
> +        addpd     %xmm11, %xmm2
> +
> +/*  The end of implementation  */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1 xmm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm2, %xmm0
> +        addq      $88, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(96)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm1, 48(%rsp)
> +        movups    %xmm2, 64(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -80)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -88)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -96)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    64(%rsp), %xmm2
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -80)
> +        cfi_offset(13, -88)
> +        cfi_offset(14, -96)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm2
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        movsd     48(%rsp,%r14,8), %xmm1
> +        call      hypot@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN2vv_hypot_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dhypot_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _dHiLoMask[2][2];
> +        __declspec(align(16)) VUINT32 _dAbsMask[2][2];
> +        __declspec(align(16)) VUINT32 _dOne[2][2];
> +        __declspec(align(16)) VUINT32 _POLY_C5[2][2];
> +        __declspec(align(16)) VUINT32 _POLY_C4[2][2];
> +        __declspec(align(16)) VUINT32 _POLY_C3[2][2];
> +        __declspec(align(16)) VUINT32 _POLY_C2[2][2];
> +        __declspec(align(16)) VUINT32 _POLY_C1[2][2];
> +        __declspec(align(16)) VUINT32 _LowBoundary[4][1];
> +        __declspec(align(16)) VUINT32 _HighBoundary[4][1];
> +} __svml_dhypot_data_internal;
> +#endif
> +__svml_dhypot_data_internal:
> +        /* legacy algorithm */
> +        .quad 0xffffc00000000000, 0xffffc00000000000       /* _dHiLoMask     */
> +        .align 16
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff       /* _dAbsMask      */
> +        .align 16
> +        .quad 0x3FF0000000000000, 0x3FF0000000000000       /* _dOne          */
> +        .align 16
> +        .quad 0xBFCF800000000000, 0xBFCF800000000000       /* _POLY_C5            */
> +        .align 16
> +        .quad 0x3FD1800000000000, 0x3FD1800000000000       /* _POLY_C4            */
> +        .align 16
> +        .quad 0xBFD4000000000000, 0xBFD4000000000000       /* _POLY_C3            */
> +        .align 16
> +        .quad 0x3FD8000000000000, 0x3FD8000000000000       /* _POLY_C2            */
> +        .align 16
> +        .quad 0xBFE0000000000000, 0xBFE0000000000000       /* _POLY_C1            */
> +        .align 16
> +        .long 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000       /* _LowBoundary   */
> +        .align 16
> +        .long 0x44100000, 0x44100000, 0x44100000, 0x44100000       /* _HighBoundary  */
> +        .align 16
> +        .type	__svml_dhypot_data_internal,@object
> +        .size	__svml_dhypot_data_internal,.-__svml_dhypot_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S
> new file mode 100644
> index 0000000000..5e7c75c44c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized hypot.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4vv_hypot _ZGVdN4vv_hypot_sse_wrapper
> +#include "../svml_d_hypot4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c
> new file mode 100644
> index 0000000000..06f34d35e1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized hypot, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4vv_hypot
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4vv_hypot, __GI__ZGVdN4vv_hypot,
> +	       __redirect__ZGVdN4vv_hypot)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S
> new file mode 100644
> index 0000000000..45028ab7e9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot4_core_avx2.S
> @@ -0,0 +1,289 @@
> +/* Function hypot vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      HIGH LEVEL OVERVIEW
> + *
> + *      Calculate z = (x*x+y*y)
> + *      Calculate reciplicle sqrt (z)
> + *      Calculate error = z*(rsqrt(z)*rsqrt(z)) - 1
> + *      Calculate fixing part p with polynom
> + *      Fix answer with sqrt(z) = z * rsqrt(z) + error * p * z
> + *
> + *      ALGORITHM DETAILS
> + *
> + *    Multiprecision branch for _HA_ only
> + *      Remove sigm from both arguments
> + *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
> + *      Split _x int _a and _b for multiprecision
> + *      If _x >> _y we will we will not split _y for multiprecision
> + *      all _y will be put into lower part (_d) and higher part (_c = 0)
> + *      Fixing _hilo_mask for the case _x >> _y
> + *      Split _y into _c and _d for multiprecision with fixed mask
> + *
> + *      compute Hi and Lo parts of _z = _x*_x + _y*_y
> + *
> + *      _zHi = _a*_a + _c*_c
> + *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
> + *      _z = _zHi + _zLo
> + *
> + *    No multiprecision branch for _LA_ and _EP_
> + *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
> + *
> + *    Check _z exponent to be withing borders [3BC ; 441] else goto Callout
> + *
> + *    _s  ~ 1.0/sqrt(_z)
> + *    _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z = (1.0/_z + O)
> + *    _e[rror]  =  (1.0/_z + O) * _z - 1.0
> + *    calculate fixing part _p
> + *    _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1
> + *    some parts of polynom are skipped for lower flav
> + *
> + *    result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dhypot_data_internal
> + */
> +#define _dHiLoMask                    	0
> +#define _dAbsMask                     	32
> +#define _dOne                         	64
> +#define _POLY_C5                      	96
> +#define _POLY_C4                      	128
> +#define _POLY_C3                      	160
> +#define _POLY_C2                      	192
> +#define _POLY_C1                      	224
> +#define _LowBoundary                  	256
> +#define _HighBoundary                 	288
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4vv_hypot_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $128, %rsp
> +        vmovapd   %ymm1, %ymm2
> +        vmovapd   %ymm0, %ymm1
> +
> +/*
> + *  Defines
> + *  Implementation
> + * Multiprecision branch for _HA_ only
> + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
> + */
> +        vmulpd    %ymm1, %ymm1, %ymm0
> +
> +/*
> + * calculate fixing part _p
> + * _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1
> + * some parts of polynom are skipped for lower flav
> + */
> +        vmovupd   _POLY_C4+__svml_dhypot_data_internal(%rip), %ymm15
> +        vmovups   _LowBoundary+__svml_dhypot_data_internal(%rip), %xmm4
> +        vfmadd231pd %ymm2, %ymm2, %ymm0
> +
> +/*
> + * _s  ~ 1.0/sqrt(_z)
> + * _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z
> + */
> +        vcvtpd2ps %ymm0, %xmm12
> +
> +/* Check _z exponent to be withing borders [3BC ; 441] else goto Callout */
> +        vextractf128 $1, %ymm0, %xmm3
> +        vrsqrtps  %xmm12, %xmm13
> +        vshufps   $221, %xmm3, %xmm0, %xmm5
> +        vcvtps2pd %xmm13, %ymm3
> +        vpcmpgtd  %xmm5, %xmm4, %xmm6
> +        vpcmpgtd  _HighBoundary+__svml_dhypot_data_internal(%rip), %xmm5, %xmm7
> +        vpor      %xmm7, %xmm6, %xmm9
> +        vpshufd   $80, %xmm9, %xmm8
> +        vmulpd    %ymm3, %ymm3, %ymm14
> +        vpshufd   $250, %xmm9, %xmm10
> +
> +/* _e[rror]  ~  (1.0/_z + O) * _z - 1.0 */
> +        vfmsub213pd _dOne+__svml_dhypot_data_internal(%rip), %ymm0, %ymm14
> +        vfmadd213pd _POLY_C3+__svml_dhypot_data_internal(%rip), %ymm14, %ymm15
> +        vfmadd213pd _POLY_C2+__svml_dhypot_data_internal(%rip), %ymm14, %ymm15
> +        vfmadd213pd _POLY_C1+__svml_dhypot_data_internal(%rip), %ymm14, %ymm15
> +
> +/* result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z */
> +        vmulpd    %ymm15, %ymm14, %ymm14
> +        vmulpd    %ymm14, %ymm3, %ymm15
> +        vmulpd    %ymm15, %ymm0, %ymm4
> +        vfmadd213pd %ymm4, %ymm3, %ymm0
> +        vinsertf128 $1, %xmm10, %ymm8, %ymm11
> +        vmovmskpd %ymm11, %edx
> +
> +/*  The end of implementation  */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1 ymm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm1, 32(%rsp)
> +        vmovupd   %ymm2, 64(%rsp)
> +        vmovupd   %ymm0, 96(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   96(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        movsd     64(%rsp,%r14,8), %xmm1
> +        call      hypot@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 96(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4vv_hypot_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dhypot_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _dHiLoMask[4][2];
> +        __declspec(align(32)) VUINT32 _dAbsMask[4][2];
> +        __declspec(align(32)) VUINT32 _dOne[4][2];
> +        __declspec(align(32)) VUINT32 _POLY_C5[4][2];
> +        __declspec(align(32)) VUINT32 _POLY_C4[4][2];
> +        __declspec(align(32)) VUINT32 _POLY_C3[4][2];
> +        __declspec(align(32)) VUINT32 _POLY_C2[4][2];
> +        __declspec(align(32)) VUINT32 _POLY_C1[4][2];
> +        __declspec(align(32)) VUINT32 _LowBoundary[8][1];
> +        __declspec(align(32)) VUINT32 _HighBoundary[8][1];
> +} __svml_dhypot_data_internal;
> +#endif
> +__svml_dhypot_data_internal:
> +        /* legacy algorithm */
> +        .quad 0xffffc00000000000, 0xffffc00000000000, 0xffffc00000000000, 0xffffc00000000000       /* _dHiLoMask     */
> +        .align 32
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff       /* _dAbsMask      */
> +        .align 32
> +        .quad 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000       /* _dOne          */
> +        .align 32
> +        .quad 0xBFCF800000000000, 0xBFCF800000000000, 0xBFCF800000000000, 0xBFCF800000000000       /* _POLY_C5            */
> +        .align 32
> +        .quad 0x3FD1800000000000, 0x3FD1800000000000, 0x3FD1800000000000, 0x3FD1800000000000       /* _POLY_C4            */
> +        .align 32
> +        .quad 0xBFD4000000000000, 0xBFD4000000000000, 0xBFD4000000000000, 0xBFD4000000000000       /* _POLY_C3            */
> +        .align 32
> +        .quad 0x3FD8000000000000, 0x3FD8000000000000, 0x3FD8000000000000, 0x3FD8000000000000       /* _POLY_C2            */
> +        .align 32
> +        .quad 0xBFE0000000000000, 0xBFE0000000000000, 0xBFE0000000000000, 0xBFE0000000000000       /* _POLY_C1            */
> +        .align 32
> +        .long 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000, 0x3BC00000       /* _LowBoundary   */
> +        .align 32
> +        .long 0x44100000, 0x44100000, 0x44100000, 0x44100000, 0x44100000, 0x44100000, 0x44100000, 0x44100000       /* _HighBoundary  */
> +        .align 32
> +        .type	__svml_dhypot_data_internal,@object
> +        .size	__svml_dhypot_data_internal,.-__svml_dhypot_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S
> new file mode 100644
> index 0000000000..a53e82cf9a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized hypot.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8vv_hypot _ZGVeN8vv_hypot_avx2_wrapper
> +#include "../svml_d_hypot8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c
> new file mode 100644
> index 0000000000..6052c752c9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized hypot, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8vv_hypot
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8vv_hypot, __GI__ZGVeN8vv_hypot,
> +	       __redirect__ZGVeN8vv_hypot)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S
> new file mode 100644
> index 0000000000..1e5e716a8d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_hypot8_core_avx512.S
> @@ -0,0 +1,235 @@
> +/* Function hypot vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      HIGH LEVEL OVERVIEW
> + *
> + *      Calculate z = (x*x+y*y)
> + *      Calculate reciplicle sqrt (z)
> + *      Calculate error = z*(rsqrt(z)*rsqrt(z)) - 1
> + *      Calculate fixing part p with polynom
> + *      Fix answer with sqrt(z) = z * rsqrt(z) + error * p * z
> + *
> + *      ALGORITHM DETAILS
> + *
> + *    Multiprecision branch for _HA_ only
> + *      Remove sigm from both arguments
> + *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
> + *      Split _x int _a and _b for multiprecision
> + *      If _x >> _y we will we will not split _y for multiprecision
> + *      all _y will be put into lower part (_d) and higher part (_c = 0)
> + *      Fixing _hilo_mask for the case _x >> _y
> + *      Split _y into _c and _d for multiprecision with fixed mask
> + *
> + *      compute Hi and Lo parts of _z = _x*_x + _y*_y
> + *
> + *      _zHi = _a*_a + _c*_c
> + *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
> + *      _z = _zHi + _zLo
> + *
> + *    No multiprecision branch for _LA_ and _EP_
> + *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
> + *
> + *    Check _z exponent to be withing borders [3BC ; 441] else goto Callout
> + *
> + *    _s  ~ 1.0/sqrt(_z)
> + *    _s2 ~ 1.0/(sqrt(_z)*sqrt(_z)) ~ 1.0/_z = (1.0/_z + O)
> + *    _e[rror]  =  (1.0/_z + O) * _z - 1.0
> + *    calculate fixing part _p
> + *    _p = (((_POLY_C5*_e + _POLY_C4)*_e +_POLY_C3)*_e +_POLY_C2)*_e + _POLY_C1
> + *    some parts of polynom are skipped for lower flav
> + *
> + *    result = _z * (1.0/sqrt(_z) + O) + _p * _e[rror] * _z
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dhypot_data_internal
> + */
> +#define _dAbsMask                     	0
> +#define _lExpBound_uisa               	64
> +#define _lExpBound                    	128
> +#define _dHalf                        	192
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8vv_hypot_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $256, %rsp
> +        vgetexppd {sae}, %zmm0, %zmm2
> +        vgetexppd {sae}, %zmm1, %zmm3
> +        vmovups   _dHalf+__svml_dhypot_data_internal(%rip), %zmm9
> +        vmaxpd    {sae}, %zmm3, %zmm2, %zmm4
> +        vmulpd    {rn-sae}, %zmm0, %zmm0, %zmm2
> +        vandpd    _dAbsMask+__svml_dhypot_data_internal(%rip), %zmm4, %zmm5
> +        vfmadd231pd {rn-sae}, %zmm1, %zmm1, %zmm2
> +
> +/* Select exponent bound so that no scaling is needed */
> +        vpcmpq    $5, _lExpBound_uisa+__svml_dhypot_data_internal(%rip), %zmm5, %k0
> +        vrsqrt14pd %zmm2, %zmm6
> +        kmovw     %k0, %edx
> +        vmulpd    {rn-sae}, %zmm6, %zmm2, %zmm7
> +        vmulpd    {rn-sae}, %zmm6, %zmm9, %zmm8
> +        vfnmadd231pd {rn-sae}, %zmm7, %zmm8, %zmm9
> +        vfmadd231pd {rn-sae}, %zmm9, %zmm8, %zmm8
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm7, %zmm9
> +        vfnmadd231pd {rn-sae}, %zmm9, %zmm9, %zmm2
> +        vfmadd213pd {rn-sae}, %zmm9, %zmm8, %zmm2
> +
> +/*  The end of implementation  */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1 zmm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %zmm2, %zmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm0, 64(%rsp)
> +        vmovups   %zmm1, 128(%rsp)
> +        vmovups   %zmm2, 192(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm2
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   192(%rsp), %zmm2
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm2
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        movsd     128(%rsp,%r14,8), %xmm1
> +        call      hypot@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 192(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8vv_hypot_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dhypot_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _dAbsMask[8][2];
> +        __declspec(align(64)) VUINT32 _lExpBound_uisa[8][2];
> +        __declspec(align(64)) VUINT32 _lExpBound[8][2];
> +        __declspec(align(64)) VUINT32 _dHalf[8][2];
> +} __svml_dhypot_data_internal;
> +#endif
> +__svml_dhypot_data_internal:
> +        /* legacy algorithm */
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff       /* _dAbsMask      */
> +        /* fma based algorithm*/
> +        .align 64
> +        .quad 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000, 0x407ff00000000000       /* _lExpBound_uisa */
> +        .align 64
> +        .quad 0x404f800000000000, 0x404f800000000000, 0x404f800000000000, 0x404f800000000000, 0x404f800000000000, 0x404f800000000000, 0x404f800000000000, 0x404f800000000000       /* _lExpBound      */
> +        .align 64
> +        .quad 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000       /* _dHalf          */
> +        .align 64
> +        .type	__svml_dhypot_data_internal,@object
> +        .size	__svml_dhypot_data_internal,.-__svml_dhypot_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S
> new file mode 100644
> index 0000000000..a6ba40df4d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized hypotf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16vv_hypotf _ZGVeN16vv_hypotf_avx2_wrapper
> +#include "../svml_s_hypotf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c
> new file mode 100644
> index 0000000000..0c9eb6a364
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized hypotf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16vv_hypotf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16vv_hypotf, __GI__ZGVeN16vv_hypotf,
> +	       __redirect__ZGVeN16vv_hypotf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S
> new file mode 100644
> index 0000000000..46a156d136
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf16_core_avx512.S
> @@ -0,0 +1,239 @@
> +/* Function hypotf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      HIGH LEVEL OVERVIEW
> + *
> + *      Calculate z = (x*x+y*y)
> + *      Calculate reciplicle sqrt (z)
> + *      Calculate make two NR iterations
> + *
> + *      ALGORITHM DETAILS
> + *
> + *    Multiprecision branch for _HA_ only
> + *      Remove sigm from both arguments
> + *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
> + *      Split _x int _a and _b for multiprecision
> + *      If _x >> _y we will we will not split _y for multiprecision
> + *      all _y will be put into lower part (_d) and higher part (_c = 0)
> + *      Fixing _hilo_mask for the case _x >> _y
> + *      Split _y into _c and _d for multiprecision with fixed mask
> + *
> + *      compute Hi and Lo parts of _z = _x*_x + _y*_y
> + *
> + *      _zHi = _a*_a + _c*_c
> + *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
> + *      _z = _zHi + _zLo
> + *
> + *    No multiprecision branch for _LA_ and _EP_
> + *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
> + *
> + *    Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout
> + *
> + *    Compute resciplicle sqrt s0 ~ 1.0/sqrt(_z),
> + *      that multiplied by _z, is final result for _EP_ version.
> + *
> + *    First iteration (or zero iteration):
> + *       s =  z * s0
> + *       h = .5 * s0
> + *       d =  s *  h - .5
> + *
> + *    Second iteration:
> + *       h = d * h + h
> + *       s = s * d + s
> + *       d = s * s - z (in multiprecision for _HA_)
> + *
> + *    result = s - h * d
> + *
> + *    EP version of the function can be implemented as y[i]=sqrt(a[i]^2+b[i]^2)
> + *    with all intermediate operations done in target precision for i=1,..,n.
> + *    It can return result y[i]=0 in case a[i]^2 and b[i]^2 underflow in target
> + *    precision (for some i). It can return result y[i]=NAN in case
> + *    a[i]^2+b[i]^2 overflow in target precision, for some i. It can return
> + *    result y[i]=NAN in case a[i] or b[i] is infinite, for some i.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_shypot_data_internal
> + */
> +#define _sAbsMask                     	0
> +#define _sHalf                        	64
> +#define _iExpBound                    	128
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16vv_hypotf_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $256, %rsp
> +        vgetexpps {sae}, %zmm0, %zmm2
> +        vgetexpps {sae}, %zmm1, %zmm3
> +        vmovups   _sHalf+__svml_shypot_data_internal(%rip), %zmm6
> +        vmaxps    {sae}, %zmm3, %zmm2, %zmm4
> +        vmulps    {rn-sae}, %zmm0, %zmm0, %zmm2
> +        vandps    _sAbsMask+__svml_shypot_data_internal(%rip), %zmm4, %zmm5
> +        vfmadd231ps {rn-sae}, %zmm1, %zmm1, %zmm2
> +        vpcmpd    $5, _iExpBound+__svml_shypot_data_internal(%rip), %zmm5, %k0
> +        vrsqrt14ps %zmm2, %zmm7
> +        kmovw     %k0, %edx
> +        vmulps    {rn-sae}, %zmm7, %zmm2, %zmm9
> +        vmulps    {rn-sae}, %zmm7, %zmm6, %zmm8
> +        vfnmadd231ps {rn-sae}, %zmm9, %zmm9, %zmm2
> +        vfmadd213ps {rn-sae}, %zmm9, %zmm8, %zmm2
> +
> +/*
> + * VSCALEF( S, _VRES1, _VRES1, sExp );
> + *  The end of implementation
> + */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1 zmm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %zmm2, %zmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm0, 64(%rsp)
> +        vmovups   %zmm1, 128(%rsp)
> +        vmovups   %zmm2, 192(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm2
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   192(%rsp), %zmm2
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm2
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        movss     128(%rsp,%r14,4), %xmm1
> +        call      hypotf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 192(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16vv_hypotf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_shypot_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _sAbsMask[16][1];
> +        __declspec(align(64)) VUINT32 _sHalf[16][1];
> +        __declspec(align(64)) VUINT32 _iExpBound[16][1];
> +} __svml_shypot_data_internal;
> +#endif
> +__svml_shypot_data_internal:
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _sAbsMask      */
> +        .align 64
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000  /* _sHalf         */
> +        /* fma based algorithm*/
> +        .align 64
> +        .long 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000, 0x427C0000  /* _iExpBound     */
> +        .align 64
> +        .type	__svml_shypot_data_internal,@object
> +        .size	__svml_shypot_data_internal,.-__svml_shypot_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S
> new file mode 100644
> index 0000000000..5e9dd22d94
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized hypotf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4vv_hypotf _ZGVbN4vv_hypotf_sse2
> +#include "../svml_s_hypotf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c
> new file mode 100644
> index 0000000000..91c9f5ca3f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized hypotf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4vv_hypotf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4vv_hypotf, __GI__ZGVbN4vv_hypotf,
> +	       __redirect__ZGVbN4vv_hypotf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S
> new file mode 100644
> index 0000000000..a3f6d21ce1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf4_core_sse4.S
> @@ -0,0 +1,265 @@
> +/* Function hypotf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      HIGH LEVEL OVERVIEW
> + *
> + *      Calculate z = (x*x+y*y)
> + *      Calculate reciplicle sqrt (z)
> + *      Calculate make two NR iterations
> + *
> + *      ALGORITHM DETAILS
> + *
> + *    Multiprecision branch for _HA_ only
> + *      Remove sigm from both arguments
> + *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
> + *      Split _x int _a and _b for multiprecision
> + *      If _x >> _y we will we will not split _y for multiprecision
> + *      all _y will be put into lower part (_d) and higher part (_c = 0)
> + *      Fixing _hilo_mask for the case _x >> _y
> + *      Split _y into _c and _d for multiprecision with fixed mask
> + *
> + *      compute Hi and Lo parts of _z = _x*_x + _y*_y
> + *
> + *      _zHi = _a*_a + _c*_c
> + *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
> + *      _z = _zHi + _zLo
> + *
> + *    No multiprecision branch for _LA_ and _EP_
> + *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
> + *
> + *    Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout
> + *
> + *    Compute resciplicle sqrt s0 ~ 1.0/sqrt(_z),
> + *      that multiplied by _z, is final result for _EP_ version.
> + *
> + *    First iteration (or zero iteration):
> + *       s =  z * s0
> + *       h = .5 * s0
> + *       d =  s *  h - .5
> + *
> + *    Second iteration:
> + *       h = d * h + h
> + *       s = s * d + s
> + *       d = s * s - z (in multiprecision for _HA_)
> + *
> + *    result = s - h * d
> + *
> + *    EP version of the function can be implemented as y[i]=sqrt(a[i]^2+b[i]^2)
> + *    with all intermediate operations done in target precision for i=1,..,n.
> + *    It can return result y[i]=0 in case a[i]^2 and b[i]^2 underflow in target
> + *    precision (for some i). It can return result y[i]=NAN in case
> + *    a[i]^2+b[i]^2 overflow in target precision, for some i. It can return
> + *    result y[i]=NAN in case a[i] or b[i] is infinite, for some i.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_shypot_data_internal
> + */
> +#define _sHiLoMask                    	0
> +#define _sAbsMask                     	16
> +#define _sHalf                        	32
> +#define _LowBoundary                  	48
> +#define _HighBoundary                 	64
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4vv_hypotf_sse4)
> +        subq      $88, %rsp
> +        cfi_def_cfa_offset(96)
> +
> +/*
> + *  Implementation
> + * Multiprecision branch for _HA_ only
> + * No multiprecision branch for _LA_
> + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
> + */
> +        movaps    %xmm0, %xmm8
> +        movaps    %xmm1, %xmm2
> +        mulps     %xmm0, %xmm8
> +        mulps     %xmm1, %xmm2
> +
> +/*
> + *  Variables
> + *  Defines
> + *  Constants loading
> + */
> +        movups    _sHalf+__svml_shypot_data_internal(%rip), %xmm5
> +        addps     %xmm2, %xmm8
> +
> +/* _s0  ~ 1.0/sqrt(_z) */
> +        rsqrtps   %xmm8, %xmm10
> +
> +/* First iteration */
> +        movaps    %xmm10, %xmm2
> +        movaps    %xmm8, %xmm3
> +        mulps     %xmm8, %xmm2
> +        mulps     %xmm5, %xmm10
> +        movaps    %xmm2, %xmm6
> +        mulps     %xmm10, %xmm6
> +
> +/* Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout */
> +        movdqu    _LowBoundary+__svml_shypot_data_internal(%rip), %xmm4
> +        subps     %xmm6, %xmm5
> +
> +/* Second iteration */
> +        movaps    %xmm5, %xmm7
> +        pcmpgtd   %xmm8, %xmm4
> +        mulps     %xmm2, %xmm5
> +        mulps     %xmm10, %xmm7
> +        addps     %xmm5, %xmm2
> +        addps     %xmm7, %xmm10
> +
> +/* Finish second iteration in native precision for _LA_ */
> +        movaps    %xmm2, %xmm9
> +        mulps     %xmm2, %xmm9
> +        pcmpgtd   _HighBoundary+__svml_shypot_data_internal(%rip), %xmm3
> +        subps     %xmm8, %xmm9
> +        mulps     %xmm9, %xmm10
> +        por       %xmm3, %xmm4
> +        movmskps  %xmm4, %edx
> +        subps     %xmm10, %xmm2
> +
> +/*  The end of implementation  */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1 xmm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm2, %xmm0
> +        addq      $88, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(96)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm1, 48(%rsp)
> +        movups    %xmm2, 64(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -80)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -88)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -96)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    64(%rsp), %xmm2
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -80)
> +        cfi_offset(13, -88)
> +        cfi_offset(14, -96)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm2
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        movss     48(%rsp,%r14,4), %xmm1
> +        call      hypotf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4vv_hypotf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_shypot_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _sHiLoMask[4][1];
> +        __declspec(align(16)) VUINT32 _sAbsMask[4][1];
> +        __declspec(align(16)) VUINT32 _sHalf[4][1];
> +        __declspec(align(16)) VUINT32 _LowBoundary[4][1];
> +        __declspec(align(16)) VUINT32 _HighBoundary[4][1];
> +} __svml_shypot_data_internal;
> +#endif
> +__svml_shypot_data_internal:
> +        /* legacy algorithm */
> +        .long 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000  /* _sHiLoMask     */
> +        .align 16
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _sAbsMask      */
> +        .align 16
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000  /* _sHalf         */
> +        .align 16
> +        .long 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000  /* _LowBoundary   */
> +        .align 16
> +        .long 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000  /* _HighBoundary  */
> +        .align 16
> +        .type	__svml_shypot_data_internal,@object
> +        .size	__svml_shypot_data_internal,.-__svml_shypot_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S
> new file mode 100644
> index 0000000000..d37556e331
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized hypotf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8vv_hypotf _ZGVdN8vv_hypotf_sse_wrapper
> +#include "../svml_s_hypotf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c
> new file mode 100644
> index 0000000000..6cc497e73d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized sinf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8vv_hypotf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8vv_hypotf, __GI__ZGVdN8vv_hypotf,
> +	       __redirect__ZGVdN8vv_hypotf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S
> new file mode 100644
> index 0000000000..733022ff01
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_hypotf8_core_avx2.S
> @@ -0,0 +1,269 @@
> +/* Function hypotf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      HIGH LEVEL OVERVIEW
> + *
> + *      Calculate z = (x*x+y*y)
> + *      Calculate reciplicle sqrt (z)
> + *      Calculate make two NR iterations
> + *
> + *      ALGORITHM DETAILS
> + *
> + *    Multiprecision branch for _HA_ only
> + *      Remove sigm from both arguments
> + *      Find maximum (_x) and minimum (_y) (by abs value) between arguments
> + *      Split _x int _a and _b for multiprecision
> + *      If _x >> _y we will we will not split _y for multiprecision
> + *      all _y will be put into lower part (_d) and higher part (_c = 0)
> + *      Fixing _hilo_mask for the case _x >> _y
> + *      Split _y into _c and _d for multiprecision with fixed mask
> + *
> + *      compute Hi and Lo parts of _z = _x*_x + _y*_y
> + *
> + *      _zHi = _a*_a + _c*_c
> + *      _zLo = (_x + _a)*_b + _d*_y + _d*_c
> + *      _z = _zHi + _zLo
> + *
> + *    No multiprecision branch for _LA_ and _EP_
> + *      _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
> + *
> + *    Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout
> + *
> + *    Compute resciplicle sqrt s0 ~ 1.0/sqrt(_z),
> + *      that multiplied by _z, is final result for _EP_ version.
> + *
> + *    First iteration (or zero iteration):
> + *       s =  z * s0
> + *       h = .5 * s0
> + *       d =  s *  h - .5
> + *
> + *    Second iteration:
> + *       h = d * h + h
> + *       s = s * d + s
> + *       d = s * s - z (in multiprecision for _HA_)
> + *
> + *    result = s - h * d
> + *
> + *    EP version of the function can be implemented as y[i]=sqrt(a[i]^2+b[i]^2)
> + *    with all intermediate operations done in target precision for i=1,..,n.
> + *    It can return result y[i]=0 in case a[i]^2 and b[i]^2 underflow in target
> + *    precision (for some i). It can return result y[i]=NAN in case
> + *    a[i]^2+b[i]^2 overflow in target precision, for some i. It can return
> + *    result y[i]=NAN in case a[i] or b[i] is infinite, for some i.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_shypot_data_internal
> + */
> +#define _sHiLoMask                    	0
> +#define _sAbsMask                     	32
> +#define _sHalf                        	64
> +#define _LowBoundary                  	96
> +#define _HighBoundary                 	128
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8vv_hypotf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $128, %rsp
> +
> +/*
> + *  Implementation
> + * Multiprecision branch for _HA_ only
> + * No multiprecision branch for _LA_
> + * _z = _VARG1 * _VARG1 + _VARG2 * _VARG2
> + */
> +        vmulps    %ymm0, %ymm0, %ymm8
> +
> +/*
> + *  Variables
> + *  Defines
> + *  Constants loading
> + */
> +        vmovups   _sHalf+__svml_shypot_data_internal(%rip), %ymm7
> +
> +/* Check _z exponent to be withing borders [1E3 ; 60A] else goto Callout */
> +        vmovups   _LowBoundary+__svml_shypot_data_internal(%rip), %ymm2
> +        vfmadd231ps %ymm1, %ymm1, %ymm8
> +
> +/* _s0  ~ 1.0/sqrt(_z) */
> +        vrsqrtps  %ymm8, %ymm6
> +        vpcmpgtd  %ymm8, %ymm2, %ymm3
> +
> +/* First iteration */
> +        vmulps    %ymm8, %ymm6, %ymm9
> +        vmulps    %ymm7, %ymm6, %ymm2
> +        vfnmadd231ps %ymm9, %ymm2, %ymm7
> +        vfmadd213ps %ymm9, %ymm7, %ymm9
> +
> +/* Second iteration */
> +        vfmadd132ps %ymm7, %ymm2, %ymm2
> +        vpcmpgtd  _HighBoundary+__svml_shypot_data_internal(%rip), %ymm8, %ymm4
> +        vpor      %ymm4, %ymm3, %ymm5
> +
> +/* Finish second iteration in native precision for _LA_ */
> +        vfmsub231ps %ymm9, %ymm9, %ymm8
> +        vmovmskps %ymm5, %edx
> +        vfnmadd213ps %ymm9, %ymm8, %ymm2
> +
> +/*  The end of implementation  */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1 ymm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %ymm2, %ymm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm0, 32(%rsp)
> +        vmovups   %ymm1, 64(%rsp)
> +        vmovups   %ymm2, 96(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm2
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   96(%rsp), %ymm2
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm2
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        movss     64(%rsp,%r14,4), %xmm1
> +        call      hypotf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 96(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8vv_hypotf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_shypot_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _sHiLoMask[8][1];
> +        __declspec(align(32)) VUINT32 _sAbsMask[8][1];
> +        __declspec(align(32)) VUINT32 _sHalf[8][1];
> +        __declspec(align(32)) VUINT32 _LowBoundary[8][1];
> +        __declspec(align(32)) VUINT32 _HighBoundary[8][1];
> +} __svml_shypot_data_internal;
> +#endif
> +__svml_shypot_data_internal:
> +        /* legacy algorithm */
> +        .long 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000, 0xFFF80000  /* _sHiLoMask     */
> +        .align 32
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _sAbsMask      */
> +        .align 32
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000  /* _sHalf         */
> +        .align 32
> +        .long 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000, 0x1E300000  /* _LowBoundary   */
> +        .align 32
> +        .long 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000, 0x60A00000  /* _HighBoundary  */
> +        .align 32
> +        .type	__svml_shypot_data_internal,@object
> +        .size	__svml_shypot_data_internal,.-__svml_shypot_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_hypot2_core.S b/sysdeps/x86_64/fpu/svml_d_hypot2_core.S
> new file mode 100644
> index 0000000000..ea98f36324
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_hypot2_core.S
> @@ -0,0 +1,29 @@
> +/* Function hypot vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2vv_hypot)
> +WRAPPER_IMPL_SSE2_ff hypot
> +END (_ZGVbN2vv_hypot)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2vv_hypot)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_hypot4_core.S b/sysdeps/x86_64/fpu/svml_d_hypot4_core.S
> new file mode 100644
> index 0000000000..cedbbff2b6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_hypot4_core.S
> @@ -0,0 +1,29 @@
> +/* Function hypot vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4vv_hypot)
> +WRAPPER_IMPL_AVX_ff _ZGVbN2vv_hypot
> +END (_ZGVdN4vv_hypot)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4vv_hypot)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S
> new file mode 100644
> index 0000000000..e0fef5203d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_hypot4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function hypot vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4vv_hypot)
> +WRAPPER_IMPL_AVX_ff _ZGVbN2vv_hypot
> +END (_ZGVcN4vv_hypot)
> diff --git a/sysdeps/x86_64/fpu/svml_d_hypot8_core.S b/sysdeps/x86_64/fpu/svml_d_hypot8_core.S
> new file mode 100644
> index 0000000000..7588e4407b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_hypot8_core.S
> @@ -0,0 +1,25 @@
> +/* Function hypot vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8vv_hypot)
> +WRAPPER_IMPL_AVX512_ff _ZGVdN4vv_hypot
> +END (_ZGVeN8vv_hypot)
> diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf16_core.S b/sysdeps/x86_64/fpu/svml_s_hypotf16_core.S
> new file mode 100644
> index 0000000000..06d421a926
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_hypotf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function hypotf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16vv_hypotf)
> +WRAPPER_IMPL_AVX512_ff _ZGVdN8vv_hypotf
> +END (_ZGVeN16vv_hypotf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf4_core.S b/sysdeps/x86_64/fpu/svml_s_hypotf4_core.S
> new file mode 100644
> index 0000000000..7e8553cae4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_hypotf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function hypotf vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4vv_hypotf)
> +WRAPPER_IMPL_SSE2_ff hypotf
> +END (_ZGVbN4vv_hypotf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4vv_hypotf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf8_core.S b/sysdeps/x86_64/fpu/svml_s_hypotf8_core.S
> new file mode 100644
> index 0000000000..a9bf27370b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_hypotf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function hypotf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8vv_hypotf)
> +WRAPPER_IMPL_AVX_ff _ZGVbN4vv_hypotf
> +END (_ZGVdN8vv_hypotf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8vv_hypotf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S
> new file mode 100644
> index 0000000000..8b8008a7e9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_hypotf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function hypotf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY(_ZGVcN8vv_hypotf)
> +WRAPPER_IMPL_AVX_ff _ZGVbN4vv_hypotf
> +END(_ZGVcN8vv_hypotf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c
> new file mode 100644
> index 0000000000..c6a26a63e4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-hypot.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c
> new file mode 100644
> index 0000000000..c6a26a63e4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-hypot.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c
> new file mode 100644
> index 0000000000..c6a26a63e4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-hypot.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-hypot.c b/sysdeps/x86_64/fpu/test-double-libmvec-hypot.c
> new file mode 100644
> index 0000000000..c0f600a443
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-hypot.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC hypot
> +#include "test-vector-abi-arg2.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 5746bb5be3..9bc9d1dafa 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVbN2vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 8d3d5493ed..c41994d90a 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -33,6 +33,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVdN4vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index f43328f2ff..881f6c801a 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVcN4vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 8b566c199a..6fd106fe68 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVeN8vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c
> new file mode 100644
> index 0000000000..97d11ad1d3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-hypotf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c
> new file mode 100644
> index 0000000000..97d11ad1d3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-hypotf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c
> new file mode 100644
> index 0000000000..97d11ad1d3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-hypotf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c
> new file mode 100644
> index 0000000000..38776fa724
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-hypotf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC hypotf
> +#include "test-vector-abi-arg2.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 3d3218a310..4c2ea6ddfe 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index 7d75b9f60f..1d5d952d07 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index 405dde49bc..7a750f3781 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -33,6 +33,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 7558443f2e..af816a7789 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -30,6 +30,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 04/18] x86-64: Add vector exp2/exp2f implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 04/18] x86-64: Add vector exp2/exp2f " Sunil K Pandey
@ 2021-12-29 21:25   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:25 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:46PM -0800, Sunil K Pandey wrote:
> Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector exp2/exp2f with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
>  .../fpu/multiarch/svml_d_exp22_core-sse2.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_exp22_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_exp22_core_sse4.S    | 325 +++++++++++++++++
>  .../fpu/multiarch/svml_d_exp24_core-sse.S     |  20 +
>  .../x86_64/fpu/multiarch/svml_d_exp24_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_exp24_core_avx2.S    | 341 ++++++++++++++++++
>  .../fpu/multiarch/svml_d_exp28_core-avx2.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_exp28_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_exp28_core_avx512.S  | 301 ++++++++++++++++
>  .../fpu/multiarch/svml_s_exp2f16_core-avx2.S  |  20 +
>  .../fpu/multiarch/svml_s_exp2f16_core.c       |  28 ++
>  .../multiarch/svml_s_exp2f16_core_avx512.S    | 271 ++++++++++++++
>  .../fpu/multiarch/svml_s_exp2f4_core-sse2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_s_exp2f4_core.c |  28 ++
>  .../fpu/multiarch/svml_s_exp2f4_core_sse4.S   | 238 ++++++++++++
>  .../fpu/multiarch/svml_s_exp2f8_core-sse.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_s_exp2f8_core.c |  28 ++
>  .../fpu/multiarch/svml_s_exp2f8_core_avx2.S   | 245 +++++++++++++
>  sysdeps/x86_64/fpu/svml_d_exp22_core.S        |  29 ++
>  sysdeps/x86_64/fpu/svml_d_exp24_core.S        |  29 ++
>  sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S    |  25 ++
>  sysdeps/x86_64/fpu/svml_d_exp28_core.S        |  25 ++
>  sysdeps/x86_64/fpu/svml_s_exp2f16_core.S      |  25 ++
>  sysdeps/x86_64/fpu/svml_s_exp2f4_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_s_exp2f8_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S   |  25 ++
>  .../x86_64/fpu/test-double-libmvec-exp2-avx.c |   1 +
>  .../fpu/test-double-libmvec-exp2-avx2.c       |   1 +
>  .../fpu/test-double-libmvec-exp2-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-exp2.c |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-exp2f-avx.c |   1 +
>  .../fpu/test-float-libmvec-exp2f-avx2.c       |   1 +
>  .../fpu/test-float-libmvec-exp2f-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 2293 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_exp22_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_exp24_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_exp28_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index adf65f6bc2..36d6643eb9 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -142,4 +142,15 @@
>  #define __DECL_SIMD_hypotf32x
>  #define __DECL_SIMD_hypotf64x
>  #define __DECL_SIMD_hypotf128x
> +
> +#define __DECL_SIMD_exp2
> +#define __DECL_SIMD_exp2f
> +#define __DECL_SIMD_exp2l
> +#define __DECL_SIMD_exp2f16
> +#define __DECL_SIMD_exp2f32
> +#define __DECL_SIMD_exp2f64
> +#define __DECL_SIMD_exp2f128
> +#define __DECL_SIMD_exp2f32x
> +#define __DECL_SIMD_exp2f64x
> +#define __DECL_SIMD_exp2f128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 2ed820a0dc..645088cbf3 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -127,7 +127,7 @@ __MATHCALL (logb,, (_Mdouble_ __x));
>  
>  #ifdef __USE_ISOC99
>  /* Compute base-2 exponential of X.  */
> -__MATHCALL (exp2,, (_Mdouble_ __x));
> +__MATHCALL_VEC (exp2,, (_Mdouble_ __x));
>  
>  /* Compute base-2 logarithm of X.  */
>  __MATHCALL (log2,, (_Mdouble_ __x));
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index 12bb03245b..1717f2dee9 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -49,32 +49,40 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
>  GLIBC_2.35 _ZGVbN2v_acos F
>  GLIBC_2.35 _ZGVbN2v_asin F
>  GLIBC_2.35 _ZGVbN2v_atan F
> +GLIBC_2.35 _ZGVbN2v_exp2 F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
>  GLIBC_2.35 _ZGVbN4v_atanf F
> +GLIBC_2.35 _ZGVbN4v_exp2f F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
>  GLIBC_2.35 _ZGVcN4v_asin F
>  GLIBC_2.35 _ZGVcN4v_atan F
> +GLIBC_2.35 _ZGVcN4v_exp2 F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
>  GLIBC_2.35 _ZGVcN8v_atanf F
> +GLIBC_2.35 _ZGVcN8v_exp2f F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
>  GLIBC_2.35 _ZGVdN4v_asin F
>  GLIBC_2.35 _ZGVdN4v_atan F
> +GLIBC_2.35 _ZGVdN4v_exp2 F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
>  GLIBC_2.35 _ZGVdN8v_atanf F
> +GLIBC_2.35 _ZGVdN8v_exp2f F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
>  GLIBC_2.35 _ZGVeN16v_atanf F
> +GLIBC_2.35 _ZGVeN16v_exp2f F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
>  GLIBC_2.35 _ZGVeN8v_asin F
>  GLIBC_2.35 _ZGVeN8v_atan F
> +GLIBC_2.35 _ZGVeN8v_exp2 F
>  GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 437977c5fd..c7a972521b 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -74,6 +74,10 @@
>  #  define __DECL_SIMD_hypot __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_hypotf
>  #  define __DECL_SIMD_hypotf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_exp2
> +#  define __DECL_SIMD_exp2 __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_exp2f
> +#  define __DECL_SIMD_exp2f __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index cda31479a6..0994e6dfac 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -36,6 +36,8 @@
>  !GCC$ builtin (asinf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (hypot) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (hypotf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (exp2) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (exp2f) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -57,3 +59,5 @@
>  !GCC$ builtin (asinf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (hypot) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (hypotf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (exp2) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (exp2f) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 7769a02731..03b2364417 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -27,6 +27,7 @@ libmvec-funcs = \
>    atan \
>    cos \
>    exp \
> +  exp2 \
>    hypot \
>    log \
>    pow \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index e359e5dc2c..12b7ad1830 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -17,10 +17,12 @@ libmvec {
>      _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
>      _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
>      _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
> +    _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
>      _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
> +    _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
>      _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
>    }
>  }
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index a7513ec94e..bc4479ad39 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -1276,6 +1276,26 @@ float: 1
>  float128: 2
>  ldouble: 1
>  
> +Function: "exp2_vlen16":
> +float: 1
> +
> +Function: "exp2_vlen2":
> +double: 1
> +
> +Function: "exp2_vlen4":
> +double: 1
> +float: 1
> +
> +Function: "exp2_vlen4_avx2":
> +double: 1
> +
> +Function: "exp2_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "exp2_vlen8_avx2":
> +float: 1
> +
>  Function: "exp_downward":
>  double: 1
>  float: 1
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S
> new file mode 100644
> index 0000000000..330260baaa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized exp2, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_exp2 _ZGVbN2v_exp2_sse2
> +#include "../svml_d_exp22_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c
> new file mode 100644
> index 0000000000..e0cf198030
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized exp2, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_exp2
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_exp2, __GI__ZGVbN2v_exp2, __redirect__ZGVbN2v_exp2)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S
> new file mode 100644
> index 0000000000..7388c242f6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp22_core_sse4.S
> @@ -0,0 +1,325 @@
> +/* Function exp2 vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   exp2(x)  = 2^n * T[j] * (1 + P(y))
> + *   where
> + *        x = m*(1/K) + y,    y in [-1/K..1/K]
> + *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
> + *
> + *        values of 2^j/K are tabulated
> + *
> + *        P(y) is a minimax polynomial approximation of exp2(x)-1
> + *        on small interval [-1/K..1/K]
> + *
> + *  Special cases:
> + *
> + *   exp2(NaN)  = NaN
> + *   exp2(+INF) = +INF
> + *   exp2(-INF) = 0
> + *   exp2(x)    = 1 for subnormals
> + *   For IEEE double
> + *     if x >= 1024.0 then exp2(x) overflows
> + *     if x < -1076.0 then exp2(x) underflows
> + *
> + */
> +
> +/* Offsets for data table __svml_dexp2_data_internal
> + */
> +#define _dbT                          	0
> +#define _dbShifter                    	1024
> +#define _dPC1                         	1040
> +#define _dPC2                         	1056
> +#define _dPC3                         	1072
> +#define _dPC4                         	1088
> +#define _lIndexMask                   	1104
> +#define _iAbsMask                     	1120
> +#define _iDomainRange                 	1136
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_exp2_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +
> +/*  R  */
> +        movaps    %xmm0, %xmm7
> +        movups    _dbShifter+__svml_dexp2_data_internal(%rip), %xmm1
> +
> +/* out, basePtr, iIndex, iBaseOfs, iSize, iGran, iOfs */
> +        lea       __svml_dexp2_data_internal(%rip), %rsi
> +
> +/*  Load arument  */
> +        movaps    %xmm1, %xmm10
> +        addpd     %xmm0, %xmm10
> +        movaps    %xmm10, %xmm6
> +        subpd     %xmm1, %xmm6
> +        subpd     %xmm6, %xmm7
> +
> +/*
> + *  Polynomial
> + * poly(dN) = a1*dR+...+a4*dR^4
> + */
> +        movups    _dPC4+__svml_dexp2_data_internal(%rip), %xmm8
> +        mulpd     %xmm7, %xmm8
> +        addpd     _dPC3+__svml_dexp2_data_internal(%rip), %xmm8
> +        mulpd     %xmm7, %xmm8
> +        addpd     _dPC2+__svml_dexp2_data_internal(%rip), %xmm8
> +        movdqu    _lIndexMask+__svml_dexp2_data_internal(%rip), %xmm9
> +
> +/*  Index and lookup  */
> +        movdqa    %xmm9, %xmm5
> +        pandn     %xmm10, %xmm9
> +        pand      %xmm10, %xmm5
> +
> +/*  2^N  */
> +        psllq     $45, %xmm9
> +        movd      %xmm5, %eax
> +        movq      _iAbsMask+__svml_dexp2_data_internal(%rip), %xmm2
> +
> +/* Check for overflow\underflow  */
> +        pshufd    $221, %xmm0, %xmm4
> +        pextrw    $4, %xmm5, %ecx
> +
> +/* a1+...+a4*dR^3 ! */
> +        mulpd     %xmm7, %xmm8
> +        shll      $3, %eax
> +        pand      %xmm2, %xmm4
> +        shll      $3, %ecx
> +        movq      (%rsi,%rax), %xmm1
> +        movhpd    (%rsi,%rcx), %xmm1
> +
> +/* dR=dR*dT */
> +        mulpd     %xmm1, %xmm7
> +        addpd     _dPC1+__svml_dexp2_data_internal(%rip), %xmm8
> +
> +/*
> + *  Reconstruction
> + * exp2 = {2^N later}*(Tj+Tj*poly)
> + * dN = dT+dT*dR*(a1+...+a4*dR^3)
> + */
> +        mulpd     %xmm7, %xmm8
> +        addpd     %xmm8, %xmm1
> +        movq      _iDomainRange+__svml_dexp2_data_internal(%rip), %xmm3
> +        pcmpgtd   %xmm3, %xmm4
> +        movmskps  %xmm4, %edx
> +
> +/* quick 2^N */
> +        paddq     %xmm9, %xmm1
> +        andl      $3, %edx
> +
> +/*  Finish   */
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm1, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm1, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm1
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      exp2@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN2v_exp2_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dexp2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _dbT[(1<<7)][2];
> +        __declspec(align(16)) VUINT32 _dbShifter[2][2];
> +        __declspec(align(16)) VUINT32 _dPC1[2][2];
> +        __declspec(align(16)) VUINT32 _dPC2[2][2];
> +        __declspec(align(16)) VUINT32 _dPC3[2][2];
> +        __declspec(align(16)) VUINT32 _dPC4[2][2];
> +        __declspec(align(16)) VUINT32 _lIndexMask[2][2];
> +        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
> +} __svml_dexp2_data_internal;
> +#endif
> +__svml_dexp2_data_internal:
> +        /*== _dbT ==*/
> +        .quad 0x3ff0000000000000, 0x3ff0163da9fb3335   /*2^( 0 /128),2^( 1 /128)*/
> +        .quad 0x3ff02c9a3e778061, 0x3ff04315e86e7f85   /*2^( 2 /128),2^( 3 /128)*/
> +        .quad 0x3ff059b0d3158574, 0x3ff0706b29ddf6de   /*2^( 4 /128),2^( 5 /128)*/
> +        .quad 0x3ff0874518759bc8, 0x3ff09e3ecac6f383   /*2^( 6 /128),2^( 7 /128)*/
> +        .quad 0x3ff0b5586cf9890f, 0x3ff0cc922b7247f7   /*2^( 8 /128),2^( 9 /128)*/
> +        .quad 0x3ff0e3ec32d3d1a2, 0x3ff0fb66affed31b   /*2^( 10 /128),2^( 11 /128)*/
> +        .quad 0x3ff11301d0125b51, 0x3ff12abdc06c31cc   /*2^( 12 /128),2^( 13 /128)*/
> +        .quad 0x3ff1429aaea92de0, 0x3ff15a98c8a58e51   /*2^( 14 /128),2^( 15 /128)*/
> +        .quad 0x3ff172b83c7d517b, 0x3ff18af9388c8dea   /*2^( 16 /128),2^( 17 /128)*/
> +        .quad 0x3ff1a35beb6fcb75, 0x3ff1bbe084045cd4   /*2^( 18 /128),2^( 19 /128)*/
> +        .quad 0x3ff1d4873168b9aa, 0x3ff1ed5022fcd91d   /*2^( 20 /128),2^( 21 /128)*/
> +        .quad 0x3ff2063b88628cd6, 0x3ff21f49917ddc96   /*2^( 22 /128),2^( 23 /128)*/
> +        .quad 0x3ff2387a6e756238, 0x3ff251ce4fb2a63f   /*2^( 24 /128),2^( 25 /128)*/
> +        .quad 0x3ff26b4565e27cdd, 0x3ff284dfe1f56381   /*2^( 26 /128),2^( 27 /128)*/
> +        .quad 0x3ff29e9df51fdee1, 0x3ff2b87fd0dad990   /*2^( 28 /128),2^( 29 /128)*/
> +        .quad 0x3ff2d285a6e4030b, 0x3ff2ecafa93e2f56   /*2^( 30 /128),2^( 31 /128)*/
> +        .quad 0x3ff306fe0a31b715, 0x3ff32170fc4cd831   /*2^( 32 /128),2^( 33 /128)*/
> +        .quad 0x3ff33c08b26416ff, 0x3ff356c55f929ff1   /*2^( 34 /128),2^( 35 /128)*/
> +        .quad 0x3ff371a7373aa9cb, 0x3ff38cae6d05d866   /*2^( 36 /128),2^( 37 /128)*/
> +        .quad 0x3ff3a7db34e59ff7, 0x3ff3c32dc313a8e5   /*2^( 38 /128),2^( 39 /128)*/
> +        .quad 0x3ff3dea64c123422, 0x3ff3fa4504ac801c   /*2^( 40 /128),2^( 41 /128)*/
> +        .quad 0x3ff4160a21f72e2a, 0x3ff431f5d950a897   /*2^( 42 /128),2^( 43 /128)*/
> +        .quad 0x3ff44e086061892d, 0x3ff46a41ed1d0057   /*2^( 44 /128),2^( 45 /128)*/
> +        .quad 0x3ff486a2b5c13cd0, 0x3ff4a32af0d7d3de   /*2^( 46 /128),2^( 47 /128)*/
> +        .quad 0x3ff4bfdad5362a27, 0x3ff4dcb299fddd0d   /*2^( 48 /128),2^( 49 /128)*/
> +        .quad 0x3ff4f9b2769d2ca7, 0x3ff516daa2cf6642   /*2^( 50 /128),2^( 51 /128)*/
> +        .quad 0x3ff5342b569d4f82, 0x3ff551a4ca5d920f   /*2^( 52 /128),2^( 53 /128)*/
> +        .quad 0x3ff56f4736b527da, 0x3ff58d12d497c7fd   /*2^( 54 /128),2^( 55 /128)*/
> +        .quad 0x3ff5ab07dd485429, 0x3ff5c9268a5946b7   /*2^( 56 /128),2^( 57 /128)*/
> +        .quad 0x3ff5e76f15ad2148, 0x3ff605e1b976dc09   /*2^( 58 /128),2^( 59 /128)*/
> +        .quad 0x3ff6247eb03a5585, 0x3ff6434634ccc320   /*2^( 60 /128),2^( 61 /128)*/
> +        .quad 0x3ff6623882552225, 0x3ff68155d44ca973   /*2^( 62 /128),2^( 63 /128)*/
> +        .quad 0x3ff6a09e667f3bcd, 0x3ff6c012750bdabf   /*2^( 64 /128),2^( 65 /128)*/
> +        .quad 0x3ff6dfb23c651a2f, 0x3ff6ff7df9519484   /*2^( 66 /128),2^( 67 /128)*/
> +        .quad 0x3ff71f75e8ec5f74, 0x3ff73f9a48a58174   /*2^( 68 /128),2^( 69 /128)*/
> +        .quad 0x3ff75feb564267c9, 0x3ff780694fde5d3f   /*2^( 70 /128),2^( 71 /128)*/
> +        .quad 0x3ff7a11473eb0187, 0x3ff7c1ed0130c132   /*2^( 72 /128),2^( 73 /128)*/
> +        .quad 0x3ff7e2f336cf4e62, 0x3ff80427543e1a12   /*2^( 74 /128),2^( 75 /128)*/
> +        .quad 0x3ff82589994cce13, 0x3ff8471a4623c7ad   /*2^( 76 /128),2^( 77 /128)*/
> +        .quad 0x3ff868d99b4492ed, 0x3ff88ac7d98a6699   /*2^( 78 /128),2^( 79 /128)*/
> +        .quad 0x3ff8ace5422aa0db, 0x3ff8cf3216b5448c   /*2^( 80 /128),2^( 81 /128)*/
> +        .quad 0x3ff8f1ae99157736, 0x3ff9145b0b91ffc6   /*2^( 82 /128),2^( 83 /128)*/
> +        .quad 0x3ff93737b0cdc5e5, 0x3ff95a44cbc8520f   /*2^( 84 /128),2^( 85 /128)*/
> +        .quad 0x3ff97d829fde4e50, 0x3ff9a0f170ca07ba   /*2^( 86 /128),2^( 87 /128)*/
> +        .quad 0x3ff9c49182a3f090, 0x3ff9e86319e32323   /*2^( 88 /128),2^( 89 /128)*/
> +        .quad 0x3ffa0c667b5de565, 0x3ffa309bec4a2d33   /*2^( 90 /128),2^( 91 /128)*/
> +        .quad 0x3ffa5503b23e255d, 0x3ffa799e1330b358   /*2^( 92 /128),2^( 93 /128)*/
> +        .quad 0x3ffa9e6b5579fdbf, 0x3ffac36bbfd3f37a   /*2^( 94 /128),2^( 95 /128)*/
> +        .quad 0x3ffae89f995ad3ad, 0x3ffb0e07298db666   /*2^( 96 /128),2^( 97 /128)*/
> +        .quad 0x3ffb33a2b84f15fb, 0x3ffb59728de5593a   /*2^( 98 /128),2^( 99 /128)*/
> +        .quad 0x3ffb7f76f2fb5e47, 0x3ffba5b030a1064a   /*2^( 100 /128),2^( 101 /128)*/
> +        .quad 0x3ffbcc1e904bc1d2, 0x3ffbf2c25bd71e09   /*2^( 102 /128),2^( 103 /128)*/
> +        .quad 0x3ffc199bdd85529c, 0x3ffc40ab5fffd07a   /*2^( 104 /128),2^( 105 /128)*/
> +        .quad 0x3ffc67f12e57d14b, 0x3ffc8f6d9406e7b5   /*2^( 106 /128),2^( 107 /128)*/
> +        .quad 0x3ffcb720dcef9069, 0x3ffcdf0b555dc3fa   /*2^( 108 /128),2^( 109 /128)*/
> +        .quad 0x3ffd072d4a07897c, 0x3ffd2f87080d89f2   /*2^( 110 /128),2^( 111 /128)*/
> +        .quad 0x3ffd5818dcfba487, 0x3ffd80e316c98398   /*2^( 112 /128),2^( 113 /128)*/
> +        .quad 0x3ffda9e603db3285, 0x3ffdd321f301b460   /*2^( 114 /128),2^( 115 /128)*/
> +        .quad 0x3ffdfc97337b9b5f, 0x3ffe264614f5a129   /*2^( 116 /128),2^( 117 /128)*/
> +        .quad 0x3ffe502ee78b3ff6, 0x3ffe7a51fbc74c83   /*2^( 118 /128),2^( 119 /128)*/
> +        .quad 0x3ffea4afa2a490da, 0x3ffecf482d8e67f1   /*2^( 120 /128),2^( 121 /128)*/
> +        .quad 0x3ffefa1bee615a27, 0x3fff252b376bba97   /*2^( 122 /128),2^( 123 /128)*/
> +        .quad 0x3fff50765b6e4540, 0x3fff7bfdad9cbe14   /*2^( 124 /128),2^( 125 /128)*/
> +        .quad 0x3fffa7c1819e90d8, 0x3fffd3c22b8f71f1 /*2^( 126 /128),2^( 127 /128)*/
> +        .align 16
> +        .quad 0x42c8000000000000, 0x42c8000000000000  /* _dbShifter - 0x433-7=0x42c shifted right on K!*/
> +        //log2(relerr) = -53.547756365162
> +        .align 16
> +        .quad 0x3fe62e42fefa3685, 0x3fe62e42fefa3685 /* _dPC1 */
> +        .align 16
> +        .quad 0x3fcebfbdff82ca48, 0x3fcebfbdff82ca48 /* _dPC2 */
> +        .align 16
> +        .quad 0x3fac6b09b180f045, 0x3fac6b09b180f045 /* _dPC3 */
> +        .align 16
> +        .quad 0x3f83b2ab5bb1268f, 0x3f83b2ab5bb1268f /* _dPC4 */
> +        .align 16
> +        .quad 0x000000000000007f, 0x000000000000007f          /* _lIndexMask =(2^K-1)*/
> +        .align 16
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff          /* _iAbsMask */
> +        .align 16
> +        .long 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff /* _iDomainRange */
> +        .align 16
> +        .type	__svml_dexp2_data_internal,@object
> +        .size	__svml_dexp2_data_internal,.-__svml_dexp2_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S
> new file mode 100644
> index 0000000000..51c5de1100
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized exp2, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_exp2 _ZGVdN4v_exp2_sse_wrapper
> +#include "../svml_d_exp24_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c
> new file mode 100644
> index 0000000000..bb979afde6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized exp2, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_exp2
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_exp2, __GI__ZGVdN4v_exp2, __redirect__ZGVdN4v_exp2)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S
> new file mode 100644
> index 0000000000..6aaadafeeb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp24_core_avx2.S
> @@ -0,0 +1,341 @@
> +/* Function exp2 vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   exp2(x)  = 2^n * T[j] * (1 + P(y))
> + *   where
> + *        x = m*(1/K) + y,    y in [-1/K..1/K]
> + *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
> + *
> + *        values of 2^j/K are tabulated
> + *
> + *        P(y) is a minimax polynomial approximation of exp2(x)-1
> + *        on small interval [-1/K..1/K]
> + *
> + *  Special cases:
> + *
> + *   exp2(NaN)  = NaN
> + *   exp2(+INF) = +INF
> + *   exp2(-INF) = 0
> + *   exp2(x)    = 1 for subnormals
> + *   For IEEE double
> + *     if x >= 1024.0 then exp2(x) overflows
> + *     if x < -1076.0 then exp2(x) underflows
> + *
> + */
> +
> +/* Offsets for data table __svml_dexp2_data_internal
> + */
> +#define _dbT                          	0
> +#define _dbShifter                    	1024
> +#define _dPC1                         	1056
> +#define _dPC2                         	1088
> +#define _dPC3                         	1120
> +#define _dPC4                         	1152
> +#define _lIndexMask                   	1184
> +#define _iAbsMask                     	1216
> +#define _iDomainRange                 	1248
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_exp2_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +
> +/* out, basePtr, iIndex, iBaseOfs, iSize, iGran, iOfs */
> +        lea       __svml_dexp2_data_internal(%rip), %r8
> +        vmovupd   _dbShifter+__svml_dexp2_data_internal(%rip), %ymm4
> +        vmovupd   _lIndexMask+__svml_dexp2_data_internal(%rip), %ymm3
> +        vmovapd   %ymm0, %ymm1
> +
> +/*  Load arument  */
> +        vaddpd    %ymm4, %ymm1, %ymm2
> +        vsubpd    %ymm4, %ymm2, %ymm0
> +
> +/*  Index and lookup  */
> +        vandps    %ymm3, %ymm2, %ymm9
> +        vpandn    %ymm2, %ymm3, %ymm2
> +
> +/*  2^N  */
> +        vpsllq    $45, %ymm2, %ymm3
> +
> +/*  R  */
> +        vsubpd    %ymm0, %ymm1, %ymm15
> +
> +/* Check for overflow\underflow  */
> +        vextractf128 $1, %ymm1, %xmm5
> +
> +/*
> + *  Polynomial
> + * poly(dN) = a1*dR+...+a4*dR^4
> + */
> +        vmovupd   _dPC4+__svml_dexp2_data_internal(%rip), %ymm0
> +        vshufps   $221, %xmm5, %xmm1, %xmm6
> +        vandps    _iAbsMask+__svml_dexp2_data_internal(%rip), %xmm6, %xmm7
> +        vpcmpgtd  _iDomainRange+__svml_dexp2_data_internal(%rip), %xmm7, %xmm8
> +        vfmadd213pd _dPC3+__svml_dexp2_data_internal(%rip), %ymm15, %ymm0
> +        vmovmskps %xmm8, %eax
> +        vfmadd213pd _dPC2+__svml_dexp2_data_internal(%rip), %ymm15, %ymm0
> +
> +/* a1+...+a4*dR^3 ! */
> +        vfmadd213pd _dPC1+__svml_dexp2_data_internal(%rip), %ymm15, %ymm0
> +        vextractf128 $1, %ymm9, %xmm12
> +        vmovd     %xmm9, %edx
> +        vmovd     %xmm12, %esi
> +        shll      $3, %edx
> +        vpextrd   $2, %xmm9, %ecx
> +        shll      $3, %esi
> +        vpextrd   $2, %xmm12, %edi
> +        shll      $3, %ecx
> +        vmovq     (%r8,%rdx), %xmm10
> +        shll      $3, %edi
> +        vmovq     (%r8,%rsi), %xmm13
> +        vmovhpd   (%r8,%rcx), %xmm10, %xmm11
> +        vmovhpd   (%r8,%rdi), %xmm13, %xmm14
> +        vinsertf128 $1, %xmm14, %ymm11, %ymm4
> +
> +/* dR=dR*dT */
> +        vmulpd    %ymm15, %ymm4, %ymm15
> +
> +/*
> + *  Reconstruction
> + * exp2 = {2^N later}*(Tj+Tj*poly)
> + * dN = dT+dT*dR*(a1+...+a4*dR^3)
> + */
> +        vfmadd213pd %ymm4, %ymm15, %ymm0
> +
> +/* quick 2^N */
> +        vpaddq    %ymm3, %ymm0, %ymm0
> +
> +/*  Finish   */
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm1, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      exp2@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_exp2_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dexp2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _dbT[(1<<7)][2];
> +        __declspec(align(32)) VUINT32 _dbShifter[4][2];
> +        __declspec(align(32)) VUINT32 _dPC1[4][2];
> +        __declspec(align(32)) VUINT32 _dPC2[4][2];
> +        __declspec(align(32)) VUINT32 _dPC3[4][2];
> +        __declspec(align(32)) VUINT32 _dPC4[4][2];
> +        __declspec(align(32)) VUINT32 _lIndexMask[4][2];
> +        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
> +} __svml_dexp2_data_internal;
> +#endif
> +__svml_dexp2_data_internal:
> +        /*== _dbT ==*/
> +        .quad 0x3ff0000000000000, 0x3ff0163da9fb3335   /*2^( 0 /128),2^( 1 /128)*/
> +        .quad 0x3ff02c9a3e778061, 0x3ff04315e86e7f85   /*2^( 2 /128),2^( 3 /128)*/
> +        .quad 0x3ff059b0d3158574, 0x3ff0706b29ddf6de   /*2^( 4 /128),2^( 5 /128)*/
> +        .quad 0x3ff0874518759bc8, 0x3ff09e3ecac6f383   /*2^( 6 /128),2^( 7 /128)*/
> +        .quad 0x3ff0b5586cf9890f, 0x3ff0cc922b7247f7   /*2^( 8 /128),2^( 9 /128)*/
> +        .quad 0x3ff0e3ec32d3d1a2, 0x3ff0fb66affed31b   /*2^( 10 /128),2^( 11 /128)*/
> +        .quad 0x3ff11301d0125b51, 0x3ff12abdc06c31cc   /*2^( 12 /128),2^( 13 /128)*/
> +        .quad 0x3ff1429aaea92de0, 0x3ff15a98c8a58e51   /*2^( 14 /128),2^( 15 /128)*/
> +        .quad 0x3ff172b83c7d517b, 0x3ff18af9388c8dea   /*2^( 16 /128),2^( 17 /128)*/
> +        .quad 0x3ff1a35beb6fcb75, 0x3ff1bbe084045cd4   /*2^( 18 /128),2^( 19 /128)*/
> +        .quad 0x3ff1d4873168b9aa, 0x3ff1ed5022fcd91d   /*2^( 20 /128),2^( 21 /128)*/
> +        .quad 0x3ff2063b88628cd6, 0x3ff21f49917ddc96   /*2^( 22 /128),2^( 23 /128)*/
> +        .quad 0x3ff2387a6e756238, 0x3ff251ce4fb2a63f   /*2^( 24 /128),2^( 25 /128)*/
> +        .quad 0x3ff26b4565e27cdd, 0x3ff284dfe1f56381   /*2^( 26 /128),2^( 27 /128)*/
> +        .quad 0x3ff29e9df51fdee1, 0x3ff2b87fd0dad990   /*2^( 28 /128),2^( 29 /128)*/
> +        .quad 0x3ff2d285a6e4030b, 0x3ff2ecafa93e2f56   /*2^( 30 /128),2^( 31 /128)*/
> +        .quad 0x3ff306fe0a31b715, 0x3ff32170fc4cd831   /*2^( 32 /128),2^( 33 /128)*/
> +        .quad 0x3ff33c08b26416ff, 0x3ff356c55f929ff1   /*2^( 34 /128),2^( 35 /128)*/
> +        .quad 0x3ff371a7373aa9cb, 0x3ff38cae6d05d866   /*2^( 36 /128),2^( 37 /128)*/
> +        .quad 0x3ff3a7db34e59ff7, 0x3ff3c32dc313a8e5   /*2^( 38 /128),2^( 39 /128)*/
> +        .quad 0x3ff3dea64c123422, 0x3ff3fa4504ac801c   /*2^( 40 /128),2^( 41 /128)*/
> +        .quad 0x3ff4160a21f72e2a, 0x3ff431f5d950a897   /*2^( 42 /128),2^( 43 /128)*/
> +        .quad 0x3ff44e086061892d, 0x3ff46a41ed1d0057   /*2^( 44 /128),2^( 45 /128)*/
> +        .quad 0x3ff486a2b5c13cd0, 0x3ff4a32af0d7d3de   /*2^( 46 /128),2^( 47 /128)*/
> +        .quad 0x3ff4bfdad5362a27, 0x3ff4dcb299fddd0d   /*2^( 48 /128),2^( 49 /128)*/
> +        .quad 0x3ff4f9b2769d2ca7, 0x3ff516daa2cf6642   /*2^( 50 /128),2^( 51 /128)*/
> +        .quad 0x3ff5342b569d4f82, 0x3ff551a4ca5d920f   /*2^( 52 /128),2^( 53 /128)*/
> +        .quad 0x3ff56f4736b527da, 0x3ff58d12d497c7fd   /*2^( 54 /128),2^( 55 /128)*/
> +        .quad 0x3ff5ab07dd485429, 0x3ff5c9268a5946b7   /*2^( 56 /128),2^( 57 /128)*/
> +        .quad 0x3ff5e76f15ad2148, 0x3ff605e1b976dc09   /*2^( 58 /128),2^( 59 /128)*/
> +        .quad 0x3ff6247eb03a5585, 0x3ff6434634ccc320   /*2^( 60 /128),2^( 61 /128)*/
> +        .quad 0x3ff6623882552225, 0x3ff68155d44ca973   /*2^( 62 /128),2^( 63 /128)*/
> +        .quad 0x3ff6a09e667f3bcd, 0x3ff6c012750bdabf   /*2^( 64 /128),2^( 65 /128)*/
> +        .quad 0x3ff6dfb23c651a2f, 0x3ff6ff7df9519484   /*2^( 66 /128),2^( 67 /128)*/
> +        .quad 0x3ff71f75e8ec5f74, 0x3ff73f9a48a58174   /*2^( 68 /128),2^( 69 /128)*/
> +        .quad 0x3ff75feb564267c9, 0x3ff780694fde5d3f   /*2^( 70 /128),2^( 71 /128)*/
> +        .quad 0x3ff7a11473eb0187, 0x3ff7c1ed0130c132   /*2^( 72 /128),2^( 73 /128)*/
> +        .quad 0x3ff7e2f336cf4e62, 0x3ff80427543e1a12   /*2^( 74 /128),2^( 75 /128)*/
> +        .quad 0x3ff82589994cce13, 0x3ff8471a4623c7ad   /*2^( 76 /128),2^( 77 /128)*/
> +        .quad 0x3ff868d99b4492ed, 0x3ff88ac7d98a6699   /*2^( 78 /128),2^( 79 /128)*/
> +        .quad 0x3ff8ace5422aa0db, 0x3ff8cf3216b5448c   /*2^( 80 /128),2^( 81 /128)*/
> +        .quad 0x3ff8f1ae99157736, 0x3ff9145b0b91ffc6   /*2^( 82 /128),2^( 83 /128)*/
> +        .quad 0x3ff93737b0cdc5e5, 0x3ff95a44cbc8520f   /*2^( 84 /128),2^( 85 /128)*/
> +        .quad 0x3ff97d829fde4e50, 0x3ff9a0f170ca07ba   /*2^( 86 /128),2^( 87 /128)*/
> +        .quad 0x3ff9c49182a3f090, 0x3ff9e86319e32323   /*2^( 88 /128),2^( 89 /128)*/
> +        .quad 0x3ffa0c667b5de565, 0x3ffa309bec4a2d33   /*2^( 90 /128),2^( 91 /128)*/
> +        .quad 0x3ffa5503b23e255d, 0x3ffa799e1330b358   /*2^( 92 /128),2^( 93 /128)*/
> +        .quad 0x3ffa9e6b5579fdbf, 0x3ffac36bbfd3f37a   /*2^( 94 /128),2^( 95 /128)*/
> +        .quad 0x3ffae89f995ad3ad, 0x3ffb0e07298db666   /*2^( 96 /128),2^( 97 /128)*/
> +        .quad 0x3ffb33a2b84f15fb, 0x3ffb59728de5593a   /*2^( 98 /128),2^( 99 /128)*/
> +        .quad 0x3ffb7f76f2fb5e47, 0x3ffba5b030a1064a   /*2^( 100 /128),2^( 101 /128)*/
> +        .quad 0x3ffbcc1e904bc1d2, 0x3ffbf2c25bd71e09   /*2^( 102 /128),2^( 103 /128)*/
> +        .quad 0x3ffc199bdd85529c, 0x3ffc40ab5fffd07a   /*2^( 104 /128),2^( 105 /128)*/
> +        .quad 0x3ffc67f12e57d14b, 0x3ffc8f6d9406e7b5   /*2^( 106 /128),2^( 107 /128)*/
> +        .quad 0x3ffcb720dcef9069, 0x3ffcdf0b555dc3fa   /*2^( 108 /128),2^( 109 /128)*/
> +        .quad 0x3ffd072d4a07897c, 0x3ffd2f87080d89f2   /*2^( 110 /128),2^( 111 /128)*/
> +        .quad 0x3ffd5818dcfba487, 0x3ffd80e316c98398   /*2^( 112 /128),2^( 113 /128)*/
> +        .quad 0x3ffda9e603db3285, 0x3ffdd321f301b460   /*2^( 114 /128),2^( 115 /128)*/
> +        .quad 0x3ffdfc97337b9b5f, 0x3ffe264614f5a129   /*2^( 116 /128),2^( 117 /128)*/
> +        .quad 0x3ffe502ee78b3ff6, 0x3ffe7a51fbc74c83   /*2^( 118 /128),2^( 119 /128)*/
> +        .quad 0x3ffea4afa2a490da, 0x3ffecf482d8e67f1   /*2^( 120 /128),2^( 121 /128)*/
> +        .quad 0x3ffefa1bee615a27, 0x3fff252b376bba97   /*2^( 122 /128),2^( 123 /128)*/
> +        .quad 0x3fff50765b6e4540, 0x3fff7bfdad9cbe14   /*2^( 124 /128),2^( 125 /128)*/
> +        .quad 0x3fffa7c1819e90d8, 0x3fffd3c22b8f71f1 /*2^( 126 /128),2^( 127 /128)*/
> +        .align 32
> +        .quad 0x42c8000000000000, 0x42c8000000000000, 0x42c8000000000000, 0x42c8000000000000  /* _dbShifter - 0x433-7=0x42c shifted right on K!*/
> +        //log2(relerr) = -53.547756365162
> +        .align 32
> +        .quad 0x3fe62e42fefa3685, 0x3fe62e42fefa3685, 0x3fe62e42fefa3685, 0x3fe62e42fefa3685 /* _dPC1 */
> +        .align 32
> +        .quad 0x3fcebfbdff82ca48, 0x3fcebfbdff82ca48, 0x3fcebfbdff82ca48, 0x3fcebfbdff82ca48 /* _dPC2 */
> +        .align 32
> +        .quad 0x3fac6b09b180f045, 0x3fac6b09b180f045, 0x3fac6b09b180f045, 0x3fac6b09b180f045 /* _dPC3 */
> +        .align 32
> +        .quad 0x3f83b2ab5bb1268f, 0x3f83b2ab5bb1268f, 0x3f83b2ab5bb1268f, 0x3f83b2ab5bb1268f /* _dPC4 */
> +        .align 32
> +        .quad 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f          /* _lIndexMask =(2^K-1)*/
> +        .align 32
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff          /* _iAbsMask */
> +        .align 32
> +        .long 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff, 0x408fefff /* _iDomainRange */
> +        .align 32
> +        .type	__svml_dexp2_data_internal,@object
> +        .size	__svml_dexp2_data_internal,.-__svml_dexp2_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S
> new file mode 100644
> index 0000000000..c9c17f0aaa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized exp2, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_exp2 _ZGVeN8v_exp2_avx2_wrapper
> +#include "../svml_d_exp28_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c
> new file mode 100644
> index 0000000000..3be9e88e98
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized exp2, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_exp2
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_exp2, __GI__ZGVeN8v_exp2, __redirect__ZGVeN8v_exp2)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S
> new file mode 100644
> index 0000000000..90f21695f0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp28_core_avx512.S
> @@ -0,0 +1,301 @@
> +/* Function exp2 vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *     Double precision mantissa represented as: 1.b1b2b3 ... b52
> + *     Constant for double precision: S = 2^48 x 1.5
> + *
> + *     2^X = 2^Xo  x  2^{X-Xo}
> + *     2^X = 2^K  x  2^fo  x  2^{X-Xo}
> + *     2^X = 2^K  x  2^fo  x  2^r
> + *
> + *     2^K  --> Manual scaling
> + *     2^fo --> Table lookup
> + *     r    --> 1 + poly    (r = X - Xo)
> + *
> + *     Xo = K  +  fo
> + *     Xo = K  +  0.x1x2x3x4
> + *
> + *     r = X - Xo
> + *       = Vreduce(X, imm)
> + *       = X - VRndScale(X, imm),    where Xo = VRndScale(X, imm)
> + *
> + *     Rnd(S + X) = S + Xo,    where S is selected as S = 2^19 x 1.5
> + *         S + X = S + floor(X) + 0.x1x2x3x4
> + *     Rnd(S + X) = Rnd(2^48 x 1.5 + X)
> + *     (Note: 2^exp x 1.b1b2b3 ... b52,  2^{exp-52} = 2^-4 for exp=48)
> + *
> + *     exp2(x) =  2^K  x  2^fo  x (1 + poly(r)),   where 2^r = 1 + poly(r)
> + *
> + *     Scale back:
> + *     dest = src1 x 2^floor(src2)
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dexp2_data_internal_avx512
> + */
> +#define Frac_PowerD0                  	0
> +#define poly_coeff1                   	128
> +#define poly_coeff2                   	192
> +#define poly_coeff3                   	256
> +#define poly_coeff4                   	320
> +#define poly_coeff5                   	384
> +#define poly_coeff6                   	448
> +#define add_const                     	512
> +#define AbsMask                       	576
> +#define Threshold                     	640
> +#define _lIndexMask                   	704
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_exp2_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   poly_coeff5+__svml_dexp2_data_internal_avx512(%rip), %zmm14
> +        vmovups   poly_coeff6+__svml_dexp2_data_internal_avx512(%rip), %zmm6
> +
> +/*
> + * Reduced argument
> + * where VREDUCE is available
> + */
> +        vreducepd $65, {sae}, %zmm0, %zmm10
> +        vmovups   poly_coeff4+__svml_dexp2_data_internal_avx512(%rip), %zmm7
> +        vmovups   add_const+__svml_dexp2_data_internal_avx512(%rip), %zmm3
> +        vmovups   poly_coeff3+__svml_dexp2_data_internal_avx512(%rip), %zmm8
> +        vmovups   __svml_dexp2_data_internal_avx512(%rip), %zmm13
> +
> +/* c6*r   + c5 */
> +        vfmadd231pd {rn-sae}, %zmm10, %zmm6, %zmm14
> +        vmovups   poly_coeff2+__svml_dexp2_data_internal_avx512(%rip), %zmm9
> +        vmovups   Threshold+__svml_dexp2_data_internal_avx512(%rip), %zmm2
> +
> +/*
> + *
> + *  HA
> + * Variables and constants
> + * Load constants and vector(s)
> + */
> +        vmovups   poly_coeff1+__svml_dexp2_data_internal_avx512(%rip), %zmm11
> +
> +/* c6*r^2 + c5*r + c4 */
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm10, %zmm14
> +
> +/*
> + * Integer form of K+0.b1b2b3b4 in lower bits - call K_plus_f0
> + * Mantisssa of normalized double precision FP: 1.b1b2...b52
> + */
> +        vaddpd    {rd-sae}, %zmm3, %zmm0, %zmm4
> +        vandpd    AbsMask+__svml_dexp2_data_internal_avx512(%rip), %zmm0, %zmm1
> +
> +/* c6*r^3 + c5*r^2 + c4*r + c3 */
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm10, %zmm14
> +        vcmppd    $29, {sae}, %zmm2, %zmm1, %k0
> +
> +/* c6*r^4 + c5*r^3 + c4*r^2 + c3*r + c2 */
> +        vfmadd213pd {rn-sae}, %zmm9, %zmm10, %zmm14
> +        kmovw     %k0, %edx
> +
> +/* c6*r^5 + c5*r^4 + c4*r^3 + c3*r^2 + c2*r + c1 */
> +        vfmadd213pd {rn-sae}, %zmm11, %zmm10, %zmm14
> +
> +/* Table value: 2^(0.b1b2b3b4) */
> +        vpandq    _lIndexMask+__svml_dexp2_data_internal_avx512(%rip), %zmm4, %zmm5
> +        vpermt2pd Frac_PowerD0+64+__svml_dexp2_data_internal_avx512(%rip), %zmm5, %zmm13
> +
> +/* T*r */
> +        vmulpd    {rn-sae}, %zmm10, %zmm13, %zmm12
> +
> +/* T + (T*r*(c6*r^5 + c5*r^4 + c4*r^3 + c3*r^2 + c2*r + c1)) */
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm12, %zmm14
> +
> +/* Scaling placed at the end to avoid accuracy loss when T*r*scale underflows */
> +        vscalefpd {rn-sae}, %zmm0, %zmm14, %zmm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %zmm1, %zmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm0, 64(%rsp)
> +        vmovups   %zmm1, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm1
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      exp2@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_exp2_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dexp2_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Frac_PowerD0[16][2];
> +        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> +        __declspec(align(64)) VUINT32 add_const[8][2];
> +        __declspec(align(64)) VUINT32 AbsMask[8][2];
> +        __declspec(align(64)) VUINT32 Threshold[8][2];
> +        __declspec(align(64)) VUINT32 _lIndexMask[8][2];
> +} __svml_dexp2_data_internal_avx512;
> +#endif
> +__svml_dexp2_data_internal_avx512:
> +        /*== Frac_PowerD0 ==*/
> +        .quad 0x3FF0000000000000
> +        .quad 0x3FF0B5586CF9890F
> +        .quad 0x3FF172B83C7D517B
> +        .quad 0x3FF2387A6E756238
> +        .quad 0x3FF306FE0A31B715
> +        .quad 0x3FF3DEA64C123422
> +        .quad 0x3FF4BFDAD5362A27
> +        .quad 0x3FF5AB07DD485429
> +        .quad 0x3FF6A09E667F3BCD
> +        .quad 0x3FF7A11473EB0187
> +        .quad 0x3FF8ACE5422AA0DB
> +        .quad 0x3FF9C49182A3F090
> +        .quad 0x3FFAE89F995AD3AD
> +        .quad 0x3FFC199BDD85529C
> +        .quad 0x3FFD5818DCFBA487
> +        .quad 0x3FFEA4AFA2A490DA
> +        .align 64
> +        .quad 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B, 0x3FE62E42FEFA398B  /*== poly_coeff1 ==*/
> +        .align 64
> +        .quad 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A, 0x3FCEBFBDFF84555A  /*== poly_coeff2 ==*/
> +        .align 64
> +        .quad 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9, 0x3FAC6B08D4AD86B9  /*== poly_coeff3 ==*/
> +        .align 64
> +        .quad 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252, 0x3F83B2AD1B172252  /*== poly_coeff4 ==*/
> +        .align 64
> +        .quad 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19, 0x3F55D7472713CD19  /*== poly_coeff5 ==*/
> +        .align 64
> +        .quad 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B, 0x3F24A1D7F526371B  /*== poly_coeff6 ==*/
> +        .align 64
> +        .quad 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000  /* add_const     */
> +        .align 64
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff  /* AbsMask       */
> +        .align 64
> +        .quad 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000, 0x408fefff00000000  /* Threshold     */
> +        .align 64
> +        .quad 0x000000000000000F, 0x000000000000000F, 0x000000000000000F, 0x000000000000000F, 0x000000000000000F, 0x000000000000000F, 0x000000000000000F, 0x000000000000000F  /* _lIndexMask   */
> +        .align 64
> +        .type	__svml_dexp2_data_internal_avx512,@object
> +        .size	__svml_dexp2_data_internal_avx512,.-__svml_dexp2_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S
> new file mode 100644
> index 0000000000..4daa687852
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized exp2f.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_exp2f _ZGVeN16v_exp2f_avx2_wrapper
> +#include "../svml_s_exp2f16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c
> new file mode 100644
> index 0000000000..e90d9d8684
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized exp2f, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_exp2f
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_exp2f, __GI__ZGVeN16v_exp2f,
> +	       __redirect__ZGVeN16v_exp2f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S
> new file mode 100644
> index 0000000000..6b512159bc
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f16_core_avx512.S
> @@ -0,0 +1,271 @@
> +/* Function exp2f vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *     Single precision mantissa represented as: 1.b1b2b3 ... b23
> + *     Constant for single precision: S = 2^19 x 1.5
> + *
> + *     2^X = 2^Xo  x  2^{X-Xo}
> + *     2^X = 2^K  x  2^fo  x  2^{X-Xo}
> + *     2^X = 2^K  x  2^fo  x  2^r
> + *
> + *     2^K  --> Manual scaling
> + *     2^fo --> Table lookup
> + *     r    --> 1 + poly    (r = X - Xo)
> + *
> + *     Xo = K  +  fo
> + *     Xo = K  +  0.x1x2x3x4
> + *
> + *     r = X - Xo
> + *       = Vreduce(X, imm)
> + *       = X - VRndScale(X, imm),    where Xo = VRndScale(X, imm)
> + *
> + *     Rnd(S + X) = S + Xo,    where S is selected as S = 2^19 x 1.5
> + *         S + X = S + floor(X) + 0.x1x2x3x4
> + *     Rnd(S + X) = Rnd(2^19 x 1.5 + X)
> + *     (Note: 2^exp x 1.b1b2b3 ... b23,  2^{exp-23} = 2^-4 for exp=19)
> + *
> + *     exp2(x) =  2^K  x  2^fo  x (1 + poly(r)),   where 2^r = 1 + poly(r)
> + *
> + *     Scale back:
> + *     dest = src1 x 2^floor(src2)
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_sexp2_data_internal_avx512
> + */
> +#define Frac_PowerS0                  	0
> +#define poly_coeff1                   	64
> +#define poly_coeff2                   	128
> +#define poly_coeff3                   	192
> +#define add_const                     	256
> +#define AbsMask                       	320
> +#define Threshold                     	384
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_exp2f_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   add_const+__svml_sexp2_data_internal_avx512(%rip), %zmm3
> +
> +/*
> + * Reduced argument
> + * where VREDUCE is available
> + */
> +        vreduceps $65, {sae}, %zmm0, %zmm6
> +        vmovups   poly_coeff3+__svml_sexp2_data_internal_avx512(%rip), %zmm5
> +        vmovups   poly_coeff2+__svml_sexp2_data_internal_avx512(%rip), %zmm10
> +        vmovups   Threshold+__svml_sexp2_data_internal_avx512(%rip), %zmm2
> +
> +/*
> + *
> + *  HA
> + * Variables and constants
> + * Load constants and vector(s)
> + */
> +        vmovups   poly_coeff1+__svml_sexp2_data_internal_avx512(%rip), %zmm7
> +
> +/*
> + * Integer form of K+0.b1b2b3b4 in lower bits - call K_plus_f0
> + * Mantisssa of normalized single precision FP: 1.b1b2...b23
> + */
> +        vaddps    {rd-sae}, %zmm3, %zmm0, %zmm4
> +        vandps    AbsMask+__svml_sexp2_data_internal_avx512(%rip), %zmm0, %zmm1
> +
> +/* c3*r   + c2 */
> +        vfmadd231ps {rn-sae}, %zmm6, %zmm5, %zmm10
> +        vcmpps    $30, {sae}, %zmm2, %zmm1, %k0
> +
> +/* c3*r^2 + c2*r + c1 */
> +        vfmadd213ps {rn-sae}, %zmm7, %zmm6, %zmm10
> +
> +/* Table value: 2^(0.b1b2b3b4) */
> +        vpermps   __svml_sexp2_data_internal_avx512(%rip), %zmm4, %zmm9
> +        kmovw     %k0, %edx
> +
> +/* T*r */
> +        vmulps    {rn-sae}, %zmm6, %zmm9, %zmm8
> +
> +/* T + (T*r*(c3*r^2 + c2*r + c1) */
> +        vfmadd213ps {rn-sae}, %zmm9, %zmm8, %zmm10
> +
> +/* Scaling placed at the end to avoid accuracy loss when T*r*scale underflows */
> +        vscalefps {rn-sae}, %zmm0, %zmm10, %zmm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %zmm1, %zmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm0, 64(%rsp)
> +        vmovups   %zmm1, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm1
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      exp2f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_exp2f_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_sexp2_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Frac_PowerS0[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
> +        __declspec(align(64)) VUINT32 add_const[16][1];
> +        __declspec(align(64)) VUINT32 AbsMask[16][1];
> +        __declspec(align(64)) VUINT32 Threshold[16][1];
> +} __svml_sexp2_data_internal_avx512;
> +#endif
> +__svml_sexp2_data_internal_avx512:
> +        /*== Frac_PowerS0 ==*/
> +        .long 0x3F800000
> +        .long 0x3F85AAC3
> +        .long 0x3F8B95C2
> +        .long 0x3F91C3D3
> +        .long 0x3F9837F0
> +        .long 0x3F9EF532
> +        .long 0x3FA5FED7
> +        .long 0x3FAD583F
> +        .long 0x3FB504F3
> +        .long 0x3FBD08A4
> +        .long 0x3FC5672A
> +        .long 0x3FCE248C
> +        .long 0x3FD744FD
> +        .long 0x3FE0CCDF
> +        .long 0x3FEAC0C7
> +        .long 0x3FF5257D
> +        .align 64
> +        .long 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222, 0x3F317222  /*== poly_coeff1 ==*/
> +        .align 64
> +        .long 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B, 0x3E75F16B  /*== poly_coeff2 ==*/
> +        .align 64
> +        .long 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA, 0x3D6854CA  /*== poly_coeff3 ==*/
> +        .align 64
> +        .long 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000, 0x49400000   /* add_const */
> +        .align 64
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff   /* AbsMask   */
> +        .align 64
> +        .long 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000   /* Threshold=126.0 */
> +        .align 64
> +        .type	__svml_sexp2_data_internal_avx512,@object
> +        .size	__svml_sexp2_data_internal_avx512,.-__svml_sexp2_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S
> new file mode 100644
> index 0000000000..0b3fec834c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized exp2f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_exp2f _ZGVbN4v_exp2f_sse2
> +#include "../svml_s_exp2f4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c
> new file mode 100644
> index 0000000000..db47118d97
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized exp2f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_exp2f
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_exp2f, __GI__ZGVbN4v_exp2f,
> +	       __redirect__ZGVbN4v_exp2f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S
> new file mode 100644
> index 0000000000..0d9f45d5c3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f4_core_sse4.S
> @@ -0,0 +1,238 @@
> +/* Function exp2f vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   exp2(x)  = 2^n * T[j] * (1 + P(y))
> + *   where
> + *        x = m*(1/K) + y,    y in [-1/K..1/K]
> + *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
> + *
> + *        values of 2^j/K are tabulated
> + *
> + *        P(y) is a minimax polynomial approximation of exp2(x)-1
> + *        on small interval [-1/K..1/K]
> + *
> + *  Special cases:
> + *
> + *   exp2(NaN)  = NaN
> + *   exp2(+INF) = +INF
> + *   exp2(-INF) = 0
> + *   exp2(x)    = 1 for subnormals
> + *   For IEEE float
> + *     if x >= 128.0 then exp2f(x) overflow
> + *     if x < -151.0 then exp2f(x) underflow
> + *
> + */
> +
> +/* Offsets for data table __svml_sexp2_data_internal
> + */
> +#define _sShifter                     	0
> +#define _sPC0                         	16
> +#define _sPC1                         	32
> +#define _sPC2                         	48
> +#define _sPC3                         	64
> +#define _sPC4                         	80
> +#define _sPC5                         	96
> +#define _sPC6                         	112
> +#define _iAbsMask                     	128
> +#define _iDomainRange                 	144
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_exp2f_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +
> +/* Check for overflow\underflow  */
> +        movups    __svml_sexp2_data_internal(%rip), %xmm1
> +
> +/*  Implementation  */
> +        movaps    %xmm1, %xmm5
> +
> +/*  Polynomial  */
> +        movups    _sPC6+__svml_sexp2_data_internal(%rip), %xmm4
> +        addps     %xmm0, %xmm5
> +        movaps    %xmm5, %xmm3
> +
> +/*  2^N  */
> +        pslld     $23, %xmm5
> +
> +/* Check for overflow\underflow  */
> +        movdqu    _iAbsMask+__svml_sexp2_data_internal(%rip), %xmm2
> +        subps     %xmm1, %xmm3
> +
> +/*  R  */
> +        movaps    %xmm0, %xmm1
> +        pand      %xmm0, %xmm2
> +        pcmpgtd   _iDomainRange+__svml_sexp2_data_internal(%rip), %xmm2
> +        subps     %xmm3, %xmm1
> +        movmskps  %xmm2, %edx
> +        mulps     %xmm1, %xmm4
> +        addps     _sPC5+__svml_sexp2_data_internal(%rip), %xmm4
> +        mulps     %xmm1, %xmm4
> +        addps     _sPC4+__svml_sexp2_data_internal(%rip), %xmm4
> +        mulps     %xmm1, %xmm4
> +        addps     _sPC3+__svml_sexp2_data_internal(%rip), %xmm4
> +        mulps     %xmm1, %xmm4
> +        addps     _sPC2+__svml_sexp2_data_internal(%rip), %xmm4
> +        mulps     %xmm1, %xmm4
> +        addps     _sPC1+__svml_sexp2_data_internal(%rip), %xmm4
> +        mulps     %xmm4, %xmm1
> +        addps     _sPC0+__svml_sexp2_data_internal(%rip), %xmm1
> +
> +/*  Reconstruction  */
> +        paddd     %xmm5, %xmm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm1, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm1, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      exp2f@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_exp2f_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_sexp2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _sShifter[4][1];
> +        __declspec(align(16)) VUINT32 _sPC0[4][1];
> +        __declspec(align(16)) VUINT32 _sPC1[4][1];
> +        __declspec(align(16)) VUINT32 _sPC2[4][1];
> +        __declspec(align(16)) VUINT32 _sPC3[4][1];
> +        __declspec(align(16)) VUINT32 _sPC4[4][1];
> +        __declspec(align(16)) VUINT32 _sPC5[4][1];
> +        __declspec(align(16)) VUINT32 _sPC6[4][1];
> +        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
> +} __svml_sexp2_data_internal;
> +#endif
> +__svml_sexp2_data_internal:
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000   /* _sShifter */
> +        .align 16
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000   /* _sPC0  */
> +        .align 16
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218   /* _sPC1  */
> +        .align 16
> +        .long 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef   /* _sPC2  */
> +        .align 16
> +        .long 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf   /* _sPC3  */
> +        .align 16
> +        .long 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c   /* _sPC4  */
> +        .align 16
> +        .long 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51   /* _sPC5  */
> +        .align 16
> +        .long 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c   /* _sPC6  */
> +        //common
> +        .align 16
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff   /* _iAbsMask */
> +        .align 16
> +        .long 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000   /* _iDomainRange=126.0 */
> +        .align 16
> +        .type	__svml_sexp2_data_internal,@object
> +        .size	__svml_sexp2_data_internal,.-__svml_sexp2_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S
> new file mode 100644
> index 0000000000..4da2278ed8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized exp2f, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_exp2f _ZGVdN8v_exp2f_sse_wrapper
> +#include "../svml_s_exp2f8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c
> new file mode 100644
> index 0000000000..dc34671263
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized exp2f, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_exp2f
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_exp2f, __GI__ZGVdN8v_exp2f,
> +	       __redirect__ZGVdN8v_exp2f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S
> new file mode 100644
> index 0000000000..aa7af4be79
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp2f8_core_avx2.S
> @@ -0,0 +1,245 @@
> +/* Function exp2f vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   exp2(x)  = 2^n * T[j] * (1 + P(y))
> + *   where
> + *        x = m*(1/K) + y,    y in [-1/K..1/K]
> + *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
> + *
> + *        values of 2^j/K are tabulated
> + *
> + *        P(y) is a minimax polynomial approximation of exp2(x)-1
> + *        on small interval [-1/K..1/K]
> + *
> + *  Special cases:
> + *
> + *   exp2(NaN)  = NaN
> + *   exp2(+INF) = +INF
> + *   exp2(-INF) = 0
> + *   exp2(x)    = 1 for subnormals
> + *   For IEEE float
> + *     if x >= 128.0 then exp2f(x) overflow
> + *     if x < -151.0 then exp2f(x) underflow
> + *
> + */
> +
> +/* Offsets for data table __svml_sexp2_data_internal
> + */
> +#define _sShifter                     	0
> +#define _sPC0                         	32
> +#define _sPC1                         	64
> +#define _sPC2                         	96
> +#define _sPC3                         	128
> +#define _sPC4                         	160
> +#define _sPC5                         	192
> +#define _sPC6                         	224
> +#define _iAbsMask                     	256
> +#define _iDomainRange                 	288
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_exp2f_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        vmovups   __svml_sexp2_data_internal(%rip), %ymm1
> +
> +/* Check for overflow\underflow  */
> +        vmovups   _sPC6+__svml_sexp2_data_internal(%rip), %ymm7
> +
> +/*  Implementation  */
> +        vaddps    %ymm1, %ymm0, %ymm6
> +        vsubps    %ymm1, %ymm6, %ymm4
> +
> +/*  2^N  */
> +        vpslld    $23, %ymm6, %ymm8
> +
> +/*  R  */
> +        vsubps    %ymm4, %ymm0, %ymm5
> +
> +/*  Polynomial  */
> +        vfmadd213ps _sPC5+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
> +        vfmadd213ps _sPC4+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
> +        vfmadd213ps _sPC3+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
> +        vfmadd213ps _sPC2+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
> +        vfmadd213ps _sPC1+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
> +        vfmadd213ps _sPC0+__svml_sexp2_data_internal(%rip), %ymm5, %ymm7
> +
> +/* Check for overflow\underflow  */
> +        vandps    _iAbsMask+__svml_sexp2_data_internal(%rip), %ymm0, %ymm2
> +        vpcmpgtd  _iDomainRange+__svml_sexp2_data_internal(%rip), %ymm2, %ymm3
> +        vmovmskps %ymm3, %edx
> +
> +/*  Reconstruction  */
> +        vpaddd    %ymm8, %ymm7, %ymm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %ymm1, %ymm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm0, 32(%rsp)
> +        vmovups   %ymm1, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm1
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      exp2f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_exp2f_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_sexp2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _sShifter[8][1];
> +        __declspec(align(32)) VUINT32 _sPC0[8][1];
> +        __declspec(align(32)) VUINT32 _sPC1[8][1];
> +        __declspec(align(32)) VUINT32 _sPC2[8][1];
> +        __declspec(align(32)) VUINT32 _sPC3[8][1];
> +        __declspec(align(32)) VUINT32 _sPC4[8][1];
> +        __declspec(align(32)) VUINT32 _sPC5[8][1];
> +        __declspec(align(32)) VUINT32 _sPC6[8][1];
> +        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
> +} __svml_sexp2_data_internal;
> +#endif
> +__svml_sexp2_data_internal:
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000   /* _sShifter */
> +        .align 32
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000   /* _sPC0  */
> +        .align 32
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218   /* _sPC1  */
> +        .align 32
> +        .long 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef, 0x3e75fdef   /* _sPC2  */
> +        .align 32
> +        .long 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf, 0x3d6357cf   /* _sPC3  */
> +        .align 32
> +        .long 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c, 0x3c1d962c   /* _sPC4  */
> +        .align 32
> +        .long 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51, 0x3aaf7a51   /* _sPC5  */
> +        .align 32
> +        .long 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c, 0x39213c8c   /* _sPC6  */
> +        //common
> +        .align 32
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff   /* _iAbsMask */
> +        .align 32
> +        .long 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000, 0x42fc0000   /* _iDomainRange=126.0 */
> +        .align 32
> +        .type	__svml_sexp2_data_internal,@object
> +        .size	__svml_sexp2_data_internal,.-__svml_sexp2_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_exp22_core.S b/sysdeps/x86_64/fpu/svml_d_exp22_core.S
> new file mode 100644
> index 0000000000..f03080a977
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_exp22_core.S
> @@ -0,0 +1,29 @@
> +/* Function exp2 vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_exp2)
> +WRAPPER_IMPL_SSE2 exp2
> +END (_ZGVbN2v_exp2)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_exp2)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_exp24_core.S b/sysdeps/x86_64/fpu/svml_d_exp24_core.S
> new file mode 100644
> index 0000000000..40475c7a94
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_exp24_core.S
> @@ -0,0 +1,29 @@
> +/* Function exp2 vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_exp2)
> +WRAPPER_IMPL_AVX _ZGVbN2v_exp2
> +END (_ZGVdN4v_exp2)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_exp2)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S b/sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S
> new file mode 100644
> index 0000000000..a7d22409df
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_exp24_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function exp2 vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_exp2)
> +WRAPPER_IMPL_AVX _ZGVbN2v_exp2
> +END (_ZGVcN4v_exp2)
> diff --git a/sysdeps/x86_64/fpu/svml_d_exp28_core.S b/sysdeps/x86_64/fpu/svml_d_exp28_core.S
> new file mode 100644
> index 0000000000..f68aaed427
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_exp28_core.S
> @@ -0,0 +1,25 @@
> +/* Function exp2 vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_exp2)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_exp2
> +END (_ZGVeN8v_exp2)
> diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f16_core.S b/sysdeps/x86_64/fpu/svml_s_exp2f16_core.S
> new file mode 100644
> index 0000000000..8ba4e82272
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_exp2f16_core.S
> @@ -0,0 +1,25 @@
> +/* Function exp2f vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_exp2f)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_exp2f
> +END (_ZGVeN16v_exp2f)
> diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f4_core.S b/sysdeps/x86_64/fpu/svml_s_exp2f4_core.S
> new file mode 100644
> index 0000000000..916f176dca
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_exp2f4_core.S
> @@ -0,0 +1,29 @@
> +/* Function exp2f vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_exp2f)
> +WRAPPER_IMPL_SSE2 exp2f
> +END (_ZGVbN4v_exp2f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_exp2f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f8_core.S b/sysdeps/x86_64/fpu/svml_s_exp2f8_core.S
> new file mode 100644
> index 0000000000..b8821b952b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_exp2f8_core.S
> @@ -0,0 +1,29 @@
> +/* Function exp2f vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_exp2f)
> +WRAPPER_IMPL_AVX _ZGVbN4v_exp2f
> +END (_ZGVdN8v_exp2f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_exp2f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S
> new file mode 100644
> index 0000000000..ddaaf3b59a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_exp2f8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function exp2f vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_exp2f)
> +WRAPPER_IMPL_AVX _ZGVbN4v_exp2f
> +END (_ZGVcN8v_exp2f)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c
> new file mode 100644
> index 0000000000..341ec99724
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-exp2.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c
> new file mode 100644
> index 0000000000..341ec99724
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-exp2.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c
> new file mode 100644
> index 0000000000..341ec99724
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-exp2.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp2.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp2.c
> new file mode 100644
> index 0000000000..b3b04f63e4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp2.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC exp2
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 9bc9d1dafa..2f7172bd7b 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index c41994d90a..e2d519faac 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 881f6c801a..1ce4d8b413 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 6fd106fe68..6c87cec648 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c
> new file mode 100644
> index 0000000000..0281d386fb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-exp2f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c
> new file mode 100644
> index 0000000000..0281d386fb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-exp2f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c
> new file mode 100644
> index 0000000000..0281d386fb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-exp2f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c
> new file mode 100644
> index 0000000000..bf57661bee
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp2f.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC exp2f
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 4c2ea6ddfe..597d7d7598 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index 1d5d952d07..3500eec810 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index 7a750f3781..921b9c65d6 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -34,6 +34,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index af816a7789..6cbcb57521 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 02/18] x86-64: Add vector asin/asinf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 02/18] x86-64: Add vector asin/asinf " Sunil K Pandey
@ 2021-12-29 21:25   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:25 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:44PM -0800, Sunil K Pandey wrote:
> Implement vectorized asin/asinf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector asin/asinf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 ++
>  .../fpu/multiarch/svml_d_asin2_core-sse2.S    |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_asin2_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_asin2_core_sse4.S    | 288 +++++++++++++++++
>  .../fpu/multiarch/svml_d_asin4_core-sse.S     |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_asin4_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_asin4_core_avx2.S    | 273 ++++++++++++++++
>  .../fpu/multiarch/svml_d_asin8_core-avx2.S    |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_asin8_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_asin8_core_avx512.S  | 295 ++++++++++++++++++
>  .../fpu/multiarch/svml_s_asinf16_core-avx2.S  |  20 ++
>  .../fpu/multiarch/svml_s_asinf16_core.c       |  28 ++
>  .../multiarch/svml_s_asinf16_core_avx512.S    | 260 +++++++++++++++
>  .../fpu/multiarch/svml_s_asinf4_core-sse2.S   |  20 ++
>  .../x86_64/fpu/multiarch/svml_s_asinf4_core.c |  28 ++
>  .../fpu/multiarch/svml_s_asinf4_core_sse4.S   | 252 +++++++++++++++
>  .../fpu/multiarch/svml_s_asinf8_core-sse.S    |  20 ++
>  .../x86_64/fpu/multiarch/svml_s_asinf8_core.c |  28 ++
>  .../fpu/multiarch/svml_s_asinf8_core_avx2.S   | 249 +++++++++++++++
>  sysdeps/x86_64/fpu/svml_d_asin2_core.S        |  29 ++
>  sysdeps/x86_64/fpu/svml_d_asin4_core.S        |  29 ++
>  sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S    |  25 ++
>  sysdeps/x86_64/fpu/svml_d_asin8_core.S        |  25 ++
>  sysdeps/x86_64/fpu/svml_s_asinf16_core.S      |  25 ++
>  sysdeps/x86_64/fpu/svml_s_asinf4_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_s_asinf8_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S   |  25 ++
>  .../x86_64/fpu/test-double-libmvec-asin-avx.c |   1 +
>  .../fpu/test-double-libmvec-asin-avx2.c       |   1 +
>  .../fpu/test-double-libmvec-asin-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-asin.c |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-asinf-avx.c |   1 +
>  .../fpu/test-float-libmvec-asinf-avx2.c       |   1 +
>  .../fpu/test-float-libmvec-asinf-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-asinf.c |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 2189 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_asin2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_asin4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_asin8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asin.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index b4647ca918..ae8ee882d0 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -120,4 +120,15 @@
>  #define __DECL_SIMD_atanf32x
>  #define __DECL_SIMD_atanf64x
>  #define __DECL_SIMD_atanf128x
> +
> +#define __DECL_SIMD_asin
> +#define __DECL_SIMD_asinf
> +#define __DECL_SIMD_asinl
> +#define __DECL_SIMD_asinf16
> +#define __DECL_SIMD_asinf32
> +#define __DECL_SIMD_asinf64
> +#define __DECL_SIMD_asinf128
> +#define __DECL_SIMD_asinf32x
> +#define __DECL_SIMD_asinf64x
> +#define __DECL_SIMD_asinf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 3e27c21f21..bb53b7021e 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -52,7 +52,7 @@
>  /* Arc cosine of X.  */
>  __MATHCALL_VEC (acos,, (_Mdouble_ __x));
>  /* Arc sine of X.  */
> -__MATHCALL (asin,, (_Mdouble_ __x));
> +__MATHCALL_VEC (asin,, (_Mdouble_ __x));
>  /* Arc tangent of X.  */
>  __MATHCALL_VEC (atan,, (_Mdouble_ __x));
>  /* Arc tangent of Y/X.  */
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index a93258db6f..ab03a07f92 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -47,18 +47,26 @@ GLIBC_2.22 _ZGVeN8v_sin F
>  GLIBC_2.22 _ZGVeN8vv_pow F
>  GLIBC_2.22 _ZGVeN8vvv_sincos F
>  GLIBC_2.35 _ZGVbN2v_acos F
> +GLIBC_2.35 _ZGVbN2v_asin F
>  GLIBC_2.35 _ZGVbN2v_atan F
>  GLIBC_2.35 _ZGVbN4v_acosf F
> +GLIBC_2.35 _ZGVbN4v_asinf F
>  GLIBC_2.35 _ZGVbN4v_atanf F
>  GLIBC_2.35 _ZGVcN4v_acos F
> +GLIBC_2.35 _ZGVcN4v_asin F
>  GLIBC_2.35 _ZGVcN4v_atan F
>  GLIBC_2.35 _ZGVcN8v_acosf F
> +GLIBC_2.35 _ZGVcN8v_asinf F
>  GLIBC_2.35 _ZGVcN8v_atanf F
>  GLIBC_2.35 _ZGVdN4v_acos F
> +GLIBC_2.35 _ZGVdN4v_asin F
>  GLIBC_2.35 _ZGVdN4v_atan F
>  GLIBC_2.35 _ZGVdN8v_acosf F
> +GLIBC_2.35 _ZGVdN8v_asinf F
>  GLIBC_2.35 _ZGVdN8v_atanf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
> +GLIBC_2.35 _ZGVeN16v_asinf F
>  GLIBC_2.35 _ZGVeN16v_atanf F
>  GLIBC_2.35 _ZGVeN8v_acos F
> +GLIBC_2.35 _ZGVeN8v_asin F
>  GLIBC_2.35 _ZGVeN8v_atan F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 1c0e5c5e35..73cb8849ff 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -66,6 +66,10 @@
>  #  define __DECL_SIMD_atan __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_atanf
>  #  define __DECL_SIMD_atanf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_asin
> +#  define __DECL_SIMD_asin __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_asinf
> +#  define __DECL_SIMD_asinf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index ddcccb11d7..4552c2bdfa 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -32,6 +32,8 @@
>  !GCC$ builtin (acosf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (atan) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (atanf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (asin) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (asinf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -49,3 +51,5 @@
>  !GCC$ builtin (acosf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (atan) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (atanf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (asin) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (asinf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index dae0887f13..e0eae0b196 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -23,6 +23,7 @@ postclean-generated += libmvec.mk
>  # Define for both math and mathvec directories.
>  libmvec-funcs = \
>    acos \
> +  asin \
>    atan \
>    cos \
>    exp \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 424f6d526e..10baf869a5 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -15,8 +15,10 @@ libmvec {
>    }
>    GLIBC_2.35 {
>      _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
> +    _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
>      _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
> +    _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
>      _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
>    }
>  }
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 2e64e59803..ea0f833381 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -93,6 +93,26 @@ float: 1
>  float128: 2
>  ldouble: 1
>  
> +Function: "asin_vlen16":
> +float: 1
> +
> +Function: "asin_vlen2":
> +double: 1
> +
> +Function: "asin_vlen4":
> +double: 1
> +float: 1
> +
> +Function: "asin_vlen4_avx2":
> +double: 1
> +
> +Function: "asin_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "asin_vlen8_avx2":
> +float: 1
> +
>  Function: "asinh":
>  double: 2
>  float: 2
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core-sse2.S
> new file mode 100644
> index 0000000000..57e1d41a7b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized asin, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_asin _ZGVbN2v_asin_sse2
> +#include "../svml_d_asin2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core.c
> new file mode 100644
> index 0000000000..e46c3af81e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized asin, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_asin
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_asin, __GI__ZGVbN2v_asin, __redirect__ZGVbN2v_asin)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core_sse4.S
> new file mode 100644
> index 0000000000..a6f7a41623
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin2_core_sse4.S
> @@ -0,0 +1,288 @@
> +/* Function asin vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      SelMask = (|x| >= 0.5) ? 1 : 0;
> + *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
> + *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
> + *
> + */
> +
> +/* Offsets for data table __svml_dasin_data_internal
> + */
> +#define AbsMask                       	0
> +#define OneHalf                       	16
> +#define SmallNorm                     	32
> +#define One                           	48
> +#define Two                           	64
> +#define sqrt_coeff                    	80
> +#define poly_coeff                    	144
> +#define Pi2H                          	336
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_asin_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm5
> +        movups    __svml_dasin_data_internal(%rip), %xmm3
> +        movups    OneHalf+__svml_dasin_data_internal(%rip), %xmm8
> +
> +/* x = |arg| */
> +        movaps    %xmm3, %xmm4
> +        andps     %xmm5, %xmm4
> +
> +/* Y = 0.5 - 0.5*x */
> +        movaps    %xmm8, %xmm6
> +        mulpd     %xmm4, %xmm6
> +        movaps    %xmm8, %xmm14
> +
> +/* x^2 */
> +        movaps    %xmm4, %xmm2
> +        subpd     %xmm6, %xmm14
> +        mulpd     %xmm4, %xmm2
> +
> +/* S ~ -2*sqrt(Y) */
> +        cvtpd2ps  %xmm14, %xmm9
> +        minpd     %xmm14, %xmm2
> +        movlhps   %xmm9, %xmm9
> +        movaps    %xmm14, %xmm15
> +        rsqrtps   %xmm9, %xmm10
> +        cmpltpd   SmallNorm+__svml_dasin_data_internal(%rip), %xmm15
> +        addpd     %xmm14, %xmm14
> +        cvtps2pd  %xmm10, %xmm11
> +        andnps    %xmm11, %xmm15
> +        movaps    %xmm4, %xmm1
> +        movaps    %xmm15, %xmm12
> +        andnps    %xmm5, %xmm3
> +        mulpd     %xmm15, %xmm12
> +        mulpd     %xmm14, %xmm15
> +        mulpd     %xmm12, %xmm14
> +        cmpnltpd  %xmm8, %xmm1
> +        subpd     Two+__svml_dasin_data_internal(%rip), %xmm14
> +
> +/* polynomial */
> +        movups    poly_coeff+__svml_dasin_data_internal(%rip), %xmm6
> +        movaps    %xmm2, %xmm12
> +        mulpd     %xmm2, %xmm6
> +        mulpd     %xmm2, %xmm12
> +        addpd     poly_coeff+16+__svml_dasin_data_internal(%rip), %xmm6
> +        movups    One+__svml_dasin_data_internal(%rip), %xmm7
> +        movaps    %xmm12, %xmm8
> +        cmpltpd   %xmm4, %xmm7
> +        mulpd     %xmm12, %xmm6
> +        movmskpd  %xmm7, %edx
> +        movups    poly_coeff+32+__svml_dasin_data_internal(%rip), %xmm9
> +        movaps    %xmm14, %xmm0
> +        movups    poly_coeff+64+__svml_dasin_data_internal(%rip), %xmm7
> +        mulpd     %xmm2, %xmm9
> +        mulpd     %xmm2, %xmm7
> +        addpd     poly_coeff+48+__svml_dasin_data_internal(%rip), %xmm9
> +        addpd     poly_coeff+80+__svml_dasin_data_internal(%rip), %xmm7
> +        mulpd     %xmm12, %xmm8
> +        mulpd     %xmm12, %xmm7
> +        addpd     %xmm6, %xmm9
> +        mulpd     %xmm15, %xmm0
> +        mulpd     %xmm8, %xmm9
> +        movups    poly_coeff+96+__svml_dasin_data_internal(%rip), %xmm10
> +        mulpd     %xmm2, %xmm10
> +        movups    sqrt_coeff+__svml_dasin_data_internal(%rip), %xmm13
> +        mulpd     %xmm14, %xmm13
> +        addpd     poly_coeff+112+__svml_dasin_data_internal(%rip), %xmm10
> +        addpd     sqrt_coeff+16+__svml_dasin_data_internal(%rip), %xmm13
> +        addpd     %xmm7, %xmm10
> +        mulpd     %xmm14, %xmm13
> +        addpd     %xmm9, %xmm10
> +        addpd     sqrt_coeff+32+__svml_dasin_data_internal(%rip), %xmm13
> +        mulpd     %xmm12, %xmm10
> +        mulpd     %xmm13, %xmm14
> +        movups    poly_coeff+128+__svml_dasin_data_internal(%rip), %xmm11
> +        mulpd     %xmm2, %xmm11
> +        addpd     sqrt_coeff+48+__svml_dasin_data_internal(%rip), %xmm14
> +        addpd     poly_coeff+144+__svml_dasin_data_internal(%rip), %xmm11
> +        mulpd     %xmm14, %xmm0
> +        addpd     %xmm10, %xmm11
> +        subpd     %xmm15, %xmm0
> +        mulpd     %xmm11, %xmm12
> +        movups    poly_coeff+160+__svml_dasin_data_internal(%rip), %xmm13
> +        movaps    %xmm1, %xmm14
> +        mulpd     %xmm2, %xmm13
> +        addpd     poly_coeff+176+__svml_dasin_data_internal(%rip), %xmm13
> +        addpd     %xmm12, %xmm13
> +        mulpd     %xmm13, %xmm2
> +        andnps    %xmm4, %xmm14
> +        andps     %xmm1, %xmm0
> +        orps      %xmm0, %xmm14
> +        mulpd     %xmm14, %xmm2
> +        addpd     %xmm2, %xmm14
> +        movups    Pi2H+__svml_dasin_data_internal(%rip), %xmm0
> +        andps     %xmm1, %xmm0
> +        addpd     %xmm14, %xmm0
> +        pxor      %xmm3, %xmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm5
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm5, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      asin@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN2v_asin_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dasin_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 AbsMask[2][2];
> +        __declspec(align(16)) VUINT32 OneHalf[2][2];
> +        __declspec(align(16)) VUINT32 SmallNorm[2][2];
> +        __declspec(align(16)) VUINT32 One[2][2];
> +        __declspec(align(16)) VUINT32 Two[2][2];
> +        __declspec(align(16)) VUINT32 sqrt_coeff[4][2][2];
> +        __declspec(align(16)) VUINT32 poly_coeff[12][2][2];
> +        __declspec(align(16)) VUINT32 Pi2H[2][2];
> +} __svml_dasin_data_internal;
> +#endif
> +__svml_dasin_data_internal:
> +        /*== AbsMask ==*/
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== OneHalf ==*/
> +        .align 16
> +        .quad 0x3fe0000000000000, 0x3fe0000000000000
> +        /*== SmallNorm ==*/
> +        .align 16
> +        .quad 0x3000000000000000, 0x3000000000000000
> +        /*== One ==*/
> +        .align 16
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== Two ==*/
> +        .align 16
> +        .quad 0x4000000000000000, 0x4000000000000000
> +        /*== sqrt_coeff[4] ==*/
> +        .align 16
> +        .quad 0xbf918000993B24C3, 0xbf918000993B24C3 /* sqrt_coeff4 */
> +        .quad 0x3fa400006F70D42D, 0x3fa400006F70D42D /* sqrt_coeff3 */
> +        .quad 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97 /* sqrt_coeff2 */
> +        .quad 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D /* sqrt_coeff1 */
> +        /*== poly_coeff[12] ==*/
> +        .align 16
> +        .quad 0x3fa07520C70EB909, 0x3fa07520C70EB909 /* poly_coeff12 */
> +        .quad 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED /* poly_coeff11 */
> +        .quad 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE /* poly_coeff10 */
> +        .quad 0x3f7A583395D45ED5, 0x3f7A583395D45ED5 /* poly_coeff9 */
> +        .quad 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6 /* poly_coeff8 */
> +        .quad 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57 /* poly_coeff7 */
> +        .quad 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E /* poly_coeff6 */
> +        .quad 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd /* poly_coeff5 */
> +        .quad 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE /* poly_coeff4 */
> +        .quad 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8 /* poly_coeff3 */
> +        .quad 0x3fb333333337E0DE, 0x3fb333333337E0DE /* poly_coeff2 */
> +        .quad 0x3fc555555555529C, 0x3fc555555555529C /* poly_coeff1 */
> +        /*== Pi2H ==*/
> +        .align 16
> +        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18
> +        .align 16
> +        .type	__svml_dasin_data_internal,@object
> +        .size	__svml_dasin_data_internal,.-__svml_dasin_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core-sse.S
> new file mode 100644
> index 0000000000..1006fddc59
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized asin, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_asin _ZGVdN4v_asin_sse_wrapper
> +#include "../svml_d_asin4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core.c
> new file mode 100644
> index 0000000000..b896516f5e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized asin, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_asin
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_asin, __GI__ZGVdN4v_asin, __redirect__ZGVdN4v_asin)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core_avx2.S
> new file mode 100644
> index 0000000000..80467b616f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin4_core_avx2.S
> @@ -0,0 +1,273 @@
> +/* Function asin vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      SelMask = (|x| >= 0.5) ? 1 : 0;
> + *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
> + *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
> + *
> + */
> +
> +/* Offsets for data table __svml_dasin_data_internal
> + */
> +#define AbsMask                       	0
> +#define OneHalf                       	32
> +#define SmallNorm                     	64
> +#define One                           	96
> +#define Two                           	128
> +#define sqrt_coeff                    	160
> +#define poly_coeff                    	288
> +#define Pi2H                          	672
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_asin_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        vmovupd   __svml_dasin_data_internal(%rip), %ymm6
> +        vmovupd   OneHalf+__svml_dasin_data_internal(%rip), %ymm10
> +        vmovupd   One+__svml_dasin_data_internal(%rip), %ymm8
> +        vmovapd   %ymm0, %ymm5
> +
> +/* x = |arg| */
> +        vandpd    %ymm5, %ymm6, %ymm4
> +
> +/* Y = 0.5 - 0.5*x */
> +        vmovapd   %ymm10, %ymm15
> +        vfnmadd231pd %ymm4, %ymm10, %ymm15
> +
> +/* x^2 */
> +        vmulpd    %ymm4, %ymm4, %ymm7
> +        vcmplt_oqpd %ymm4, %ymm8, %ymm9
> +
> +/* S ~ -2*sqrt(Y) */
> +        vcmplt_oqpd SmallNorm+__svml_dasin_data_internal(%rip), %ymm15, %ymm13
> +        vminpd    %ymm15, %ymm7, %ymm2
> +        vaddpd    %ymm15, %ymm15, %ymm7
> +        vcmpnlt_uqpd %ymm10, %ymm4, %ymm1
> +        vcvtpd2ps %ymm15, %xmm11
> +        vmovupd   poly_coeff+64+__svml_dasin_data_internal(%rip), %ymm10
> +        vmulpd    %ymm2, %ymm2, %ymm15
> +        vrsqrtps  %xmm11, %xmm12
> +        vmovupd   poly_coeff+192+__svml_dasin_data_internal(%rip), %ymm11
> +        vfmadd213pd poly_coeff+96+__svml_dasin_data_internal(%rip), %ymm2, %ymm10
> +        vcvtps2pd %xmm12, %ymm14
> +        vmulpd    %ymm15, %ymm15, %ymm12
> +        vfmadd213pd poly_coeff+224+__svml_dasin_data_internal(%rip), %ymm2, %ymm11
> +        vandnpd   %ymm14, %ymm13, %ymm0
> +        vandnpd   %ymm5, %ymm6, %ymm3
> +        vmulpd    %ymm0, %ymm0, %ymm6
> +        vmovupd   poly_coeff+128+__svml_dasin_data_internal(%rip), %ymm13
> +        vmovupd   poly_coeff+256+__svml_dasin_data_internal(%rip), %ymm14
> +        vfmadd213pd poly_coeff+160+__svml_dasin_data_internal(%rip), %ymm2, %ymm13
> +        vfmadd213pd poly_coeff+288+__svml_dasin_data_internal(%rip), %ymm2, %ymm14
> +        vfmadd213pd %ymm11, %ymm15, %ymm13
> +        vmovmskpd %ymm9, %edx
> +        vmulpd    %ymm7, %ymm0, %ymm9
> +        vfmsub213pd Two+__svml_dasin_data_internal(%rip), %ymm6, %ymm7
> +
> +/* polynomial */
> +        vmovupd   poly_coeff+__svml_dasin_data_internal(%rip), %ymm6
> +        vmovupd   sqrt_coeff+__svml_dasin_data_internal(%rip), %ymm0
> +        vmulpd    %ymm7, %ymm9, %ymm8
> +        vfmadd213pd poly_coeff+32+__svml_dasin_data_internal(%rip), %ymm2, %ymm6
> +        vfmadd213pd sqrt_coeff+32+__svml_dasin_data_internal(%rip), %ymm7, %ymm0
> +        vfmadd213pd %ymm10, %ymm15, %ymm6
> +        vmovupd   poly_coeff+320+__svml_dasin_data_internal(%rip), %ymm10
> +        vfmadd213pd sqrt_coeff+64+__svml_dasin_data_internal(%rip), %ymm7, %ymm0
> +        vfmadd213pd %ymm13, %ymm12, %ymm6
> +        vfmadd213pd poly_coeff+352+__svml_dasin_data_internal(%rip), %ymm2, %ymm10
> +        vfmadd213pd sqrt_coeff+96+__svml_dasin_data_internal(%rip), %ymm7, %ymm0
> +        vfmadd213pd %ymm14, %ymm15, %ymm6
> +        vfmsub213pd %ymm9, %ymm8, %ymm0
> +        vfmadd213pd %ymm10, %ymm15, %ymm6
> +        vblendvpd %ymm1, %ymm0, %ymm4, %ymm4
> +        vmulpd    %ymm6, %ymm2, %ymm2
> +        vfmadd213pd %ymm4, %ymm4, %ymm2
> +        vandpd    Pi2H+__svml_dasin_data_internal(%rip), %ymm1, %ymm1
> +        vaddpd    %ymm2, %ymm1, %ymm0
> +        vxorpd    %ymm3, %ymm0, %ymm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm5
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm5, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      asin@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_asin_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dasin_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 AbsMask[4][2];
> +        __declspec(align(32)) VUINT32 OneHalf[4][2];
> +        __declspec(align(32)) VUINT32 SmallNorm[4][2];
> +        __declspec(align(32)) VUINT32 One[4][2];
> +        __declspec(align(32)) VUINT32 Two[4][2];
> +        __declspec(align(32)) VUINT32 sqrt_coeff[4][4][2];
> +        __declspec(align(32)) VUINT32 poly_coeff[12][4][2];
> +        __declspec(align(32)) VUINT32 Pi2H[4][2];
> +} __svml_dasin_data_internal;
> +#endif
> +__svml_dasin_data_internal:
> +        /*== AbsMask ==*/
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== OneHalf ==*/
> +        .align 32
> +        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
> +        /*== SmallNorm ==*/
> +        .align 32
> +        .quad 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000
> +        /*== One ==*/
> +        .align 32
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== Two ==*/
> +        .align 32
> +        .quad 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000
> +        /*== sqrt_coeff[4] ==*/
> +        .align 32
> +        .quad 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3 /* sqrt_coeff4 */
> +        .quad 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D /* sqrt_coeff3 */
> +        .quad 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97 /* sqrt_coeff2 */
> +        .quad 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D /* sqrt_coeff1 */
> +        /*== poly_coeff[12] ==*/
> +        .align 32
> +        .quad 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909 /* poly_coeff12 */
> +        .quad 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED /* poly_coeff11 */
> +        .quad 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE /* poly_coeff10 */
> +        .quad 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5 /* poly_coeff9 */
> +        .quad 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6 /* poly_coeff8 */
> +        .quad 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57 /* poly_coeff7 */
> +        .quad 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E /* poly_coeff6 */
> +        .quad 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd /* poly_coeff5 */
> +        .quad 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE /* poly_coeff4 */
> +        .quad 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8 /* poly_coeff3 */
> +        .quad 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE /* poly_coeff2 */
> +        .quad 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C /* poly_coeff1 */
> +        /*== Pi2H ==*/
> +        .align 32
> +        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18
> +        .align 32
> +        .type	__svml_dasin_data_internal,@object
> +        .size	__svml_dasin_data_internal,.-__svml_dasin_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core-avx2.S
> new file mode 100644
> index 0000000000..354a55dfaa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized asin, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_asin _ZGVeN8v_asin_avx2_wrapper
> +#include "../svml_d_asin8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core.c
> new file mode 100644
> index 0000000000..b03e4a2b9c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized asin, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_asin
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_asin, __GI__ZGVeN8v_asin, __redirect__ZGVeN8v_asin)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core_avx512.S
> new file mode 100644
> index 0000000000..b2fd8edb13
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asin8_core_avx512.S
> @@ -0,0 +1,295 @@
> +/* Function asin vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      SelMask = (|x| >= 0.5) ? 1 : 0;
> + *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
> + *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
> + *
> + */
> +
> +/* Offsets for data table __svml_dasin_data_internal
> + */
> +#define AbsMask                       	0
> +#define OneHalf                       	64
> +#define SmallNorm                     	128
> +#define One                           	192
> +#define Two                           	256
> +#define sqrt_coeff_1                  	320
> +#define sqrt_coeff_2                  	384
> +#define sqrt_coeff_3                  	448
> +#define sqrt_coeff_4                  	512
> +#define poly_coeff_1                  	576
> +#define poly_coeff_2                  	640
> +#define poly_coeff_3                  	704
> +#define poly_coeff_4                  	768
> +#define poly_coeff_5                  	832
> +#define poly_coeff_6                  	896
> +#define poly_coeff_7                  	960
> +#define poly_coeff_8                  	1024
> +#define poly_coeff_9                  	1088
> +#define poly_coeff_10                 	1152
> +#define poly_coeff_11                 	1216
> +#define poly_coeff_12                 	1280
> +#define Pi2H                          	1344
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_asin_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   OneHalf+__svml_dasin_data_internal(%rip), %zmm8
> +
> +/* S ~ -2*sqrt(Y) */
> +        vmovups   SmallNorm+__svml_dasin_data_internal(%rip), %zmm10
> +        vmovups   Two+__svml_dasin_data_internal(%rip), %zmm14
> +        vmovups   sqrt_coeff_1+__svml_dasin_data_internal(%rip), %zmm15
> +        vmovups   sqrt_coeff_2+__svml_dasin_data_internal(%rip), %zmm2
> +        vmovups   sqrt_coeff_3+__svml_dasin_data_internal(%rip), %zmm1
> +        vmovups   One+__svml_dasin_data_internal(%rip), %zmm9
> +        vmovaps   %zmm0, %zmm6
> +
> +/* x = |arg| */
> +        vandpd    __svml_dasin_data_internal(%rip), %zmm6, %zmm4
> +
> +/* Y = 0.5 - 0.5*x */
> +        vmovaps   %zmm8, %zmm11
> +        vfnmadd231pd {rn-sae}, %zmm4, %zmm8, %zmm11
> +
> +/* x^2 */
> +        vmulpd    {rn-sae}, %zmm4, %zmm4, %zmm7
> +        vrsqrt14pd %zmm11, %zmm12
> +        vcmppd    $17, {sae}, %zmm10, %zmm11, %k1
> +        vcmppd    $21, {sae}, %zmm8, %zmm4, %k2
> +        vcmppd    $17, {sae}, %zmm4, %zmm9, %k0
> +        vmovups   poly_coeff_5+__svml_dasin_data_internal(%rip), %zmm10
> +
> +/* polynomial */
> +        vmovups   poly_coeff_1+__svml_dasin_data_internal(%rip), %zmm8
> +        vmovups   poly_coeff_3+__svml_dasin_data_internal(%rip), %zmm9
> +        vminpd    {sae}, %zmm11, %zmm7, %zmm3
> +        vxorpd    %zmm12, %zmm12, %zmm12{%k1}
> +        vaddpd    {rn-sae}, %zmm11, %zmm11, %zmm0
> +        vxorpd    %zmm6, %zmm4, %zmm5
> +        vmulpd    {rn-sae}, %zmm12, %zmm12, %zmm13
> +        vmulpd    {rn-sae}, %zmm12, %zmm0, %zmm7
> +        vmovups   poly_coeff_7+__svml_dasin_data_internal(%rip), %zmm11
> +        vmovups   poly_coeff_4+__svml_dasin_data_internal(%rip), %zmm12
> +        vfmsub213pd {rn-sae}, %zmm14, %zmm13, %zmm0
> +        vmovups   sqrt_coeff_4+__svml_dasin_data_internal(%rip), %zmm13
> +        vfmadd231pd {rn-sae}, %zmm3, %zmm9, %zmm12
> +        vmovups   poly_coeff_11+__svml_dasin_data_internal(%rip), %zmm9
> +        vfmadd231pd {rn-sae}, %zmm0, %zmm15, %zmm2
> +        vmovups   poly_coeff_9+__svml_dasin_data_internal(%rip), %zmm15
> +        vmulpd    {rn-sae}, %zmm0, %zmm7, %zmm14
> +        vfmadd213pd {rn-sae}, %zmm1, %zmm0, %zmm2
> +        vmovups   poly_coeff_2+__svml_dasin_data_internal(%rip), %zmm1
> +        kmovw     %k0, %edx
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm0, %zmm2
> +        vfmadd231pd {rn-sae}, %zmm3, %zmm8, %zmm1
> +        vmovups   poly_coeff_10+__svml_dasin_data_internal(%rip), %zmm8
> +        vmulpd    {rn-sae}, %zmm3, %zmm3, %zmm0
> +        vfmsub213pd {rn-sae}, %zmm7, %zmm14, %zmm2
> +        vmovups   poly_coeff_6+__svml_dasin_data_internal(%rip), %zmm7
> +        vfmadd231pd {rn-sae}, %zmm3, %zmm15, %zmm8
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm0, %zmm1
> +        vblendmpd %zmm2, %zmm4, %zmm2{%k2}
> +        vfmadd231pd {rn-sae}, %zmm3, %zmm10, %zmm7
> +        vmovups   poly_coeff_8+__svml_dasin_data_internal(%rip), %zmm10
> +        vmovups   Pi2H+__svml_dasin_data_internal(%rip), %zmm4
> +        vfmadd231pd {rn-sae}, %zmm3, %zmm11, %zmm10
> +        vmovups   poly_coeff_12+__svml_dasin_data_internal(%rip), %zmm11
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm0, %zmm7
> +        vfmadd231pd {rn-sae}, %zmm3, %zmm9, %zmm11
> +        vmulpd    {rn-sae}, %zmm0, %zmm0, %zmm10
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm10, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm0, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm11, %zmm0, %zmm1
> +        vmulpd    {rn-sae}, %zmm3, %zmm1, %zmm3
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm2, %zmm3
> +        vaddpd    {rn-sae}, %zmm4, %zmm3, %zmm3{%k2}
> +        vxorpd    %zmm5, %zmm3, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm6
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm6, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      asin@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_asin_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dasin_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 AbsMask[8][2];
> +        __declspec(align(64)) VUINT32 OneHalf[8][2];
> +        __declspec(align(64)) VUINT32 SmallNorm[8][2];
> +        __declspec(align(64)) VUINT32 One[8][2];
> +        __declspec(align(64)) VUINT32 Two[8][2];
> +        __declspec(align(64)) VUINT32 sqrt_coeff[4][8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff[12][8][2];
> +        __declspec(align(64)) VUINT32 Pi2H[8][2];
> +} __svml_dasin_data_internal;
> +#endif
> +__svml_dasin_data_internal:
> +        /*== AbsMask ==*/
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== OneHalf ==*/
> +        .align 64
> +        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
> +        /*== SmallNorm ==*/
> +        .align 64
> +        .quad 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000, 0x3000000000000000
> +        /*== One ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== Two ==*/
> +        .align 64
> +        .quad 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000, 0x4000000000000000
> +        /*== sqrt_coeff[4] ==*/
> +        .align 64
> +        .quad 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3, 0xbf918000993B24C3 /* sqrt_coeff4 */
> +        .quad 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D, 0x3fa400006F70D42D /* sqrt_coeff3 */
> +        .quad 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97, 0xbfb7FFFFFFFFFE97 /* sqrt_coeff2 */
> +        .quad 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D, 0x3fcFFFFFFFFFFF9D /* sqrt_coeff1 */
> +        /*== poly_coeff[12] ==*/
> +        .align 64
> +        .quad 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909, 0x3fa07520C70EB909 /* poly_coeff12 */
> +        .quad 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED, 0xbf90FB17F7DBB0ED /* poly_coeff11 */
> +        .quad 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE, 0x3f943F44BFBC3BAE /* poly_coeff10 */
> +        .quad 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5, 0x3f7A583395D45ED5 /* poly_coeff9 */
> +        .quad 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6, 0x3f88F8DC2AFCCAD6 /* poly_coeff8 */
> +        .quad 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57, 0x3f8C6DBBCB88BD57 /* poly_coeff7 */
> +        .quad 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E, 0x3f91C6DCF538AD2E /* poly_coeff6 */
> +        .quad 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd, 0x3f96E89CEBDEFadd /* poly_coeff5 */
> +        .quad 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE, 0x3f9F1C72E13AD8BE /* poly_coeff4 */
> +        .quad 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8, 0x3fa6DB6DB3B445F8 /* poly_coeff3 */
> +        .quad 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE, 0x3fb333333337E0DE /* poly_coeff2 */
> +        .quad 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C, 0x3fc555555555529C /* poly_coeff1 */
> +        /*== Pi2H ==*/
> +        .align 64
> +        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18
> +        .align 64
> +        .type	__svml_dasin_data_internal,@object
> +        .size	__svml_dasin_data_internal,.-__svml_dasin_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core-avx2.S
> new file mode 100644
> index 0000000000..e0582f27d4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized asinf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_asinf _ZGVeN16v_asinf_avx2_wrapper
> +#include "../svml_s_asinf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core.c
> new file mode 100644
> index 0000000000..4435055566
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized asinf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_asinf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_asinf, __GI__ZGVeN16v_asinf,
> +	       __redirect__ZGVeN16v_asinf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core_avx512.S
> new file mode 100644
> index 0000000000..7afdfd1317
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf16_core_avx512.S
> @@ -0,0 +1,260 @@
> +/* Function asinf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      SelMask = (|x| >= 0.5) ? 1 : 0;
> + *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
> + *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_sasin_data_internal
> + */
> +#define AbsMask                       	0
> +#define OneHalf                       	64
> +#define SmallNorm                     	128
> +#define One                           	192
> +#define Two                           	256
> +#define sqrt_coeff_1                  	320
> +#define sqrt_coeff_2                  	384
> +#define poly_coeff_1                  	448
> +#define poly_coeff_2                  	512
> +#define poly_coeff_3                  	576
> +#define poly_coeff_4                  	640
> +#define poly_coeff_5                  	704
> +#define Pi2H                          	768
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_asinf_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   __svml_sasin_data_internal(%rip), %zmm4
> +        vmovups   OneHalf+__svml_sasin_data_internal(%rip), %zmm6
> +
> +/* SQ ~ -2*sqrt(Y) */
> +        vmovups   SmallNorm+__svml_sasin_data_internal(%rip), %zmm8
> +        vmovups   Two+__svml_sasin_data_internal(%rip), %zmm12
> +        vmovups   sqrt_coeff_1+__svml_sasin_data_internal(%rip), %zmm13
> +        vmovups   One+__svml_sasin_data_internal(%rip), %zmm7
> +        vmovaps   %zmm0, %zmm3
> +
> +/* x = |arg| */
> +        vandps    %zmm3, %zmm4, %zmm2
> +        vandnps   %zmm3, %zmm4, %zmm1
> +
> +/* x^2 */
> +        vmulps    {rn-sae}, %zmm2, %zmm2, %zmm5
> +        vcmpps    $17, {sae}, %zmm2, %zmm7, %k0
> +        vcmpps    $21, {sae}, %zmm6, %zmm2, %k2
> +        vmovups   poly_coeff_2+__svml_sasin_data_internal(%rip), %zmm7
> +        kmovw     %k0, %edx
> +
> +/* Y = 0.5 - 0.5*x */
> +        vmovaps   %zmm6, %zmm9
> +        vfnmadd231ps {rn-sae}, %zmm2, %zmm6, %zmm9
> +        vmovups   poly_coeff_5+__svml_sasin_data_internal(%rip), %zmm6
> +        vrsqrt14ps %zmm9, %zmm10
> +        vcmpps    $17, {sae}, %zmm8, %zmm9, %k1
> +        vminps    {sae}, %zmm9, %zmm5, %zmm0
> +        vmovups   sqrt_coeff_2+__svml_sasin_data_internal(%rip), %zmm8
> +        vmovups   poly_coeff_4+__svml_sasin_data_internal(%rip), %zmm5
> +        vxorps    %zmm10, %zmm10, %zmm10{%k1}
> +        vaddps    {rn-sae}, %zmm9, %zmm9, %zmm14
> +        vmulps    {rn-sae}, %zmm10, %zmm10, %zmm11
> +        vmulps    {rn-sae}, %zmm10, %zmm14, %zmm4
> +        vfmsub213ps {rn-sae}, %zmm12, %zmm11, %zmm14
> +        vmulps    {rn-sae}, %zmm14, %zmm4, %zmm15
> +        vfmadd231ps {rn-sae}, %zmm14, %zmm13, %zmm8
> +        vmovups   poly_coeff_3+__svml_sasin_data_internal(%rip), %zmm14
> +
> +/* polynomial */
> +        vmovups   poly_coeff_1+__svml_sasin_data_internal(%rip), %zmm13
> +        vfmsub213ps {rn-sae}, %zmm4, %zmm15, %zmm8
> +        vfmadd231ps {rn-sae}, %zmm0, %zmm14, %zmm5
> +        vfmadd231ps {rn-sae}, %zmm0, %zmm13, %zmm7
> +        vmulps    {rn-sae}, %zmm0, %zmm0, %zmm15
> +        vblendmps %zmm8, %zmm2, %zmm2{%k2}
> +        vfmadd213ps {rn-sae}, %zmm5, %zmm15, %zmm7
> +        vfmadd213ps {rn-sae}, %zmm6, %zmm0, %zmm7
> +        vmulps    {rn-sae}, %zmm0, %zmm7, %zmm9
> +        vmovups   Pi2H+__svml_sasin_data_internal(%rip), %zmm0
> +        vfmadd213ps {rn-sae}, %zmm2, %zmm2, %zmm9
> +        vaddps    {rn-sae}, %zmm0, %zmm9, %zmm9{%k2}
> +        vxorps    %zmm1, %zmm9, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm3, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      asinf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_asinf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_sasin_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 AbsMask[16][1];
> +        __declspec(align(64)) VUINT32 OneHalf[16][1];
> +        __declspec(align(64)) VUINT32 SmallNorm[16][1];
> +        __declspec(align(64)) VUINT32 One[16][1];
> +        __declspec(align(64)) VUINT32 Two[16][1];
> +        __declspec(align(64)) VUINT32 sqrt_coeff[2][16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff[5][16][1];
> +        __declspec(align(64)) VUINT32 Pi2H[16][1];
> +} __svml_sasin_data_internal;
> +#endif
> +__svml_sasin_data_internal:
> +        /*== AbsMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== OneHalf ==*/
> +        .align 64
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
> +        /*== SmallNorm ==*/
> +        .align 64
> +        .long 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000
> +        /*== One ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== Two ==*/
> +        .align 64
> +        .long 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000
> +        /*== sqrt_coeff[2] ==*/
> +        .align 64
> +        .long 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004 /* sqrt_coeff2 */
> +        .long 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001 /* sqrt_coeff1 */
> +        /*== poly_coeff[5] ==*/
> +        .align 64
> +        .long 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07 /* poly_coeff5 */
> +        .long 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B /* poly_coeff4 */
> +        .long 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4 /* poly_coeff3 */
> +        .long 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12 /* poly_coeff2 */
> +        .long 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF /* poly_coeff1 */
> +        /*== Pi2H ==*/
> +        .align 64
> +        .long 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB
> +        .align 64
> +        .type	__svml_sasin_data_internal,@object
> +        .size	__svml_sasin_data_internal,.-__svml_sasin_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core-sse2.S
> new file mode 100644
> index 0000000000..b958db7795
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized asinf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_asinf _ZGVbN4v_asinf_sse2
> +#include "../svml_s_asinf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core.c
> new file mode 100644
> index 0000000000..5a7aa94264
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized asinf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_asinf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_asinf, __GI__ZGVbN4v_asinf,
> +	       __redirect__ZGVbN4v_asinf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core_sse4.S
> new file mode 100644
> index 0000000000..ddcceeb7b9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf4_core_sse4.S
> @@ -0,0 +1,252 @@
> +/* Function asinf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      SelMask = (|x| >= 0.5) ? 1 : 0;
> + *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
> + *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_sasin_data_internal
> + */
> +#define AbsMask                       	0
> +#define OneHalf                       	16
> +#define SmallNorm                     	32
> +#define One                           	48
> +#define Two                           	64
> +#define sqrt_coeff                    	80
> +#define poly_coeff                    	112
> +#define Pi2H                          	192
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_asinf_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm2
> +        movups    __svml_sasin_data_internal(%rip), %xmm1
> +        movups    OneHalf+__svml_sasin_data_internal(%rip), %xmm5
> +
> +/* x = |arg| */
> +        movaps    %xmm1, %xmm0
> +        andps     %xmm2, %xmm0
> +
> +/* Y = 0.5 - 0.5*x */
> +        movaps    %xmm5, %xmm3
> +        mulps     %xmm0, %xmm3
> +        movaps    %xmm5, %xmm8
> +
> +/* x^2 */
> +        movaps    %xmm0, %xmm14
> +        movaps    %xmm0, %xmm15
> +        mulps     %xmm0, %xmm14
> +        subps     %xmm3, %xmm8
> +        cmpnltps  %xmm5, %xmm15
> +
> +/* SQ ~ -2*sqrt(Y) */
> +        rsqrtps   %xmm8, %xmm6
> +        minps     %xmm8, %xmm14
> +        movaps    %xmm8, %xmm9
> +        movaps    %xmm14, %xmm10
> +        cmpltps   SmallNorm+__svml_sasin_data_internal(%rip), %xmm9
> +        mulps     %xmm14, %xmm10
> +        addps     %xmm8, %xmm8
> +        andnps    %xmm6, %xmm9
> +        movaps    %xmm15, %xmm3
> +        movaps    %xmm9, %xmm7
> +        andnps    %xmm0, %xmm3
> +        mulps     %xmm9, %xmm7
> +        andnps    %xmm2, %xmm1
> +        mulps     %xmm8, %xmm9
> +        mulps     %xmm7, %xmm8
> +
> +/* polynomial */
> +        movups    poly_coeff+__svml_sasin_data_internal(%rip), %xmm11
> +        mulps     %xmm14, %xmm11
> +        subps     Two+__svml_sasin_data_internal(%rip), %xmm8
> +        movups    poly_coeff+32+__svml_sasin_data_internal(%rip), %xmm12
> +        mulps     %xmm14, %xmm12
> +        addps     poly_coeff+16+__svml_sasin_data_internal(%rip), %xmm11
> +        mulps     %xmm10, %xmm11
> +        addps     poly_coeff+48+__svml_sasin_data_internal(%rip), %xmm12
> +        movups    sqrt_coeff+__svml_sasin_data_internal(%rip), %xmm13
> +        addps     %xmm11, %xmm12
> +        mulps     %xmm8, %xmm13
> +        mulps     %xmm9, %xmm8
> +        mulps     %xmm14, %xmm12
> +        addps     sqrt_coeff+16+__svml_sasin_data_internal(%rip), %xmm13
> +        addps     poly_coeff+64+__svml_sasin_data_internal(%rip), %xmm12
> +        mulps     %xmm8, %xmm13
> +        mulps     %xmm12, %xmm14
> +        subps     %xmm9, %xmm13
> +        andps     %xmm15, %xmm13
> +        orps      %xmm13, %xmm3
> +        mulps     %xmm3, %xmm14
> +        movups    One+__svml_sasin_data_internal(%rip), %xmm4
> +        addps     %xmm14, %xmm3
> +        cmpltps   %xmm0, %xmm4
> +        movups    Pi2H+__svml_sasin_data_internal(%rip), %xmm0
> +        andps     %xmm15, %xmm0
> +        movmskps  %xmm4, %edx
> +        addps     %xmm3, %xmm0
> +        pxor      %xmm1, %xmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm2, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      asinf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_asinf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_sasin_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 AbsMask[4][1];
> +        __declspec(align(16)) VUINT32 OneHalf[4][1];
> +        __declspec(align(16)) VUINT32 SmallNorm[4][1];
> +        __declspec(align(16)) VUINT32 One[4][1];
> +        __declspec(align(16)) VUINT32 Two[4][1];
> +        __declspec(align(16)) VUINT32 sqrt_coeff[2][4][1];
> +        __declspec(align(16)) VUINT32 poly_coeff[5][4][1];
> +        __declspec(align(16)) VUINT32 Pi2H[4][1];
> +} __svml_sasin_data_internal;
> +#endif
> +__svml_sasin_data_internal:
> +        /*== AbsMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== OneHalf ==*/
> +        .align 16
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
> +        /*== SmallNorm ==*/
> +        .align 16
> +        .long 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000
> +        /*== One ==*/
> +        .align 16
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== Two ==*/
> +        .align 16
> +        .long 0x40000000, 0x40000000, 0x40000000, 0x40000000
> +        /*== sqrt_coeff[2] ==*/
> +        .align 16
> +        .long 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004 /* sqrt_coeff2 */
> +        .long 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001 /* sqrt_coeff1 */
> +        /*== poly_coeff[5] ==*/
> +        .align 16
> +        .long 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07 /* poly_coeff5 */
> +        .long 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B /* poly_coeff4 */
> +        .long 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4 /* poly_coeff3 */
> +        .long 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12 /* poly_coeff2 */
> +        .long 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF /* poly_coeff1 */
> +        /*== Pi2H ==*/
> +        .align 16
> +        .long 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB
> +        .align 16
> +        .type	__svml_sasin_data_internal,@object
> +        .size	__svml_sasin_data_internal,.-__svml_sasin_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core-sse.S
> new file mode 100644
> index 0000000000..6273c919d6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized asinf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_asinf _ZGVdN8v_asinf_sse_wrapper
> +#include "../svml_s_asinf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core.c
> new file mode 100644
> index 0000000000..946b25b43f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized asinf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_asinf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_asinf, __GI__ZGVdN8v_asinf,
> +	       __redirect__ZGVdN8v_asinf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core_avx2.S
> new file mode 100644
> index 0000000000..89c156dbbb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinf8_core_avx2.S
> @@ -0,0 +1,249 @@
> +/* Function asinf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      SelMask = (|x| >= 0.5) ? 1 : 0;
> + *      R = SelMask ? sqrt(0.5 - 0.5*|x|) : |x|
> + *      asin(x) = (SelMask ? (Pi/2 - 2*Poly(R)) : Poly(R))*(-1)^sign(x)
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_sasin_data_internal
> + */
> +#define AbsMask                       	0
> +#define OneHalf                       	32
> +#define SmallNorm                     	64
> +#define One                           	96
> +#define Two                           	128
> +#define sqrt_coeff                    	160
> +#define poly_coeff                    	224
> +#define Pi2H                          	384
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_asinf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        vmovups   __svml_sasin_data_internal(%rip), %ymm5
> +        vmovups   OneHalf+__svml_sasin_data_internal(%rip), %ymm9
> +        vmovups   One+__svml_sasin_data_internal(%rip), %ymm6
> +        vmovaps   %ymm0, %ymm4
> +
> +/* x = |arg| */
> +        vandps    %ymm4, %ymm5, %ymm3
> +
> +/* Y = 0.5 - 0.5*x */
> +        vmovaps   %ymm9, %ymm12
> +        vfnmadd231ps %ymm3, %ymm9, %ymm12
> +
> +/* x^2 */
> +        vmulps    %ymm3, %ymm3, %ymm7
> +        vcmplt_oqps %ymm3, %ymm6, %ymm8
> +
> +/* SQ ~ -2*sqrt(Y) */
> +        vcmplt_oqps SmallNorm+__svml_sasin_data_internal(%rip), %ymm12, %ymm10
> +        vminps    %ymm12, %ymm7, %ymm1
> +        vaddps    %ymm12, %ymm12, %ymm15
> +        vcmpnlt_uqps %ymm9, %ymm3, %ymm0
> +        vrsqrtps  %ymm12, %ymm11
> +        vmovups   poly_coeff+64+__svml_sasin_data_internal(%rip), %ymm7
> +        vmulps    %ymm1, %ymm1, %ymm6
> +        vmovups   sqrt_coeff+__svml_sasin_data_internal(%rip), %ymm9
> +        vfmadd213ps poly_coeff+96+__svml_sasin_data_internal(%rip), %ymm1, %ymm7
> +        vmovmskps %ymm8, %edx
> +
> +/* polynomial */
> +        vmovups   poly_coeff+__svml_sasin_data_internal(%rip), %ymm8
> +        vandnps   %ymm11, %ymm10, %ymm13
> +        vmulps    %ymm13, %ymm13, %ymm14
> +        vfmadd213ps poly_coeff+32+__svml_sasin_data_internal(%rip), %ymm1, %ymm8
> +        vandnps   %ymm4, %ymm5, %ymm2
> +        vmulps    %ymm15, %ymm13, %ymm5
> +        vfmsub213ps Two+__svml_sasin_data_internal(%rip), %ymm14, %ymm15
> +        vfmadd213ps %ymm7, %ymm6, %ymm8
> +        vfmadd213ps sqrt_coeff+32+__svml_sasin_data_internal(%rip), %ymm15, %ymm9
> +        vmulps    %ymm15, %ymm5, %ymm15
> +        vfmadd213ps poly_coeff+128+__svml_sasin_data_internal(%rip), %ymm1, %ymm8
> +        vfmsub213ps %ymm5, %ymm15, %ymm9
> +        vmulps    %ymm8, %ymm1, %ymm1
> +        vblendvps %ymm0, %ymm9, %ymm3, %ymm3
> +        vfmadd213ps %ymm3, %ymm3, %ymm1
> +        vandps    Pi2H+__svml_sasin_data_internal(%rip), %ymm0, %ymm0
> +        vaddps    %ymm1, %ymm0, %ymm10
> +        vxorps    %ymm2, %ymm10, %ymm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm4
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm4, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      asinf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_asinf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_sasin_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 AbsMask[8][1];
> +        __declspec(align(32)) VUINT32 OneHalf[8][1];
> +        __declspec(align(32)) VUINT32 SmallNorm[8][1];
> +        __declspec(align(32)) VUINT32 One[8][1];
> +        __declspec(align(32)) VUINT32 Two[8][1];
> +        __declspec(align(32)) VUINT32 sqrt_coeff[2][8][1];
> +        __declspec(align(32)) VUINT32 poly_coeff[5][8][1];
> +        __declspec(align(32)) VUINT32 Pi2H[8][1];
> +} __svml_sasin_data_internal;
> +#endif
> +__svml_sasin_data_internal:
> +        /*== AbsMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== OneHalf ==*/
> +        .align 32
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
> +        /*== SmallNorm ==*/
> +        .align 32
> +        .long 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000
> +        /*== One ==*/
> +        .align 32
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== Two ==*/
> +        .align 32
> +        .long 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000, 0x40000000
> +        /*== sqrt_coeff[2] ==*/
> +        .align 32
> +        .long 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004, 0xbdC00004 /* sqrt_coeff2 */
> +        .long 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001, 0x3e800001 /* sqrt_coeff1 */
> +        /*== poly_coeff[5] ==*/
> +        .align 32
> +        .long 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07, 0x3d2EDC07 /* poly_coeff5 */
> +        .long 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B, 0x3CC32A6B /* poly_coeff4 */
> +        .long 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4, 0x3d3A9AB4 /* poly_coeff3 */
> +        .long 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12, 0x3d997C12 /* poly_coeff2 */
> +        .long 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF, 0x3e2AAAFF /* poly_coeff1 */
> +        /*== Pi2H ==*/
> +        .align 32
> +        .long 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB
> +        .align 32
> +        .type	__svml_sasin_data_internal,@object
> +        .size	__svml_sasin_data_internal,.-__svml_sasin_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_asin2_core.S b/sysdeps/x86_64/fpu/svml_d_asin2_core.S
> new file mode 100644
> index 0000000000..8ff8bc58df
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_asin2_core.S
> @@ -0,0 +1,29 @@
> +/* Function asin vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_asin)
> +WRAPPER_IMPL_SSE2 asin
> +END (_ZGVbN2v_asin)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_asin)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_asin4_core.S b/sysdeps/x86_64/fpu/svml_d_asin4_core.S
> new file mode 100644
> index 0000000000..dbe33952bc
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_asin4_core.S
> @@ -0,0 +1,29 @@
> +/* Function asin vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_asin)
> +WRAPPER_IMPL_AVX _ZGVbN2v_asin
> +END (_ZGVdN4v_asin)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_asin)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S
> new file mode 100644
> index 0000000000..513a31bde5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_asin4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function asin vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_asin)
> +WRAPPER_IMPL_AVX _ZGVbN2v_asin
> +END (_ZGVcN4v_asin)
> diff --git a/sysdeps/x86_64/fpu/svml_d_asin8_core.S b/sysdeps/x86_64/fpu/svml_d_asin8_core.S
> new file mode 100644
> index 0000000000..06694298cf
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_asin8_core.S
> @@ -0,0 +1,25 @@
> +/* Function asin vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_asin)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_asin
> +END (_ZGVeN8v_asin)
> diff --git a/sysdeps/x86_64/fpu/svml_s_asinf16_core.S b/sysdeps/x86_64/fpu/svml_s_asinf16_core.S
> new file mode 100644
> index 0000000000..015d583e3f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_asinf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function asinf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_asinf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_asinf
> +END (_ZGVeN16v_asinf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_asinf4_core.S b/sysdeps/x86_64/fpu/svml_s_asinf4_core.S
> new file mode 100644
> index 0000000000..d80f06c16d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_asinf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function asinf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_asinf)
> +WRAPPER_IMPL_SSE2 asinf
> +END (_ZGVbN4v_asinf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_asinf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_asinf8_core.S b/sysdeps/x86_64/fpu/svml_s_asinf8_core.S
> new file mode 100644
> index 0000000000..304ad0a7f5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_asinf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function asinf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_asinf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_asinf
> +END (_ZGVdN8v_asinf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_asinf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S
> new file mode 100644
> index 0000000000..a2f7dc112e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_asinf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function asinf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_asinf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_asinf
> +END (_ZGVcN8v_asinf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx.c
> new file mode 100644
> index 0000000000..e37cfdce58
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-asin.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx2.c
> new file mode 100644
> index 0000000000..e37cfdce58
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-asin.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx512f.c
> new file mode 100644
> index 0000000000..e37cfdce58
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-asin-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-asin.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asin.c b/sysdeps/x86_64/fpu/test-double-libmvec-asin.c
> new file mode 100644
> index 0000000000..d2e16e67f4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-asin.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC asin
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 467c913990..5746bb5be3 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVbN2vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan)
> +VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index b72a7de84e..8d3d5493ed 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVdN4v_exp)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVdN4vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan)
> +VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index d2434df21e..f43328f2ff 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVcN4v_exp)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVcN4vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan)
> +VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index f7aaf8159e..8b566c199a 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVeN8v_exp)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVeN8vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos)
>  VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan)
> +VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx.c
> new file mode 100644
> index 0000000000..6aa8f5f370
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-asinf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx2.c
> new file mode 100644
> index 0000000000..6aa8f5f370
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-asinf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx512f.c
> new file mode 100644
> index 0000000000..6aa8f5f370
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-asinf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinf.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinf.c
> new file mode 100644
> index 0000000000..2bbe2395a0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC asinf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index af769c56fa..3d3218a310 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVeN16v_expf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index 76e61d2f1e..7d75b9f60f 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index 5e27eaaf29..405dde49bc 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVdN8v_expf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 28daf79aa9..7558443f2e 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -29,6 +29,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVcN8v_expf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 07/18] x86-64: Add vector expm1/expm1f implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 07/18] x86-64: Add vector expm1/expm1f " Sunil K Pandey
@ 2021-12-29 21:25   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:25 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:49PM -0800, Sunil K Pandey wrote:
> Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector expm1/expm1f with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
>  .../fpu/multiarch/svml_d_expm12_core-sse2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_d_expm12_core.c |  27 ++
>  .../fpu/multiarch/svml_d_expm12_core_sse4.S   | 421 ++++++++++++++++++
>  .../fpu/multiarch/svml_d_expm14_core-sse.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_expm14_core.c |  27 ++
>  .../fpu/multiarch/svml_d_expm14_core_avx2.S   | 408 +++++++++++++++++
>  .../fpu/multiarch/svml_d_expm18_core-avx2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_d_expm18_core.c |  27 ++
>  .../fpu/multiarch/svml_d_expm18_core_avx512.S | 334 ++++++++++++++
>  .../fpu/multiarch/svml_s_expm1f16_core-avx2.S |  20 +
>  .../fpu/multiarch/svml_s_expm1f16_core.c      |  28 ++
>  .../multiarch/svml_s_expm1f16_core_avx512.S   | 281 ++++++++++++
>  .../fpu/multiarch/svml_s_expm1f4_core-sse2.S  |  20 +
>  .../fpu/multiarch/svml_s_expm1f4_core.c       |  28 ++
>  .../fpu/multiarch/svml_s_expm1f4_core_sse4.S  | 358 +++++++++++++++
>  .../fpu/multiarch/svml_s_expm1f8_core-sse.S   |  20 +
>  .../fpu/multiarch/svml_s_expm1f8_core.c       |  28 ++
>  .../fpu/multiarch/svml_s_expm1f8_core_avx2.S  | 351 +++++++++++++++
>  sysdeps/x86_64/fpu/svml_d_expm12_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_d_expm14_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S   |  25 ++
>  sysdeps/x86_64/fpu/svml_d_expm18_core.S       |  25 ++
>  sysdeps/x86_64/fpu/svml_s_expm1f16_core.S     |  25 ++
>  sysdeps/x86_64/fpu/svml_s_expm1f4_core.S      |  29 ++
>  sysdeps/x86_64/fpu/svml_s_expm1f8_core.S      |  29 ++
>  sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S  |  25 ++
>  .../fpu/test-double-libmvec-expm1-avx.c       |   1 +
>  .../fpu/test-double-libmvec-expm1-avx2.c      |   1 +
>  .../fpu/test-double-libmvec-expm1-avx512f.c   |   1 +
>  .../x86_64/fpu/test-double-libmvec-expm1.c    |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../fpu/test-float-libmvec-expm1f-avx.c       |   1 +
>  .../fpu/test-float-libmvec-expm1f-avx2.c      |   1 +
>  .../fpu/test-float-libmvec-expm1f-avx512f.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-expm1f.c    |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 2725 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_expm12_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_expm14_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_expm18_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-expm1.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-expm1f.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 35c6ac57a8..28dc4a82c5 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -175,4 +175,15 @@
>  #define __DECL_SIMD_coshf32x
>  #define __DECL_SIMD_coshf64x
>  #define __DECL_SIMD_coshf128x
> +
> +#define __DECL_SIMD_expm1
> +#define __DECL_SIMD_expm1f
> +#define __DECL_SIMD_expm1l
> +#define __DECL_SIMD_expm1f16
> +#define __DECL_SIMD_expm1f32
> +#define __DECL_SIMD_expm1f64
> +#define __DECL_SIMD_expm1f128
> +#define __DECL_SIMD_expm1f32x
> +#define __DECL_SIMD_expm1f64x
> +#define __DECL_SIMD_expm1f128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 60a314f69e..c57adc8ace 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -116,7 +116,7 @@ __MATHCALL_VEC (exp10,, (_Mdouble_ __x));
>  
>  #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99
>  /* Return exp(X) - 1.  */
> -__MATHCALL (expm1,, (_Mdouble_ __x));
> +__MATHCALL_VEC (expm1,, (_Mdouble_ __x));
>  
>  /* Return log(1 + X).  */
>  __MATHCALL (log1p,, (_Mdouble_ __x));
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index 4907680143..c9d3213bd3 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -52,6 +52,7 @@ GLIBC_2.35 _ZGVbN2v_atan F
>  GLIBC_2.35 _ZGVbN2v_cosh F
>  GLIBC_2.35 _ZGVbN2v_exp10 F
>  GLIBC_2.35 _ZGVbN2v_exp2 F
> +GLIBC_2.35 _ZGVbN2v_expm1 F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
> @@ -59,6 +60,7 @@ GLIBC_2.35 _ZGVbN4v_atanf F
>  GLIBC_2.35 _ZGVbN4v_coshf F
>  GLIBC_2.35 _ZGVbN4v_exp10f F
>  GLIBC_2.35 _ZGVbN4v_exp2f F
> +GLIBC_2.35 _ZGVbN4v_expm1f F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
>  GLIBC_2.35 _ZGVcN4v_asin F
> @@ -66,6 +68,7 @@ GLIBC_2.35 _ZGVcN4v_atan F
>  GLIBC_2.35 _ZGVcN4v_cosh F
>  GLIBC_2.35 _ZGVcN4v_exp10 F
>  GLIBC_2.35 _ZGVcN4v_exp2 F
> +GLIBC_2.35 _ZGVcN4v_expm1 F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
> @@ -73,6 +76,7 @@ GLIBC_2.35 _ZGVcN8v_atanf F
>  GLIBC_2.35 _ZGVcN8v_coshf F
>  GLIBC_2.35 _ZGVcN8v_exp10f F
>  GLIBC_2.35 _ZGVcN8v_exp2f F
> +GLIBC_2.35 _ZGVcN8v_expm1f F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
>  GLIBC_2.35 _ZGVdN4v_asin F
> @@ -80,6 +84,7 @@ GLIBC_2.35 _ZGVdN4v_atan F
>  GLIBC_2.35 _ZGVdN4v_cosh F
>  GLIBC_2.35 _ZGVdN4v_exp10 F
>  GLIBC_2.35 _ZGVdN4v_exp2 F
> +GLIBC_2.35 _ZGVdN4v_expm1 F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
> @@ -87,6 +92,7 @@ GLIBC_2.35 _ZGVdN8v_atanf F
>  GLIBC_2.35 _ZGVdN8v_coshf F
>  GLIBC_2.35 _ZGVdN8v_exp10f F
>  GLIBC_2.35 _ZGVdN8v_exp2f F
> +GLIBC_2.35 _ZGVdN8v_expm1f F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
> @@ -94,6 +100,7 @@ GLIBC_2.35 _ZGVeN16v_atanf F
>  GLIBC_2.35 _ZGVeN16v_coshf F
>  GLIBC_2.35 _ZGVeN16v_exp10f F
>  GLIBC_2.35 _ZGVeN16v_exp2f F
> +GLIBC_2.35 _ZGVeN16v_expm1f F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
>  GLIBC_2.35 _ZGVeN8v_asin F
> @@ -101,4 +108,5 @@ GLIBC_2.35 _ZGVeN8v_atan F
>  GLIBC_2.35 _ZGVeN8v_cosh F
>  GLIBC_2.35 _ZGVeN8v_exp10 F
>  GLIBC_2.35 _ZGVeN8v_exp2 F
> +GLIBC_2.35 _ZGVeN8v_expm1 F
>  GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 708e81b3d0..e2f98e176f 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -86,6 +86,10 @@
>  #  define __DECL_SIMD_cosh __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_coshf
>  #  define __DECL_SIMD_coshf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_expm1
> +#  define __DECL_SIMD_expm1 __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_expm1f
> +#  define __DECL_SIMD_expm1f __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 81d0238ebf..43233059f6 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -42,6 +42,8 @@
>  !GCC$ builtin (exp10f) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (cosh) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (coshf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (expm1) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (expm1f) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -69,3 +71,5 @@
>  !GCC$ builtin (exp10f) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosh) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (coshf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (expm1) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (expm1f) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 5bc2df134f..8de8214971 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -30,6 +30,7 @@ libmvec-funcs = \
>    exp \
>    exp10 \
>    exp2 \
> +  expm1 \
>    hypot \
>    log \
>    pow \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 53346d16a2..58debb2dbe 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -20,6 +20,7 @@ libmvec {
>      _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh;
>      _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
>      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
> +    _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
> @@ -27,6 +28,7 @@ libmvec {
>      _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf;
>      _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
>      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
> +    _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
>      _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
>    }
>  }
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index ac70f15208..f05ece8c8a 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -1395,6 +1395,26 @@ float: 1
>  float128: 3
>  ldouble: 4
>  
> +Function: "expm1_vlen16":
> +float: 1
> +
> +Function: "expm1_vlen2":
> +double: 1
> +
> +Function: "expm1_vlen4":
> +double: 1
> +float: 1
> +
> +Function: "expm1_vlen4_avx2":
> +double: 1
> +
> +Function: "expm1_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "expm1_vlen8_avx2":
> +float: 1
> +
>  Function: "gamma":
>  double: 4
>  float: 7
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core-sse2.S
> new file mode 100644
> index 0000000000..e8cb6faaca
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized expm1, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_expm1 _ZGVbN2v_expm1_sse2
> +#include "../svml_d_expm12_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core.c
> new file mode 100644
> index 0000000000..9c794e932e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized expm1, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_expm1
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_expm1, __GI__ZGVbN2v_expm1, __redirect__ZGVbN2v_expm1)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core_sse4.S
> new file mode 100644
> index 0000000000..db763e3856
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm12_core_sse4.S
> @@ -0,0 +1,421 @@
> +/* Function expm1 vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    N = (int)(x*2^k/log(2.0)), R = x - N*log(2)/2^k
> + *    exp(x) = 2^(N/2^k) * poly(R) is computed in high-low parts
> + *    expm1(x) = exp(x)-1 is then obtained via multi-precision computation
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dexpm1_data_internal
> + */
> +#define Expm1_HA_table                	0
> +#define poly_coeff                    	2048
> +#define Log2e                         	2112
> +#define L2H                           	2128
> +#define L2L                           	2144
> +#define ExpAddConst                   	2160
> +#define IndexMask                     	2176
> +#define ExpMask                       	2192
> +#define MOne                          	2208
> +#define AbsMask                       	2224
> +#define Threshold                     	2240
> +#define L2                            	2256
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_expm1_sse4)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $64, %rsp
> +        movaps    %xmm0, %xmm2
> +        movups    Log2e+__svml_dexpm1_data_internal(%rip), %xmm7
> +        lea       __svml_dexpm1_data_internal(%rip), %rsi
> +        mulpd     %xmm0, %xmm7
> +        movups    .FLT_10(%rip), %xmm3
> +        addpd     %xmm3, %xmm7
> +        subpd     %xmm3, %xmm7
> +
> +/* argument reduction */
> +        movups    L2H+__svml_dexpm1_data_internal(%rip), %xmm4
> +        mulpd     %xmm7, %xmm4
> +        movups    L2L+__svml_dexpm1_data_internal(%rip), %xmm5
> +        mulpd     %xmm7, %xmm5
> +        subpd     %xmm4, %xmm2
> +        subpd     %xmm5, %xmm2
> +
> +/* polynomial */
> +        movups    poly_coeff+__svml_dexpm1_data_internal(%rip), %xmm12
> +        movaps    %xmm2, %xmm14
> +        mulpd     %xmm2, %xmm12
> +        mulpd     %xmm2, %xmm14
> +        addpd     poly_coeff+16+__svml_dexpm1_data_internal(%rip), %xmm12
> +        movups    ExpAddConst+__svml_dexpm1_data_internal(%rip), %xmm15
> +        addpd     %xmm7, %xmm15
> +        mulpd     %xmm14, %xmm12
> +        movups    poly_coeff+32+__svml_dexpm1_data_internal(%rip), %xmm13
> +        mulpd     %xmm2, %xmm13
> +
> +/* table lookup */
> +        movdqu    IndexMask+__svml_dexpm1_data_internal(%rip), %xmm8
> +        pand      %xmm15, %xmm8
> +        movups    AbsMask+__svml_dexpm1_data_internal(%rip), %xmm1
> +        pshufd    $2, %xmm8, %xmm9
> +        movaps    %xmm1, %xmm6
> +        movd      %xmm8, %eax
> +        andps     %xmm0, %xmm6
> +        movd      %xmm9, %ecx
> +        andnps    %xmm0, %xmm1
> +        movdqu    ExpMask+__svml_dexpm1_data_internal(%rip), %xmm11
> +        pand      %xmm11, %xmm15
> +        cmpnlepd  Threshold+__svml_dexpm1_data_internal(%rip), %xmm6
> +        addpd     poly_coeff+48+__svml_dexpm1_data_internal(%rip), %xmm13
> +        movmskpd  %xmm6, %edx
> +        psllq     $41, %xmm15
> +
> +/* T-1 */
> +        movups    MOne+__svml_dexpm1_data_internal(%rip), %xmm4
> +        movslq    %eax, %rax
> +        movslq    %ecx, %rcx
> +        addpd     %xmm12, %xmm13
> +        movups    (%rsi,%rax), %xmm3
> +        movups    (%rsi,%rcx), %xmm10
> +        movaps    %xmm3, %xmm6
> +        unpckhpd  %xmm10, %xmm3
> +
> +/* Th1 = (Th-1) + Tl */
> +        mulpd     %xmm15, %xmm3
> +        mulpd     %xmm13, %xmm14
> +        unpcklpd  %xmm10, %xmm6
> +        orps      %xmm15, %xmm6
> +        addpd     %xmm4, %xmm6
> +        addpd     %xmm14, %xmm2
> +        addpd     %xmm3, %xmm6
> +
> +/* T = Th+Tl */
> +        movaps    %xmm6, %xmm5
> +        subpd     %xmm4, %xmm5
> +        mulpd     %xmm5, %xmm2
> +        addpd     %xmm2, %xmm6
> +        orps      %xmm1, %xmm6
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm6
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm6, %xmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm6, 48(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm6
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 xmm6
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      expm1@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVbN2v_expm1_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dexpm1_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 Expm1_HA_table[(1<<8)][2];
> +        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
> +        __declspec(align(16)) VUINT32 Log2e[2][2];
> +        __declspec(align(16)) VUINT32 L2H[2][2];
> +        __declspec(align(16)) VUINT32 L2L[2][2];
> +        __declspec(align(16)) VUINT32 ExpAddConst[2][2];
> +        __declspec(align(16)) VUINT32 IndexMask[2][2];
> +        __declspec(align(16)) VUINT32 ExpMask[2][2];
> +        __declspec(align(16)) VUINT32 MOne[2][2];
> +        __declspec(align(16)) VUINT32 AbsMask[2][2];
> +        __declspec(align(16)) VUINT32 Threshold[2][2];
> +        __declspec(align(16)) VUINT32 L2[2][2];
> +} __svml_dexpm1_data_internal;
> +#endif
> +__svml_dexpm1_data_internal:
> +        /* Expm1_HA_table */
> +        .quad 0x0000000000000000, 0x0000000000000000
> +        .quad 0x0000163da8000000, 0x3e3fb33356d84a67
> +        .quad 0x00002c9a40000000, 0xbe3887f9f1190835
> +        .quad 0x00004315e8000000, 0x3e1b9fe12f5ce3e7
> +        .quad 0x000059b0d0000000, 0x3e48ac2ba1d73e2a
> +        .quad 0x0000706b28000000, 0x3e3ddf6ddc6dc404
> +        .quad 0x0000874518000000, 0x3e1d66f20230d7c9
> +        .quad 0x00009e3ec8000000, 0x3e46379c1a290f03
> +        .quad 0x0000b55870000000, 0xbe4833b784eb3a37
> +        .quad 0x0000cc9228000000, 0x3e4b923fba03db83
> +        .quad 0x0000e3ec30000000, 0x3e469e8d10103a17
> +        .quad 0x0000fb66b0000000, 0xbdb2ce50dcdf6e22
> +        .quad 0x00011301d0000000, 0x3df25b50a4ebbf1b
> +        .quad 0x00012abdc0000000, 0x3e1b0c72fee4aeb5
> +        .quad 0x0001429ab0000000, 0xbe356d2204cbefe7
> +        .quad 0x00015a98c8000000, 0x3e24b1ca24901aae
> +        .quad 0x000172b840000000, 0xbe4c15742919041c
> +        .quad 0x00018af938000000, 0x3e2191bd3777ee17
> +        .quad 0x0001a35be8000000, 0x3e4b7e5ba9e5b4c8
> +        .quad 0x0001bbe088000000, 0xbe4fdd19632a70c7
> +        .quad 0x0001d48730000000, 0x3e368b9aa7805b80
> +        .quad 0x0001ed5020000000, 0x3e47e6c8e5c40d00
> +        .quad 0x0002063b88000000, 0x3e18a3358ee3bac1
> +        .quad 0x00021f4990000000, 0x3e37ddc962552fd3
> +        .quad 0x0002387a70000000, 0xbe38a9dc7993e052
> +        .quad 0x000251ce50000000, 0xbe135670329f5521
> +        .quad 0x00026b4568000000, 0xbe40ec1916d42cc6
> +        .quad 0x000284dfe0000000, 0x3e3f5638096cf15d
> +        .quad 0x00029e9df8000000, 0xbe470108f69ed175
> +        .quad 0x0002b87fd0000000, 0x3e2b5b31ffbbd48d
> +        .quad 0x0002d285a8000000, 0xbe31bfcf4bff6e2b
> +        .quad 0x0002ecafa8000000, 0x3e33e2f5611ca0f4
> +        .quad 0x000306fe08000000, 0x3e418db8a96f46ad
> +        .quad 0x0003217100000000, 0xbe4d993e76563187
> +        .quad 0x00033c08b0000000, 0x3e4320b7fa64e431
> +        .quad 0x000356c560000000, 0xbe1b5803cdae772e
> +        .quad 0x000371a738000000, 0xbe28aac6ab1d7560
> +        .quad 0x00038cae70000000, 0xbe47d13cd3d2b1a8
> +        .quad 0x0003a7db38000000, 0xbe48d30048af21b7
> +        .quad 0x0003c32dc0000000, 0x3e489d47242000f9
> +        .quad 0x0003dea650000000, 0xbe4f6e5eee525f6f
> +        .quad 0x0003fa4508000000, 0xbe4a9bff22fa047f
> +        .quad 0x0004160a20000000, 0x3e3f72e29f84325c
> +        .quad 0x000431f5d8000000, 0x3e350a896dc70444
> +        .quad 0x00044e0860000000, 0x3e18624b40c4dbd0
> +        .quad 0x00046a41f0000000, 0xbe4717fd446d7686
> +        .quad 0x000486a2b8000000, 0xbe41f6197f61f2e2
> +        .quad 0x0004a32af0000000, 0x3e2afa7bcce5b17a
> +        .quad 0x0004bfdad8000000, 0xbe464eaec715e343
> +        .quad 0x0004dcb298000000, 0x3e3fddd0d63b36ef
> +        .quad 0x0004f9b278000000, 0xbe362d35952cc275
> +        .quad 0x000516daa0000000, 0x3e467b320e0897a9
> +        .quad 0x0005342b58000000, 0xbe362b07e20f57c4
> +        .quad 0x000551a4c8000000, 0x3e42ec9076297631
> +        .quad 0x00056f4738000000, 0xbe34ad8259913500
> +        .quad 0x00058d12d8000000, 0xbe4b41c016d6a1ea
> +        .quad 0x0005ab07e0000000, 0xbe45bd5eb539b67f
> +        .quad 0x0005c92688000000, 0x3e42ca35b80e258e
> +        .quad 0x0005e76f18000000, 0xbe4296f5bc8b20da
> +        .quad 0x000605e1b8000000, 0x3e376dc08b076f59
> +        .quad 0x0006247eb0000000, 0x3e0d2ac258f87d03
> +        .quad 0x0006434638000000, 0xbe4999e701c483c7
> +        .quad 0x0006623880000000, 0x3e42a91124893ecf
> +        .quad 0x00068155d8000000, 0xbe4d9ab467bf1d47
> +        .quad 0x0006a09e68000000, 0xbe380c4336f74d05
> +        .quad 0x0006c01278000000, 0xbe47a12a08944ab3
> +        .quad 0x0006dfb240000000, 0xbe4cd72e886ef8ea
> +        .quad 0x0006ff7df8000000, 0x3e3519483cf87e1b
> +        .quad 0x00071f75e8000000, 0x3e2d8bee7ba46e1e
> +        .quad 0x00073f9a48000000, 0x3e24b02e77ab934a
> +        .quad 0x00075feb58000000, 0xbe3bd98374091656
> +        .quad 0x0007806950000000, 0xbe00d1604f328fec
> +        .quad 0x0007a11470000000, 0x3e4f580c36bea881
> +        .quad 0x0007c1ed00000000, 0x3e330c1327c49334
> +        .quad 0x0007e2f338000000, 0xbe330b19defa2fd4
> +        .quad 0x0008042758000000, 0xbe4e0f2f724f90cc
> +        .quad 0x0008258998000000, 0x3e34cce128acf88b
> +        .quad 0x0008471a48000000, 0xbe3dc385331ad094
> +        .quad 0x000868d998000000, 0x3e4a2497640720ed
> +        .quad 0x00088ac7d8000000, 0x3e38a669966530bd
> +        .quad 0x0008ace540000000, 0x3e415506dadd3e2b
> +        .quad 0x0008cf3218000000, 0xbe34abb7410d55e3
> +        .quad 0x0008f1ae98000000, 0x3e31577362b98274
> +        .quad 0x0009145b08000000, 0x3e4c8ffe2c4530da
> +        .quad 0x00093737b0000000, 0x3e29b8bc9e8a0388
> +        .quad 0x00095a44c8000000, 0x3e4e4290774da41b
> +        .quad 0x00097d82a0000000, 0xbe00d8d83a30b6f8
> +        .quad 0x0009a0f170000000, 0x3e2940f737462137
> +        .quad 0x0009c49180000000, 0x3e451f8480e3e236
> +        .quad 0x0009e86318000000, 0x3e3e323231824ca8
> +        .quad 0x000a0c6678000000, 0x3e4aef2b2594d6d4
> +        .quad 0x000a309bf0000000, 0xbe4dae966539f470
> +        .quad 0x000a5503b0000000, 0x3e41f12ae45a1225
> +        .quad 0x000a799e10000000, 0x3e49859ac3796fd9
> +        .quad 0x000a9e6b58000000, 0xbe44301205e0a6de
> +        .quad 0x000ac36bc0000000, 0xbe0606431f9234cb
> +        .quad 0x000ae89f98000000, 0x3e35ad3ad5e8734d
> +        .quad 0x000b0e0728000000, 0x3e38db66590842ad
> +        .quad 0x000b33a2b8000000, 0x3e13c57ebdaff43a
> +        .quad 0x000b597290000000, 0xbe40d536338e3bf7
> +        .quad 0x000b7f76f0000000, 0x3e47daf237553d84
> +        .quad 0x000ba5b030000000, 0x3e2420c930819679
> +        .quad 0x000bcc1e90000000, 0x3e12f074891ee83d
> +        .quad 0x000bf2c258000000, 0x3e4eb8f0442046b8
> +        .quad 0x000c199be0000000, 0xbe43d56b1eeef9a7
> +        .quad 0x000c40ab60000000, 0xbd87c2c975903ef8
> +        .quad 0x000c67f130000000, 0xbe3a82eb4b5dec80
> +        .quad 0x000c8f6d98000000, 0xbe4fc8c257729a1e
> +        .quad 0x000cb720e0000000, 0xbe48837cb757e1a1
> +        .quad 0x000cdf0b58000000, 0xbe4511e031dd83b5
> +        .quad 0x000d072d48000000, 0x3e403c4bdc687918
> +        .quad 0x000d2f8708000000, 0x3deb13e315bc2473
> +        .quad 0x000d5818e0000000, 0xbe4822dbc6d12fd3
> +        .quad 0x000d80e318000000, 0xbe3367c68447b063
> +        .quad 0x000da9e600000000, 0x3e4ed9942b84600d
> +        .quad 0x000dd321f0000000, 0x3e480da3025b4aef
> +        .quad 0x000dfc9730000000, 0x3e4bdcdaf5cb4656
> +        .quad 0x000e264618000000, 0xbe4852f6baf6c4f0
> +        .quad 0x000e502ee8000000, 0xbe1d30027630bb40
> +        .quad 0x000e7a51f8000000, 0x3e4e3a641a5aa459
> +        .quad 0x000ea4afa0000000, 0x3e452486cc2c7b9d
> +        .quad 0x000ecf4830000000, 0xbe438cc07b927e77
> +        .quad 0x000efa1bf0000000, 0xbe39ea5d888e02de
> +        .quad 0x000f252b38000000, 0xbe2288ad162f2d20
> +        .quad 0x000f507658000000, 0x3e4b722a033a7c26
> +        .quad 0x000f7bfdb0000000, 0xbe431a0f63b7625a
> +        .quad 0x000fa7c180000000, 0x3e39e90d82e90a7e
> +        .quad 0x000fd3c228000000, 0x3e4c7b8f884badd2
> +        /*== poly_coeff[4] ==*/
> +        .align 16
> +        .quad 0x3f81111168877F38, 0x3f81111168877F38 /* coeff5 */
> +        .quad 0x3fa55555C2A9C0F3, 0x3fa55555C2A9C0F3 /* coeff4 */
> +        .quad 0x3fc555555555541D, 0x3fc555555555541D /* coeff3 */
> +        .quad 0x3fdFFFFFFFFFFE5C, 0x3fdFFFFFFFFFFE5C /* coeff2 */
> +        /*== Log2e ==*/
> +        .align 16
> +        .quad 0x40671547652B82FE, 0x40671547652B82FE
> +        /*== L2H ==*/
> +        .align 16
> +        .quad 0x3f762e42fef80000, 0x3f762e42fef80000
> +        /*== L2L ==*/
> +        .align 16
> +        .quad 0x3d41cf79abc9e3b4, 0x3d41cf79abc9e3b4
> +        /*== ExpAddConst ==*/
> +        .align 16
> +        .quad 0x42f80000001ff800, 0x42f80000001ff800
> +        /*== IndexMask ==*/
> +        .align 16
> +        .quad 0x00000000000007f0, 0x00000000000007f0
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x00000000003ff800, 0x00000000003ff800
> +        /*== MOne ==*/
> +        .align 16
> +        .quad 0xbff0000000000000, 0xbff0000000000000
> +        /*== AbsMask ==*/
> +        .align 16
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== Threshold ==*/
> +        .align 16
> +        .quad 0x40861DA04CBAFE43, 0x40861DA04CBAFE43
> +        /*== L2 ==*/
> +        .align 16
> +        .quad 0x3f762e42fefa39ef, 0x3f762e42fefa39ef
> +        .align 16
> +        .type	__svml_dexpm1_data_internal,@object
> +        .size	__svml_dexpm1_data_internal,.-__svml_dexpm1_data_internal
> +        .align 16
> +
> +.FLT_10:
> +        .long	0x00000000,0x43380000,0x00000000,0x43380000
> +        .type	.FLT_10,@object
> +        .size	.FLT_10,16
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core-sse.S
> new file mode 100644
> index 0000000000..e7016708d0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized expm1, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_expm1 _ZGVdN4v_expm1_sse_wrapper
> +#include "../svml_d_expm14_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core.c
> new file mode 100644
> index 0000000000..4215d7dbaf
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized expm1, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_expm1
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_expm1, __GI__ZGVdN4v_expm1, __redirect__ZGVdN4v_expm1)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core_avx2.S
> new file mode 100644
> index 0000000000..c34f73a578
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm14_core_avx2.S
> @@ -0,0 +1,408 @@
> +/* Function expm1 vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    N = (int)(x*2^k/log(2.0)), R = x - N*log(2)/2^k
> + *    exp(x) = 2^(N/2^k) * poly(R) is computed in high-low parts
> + *    expm1(x) = exp(x)-1 is then obtained via multi-precision computation
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dexpm1_data_internal
> + */
> +#define Expm1_HA_table                	0
> +#define poly_coeff                    	2048
> +#define Log2e                         	2176
> +#define L2H                           	2208
> +#define L2L                           	2240
> +#define ExpAddConst                   	2272
> +#define IndexMask                     	2304
> +#define ExpMask                       	2336
> +#define MOne                          	2368
> +#define AbsMask                       	2400
> +#define Threshold                     	2432
> +#define L2                            	2464
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_expm1_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       __svml_dexpm1_data_internal(%rip), %r8
> +        vmovapd   %ymm0, %ymm3
> +        vmulpd    Log2e+__svml_dexpm1_data_internal(%rip), %ymm3, %ymm4
> +
> +/* argument reduction */
> +        vmovupd   L2H+__svml_dexpm1_data_internal(%rip), %ymm2
> +        vmovupd   AbsMask+__svml_dexpm1_data_internal(%rip), %ymm5
> +        vroundpd  $0, %ymm4, %ymm8
> +        vaddpd    ExpAddConst+__svml_dexpm1_data_internal(%rip), %ymm8, %ymm0
> +        vfnmadd213pd %ymm3, %ymm8, %ymm2
> +
> +/* table lookup */
> +        vandps    IndexMask+__svml_dexpm1_data_internal(%rip), %ymm0, %ymm9
> +        vandpd    %ymm5, %ymm3, %ymm6
> +        vcmpnle_uqpd Threshold+__svml_dexpm1_data_internal(%rip), %ymm6, %ymm7
> +        vfnmadd231pd L2L+__svml_dexpm1_data_internal(%rip), %ymm8, %ymm2
> +        vandnpd   %ymm3, %ymm5, %ymm1
> +        vmovmskpd %ymm7, %eax
> +        vmovupd   poly_coeff+64+__svml_dexpm1_data_internal(%rip), %ymm7
> +        vmulpd    %ymm2, %ymm2, %ymm8
> +        vfmadd213pd poly_coeff+96+__svml_dexpm1_data_internal(%rip), %ymm2, %ymm7
> +        vandps    ExpMask+__svml_dexpm1_data_internal(%rip), %ymm0, %ymm0
> +        vextractf128 $1, %ymm9, %xmm10
> +        vmovd     %xmm9, %edx
> +        vmovd     %xmm10, %esi
> +        vpextrd   $2, %xmm9, %ecx
> +        vpextrd   $2, %xmm10, %edi
> +        movslq    %edx, %rdx
> +        movslq    %ecx, %rcx
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        vmovupd   (%r8,%rdx), %xmm13
> +        vmovupd   (%r8,%rcx), %xmm14
> +        vmovupd   (%r8,%rsi), %xmm4
> +        vmovupd   (%r8,%rdi), %xmm5
> +        vunpcklpd %xmm14, %xmm13, %xmm11
> +        vunpcklpd %xmm5, %xmm4, %xmm12
> +        vpsllq    $41, %ymm0, %ymm10
> +        vunpckhpd %xmm14, %xmm13, %xmm15
> +        vunpckhpd %xmm5, %xmm4, %xmm13
> +        vinsertf128 $1, %xmm12, %ymm11, %ymm6
> +
> +/* polynomial */
> +        vmovupd   poly_coeff+__svml_dexpm1_data_internal(%rip), %ymm12
> +
> +/* T-1 */
> +        vmovupd   MOne+__svml_dexpm1_data_internal(%rip), %ymm11
> +        vfmadd213pd poly_coeff+32+__svml_dexpm1_data_internal(%rip), %ymm2, %ymm12
> +        vfmadd213pd %ymm7, %ymm8, %ymm12
> +        vorpd     %ymm10, %ymm6, %ymm9
> +        vfmadd213pd %ymm2, %ymm8, %ymm12
> +        vaddpd    %ymm11, %ymm9, %ymm2
> +        vinsertf128 $1, %xmm13, %ymm15, %ymm14
> +
> +/* Th1 = (Th-1) + Tl */
> +        vfmadd213pd %ymm2, %ymm10, %ymm14
> +
> +/* T = Th+Tl */
> +        vsubpd    %ymm11, %ymm14, %ymm0
> +        vfmadd213pd %ymm14, %ymm12, %ymm0
> +        vorpd     %ymm1, %ymm0, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm3, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      expm1@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_expm1_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dexpm1_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 Expm1_HA_table[(1<<8)][2];
> +        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
> +        __declspec(align(32)) VUINT32 Log2e[4][2];
> +        __declspec(align(32)) VUINT32 L2H[4][2];
> +        __declspec(align(32)) VUINT32 L2L[4][2];
> +        __declspec(align(32)) VUINT32 ExpAddConst[4][2];
> +        __declspec(align(32)) VUINT32 IndexMask[4][2];
> +        __declspec(align(32)) VUINT32 ExpMask[4][2];
> +        __declspec(align(32)) VUINT32 MOne[4][2];
> +        __declspec(align(32)) VUINT32 AbsMask[4][2];
> +        __declspec(align(32)) VUINT32 Threshold[4][2];
> +        __declspec(align(32)) VUINT32 L2[4][2];
> +} __svml_dexpm1_data_internal;
> +#endif
> +__svml_dexpm1_data_internal:
> +        /* Expm1_HA_table */
> +        .quad 0x0000000000000000, 0x0000000000000000
> +        .quad 0x0000163da8000000, 0x3e3fb33356d84a67
> +        .quad 0x00002c9a40000000, 0xbe3887f9f1190835
> +        .quad 0x00004315e8000000, 0x3e1b9fe12f5ce3e7
> +        .quad 0x000059b0d0000000, 0x3e48ac2ba1d73e2a
> +        .quad 0x0000706b28000000, 0x3e3ddf6ddc6dc404
> +        .quad 0x0000874518000000, 0x3e1d66f20230d7c9
> +        .quad 0x00009e3ec8000000, 0x3e46379c1a290f03
> +        .quad 0x0000b55870000000, 0xbe4833b784eb3a37
> +        .quad 0x0000cc9228000000, 0x3e4b923fba03db83
> +        .quad 0x0000e3ec30000000, 0x3e469e8d10103a17
> +        .quad 0x0000fb66b0000000, 0xbdb2ce50dcdf6e22
> +        .quad 0x00011301d0000000, 0x3df25b50a4ebbf1b
> +        .quad 0x00012abdc0000000, 0x3e1b0c72fee4aeb5
> +        .quad 0x0001429ab0000000, 0xbe356d2204cbefe7
> +        .quad 0x00015a98c8000000, 0x3e24b1ca24901aae
> +        .quad 0x000172b840000000, 0xbe4c15742919041c
> +        .quad 0x00018af938000000, 0x3e2191bd3777ee17
> +        .quad 0x0001a35be8000000, 0x3e4b7e5ba9e5b4c8
> +        .quad 0x0001bbe088000000, 0xbe4fdd19632a70c7
> +        .quad 0x0001d48730000000, 0x3e368b9aa7805b80
> +        .quad 0x0001ed5020000000, 0x3e47e6c8e5c40d00
> +        .quad 0x0002063b88000000, 0x3e18a3358ee3bac1
> +        .quad 0x00021f4990000000, 0x3e37ddc962552fd3
> +        .quad 0x0002387a70000000, 0xbe38a9dc7993e052
> +        .quad 0x000251ce50000000, 0xbe135670329f5521
> +        .quad 0x00026b4568000000, 0xbe40ec1916d42cc6
> +        .quad 0x000284dfe0000000, 0x3e3f5638096cf15d
> +        .quad 0x00029e9df8000000, 0xbe470108f69ed175
> +        .quad 0x0002b87fd0000000, 0x3e2b5b31ffbbd48d
> +        .quad 0x0002d285a8000000, 0xbe31bfcf4bff6e2b
> +        .quad 0x0002ecafa8000000, 0x3e33e2f5611ca0f4
> +        .quad 0x000306fe08000000, 0x3e418db8a96f46ad
> +        .quad 0x0003217100000000, 0xbe4d993e76563187
> +        .quad 0x00033c08b0000000, 0x3e4320b7fa64e431
> +        .quad 0x000356c560000000, 0xbe1b5803cdae772e
> +        .quad 0x000371a738000000, 0xbe28aac6ab1d7560
> +        .quad 0x00038cae70000000, 0xbe47d13cd3d2b1a8
> +        .quad 0x0003a7db38000000, 0xbe48d30048af21b7
> +        .quad 0x0003c32dc0000000, 0x3e489d47242000f9
> +        .quad 0x0003dea650000000, 0xbe4f6e5eee525f6f
> +        .quad 0x0003fa4508000000, 0xbe4a9bff22fa047f
> +        .quad 0x0004160a20000000, 0x3e3f72e29f84325c
> +        .quad 0x000431f5d8000000, 0x3e350a896dc70444
> +        .quad 0x00044e0860000000, 0x3e18624b40c4dbd0
> +        .quad 0x00046a41f0000000, 0xbe4717fd446d7686
> +        .quad 0x000486a2b8000000, 0xbe41f6197f61f2e2
> +        .quad 0x0004a32af0000000, 0x3e2afa7bcce5b17a
> +        .quad 0x0004bfdad8000000, 0xbe464eaec715e343
> +        .quad 0x0004dcb298000000, 0x3e3fddd0d63b36ef
> +        .quad 0x0004f9b278000000, 0xbe362d35952cc275
> +        .quad 0x000516daa0000000, 0x3e467b320e0897a9
> +        .quad 0x0005342b58000000, 0xbe362b07e20f57c4
> +        .quad 0x000551a4c8000000, 0x3e42ec9076297631
> +        .quad 0x00056f4738000000, 0xbe34ad8259913500
> +        .quad 0x00058d12d8000000, 0xbe4b41c016d6a1ea
> +        .quad 0x0005ab07e0000000, 0xbe45bd5eb539b67f
> +        .quad 0x0005c92688000000, 0x3e42ca35b80e258e
> +        .quad 0x0005e76f18000000, 0xbe4296f5bc8b20da
> +        .quad 0x000605e1b8000000, 0x3e376dc08b076f59
> +        .quad 0x0006247eb0000000, 0x3e0d2ac258f87d03
> +        .quad 0x0006434638000000, 0xbe4999e701c483c7
> +        .quad 0x0006623880000000, 0x3e42a91124893ecf
> +        .quad 0x00068155d8000000, 0xbe4d9ab467bf1d47
> +        .quad 0x0006a09e68000000, 0xbe380c4336f74d05
> +        .quad 0x0006c01278000000, 0xbe47a12a08944ab3
> +        .quad 0x0006dfb240000000, 0xbe4cd72e886ef8ea
> +        .quad 0x0006ff7df8000000, 0x3e3519483cf87e1b
> +        .quad 0x00071f75e8000000, 0x3e2d8bee7ba46e1e
> +        .quad 0x00073f9a48000000, 0x3e24b02e77ab934a
> +        .quad 0x00075feb58000000, 0xbe3bd98374091656
> +        .quad 0x0007806950000000, 0xbe00d1604f328fec
> +        .quad 0x0007a11470000000, 0x3e4f580c36bea881
> +        .quad 0x0007c1ed00000000, 0x3e330c1327c49334
> +        .quad 0x0007e2f338000000, 0xbe330b19defa2fd4
> +        .quad 0x0008042758000000, 0xbe4e0f2f724f90cc
> +        .quad 0x0008258998000000, 0x3e34cce128acf88b
> +        .quad 0x0008471a48000000, 0xbe3dc385331ad094
> +        .quad 0x000868d998000000, 0x3e4a2497640720ed
> +        .quad 0x00088ac7d8000000, 0x3e38a669966530bd
> +        .quad 0x0008ace540000000, 0x3e415506dadd3e2b
> +        .quad 0x0008cf3218000000, 0xbe34abb7410d55e3
> +        .quad 0x0008f1ae98000000, 0x3e31577362b98274
> +        .quad 0x0009145b08000000, 0x3e4c8ffe2c4530da
> +        .quad 0x00093737b0000000, 0x3e29b8bc9e8a0388
> +        .quad 0x00095a44c8000000, 0x3e4e4290774da41b
> +        .quad 0x00097d82a0000000, 0xbe00d8d83a30b6f8
> +        .quad 0x0009a0f170000000, 0x3e2940f737462137
> +        .quad 0x0009c49180000000, 0x3e451f8480e3e236
> +        .quad 0x0009e86318000000, 0x3e3e323231824ca8
> +        .quad 0x000a0c6678000000, 0x3e4aef2b2594d6d4
> +        .quad 0x000a309bf0000000, 0xbe4dae966539f470
> +        .quad 0x000a5503b0000000, 0x3e41f12ae45a1225
> +        .quad 0x000a799e10000000, 0x3e49859ac3796fd9
> +        .quad 0x000a9e6b58000000, 0xbe44301205e0a6de
> +        .quad 0x000ac36bc0000000, 0xbe0606431f9234cb
> +        .quad 0x000ae89f98000000, 0x3e35ad3ad5e8734d
> +        .quad 0x000b0e0728000000, 0x3e38db66590842ad
> +        .quad 0x000b33a2b8000000, 0x3e13c57ebdaff43a
> +        .quad 0x000b597290000000, 0xbe40d536338e3bf7
> +        .quad 0x000b7f76f0000000, 0x3e47daf237553d84
> +        .quad 0x000ba5b030000000, 0x3e2420c930819679
> +        .quad 0x000bcc1e90000000, 0x3e12f074891ee83d
> +        .quad 0x000bf2c258000000, 0x3e4eb8f0442046b8
> +        .quad 0x000c199be0000000, 0xbe43d56b1eeef9a7
> +        .quad 0x000c40ab60000000, 0xbd87c2c975903ef8
> +        .quad 0x000c67f130000000, 0xbe3a82eb4b5dec80
> +        .quad 0x000c8f6d98000000, 0xbe4fc8c257729a1e
> +        .quad 0x000cb720e0000000, 0xbe48837cb757e1a1
> +        .quad 0x000cdf0b58000000, 0xbe4511e031dd83b5
> +        .quad 0x000d072d48000000, 0x3e403c4bdc687918
> +        .quad 0x000d2f8708000000, 0x3deb13e315bc2473
> +        .quad 0x000d5818e0000000, 0xbe4822dbc6d12fd3
> +        .quad 0x000d80e318000000, 0xbe3367c68447b063
> +        .quad 0x000da9e600000000, 0x3e4ed9942b84600d
> +        .quad 0x000dd321f0000000, 0x3e480da3025b4aef
> +        .quad 0x000dfc9730000000, 0x3e4bdcdaf5cb4656
> +        .quad 0x000e264618000000, 0xbe4852f6baf6c4f0
> +        .quad 0x000e502ee8000000, 0xbe1d30027630bb40
> +        .quad 0x000e7a51f8000000, 0x3e4e3a641a5aa459
> +        .quad 0x000ea4afa0000000, 0x3e452486cc2c7b9d
> +        .quad 0x000ecf4830000000, 0xbe438cc07b927e77
> +        .quad 0x000efa1bf0000000, 0xbe39ea5d888e02de
> +        .quad 0x000f252b38000000, 0xbe2288ad162f2d20
> +        .quad 0x000f507658000000, 0x3e4b722a033a7c26
> +        .quad 0x000f7bfdb0000000, 0xbe431a0f63b7625a
> +        .quad 0x000fa7c180000000, 0x3e39e90d82e90a7e
> +        .quad 0x000fd3c228000000, 0x3e4c7b8f884badd2
> +        /*== poly_coeff[4] ==*/
> +        .align 32
> +        .quad 0x3f81111168877F38, 0x3f81111168877F38, 0x3f81111168877F38, 0x3f81111168877F38 /* coeff5 */
> +        .quad 0x3fa55555C2A9C0F3, 0x3fa55555C2A9C0F3, 0x3fa55555C2A9C0F3, 0x3fa55555C2A9C0F3 /* coeff4 */
> +        .quad 0x3fc555555555541D, 0x3fc555555555541D, 0x3fc555555555541D, 0x3fc555555555541D /* coeff3 */
> +        .quad 0x3fdFFFFFFFFFFE5C, 0x3fdFFFFFFFFFFE5C, 0x3fdFFFFFFFFFFE5C, 0x3fdFFFFFFFFFFE5C /* coeff2 */
> +        /*== Log2e ==*/
> +        .align 32
> +        .quad 0x40671547652B82FE, 0x40671547652B82FE, 0x40671547652B82FE, 0x40671547652B82FE
> +        /*== L2H ==*/
> +        .align 32
> +        .quad 0x3f762e42fef80000, 0x3f762e42fef80000, 0x3f762e42fef80000, 0x3f762e42fef80000
> +        /*== L2L ==*/
> +        .align 32
> +        .quad 0x3d41cf79abc9e3b4, 0x3d41cf79abc9e3b4, 0x3d41cf79abc9e3b4, 0x3d41cf79abc9e3b4
> +        /*== ExpAddConst ==*/
> +        .align 32
> +        .quad 0x42f80000001ff800, 0x42f80000001ff800, 0x42f80000001ff800, 0x42f80000001ff800
> +        /*== IndexMask ==*/
> +        .align 32
> +        .quad 0x00000000000007f0, 0x00000000000007f0, 0x00000000000007f0, 0x00000000000007f0
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x00000000003ff800, 0x00000000003ff800, 0x00000000003ff800, 0x00000000003ff800
> +        /*== MOne ==*/
> +        .align 32
> +        .quad 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000
> +        /*== AbsMask ==*/
> +        .align 32
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== Threshold ==*/
> +        .align 32
> +        .quad 0x40861DA04CBAFE43, 0x40861DA04CBAFE43, 0x40861DA04CBAFE43, 0x40861DA04CBAFE43
> +        /*== L2 ==*/
> +        .align 32
> +        .quad 0x3f762e42fefa39ef, 0x3f762e42fefa39ef, 0x3f762e42fefa39ef, 0x3f762e42fefa39ef
> +        .align 32
> +        .type	__svml_dexpm1_data_internal,@object
> +        .size	__svml_dexpm1_data_internal,.-__svml_dexpm1_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core-avx2.S
> new file mode 100644
> index 0000000000..3b75d1de16
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized expm1, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_expm1 _ZGVeN8v_expm1_avx2_wrapper
> +#include "../svml_d_expm18_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core.c
> new file mode 100644
> index 0000000000..860edf6df5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized expm1, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_expm1
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_expm1, __GI__ZGVeN8v_expm1, __redirect__ZGVeN8v_expm1)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core_avx512.S
> new file mode 100644
> index 0000000000..64cee91abd
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_expm18_core_avx512.S
> @@ -0,0 +1,334 @@
> +/* Function expm1 vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *   After computing exp(x) in high-low parts, an accurate computation is performed to obtain exp(x)-1
> + *   Typical exp() implementation, except that:
> + *    - tables are small (16 elements), allowing for fast gathers
> + *    - all arguments processed in the main path
> + *        - final VSCALEF assists branch-free design (correct overflow/underflow and special case responses)
> + *        - a VAND is used to ensure the reduced argument |R|<2, even for large inputs
> + *        - RZ mode used to avoid oveflow to +/-Inf for x*log2(e); helps with special case handling
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dexpm1_data_internal_avx512
> + */
> +#define Exp_tbl_H                     	0
> +#define Exp_tbl_L                     	128
> +#define L2E                           	256
> +#define Shifter                       	320
> +#define Threshold                     	384
> +#define SgnMask                       	448
> +#define L2H                           	512
> +#define L2L                           	576
> +#define ZThres                        	640
> +#define EMask                         	704
> +#define poly_coeff7                   	768
> +#define poly_coeff6                   	832
> +#define poly_coeff5                   	896
> +#define poly_coeff4                   	960
> +#define poly_coeff3                   	1024
> +#define poly_coeff2                   	1088
> +#define One                           	1152
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_expm1_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   L2E+__svml_dexpm1_data_internal_avx512(%rip), %zmm6
> +        vmovups   Shifter+__svml_dexpm1_data_internal_avx512(%rip), %zmm4
> +        vmovups   L2H+__svml_dexpm1_data_internal_avx512(%rip), %zmm11
> +        vmovups   L2L+__svml_dexpm1_data_internal_avx512(%rip), %zmm5
> +        vmovups   Threshold+__svml_dexpm1_data_internal_avx512(%rip), %zmm3
> +        vmovups   poly_coeff5+__svml_dexpm1_data_internal_avx512(%rip), %zmm13
> +        vmovups   poly_coeff4+__svml_dexpm1_data_internal_avx512(%rip), %zmm15
> +
> +/* polynomial */
> +        vmovups   poly_coeff7+__svml_dexpm1_data_internal_avx512(%rip), %zmm12
> +
> +/* set Z0=max(Z0, -128.0) */
> +        vmovups   ZThres+__svml_dexpm1_data_internal_avx512(%rip), %zmm8
> +        vmovups   poly_coeff3+__svml_dexpm1_data_internal_avx512(%rip), %zmm14
> +        vmovups   __svml_dexpm1_data_internal_avx512(%rip), %zmm9
> +        vmovaps   %zmm0, %zmm2
> +
> +/* 2^(52-4)*1.5 + x * log2(e) */
> +        vfmadd213pd {rn-sae}, %zmm4, %zmm2, %zmm6
> +        vmovups   Exp_tbl_L+__svml_dexpm1_data_internal_avx512(%rip), %zmm0
> +        vcmppd    $21, {sae}, %zmm3, %zmm2, %k0
> +
> +/* Z0 ~ x*log2(e), rounded to 4 fractional bits */
> +        vsubpd    {rn-sae}, %zmm4, %zmm6, %zmm7
> +        vpermt2pd Exp_tbl_H+64+__svml_dexpm1_data_internal_avx512(%rip), %zmm6, %zmm9
> +        vpermt2pd Exp_tbl_L+64+__svml_dexpm1_data_internal_avx512(%rip), %zmm6, %zmm0
> +        vandpd    SgnMask+__svml_dexpm1_data_internal_avx512(%rip), %zmm2, %zmm1
> +
> +/* R = x - Z0*log(2) */
> +        vfnmadd213pd {rn-sae}, %zmm2, %zmm7, %zmm11
> +        vmaxpd    {sae}, %zmm8, %zmm7, %zmm10
> +        vfnmadd231pd {rn-sae}, %zmm7, %zmm5, %zmm11
> +        kmovw     %k0, %edx
> +
> +/* ensure |R|<2 even for special cases */
> +        vandpd    EMask+__svml_dexpm1_data_internal_avx512(%rip), %zmm11, %zmm3
> +        vmovups   poly_coeff6+__svml_dexpm1_data_internal_avx512(%rip), %zmm11
> +
> +/* scale Th */
> +        vscalefpd {rn-sae}, %zmm10, %zmm9, %zmm4
> +        vfmadd231pd {rn-sae}, %zmm3, %zmm13, %zmm15
> +        vfmadd231pd {rn-sae}, %zmm3, %zmm12, %zmm11
> +        vmovups   poly_coeff2+__svml_dexpm1_data_internal_avx512(%rip), %zmm12
> +        vmulpd    {rn-sae}, %zmm3, %zmm3, %zmm13
> +        vfmadd231pd {rn-sae}, %zmm3, %zmm14, %zmm12
> +        vfmadd213pd {rn-sae}, %zmm15, %zmm13, %zmm11
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm13, %zmm11
> +
> +/* Tlr + R+ R*Poly */
> +        vfmadd213pd {rn-sae}, %zmm0, %zmm13, %zmm11
> +
> +/* Th - 1 */
> +        vmovups   One+__svml_dexpm1_data_internal_avx512(%rip), %zmm0
> +        vaddpd    {rn-sae}, %zmm3, %zmm11, %zmm14
> +        vsubpd    {rn-sae}, %zmm0, %zmm4, %zmm15
> +
> +/* (Th-1)+Th*(Tlr + R+ R*Poly) */
> +        vfmadd213pd {rn-sae}, %zmm15, %zmm14, %zmm4
> +        vorpd     %zmm1, %zmm4, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm2, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      expm1@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_expm1_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dexpm1_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Exp_tbl_H[16][2];
> +        __declspec(align(64)) VUINT32 Exp_tbl_L[16][2];
> +        __declspec(align(64)) VUINT32 L2E[8][2];
> +        __declspec(align(64)) VUINT32 Shifter[8][2];
> +        __declspec(align(64)) VUINT32 Threshold[8][2];
> +        __declspec(align(64)) VUINT32 SgnMask[8][2];
> +        __declspec(align(64)) VUINT32 L2H[8][2];
> +        __declspec(align(64)) VUINT32 L2L[8][2];
> +        __declspec(align(64)) VUINT32 ZThres[8][2];
> +        __declspec(align(64)) VUINT32 EMask[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> +        __declspec(align(64)) VUINT32 One[8][2];
> +    } __svml_dexpm1_data_internal_avx512;
> +#endif
> +__svml_dexpm1_data_internal_avx512:
> +        /*== Exp_tbl_H ==*/
> +        .quad 0x3ff0000000000000
> +        .quad 0x3ff0b5586cf9890f
> +        .quad 0x3ff172b83c7d517b
> +        .quad 0x3ff2387a6e756238
> +        .quad 0x3ff306fe0a31b715
> +        .quad 0x3ff3dea64c123422
> +        .quad 0x3ff4bfdad5362a27
> +        .quad 0x3ff5ab07dd485429
> +        .quad 0x3ff6a09e667f3bcd
> +        .quad 0x3ff7a11473eb0187
> +        .quad 0x3ff8ace5422aa0db
> +        .quad 0x3ff9c49182a3f090
> +        .quad 0x3ffae89f995ad3ad
> +        .quad 0x3ffc199bdd85529c
> +        .quad 0x3ffd5818dcfba487
> +        .quad 0x3ffea4afa2a490da
> +        /*== Exp_tbl_L ==*/
> +        .align 64
> +        .quad 0x0000000000000000
> +        .quad 0x3c979aa65d837b6d
> +        .quad 0xbc801b15eaa59348
> +        .quad 0x3c968efde3a8a894
> +        .quad 0x3c834d754db0abb6
> +        .quad 0x3c859f48a72a4c6d
> +        .quad 0x3c7690cebb7aafb0
> +        .quad 0x3c9063e1e21c5409
> +        .quad 0xbc93b3efbf5e2228
> +        .quad 0xbc7b32dcb94da51d
> +        .quad 0x3c8db72fc1f0eab4
> +        .quad 0x3c71affc2b91ce27
> +        .quad 0x3c8c1a7792cb3387
> +        .quad 0x3c736eae30af0cb3
> +        .quad 0x3c74a385a63d07a7
> +        .quad 0xbc8ff7128fd391f0
> +        /*== log2(e) ==*/
> +        .align 64
> +        .quad 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE
> +        /*== Shifter=2^(52-4)*1.5 ==*/
> +        .align 64
> +        .quad 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0
> +        /*== Threshold ==*/
> +        .align 64
> +        .quad 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44, 0x40861DA04CBAFE44
> +        /*== Sgn ==*/
> +        .align 64
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
> +        /*== L2H = log(2)_high ==*/
> +        .align 64
> +        .quad 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef, 0x3fe62e42fefa39ef
> +        /*== L2L = log(2)_low ==*/
> +        .align 64
> +        .quad 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f, 0x3c7abc9e3b39803f
> +        /*== ZThres ==*/
> +        .align 64
> +        .quad 0xc060000000000000, 0xc060000000000000, 0xc060000000000000, 0xc060000000000000, 0xc060000000000000, 0xc060000000000000, 0xc060000000000000, 0xc060000000000000
> +        /*== EMask ==*/
> +        .align 64
> +        .quad 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff
> +        /*== poly_coeff7 ==*/
> +        .align 64
> +        .quad 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a, 0x3f2a020410303d8a
> +        /*== poly_coeff6 ==*/
> +        .align 64
> +        .quad 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f, 0x3f56c1c38e164a2f
> +        /*== poly_coeff5 ==*/
> +        .align 64
> +        .quad 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214, 0x3f81111110865214
> +        /*== poly_coeff4 ==*/
> +        .align 64
> +        .quad 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06, 0x3fa5555554ad3d06
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .quad 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656, 0x3fc5555555555656
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .quad 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2, 0x3fe00000000000a2
> +        /*== One ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        .align 64
> +        .type	__svml_dexpm1_data_internal_avx512,@object
> +        .size	__svml_dexpm1_data_internal_avx512,.-__svml_dexpm1_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core-avx2.S
> new file mode 100644
> index 0000000000..a2a8699a05
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized expm1f.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_expm1f _ZGVeN16v_expm1f_avx2_wrapper
> +#include "../svml_s_expm1f16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core.c
> new file mode 100644
> index 0000000000..8007d1e415
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized expm1f, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_expm1f
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_expm1f, __GI__ZGVeN16v_expm1f,
> +	       __redirect__ZGVeN16v_expm1f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core_avx512.S
> new file mode 100644
> index 0000000000..5b0dcde77f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f16_core_avx512.S
> @@ -0,0 +1,281 @@
> +/* Function expm1f vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *   After computing exp(x) in high-low parts, an accurate computation is performed to obtain exp(x)-1
> + *   Typical exp() implementation, except that:
> + *    - tables are small (32 elements), allowing for fast gathers
> + *    - all arguments processed in the main path
> + *        - final VSCALEF assists branch-free design (correct overflow/underflow and special case responses)
> + *        - a VAND is used to ensure the reduced argument |R|<2, even for large inputs
> + *        - RZ mode used to avoid oveflow to +/-Inf for x*log2(e); helps with special case handling
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_sexpm1_data_internal_avx512
> + */
> +#define Exp_tbl_H                     	0
> +#define Exp_tbl_L                     	128
> +#define L2E                           	256
> +#define Shifter                       	320
> +#define Threshold                     	384
> +#define SgnMask                       	448
> +#define L2H                           	512
> +#define L2L                           	576
> +#define EMask                         	640
> +#define poly_coeff3                   	704
> +#define poly_coeff2                   	768
> +#define One                           	832
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_expm1f_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   L2E+__svml_sexpm1_data_internal_avx512(%rip), %zmm5
> +        vmovups   Shifter+__svml_sexpm1_data_internal_avx512(%rip), %zmm3
> +        vmovups   L2H+__svml_sexpm1_data_internal_avx512(%rip), %zmm8
> +        vmovups   L2L+__svml_sexpm1_data_internal_avx512(%rip), %zmm4
> +        vmovups   __svml_sexpm1_data_internal_avx512(%rip), %zmm6
> +
> +/* polynomial */
> +        vmovups   poly_coeff3+__svml_sexpm1_data_internal_avx512(%rip), %zmm9
> +        vmovups   poly_coeff2+__svml_sexpm1_data_internal_avx512(%rip), %zmm12
> +        vmovups   Exp_tbl_L+__svml_sexpm1_data_internal_avx512(%rip), %zmm11
> +        vmovups   Threshold+__svml_sexpm1_data_internal_avx512(%rip), %zmm2
> +
> +/* Th - 1 */
> +        vmovups   One+__svml_sexpm1_data_internal_avx512(%rip), %zmm14
> +        vmovaps   %zmm0, %zmm1
> +
> +/* 2^(52-5)*1.5 + x * log2(e) */
> +        vfmadd213ps {rn-sae}, %zmm3, %zmm1, %zmm5
> +        vcmpps    $29, {sae}, %zmm2, %zmm1, %k0
> +
> +/* Z0 ~ x*log2(e), rounded to 5 fractional bits */
> +        vsubps    {rn-sae}, %zmm3, %zmm5, %zmm7
> +        vpermt2ps Exp_tbl_H+64+__svml_sexpm1_data_internal_avx512(%rip), %zmm5, %zmm6
> +        vpermt2ps Exp_tbl_L+64+__svml_sexpm1_data_internal_avx512(%rip), %zmm5, %zmm11
> +        vandps    SgnMask+__svml_sexpm1_data_internal_avx512(%rip), %zmm1, %zmm0
> +
> +/* R = x - Z0*log(2) */
> +        vfnmadd213ps {rn-sae}, %zmm1, %zmm7, %zmm8
> +
> +/* scale Th */
> +        vscalefps {rn-sae}, %zmm7, %zmm6, %zmm2
> +        vfnmadd231ps {rn-sae}, %zmm7, %zmm4, %zmm8
> +        kmovw     %k0, %edx
> +
> +/* ensure |R|<2 even for special cases */
> +        vandps    EMask+__svml_sexpm1_data_internal_avx512(%rip), %zmm8, %zmm13
> +        vsubps    {rn-sae}, %zmm14, %zmm2, %zmm8
> +        vmulps    {rn-sae}, %zmm13, %zmm13, %zmm10
> +        vfmadd231ps {rn-sae}, %zmm13, %zmm9, %zmm12
> +
> +/* Tlr + R+ R2*Poly */
> +        vfmadd213ps {rn-sae}, %zmm11, %zmm10, %zmm12
> +        vaddps    {rn-sae}, %zmm13, %zmm12, %zmm15
> +
> +/* (Th-1)+Th*(Tlr + R+ R*Poly) */
> +        vfmadd213ps {rn-sae}, %zmm8, %zmm15, %zmm2
> +        vorps     %zmm0, %zmm2, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm1, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      expm1f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_expm1f_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_sexpm1_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Exp_tbl_H[32][1];
> +        __declspec(align(64)) VUINT32 Exp_tbl_L[32][1];
> +        __declspec(align(64)) VUINT32 L2E[16][1];
> +        __declspec(align(64)) VUINT32 Shifter[16][1];
> +        __declspec(align(64)) VUINT32 Threshold[16][1];
> +        __declspec(align(64)) VUINT32 SgnMask[16][1];
> +        __declspec(align(64)) VUINT32 L2H[16][1];
> +        __declspec(align(64)) VUINT32 L2L[16][1];
> +        __declspec(align(64)) VUINT32 EMask[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
> +        __declspec(align(64)) VUINT32 One[16][1];
> +    } __svml_sexpm1_data_internal_avx512;
> +#endif
> +__svml_sexpm1_data_internal_avx512:
> +        /*== Exp_tbl_H ==*/
> +        .long 0x3f800000, 0x3f82cd87, 0x3f85aac3, 0x3f88980f
> +        .long 0x3f8b95c2, 0x3f8ea43a, 0x3f91c3d3, 0x3f94f4f0
> +        .long 0x3f9837f0, 0x3f9b8d3a, 0x3f9ef532, 0x3fa27043
> +        .long 0x3fa5fed7, 0x3fa9a15b, 0x3fad583f, 0x3fb123f6
> +        .long 0x3fb504f3, 0x3fb8fbaf, 0x3fbd08a4, 0x3fc12c4d
> +        .long 0x3fc5672a, 0x3fc9b9be, 0x3fce248c, 0x3fd2a81e
> +        .long 0x3fd744fd, 0x3fdbfbb8, 0x3fe0ccdf, 0x3fe5b907
> +        .long 0x3feac0c7, 0x3fefe4ba, 0x3ff5257d, 0x3ffa83b3
> +        /*== Exp_tbl_L ==*/
> +        .align 64
> +        .long 0x00000000, 0xb34a3a0a, 0x3346cb6a, 0xb36ed17e
> +        .long 0xb24e0611, 0xb3517dd9, 0x334b2482, 0xb31586de
> +        .long 0x33092801, 0xb2e6f467, 0x331b85f2, 0x3099b6f1
> +        .long 0xb3051aa8, 0xb2e2a0da, 0xb2006c56, 0xb3365942
> +        .long 0x329302ae, 0x32c595dc, 0xb302e5a2, 0xb28e10a1
> +        .long 0x31b3d0e5, 0xb31a472b, 0x31d1daf2, 0xb305bf64
> +        .long 0xb27ce182, 0xb2f26443, 0xb1b4b0da, 0xb1da8a8f
> +        .long 0xb1d290be, 0xb2d5b899, 0x31b0a147, 0xb2156afc
> +        /*== log2(e) ==*/
> +        .align 64
> +        .long 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B
> +        /*== Shifter=2^(23-5)*1.5 ==*/
> +        .align 64
> +        .long 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000
> +        /*== Threshold ==*/
> +        .align 64
> +        .long 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B
> +        /*== Sgn ==*/
> +        .align 64
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000
> +        /*== L2H = log(2)_high ==*/
> +        .align 64
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> +        /*== L2L = log(2)_low ==*/
> +        .align 64
> +        .long 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308, 0xb102e308
> +        /*== EMask ==*/
> +        .align 64
> +        .long 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff, 0xbfffffff
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .long 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3, 0x3e2AABF3
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .long 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6, 0x3f0000F6
> +        /*== One ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        .align 64
> +        .type	__svml_sexpm1_data_internal_avx512,@object
> +        .size	__svml_sexpm1_data_internal_avx512,.-__svml_sexpm1_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core-sse2.S
> new file mode 100644
> index 0000000000..b4dbb77590
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized expm1f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_expm1f _ZGVbN4v_expm1f_sse2
> +#include "../svml_s_expm1f4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core.c
> new file mode 100644
> index 0000000000..f8ef12511d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized expm1f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_expm1f
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_expm1f, __GI__ZGVbN4v_expm1f,
> +	       __redirect__ZGVbN4v_expm1f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core_sse4.S
> new file mode 100644
> index 0000000000..18770f6dbb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f4_core_sse4.S
> @@ -0,0 +1,358 @@
> +/* Function expm1f vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    N = (int)(x*2^k/log(2.0)), R = x - N*log(2)/2^k
> + *    exp(x) = 2^(N/2^k) * poly(R) is computed in high-low parts
> + *    expm1(x) = exp(x)-1 is then obtained via multi-precision computation
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_sexpm1_data_internal
> + */
> +#define Expm1_HA_table                	0
> +#define poly_coeff                    	512
> +#define Log2e                         	576
> +#define L2H                           	592
> +#define L2L                           	608
> +#define ExpAddConst                   	624
> +#define IndexMask                     	640
> +#define ExpMask                       	656
> +#define MOne                          	672
> +#define AbsMask                       	688
> +#define Threshold                     	704
> +#define L2                            	720
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_expm1f_sse4)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $64, %rsp
> +        movaps    %xmm0, %xmm4
> +        movups    Log2e+__svml_sexpm1_data_internal(%rip), %xmm9
> +        lea       __svml_sexpm1_data_internal(%rip), %r8
> +        mulps     %xmm0, %xmm9
> +        movups    .FLT_10(%rip), %xmm5
> +        movups    ExpAddConst+__svml_sexpm1_data_internal(%rip), %xmm2
> +        addps     %xmm5, %xmm9
> +
> +/* argument reduction */
> +        movups    L2H+__svml_sexpm1_data_internal(%rip), %xmm6
> +        subps     %xmm5, %xmm9
> +        mulps     %xmm9, %xmm6
> +        addps     %xmm9, %xmm2
> +
> +/* table lookup */
> +        movdqu    IndexMask+__svml_sexpm1_data_internal(%rip), %xmm12
> +        subps     %xmm6, %xmm4
> +        pand      %xmm2, %xmm12
> +        movups    L2L+__svml_sexpm1_data_internal(%rip), %xmm7
> +        movups    AbsMask+__svml_sexpm1_data_internal(%rip), %xmm3
> +        pshufd    $1, %xmm12, %xmm10
> +        movaps    %xmm3, %xmm8
> +        mulps     %xmm9, %xmm7
> +        andps     %xmm0, %xmm8
> +        cmpnleps  Threshold+__svml_sexpm1_data_internal(%rip), %xmm8
> +        movd      %xmm12, %edx
> +        subps     %xmm7, %xmm4
> +        movd      %xmm10, %ecx
> +        movmskps  %xmm8, %eax
> +        pshufd    $2, %xmm12, %xmm11
> +        movaps    %xmm4, %xmm7
> +        pshufd    $3, %xmm12, %xmm13
> +        andnps    %xmm0, %xmm3
> +        movd      %xmm11, %esi
> +        movd      %xmm13, %edi
> +
> +/* polynomial */
> +        movups    poly_coeff+__svml_sexpm1_data_internal(%rip), %xmm8
> +        movdqu    ExpMask+__svml_sexpm1_data_internal(%rip), %xmm6
> +        movslq    %edx, %rdx
> +        pand      %xmm6, %xmm2
> +        movslq    %ecx, %rcx
> +        pslld     $14, %xmm2
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        movq      (%r8,%rdx), %xmm1
> +        movq      (%r8,%rcx), %xmm14
> +        movq      (%r8,%rsi), %xmm5
> +        movq      (%r8,%rdi), %xmm15
> +        unpcklps  %xmm14, %xmm1
> +        mulps     %xmm4, %xmm8
> +        movaps    %xmm1, %xmm10
> +        mulps     %xmm4, %xmm7
> +        addps     poly_coeff+16+__svml_sexpm1_data_internal(%rip), %xmm8
> +        unpcklps  %xmm15, %xmm5
> +        movlhps   %xmm5, %xmm10
> +        shufps    $238, %xmm5, %xmm1
> +        orps      %xmm2, %xmm10
> +
> +/* T-1 */
> +        movups    MOne+__svml_sexpm1_data_internal(%rip), %xmm9
> +        mulps     %xmm2, %xmm1
> +        addps     %xmm9, %xmm10
> +        mulps     %xmm7, %xmm8
> +        addps     %xmm1, %xmm10
> +        addps     %xmm8, %xmm4
> +        movaps    %xmm10, %xmm1
> +        subps     %xmm9, %xmm1
> +        mulps     %xmm1, %xmm4
> +        addps     %xmm4, %xmm10
> +        orps      %xmm3, %xmm10
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax xmm0 xmm10
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm10, %xmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm10, 48(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax
> +
> +        xorl      %edx, %edx
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm10
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 xmm10
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      expm1f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVbN4v_expm1f_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_sexpm1_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 Expm1_HA_table[(1<<7)][1];
> +        __declspec(align(16)) VUINT32 poly_coeff[4][4][1];
> +        __declspec(align(16)) VUINT32 Log2e[4][1];
> +        __declspec(align(16)) VUINT32 L2H[4][1];
> +        __declspec(align(16)) VUINT32 L2L[4][1];
> +        __declspec(align(16)) VUINT32 ExpAddConst[4][1];
> +        __declspec(align(16)) VUINT32 IndexMask[4][1];
> +        __declspec(align(16)) VUINT32 ExpMask[4][1];
> +        __declspec(align(16)) VUINT32 MOne[4][1];
> +        __declspec(align(16)) VUINT32 AbsMask[4][1];
> +        __declspec(align(16)) VUINT32 Threshold[4][1];
> +        __declspec(align(16)) VUINT32 L2[4][1];
> +} __svml_sexpm1_data_internal;
> +#endif
> +__svml_sexpm1_data_internal:
> +        /* Expm1_HA_table */
> +        .long 0x00000000, 0x00000000
> +        .long 0x00016000, 0x391a3e78
> +        .long 0x0002d000, 0xb89e59d5
> +        .long 0x00044000, 0xb93ae78a
> +        .long 0x0005b000, 0xb9279306
> +        .long 0x00072000, 0xb79e6961
> +        .long 0x0008a000, 0xb97e2fee
> +        .long 0x000a1000, 0x391aaea9
> +        .long 0x000b9000, 0x39383c7d
> +        .long 0x000d2000, 0xb9241490
> +        .long 0x000ea000, 0x39073169
> +        .long 0x00103000, 0x386e218a
> +        .long 0x0011c000, 0x38f4dceb
> +        .long 0x00136000, 0xb93a9a1e
> +        .long 0x0014f000, 0x391df520
> +        .long 0x00169000, 0x3905a6e4
> +        .long 0x00183000, 0x397e0a32
> +        .long 0x0019e000, 0x370b2641
> +        .long 0x001b9000, 0xb8b1918b
> +        .long 0x001d4000, 0xb8132c6a
> +        .long 0x001ef000, 0x39264c12
> +        .long 0x0020b000, 0x37221f73
> +        .long 0x00227000, 0x37060619
> +        .long 0x00243000, 0x3922b5c1
> +        .long 0x00260000, 0xb814ab27
> +        .long 0x0027d000, 0xb89b12c6
> +        .long 0x0029a000, 0x382d5a75
> +        .long 0x002b8000, 0xb938c94b
> +        .long 0x002d6000, 0xb97822b8
> +        .long 0x002f4000, 0xb910ea53
> +        .long 0x00312000, 0x38fd6075
> +        .long 0x00331000, 0x38620955
> +        .long 0x00350000, 0x391e667f
> +        .long 0x00370000, 0xb89b8736
> +        .long 0x00390000, 0xb90a1714
> +        .long 0x003b0000, 0xb7a54ded
> +        .long 0x003d1000, 0xb96b8c15
> +        .long 0x003f1000, 0x397336cf
> +        .long 0x00413000, 0xb8eccd66
> +        .long 0x00434000, 0x39599b45
> +        .long 0x00456000, 0x3965422b
> +        .long 0x00479000, 0xb8a2cdd5
> +        .long 0x0049c000, 0xb9484f32
> +        .long 0x004bf000, 0xb8fac043
> +        .long 0x004e2000, 0x391182a4
> +        .long 0x00506000, 0x38ccf6bc
> +        .long 0x0052b000, 0xb97c4dc2
> +        .long 0x0054f000, 0x38d6aaf4
> +        .long 0x00574000, 0x391f995b
> +        .long 0x0059a000, 0xb8ba8f62
> +        .long 0x005c0000, 0xb9090d05
> +        .long 0x005e6000, 0x37f4825e
> +        .long 0x0060d000, 0xb8c844f5
> +        .long 0x00634000, 0xb76d1a83
> +        .long 0x0065c000, 0xb95f2310
> +        .long 0x00684000, 0xb952b5f8
> +        .long 0x006ac000, 0x37c6e7dd
> +        .long 0x006d5000, 0xb7cfe126
> +        .long 0x006fe000, 0x3917337c
> +        .long 0x00728000, 0x383b9e2d
> +        .long 0x00752000, 0x392fa2a5
> +        .long 0x0077d000, 0x37df730b
> +        .long 0x007a8000, 0x38ecb6dd
> +        .long 0x007d4000, 0xb879f986
> +        /*== poly_coeff[4] ==*/
> +        .align 16
> +        .long 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF /* coeff3 */
> +        .long 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F /* coeff2 */
> +        /* 32 Byte Padding */
> +        .zero 32
> +        /*== Log2e ==*/
> +        .align 16
> +        .long 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B
> +        /*== L2H ==*/
> +        .align 16
> +        .long 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000
> +        /*== L2L ==*/
> +        .align 16
> +        .long 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083
> +        /*== ExpAddConst ==*/
> +        .align 16
> +        .long 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00
> +        /*== IndexMask ==*/
> +        .align 16
> +        .long 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8
> +        /*== ExpMask ==*/
> +        .align 16
> +        .long 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00
> +        /*== MOne ==*/
> +        .align 16
> +        .long 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000
> +        /*== AbsMask ==*/
> +        .align 16
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== Threshold ==*/
> +        .align 16
> +        .long 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B // 86.643394
> +        /*== L2 ==*/
> +        .align 16
> +        .long 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218
> +        .align 16
> +        .type	__svml_sexpm1_data_internal,@object
> +        .size	__svml_sexpm1_data_internal,.-__svml_sexpm1_data_internal
> +        .align 16
> +
> +.FLT_10:
> +        .long	0x4b400000,0x4b400000,0x4b400000,0x4b400000
> +        .type	.FLT_10,@object
> +        .size	.FLT_10,16
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core-sse.S
> new file mode 100644
> index 0000000000..e34e4eb8d0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized expm1f, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_expm1f _ZGVdN8v_expm1f_sse_wrapper
> +#include "../svml_s_expm1f8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core.c
> new file mode 100644
> index 0000000000..7e8b57de30
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized expm1f, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_expm1f
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_expm1f, __GI__ZGVdN8v_expm1f,
> +	       __redirect__ZGVdN8v_expm1f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core_avx2.S
> new file mode 100644
> index 0000000000..8e65d692d6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_expm1f8_core_avx2.S
> @@ -0,0 +1,351 @@
> +/* Function expm1f vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    N = (int)(x*2^k/log(2.0)), R = x - N*log(2)/2^k
> + *    exp(x) = 2^(N/2^k) * poly(R) is computed in high-low parts
> + *    expm1(x) = exp(x)-1 is then obtained via multi-precision computation
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_sexpm1_data_internal
> + */
> +#define Expm1_HA_table                	0
> +#define poly_coeff                    	512
> +#define Log2e                         	640
> +#define L2H                           	672
> +#define L2L                           	704
> +#define ExpAddConst                   	736
> +#define IndexMask                     	768
> +#define ExpMask                       	800
> +#define MOne                          	832
> +#define AbsMask                       	864
> +#define Threshold                     	896
> +#define L2                            	928
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_expm1f_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       __svml_sexpm1_data_internal(%rip), %rax
> +        vmovaps   %ymm0, %ymm3
> +        vmulps    Log2e+__svml_sexpm1_data_internal(%rip), %ymm3, %ymm4
> +
> +/* argument reduction */
> +        vmovups   L2H+__svml_sexpm1_data_internal(%rip), %ymm2
> +        vmovups   AbsMask+__svml_sexpm1_data_internal(%rip), %ymm5
> +        vroundps  $0, %ymm4, %ymm8
> +        vaddps    ExpAddConst+__svml_sexpm1_data_internal(%rip), %ymm8, %ymm0
> +        vfnmadd213ps %ymm3, %ymm8, %ymm2
> +
> +/* table lookup */
> +        vandps    IndexMask+__svml_sexpm1_data_internal(%rip), %ymm0, %ymm9
> +        vandps    %ymm5, %ymm3, %ymm6
> +        vcmpnle_uqps Threshold+__svml_sexpm1_data_internal(%rip), %ymm6, %ymm7
> +        vfnmadd231ps L2L+__svml_sexpm1_data_internal(%rip), %ymm8, %ymm2
> +        vandps    ExpMask+__svml_sexpm1_data_internal(%rip), %ymm0, %ymm0
> +        vandnps   %ymm3, %ymm5, %ymm1
> +        vpslld    $14, %ymm0, %ymm0
> +        vmovmskps %ymm7, %edx
> +        vmovd     %xmm9, %ecx
> +        vextractf128 $1, %ymm9, %xmm10
> +        movslq    %ecx, %rcx
> +        vmovd     %xmm10, %r9d
> +        vpextrd   $1, %xmm9, %esi
> +        vpextrd   $2, %xmm9, %edi
> +        vpextrd   $3, %xmm9, %r8d
> +        vmovq     (%rax,%rcx), %xmm11
> +        vpextrd   $1, %xmm10, %r10d
> +        vpextrd   $2, %xmm10, %r11d
> +        vpextrd   $3, %xmm10, %ecx
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        movslq    %r8d, %r8
> +        movslq    %r9d, %r9
> +        movslq    %r10d, %r10
> +        movslq    %r11d, %r11
> +        movslq    %ecx, %rcx
> +        vmovq     (%rax,%rsi), %xmm13
> +        vmovq     (%rax,%rdi), %xmm12
> +        vmovq     (%rax,%r8), %xmm14
> +        vmovq     (%rax,%r9), %xmm15
> +        vmovq     (%rax,%r10), %xmm5
> +        vmovq     (%rax,%r11), %xmm4
> +        vmovq     (%rax,%rcx), %xmm6
> +        vunpcklps %xmm12, %xmm11, %xmm7
> +        vunpcklps %xmm14, %xmm13, %xmm8
> +        vunpcklps %xmm4, %xmm15, %xmm15
> +        vunpcklps %xmm6, %xmm5, %xmm9
> +        vmulps    %ymm2, %ymm2, %ymm13
> +        vinsertf128 $1, %xmm15, %ymm7, %ymm10
> +        vinsertf128 $1, %xmm9, %ymm8, %ymm11
> +        vunpcklps %ymm11, %ymm10, %ymm12
> +        vorps     %ymm0, %ymm12, %ymm14
> +
> +/* polynomial */
> +        vmovups   poly_coeff+__svml_sexpm1_data_internal(%rip), %ymm12
> +        vfmadd213ps poly_coeff+32+__svml_sexpm1_data_internal(%rip), %ymm2, %ymm12
> +        vfmadd213ps %ymm2, %ymm13, %ymm12
> +
> +/* T-1 */
> +        vmovups   MOne+__svml_sexpm1_data_internal(%rip), %ymm13
> +        vaddps    %ymm13, %ymm14, %ymm2
> +        vunpckhps %ymm11, %ymm10, %ymm4
> +        vfmadd213ps %ymm2, %ymm0, %ymm4
> +        vsubps    %ymm13, %ymm4, %ymm0
> +        vfmadd213ps %ymm4, %ymm12, %ymm0
> +        vorps     %ymm1, %ymm0, %ymm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm3, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      expm1f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_expm1f_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_sexpm1_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 Expm1_HA_table[(1<<7)][1];
> +        __declspec(align(32)) VUINT32 poly_coeff[4][8][1];
> +        __declspec(align(32)) VUINT32 Log2e[8][1];
> +        __declspec(align(32)) VUINT32 L2H[8][1];
> +        __declspec(align(32)) VUINT32 L2L[8][1];
> +        __declspec(align(32)) VUINT32 ExpAddConst[8][1];
> +        __declspec(align(32)) VUINT32 IndexMask[8][1];
> +        __declspec(align(32)) VUINT32 ExpMask[8][1];
> +        __declspec(align(32)) VUINT32 MOne[8][1];
> +        __declspec(align(32)) VUINT32 AbsMask[8][1];
> +        __declspec(align(32)) VUINT32 Threshold[8][1];
> +        __declspec(align(32)) VUINT32 L2[8][1];
> +} __svml_sexpm1_data_internal;
> +#endif
> +__svml_sexpm1_data_internal:
> +        /* Expm1_HA_table */
> +        .long 0x00000000, 0x00000000
> +        .long 0x00016000, 0x391a3e78
> +        .long 0x0002d000, 0xb89e59d5
> +        .long 0x00044000, 0xb93ae78a
> +        .long 0x0005b000, 0xb9279306
> +        .long 0x00072000, 0xb79e6961
> +        .long 0x0008a000, 0xb97e2fee
> +        .long 0x000a1000, 0x391aaea9
> +        .long 0x000b9000, 0x39383c7d
> +        .long 0x000d2000, 0xb9241490
> +        .long 0x000ea000, 0x39073169
> +        .long 0x00103000, 0x386e218a
> +        .long 0x0011c000, 0x38f4dceb
> +        .long 0x00136000, 0xb93a9a1e
> +        .long 0x0014f000, 0x391df520
> +        .long 0x00169000, 0x3905a6e4
> +        .long 0x00183000, 0x397e0a32
> +        .long 0x0019e000, 0x370b2641
> +        .long 0x001b9000, 0xb8b1918b
> +        .long 0x001d4000, 0xb8132c6a
> +        .long 0x001ef000, 0x39264c12
> +        .long 0x0020b000, 0x37221f73
> +        .long 0x00227000, 0x37060619
> +        .long 0x00243000, 0x3922b5c1
> +        .long 0x00260000, 0xb814ab27
> +        .long 0x0027d000, 0xb89b12c6
> +        .long 0x0029a000, 0x382d5a75
> +        .long 0x002b8000, 0xb938c94b
> +        .long 0x002d6000, 0xb97822b8
> +        .long 0x002f4000, 0xb910ea53
> +        .long 0x00312000, 0x38fd6075
> +        .long 0x00331000, 0x38620955
> +        .long 0x00350000, 0x391e667f
> +        .long 0x00370000, 0xb89b8736
> +        .long 0x00390000, 0xb90a1714
> +        .long 0x003b0000, 0xb7a54ded
> +        .long 0x003d1000, 0xb96b8c15
> +        .long 0x003f1000, 0x397336cf
> +        .long 0x00413000, 0xb8eccd66
> +        .long 0x00434000, 0x39599b45
> +        .long 0x00456000, 0x3965422b
> +        .long 0x00479000, 0xb8a2cdd5
> +        .long 0x0049c000, 0xb9484f32
> +        .long 0x004bf000, 0xb8fac043
> +        .long 0x004e2000, 0x391182a4
> +        .long 0x00506000, 0x38ccf6bc
> +        .long 0x0052b000, 0xb97c4dc2
> +        .long 0x0054f000, 0x38d6aaf4
> +        .long 0x00574000, 0x391f995b
> +        .long 0x0059a000, 0xb8ba8f62
> +        .long 0x005c0000, 0xb9090d05
> +        .long 0x005e6000, 0x37f4825e
> +        .long 0x0060d000, 0xb8c844f5
> +        .long 0x00634000, 0xb76d1a83
> +        .long 0x0065c000, 0xb95f2310
> +        .long 0x00684000, 0xb952b5f8
> +        .long 0x006ac000, 0x37c6e7dd
> +        .long 0x006d5000, 0xb7cfe126
> +        .long 0x006fe000, 0x3917337c
> +        .long 0x00728000, 0x383b9e2d
> +        .long 0x00752000, 0x392fa2a5
> +        .long 0x0077d000, 0x37df730b
> +        .long 0x007a8000, 0x38ecb6dd
> +        .long 0x007d4000, 0xb879f986
> +        /*== poly_coeff[4] ==*/
> +        .align 32
> +        .long 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF, 0x3e2AAABF /* coeff3 */
> +        .long 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F, 0x3f00000F /* coeff2 */
> +        /* 64 Byte Padding */
> +        .zero 64
> +        /*== Log2e ==*/
> +        .align 32
> +        .long 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B, 0x42B8AA3B
> +        /*== L2H ==*/
> +        .align 32
> +        .long 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000, 0x3c318000
> +        /*== L2L ==*/
> +        .align 32
> +        .long 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083, 0xb65e8083
> +        /*== ExpAddConst ==*/
> +        .align 32
> +        .long 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00, 0x49f0fe00
> +        /*== IndexMask ==*/
> +        .align 32
> +        .long 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8, 0x000001f8
> +        /*== ExpMask ==*/
> +        .align 32
> +        .long 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00, 0x0001fe00
> +        /*== MOne ==*/
> +        .align 32
> +        .long 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000
> +        /*== AbsMask ==*/
> +        .align 32
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== Threshold ==*/
> +        .align 32
> +        .long 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B, 0x42AD496B // 86.643394
> +        /*== L2 ==*/
> +        .align 32
> +        .long 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218, 0x3cb17218
> +        .align 32
> +        .type	__svml_sexpm1_data_internal,@object
> +        .size	__svml_sexpm1_data_internal,.-__svml_sexpm1_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_expm12_core.S b/sysdeps/x86_64/fpu/svml_d_expm12_core.S
> new file mode 100644
> index 0000000000..a725d614bd
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_expm12_core.S
> @@ -0,0 +1,29 @@
> +/* Function expm1 vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_expm1)
> +WRAPPER_IMPL_SSE2 expm1
> +END (_ZGVbN2v_expm1)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_expm1)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_expm14_core.S b/sysdeps/x86_64/fpu/svml_d_expm14_core.S
> new file mode 100644
> index 0000000000..1027def883
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_expm14_core.S
> @@ -0,0 +1,29 @@
> +/* Function expm1 vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_expm1)
> +WRAPPER_IMPL_AVX _ZGVbN2v_expm1
> +END (_ZGVdN4v_expm1)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_expm1)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S b/sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S
> new file mode 100644
> index 0000000000..3a34262241
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_expm14_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function expm1 vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_expm1)
> +WRAPPER_IMPL_AVX _ZGVbN2v_expm1
> +END (_ZGVcN4v_expm1)
> diff --git a/sysdeps/x86_64/fpu/svml_d_expm18_core.S b/sysdeps/x86_64/fpu/svml_d_expm18_core.S
> new file mode 100644
> index 0000000000..fa97595665
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_expm18_core.S
> @@ -0,0 +1,25 @@
> +/* Function expm1 vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_expm1)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_expm1
> +END (_ZGVeN8v_expm1)
> diff --git a/sysdeps/x86_64/fpu/svml_s_expm1f16_core.S b/sysdeps/x86_64/fpu/svml_s_expm1f16_core.S
> new file mode 100644
> index 0000000000..b7423632a9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_expm1f16_core.S
> @@ -0,0 +1,25 @@
> +/* Function expm1f vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_expm1f)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_expm1f
> +END (_ZGVeN16v_expm1f)
> diff --git a/sysdeps/x86_64/fpu/svml_s_expm1f4_core.S b/sysdeps/x86_64/fpu/svml_s_expm1f4_core.S
> new file mode 100644
> index 0000000000..334a49133a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_expm1f4_core.S
> @@ -0,0 +1,29 @@
> +/* Function expm1f vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_expm1f)
> +WRAPPER_IMPL_SSE2 expm1f
> +END (_ZGVbN4v_expm1f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_expm1f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_expm1f8_core.S b/sysdeps/x86_64/fpu/svml_s_expm1f8_core.S
> new file mode 100644
> index 0000000000..10589574a5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_expm1f8_core.S
> @@ -0,0 +1,29 @@
> +/* Function expm1f vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_expm1f)
> +WRAPPER_IMPL_AVX _ZGVbN4v_expm1f
> +END (_ZGVdN8v_expm1f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_expm1f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S
> new file mode 100644
> index 0000000000..4161113615
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_expm1f8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function expm1f vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_expm1f)
> +WRAPPER_IMPL_AVX _ZGVbN4v_expm1f
> +END (_ZGVcN8v_expm1f)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx.c
> new file mode 100644
> index 0000000000..3e59cb7141
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-expm1.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx2.c
> new file mode 100644
> index 0000000000..3e59cb7141
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-expm1.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx512f.c
> new file mode 100644
> index 0000000000..3e59cb7141
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-expm1-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-expm1.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-expm1.c b/sysdeps/x86_64/fpu/test-double-libmvec-expm1.c
> new file mode 100644
> index 0000000000..33806a78c8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-expm1.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC expm1
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 68c449e04a..0222f9f5b8 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh)
> +VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index df67306373..1aad9faf9c 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -37,6 +37,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh)
> +VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 1a6731098f..e404bf899d 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh)
> +VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 4cdfa918e8..2b4de59343 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh)
> +VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx.c
> new file mode 100644
> index 0000000000..67e31f9666
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-expm1f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx2.c
> new file mode 100644
> index 0000000000..67e31f9666
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-expm1f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx512f.c
> new file mode 100644
> index 0000000000..67e31f9666
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-expm1f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-expm1f.c b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f.c
> new file mode 100644
> index 0000000000..aa9871a39d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-expm1f.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC expm1f
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 47a9862233..9a4a1b84a9 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf)
> +VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index e7c5410e7b..eb4e36d0e2 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf)
> +VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index b8e9d48cd6..d8adab59e6 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -37,6 +37,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf)
> +VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 328c827b27..e6e1a90c72 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -34,6 +34,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf)
> +VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 09/18] x86-64: Add vector cbrt/cbrtf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 09/18] x86-64: Add vector cbrt/cbrtf " Sunil K Pandey
@ 2021-12-29 21:25   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:25 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:51PM -0800, Sunil K Pandey wrote:
> Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
>  .../fpu/multiarch/svml_d_cbrt2_core-sse2.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_cbrt2_core.c  |  27 +
>  .../fpu/multiarch/svml_d_cbrt2_core_sse4.S    | 467 ++++++++++++++++
>  .../fpu/multiarch/svml_d_cbrt4_core-sse.S     |  20 +
>  .../x86_64/fpu/multiarch/svml_d_cbrt4_core.c  |  27 +
>  .../fpu/multiarch/svml_d_cbrt4_core_avx2.S    | 505 +++++++++++++++++
>  .../fpu/multiarch/svml_d_cbrt8_core-avx2.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_cbrt8_core.c  |  27 +
>  .../fpu/multiarch/svml_d_cbrt8_core_avx512.S  | 253 +++++++++
>  .../fpu/multiarch/svml_s_cbrtf16_core-avx2.S  |  20 +
>  .../fpu/multiarch/svml_s_cbrtf16_core.c       |  28 +
>  .../multiarch/svml_s_cbrtf16_core_avx512.S    | 235 ++++++++
>  .../fpu/multiarch/svml_s_cbrtf4_core-sse2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_s_cbrtf4_core.c |  28 +
>  .../fpu/multiarch/svml_s_cbrtf4_core_sse4.S   | 490 +++++++++++++++++
>  .../fpu/multiarch/svml_s_cbrtf8_core-sse.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_s_cbrtf8_core.c |  28 +
>  .../fpu/multiarch/svml_s_cbrtf8_core_avx2.S   | 509 ++++++++++++++++++
>  sysdeps/x86_64/fpu/svml_d_cbrt2_core.S        |  29 +
>  sysdeps/x86_64/fpu/svml_d_cbrt4_core.S        |  29 +
>  sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S    |  25 +
>  sysdeps/x86_64/fpu/svml_d_cbrt8_core.S        |  25 +
>  sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S      |  25 +
>  sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S       |  29 +
>  sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S       |  29 +
>  sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S   |  25 +
>  .../x86_64/fpu/test-double-libmvec-cbrt-avx.c |   1 +
>  .../fpu/test-double-libmvec-cbrt-avx2.c       |   1 +
>  .../fpu/test-double-libmvec-cbrt-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-cbrtf-avx.c |   1 +
>  .../fpu/test-float-libmvec-cbrtf-avx2.c       |   1 +
>  .../fpu/test-float-libmvec-cbrtf-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 3031 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_cbrt8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 6347320521..7f1304ed1d 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -197,4 +197,15 @@
>  #define __DECL_SIMD_sinhf32x
>  #define __DECL_SIMD_sinhf64x
>  #define __DECL_SIMD_sinhf128x
> +
> +#define __DECL_SIMD_cbrt
> +#define __DECL_SIMD_cbrtf
> +#define __DECL_SIMD_cbrtl
> +#define __DECL_SIMD_cbrtf16
> +#define __DECL_SIMD_cbrtf32
> +#define __DECL_SIMD_cbrtf64
> +#define __DECL_SIMD_cbrtf128
> +#define __DECL_SIMD_cbrtf32x
> +#define __DECL_SIMD_cbrtf64x
> +#define __DECL_SIMD_cbrtf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 673b3a93ba..26d18f0135 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -149,7 +149,7 @@ __MATHCALL_VEC (hypot,, (_Mdouble_ __x, _Mdouble_ __y));
>  
>  #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99
>  /* Return the cube root of X.  */
> -__MATHCALL (cbrt,, (_Mdouble_ __x));
> +__MATHCALL_VEC (cbrt,, (_Mdouble_ __x));
>  #endif
>  
>  
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index f9d7b085ab..a6558d9810 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -49,6 +49,7 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
>  GLIBC_2.35 _ZGVbN2v_acos F
>  GLIBC_2.35 _ZGVbN2v_asin F
>  GLIBC_2.35 _ZGVbN2v_atan F
> +GLIBC_2.35 _ZGVbN2v_cbrt F
>  GLIBC_2.35 _ZGVbN2v_cosh F
>  GLIBC_2.35 _ZGVbN2v_exp10 F
>  GLIBC_2.35 _ZGVbN2v_exp2 F
> @@ -58,6 +59,7 @@ GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
>  GLIBC_2.35 _ZGVbN4v_atanf F
> +GLIBC_2.35 _ZGVbN4v_cbrtf F
>  GLIBC_2.35 _ZGVbN4v_coshf F
>  GLIBC_2.35 _ZGVbN4v_exp10f F
>  GLIBC_2.35 _ZGVbN4v_exp2f F
> @@ -67,6 +69,7 @@ GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
>  GLIBC_2.35 _ZGVcN4v_asin F
>  GLIBC_2.35 _ZGVcN4v_atan F
> +GLIBC_2.35 _ZGVcN4v_cbrt F
>  GLIBC_2.35 _ZGVcN4v_cosh F
>  GLIBC_2.35 _ZGVcN4v_exp10 F
>  GLIBC_2.35 _ZGVcN4v_exp2 F
> @@ -76,6 +79,7 @@ GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
>  GLIBC_2.35 _ZGVcN8v_atanf F
> +GLIBC_2.35 _ZGVcN8v_cbrtf F
>  GLIBC_2.35 _ZGVcN8v_coshf F
>  GLIBC_2.35 _ZGVcN8v_exp10f F
>  GLIBC_2.35 _ZGVcN8v_exp2f F
> @@ -85,6 +89,7 @@ GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
>  GLIBC_2.35 _ZGVdN4v_asin F
>  GLIBC_2.35 _ZGVdN4v_atan F
> +GLIBC_2.35 _ZGVdN4v_cbrt F
>  GLIBC_2.35 _ZGVdN4v_cosh F
>  GLIBC_2.35 _ZGVdN4v_exp10 F
>  GLIBC_2.35 _ZGVdN4v_exp2 F
> @@ -94,6 +99,7 @@ GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
>  GLIBC_2.35 _ZGVdN8v_atanf F
> +GLIBC_2.35 _ZGVdN8v_cbrtf F
>  GLIBC_2.35 _ZGVdN8v_coshf F
>  GLIBC_2.35 _ZGVdN8v_exp10f F
>  GLIBC_2.35 _ZGVdN8v_exp2f F
> @@ -103,6 +109,7 @@ GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
>  GLIBC_2.35 _ZGVeN16v_atanf F
> +GLIBC_2.35 _ZGVeN16v_cbrtf F
>  GLIBC_2.35 _ZGVeN16v_coshf F
>  GLIBC_2.35 _ZGVeN16v_exp10f F
>  GLIBC_2.35 _ZGVeN16v_exp2f F
> @@ -112,6 +119,7 @@ GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
>  GLIBC_2.35 _ZGVeN8v_asin F
>  GLIBC_2.35 _ZGVeN8v_atan F
> +GLIBC_2.35 _ZGVeN8v_cbrt F
>  GLIBC_2.35 _ZGVeN8v_cosh F
>  GLIBC_2.35 _ZGVeN8v_exp10 F
>  GLIBC_2.35 _ZGVeN8v_exp2 F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 51a41cfebc..dcd45934ab 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -94,6 +94,10 @@
>  #  define __DECL_SIMD_sinh __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_sinhf
>  #  define __DECL_SIMD_sinhf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_cbrt
> +#  define __DECL_SIMD_cbrt __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_cbrtf
> +#  define __DECL_SIMD_cbrtf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 91e9b4fc83..dfb5f13ea3 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -46,6 +46,8 @@
>  !GCC$ builtin (expm1f) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (sinh) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (sinhf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (cbrt) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -77,3 +79,5 @@
>  !GCC$ builtin (expm1f) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (sinh) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (sinhf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (cbrt) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 81e9fc95b2..dde737c0d6 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -25,6 +25,7 @@ libmvec-funcs = \
>    acos \
>    asin \
>    atan \
> +  cbrt \
>    cos \
>    cosh \
>    exp \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 2710446d12..b70aeb3e2f 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -17,6 +17,7 @@ libmvec {
>      _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
>      _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
>      _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
> +    _ZGVbN2v_cbrt; _ZGVcN4v_cbrt; _ZGVdN4v_cbrt; _ZGVeN8v_cbrt;
>      _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh;
>      _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
>      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
> @@ -26,6 +27,7 @@ libmvec {
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
>      _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
> +    _ZGVbN4v_cbrtf; _ZGVcN8v_cbrtf; _ZGVdN8v_cbrtf; _ZGVeN16v_cbrtf;
>      _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf;
>      _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
>      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index f4b313119d..e039a993df 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -583,6 +583,26 @@ float: 1
>  float128: 1
>  ldouble: 1
>  
> +Function: "cbrt_vlen16":
> +float: 1
> +
> +Function: "cbrt_vlen2":
> +double: 1
> +
> +Function: "cbrt_vlen4":
> +double: 1
> +float: 2
> +
> +Function: "cbrt_vlen4_avx2":
> +double: 1
> +
> +Function: "cbrt_vlen8":
> +double: 1
> +float: 2
> +
> +Function: "cbrt_vlen8_avx2":
> +float: 2
> +
>  Function: Real part of "ccos":
>  double: 1
>  float: 1
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S
> new file mode 100644
> index 0000000000..60f4c46a11
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized cbrt, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_cbrt _ZGVbN2v_cbrt_sse2
> +#include "../svml_d_cbrt2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c
> new file mode 100644
> index 0000000000..07390b7150
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized cbrt, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_cbrt
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_cbrt, __GI__ZGVbN2v_cbrt, __redirect__ZGVbN2v_cbrt)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S
> new file mode 100644
> index 0000000000..72ecb25e05
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt2_core_sse4.S
> @@ -0,0 +1,467 @@
> +/* Function cbrt vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
> + *   Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
> + *   where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in double precision
> + *   cbrt(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
> + *   (T stores the high 53 bits, D stores the low order bits)
> + *   Result=2^k*T+(2^k*T*r)*P+2^k*D
> + *   where P=p1+p2*r+..+p8*r^7
> + *
> + */
> +
> +/* Offsets for data table __svml_dcbrt_data_internal
> + */
> +#define _dRcp                         	0
> +#define _dCbrtHiLo                    	256
> +#define _dA7                          	1024
> +#define _dA6                          	1040
> +#define _dA5                          	1056
> +#define _dA4                          	1072
> +#define _dA3                          	1088
> +#define _dA2                          	1104
> +#define _dA1                          	1120
> +#define _dNeg65Div64                  	1136
> +#define _dSgnf6Mask                   	1152
> +#define _dNegOne                      	1168
> +#define _dMantissaMask                	1184
> +#define _lExpHiMask                   	1200
> +#define _lExpLoMask                   	1216
> +#define _l1556                        	1232
> +#define _iRcpIndexMask                	1248
> +#define _iAbsMask                     	1264
> +#define _iSignMask                    	1280
> +#define _iBias                        	1296
> +#define _iSub                         	1312
> +#define _iCmp                         	1328
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_cbrt_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +
> +/* Calculate CbrtIndex */
> +        movaps    %xmm0, %xmm10
> +        psrlq     $52, %xmm10
> +
> +/* Load 1/(1+iRcpIndex/32+1/64) reciprocal table value */
> +        lea       __svml_dcbrt_data_internal(%rip), %r8
> +        pand      _lExpLoMask+__svml_dcbrt_data_internal(%rip), %xmm10
> +        movdqu    _l1556+__svml_dcbrt_data_internal(%rip), %xmm9
> +        pmuludq   %xmm10, %xmm9
> +
> +/* If the exponent field is zero - go to callout to process denormals */
> +        movq      _iAbsMask+__svml_dcbrt_data_internal(%rip), %xmm7
> +
> +/* Calculate Rcp table index */
> +        movq      _iRcpIndexMask+__svml_dcbrt_data_internal(%rip), %xmm13
> +
> +/* Get iX - high part of argument */
> +        pshufd    $221, %xmm0, %xmm4
> +
> +/*
> + * Declarations
> + * Load constants
> + */
> +        movq      _iSignMask+__svml_dcbrt_data_internal(%rip), %xmm1
> +        pand      %xmm4, %xmm7
> +        pand      %xmm4, %xmm13
> +
> +/* Compute 2^k */
> +        psrld     $20, %xmm4
> +        movq      _iBias+__svml_dcbrt_data_internal(%rip), %xmm2
> +        pand      %xmm1, %xmm4
> +        pshufd    $136, %xmm9, %xmm15
> +        por       %xmm2, %xmm4
> +        psrld     $14, %xmm15
> +        psrld     $12, %xmm13
> +        paddd     %xmm15, %xmm4
> +        pxor      %xmm2, %xmm2
> +        pslld     $20, %xmm4
> +        movdqa    %xmm15, %xmm11
> +        movd      %xmm13, %edx
> +        paddd     %xmm15, %xmm11
> +        pshufd    $1, %xmm13, %xmm8
> +        punpckldq %xmm4, %xmm2
> +
> +/*
> + * VAND( L, l2k, = l2k, lExpHiMask );
> + * Argument reduction Z
> + */
> +        movups    _dMantissaMask+__svml_dcbrt_data_internal(%rip), %xmm1
> +        movups    _dSgnf6Mask+__svml_dcbrt_data_internal(%rip), %xmm4
> +        andps     %xmm0, %xmm1
> +        movd      %xmm8, %ecx
> +        andps     %xmm0, %xmm4
> +        orps      _dNegOne+__svml_dcbrt_data_internal(%rip), %xmm1
> +        orps      _dNeg65Div64+__svml_dcbrt_data_internal(%rip), %xmm4
> +        movslq    %edx, %rdx
> +        subpd     %xmm4, %xmm1
> +        movslq    %ecx, %rcx
> +        movsd     (%r8,%rdx), %xmm3
> +        movq      _iSub+__svml_dcbrt_data_internal(%rip), %xmm5
> +        psubd     %xmm5, %xmm7
> +        movhpd    (%r8,%rcx), %xmm3
> +        mulpd     %xmm1, %xmm3
> +
> +/* Polynomial */
> +        movups    _dA7+__svml_dcbrt_data_internal(%rip), %xmm5
> +        mulpd     %xmm3, %xmm5
> +        addpd     _dA6+__svml_dcbrt_data_internal(%rip), %xmm5
> +        mulpd     %xmm3, %xmm5
> +        addpd     _dA5+__svml_dcbrt_data_internal(%rip), %xmm5
> +        mulpd     %xmm3, %xmm5
> +        addpd     _dA4+__svml_dcbrt_data_internal(%rip), %xmm5
> +        mulpd     %xmm3, %xmm5
> +        addpd     _dA3+__svml_dcbrt_data_internal(%rip), %xmm5
> +        pshufd    $136, %xmm10, %xmm12
> +        psubd     %xmm15, %xmm12
> +        psubd     %xmm11, %xmm12
> +        mulpd     %xmm3, %xmm5
> +        pslld     $8, %xmm12
> +        paddd     %xmm12, %xmm13
> +
> +/* Load cbrt(2^j*(1+iRcpIndex/32+1/64)) Hi & Lo values */
> +        movd      %xmm13, %esi
> +        pshufd    $1, %xmm13, %xmm14
> +        movq      _iCmp+__svml_dcbrt_data_internal(%rip), %xmm6
> +        movd      %xmm14, %edi
> +        pcmpgtd   %xmm6, %xmm7
> +        movmskps  %xmm7, %eax
> +        addpd     _dA2+__svml_dcbrt_data_internal(%rip), %xmm5
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        mulpd     %xmm3, %xmm5
> +        movsd     256(%r8,%rsi), %xmm6
> +        movhpd    256(%r8,%rdi), %xmm6
> +
> +/* THi*2^k, TLo*2^k */
> +        mulpd     %xmm2, %xmm6
> +        addpd     _dA1+__svml_dcbrt_data_internal(%rip), %xmm5
> +
> +/* THi*2^k*Z */
> +        mulpd     %xmm6, %xmm3
> +
> +/* Final reconstruction */
> +        mulpd     %xmm3, %xmm5
> +        addpd     %xmm5, %xmm6
> +        andl      $3, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm6
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm6, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm6, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax xmm6
> +
> +        xorl      %edx, %edx
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm6
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm6
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      cbrt@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN2v_cbrt_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dcbrt_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _dRcp[32][2];
> +        __declspec(align(16)) VUINT32 _dCbrtHiLo[96][2];
> +        __declspec(align(16)) VUINT32 _dA7[2][2];
> +        __declspec(align(16)) VUINT32 _dA6[2][2];
> +        __declspec(align(16)) VUINT32 _dA5[2][2];
> +        __declspec(align(16)) VUINT32 _dA4[2][2];
> +        __declspec(align(16)) VUINT32 _dA3[2][2];
> +        __declspec(align(16)) VUINT32 _dA2[2][2];
> +        __declspec(align(16)) VUINT32 _dA1[2][2];
> +        __declspec(align(16)) VUINT32 _dNeg65Div64[2][2];
> +        __declspec(align(16)) VUINT32 _dSgnf6Mask[2][2];
> +        __declspec(align(16)) VUINT32 _dNegOne[2][2];
> +        __declspec(align(16)) VUINT32 _dMantissaMask[2][2];
> +        __declspec(align(16)) VUINT32 _lExpHiMask[2][2];
> +        __declspec(align(16)) VUINT32 _lExpLoMask[2][2];
> +        __declspec(align(16)) VUINT32 _l1556[2][2];
> +        __declspec(align(16)) VUINT32 _iRcpIndexMask[4][1];
> +        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iSignMask[4][1];
> +        __declspec(align(16)) VUINT32 _iBias[4][1];
> +        __declspec(align(16)) VUINT32 _iSub[4][1];
> +        __declspec(align(16)) VUINT32 _iCmp[4][1];
> +} __svml_dcbrt_data_internal;
> +#endif
> +__svml_dcbrt_data_internal:
> +        /*== _dRcp ==*/
> +        .quad 0xBFEF81F81F81F820  /* (1/(1+0/32+1/64)) = -.984615 */
> +        .quad 0xBFEE9131ABF0B767  /* (1/(1+1/32+1/64)) = -.955224 */
> +        .quad 0xBFEDAE6076B981DB  /* (1/(1+2/32+1/64)) = -.927536 */
> +        .quad 0xBFECD85689039B0B  /* (1/(1+3/32+1/64)) = -.901408 */
> +        .quad 0xBFEC0E070381C0E0  /* (1/(1+4/32+1/64)) = -.876712 */
> +        .quad 0xBFEB4E81B4E81B4F  /* (1/(1+5/32+1/64)) = -.853333 */
> +        .quad 0xBFEA98EF606A63BE  /* (1/(1+6/32+1/64)) = -.831169 */
> +        .quad 0xBFE9EC8E951033D9  /* (1/(1+7/32+1/64)) = -.810127 */
> +        .quad 0xBFE948B0FCD6E9E0  /* (1/(1+8/32+1/64)) = -.790123 */
> +        .quad 0xBFE8ACB90F6BF3AA  /* (1/(1+9/32+1/64)) = -.771084 */
> +        .quad 0xBFE8181818181818  /* (1/(1+10/32+1/64)) = -.752941 */
> +        .quad 0xBFE78A4C8178A4C8  /* (1/(1+11/32+1/64)) = -.735632 */
> +        .quad 0xBFE702E05C0B8170  /* (1/(1+12/32+1/64)) = -.719101 */
> +        .quad 0xBFE6816816816817  /* (1/(1+13/32+1/64)) = -.703297 */
> +        .quad 0xBFE6058160581606  /* (1/(1+14/32+1/64)) = -.688172 */
> +        .quad 0xBFE58ED2308158ED  /* (1/(1+15/32+1/64)) = -.673684 */
> +        .quad 0xBFE51D07EAE2F815  /* (1/(1+16/32+1/64)) = -.659794 */
> +        .quad 0xBFE4AFD6A052BF5B  /* (1/(1+17/32+1/64)) = -.646465 */
> +        .quad 0xBFE446F86562D9FB  /* (1/(1+18/32+1/64)) = -.633663 */
> +        .quad 0xBFE3E22CBCE4A902  /* (1/(1+19/32+1/64)) = -.621359 */
> +        .quad 0xBFE3813813813814  /* (1/(1+20/32+1/64)) = -.609524 */
> +        .quad 0xBFE323E34A2B10BF  /* (1/(1+21/32+1/64)) = -.598131 */
> +        .quad 0xBFE2C9FB4D812CA0  /* (1/(1+22/32+1/64)) = -.587156 */
> +        .quad 0xBFE27350B8812735  /* (1/(1+23/32+1/64)) = -.576577 */
> +        .quad 0xBFE21FB78121FB78  /* (1/(1+24/32+1/64)) = -.566372 */
> +        .quad 0xBFE1CF06ADA2811D  /* (1/(1+25/32+1/64)) = -.556522 */
> +        .quad 0xBFE1811811811812  /* (1/(1+26/32+1/64)) = -.547009 */
> +        .quad 0xBFE135C81135C811  /* (1/(1+27/32+1/64)) = -.537815 */
> +        .quad 0xBFE0ECF56BE69C90  /* (1/(1+28/32+1/64)) = -.528926 */
> +        .quad 0xBFE0A6810A6810A7  /* (1/(1+29/32+1/64)) = -.520325 */
> +        .quad 0xBFE0624DD2F1A9FC  /* (1/(1+30/32+1/64)) = -.512    */
> +        .quad 0xBFE0204081020408  /* (1/(1+31/32+1/64)) = -.503937 */
> +        /*== _dCbrtHiLo ==*/
> +        .align 16
> +        .quad 0x3FF01539221D4C97    /* HI((2^0*(1+0/32+1/64))^(1/3)) = 1.005181 */
> +        .quad 0x3FF03F06771A2E33    /* HI((2^0*(1+1/32+1/64))^(1/3)) = 1.015387 */
> +        .quad 0x3FF06800E629D671    /* HI((2^0*(1+2/32+1/64))^(1/3)) = 1.025391 */
> +        .quad 0x3FF090328731DEB2    /* HI((2^0*(1+3/32+1/64))^(1/3)) = 1.035204 */
> +        .quad 0x3FF0B7A4B1BD64AC    /* HI((2^0*(1+4/32+1/64))^(1/3)) = 1.044835 */
> +        .quad 0x3FF0DE601024FB87    /* HI((2^0*(1+5/32+1/64))^(1/3)) = 1.054291 */
> +        .quad 0x3FF1046CB0597000    /* HI((2^0*(1+6/32+1/64))^(1/3)) = 1.06358  */
> +        .quad 0x3FF129D212A9BA9B    /* HI((2^0*(1+7/32+1/64))^(1/3)) = 1.07271  */
> +        .quad 0x3FF14E9736CDAF38    /* HI((2^0*(1+8/32+1/64))^(1/3)) = 1.081687 */
> +        .quad 0x3FF172C2A772F507    /* HI((2^0*(1+9/32+1/64))^(1/3)) = 1.090518 */
> +        .quad 0x3FF1965A848001D3    /* HI((2^0*(1+10/32+1/64))^(1/3)) = 1.099207 */
> +        .quad 0x3FF1B9648C38C55D    /* HI((2^0*(1+11/32+1/64))^(1/3)) = 1.107762 */
> +        .quad 0x3FF1DBE6236A0C45    /* HI((2^0*(1+12/32+1/64))^(1/3)) = 1.116186 */
> +        .quad 0x3FF1FDE45CBB1F9F    /* HI((2^0*(1+13/32+1/64))^(1/3)) = 1.124485 */
> +        .quad 0x3FF21F63FF409042    /* HI((2^0*(1+14/32+1/64))^(1/3)) = 1.132664 */
> +        .quad 0x3FF240698C6746E5    /* HI((2^0*(1+15/32+1/64))^(1/3)) = 1.140726 */
> +        .quad 0x3FF260F9454BB99B    /* HI((2^0*(1+16/32+1/64))^(1/3)) = 1.148675 */
> +        .quad 0x3FF281172F8E7073    /* HI((2^0*(1+17/32+1/64))^(1/3)) = 1.156516 */
> +        .quad 0x3FF2A0C719B4B6D0    /* HI((2^0*(1+18/32+1/64))^(1/3)) = 1.164252 */
> +        .quad 0x3FF2C00C9F2263EC    /* HI((2^0*(1+19/32+1/64))^(1/3)) = 1.171887 */
> +        .quad 0x3FF2DEEB2BB7FB78    /* HI((2^0*(1+20/32+1/64))^(1/3)) = 1.179423 */
> +        .quad 0x3FF2FD65FF1EFBBC    /* HI((2^0*(1+21/32+1/64))^(1/3)) = 1.186865 */
> +        .quad 0x3FF31B802FCCF6A2    /* HI((2^0*(1+22/32+1/64))^(1/3)) = 1.194214 */
> +        .quad 0x3FF3393CADC50708    /* HI((2^0*(1+23/32+1/64))^(1/3)) = 1.201474 */
> +        .quad 0x3FF3569E451E4C2A    /* HI((2^0*(1+24/32+1/64))^(1/3)) = 1.208647 */
> +        .quad 0x3FF373A7A0554CDE    /* HI((2^0*(1+25/32+1/64))^(1/3)) = 1.215736 */
> +        .quad 0x3FF3905B4A6D76CE    /* HI((2^0*(1+26/32+1/64))^(1/3)) = 1.222743 */
> +        .quad 0x3FF3ACBBB0E756B6    /* HI((2^0*(1+27/32+1/64))^(1/3)) = 1.229671 */
> +        .quad 0x3FF3C8CB258FA340    /* HI((2^0*(1+28/32+1/64))^(1/3)) = 1.236522 */
> +        .quad 0x3FF3E48BE02AC0CE    /* HI((2^0*(1+29/32+1/64))^(1/3)) = 1.243297 */
> +        .quad 0x3FF4000000000000    /* HI((2^0*(1+30/32+1/64))^(1/3)) = 1.25     */
> +        .quad 0x3FF41B298D47800E    /* HI((2^0*(1+31/32+1/64))^(1/3)) = 1.256631 */
> +        .quad 0x3FF443604B34D9B2    /* HI((2^1*(1+0/32+1/64))^(1/3)) = 1.266449 */
> +        .quad 0x3FF4780B20906571    /* HI((2^1*(1+1/32+1/64))^(1/3)) = 1.279307 */
> +        .quad 0x3FF4ABAC3EE06706    /* HI((2^1*(1+2/32+1/64))^(1/3)) = 1.291912 */
> +        .quad 0x3FF4DE505DA66B8D    /* HI((2^1*(1+3/32+1/64))^(1/3)) = 1.304276 */
> +        .quad 0x3FF51003420A5C07    /* HI((2^1*(1+4/32+1/64))^(1/3)) = 1.316409 */
> +        .quad 0x3FF540CFD6FD11C1    /* HI((2^1*(1+5/32+1/64))^(1/3)) = 1.328323 */
> +        .quad 0x3FF570C04260716B    /* HI((2^1*(1+6/32+1/64))^(1/3)) = 1.340027 */
> +        .quad 0x3FF59FDDF7A45F38    /* HI((2^1*(1+7/32+1/64))^(1/3)) = 1.35153  */
> +        .quad 0x3FF5CE31C83539DF    /* HI((2^1*(1+8/32+1/64))^(1/3)) = 1.36284  */
> +        .quad 0x3FF5FBC3F20966A4    /* HI((2^1*(1+9/32+1/64))^(1/3)) = 1.373966 */
> +        .quad 0x3FF6289C2C8F1B70    /* HI((2^1*(1+10/32+1/64))^(1/3)) = 1.384915 */
> +        .quad 0x3FF654C1B4316DCF    /* HI((2^1*(1+11/32+1/64))^(1/3)) = 1.395693 */
> +        .quad 0x3FF6803B54A34E44    /* HI((2^1*(1+12/32+1/64))^(1/3)) = 1.406307 */
> +        .quad 0x3FF6AB0F72182659    /* HI((2^1*(1+13/32+1/64))^(1/3)) = 1.416763 */
> +        .quad 0x3FF6D544118C08BC    /* HI((2^1*(1+14/32+1/64))^(1/3)) = 1.427067 */
> +        .quad 0x3FF6FEDEE0388D4A    /* HI((2^1*(1+15/32+1/64))^(1/3)) = 1.437224 */
> +        .quad 0x3FF727E53A4F645E    /* HI((2^1*(1+16/32+1/64))^(1/3)) = 1.44724  */
> +        .quad 0x3FF7505C31104114    /* HI((2^1*(1+17/32+1/64))^(1/3)) = 1.457119 */
> +        .quad 0x3FF77848904CD549    /* HI((2^1*(1+18/32+1/64))^(1/3)) = 1.466866 */
> +        .quad 0x3FF79FAEE36B2534    /* HI((2^1*(1+19/32+1/64))^(1/3)) = 1.476485 */
> +        .quad 0x3FF7C69379F4605B    /* HI((2^1*(1+20/32+1/64))^(1/3)) = 1.48598  */
> +        .quad 0x3FF7ECFA6BBCA391    /* HI((2^1*(1+21/32+1/64))^(1/3)) = 1.495356 */
> +        .quad 0x3FF812E79CAE7EB9    /* HI((2^1*(1+22/32+1/64))^(1/3)) = 1.504615 */
> +        .quad 0x3FF8385EC043C71D    /* HI((2^1*(1+23/32+1/64))^(1/3)) = 1.513762 */
> +        .quad 0x3FF85D635CB41B9D    /* HI((2^1*(1+24/32+1/64))^(1/3)) = 1.5228   */
> +        .quad 0x3FF881F8CDE083DB    /* HI((2^1*(1+25/32+1/64))^(1/3)) = 1.531731 */
> +        .quad 0x3FF8A6224802B8A8    /* HI((2^1*(1+26/32+1/64))^(1/3)) = 1.54056  */
> +        .quad 0x3FF8C9E2DA25E5E4    /* HI((2^1*(1+27/32+1/64))^(1/3)) = 1.549289 */
> +        .quad 0x3FF8ED3D706E1010    /* HI((2^1*(1+28/32+1/64))^(1/3)) = 1.55792  */
> +        .quad 0x3FF91034D632B6DF    /* HI((2^1*(1+29/32+1/64))^(1/3)) = 1.566457 */
> +        .quad 0x3FF932CBB7F0CF2D    /* HI((2^1*(1+30/32+1/64))^(1/3)) = 1.574901 */
> +        .quad 0x3FF95504A517BF3A    /* HI((2^1*(1+31/32+1/64))^(1/3)) = 1.583256 */
> +        .quad 0x3FF987AF34F8BB19    /* HI((2^2*(1+0/32+1/64))^(1/3)) = 1.595626 */
> +        .quad 0x3FF9CA0A8337B317    /* HI((2^2*(1+1/32+1/64))^(1/3)) = 1.611826 */
> +        .quad 0x3FFA0B1709CC13D5    /* HI((2^2*(1+2/32+1/64))^(1/3)) = 1.627708 */
> +        .quad 0x3FFA4AE4CE6419ED    /* HI((2^2*(1+3/32+1/64))^(1/3)) = 1.643285 */
> +        .quad 0x3FFA8982A5567031    /* HI((2^2*(1+4/32+1/64))^(1/3)) = 1.658572 */
> +        .quad 0x3FFAC6FE500AB570    /* HI((2^2*(1+5/32+1/64))^(1/3)) = 1.673582 */
> +        .quad 0x3FFB036497A15A17    /* HI((2^2*(1+6/32+1/64))^(1/3)) = 1.688328 */
> +        .quad 0x3FFB3EC164671755    /* HI((2^2*(1+7/32+1/64))^(1/3)) = 1.702821 */
> +        .quad 0x3FFB791FD288C46F    /* HI((2^2*(1+8/32+1/64))^(1/3)) = 1.717071 */
> +        .quad 0x3FFBB28A44693BE4    /* HI((2^2*(1+9/32+1/64))^(1/3)) = 1.731089 */
> +        .quad 0x3FFBEB0A72EB6E31    /* HI((2^2*(1+10/32+1/64))^(1/3)) = 1.744883 */
> +        .quad 0x3FFC22A97BF5F697    /* HI((2^2*(1+11/32+1/64))^(1/3)) = 1.758462 */
> +        .quad 0x3FFC596FEF6AF983    /* HI((2^2*(1+12/32+1/64))^(1/3)) = 1.771835 */
> +        .quad 0x3FFC8F65DAC655A3    /* HI((2^2*(1+13/32+1/64))^(1/3)) = 1.785009 */
> +        .quad 0x3FFCC492D38CE8D9    /* HI((2^2*(1+14/32+1/64))^(1/3)) = 1.797992 */
> +        .quad 0x3FFCF8FE00B19367    /* HI((2^2*(1+15/32+1/64))^(1/3)) = 1.810789 */
> +        .quad 0x3FFD2CAE230F8709    /* HI((2^2*(1+16/32+1/64))^(1/3)) = 1.823408 */
> +        .quad 0x3FFD5FA99D15208F    /* HI((2^2*(1+17/32+1/64))^(1/3)) = 1.835855 */
> +        .quad 0x3FFD91F679B6E505    /* HI((2^2*(1+18/32+1/64))^(1/3)) = 1.848135 */
> +        .quad 0x3FFDC39A72BF2302    /* HI((2^2*(1+19/32+1/64))^(1/3)) = 1.860255 */
> +        .quad 0x3FFDF49AF68C1570    /* HI((2^2*(1+20/32+1/64))^(1/3)) = 1.872218 */
> +        .quad 0x3FFE24FD2D4C23B8    /* HI((2^2*(1+21/32+1/64))^(1/3)) = 1.884031 */
> +        .quad 0x3FFE54C5FDC5EC73    /* HI((2^2*(1+22/32+1/64))^(1/3)) = 1.895697 */
> +        .quad 0x3FFE83FA11B81DBB    /* HI((2^2*(1+23/32+1/64))^(1/3)) = 1.907221 */
> +        .quad 0x3FFEB29DD9DBAF25    /* HI((2^2*(1+24/32+1/64))^(1/3)) = 1.918608 */
> +        .quad 0x3FFEE0B59191D374    /* HI((2^2*(1+25/32+1/64))^(1/3)) = 1.929861 */
> +        .quad 0x3FFF0E454245E4BF    /* HI((2^2*(1+26/32+1/64))^(1/3)) = 1.940984 */
> +        .quad 0x3FFF3B50C68A9DD3    /* HI((2^2*(1+27/32+1/64))^(1/3)) = 1.951981 */
> +        .quad 0x3FFF67DBCCF922DC    /* HI((2^2*(1+28/32+1/64))^(1/3)) = 1.962856 */
> +        .quad 0x3FFF93E9DAD7A4A6    /* HI((2^2*(1+29/32+1/64))^(1/3)) = 1.973612 */
> +        .quad 0x3FFFBF7E4E8CC9CB    /* HI((2^2*(1+30/32+1/64))^(1/3)) = 1.984251 */
> +        .quad 0x3FFFEA9C61E47CD3    /* HI((2^2*(1+31/32+1/64))^(1/3)) = 1.994778 */
> +        .align 16
> +        .quad 0x3F93750AD588F115, 0x3F93750AD588F115      /* _dA7 */
> +        .align 16
> +        .quad 0xBF98090D6221A247, 0xBF98090D6221A247      /* _dA6 */
> +        .align 16
> +        .quad 0x3F9EE7113506AC12, 0x3F9EE7113506AC12      /* _dA5 */
> +        .align 16
> +        .quad 0xBFA511E8D2B3183B, 0xBFA511E8D2B3183B      /* _dA4 */
> +        .align 16
> +        .quad 0x3FAF9ADD3C0CA458, 0x3FAF9ADD3C0CA458      /* _dA3 */
> +        .align 16
> +        .quad 0xBFBC71C71C71C71C, 0xBFBC71C71C71C71C      /* _dA2 */
> +        .align 16
> +        .quad 0x3FD5555555555555, 0x3FD5555555555555      /* _dA1 */
> +        .align 16
> +        .quad 0xBFF0400000000000, 0xBFF0400000000000        /* _dNeg65Div64 */
> +        .align 16
> +        .quad 0x000FC00000000000, 0x000FC00000000000        /* _dSgnf6Mask */
> +        .align 16
> +        .quad 0xBFF0000000000000, 0xBFF0000000000000        /* _dNegOne */
> +        .align 16
> +        .quad 0x000FFFFFFFFFFFFF, 0x000FFFFFFFFFFFFF        /* _dMantissaMask */
> +        .align 16
> +        .quad 0xFFF0000000000000, 0xFFF0000000000000        /* _lExpHiMask */
> +        .align 16
> +        .quad 0x00000000000007FF, 0x00000000000007FF        /* _lExpLoMask */
> +        .align 16
> +        .quad 0x0000000000001556, 0x0000000000001556        /* _l1556 */
> +        .align 16
> +        .long 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000    /* _iRcpIndexMask */
> +        .align 16
> +        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF    /* _iAbsMask */
> +        .align 16
> +        .long 0x00000800, 0x00000800, 0x00000800, 0x00000800    /* _iSignMask */
> +        .align 16
> +        .long 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA    /* _iBias */
> +        .align 16
> +        .long 0x80100000, 0x80100000, 0x80100000, 0x80100000    /* _iSub */
> +        .align 16
> +        .long 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff    /* _iCmp */
> +        .align 16
> +        .type	__svml_dcbrt_data_internal,@object
> +        .size	__svml_dcbrt_data_internal,.-__svml_dcbrt_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S
> new file mode 100644
> index 0000000000..3b54f31fbc
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized cbrt, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_cbrt _ZGVdN4v_cbrt_sse_wrapper
> +#include "../svml_d_cbrt4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c
> new file mode 100644
> index 0000000000..0b135877aa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized cbrt, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_cbrt
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_cbrt, __GI__ZGVdN4v_cbrt, __redirect__ZGVdN4v_cbrt)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S
> new file mode 100644
> index 0000000000..2223c5309f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt4_core_avx2.S
> @@ -0,0 +1,505 @@
> +/* Function cbrt vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
> + *   Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
> + *   where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in double precision
> + *   cbrt(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
> + *   (T stores the high 53 bits, D stores the low order bits)
> + *   Result=2^k*T+(2^k*T*r)*P+2^k*D
> + *   where P=p1+p2*r+..+p8*r^7
> + *
> + */
> +
> +/* Offsets for data table __svml_dcbrt_data_internal
> + */
> +#define _dRcp                         	0
> +#define _dCbrtHiLo                    	256
> +#define _dA7                          	1024
> +#define _dA6                          	1056
> +#define _dA5                          	1088
> +#define _dA4                          	1120
> +#define _dA3                          	1152
> +#define _dA2                          	1184
> +#define _dA1                          	1216
> +#define _dNeg65Div64                  	1248
> +#define _dSgnf6Mask                   	1280
> +#define _dNegOne                      	1312
> +#define _dMantissaMask                	1344
> +#define _lExpHiMask                   	1376
> +#define _lExpLoMask                   	1408
> +#define _l1556                        	1440
> +#define _iRcpIndexMask                	1472
> +#define _iAbsMask                     	1504
> +#define _iSignMask                    	1536
> +#define _iBias                        	1568
> +#define _iSub                         	1600
> +#define _iCmp                         	1632
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_cbrt_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +
> +/* Load 1/(1+iRcpIndex/32+1/64) reciprocal table value */
> +        lea       __svml_dcbrt_data_internal(%rip), %rax
> +        vmovapd   %ymm0, %ymm5
> +
> +/*
> + * Declarations
> + * Load constants
> + * Get iX - high part of argument
> + */
> +        vextractf128 $1, %ymm5, %xmm6
> +
> +/* Calculate CbrtIndex */
> +        vpsrlq    $52, %ymm5, %ymm15
> +        vshufps   $221, %xmm6, %xmm5, %xmm4
> +
> +/* Calculate Rcp table index */
> +        vandps    _iRcpIndexMask+__svml_dcbrt_data_internal(%rip), %xmm4, %xmm10
> +        vpsrld    $12, %xmm10, %xmm3
> +        vmovd     %xmm3, %ecx
> +
> +/* If the exponent field is zero - go to callout to process denormals */
> +        vandps    _iAbsMask+__svml_dcbrt_data_internal(%rip), %xmm4, %xmm7
> +
> +/* Compute 2^k */
> +        vpsrld    $20, %xmm4, %xmm4
> +        vpsubd    _iSub+__svml_dcbrt_data_internal(%rip), %xmm7, %xmm8
> +        vandps    _lExpLoMask+__svml_dcbrt_data_internal(%rip), %ymm15, %ymm0
> +        vpmuludq  _l1556+__svml_dcbrt_data_internal(%rip), %ymm0, %ymm6
> +        vpextrd   $2, %xmm3, %edi
> +        movslq    %ecx, %rcx
> +        vpextrd   $1, %xmm3, %esi
> +        movslq    %edi, %rdi
> +        vpextrd   $3, %xmm3, %r8d
> +        movslq    %esi, %rsi
> +        movslq    %r8d, %r8
> +        vpcmpgtd  _iCmp+__svml_dcbrt_data_internal(%rip), %xmm8, %xmm9
> +        vmovsd    (%rax,%rcx), %xmm11
> +        vmovmskps %xmm9, %edx
> +        vmovsd    (%rax,%rdi), %xmm13
> +        vmovhpd   (%rax,%rsi), %xmm11, %xmm12
> +        vmovhpd   (%rax,%r8), %xmm13, %xmm14
> +        vextractf128 $1, %ymm6, %xmm7
> +        vshufps   $136, %xmm7, %xmm6, %xmm8
> +        vmovups   __VUNPACK_ODD_ind1.613.0.1(%rip), %ymm7
> +        vextractf128 $1, %ymm0, %xmm1
> +        vshufps   $136, %xmm1, %xmm0, %xmm9
> +        vpsrld    $14, %xmm8, %xmm1
> +        vpsubd    %xmm1, %xmm9, %xmm10
> +        vpaddd    %xmm1, %xmm1, %xmm11
> +
> +/*
> + * VAND( L, l2k, = l2k, lExpHiMask );
> + * Argument reduction Z
> + */
> +        vandpd    _dMantissaMask+__svml_dcbrt_data_internal(%rip), %ymm5, %ymm9
> +        vinsertf128 $1, %xmm14, %ymm12, %ymm2
> +        vpsubd    %xmm11, %xmm10, %xmm12
> +        vpslld    $8, %xmm12, %xmm13
> +        vpaddd    %xmm13, %xmm3, %xmm15
> +
> +/* Load cbrt(2^j*(1+iRcpIndex/32+1/64)) Hi & Lo values */
> +        vmovd     %xmm15, %r9d
> +        vpextrd   $2, %xmm15, %r11d
> +        movslq    %r9d, %r9
> +        vpextrd   $1, %xmm15, %r10d
> +        movslq    %r11d, %r11
> +        vpextrd   $3, %xmm15, %ecx
> +        movslq    %r10d, %r10
> +        movslq    %ecx, %rcx
> +        vmovsd    256(%rax,%r9), %xmm3
> +        vmovsd    256(%rax,%r11), %xmm0
> +        vandpd    _dSgnf6Mask+__svml_dcbrt_data_internal(%rip), %ymm5, %ymm10
> +        vmovhpd   256(%rax,%r10), %xmm3, %xmm14
> +        vmovhpd   256(%rax,%rcx), %xmm0, %xmm3
> +        vorpd     _dNegOne+__svml_dcbrt_data_internal(%rip), %ymm9, %ymm11
> +        vorpd     _dNeg65Div64+__svml_dcbrt_data_internal(%rip), %ymm10, %ymm12
> +        vsubpd    %ymm12, %ymm11, %ymm13
> +        vmulpd    %ymm13, %ymm2, %ymm2
> +        vinsertf128 $1, %xmm3, %ymm14, %ymm0
> +        vpand     _iSignMask+__svml_dcbrt_data_internal(%rip), %xmm4, %xmm3
> +        vpor      _iBias+__svml_dcbrt_data_internal(%rip), %xmm3, %xmm4
> +        vpaddd    %xmm1, %xmm4, %xmm1
> +        vpslld    $20, %xmm1, %xmm6
> +
> +/* Polynomial */
> +        vmovupd   _dA7+__svml_dcbrt_data_internal(%rip), %ymm1
> +        vfmadd213pd _dA6+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213pd _dA5+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213pd _dA4+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213pd _dA3+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213pd _dA2+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213pd _dA1+__svml_dcbrt_data_internal(%rip), %ymm2, %ymm1
> +        vpermps   %ymm6, %ymm7, %ymm8
> +        vandps    __VUNPACK_ODD_mask.613.0.1(%rip), %ymm8, %ymm14
> +
> +/* THi*2^k, TLo*2^k */
> +        vmulpd    %ymm14, %ymm0, %ymm0
> +
> +/* THi*2^k*Z */
> +        vmulpd    %ymm0, %ymm2, %ymm2
> +
> +/* Final reconstruction */
> +        vmulpd    %ymm2, %ymm1, %ymm3
> +        vaddpd    %ymm3, %ymm0, %ymm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm5
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm5, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      cbrt@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_cbrt_avx2)
> +        .section .rodata, "a"
> +        .align 32
> +
> +__VUNPACK_ODD_ind1.613.0.1:
> +	.rept	3
> +        .long	0
> +	.endr
> +        .long	1
> +        .long	0
> +        .long	2
> +        .long	0
> +        .long	3
> +        .align 32
> +
> +__VUNPACK_ODD_mask.613.0.1:
> +        .long	0
> +        .long	-1
> +        .long	0
> +        .long	-1
> +        .long	0
> +        .long	-1
> +        .long	0
> +        .long	-1
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dcbrt_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _dRcp[32][2];
> +        __declspec(align(32)) VUINT32 _dCbrtHiLo[96][2];
> +        __declspec(align(32)) VUINT32 _dA7[4][2];
> +        __declspec(align(32)) VUINT32 _dA6[4][2];
> +        __declspec(align(32)) VUINT32 _dA5[4][2];
> +        __declspec(align(32)) VUINT32 _dA4[4][2];
> +        __declspec(align(32)) VUINT32 _dA3[4][2];
> +        __declspec(align(32)) VUINT32 _dA2[4][2];
> +        __declspec(align(32)) VUINT32 _dA1[4][2];
> +        __declspec(align(32)) VUINT32 _dNeg65Div64[4][2];
> +        __declspec(align(32)) VUINT32 _dSgnf6Mask[4][2];
> +        __declspec(align(32)) VUINT32 _dNegOne[4][2];
> +        __declspec(align(32)) VUINT32 _dMantissaMask[4][2];
> +        __declspec(align(32)) VUINT32 _lExpHiMask[4][2];
> +        __declspec(align(32)) VUINT32 _lExpLoMask[4][2];
> +        __declspec(align(32)) VUINT32 _l1556[4][2];
> +        __declspec(align(32)) VUINT32 _iRcpIndexMask[8][1];
> +        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iSignMask[8][1];
> +        __declspec(align(32)) VUINT32 _iBias[8][1];
> +        __declspec(align(32)) VUINT32 _iSub[8][1];
> +        __declspec(align(32)) VUINT32 _iCmp[8][1];
> +} __svml_dcbrt_data_internal;
> +#endif
> +__svml_dcbrt_data_internal:
> +        /*== _dRcp ==*/
> +        .quad 0xBFEF81F81F81F820  /* (1/(1+0/32+1/64)) = -.984615 */
> +        .quad 0xBFEE9131ABF0B767  /* (1/(1+1/32+1/64)) = -.955224 */
> +        .quad 0xBFEDAE6076B981DB  /* (1/(1+2/32+1/64)) = -.927536 */
> +        .quad 0xBFECD85689039B0B  /* (1/(1+3/32+1/64)) = -.901408 */
> +        .quad 0xBFEC0E070381C0E0  /* (1/(1+4/32+1/64)) = -.876712 */
> +        .quad 0xBFEB4E81B4E81B4F  /* (1/(1+5/32+1/64)) = -.853333 */
> +        .quad 0xBFEA98EF606A63BE  /* (1/(1+6/32+1/64)) = -.831169 */
> +        .quad 0xBFE9EC8E951033D9  /* (1/(1+7/32+1/64)) = -.810127 */
> +        .quad 0xBFE948B0FCD6E9E0  /* (1/(1+8/32+1/64)) = -.790123 */
> +        .quad 0xBFE8ACB90F6BF3AA  /* (1/(1+9/32+1/64)) = -.771084 */
> +        .quad 0xBFE8181818181818  /* (1/(1+10/32+1/64)) = -.752941 */
> +        .quad 0xBFE78A4C8178A4C8  /* (1/(1+11/32+1/64)) = -.735632 */
> +        .quad 0xBFE702E05C0B8170  /* (1/(1+12/32+1/64)) = -.719101 */
> +        .quad 0xBFE6816816816817  /* (1/(1+13/32+1/64)) = -.703297 */
> +        .quad 0xBFE6058160581606  /* (1/(1+14/32+1/64)) = -.688172 */
> +        .quad 0xBFE58ED2308158ED  /* (1/(1+15/32+1/64)) = -.673684 */
> +        .quad 0xBFE51D07EAE2F815  /* (1/(1+16/32+1/64)) = -.659794 */
> +        .quad 0xBFE4AFD6A052BF5B  /* (1/(1+17/32+1/64)) = -.646465 */
> +        .quad 0xBFE446F86562D9FB  /* (1/(1+18/32+1/64)) = -.633663 */
> +        .quad 0xBFE3E22CBCE4A902  /* (1/(1+19/32+1/64)) = -.621359 */
> +        .quad 0xBFE3813813813814  /* (1/(1+20/32+1/64)) = -.609524 */
> +        .quad 0xBFE323E34A2B10BF  /* (1/(1+21/32+1/64)) = -.598131 */
> +        .quad 0xBFE2C9FB4D812CA0  /* (1/(1+22/32+1/64)) = -.587156 */
> +        .quad 0xBFE27350B8812735  /* (1/(1+23/32+1/64)) = -.576577 */
> +        .quad 0xBFE21FB78121FB78  /* (1/(1+24/32+1/64)) = -.566372 */
> +        .quad 0xBFE1CF06ADA2811D  /* (1/(1+25/32+1/64)) = -.556522 */
> +        .quad 0xBFE1811811811812  /* (1/(1+26/32+1/64)) = -.547009 */
> +        .quad 0xBFE135C81135C811  /* (1/(1+27/32+1/64)) = -.537815 */
> +        .quad 0xBFE0ECF56BE69C90  /* (1/(1+28/32+1/64)) = -.528926 */
> +        .quad 0xBFE0A6810A6810A7  /* (1/(1+29/32+1/64)) = -.520325 */
> +        .quad 0xBFE0624DD2F1A9FC  /* (1/(1+30/32+1/64)) = -.512    */
> +        .quad 0xBFE0204081020408  /* (1/(1+31/32+1/64)) = -.503937 */
> +        /*== _dCbrtHiLo ==*/
> +        .align 32
> +        .quad 0x3FF01539221D4C97    /* HI((2^0*(1+0/32+1/64))^(1/3)) = 1.005181 */
> +        .quad 0x3FF03F06771A2E33    /* HI((2^0*(1+1/32+1/64))^(1/3)) = 1.015387 */
> +        .quad 0x3FF06800E629D671    /* HI((2^0*(1+2/32+1/64))^(1/3)) = 1.025391 */
> +        .quad 0x3FF090328731DEB2    /* HI((2^0*(1+3/32+1/64))^(1/3)) = 1.035204 */
> +        .quad 0x3FF0B7A4B1BD64AC    /* HI((2^0*(1+4/32+1/64))^(1/3)) = 1.044835 */
> +        .quad 0x3FF0DE601024FB87    /* HI((2^0*(1+5/32+1/64))^(1/3)) = 1.054291 */
> +        .quad 0x3FF1046CB0597000    /* HI((2^0*(1+6/32+1/64))^(1/3)) = 1.06358  */
> +        .quad 0x3FF129D212A9BA9B    /* HI((2^0*(1+7/32+1/64))^(1/3)) = 1.07271  */
> +        .quad 0x3FF14E9736CDAF38    /* HI((2^0*(1+8/32+1/64))^(1/3)) = 1.081687 */
> +        .quad 0x3FF172C2A772F507    /* HI((2^0*(1+9/32+1/64))^(1/3)) = 1.090518 */
> +        .quad 0x3FF1965A848001D3    /* HI((2^0*(1+10/32+1/64))^(1/3)) = 1.099207 */
> +        .quad 0x3FF1B9648C38C55D    /* HI((2^0*(1+11/32+1/64))^(1/3)) = 1.107762 */
> +        .quad 0x3FF1DBE6236A0C45    /* HI((2^0*(1+12/32+1/64))^(1/3)) = 1.116186 */
> +        .quad 0x3FF1FDE45CBB1F9F    /* HI((2^0*(1+13/32+1/64))^(1/3)) = 1.124485 */
> +        .quad 0x3FF21F63FF409042    /* HI((2^0*(1+14/32+1/64))^(1/3)) = 1.132664 */
> +        .quad 0x3FF240698C6746E5    /* HI((2^0*(1+15/32+1/64))^(1/3)) = 1.140726 */
> +        .quad 0x3FF260F9454BB99B    /* HI((2^0*(1+16/32+1/64))^(1/3)) = 1.148675 */
> +        .quad 0x3FF281172F8E7073    /* HI((2^0*(1+17/32+1/64))^(1/3)) = 1.156516 */
> +        .quad 0x3FF2A0C719B4B6D0    /* HI((2^0*(1+18/32+1/64))^(1/3)) = 1.164252 */
> +        .quad 0x3FF2C00C9F2263EC    /* HI((2^0*(1+19/32+1/64))^(1/3)) = 1.171887 */
> +        .quad 0x3FF2DEEB2BB7FB78    /* HI((2^0*(1+20/32+1/64))^(1/3)) = 1.179423 */
> +        .quad 0x3FF2FD65FF1EFBBC    /* HI((2^0*(1+21/32+1/64))^(1/3)) = 1.186865 */
> +        .quad 0x3FF31B802FCCF6A2    /* HI((2^0*(1+22/32+1/64))^(1/3)) = 1.194214 */
> +        .quad 0x3FF3393CADC50708    /* HI((2^0*(1+23/32+1/64))^(1/3)) = 1.201474 */
> +        .quad 0x3FF3569E451E4C2A    /* HI((2^0*(1+24/32+1/64))^(1/3)) = 1.208647 */
> +        .quad 0x3FF373A7A0554CDE    /* HI((2^0*(1+25/32+1/64))^(1/3)) = 1.215736 */
> +        .quad 0x3FF3905B4A6D76CE    /* HI((2^0*(1+26/32+1/64))^(1/3)) = 1.222743 */
> +        .quad 0x3FF3ACBBB0E756B6    /* HI((2^0*(1+27/32+1/64))^(1/3)) = 1.229671 */
> +        .quad 0x3FF3C8CB258FA340    /* HI((2^0*(1+28/32+1/64))^(1/3)) = 1.236522 */
> +        .quad 0x3FF3E48BE02AC0CE    /* HI((2^0*(1+29/32+1/64))^(1/3)) = 1.243297 */
> +        .quad 0x3FF4000000000000    /* HI((2^0*(1+30/32+1/64))^(1/3)) = 1.25     */
> +        .quad 0x3FF41B298D47800E    /* HI((2^0*(1+31/32+1/64))^(1/3)) = 1.256631 */
> +        .quad 0x3FF443604B34D9B2    /* HI((2^1*(1+0/32+1/64))^(1/3)) = 1.266449 */
> +        .quad 0x3FF4780B20906571    /* HI((2^1*(1+1/32+1/64))^(1/3)) = 1.279307 */
> +        .quad 0x3FF4ABAC3EE06706    /* HI((2^1*(1+2/32+1/64))^(1/3)) = 1.291912 */
> +        .quad 0x3FF4DE505DA66B8D    /* HI((2^1*(1+3/32+1/64))^(1/3)) = 1.304276 */
> +        .quad 0x3FF51003420A5C07    /* HI((2^1*(1+4/32+1/64))^(1/3)) = 1.316409 */
> +        .quad 0x3FF540CFD6FD11C1    /* HI((2^1*(1+5/32+1/64))^(1/3)) = 1.328323 */
> +        .quad 0x3FF570C04260716B    /* HI((2^1*(1+6/32+1/64))^(1/3)) = 1.340027 */
> +        .quad 0x3FF59FDDF7A45F38    /* HI((2^1*(1+7/32+1/64))^(1/3)) = 1.35153  */
> +        .quad 0x3FF5CE31C83539DF    /* HI((2^1*(1+8/32+1/64))^(1/3)) = 1.36284  */
> +        .quad 0x3FF5FBC3F20966A4    /* HI((2^1*(1+9/32+1/64))^(1/3)) = 1.373966 */
> +        .quad 0x3FF6289C2C8F1B70    /* HI((2^1*(1+10/32+1/64))^(1/3)) = 1.384915 */
> +        .quad 0x3FF654C1B4316DCF    /* HI((2^1*(1+11/32+1/64))^(1/3)) = 1.395693 */
> +        .quad 0x3FF6803B54A34E44    /* HI((2^1*(1+12/32+1/64))^(1/3)) = 1.406307 */
> +        .quad 0x3FF6AB0F72182659    /* HI((2^1*(1+13/32+1/64))^(1/3)) = 1.416763 */
> +        .quad 0x3FF6D544118C08BC    /* HI((2^1*(1+14/32+1/64))^(1/3)) = 1.427067 */
> +        .quad 0x3FF6FEDEE0388D4A    /* HI((2^1*(1+15/32+1/64))^(1/3)) = 1.437224 */
> +        .quad 0x3FF727E53A4F645E    /* HI((2^1*(1+16/32+1/64))^(1/3)) = 1.44724  */
> +        .quad 0x3FF7505C31104114    /* HI((2^1*(1+17/32+1/64))^(1/3)) = 1.457119 */
> +        .quad 0x3FF77848904CD549    /* HI((2^1*(1+18/32+1/64))^(1/3)) = 1.466866 */
> +        .quad 0x3FF79FAEE36B2534    /* HI((2^1*(1+19/32+1/64))^(1/3)) = 1.476485 */
> +        .quad 0x3FF7C69379F4605B    /* HI((2^1*(1+20/32+1/64))^(1/3)) = 1.48598  */
> +        .quad 0x3FF7ECFA6BBCA391    /* HI((2^1*(1+21/32+1/64))^(1/3)) = 1.495356 */
> +        .quad 0x3FF812E79CAE7EB9    /* HI((2^1*(1+22/32+1/64))^(1/3)) = 1.504615 */
> +        .quad 0x3FF8385EC043C71D    /* HI((2^1*(1+23/32+1/64))^(1/3)) = 1.513762 */
> +        .quad 0x3FF85D635CB41B9D    /* HI((2^1*(1+24/32+1/64))^(1/3)) = 1.5228   */
> +        .quad 0x3FF881F8CDE083DB    /* HI((2^1*(1+25/32+1/64))^(1/3)) = 1.531731 */
> +        .quad 0x3FF8A6224802B8A8    /* HI((2^1*(1+26/32+1/64))^(1/3)) = 1.54056  */
> +        .quad 0x3FF8C9E2DA25E5E4    /* HI((2^1*(1+27/32+1/64))^(1/3)) = 1.549289 */
> +        .quad 0x3FF8ED3D706E1010    /* HI((2^1*(1+28/32+1/64))^(1/3)) = 1.55792  */
> +        .quad 0x3FF91034D632B6DF    /* HI((2^1*(1+29/32+1/64))^(1/3)) = 1.566457 */
> +        .quad 0x3FF932CBB7F0CF2D    /* HI((2^1*(1+30/32+1/64))^(1/3)) = 1.574901 */
> +        .quad 0x3FF95504A517BF3A    /* HI((2^1*(1+31/32+1/64))^(1/3)) = 1.583256 */
> +        .quad 0x3FF987AF34F8BB19    /* HI((2^2*(1+0/32+1/64))^(1/3)) = 1.595626 */
> +        .quad 0x3FF9CA0A8337B317    /* HI((2^2*(1+1/32+1/64))^(1/3)) = 1.611826 */
> +        .quad 0x3FFA0B1709CC13D5    /* HI((2^2*(1+2/32+1/64))^(1/3)) = 1.627708 */
> +        .quad 0x3FFA4AE4CE6419ED    /* HI((2^2*(1+3/32+1/64))^(1/3)) = 1.643285 */
> +        .quad 0x3FFA8982A5567031    /* HI((2^2*(1+4/32+1/64))^(1/3)) = 1.658572 */
> +        .quad 0x3FFAC6FE500AB570    /* HI((2^2*(1+5/32+1/64))^(1/3)) = 1.673582 */
> +        .quad 0x3FFB036497A15A17    /* HI((2^2*(1+6/32+1/64))^(1/3)) = 1.688328 */
> +        .quad 0x3FFB3EC164671755    /* HI((2^2*(1+7/32+1/64))^(1/3)) = 1.702821 */
> +        .quad 0x3FFB791FD288C46F    /* HI((2^2*(1+8/32+1/64))^(1/3)) = 1.717071 */
> +        .quad 0x3FFBB28A44693BE4    /* HI((2^2*(1+9/32+1/64))^(1/3)) = 1.731089 */
> +        .quad 0x3FFBEB0A72EB6E31    /* HI((2^2*(1+10/32+1/64))^(1/3)) = 1.744883 */
> +        .quad 0x3FFC22A97BF5F697    /* HI((2^2*(1+11/32+1/64))^(1/3)) = 1.758462 */
> +        .quad 0x3FFC596FEF6AF983    /* HI((2^2*(1+12/32+1/64))^(1/3)) = 1.771835 */
> +        .quad 0x3FFC8F65DAC655A3    /* HI((2^2*(1+13/32+1/64))^(1/3)) = 1.785009 */
> +        .quad 0x3FFCC492D38CE8D9    /* HI((2^2*(1+14/32+1/64))^(1/3)) = 1.797992 */
> +        .quad 0x3FFCF8FE00B19367    /* HI((2^2*(1+15/32+1/64))^(1/3)) = 1.810789 */
> +        .quad 0x3FFD2CAE230F8709    /* HI((2^2*(1+16/32+1/64))^(1/3)) = 1.823408 */
> +        .quad 0x3FFD5FA99D15208F    /* HI((2^2*(1+17/32+1/64))^(1/3)) = 1.835855 */
> +        .quad 0x3FFD91F679B6E505    /* HI((2^2*(1+18/32+1/64))^(1/3)) = 1.848135 */
> +        .quad 0x3FFDC39A72BF2302    /* HI((2^2*(1+19/32+1/64))^(1/3)) = 1.860255 */
> +        .quad 0x3FFDF49AF68C1570    /* HI((2^2*(1+20/32+1/64))^(1/3)) = 1.872218 */
> +        .quad 0x3FFE24FD2D4C23B8    /* HI((2^2*(1+21/32+1/64))^(1/3)) = 1.884031 */
> +        .quad 0x3FFE54C5FDC5EC73    /* HI((2^2*(1+22/32+1/64))^(1/3)) = 1.895697 */
> +        .quad 0x3FFE83FA11B81DBB    /* HI((2^2*(1+23/32+1/64))^(1/3)) = 1.907221 */
> +        .quad 0x3FFEB29DD9DBAF25    /* HI((2^2*(1+24/32+1/64))^(1/3)) = 1.918608 */
> +        .quad 0x3FFEE0B59191D374    /* HI((2^2*(1+25/32+1/64))^(1/3)) = 1.929861 */
> +        .quad 0x3FFF0E454245E4BF    /* HI((2^2*(1+26/32+1/64))^(1/3)) = 1.940984 */
> +        .quad 0x3FFF3B50C68A9DD3    /* HI((2^2*(1+27/32+1/64))^(1/3)) = 1.951981 */
> +        .quad 0x3FFF67DBCCF922DC    /* HI((2^2*(1+28/32+1/64))^(1/3)) = 1.962856 */
> +        .quad 0x3FFF93E9DAD7A4A6    /* HI((2^2*(1+29/32+1/64))^(1/3)) = 1.973612 */
> +        .quad 0x3FFFBF7E4E8CC9CB    /* HI((2^2*(1+30/32+1/64))^(1/3)) = 1.984251 */
> +        .quad 0x3FFFEA9C61E47CD3    /* HI((2^2*(1+31/32+1/64))^(1/3)) = 1.994778 */
> +        .align 32
> +        .quad 0x3F93750AD588F115, 0x3F93750AD588F115, 0x3F93750AD588F115, 0x3F93750AD588F115      /* _dA7 */
> +        .align 32
> +        .quad 0xBF98090D6221A247, 0xBF98090D6221A247, 0xBF98090D6221A247, 0xBF98090D6221A247      /* _dA6 */
> +        .align 32
> +        .quad 0x3F9EE7113506AC12, 0x3F9EE7113506AC12, 0x3F9EE7113506AC12, 0x3F9EE7113506AC12      /* _dA5 */
> +        .align 32
> +        .quad 0xBFA511E8D2B3183B, 0xBFA511E8D2B3183B, 0xBFA511E8D2B3183B, 0xBFA511E8D2B3183B      /* _dA4 */
> +        .align 32
> +        .quad 0x3FAF9ADD3C0CA458, 0x3FAF9ADD3C0CA458, 0x3FAF9ADD3C0CA458, 0x3FAF9ADD3C0CA458      /* _dA3 */
> +        .align 32
> +        .quad 0xBFBC71C71C71C71C, 0xBFBC71C71C71C71C, 0xBFBC71C71C71C71C, 0xBFBC71C71C71C71C      /* _dA2 */
> +        .align 32
> +        .quad 0x3FD5555555555555, 0x3FD5555555555555, 0x3FD5555555555555, 0x3FD5555555555555      /* _dA1 */
> +        .align 32
> +        .quad 0xBFF0400000000000, 0xBFF0400000000000, 0xBFF0400000000000, 0xBFF0400000000000        /* _dNeg65Div64 */
> +        .align 32
> +        .quad 0x000FC00000000000, 0x000FC00000000000, 0x000FC00000000000, 0x000FC00000000000        /* _dSgnf6Mask */
> +        .align 32
> +        .quad 0xBFF0000000000000, 0xBFF0000000000000, 0xBFF0000000000000, 0xBFF0000000000000        /* _dNegOne */
> +        .align 32
> +        .quad 0x000FFFFFFFFFFFFF, 0x000FFFFFFFFFFFFF, 0x000FFFFFFFFFFFFF, 0x000FFFFFFFFFFFFF        /* _dMantissaMask */
> +        .align 32
> +        .quad 0xFFF0000000000000, 0xFFF0000000000000, 0xFFF0000000000000, 0xFFF0000000000000        /* _lExpHiMask */
> +        .align 32
> +        .quad 0x00000000000007FF, 0x00000000000007FF, 0x00000000000007FF, 0x00000000000007FF        /* _lExpLoMask */
> +        .align 32
> +        .quad 0x0000000000001556, 0x0000000000001556, 0x0000000000001556, 0x0000000000001556        /* _l1556 */
> +        .align 32
> +        .long 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000, 0x000F8000    /* _iRcpIndexMask */
> +        .align 32
> +        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF    /* _iAbsMask */
> +        .align 32
> +        .long 0x00000800, 0x00000800, 0x00000800, 0x00000800, 0x00000800, 0x00000800, 0x00000800, 0x00000800    /* _iSignMask */
> +        .align 32
> +        .long 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA, 0x000002AA    /* _iBias */
> +        .align 32
> +        .long 0x80100000, 0x80100000, 0x80100000, 0x80100000, 0x80100000, 0x80100000, 0x80100000, 0x80100000    /* _iSub */
> +        .align 32
> +        .long 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff, 0xffdfffff    /* _iCmp */
> +        .align 32
> +        .type	__svml_dcbrt_data_internal,@object
> +        .size	__svml_dcbrt_data_internal,.-__svml_dcbrt_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S
> new file mode 100644
> index 0000000000..3831e582ce
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized cbrt, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_cbrt _ZGVeN8v_cbrt_avx2_wrapper
> +#include "../svml_d_cbrt8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c
> new file mode 100644
> index 0000000000..28c147216f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized cbrt, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_cbrt
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_cbrt, __GI__ZGVeN8v_cbrt, __redirect__ZGVeN8v_cbrt)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S
> new file mode 100644
> index 0000000000..b9c071b54c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cbrt8_core_avx512.S
> @@ -0,0 +1,253 @@
> +/* Function cbrt vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
> + *   Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
> + *   where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in double precision
> + *   cbrt(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
> + *   (T stores the high 53 bits, D stores the low order bits)
> + *   Result=2^k*T+(2^k*T*r)*P+2^k*D
> + *   where P=p1+p2*r+..+p8*r^7
> + *
> + */
> +
> +/* Offsets for data table __svml_dcbrt_data_internal_avx512
> + */
> +#define etbl_H                        	0
> +#define etbl_L                        	64
> +#define cbrt_tbl_H                    	128
> +#define BiasL                         	256
> +#define SZero                         	320
> +#define OneThird                      	384
> +#define Bias3                         	448
> +#define Three                         	512
> +#define One                           	576
> +#define poly_coeff10                  	640
> +#define poly_coeff9                   	704
> +#define poly_coeff8                   	768
> +#define poly_coeff7                   	832
> +#define poly_coeff6                   	896
> +#define poly_coeff5                   	960
> +#define poly_coeff4                   	1024
> +#define poly_coeff3                   	1088
> +#define poly_coeff2                   	1152
> +#define poly_coeff1                   	1216
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_cbrt_skx)
> +        vgetmantpd $0, {sae}, %zmm0, %zmm14
> +
> +/* GetExp(x) */
> +        vgetexppd {sae}, %zmm0, %zmm7
> +        vmovups   BiasL+__svml_dcbrt_data_internal_avx512(%rip), %zmm8
> +
> +/* exponent/3 */
> +        vmovups   OneThird+__svml_dcbrt_data_internal_avx512(%rip), %zmm9
> +        vmovups   Bias3+__svml_dcbrt_data_internal_avx512(%rip), %zmm10
> +
> +/* Reduced argument: R = DblRcp*Mantissa - 1 */
> +        vmovups   One+__svml_dcbrt_data_internal_avx512(%rip), %zmm2
> +
> +/* exponent%3 (to be used as index) */
> +        vmovups   Three+__svml_dcbrt_data_internal_avx512(%rip), %zmm11
> +
> +/* DblRcp ~ 1/Mantissa */
> +        vrcp14pd  %zmm14, %zmm13
> +        vaddpd    {rn-sae}, %zmm8, %zmm7, %zmm12
> +        vandpd    SZero+__svml_dcbrt_data_internal_avx512(%rip), %zmm0, %zmm6
> +
> +/* round DblRcp to 3 fractional bits (RN mode, no Precision exception) */
> +        vrndscalepd $72, {sae}, %zmm13, %zmm15
> +        vfmsub231pd {rn-sae}, %zmm12, %zmm9, %zmm10
> +
> +/* polynomial */
> +        vmovups   poly_coeff10+__svml_dcbrt_data_internal_avx512(%rip), %zmm0
> +        vmovups   poly_coeff8+__svml_dcbrt_data_internal_avx512(%rip), %zmm7
> +        vmovups   poly_coeff7+__svml_dcbrt_data_internal_avx512(%rip), %zmm9
> +        vfmsub231pd {rn-sae}, %zmm15, %zmm14, %zmm2
> +        vrndscalepd $9, {sae}, %zmm10, %zmm5
> +
> +/* Table lookup */
> +        vmovups   cbrt_tbl_H+__svml_dcbrt_data_internal_avx512(%rip), %zmm10
> +        vmovups   poly_coeff6+__svml_dcbrt_data_internal_avx512(%rip), %zmm8
> +        vmovups   poly_coeff3+__svml_dcbrt_data_internal_avx512(%rip), %zmm13
> +        vfmadd231pd {rn-sae}, %zmm2, %zmm7, %zmm9
> +        vfnmadd231pd {rn-sae}, %zmm5, %zmm11, %zmm12
> +        vmovups   poly_coeff5+__svml_dcbrt_data_internal_avx512(%rip), %zmm11
> +        vmovups   poly_coeff1+__svml_dcbrt_data_internal_avx512(%rip), %zmm14
> +
> +/* Prepare table index */
> +        vpsrlq    $49, %zmm15, %zmm1
> +
> +/* Table lookup: 2^(exponent%3) */
> +        vpermpd   __svml_dcbrt_data_internal_avx512(%rip), %zmm12, %zmm4
> +        vpermpd   etbl_L+__svml_dcbrt_data_internal_avx512(%rip), %zmm12, %zmm3
> +        vpermt2pd cbrt_tbl_H+64+__svml_dcbrt_data_internal_avx512(%rip), %zmm1, %zmm10
> +        vmovups   poly_coeff9+__svml_dcbrt_data_internal_avx512(%rip), %zmm1
> +        vfmadd231pd {rn-sae}, %zmm2, %zmm8, %zmm11
> +        vmovups   poly_coeff2+__svml_dcbrt_data_internal_avx512(%rip), %zmm12
> +        vscalefpd {rn-sae}, %zmm5, %zmm10, %zmm15
> +        vfmadd231pd {rn-sae}, %zmm2, %zmm0, %zmm1
> +        vmovups   poly_coeff4+__svml_dcbrt_data_internal_avx512(%rip), %zmm5
> +        vfmadd231pd {rn-sae}, %zmm2, %zmm12, %zmm14
> +        vmulpd    {rn-sae}, %zmm2, %zmm2, %zmm0
> +        vfmadd231pd {rn-sae}, %zmm2, %zmm5, %zmm13
> +
> +/* Sh*R */
> +        vmulpd    {rn-sae}, %zmm2, %zmm4, %zmm2
> +        vfmadd213pd {rn-sae}, %zmm9, %zmm0, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm11, %zmm0, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm0, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm1
> +
> +/* Sl + (Sh*R)*Poly */
> +        vfmadd213pd {rn-sae}, %zmm3, %zmm1, %zmm2
> +
> +/*
> + * branch-free
> + * scaled_Th*(Sh+Sl+Sh*R*Poly)
> + */
> +        vaddpd    {rn-sae}, %zmm4, %zmm2, %zmm3
> +        vmulpd    {rn-sae}, %zmm15, %zmm3, %zmm4
> +        vorpd     %zmm6, %zmm4, %zmm0
> +        ret
> +
> +END(_ZGVeN8v_cbrt_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dcbrt_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 etbl_H[8][2];
> +        __declspec(align(64)) VUINT32 etbl_L[8][2];
> +        __declspec(align(64)) VUINT32 cbrt_tbl_H[16][2];
> +        __declspec(align(64)) VUINT32 BiasL[8][2];
> +        __declspec(align(64)) VUINT32 SZero[8][2];
> +        __declspec(align(64)) VUINT32 OneThird[8][2];
> +        __declspec(align(64)) VUINT32 Bias3[8][2];
> +        __declspec(align(64)) VUINT32 Three[8][2];
> +        __declspec(align(64)) VUINT32 One[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff10[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
> +    } __svml_dcbrt_data_internal_avx512;
> +#endif
> +__svml_dcbrt_data_internal_avx512:
> +        /*== etbl_H ==*/
> +        .quad 0x3ff0000000000000
> +        .quad 0x3ff428a2f98d728b
> +        .quad 0x3ff965fea53d6e3d
> +        .quad 0x0000000000000000
> +        .quad 0xbff0000000000000
> +        .quad 0xbff428a2f98d728b
> +        .quad 0xbff965fea53d6e3d
> +        .quad 0x0000000000000000
> +        /*== etbl_L ==*/
> +        .align 64
> +        .quad 0x0000000000000000
> +        .quad 0xbc7ddc22548ea41e
> +        .quad 0xbc9f53e999952f09
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x3c7ddc22548ea41e
> +        .quad 0x3c9f53e999952f09
> +        .quad 0x0000000000000000
> +        /*== cbrt_tbl_H ==*/
> +        .align 64
> +        .quad 0x3ff428a2f98d728b
> +        .quad 0x3ff361f35ca116ff
> +        .quad 0x3ff2b6b5edf6b54a
> +        .quad 0x3ff220e6dd675180
> +        .quad 0x3ff19c3b38e975a8
> +        .quad 0x3ff12589c21fb842
> +        .quad 0x3ff0ba6ee5f9aad4
> +        .quad 0x3ff059123d3a9848
> +        .quad 0x3ff0000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        /*== BiasL ==*/
> +        .align 64
> +        .quad 0x4338000000000000, 0x4338000000000000, 0x4338000000000000, 0x4338000000000000, 0x4338000000000000, 0x4338000000000000, 0x4338000000000000, 0x4338000000000000
> +        /*== Zero ==*/
> +        .align 64
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
> +        /*== OneThird ==*/
> +        .align 64
> +        .quad 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556, 0x3fd5555555555556
> +        /*== Bias3 ==*/
> +        .align 64
> +        .quad 0x4320000000000000, 0x4320000000000000, 0x4320000000000000, 0x4320000000000000, 0x4320000000000000, 0x4320000000000000, 0x4320000000000000, 0x4320000000000000
> +        /*== Three ==*/
> +        .align 64
> +        .quad 0x4008000000000000, 0x4008000000000000, 0x4008000000000000, 0x4008000000000000, 0x4008000000000000, 0x4008000000000000, 0x4008000000000000, 0x4008000000000000
> +        /*==One ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== poly_coeff10 ==*/
> +        .align 64
> +        .quad 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62, 0xbf882e3b6adeca62
> +        /*== poly_coeff9 ==*/
> +        .align 64
> +        .quad 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875, 0x3f8bda24bae48875
> +        /*== poly_coeff8 ==*/
> +        .align 64
> +        .quad 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f, 0xbf9036b87c71d55f
> +        /*== poly_coeff7 ==*/
> +        .align 64
> +        .quad 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914, 0x3f9374ed9398b914
> +        /*== poly_coeff6 ==*/
> +        .align 64
> +        .quad 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e, 0xbf98090d77f2468e
> +        /*== poly_coeff5 ==*/
> +        .align 64
> +        .quad 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569, 0x3f9ee71141dcf569
> +        /*== poly_coeff4 ==*/
> +        .align 64
> +        .quad 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e, 0xbfa511e8d2b0363e
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .quad 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31, 0x3faf9add3c0b7e31
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .quad 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741, 0xbfbc71c71c71c741
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .quad 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557, 0x3fd5555555555557
> +        .align 64
> +        .type	__svml_dcbrt_data_internal_avx512,@object
> +        .size	__svml_dcbrt_data_internal_avx512,.-__svml_dcbrt_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S
> new file mode 100644
> index 0000000000..faa847fba6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized cbrtf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_cbrtf _ZGVeN16v_cbrtf_avx2_wrapper
> +#include "../svml_s_cbrtf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c
> new file mode 100644
> index 0000000000..785a68cc0d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized cbrtf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_cbrtf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_cbrtf, __GI__ZGVeN16v_cbrtf,
> +	       __redirect__ZGVeN16v_cbrtf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S
> new file mode 100644
> index 0000000000..55b017682b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf16_core_avx512.S
> @@ -0,0 +1,235 @@
> +/* Function cbrtf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *     x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
> + *     Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
> + *     where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in single precision
> + *     cbrtf(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
> + *     (T stores the high 24 bits, D stores the low order bits)
> + *     Result=2^k*T+(2^k*T*r)*P+2^k*D
> + *      where P=p1+p2*r+..
> + *
> + */
> +
> +/* Offsets for data table __svml_scbrt_data_internal_avx512
> + */
> +#define etbl_H                        	0
> +#define etbl_L                        	64
> +#define cbrt_tbl_H                    	128
> +#define BiasL                         	256
> +#define SZero                         	320
> +#define OneThird                      	384
> +#define Bias3                         	448
> +#define Three                         	512
> +#define One                           	576
> +#define poly_coeff3                   	640
> +#define poly_coeff2                   	704
> +#define poly_coeff1                   	768
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_cbrtf_skx)
> +        vgetmantps $0, {sae}, %zmm0, %zmm8
> +
> +/* GetExp(x) */
> +        vgetexpps {sae}, %zmm0, %zmm1
> +        vmovups   BiasL+__svml_scbrt_data_internal_avx512(%rip), %zmm2
> +
> +/* exponent/3 */
> +        vmovups   OneThird+__svml_scbrt_data_internal_avx512(%rip), %zmm3
> +        vmovups   Bias3+__svml_scbrt_data_internal_avx512(%rip), %zmm4
> +        vmovups   One+__svml_scbrt_data_internal_avx512(%rip), %zmm15
> +
> +/* exponent%3 (to be used as index) */
> +        vmovups   Three+__svml_scbrt_data_internal_avx512(%rip), %zmm5
> +
> +/* polynomial */
> +        vmovups   poly_coeff3+__svml_scbrt_data_internal_avx512(%rip), %zmm11
> +        vmovups   poly_coeff1+__svml_scbrt_data_internal_avx512(%rip), %zmm14
> +
> +/* Table lookup */
> +        vmovups   cbrt_tbl_H+__svml_scbrt_data_internal_avx512(%rip), %zmm12
> +
> +/* DblRcp ~ 1/Mantissa */
> +        vrcp14ps  %zmm8, %zmm7
> +        vaddps    {rn-sae}, %zmm2, %zmm1, %zmm6
> +        vandps    SZero+__svml_scbrt_data_internal_avx512(%rip), %zmm0, %zmm0
> +
> +/* round DblRcp to 3 fractional bits (RN mode, no Precision exception) */
> +        vrndscaleps $88, {sae}, %zmm7, %zmm9
> +        vfmsub231ps {rn-sae}, %zmm6, %zmm3, %zmm4
> +        vmovups   poly_coeff2+__svml_scbrt_data_internal_avx512(%rip), %zmm7
> +
> +/* Reduced argument: R = DblRcp*Mantissa - 1 */
> +        vfmsub231ps {rn-sae}, %zmm9, %zmm8, %zmm15
> +        vrndscaleps $9, {sae}, %zmm4, %zmm13
> +
> +/* Prepare table index */
> +        vpsrld    $19, %zmm9, %zmm10
> +        vfmadd231ps {rn-sae}, %zmm15, %zmm11, %zmm7
> +        vfnmadd231ps {rn-sae}, %zmm13, %zmm5, %zmm6
> +        vpermt2ps cbrt_tbl_H+64+__svml_scbrt_data_internal_avx512(%rip), %zmm10, %zmm12
> +        vfmadd213ps {rn-sae}, %zmm14, %zmm15, %zmm7
> +        vscalefps {rn-sae}, %zmm13, %zmm12, %zmm2
> +
> +/* Table lookup: 2^(exponent%3) */
> +        vpermps   __svml_scbrt_data_internal_avx512(%rip), %zmm6, %zmm1
> +        vpermps   etbl_L+__svml_scbrt_data_internal_avx512(%rip), %zmm6, %zmm6
> +
> +/* Sh*R */
> +        vmulps    {rn-sae}, %zmm15, %zmm1, %zmm14
> +
> +/* Sl + (Sh*R)*Poly */
> +        vfmadd213ps {rn-sae}, %zmm6, %zmm7, %zmm14
> +
> +/*
> + * branch-free
> + * scaled_Th*(Sh+Sl+Sh*R*Poly)
> + */
> +        vaddps    {rn-sae}, %zmm1, %zmm14, %zmm15
> +        vmulps    {rn-sae}, %zmm2, %zmm15, %zmm3
> +        vorps     %zmm0, %zmm3, %zmm0
> +        ret
> +
> +END(_ZGVeN16v_cbrtf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_scbrt_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 etbl_H[16][1];
> +        __declspec(align(64)) VUINT32 etbl_L[16][1];
> +        __declspec(align(64)) VUINT32 cbrt_tbl_H[32][1];
> +        __declspec(align(64)) VUINT32 BiasL[16][1];
> +        __declspec(align(64)) VUINT32 SZero[16][1];
> +        __declspec(align(64)) VUINT32 OneThird[16][1];
> +        __declspec(align(64)) VUINT32 Bias3[16][1];
> +        __declspec(align(64)) VUINT32 Three[16][1];
> +        __declspec(align(64)) VUINT32 One[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
> +    } __svml_scbrt_data_internal_avx512;
> +#endif
> +__svml_scbrt_data_internal_avx512:
> +        /*== etbl_H ==*/
> +        .long 0x3f800000
> +        .long 0x3fa14518
> +        .long 0x3fcb2ff5
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        /*== etbl_L ==*/
> +        .align 64
> +        .long 0x00000000
> +        .long 0xb2ce51af
> +        .long 0x32a7adc8
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        /*== cbrt_tbl_H ==*/
> +        .align 64
> +        .long 0x3fa14518
> +        .long 0x3f9e0b2b
> +        .long 0x3f9b0f9b
> +        .long 0x3f984a9a
> +        .long 0x3f95b5af
> +        .long 0x3f934b6c
> +        .long 0x3f910737
> +        .long 0x3f8ee526
> +        .long 0x3f8ce1da
> +        .long 0x3f8afa6a
> +        .long 0x3f892c4e
> +        .long 0x3f87754e
> +        .long 0x3f85d377
> +        .long 0x3f844510
> +        .long 0x3f82c892
> +        .long 0x3f815c9f
> +        .long 0x3f800000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        .long 0x00000000
> +        /*== BiasL ==*/
> +        .align 64
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000
> +        /*== Zero ==*/
> +        .align 64
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000
> +        /*== OneThird ==*/
> +        .align 64
> +        .long 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab, 0x3eaaaaab
> +        /*== Bias3 ==*/
> +        .align 64
> +        .long 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000, 0x4a800000
> +        /*== Three ==*/
> +        .align 64
> +        .long 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000, 0x40400000
> +        /*==One ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .long 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c, 0x3d7d057c
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .long 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363, 0xbde3a363
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .long 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa, 0x3eaaaaaa
> +        .align 64
> +        .type	__svml_scbrt_data_internal_avx512,@object
> +        .size	__svml_scbrt_data_internal_avx512,.-__svml_scbrt_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S
> new file mode 100644
> index 0000000000..76fc254e7a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized cbrtf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_cbrtf _ZGVbN4v_cbrtf_sse2
> +#include "../svml_s_cbrtf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c
> new file mode 100644
> index 0000000000..564a549b39
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized cbrtf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_cbrtf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_cbrtf, __GI__ZGVbN4v_cbrtf,
> +	       __redirect__ZGVbN4v_cbrtf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S
> new file mode 100644
> index 0000000000..af42dd5164
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf4_core_sse4.S
> @@ -0,0 +1,490 @@
> +/* Function cbrtf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *     x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
> + *     Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
> + *     where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in single precision
> + *     cbrtf(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
> + *     (T stores the high 24 bits, D stores the low order bits)
> + *     Result=2^k*T+(2^k*T*r)*P+2^k*D
> + *      where P=p1+p2*r+..
> + *
> + */
> +
> +/* Offsets for data table __svml_scbrt_data_internal
> + */
> +#define _sRcp                         	0
> +#define _sCbrtHL                      	128
> +#define _sP2                          	512
> +#define _sP1                          	528
> +#define _sMantissaMask                	544
> +#define _sMantissaMask1               	560
> +#define _sExpMask                     	576
> +#define _sExpMask1                    	592
> +#define _iRcpIndexMask                	608
> +#define _iBExpMask                    	624
> +#define _iSignMask                    	640
> +#define _iBias                        	656
> +#define _iOne                         	672
> +#define _i555                         	688
> +#define _iAbsMask                     	704
> +#define _iSubConst                    	720
> +#define _iCmpConst                    	736
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_cbrtf_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +
> +/*
> + * Load constants
> + * Reciprocal index calculation
> + */
> +        movaps    %xmm0, %xmm2
> +        movdqu    _iRcpIndexMask+__svml_scbrt_data_internal(%rip), %xmm3
> +        psrld     $16, %xmm2
> +        pand      %xmm2, %xmm3
> +
> +/* Load reciprocal value */
> +        lea       __svml_scbrt_data_internal(%rip), %rdx
> +        pshufd    $1, %xmm3, %xmm5
> +
> +/* Get signed biased exponent */
> +        psrld     $7, %xmm2
> +        movd      %xmm3, %eax
> +        movd      %xmm5, %ecx
> +
> +/* Get absolute biased exponent */
> +        movdqu    _iBExpMask+__svml_scbrt_data_internal(%rip), %xmm15
> +
> +/*
> + * Calculate exponent/3
> + * i555Exp=(2^{12}-1)/3*exponent
> + */
> +        movdqu    _i555+__svml_scbrt_data_internal(%rip), %xmm14
> +        pand      %xmm2, %xmm15
> +        movslq    %eax, %rax
> +        movdqa    %xmm14, %xmm5
> +        movslq    %ecx, %rcx
> +        psrlq     $32, %xmm14
> +        pmuludq   %xmm15, %xmm5
> +        movd      (%rdx,%rax), %xmm4
> +        movd      (%rdx,%rcx), %xmm6
> +        punpckldq %xmm6, %xmm4
> +        movdqa    %xmm15, %xmm6
> +        psrlq     $32, %xmm15
> +        pmuludq   %xmm14, %xmm15
> +        pshufd    $2, %xmm3, %xmm7
> +        psllq     $32, %xmm15
> +        pshufd    $3, %xmm3, %xmm8
> +        movd      %xmm7, %esi
> +        movd      %xmm8, %edi
> +
> +/* Argument reduction */
> +        movups    _sMantissaMask+__svml_scbrt_data_internal(%rip), %xmm12
> +        movups    _sMantissaMask1+__svml_scbrt_data_internal(%rip), %xmm11
> +        andps     %xmm0, %xmm12
> +        pand      .FLT_17(%rip), %xmm5
> +        andps     %xmm0, %xmm11
> +        movslq    %esi, %rsi
> +        por       %xmm15, %xmm5
> +        movslq    %edi, %rdi
> +
> +/* Get K (exponent=3*k+j) */
> +        psrld     $12, %xmm5
> +        orps      _sExpMask+__svml_scbrt_data_internal(%rip), %xmm12
> +        orps      _sExpMask1+__svml_scbrt_data_internal(%rip), %xmm11
> +        psubd     _iOne+__svml_scbrt_data_internal(%rip), %xmm6
> +
> +/* r=y-y` */
> +        subps     %xmm11, %xmm12
> +
> +/* Get J */
> +        psubd     %xmm5, %xmm6
> +        movdqu    _iAbsMask+__svml_scbrt_data_internal(%rip), %xmm1
> +        psubd     %xmm5, %xmm6
> +        movd      (%rdx,%rsi), %xmm10
> +        pand      %xmm0, %xmm1
> +        movd      (%rdx,%rdi), %xmm9
> +        psubd     %xmm5, %xmm6
> +        punpckldq %xmm9, %xmm10
> +
> +/* Get 128*J */
> +        pslld     $7, %xmm6
> +        punpcklqdq %xmm10, %xmm4
> +
> +/*
> + * iCbrtIndex=4*l+128*j
> + * Zero index if callout expected
> + */
> +        paddd     %xmm6, %xmm3
> +        psubd     _iSubConst+__svml_scbrt_data_internal(%rip), %xmm1
> +        pcmpgtd   _iCmpConst+__svml_scbrt_data_internal(%rip), %xmm1
> +
> +/* r=(y-y`)*rcp_table(y`) */
> +        mulps     %xmm12, %xmm4
> +        movmskps  %xmm1, %eax
> +
> +/* Biased exponent-1 */
> +        movdqu    _iSignMask+__svml_scbrt_data_internal(%rip), %xmm13
> +        pandn     %xmm3, %xmm1
> +
> +/*
> + * Add 2/3*(bias-1)+1 to (k+1/3*(bias-1))
> + * Attach sign to exponent
> + */
> +        movdqu    _iBias+__svml_scbrt_data_internal(%rip), %xmm12
> +        pand      %xmm13, %xmm2
> +        paddd     %xmm5, %xmm12
> +
> +/* Load Cbrt table Hi & Lo values */
> +        movd      %xmm1, %r8d
> +        por       %xmm2, %xmm12
> +        pshufd    $1, %xmm1, %xmm2
> +        pslld     $23, %xmm12
> +        pshufd    $2, %xmm1, %xmm7
> +        pshufd    $3, %xmm1, %xmm1
> +        movd      %xmm2, %r9d
> +        movd      %xmm7, %r10d
> +        movd      %xmm1, %r11d
> +
> +/* Polynomial:    p1+r*(p2*r+r*(p3+r*p4)) */
> +        movups    _sP2+__svml_scbrt_data_internal(%rip), %xmm11
> +        mulps     %xmm4, %xmm11
> +        movslq    %r8d, %r8
> +        addps     _sP1+__svml_scbrt_data_internal(%rip), %xmm11
> +        movslq    %r9d, %r9
> +        movslq    %r10d, %r10
> +        movslq    %r11d, %r11
> +        movd      128(%rdx,%r8), %xmm10
> +        movd      128(%rdx,%r9), %xmm3
> +        movd      128(%rdx,%r10), %xmm9
> +        movd      128(%rdx,%r11), %xmm8
> +        punpckldq %xmm3, %xmm10
> +        punpckldq %xmm8, %xmm9
> +        punpcklqdq %xmm9, %xmm10
> +
> +/* sCbrtHi *= 2^k */
> +        mulps     %xmm10, %xmm12
> +
> +/* T`*r */
> +        mulps     %xmm12, %xmm4
> +
> +/* (T`*r)*P */
> +        mulps     %xmm4, %xmm11
> +
> +/*
> + * T`*r*P+D`
> + * result = T`+(T`*r*P+D`)
> + */
> +        addps     %xmm11, %xmm12
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm12
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm12, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm12, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax
> +
> +        xorl      %edx, %edx
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm12
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm12
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      cbrtf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_cbrtf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_scbrt_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _sRcp[32][1];
> +        __declspec(align(16)) VUINT32 _sCbrtHL[96][1];
> +        __declspec(align(16)) VUINT32 _sP2[4][1];
> +        __declspec(align(16)) VUINT32 _sP1[4][1];
> +        __declspec(align(16)) VUINT32 _sMantissaMask[4][1];
> +        __declspec(align(16)) VUINT32 _sMantissaMask1[4][1];
> +        __declspec(align(16)) VUINT32 _sExpMask[4][1];
> +        __declspec(align(16)) VUINT32 _sExpMask1[4][1];
> +        __declspec(align(16)) VUINT32 _iRcpIndexMask[4][1];
> +        __declspec(align(16)) VUINT32 _iBExpMask[4][1];
> +        __declspec(align(16)) VUINT32 _iSignMask[4][1];
> +        __declspec(align(16)) VUINT32 _iBias[4][1];
> +        __declspec(align(16)) VUINT32 _iOne[4][1];
> +        __declspec(align(16)) VUINT32 _i555[4][1];
> +        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iSubConst[4][1];
> +        __declspec(align(16)) VUINT32 _iCmpConst[4][1];
> +} __svml_scbrt_data_internal;
> +#endif
> +__svml_scbrt_data_internal:
> +        /*== _sRcp ==*/
> +        .long 0xBF7C0FC1  /* (1/(1+0/32+1/64)) = -.984615 */
> +        .long 0xBF74898D  /* (1/(1+1/32+1/64)) = -.955224 */
> +        .long 0xBF6D7304  /* (1/(1+2/32+1/64)) = -.927536 */
> +        .long 0xBF66C2B4  /* (1/(1+3/32+1/64)) = -.901408 */
> +        .long 0xBF607038  /* (1/(1+4/32+1/64)) = -.876712 */
> +        .long 0xBF5A740E  /* (1/(1+5/32+1/64)) = -.853333 */
> +        .long 0xBF54C77B  /* (1/(1+6/32+1/64)) = -.831169 */
> +        .long 0xBF4F6475  /* (1/(1+7/32+1/64)) = -.810127 */
> +        .long 0xBF4A4588  /* (1/(1+8/32+1/64)) = -.790123 */
> +        .long 0xBF4565C8  /* (1/(1+9/32+1/64)) = -.771084 */
> +        .long 0xBF40C0C1  /* (1/(1+10/32+1/64)) = -.752941 */
> +        .long 0xBF3C5264  /* (1/(1+11/32+1/64)) = -.735632 */
> +        .long 0xBF381703  /* (1/(1+12/32+1/64)) = -.719101 */
> +        .long 0xBF340B41  /* (1/(1+13/32+1/64)) = -.703297 */
> +        .long 0xBF302C0B  /* (1/(1+14/32+1/64)) = -.688172 */
> +        .long 0xBF2C7692  /* (1/(1+15/32+1/64)) = -.673684 */
> +        .long 0xBF28E83F  /* (1/(1+16/32+1/64)) = -.659794 */
> +        .long 0xBF257EB5  /* (1/(1+17/32+1/64)) = -.646465 */
> +        .long 0xBF2237C3  /* (1/(1+18/32+1/64)) = -.633663 */
> +        .long 0xBF1F1166  /* (1/(1+19/32+1/64)) = -.621359 */
> +        .long 0xBF1C09C1  /* (1/(1+20/32+1/64)) = -.609524 */
> +        .long 0xBF191F1A  /* (1/(1+21/32+1/64)) = -.598131 */
> +        .long 0xBF164FDA  /* (1/(1+22/32+1/64)) = -.587156 */
> +        .long 0xBF139A86  /* (1/(1+23/32+1/64)) = -.576577 */
> +        .long 0xBF10FDBC  /* (1/(1+24/32+1/64)) = -.566372 */
> +        .long 0xBF0E7835  /* (1/(1+25/32+1/64)) = -.556522 */
> +        .long 0xBF0C08C1  /* (1/(1+26/32+1/64)) = -.547009 */
> +        .long 0xBF09AE41  /* (1/(1+27/32+1/64)) = -.537815 */
> +        .long 0xBF0767AB  /* (1/(1+28/32+1/64)) = -.528926 */
> +        .long 0xBF053408  /* (1/(1+29/32+1/64)) = -.520325 */
> +        .long 0xBF03126F  /* (1/(1+30/32+1/64)) = -.512    */
> +        .long 0xBF010204  /* (1/(1+31/32+1/64)) = -.503937 */
> +        /*== _sCbrtHL ==*/
> +        .align 16
> +        .long 0x3F80A9C9    /* HI((2^0*(1+0/32+1/64))^(1/3)) = 1.005181 */
> +        .long 0x3F81F833    /* HI((2^0*(1+1/32+1/64))^(1/3)) = 1.015387 */
> +        .long 0x3F834007    /* HI((2^0*(1+2/32+1/64))^(1/3)) = 1.025391 */
> +        .long 0x3F848194    /* HI((2^0*(1+3/32+1/64))^(1/3)) = 1.035204 */
> +        .long 0x3F85BD25    /* HI((2^0*(1+4/32+1/64))^(1/3)) = 1.044835 */
> +        .long 0x3F86F300    /* HI((2^0*(1+5/32+1/64))^(1/3)) = 1.054291 */
> +        .long 0x3F882365    /* HI((2^0*(1+6/32+1/64))^(1/3)) = 1.06358  */
> +        .long 0x3F894E90    /* HI((2^0*(1+7/32+1/64))^(1/3)) = 1.07271  */
> +        .long 0x3F8A74B9    /* HI((2^0*(1+8/32+1/64))^(1/3)) = 1.081687 */
> +        .long 0x3F8B9615    /* HI((2^0*(1+9/32+1/64))^(1/3)) = 1.090518 */
> +        .long 0x3F8CB2D4    /* HI((2^0*(1+10/32+1/64))^(1/3)) = 1.099207 */
> +        .long 0x3F8DCB24    /* HI((2^0*(1+11/32+1/64))^(1/3)) = 1.107762 */
> +        .long 0x3F8EDF31    /* HI((2^0*(1+12/32+1/64))^(1/3)) = 1.116186 */
> +        .long 0x3F8FEF22    /* HI((2^0*(1+13/32+1/64))^(1/3)) = 1.124485 */
> +        .long 0x3F90FB1F    /* HI((2^0*(1+14/32+1/64))^(1/3)) = 1.132664 */
> +        .long 0x3F92034C    /* HI((2^0*(1+15/32+1/64))^(1/3)) = 1.140726 */
> +        .long 0x3F9307CA    /* HI((2^0*(1+16/32+1/64))^(1/3)) = 1.148675 */
> +        .long 0x3F9408B9    /* HI((2^0*(1+17/32+1/64))^(1/3)) = 1.156516 */
> +        .long 0x3F950638    /* HI((2^0*(1+18/32+1/64))^(1/3)) = 1.164252 */
> +        .long 0x3F960064    /* HI((2^0*(1+19/32+1/64))^(1/3)) = 1.171887 */
> +        .long 0x3F96F759    /* HI((2^0*(1+20/32+1/64))^(1/3)) = 1.179423 */
> +        .long 0x3F97EB2F    /* HI((2^0*(1+21/32+1/64))^(1/3)) = 1.186865 */
> +        .long 0x3F98DC01    /* HI((2^0*(1+22/32+1/64))^(1/3)) = 1.194214 */
> +        .long 0x3F99C9E5    /* HI((2^0*(1+23/32+1/64))^(1/3)) = 1.201474 */
> +        .long 0x3F9AB4F2    /* HI((2^0*(1+24/32+1/64))^(1/3)) = 1.208647 */
> +        .long 0x3F9B9D3D    /* HI((2^0*(1+25/32+1/64))^(1/3)) = 1.215736 */
> +        .long 0x3F9C82DA    /* HI((2^0*(1+26/32+1/64))^(1/3)) = 1.222743 */
> +        .long 0x3F9D65DD    /* HI((2^0*(1+27/32+1/64))^(1/3)) = 1.229671 */
> +        .long 0x3F9E4659    /* HI((2^0*(1+28/32+1/64))^(1/3)) = 1.236522 */
> +        .long 0x3F9F245F    /* HI((2^0*(1+29/32+1/64))^(1/3)) = 1.243297 */
> +        .long 0x3FA00000    /* HI((2^0*(1+30/32+1/64))^(1/3)) = 1.25     */
> +        .long 0x3FA0D94C    /* HI((2^0*(1+31/32+1/64))^(1/3)) = 1.256631 */
> +        .long 0x3FA21B02    /* HI((2^1*(1+0/32+1/64))^(1/3)) = 1.266449 */
> +        .long 0x3FA3C059    /* HI((2^1*(1+1/32+1/64))^(1/3)) = 1.279307 */
> +        .long 0x3FA55D61    /* HI((2^1*(1+2/32+1/64))^(1/3)) = 1.291912 */
> +        .long 0x3FA6F282    /* HI((2^1*(1+3/32+1/64))^(1/3)) = 1.304276 */
> +        .long 0x3FA8801A    /* HI((2^1*(1+4/32+1/64))^(1/3)) = 1.316409 */
> +        .long 0x3FAA067E    /* HI((2^1*(1+5/32+1/64))^(1/3)) = 1.328323 */
> +        .long 0x3FAB8602    /* HI((2^1*(1+6/32+1/64))^(1/3)) = 1.340027 */
> +        .long 0x3FACFEEF    /* HI((2^1*(1+7/32+1/64))^(1/3)) = 1.35153  */
> +        .long 0x3FAE718E    /* HI((2^1*(1+8/32+1/64))^(1/3)) = 1.36284  */
> +        .long 0x3FAFDE1F    /* HI((2^1*(1+9/32+1/64))^(1/3)) = 1.373966 */
> +        .long 0x3FB144E1    /* HI((2^1*(1+10/32+1/64))^(1/3)) = 1.384915 */
> +        .long 0x3FB2A60D    /* HI((2^1*(1+11/32+1/64))^(1/3)) = 1.395692 */
> +        .long 0x3FB401DA    /* HI((2^1*(1+12/32+1/64))^(1/3)) = 1.406307 */
> +        .long 0x3FB5587B    /* HI((2^1*(1+13/32+1/64))^(1/3)) = 1.416763 */
> +        .long 0x3FB6AA20    /* HI((2^1*(1+14/32+1/64))^(1/3)) = 1.427067 */
> +        .long 0x3FB7F6F7    /* HI((2^1*(1+15/32+1/64))^(1/3)) = 1.437224 */
> +        .long 0x3FB93F29    /* HI((2^1*(1+16/32+1/64))^(1/3)) = 1.44724  */
> +        .long 0x3FBA82E1    /* HI((2^1*(1+17/32+1/64))^(1/3)) = 1.457119 */
> +        .long 0x3FBBC244    /* HI((2^1*(1+18/32+1/64))^(1/3)) = 1.466866 */
> +        .long 0x3FBCFD77    /* HI((2^1*(1+19/32+1/64))^(1/3)) = 1.476485 */
> +        .long 0x3FBE349B    /* HI((2^1*(1+20/32+1/64))^(1/3)) = 1.48598  */
> +        .long 0x3FBF67D3    /* HI((2^1*(1+21/32+1/64))^(1/3)) = 1.495356 */
> +        .long 0x3FC0973C    /* HI((2^1*(1+22/32+1/64))^(1/3)) = 1.504615 */
> +        .long 0x3FC1C2F6    /* HI((2^1*(1+23/32+1/64))^(1/3)) = 1.513762 */
> +        .long 0x3FC2EB1A    /* HI((2^1*(1+24/32+1/64))^(1/3)) = 1.5228   */
> +        .long 0x3FC40FC6    /* HI((2^1*(1+25/32+1/64))^(1/3)) = 1.531731 */
> +        .long 0x3FC53112    /* HI((2^1*(1+26/32+1/64))^(1/3)) = 1.54056  */
> +        .long 0x3FC64F16    /* HI((2^1*(1+27/32+1/64))^(1/3)) = 1.549289 */
> +        .long 0x3FC769EB    /* HI((2^1*(1+28/32+1/64))^(1/3)) = 1.55792  */
> +        .long 0x3FC881A6    /* HI((2^1*(1+29/32+1/64))^(1/3)) = 1.566457 */
> +        .long 0x3FC9965D    /* HI((2^1*(1+30/32+1/64))^(1/3)) = 1.574901 */
> +        .long 0x3FCAA825    /* HI((2^1*(1+31/32+1/64))^(1/3)) = 1.583256 */
> +        .long 0x3FCC3D79    /* HI((2^2*(1+0/32+1/64))^(1/3)) = 1.595626 */
> +        .long 0x3FCE5054    /* HI((2^2*(1+1/32+1/64))^(1/3)) = 1.611826 */
> +        .long 0x3FD058B8    /* HI((2^2*(1+2/32+1/64))^(1/3)) = 1.627707 */
> +        .long 0x3FD25726    /* HI((2^2*(1+3/32+1/64))^(1/3)) = 1.643285 */
> +        .long 0x3FD44C15    /* HI((2^2*(1+4/32+1/64))^(1/3)) = 1.658572 */
> +        .long 0x3FD637F2    /* HI((2^2*(1+5/32+1/64))^(1/3)) = 1.673582 */
> +        .long 0x3FD81B24    /* HI((2^2*(1+6/32+1/64))^(1/3)) = 1.688328 */
> +        .long 0x3FD9F60B    /* HI((2^2*(1+7/32+1/64))^(1/3)) = 1.702821 */
> +        .long 0x3FDBC8FE    /* HI((2^2*(1+8/32+1/64))^(1/3)) = 1.717071 */
> +        .long 0x3FDD9452    /* HI((2^2*(1+9/32+1/64))^(1/3)) = 1.731089 */
> +        .long 0x3FDF5853    /* HI((2^2*(1+10/32+1/64))^(1/3)) = 1.744883 */
> +        .long 0x3FE1154B    /* HI((2^2*(1+11/32+1/64))^(1/3)) = 1.758462 */
> +        .long 0x3FE2CB7F    /* HI((2^2*(1+12/32+1/64))^(1/3)) = 1.771835 */
> +        .long 0x3FE47B2E    /* HI((2^2*(1+13/32+1/64))^(1/3)) = 1.785009 */
> +        .long 0x3FE62496    /* HI((2^2*(1+14/32+1/64))^(1/3)) = 1.797992 */
> +        .long 0x3FE7C7F0    /* HI((2^2*(1+15/32+1/64))^(1/3)) = 1.810789 */
> +        .long 0x3FE96571    /* HI((2^2*(1+16/32+1/64))^(1/3)) = 1.823408 */
> +        .long 0x3FEAFD4C    /* HI((2^2*(1+17/32+1/64))^(1/3)) = 1.835855 */
> +        .long 0x3FEC8FB3    /* HI((2^2*(1+18/32+1/64))^(1/3)) = 1.848135 */
> +        .long 0x3FEE1CD3    /* HI((2^2*(1+19/32+1/64))^(1/3)) = 1.860255 */
> +        .long 0x3FEFA4D7    /* HI((2^2*(1+20/32+1/64))^(1/3)) = 1.872218 */
> +        .long 0x3FF127E9    /* HI((2^2*(1+21/32+1/64))^(1/3)) = 1.88403  */
> +        .long 0x3FF2A62F    /* HI((2^2*(1+22/32+1/64))^(1/3)) = 1.895697 */
> +        .long 0x3FF41FD0    /* HI((2^2*(1+23/32+1/64))^(1/3)) = 1.907221 */
> +        .long 0x3FF594EE    /* HI((2^2*(1+24/32+1/64))^(1/3)) = 1.918607 */
> +        .long 0x3FF705AC    /* HI((2^2*(1+25/32+1/64))^(1/3)) = 1.929861 */
> +        .long 0x3FF8722A    /* HI((2^2*(1+26/32+1/64))^(1/3)) = 1.940984 */
> +        .long 0x3FF9DA86    /* HI((2^2*(1+27/32+1/64))^(1/3)) = 1.951981 */
> +        .long 0x3FFB3EDE    /* HI((2^2*(1+28/32+1/64))^(1/3)) = 1.962856 */
> +        .long 0x3FFC9F4E    /* HI((2^2*(1+29/32+1/64))^(1/3)) = 1.973612 */
> +        .long 0x3FFDFBF2    /* HI((2^2*(1+30/32+1/64))^(1/3)) = 1.984251 */
> +        .long 0x3FFF54E3    /* HI((2^2*(1+31/32+1/64))^(1/3)) = 1.994778 */
> +        .align 16
> +        .long 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962  /* _sP2 */
> +        .align 16
> +        .long 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91  /* _sP1 */
> +        .align 16
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff  /* _sMantissaMask (EXP_MSK3) */
> +        .align 16
> +        .long 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000  /* _sMantissaMask1 (SIG_MASK) */
> +        .align 16
> +        .long 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000  /* _sExpMask  (EXP_MASK) */
> +        .align 16
> +        .long 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000  /* _sExpMask1 (EXP_MASK2) */
> +        .align 16
> +        .long 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c  /* _iRcpIndexMask */
> +        .align 16
> +        .long 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff  /* _iBExpMask */
> +        .align 16
> +        .long 0x00000100, 0x00000100, 0x00000100, 0x00000100  /* _iSignMask */
> +        .align 16
> +        .long 0x00000055, 0x00000055, 0x00000055, 0x00000055  /* _iBias */
> +        .align 16
> +        .long 0x00000001, 0x00000001, 0x00000001, 0x00000001  /* _iOne */
> +        .align 16
> +        .long 0x00000555, 0x00000555, 0x00000555, 0x00000555  /* _i555 */
> +        .align 16
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _iAbsMask */
> +        .align 16
> +        .long 0x80800000, 0x80800000, 0x80800000, 0x80800000  /* _iSubConst */
> +        .align 16
> +        .long 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF  /* _iCmpConst */
> +        .align 16
> +        .type	__svml_scbrt_data_internal,@object
> +        .size	__svml_scbrt_data_internal,.-__svml_scbrt_data_internal
> +        .align 16
> +
> +.FLT_17:
> +        .long	0xffffffff,0x00000000,0xffffffff,0x00000000
> +        .type	.FLT_17,@object
> +        .size	.FLT_17,16
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S
> new file mode 100644
> index 0000000000..8eaa457fa6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized cbrtf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_cbrtf _ZGVdN8v_cbrtf_sse_wrapper
> +#include "../svml_s_cbrtf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c
> new file mode 100644
> index 0000000000..089d28461f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized cbrtf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_cbrtf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_cbrtf, __GI__ZGVdN8v_cbrtf,
> +	       __redirect__ZGVdN8v_cbrtf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S
> new file mode 100644
> index 0000000000..acd20d9db8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_cbrtf8_core_avx2.S
> @@ -0,0 +1,509 @@
> +/* Function cbrtf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *     x=2^{3*k+j} * 1.b1 b2 ... b5 b6 ... b52
> + *     Let r=(x*2^{-3k-j} - 1.b1 b2 ... b5 1)* rcp[b1 b2 ..b5],
> + *     where rcp[b1 b2 .. b5]=1/(1.b1 b2 b3 b4 b5 1) in single precision
> + *     cbrtf(2^j * 1. b1 b2 .. b5 1) is approximated as T[j][b1..b5]+D[j][b1..b5]
> + *     (T stores the high 24 bits, D stores the low order bits)
> + *     Result=2^k*T+(2^k*T*r)*P+2^k*D
> + *      where P=p1+p2*r+..
> + *
> + */
> +
> +/* Offsets for data table __svml_scbrt_data_internal
> + */
> +#define _sRcp                         	0
> +#define _sCbrtHL                      	128
> +#define _sP2                          	512
> +#define _sP1                          	544
> +#define _sMantissaMask                	576
> +#define _sMantissaMask1               	608
> +#define _sExpMask                     	640
> +#define _sExpMask1                    	672
> +#define _iRcpIndexMask                	704
> +#define _iBExpMask                    	736
> +#define _iSignMask                    	768
> +#define _iBias                        	800
> +#define _iOne                         	832
> +#define _i555                         	864
> +#define _iAbsMask                     	896
> +#define _iSubConst                    	928
> +#define _iCmpConst                    	960
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_cbrtf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +
> +/* Load reciprocal value */
> +        lea       __svml_scbrt_data_internal(%rip), %rdx
> +        vmovaps   %ymm0, %ymm5
> +
> +/*
> + * Load constants
> + * Reciprocal index calculation
> + */
> +        vpsrld    $16, %ymm5, %ymm3
> +        vpand     _iRcpIndexMask+__svml_scbrt_data_internal(%rip), %ymm3, %ymm4
> +        vextractf128 $1, %ymm4, %xmm15
> +        vmovd     %xmm4, %eax
> +        vmovd     %xmm15, %r8d
> +        vpextrd   $1, %xmm15, %r9d
> +        vpextrd   $2, %xmm15, %r10d
> +        vpextrd   $3, %xmm15, %r11d
> +        movslq    %r8d, %r8
> +        movslq    %r9d, %r9
> +        movslq    %r10d, %r10
> +        movslq    %r11d, %r11
> +        vpextrd   $1, %xmm4, %ecx
> +        vpextrd   $2, %xmm4, %esi
> +        vpextrd   $3, %xmm4, %edi
> +        movslq    %eax, %rax
> +        movslq    %ecx, %rcx
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        vmovd     (%rdx,%r8), %xmm13
> +        vmovd     (%rdx,%r9), %xmm14
> +        vmovd     (%rdx,%r10), %xmm1
> +        vmovd     (%rdx,%r11), %xmm0
> +        vpunpckldq %xmm14, %xmm13, %xmm2
> +        vpunpckldq %xmm0, %xmm1, %xmm13
> +
> +/* Get signed biased exponent */
> +        vpsrld    $7, %ymm3, %ymm0
> +        vmovd     (%rdx,%rax), %xmm6
> +        vmovd     (%rdx,%rcx), %xmm7
> +        vmovd     (%rdx,%rsi), %xmm8
> +        vmovd     (%rdx,%rdi), %xmm9
> +        vpunpckldq %xmm7, %xmm6, %xmm10
> +        vpunpckldq %xmm9, %xmm8, %xmm11
> +        vpunpcklqdq %xmm11, %xmm10, %xmm12
> +        vpunpcklqdq %xmm13, %xmm2, %xmm6
> +        vandps    _iAbsMask+__svml_scbrt_data_internal(%rip), %ymm5, %ymm3
> +
> +/* Argument reduction */
> +        vandps    _sMantissaMask+__svml_scbrt_data_internal(%rip), %ymm5, %ymm8
> +        vandps    _sMantissaMask1+__svml_scbrt_data_internal(%rip), %ymm5, %ymm9
> +        vpsubd    _iSubConst+__svml_scbrt_data_internal(%rip), %ymm3, %ymm7
> +        vorps     _sExpMask+__svml_scbrt_data_internal(%rip), %ymm8, %ymm10
> +        vorps     _sExpMask1+__svml_scbrt_data_internal(%rip), %ymm9, %ymm11
> +
> +/* r=y-y` */
> +        vsubps    %ymm11, %ymm10, %ymm15
> +
> +/* Biased exponent-1 */
> +        vpand     _iSignMask+__svml_scbrt_data_internal(%rip), %ymm0, %ymm8
> +        vpcmpgtd  _iCmpConst+__svml_scbrt_data_internal(%rip), %ymm7, %ymm2
> +        vmovmskps %ymm2, %eax
> +        vinsertf128 $1, %xmm6, %ymm12, %ymm14
> +
> +/* Get absolute biased exponent */
> +        vpand     _iBExpMask+__svml_scbrt_data_internal(%rip), %ymm0, %ymm6
> +
> +/* r=(y-y`)*rcp_table(y`) */
> +        vmulps    %ymm15, %ymm14, %ymm1
> +        vpsubd    _iOne+__svml_scbrt_data_internal(%rip), %ymm6, %ymm10
> +
> +/*
> + * Calculate exponent/3
> + * i555Exp=(2^{12}-1)/3*exponent
> + */
> +        vpmulld   _i555+__svml_scbrt_data_internal(%rip), %ymm6, %ymm3
> +
> +/* Get K (exponent=3*k+j) */
> +        vpsrld    $12, %ymm3, %ymm13
> +
> +/* Get J */
> +        vpsubd    %ymm13, %ymm10, %ymm11
> +
> +/* Add 2/3*(bias-1)+1 to (k+1/3*(bias-1)) */
> +        vpaddd    _iBias+__svml_scbrt_data_internal(%rip), %ymm13, %ymm7
> +        vpsubd    %ymm13, %ymm11, %ymm12
> +
> +/* Attach sign to exponent */
> +        vpor      %ymm8, %ymm7, %ymm9
> +        vpsubd    %ymm13, %ymm12, %ymm14
> +        vpslld    $23, %ymm9, %ymm0
> +
> +/* Get 128*J */
> +        vpslld    $7, %ymm14, %ymm15
> +
> +/* iCbrtIndex=4*l+128*j */
> +        vpaddd    %ymm15, %ymm4, %ymm4
> +
> +/* Zero index if callout expected */
> +        vpandn    %ymm4, %ymm2, %ymm4
> +
> +/* Load Cbrt table Hi & Lo values */
> +        vmovd     %xmm4, %ecx
> +        vextractf128 $1, %ymm4, %xmm13
> +        vpextrd   $1, %xmm4, %esi
> +        movslq    %ecx, %rcx
> +        movslq    %esi, %rsi
> +        vmovd     %xmm13, %r9d
> +        vmovd     128(%rdx,%rcx), %xmm2
> +        vpextrd   $2, %xmm4, %edi
> +        vpextrd   $3, %xmm4, %r8d
> +        vmovd     128(%rdx,%rsi), %xmm3
> +        vpextrd   $1, %xmm13, %r10d
> +        vpextrd   $2, %xmm13, %ecx
> +        vpextrd   $3, %xmm13, %esi
> +        movslq    %edi, %rdi
> +        movslq    %r8d, %r8
> +        movslq    %r9d, %r9
> +        movslq    %r10d, %r10
> +        movslq    %ecx, %rcx
> +        movslq    %esi, %rsi
> +        vmovd     128(%rdx,%rdi), %xmm6
> +        vmovd     128(%rdx,%r8), %xmm7
> +        vmovd     128(%rdx,%r9), %xmm11
> +        vmovd     128(%rdx,%r10), %xmm12
> +        vmovd     128(%rdx,%rcx), %xmm14
> +        vmovd     128(%rdx,%rsi), %xmm15
> +        vpunpckldq %xmm3, %xmm2, %xmm8
> +        vpunpckldq %xmm7, %xmm6, %xmm9
> +        vpunpckldq %xmm12, %xmm11, %xmm4
> +        vpunpckldq %xmm15, %xmm14, %xmm11
> +        vpunpcklqdq %xmm9, %xmm8, %xmm10
> +        vpunpcklqdq %xmm11, %xmm4, %xmm2
> +        vinsertf128 $1, %xmm2, %ymm10, %ymm3
> +
> +/* sCbrtHi *= 2^k */
> +        vmulps    %ymm3, %ymm0, %ymm2
> +
> +/* Polynomial:    p1+r*(p2*r+r*(p3+r*p4)) */
> +        vmovups   _sP2+__svml_scbrt_data_internal(%rip), %ymm0
> +        vfmadd213ps _sP1+__svml_scbrt_data_internal(%rip), %ymm1, %ymm0
> +
> +/* T`*r */
> +        vmulps    %ymm2, %ymm1, %ymm1
> +
> +/* (T`*r)*P */
> +        vmulps    %ymm1, %ymm0, %ymm0
> +
> +/*
> + * T`*r*P+D`
> + * result = T`+(T`*r*P+D`)
> + */
> +        vaddps    %ymm0, %ymm2, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm5
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm5, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      cbrtf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_cbrtf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_scbrt_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _sRcp[32][1];
> +        __declspec(align(32)) VUINT32 _sCbrtHL[96][1];
> +        __declspec(align(32)) VUINT32 _sP2[8][1];
> +        __declspec(align(32)) VUINT32 _sP1[8][1];
> +        __declspec(align(32)) VUINT32 _sMantissaMask[8][1];
> +        __declspec(align(32)) VUINT32 _sMantissaMask1[8][1];
> +        __declspec(align(32)) VUINT32 _sExpMask[8][1];
> +        __declspec(align(32)) VUINT32 _sExpMask1[8][1];
> +        __declspec(align(32)) VUINT32 _iRcpIndexMask[8][1];
> +        __declspec(align(32)) VUINT32 _iBExpMask[8][1];
> +        __declspec(align(32)) VUINT32 _iSignMask[8][1];
> +        __declspec(align(32)) VUINT32 _iBias[8][1];
> +        __declspec(align(32)) VUINT32 _iOne[8][1];
> +        __declspec(align(32)) VUINT32 _i555[8][1];
> +        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iSubConst[8][1];
> +        __declspec(align(32)) VUINT32 _iCmpConst[8][1];
> +} __svml_scbrt_data_internal;
> +#endif
> +__svml_scbrt_data_internal:
> +        /*== _sRcp ==*/
> +        .long 0xBF7C0FC1  /* (1/(1+0/32+1/64)) = -.984615 */
> +        .long 0xBF74898D  /* (1/(1+1/32+1/64)) = -.955224 */
> +        .long 0xBF6D7304  /* (1/(1+2/32+1/64)) = -.927536 */
> +        .long 0xBF66C2B4  /* (1/(1+3/32+1/64)) = -.901408 */
> +        .long 0xBF607038  /* (1/(1+4/32+1/64)) = -.876712 */
> +        .long 0xBF5A740E  /* (1/(1+5/32+1/64)) = -.853333 */
> +        .long 0xBF54C77B  /* (1/(1+6/32+1/64)) = -.831169 */
> +        .long 0xBF4F6475  /* (1/(1+7/32+1/64)) = -.810127 */
> +        .long 0xBF4A4588  /* (1/(1+8/32+1/64)) = -.790123 */
> +        .long 0xBF4565C8  /* (1/(1+9/32+1/64)) = -.771084 */
> +        .long 0xBF40C0C1  /* (1/(1+10/32+1/64)) = -.752941 */
> +        .long 0xBF3C5264  /* (1/(1+11/32+1/64)) = -.735632 */
> +        .long 0xBF381703  /* (1/(1+12/32+1/64)) = -.719101 */
> +        .long 0xBF340B41  /* (1/(1+13/32+1/64)) = -.703297 */
> +        .long 0xBF302C0B  /* (1/(1+14/32+1/64)) = -.688172 */
> +        .long 0xBF2C7692  /* (1/(1+15/32+1/64)) = -.673684 */
> +        .long 0xBF28E83F  /* (1/(1+16/32+1/64)) = -.659794 */
> +        .long 0xBF257EB5  /* (1/(1+17/32+1/64)) = -.646465 */
> +        .long 0xBF2237C3  /* (1/(1+18/32+1/64)) = -.633663 */
> +        .long 0xBF1F1166  /* (1/(1+19/32+1/64)) = -.621359 */
> +        .long 0xBF1C09C1  /* (1/(1+20/32+1/64)) = -.609524 */
> +        .long 0xBF191F1A  /* (1/(1+21/32+1/64)) = -.598131 */
> +        .long 0xBF164FDA  /* (1/(1+22/32+1/64)) = -.587156 */
> +        .long 0xBF139A86  /* (1/(1+23/32+1/64)) = -.576577 */
> +        .long 0xBF10FDBC  /* (1/(1+24/32+1/64)) = -.566372 */
> +        .long 0xBF0E7835  /* (1/(1+25/32+1/64)) = -.556522 */
> +        .long 0xBF0C08C1  /* (1/(1+26/32+1/64)) = -.547009 */
> +        .long 0xBF09AE41  /* (1/(1+27/32+1/64)) = -.537815 */
> +        .long 0xBF0767AB  /* (1/(1+28/32+1/64)) = -.528926 */
> +        .long 0xBF053408  /* (1/(1+29/32+1/64)) = -.520325 */
> +        .long 0xBF03126F  /* (1/(1+30/32+1/64)) = -.512    */
> +        .long 0xBF010204  /* (1/(1+31/32+1/64)) = -.503937 */
> +        /*== _sCbrtHL ==*/
> +        .align 32
> +        .long 0x3F80A9C9    /* HI((2^0*(1+0/32+1/64))^(1/3)) = 1.005181 */
> +        .long 0x3F81F833    /* HI((2^0*(1+1/32+1/64))^(1/3)) = 1.015387 */
> +        .long 0x3F834007    /* HI((2^0*(1+2/32+1/64))^(1/3)) = 1.025391 */
> +        .long 0x3F848194    /* HI((2^0*(1+3/32+1/64))^(1/3)) = 1.035204 */
> +        .long 0x3F85BD25    /* HI((2^0*(1+4/32+1/64))^(1/3)) = 1.044835 */
> +        .long 0x3F86F300    /* HI((2^0*(1+5/32+1/64))^(1/3)) = 1.054291 */
> +        .long 0x3F882365    /* HI((2^0*(1+6/32+1/64))^(1/3)) = 1.06358  */
> +        .long 0x3F894E90    /* HI((2^0*(1+7/32+1/64))^(1/3)) = 1.07271  */
> +        .long 0x3F8A74B9    /* HI((2^0*(1+8/32+1/64))^(1/3)) = 1.081687 */
> +        .long 0x3F8B9615    /* HI((2^0*(1+9/32+1/64))^(1/3)) = 1.090518 */
> +        .long 0x3F8CB2D4    /* HI((2^0*(1+10/32+1/64))^(1/3)) = 1.099207 */
> +        .long 0x3F8DCB24    /* HI((2^0*(1+11/32+1/64))^(1/3)) = 1.107762 */
> +        .long 0x3F8EDF31    /* HI((2^0*(1+12/32+1/64))^(1/3)) = 1.116186 */
> +        .long 0x3F8FEF22    /* HI((2^0*(1+13/32+1/64))^(1/3)) = 1.124485 */
> +        .long 0x3F90FB1F    /* HI((2^0*(1+14/32+1/64))^(1/3)) = 1.132664 */
> +        .long 0x3F92034C    /* HI((2^0*(1+15/32+1/64))^(1/3)) = 1.140726 */
> +        .long 0x3F9307CA    /* HI((2^0*(1+16/32+1/64))^(1/3)) = 1.148675 */
> +        .long 0x3F9408B9    /* HI((2^0*(1+17/32+1/64))^(1/3)) = 1.156516 */
> +        .long 0x3F950638    /* HI((2^0*(1+18/32+1/64))^(1/3)) = 1.164252 */
> +        .long 0x3F960064    /* HI((2^0*(1+19/32+1/64))^(1/3)) = 1.171887 */
> +        .long 0x3F96F759    /* HI((2^0*(1+20/32+1/64))^(1/3)) = 1.179423 */
> +        .long 0x3F97EB2F    /* HI((2^0*(1+21/32+1/64))^(1/3)) = 1.186865 */
> +        .long 0x3F98DC01    /* HI((2^0*(1+22/32+1/64))^(1/3)) = 1.194214 */
> +        .long 0x3F99C9E5    /* HI((2^0*(1+23/32+1/64))^(1/3)) = 1.201474 */
> +        .long 0x3F9AB4F2    /* HI((2^0*(1+24/32+1/64))^(1/3)) = 1.208647 */
> +        .long 0x3F9B9D3D    /* HI((2^0*(1+25/32+1/64))^(1/3)) = 1.215736 */
> +        .long 0x3F9C82DA    /* HI((2^0*(1+26/32+1/64))^(1/3)) = 1.222743 */
> +        .long 0x3F9D65DD    /* HI((2^0*(1+27/32+1/64))^(1/3)) = 1.229671 */
> +        .long 0x3F9E4659    /* HI((2^0*(1+28/32+1/64))^(1/3)) = 1.236522 */
> +        .long 0x3F9F245F    /* HI((2^0*(1+29/32+1/64))^(1/3)) = 1.243297 */
> +        .long 0x3FA00000    /* HI((2^0*(1+30/32+1/64))^(1/3)) = 1.25     */
> +        .long 0x3FA0D94C    /* HI((2^0*(1+31/32+1/64))^(1/3)) = 1.256631 */
> +        .long 0x3FA21B02    /* HI((2^1*(1+0/32+1/64))^(1/3)) = 1.266449 */
> +        .long 0x3FA3C059    /* HI((2^1*(1+1/32+1/64))^(1/3)) = 1.279307 */
> +        .long 0x3FA55D61    /* HI((2^1*(1+2/32+1/64))^(1/3)) = 1.291912 */
> +        .long 0x3FA6F282    /* HI((2^1*(1+3/32+1/64))^(1/3)) = 1.304276 */
> +        .long 0x3FA8801A    /* HI((2^1*(1+4/32+1/64))^(1/3)) = 1.316409 */
> +        .long 0x3FAA067E    /* HI((2^1*(1+5/32+1/64))^(1/3)) = 1.328323 */
> +        .long 0x3FAB8602    /* HI((2^1*(1+6/32+1/64))^(1/3)) = 1.340027 */
> +        .long 0x3FACFEEF    /* HI((2^1*(1+7/32+1/64))^(1/3)) = 1.35153  */
> +        .long 0x3FAE718E    /* HI((2^1*(1+8/32+1/64))^(1/3)) = 1.36284  */
> +        .long 0x3FAFDE1F    /* HI((2^1*(1+9/32+1/64))^(1/3)) = 1.373966 */
> +        .long 0x3FB144E1    /* HI((2^1*(1+10/32+1/64))^(1/3)) = 1.384915 */
> +        .long 0x3FB2A60D    /* HI((2^1*(1+11/32+1/64))^(1/3)) = 1.395692 */
> +        .long 0x3FB401DA    /* HI((2^1*(1+12/32+1/64))^(1/3)) = 1.406307 */
> +        .long 0x3FB5587B    /* HI((2^1*(1+13/32+1/64))^(1/3)) = 1.416763 */
> +        .long 0x3FB6AA20    /* HI((2^1*(1+14/32+1/64))^(1/3)) = 1.427067 */
> +        .long 0x3FB7F6F7    /* HI((2^1*(1+15/32+1/64))^(1/3)) = 1.437224 */
> +        .long 0x3FB93F29    /* HI((2^1*(1+16/32+1/64))^(1/3)) = 1.44724  */
> +        .long 0x3FBA82E1    /* HI((2^1*(1+17/32+1/64))^(1/3)) = 1.457119 */
> +        .long 0x3FBBC244    /* HI((2^1*(1+18/32+1/64))^(1/3)) = 1.466866 */
> +        .long 0x3FBCFD77    /* HI((2^1*(1+19/32+1/64))^(1/3)) = 1.476485 */
> +        .long 0x3FBE349B    /* HI((2^1*(1+20/32+1/64))^(1/3)) = 1.48598  */
> +        .long 0x3FBF67D3    /* HI((2^1*(1+21/32+1/64))^(1/3)) = 1.495356 */
> +        .long 0x3FC0973C    /* HI((2^1*(1+22/32+1/64))^(1/3)) = 1.504615 */
> +        .long 0x3FC1C2F6    /* HI((2^1*(1+23/32+1/64))^(1/3)) = 1.513762 */
> +        .long 0x3FC2EB1A    /* HI((2^1*(1+24/32+1/64))^(1/3)) = 1.5228   */
> +        .long 0x3FC40FC6    /* HI((2^1*(1+25/32+1/64))^(1/3)) = 1.531731 */
> +        .long 0x3FC53112    /* HI((2^1*(1+26/32+1/64))^(1/3)) = 1.54056  */
> +        .long 0x3FC64F16    /* HI((2^1*(1+27/32+1/64))^(1/3)) = 1.549289 */
> +        .long 0x3FC769EB    /* HI((2^1*(1+28/32+1/64))^(1/3)) = 1.55792  */
> +        .long 0x3FC881A6    /* HI((2^1*(1+29/32+1/64))^(1/3)) = 1.566457 */
> +        .long 0x3FC9965D    /* HI((2^1*(1+30/32+1/64))^(1/3)) = 1.574901 */
> +        .long 0x3FCAA825    /* HI((2^1*(1+31/32+1/64))^(1/3)) = 1.583256 */
> +        .long 0x3FCC3D79    /* HI((2^2*(1+0/32+1/64))^(1/3)) = 1.595626 */
> +        .long 0x3FCE5054    /* HI((2^2*(1+1/32+1/64))^(1/3)) = 1.611826 */
> +        .long 0x3FD058B8    /* HI((2^2*(1+2/32+1/64))^(1/3)) = 1.627707 */
> +        .long 0x3FD25726    /* HI((2^2*(1+3/32+1/64))^(1/3)) = 1.643285 */
> +        .long 0x3FD44C15    /* HI((2^2*(1+4/32+1/64))^(1/3)) = 1.658572 */
> +        .long 0x3FD637F2    /* HI((2^2*(1+5/32+1/64))^(1/3)) = 1.673582 */
> +        .long 0x3FD81B24    /* HI((2^2*(1+6/32+1/64))^(1/3)) = 1.688328 */
> +        .long 0x3FD9F60B    /* HI((2^2*(1+7/32+1/64))^(1/3)) = 1.702821 */
> +        .long 0x3FDBC8FE    /* HI((2^2*(1+8/32+1/64))^(1/3)) = 1.717071 */
> +        .long 0x3FDD9452    /* HI((2^2*(1+9/32+1/64))^(1/3)) = 1.731089 */
> +        .long 0x3FDF5853    /* HI((2^2*(1+10/32+1/64))^(1/3)) = 1.744883 */
> +        .long 0x3FE1154B    /* HI((2^2*(1+11/32+1/64))^(1/3)) = 1.758462 */
> +        .long 0x3FE2CB7F    /* HI((2^2*(1+12/32+1/64))^(1/3)) = 1.771835 */
> +        .long 0x3FE47B2E    /* HI((2^2*(1+13/32+1/64))^(1/3)) = 1.785009 */
> +        .long 0x3FE62496    /* HI((2^2*(1+14/32+1/64))^(1/3)) = 1.797992 */
> +        .long 0x3FE7C7F0    /* HI((2^2*(1+15/32+1/64))^(1/3)) = 1.810789 */
> +        .long 0x3FE96571    /* HI((2^2*(1+16/32+1/64))^(1/3)) = 1.823408 */
> +        .long 0x3FEAFD4C    /* HI((2^2*(1+17/32+1/64))^(1/3)) = 1.835855 */
> +        .long 0x3FEC8FB3    /* HI((2^2*(1+18/32+1/64))^(1/3)) = 1.848135 */
> +        .long 0x3FEE1CD3    /* HI((2^2*(1+19/32+1/64))^(1/3)) = 1.860255 */
> +        .long 0x3FEFA4D7    /* HI((2^2*(1+20/32+1/64))^(1/3)) = 1.872218 */
> +        .long 0x3FF127E9    /* HI((2^2*(1+21/32+1/64))^(1/3)) = 1.88403  */
> +        .long 0x3FF2A62F    /* HI((2^2*(1+22/32+1/64))^(1/3)) = 1.895697 */
> +        .long 0x3FF41FD0    /* HI((2^2*(1+23/32+1/64))^(1/3)) = 1.907221 */
> +        .long 0x3FF594EE    /* HI((2^2*(1+24/32+1/64))^(1/3)) = 1.918607 */
> +        .long 0x3FF705AC    /* HI((2^2*(1+25/32+1/64))^(1/3)) = 1.929861 */
> +        .long 0x3FF8722A    /* HI((2^2*(1+26/32+1/64))^(1/3)) = 1.940984 */
> +        .long 0x3FF9DA86    /* HI((2^2*(1+27/32+1/64))^(1/3)) = 1.951981 */
> +        .long 0x3FFB3EDE    /* HI((2^2*(1+28/32+1/64))^(1/3)) = 1.962856 */
> +        .long 0x3FFC9F4E    /* HI((2^2*(1+29/32+1/64))^(1/3)) = 1.973612 */
> +        .long 0x3FFDFBF2    /* HI((2^2*(1+30/32+1/64))^(1/3)) = 1.984251 */
> +        .long 0x3FFF54E3    /* HI((2^2*(1+31/32+1/64))^(1/3)) = 1.994778 */
> +        .align 32
> +        .long 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962, 0xBDE3A962  /* _sP2 */
> +        .align 32
> +        .long 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91, 0x3EAAAC91  /* _sP1 */
> +        .align 32
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff  /* _sMantissaMask (EXP_MSK3) */
> +        .align 32
> +        .long 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000, 0x007e0000  /* _sMantissaMask1 (SIG_MASK) */
> +        .align 32
> +        .long 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000, 0xBF800000  /* _sExpMask  (EXP_MASK) */
> +        .align 32
> +        .long 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000, 0xBF820000  /* _sExpMask1 (EXP_MASK2) */
> +        .align 32
> +        .long 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c, 0x0000007c  /* _iRcpIndexMask */
> +        .align 32
> +        .long 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff  /* _iBExpMask */
> +        .align 32
> +        .long 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100  /* _iSignMask */
> +        .align 32
> +        .long 0x00000055, 0x00000055, 0x00000055, 0x00000055, 0x00000055, 0x00000055, 0x00000055, 0x00000055  /* _iBias */
> +        .align 32
> +        .long 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001, 0x00000001  /* _iOne */
> +        .align 32
> +        .long 0x00000555, 0x00000555, 0x00000555, 0x00000555, 0x00000555, 0x00000555, 0x00000555, 0x00000555  /* _i555 */
> +        .align 32
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _iAbsMask */
> +        .align 32
> +        .long 0x80800000, 0x80800000, 0x80800000, 0x80800000, 0x80800000, 0x80800000, 0x80800000, 0x80800000  /* _iSubConst */
> +        .align 32
> +        .long 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF, 0xFEFFFFFF  /* _iCmpConst */
> +        .align 32
> +        .type	__svml_scbrt_data_internal,@object
> +        .size	__svml_scbrt_data_internal,.-__svml_scbrt_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt2_core.S b/sysdeps/x86_64/fpu/svml_d_cbrt2_core.S
> new file mode 100644
> index 0000000000..4bf546564b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_cbrt2_core.S
> @@ -0,0 +1,29 @@
> +/* Function cbrt vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_cbrt)
> +WRAPPER_IMPL_SSE2 cbrt
> +END (_ZGVbN2v_cbrt)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_cbrt)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt4_core.S b/sysdeps/x86_64/fpu/svml_d_cbrt4_core.S
> new file mode 100644
> index 0000000000..e6d1003e27
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_cbrt4_core.S
> @@ -0,0 +1,29 @@
> +/* Function cbrt vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_cbrt)
> +WRAPPER_IMPL_AVX _ZGVbN2v_cbrt
> +END (_ZGVdN4v_cbrt)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_cbrt)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S
> new file mode 100644
> index 0000000000..70632869ac
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_cbrt4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function cbrt vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_cbrt)
> +WRAPPER_IMPL_AVX _ZGVbN2v_cbrt
> +END (_ZGVcN4v_cbrt)
> diff --git a/sysdeps/x86_64/fpu/svml_d_cbrt8_core.S b/sysdeps/x86_64/fpu/svml_d_cbrt8_core.S
> new file mode 100644
> index 0000000000..37571673a7
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_cbrt8_core.S
> @@ -0,0 +1,25 @@
> +/* Function cbrt vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_cbrt)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_cbrt
> +END (_ZGVeN8v_cbrt)
> diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S b/sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S
> new file mode 100644
> index 0000000000..1be6294026
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_cbrtf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function cbrtf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_cbrtf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_cbrtf
> +END (_ZGVeN16v_cbrtf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S b/sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S
> new file mode 100644
> index 0000000000..2469a100f4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_cbrtf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function cbrtf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_cbrtf)
> +WRAPPER_IMPL_SSE2 cbrtf
> +END (_ZGVbN4v_cbrtf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_cbrtf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S
> new file mode 100644
> index 0000000000..efedc22323
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function cbrtf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_cbrtf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_cbrtf
> +END (_ZGVdN8v_cbrtf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_cbrtf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S
> new file mode 100644
> index 0000000000..b5acc62426
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_cbrtf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function cbrtf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_cbrtf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_cbrtf
> +END (_ZGVcN8v_cbrtf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c
> new file mode 100644
> index 0000000000..c8bc643c99
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-cbrt.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c
> new file mode 100644
> index 0000000000..c8bc643c99
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-cbrt.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c
> new file mode 100644
> index 0000000000..c8bc643c99
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-cbrt.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c
> new file mode 100644
> index 0000000000..fb3684b18c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cbrt.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC cbrt
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index db136cc901..b1981ac7e4 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVbN2v_sinh)
> +VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 5fc09ac8c0..47915a7e59 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVdN4v_sinh)
> +VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 26ef7fb365..5cd5049807 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVcN4v_sinh)
> +VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index c7055fca76..83970739ab 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVeN8v_sinh)
> +VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c
> new file mode 100644
> index 0000000000..59b8d77f71
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-cbrtf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c
> new file mode 100644
> index 0000000000..59b8d77f71
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-cbrtf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c
> new file mode 100644
> index 0000000000..59b8d77f71
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-cbrtf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c
> new file mode 100644
> index 0000000000..3a06ba79e0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-cbrtf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC cbrtf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index d353bcb0f2..0420f11c28 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVeN16v_sinhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index 5e59117626..c8f7580265 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVbN4v_sinhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index e884a5f4df..b581796b88 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVdN8v_sinhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 95910d39e9..f16789e5ff 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVcN8v_sinhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 01/18] x86-64: Add vector atan/atanf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 01/18] x86-64: Add vector atan/atanf implementation " Sunil K Pandey
@ 2021-12-29 21:25   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:25 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:43PM -0800, Sunil K Pandey wrote:
> Implement vectorized atan/atanf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector atan/atanf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 ++
>  .../fpu/multiarch/svml_d_atan2_core-sse2.S    |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_atan2_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_atan2_core_sse4.S    | 245 ++++++++++++++++++
>  .../fpu/multiarch/svml_d_atan4_core-sse.S     |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_atan4_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_atan4_core_avx2.S    | 225 ++++++++++++++++
>  .../fpu/multiarch/svml_d_atan8_core-avx2.S    |  20 ++
>  .../x86_64/fpu/multiarch/svml_d_atan8_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_atan8_core_avx512.S  | 213 +++++++++++++++
>  .../fpu/multiarch/svml_s_atanf16_core-avx2.S  |  20 ++
>  .../fpu/multiarch/svml_s_atanf16_core.c       |  28 ++
>  .../multiarch/svml_s_atanf16_core_avx512.S    | 174 +++++++++++++
>  .../fpu/multiarch/svml_s_atanf4_core-sse2.S   |  20 ++
>  .../x86_64/fpu/multiarch/svml_s_atanf4_core.c |  28 ++
>  .../fpu/multiarch/svml_s_atanf4_core_sse4.S   | 164 ++++++++++++
>  .../fpu/multiarch/svml_s_atanf8_core-sse.S    |  20 ++
>  .../x86_64/fpu/multiarch/svml_s_atanf8_core.c |  28 ++
>  .../fpu/multiarch/svml_s_atanf8_core_avx2.S   | 148 +++++++++++
>  sysdeps/x86_64/fpu/svml_d_atan2_core.S        |  29 +++
>  sysdeps/x86_64/fpu/svml_d_atan4_core.S        |  29 +++
>  sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S    |  25 ++
>  sysdeps/x86_64/fpu/svml_d_atan8_core.S        |  25 ++
>  sysdeps/x86_64/fpu/svml_s_atanf16_core.S      |  25 ++
>  sysdeps/x86_64/fpu/svml_s_atanf4_core.S       |  29 +++
>  sysdeps/x86_64/fpu/svml_s_atanf8_core.S       |  29 +++
>  sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S   |  25 ++
>  .../x86_64/fpu/test-double-libmvec-atan-avx.c |   1 +
>  .../fpu/test-double-libmvec-atan-avx2.c       |   1 +
>  .../fpu/test-double-libmvec-atan-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-atan.c |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-atanf-avx.c |   1 +
>  .../fpu/test-float-libmvec-atanf-avx2.c       |   1 +
>  .../fpu/test-float-libmvec-atanf-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-atanf.c |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 1741 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atan2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atan4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atan8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 2ccdd1fc53..b4647ca918 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -109,4 +109,15 @@
>  #define __DECL_SIMD_acosf32x
>  #define __DECL_SIMD_acosf64x
>  #define __DECL_SIMD_acosf128x
> +
> +#define __DECL_SIMD_atan
> +#define __DECL_SIMD_atanf
> +#define __DECL_SIMD_atanl
> +#define __DECL_SIMD_atanf16
> +#define __DECL_SIMD_atanf32
> +#define __DECL_SIMD_atanf64
> +#define __DECL_SIMD_atanf128
> +#define __DECL_SIMD_atanf32x
> +#define __DECL_SIMD_atanf64x
> +#define __DECL_SIMD_atanf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 2cc6654208..3e27c21f21 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -54,7 +54,7 @@ __MATHCALL_VEC (acos,, (_Mdouble_ __x));
>  /* Arc sine of X.  */
>  __MATHCALL (asin,, (_Mdouble_ __x));
>  /* Arc tangent of X.  */
> -__MATHCALL (atan,, (_Mdouble_ __x));
> +__MATHCALL_VEC (atan,, (_Mdouble_ __x));
>  /* Arc tangent of Y/X.  */
>  __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
>  
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index b37b55777e..a93258db6f 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -47,10 +47,18 @@ GLIBC_2.22 _ZGVeN8v_sin F
>  GLIBC_2.22 _ZGVeN8vv_pow F
>  GLIBC_2.22 _ZGVeN8vvv_sincos F
>  GLIBC_2.35 _ZGVbN2v_acos F
> +GLIBC_2.35 _ZGVbN2v_atan F
>  GLIBC_2.35 _ZGVbN4v_acosf F
> +GLIBC_2.35 _ZGVbN4v_atanf F
>  GLIBC_2.35 _ZGVcN4v_acos F
> +GLIBC_2.35 _ZGVcN4v_atan F
>  GLIBC_2.35 _ZGVcN8v_acosf F
> +GLIBC_2.35 _ZGVcN8v_atanf F
>  GLIBC_2.35 _ZGVdN4v_acos F
> +GLIBC_2.35 _ZGVdN4v_atan F
>  GLIBC_2.35 _ZGVdN8v_acosf F
> +GLIBC_2.35 _ZGVdN8v_atanf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
> +GLIBC_2.35 _ZGVeN16v_atanf F
>  GLIBC_2.35 _ZGVeN8v_acos F
> +GLIBC_2.35 _ZGVeN8v_atan F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index dabb74cbb9..1c0e5c5e35 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -62,6 +62,10 @@
>  #  define __DECL_SIMD_acos __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_acosf
>  #  define __DECL_SIMD_acosf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_atan
> +#  define __DECL_SIMD_atan __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_atanf
> +#  define __DECL_SIMD_atanf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 4bcbd1fbce..ddcccb11d7 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -30,6 +30,8 @@
>  !GCC$ builtin (powf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (acos) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (acosf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (atan) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (atanf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -45,3 +47,5 @@
>  !GCC$ builtin (powf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (acos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (acosf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (atan) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (atanf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 7acf1f306c..dae0887f13 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -23,6 +23,7 @@ postclean-generated += libmvec.mk
>  # Define for both math and mathvec directories.
>  libmvec-funcs = \
>    acos \
> +  atan \
>    cos \
>    exp \
>    log \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 2985fe7ca7..424f6d526e 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -15,6 +15,8 @@ libmvec {
>    }
>    GLIBC_2.35 {
>      _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
> +    _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
> +    _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
>    }
>  }
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 6c12976c82..2e64e59803 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -164,6 +164,26 @@ float: 2
>  float128: 2
>  ldouble: 1
>  
> +Function: "atan_vlen16":
> +float: 1
> +
> +Function: "atan_vlen2":
> +double: 1
> +
> +Function: "atan_vlen4":
> +double: 1
> +float: 1
> +
> +Function: "atan_vlen4_avx2":
> +double: 1
> +
> +Function: "atan_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "atan_vlen8_avx2":
> +float: 1
> +
>  Function: "atanh":
>  double: 2
>  float: 2
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S
> new file mode 100644
> index 0000000000..115e5223aa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized atan, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_atan _ZGVbN2v_atan_sse2
> +#include "../svml_d_atan2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c
> new file mode 100644
> index 0000000000..93f079ffcb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized atan, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_atan
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_atan, __GI__ZGVbN2v_atan, __redirect__ZGVbN2v_atan)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S
> new file mode 100644
> index 0000000000..f0ad036b9e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan2_core_sse4.S
> @@ -0,0 +1,245 @@
> +/* Function atan vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + */
> +
> +/* Offsets for data table __svml_datan_data_internal_avx512
> + */
> +#define AbsMask                       	0
> +#define Shifter                       	16
> +#define MaxThreshold                  	32
> +#define MOne                          	48
> +#define One                           	64
> +#define LargeX                        	80
> +#define Zero                          	96
> +#define Tbl_H                         	112
> +#define Tbl_L                         	368
> +#define dIndexMed                     	624
> +#define Pi2                           	640
> +#define Pi2_low                       	656
> +#define coeff                         	672
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_atan_sse4)
> +        lea       Tbl_H+128+__svml_datan_data_internal_avx512(%rip), %rcx
> +        movups    __svml_datan_data_internal_avx512(%rip), %xmm4
> +        movups    Shifter+__svml_datan_data_internal_avx512(%rip), %xmm3
> +        andps     %xmm0, %xmm4
> +        movaps    %xmm3, %xmm12
> +        movaps    %xmm4, %xmm5
> +        addpd     %xmm4, %xmm12
> +        movaps    %xmm12, %xmm7
> +
> +/*
> + * table lookup sequence
> + * VPERMUTE not available
> + */
> +        movaps    %xmm12, %xmm10
> +        subpd     %xmm3, %xmm7
> +        subpd     %xmm7, %xmm5
> +        mulpd     %xmm4, %xmm7
> +        movups    MaxThreshold+__svml_datan_data_internal_avx512(%rip), %xmm2
> +        psllq     $3, %xmm10
> +
> +/* saturate X range */
> +        movups    LargeX+__svml_datan_data_internal_avx512(%rip), %xmm8
> +        pxor      %xmm4, %xmm0
> +        cmplepd   %xmm4, %xmm2
> +        addpd     One+__svml_datan_data_internal_avx512(%rip), %xmm7
> +        minpd     %xmm4, %xmm8
> +        movups    MOne+__svml_datan_data_internal_avx512(%rip), %xmm6
> +        movaps    %xmm2, %xmm1
> +        movaps    %xmm2, %xmm9
> +        andnps    %xmm5, %xmm1
> +        andps     %xmm2, %xmm6
> +        andnps    %xmm7, %xmm9
> +        andps     %xmm2, %xmm8
> +        orps      %xmm6, %xmm1
> +        orps      %xmm8, %xmm9
> +
> +/* R+Rl = DiffX/Y */
> +        divpd     %xmm9, %xmm1
> +        pand      .FLT_11(%rip), %xmm10
> +
> +/* set table value to Pi/2 for large X */
> +        movups    Pi2+__svml_datan_data_internal_avx512(%rip), %xmm4
> +        movd      %xmm10, %eax
> +        andps     %xmm2, %xmm4
> +        pshufd    $2, %xmm10, %xmm11
> +        movaps    %xmm2, %xmm10
> +
> +/* polynomial evaluation */
> +        movaps    %xmm1, %xmm2
> +        mulpd     %xmm1, %xmm2
> +        movd      %xmm11, %edx
> +        movups    coeff+__svml_datan_data_internal_avx512(%rip), %xmm5
> +        movaps    %xmm2, %xmm7
> +        movups    coeff+32+__svml_datan_data_internal_avx512(%rip), %xmm6
> +        movaps    %xmm2, %xmm9
> +        mulpd     %xmm2, %xmm5
> +        mulpd     %xmm2, %xmm7
> +        addpd     coeff+16+__svml_datan_data_internal_avx512(%rip), %xmm5
> +        mulpd     %xmm2, %xmm6
> +        mulpd     %xmm7, %xmm5
> +        addpd     coeff+48+__svml_datan_data_internal_avx512(%rip), %xmm6
> +        mulpd     %xmm1, %xmm9
> +        addpd     %xmm5, %xmm6
> +        movups    coeff+64+__svml_datan_data_internal_avx512(%rip), %xmm8
> +        mulpd     %xmm2, %xmm8
> +        mulpd     %xmm6, %xmm7
> +        addpd     coeff+80+__svml_datan_data_internal_avx512(%rip), %xmm8
> +        addpd     %xmm7, %xmm8
> +        mulpd     %xmm8, %xmm9
> +        movups    dIndexMed+__svml_datan_data_internal_avx512(%rip), %xmm14
> +        cmplepd   %xmm12, %xmm14
> +        addpd     %xmm9, %xmm1
> +        movslq    %eax, %rax
> +        movaps    %xmm14, %xmm3
> +        movslq    %edx, %rdx
> +        movsd     -128(%rax,%rcx), %xmm13
> +        movsd     (%rcx,%rax), %xmm15
> +        movhpd    -128(%rdx,%rcx), %xmm13
> +        movhpd    (%rcx,%rdx), %xmm15
> +        andnps    %xmm13, %xmm3
> +        andps     %xmm14, %xmm15
> +        orps      %xmm15, %xmm3
> +        andnps    %xmm3, %xmm10
> +        orps      %xmm4, %xmm10
> +        addpd     %xmm1, %xmm10
> +        pxor      %xmm10, %xmm0
> +        ret
> +
> +END(_ZGVbN2v_atan_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_datan_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 AbsMask[2][2];
> +        __declspec(align(16)) VUINT32 Shifter[2][2];
> +        __declspec(align(16)) VUINT32 MaxThreshold[2][2];
> +        __declspec(align(16)) VUINT32 MOne[2][2];
> +        __declspec(align(16)) VUINT32 One[2][2];
> +        __declspec(align(16)) VUINT32 LargeX[2][2];
> +        __declspec(align(16)) VUINT32 Zero[2][2];
> +        __declspec(align(16)) VUINT32 Tbl_H[32][2];
> +        __declspec(align(16)) VUINT32 Tbl_L[32][2];
> +        __declspec(align(16)) VUINT32 dIndexMed[2][2];
> +        __declspec(align(16)) VUINT32 Pi2[2][2];
> +        __declspec(align(16)) VUINT32 Pi2_low[2][2];
> +        __declspec(align(16)) VUINT32 coeff[6][2][2];
> +    } __svml_datan_data_internal_avx512;
> +#endif
> +__svml_datan_data_internal_avx512:
> +        /*== AbsMask ==*/
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== Shifter ==*/
> +        .align 16
> +        .quad 0x4318000000000000, 0x4318000000000000
> +        /*== MaxThreshold ==*/
> +        .align 16
> +        .quad 0x401f800000000000, 0x401f800000000000
> +        /*== MOne ==*/
> +        .align 16
> +        .quad 0xbff0000000000000, 0xbff0000000000000
> +        /*== One ==*/
> +        .align 16
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== LargeX ==*/
> +        .align 16
> +        .quad 0x47f0000000000000, 0x47f0000000000000
> +        /*== Zero ==*/
> +        .align 16
> +        .quad 0x0000000000000000, 0x0000000000000000
> +        /*== Tbl_H ==*/
> +        .align 16
> +        .quad 0x0000000000000000, 0x3fcf5b75f92c80dd
> +        .quad 0x3fddac670561bb4f, 0x3fe4978fa3269ee1
> +        .quad 0x3fe921fb54442d18, 0x3fecac7c57846f9e
> +        .quad 0x3fef730bd281f69b, 0x3ff0d38f2c5ba09f
> +        .quad 0x3ff1b6e192ebbe44, 0x3ff270ef55a53a25
> +        .quad 0x3ff30b6d796a4da8, 0x3ff38d6a6ce13353
> +        .quad 0x3ff3fc176b7a8560, 0x3ff45b54837351a0
> +        .quad 0x3ff4ae10fc6589a5, 0x3ff4f68dea672617
> +        .quad 0x3ff5368c951e9cfd, 0x3ff56f6f33a3e6a7
> +        .quad 0x3ff5a25052114e60, 0x3ff5d013c41adabd
> +        .quad 0x3ff5f97315254857, 0x3ff61f06c6a92b89
> +        .quad 0x3ff6414d44094c7c, 0x3ff660b02c736a06
> +        .quad 0x3ff67d8863bc99bd, 0x3ff698213a9d5053
> +        .quad 0x3ff6b0bae830c070, 0x3ff6c78c7edeb195
> +        .quad 0x3ff6dcc57bb565fd, 0x3ff6f08f07435fec
> +        .quad 0x3ff7030cf9403197, 0x3ff7145eac2088a4
> +        /*== Tbl_L ==*/
> +        .align 16
> +        .quad 0x0000000000000000, 0x3c68ab6e3cf7afbd
> +        .quad 0x3c7a2b7f222f65e2, 0x3c72419a87f2a458
> +        .quad 0x3c81a62633145c07, 0x3c80dae13ad18a6b
> +        .quad 0x3c7007887af0cbbd, 0xbc9bd0dc231bfd70
> +        .quad 0x3c9b1b466a88828e, 0xbc9a66b1af5f84fb
> +        .quad 0x3c96254cb03bb199, 0xbc812c77e8a80f5c
> +        .quad 0xbc4441a3bd3f1084, 0x3c79e4a72eedacc4
> +        .quad 0xbc93b03e8a27f555, 0x3c9934f9f2b0020e
> +        .quad 0xbc996f47948a99f1, 0xbc7df6edd6f1ec3b
> +        .quad 0x3c78c2d0c89de218, 0x3c9f82bba194dd5d
> +        .quad 0xbc831151a43b51ca, 0xbc8487d50bceb1a5
> +        .quad 0xbc9c5f60a65c7397, 0xbc7acb6afb332a0f
> +        .quad 0xbc99b7bd2e1e8c9c, 0xbc9b9839085189e3
> +        .quad 0xbc97d1ab82ffb70b, 0x3c99239ad620ffe2
> +        .quad 0xbc929c86447928e7, 0xbc8957a7170df016
> +        .quad 0xbc7cbe1896221608, 0xbc9fda5797b32a0b
> +        /*== dIndexMed ==*/
> +        .align 16
> +        .quad 0x4318000000000010, 0x4318000000000010
> +        /*== Pi2 ==*/
> +        .align 16
> +        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18
> +        /*== Pi2_low ==*/
> +        .align 16
> +        .quad 0x3c91a62633145c07, 0x3c91a62633145c07
> +        /*== coeff6 ==*/
> +        .align 16
> +        .quad 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97
> +        .quad 0xbfb74257c46790cc, 0xbfb74257c46790cc
> +        .quad 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0
> +        .quad 0xbfc249248eef04da, 0xbfc249248eef04da
> +        .quad 0x3fc999999998741e, 0x3fc999999998741e
> +        .quad 0xbfd555555555554d, 0xbfd555555555554d
> +        .align 16
> +        .type	__svml_datan_data_internal_avx512,@object
> +        .size	__svml_datan_data_internal_avx512,.-__svml_datan_data_internal_avx512
> +        .align 16
> +
> +.FLT_11:
> +        .long	0x00000078,0x00000000,0x00000078,0x00000000
> +        .type	.FLT_11,@object
> +        .size	.FLT_11,16
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S
> new file mode 100644
> index 0000000000..79c48dbc91
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized atan, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_atan _ZGVdN4v_atan_sse_wrapper
> +#include "../svml_d_atan4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c
> new file mode 100644
> index 0000000000..64ce66b9fd
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized atan, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_atan
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_atan, __GI__ZGVdN4v_atan, __redirect__ZGVdN4v_atan)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S
> new file mode 100644
> index 0000000000..50336514d7
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan4_core_avx2.S
> @@ -0,0 +1,225 @@
> +/* Function atan vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + */
> +
> +/* Offsets for data table __svml_datan_data_internal_avx512
> + */
> +#define AbsMask                       	0
> +#define Shifter                       	32
> +#define MaxThreshold                  	64
> +#define MOne                          	96
> +#define One                           	128
> +#define LargeX                        	160
> +#define Zero                          	192
> +#define Tbl_H                         	224
> +#define Tbl_L                         	480
> +#define dIndexMed                     	736
> +#define Pi2                           	768
> +#define Pi2_low                       	800
> +#define coeff                         	832
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_atan_avx2)
> +        lea       Tbl_H+128+__svml_datan_data_internal_avx512(%rip), %rdi
> +        vmovupd   Shifter+__svml_datan_data_internal_avx512(%rip), %ymm4
> +        vmovupd   One+__svml_datan_data_internal_avx512(%rip), %ymm9
> +
> +/* saturate X range */
> +        vmovupd   LargeX+__svml_datan_data_internal_avx512(%rip), %ymm6
> +        vandpd    __svml_datan_data_internal_avx512(%rip), %ymm0, %ymm7
> +        vaddpd    %ymm4, %ymm7, %ymm2
> +        vcmpge_oqpd MaxThreshold+__svml_datan_data_internal_avx512(%rip), %ymm7, %ymm3
> +        vminpd    %ymm7, %ymm6, %ymm10
> +        vsubpd    %ymm4, %ymm2, %ymm5
> +
> +/*
> + * table lookup sequence
> + * VPERMUTE not available
> + */
> +        vpsllq    $3, %ymm2, %ymm13
> +        vsubpd    %ymm5, %ymm7, %ymm8
> +        vcmpge_oqpd dIndexMed+__svml_datan_data_internal_avx512(%rip), %ymm2, %ymm2
> +        vfmadd231pd %ymm7, %ymm5, %ymm9
> +        vpand     .FLT_11(%rip), %ymm13, %ymm14
> +        vblendvpd %ymm3, MOne+__svml_datan_data_internal_avx512(%rip), %ymm8, %ymm11
> +        vblendvpd %ymm3, %ymm10, %ymm9, %ymm12
> +        vxorpd    %ymm0, %ymm7, %ymm1
> +
> +/* R+Rl = DiffX/Y */
> +        vdivpd    %ymm12, %ymm11, %ymm0
> +        vextractf128 $1, %ymm14, %xmm4
> +        vmovd     %xmm14, %eax
> +        vmovd     %xmm4, %ecx
> +        movslq    %eax, %rax
> +        vpextrd   $2, %xmm14, %edx
> +        movslq    %ecx, %rcx
> +        vpextrd   $2, %xmm4, %esi
> +        movslq    %edx, %rdx
> +        movslq    %esi, %rsi
> +        vmovsd    -128(%rax,%rdi), %xmm15
> +        vmovsd    (%rdi,%rax), %xmm7
> +        vmovsd    -128(%rcx,%rdi), %xmm5
> +        vmovsd    (%rdi,%rcx), %xmm9
> +        vmovhpd   -128(%rdx,%rdi), %xmm15, %xmm15
> +        vmovhpd   (%rdi,%rdx), %xmm7, %xmm8
> +        vmovhpd   -128(%rsi,%rdi), %xmm5, %xmm6
> +        vmovhpd   (%rdi,%rsi), %xmm9, %xmm10
> +
> +/* polynomial evaluation */
> +        vmulpd    %ymm0, %ymm0, %ymm5
> +        vmulpd    %ymm5, %ymm5, %ymm4
> +        vinsertf128 $1, %xmm6, %ymm15, %ymm11
> +        vinsertf128 $1, %xmm10, %ymm8, %ymm12
> +        vblendvpd %ymm2, %ymm12, %ymm11, %ymm13
> +        vmovupd   coeff+__svml_datan_data_internal_avx512(%rip), %ymm8
> +        vmovupd   coeff+64+__svml_datan_data_internal_avx512(%rip), %ymm2
> +        vmulpd    %ymm5, %ymm0, %ymm6
> +        vfmadd213pd coeff+32+__svml_datan_data_internal_avx512(%rip), %ymm5, %ymm8
> +        vfmadd213pd coeff+96+__svml_datan_data_internal_avx512(%rip), %ymm5, %ymm2
> +
> +/* set table value to Pi/2 for large X */
> +        vblendvpd %ymm3, Pi2+__svml_datan_data_internal_avx512(%rip), %ymm13, %ymm7
> +        vmovupd   coeff+128+__svml_datan_data_internal_avx512(%rip), %ymm3
> +        vfmadd213pd %ymm2, %ymm4, %ymm8
> +        vfmadd213pd coeff+160+__svml_datan_data_internal_avx512(%rip), %ymm3, %ymm5
> +        vfmadd213pd %ymm5, %ymm4, %ymm8
> +        vfmadd213pd %ymm0, %ymm6, %ymm8
> +        vaddpd    %ymm8, %ymm7, %ymm0
> +        vxorpd    %ymm1, %ymm0, %ymm0
> +        ret
> +
> +END(_ZGVdN4v_atan_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +.FLT_11:
> +        .long	0x00000078,0x00000000,0x00000078,0x00000000,0x00000078,0x00000000,0x00000078,0x00000000
> +        .type	.FLT_11,@object
> +        .size	.FLT_11,32
> +        .align 32
> +
> +#ifdef __svml_datan_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 AbsMask[4][2];
> +        __declspec(align(32)) VUINT32 Shifter[4][2];
> +        __declspec(align(32)) VUINT32 MaxThreshold[4][2];
> +        __declspec(align(32)) VUINT32 MOne[4][2];
> +        __declspec(align(32)) VUINT32 One[4][2];
> +        __declspec(align(32)) VUINT32 LargeX[4][2];
> +        __declspec(align(32)) VUINT32 Zero[4][2];
> +        __declspec(align(32)) VUINT32 Tbl_H[32][2];
> +        __declspec(align(32)) VUINT32 Tbl_L[32][2];
> +        __declspec(align(32)) VUINT32 dIndexMed[4][2];
> +        __declspec(align(32)) VUINT32 Pi2[4][2];
> +        __declspec(align(32)) VUINT32 Pi2_low[4][2];
> +        __declspec(align(32)) VUINT32 coeff[6][4][2];
> +    } __svml_datan_data_internal_avx512;
> +#endif
> +__svml_datan_data_internal_avx512:
> +        /*== AbsMask ==*/
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== Shifter ==*/
> +        .align 32
> +        .quad 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000
> +        /*== MaxThreshold ==*/
> +        .align 32
> +        .quad 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000
> +        /*== MOne ==*/
> +        .align 32
> +        .quad 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000
> +        /*== One ==*/
> +        .align 32
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== LargeX ==*/
> +        .align 32
> +        .quad 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000
> +        /*== Zero ==*/
> +        .align 32
> +        .quad 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000
> +        /*== Tbl_H ==*/
> +        .align 32
> +        .quad 0x0000000000000000, 0x3fcf5b75f92c80dd
> +        .quad 0x3fddac670561bb4f, 0x3fe4978fa3269ee1
> +        .quad 0x3fe921fb54442d18, 0x3fecac7c57846f9e
> +        .quad 0x3fef730bd281f69b, 0x3ff0d38f2c5ba09f
> +        .quad 0x3ff1b6e192ebbe44, 0x3ff270ef55a53a25
> +        .quad 0x3ff30b6d796a4da8, 0x3ff38d6a6ce13353
> +        .quad 0x3ff3fc176b7a8560, 0x3ff45b54837351a0
> +        .quad 0x3ff4ae10fc6589a5, 0x3ff4f68dea672617
> +        .quad 0x3ff5368c951e9cfd, 0x3ff56f6f33a3e6a7
> +        .quad 0x3ff5a25052114e60, 0x3ff5d013c41adabd
> +        .quad 0x3ff5f97315254857, 0x3ff61f06c6a92b89
> +        .quad 0x3ff6414d44094c7c, 0x3ff660b02c736a06
> +        .quad 0x3ff67d8863bc99bd, 0x3ff698213a9d5053
> +        .quad 0x3ff6b0bae830c070, 0x3ff6c78c7edeb195
> +        .quad 0x3ff6dcc57bb565fd, 0x3ff6f08f07435fec
> +        .quad 0x3ff7030cf9403197, 0x3ff7145eac2088a4
> +        /*== Tbl_L ==*/
> +        .align 32
> +        .quad 0x0000000000000000, 0x3c68ab6e3cf7afbd
> +        .quad 0x3c7a2b7f222f65e2, 0x3c72419a87f2a458
> +        .quad 0x3c81a62633145c07, 0x3c80dae13ad18a6b
> +        .quad 0x3c7007887af0cbbd, 0xbc9bd0dc231bfd70
> +        .quad 0x3c9b1b466a88828e, 0xbc9a66b1af5f84fb
> +        .quad 0x3c96254cb03bb199, 0xbc812c77e8a80f5c
> +        .quad 0xbc4441a3bd3f1084, 0x3c79e4a72eedacc4
> +        .quad 0xbc93b03e8a27f555, 0x3c9934f9f2b0020e
> +        .quad 0xbc996f47948a99f1, 0xbc7df6edd6f1ec3b
> +        .quad 0x3c78c2d0c89de218, 0x3c9f82bba194dd5d
> +        .quad 0xbc831151a43b51ca, 0xbc8487d50bceb1a5
> +        .quad 0xbc9c5f60a65c7397, 0xbc7acb6afb332a0f
> +        .quad 0xbc99b7bd2e1e8c9c, 0xbc9b9839085189e3
> +        .quad 0xbc97d1ab82ffb70b, 0x3c99239ad620ffe2
> +        .quad 0xbc929c86447928e7, 0xbc8957a7170df016
> +        .quad 0xbc7cbe1896221608, 0xbc9fda5797b32a0b
> +        /*== dIndexMed ==*/
> +        .align 32
> +        .quad 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010
> +        /*== Pi2 ==*/
> +        .align 32
> +        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18
> +        /*== Pi2_low ==*/
> +        .align 32
> +        .quad 0x3c91a62633145c07, 0x3c91a62633145c07, 0x3c91a62633145c07, 0x3c91a62633145c07
> +        /*== coeff6 ==*/
> +        .align 32
> +        .quad 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97
> +        .quad 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc
> +        .quad 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0
> +        .quad 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da
> +        .quad 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e
> +        .quad 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d
> +        .align 32
> +        .type	__svml_datan_data_internal_avx512,@object
> +        .size	__svml_datan_data_internal_avx512,.-__svml_datan_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S
> new file mode 100644
> index 0000000000..723734e10b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized atan, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_atan _ZGVeN8v_atan_avx2_wrapper
> +#include "../svml_d_atan8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c
> new file mode 100644
> index 0000000000..e97a41b6bc
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized atan, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_atan
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_atan, __GI__ZGVeN8v_atan, __redirect__ZGVeN8v_atan)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S
> new file mode 100644
> index 0000000000..fa6cb47308
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan8_core_avx512.S
> @@ -0,0 +1,213 @@
> +/* Function atan vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + */
> +
> +/* Offsets for data table __svml_datan_data_internal_avx512
> + */
> +#define AbsMask                       	0
> +#define Shifter                       	64
> +#define MaxThreshold                  	128
> +#define MOne                          	192
> +#define One                           	256
> +#define LargeX                        	320
> +#define Zero                          	384
> +#define Tbl_H                         	448
> +#define dIndexMed                     	704
> +#define Pi2                           	768
> +#define coeff_1                       	832
> +#define coeff_2                       	896
> +#define coeff_3                       	960
> +#define coeff_4                       	1024
> +#define coeff_5                       	1088
> +#define coeff_6                       	1152
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_atan_skx)
> +        vmovups   Shifter+__svml_datan_data_internal_avx512(%rip), %zmm4
> +        vmovups   MaxThreshold+__svml_datan_data_internal_avx512(%rip), %zmm3
> +        vmovups   One+__svml_datan_data_internal_avx512(%rip), %zmm9
> +
> +/* saturate X range */
> +        vmovups   LargeX+__svml_datan_data_internal_avx512(%rip), %zmm7
> +        vandpd    __svml_datan_data_internal_avx512(%rip), %zmm0, %zmm8
> +
> +/* R+Rl = DiffX/Y */
> +        vbroadcastsd .FLT_10(%rip), %zmm15
> +        vaddpd    {rn-sae}, %zmm4, %zmm8, %zmm2
> +        vxorpd    %zmm0, %zmm8, %zmm1
> +        vcmppd    $29, {sae}, %zmm3, %zmm8, %k2
> +
> +/* round to 2 bits after binary point */
> +        vreducepd $40, {sae}, %zmm8, %zmm6
> +        vsubpd    {rn-sae}, %zmm4, %zmm2, %zmm5
> +
> +/*
> + * if|X|>=MaxThreshold, set DiffX=-1
> + * VMSUB(D, DiffX, LargeMask, Zero, One);
> + */
> +        vblendmpd MOne+__svml_datan_data_internal_avx512(%rip), %zmm6, %zmm10{%k2}
> +        vfmadd231pd {rn-sae}, %zmm8, %zmm5, %zmm9
> +        vmovups   dIndexMed+__svml_datan_data_internal_avx512(%rip), %zmm5
> +
> +/* table lookup sequence */
> +        vmovups   Tbl_H+__svml_datan_data_internal_avx512(%rip), %zmm6
> +        vgetmantpd $0, {sae}, %zmm10, %zmm14
> +        vgetexppd {sae}, %zmm10, %zmm11
> +        vmovups   coeff_5+__svml_datan_data_internal_avx512(%rip), %zmm10
> +
> +/*
> + * if|X|>=MaxThreshold, set Y=X
> + * VMADD(D, Y, LargeMask, X, Zero);
> + */
> +        vminpd    {sae}, %zmm8, %zmm7, %zmm9{%k2}
> +        vcmppd    $29, {sae}, %zmm5, %zmm2, %k1
> +        vmovups   Tbl_H+128+__svml_datan_data_internal_avx512(%rip), %zmm7
> +        vmovups   coeff_1+__svml_datan_data_internal_avx512(%rip), %zmm8
> +        vgetmantpd $0, {sae}, %zmm9, %zmm3
> +        vgetexppd {sae}, %zmm9, %zmm12
> +        vmovups   coeff_3+__svml_datan_data_internal_avx512(%rip), %zmm9
> +        vpermt2pd Tbl_H+64+__svml_datan_data_internal_avx512(%rip), %zmm2, %zmm6
> +        vsubpd    {rn-sae}, %zmm12, %zmm11, %zmm4
> +        vpermt2pd Tbl_H+192+__svml_datan_data_internal_avx512(%rip), %zmm2, %zmm7
> +        vrcp14pd  %zmm3, %zmm13
> +        vmovups   coeff_4+__svml_datan_data_internal_avx512(%rip), %zmm12
> +        vmovups   coeff_6+__svml_datan_data_internal_avx512(%rip), %zmm11
> +        vblendmpd %zmm7, %zmm6, %zmm2{%k1}
> +        vmulpd    {rn-sae}, %zmm13, %zmm14, %zmm0
> +        vfnmadd231pd {rn-sae}, %zmm3, %zmm13, %zmm15
> +        vfnmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm3
> +        vfmadd213pd {rn-sae}, %zmm15, %zmm15, %zmm15
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm13, %zmm15
> +        vfmadd213pd {rn-sae}, %zmm0, %zmm15, %zmm3
> +        vscalefpd {rn-sae}, %zmm4, %zmm3, %zmm0
> +
> +/* set table value to Pi/2 for large X */
> +        vblendmpd Pi2+__svml_datan_data_internal_avx512(%rip), %zmm2, %zmm3{%k2}
> +        vmovups   coeff_2+__svml_datan_data_internal_avx512(%rip), %zmm2
> +
> +/* polynomial evaluation */
> +        vmulpd    {rn-sae}, %zmm0, %zmm0, %zmm14
> +        vmulpd    {rn-sae}, %zmm14, %zmm14, %zmm13
> +        vmulpd    {rn-sae}, %zmm0, %zmm14, %zmm15
> +        vfmadd231pd {rn-sae}, %zmm14, %zmm8, %zmm2
> +        vfmadd231pd {rn-sae}, %zmm14, %zmm9, %zmm12
> +        vfmadd213pd {rn-sae}, %zmm11, %zmm10, %zmm14
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm13, %zmm2
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm13, %zmm2
> +        vfmadd213pd {rn-sae}, %zmm0, %zmm15, %zmm2
> +        vaddpd    {rn-sae}, %zmm3, %zmm2, %zmm0
> +        vxorpd    %zmm1, %zmm0, %zmm0
> +        ret
> +
> +END(_ZGVeN8v_atan_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_datan_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 AbsMask[8][2];
> +        __declspec(align(64)) VUINT32 Shifter[8][2];
> +        __declspec(align(64)) VUINT32 MaxThreshold[8][2];
> +        __declspec(align(64)) VUINT32 MOne[8][2];
> +        __declspec(align(64)) VUINT32 One[8][2];
> +        __declspec(align(64)) VUINT32 LargeX[8][2];
> +        __declspec(align(64)) VUINT32 Zero[8][2];
> +        __declspec(align(64)) VUINT32 Tbl_H[32][2];
> +        __declspec(align(64)) VUINT32 dIndexMed[8][2];
> +        __declspec(align(64)) VUINT32 Pi2[8][2];
> +        __declspec(align(64)) VUINT32 coeff[6][8][2];
> +    } __svml_datan_data_internal_avx512;
> +#endif
> +__svml_datan_data_internal_avx512:
> +        /*== AbsMask ==*/
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== Shifter ==*/
> +        .align 64
> +        .quad 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000, 0x4318000000000000
> +        /*== MaxThreshold ==*/
> +        .align 64
> +        .quad 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000, 0x401f800000000000
> +        /*== MOne ==*/
> +        .align 64
> +        .quad 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000, 0xbff0000000000000
> +        /*== One ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== LargeX ==*/
> +        .align 64
> +        .quad 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000, 0x47f0000000000000
> +        /*== Zero ==*/
> +        .align 64
> +        .quad 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000
> +        /*== Tbl_H ==*/
> +        .align 64
> +        .quad 0x0000000000000000, 0x3fcf5b75f92c80dd
> +        .quad 0x3fddac670561bb4f, 0x3fe4978fa3269ee1
> +        .quad 0x3fe921fb54442d18, 0x3fecac7c57846f9e
> +        .quad 0x3fef730bd281f69b, 0x3ff0d38f2c5ba09f
> +        .quad 0x3ff1b6e192ebbe44, 0x3ff270ef55a53a25
> +        .quad 0x3ff30b6d796a4da8, 0x3ff38d6a6ce13353
> +        .quad 0x3ff3fc176b7a8560, 0x3ff45b54837351a0
> +        .quad 0x3ff4ae10fc6589a5, 0x3ff4f68dea672617
> +        .quad 0x3ff5368c951e9cfd, 0x3ff56f6f33a3e6a7
> +        .quad 0x3ff5a25052114e60, 0x3ff5d013c41adabd
> +        .quad 0x3ff5f97315254857, 0x3ff61f06c6a92b89
> +        .quad 0x3ff6414d44094c7c, 0x3ff660b02c736a06
> +        .quad 0x3ff67d8863bc99bd, 0x3ff698213a9d5053
> +        .quad 0x3ff6b0bae830c070, 0x3ff6c78c7edeb195
> +        .quad 0x3ff6dcc57bb565fd, 0x3ff6f08f07435fec
> +        .quad 0x3ff7030cf9403197, 0x3ff7145eac2088a4
> +        /*== dIndexMed ==*/
> +        .align 64
> +        .quad 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010, 0x4318000000000010
> +        /*== Pi2 ==*/
> +        .align 64
> +        .quad 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18, 0x3ff921fb54442d18
> +        /*== coeff6 ==*/
> +        .align 64
> +        .quad 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97, 0x3fb2e9b9f5c4fe97
> +        .quad 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc, 0xbfb74257c46790cc
> +        .quad 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0, 0x3fbc71bfeff916a0
> +        .quad 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da, 0xbfc249248eef04da
> +        .quad 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e, 0x3fc999999998741e
> +        .quad 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d, 0xbfd555555555554d
> +        .align 64
> +        .type	__svml_datan_data_internal_avx512,@object
> +        .size	__svml_datan_data_internal_avx512,.-__svml_datan_data_internal_avx512
> +        .align 8
> +
> +.FLT_10:
> +        .long	0x00000000,0x3ff00000
> +        .type	.FLT_10,@object
> +        .size	.FLT_10,8
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S
> new file mode 100644
> index 0000000000..27623cdf16
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized atanf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_atanf _ZGVeN16v_atanf_avx2_wrapper
> +#include "../svml_s_atanf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c
> new file mode 100644
> index 0000000000..940de26615
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atanf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_atanf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_atanf, __GI__ZGVeN16v_atanf,
> +	       __redirect__ZGVeN16v_atanf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S
> new file mode 100644
> index 0000000000..4a37f03e69
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf16_core_avx512.S
> @@ -0,0 +1,174 @@
> +/* Function atanf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + */
> +
> +/* Offsets for data table __svml_satan_data_internal_avx512
> + */
> +#define AbsMask                       	0
> +#define Shifter                       	64
> +#define MaxThreshold                  	128
> +#define MOne                          	192
> +#define One                           	256
> +#define LargeX                        	320
> +#define Zero                          	384
> +#define Tbl_H                         	448
> +#define Pi2                           	576
> +#define coeff_1                       	640
> +#define coeff_2                       	704
> +#define coeff_3                       	768
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_atanf_skx)
> +        vandps    __svml_satan_data_internal_avx512(%rip), %zmm0, %zmm7
> +        vmovups   MaxThreshold+__svml_satan_data_internal_avx512(%rip), %zmm3
> +        vmovups   One+__svml_satan_data_internal_avx512(%rip), %zmm8
> +
> +/* round to 2 bits after binary point */
> +        vreduceps $40, {sae}, %zmm7, %zmm5
> +
> +/* saturate X range */
> +        vmovups   LargeX+__svml_satan_data_internal_avx512(%rip), %zmm6
> +        vmovups   Shifter+__svml_satan_data_internal_avx512(%rip), %zmm2
> +        vcmpps    $29, {sae}, %zmm3, %zmm7, %k1
> +
> +/* table lookup sequence */
> +        vmovups   Tbl_H+__svml_satan_data_internal_avx512(%rip), %zmm3
> +        vsubps    {rn-sae}, %zmm5, %zmm7, %zmm4
> +        vaddps    {rn-sae}, %zmm2, %zmm7, %zmm1
> +        vxorps    %zmm0, %zmm7, %zmm0
> +        vfmadd231ps {rn-sae}, %zmm7, %zmm4, %zmm8
> +        vmovups   coeff_2+__svml_satan_data_internal_avx512(%rip), %zmm4
> +
> +/* if|X|>=MaxThreshold, set DiffX=-1 */
> +        vblendmps MOne+__svml_satan_data_internal_avx512(%rip), %zmm5, %zmm9{%k1}
> +        vmovups   coeff_3+__svml_satan_data_internal_avx512(%rip), %zmm5
> +
> +/* if|X|>=MaxThreshold, set Y=X */
> +        vminps    {sae}, %zmm7, %zmm6, %zmm8{%k1}
> +
> +/* R+Rl = DiffX/Y */
> +        vgetmantps $0, {sae}, %zmm9, %zmm12
> +        vgetexpps {sae}, %zmm9, %zmm10
> +        vpermt2ps Tbl_H+64+__svml_satan_data_internal_avx512(%rip), %zmm1, %zmm3
> +        vgetmantps $0, {sae}, %zmm8, %zmm15
> +        vgetexpps {sae}, %zmm8, %zmm11
> +        vmovups   coeff_1+__svml_satan_data_internal_avx512(%rip), %zmm1
> +
> +/* set table value to Pi/2 for large X */
> +        vblendmps Pi2+__svml_satan_data_internal_avx512(%rip), %zmm3, %zmm9{%k1}
> +        vrcp14ps  %zmm15, %zmm13
> +        vsubps    {rn-sae}, %zmm11, %zmm10, %zmm2
> +        vmulps    {rn-sae}, %zmm13, %zmm12, %zmm14
> +        vfnmadd213ps {rn-sae}, %zmm12, %zmm14, %zmm15
> +        vfmadd213ps {rn-sae}, %zmm14, %zmm13, %zmm15
> +        vscalefps {rn-sae}, %zmm2, %zmm15, %zmm7
> +
> +/* polynomial evaluation */
> +        vmulps    {rn-sae}, %zmm7, %zmm7, %zmm8
> +        vmulps    {rn-sae}, %zmm7, %zmm8, %zmm6
> +        vfmadd231ps {rn-sae}, %zmm8, %zmm1, %zmm4
> +        vfmadd213ps {rn-sae}, %zmm5, %zmm4, %zmm8
> +        vfmadd213ps {rn-sae}, %zmm7, %zmm6, %zmm8
> +        vaddps    {rn-sae}, %zmm9, %zmm8, %zmm10
> +        vxorps    %zmm0, %zmm10, %zmm0
> +        ret
> +
> +END(_ZGVeN16v_atanf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_satan_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 AbsMask[16][1];
> +        __declspec(align(64)) VUINT32 Shifter[16][1];
> +        __declspec(align(64)) VUINT32 MaxThreshold[16][1];
> +        __declspec(align(64)) VUINT32 MOne[16][1];
> +        __declspec(align(64)) VUINT32 One[16][1];
> +        __declspec(align(64)) VUINT32 LargeX[16][1];
> +        __declspec(align(64)) VUINT32 Zero[16][1];
> +        __declspec(align(64)) VUINT32 Tbl_H[32][1];
> +        __declspec(align(64)) VUINT32 Pi2[16][1];
> +        __declspec(align(64)) VUINT32 coeff[3][16][1];
> +    } __svml_satan_data_internal_avx512;
> +#endif
> +__svml_satan_data_internal_avx512:
> +        /*== AbsMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== Shifter ==*/
> +        .align 64
> +        .long 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000, 0x4a000000
> +        /*== MaxThreshold ==*/
> +        .align 64
> +        .long 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000, 0x40F80000
> +        /*== MOne ==*/
> +        .align 64
> +        .long 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000, 0xbf800000
> +        /*== One ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== LargeX ==*/
> +        .align 64
> +        .long 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000, 0x4f800000
> +        /*== Zero ==*/
> +        .align 64
> +        .long 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000
> +        /*== Tbl_H ==*/
> +        .align 64
> +        .long 0x00000000, 0x3e7adbb0
> +        .long 0x3eed6338, 0x3f24bc7d
> +        .long 0x3f490fdb, 0x3f6563e3
> +        .long 0x3f7b985f, 0x3f869c79
> +        .long 0x3f8db70d, 0x3f93877b
> +        .long 0x3f985b6c, 0x3f9c6b53
> +        .long 0x3f9fe0bb, 0x3fa2daa4
> +        .long 0x3fa57088, 0x3fa7b46f
> +        .long 0x3fa9b465, 0x3fab7b7a
> +        .long 0x3fad1283, 0x3fae809e
> +        .long 0x3fafcb99, 0x3fb0f836
> +        .long 0x3fb20a6a, 0x3fb30581
> +        .long 0x3fb3ec43, 0x3fb4c10a
> +        .long 0x3fb585d7, 0x3fb63c64
> +        .long 0x3fb6e62c, 0x3fb78478
> +        .long 0x3fb81868, 0x3fb8a2f5
> +        /*== Pi2 ==*/
> +        .align 64
> +        .long 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB, 0x3fc90FDB
> +        /*== coeff3 ==*/
> +        .align 64
> +        .long 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de, 0xbe0fa8de
> +        .long 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2, 0x3e4cc8e2
> +        .long 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa, 0xbeaaaaaa
> +        .align 64
> +        .type	__svml_satan_data_internal_avx512,@object
> +        .size	__svml_satan_data_internal_avx512,.-__svml_satan_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S
> new file mode 100644
> index 0000000000..fe81170666
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized atanf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_atanf _ZGVbN4v_atanf_sse2
> +#include "../svml_s_atanf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c
> new file mode 100644
> index 0000000000..975ece6812
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atanf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_atanf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_atanf, __GI__ZGVbN4v_atanf,
> +	       __redirect__ZGVbN4v_atanf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S
> new file mode 100644
> index 0000000000..c58a894e10
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf4_core_sse4.S
> @@ -0,0 +1,164 @@
> +/* Function atanf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + */
> +
> +/* Offsets for data table __svml_satan_data_internal
> + */
> +#define _sSIGN_MASK                   	0
> +#define _sABS_MASK                    	16
> +#define _sONE                         	32
> +#define _sPIO2                        	48
> +#define _sPC8                         	64
> +#define _sPC7                         	80
> +#define _sPC6                         	96
> +#define _sPC5                         	112
> +#define _sPC4                         	128
> +#define _sPC3                         	144
> +#define _sPC2                         	160
> +#define _sPC1                         	176
> +#define _sPC0                         	192
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_atanf_sse4)
> +/*
> + * To use minps\maxps operations for argument reduction
> + * uncomment _AT_USEMINMAX_ definition
> + *  Declarations
> + * Variables
> + * Constants
> + */
> +        movups    _sABS_MASK+__svml_satan_data_internal(%rip), %xmm2
> +
> +/*
> + * 1) If x>1,      then r=-1/x, PIO2=Pi/2
> + * 2) If -1<=x<=1, then r=x,    PIO2=0
> + * 3) If x<-1,     then r=-1/x, PIO2=-Pi/2
> + */
> +        movups    _sONE+__svml_satan_data_internal(%rip), %xmm1
> +        andps     %xmm0, %xmm2
> +        movaps    %xmm2, %xmm9
> +        movaps    %xmm1, %xmm3
> +        cmpleps   %xmm1, %xmm9
> +        maxps     %xmm2, %xmm3
> +        minps     %xmm2, %xmm1
> +        divps     %xmm3, %xmm1
> +        movups    __svml_satan_data_internal(%rip), %xmm4
> +        movaps    %xmm9, %xmm10
> +        andps     %xmm4, %xmm0
> +        andnps    %xmm4, %xmm9
> +        pxor      %xmm0, %xmm9
> +        pxor      %xmm1, %xmm9
> +
> +/* Polynomial. */
> +        movaps    %xmm9, %xmm8
> +        mulps     %xmm9, %xmm8
> +        movaps    %xmm8, %xmm7
> +        mulps     %xmm8, %xmm7
> +        movups    _sPC8+__svml_satan_data_internal(%rip), %xmm6
> +        mulps     %xmm7, %xmm6
> +        movups    _sPC7+__svml_satan_data_internal(%rip), %xmm5
> +        mulps     %xmm7, %xmm5
> +        addps     _sPC6+__svml_satan_data_internal(%rip), %xmm6
> +        mulps     %xmm7, %xmm6
> +        addps     _sPC5+__svml_satan_data_internal(%rip), %xmm5
> +        mulps     %xmm7, %xmm5
> +        addps     _sPC4+__svml_satan_data_internal(%rip), %xmm6
> +        mulps     %xmm7, %xmm6
> +        addps     _sPC3+__svml_satan_data_internal(%rip), %xmm5
> +        mulps     %xmm5, %xmm7
> +        addps     _sPC2+__svml_satan_data_internal(%rip), %xmm6
> +        mulps     %xmm8, %xmm6
> +        addps     _sPC1+__svml_satan_data_internal(%rip), %xmm7
> +        andnps    _sPIO2+__svml_satan_data_internal(%rip), %xmm10
> +        addps     %xmm6, %xmm7
> +        mulps     %xmm7, %xmm8
> +        pxor      %xmm0, %xmm10
> +        addps     _sPC0+__svml_satan_data_internal(%rip), %xmm8
> +
> +/* Reconstruction. */
> +        mulps     %xmm8, %xmm9
> +        addps     %xmm9, %xmm10
> +        movaps    %xmm10, %xmm0
> +        ret
> +
> +END(_ZGVbN4v_atanf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_satan_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 _sSIGN_MASK[4][1];
> +        __declspec(align(16)) VUINT32 _sABS_MASK[4][1];
> +        __declspec(align(16)) VUINT32 _sONE[4][1];
> +        __declspec(align(16)) VUINT32 _sPIO2[4][1];
> +        __declspec(align(16)) VUINT32 _sPC8[4][1];
> +        __declspec(align(16)) VUINT32 _sPC7[4][1];
> +        __declspec(align(16)) VUINT32 _sPC6[4][1];
> +        __declspec(align(16)) VUINT32 _sPC5[4][1];
> +        __declspec(align(16)) VUINT32 _sPC4[4][1];
> +        __declspec(align(16)) VUINT32 _sPC3[4][1];
> +        __declspec(align(16)) VUINT32 _sPC2[4][1];
> +        __declspec(align(16)) VUINT32 _sPC1[4][1];
> +        __declspec(align(16)) VUINT32 _sPC0[4][1];
> +} __svml_satan_data_internal;
> +#endif
> +__svml_satan_data_internal:
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000 //_sSIGN_MASK
> +        .align 16
> +        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF //_sABS_MASK
> +        .align 16
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000 //_sONE
> +        .align 16
> +        .long 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB //_sPIO2
> +        .align 16
> +        .long 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0 //_sPC8
> +        .align 16
> +        .long 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631 //_sPC7
> +        .align 16
> +        .long 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384 //_sPC6
> +        .align 16
> +        .long 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629 //_sPC5
> +        .align 16
> +        .long 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474 //_sPC4
> +        .align 16
> +        .long 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8 //_sPC3
> +        .align 16
> +        .long 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F //_sPC2
> +        .align 16
> +        .long 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49 //_sPC1
> +        .align 16
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000 //_sPC0
> +        .align 16
> +        .type	__svml_satan_data_internal,@object
> +        .size	__svml_satan_data_internal,.-__svml_satan_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S
> new file mode 100644
> index 0000000000..1652a8f5c6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized atanf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_atanf _ZGVdN8v_atanf_sse_wrapper
> +#include "../svml_s_atanf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c
> new file mode 100644
> index 0000000000..733d8c3bc3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atanf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_atanf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_atanf, __GI__ZGVdN8v_atanf,
> +	       __redirect__ZGVdN8v_atanf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S
> new file mode 100644
> index 0000000000..e333f979c4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanf8_core_avx2.S
> @@ -0,0 +1,148 @@
> +/* Function atanf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + */
> +
> +/* Offsets for data table __svml_satan_data_internal
> + */
> +#define _sSIGN_MASK                   	0
> +#define _sABS_MASK                    	32
> +#define _sONE                         	64
> +#define _sPIO2                        	96
> +#define _sPC8                         	128
> +#define _sPC7                         	160
> +#define _sPC6                         	192
> +#define _sPC5                         	224
> +#define _sPC4                         	256
> +#define _sPC3                         	288
> +#define _sPC2                         	320
> +#define _sPC1                         	352
> +#define _sPC0                         	384
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_atanf_avx2)
> +/*
> + * 1) If x>1,      then r=-1/x, PIO2=Pi/2
> + * 2) If -1<=x<=1, then r=x,    PIO2=0
> + * 3) If x<-1,     then r=-1/x, PIO2=-Pi/2
> + */
> +        vmovups   _sONE+__svml_satan_data_internal(%rip), %ymm2
> +        vmovups   __svml_satan_data_internal(%rip), %ymm7
> +        vmovups   _sPC7+__svml_satan_data_internal(%rip), %ymm13
> +
> +/*
> + * To use minps\maxps operations for argument reduction
> + * uncomment _AT_USEMINMAX_ definition
> + *  Declarations
> + * Variables
> + * Constants
> + */
> +        vandps    _sABS_MASK+__svml_satan_data_internal(%rip), %ymm0, %ymm3
> +        vmaxps    %ymm3, %ymm2, %ymm5
> +        vminps    %ymm3, %ymm2, %ymm4
> +        vcmple_oqps %ymm2, %ymm3, %ymm6
> +        vdivps    %ymm5, %ymm4, %ymm11
> +        vandps    %ymm7, %ymm0, %ymm9
> +        vandnps   %ymm7, %ymm6, %ymm8
> +        vxorps    %ymm9, %ymm8, %ymm10
> +        vxorps    %ymm11, %ymm10, %ymm15
> +
> +/* Polynomial. */
> +        vmulps    %ymm15, %ymm15, %ymm14
> +        vmovups   _sPC8+__svml_satan_data_internal(%rip), %ymm0
> +        vmulps    %ymm14, %ymm14, %ymm12
> +        vfmadd213ps _sPC6+__svml_satan_data_internal(%rip), %ymm12, %ymm0
> +        vfmadd213ps _sPC5+__svml_satan_data_internal(%rip), %ymm12, %ymm13
> +        vfmadd213ps _sPC4+__svml_satan_data_internal(%rip), %ymm12, %ymm0
> +        vfmadd213ps _sPC3+__svml_satan_data_internal(%rip), %ymm12, %ymm13
> +        vfmadd213ps _sPC2+__svml_satan_data_internal(%rip), %ymm12, %ymm0
> +        vfmadd213ps _sPC1+__svml_satan_data_internal(%rip), %ymm12, %ymm13
> +        vfmadd213ps %ymm13, %ymm14, %ymm0
> +        vfmadd213ps _sPC0+__svml_satan_data_internal(%rip), %ymm14, %ymm0
> +        vandnps   _sPIO2+__svml_satan_data_internal(%rip), %ymm6, %ymm1
> +        vxorps    %ymm9, %ymm1, %ymm1
> +
> +/* Reconstruction. */
> +        vfmadd213ps %ymm1, %ymm15, %ymm0
> +        ret
> +
> +END(_ZGVdN8v_atanf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_satan_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 _sSIGN_MASK[8][1];
> +        __declspec(align(32)) VUINT32 _sABS_MASK[8][1];
> +        __declspec(align(32)) VUINT32 _sONE[8][1];
> +        __declspec(align(32)) VUINT32 _sPIO2[8][1];
> +        __declspec(align(32)) VUINT32 _sPC8[8][1];
> +        __declspec(align(32)) VUINT32 _sPC7[8][1];
> +        __declspec(align(32)) VUINT32 _sPC6[8][1];
> +        __declspec(align(32)) VUINT32 _sPC5[8][1];
> +        __declspec(align(32)) VUINT32 _sPC4[8][1];
> +        __declspec(align(32)) VUINT32 _sPC3[8][1];
> +        __declspec(align(32)) VUINT32 _sPC2[8][1];
> +        __declspec(align(32)) VUINT32 _sPC1[8][1];
> +        __declspec(align(32)) VUINT32 _sPC0[8][1];
> +} __svml_satan_data_internal;
> +#endif
> +__svml_satan_data_internal:
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000 //_sSIGN_MASK
> +        .align 32
> +        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF //_sABS_MASK
> +        .align 32
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000 //_sONE
> +        .align 32
> +        .long 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB //_sPIO2
> +        .align 32
> +        .long 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0 //_sPC8
> +        .align 32
> +        .long 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631 //_sPC7
> +        .align 32
> +        .long 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384 //_sPC6
> +        .align 32
> +        .long 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629 //_sPC5
> +        .align 32
> +        .long 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474 //_sPC4
> +        .align 32
> +        .long 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8 //_sPC3
> +        .align 32
> +        .long 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F //_sPC2
> +        .align 32
> +        .long 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49 //_sPC1
> +        .align 32
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000 //_sPC0
> +        .align 32
> +        .type	__svml_satan_data_internal,@object
> +        .size	__svml_satan_data_internal,.-__svml_satan_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_atan2_core.S b/sysdeps/x86_64/fpu/svml_d_atan2_core.S
> new file mode 100644
> index 0000000000..e86d5b7047
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atan2_core.S
> @@ -0,0 +1,29 @@
> +/* Function atan vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_atan)
> +WRAPPER_IMPL_SSE2 atan
> +END (_ZGVbN2v_atan)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_atan)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_atan4_core.S b/sysdeps/x86_64/fpu/svml_d_atan4_core.S
> new file mode 100644
> index 0000000000..eb11fd2f17
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atan4_core.S
> @@ -0,0 +1,29 @@
> +/* Function atan vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_atan)
> +WRAPPER_IMPL_AVX _ZGVbN2v_atan
> +END (_ZGVdN4v_atan)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_atan)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S
> new file mode 100644
> index 0000000000..b83a4be33d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atan4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function atan vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_atan)
> +WRAPPER_IMPL_AVX _ZGVbN2v_atan
> +END (_ZGVcN4v_atan)
> diff --git a/sysdeps/x86_64/fpu/svml_d_atan8_core.S b/sysdeps/x86_64/fpu/svml_d_atan8_core.S
> new file mode 100644
> index 0000000000..9685a32bdc
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atan8_core.S
> @@ -0,0 +1,25 @@
> +/* Function atan vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_atan)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_atan
> +END (_ZGVeN8v_atan)
> diff --git a/sysdeps/x86_64/fpu/svml_s_atanf16_core.S b/sysdeps/x86_64/fpu/svml_s_atanf16_core.S
> new file mode 100644
> index 0000000000..f82d2422ae
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atanf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function atanf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_atanf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_atanf
> +END (_ZGVeN16v_atanf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_atanf4_core.S b/sysdeps/x86_64/fpu/svml_s_atanf4_core.S
> new file mode 100644
> index 0000000000..6b8c4d9624
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atanf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function atanf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_atanf)
> +WRAPPER_IMPL_SSE2 atanf
> +END (_ZGVbN4v_atanf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_atanf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_atanf8_core.S b/sysdeps/x86_64/fpu/svml_s_atanf8_core.S
> new file mode 100644
> index 0000000000..315681f6c0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atanf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function atanf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_atanf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_atanf
> +END (_ZGVdN8v_atanf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_atanf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S
> new file mode 100644
> index 0000000000..b9cd502186
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atanf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function atanf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_atanf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_atanf
> +END (_ZGVcN8v_atanf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c
> new file mode 100644
> index 0000000000..0f7176a20b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-atan.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c
> new file mode 100644
> index 0000000000..0f7176a20b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-atan.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c
> new file mode 100644
> index 0000000000..0f7176a20b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-atan.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan.c
> new file mode 100644
> index 0000000000..982687b169
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC atan
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 0abc7d2021..467c913990 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVbN2v_log)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVbN2vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos)
> +VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index dda093b914..b72a7de84e 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVdN4v_log)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVdN4v_exp)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVdN4vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos)
> +VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index f3230463bb..d2434df21e 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVcN4v_log)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVcN4v_exp)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVcN4vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos)
> +VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index cf9f52faf0..f7aaf8159e 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVeN8v_log)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVeN8v_exp)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVeN8vv_pow)
>  VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos)
> +VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c
> new file mode 100644
> index 0000000000..9251c65f8a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-atanf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c
> new file mode 100644
> index 0000000000..9251c65f8a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-atanf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c
> new file mode 100644
> index 0000000000..9251c65f8a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-atanf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanf.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanf.c
> new file mode 100644
> index 0000000000..2a8ab87e86
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC atanf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index abbd3ed870..af769c56fa 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVeN16v_logf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVeN16v_expf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index 8a24027952..76e61d2f1e 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index aff0442606..5e27eaaf29 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -31,6 +31,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVdN8v_logf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVdN8v_expf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 913584d111..28daf79aa9 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -28,6 +28,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVcN8v_logf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVcN8v_expf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 05/18] x86-64: Add vector exp10/exp10f implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 05/18] x86-64: Add vector exp10/exp10f " Sunil K Pandey
@ 2021-12-29 21:25   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:25 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:47PM -0800, Sunil K Pandey wrote:
> Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector exp10/exp10f with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
>  .../fpu/multiarch/svml_d_exp102_core-sse2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_d_exp102_core.c |  27 ++
>  .../fpu/multiarch/svml_d_exp102_core_sse4.S   | 418 +++++++++++++++++
>  .../fpu/multiarch/svml_d_exp104_core-sse.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_exp104_core.c |  27 ++
>  .../fpu/multiarch/svml_d_exp104_core_avx2.S   | 429 ++++++++++++++++++
>  .../fpu/multiarch/svml_d_exp108_core-avx2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_d_exp108_core.c |  27 ++
>  .../fpu/multiarch/svml_d_exp108_core_avx512.S | 287 ++++++++++++
>  .../fpu/multiarch/svml_s_exp10f16_core-avx2.S |  20 +
>  .../fpu/multiarch/svml_s_exp10f16_core.c      |  28 ++
>  .../multiarch/svml_s_exp10f16_core_avx512.S   | 269 +++++++++++
>  .../fpu/multiarch/svml_s_exp10f4_core-sse2.S  |  20 +
>  .../fpu/multiarch/svml_s_exp10f4_core.c       |  28 ++
>  .../fpu/multiarch/svml_s_exp10f4_core_sse4.S  | 311 +++++++++++++
>  .../fpu/multiarch/svml_s_exp10f8_core-sse.S   |  20 +
>  .../fpu/multiarch/svml_s_exp10f8_core.c       |  28 ++
>  .../fpu/multiarch/svml_s_exp10f8_core_avx2.S  | 331 ++++++++++++++
>  sysdeps/x86_64/fpu/svml_d_exp102_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_d_exp104_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S   |  25 +
>  sysdeps/x86_64/fpu/svml_d_exp108_core.S       |  25 +
>  sysdeps/x86_64/fpu/svml_s_exp10f16_core.S     |  25 +
>  sysdeps/x86_64/fpu/svml_s_exp10f4_core.S      |  29 ++
>  sysdeps/x86_64/fpu/svml_s_exp10f8_core.S      |  29 ++
>  sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S  |  25 +
>  .../fpu/test-double-libmvec-exp10-avx.c       |   1 +
>  .../fpu/test-double-libmvec-exp10-avx2.c      |   1 +
>  .../fpu/test-double-libmvec-exp10-avx512f.c   |   1 +
>  .../x86_64/fpu/test-double-libmvec-exp10.c    |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../fpu/test-float-libmvec-exp10f-avx.c       |   1 +
>  .../fpu/test-float-libmvec-exp10f-avx2.c      |   1 +
>  .../fpu/test-float-libmvec-exp10f-avx512f.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-exp10f.c    |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 2617 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_exp102_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_exp104_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_exp108_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-exp10.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 36d6643eb9..bc18621f17 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -153,4 +153,15 @@
>  #define __DECL_SIMD_exp2f32x
>  #define __DECL_SIMD_exp2f64x
>  #define __DECL_SIMD_exp2f128x
> +
> +#define __DECL_SIMD_exp10
> +#define __DECL_SIMD_exp10f
> +#define __DECL_SIMD_exp10l
> +#define __DECL_SIMD_exp10f16
> +#define __DECL_SIMD_exp10f32
> +#define __DECL_SIMD_exp10f64
> +#define __DECL_SIMD_exp10f128
> +#define __DECL_SIMD_exp10f32x
> +#define __DECL_SIMD_exp10f64x
> +#define __DECL_SIMD_exp10f128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 645088cbf3..870778457f 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -111,7 +111,7 @@ __MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr)) __nonnull ((2));
>  
>  #if __GLIBC_USE (IEC_60559_FUNCS_EXT_C2X)
>  /* Compute exponent to base ten.  */
> -__MATHCALL (exp10,, (_Mdouble_ __x));
> +__MATHCALL_VEC (exp10,, (_Mdouble_ __x));
>  #endif
>  
>  #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index 1717f2dee9..b3c1f59593 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -49,40 +49,48 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
>  GLIBC_2.35 _ZGVbN2v_acos F
>  GLIBC_2.35 _ZGVbN2v_asin F
>  GLIBC_2.35 _ZGVbN2v_atan F
> +GLIBC_2.35 _ZGVbN2v_exp10 F
>  GLIBC_2.35 _ZGVbN2v_exp2 F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
>  GLIBC_2.35 _ZGVbN4v_atanf F
> +GLIBC_2.35 _ZGVbN4v_exp10f F
>  GLIBC_2.35 _ZGVbN4v_exp2f F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
>  GLIBC_2.35 _ZGVcN4v_asin F
>  GLIBC_2.35 _ZGVcN4v_atan F
> +GLIBC_2.35 _ZGVcN4v_exp10 F
>  GLIBC_2.35 _ZGVcN4v_exp2 F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
>  GLIBC_2.35 _ZGVcN8v_atanf F
> +GLIBC_2.35 _ZGVcN8v_exp10f F
>  GLIBC_2.35 _ZGVcN8v_exp2f F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
>  GLIBC_2.35 _ZGVdN4v_asin F
>  GLIBC_2.35 _ZGVdN4v_atan F
> +GLIBC_2.35 _ZGVdN4v_exp10 F
>  GLIBC_2.35 _ZGVdN4v_exp2 F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
>  GLIBC_2.35 _ZGVdN8v_atanf F
> +GLIBC_2.35 _ZGVdN8v_exp10f F
>  GLIBC_2.35 _ZGVdN8v_exp2f F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
>  GLIBC_2.35 _ZGVeN16v_atanf F
> +GLIBC_2.35 _ZGVeN16v_exp10f F
>  GLIBC_2.35 _ZGVeN16v_exp2f F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
>  GLIBC_2.35 _ZGVeN8v_asin F
>  GLIBC_2.35 _ZGVeN8v_atan F
> +GLIBC_2.35 _ZGVeN8v_exp10 F
>  GLIBC_2.35 _ZGVeN8v_exp2 F
>  GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index c7a972521b..f3f9c2e092 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -78,6 +78,10 @@
>  #  define __DECL_SIMD_exp2 __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_exp2f
>  #  define __DECL_SIMD_exp2f __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_exp10
> +#  define __DECL_SIMD_exp10 __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_exp10f
> +#  define __DECL_SIMD_exp10f __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 0994e6dfac..c033abbedc 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -38,6 +38,8 @@
>  !GCC$ builtin (hypotf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (exp2) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (exp2f) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (exp10) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (exp10f) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -61,3 +63,5 @@
>  !GCC$ builtin (hypotf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (exp2) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (exp2f) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (exp10) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (exp10f) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 03b2364417..fd0a9da439 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -27,6 +27,7 @@ libmvec-funcs = \
>    atan \
>    cos \
>    exp \
> +  exp10 \
>    exp2 \
>    hypot \
>    log \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 12b7ad1830..f29cfa4cbf 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -17,11 +17,13 @@ libmvec {
>      _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
>      _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
>      _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
> +    _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
>      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
>      _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
> +    _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
>      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
>      _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
>    }
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index bc4479ad39..45f2e4bb53 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -1252,6 +1252,26 @@ float: 1
>  float128: 3
>  ldouble: 2
>  
> +Function: "exp10_vlen16":
> +float: 3
> +
> +Function: "exp10_vlen2":
> +double: 1
> +
> +Function: "exp10_vlen4":
> +double: 1
> +float: 1
> +
> +Function: "exp10_vlen4_avx2":
> +double: 1
> +
> +Function: "exp10_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "exp10_vlen8_avx2":
> +float: 1
> +
>  Function: "exp2":
>  double: 1
>  float: 1
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S
> new file mode 100644
> index 0000000000..ab615c0323
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized exp10, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_exp10 _ZGVbN2v_exp10_sse2
> +#include "../svml_d_exp102_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c
> new file mode 100644
> index 0000000000..5c5625b278
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized exp10, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_exp10
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_exp10, __GI__ZGVbN2v_exp10, __redirect__ZGVbN2v_exp10)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S
> new file mode 100644
> index 0000000000..7c6e5de3e0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp102_core_sse4.S
> @@ -0,0 +1,418 @@
> +/* Function exp10 vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   exp10(x)  = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y))
> + *   where
> + *        x = m*log10(2)/K + y,  y in [-log10(2)/K..log10(2)/K]
> + *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
> + *
> + *        values of 2^j/K are tabulated
> + *
> + *        P(y) is a minimax polynomial approximation of exp10(x)-1
> + *        on small interval [-log10(2)/K..log10(2)/K]
> + *
> + *  Special cases:
> + *
> + *   exp10(NaN)  = NaN
> + *   exp10(+INF) = +INF
> + *   exp10(-INF) = 0
> + *   exp10(x)    = 1 for subnormals
> + *   For IEEE double
> + *     if x >  3.39782712893383973096e+02 then exp10(x) overflow
> + *     if x < -3.45133219101941108420e+02 then exp10(x) underflow
> + *
> + */
> +
> +/* Offsets for data table __svml_dexp10_data_internal
> + */
> +#define _dbT                          	0
> +#define _dbLg2_10                     	1024
> +#define _dbShifter                    	1040
> +#define _dbInvLg2_10hi                	1056
> +#define _dbInvLg2_10lo                	1072
> +#define _dPC1                         	1088
> +#define _dPC2                         	1104
> +#define _dPC3                         	1120
> +#define _dPC4                         	1136
> +#define _dPC5                         	1152
> +#define _lExpMask                     	1168
> +#define _iIndexMask                   	1184
> +#define _iAbsMask                     	1200
> +#define _iDomainRange                 	1216
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_exp10_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +
> +/*  R  */
> +        movaps    %xmm0, %xmm12
> +
> +/*  Load arument  */
> +        movups    _dbLg2_10+__svml_dexp10_data_internal(%rip), %xmm13
> +        lea       __svml_dexp10_data_internal(%rip), %rsi
> +        mulpd     %xmm0, %xmm13
> +        movups    _dbShifter+__svml_dexp10_data_internal(%rip), %xmm1
> +        addpd     %xmm1, %xmm13
> +        movaps    %xmm13, %xmm9
> +        subpd     %xmm1, %xmm9
> +        movups    _dbInvLg2_10hi+__svml_dexp10_data_internal(%rip), %xmm8
> +        mulpd     %xmm9, %xmm8
> +        movups    _dbInvLg2_10lo+__svml_dexp10_data_internal(%rip), %xmm10
> +        mulpd     %xmm9, %xmm10
> +        subpd     %xmm8, %xmm12
> +        subpd     %xmm10, %xmm12
> +
> +/*
> + *  Polynomial
> + * poly(dN) = a1*dR+...+a5*dR^5
> + */
> +        movups    _dPC5+__svml_dexp10_data_internal(%rip), %xmm11
> +        mulpd     %xmm12, %xmm11
> +        addpd     _dPC4+__svml_dexp10_data_internal(%rip), %xmm11
> +        mulpd     %xmm12, %xmm11
> +        addpd     _dPC3+__svml_dexp10_data_internal(%rip), %xmm11
> +        mulpd     %xmm12, %xmm11
> +        addpd     _dPC2+__svml_dexp10_data_internal(%rip), %xmm11
> +
> +/* a1+...+a5*dR^4 ! */
> +        mulpd     %xmm12, %xmm11
> +        addpd     _dPC1+__svml_dexp10_data_internal(%rip), %xmm11
> +        movq      _iIndexMask+__svml_dexp10_data_internal(%rip), %xmm5
> +
> +/*  Index and lookup  */
> +        pshufd    $136, %xmm13, %xmm6
> +
> +/*  2^N  */
> +        psllq     $45, %xmm13
> +        pand      %xmm5, %xmm6
> +
> +/* iIndex*=sizeof(D); */
> +        pslld     $3, %xmm6
> +        movd      %xmm6, %eax
> +        pshufd    $1, %xmm6, %xmm7
> +        movq      _iAbsMask+__svml_dexp10_data_internal(%rip), %xmm2
> +
> +/* a1*dR+...+a5*dR^5 */
> +        mulpd     %xmm11, %xmm12
> +        movd      %xmm7, %ecx
> +
> +/* Check for overflow\underflow  */
> +        pshufd    $221, %xmm0, %xmm4
> +        movq      _iDomainRange+__svml_dexp10_data_internal(%rip), %xmm3
> +        pand      %xmm2, %xmm4
> +        movslq    %eax, %rax
> +        pcmpgtd   %xmm3, %xmm4
> +        movslq    %ecx, %rcx
> +        movmskps  %xmm4, %edx
> +
> +/* lM==EXP(2^N) */
> +        pand      _lExpMask+__svml_dexp10_data_internal(%rip), %xmm13
> +        movsd     (%rsi,%rax), %xmm1
> +        movhpd    (%rsi,%rcx), %xmm1
> +
> +/* Tj*poly */
> +        mulpd     %xmm1, %xmm12
> +        addpd     %xmm12, %xmm1
> +
> +/* quick 2^N */
> +        paddq     %xmm13, %xmm1
> +        andl      $3, %edx
> +
> +/*  Finish   */
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm1, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm1, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm1
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      exp10@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN2v_exp10_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dexp10_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _dbT[(1<<7)][2];
> +        __declspec(align(16)) VUINT32 _dbLg2_10[2][2];
> +        __declspec(align(16)) VUINT32 _dbShifter[2][2];
> +        __declspec(align(16)) VUINT32 _dbInvLg2_10hi[2][2];
> +        __declspec(align(16)) VUINT32 _dbInvLg2_10lo[2][2];
> +        __declspec(align(16)) VUINT32 _dPC1[2][2];
> +        __declspec(align(16)) VUINT32 _dPC2[2][2];
> +        __declspec(align(16)) VUINT32 _dPC3[2][2];
> +        __declspec(align(16)) VUINT32 _dPC4[2][2];
> +        __declspec(align(16)) VUINT32 _dPC5[2][2];
> +        __declspec(align(16)) VUINT32 _lExpMask[2][2];
> +        __declspec(align(16)) VUINT32 _iIndexMask[4][1];
> +        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
> +} __svml_dexp10_data_internal;
> +#endif
> +__svml_dexp10_data_internal:
> +        /*== _dbT ==*/
> +        .quad 0x3ff0000000000000    /*2^( 0 /128)*/
> +        .quad 0x3ff0163da9fb3335    /*2^( 1 /128)*/
> +        .quad 0x3ff02c9a3e778061    /*2^( 2 /128)*/
> +        .quad 0x3ff04315e86e7f85    /*2^( 3 /128)*/
> +        .quad 0x3ff059b0d3158574    /*2^( 4 /128)*/
> +        .quad 0x3ff0706b29ddf6de    /*2^( 5 /128)*/
> +        .quad 0x3ff0874518759bc8    /*2^( 6 /128)*/
> +        .quad 0x3ff09e3ecac6f383    /*2^( 7 /128)*/
> +        .quad 0x3ff0b5586cf9890f    /*2^( 8 /128)*/
> +        .quad 0x3ff0cc922b7247f7    /*2^( 9 /128)*/
> +        .quad 0x3ff0e3ec32d3d1a2    /*2^( 10 /128)*/
> +        .quad 0x3ff0fb66affed31b    /*2^( 11 /128)*/
> +        .quad 0x3ff11301d0125b51    /*2^( 12 /128)*/
> +        .quad 0x3ff12abdc06c31cc    /*2^( 13 /128)*/
> +        .quad 0x3ff1429aaea92de0    /*2^( 14 /128)*/
> +        .quad 0x3ff15a98c8a58e51    /*2^( 15 /128)*/
> +        .quad 0x3ff172b83c7d517b    /*2^( 16 /128)*/
> +        .quad 0x3ff18af9388c8dea    /*2^( 17 /128)*/
> +        .quad 0x3ff1a35beb6fcb75    /*2^( 18 /128)*/
> +        .quad 0x3ff1bbe084045cd4    /*2^( 19 /128)*/
> +        .quad 0x3ff1d4873168b9aa    /*2^( 20 /128)*/
> +        .quad 0x3ff1ed5022fcd91d    /*2^( 21 /128)*/
> +        .quad 0x3ff2063b88628cd6    /*2^( 22 /128)*/
> +        .quad 0x3ff21f49917ddc96    /*2^( 23 /128)*/
> +        .quad 0x3ff2387a6e756238    /*2^( 24 /128)*/
> +        .quad 0x3ff251ce4fb2a63f    /*2^( 25 /128)*/
> +        .quad 0x3ff26b4565e27cdd    /*2^( 26 /128)*/
> +        .quad 0x3ff284dfe1f56381    /*2^( 27 /128)*/
> +        .quad 0x3ff29e9df51fdee1    /*2^( 28 /128)*/
> +        .quad 0x3ff2b87fd0dad990    /*2^( 29 /128)*/
> +        .quad 0x3ff2d285a6e4030b    /*2^( 30 /128)*/
> +        .quad 0x3ff2ecafa93e2f56    /*2^( 31 /128)*/
> +        .quad 0x3ff306fe0a31b715    /*2^( 32 /128)*/
> +        .quad 0x3ff32170fc4cd831    /*2^( 33 /128)*/
> +        .quad 0x3ff33c08b26416ff    /*2^( 34 /128)*/
> +        .quad 0x3ff356c55f929ff1    /*2^( 35 /128)*/
> +        .quad 0x3ff371a7373aa9cb    /*2^( 36 /128)*/
> +        .quad 0x3ff38cae6d05d866    /*2^( 37 /128)*/
> +        .quad 0x3ff3a7db34e59ff7    /*2^( 38 /128)*/
> +        .quad 0x3ff3c32dc313a8e5    /*2^( 39 /128)*/
> +        .quad 0x3ff3dea64c123422    /*2^( 40 /128)*/
> +        .quad 0x3ff3fa4504ac801c    /*2^( 41 /128)*/
> +        .quad 0x3ff4160a21f72e2a    /*2^( 42 /128)*/
> +        .quad 0x3ff431f5d950a897    /*2^( 43 /128)*/
> +        .quad 0x3ff44e086061892d    /*2^( 44 /128)*/
> +        .quad 0x3ff46a41ed1d0057    /*2^( 45 /128)*/
> +        .quad 0x3ff486a2b5c13cd0    /*2^( 46 /128)*/
> +        .quad 0x3ff4a32af0d7d3de    /*2^( 47 /128)*/
> +        .quad 0x3ff4bfdad5362a27    /*2^( 48 /128)*/
> +        .quad 0x3ff4dcb299fddd0d    /*2^( 49 /128)*/
> +        .quad 0x3ff4f9b2769d2ca7    /*2^( 50 /128)*/
> +        .quad 0x3ff516daa2cf6642    /*2^( 51 /128)*/
> +        .quad 0x3ff5342b569d4f82    /*2^( 52 /128)*/
> +        .quad 0x3ff551a4ca5d920f    /*2^( 53 /128)*/
> +        .quad 0x3ff56f4736b527da    /*2^( 54 /128)*/
> +        .quad 0x3ff58d12d497c7fd    /*2^( 55 /128)*/
> +        .quad 0x3ff5ab07dd485429    /*2^( 56 /128)*/
> +        .quad 0x3ff5c9268a5946b7    /*2^( 57 /128)*/
> +        .quad 0x3ff5e76f15ad2148    /*2^( 58 /128)*/
> +        .quad 0x3ff605e1b976dc09    /*2^( 59 /128)*/
> +        .quad 0x3ff6247eb03a5585    /*2^( 60 /128)*/
> +        .quad 0x3ff6434634ccc320    /*2^( 61 /128)*/
> +        .quad 0x3ff6623882552225    /*2^( 62 /128)*/
> +        .quad 0x3ff68155d44ca973    /*2^( 63 /128)*/
> +        .quad 0x3ff6a09e667f3bcd    /*2^( 64 /128)*/
> +        .quad 0x3ff6c012750bdabf    /*2^( 65 /128)*/
> +        .quad 0x3ff6dfb23c651a2f    /*2^( 66 /128)*/
> +        .quad 0x3ff6ff7df9519484    /*2^( 67 /128)*/
> +        .quad 0x3ff71f75e8ec5f74    /*2^( 68 /128)*/
> +        .quad 0x3ff73f9a48a58174    /*2^( 69 /128)*/
> +        .quad 0x3ff75feb564267c9    /*2^( 70 /128)*/
> +        .quad 0x3ff780694fde5d3f    /*2^( 71 /128)*/
> +        .quad 0x3ff7a11473eb0187    /*2^( 72 /128)*/
> +        .quad 0x3ff7c1ed0130c132    /*2^( 73 /128)*/
> +        .quad 0x3ff7e2f336cf4e62    /*2^( 74 /128)*/
> +        .quad 0x3ff80427543e1a12    /*2^( 75 /128)*/
> +        .quad 0x3ff82589994cce13    /*2^( 76 /128)*/
> +        .quad 0x3ff8471a4623c7ad    /*2^( 77 /128)*/
> +        .quad 0x3ff868d99b4492ed    /*2^( 78 /128)*/
> +        .quad 0x3ff88ac7d98a6699    /*2^( 79 /128)*/
> +        .quad 0x3ff8ace5422aa0db    /*2^( 80 /128)*/
> +        .quad 0x3ff8cf3216b5448c    /*2^( 81 /128)*/
> +        .quad 0x3ff8f1ae99157736    /*2^( 82 /128)*/
> +        .quad 0x3ff9145b0b91ffc6    /*2^( 83 /128)*/
> +        .quad 0x3ff93737b0cdc5e5    /*2^( 84 /128)*/
> +        .quad 0x3ff95a44cbc8520f    /*2^( 85 /128)*/
> +        .quad 0x3ff97d829fde4e50    /*2^( 86 /128)*/
> +        .quad 0x3ff9a0f170ca07ba    /*2^( 87 /128)*/
> +        .quad 0x3ff9c49182a3f090    /*2^( 88 /128)*/
> +        .quad 0x3ff9e86319e32323    /*2^( 89 /128)*/
> +        .quad 0x3ffa0c667b5de565    /*2^( 90 /128)*/
> +        .quad 0x3ffa309bec4a2d33    /*2^( 91 /128)*/
> +        .quad 0x3ffa5503b23e255d    /*2^( 92 /128)*/
> +        .quad 0x3ffa799e1330b358    /*2^( 93 /128)*/
> +        .quad 0x3ffa9e6b5579fdbf    /*2^( 94 /128)*/
> +        .quad 0x3ffac36bbfd3f37a    /*2^( 95 /128)*/
> +        .quad 0x3ffae89f995ad3ad    /*2^( 96 /128)*/
> +        .quad 0x3ffb0e07298db666    /*2^( 97 /128)*/
> +        .quad 0x3ffb33a2b84f15fb    /*2^( 98 /128)*/
> +        .quad 0x3ffb59728de5593a    /*2^( 99 /128)*/
> +        .quad 0x3ffb7f76f2fb5e47    /*2^( 100 /128)*/
> +        .quad 0x3ffba5b030a1064a    /*2^( 101 /128)*/
> +        .quad 0x3ffbcc1e904bc1d2    /*2^( 102 /128)*/
> +        .quad 0x3ffbf2c25bd71e09    /*2^( 103 /128)*/
> +        .quad 0x3ffc199bdd85529c    /*2^( 104 /128)*/
> +        .quad 0x3ffc40ab5fffd07a    /*2^( 105 /128)*/
> +        .quad 0x3ffc67f12e57d14b    /*2^( 106 /128)*/
> +        .quad 0x3ffc8f6d9406e7b5    /*2^( 107 /128)*/
> +        .quad 0x3ffcb720dcef9069    /*2^( 108 /128)*/
> +        .quad 0x3ffcdf0b555dc3fa    /*2^( 109 /128)*/
> +        .quad 0x3ffd072d4a07897c    /*2^( 110 /128)*/
> +        .quad 0x3ffd2f87080d89f2    /*2^( 111 /128)*/
> +        .quad 0x3ffd5818dcfba487    /*2^( 112 /128)*/
> +        .quad 0x3ffd80e316c98398    /*2^( 113 /128)*/
> +        .quad 0x3ffda9e603db3285    /*2^( 114 /128)*/
> +        .quad 0x3ffdd321f301b460    /*2^( 115 /128)*/
> +        .quad 0x3ffdfc97337b9b5f    /*2^( 116 /128)*/
> +        .quad 0x3ffe264614f5a129    /*2^( 117 /128)*/
> +        .quad 0x3ffe502ee78b3ff6    /*2^( 118 /128)*/
> +        .quad 0x3ffe7a51fbc74c83    /*2^( 119 /128)*/
> +        .quad 0x3ffea4afa2a490da    /*2^( 120 /128)*/
> +        .quad 0x3ffecf482d8e67f1    /*2^( 121 /128)*/
> +        .quad 0x3ffefa1bee615a27    /*2^( 122 /128)*/
> +        .quad 0x3fff252b376bba97    /*2^( 123 /128)*/
> +        .quad 0x3fff50765b6e4540    /*2^( 124 /128)*/
> +        .quad 0x3fff7bfdad9cbe14    /*2^( 125 /128)*/
> +        .quad 0x3fffa7c1819e90d8    /*2^( 126 /128)*/
> +        .quad 0x3fffd3c22b8f71f1     /*2^( 127 /128)*/
> +        .align 16
> +        .quad 0x407a934f0979a371, 0x407a934f0979a371  /* _dbLg2_10*2^K */
> +        .align 16
> +        .quad 0x4338800000000000, 0x4338800000000000  /* _dbShifter */
> +        .align 16
> +        .quad 0x3f63441350a00000, 0x3f63441350a00000  /* _dbInvLg2_10hi/2^K 53-11-K bits*/
> +        .align 16
> +        .quad 0xbd10c0219dc1da99, 0xbd10c0219dc1da99  /* _dbInvLg2_10lo/2^K */
> +        //PC0 = 1.0
> +        .align 16
> +        .quad 0x40026bb1bbb55516, 0x40026bb1bbb55516  /* _dPC1 */
> +        .align 16
> +        .quad 0x40053524c73ce8e3, 0x40053524c73ce8e3  /* _dPC2 */
> +        .align 16
> +        .quad 0x4000470591ccea8b, 0x4000470591ccea8b  /* _dPC3 */
> +        .align 16
> +        .quad 0x3ff2bd767584db59, 0x3ff2bd767584db59  /* _dPC4 */
> +        .align 16
> +        .quad 0x3fe144c03efafb54, 0x3fe144c03efafb54  /* _dPC5 */
> +        .align 16
> +        .quad 0xfff0000000000000, 0xfff0000000000000  /* _lExpMask */
> +        .align 16
> +        .long 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f          /* _iIndexMask =(2^K-1)*/
> +        //common
> +        .align 16
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff          /* _iAbsMask */
> +        .align 16
> +        .long 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70 /* _iDomainRange */
> +        .align 16
> +        .type	__svml_dexp10_data_internal,@object
> +        .size	__svml_dexp10_data_internal,.-__svml_dexp10_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S
> new file mode 100644
> index 0000000000..260c052143
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized exp10, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_exp10 _ZGVdN4v_exp10_sse_wrapper
> +#include "../svml_d_exp104_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c
> new file mode 100644
> index 0000000000..e3e302be72
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized exp10, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_exp10
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_exp10, __GI__ZGVdN4v_exp10, __redirect__ZGVdN4v_exp10)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S
> new file mode 100644
> index 0000000000..1a53f43c9e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp104_core_avx2.S
> @@ -0,0 +1,429 @@
> +/* Function exp10 vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   exp10(x)  = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y))
> + *   where
> + *        x = m*log10(2)/K + y,  y in [-log10(2)/K..log10(2)/K]
> + *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
> + *
> + *        values of 2^j/K are tabulated
> + *
> + *        P(y) is a minimax polynomial approximation of exp10(x)-1
> + *        on small interval [-log10(2)/K..log10(2)/K]
> + *
> + *  Special cases:
> + *
> + *   exp10(NaN)  = NaN
> + *   exp10(+INF) = +INF
> + *   exp10(-INF) = 0
> + *   exp10(x)    = 1 for subnormals
> + *   For IEEE double
> + *     if x >  3.39782712893383973096e+02 then exp10(x) overflow
> + *     if x < -3.45133219101941108420e+02 then exp10(x) underflow
> + *
> + */
> +
> +/* Offsets for data table __svml_dexp10_data_internal
> + */
> +#define _dbT                          	0
> +#define _dbLg2_10                     	1024
> +#define _dbShifter                    	1056
> +#define _dbInvLg2_10hi                	1088
> +#define _dbInvLg2_10lo                	1120
> +#define _dPC1                         	1152
> +#define _dPC2                         	1184
> +#define _dPC3                         	1216
> +#define _dPC4                         	1248
> +#define _dPC5                         	1280
> +#define _lExpMask                     	1312
> +#define _iIndexMask                   	1344
> +#define _iAbsMask                     	1376
> +#define _iDomainRange                 	1408
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_exp10_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       __svml_dexp10_data_internal(%rip), %r8
> +        vmovapd   %ymm0, %ymm2
> +        vmovupd   _dbShifter+__svml_dexp10_data_internal(%rip), %ymm3
> +
> +/*  Load arument  */
> +        vmovupd   _dbLg2_10+__svml_dexp10_data_internal(%rip), %ymm0
> +        vfmadd213pd %ymm3, %ymm2, %ymm0
> +        vsubpd    %ymm3, %ymm0, %ymm1
> +
> +/*  R  */
> +        vmovupd   _dbInvLg2_10hi+__svml_dexp10_data_internal(%rip), %ymm3
> +        vfnmadd213pd %ymm2, %ymm1, %ymm3
> +
> +/* Check for overflow\underflow  */
> +        vextractf128 $1, %ymm2, %xmm4
> +        vfnmadd132pd _dbInvLg2_10lo+__svml_dexp10_data_internal(%rip), %ymm3, %ymm1
> +        vshufps   $221, %xmm4, %xmm2, %xmm5
> +        vandps    _iAbsMask+__svml_dexp10_data_internal(%rip), %xmm5, %xmm6
> +        vpcmpgtd  _iDomainRange+__svml_dexp10_data_internal(%rip), %xmm6, %xmm7
> +
> +/*
> + *  Polynomial
> + * poly(dN) = a1*dR+...+a5*dR^5
> + */
> +        vmovupd   _dPC5+__svml_dexp10_data_internal(%rip), %ymm4
> +        vmovmskps %xmm7, %eax
> +        vfmadd213pd _dPC4+__svml_dexp10_data_internal(%rip), %ymm1, %ymm4
> +        vfmadd213pd _dPC3+__svml_dexp10_data_internal(%rip), %ymm1, %ymm4
> +        vfmadd213pd _dPC2+__svml_dexp10_data_internal(%rip), %ymm1, %ymm4
> +
> +/* a1+...+a5*dR^4 ! */
> +        vfmadd213pd _dPC1+__svml_dexp10_data_internal(%rip), %ymm1, %ymm4
> +
> +/* a1*dR+...+a5*dR^5 */
> +        vmulpd    %ymm4, %ymm1, %ymm1
> +
> +/*  Index and lookup  */
> +        vextractf128 $1, %ymm0, %xmm8
> +        vshufps   $136, %xmm8, %xmm0, %xmm9
> +        vandps    _iIndexMask+__svml_dexp10_data_internal(%rip), %xmm9, %xmm10
> +
> +/* iIndex*=sizeof(D); */
> +        vpslld    $3, %xmm10, %xmm13
> +        vmovd     %xmm13, %edx
> +
> +/*  2^N  */
> +        vpsllq    $45, %ymm0, %ymm0
> +        vpextrd   $2, %xmm13, %esi
> +        movslq    %edx, %rdx
> +        vpextrd   $1, %xmm13, %ecx
> +        movslq    %esi, %rsi
> +        vpextrd   $3, %xmm13, %edi
> +        movslq    %ecx, %rcx
> +        movslq    %edi, %rdi
> +        vmovsd    (%r8,%rdx), %xmm11
> +        vmovsd    (%r8,%rsi), %xmm14
> +        vmovhpd   (%r8,%rcx), %xmm11, %xmm12
> +        vmovhpd   (%r8,%rdi), %xmm14, %xmm15
> +
> +/* lM==EXP(2^N) */
> +        vpand     _lExpMask+__svml_dexp10_data_internal(%rip), %ymm0, %ymm6
> +        vinsertf128 $1, %xmm15, %ymm12, %ymm5
> +
> +/* Tj*poly */
> +        vfmadd213pd %ymm5, %ymm5, %ymm1
> +
> +/* quick 2^N */
> +        vpaddq    %ymm6, %ymm1, %ymm0
> +
> +/*  Finish   */
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm2, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      exp10@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_exp10_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dexp10_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _dbT[(1<<7)][2];
> +        __declspec(align(32)) VUINT32 _dbLg2_10[4][2];
> +        __declspec(align(32)) VUINT32 _dbShifter[4][2];
> +        __declspec(align(32)) VUINT32 _dbInvLg2_10hi[4][2];
> +        __declspec(align(32)) VUINT32 _dbInvLg2_10lo[4][2];
> +        __declspec(align(32)) VUINT32 _dPC1[4][2];
> +        __declspec(align(32)) VUINT32 _dPC2[4][2];
> +        __declspec(align(32)) VUINT32 _dPC3[4][2];
> +        __declspec(align(32)) VUINT32 _dPC4[4][2];
> +        __declspec(align(32)) VUINT32 _dPC5[4][2];
> +        __declspec(align(32)) VUINT32 _lExpMask[4][2];
> +        __declspec(align(32)) VUINT32 _iIndexMask[8][1];
> +        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
> +} __svml_dexp10_data_internal;
> +#endif
> +__svml_dexp10_data_internal:
> +        /*== _dbT ==*/
> +        .quad 0x3ff0000000000000    /*2^( 0 /128)*/
> +        .quad 0x3ff0163da9fb3335    /*2^( 1 /128)*/
> +        .quad 0x3ff02c9a3e778061    /*2^( 2 /128)*/
> +        .quad 0x3ff04315e86e7f85    /*2^( 3 /128)*/
> +        .quad 0x3ff059b0d3158574    /*2^( 4 /128)*/
> +        .quad 0x3ff0706b29ddf6de    /*2^( 5 /128)*/
> +        .quad 0x3ff0874518759bc8    /*2^( 6 /128)*/
> +        .quad 0x3ff09e3ecac6f383    /*2^( 7 /128)*/
> +        .quad 0x3ff0b5586cf9890f    /*2^( 8 /128)*/
> +        .quad 0x3ff0cc922b7247f7    /*2^( 9 /128)*/
> +        .quad 0x3ff0e3ec32d3d1a2    /*2^( 10 /128)*/
> +        .quad 0x3ff0fb66affed31b    /*2^( 11 /128)*/
> +        .quad 0x3ff11301d0125b51    /*2^( 12 /128)*/
> +        .quad 0x3ff12abdc06c31cc    /*2^( 13 /128)*/
> +        .quad 0x3ff1429aaea92de0    /*2^( 14 /128)*/
> +        .quad 0x3ff15a98c8a58e51    /*2^( 15 /128)*/
> +        .quad 0x3ff172b83c7d517b    /*2^( 16 /128)*/
> +        .quad 0x3ff18af9388c8dea    /*2^( 17 /128)*/
> +        .quad 0x3ff1a35beb6fcb75    /*2^( 18 /128)*/
> +        .quad 0x3ff1bbe084045cd4    /*2^( 19 /128)*/
> +        .quad 0x3ff1d4873168b9aa    /*2^( 20 /128)*/
> +        .quad 0x3ff1ed5022fcd91d    /*2^( 21 /128)*/
> +        .quad 0x3ff2063b88628cd6    /*2^( 22 /128)*/
> +        .quad 0x3ff21f49917ddc96    /*2^( 23 /128)*/
> +        .quad 0x3ff2387a6e756238    /*2^( 24 /128)*/
> +        .quad 0x3ff251ce4fb2a63f    /*2^( 25 /128)*/
> +        .quad 0x3ff26b4565e27cdd    /*2^( 26 /128)*/
> +        .quad 0x3ff284dfe1f56381    /*2^( 27 /128)*/
> +        .quad 0x3ff29e9df51fdee1    /*2^( 28 /128)*/
> +        .quad 0x3ff2b87fd0dad990    /*2^( 29 /128)*/
> +        .quad 0x3ff2d285a6e4030b    /*2^( 30 /128)*/
> +        .quad 0x3ff2ecafa93e2f56    /*2^( 31 /128)*/
> +        .quad 0x3ff306fe0a31b715    /*2^( 32 /128)*/
> +        .quad 0x3ff32170fc4cd831    /*2^( 33 /128)*/
> +        .quad 0x3ff33c08b26416ff    /*2^( 34 /128)*/
> +        .quad 0x3ff356c55f929ff1    /*2^( 35 /128)*/
> +        .quad 0x3ff371a7373aa9cb    /*2^( 36 /128)*/
> +        .quad 0x3ff38cae6d05d866    /*2^( 37 /128)*/
> +        .quad 0x3ff3a7db34e59ff7    /*2^( 38 /128)*/
> +        .quad 0x3ff3c32dc313a8e5    /*2^( 39 /128)*/
> +        .quad 0x3ff3dea64c123422    /*2^( 40 /128)*/
> +        .quad 0x3ff3fa4504ac801c    /*2^( 41 /128)*/
> +        .quad 0x3ff4160a21f72e2a    /*2^( 42 /128)*/
> +        .quad 0x3ff431f5d950a897    /*2^( 43 /128)*/
> +        .quad 0x3ff44e086061892d    /*2^( 44 /128)*/
> +        .quad 0x3ff46a41ed1d0057    /*2^( 45 /128)*/
> +        .quad 0x3ff486a2b5c13cd0    /*2^( 46 /128)*/
> +        .quad 0x3ff4a32af0d7d3de    /*2^( 47 /128)*/
> +        .quad 0x3ff4bfdad5362a27    /*2^( 48 /128)*/
> +        .quad 0x3ff4dcb299fddd0d    /*2^( 49 /128)*/
> +        .quad 0x3ff4f9b2769d2ca7    /*2^( 50 /128)*/
> +        .quad 0x3ff516daa2cf6642    /*2^( 51 /128)*/
> +        .quad 0x3ff5342b569d4f82    /*2^( 52 /128)*/
> +        .quad 0x3ff551a4ca5d920f    /*2^( 53 /128)*/
> +        .quad 0x3ff56f4736b527da    /*2^( 54 /128)*/
> +        .quad 0x3ff58d12d497c7fd    /*2^( 55 /128)*/
> +        .quad 0x3ff5ab07dd485429    /*2^( 56 /128)*/
> +        .quad 0x3ff5c9268a5946b7    /*2^( 57 /128)*/
> +        .quad 0x3ff5e76f15ad2148    /*2^( 58 /128)*/
> +        .quad 0x3ff605e1b976dc09    /*2^( 59 /128)*/
> +        .quad 0x3ff6247eb03a5585    /*2^( 60 /128)*/
> +        .quad 0x3ff6434634ccc320    /*2^( 61 /128)*/
> +        .quad 0x3ff6623882552225    /*2^( 62 /128)*/
> +        .quad 0x3ff68155d44ca973    /*2^( 63 /128)*/
> +        .quad 0x3ff6a09e667f3bcd    /*2^( 64 /128)*/
> +        .quad 0x3ff6c012750bdabf    /*2^( 65 /128)*/
> +        .quad 0x3ff6dfb23c651a2f    /*2^( 66 /128)*/
> +        .quad 0x3ff6ff7df9519484    /*2^( 67 /128)*/
> +        .quad 0x3ff71f75e8ec5f74    /*2^( 68 /128)*/
> +        .quad 0x3ff73f9a48a58174    /*2^( 69 /128)*/
> +        .quad 0x3ff75feb564267c9    /*2^( 70 /128)*/
> +        .quad 0x3ff780694fde5d3f    /*2^( 71 /128)*/
> +        .quad 0x3ff7a11473eb0187    /*2^( 72 /128)*/
> +        .quad 0x3ff7c1ed0130c132    /*2^( 73 /128)*/
> +        .quad 0x3ff7e2f336cf4e62    /*2^( 74 /128)*/
> +        .quad 0x3ff80427543e1a12    /*2^( 75 /128)*/
> +        .quad 0x3ff82589994cce13    /*2^( 76 /128)*/
> +        .quad 0x3ff8471a4623c7ad    /*2^( 77 /128)*/
> +        .quad 0x3ff868d99b4492ed    /*2^( 78 /128)*/
> +        .quad 0x3ff88ac7d98a6699    /*2^( 79 /128)*/
> +        .quad 0x3ff8ace5422aa0db    /*2^( 80 /128)*/
> +        .quad 0x3ff8cf3216b5448c    /*2^( 81 /128)*/
> +        .quad 0x3ff8f1ae99157736    /*2^( 82 /128)*/
> +        .quad 0x3ff9145b0b91ffc6    /*2^( 83 /128)*/
> +        .quad 0x3ff93737b0cdc5e5    /*2^( 84 /128)*/
> +        .quad 0x3ff95a44cbc8520f    /*2^( 85 /128)*/
> +        .quad 0x3ff97d829fde4e50    /*2^( 86 /128)*/
> +        .quad 0x3ff9a0f170ca07ba    /*2^( 87 /128)*/
> +        .quad 0x3ff9c49182a3f090    /*2^( 88 /128)*/
> +        .quad 0x3ff9e86319e32323    /*2^( 89 /128)*/
> +        .quad 0x3ffa0c667b5de565    /*2^( 90 /128)*/
> +        .quad 0x3ffa309bec4a2d33    /*2^( 91 /128)*/
> +        .quad 0x3ffa5503b23e255d    /*2^( 92 /128)*/
> +        .quad 0x3ffa799e1330b358    /*2^( 93 /128)*/
> +        .quad 0x3ffa9e6b5579fdbf    /*2^( 94 /128)*/
> +        .quad 0x3ffac36bbfd3f37a    /*2^( 95 /128)*/
> +        .quad 0x3ffae89f995ad3ad    /*2^( 96 /128)*/
> +        .quad 0x3ffb0e07298db666    /*2^( 97 /128)*/
> +        .quad 0x3ffb33a2b84f15fb    /*2^( 98 /128)*/
> +        .quad 0x3ffb59728de5593a    /*2^( 99 /128)*/
> +        .quad 0x3ffb7f76f2fb5e47    /*2^( 100 /128)*/
> +        .quad 0x3ffba5b030a1064a    /*2^( 101 /128)*/
> +        .quad 0x3ffbcc1e904bc1d2    /*2^( 102 /128)*/
> +        .quad 0x3ffbf2c25bd71e09    /*2^( 103 /128)*/
> +        .quad 0x3ffc199bdd85529c    /*2^( 104 /128)*/
> +        .quad 0x3ffc40ab5fffd07a    /*2^( 105 /128)*/
> +        .quad 0x3ffc67f12e57d14b    /*2^( 106 /128)*/
> +        .quad 0x3ffc8f6d9406e7b5    /*2^( 107 /128)*/
> +        .quad 0x3ffcb720dcef9069    /*2^( 108 /128)*/
> +        .quad 0x3ffcdf0b555dc3fa    /*2^( 109 /128)*/
> +        .quad 0x3ffd072d4a07897c    /*2^( 110 /128)*/
> +        .quad 0x3ffd2f87080d89f2    /*2^( 111 /128)*/
> +        .quad 0x3ffd5818dcfba487    /*2^( 112 /128)*/
> +        .quad 0x3ffd80e316c98398    /*2^( 113 /128)*/
> +        .quad 0x3ffda9e603db3285    /*2^( 114 /128)*/
> +        .quad 0x3ffdd321f301b460    /*2^( 115 /128)*/
> +        .quad 0x3ffdfc97337b9b5f    /*2^( 116 /128)*/
> +        .quad 0x3ffe264614f5a129    /*2^( 117 /128)*/
> +        .quad 0x3ffe502ee78b3ff6    /*2^( 118 /128)*/
> +        .quad 0x3ffe7a51fbc74c83    /*2^( 119 /128)*/
> +        .quad 0x3ffea4afa2a490da    /*2^( 120 /128)*/
> +        .quad 0x3ffecf482d8e67f1    /*2^( 121 /128)*/
> +        .quad 0x3ffefa1bee615a27    /*2^( 122 /128)*/
> +        .quad 0x3fff252b376bba97    /*2^( 123 /128)*/
> +        .quad 0x3fff50765b6e4540    /*2^( 124 /128)*/
> +        .quad 0x3fff7bfdad9cbe14    /*2^( 125 /128)*/
> +        .quad 0x3fffa7c1819e90d8    /*2^( 126 /128)*/
> +        .quad 0x3fffd3c22b8f71f1     /*2^( 127 /128)*/
> +        .align 32
> +        .quad 0x407a934f0979a371, 0x407a934f0979a371, 0x407a934f0979a371, 0x407a934f0979a371  /* _dbLg2_10*2^K */
> +        .align 32
> +        .quad 0x4338800000000000, 0x4338800000000000, 0x4338800000000000, 0x4338800000000000  /* _dbShifter */
> +        .align 32
> +        .quad 0x3f63441350a00000, 0x3f63441350a00000, 0x3f63441350a00000, 0x3f63441350a00000  /* _dbInvLg2_10hi/2^K 53-11-K bits*/
> +        .align 32
> +        .quad 0xbd10c0219dc1da99, 0xbd10c0219dc1da99, 0xbd10c0219dc1da99, 0xbd10c0219dc1da99  /* _dbInvLg2_10lo/2^K */
> +        //PC0 = 1.0
> +        .align 32
> +        .quad 0x40026bb1bbb55516, 0x40026bb1bbb55516, 0x40026bb1bbb55516, 0x40026bb1bbb55516  /* _dPC1 */
> +        .align 32
> +        .quad 0x40053524c73ce8e3, 0x40053524c73ce8e3, 0x40053524c73ce8e3, 0x40053524c73ce8e3  /* _dPC2 */
> +        .align 32
> +        .quad 0x4000470591ccea8b, 0x4000470591ccea8b, 0x4000470591ccea8b, 0x4000470591ccea8b  /* _dPC3 */
> +        .align 32
> +        .quad 0x3ff2bd767584db59, 0x3ff2bd767584db59, 0x3ff2bd767584db59, 0x3ff2bd767584db59  /* _dPC4 */
> +        .align 32
> +        .quad 0x3fe144c03efafb54, 0x3fe144c03efafb54, 0x3fe144c03efafb54, 0x3fe144c03efafb54  /* _dPC5 */
> +        .align 32
> +        .quad 0xfff0000000000000, 0xfff0000000000000, 0xfff0000000000000, 0xfff0000000000000  /* _lExpMask */
> +        .align 32
> +        .long 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f          /* _iIndexMask =(2^K-1)*/
> +        //common
> +        .align 32
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff          /* _iAbsMask */
> +        .align 32
> +        .long 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70, 0x40733a70 /* _iDomainRange */
> +        .align 32
> +        .type	__svml_dexp10_data_internal,@object
> +        .size	__svml_dexp10_data_internal,.-__svml_dexp10_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S
> new file mode 100644
> index 0000000000..3aff9446d3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized exp10, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_exp10 _ZGVeN8v_exp10_avx2_wrapper
> +#include "../svml_d_exp108_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c
> new file mode 100644
> index 0000000000..d592663169
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized exp10, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_exp10
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_exp10, __GI__ZGVeN8v_exp10, __redirect__ZGVeN8v_exp10)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S
> new file mode 100644
> index 0000000000..953cb5bc1a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_exp108_core_avx512.S
> @@ -0,0 +1,287 @@
> +/* Function exp10 vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *   Typical exp10() implementation, except that:
> + *    - tables are small (16 elements), allowing for fast gathers
> + *    - all arguments processed in the main path
> + *        - final VSCALEF assists branch-free design (correct overflow/underflow and special case responses)
> + *        - a VAND is used to ensure the reduced argument |R|<2, even for large inputs
> + *        - RZ mode used to avoid oveflow to +/-Inf for x*log2(e); helps with special case handling
> + *        - SAE used to avoid spurious flag settings
> + *
> + */
> +
> +/* Offsets for data table __svml_dexp10_data_internal_avx512
> + */
> +#define Exp_tbl_H                     	0
> +#define L2E                           	128
> +#define Shifter                       	192
> +#define L2H                           	256
> +#define L2L                           	320
> +#define EMask                         	384
> +#define poly_coeff6                   	448
> +#define poly_coeff5                   	512
> +#define poly_coeff4                   	576
> +#define poly_coeff3                   	640
> +#define poly_coeff2                   	704
> +#define poly_coeff1                   	768
> +#define AbsMask                       	832
> +#define Threshold                     	896
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_exp10_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   L2E+__svml_dexp10_data_internal_avx512(%rip), %zmm4
> +        vmovups   Shifter+__svml_dexp10_data_internal_avx512(%rip), %zmm2
> +        vmovups   L2H+__svml_dexp10_data_internal_avx512(%rip), %zmm5
> +        vmovups   L2L+__svml_dexp10_data_internal_avx512(%rip), %zmm3
> +
> +/* polynomial */
> +        vmovups   poly_coeff6+__svml_dexp10_data_internal_avx512(%rip), %zmm6
> +        vmovups   poly_coeff4+__svml_dexp10_data_internal_avx512(%rip), %zmm7
> +        vmovups   poly_coeff3+__svml_dexp10_data_internal_avx512(%rip), %zmm9
> +        vmovups   poly_coeff2+__svml_dexp10_data_internal_avx512(%rip), %zmm8
> +        vmovups   poly_coeff1+__svml_dexp10_data_internal_avx512(%rip), %zmm11
> +        vmovups   Threshold+__svml_dexp10_data_internal_avx512(%rip), %zmm14
> +        vmovaps   %zmm0, %zmm1
> +
> +/* 2^(52-4)*1.5 + x * log2(e) */
> +        vfmadd213pd {rz-sae}, %zmm2, %zmm1, %zmm4
> +        vandpd    AbsMask+__svml_dexp10_data_internal_avx512(%rip), %zmm1, %zmm13
> +
> +/* Z0 ~ x*log2(e), rounded down to 4 fractional bits */
> +        vsubpd    {rn-sae}, %zmm2, %zmm4, %zmm0
> +
> +/* Table lookup: Th */
> +        vmovups   __svml_dexp10_data_internal_avx512(%rip), %zmm2
> +        vcmppd    $29, {sae}, %zmm14, %zmm13, %k0
> +
> +/* R = x - Z0*log(2) */
> +        vfnmadd213pd {rn-sae}, %zmm1, %zmm0, %zmm5
> +        vpermt2pd Exp_tbl_H+64+__svml_dexp10_data_internal_avx512(%rip), %zmm4, %zmm2
> +        kmovw     %k0, %edx
> +        vfnmadd231pd {rn-sae}, %zmm0, %zmm3, %zmm5
> +        vmovups   poly_coeff5+__svml_dexp10_data_internal_avx512(%rip), %zmm3
> +
> +/* ensure |R|<2 even for special cases */
> +        vandpd    EMask+__svml_dexp10_data_internal_avx512(%rip), %zmm5, %zmm12
> +        vmulpd    {rn-sae}, %zmm12, %zmm12, %zmm10
> +        vmulpd    {rn-sae}, %zmm12, %zmm2, %zmm15
> +        vfmadd231pd {rn-sae}, %zmm12, %zmm6, %zmm3
> +        vfmadd231pd {rn-sae}, %zmm12, %zmm7, %zmm9
> +        vfmadd231pd {rn-sae}, %zmm12, %zmm8, %zmm11
> +        vfmadd213pd {rn-sae}, %zmm9, %zmm10, %zmm3
> +        vfmadd213pd {rn-sae}, %zmm11, %zmm10, %zmm3
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm15, %zmm3
> +        vscalefpd {rn-sae}, %zmm0, %zmm3, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm1, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      exp10@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_exp10_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dexp10_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Exp_tbl_H[16][2];
> +        __declspec(align(64)) VUINT32 L2E[8][2];
> +        __declspec(align(64)) VUINT32 Shifter[8][2];
> +        __declspec(align(64)) VUINT32 L2H[8][2];
> +        __declspec(align(64)) VUINT32 L2L[8][2];
> +        __declspec(align(64)) VUINT32 EMask[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
> +        __declspec(align(64)) VUINT32 AbsMask[8][2];
> +        __declspec(align(64)) VUINT32 Threshold[8][2];
> +    } __svml_dexp10_data_internal_avx512;
> +#endif
> +__svml_dexp10_data_internal_avx512:
> +        /*== Exp_tbl_H ==*/
> +        .quad 0x3ff0000000000000
> +        .quad 0x3ff0b5586cf9890f
> +        .quad 0x3ff172b83c7d517b
> +        .quad 0x3ff2387a6e756238
> +        .quad 0x3ff306fe0a31b715
> +        .quad 0x3ff3dea64c123422
> +        .quad 0x3ff4bfdad5362a27
> +        .quad 0x3ff5ab07dd485429
> +        .quad 0x3ff6a09e667f3bcd
> +        .quad 0x3ff7a11473eb0187
> +        .quad 0x3ff8ace5422aa0db
> +        .quad 0x3ff9c49182a3f090
> +        .quad 0x3ffae89f995ad3ad
> +        .quad 0x3ffc199bdd85529c
> +        .quad 0x3ffd5818dcfba487
> +        .quad 0x3ffea4afa2a490da
> +        /*== log2(e) ==*/
> +        .align 64
> +        .quad 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371, 0x400A934F0979A371
> +        /*== Shifter=2^(52-4)*1.5 ==*/
> +        .align 64
> +        .quad 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0, 0x42f8000000003ff0
> +        /*== L2H = log(2)_high ==*/
> +        .align 64
> +        .quad 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff
> +        /*== L2L = log(2)_low ==*/
> +        .align 64
> +        .quad 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21, 0xbc49dc1da994fd21
> +        /*== EMask ==*/
> +        .align 64
> +        .quad 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff, 0xbfffffffffffffff
> +        /*== poly_coeff6 ==*/
> +        .align 64
> +        .quad 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020, 0x3fcb137ed8ac2020
> +        /*== poly_coeff5 ==*/
> +        .align 64
> +        .quad 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424, 0x3fe141a8e24f9424
> +        /*== poly_coeff4 ==*/
> +        .align 64
> +        .quad 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d, 0x3ff2bd77a0926c9d
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .quad 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8, 0x40004705908704c8
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .quad 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25, 0x40053524c73dfe25
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .quad 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2, 0x40026bb1bbb554c2
> +        /*== AbsMask ==*/
> +        .align 64
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== Threshold ==*/
> +        .align 64
> +        .quad 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41, 0x40733A7146F72A41
> +        .align 64
> +        .type	__svml_dexp10_data_internal_avx512,@object
> +        .size	__svml_dexp10_data_internal_avx512,.-__svml_dexp10_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S
> new file mode 100644
> index 0000000000..dda41c9c8f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized exp10f.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_exp10f _ZGVeN16v_exp10f_avx2_wrapper
> +#include "../svml_s_exp10f16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c
> new file mode 100644
> index 0000000000..8176a5912b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized exp10f, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_exp10f
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_exp10f, __GI__ZGVeN16v_exp10f,
> +	       __redirect__ZGVeN16v_exp10f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S
> new file mode 100644
> index 0000000000..fc9309c90f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f16_core_avx512.S
> @@ -0,0 +1,269 @@
> +/* Function exp10f vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *   Typical exp10() implementation, except that:
> + *    - tables are small (16 elements), allowing for fast gathers
> + *    - all arguments processed in the main path
> + *        - final VSCALEF assists branch-free design (correct overflow/underflow and special case responses)
> + *        - a VAND is used to ensure the reduced argument |R|<2, even for large inputs
> + *        - RZ mode used to avoid oveflow to +/-Inf for x*log2(e); helps with special case handling
> + *        - SAE used to avoid spurious flag settings
> + *
> + */
> +
> +/* Offsets for data table __svml_sexp10_data_internal_avx512
> + */
> +#define Exp_tbl_L                     	0
> +#define Exp_tbl_H                     	128
> +#define L2E                           	256
> +#define Shifter                       	320
> +#define L2H                           	384
> +#define L2L                           	448
> +#define EMask                         	512
> +#define AbsMask                       	576
> +#define Threshold                     	640
> +#define poly_coeff2                   	704
> +#define poly_coeff1                   	768
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_exp10f_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   L2E+__svml_sexp10_data_internal_avx512(%rip), %zmm2
> +        vmovups   Shifter+__svml_sexp10_data_internal_avx512(%rip), %zmm1
> +        vmovups   L2H+__svml_sexp10_data_internal_avx512(%rip), %zmm5
> +        vmovups   L2L+__svml_sexp10_data_internal_avx512(%rip), %zmm4
> +
> +/* ensure |R|<2 even for special cases */
> +        vmovups   EMask+__svml_sexp10_data_internal_avx512(%rip), %zmm6
> +        vmovups   poly_coeff2+__svml_sexp10_data_internal_avx512(%rip), %zmm9
> +
> +/* 2^(52-4)*1.5 + x * log2(e) */
> +        vfmadd213ps {rz-sae}, %zmm1, %zmm0, %zmm2
> +        vmovups   poly_coeff1+__svml_sexp10_data_internal_avx512(%rip), %zmm10
> +        vmovups   __svml_sexp10_data_internal_avx512(%rip), %zmm8
> +        vmovups   Exp_tbl_H+__svml_sexp10_data_internal_avx512(%rip), %zmm15
> +        vmovups   Threshold+__svml_sexp10_data_internal_avx512(%rip), %zmm13
> +        vpsrld    $5, %zmm2, %zmm3
> +
> +/* Z0 ~ x*log2(e), rounded down to 6 fractional bits */
> +        vsubps    {rn-sae}, %zmm1, %zmm2, %zmm1
> +        vpermt2ps Exp_tbl_L+64+__svml_sexp10_data_internal_avx512(%rip), %zmm2, %zmm8
> +        vpermt2ps Exp_tbl_H+64+__svml_sexp10_data_internal_avx512(%rip), %zmm3, %zmm15
> +        vandps    AbsMask+__svml_sexp10_data_internal_avx512(%rip), %zmm0, %zmm12
> +
> +/* R = x - Z0*log(2) */
> +        vfnmadd213ps {rn-sae}, %zmm0, %zmm1, %zmm5
> +        vcmpps    $29, {sae}, %zmm13, %zmm12, %k0
> +        vfnmadd231ps {rn-sae}, %zmm1, %zmm4, %zmm5
> +        kmovw     %k0, %edx
> +        vrangeps  $2, {sae}, %zmm6, %zmm5, %zmm11
> +        vfmadd231ps {rn-sae}, %zmm11, %zmm9, %zmm10
> +        vmulps    {rn-sae}, %zmm11, %zmm10, %zmm14
> +
> +/* x!=0? */
> +        vpxord    %zmm7, %zmm7, %zmm7
> +        vcmpps    $4, {sae}, %zmm7, %zmm0, %k1
> +
> +/* Th*Tl */
> +        vmulps    {rn-sae}, %zmm8, %zmm15, %zmm15{%k1}
> +        vfmadd213ps {rn-sae}, %zmm15, %zmm14, %zmm15
> +        vscalefps {rn-sae}, %zmm1, %zmm15, %zmm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %zmm1, %zmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm0, 64(%rsp)
> +        vmovups   %zmm1, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm1
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      exp10f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_exp10f_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_sexp10_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Exp_tbl_L[32][1];
> +        __declspec(align(64)) VUINT32 Exp_tbl_H[32][1];
> +        __declspec(align(64)) VUINT32 L2E[16][1];
> +        __declspec(align(64)) VUINT32 Shifter[16][1];
> +        __declspec(align(64)) VUINT32 L2H[16][1];
> +        __declspec(align(64)) VUINT32 L2L[16][1];
> +        __declspec(align(64)) VUINT32 EMask[16][1];
> +        __declspec(align(64)) VUINT32 AbsMask[16][1];
> +        __declspec(align(64)) VUINT32 Threshold[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
> +    } __svml_sexp10_data_internal_avx512;
> +#endif
> +__svml_sexp10_data_internal_avx512:
> +        /*== Exp_tbl_L ==*/
> +        .long 0x3f800001, 0x3f801631, 0x3f802c65, 0x3f80429d
> +        .long 0x3f8058d9, 0x3f806f18, 0x3f80855c, 0x3f809ba3
> +        .long 0x3f80b1ee, 0x3f80c83d, 0x3f80de90, 0x3f80f4e7
> +        .long 0x3f810b42, 0x3f8121a0, 0x3f813803, 0x3f814e69
> +        .long 0x3f8164d3, 0x3f817b41, 0x3f8191b3, 0x3f81a829
> +        .long 0x3f81bea2, 0x3f81d520, 0x3f81eba2, 0x3f820227
> +        .long 0x3f8218b0, 0x3f822f3d, 0x3f8245cf, 0x3f825c64
> +        .long 0x3f8272fd, 0x3f828999, 0x3f82a03a, 0x3f82b6df
> +        /*== Exp_tbl_H ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f82cd87, 0x3f85aac3, 0x3f88980f
> +        .long 0x3f8b95c2, 0x3f8ea43a, 0x3f91c3d3, 0x3f94f4f0
> +        .long 0x3f9837f0, 0x3f9b8d3a, 0x3f9ef532, 0x3fa27043
> +        .long 0x3fa5fed7, 0x3fa9a15b, 0x3fad583f, 0x3fb123f6
> +        .long 0x3fb504f3, 0x3fb8fbaf, 0x3fbd08a4, 0x3fc12c4d
> +        .long 0x3fc5672a, 0x3fc9b9be, 0x3fce248c, 0x3fd2a81e
> +        .long 0x3fd744fd, 0x3fdbfbb8, 0x3fe0ccdf, 0x3fe5b907
> +        .long 0x3feac0c7, 0x3fefe4ba, 0x3ff5257d, 0x3ffa83b3
> +        /*== log2(10) ==*/
> +        .align 64
> +        .long 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78, 0x40549A78
> +        /*== Shifter=2^(23-10)*1.5 ==*/
> +        .align 64
> +        .long 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000, 0x46400000
> +        /*== L2H = log(2)_high ==*/
> +        .align 64
> +        .long 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b
> +        /*== L2L = log(2)_low ==*/
> +        .align 64
> +        .long 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860, 0xb2760860
> +        /*== EMask ==*/
> +        .align 64
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
> +        /*== AbsMask ==*/
> +        .align 64
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== Threshold ==*/
> +        .align 64
> +        .long 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818, 0x4217B818
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .long 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA, 0x4029B7DA
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .long 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D, 0x40135D8D
> +        .align 64
> +        .type	__svml_sexp10_data_internal_avx512,@object
> +        .size	__svml_sexp10_data_internal_avx512,.-__svml_sexp10_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S
> new file mode 100644
> index 0000000000..460d01357d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized exp10f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_exp10f _ZGVbN4v_exp10f_sse2
> +#include "../svml_s_exp10f4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c
> new file mode 100644
> index 0000000000..7ce90a9bae
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized exp10f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_exp10f
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_exp10f, __GI__ZGVbN4v_exp10f,
> +	       __redirect__ZGVbN4v_exp10f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S
> new file mode 100644
> index 0000000000..879592b789
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f4_core_sse4.S
> @@ -0,0 +1,311 @@
> +/* Function exp10f vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   exp10(x)  = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y))
> + *   where
> + *        x = m*log10(2)/K + y,  y in [-log10(2)/K..log10(2)/K]
> + *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
> + *
> + *        values of 2^j/K are tabulated
> + *
> + *        P(y) is a minimax polynomial approximation of exp10(x)-1
> + *        on small interval [-log10(2)/K..log10(2)/K]
> + *
> + *  Special cases:
> + *
> + *   exp10(NaN)  = NaN
> + *   exp10(+INF) = +INF
> + *   exp10(-INF) = 0
> + *   exp10(x)    = 1 for subnormals
> + *   For IEEE float
> + *     if x >  38.5318412780761720 then exp10f(x) overflow
> + *     if x < -45.4555282592773440 then exp10f(x) underflow
> + *
> + */
> +
> +/* Offsets for data table __svml_sexp10_data_internal
> + */
> +#define _sT                           	0
> +#define _sLg2_10                      	128
> +#define _sShifter                     	144
> +#define _sInvLg2_10hi                 	160
> +#define _sInvLg2_10lo                 	176
> +#define _sPC0                         	192
> +#define _sPC1                         	208
> +#define _sPC2                         	224
> +#define _iIndexMask                   	240
> +#define _iAbsMask                     	256
> +#define _iDomainRange                 	272
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_exp10f_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm4
> +
> +/*  Load arument  */
> +        movups    _sLg2_10+__svml_sexp10_data_internal(%rip), %xmm2
> +        lea       __svml_sexp10_data_internal(%rip), %r8
> +        mulps     %xmm4, %xmm2
> +        movups    _sShifter+__svml_sexp10_data_internal(%rip), %xmm5
> +
> +/*  R  */
> +        movups    _sInvLg2_10hi+__svml_sexp10_data_internal(%rip), %xmm14
> +        addps     %xmm5, %xmm2
> +        movaps    %xmm2, %xmm1
> +        movups    _sInvLg2_10lo+__svml_sexp10_data_internal(%rip), %xmm15
> +        subps     %xmm5, %xmm1
> +        mulps     %xmm1, %xmm14
> +        movaps    %xmm4, %xmm5
> +        mulps     %xmm1, %xmm15
> +        subps     %xmm14, %xmm5
> +
> +/*
> + *  Polynomial
> + * exp10 = 2^N*(Tj+Tj*poly)
> + * poly(sN) = {1+later} a0+a1*sR
> + */
> +        movups    _sPC2+__svml_sexp10_data_internal(%rip), %xmm1
> +        subps     %xmm15, %xmm5
> +        mulps     %xmm5, %xmm1
> +        movdqu    _iIndexMask+__svml_sexp10_data_internal(%rip), %xmm3
> +
> +/*  Index and lookup  */
> +        movdqa    %xmm3, %xmm10
> +
> +/* remove index bits */
> +        pandn     %xmm2, %xmm3
> +        pand      %xmm2, %xmm10
> +
> +/*  2^N  */
> +        pslld     $18, %xmm3
> +
> +/* iIndex *= sizeof(S); */
> +        pslld     $2, %xmm10
> +        addps     _sPC1+__svml_sexp10_data_internal(%rip), %xmm1
> +        movd      %xmm10, %edx
> +        pshufd    $1, %xmm10, %xmm7
> +        pshufd    $2, %xmm10, %xmm9
> +        pshufd    $3, %xmm10, %xmm11
> +        movd      %xmm7, %ecx
> +        movd      %xmm9, %esi
> +        movd      %xmm11, %edi
> +
> +/* Check for overflow\underflow  */
> +        movdqu    _iAbsMask+__svml_sexp10_data_internal(%rip), %xmm6
> +        pand      %xmm4, %xmm6
> +        mulps     %xmm1, %xmm5
> +        movslq    %edx, %rdx
> +        addps     _sPC0+__svml_sexp10_data_internal(%rip), %xmm5
> +        movslq    %ecx, %rcx
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        movd      (%r8,%rdx), %xmm0
> +        movd      (%r8,%rcx), %xmm8
> +        movd      (%r8,%rsi), %xmm13
> +        movd      (%r8,%rdi), %xmm12
> +        punpckldq %xmm8, %xmm0
> +        punpckldq %xmm12, %xmm13
> +        punpcklqdq %xmm13, %xmm0
> +
> +/* Tj_l+Tj_h*poly */
> +        mulps     %xmm0, %xmm5
> +        pcmpgtd   _iDomainRange+__svml_sexp10_data_internal(%rip), %xmm6
> +        addps     %xmm5, %xmm0
> +        movmskps  %xmm6, %eax
> +
> +/* quick mul 2^N */
> +        paddd     %xmm3, %xmm0
> +
> +/*  Finish   */
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm4
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm4, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax
> +
> +        xorl      %edx, %edx
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      exp10f@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_exp10f_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_sexp10_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _sT[(1<<5)][1];
> +        __declspec(align(16)) VUINT32 _sLg2_10[4][1];
> +        __declspec(align(16)) VUINT32 _sShifter[4][1];
> +        __declspec(align(16)) VUINT32 _sInvLg2_10hi[4][1];
> +        __declspec(align(16)) VUINT32 _sInvLg2_10lo[4][1];
> +        __declspec(align(16)) VUINT32 _sPC0[4][1];
> +        __declspec(align(16)) VUINT32 _sPC1[4][1];
> +        __declspec(align(16)) VUINT32 _sPC2[4][1];
> +        __declspec(align(16)) VUINT32 _iIndexMask[4][1];
> +        __declspec(align(16)) VUINT32 _iAbsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
> +} __svml_sexp10_data_internal;
> +#endif
> +__svml_sexp10_data_internal:
> +        /*== _sT ==*/
> +        .long 0x3f800000  // 2^( 0 /32 )
> +        .long 0x3f82cd87  // 2^( 1 /32 )
> +        .long 0x3f85aac3  // 2^( 2 /32 )
> +        .long 0x3f88980f  // 2^( 3 /32 )
> +        .long 0x3f8b95c2  // 2^( 4 /32 )
> +        .long 0x3f8ea43a  // 2^( 5 /32 )
> +        .long 0x3f91c3d3  // 2^( 6 /32 )
> +        .long 0x3f94f4f0  // 2^( 7 /32 )
> +        .long 0x3f9837f0  // 2^( 8 /32 )
> +        .long 0x3f9b8d3a  // 2^( 9 /32 )
> +        .long 0x3f9ef532  // 2^( 10/32 )
> +        .long 0x3fa27043  // 2^( 11/32 )
> +        .long 0x3fa5fed7  // 2^( 12/32 )
> +        .long 0x3fa9a15b  // 2^( 13/32 )
> +        .long 0x3fad583f  // 2^( 14/32 )
> +        .long 0x3fb123f6  // 2^( 15/32 )
> +        .long 0x3fb504f3  // 2^( 16/32 )
> +        .long 0x3fb8fbaf  // 2^( 17/32 )
> +        .long 0x3fbd08a4  // 2^( 18/32 )
> +        .long 0x3fc12c4d  // 2^( 19/32 )
> +        .long 0x3fc5672a  // 2^( 20/32 )
> +        .long 0x3fc9b9be  // 2^( 21/32 )
> +        .long 0x3fce248c  // 2^( 22/32 )
> +        .long 0x3fd2a81e  // 2^( 23/32 )
> +        .long 0x3fd744fd  // 2^( 24/32 )
> +        .long 0x3fdbfbb8  // 2^( 25/32 )
> +        .long 0x3fe0ccdf  // 2^( 26/32 )
> +        .long 0x3fe5b907  // 2^( 27/32 )
> +        .long 0x3feac0c7  // 2^( 28/32 )
> +        .long 0x3fefe4ba  // 2^( 29/32 )
> +        .long 0x3ff5257d  // 2^( 30/32 )
> +        .long 0x3ffa83b3  // 2^( 31/32 )
> +        .align 16
> +        .long 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78  /* _sLg2_10*2^K   */
> +        .align 16
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000  /* _sShifter) */
> +        .align 16
> +        .long 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000  /* _sInvLg2_10hi/2^K hi (24-K-7) bits*/
> +        .align 16
> +        .long 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc  /* _sInvLg2_10lo/2^K  lo bits */
> +        // otherwise exp10(0) won't produce exact 1.0
> +        .align 16
> +        .long 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868  /* _sPC0 */
> +        .align 16
> +        .long 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b  /* _sPC1 */
> +        .align 16
> +        .long 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2  /* _sPC2 */
> +        .align 16
> +        .long 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f  /* _iIndexMask =(2^K-1)*/
> +        //common
> +        .align 16
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff   /* _iAbsMask */
> +        .align 16
> +        .long 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818   /* _iDomainRange=-log10(max_denormal=0x007fffff) RZ */
> +        .align 16
> +        .type	__svml_sexp10_data_internal,@object
> +        .size	__svml_sexp10_data_internal,.-__svml_sexp10_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S
> new file mode 100644
> index 0000000000..3f3fe252da
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized exp10f, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_exp10f _ZGVdN8v_exp10f_sse_wrapper
> +#include "../svml_s_exp10f8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c
> new file mode 100644
> index 0000000000..1f5ed5a59d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized exp10f, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_exp10f
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_exp10f, __GI__ZGVdN8v_exp10f,
> +	       __redirect__ZGVdN8v_exp10f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S
> new file mode 100644
> index 0000000000..b576412cf1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_exp10f8_core_avx2.S
> @@ -0,0 +1,331 @@
> +/* Function exp10f vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   exp10(x)  = 2^x/log10(2) = 2^n * (1 + T[j]) * (1 + P(y))
> + *   where
> + *        x = m*log10(2)/K + y,  y in [-log10(2)/K..log10(2)/K]
> + *        m = n*K + j,           m,n,j - signed integer, j in [-K/2..K/2]
> + *
> + *        values of 2^j/K are tabulated
> + *
> + *        P(y) is a minimax polynomial approximation of exp10(x)-1
> + *        on small interval [-log10(2)/K..log10(2)/K]
> + *
> + *  Special cases:
> + *
> + *   exp10(NaN)  = NaN
> + *   exp10(+INF) = +INF
> + *   exp10(-INF) = 0
> + *   exp10(x)    = 1 for subnormals
> + *   For IEEE float
> + *     if x >  38.5318412780761720 then exp10f(x) overflow
> + *     if x < -45.4555282592773440 then exp10f(x) underflow
> + *
> + */
> +
> +/* Offsets for data table __svml_sexp10_data_internal
> + */
> +#define _sT                           	0
> +#define _sLg2_10                      	128
> +#define _sShifter                     	160
> +#define _sInvLg2_10hi                 	192
> +#define _sInvLg2_10lo                 	224
> +#define _sPC0                         	256
> +#define _sPC1                         	288
> +#define _sPC2                         	320
> +#define _iIndexMask                   	352
> +#define _iAbsMask                     	384
> +#define _iDomainRange                 	416
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_exp10f_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       __svml_sexp10_data_internal(%rip), %rax
> +        vmovups   _sShifter+__svml_sexp10_data_internal(%rip), %ymm4
> +
> +/*  Load arument  */
> +        vmovups   _sLg2_10+__svml_sexp10_data_internal(%rip), %ymm1
> +        vmovups   _iIndexMask+__svml_sexp10_data_internal(%rip), %ymm2
> +        vmovaps   %ymm0, %ymm3
> +        vfmadd213ps %ymm4, %ymm3, %ymm1
> +
> +/*  Index and lookup  */
> +        vandps    %ymm2, %ymm1, %ymm7
> +
> +/* iIndex *= sizeof(S); */
> +        vpslld    $2, %ymm7, %ymm10
> +        vsubps    %ymm4, %ymm1, %ymm0
> +
> +/* Check for overflow\underflow  */
> +        vandps    _iAbsMask+__svml_sexp10_data_internal(%rip), %ymm3, %ymm5
> +        vpcmpgtd  _iDomainRange+__svml_sexp10_data_internal(%rip), %ymm5, %ymm6
> +        vmovmskps %ymm6, %edx
> +        vmovd     %xmm10, %ecx
> +        vextractf128 $1, %ymm10, %xmm6
> +        vpextrd   $1, %xmm10, %esi
> +        vpextrd   $2, %xmm10, %edi
> +        vpextrd   $3, %xmm10, %r8d
> +        movslq    %ecx, %rcx
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        movslq    %r8d, %r8
> +        vmovd     (%rax,%rcx), %xmm8
> +        vmovd     (%rax,%rsi), %xmm9
> +        vmovd     (%rax,%rdi), %xmm11
> +        vmovd     (%rax,%r8), %xmm12
> +        vpunpckldq %xmm9, %xmm8, %xmm13
> +        vpunpckldq %xmm12, %xmm11, %xmm14
> +        vpunpcklqdq %xmm14, %xmm13, %xmm15
> +
> +/*  R  */
> +        vmovups   _sInvLg2_10hi+__svml_sexp10_data_internal(%rip), %ymm13
> +        vmovd     %xmm6, %r9d
> +        vfnmadd213ps %ymm3, %ymm0, %ymm13
> +        vpextrd   $1, %xmm6, %r10d
> +        movslq    %r9d, %r9
> +        movslq    %r10d, %r10
> +        vfnmadd132ps _sInvLg2_10lo+__svml_sexp10_data_internal(%rip), %ymm13, %ymm0
> +        vmovd     (%rax,%r9), %xmm4
> +        vmovd     (%rax,%r10), %xmm5
> +        vpunpckldq %xmm5, %xmm4, %xmm9
> +
> +/*
> + *  Polynomial
> + * exp10 = 2^N*(Tj+Tj*poly)
> + * poly(sN) = {1+later} a0+a1*sR
> + */
> +        vmovups   _sPC2+__svml_sexp10_data_internal(%rip), %ymm4
> +        vfmadd213ps _sPC1+__svml_sexp10_data_internal(%rip), %ymm0, %ymm4
> +        vpextrd   $2, %xmm6, %r11d
> +        vpextrd   $3, %xmm6, %ecx
> +        movslq    %r11d, %r11
> +        movslq    %ecx, %rcx
> +        vfmadd213ps _sPC0+__svml_sexp10_data_internal(%rip), %ymm0, %ymm4
> +        vmovd     (%rax,%r11), %xmm7
> +        vmovd     (%rax,%rcx), %xmm8
> +        vpunpckldq %xmm8, %xmm7, %xmm11
> +
> +/* remove index bits */
> +        vpandn    %ymm1, %ymm2, %ymm0
> +        vpunpcklqdq %xmm11, %xmm9, %xmm12
> +
> +/*  2^N  */
> +        vpslld    $18, %ymm0, %ymm1
> +        vinsertf128 $1, %xmm12, %ymm15, %ymm14
> +
> +/* Tj_l+Tj_h*poly */
> +        vfmadd213ps %ymm14, %ymm14, %ymm4
> +
> +/* quick mul 2^N */
> +        vpaddd    %ymm1, %ymm4, %ymm0
> +
> +/*  Finish   */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm3, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      exp10f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_exp10f_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_sexp10_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _sT[(1<<5)][1];
> +        __declspec(align(32)) VUINT32 _sLg2_10[8][1];
> +        __declspec(align(32)) VUINT32 _sShifter[8][1];
> +        __declspec(align(32)) VUINT32 _sInvLg2_10hi[8][1];
> +        __declspec(align(32)) VUINT32 _sInvLg2_10lo[8][1];
> +        __declspec(align(32)) VUINT32 _sPC0[8][1];
> +        __declspec(align(32)) VUINT32 _sPC1[8][1];
> +        __declspec(align(32)) VUINT32 _sPC2[8][1];
> +        __declspec(align(32)) VUINT32 _iIndexMask[8][1];
> +        __declspec(align(32)) VUINT32 _iAbsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
> +} __svml_sexp10_data_internal;
> +#endif
> +__svml_sexp10_data_internal:
> +        /*== _sT ==*/
> +        .long 0x3f800000  // 2^( 0 /32 )
> +        .long 0x3f82cd87  // 2^( 1 /32 )
> +        .long 0x3f85aac3  // 2^( 2 /32 )
> +        .long 0x3f88980f  // 2^( 3 /32 )
> +        .long 0x3f8b95c2  // 2^( 4 /32 )
> +        .long 0x3f8ea43a  // 2^( 5 /32 )
> +        .long 0x3f91c3d3  // 2^( 6 /32 )
> +        .long 0x3f94f4f0  // 2^( 7 /32 )
> +        .long 0x3f9837f0  // 2^( 8 /32 )
> +        .long 0x3f9b8d3a  // 2^( 9 /32 )
> +        .long 0x3f9ef532  // 2^( 10/32 )
> +        .long 0x3fa27043  // 2^( 11/32 )
> +        .long 0x3fa5fed7  // 2^( 12/32 )
> +        .long 0x3fa9a15b  // 2^( 13/32 )
> +        .long 0x3fad583f  // 2^( 14/32 )
> +        .long 0x3fb123f6  // 2^( 15/32 )
> +        .long 0x3fb504f3  // 2^( 16/32 )
> +        .long 0x3fb8fbaf  // 2^( 17/32 )
> +        .long 0x3fbd08a4  // 2^( 18/32 )
> +        .long 0x3fc12c4d  // 2^( 19/32 )
> +        .long 0x3fc5672a  // 2^( 20/32 )
> +        .long 0x3fc9b9be  // 2^( 21/32 )
> +        .long 0x3fce248c  // 2^( 22/32 )
> +        .long 0x3fd2a81e  // 2^( 23/32 )
> +        .long 0x3fd744fd  // 2^( 24/32 )
> +        .long 0x3fdbfbb8  // 2^( 25/32 )
> +        .long 0x3fe0ccdf  // 2^( 26/32 )
> +        .long 0x3fe5b907  // 2^( 27/32 )
> +        .long 0x3feac0c7  // 2^( 28/32 )
> +        .long 0x3fefe4ba  // 2^( 29/32 )
> +        .long 0x3ff5257d  // 2^( 30/32 )
> +        .long 0x3ffa83b3  // 2^( 31/32 )
> +        .align 32
> +        .long 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78, 0x42d49a78  /* _sLg2_10*2^K   */
> +        .align 32
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000  /* _sShifter) */
> +        .align 32
> +        .long 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000, 0x3c1a2000  /* _sInvLg2_10hi/2^K hi (24-K-7) bits*/
> +        .align 32
> +        .long 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc, 0x341a84fc  /* _sInvLg2_10lo/2^K  lo bits */
> +        // otherwise exp10(0) won't produce exact 1.0
> +        .align 32
> +        .long 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868, 0x2fecc868  /* _sPC0 */
> +        .align 32
> +        .long 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b, 0x40135e1b  /* _sPC1 */
> +        .align 32
> +        .long 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2, 0x4029a8d2  /* _sPC2 */
> +        .align 32
> +        .long 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f  /* _iIndexMask =(2^K-1)*/
> +        //common
> +        .align 32
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff   /* _iAbsMask */
> +        .align 32
> +        .long 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818, 0x4217b818   /* _iDomainRange=-log10(max_denormal=0x007fffff) RZ */
> +        .align 32
> +        .type	__svml_sexp10_data_internal,@object
> +        .size	__svml_sexp10_data_internal,.-__svml_sexp10_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_exp102_core.S b/sysdeps/x86_64/fpu/svml_d_exp102_core.S
> new file mode 100644
> index 0000000000..157fb3b7c0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_exp102_core.S
> @@ -0,0 +1,29 @@
> +/* Function exp10 vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_exp10)
> +WRAPPER_IMPL_SSE2 exp10
> +END (_ZGVbN2v_exp10)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_exp10)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_exp104_core.S b/sysdeps/x86_64/fpu/svml_d_exp104_core.S
> new file mode 100644
> index 0000000000..9b9d0a5d4b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_exp104_core.S
> @@ -0,0 +1,29 @@
> +/* Function exp10 vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_exp10)
> +WRAPPER_IMPL_AVX _ZGVbN2v_exp10
> +END (_ZGVdN4v_exp10)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_exp10)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S b/sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S
> new file mode 100644
> index 0000000000..1ba1a819ed
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_exp104_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function exp10 vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_exp10)
> +WRAPPER_IMPL_AVX _ZGVbN2v_exp10
> +END (_ZGVcN4v_exp10)
> diff --git a/sysdeps/x86_64/fpu/svml_d_exp108_core.S b/sysdeps/x86_64/fpu/svml_d_exp108_core.S
> new file mode 100644
> index 0000000000..a530dc12de
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_exp108_core.S
> @@ -0,0 +1,25 @@
> +/* Function exp10 vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_exp10)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_exp10
> +END (_ZGVeN8v_exp10)
> diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f16_core.S b/sysdeps/x86_64/fpu/svml_s_exp10f16_core.S
> new file mode 100644
> index 0000000000..e5043bc875
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_exp10f16_core.S
> @@ -0,0 +1,25 @@
> +/* Function exp10f vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_exp10f)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_exp10f
> +END (_ZGVeN16v_exp10f)
> diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f4_core.S b/sysdeps/x86_64/fpu/svml_s_exp10f4_core.S
> new file mode 100644
> index 0000000000..75e6637a82
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_exp10f4_core.S
> @@ -0,0 +1,29 @@
> +/* Function exp10f vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_exp10f)
> +WRAPPER_IMPL_SSE2 exp10f
> +END (_ZGVbN4v_exp10f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_exp10f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f8_core.S b/sysdeps/x86_64/fpu/svml_s_exp10f8_core.S
> new file mode 100644
> index 0000000000..d481d2dee9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_exp10f8_core.S
> @@ -0,0 +1,29 @@
> +/* Function exp10f vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_exp10f)
> +WRAPPER_IMPL_AVX _ZGVbN4v_exp10f
> +END (_ZGVdN8v_exp10f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_exp10f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S
> new file mode 100644
> index 0000000000..65944bd4d2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_exp10f8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function exp10f vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_exp10f)
> +WRAPPER_IMPL_AVX _ZGVbN4v_exp10f
> +END (_ZGVcN8v_exp10f)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c
> new file mode 100644
> index 0000000000..7cdda9895b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-exp10.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c
> new file mode 100644
> index 0000000000..7cdda9895b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-exp10.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c
> new file mode 100644
> index 0000000000..7cdda9895b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-exp10.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-exp10.c b/sysdeps/x86_64/fpu/test-double-libmvec-exp10.c
> new file mode 100644
> index 0000000000..b1461ed85e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-exp10.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC exp10
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 2f7172bd7b..256e8f07c9 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVbN2v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index e2d519faac..9de1dab2c2 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVdN4v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 1ce4d8b413..43865ab099 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVcN4v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 6c87cec648..5dbdacf617 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atan), _ZGVeN8v_atan)
>  VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c
> new file mode 100644
> index 0000000000..be3cdaa80d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-exp10f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c
> new file mode 100644
> index 0000000000..be3cdaa80d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-exp10f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c
> new file mode 100644
> index 0000000000..be3cdaa80d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-exp10f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c
> new file mode 100644
> index 0000000000..06f447eb8d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-exp10f.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC exp10f
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 597d7d7598..c159c8f583 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVeN16v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index 3500eec810..c745ef744a 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVbN4v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index 921b9c65d6..c9226cf4dc 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVdN8v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 6cbcb57521..92970c5ace 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -32,6 +32,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanf), _ZGVcN8v_atanf)
>  VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 14/18] x86-64: Add vector atanh/atanhf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 14/18] x86-64: Add vector atanh/atanhf " Sunil K Pandey
@ 2021-12-29 21:26   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:26 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:56PM -0800, Sunil K Pandey wrote:
> Implement vectorized atanh/atanhf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector atanh/atanhf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |   11 +
>  math/bits/mathcalls.h                         |    2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |    4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |    1 +
>  sysdeps/x86_64/fpu/Versions                   |    2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
>  .../fpu/multiarch/svml_d_atanh2_core-sse2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_d_atanh2_core.c |   27 +
>  .../fpu/multiarch/svml_d_atanh2_core_sse4.S   | 1519 +++++++++++++++++
>  .../fpu/multiarch/svml_d_atanh4_core-sse.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_atanh4_core.c |   27 +
>  .../fpu/multiarch/svml_d_atanh4_core_avx2.S   | 1479 ++++++++++++++++
>  .../fpu/multiarch/svml_d_atanh8_core-avx2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_d_atanh8_core.c |   27 +
>  .../fpu/multiarch/svml_d_atanh8_core_avx512.S |  401 +++++
>  .../fpu/multiarch/svml_s_atanhf16_core-avx2.S |   20 +
>  .../fpu/multiarch/svml_s_atanhf16_core.c      |   28 +
>  .../multiarch/svml_s_atanhf16_core_avx512.S   |  393 +++++
>  .../fpu/multiarch/svml_s_atanhf4_core-sse2.S  |   20 +
>  .../fpu/multiarch/svml_s_atanhf4_core.c       |   28 +
>  .../fpu/multiarch/svml_s_atanhf4_core_sse4.S  |  361 ++++
>  .../fpu/multiarch/svml_s_atanhf8_core-sse.S   |   20 +
>  .../fpu/multiarch/svml_s_atanhf8_core.c       |   28 +
>  .../fpu/multiarch/svml_s_atanhf8_core_avx2.S  |  335 ++++
>  sysdeps/x86_64/fpu/svml_d_atanh2_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_d_atanh4_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S   |   25 +
>  sysdeps/x86_64/fpu/svml_d_atanh8_core.S       |   25 +
>  sysdeps/x86_64/fpu/svml_s_atanhf16_core.S     |   25 +
>  sysdeps/x86_64/fpu/svml_s_atanhf4_core.S      |   29 +
>  sysdeps/x86_64/fpu/svml_s_atanhf8_core.S      |   29 +
>  sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S  |   25 +
>  .../fpu/test-double-libmvec-atanh-avx.c       |    1 +
>  .../fpu/test-double-libmvec-atanh-avx2.c      |    1 +
>  .../fpu/test-double-libmvec-atanh-avx512f.c   |    1 +
>  .../x86_64/fpu/test-double-libmvec-atanh.c    |    3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
>  .../fpu/test-float-libmvec-atanhf-avx.c       |    1 +
>  .../fpu/test-float-libmvec-atanhf-avx2.c      |    1 +
>  .../fpu/test-float-libmvec-atanhf-avx512f.c   |    1 +
>  .../x86_64/fpu/test-float-libmvec-atanhf.c    |    3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
>  50 files changed, 5060 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atanh8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atanh.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atanhf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 845246fab9..bb7380a446 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -252,4 +252,15 @@
>  #define __DECL_SIMD_log1pf32x
>  #define __DECL_SIMD_log1pf64x
>  #define __DECL_SIMD_log1pf128x
> +
> +#define __DECL_SIMD_atanh
> +#define __DECL_SIMD_atanhf
> +#define __DECL_SIMD_atanhl
> +#define __DECL_SIMD_atanhf16
> +#define __DECL_SIMD_atanhf32
> +#define __DECL_SIMD_atanhf64
> +#define __DECL_SIMD_atanhf128
> +#define __DECL_SIMD_atanhf32x
> +#define __DECL_SIMD_atanhf64x
> +#define __DECL_SIMD_atanhf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index aa4bc61aa4..04dd9c5d1b 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -86,7 +86,7 @@ __MATHCALL (acosh,, (_Mdouble_ __x));
>  /* Hyperbolic arc sine of X.  */
>  __MATHCALL (asinh,, (_Mdouble_ __x));
>  /* Hyperbolic arc tangent of X.  */
> -__MATHCALL (atanh,, (_Mdouble_ __x));
> +__MATHCALL_VEC (atanh,, (_Mdouble_ __x));
>  #endif
>  
>  /* Exponential and logarithmic functions.  */
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index 68b940606a..2d389912b1 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -49,6 +49,7 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
>  GLIBC_2.35 _ZGVbN2v_acos F
>  GLIBC_2.35 _ZGVbN2v_asin F
>  GLIBC_2.35 _ZGVbN2v_atan F
> +GLIBC_2.35 _ZGVbN2v_atanh F
>  GLIBC_2.35 _ZGVbN2v_cbrt F
>  GLIBC_2.35 _ZGVbN2v_cosh F
>  GLIBC_2.35 _ZGVbN2v_exp10 F
> @@ -63,6 +64,7 @@ GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
>  GLIBC_2.35 _ZGVbN4v_atanf F
> +GLIBC_2.35 _ZGVbN4v_atanhf F
>  GLIBC_2.35 _ZGVbN4v_cbrtf F
>  GLIBC_2.35 _ZGVbN4v_coshf F
>  GLIBC_2.35 _ZGVbN4v_exp10f F
> @@ -77,6 +79,7 @@ GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
>  GLIBC_2.35 _ZGVcN4v_asin F
>  GLIBC_2.35 _ZGVcN4v_atan F
> +GLIBC_2.35 _ZGVcN4v_atanh F
>  GLIBC_2.35 _ZGVcN4v_cbrt F
>  GLIBC_2.35 _ZGVcN4v_cosh F
>  GLIBC_2.35 _ZGVcN4v_exp10 F
> @@ -91,6 +94,7 @@ GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
>  GLIBC_2.35 _ZGVcN8v_atanf F
> +GLIBC_2.35 _ZGVcN8v_atanhf F
>  GLIBC_2.35 _ZGVcN8v_cbrtf F
>  GLIBC_2.35 _ZGVcN8v_coshf F
>  GLIBC_2.35 _ZGVcN8v_exp10f F
> @@ -105,6 +109,7 @@ GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
>  GLIBC_2.35 _ZGVdN4v_asin F
>  GLIBC_2.35 _ZGVdN4v_atan F
> +GLIBC_2.35 _ZGVdN4v_atanh F
>  GLIBC_2.35 _ZGVdN4v_cbrt F
>  GLIBC_2.35 _ZGVdN4v_cosh F
>  GLIBC_2.35 _ZGVdN4v_exp10 F
> @@ -119,6 +124,7 @@ GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
>  GLIBC_2.35 _ZGVdN8v_atanf F
> +GLIBC_2.35 _ZGVdN8v_atanhf F
>  GLIBC_2.35 _ZGVdN8v_cbrtf F
>  GLIBC_2.35 _ZGVdN8v_coshf F
>  GLIBC_2.35 _ZGVdN8v_exp10f F
> @@ -133,6 +139,7 @@ GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
>  GLIBC_2.35 _ZGVeN16v_atanf F
> +GLIBC_2.35 _ZGVeN16v_atanhf F
>  GLIBC_2.35 _ZGVeN16v_cbrtf F
>  GLIBC_2.35 _ZGVeN16v_coshf F
>  GLIBC_2.35 _ZGVeN16v_exp10f F
> @@ -147,6 +154,7 @@ GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
>  GLIBC_2.35 _ZGVeN8v_asin F
>  GLIBC_2.35 _ZGVeN8v_atan F
> +GLIBC_2.35 _ZGVeN8v_atanh F
>  GLIBC_2.35 _ZGVeN8v_cbrt F
>  GLIBC_2.35 _ZGVeN8v_cosh F
>  GLIBC_2.35 _ZGVeN8v_exp10 F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 14c9db3bb3..4937b6811f 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -114,6 +114,10 @@
>  #  define __DECL_SIMD_log1p __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_log1pf
>  #  define __DECL_SIMD_log1pf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_atanh
> +#  define __DECL_SIMD_atanh __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_atanhf
> +#  define __DECL_SIMD_atanhf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 3dca196432..da39c08ba9 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -56,6 +56,8 @@
>  !GCC$ builtin (log2f) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (log1p) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (log1pf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (atanh) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (atanhf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -97,3 +99,5 @@
>  !GCC$ builtin (log2f) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (log1p) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (log1pf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (atanh) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (atanhf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 378cb06d37..de87544259 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -26,6 +26,7 @@ libmvec-funcs = \
>    asin \
>    atan \
>    atan2 \
> +  atanh \
>    cbrt \
>    cos \
>    cosh \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 155fb115f3..df0ea83711 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -17,6 +17,7 @@ libmvec {
>      _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
>      _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
>      _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
> +    _ZGVbN2v_atanh; _ZGVcN4v_atanh; _ZGVdN4v_atanh; _ZGVeN8v_atanh;
>      _ZGVbN2v_cbrt; _ZGVcN4v_cbrt; _ZGVdN4v_cbrt; _ZGVeN8v_cbrt;
>      _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh;
>      _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
> @@ -31,6 +32,7 @@ libmvec {
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
>      _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
> +    _ZGVbN4v_atanhf; _ZGVcN8v_atanhf; _ZGVdN8v_atanhf; _ZGVeN16v_atanhf;
>      _ZGVbN4v_cbrtf; _ZGVcN8v_cbrtf; _ZGVdN8v_cbrtf; _ZGVeN16v_cbrtf;
>      _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf;
>      _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index a2b15a795b..09a46190b6 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -248,6 +248,26 @@ float: 3
>  float128: 4
>  ldouble: 5
>  
> +Function: "atanh_vlen16":
> +float: 1
> +
> +Function: "atanh_vlen2":
> +double: 1
> +
> +Function: "atanh_vlen4":
> +double: 1
> +float: 1
> +
> +Function: "atanh_vlen4_avx2":
> +double: 1
> +
> +Function: "atanh_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "atanh_vlen8_avx2":
> +float: 1
> +
>  Function: "cabs":
>  double: 1
>  float128: 1
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core-sse2.S
> new file mode 100644
> index 0000000000..b154ab8649
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized atanh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_atanh _ZGVbN2v_atanh_sse2
> +#include "../svml_d_atanh2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core.c
> new file mode 100644
> index 0000000000..138190e568
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized atanh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_atanh
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_atanh, __GI__ZGVbN2v_atanh, __redirect__ZGVbN2v_atanh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core_sse4.S
> new file mode 100644
> index 0000000000..7e70b036f7
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh2_core_sse4.S
> @@ -0,0 +1,1519 @@
> +/* Function atanh vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
> + *
> + *   Special cases:
> + *
> + *   atanh(0)  = 0
> + *   atanh(+1) = +INF
> + *   atanh(-1) = -INF
> + *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
> + *
> + */
> +
> +/* Offsets for data table __svml_datanh_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	8208
> +#define poly_coeff                    	12320
> +#define ExpMask                       	12384
> +#define Two10                         	12400
> +#define MinLog1p                      	12416
> +#define MaxLog1p                      	12432
> +#define One                           	12448
> +#define SgnMask                       	12464
> +#define XThreshold                    	12480
> +#define XhMask                        	12496
> +#define Threshold                     	12512
> +#define Bias                          	12528
> +#define Bias1                         	12544
> +#define ExpMask0                      	12560
> +#define ExpMask2                      	12576
> +#define L2                            	12592
> +#define dHalf                         	12608
> +#define dSign                         	12624
> +#define dTopMask12                    	12640
> +#define dTopMask41                    	12656
> +#define TinyRange                     	12672
> +
> +/* Lookup bias for data table __svml_datanh_data_internal.  */
> +#define Table_Lookup_Bias               -0x405ff0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_atanh_sse4)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $64, %rsp
> +        movaps    %xmm0, %xmm12
> +        movups    SgnMask+__svml_datanh_data_internal(%rip), %xmm7
> +        lea       Table_Lookup_Bias+__svml_datanh_data_internal(%rip), %rsi
> +
> +/* Load the constant 1 and a sign mask */
> +        movups    One+__svml_datanh_data_internal(%rip), %xmm11
> +
> +/* Strip off the sign, so treat X as positive until right at the end */
> +        movaps    %xmm7, %xmm14
> +        andps     %xmm12, %xmm14
> +        movaps    %xmm11, %xmm15
> +        subpd     %xmm14, %xmm15
> +        movups    dTopMask41+__svml_datanh_data_internal(%rip), %xmm2
> +        movaps    %xmm11, %xmm5
> +        movaps    %xmm2, %xmm0
> +
> +/*
> + * Compute V = 2 * X trivially, and UHi + U_lo = 1 - X in two pieces,
> + * the upper part UHi being <= 41 bits long. Then we have
> + * atanh(X) = 1/2 * log((1 + X) / (1 - X)) = 1/2 * log1p(V / (UHi + ULo)).
> + */
> +        movaps    %xmm14, %xmm6
> +        andps     %xmm15, %xmm0
> +
> +/*
> + * Check whether |X| < 1, in which case we use the main function.
> + * Otherwise set the rangemask so that the callout will get used.
> + * Note that this will also use the callout for NaNs since not(NaN < 1).
> + */
> +        movaps    %xmm14, %xmm13
> +
> +/*
> + * Now compute R = 1/(UHi+ULo) * (1 - E) and the error term E
> + * The first FMR is exact (we force R to 12 bits just in case it
> + * isn't already, to make absolutely sure), and since E is ~ 2^-12,
> + * the rounding error in the other one is acceptable.
> + */
> +        cvtpd2ps  %xmm0, %xmm1
> +        subpd     %xmm15, %xmm5
> +        addpd     %xmm14, %xmm6
> +        subpd     %xmm0, %xmm15
> +        cmpnltpd  %xmm11, %xmm13
> +        subpd     %xmm14, %xmm5
> +        movmskpd  %xmm13, %edx
> +        movlhps   %xmm1, %xmm1
> +        movaps    %xmm14, %xmm9
> +        rcpps     %xmm1, %xmm4
> +        addpd     %xmm15, %xmm5
> +        cmpltpd   TinyRange+__svml_datanh_data_internal(%rip), %xmm9
> +        cvtps2pd  %xmm4, %xmm14
> +        andps     dTopMask12+__svml_datanh_data_internal(%rip), %xmm14
> +        movaps    %xmm11, %xmm13
> +        mulpd     %xmm14, %xmm0
> +        mulpd     %xmm14, %xmm5
> +        subpd     %xmm0, %xmm13
> +
> +/*
> + * Split V as well into upper 41 bits and lower part, so that we can get
> + * a preliminary quotient estimate without rounding error.
> + */
> +        andps     %xmm6, %xmm2
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * later incorporating L into the reduced argument.
> + * compute 1+x as high, low parts
> + */
> +        movaps    %xmm11, %xmm0
> +        subpd     %xmm5, %xmm13
> +        subpd     %xmm2, %xmm6
> +
> +/* Hence get initial quotient estimate QHi + QLo = R * VHi + R * VLo */
> +        mulpd     %xmm14, %xmm2
> +        mulpd     %xmm6, %xmm14
> +
> +/*
> + * Compute D = E + E^2 + E^3 + E^4 + E^5
> + * = E + (E + E^2) (E + E * E^2)
> + */
> +        movaps    %xmm13, %xmm6
> +        movaps    %xmm13, %xmm3
> +        mulpd     %xmm13, %xmm6
> +        mulpd     %xmm6, %xmm3
> +        addpd     %xmm13, %xmm6
> +        addpd     %xmm13, %xmm3
> +        mulpd     %xmm3, %xmm6
> +        addpd     %xmm6, %xmm13
> +
> +/*
> + * Compute R * (VHi + VLo) * (1 + E + E^2 + E^3 + E^4 + E^5)
> + * = R *  (VHi + VLo) * (1 + D)
> + * = QHi + (QHi * D + QLo + QLo * D)
> + */
> +        movaps    %xmm13, %xmm1
> +        movaps    %xmm11, %xmm5
> +        mulpd     %xmm14, %xmm13
> +        mulpd     %xmm2, %xmm1
> +        addpd     %xmm13, %xmm14
> +        addpd     %xmm14, %xmm1
> +
> +/*
> + * Now finally accumulate the high and low parts of the
> + * argument to log1p, H + L, with a final compensated summation.
> + */
> +        addpd     %xmm1, %xmm2
> +        maxpd     %xmm2, %xmm0
> +        minpd     %xmm2, %xmm5
> +        andps     %xmm7, %xmm2
> +        movaps    %xmm0, %xmm4
> +        cmpltpd   XThreshold+__svml_datanh_data_internal(%rip), %xmm2
> +        addpd     %xmm5, %xmm4
> +        orps      XhMask+__svml_datanh_data_internal(%rip), %xmm2
> +        movaps    %xmm12, %xmm10
> +
> +/* preserve mantissa, set input exponent to 2^(-10) */
> +        movups    ExpMask+__svml_datanh_data_internal(%rip), %xmm7
> +        andps     %xmm2, %xmm4
> +        andps     %xmm4, %xmm7
> +
> +/* exponent bits */
> +        movaps    %xmm4, %xmm6
> +        orps      Two10+__svml_datanh_data_internal(%rip), %xmm7
> +        psrlq     $20, %xmm6
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        cvtpd2ps  %xmm7, %xmm1
> +        subpd     %xmm4, %xmm0
> +        mulpd     %xmm12, %xmm10
> +        addpd     %xmm0, %xmm5
> +        addpd     %xmm12, %xmm10
> +        movlhps   %xmm1, %xmm1
> +        rcpps     %xmm1, %xmm15
> +        cvtps2pd  %xmm15, %xmm3
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        movups    .FLT_21(%rip), %xmm1
> +        addpd     %xmm1, %xmm3
> +        subpd     %xmm1, %xmm3
> +
> +/* exponent of X needed to scale Xl */
> +        movdqu    ExpMask0+__svml_datanh_data_internal(%rip), %xmm0
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        movaps    %xmm3, %xmm13
> +
> +/* 2^ (-10-exp(X) ) */
> +        movdqu    ExpMask2+__svml_datanh_data_internal(%rip), %xmm2
> +        pand      %xmm4, %xmm0
> +        psubq     %xmm0, %xmm2
> +
> +/* scale DblRcp */
> +        mulpd     %xmm3, %xmm2
> +
> +/* argument reduction */
> +        mulpd     %xmm2, %xmm4
> +        mulpd     %xmm2, %xmm5
> +        subpd     %xmm11, %xmm4
> +        addpd     %xmm5, %xmm4
> +
> +/* polynomial */
> +        movups    poly_coeff+__svml_datanh_data_internal(%rip), %xmm11
> +        psrlq     $40, %xmm13
> +        mulpd     %xmm4, %xmm11
> +        movd      %xmm13, %eax
> +        pshufd    $221, %xmm6, %xmm7
> +
> +/* exponent*log(2.0) */
> +        movups    Threshold+__svml_datanh_data_internal(%rip), %xmm6
> +        cmpltpd   %xmm3, %xmm6
> +        addpd     poly_coeff+16+__svml_datanh_data_internal(%rip), %xmm11
> +
> +/* biased exponent in DP format */
> +        cvtdq2pd  %xmm7, %xmm1
> +        movaps    %xmm4, %xmm3
> +        mulpd     %xmm4, %xmm3
> +        movups    poly_coeff+32+__svml_datanh_data_internal(%rip), %xmm2
> +        mulpd     %xmm4, %xmm2
> +        mulpd     %xmm3, %xmm11
> +        addpd     poly_coeff+48+__svml_datanh_data_internal(%rip), %xmm2
> +        addpd     %xmm11, %xmm2
> +
> +/* reconstruction */
> +        mulpd     %xmm2, %xmm3
> +        andps     Bias+__svml_datanh_data_internal(%rip), %xmm6
> +        orps      Bias1+__svml_datanh_data_internal(%rip), %xmm6
> +        pshufd    $2, %xmm13, %xmm14
> +        subpd     %xmm6, %xmm1
> +        addpd     %xmm3, %xmm4
> +        movd      %xmm14, %ecx
> +        mulpd     L2+__svml_datanh_data_internal(%rip), %xmm1
> +        movslq    %eax, %rax
> +        movslq    %ecx, %rcx
> +
> +/* Record the sign for eventual reincorporation. */
> +        movups    dSign+__svml_datanh_data_internal(%rip), %xmm8
> +        andps     %xmm12, %xmm8
> +        movsd     (%rsi,%rax), %xmm0
> +
> +/* Or the sign bit in with the tiny result to handle atanh(-0) correctly */
> +        orps      %xmm8, %xmm10
> +        movhpd    (%rsi,%rcx), %xmm0
> +        andps     %xmm9, %xmm10
> +        addpd     %xmm4, %xmm0
> +        addpd     %xmm0, %xmm1
> +
> +/* Finally, halve the result and reincorporate the sign */
> +        movups    dHalf+__svml_datanh_data_internal(%rip), %xmm4
> +        movaps    %xmm9, %xmm0
> +        pxor      %xmm8, %xmm4
> +        mulpd     %xmm1, %xmm4
> +        andnps    %xmm4, %xmm0
> +        orps      %xmm10, %xmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm12
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm12, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      atanh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVbN2v_atanh_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_datanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
> +        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
> +        __declspec(align(16)) VUINT32 ExpMask[2][2];
> +        __declspec(align(16)) VUINT32 Two10[2][2];
> +        __declspec(align(16)) VUINT32 MinLog1p[2][2];
> +        __declspec(align(16)) VUINT32 MaxLog1p[2][2];
> +        __declspec(align(16)) VUINT32 One[2][2];
> +        __declspec(align(16)) VUINT32 SgnMask[2][2];
> +        __declspec(align(16)) VUINT32 XThreshold[2][2];
> +        __declspec(align(16)) VUINT32 XhMask[2][2];
> +        __declspec(align(16)) VUINT32 Threshold[2][2];
> +        __declspec(align(16)) VUINT32 Bias[2][2];
> +        __declspec(align(16)) VUINT32 Bias1[2][2];
> +        __declspec(align(16)) VUINT32 ExpMask0[2][2];
> +        __declspec(align(16)) VUINT32 ExpMask2[2][2];
> +        __declspec(align(16)) VUINT32 L2[2][2];
> +        __declspec(align(16)) VUINT32 dHalf[2][2];
> +        __declspec(align(16)) VUINT32 dSign[2][2];
> +        __declspec(align(16)) VUINT32 dTopMask12[2][2];
> +        __declspec(align(16)) VUINT32 dTopMask41[2][2];
> +        __declspec(align(16)) VUINT32 TinyRange[2][2];
> +} __svml_datanh_data_internal;
> +#endif
> +__svml_datanh_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> +        /*== Log_LA_table ==*/
> +        .align 16
> +        .quad 0x8000000000000000
> +        .quad 0xbf5ff802a9ab10e6
> +        .quad 0xbf6ff00aa2b10bc0
> +        .quad 0xbf77ee11ebd82e94
> +        .quad 0xbf7fe02a6b106789
> +        .quad 0xbf83e7295d25a7d9
> +        .quad 0xbf87dc475f810a77
> +        .quad 0xbf8bcf712c74384c
> +        .quad 0xbf8fc0a8b0fc03e4
> +        .quad 0xbf91d7f7eb9eebe7
> +        .quad 0xbf93cea44346a575
> +        .quad 0xbf95c45a51b8d389
> +        .quad 0xbf97b91b07d5b11b
> +        .quad 0xbf99ace7551cc514
> +        .quad 0xbf9b9fc027af9198
> +        .quad 0xbf9d91a66c543cc4
> +        .quad 0xbf9f829b0e783300
> +        .quad 0xbfa0b94f7c196176
> +        .quad 0xbfa1b0d98923d980
> +        .quad 0xbfa2a7ec2214e873
> +        .quad 0xbfa39e87b9febd60
> +        .quad 0xbfa494acc34d911c
> +        .quad 0xbfa58a5bafc8e4d5
> +        .quad 0xbfa67f94f094bd98
> +        .quad 0xbfa77458f632dcfc
> +        .quad 0xbfa868a83083f6cf
> +        .quad 0xbfa95c830ec8e3eb
> +        .quad 0xbfaa4fe9ffa3d235
> +        .quad 0xbfab42dd711971bf
> +        .quad 0xbfac355dd0921f2d
> +        .quad 0xbfad276b8adb0b52
> +        .quad 0xbfae19070c276016
> +        .quad 0xbfaf0a30c01162a6
> +        .quad 0xbfaffae9119b9303
> +        .quad 0xbfb075983598e471
> +        .quad 0xbfb0ed839b5526fe
> +        .quad 0xbfb16536eea37ae1
> +        .quad 0xbfb1dcb263db1944
> +        .quad 0xbfb253f62f0a1417
> +        .quad 0xbfb2cb0283f5de1f
> +        .quad 0xbfb341d7961bd1d1
> +        .quad 0xbfb3b87598b1b6ee
> +        .quad 0xbfb42edcbea646f0
> +        .quad 0xbfb4a50d3aa1b040
> +        .quad 0xbfb51b073f06183f
> +        .quad 0xbfb590cafdf01c28
> +        .quad 0xbfb60658a93750c4
> +        .quad 0xbfb67bb0726ec0fc
> +        .quad 0xbfb6f0d28ae56b4c
> +        .quad 0xbfb765bf23a6be13
> +        .quad 0xbfb7da766d7b12cd
> +        .quad 0xbfb84ef898e8282a
> +        .quad 0xbfb8c345d6319b21
> +        .quad 0xbfb9375e55595ede
> +        .quad 0xbfb9ab42462033ad
> +        .quad 0xbfba1ef1d8061cd4
> +        .quad 0xbfba926d3a4ad563
> +        .quad 0xbfbb05b49bee43fe
> +        .quad 0xbfbb78c82bb0eda1
> +        .quad 0xbfbbeba818146765
> +        .quad 0xbfbc5e548f5bc743
> +        .quad 0xbfbcd0cdbf8c13e1
> +        .quad 0xbfbd4313d66cb35d
> +        .quad 0xbfbdb5270187d927
> +        .quad 0xbfbe27076e2af2e6
> +        .quad 0xbfbe98b549671467
> +        .quad 0xbfbf0a30c01162a6
> +        .quad 0xbfbf7b79fec37ddf
> +        .quad 0xbfbfec9131dbeabb
> +        .quad 0xbfc02ebb42bf3d4b
> +        .quad 0xbfc0671512ca596e
> +        .quad 0xbfc09f561ee719c3
> +        .quad 0xbfc0d77e7cd08e59
> +        .quad 0xbfc10f8e422539b1
> +        .quad 0xbfc14785846742ac
> +        .quad 0xbfc17f6458fca611
> +        .quad 0xbfc1b72ad52f67a0
> +        .quad 0xbfc1eed90e2dc2c3
> +        .quad 0xbfc2266f190a5acb
> +        .quad 0xbfc25ded0abc6ad2
> +        .quad 0xbfc29552f81ff523
> +        .quad 0xbfc2cca0f5f5f251
> +        .quad 0xbfc303d718e47fd3
> +        .quad 0xbfc33af575770e4f
> +        .quad 0xbfc371fc201e8f74
> +        .quad 0xbfc3a8eb2d31a376
> +        .quad 0xbfc3dfc2b0ecc62a
> +        .quad 0xbfc41682bf727bc0
> +        .quad 0xbfc44d2b6ccb7d1e
> +        .quad 0xbfc483bccce6e3dd
> +        .quad 0xbfc4ba36f39a55e5
> +        .quad 0xbfc4f099f4a230b2
> +        .quad 0xbfc526e5e3a1b438
> +        .quad 0xbfc55d1ad4232d6f
> +        .quad 0xbfc59338d9982086
> +        .quad 0xbfc5c940075972b9
> +        .quad 0xbfc5ff3070a793d4
> +        .quad 0xbfc6350a28aaa758
> +        .quad 0xbfc66acd4272ad51
> +        .quad 0xbfc6a079d0f7aad2
> +        .quad 0xbfc6d60fe719d21d
> +        .quad 0xbfc70b8f97a1aa75
> +        .quad 0xbfc740f8f54037a5
> +        .quad 0xbfc7764c128f2127
> +        .quad 0xbfc7ab890210d909
> +        .quad 0xbfc7e0afd630c274
> +        .quad 0xbfc815c0a14357eb
> +        .quad 0xbfc84abb75865139
> +        .quad 0xbfc87fa06520c911
> +        .quad 0xbfc8b46f8223625b
> +        .quad 0xbfc8e928de886d41
> +        .quad 0xbfc91dcc8c340bde
> +        .quad 0xbfc9525a9cf456b4
> +        .quad 0xbfc986d3228180ca
> +        .quad 0xbfc9bb362e7dfb83
> +        .quad 0xbfc9ef83d2769a34
> +        .quad 0xbfca23bc1fe2b563
> +        .quad 0xbfca57df28244dcd
> +        .quad 0xbfca8becfc882f19
> +        .quad 0xbfcabfe5ae46124c
> +        .quad 0xbfcaf3c94e80bff3
> +        .quad 0xbfcb2797ee46320c
> +        .quad 0xbfcb5b519e8fb5a4
> +        .quad 0xbfcb8ef670420c3b
> +        .quad 0xbfcbc286742d8cd6
> +        .quad 0xbfcbf601bb0e44e2
> +        .quad 0xbfcc2968558c18c1
> +        .quad 0xbfcc5cba543ae425
> +        .quad 0xbfcc8ff7c79a9a22
> +        .quad 0xbfccc320c0176502
> +        .quad 0xbfccf6354e09c5dc
> +        .quad 0xbfcd293581b6b3e7
> +        .quad 0xbfcd5c216b4fbb91
> +        .quad 0xbfcd8ef91af31d5e
> +        .quad 0xbfcdc1bca0abec7d
> +        .quad 0xbfcdf46c0c722d2f
> +        .quad 0xbfce27076e2af2e6
> +        .quad 0xbfce598ed5a87e2f
> +        .quad 0xbfce8c0252aa5a60
> +        .quad 0xbfcebe61f4dd7b0b
> +        .quad 0xbfcef0adcbdc5936
> +        .quad 0xbfcf22e5e72f105d
> +        .quad 0xbfcf550a564b7b37
> +        .quad 0xbfcf871b28955045
> +        .quad 0xbfcfb9186d5e3e2b
> +        .quad 0xbfcfeb0233e607cc
> +        .quad 0xbfd00e6c45ad501d
> +        .quad 0xbfd0274dc16c232f
> +        .quad 0xbfd0402594b4d041
> +        .quad 0xbfd058f3c703ebc6
> +        .quad 0xbfd071b85fcd590d
> +        .quad 0xbfd08a73667c57af
> +        .quad 0xbfd0a324e27390e3
> +        .quad 0xbfd0bbccdb0d24bd
> +        .quad 0xbfd0d46b579ab74b
> +        .quad 0xbfd0ed005f657da4
> +        .quad 0xbfd1058bf9ae4ad5
> +        .quad 0xbfd11e0e2dad9cb7
> +        .quad 0xbfd136870293a8b0
> +        .quad 0xbfd14ef67f88685a
> +        .quad 0xbfd1675cababa60e
> +        .quad 0xbfd17fb98e15095d
> +        .quad 0xbfd1980d2dd4236f
> +        .quad 0xbfd1b05791f07b49
> +        .quad 0xbfd1c898c16999fb
> +        .quad 0xbfd1e0d0c33716be
> +        .quad 0xbfd1f8ff9e48a2f3
> +        .quad 0xbfd211255986160c
> +        .quad 0xbfd22941fbcf7966
> +        .quad 0xbfd241558bfd1404
> +        .quad 0xbfd2596010df763a
> +        .quad 0xbfd27161913f853d
> +        .quad 0xbfd2895a13de86a3
> +        .quad 0xbfd2a1499f762bc9
> +        .quad 0xbfd2b9303ab89d25
> +        .quad 0xbfd2d10dec508583
> +        .quad 0xbfd2e8e2bae11d31
> +        .quad 0xbfd300aead06350c
> +        .quad 0xbfd31871c9544185
> +        .quad 0xbfd3302c16586588
> +        .quad 0xbfd347dd9a987d55
> +        .quad 0xbfd35f865c93293e
> +        .quad 0xbfd3772662bfd85b
> +        .quad 0xbfd38ebdb38ed321
> +        .quad 0xbfd3a64c556945ea
> +        .quad 0xbfd3bdd24eb14b6a
> +        .quad 0xbfd3d54fa5c1f710
> +        .quad 0xbfd3ecc460ef5f50
> +        .quad 0xbfd404308686a7e4
> +        .quad 0xbfd41b941cce0bee
> +        .quad 0xbfd432ef2a04e814
> +        .quad 0xbfd44a41b463c47c
> +        .quad 0xbfd4618bc21c5ec2
> +        .quad 0xbfd478cd5959b3d9
> +        .quad 0xbfd49006804009d1
> +        .quad 0xbfd4a7373cecf997
> +        .quad 0xbfd4be5f957778a1
> +        .quad 0xbfd4d57f8fefe27f
> +        .quad 0xbfd4ec973260026a
> +        .quad 0xbfd503a682cb1cb3
> +        .quad 0xbfd51aad872df82d
> +        .quad 0xbfd531ac457ee77e
> +        .quad 0xbfd548a2c3add263
> +        .quad 0xbfd55f9107a43ee2
> +        .quad 0xbfd5767717455a6c
> +        .quad 0xbfd58d54f86e02f2
> +        .quad 0xbfd5a42ab0f4cfe2
> +        .quad 0xbfd5baf846aa1b19
> +        .quad 0xbfd5d1bdbf5809ca
> +        .quad 0xbfd5e87b20c2954a
> +        .quad 0xbfd5ff3070a793d4
> +        .quad 0xbfd615ddb4bec13c
> +        .quad 0xbfd62c82f2b9c795
> +        .quad 0x3fd61965cdb02c1f
> +        .quad 0x3fd602d08af091ec
> +        .quad 0x3fd5ec433d5c35ae
> +        .quad 0x3fd5d5bddf595f30
> +        .quad 0x3fd5bf406b543db2
> +        .quad 0x3fd5a8cadbbedfa1
> +        .quad 0x3fd5925d2b112a59
> +        .quad 0x3fd57bf753c8d1fb
> +        .quad 0x3fd565995069514c
> +        .quad 0x3fd54f431b7be1a9
> +        .quad 0x3fd538f4af8f72fe
> +        .quad 0x3fd522ae0738a3d8
> +        .quad 0x3fd50c6f1d11b97c
> +        .quad 0x3fd4f637ebba9810
> +        .quad 0x3fd4e0086dd8baca
> +        .quad 0x3fd4c9e09e172c3c
> +        .quad 0x3fd4b3c077267e9a
> +        .quad 0x3fd49da7f3bcc41f
> +        .quad 0x3fd487970e958770
> +        .quad 0x3fd4718dc271c41b
> +        .quad 0x3fd45b8c0a17df13
> +        .quad 0x3fd44591e0539f49
> +        .quad 0x3fd42f9f3ff62642
> +        .quad 0x3fd419b423d5e8c7
> +        .quad 0x3fd403d086cea79c
> +        .quad 0x3fd3edf463c1683e
> +        .quad 0x3fd3d81fb5946dba
> +        .quad 0x3fd3c25277333184
> +        .quad 0x3fd3ac8ca38e5c5f
> +        .quad 0x3fd396ce359bbf54
> +        .quad 0x3fd3811728564cb2
> +        .quad 0x3fd36b6776be1117
> +        .quad 0x3fd355bf1bd82c8b
> +        .quad 0x3fd3401e12aecba1
> +        .quad 0x3fd32a84565120a8
> +        .quad 0x3fd314f1e1d35ce4
> +        .quad 0x3fd2ff66b04ea9d4
> +        .quad 0x3fd2e9e2bce12286
> +        .quad 0x3fd2d46602adccee
> +        .quad 0x3fd2bef07cdc9354
> +        .quad 0x3fd2a982269a3dbf
> +        .quad 0x3fd2941afb186b7c
> +        .quad 0x3fd27ebaf58d8c9d
> +        .quad 0x3fd269621134db92
> +        .quad 0x3fd25410494e56c7
> +        .quad 0x3fd23ec5991eba49
> +        .quad 0x3fd22981fbef797b
> +        .quad 0x3fd214456d0eb8d4
> +        .quad 0x3fd1ff0fe7cf47a7
> +        .quad 0x3fd1e9e1678899f4
> +        .quad 0x3fd1d4b9e796c245
> +        .quad 0x3fd1bf99635a6b95
> +        .quad 0x3fd1aa7fd638d33f
> +        .quad 0x3fd1956d3b9bc2fa
> +        .quad 0x3fd180618ef18adf
> +        .quad 0x3fd16b5ccbacfb73
> +        .quad 0x3fd1565eed455fc3
> +        .quad 0x3fd14167ef367783
> +        .quad 0x3fd12c77cd00713b
> +        .quad 0x3fd1178e8227e47c
> +        .quad 0x3fd102ac0a35cc1c
> +        .quad 0x3fd0edd060b78081
> +        .quad 0x3fd0d8fb813eb1ef
> +        .quad 0x3fd0c42d676162e3
> +        .quad 0x3fd0af660eb9e279
> +        .quad 0x3fd09aa572e6c6d4
> +        .quad 0x3fd085eb8f8ae797
> +        .quad 0x3fd07138604d5862
> +        .quad 0x3fd05c8be0d9635a
> +        .quad 0x3fd047e60cde83b8
> +        .quad 0x3fd03346e0106062
> +        .quad 0x3fd01eae5626c691
> +        .quad 0x3fd00a1c6adda473
> +        .quad 0x3fcfeb2233ea07cd
> +        .quad 0x3fcfc218be620a5e
> +        .quad 0x3fcf991c6cb3b379
> +        .quad 0x3fcf702d36777df0
> +        .quad 0x3fcf474b134df229
> +        .quad 0x3fcf1e75fadf9bde
> +        .quad 0x3fcef5ade4dcffe6
> +        .quad 0x3fceccf2c8fe920a
> +        .quad 0x3fcea4449f04aaf5
> +        .quad 0x3fce7ba35eb77e2a
> +        .quad 0x3fce530effe71012
> +        .quad 0x3fce2a877a6b2c12
> +        .quad 0x3fce020cc6235ab5
> +        .quad 0x3fcdd99edaf6d7e9
> +        .quad 0x3fcdb13db0d48940
> +        .quad 0x3fcd88e93fb2f450
> +        .quad 0x3fcd60a17f903515
> +        .quad 0x3fcd38666871f465
> +        .quad 0x3fcd1037f2655e7b
> +        .quad 0x3fcce816157f1988
> +        .quad 0x3fccc000c9db3c52
> +        .quad 0x3fcc97f8079d44ec
> +        .quad 0x3fcc6ffbc6f00f71
> +        .quad 0x3fcc480c0005ccd1
> +        .quad 0x3fcc2028ab17f9b4
> +        .quad 0x3fcbf851c067555f
> +        .quad 0x3fcbd087383bd8ad
> +        .quad 0x3fcba8c90ae4ad19
> +        .quad 0x3fcb811730b823d2
> +        .quad 0x3fcb5971a213acdb
> +        .quad 0x3fcb31d8575bce3d
> +        .quad 0x3fcb0a4b48fc1b46
> +        .quad 0x3fcae2ca6f672bd4
> +        .quad 0x3fcabb55c31693ad
> +        .quad 0x3fca93ed3c8ad9e3
> +        .quad 0x3fca6c90d44b704e
> +        .quad 0x3fca454082e6ab05
> +        .quad 0x3fca1dfc40f1b7f1
> +        .quad 0x3fc9f6c407089664
> +        .quad 0x3fc9cf97cdce0ec3
> +        .quad 0x3fc9a8778debaa38
> +        .quad 0x3fc981634011aa75
> +        .quad 0x3fc95a5adcf7017f
> +        .quad 0x3fc9335e5d594989
> +        .quad 0x3fc90c6db9fcbcd9
> +        .quad 0x3fc8e588ebac2dbf
> +        .quad 0x3fc8beafeb38fe8c
> +        .quad 0x3fc897e2b17b19a5
> +        .quad 0x3fc871213750e994
> +        .quad 0x3fc84a6b759f512f
> +        .quad 0x3fc823c16551a3c2
> +        .quad 0x3fc7fd22ff599d4f
> +        .quad 0x3fc7d6903caf5ad0
> +        .quad 0x3fc7b0091651528c
> +        .quad 0x3fc7898d85444c73
> +        .quad 0x3fc7631d82935a86
> +        .quad 0x3fc73cb9074fd14d
> +        .quad 0x3fc716600c914054
> +        .quad 0x3fc6f0128b756abc
> +        .quad 0x3fc6c9d07d203fc7
> +        .quad 0x3fc6a399dabbd383
> +        .quad 0x3fc67d6e9d785771
> +        .quad 0x3fc6574ebe8c133a
> +        .quad 0x3fc6313a37335d76
> +        .quad 0x3fc60b3100b09476
> +        .quad 0x3fc5e533144c1719
> +        .quad 0x3fc5bf406b543db2
> +        .quad 0x3fc59958ff1d52f1
> +        .quad 0x3fc5737cc9018cdd
> +        .quad 0x3fc54dabc26105d2
> +        .quad 0x3fc527e5e4a1b58d
> +        .quad 0x3fc5022b292f6a45
> +        .quad 0x3fc4dc7b897bc1c8
> +        .quad 0x3fc4b6d6fefe22a4
> +        .quad 0x3fc4913d8333b561
> +        .quad 0x3fc46baf0f9f5db7
> +        .quad 0x3fc4462b9dc9b3dc
> +        .quad 0x3fc420b32740fdd4
> +        .quad 0x3fc3fb45a59928cc
> +        .quad 0x3fc3d5e3126bc27f
> +        .quad 0x3fc3b08b6757f2a9
> +        .quad 0x3fc38b3e9e027479
> +        .quad 0x3fc365fcb0159016
> +        .quad 0x3fc340c59741142e
> +        .quad 0x3fc31b994d3a4f85
> +        .quad 0x3fc2f677cbbc0a96
> +        .quad 0x3fc2d1610c86813a
> +        .quad 0x3fc2ac55095f5c59
> +        .quad 0x3fc28753bc11aba5
> +        .quad 0x3fc2625d1e6ddf57
> +        .quad 0x3fc23d712a49c202
> +        .quad 0x3fc2188fd9807263
> +        .quad 0x3fc1f3b925f25d41
> +        .quad 0x3fc1ceed09853752
> +        .quad 0x3fc1aa2b7e23f72a
> +        .quad 0x3fc185747dbecf34
> +        .quad 0x3fc160c8024b27b1
> +        .quad 0x3fc13c2605c398c3
> +        .quad 0x3fc1178e8227e47c
> +        .quad 0x3fc0f301717cf0fb
> +        .quad 0x3fc0ce7ecdccc28d
> +        .quad 0x3fc0aa06912675d5
> +        .quad 0x3fc08598b59e3a07
> +        .quad 0x3fc06135354d4b18
> +        .quad 0x3fc03cdc0a51ec0d
> +        .quad 0x3fc0188d2ecf6140
> +        .quad 0x3fbfe89139dbd566
> +        .quad 0x3fbfa01c9db57ce2
> +        .quad 0x3fbf57bc7d9005db
> +        .quad 0x3fbf0f70cdd992e3
> +        .quad 0x3fbec739830a1120
> +        .quad 0x3fbe7f1691a32d3e
> +        .quad 0x3fbe3707ee30487b
> +        .quad 0x3fbdef0d8d466db9
> +        .quad 0x3fbda727638446a2
> +        .quad 0x3fbd5f55659210e2
> +        .quad 0x3fbd179788219364
> +        .quad 0x3fbccfedbfee13a8
> +        .quad 0x3fbc885801bc4b23
> +        .quad 0x3fbc40d6425a5cb1
> +        .quad 0x3fbbf968769fca11
> +        .quad 0x3fbbb20e936d6974
> +        .quad 0x3fbb6ac88dad5b1c
> +        .quad 0x3fbb23965a52ff00
> +        .quad 0x3fbadc77ee5aea8c
> +        .quad 0x3fba956d3ecade63
> +        .quad 0x3fba4e7640b1bc38
> +        .quad 0x3fba0792e9277cac
> +        .quad 0x3fb9c0c32d4d2548
> +        .quad 0x3fb97a07024cbe74
> +        .quad 0x3fb9335e5d594989
> +        .quad 0x3fb8ecc933aeb6e8
> +        .quad 0x3fb8a6477a91dc29
> +        .quad 0x3fb85fd927506a48
> +        .quad 0x3fb8197e2f40e3f0
> +        .quad 0x3fb7d33687c293c9
> +        .quad 0x3fb78d02263d82d3
> +        .quad 0x3fb746e100226ed9
> +        .quad 0x3fb700d30aeac0e1
> +        .quad 0x3fb6bad83c1883b6
> +        .quad 0x3fb674f089365a7a
> +        .quad 0x3fb62f1be7d77743
> +        .quad 0x3fb5e95a4d9791cb
> +        .quad 0x3fb5a3abb01ade25
> +        .quad 0x3fb55e10050e0384
> +        .quad 0x3fb518874226130a
> +        .quad 0x3fb4d3115d207eac
> +        .quad 0x3fb48dae4bc31018
> +        .quad 0x3fb4485e03dbdfad
> +        .quad 0x3fb403207b414b7f
> +        .quad 0x3fb3bdf5a7d1ee64
> +        .quad 0x3fb378dd7f749714
> +        .quad 0x3fb333d7f8183f4b
> +        .quad 0x3fb2eee507b40301
> +        .quad 0x3fb2aa04a44717a5
> +        .quad 0x3fb26536c3d8c369
> +        .quad 0x3fb2207b5c78549e
> +        .quad 0x3fb1dbd2643d190b
> +        .quad 0x3fb1973bd1465567
> +        .quad 0x3fb152b799bb3cc9
> +        .quad 0x3fb10e45b3cae831
> +        .quad 0x3fb0c9e615ac4e17
> +        .quad 0x3fb08598b59e3a07
> +        .quad 0x3fb0415d89e74444
> +        .quad 0x3faffa6911ab9301
> +        .quad 0x3faf723b517fc523
> +        .quad 0x3faeea31c006b87c
> +        .quad 0x3fae624c4a0b5e1b
> +        .quad 0x3fadda8adc67ee4e
> +        .quad 0x3fad52ed6405d86f
> +        .quad 0x3faccb73cdddb2cc
> +        .quad 0x3fac441e06f72a9e
> +        .quad 0x3fabbcebfc68f420
> +        .quad 0x3fab35dd9b58baad
> +        .quad 0x3faaaef2d0fb10fc
> +        .quad 0x3faa282b8a936171
> +        .quad 0x3fa9a187b573de7c
> +        .quad 0x3fa91b073efd7314
> +        .quad 0x3fa894aa149fb343
> +        .quad 0x3fa80e7023d8ccc4
> +        .quad 0x3fa788595a3577ba
> +        .quad 0x3fa70265a550e777
> +        .quad 0x3fa67c94f2d4bb58
> +        .quad 0x3fa5f6e73078efb8
> +        .quad 0x3fa5715c4c03ceef
> +        .quad 0x3fa4ebf43349e26f
> +        .quad 0x3fa466aed42de3ea
> +        .quad 0x3fa3e18c1ca0ae92
> +        .quad 0x3fa35c8bfaa1306b
> +        .quad 0x3fa2d7ae5c3c5bae
> +        .quad 0x3fa252f32f8d183f
> +        .quad 0x3fa1ce5a62bc353a
> +        .quad 0x3fa149e3e4005a8d
> +        .quad 0x3fa0c58fa19dfaaa
> +        .quad 0x3fa0415d89e74444
> +        .quad 0x3f9f7a9b16782856
> +        .quad 0x3f9e72bf2813ce51
> +        .quad 0x3f9d6b2725979802
> +        .quad 0x3f9c63d2ec14aaf2
> +        .quad 0x3f9b5cc258b718e6
> +        .quad 0x3f9a55f548c5c43f
> +        .quad 0x3f994f6b99a24475
> +        .quad 0x3f98492528c8cabf
> +        .quad 0x3f974321d3d006d3
> +        .quad 0x3f963d6178690bd6
> +        .quad 0x3f9537e3f45f3565
> +        .quad 0x3f9432a925980cc1
> +        .quad 0x3f932db0ea132e22
> +        .quad 0x3f9228fb1fea2e28
> +        .quad 0x3f912487a5507f70
> +        .quad 0x3f90205658935847
> +        .quad 0x3f8e38ce3033310c
> +        .quad 0x3f8c317384c75f06
> +        .quad 0x3f8a2a9c6c170462
> +        .quad 0x3f882448a388a2aa
> +        .quad 0x3f861e77e8b53fc6
> +        .quad 0x3f841929f96832f0
> +        .quad 0x3f82145e939ef1e9
> +        .quad 0x3f8010157588de71
> +        .quad 0x3f7c189cbb0e27fb
> +        .quad 0x3f78121214586b54
> +        .quad 0x3f740c8a747878e2
> +        .quad 0x3f70080559588b35
> +        .quad 0x3f680904828985c0
> +        .quad 0x3f60040155d5889e
> +        .quad 0x3f50020055655889
> +        .quad 0x0000000000000000
> +        /*== poly_coeff[4] ==*/
> +        .align 16
> +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 16
> +        .quad 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinLog1p = -1+2^(-53) ==*/
> +        .align 16
> +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff
> +        /*== MaxLog1p ==*/
> +        .align 16
> +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000
> +        /*== One ==*/
> +        .align 16
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== SgnMask ==*/
> +        .align 16
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== XThreshold ==*/
> +        .align 16
> +        .quad 0x3e00000000000000, 0x3e00000000000000
> +        /*== XhMask ==*/
> +        .align 16
> +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00
> +        /*== Threshold ==*/
> +        .align 16
> +        .quad 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 16
> +        .quad 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 16
> +        .quad 0x408ff00000000000, 0x408ff00000000000
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000
> +        /*== ExpMask2 ==*/
> +        .align 16
> +        .quad 0x7f40000000000000, 0x7f40000000000000
> +        /*== L2L ==*/
> +        .align 16
> +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> +        /*== dHalf ==*/
> +        .align 16
> +        .quad 0x3FE0000000000000, 0x3FE0000000000000
> +        /*== dSign ==*/
> +        .align 16
> +        .quad 0x8000000000000000, 0x8000000000000000
> +        /*== dTopMask12 ==*/
> +        .align 16
> +        .quad 0xFFFFFE0000000000, 0xFFFFFE0000000000
> +        /*== dTopMask41 ==*/
> +        .align 16
> +        .quad 0xFFFFFFFFFFFFF000, 0xFFFFFFFFFFFFF000
> +        /*== dTinyRange ==*/
> +        .align 16
> +        .quad 0x0350000000000000, 0x0350000000000000
> +        .align 16
> +        .type	__svml_datanh_data_internal,@object
> +        .size	__svml_datanh_data_internal,.-__svml_datanh_data_internal
> +        .align 16
> +
> +.FLT_21:
> +        .long	0x00000000,0x43380000,0x00000000,0x43380000
> +        .type	.FLT_21,@object
> +        .size	.FLT_21,16
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core-sse.S
> new file mode 100644
> index 0000000000..a39cbb7595
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized atanh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_atanh _ZGVdN4v_atanh_sse_wrapper
> +#include "../svml_d_atanh4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core.c
> new file mode 100644
> index 0000000000..e8ef343ae7
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized atanh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_atanh
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_atanh, __GI__ZGVdN4v_atanh, __redirect__ZGVdN4v_atanh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core_avx2.S
> new file mode 100644
> index 0000000000..1230029da2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh4_core_avx2.S
> @@ -0,0 +1,1479 @@
> +/* Function atanh vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
> + *
> + *   Special cases:
> + *
> + *   atanh(0)  = 0
> + *   atanh(+1) = +INF
> + *   atanh(-1) = -INF
> + *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
> + *
> + */
> +
> +/* Offsets for data table __svml_datanh_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	8224
> +#define poly_coeff                    	12352
> +#define ExpMask                       	12480
> +#define Two10                         	12512
> +#define MinLog1p                      	12544
> +#define MaxLog1p                      	12576
> +#define One                           	12608
> +#define SgnMask                       	12640
> +#define XThreshold                    	12672
> +#define XhMask                        	12704
> +#define Threshold                     	12736
> +#define Bias                          	12768
> +#define Bias1                         	12800
> +#define ExpMask0                      	12832
> +#define ExpMask2                      	12864
> +#define L2                            	12896
> +#define dHalf                         	12928
> +#define dSign                         	12960
> +#define dTopMask12                    	12992
> +#define dTopMask41                    	13024
> +#define TinyRange                     	13056
> +
> +/* Lookup bias for data table __svml_datanh_data_internal.  */
> +#define Table_Lookup_Bias               -0x405fe0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_atanh_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       Table_Lookup_Bias+__svml_datanh_data_internal(%rip), %r8
> +        vmovupd   SgnMask+__svml_datanh_data_internal(%rip), %ymm7
> +
> +/* Load the constant 1 and a sign mask */
> +        vmovupd   One+__svml_datanh_data_internal(%rip), %ymm11
> +        vmovapd   %ymm0, %ymm12
> +
> +/* Strip off the sign, so treat X as positive until right at the end */
> +        vandpd    %ymm7, %ymm12, %ymm0
> +        vsubpd    %ymm0, %ymm11, %ymm6
> +
> +/*
> + * Check whether |X| < 1, in which case we use the main function.
> + * Otherwise set the rangemask so that the callout will get used.
> + * Note that this will also use the callout for NaNs since not(NaN < 1).
> + */
> +        vcmpnlt_uqpd %ymm11, %ymm0, %ymm13
> +        vcmplt_oqpd TinyRange+__svml_datanh_data_internal(%rip), %ymm0, %ymm10
> +        vsubpd    %ymm6, %ymm11, %ymm15
> +
> +/*
> + * Compute V = 2 * X trivially, and UHi + U_lo = 1 - X in two pieces,
> + * the upper part UHi being <= 41 bits long. Then we have
> + * atanh(X) = 1/2 * log((1 + X) / (1 - X)) = 1/2 * log1p(V / (UHi + ULo)).
> + */
> +        vaddpd    %ymm0, %ymm0, %ymm3
> +        vcvtpd2ps %ymm6, %xmm5
> +        vsubpd    %ymm0, %ymm15, %ymm1
> +        vrcpps    %xmm5, %xmm4
> +        vmovapd   %ymm12, %ymm14
> +        vfmadd213pd %ymm12, %ymm12, %ymm14
> +        vcvtps2pd %xmm4, %ymm2
> +
> +/* Record the sign for eventual reincorporation. */
> +        vandpd    dSign+__svml_datanh_data_internal(%rip), %ymm12, %ymm9
> +
> +/* Or the sign bit in with the tiny result to handle atanh(-0) correctly */
> +        vorpd     %ymm9, %ymm14, %ymm8
> +        vandpd    dTopMask12+__svml_datanh_data_internal(%rip), %ymm2, %ymm14
> +
> +/* No need to split dU when FMA is available */
> +        vfnmadd213pd %ymm11, %ymm14, %ymm6
> +        vfnmadd231pd %ymm14, %ymm1, %ymm6
> +
> +/*
> + * Compute D = E + E^2 + E^3 + E^4 + E^5
> + * = E + (E + E^2) (E + E * E^2)
> + * Only saves when FMA is available
> + */
> +        vmovapd   %ymm11, %ymm0
> +        vmovapd   %ymm6, %ymm5
> +        vfmadd231pd %ymm6, %ymm6, %ymm0
> +        vfmadd213pd %ymm6, %ymm6, %ymm5
> +        vfmadd213pd %ymm11, %ymm0, %ymm5
> +        vmovmskpd %ymm13, %eax
> +
> +/*
> + * Split V as well into upper 41 bits and lower part, so that we can get
> + * a preliminary quotient estimate without rounding error.
> + */
> +        vandpd    dTopMask41+__svml_datanh_data_internal(%rip), %ymm3, %ymm13
> +        vsubpd    %ymm13, %ymm3, %ymm15
> +
> +/* Hence get initial quotient estimate QHi + QLo = R * VHi + R * VLo */
> +        vmulpd    %ymm13, %ymm14, %ymm2
> +        vmulpd    %ymm5, %ymm6, %ymm0
> +        vmulpd    %ymm15, %ymm14, %ymm4
> +
> +/* 2^ (-10-exp(X) ) */
> +        vmovupd   ExpMask2+__svml_datanh_data_internal(%rip), %ymm15
> +
> +/*
> + * Compute R * (VHi + VLo) * (1 + E + E^2 + E^3 + E^4 + E^5)
> + * = R *  (VHi + VLo) * (1 + D)
> + * = QHi + (QHi * D + QLo + QLo * D)
> + */
> +        vmulpd    %ymm0, %ymm2, %ymm6
> +        vfmadd213pd %ymm4, %ymm4, %ymm0
> +        vaddpd    %ymm0, %ymm6, %ymm5
> +
> +/*
> + * Now finally accumulate the high and low parts of the
> + * argument to log1p, H + L, with a final compensated summation.
> + */
> +        vaddpd    %ymm5, %ymm2, %ymm4
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * later incorporating L into the reduced argument.
> + * compute 1+x as high, low parts
> + */
> +        vmaxpd    %ymm4, %ymm11, %ymm1
> +        vminpd    %ymm4, %ymm11, %ymm3
> +        vandpd    %ymm7, %ymm4, %ymm7
> +        vcmplt_oqpd XThreshold+__svml_datanh_data_internal(%rip), %ymm7, %ymm0
> +        vaddpd    %ymm3, %ymm1, %ymm5
> +        vorpd     XhMask+__svml_datanh_data_internal(%rip), %ymm0, %ymm4
> +        vandpd    %ymm4, %ymm5, %ymm5
> +
> +/* preserve mantissa, set input exponent to 2^(-10) */
> +        vandpd    ExpMask+__svml_datanh_data_internal(%rip), %ymm5, %ymm6
> +        vorpd     Two10+__svml_datanh_data_internal(%rip), %ymm6, %ymm7
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        vcvtpd2ps %ymm7, %xmm13
> +        vsubpd    %ymm5, %ymm1, %ymm2
> +        vrcpps    %xmm13, %xmm14
> +        vaddpd    %ymm2, %ymm3, %ymm4
> +        vcvtps2pd %xmm14, %ymm3
> +
> +/* exponent bits */
> +        vpsrlq    $20, %ymm5, %ymm2
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        vroundpd  $0, %ymm3, %ymm3
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        vpsrlq    $40, %ymm3, %ymm13
> +
> +/* exponent of X needed to scale Xl */
> +        vandps    ExpMask0+__svml_datanh_data_internal(%rip), %ymm5, %ymm0
> +        vpsubq    %ymm0, %ymm15, %ymm6
> +
> +/* Finally, halve the result and reincorporate the sign */
> +        vxorpd    dHalf+__svml_datanh_data_internal(%rip), %ymm9, %ymm9
> +        vmovd     %xmm13, %edx
> +        vextractf128 $1, %ymm13, %xmm0
> +        movslq    %edx, %rdx
> +        vpextrd   $2, %xmm13, %ecx
> +        movslq    %ecx, %rcx
> +        vmovd     %xmm0, %esi
> +        vmovsd    (%r8,%rdx), %xmm14
> +        vmovhpd   (%r8,%rcx), %xmm14, %xmm15
> +
> +/* exponent*log(2.0) */
> +        vmovupd   Threshold+__svml_datanh_data_internal(%rip), %ymm14
> +        movslq    %esi, %rsi
> +        vpextrd   $2, %xmm0, %edi
> +        movslq    %edi, %rdi
> +        vextractf128 $1, %ymm2, %xmm1
> +        vshufps   $221, %xmm1, %xmm2, %xmm7
> +
> +/* scale DblRcp */
> +        vmulpd    %ymm6, %ymm3, %ymm2
> +        vmovsd    (%r8,%rsi), %xmm6
> +
> +/* biased exponent in DP format */
> +        vcvtdq2pd %xmm7, %ymm1
> +        vmovhpd   (%r8,%rdi), %xmm6, %xmm7
> +        vcmplt_oqpd %ymm3, %ymm14, %ymm3
> +
> +/* argument reduction */
> +        vfmsub213pd %ymm11, %ymm2, %ymm5
> +        vmulpd    %ymm2, %ymm4, %ymm11
> +        vmovupd   poly_coeff+64+__svml_datanh_data_internal(%rip), %ymm2
> +        vaddpd    %ymm11, %ymm5, %ymm5
> +        vandpd    Bias+__svml_datanh_data_internal(%rip), %ymm3, %ymm3
> +        vorpd     Bias1+__svml_datanh_data_internal(%rip), %ymm3, %ymm6
> +        vsubpd    %ymm6, %ymm1, %ymm1
> +        vfmadd213pd poly_coeff+96+__svml_datanh_data_internal(%rip), %ymm5, %ymm2
> +        vmulpd    %ymm5, %ymm5, %ymm4
> +        vmulpd    L2+__svml_datanh_data_internal(%rip), %ymm1, %ymm3
> +
> +/* polynomial */
> +        vmovupd   poly_coeff+__svml_datanh_data_internal(%rip), %ymm1
> +        vfmadd213pd poly_coeff+32+__svml_datanh_data_internal(%rip), %ymm5, %ymm1
> +        vfmadd213pd %ymm2, %ymm4, %ymm1
> +
> +/* reconstruction */
> +        vfmadd213pd %ymm5, %ymm4, %ymm1
> +        vinsertf128 $1, %xmm7, %ymm15, %ymm0
> +        vaddpd    %ymm1, %ymm0, %ymm0
> +        vaddpd    %ymm0, %ymm3, %ymm6
> +        vmulpd    %ymm6, %ymm9, %ymm0
> +        vblendvpd %ymm10, %ymm8, %ymm0, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm12
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm12, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      atanh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_atanh_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_datanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
> +        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
> +        __declspec(align(32)) VUINT32 ExpMask[4][2];
> +        __declspec(align(32)) VUINT32 Two10[4][2];
> +        __declspec(align(32)) VUINT32 MinLog1p[4][2];
> +        __declspec(align(32)) VUINT32 MaxLog1p[4][2];
> +        __declspec(align(32)) VUINT32 One[4][2];
> +        __declspec(align(32)) VUINT32 SgnMask[4][2];
> +        __declspec(align(32)) VUINT32 XThreshold[4][2];
> +        __declspec(align(32)) VUINT32 XhMask[4][2];
> +        __declspec(align(32)) VUINT32 Threshold[4][2];
> +        __declspec(align(32)) VUINT32 Bias[4][2];
> +        __declspec(align(32)) VUINT32 Bias1[4][2];
> +        __declspec(align(32)) VUINT32 ExpMask0[4][2];
> +        __declspec(align(32)) VUINT32 ExpMask2[4][2];
> +        __declspec(align(32)) VUINT32 L2[4][2];
> +        __declspec(align(32)) VUINT32 dHalf[4][2];
> +        __declspec(align(32)) VUINT32 dSign[4][2];
> +        __declspec(align(32)) VUINT32 dTopMask12[4][2];
> +        __declspec(align(32)) VUINT32 dTopMask41[4][2];
> +        __declspec(align(32)) VUINT32 TinyRange[4][2];
> +} __svml_datanh_data_internal;
> +#endif
> +__svml_datanh_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> +        /*== Log_LA_table ==*/
> +        .align 32
> +        .quad 0x8000000000000000
> +        .quad 0xbf5ff802a9ab10e6
> +        .quad 0xbf6ff00aa2b10bc0
> +        .quad 0xbf77ee11ebd82e94
> +        .quad 0xbf7fe02a6b106789
> +        .quad 0xbf83e7295d25a7d9
> +        .quad 0xbf87dc475f810a77
> +        .quad 0xbf8bcf712c74384c
> +        .quad 0xbf8fc0a8b0fc03e4
> +        .quad 0xbf91d7f7eb9eebe7
> +        .quad 0xbf93cea44346a575
> +        .quad 0xbf95c45a51b8d389
> +        .quad 0xbf97b91b07d5b11b
> +        .quad 0xbf99ace7551cc514
> +        .quad 0xbf9b9fc027af9198
> +        .quad 0xbf9d91a66c543cc4
> +        .quad 0xbf9f829b0e783300
> +        .quad 0xbfa0b94f7c196176
> +        .quad 0xbfa1b0d98923d980
> +        .quad 0xbfa2a7ec2214e873
> +        .quad 0xbfa39e87b9febd60
> +        .quad 0xbfa494acc34d911c
> +        .quad 0xbfa58a5bafc8e4d5
> +        .quad 0xbfa67f94f094bd98
> +        .quad 0xbfa77458f632dcfc
> +        .quad 0xbfa868a83083f6cf
> +        .quad 0xbfa95c830ec8e3eb
> +        .quad 0xbfaa4fe9ffa3d235
> +        .quad 0xbfab42dd711971bf
> +        .quad 0xbfac355dd0921f2d
> +        .quad 0xbfad276b8adb0b52
> +        .quad 0xbfae19070c276016
> +        .quad 0xbfaf0a30c01162a6
> +        .quad 0xbfaffae9119b9303
> +        .quad 0xbfb075983598e471
> +        .quad 0xbfb0ed839b5526fe
> +        .quad 0xbfb16536eea37ae1
> +        .quad 0xbfb1dcb263db1944
> +        .quad 0xbfb253f62f0a1417
> +        .quad 0xbfb2cb0283f5de1f
> +        .quad 0xbfb341d7961bd1d1
> +        .quad 0xbfb3b87598b1b6ee
> +        .quad 0xbfb42edcbea646f0
> +        .quad 0xbfb4a50d3aa1b040
> +        .quad 0xbfb51b073f06183f
> +        .quad 0xbfb590cafdf01c28
> +        .quad 0xbfb60658a93750c4
> +        .quad 0xbfb67bb0726ec0fc
> +        .quad 0xbfb6f0d28ae56b4c
> +        .quad 0xbfb765bf23a6be13
> +        .quad 0xbfb7da766d7b12cd
> +        .quad 0xbfb84ef898e8282a
> +        .quad 0xbfb8c345d6319b21
> +        .quad 0xbfb9375e55595ede
> +        .quad 0xbfb9ab42462033ad
> +        .quad 0xbfba1ef1d8061cd4
> +        .quad 0xbfba926d3a4ad563
> +        .quad 0xbfbb05b49bee43fe
> +        .quad 0xbfbb78c82bb0eda1
> +        .quad 0xbfbbeba818146765
> +        .quad 0xbfbc5e548f5bc743
> +        .quad 0xbfbcd0cdbf8c13e1
> +        .quad 0xbfbd4313d66cb35d
> +        .quad 0xbfbdb5270187d927
> +        .quad 0xbfbe27076e2af2e6
> +        .quad 0xbfbe98b549671467
> +        .quad 0xbfbf0a30c01162a6
> +        .quad 0xbfbf7b79fec37ddf
> +        .quad 0xbfbfec9131dbeabb
> +        .quad 0xbfc02ebb42bf3d4b
> +        .quad 0xbfc0671512ca596e
> +        .quad 0xbfc09f561ee719c3
> +        .quad 0xbfc0d77e7cd08e59
> +        .quad 0xbfc10f8e422539b1
> +        .quad 0xbfc14785846742ac
> +        .quad 0xbfc17f6458fca611
> +        .quad 0xbfc1b72ad52f67a0
> +        .quad 0xbfc1eed90e2dc2c3
> +        .quad 0xbfc2266f190a5acb
> +        .quad 0xbfc25ded0abc6ad2
> +        .quad 0xbfc29552f81ff523
> +        .quad 0xbfc2cca0f5f5f251
> +        .quad 0xbfc303d718e47fd3
> +        .quad 0xbfc33af575770e4f
> +        .quad 0xbfc371fc201e8f74
> +        .quad 0xbfc3a8eb2d31a376
> +        .quad 0xbfc3dfc2b0ecc62a
> +        .quad 0xbfc41682bf727bc0
> +        .quad 0xbfc44d2b6ccb7d1e
> +        .quad 0xbfc483bccce6e3dd
> +        .quad 0xbfc4ba36f39a55e5
> +        .quad 0xbfc4f099f4a230b2
> +        .quad 0xbfc526e5e3a1b438
> +        .quad 0xbfc55d1ad4232d6f
> +        .quad 0xbfc59338d9982086
> +        .quad 0xbfc5c940075972b9
> +        .quad 0xbfc5ff3070a793d4
> +        .quad 0xbfc6350a28aaa758
> +        .quad 0xbfc66acd4272ad51
> +        .quad 0xbfc6a079d0f7aad2
> +        .quad 0xbfc6d60fe719d21d
> +        .quad 0xbfc70b8f97a1aa75
> +        .quad 0xbfc740f8f54037a5
> +        .quad 0xbfc7764c128f2127
> +        .quad 0xbfc7ab890210d909
> +        .quad 0xbfc7e0afd630c274
> +        .quad 0xbfc815c0a14357eb
> +        .quad 0xbfc84abb75865139
> +        .quad 0xbfc87fa06520c911
> +        .quad 0xbfc8b46f8223625b
> +        .quad 0xbfc8e928de886d41
> +        .quad 0xbfc91dcc8c340bde
> +        .quad 0xbfc9525a9cf456b4
> +        .quad 0xbfc986d3228180ca
> +        .quad 0xbfc9bb362e7dfb83
> +        .quad 0xbfc9ef83d2769a34
> +        .quad 0xbfca23bc1fe2b563
> +        .quad 0xbfca57df28244dcd
> +        .quad 0xbfca8becfc882f19
> +        .quad 0xbfcabfe5ae46124c
> +        .quad 0xbfcaf3c94e80bff3
> +        .quad 0xbfcb2797ee46320c
> +        .quad 0xbfcb5b519e8fb5a4
> +        .quad 0xbfcb8ef670420c3b
> +        .quad 0xbfcbc286742d8cd6
> +        .quad 0xbfcbf601bb0e44e2
> +        .quad 0xbfcc2968558c18c1
> +        .quad 0xbfcc5cba543ae425
> +        .quad 0xbfcc8ff7c79a9a22
> +        .quad 0xbfccc320c0176502
> +        .quad 0xbfccf6354e09c5dc
> +        .quad 0xbfcd293581b6b3e7
> +        .quad 0xbfcd5c216b4fbb91
> +        .quad 0xbfcd8ef91af31d5e
> +        .quad 0xbfcdc1bca0abec7d
> +        .quad 0xbfcdf46c0c722d2f
> +        .quad 0xbfce27076e2af2e6
> +        .quad 0xbfce598ed5a87e2f
> +        .quad 0xbfce8c0252aa5a60
> +        .quad 0xbfcebe61f4dd7b0b
> +        .quad 0xbfcef0adcbdc5936
> +        .quad 0xbfcf22e5e72f105d
> +        .quad 0xbfcf550a564b7b37
> +        .quad 0xbfcf871b28955045
> +        .quad 0xbfcfb9186d5e3e2b
> +        .quad 0xbfcfeb0233e607cc
> +        .quad 0xbfd00e6c45ad501d
> +        .quad 0xbfd0274dc16c232f
> +        .quad 0xbfd0402594b4d041
> +        .quad 0xbfd058f3c703ebc6
> +        .quad 0xbfd071b85fcd590d
> +        .quad 0xbfd08a73667c57af
> +        .quad 0xbfd0a324e27390e3
> +        .quad 0xbfd0bbccdb0d24bd
> +        .quad 0xbfd0d46b579ab74b
> +        .quad 0xbfd0ed005f657da4
> +        .quad 0xbfd1058bf9ae4ad5
> +        .quad 0xbfd11e0e2dad9cb7
> +        .quad 0xbfd136870293a8b0
> +        .quad 0xbfd14ef67f88685a
> +        .quad 0xbfd1675cababa60e
> +        .quad 0xbfd17fb98e15095d
> +        .quad 0xbfd1980d2dd4236f
> +        .quad 0xbfd1b05791f07b49
> +        .quad 0xbfd1c898c16999fb
> +        .quad 0xbfd1e0d0c33716be
> +        .quad 0xbfd1f8ff9e48a2f3
> +        .quad 0xbfd211255986160c
> +        .quad 0xbfd22941fbcf7966
> +        .quad 0xbfd241558bfd1404
> +        .quad 0xbfd2596010df763a
> +        .quad 0xbfd27161913f853d
> +        .quad 0xbfd2895a13de86a3
> +        .quad 0xbfd2a1499f762bc9
> +        .quad 0xbfd2b9303ab89d25
> +        .quad 0xbfd2d10dec508583
> +        .quad 0xbfd2e8e2bae11d31
> +        .quad 0xbfd300aead06350c
> +        .quad 0xbfd31871c9544185
> +        .quad 0xbfd3302c16586588
> +        .quad 0xbfd347dd9a987d55
> +        .quad 0xbfd35f865c93293e
> +        .quad 0xbfd3772662bfd85b
> +        .quad 0xbfd38ebdb38ed321
> +        .quad 0xbfd3a64c556945ea
> +        .quad 0xbfd3bdd24eb14b6a
> +        .quad 0xbfd3d54fa5c1f710
> +        .quad 0xbfd3ecc460ef5f50
> +        .quad 0xbfd404308686a7e4
> +        .quad 0xbfd41b941cce0bee
> +        .quad 0xbfd432ef2a04e814
> +        .quad 0xbfd44a41b463c47c
> +        .quad 0xbfd4618bc21c5ec2
> +        .quad 0xbfd478cd5959b3d9
> +        .quad 0xbfd49006804009d1
> +        .quad 0xbfd4a7373cecf997
> +        .quad 0xbfd4be5f957778a1
> +        .quad 0xbfd4d57f8fefe27f
> +        .quad 0xbfd4ec973260026a
> +        .quad 0xbfd503a682cb1cb3
> +        .quad 0xbfd51aad872df82d
> +        .quad 0xbfd531ac457ee77e
> +        .quad 0xbfd548a2c3add263
> +        .quad 0xbfd55f9107a43ee2
> +        .quad 0xbfd5767717455a6c
> +        .quad 0xbfd58d54f86e02f2
> +        .quad 0xbfd5a42ab0f4cfe2
> +        .quad 0xbfd5baf846aa1b19
> +        .quad 0xbfd5d1bdbf5809ca
> +        .quad 0xbfd5e87b20c2954a
> +        .quad 0xbfd5ff3070a793d4
> +        .quad 0xbfd615ddb4bec13c
> +        .quad 0xbfd62c82f2b9c795
> +        .quad 0x3fd61965cdb02c1f
> +        .quad 0x3fd602d08af091ec
> +        .quad 0x3fd5ec433d5c35ae
> +        .quad 0x3fd5d5bddf595f30
> +        .quad 0x3fd5bf406b543db2
> +        .quad 0x3fd5a8cadbbedfa1
> +        .quad 0x3fd5925d2b112a59
> +        .quad 0x3fd57bf753c8d1fb
> +        .quad 0x3fd565995069514c
> +        .quad 0x3fd54f431b7be1a9
> +        .quad 0x3fd538f4af8f72fe
> +        .quad 0x3fd522ae0738a3d8
> +        .quad 0x3fd50c6f1d11b97c
> +        .quad 0x3fd4f637ebba9810
> +        .quad 0x3fd4e0086dd8baca
> +        .quad 0x3fd4c9e09e172c3c
> +        .quad 0x3fd4b3c077267e9a
> +        .quad 0x3fd49da7f3bcc41f
> +        .quad 0x3fd487970e958770
> +        .quad 0x3fd4718dc271c41b
> +        .quad 0x3fd45b8c0a17df13
> +        .quad 0x3fd44591e0539f49
> +        .quad 0x3fd42f9f3ff62642
> +        .quad 0x3fd419b423d5e8c7
> +        .quad 0x3fd403d086cea79c
> +        .quad 0x3fd3edf463c1683e
> +        .quad 0x3fd3d81fb5946dba
> +        .quad 0x3fd3c25277333184
> +        .quad 0x3fd3ac8ca38e5c5f
> +        .quad 0x3fd396ce359bbf54
> +        .quad 0x3fd3811728564cb2
> +        .quad 0x3fd36b6776be1117
> +        .quad 0x3fd355bf1bd82c8b
> +        .quad 0x3fd3401e12aecba1
> +        .quad 0x3fd32a84565120a8
> +        .quad 0x3fd314f1e1d35ce4
> +        .quad 0x3fd2ff66b04ea9d4
> +        .quad 0x3fd2e9e2bce12286
> +        .quad 0x3fd2d46602adccee
> +        .quad 0x3fd2bef07cdc9354
> +        .quad 0x3fd2a982269a3dbf
> +        .quad 0x3fd2941afb186b7c
> +        .quad 0x3fd27ebaf58d8c9d
> +        .quad 0x3fd269621134db92
> +        .quad 0x3fd25410494e56c7
> +        .quad 0x3fd23ec5991eba49
> +        .quad 0x3fd22981fbef797b
> +        .quad 0x3fd214456d0eb8d4
> +        .quad 0x3fd1ff0fe7cf47a7
> +        .quad 0x3fd1e9e1678899f4
> +        .quad 0x3fd1d4b9e796c245
> +        .quad 0x3fd1bf99635a6b95
> +        .quad 0x3fd1aa7fd638d33f
> +        .quad 0x3fd1956d3b9bc2fa
> +        .quad 0x3fd180618ef18adf
> +        .quad 0x3fd16b5ccbacfb73
> +        .quad 0x3fd1565eed455fc3
> +        .quad 0x3fd14167ef367783
> +        .quad 0x3fd12c77cd00713b
> +        .quad 0x3fd1178e8227e47c
> +        .quad 0x3fd102ac0a35cc1c
> +        .quad 0x3fd0edd060b78081
> +        .quad 0x3fd0d8fb813eb1ef
> +        .quad 0x3fd0c42d676162e3
> +        .quad 0x3fd0af660eb9e279
> +        .quad 0x3fd09aa572e6c6d4
> +        .quad 0x3fd085eb8f8ae797
> +        .quad 0x3fd07138604d5862
> +        .quad 0x3fd05c8be0d9635a
> +        .quad 0x3fd047e60cde83b8
> +        .quad 0x3fd03346e0106062
> +        .quad 0x3fd01eae5626c691
> +        .quad 0x3fd00a1c6adda473
> +        .quad 0x3fcfeb2233ea07cd
> +        .quad 0x3fcfc218be620a5e
> +        .quad 0x3fcf991c6cb3b379
> +        .quad 0x3fcf702d36777df0
> +        .quad 0x3fcf474b134df229
> +        .quad 0x3fcf1e75fadf9bde
> +        .quad 0x3fcef5ade4dcffe6
> +        .quad 0x3fceccf2c8fe920a
> +        .quad 0x3fcea4449f04aaf5
> +        .quad 0x3fce7ba35eb77e2a
> +        .quad 0x3fce530effe71012
> +        .quad 0x3fce2a877a6b2c12
> +        .quad 0x3fce020cc6235ab5
> +        .quad 0x3fcdd99edaf6d7e9
> +        .quad 0x3fcdb13db0d48940
> +        .quad 0x3fcd88e93fb2f450
> +        .quad 0x3fcd60a17f903515
> +        .quad 0x3fcd38666871f465
> +        .quad 0x3fcd1037f2655e7b
> +        .quad 0x3fcce816157f1988
> +        .quad 0x3fccc000c9db3c52
> +        .quad 0x3fcc97f8079d44ec
> +        .quad 0x3fcc6ffbc6f00f71
> +        .quad 0x3fcc480c0005ccd1
> +        .quad 0x3fcc2028ab17f9b4
> +        .quad 0x3fcbf851c067555f
> +        .quad 0x3fcbd087383bd8ad
> +        .quad 0x3fcba8c90ae4ad19
> +        .quad 0x3fcb811730b823d2
> +        .quad 0x3fcb5971a213acdb
> +        .quad 0x3fcb31d8575bce3d
> +        .quad 0x3fcb0a4b48fc1b46
> +        .quad 0x3fcae2ca6f672bd4
> +        .quad 0x3fcabb55c31693ad
> +        .quad 0x3fca93ed3c8ad9e3
> +        .quad 0x3fca6c90d44b704e
> +        .quad 0x3fca454082e6ab05
> +        .quad 0x3fca1dfc40f1b7f1
> +        .quad 0x3fc9f6c407089664
> +        .quad 0x3fc9cf97cdce0ec3
> +        .quad 0x3fc9a8778debaa38
> +        .quad 0x3fc981634011aa75
> +        .quad 0x3fc95a5adcf7017f
> +        .quad 0x3fc9335e5d594989
> +        .quad 0x3fc90c6db9fcbcd9
> +        .quad 0x3fc8e588ebac2dbf
> +        .quad 0x3fc8beafeb38fe8c
> +        .quad 0x3fc897e2b17b19a5
> +        .quad 0x3fc871213750e994
> +        .quad 0x3fc84a6b759f512f
> +        .quad 0x3fc823c16551a3c2
> +        .quad 0x3fc7fd22ff599d4f
> +        .quad 0x3fc7d6903caf5ad0
> +        .quad 0x3fc7b0091651528c
> +        .quad 0x3fc7898d85444c73
> +        .quad 0x3fc7631d82935a86
> +        .quad 0x3fc73cb9074fd14d
> +        .quad 0x3fc716600c914054
> +        .quad 0x3fc6f0128b756abc
> +        .quad 0x3fc6c9d07d203fc7
> +        .quad 0x3fc6a399dabbd383
> +        .quad 0x3fc67d6e9d785771
> +        .quad 0x3fc6574ebe8c133a
> +        .quad 0x3fc6313a37335d76
> +        .quad 0x3fc60b3100b09476
> +        .quad 0x3fc5e533144c1719
> +        .quad 0x3fc5bf406b543db2
> +        .quad 0x3fc59958ff1d52f1
> +        .quad 0x3fc5737cc9018cdd
> +        .quad 0x3fc54dabc26105d2
> +        .quad 0x3fc527e5e4a1b58d
> +        .quad 0x3fc5022b292f6a45
> +        .quad 0x3fc4dc7b897bc1c8
> +        .quad 0x3fc4b6d6fefe22a4
> +        .quad 0x3fc4913d8333b561
> +        .quad 0x3fc46baf0f9f5db7
> +        .quad 0x3fc4462b9dc9b3dc
> +        .quad 0x3fc420b32740fdd4
> +        .quad 0x3fc3fb45a59928cc
> +        .quad 0x3fc3d5e3126bc27f
> +        .quad 0x3fc3b08b6757f2a9
> +        .quad 0x3fc38b3e9e027479
> +        .quad 0x3fc365fcb0159016
> +        .quad 0x3fc340c59741142e
> +        .quad 0x3fc31b994d3a4f85
> +        .quad 0x3fc2f677cbbc0a96
> +        .quad 0x3fc2d1610c86813a
> +        .quad 0x3fc2ac55095f5c59
> +        .quad 0x3fc28753bc11aba5
> +        .quad 0x3fc2625d1e6ddf57
> +        .quad 0x3fc23d712a49c202
> +        .quad 0x3fc2188fd9807263
> +        .quad 0x3fc1f3b925f25d41
> +        .quad 0x3fc1ceed09853752
> +        .quad 0x3fc1aa2b7e23f72a
> +        .quad 0x3fc185747dbecf34
> +        .quad 0x3fc160c8024b27b1
> +        .quad 0x3fc13c2605c398c3
> +        .quad 0x3fc1178e8227e47c
> +        .quad 0x3fc0f301717cf0fb
> +        .quad 0x3fc0ce7ecdccc28d
> +        .quad 0x3fc0aa06912675d5
> +        .quad 0x3fc08598b59e3a07
> +        .quad 0x3fc06135354d4b18
> +        .quad 0x3fc03cdc0a51ec0d
> +        .quad 0x3fc0188d2ecf6140
> +        .quad 0x3fbfe89139dbd566
> +        .quad 0x3fbfa01c9db57ce2
> +        .quad 0x3fbf57bc7d9005db
> +        .quad 0x3fbf0f70cdd992e3
> +        .quad 0x3fbec739830a1120
> +        .quad 0x3fbe7f1691a32d3e
> +        .quad 0x3fbe3707ee30487b
> +        .quad 0x3fbdef0d8d466db9
> +        .quad 0x3fbda727638446a2
> +        .quad 0x3fbd5f55659210e2
> +        .quad 0x3fbd179788219364
> +        .quad 0x3fbccfedbfee13a8
> +        .quad 0x3fbc885801bc4b23
> +        .quad 0x3fbc40d6425a5cb1
> +        .quad 0x3fbbf968769fca11
> +        .quad 0x3fbbb20e936d6974
> +        .quad 0x3fbb6ac88dad5b1c
> +        .quad 0x3fbb23965a52ff00
> +        .quad 0x3fbadc77ee5aea8c
> +        .quad 0x3fba956d3ecade63
> +        .quad 0x3fba4e7640b1bc38
> +        .quad 0x3fba0792e9277cac
> +        .quad 0x3fb9c0c32d4d2548
> +        .quad 0x3fb97a07024cbe74
> +        .quad 0x3fb9335e5d594989
> +        .quad 0x3fb8ecc933aeb6e8
> +        .quad 0x3fb8a6477a91dc29
> +        .quad 0x3fb85fd927506a48
> +        .quad 0x3fb8197e2f40e3f0
> +        .quad 0x3fb7d33687c293c9
> +        .quad 0x3fb78d02263d82d3
> +        .quad 0x3fb746e100226ed9
> +        .quad 0x3fb700d30aeac0e1
> +        .quad 0x3fb6bad83c1883b6
> +        .quad 0x3fb674f089365a7a
> +        .quad 0x3fb62f1be7d77743
> +        .quad 0x3fb5e95a4d9791cb
> +        .quad 0x3fb5a3abb01ade25
> +        .quad 0x3fb55e10050e0384
> +        .quad 0x3fb518874226130a
> +        .quad 0x3fb4d3115d207eac
> +        .quad 0x3fb48dae4bc31018
> +        .quad 0x3fb4485e03dbdfad
> +        .quad 0x3fb403207b414b7f
> +        .quad 0x3fb3bdf5a7d1ee64
> +        .quad 0x3fb378dd7f749714
> +        .quad 0x3fb333d7f8183f4b
> +        .quad 0x3fb2eee507b40301
> +        .quad 0x3fb2aa04a44717a5
> +        .quad 0x3fb26536c3d8c369
> +        .quad 0x3fb2207b5c78549e
> +        .quad 0x3fb1dbd2643d190b
> +        .quad 0x3fb1973bd1465567
> +        .quad 0x3fb152b799bb3cc9
> +        .quad 0x3fb10e45b3cae831
> +        .quad 0x3fb0c9e615ac4e17
> +        .quad 0x3fb08598b59e3a07
> +        .quad 0x3fb0415d89e74444
> +        .quad 0x3faffa6911ab9301
> +        .quad 0x3faf723b517fc523
> +        .quad 0x3faeea31c006b87c
> +        .quad 0x3fae624c4a0b5e1b
> +        .quad 0x3fadda8adc67ee4e
> +        .quad 0x3fad52ed6405d86f
> +        .quad 0x3faccb73cdddb2cc
> +        .quad 0x3fac441e06f72a9e
> +        .quad 0x3fabbcebfc68f420
> +        .quad 0x3fab35dd9b58baad
> +        .quad 0x3faaaef2d0fb10fc
> +        .quad 0x3faa282b8a936171
> +        .quad 0x3fa9a187b573de7c
> +        .quad 0x3fa91b073efd7314
> +        .quad 0x3fa894aa149fb343
> +        .quad 0x3fa80e7023d8ccc4
> +        .quad 0x3fa788595a3577ba
> +        .quad 0x3fa70265a550e777
> +        .quad 0x3fa67c94f2d4bb58
> +        .quad 0x3fa5f6e73078efb8
> +        .quad 0x3fa5715c4c03ceef
> +        .quad 0x3fa4ebf43349e26f
> +        .quad 0x3fa466aed42de3ea
> +        .quad 0x3fa3e18c1ca0ae92
> +        .quad 0x3fa35c8bfaa1306b
> +        .quad 0x3fa2d7ae5c3c5bae
> +        .quad 0x3fa252f32f8d183f
> +        .quad 0x3fa1ce5a62bc353a
> +        .quad 0x3fa149e3e4005a8d
> +        .quad 0x3fa0c58fa19dfaaa
> +        .quad 0x3fa0415d89e74444
> +        .quad 0x3f9f7a9b16782856
> +        .quad 0x3f9e72bf2813ce51
> +        .quad 0x3f9d6b2725979802
> +        .quad 0x3f9c63d2ec14aaf2
> +        .quad 0x3f9b5cc258b718e6
> +        .quad 0x3f9a55f548c5c43f
> +        .quad 0x3f994f6b99a24475
> +        .quad 0x3f98492528c8cabf
> +        .quad 0x3f974321d3d006d3
> +        .quad 0x3f963d6178690bd6
> +        .quad 0x3f9537e3f45f3565
> +        .quad 0x3f9432a925980cc1
> +        .quad 0x3f932db0ea132e22
> +        .quad 0x3f9228fb1fea2e28
> +        .quad 0x3f912487a5507f70
> +        .quad 0x3f90205658935847
> +        .quad 0x3f8e38ce3033310c
> +        .quad 0x3f8c317384c75f06
> +        .quad 0x3f8a2a9c6c170462
> +        .quad 0x3f882448a388a2aa
> +        .quad 0x3f861e77e8b53fc6
> +        .quad 0x3f841929f96832f0
> +        .quad 0x3f82145e939ef1e9
> +        .quad 0x3f8010157588de71
> +        .quad 0x3f7c189cbb0e27fb
> +        .quad 0x3f78121214586b54
> +        .quad 0x3f740c8a747878e2
> +        .quad 0x3f70080559588b35
> +        .quad 0x3f680904828985c0
> +        .quad 0x3f60040155d5889e
> +        .quad 0x3f50020055655889
> +        .quad 0x0000000000000000
> +        /*== poly_coeff[4] ==*/
> +        .align 32
> +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 32
> +        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinLog1p = -1+2^(-53) ==*/
> +        .align 32
> +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff
> +        /*== MaxLog1p ==*/
> +        .align 32
> +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000
> +        /*== One ==*/
> +        .align 32
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== SgnMask ==*/
> +        .align 32
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== XThreshold ==*/
> +        .align 32
> +        .quad 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000
> +        /*== XhMask ==*/
> +        .align 32
> +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00
> +        /*== Threshold ==*/
> +        .align 32
> +        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 32
> +        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 32
> +        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000
> +        /*== ExpMask2 ==*/
> +        .align 32
> +        .quad 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000
> +        /*== L2L ==*/
> +        .align 32
> +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> +        /*== dHalf ==*/
> +        .align 32
> +        .quad 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000
> +        /*== dSign ==*/
> +        .align 32
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
> +        /*== dTopMask12 ==*/
> +        .align 32
> +        .quad 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000
> +        /*== dTopMask41 ==*/
> +        .align 32
> +        .quad 0xFFFFFFFFFFFFF000, 0xFFFFFFFFFFFFF000, 0xFFFFFFFFFFFFF000, 0xFFFFFFFFFFFFF000
> +        /*== dTinyRange ==*/
> +        .align 32
> +        .quad 0x0350000000000000, 0x0350000000000000, 0x0350000000000000, 0x0350000000000000
> +        .align 32
> +        .type	__svml_datanh_data_internal,@object
> +        .size	__svml_datanh_data_internal,.-__svml_datanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core-avx2.S
> new file mode 100644
> index 0000000000..675ebd2fd6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized atanh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_atanh _ZGVeN8v_atanh_avx2_wrapper
> +#include "../svml_d_atanh8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core.c
> new file mode 100644
> index 0000000000..4da8e20fad
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized atanh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_atanh
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_atanh, __GI__ZGVeN8v_atanh, __redirect__ZGVeN8v_atanh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core_avx512.S
> new file mode 100644
> index 0000000000..ef600c073a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atanh8_core_avx512.S
> @@ -0,0 +1,401 @@
> +/* Function atanh vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
> + *   using small lookup table that map to AVX-512 permute instructions
> + *
> + *   Special cases:
> + *
> + *   atanh(0)  = 0
> + *   atanh(+1) = +INF
> + *   atanh(-1) = -INF
> + *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
> + *
> + */
> +
> +/* Offsets for data table __svml_datanh_data_internal_avx512
> + */
> +#define Log_tbl_H                     	0
> +#define Log_tbl_L                     	128
> +#define One                           	256
> +#define AbsMask                       	320
> +#define AddB5                         	384
> +#define RcpBitMask                    	448
> +#define poly_coeff8                   	512
> +#define poly_coeff7                   	576
> +#define poly_coeff6                   	640
> +#define poly_coeff5                   	704
> +#define poly_coeff4                   	768
> +#define poly_coeff3                   	832
> +#define poly_coeff2                   	896
> +#define poly_coeff1                   	960
> +#define poly_coeff0                   	1024
> +#define Half                          	1088
> +#define L2H                           	1152
> +#define L2L                           	1216
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_atanh_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   One+__svml_datanh_data_internal_avx512(%rip), %zmm15
> +
> +/* round reciprocals to 1+4b mantissas */
> +        vmovups   AddB5+__svml_datanh_data_internal_avx512(%rip), %zmm6
> +        vmovups   RcpBitMask+__svml_datanh_data_internal_avx512(%rip), %zmm9
> +        vmovaps   %zmm0, %zmm2
> +        vandpd    AbsMask+__svml_datanh_data_internal_avx512(%rip), %zmm2, %zmm13
> +
> +/* 1+y */
> +        vaddpd    {rn-sae}, %zmm15, %zmm13, %zmm0
> +
> +/* 1-y */
> +        vsubpd    {rn-sae}, %zmm13, %zmm15, %zmm4
> +        vxorpd    %zmm13, %zmm2, %zmm1
> +
> +/* Yp_high */
> +        vsubpd    {rn-sae}, %zmm15, %zmm0, %zmm7
> +
> +/* -Ym_high */
> +        vsubpd    {rn-sae}, %zmm15, %zmm4, %zmm12
> +
> +/* RcpP ~ 1/Yp */
> +        vrcp14pd  %zmm0, %zmm3
> +
> +/* RcpM ~ 1/Ym */
> +        vrcp14pd  %zmm4, %zmm5
> +
> +/* input outside (-1, 1) ? */
> +        vcmppd    $21, {sae}, %zmm15, %zmm13, %k0
> +        vpaddq    %zmm6, %zmm3, %zmm11
> +        vpaddq    %zmm6, %zmm5, %zmm10
> +
> +/* Yp_low */
> +        vsubpd    {rn-sae}, %zmm7, %zmm13, %zmm8
> +        vandpd    %zmm9, %zmm11, %zmm14
> +        vandpd    %zmm9, %zmm10, %zmm3
> +
> +/* Ym_low */
> +        vaddpd    {rn-sae}, %zmm12, %zmm13, %zmm12
> +
> +/* Reduced argument: Rp = (RcpP*Yp - 1)+RcpP*Yp_low */
> +        vfmsub213pd {rn-sae}, %zmm15, %zmm14, %zmm0
> +
> +/* Reduced argument: Rm = (RcpM*Ym - 1)+RcpM*Ym_low */
> +        vfmsub231pd {rn-sae}, %zmm3, %zmm4, %zmm15
> +
> +/* exponents */
> +        vgetexppd {sae}, %zmm14, %zmm5
> +        vgetexppd {sae}, %zmm3, %zmm4
> +
> +/* Table lookups */
> +        vmovups   __svml_datanh_data_internal_avx512(%rip), %zmm9
> +        vmovups   Log_tbl_H+64+__svml_datanh_data_internal_avx512(%rip), %zmm13
> +        vmovups   Log_tbl_L+__svml_datanh_data_internal_avx512(%rip), %zmm7
> +        vfmadd231pd {rn-sae}, %zmm14, %zmm8, %zmm0
> +        vfnmadd231pd {rn-sae}, %zmm3, %zmm12, %zmm15
> +
> +/* Prepare table index */
> +        vpsrlq    $48, %zmm14, %zmm11
> +        vpsrlq    $48, %zmm3, %zmm8
> +        vmovups   Log_tbl_L+64+__svml_datanh_data_internal_avx512(%rip), %zmm14
> +
> +/* polynomials */
> +        vmovups   poly_coeff8+__svml_datanh_data_internal_avx512(%rip), %zmm3
> +
> +/* Km-Kp */
> +        vsubpd    {rn-sae}, %zmm5, %zmm4, %zmm5
> +        vmovups   poly_coeff7+__svml_datanh_data_internal_avx512(%rip), %zmm4
> +        kmovw     %k0, %edx
> +        vmovaps   %zmm11, %zmm10
> +        vmovaps   %zmm4, %zmm6
> +        vpermi2pd %zmm13, %zmm9, %zmm10
> +        vpermi2pd %zmm14, %zmm7, %zmm11
> +        vpermt2pd %zmm13, %zmm8, %zmm9
> +        vpermt2pd %zmm14, %zmm8, %zmm7
> +        vmovups   poly_coeff6+__svml_datanh_data_internal_avx512(%rip), %zmm8
> +        vfmadd231pd {rn-sae}, %zmm0, %zmm3, %zmm6
> +        vfmadd231pd {rn-sae}, %zmm15, %zmm3, %zmm4
> +        vmovups   poly_coeff3+__svml_datanh_data_internal_avx512(%rip), %zmm13
> +        vmovups   poly_coeff2+__svml_datanh_data_internal_avx512(%rip), %zmm14
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm0, %zmm6
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm15, %zmm4
> +        vmovups   poly_coeff0+__svml_datanh_data_internal_avx512(%rip), %zmm8
> +        vsubpd    {rn-sae}, %zmm11, %zmm7, %zmm12
> +
> +/* table values */
> +        vsubpd    {rn-sae}, %zmm10, %zmm9, %zmm3
> +        vmovups   poly_coeff5+__svml_datanh_data_internal_avx512(%rip), %zmm7
> +        vmovups   poly_coeff4+__svml_datanh_data_internal_avx512(%rip), %zmm9
> +
> +/* K*L2H + Th */
> +        vmovups   L2H+__svml_datanh_data_internal_avx512(%rip), %zmm10
> +
> +/* K*L2L + Tl */
> +        vmovups   L2L+__svml_datanh_data_internal_avx512(%rip), %zmm11
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm0, %zmm6
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm15, %zmm4
> +        vmovups   poly_coeff1+__svml_datanh_data_internal_avx512(%rip), %zmm7
> +        vfmadd231pd {rn-sae}, %zmm5, %zmm10, %zmm3
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm11, %zmm5
> +        vfmadd213pd {rn-sae}, %zmm9, %zmm0, %zmm6
> +        vfmadd213pd {rn-sae}, %zmm9, %zmm15, %zmm4
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm0, %zmm6
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm15, %zmm4
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm6
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm15, %zmm4
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm0, %zmm6
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm15, %zmm4
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm0, %zmm6
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm15, %zmm4
> +
> +/* (K*L2L + Tl) + Rp*PolyP */
> +        vfmadd213pd {rn-sae}, %zmm5, %zmm0, %zmm6
> +        vorpd     Half+__svml_datanh_data_internal_avx512(%rip), %zmm1, %zmm0
> +
> +/* (K*L2L + Tl) + Rp*PolyP -Rm*PolyM */
> +        vfnmadd213pd {rn-sae}, %zmm6, %zmm15, %zmm4
> +        vaddpd    {rn-sae}, %zmm4, %zmm3, %zmm1
> +        vmulpd    {rn-sae}, %zmm0, %zmm1, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm2, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      atanh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_atanh_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_datanh_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Log_tbl_H[16][2];
> +        __declspec(align(64)) VUINT32 Log_tbl_L[16][2];
> +        __declspec(align(64)) VUINT32 One[8][2];
> +        __declspec(align(64)) VUINT32 AbsMask[8][2];
> +        __declspec(align(64)) VUINT32 AddB5[8][2];
> +        __declspec(align(64)) VUINT32 RcpBitMask[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff0[8][2];
> +        __declspec(align(64)) VUINT32 Half[8][2];
> +        __declspec(align(64)) VUINT32 L2H[8][2];
> +        __declspec(align(64)) VUINT32 L2L[8][2];
> +    } __svml_datanh_data_internal_avx512;
> +#endif
> +__svml_datanh_data_internal_avx512:
> +        /*== Log_tbl_H ==*/
> +        .quad 0x0000000000000000
> +        .quad 0x3faf0a30c0100000
> +        .quad 0x3fbe27076e2a0000
> +        .quad 0x3fc5ff3070a80000
> +        .quad 0x3fcc8ff7c79b0000
> +        .quad 0x3fd1675cabab8000
> +        .quad 0x3fd4618bc21c8000
> +        .quad 0x3fd739d7f6bc0000
> +        .quad 0x3fd9f323ecbf8000
> +        .quad 0x3fdc8ff7c79a8000
> +        .quad 0x3fdf128f5faf0000
> +        .quad 0x3fe0be72e4254000
> +        .quad 0x3fe1e85f5e704000
> +        .quad 0x3fe307d7334f0000
> +        .quad 0x3fe41d8fe8468000
> +        .quad 0x3fe52a2d265bc000
> +        /*== Log_tbl_L ==*/
> +        .align 64
> +        .quad 0x0000000000000000
> +        .quad 0x3d662a6617cc9717
> +        .quad 0x3d6e5cbd3d50fffc
> +        .quad 0xbd6b0b0de3077d7e
> +        .quad 0xbd697794f689f843
> +        .quad 0x3d630701ce63eab9
> +        .quad 0xbd609ec17a426426
> +        .quad 0xbd67fcb18ed9d603
> +        .quad 0x3d584bf2b68d766f
> +        .quad 0x3d5a21ac25d81ef3
> +        .quad 0x3d3bb2cd720ec44c
> +        .quad 0xbd657d49676844cc
> +        .quad 0x3d1a07bd8b34be7c
> +        .quad 0x3d60be1fb590a1f5
> +        .quad 0xbd5aa33736867a17
> +        .quad 0x3d46abb9df22bc57
> +        /*== One ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== AbsMask ==*/
> +        .align 64
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== AddB5 ==*/
> +        .align 64
> +        .quad 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000
> +        /*== RcpBitMask ==*/
> +        .align 64
> +        .quad 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000
> +        /*== poly_coeff8 ==*/
> +        .align 64
> +        .quad 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142, 0x3fbc81dd40d38142
> +        /*== poly_coeff7 ==*/
> +        .align 64
> +        .quad 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70, 0xbfc0073cb82e8b70
> +        /*== poly_coeff6 ==*/
> +        .align 64
> +        .quad 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8, 0x3fc2492298ffdae8
> +        /*== poly_coeff5 ==*/
> +        .align 64
> +        .quad 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5, 0xbfc55553f871e5c5
> +        /*== poly_coeff4 ==*/
> +        .align 64
> +        .quad 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a, 0x3fc9999999cd394a
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .quad 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01, 0xbfd00000000c2a01
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .quad 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462, 0x3fd5555555555462
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .quad 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5, 0xbfdfffffffffffc5
> +        /*== poly_coeff0 ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== Half ==*/
> +        .align 64
> +        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
> +        /*== L2H = log(2)_high ==*/
> +        .align 64
> +        .quad 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000
> +        /*== L2L = log(2)_low ==*/
> +        .align 64
> +        .quad 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000
> +        .align 64
> +        .type	__svml_datanh_data_internal_avx512,@object
> +        .size	__svml_datanh_data_internal_avx512,.-__svml_datanh_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core-avx2.S
> new file mode 100644
> index 0000000000..1af3662f65
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized atanhf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_atanhf _ZGVeN16v_atanhf_avx2_wrapper
> +#include "../svml_s_atanhf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core.c
> new file mode 100644
> index 0000000000..4b1190f0eb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atanhf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_atanhf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_atanhf, __GI__ZGVeN16v_atanhf,
> +	       __redirect__ZGVeN16v_atanhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core_avx512.S
> new file mode 100644
> index 0000000000..6c5f6a54fa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core_avx512.S
> @@ -0,0 +1,393 @@
> +/* Function atanhf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
> + *   using small lookup table that map to AVX-512 permute instructions
> + *
> + *   Special cases:
> + *
> + *   atanh(0)  = 0
> + *   atanh(+1) = +INF
> + *   atanh(-1) = -INF
> + *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
> + *
> + */
> +
> +/* Offsets for data table __svml_satanh_data_internal_avx512
> + */
> +#define Log_tbl_H                     	0
> +#define Log_tbl_L                     	128
> +#define One                           	256
> +#define AbsMask                       	320
> +#define AddB5                         	384
> +#define RcpBitMask                    	448
> +#define poly_coeff3                   	512
> +#define poly_coeff2                   	576
> +#define poly_coeff1                   	640
> +#define poly_coeff0                   	704
> +#define Half                          	768
> +#define L2H                           	832
> +#define L2L                           	896
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_atanhf_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   One+__svml_satanh_data_internal_avx512(%rip), %zmm4
> +
> +/* round reciprocals to 1+5b mantissas */
> +        vmovups   AddB5+__svml_satanh_data_internal_avx512(%rip), %zmm14
> +        vmovups   RcpBitMask+__svml_satanh_data_internal_avx512(%rip), %zmm1
> +        vmovaps   %zmm0, %zmm11
> +        vandps    AbsMask+__svml_satanh_data_internal_avx512(%rip), %zmm11, %zmm6
> +
> +/* 1+y */
> +        vaddps    {rn-sae}, %zmm4, %zmm6, %zmm9
> +
> +/* 1-y */
> +        vsubps    {rn-sae}, %zmm6, %zmm4, %zmm8
> +        vxorps    %zmm6, %zmm11, %zmm10
> +
> +/* Yp_high */
> +        vsubps    {rn-sae}, %zmm4, %zmm9, %zmm2
> +
> +/* -Ym_high */
> +        vsubps    {rn-sae}, %zmm4, %zmm8, %zmm5
> +
> +/* RcpP ~ 1/Yp */
> +        vrcp14ps  %zmm9, %zmm12
> +
> +/* RcpM ~ 1/Ym */
> +        vrcp14ps  %zmm8, %zmm13
> +
> +/* input outside (-1, 1) ? */
> +        vcmpps    $21, {sae}, %zmm4, %zmm6, %k0
> +        vpaddd    %zmm14, %zmm12, %zmm15
> +        vpaddd    %zmm14, %zmm13, %zmm0
> +
> +/* Yp_low */
> +        vsubps    {rn-sae}, %zmm2, %zmm6, %zmm3
> +        vandps    %zmm1, %zmm15, %zmm7
> +        vandps    %zmm1, %zmm0, %zmm12
> +
> +/* Ym_low */
> +        vaddps    {rn-sae}, %zmm5, %zmm6, %zmm5
> +
> +/* Reduced argument: Rp = (RcpP*Yp - 1)+RcpP*Yp_low */
> +        vfmsub213ps {rn-sae}, %zmm4, %zmm7, %zmm9
> +
> +/* Reduced argument: Rm = (RcpM*Ym - 1)+RcpM*Ym_low */
> +        vfmsub231ps {rn-sae}, %zmm12, %zmm8, %zmm4
> +        vmovups   Log_tbl_L+__svml_satanh_data_internal_avx512(%rip), %zmm8
> +        vmovups   Log_tbl_L+64+__svml_satanh_data_internal_avx512(%rip), %zmm13
> +
> +/* exponents */
> +        vgetexpps {sae}, %zmm7, %zmm15
> +        vfmadd231ps {rn-sae}, %zmm7, %zmm3, %zmm9
> +
> +/* Table lookups */
> +        vmovups   __svml_satanh_data_internal_avx512(%rip), %zmm6
> +        vgetexpps {sae}, %zmm12, %zmm14
> +        vfnmadd231ps {rn-sae}, %zmm12, %zmm5, %zmm4
> +
> +/* Prepare table index */
> +        vpsrld    $18, %zmm7, %zmm3
> +        vpsrld    $18, %zmm12, %zmm2
> +        vmovups   Log_tbl_H+64+__svml_satanh_data_internal_avx512(%rip), %zmm7
> +        vmovups   poly_coeff1+__svml_satanh_data_internal_avx512(%rip), %zmm12
> +
> +/* Km-Kp */
> +        vsubps    {rn-sae}, %zmm15, %zmm14, %zmm1
> +        kmovw     %k0, %edx
> +        vmovaps   %zmm3, %zmm0
> +        vpermi2ps %zmm13, %zmm8, %zmm3
> +        vpermt2ps %zmm13, %zmm2, %zmm8
> +        vpermi2ps %zmm7, %zmm6, %zmm0
> +        vpermt2ps %zmm7, %zmm2, %zmm6
> +        vsubps    {rn-sae}, %zmm3, %zmm8, %zmm5
> +
> +/* K*L2H + Th */
> +        vmovups   L2H+__svml_satanh_data_internal_avx512(%rip), %zmm2
> +
> +/* K*L2L + Tl */
> +        vmovups   L2L+__svml_satanh_data_internal_avx512(%rip), %zmm3
> +
> +/* polynomials */
> +        vmovups   poly_coeff3+__svml_satanh_data_internal_avx512(%rip), %zmm7
> +        vmovups   poly_coeff0+__svml_satanh_data_internal_avx512(%rip), %zmm13
> +
> +/* table values */
> +        vsubps    {rn-sae}, %zmm0, %zmm6, %zmm0
> +        vfmadd231ps {rn-sae}, %zmm1, %zmm2, %zmm0
> +        vfmadd213ps {rn-sae}, %zmm5, %zmm3, %zmm1
> +        vmovups   poly_coeff2+__svml_satanh_data_internal_avx512(%rip), %zmm3
> +        vmovaps   %zmm3, %zmm2
> +        vfmadd231ps {rn-sae}, %zmm9, %zmm7, %zmm2
> +        vfmadd231ps {rn-sae}, %zmm4, %zmm7, %zmm3
> +        vfmadd213ps {rn-sae}, %zmm12, %zmm9, %zmm2
> +        vfmadd213ps {rn-sae}, %zmm12, %zmm4, %zmm3
> +        vfmadd213ps {rn-sae}, %zmm13, %zmm9, %zmm2
> +        vfmadd213ps {rn-sae}, %zmm13, %zmm4, %zmm3
> +
> +/* (K*L2L + Tl) + Rp*PolyP */
> +        vfmadd213ps {rn-sae}, %zmm1, %zmm9, %zmm2
> +        vorps     Half+__svml_satanh_data_internal_avx512(%rip), %zmm10, %zmm9
> +
> +/* (K*L2L + Tl) + Rp*PolyP -Rm*PolyM */
> +        vfnmadd213ps {rn-sae}, %zmm2, %zmm4, %zmm3
> +        vaddps    {rn-sae}, %zmm3, %zmm0, %zmm4
> +        vmulps    {rn-sae}, %zmm9, %zmm4, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm11
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm11, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      atanhf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_atanhf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_satanh_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Log_tbl_H[32][1];
> +        __declspec(align(64)) VUINT32 Log_tbl_L[32][1];
> +        __declspec(align(64)) VUINT32 One[16][1];
> +        __declspec(align(64)) VUINT32 AbsMask[16][1];
> +        __declspec(align(64)) VUINT32 AddB5[16][1];
> +        __declspec(align(64)) VUINT32 RcpBitMask[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff0[16][1];
> +        __declspec(align(64)) VUINT32 Half[16][1];
> +        __declspec(align(64)) VUINT32 L2H[16][1];
> +        __declspec(align(64)) VUINT32 L2L[16][1];
> +    } __svml_satanh_data_internal_avx512;
> +#endif
> +__svml_satanh_data_internal_avx512:
> +        /*== Log_tbl_H ==*/
> +        .long 0x00000000
> +        .long 0x3cfc0000
> +        .long 0x3d780000
> +        .long 0x3db78000
> +        .long 0x3df10000
> +        .long 0x3e14c000
> +        .long 0x3e300000
> +        .long 0x3e4a8000
> +        .long 0x3e648000
> +        .long 0x3e7dc000
> +        .long 0x3e8b4000
> +        .long 0x3e974000
> +        .long 0x3ea30000
> +        .long 0x3eae8000
> +        .long 0x3eb9c000
> +        .long 0x3ec4e000
> +        .long 0x3ecfa000
> +        .long 0x3eda2000
> +        .long 0x3ee48000
> +        .long 0x3eeea000
> +        .long 0x3ef8a000
> +        .long 0x3f013000
> +        .long 0x3f05f000
> +        .long 0x3f0aa000
> +        .long 0x3f0f4000
> +        .long 0x3f13d000
> +        .long 0x3f184000
> +        .long 0x3f1ca000
> +        .long 0x3f20f000
> +        .long 0x3f252000
> +        .long 0x3f295000
> +        .long 0x3f2d7000
> +        /*== Log_tbl_L ==*/
> +        .align 64
> +        .long 0x00000000
> +        .long 0x3726c39e
> +        .long 0x38a30c01
> +        .long 0x37528ae5
> +        .long 0x38e0edc5
> +        .long 0xb8ab41f8
> +        .long 0xb7cf8f58
> +        .long 0x3896a73d
> +        .long 0xb5838656
> +        .long 0x380c36af
> +        .long 0xb8235454
> +        .long 0x3862bae1
> +        .long 0x38c5e10e
> +        .long 0x38dedfac
> +        .long 0x38ebfb5e
> +        .long 0xb8e63c9f
> +        .long 0xb85c1340
> +        .long 0x38777bcd
> +        .long 0xb6038656
> +        .long 0x37d40984
> +        .long 0xb8b85028
> +        .long 0xb8ad5a5a
> +        .long 0x3865c84a
> +        .long 0x38c3d2f5
> +        .long 0x383ebce1
> +        .long 0xb8a1ed76
> +        .long 0xb7a332c4
> +        .long 0xb779654f
> +        .long 0xb8602f73
> +        .long 0x38f85db0
> +        .long 0x37b4996f
> +        .long 0xb8bfb3ca
> +        /*== One ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== AbsMask ==*/
> +        .align 64
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== AddB5 ==*/
> +        .align 64
> +        .long 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000
> +        /*== RcpBitMask ==*/
> +        .align 64
> +        .long 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .long 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .long 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000
> +        /*== poly_coeff0 ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== Half ==*/
> +        .align 64
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
> +        /*== L2H = log(2)_high ==*/
> +        .align 64
> +        .long 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000
> +        /*== L2L = log(2)_low ==*/
> +        .align 64
> +        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4
> +        .align 64
> +        .type	__svml_satanh_data_internal_avx512,@object
> +        .size	__svml_satanh_data_internal_avx512,.-__svml_satanh_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core-sse2.S
> new file mode 100644
> index 0000000000..b750092887
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized atanhf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_atanhf _ZGVbN4v_atanhf_sse2
> +#include "../svml_s_atanhf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core.c
> new file mode 100644
> index 0000000000..46624c48cd
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atanhf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_atanhf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_atanhf, __GI__ZGVbN4v_atanhf,
> +	       __redirect__ZGVbN4v_atanhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core_sse4.S
> new file mode 100644
> index 0000000000..77e46cb5b9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf4_core_sse4.S
> @@ -0,0 +1,361 @@
> +/* Function atanhf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
> + *
> + *   Special cases:
> + *
> + *   atanh(0)  = 0
> + *   atanh(+1) = +INF
> + *   atanh(-1) = -INF
> + *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
> + *
> + */
> +
> +/* Offsets for data table __svml_satanh_data_internal
> + */
> +#define SgnMask                       	0
> +#define sOne                          	16
> +#define sPoly                         	32
> +#define iBrkValue                     	160
> +#define iOffExpoMask                  	176
> +#define sHalf                         	192
> +#define sSign                         	208
> +#define sTopMask12                    	224
> +#define TinyRange                     	240
> +#define sLn2                          	256
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_atanhf_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm5
> +
> +/* Load constants including One = 1 */
> +        movups    sOne+__svml_satanh_data_internal(%rip), %xmm4
> +        movaps    %xmm5, %xmm3
> +
> +/* Strip off the sign, so treat X as positive until right at the end */
> +        movups    SgnMask+__svml_satanh_data_internal(%rip), %xmm7
> +        movaps    %xmm4, %xmm8
> +        andps     %xmm5, %xmm7
> +        movaps    %xmm4, %xmm10
> +        movups    sTopMask12+__svml_satanh_data_internal(%rip), %xmm11
> +        movaps    %xmm4, %xmm14
> +        movaps    %xmm11, %xmm9
> +
> +/*
> + * Compute V = 2 * X trivially, and UHi + U_lo = 1 - X in two pieces,
> + * the upper part UHi being <= 12 bits long. Then we have
> + * atanh(X) = 1/2 * log((1 + X) / (1 - X)) = 1/2 * log1p(V / (UHi + ULo)).
> + */
> +        movaps    %xmm7, %xmm12
> +
> +/*
> + * Check whether |X| < 1, in which case we use the main function.
> + * Otherwise set the rangemask so that the callout will get used.
> + * Note that this will also use the callout for NaNs since not(NaN < 1).
> + */
> +        movaps    %xmm7, %xmm6
> +        movaps    %xmm7, %xmm2
> +        cmpnltps  %xmm4, %xmm6
> +        cmpltps   TinyRange+__svml_satanh_data_internal(%rip), %xmm2
> +        mulps     %xmm5, %xmm3
> +        subps     %xmm7, %xmm8
> +        addps     %xmm7, %xmm12
> +        movmskps  %xmm6, %edx
> +        subps     %xmm8, %xmm10
> +        addps     %xmm5, %xmm3
> +        subps     %xmm7, %xmm10
> +        andps     %xmm8, %xmm9
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * later incorporating L into the reduced argument.
> + * compute 1+x as high, low parts
> + */
> +        movaps    %xmm4, %xmm7
> +
> +/*
> + * Now compute R = 1/(UHi+ULo) * (1 - E) and the error term E
> + * The first FMR is exact (we force R to 12 bits just in case it
> + * isn't already, to make absolutely sure), and since E is ~ 2^-12,
> + * the rounding error in the other one is acceptable.
> + */
> +        rcpps     %xmm9, %xmm15
> +        subps     %xmm9, %xmm8
> +        andps     %xmm11, %xmm15
> +
> +/*
> + * Split V as well into upper 12 bits and lower part, so that we can get
> + * a preliminary quotient estimate without rounding error.
> + */
> +        andps     %xmm12, %xmm11
> +        mulps     %xmm15, %xmm9
> +        addps     %xmm8, %xmm10
> +        subps     %xmm11, %xmm12
> +
> +/* Hence get initial quotient estimate QHi + QLo = R * VHi + R * VLo */
> +        mulps     %xmm15, %xmm11
> +        mulps     %xmm15, %xmm10
> +        subps     %xmm9, %xmm14
> +        mulps     %xmm12, %xmm15
> +        subps     %xmm10, %xmm14
> +
> +/* Compute D = E + E^2 */
> +        movaps    %xmm14, %xmm13
> +        movaps    %xmm4, %xmm8
> +        mulps     %xmm14, %xmm13
> +
> +/* reduction: compute r,n */
> +        movdqu    iBrkValue+__svml_satanh_data_internal(%rip), %xmm9
> +        addps     %xmm13, %xmm14
> +
> +/*
> + * Compute R * (VHi + VLo) * (1 + E + E^2)
> + * = R *  (VHi + VLo) * (1 + D)
> + * = QHi + (QHi * D + QLo + QLo * D)
> + */
> +        movaps    %xmm14, %xmm0
> +        mulps     %xmm15, %xmm14
> +        mulps     %xmm11, %xmm0
> +        addps     %xmm14, %xmm15
> +        movdqu    iOffExpoMask+__svml_satanh_data_internal(%rip), %xmm12
> +        movaps    %xmm4, %xmm14
> +
> +/* Record the sign for eventual reincorporation. */
> +        movups    sSign+__svml_satanh_data_internal(%rip), %xmm1
> +        addps     %xmm15, %xmm0
> +
> +/*
> + * Now finally accumulate the high and low parts of the
> + * argument to log1p, H + L, with a final compensated summation.
> + */
> +        movaps    %xmm0, %xmm6
> +        andps     %xmm5, %xmm1
> +
> +/* Or the sign bit in with the tiny result to handle atanh(-0) correctly */
> +        orps      %xmm1, %xmm3
> +        addps     %xmm11, %xmm6
> +        maxps     %xmm6, %xmm7
> +        minps     %xmm6, %xmm8
> +        subps     %xmm6, %xmm11
> +        movaps    %xmm7, %xmm10
> +        andps     %xmm2, %xmm3
> +        addps     %xmm8, %xmm10
> +        addps     %xmm11, %xmm0
> +        subps     %xmm10, %xmm7
> +        psubd     %xmm9, %xmm10
> +        addps     %xmm7, %xmm8
> +        pand      %xmm10, %xmm12
> +        psrad     $23, %xmm10
> +        cvtdq2ps  %xmm10, %xmm13
> +        addps     %xmm8, %xmm0
> +
> +/* final reconstruction */
> +        mulps     sLn2+__svml_satanh_data_internal(%rip), %xmm13
> +        pslld     $23, %xmm10
> +        paddd     %xmm9, %xmm12
> +        psubd     %xmm10, %xmm14
> +
> +/* polynomial evaluation */
> +        subps     %xmm4, %xmm12
> +        mulps     %xmm0, %xmm14
> +        movups    sPoly+112+__svml_satanh_data_internal(%rip), %xmm0
> +        addps     %xmm12, %xmm14
> +        mulps     %xmm14, %xmm0
> +
> +/* Finally, halve the result and reincorporate the sign */
> +        movups    sHalf+__svml_satanh_data_internal(%rip), %xmm4
> +        pxor      %xmm1, %xmm4
> +        addps     sPoly+96+__svml_satanh_data_internal(%rip), %xmm0
> +        mulps     %xmm14, %xmm0
> +        addps     sPoly+80+__svml_satanh_data_internal(%rip), %xmm0
> +        mulps     %xmm14, %xmm0
> +        addps     sPoly+64+__svml_satanh_data_internal(%rip), %xmm0
> +        mulps     %xmm14, %xmm0
> +        addps     sPoly+48+__svml_satanh_data_internal(%rip), %xmm0
> +        mulps     %xmm14, %xmm0
> +        addps     sPoly+32+__svml_satanh_data_internal(%rip), %xmm0
> +        mulps     %xmm14, %xmm0
> +        addps     sPoly+16+__svml_satanh_data_internal(%rip), %xmm0
> +        mulps     %xmm14, %xmm0
> +        addps     sPoly+__svml_satanh_data_internal(%rip), %xmm0
> +        mulps     %xmm14, %xmm0
> +        mulps     %xmm14, %xmm0
> +        addps     %xmm0, %xmm14
> +        movaps    %xmm2, %xmm0
> +        addps     %xmm13, %xmm14
> +        mulps     %xmm14, %xmm4
> +        andnps    %xmm4, %xmm0
> +        orps      %xmm3, %xmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm5
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm5, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      atanhf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_atanhf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_satanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 SgnMask[4][1];
> +        __declspec(align(16)) VUINT32 sOne[4][1];
> +        __declspec(align(16)) VUINT32 sPoly[8][4][1];
> +        __declspec(align(16)) VUINT32 iBrkValue[4][1];
> +        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
> +        __declspec(align(16)) VUINT32 sHalf[4][1];
> +        __declspec(align(16)) VUINT32 sSign[4][1];
> +        __declspec(align(16)) VUINT32 sTopMask12[4][1];
> +        __declspec(align(16)) VUINT32 TinyRange[4][1];
> +        __declspec(align(16)) VUINT32 sLn2[4][1];
> +} __svml_satanh_data_internal;
> +#endif
> +__svml_satanh_data_internal:
> +        /*== SgnMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 16
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== sPoly[] = SP polynomial ==*/
> +        .align 16
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 16
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 16
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sHalf ==*/
> +        .align 16
> +        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
> +        /*== sSign ==*/
> +        .align 16
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000
> +        /*== sTopMask12 ==*/
> +        .align 16
> +        .long 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000
> +        /*== TinyRange ==*/
> +        .align 16
> +        .long 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000
> +        /*== sLn2 = SP ln(2) ==*/
> +        .align 16
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> +        .align 16
> +        .type	__svml_satanh_data_internal,@object
> +        .size	__svml_satanh_data_internal,.-__svml_satanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core-sse.S
> new file mode 100644
> index 0000000000..b293bd5b41
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized atanhf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_atanhf _ZGVdN8v_atanhf_sse_wrapper
> +#include "../svml_s_atanhf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core.c
> new file mode 100644
> index 0000000000..3df8d66c94
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atanhf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_atanhf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_atanhf, __GI__ZGVdN8v_atanhf,
> +	       __redirect__ZGVdN8v_atanhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core_avx2.S
> new file mode 100644
> index 0000000000..00225207a8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core_avx2.S
> @@ -0,0 +1,335 @@
> +/* Function atanhf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute atanh(x) as 0.5 * log((1 + x)/(1 - x))
> + *
> + *   Special cases:
> + *
> + *   atanh(0)  = 0
> + *   atanh(+1) = +INF
> + *   atanh(-1) = -INF
> + *   atanh(x)  = NaN if |x| > 1, or if x is a NaN or INF
> + *
> + */
> +
> +/* Offsets for data table __svml_satanh_data_internal
> + */
> +#define SgnMask                       	0
> +#define sOne                          	32
> +#define sPoly                         	64
> +#define iBrkValue                     	320
> +#define iOffExpoMask                  	352
> +#define sHalf                         	384
> +#define sSign                         	416
> +#define sTopMask12                    	448
> +#define TinyRange                     	480
> +#define sLn2                          	512
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_atanhf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +
> +/* Load constants including One = 1 */
> +        vmovups   sOne+__svml_satanh_data_internal(%rip), %ymm5
> +        vmovups   sTopMask12+__svml_satanh_data_internal(%rip), %ymm13
> +        vmovaps   %ymm0, %ymm6
> +
> +/* Strip off the sign, so treat X as positive until right at the end */
> +        vandps    SgnMask+__svml_satanh_data_internal(%rip), %ymm6, %ymm10
> +        vsubps    %ymm10, %ymm5, %ymm1
> +
> +/*
> + * Compute V = 2 * X trivially, and UHi + U_lo = 1 - X in two pieces,
> + * the upper part UHi being <= 12 bits long. Then we have
> + * atanh(X) = 1/2 * log((1 + X) / (1 - X)) = 1/2 * log1p(V / (UHi + ULo)).
> + */
> +        vaddps    %ymm10, %ymm10, %ymm14
> +
> +/*
> + * Check whether |X| < 1, in which case we use the main function.
> + * Otherwise set the rangemask so that the callout will get used.
> + * Note that this will also use the callout for NaNs since not(NaN < 1).
> + */
> +        vcmpnlt_uqps %ymm5, %ymm10, %ymm7
> +        vsubps    %ymm1, %ymm5, %ymm9
> +        vcmplt_oqps TinyRange+__svml_satanh_data_internal(%rip), %ymm10, %ymm4
> +        vrcpps    %ymm1, %ymm11
> +        vsubps    %ymm10, %ymm9, %ymm12
> +        vandps    %ymm13, %ymm11, %ymm0
> +
> +/* No need to split sU when FMA is available */
> +        vfnmadd213ps %ymm5, %ymm0, %ymm1
> +        vmovaps   %ymm6, %ymm8
> +        vfmadd213ps %ymm6, %ymm6, %ymm8
> +        vfnmadd231ps %ymm0, %ymm12, %ymm1
> +
> +/*
> + * Split V as well into upper 12 bits and lower part, so that we can get
> + * a preliminary quotient estimate without rounding error.
> + */
> +        vandps    %ymm13, %ymm14, %ymm15
> +        vmovmskps %ymm7, %edx
> +        vsubps    %ymm15, %ymm14, %ymm7
> +
> +/* Hence get initial quotient estimate QHi + QLo = R * VHi + R * VLo */
> +        vmulps    %ymm15, %ymm0, %ymm10
> +
> +/* Compute D = E + E^2 */
> +        vfmadd213ps %ymm1, %ymm1, %ymm1
> +
> +/* Record the sign for eventual reincorporation. */
> +        vandps    sSign+__svml_satanh_data_internal(%rip), %ymm6, %ymm3
> +
> +/* Or the sign bit in with the tiny result to handle atanh(-0) correctly */
> +        vorps     %ymm3, %ymm8, %ymm2
> +        vmulps    %ymm7, %ymm0, %ymm8
> +
> +/*
> + * Compute R * (VHi + VLo) * (1 + E + E^2)
> + * = R *  (VHi + VLo) * (1 + D)
> + * = QHi + (QHi * D + QLo + QLo * D)
> + */
> +        vmulps    %ymm1, %ymm10, %ymm9
> +        vfmadd213ps %ymm8, %ymm8, %ymm1
> +        vaddps    %ymm1, %ymm9, %ymm1
> +
> +/* reduction: compute r,n */
> +        vmovups   iBrkValue+__svml_satanh_data_internal(%rip), %ymm9
> +
> +/*
> + * Now finally accumulate the high and low parts of the
> + * argument to log1p, H + L, with a final compensated summation.
> + */
> +        vaddps    %ymm1, %ymm10, %ymm12
> +        vsubps    %ymm12, %ymm10, %ymm11
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * later incorporating L into the reduced argument.
> + * compute 1+x as high, low parts
> + */
> +        vmaxps    %ymm12, %ymm5, %ymm13
> +        vminps    %ymm12, %ymm5, %ymm14
> +        vaddps    %ymm11, %ymm1, %ymm0
> +        vaddps    %ymm14, %ymm13, %ymm1
> +        vpsubd    %ymm9, %ymm1, %ymm7
> +        vsubps    %ymm1, %ymm13, %ymm15
> +        vpsrad    $23, %ymm7, %ymm10
> +        vpand     iOffExpoMask+__svml_satanh_data_internal(%rip), %ymm7, %ymm8
> +        vaddps    %ymm15, %ymm14, %ymm13
> +        vpslld    $23, %ymm10, %ymm11
> +        vpaddd    %ymm9, %ymm8, %ymm15
> +        vaddps    %ymm13, %ymm0, %ymm14
> +        vcvtdq2ps %ymm10, %ymm0
> +        vpsubd    %ymm11, %ymm5, %ymm12
> +
> +/* polynomial evaluation */
> +        vsubps    %ymm5, %ymm15, %ymm5
> +        vmulps    %ymm14, %ymm12, %ymm1
> +        vaddps    %ymm5, %ymm1, %ymm5
> +        vmovups   sPoly+224+__svml_satanh_data_internal(%rip), %ymm1
> +        vfmadd213ps sPoly+192+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
> +        vfmadd213ps sPoly+160+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
> +        vfmadd213ps sPoly+128+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
> +        vfmadd213ps sPoly+96+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
> +        vfmadd213ps sPoly+64+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
> +        vfmadd213ps sPoly+32+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
> +        vfmadd213ps sPoly+__svml_satanh_data_internal(%rip), %ymm5, %ymm1
> +        vmulps    %ymm1, %ymm5, %ymm7
> +        vfmadd213ps %ymm5, %ymm5, %ymm7
> +
> +/* final reconstruction */
> +        vfmadd132ps sLn2+__svml_satanh_data_internal(%rip), %ymm7, %ymm0
> +
> +/* Finally, halve the result and reincorporate the sign */
> +        vxorps    sHalf+__svml_satanh_data_internal(%rip), %ymm3, %ymm3
> +        vmulps    %ymm0, %ymm3, %ymm0
> +        vblendvps %ymm4, %ymm2, %ymm0, %ymm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm6
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm6, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      atanhf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_atanhf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_satanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 SgnMask[8][1];
> +        __declspec(align(32)) VUINT32 sOne[8][1];
> +        __declspec(align(32)) VUINT32 sPoly[8][8][1];
> +        __declspec(align(32)) VUINT32 iBrkValue[8][1];
> +        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
> +        __declspec(align(32)) VUINT32 sHalf[8][1];
> +        __declspec(align(32)) VUINT32 sSign[8][1];
> +        __declspec(align(32)) VUINT32 sTopMask12[8][1];
> +        __declspec(align(32)) VUINT32 TinyRange[8][1];
> +        __declspec(align(32)) VUINT32 sLn2[8][1];
> +} __svml_satanh_data_internal;
> +#endif
> +__svml_satanh_data_internal:
> +        /*== SgnMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 32
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== sPoly[] = SP polynomial ==*/
> +        .align 32
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 32
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 32
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sHalf ==*/
> +        .align 32
> +        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
> +        /*== sSign ==*/
> +        .align 32
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000
> +        /*== sTopMask12 ==*/
> +        .align 32
> +        .long 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000, 0xFFFFF000
> +        /*== TinyRange ==*/
> +        .align 32
> +        .long 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000, 0x0C000000
> +        /*== sLn2 = SP ln(2) ==*/
> +        .align 32
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> +        .align 32
> +        .type	__svml_satanh_data_internal,@object
> +        .size	__svml_satanh_data_internal,.-__svml_satanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_atanh2_core.S b/sysdeps/x86_64/fpu/svml_d_atanh2_core.S
> new file mode 100644
> index 0000000000..36f549ddd9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atanh2_core.S
> @@ -0,0 +1,29 @@
> +/* Function atanh vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_atanh)
> +WRAPPER_IMPL_SSE2 atanh
> +END (_ZGVbN2v_atanh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_atanh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_atanh4_core.S b/sysdeps/x86_64/fpu/svml_d_atanh4_core.S
> new file mode 100644
> index 0000000000..6d6d11e85e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atanh4_core.S
> @@ -0,0 +1,29 @@
> +/* Function atanh vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_atanh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_atanh
> +END (_ZGVdN4v_atanh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_atanh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S
> new file mode 100644
> index 0000000000..b4cfa275c8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atanh4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function atanh vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_atanh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_atanh
> +END (_ZGVcN4v_atanh)
> diff --git a/sysdeps/x86_64/fpu/svml_d_atanh8_core.S b/sysdeps/x86_64/fpu/svml_d_atanh8_core.S
> new file mode 100644
> index 0000000000..b31a6a72a1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atanh8_core.S
> @@ -0,0 +1,25 @@
> +/* Function atanh vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_atanh)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_atanh
> +END (_ZGVeN8v_atanh)
> diff --git a/sysdeps/x86_64/fpu/svml_s_atanhf16_core.S b/sysdeps/x86_64/fpu/svml_s_atanhf16_core.S
> new file mode 100644
> index 0000000000..2ea61888e7
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atanhf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function atanhf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_atanhf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_atanhf
> +END (_ZGVeN16v_atanhf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_atanhf4_core.S b/sysdeps/x86_64/fpu/svml_s_atanhf4_core.S
> new file mode 100644
> index 0000000000..6904cc388a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atanhf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function atanhf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_atanhf)
> +WRAPPER_IMPL_SSE2 atanhf
> +END (_ZGVbN4v_atanhf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_atanhf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_atanhf8_core.S b/sysdeps/x86_64/fpu/svml_s_atanhf8_core.S
> new file mode 100644
> index 0000000000..31d695fb5d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atanhf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function atanhf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_atanhf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_atanhf
> +END (_ZGVdN8v_atanhf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_atanhf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S
> new file mode 100644
> index 0000000000..6c24eaf45c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atanhf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function atanhf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_atanhf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_atanhf
> +END (_ZGVcN8v_atanhf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx.c
> new file mode 100644
> index 0000000000..0bdeec7851
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-atanh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx2.c
> new file mode 100644
> index 0000000000..0bdeec7851
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-atanh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx512f.c
> new file mode 100644
> index 0000000000..0bdeec7851
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atanh-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-atanh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atanh.c b/sysdeps/x86_64/fpu/test-double-libmvec-atanh.c
> new file mode 100644
> index 0000000000..41dd8e7af3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atanh.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC atanh
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 38359b05e3..04a4fe654b 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 17701e7731..f9ac2fad5d 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index bba62b2446..185801fa82 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 8a04e13a07..1cc8aaecbf 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx.c
> new file mode 100644
> index 0000000000..6f89ae70f2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-atanhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx2.c
> new file mode 100644
> index 0000000000..6f89ae70f2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-atanhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx512f.c
> new file mode 100644
> index 0000000000..6f89ae70f2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-atanhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atanhf.c b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf.c
> new file mode 100644
> index 0000000000..33a022adb8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atanhf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC atanhf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 706f52c618..b5d76d80e0 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index ceace4c53a..c1df6a03c1 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index 06a4753409..f4c646683f 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index a87e5298e0..a6acd3ffca 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -41,6 +41,7 @@ VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
> +VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 06/18] x86-64: Add vector cosh/coshf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 06/18] x86-64: Add vector cosh/coshf " Sunil K Pandey
@ 2021-12-29 21:26   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:26 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:48PM -0800, Sunil K Pandey wrote:
> Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector cosh/coshf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
>  .../fpu/multiarch/svml_d_cosh2_core-sse2.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_cosh2_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_cosh2_core_sse4.S    | 396 +++++++++++++++++
>  .../fpu/multiarch/svml_d_cosh4_core-sse.S     |  20 +
>  .../x86_64/fpu/multiarch/svml_d_cosh4_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_cosh4_core_avx2.S    | 412 ++++++++++++++++++
>  .../fpu/multiarch/svml_d_cosh8_core-avx2.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_cosh8_core.c  |  27 ++
>  .../fpu/multiarch/svml_d_cosh8_core_avx512.S  | 323 ++++++++++++++
>  .../fpu/multiarch/svml_s_coshf16_core-avx2.S  |  20 +
>  .../fpu/multiarch/svml_s_coshf16_core.c       |  28 ++
>  .../multiarch/svml_s_coshf16_core_avx512.S    | 321 ++++++++++++++
>  .../fpu/multiarch/svml_s_coshf4_core-sse2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_s_coshf4_core.c |  28 ++
>  .../fpu/multiarch/svml_s_coshf4_core_sse4.S   | 305 +++++++++++++
>  .../fpu/multiarch/svml_s_coshf8_core-sse.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_s_coshf8_core.c |  28 ++
>  .../fpu/multiarch/svml_s_coshf8_core_avx2.S   | 308 +++++++++++++
>  sysdeps/x86_64/fpu/svml_d_cosh2_core.S        |  29 ++
>  sysdeps/x86_64/fpu/svml_d_cosh4_core.S        |  29 ++
>  sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S    |  25 ++
>  sysdeps/x86_64/fpu/svml_d_cosh8_core.S        |  25 ++
>  sysdeps/x86_64/fpu/svml_s_coshf16_core.S      |  25 ++
>  sysdeps/x86_64/fpu/svml_s_coshf4_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_s_coshf8_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S   |  25 ++
>  .../x86_64/fpu/test-double-libmvec-cosh-avx.c |   1 +
>  .../fpu/test-double-libmvec-cosh-avx2.c       |   1 +
>  .../fpu/test-double-libmvec-cosh-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-cosh.c |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-coshf-avx.c |   1 +
>  .../fpu/test-float-libmvec-coshf-avx2.c       |   1 +
>  .../fpu/test-float-libmvec-coshf-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-coshf.c |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 2637 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_cosh8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-cosh.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-coshf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index bc18621f17..35c6ac57a8 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -164,4 +164,15 @@
>  #define __DECL_SIMD_exp10f32x
>  #define __DECL_SIMD_exp10f64x
>  #define __DECL_SIMD_exp10f128x
> +
> +#define __DECL_SIMD_cosh
> +#define __DECL_SIMD_coshf
> +#define __DECL_SIMD_coshl
> +#define __DECL_SIMD_coshf16
> +#define __DECL_SIMD_coshf32
> +#define __DECL_SIMD_coshf64
> +#define __DECL_SIMD_coshf128
> +#define __DECL_SIMD_coshf32x
> +#define __DECL_SIMD_coshf64x
> +#define __DECL_SIMD_coshf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 870778457f..60a314f69e 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -68,7 +68,7 @@ __MATHCALL (tan,, (_Mdouble_ __x));
>  /* Hyperbolic functions.  */
>  
>  /* Hyperbolic cosine of X.  */
> -__MATHCALL (cosh,, (_Mdouble_ __x));
> +__MATHCALL_VEC (cosh,, (_Mdouble_ __x));
>  /* Hyperbolic sine of X.  */
>  __MATHCALL (sinh,, (_Mdouble_ __x));
>  /* Hyperbolic tangent of X.  */
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index b3c1f59593..4907680143 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -49,48 +49,56 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
>  GLIBC_2.35 _ZGVbN2v_acos F
>  GLIBC_2.35 _ZGVbN2v_asin F
>  GLIBC_2.35 _ZGVbN2v_atan F
> +GLIBC_2.35 _ZGVbN2v_cosh F
>  GLIBC_2.35 _ZGVbN2v_exp10 F
>  GLIBC_2.35 _ZGVbN2v_exp2 F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
>  GLIBC_2.35 _ZGVbN4v_atanf F
> +GLIBC_2.35 _ZGVbN4v_coshf F
>  GLIBC_2.35 _ZGVbN4v_exp10f F
>  GLIBC_2.35 _ZGVbN4v_exp2f F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
>  GLIBC_2.35 _ZGVcN4v_asin F
>  GLIBC_2.35 _ZGVcN4v_atan F
> +GLIBC_2.35 _ZGVcN4v_cosh F
>  GLIBC_2.35 _ZGVcN4v_exp10 F
>  GLIBC_2.35 _ZGVcN4v_exp2 F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
>  GLIBC_2.35 _ZGVcN8v_atanf F
> +GLIBC_2.35 _ZGVcN8v_coshf F
>  GLIBC_2.35 _ZGVcN8v_exp10f F
>  GLIBC_2.35 _ZGVcN8v_exp2f F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
>  GLIBC_2.35 _ZGVdN4v_asin F
>  GLIBC_2.35 _ZGVdN4v_atan F
> +GLIBC_2.35 _ZGVdN4v_cosh F
>  GLIBC_2.35 _ZGVdN4v_exp10 F
>  GLIBC_2.35 _ZGVdN4v_exp2 F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
>  GLIBC_2.35 _ZGVdN8v_atanf F
> +GLIBC_2.35 _ZGVdN8v_coshf F
>  GLIBC_2.35 _ZGVdN8v_exp10f F
>  GLIBC_2.35 _ZGVdN8v_exp2f F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
>  GLIBC_2.35 _ZGVeN16v_atanf F
> +GLIBC_2.35 _ZGVeN16v_coshf F
>  GLIBC_2.35 _ZGVeN16v_exp10f F
>  GLIBC_2.35 _ZGVeN16v_exp2f F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
>  GLIBC_2.35 _ZGVeN8v_asin F
>  GLIBC_2.35 _ZGVeN8v_atan F
> +GLIBC_2.35 _ZGVeN8v_cosh F
>  GLIBC_2.35 _ZGVeN8v_exp10 F
>  GLIBC_2.35 _ZGVeN8v_exp2 F
>  GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index f3f9c2e092..708e81b3d0 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -82,6 +82,10 @@
>  #  define __DECL_SIMD_exp10 __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_exp10f
>  #  define __DECL_SIMD_exp10f __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_cosh
> +#  define __DECL_SIMD_cosh __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_coshf
> +#  define __DECL_SIMD_coshf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index c033abbedc..81d0238ebf 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -40,6 +40,8 @@
>  !GCC$ builtin (exp2f) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (exp10) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (exp10f) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (cosh) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (coshf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -65,3 +67,5 @@
>  !GCC$ builtin (exp2f) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (exp10) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (exp10f) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (cosh) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (coshf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index fd0a9da439..5bc2df134f 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -26,6 +26,7 @@ libmvec-funcs = \
>    asin \
>    atan \
>    cos \
> +  cosh \
>    exp \
>    exp10 \
>    exp2 \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index f29cfa4cbf..53346d16a2 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -17,12 +17,14 @@ libmvec {
>      _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
>      _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
>      _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
> +    _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh;
>      _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
>      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
>      _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
> +    _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf;
>      _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
>      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
>      _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 45f2e4bb53..ac70f15208 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -891,6 +891,26 @@ float: 2
>  float128: 3
>  ldouble: 3
>  
> +Function: "cosh_vlen16":
> +float: 2
> +
> +Function: "cosh_vlen2":
> +double: 2
> +
> +Function: "cosh_vlen4":
> +double: 2
> +float: 2
> +
> +Function: "cosh_vlen4_avx2":
> +double: 2
> +
> +Function: "cosh_vlen8":
> +double: 2
> +float: 2
> +
> +Function: "cosh_vlen8_avx2":
> +float: 2
> +
>  Function: Real part of "cpow":
>  double: 2
>  float: 5
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S
> new file mode 100644
> index 0000000000..bfe4e3d0f0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized cosh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_cosh _ZGVbN2v_cosh_sse2
> +#include "../svml_d_cosh2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c
> new file mode 100644
> index 0000000000..99561fea47
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized cosh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_cosh
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_cosh, __GI__ZGVbN2v_cosh, __redirect__ZGVbN2v_cosh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S
> new file mode 100644
> index 0000000000..150bfae7e1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh2_core_sse4.S
> @@ -0,0 +1,396 @@
> +/* Function cosh vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute cosh(x) as (exp(x)+exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   cosh(NaN) = quiet NaN, and raise invalid exception
> + *   cosh(INF) = that INF
> + *   cosh(0)   = 1
> + *   cosh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_dcosh_data_internal
> + */
> +#define _dbT                          	0
> +#define _dbInvLn2                     	2064
> +#define _dbLn2hi                      	2080
> +#define _dbLn2lo                      	2096
> +#define _dbShifter                    	2112
> +#define _iIndexMask                   	2128
> +#define _dPC2                         	2144
> +#define _dPC3                         	2160
> +#define _dPC4                         	2176
> +#define _iMaxIndex                    	2192
> +#define _lExpMask                     	2208
> +#define _dSign                        	2224
> +#define _iDomainRange                 	2240
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_cosh_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm4
> +        movups    _dSign+__svml_dcosh_data_internal(%rip), %xmm2
> +        lea       _dbT+__svml_dcosh_data_internal(%rip), %r8
> +
> +/*  Abs argument  */
> +        movaps    %xmm2, %xmm5
> +
> +/* dXSign=0x001000000000 */
> +        psrlq     $11, %xmm2
> +
> +/*
> + *  Load argument
> + * dM = x*2^K/log(2) + RShifter
> + */
> +        movups    _dbInvLn2+__svml_dcosh_data_internal(%rip), %xmm3
> +        andnps    %xmm4, %xmm5
> +        mulpd     %xmm5, %xmm3
> +        movups    _dbShifter+__svml_dcosh_data_internal(%rip), %xmm1
> +        addpd     %xmm1, %xmm3
> +
> +/*
> + *  R
> + * dN = dM - RShifter
> + */
> +        movaps    %xmm3, %xmm15
> +        subpd     %xmm1, %xmm15
> +
> +/* dR = dX - dN*Log2_hi/2^K */
> +        movups    _dbLn2hi+__svml_dcosh_data_internal(%rip), %xmm14
> +        mulpd     %xmm15, %xmm14
> +
> +/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */
> +        movups    _dbLn2lo+__svml_dcosh_data_internal(%rip), %xmm1
> +        mulpd     %xmm15, %xmm1
> +
> +/*
> + * Check for overflow\underflow
> + *
> + */
> +        pshufd    $221, %xmm5, %xmm7
> +        subpd     %xmm14, %xmm5
> +        movq      _iIndexMask+__svml_dcosh_data_internal(%rip), %xmm8
> +
> +/*  Index and lookup  */
> +        pshufd    $136, %xmm3, %xmm9
> +
> +/*
> + *  G1,G2,G3: dTdif,dTn * 2^N,2^(-N)
> + * NB: copied from sinh_la - to be optimized!!!!!
> + */
> +        psllq     $44, %xmm3
> +
> +/*
> + * trick
> + * 256=-iIndex
> + */
> +        movq      _iMaxIndex+__svml_dcosh_data_internal(%rip), %xmm12
> +        pand      %xmm8, %xmm9
> +        subpd     %xmm1, %xmm5
> +        psubd     %xmm9, %xmm12
> +
> +/* iIndex*=3 */
> +        movdqa    %xmm9, %xmm10
> +
> +/* iDomainRange*=3 */
> +        pslld     $3, %xmm12
> +        pslld     $3, %xmm10
> +        movd      %xmm12, %esi
> +        pshufd    $1, %xmm12, %xmm13
> +        movq      _iDomainRange+__svml_dcosh_data_internal(%rip), %xmm6
> +        movd      %xmm13, %edi
> +        pcmpgtd   %xmm6, %xmm7
> +        movmskps  %xmm7, %eax
> +
> +/* dR2 = dR^2 */
> +        movaps    %xmm5, %xmm7
> +
> +/* lM now is an EXP(2^N) */
> +        pand      _lExpMask+__svml_dcosh_data_internal(%rip), %xmm3
> +        pshufd    $1, %xmm10, %xmm11
> +        movslq    %esi, %rsi
> +        mulpd     %xmm5, %xmm7
> +        movd      %xmm10, %edx
> +        movsd     (%r8,%rsi), %xmm6
> +        movd      %xmm11, %ecx
> +        movslq    %edi, %rdi
> +        movslq    %edx, %rdx
> +        movslq    %ecx, %rcx
> +        movhpd    (%r8,%rdi), %xmm6
> +
> +/*  */
> +        psubq     %xmm3, %xmm6
> +
> +/* lX- = EXP(1/2) */
> +        psubq     %xmm2, %xmm6
> +
> +/*
> + * sinh(r) = r +r*r^2*a3 ....
> + * dSinh_r = r^2*a3
> + */
> +        movups    _dPC3+__svml_dcosh_data_internal(%rip), %xmm2
> +        mulpd     %xmm7, %xmm2
> +
> +/* dSinh_r = r + r*r^2*a3 */
> +        mulpd     %xmm5, %xmm2
> +        movsd     (%r8,%rdx), %xmm0
> +        movhpd    (%r8,%rcx), %xmm0
> +        paddq     %xmm3, %xmm0
> +        addpd     %xmm2, %xmm5
> +
> +/* dTn = dTn*2^N - dTn*2^-N */
> +        movaps    %xmm0, %xmm3
> +        subpd     %xmm6, %xmm3
> +
> +/* dTp = dTn*2^N + dTn*2^-N */
> +        addpd     %xmm6, %xmm0
> +        mulpd     %xmm5, %xmm3
> +
> +/* poly(r) = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
> +        movups    _dPC4+__svml_dcosh_data_internal(%rip), %xmm5
> +        mulpd     %xmm7, %xmm5
> +        addpd     _dPC2+__svml_dcosh_data_internal(%rip), %xmm5
> +        mulpd     %xmm5, %xmm7
> +
> +/* dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
> +        mulpd     %xmm0, %xmm7
> +        addpd     %xmm7, %xmm3
> +
> +/* _VRES1 = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
> +        addpd     %xmm3, %xmm0
> +        andl      $3, %eax
> +
> +/*  Ret H  */
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm4
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm4, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0
> +
> +        xorl      %edx, %edx
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      cosh@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN2v_cosh_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dcosh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _dbT[(1 + (1<<8))][2];  //dTpj ONLY!
> +        __declspec(align(16)) VUINT32 _dbInvLn2[2][2];
> +        __declspec(align(16)) VUINT32 _dbLn2hi[2][2];
> +        __declspec(align(16)) VUINT32 _dbLn2lo[2][2];
> +        __declspec(align(16)) VUINT32 _dbShifter[2][2];
> +        __declspec(align(16)) VUINT32 _iIndexMask[4][1];          //(1<<K)1-
> +        __declspec(align(16)) VUINT32 _dPC2[2][2];
> +        __declspec(align(16)) VUINT32 _dPC3[2][2];
> +        __declspec(align(16)) VUINT32 _dPC4[2][2];
> +        __declspec(align(16)) VUINT32 _iMaxIndex[4][1];       //(1<<K)
> +        __declspec(align(16)) VUINT32 _lExpMask[2][2];
> +        __declspec(align(16)) VUINT32 _dSign[2][2];               //0x8000000000000000
> +        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
> +} __svml_dcosh_data_internal;
> +#endif
> +__svml_dcosh_data_internal:
> +        /*== _dbT ==*/
> +        .quad 0x3fe0000000000000, 0x3fe00b1afa5abcbf, 0x3fe0163da9fb3335, 0x3fe02168143b0281
> +        .quad 0x3fe02c9a3e778061, 0x3fe037d42e11bbcc, 0x3fe04315e86e7f85, 0x3fe04e5f72f654b1
> +        .quad 0x3fe059b0d3158574, 0x3fe0650a0e3c1f89, 0x3fe0706b29ddf6de, 0x3fe07bd42b72a836
> +        .quad 0x3fe0874518759bc8, 0x3fe092bdf66607e0, 0x3fe09e3ecac6f383, 0x3fe0a9c79b1f3919
> +        .quad 0x3fe0b5586cf9890f, 0x3fe0c0f145e46c85, 0x3fe0cc922b7247f7, 0x3fe0d83b23395dec
> +        .quad 0x3fe0e3ec32d3d1a2, 0x3fe0efa55fdfa9c5, 0x3fe0fb66affed31b, 0x3fe1073028d7233e
> +        .quad 0x3fe11301d0125b51, 0x3fe11edbab5e2ab6, 0x3fe12abdc06c31cc, 0x3fe136a814f204ab
> +        .quad 0x3fe1429aaea92de0, 0x3fe14e95934f312e, 0x3fe15a98c8a58e51, 0x3fe166a45471c3c2
> +        .quad 0x3fe172b83c7d517b, 0x3fe17ed48695bbc0, 0x3fe18af9388c8dea, 0x3fe1972658375d2f
> +        .quad 0x3fe1a35beb6fcb75, 0x3fe1af99f8138a1c, 0x3fe1bbe084045cd4, 0x3fe1c82f95281c6b
> +        .quad 0x3fe1d4873168b9aa, 0x3fe1e0e75eb44027, 0x3fe1ed5022fcd91d, 0x3fe1f9c18438ce4d
> +        .quad 0x3fe2063b88628cd6, 0x3fe212be3578a819, 0x3fe21f49917ddc96, 0x3fe22bdda27912d1
> +        .quad 0x3fe2387a6e756238, 0x3fe2451ffb82140a, 0x3fe251ce4fb2a63f, 0x3fe25e85711ece75
> +        .quad 0x3fe26b4565e27cdd, 0x3fe2780e341ddf29, 0x3fe284dfe1f56381, 0x3fe291ba7591bb70
> +        .quad 0x3fe29e9df51fdee1, 0x3fe2ab8a66d10f13, 0x3fe2b87fd0dad990, 0x3fe2c57e39771b2f
> +        .quad 0x3fe2d285a6e4030b, 0x3fe2df961f641589, 0x3fe2ecafa93e2f56, 0x3fe2f9d24abd886b
> +        .quad 0x3fe306fe0a31b715, 0x3fe31432edeeb2fd, 0x3fe32170fc4cd831, 0x3fe32eb83ba8ea32
> +        .quad 0x3fe33c08b26416ff, 0x3fe3496266e3fa2d, 0x3fe356c55f929ff1, 0x3fe36431a2de883b
> +        .quad 0x3fe371a7373aa9cb, 0x3fe37f26231e754a, 0x3fe38cae6d05d866, 0x3fe39a401b7140ef
> +        .quad 0x3fe3a7db34e59ff7, 0x3fe3b57fbfec6cf4, 0x3fe3c32dc313a8e5, 0x3fe3d0e544ede173
> +        .quad 0x3fe3dea64c123422, 0x3fe3ec70df1c5175, 0x3fe3fa4504ac801c, 0x3fe40822c367a024
> +        .quad 0x3fe4160a21f72e2a, 0x3fe423fb2709468a, 0x3fe431f5d950a897, 0x3fe43ffa3f84b9d4
> +        .quad 0x3fe44e086061892d, 0x3fe45c2042a7d232, 0x3fe46a41ed1d0057, 0x3fe4786d668b3237
> +        .quad 0x3fe486a2b5c13cd0, 0x3fe494e1e192aed2, 0x3fe4a32af0d7d3de, 0x3fe4b17dea6db7d7
> +        .quad 0x3fe4bfdad5362a27, 0x3fe4ce41b817c114, 0x3fe4dcb299fddd0d, 0x3fe4eb2d81d8abff
> +        .quad 0x3fe4f9b2769d2ca7, 0x3fe508417f4531ee, 0x3fe516daa2cf6642, 0x3fe5257de83f4eef
> +        .quad 0x3fe5342b569d4f82, 0x3fe542e2f4f6ad27, 0x3fe551a4ca5d920f, 0x3fe56070dde910d2
> +        .quad 0x3fe56f4736b527da, 0x3fe57e27dbe2c4cf, 0x3fe58d12d497c7fd, 0x3fe59c0827ff07cc
> +        .quad 0x3fe5ab07dd485429, 0x3fe5ba11fba87a03, 0x3fe5c9268a5946b7, 0x3fe5d84590998b93
> +        .quad 0x3fe5e76f15ad2148, 0x3fe5f6a320dceb71, 0x3fe605e1b976dc09, 0x3fe6152ae6cdf6f4
> +        .quad 0x3fe6247eb03a5585, 0x3fe633dd1d1929fd, 0x3fe6434634ccc320, 0x3fe652b9febc8fb7
> +        .quad 0x3fe6623882552225, 0x3fe671c1c70833f6, 0x3fe68155d44ca973, 0x3fe690f4b19e9538
> +        .quad 0x3fe6a09e667f3bcd, 0x3fe6b052fa75173e, 0x3fe6c012750bdabf, 0x3fe6cfdcddd47645
> +        .quad 0x3fe6dfb23c651a2f, 0x3fe6ef9298593ae5, 0x3fe6ff7df9519484, 0x3fe70f7466f42e87
> +        .quad 0x3fe71f75e8ec5f74, 0x3fe72f8286ead08a, 0x3fe73f9a48a58174, 0x3fe74fbd35d7cbfd
> +        .quad 0x3fe75feb564267c9, 0x3fe77024b1ab6e09, 0x3fe780694fde5d3f, 0x3fe790b938ac1cf6
> +        .quad 0x3fe7a11473eb0187, 0x3fe7b17b0976cfdb, 0x3fe7c1ed0130c132, 0x3fe7d26a62ff86f0
> +        .quad 0x3fe7e2f336cf4e62, 0x3fe7f3878491c491, 0x3fe80427543e1a12, 0x3fe814d2add106d9
> +        .quad 0x3fe82589994cce13, 0x3fe8364c1eb941f7, 0x3fe8471a4623c7ad, 0x3fe857f4179f5b21
> +        .quad 0x3fe868d99b4492ed, 0x3fe879cad931a436, 0x3fe88ac7d98a6699, 0x3fe89bd0a478580f
> +        .quad 0x3fe8ace5422aa0db, 0x3fe8be05bad61778, 0x3fe8cf3216b5448c, 0x3fe8e06a5e0866d9
> +        .quad 0x3fe8f1ae99157736, 0x3fe902fed0282c8a, 0x3fe9145b0b91ffc6, 0x3fe925c353aa2fe2
> +        .quad 0x3fe93737b0cdc5e5, 0x3fe948b82b5f98e5, 0x3fe95a44cbc8520f, 0x3fe96bdd9a7670b3
> +        .quad 0x3fe97d829fde4e50, 0x3fe98f33e47a22a2, 0x3fe9a0f170ca07ba, 0x3fe9b2bb4d53fe0d
> +        .quad 0x3fe9c49182a3f090, 0x3fe9d674194bb8d5, 0x3fe9e86319e32323, 0x3fe9fa5e8d07f29e
> +        .quad 0x3fea0c667b5de565, 0x3fea1e7aed8eb8bb, 0x3fea309bec4a2d33, 0x3fea42c980460ad8
> +        .quad 0x3fea5503b23e255d, 0x3fea674a8af46052, 0x3fea799e1330b358, 0x3fea8bfe53c12e59
> +        .quad 0x3fea9e6b5579fdbf, 0x3feab0e521356eba, 0x3feac36bbfd3f37a, 0x3fead5ff3a3c2774
> +        .quad 0x3feae89f995ad3ad, 0x3feafb4ce622f2ff, 0x3feb0e07298db666, 0x3feb20ce6c9a8952
> +        .quad 0x3feb33a2b84f15fb, 0x3feb468415b749b1, 0x3feb59728de5593a, 0x3feb6c6e29f1c52a
> +        .quad 0x3feb7f76f2fb5e47, 0x3feb928cf22749e4, 0x3feba5b030a1064a, 0x3febb8e0b79a6f1f
> +        .quad 0x3febcc1e904bc1d2, 0x3febdf69c3f3a207, 0x3febf2c25bd71e09, 0x3fec06286141b33d
> +        .quad 0x3fec199bdd85529c, 0x3fec2d1cd9fa652c, 0x3fec40ab5fffd07a, 0x3fec544778fafb22
> +        .quad 0x3fec67f12e57d14b, 0x3fec7ba88988c933, 0x3fec8f6d9406e7b5, 0x3feca3405751c4db
> +        .quad 0x3fecb720dcef9069, 0x3feccb0f2e6d1675, 0x3fecdf0b555dc3fa, 0x3fecf3155b5bab74
> +        .quad 0x3fed072d4a07897c, 0x3fed1b532b08c968, 0x3fed2f87080d89f2, 0x3fed43c8eacaa1d6
> +        .quad 0x3fed5818dcfba487, 0x3fed6c76e862e6d3, 0x3fed80e316c98398, 0x3fed955d71ff6075
> +        .quad 0x3feda9e603db3285, 0x3fedbe7cd63a8315, 0x3fedd321f301b460, 0x3fede7d5641c0658
> +        .quad 0x3fedfc97337b9b5f, 0x3fee11676b197d17, 0x3fee264614f5a129, 0x3fee3b333b16ee12
> +        .quad 0x3fee502ee78b3ff6, 0x3fee653924676d76, 0x3fee7a51fbc74c83, 0x3fee8f7977cdb740
> +        .quad 0x3feea4afa2a490da, 0x3feeb9f4867cca6e, 0x3feecf482d8e67f1, 0x3feee4aaa2188510
> +        .quad 0x3feefa1bee615a27, 0x3fef0f9c1cb6412a, 0x3fef252b376bba97, 0x3fef3ac948dd7274
> +        .quad 0x3fef50765b6e4540, 0x3fef6632798844f8, 0x3fef7bfdad9cbe14, 0x3fef91d802243c89
> +        .quad 0x3fefa7c1819e90d8, 0x3fefbdba3692d514, 0x3fefd3c22b8f71f1, 0x3fefe9d96b2a23d9
> +        .quad 0x3ff0000000000000
> +        .align 16
> +        .quad 0x3ff71547652b82fe, 0x3ff71547652b82fe /* _dbInvLn2 = 1/log(2) */
> +        .align 16
> +        .quad 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000 /* _dbLn2hi  = log(2) hi*/
> +        .align 16
> +        .quad 0xBDAC610CA86C3899, 0xBDAC610CA86C3899 /* _dbLn2lo  = log(2) lo*/
> +        .align 16
> +        .quad 0x42B8000000000000, 0x42B8000000000000 /* _dbShifter */
> +        .align 16
> +        .long 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF         /* _iIndexMask */
> +        .align 16
> +        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
> +        .align 16
> +        .quad 0x3FC5555570813E14, 0x3FC5555570813E14 /* _dPC3 */
> +        .align 16
> +        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
> +        .align 16
> +        .long 0x00000100, 0x00000100, 0x00000100, 0x00000100 /* _iMaxIndex */
> +        .align 16
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000 /* _lExpMask */
> +        .align 16
> +        .quad 0x8000000000000000, 0x8000000000000000 /* _dSign*/
> +        .align 16
> +        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99 /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
> +        .align 16
> +        .type	__svml_dcosh_data_internal,@object
> +        .size	__svml_dcosh_data_internal,.-__svml_dcosh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S
> new file mode 100644
> index 0000000000..4410d34583
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized cosh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_cosh _ZGVdN4v_cosh_sse_wrapper
> +#include "../svml_d_cosh4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c
> new file mode 100644
> index 0000000000..c4f59206a9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized cosh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_cosh
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_cosh, __GI__ZGVdN4v_cosh, __redirect__ZGVdN4v_cosh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S
> new file mode 100644
> index 0000000000..2d86a02923
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh4_core_avx2.S
> @@ -0,0 +1,412 @@
> +/* Function cosh vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute cosh(x) as (exp(x)+exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   cosh(NaN) = quiet NaN, and raise invalid exception
> + *   cosh(INF) = that INF
> + *   cosh(0)   = 1
> + *   cosh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_dcosh_data_internal
> + */
> +#define _dbT                          	0
> +#define _dbInvLn2                     	2080
> +#define _dbLn2hi                      	2112
> +#define _dbLn2lo                      	2144
> +#define _dbShifter                    	2176
> +#define _iIndexMask                   	2208
> +#define _dPC2                         	2240
> +#define _dPC3                         	2272
> +#define _dPC4                         	2304
> +#define _iMaxIndex                    	2336
> +#define _lExpMask                     	2368
> +#define _dSign                        	2400
> +#define _iDomainRange                 	2432
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_cosh_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       _dbT+__svml_dcosh_data_internal(%rip), %rax
> +        vmovupd   _dSign+__svml_dcosh_data_internal(%rip), %ymm8
> +        vmovupd   _dbShifter+__svml_dcosh_data_internal(%rip), %ymm6
> +
> +/*
> + *  Load argument
> + * dM = x*2^K/log(2) + RShifter
> + */
> +        vmovupd   _dbInvLn2+__svml_dcosh_data_internal(%rip), %ymm3
> +
> +/*
> + * trick
> + * 256=-iIndex
> + */
> +        vmovups   _iMaxIndex+__svml_dcosh_data_internal(%rip), %xmm14
> +
> +/* dXSign=0x001000000000 */
> +        vpsrlq    $11, %ymm8, %ymm5
> +        vmovapd   %ymm0, %ymm7
> +
> +/*  Abs argument  */
> +        vandnpd   %ymm7, %ymm8, %ymm4
> +        vfmadd213pd %ymm6, %ymm4, %ymm3
> +
> +/*  Index and lookup  */
> +        vextractf128 $1, %ymm3, %xmm12
> +        vshufps   $136, %xmm12, %xmm3, %xmm13
> +        vpand     _iIndexMask+__svml_dcosh_data_internal(%rip), %xmm13, %xmm15
> +        vpsubd    %xmm15, %xmm14, %xmm0
> +
> +/* iDomainRange*=3 */
> +        vpslld    $3, %xmm0, %xmm2
> +        vmovd     %xmm2, %r9d
> +        vpextrd   $2, %xmm2, %r11d
> +        movslq    %r9d, %r9
> +        vpextrd   $1, %xmm2, %r10d
> +        movslq    %r11d, %r11
> +        movslq    %r10d, %r10
> +        vmovsd    (%rax,%r9), %xmm12
> +
> +/*
> + * Check for overflow\underflow
> + *
> + */
> +        vextractf128 $1, %ymm4, %xmm9
> +        vmovsd    (%rax,%r11), %xmm14
> +        vmovhpd   (%rax,%r10), %xmm12, %xmm13
> +        vshufps   $221, %xmm9, %xmm4, %xmm10
> +
> +/* iIndex*=3 */
> +        vpslld    $3, %xmm15, %xmm9
> +
> +/*
> + *  R
> + * dN = dM - RShifter
> + */
> +        vsubpd    %ymm6, %ymm3, %ymm15
> +        vmovd     %xmm9, %ecx
> +        vpcmpgtd  _iDomainRange+__svml_dcosh_data_internal(%rip), %xmm10, %xmm11
> +        vmovupd   _dbLn2hi+__svml_dcosh_data_internal(%rip), %ymm6
> +
> +/*
> + *  G1,G2,G3: dTdif,dTn * 2^N,2^(-N)
> + * NB: copied from sinh_la - to be optimized!!!!!
> + */
> +        vpsllq    $44, %ymm3, %ymm3
> +        vmovmskps %xmm11, %edx
> +
> +/* dR = dX - dN*Log2_hi/2^K */
> +        vfnmadd231pd %ymm6, %ymm15, %ymm4
> +
> +/* lM now is an EXP(2^N) */
> +        vpand     _lExpMask+__svml_dcosh_data_internal(%rip), %ymm3, %ymm3
> +
> +/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */
> +        vfnmadd231pd _dbLn2lo+__svml_dcosh_data_internal(%rip), %ymm15, %ymm4
> +        movslq    %ecx, %rcx
> +        vpextrd   $2, %xmm9, %edi
> +        vpextrd   $1, %xmm9, %esi
> +        movslq    %edi, %rdi
> +        vmovsd    (%rax,%rcx), %xmm1
> +        vpextrd   $3, %xmm9, %r8d
> +        vpextrd   $3, %xmm2, %ecx
> +        movslq    %esi, %rsi
> +        movslq    %r8d, %r8
> +        movslq    %ecx, %rcx
> +
> +/* dR2 = dR^2 */
> +        vmulpd    %ymm4, %ymm4, %ymm0
> +        vmovsd    (%rax,%rdi), %xmm10
> +        vmovhpd   (%rax,%rsi), %xmm1, %xmm8
> +        vmovhpd   (%rax,%r8), %xmm10, %xmm11
> +        vmovhpd   (%rax,%rcx), %xmm14, %xmm2
> +        vinsertf128 $1, %xmm11, %ymm8, %ymm1
> +        vinsertf128 $1, %xmm2, %ymm13, %ymm2
> +        vpaddq    %ymm3, %ymm1, %ymm6
> +
> +/*  */
> +        vpsubq    %ymm3, %ymm2, %ymm1
> +
> +/*
> + * sinh(r) = r +r*r^2*a3 ....
> + * dSinh_r = r^2*a3
> + */
> +        vmulpd    _dPC3+__svml_dcosh_data_internal(%rip), %ymm0, %ymm2
> +
> +/* lX- = EXP(1/2) */
> +        vpsubq    %ymm5, %ymm1, %ymm5
> +
> +/* dSinh_r = r + r*r^2*a3 */
> +        vfmadd213pd %ymm4, %ymm4, %ymm2
> +
> +/* poly(r) = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
> +        vmovupd   _dPC4+__svml_dcosh_data_internal(%rip), %ymm4
> +
> +/* dTn = dTn*2^N - dTn*2^-N */
> +        vsubpd    %ymm5, %ymm6, %ymm1
> +
> +/* dTp = dTn*2^N + dTn*2^-N */
> +        vaddpd    %ymm5, %ymm6, %ymm3
> +        vfmadd213pd _dPC2+__svml_dcosh_data_internal(%rip), %ymm0, %ymm4
> +        vmulpd    %ymm2, %ymm1, %ymm1
> +        vmulpd    %ymm4, %ymm0, %ymm0
> +
> +/* dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
> +        vfmadd213pd %ymm1, %ymm3, %ymm0
> +
> +/* _VRES1 = dTp + dTn*sinh(dR)+dTp*dR2*(a2 +a4*dR2) */
> +        vaddpd    %ymm0, %ymm3, %ymm0
> +
> +/*  Ret H  */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm7
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm7, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      cosh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_cosh_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dcosh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _dbT[(1 + (1<<8))][2];  //dTpj ONLY!
> +        __declspec(align(32)) VUINT32 _dbInvLn2[4][2];
> +        __declspec(align(32)) VUINT32 _dbLn2hi[4][2];
> +        __declspec(align(32)) VUINT32 _dbLn2lo[4][2];
> +        __declspec(align(32)) VUINT32 _dbShifter[4][2];
> +        __declspec(align(32)) VUINT32 _iIndexMask[8][1];          //(1<<K)1-
> +        __declspec(align(32)) VUINT32 _dPC2[4][2];
> +        __declspec(align(32)) VUINT32 _dPC3[4][2];
> +        __declspec(align(32)) VUINT32 _dPC4[4][2];
> +        __declspec(align(32)) VUINT32 _iMaxIndex[8][1];       //(1<<K)
> +        __declspec(align(32)) VUINT32 _lExpMask[4][2];
> +        __declspec(align(32)) VUINT32 _dSign[4][2];               //0x8000000000000000
> +        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
> +} __svml_dcosh_data_internal;
> +#endif
> +__svml_dcosh_data_internal:
> +        /*== _dbT ==*/
> +        .quad 0x3fe0000000000000, 0x3fe00b1afa5abcbf, 0x3fe0163da9fb3335, 0x3fe02168143b0281
> +        .quad 0x3fe02c9a3e778061, 0x3fe037d42e11bbcc, 0x3fe04315e86e7f85, 0x3fe04e5f72f654b1
> +        .quad 0x3fe059b0d3158574, 0x3fe0650a0e3c1f89, 0x3fe0706b29ddf6de, 0x3fe07bd42b72a836
> +        .quad 0x3fe0874518759bc8, 0x3fe092bdf66607e0, 0x3fe09e3ecac6f383, 0x3fe0a9c79b1f3919
> +        .quad 0x3fe0b5586cf9890f, 0x3fe0c0f145e46c85, 0x3fe0cc922b7247f7, 0x3fe0d83b23395dec
> +        .quad 0x3fe0e3ec32d3d1a2, 0x3fe0efa55fdfa9c5, 0x3fe0fb66affed31b, 0x3fe1073028d7233e
> +        .quad 0x3fe11301d0125b51, 0x3fe11edbab5e2ab6, 0x3fe12abdc06c31cc, 0x3fe136a814f204ab
> +        .quad 0x3fe1429aaea92de0, 0x3fe14e95934f312e, 0x3fe15a98c8a58e51, 0x3fe166a45471c3c2
> +        .quad 0x3fe172b83c7d517b, 0x3fe17ed48695bbc0, 0x3fe18af9388c8dea, 0x3fe1972658375d2f
> +        .quad 0x3fe1a35beb6fcb75, 0x3fe1af99f8138a1c, 0x3fe1bbe084045cd4, 0x3fe1c82f95281c6b
> +        .quad 0x3fe1d4873168b9aa, 0x3fe1e0e75eb44027, 0x3fe1ed5022fcd91d, 0x3fe1f9c18438ce4d
> +        .quad 0x3fe2063b88628cd6, 0x3fe212be3578a819, 0x3fe21f49917ddc96, 0x3fe22bdda27912d1
> +        .quad 0x3fe2387a6e756238, 0x3fe2451ffb82140a, 0x3fe251ce4fb2a63f, 0x3fe25e85711ece75
> +        .quad 0x3fe26b4565e27cdd, 0x3fe2780e341ddf29, 0x3fe284dfe1f56381, 0x3fe291ba7591bb70
> +        .quad 0x3fe29e9df51fdee1, 0x3fe2ab8a66d10f13, 0x3fe2b87fd0dad990, 0x3fe2c57e39771b2f
> +        .quad 0x3fe2d285a6e4030b, 0x3fe2df961f641589, 0x3fe2ecafa93e2f56, 0x3fe2f9d24abd886b
> +        .quad 0x3fe306fe0a31b715, 0x3fe31432edeeb2fd, 0x3fe32170fc4cd831, 0x3fe32eb83ba8ea32
> +        .quad 0x3fe33c08b26416ff, 0x3fe3496266e3fa2d, 0x3fe356c55f929ff1, 0x3fe36431a2de883b
> +        .quad 0x3fe371a7373aa9cb, 0x3fe37f26231e754a, 0x3fe38cae6d05d866, 0x3fe39a401b7140ef
> +        .quad 0x3fe3a7db34e59ff7, 0x3fe3b57fbfec6cf4, 0x3fe3c32dc313a8e5, 0x3fe3d0e544ede173
> +        .quad 0x3fe3dea64c123422, 0x3fe3ec70df1c5175, 0x3fe3fa4504ac801c, 0x3fe40822c367a024
> +        .quad 0x3fe4160a21f72e2a, 0x3fe423fb2709468a, 0x3fe431f5d950a897, 0x3fe43ffa3f84b9d4
> +        .quad 0x3fe44e086061892d, 0x3fe45c2042a7d232, 0x3fe46a41ed1d0057, 0x3fe4786d668b3237
> +        .quad 0x3fe486a2b5c13cd0, 0x3fe494e1e192aed2, 0x3fe4a32af0d7d3de, 0x3fe4b17dea6db7d7
> +        .quad 0x3fe4bfdad5362a27, 0x3fe4ce41b817c114, 0x3fe4dcb299fddd0d, 0x3fe4eb2d81d8abff
> +        .quad 0x3fe4f9b2769d2ca7, 0x3fe508417f4531ee, 0x3fe516daa2cf6642, 0x3fe5257de83f4eef
> +        .quad 0x3fe5342b569d4f82, 0x3fe542e2f4f6ad27, 0x3fe551a4ca5d920f, 0x3fe56070dde910d2
> +        .quad 0x3fe56f4736b527da, 0x3fe57e27dbe2c4cf, 0x3fe58d12d497c7fd, 0x3fe59c0827ff07cc
> +        .quad 0x3fe5ab07dd485429, 0x3fe5ba11fba87a03, 0x3fe5c9268a5946b7, 0x3fe5d84590998b93
> +        .quad 0x3fe5e76f15ad2148, 0x3fe5f6a320dceb71, 0x3fe605e1b976dc09, 0x3fe6152ae6cdf6f4
> +        .quad 0x3fe6247eb03a5585, 0x3fe633dd1d1929fd, 0x3fe6434634ccc320, 0x3fe652b9febc8fb7
> +        .quad 0x3fe6623882552225, 0x3fe671c1c70833f6, 0x3fe68155d44ca973, 0x3fe690f4b19e9538
> +        .quad 0x3fe6a09e667f3bcd, 0x3fe6b052fa75173e, 0x3fe6c012750bdabf, 0x3fe6cfdcddd47645
> +        .quad 0x3fe6dfb23c651a2f, 0x3fe6ef9298593ae5, 0x3fe6ff7df9519484, 0x3fe70f7466f42e87
> +        .quad 0x3fe71f75e8ec5f74, 0x3fe72f8286ead08a, 0x3fe73f9a48a58174, 0x3fe74fbd35d7cbfd
> +        .quad 0x3fe75feb564267c9, 0x3fe77024b1ab6e09, 0x3fe780694fde5d3f, 0x3fe790b938ac1cf6
> +        .quad 0x3fe7a11473eb0187, 0x3fe7b17b0976cfdb, 0x3fe7c1ed0130c132, 0x3fe7d26a62ff86f0
> +        .quad 0x3fe7e2f336cf4e62, 0x3fe7f3878491c491, 0x3fe80427543e1a12, 0x3fe814d2add106d9
> +        .quad 0x3fe82589994cce13, 0x3fe8364c1eb941f7, 0x3fe8471a4623c7ad, 0x3fe857f4179f5b21
> +        .quad 0x3fe868d99b4492ed, 0x3fe879cad931a436, 0x3fe88ac7d98a6699, 0x3fe89bd0a478580f
> +        .quad 0x3fe8ace5422aa0db, 0x3fe8be05bad61778, 0x3fe8cf3216b5448c, 0x3fe8e06a5e0866d9
> +        .quad 0x3fe8f1ae99157736, 0x3fe902fed0282c8a, 0x3fe9145b0b91ffc6, 0x3fe925c353aa2fe2
> +        .quad 0x3fe93737b0cdc5e5, 0x3fe948b82b5f98e5, 0x3fe95a44cbc8520f, 0x3fe96bdd9a7670b3
> +        .quad 0x3fe97d829fde4e50, 0x3fe98f33e47a22a2, 0x3fe9a0f170ca07ba, 0x3fe9b2bb4d53fe0d
> +        .quad 0x3fe9c49182a3f090, 0x3fe9d674194bb8d5, 0x3fe9e86319e32323, 0x3fe9fa5e8d07f29e
> +        .quad 0x3fea0c667b5de565, 0x3fea1e7aed8eb8bb, 0x3fea309bec4a2d33, 0x3fea42c980460ad8
> +        .quad 0x3fea5503b23e255d, 0x3fea674a8af46052, 0x3fea799e1330b358, 0x3fea8bfe53c12e59
> +        .quad 0x3fea9e6b5579fdbf, 0x3feab0e521356eba, 0x3feac36bbfd3f37a, 0x3fead5ff3a3c2774
> +        .quad 0x3feae89f995ad3ad, 0x3feafb4ce622f2ff, 0x3feb0e07298db666, 0x3feb20ce6c9a8952
> +        .quad 0x3feb33a2b84f15fb, 0x3feb468415b749b1, 0x3feb59728de5593a, 0x3feb6c6e29f1c52a
> +        .quad 0x3feb7f76f2fb5e47, 0x3feb928cf22749e4, 0x3feba5b030a1064a, 0x3febb8e0b79a6f1f
> +        .quad 0x3febcc1e904bc1d2, 0x3febdf69c3f3a207, 0x3febf2c25bd71e09, 0x3fec06286141b33d
> +        .quad 0x3fec199bdd85529c, 0x3fec2d1cd9fa652c, 0x3fec40ab5fffd07a, 0x3fec544778fafb22
> +        .quad 0x3fec67f12e57d14b, 0x3fec7ba88988c933, 0x3fec8f6d9406e7b5, 0x3feca3405751c4db
> +        .quad 0x3fecb720dcef9069, 0x3feccb0f2e6d1675, 0x3fecdf0b555dc3fa, 0x3fecf3155b5bab74
> +        .quad 0x3fed072d4a07897c, 0x3fed1b532b08c968, 0x3fed2f87080d89f2, 0x3fed43c8eacaa1d6
> +        .quad 0x3fed5818dcfba487, 0x3fed6c76e862e6d3, 0x3fed80e316c98398, 0x3fed955d71ff6075
> +        .quad 0x3feda9e603db3285, 0x3fedbe7cd63a8315, 0x3fedd321f301b460, 0x3fede7d5641c0658
> +        .quad 0x3fedfc97337b9b5f, 0x3fee11676b197d17, 0x3fee264614f5a129, 0x3fee3b333b16ee12
> +        .quad 0x3fee502ee78b3ff6, 0x3fee653924676d76, 0x3fee7a51fbc74c83, 0x3fee8f7977cdb740
> +        .quad 0x3feea4afa2a490da, 0x3feeb9f4867cca6e, 0x3feecf482d8e67f1, 0x3feee4aaa2188510
> +        .quad 0x3feefa1bee615a27, 0x3fef0f9c1cb6412a, 0x3fef252b376bba97, 0x3fef3ac948dd7274
> +        .quad 0x3fef50765b6e4540, 0x3fef6632798844f8, 0x3fef7bfdad9cbe14, 0x3fef91d802243c89
> +        .quad 0x3fefa7c1819e90d8, 0x3fefbdba3692d514, 0x3fefd3c22b8f71f1, 0x3fefe9d96b2a23d9
> +        .quad 0x3ff0000000000000
> +        .align 32
> +        .quad 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe /* _dbInvLn2 = 1/log(2) */
> +        .align 32
> +        .quad 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000 /* _dbLn2hi  = log(2) hi*/
> +        .align 32
> +        .quad 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899 /* _dbLn2lo  = log(2) lo*/
> +        .align 32
> +        .quad 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000 /* _dbShifter */
> +        .align 32
> +        .long 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF, 0x000000FF         /* _iIndexMask */
> +        .align 32
> +        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
> +        .align 32
> +        .quad 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14 /* _dPC3 */
> +        .align 32
> +        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
> +        .align 32
> +        .long 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100, 0x00000100 /* _iMaxIndex */
> +        .align 32
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000 /* _lExpMask */
> +        .align 32
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 /* _dSign*/
> +        .align 32
> +        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99 /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
> +        .align 32
> +        .type	__svml_dcosh_data_internal,@object
> +        .size	__svml_dcosh_data_internal,.-__svml_dcosh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S
> new file mode 100644
> index 0000000000..8b385cc297
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized cosh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_cosh _ZGVeN8v_cosh_avx2_wrapper
> +#include "../svml_d_cosh8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c
> new file mode 100644
> index 0000000000..576b3186d5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized cosh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_cosh
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_cosh, __GI__ZGVeN8v_cosh, __redirect__ZGVeN8v_cosh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S
> new file mode 100644
> index 0000000000..53040cef9a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_cosh8_core_avx512.S
> @@ -0,0 +1,323 @@
> +/* Function cosh vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute cosh(x) as (exp(x)+exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   cosh(NaN) = quiet NaN, and raise invalid exception
> + *   cosh(INF) = that INF
> + *   cosh(0)   = 1
> + *   cosh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_dcosh_data_internal
> + */
> +#define _dTp_h                        	0
> +#define _dTn_h                        	128
> +#define _dbShifter_UISA               	256
> +#define _dPC2_UISA                    	320
> +#define _dPC3_UISA                    	384
> +#define _dPC4_UISA                    	448
> +#define _dPC5_UISA                    	512
> +#define _dPC6_UISA                    	576
> +#define _dPC7_UISA                    	640
> +#define _dbInvLn2                     	704
> +#define _dbLn2hi                      	768
> +#define _dbLn2lo                      	832
> +#define _dbShifter                    	896
> +#define _dPC2                         	960
> +#define _dPC3                         	1024
> +#define _dPC4                         	1088
> +#define _lExpMask                     	1152
> +#define _dSign                        	1216
> +#define _iDomainRange                 	1280
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_cosh_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   _dSign+__svml_dcosh_data_internal(%rip), %zmm11
> +        vmovups   _dbShifter_UISA+__svml_dcosh_data_internal(%rip), %zmm15
> +
> +/*
> + *  Load argument
> + * dM = x*2^K/log(2) + RShifter
> + */
> +        vmovups   _dbInvLn2+__svml_dcosh_data_internal(%rip), %zmm4
> +        vmovups   _dbLn2hi+__svml_dcosh_data_internal(%rip), %zmm2
> +        vmovups   _dbLn2lo+__svml_dcosh_data_internal(%rip), %zmm3
> +        vmovups   _dPC7_UISA+__svml_dcosh_data_internal(%rip), %zmm8
> +        vmovups   _dPC6_UISA+__svml_dcosh_data_internal(%rip), %zmm9
> +        vmovups   _dPC2_UISA+__svml_dcosh_data_internal(%rip), %zmm7
> +        vmovups   _dPC3_UISA+__svml_dcosh_data_internal(%rip), %zmm6
> +        vmovaps   %zmm0, %zmm10
> +
> +/*  Abs argument  */
> +        vandnpd   %zmm10, %zmm11, %zmm5
> +
> +/*  Index and lookup  */
> +        vmovups   __svml_dcosh_data_internal(%rip), %zmm11
> +        vmovups   _dTn_h+__svml_dcosh_data_internal(%rip), %zmm0
> +        vfmadd213pd {rn-sae}, %zmm15, %zmm5, %zmm4
> +
> +/*
> + * Check for overflow\underflow
> + *
> + */
> +        vpsrlq    $32, %zmm5, %zmm12
> +
> +/* dN = dM - RShifter */
> +        vsubpd    {rn-sae}, %zmm15, %zmm4, %zmm1
> +        vpmovqd   %zmm12, %ymm13
> +        vpermt2pd _dTn_h+64+__svml_dcosh_data_internal(%rip), %zmm4, %zmm0
> +        vpermt2pd _dTp_h+64+__svml_dcosh_data_internal(%rip), %zmm4, %zmm11
> +
> +/* dR = dX - dN*Log2_hi/2^K */
> +        vfnmadd231pd {rn-sae}, %zmm2, %zmm1, %zmm5
> +
> +/*
> + * poly(r) = Gmjp(1 + a2*r^2 + a4*r^4) + Gmjn*(r+ a3*r^3 +a5*r^5)       =
> + * = Gmjp_h +Gmjp_l+ Gmjp*r^2*(a2 + a4*r^2) + Gmjn*(r+ r^3*(a3 +a5*r^2)
> + */
> +        vmovups   _dPC5_UISA+__svml_dcosh_data_internal(%rip), %zmm12
> +        vpsllq    $48, %zmm4, %zmm2
> +
> +/* dR = dX - dN*Log2_hi/2^K */
> +        vfnmadd231pd {rn-sae}, %zmm3, %zmm1, %zmm5
> +        vmulpd    {rn-sae}, %zmm5, %zmm5, %zmm1
> +        vfmadd231pd {rn-sae}, %zmm1, %zmm8, %zmm12
> +        vmovups   _dPC4_UISA+__svml_dcosh_data_internal(%rip), %zmm8
> +        vfmadd213pd {rn-sae}, %zmm6, %zmm1, %zmm12
> +        vfmadd231pd {rn-sae}, %zmm1, %zmm9, %zmm8
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm1, %zmm8
> +        vpcmpgtd  _iDomainRange+__svml_dcosh_data_internal(%rip), %ymm13, %ymm14
> +        vmovmskps %ymm14, %edx
> +
> +/* dOut=r^2*(a2 + a4*r^2) */
> +        vmulpd    {rn-sae}, %zmm1, %zmm8, %zmm6
> +
> +/* lM now is an EXP(2^N) */
> +        vpandq    _lExpMask+__svml_dcosh_data_internal(%rip), %zmm2, %zmm3
> +        vpaddq    %zmm3, %zmm11, %zmm4
> +        vpsubq    %zmm3, %zmm0, %zmm0
> +        vsubpd    {rn-sae}, %zmm0, %zmm4, %zmm14
> +        vaddpd    {rn-sae}, %zmm0, %zmm4, %zmm13
> +
> +/* dM=r^2*(a3 +a5*r^2) */
> +        vmulpd    {rn-sae}, %zmm1, %zmm12, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm13, %zmm6
> +
> +/* dM= r + r^3*(a3 +a5*r^2) */
> +        vfmadd213pd {rn-sae}, %zmm5, %zmm5, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm6, %zmm14, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm10
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm10, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      cosh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_cosh_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dcosh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _dTp_h[(1<<4)][2];
> +        __declspec(align(64)) VUINT32 _dTn_h[(1<<4)][2];
> +        __declspec(align(64)) VUINT32 _dbShifter_UISA[8][2];
> +        __declspec(align(64)) VUINT32 _dPC2_UISA[8][2];
> +        __declspec(align(64)) VUINT32 _dPC3_UISA[8][2];
> +        __declspec(align(64)) VUINT32 _dPC4_UISA[8][2];
> +        __declspec(align(64)) VUINT32 _dPC5_UISA[8][2];
> +        __declspec(align(64)) VUINT32 _dPC6_UISA[8][2];
> +        __declspec(align(64)) VUINT32 _dPC7_UISA[8][2];
> +        __declspec(align(64)) VUINT32 _dbInvLn2[8][2];
> +        __declspec(align(64)) VUINT32 _dbLn2hi[8][2];
> +        __declspec(align(64)) VUINT32 _dbLn2lo[8][2];
> +        __declspec(align(64)) VUINT32 _dbShifter[8][2];
> +        __declspec(align(64)) VUINT32 _dPC2[8][2];
> +        __declspec(align(64)) VUINT32 _dPC3[8][2];
> +        __declspec(align(64)) VUINT32 _dPC4[8][2];
> +        __declspec(align(64)) VUINT32 _lExpMask[8][2];
> +        __declspec(align(64)) VUINT32 _dSign[8][2];               //0x8000000000000000
> +        __declspec(align(64)) VUINT32 _iDomainRange[16][1];
> +} __svml_dcosh_data_internal;
> +#endif
> +__svml_dcosh_data_internal:
> +        /*== _dTp_h ==*/
> +        .quad 0x3fe0000000000000, 0x3fe0b5586cf9890f, 0x3fe172b83c7d517b, 0x3fe2387a6e756238
> +        .quad 0x3fe306fe0a31b715, 0x3fe3dea64c123422, 0x3fe4bfdad5362a27, 0x3fe5ab07dd485429
> +        .quad 0x3fe6a09e667f3bcd, 0x3fe7a11473eb0187, 0x3fe8ace5422aa0db, 0x3fe9c49182a3f090
> +        .quad 0x3feae89f995ad3ad, 0x3fec199bdd85529c, 0x3fed5818dcfba487, 0x3feea4afa2a490da
> +        /*== dTn_h ==*/
> +        .align 64
> +        .quad 0x3fe0000000000000, 0x3fdea4afa2a490da, 0x3fdd5818dcfba487, 0x3fdc199bdd85529c
> +        .quad 0x3fdae89f995ad3ad, 0x3fd9c49182a3f090, 0x3fd8ace5422aa0db, 0x3fd7a11473eb0187
> +        .quad 0x3fd6a09e667f3bcd, 0x3fd5ab07dd485429, 0x3fd4bfdad5362a27, 0x3fd3dea64c123422
> +        .quad 0x3fd306fe0a31b715, 0x3fd2387a6e756238, 0x3fd172b83c7d517b, 0x3fd0b5586cf9890f
> +        .align 64
> +        .quad 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000, 0x42F8000000000000 /* _dbShifter_UISA  */
> +        .align 64
> +        .quad 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004, 0x3fe0000000000004 /* _dPC2_UISA       */
> +        .align 64
> +        .quad 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543, 0x3fc5555555555543 /* _dPC3_UISA       */
> +        .align 64
> +        .quad 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37, 0x3fa5555555484f37 /* _dPC4_UISA       */
> +        .align 64
> +        .quad 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c, 0x3f81111111286a0c /* _dPC5_UISA       */
> +        .align 64
> +        .quad 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116, 0x3f56c183da08f116 /* _dPC6_UISA       */
> +        .align 64
> +        .quad 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da, 0x3f2a018d76da03da /* _dPC7_UISA       */
> +        /*== _dbT ==*/
> +        .align 64
> +        .quad 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe /* _dbInvLn2 = 1/log(2) */
> +        .align 64
> +        .quad 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000, 0x3FE62E42FEFC0000 /* _dbLn2hi  = log(2) hi*/
> +        .align 64
> +        .quad 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899, 0xBDAC610CA86C3899 /* _dbLn2lo  = log(2) lo*/
> +        .align 64
> +        .quad 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000, 0x42B8000000000000 /* _dbShifter */
> +        .align 64
> +        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
> +        .align 64
> +        .quad 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14, 0x3FC5555570813E14 /* _dPC3 */
> +        .align 64
> +        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
> +        .align 64
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000 /* _lExpMask */
> +        .align 64
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 /* _dSign*/
> +        .align 64
> +        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99 /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
> +        .align 64
> +        .type	__svml_dcosh_data_internal,@object
> +        .size	__svml_dcosh_data_internal,.-__svml_dcosh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S
> new file mode 100644
> index 0000000000..456d8a129f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized coshf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_coshf _ZGVeN16v_coshf_avx2_wrapper
> +#include "../svml_s_coshf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c
> new file mode 100644
> index 0000000000..34c008871a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized coshf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_coshf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_coshf, __GI__ZGVeN16v_coshf,
> +	       __redirect__ZGVeN16v_coshf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S
> new file mode 100644
> index 0000000000..276e3cfe4d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf16_core_avx512.S
> @@ -0,0 +1,321 @@
> +/* Function coshf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute cosh(x) as (exp(x)+exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   cosh(NaN) = quiet NaN, and raise invalid exception
> + *   cosh(INF) = that INF
> + *   cosh(0)   = 1
> + *   cosh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_scosh_data_internal
> + */
> +#define _sExp_tbl_PH                  	0
> +#define _sExp_tbl_NH                  	128
> +#define _sShifter_UISA                	256
> +#define _iDomainRange_UISA            	320
> +#define _sPC1_UISA                    	384
> +#define _sPC2_UISA                    	448
> +#define _sPC3_UISA                    	512
> +#define _sInvLn2                      	576
> +#define _sLn2hi                       	640
> +#define _sLn2lo                       	704
> +#define _sSign                        	768
> +#define _iExpMask                     	832
> +#define _sShifter                     	896
> +#define _iDomainRange                 	960
> +#define _sPC1                         	1024
> +#define _sPC2                         	1088
> +#define _sPC3                         	1152
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_coshf_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   _sSign+__svml_scosh_data_internal(%rip), %zmm4
> +        vmovups   _sShifter_UISA+__svml_scosh_data_internal(%rip), %zmm6
> +
> +/*
> + *  Load argument
> + * dM = x/log(2) + RShifter
> + */
> +        vmovups   _sInvLn2+__svml_scosh_data_internal(%rip), %zmm10
> +        vmovups   _sLn2hi+__svml_scosh_data_internal(%rip), %zmm7
> +        vmovups   _sLn2lo+__svml_scosh_data_internal(%rip), %zmm9
> +
> +/*  */
> +        vmovups   _sPC3_UISA+__svml_scosh_data_internal(%rip), %zmm2
> +
> +/* x^2 */
> +        vmovups   _sPC2_UISA+__svml_scosh_data_internal(%rip), %zmm3
> +
> +/*  G1,G2 2^N,2^(-N)  */
> +        vmovups   __svml_scosh_data_internal(%rip), %zmm12
> +        vmovups   _sExp_tbl_NH+__svml_scosh_data_internal(%rip), %zmm13
> +
> +/*
> + *  Implementation
> + *  Abs argument
> + */
> +        vandnps   %zmm0, %zmm4, %zmm1
> +
> +/* Check for overflow\underflow  */
> +        vpternlogd $255, %zmm5, %zmm5, %zmm5
> +        vfmadd213ps {rn-sae}, %zmm6, %zmm1, %zmm10
> +        vpcmpd    $1, _iDomainRange_UISA+__svml_scosh_data_internal(%rip), %zmm1, %k1
> +
> +/* iM now is an EXP(2^N) */
> +        vpslld    $18, %zmm10, %zmm11
> +
> +/*
> + *  R
> + * sN = sM - RShifter
> + */
> +        vsubps    {rn-sae}, %zmm6, %zmm10, %zmm8
> +        vpermt2ps _sExp_tbl_PH+64+__svml_scosh_data_internal(%rip), %zmm10, %zmm12
> +        vpermt2ps _sExp_tbl_NH+64+__svml_scosh_data_internal(%rip), %zmm10, %zmm13
> +        vpandnd   %zmm1, %zmm1, %zmm5{%k1}
> +
> +/* sR = sX - sN*Log2_hi */
> +        vfnmadd231ps {rn-sae}, %zmm7, %zmm8, %zmm1
> +        vptestmd  %zmm5, %zmm5, %k0
> +
> +/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
> +        vfnmadd231ps {rn-sae}, %zmm9, %zmm8, %zmm1
> +        kmovw     %k0, %edx
> +        vmulps    {rn-sae}, %zmm1, %zmm1, %zmm4
> +        vmulps    {rn-sae}, %zmm4, %zmm2, %zmm2
> +
> +/* sSinh_r = r + r*(r^2*(a3)) */
> +        vfmadd213ps {rn-sae}, %zmm1, %zmm1, %zmm2
> +
> +/* sOut = r^2*(a2) */
> +        vmulps    {rn-sae}, %zmm4, %zmm3, %zmm1
> +        vpandd    _iExpMask+__svml_scosh_data_internal(%rip), %zmm11, %zmm14
> +        vpaddd    %zmm14, %zmm12, %zmm15
> +        vpsubd    %zmm14, %zmm13, %zmm10
> +
> +/* sG2 = 2^N*Th + 2^(-N)*T_h */
> +        vaddps    {rn-sae}, %zmm10, %zmm15, %zmm5
> +
> +/* sG1 = 2^N*Th - 2^(-N)*T_h */
> +        vsubps    {rn-sae}, %zmm10, %zmm15, %zmm6
> +
> +/* res = sG1*(r + r*(r^2*(a3))) + sG2*(1+r^2*(a2)) */
> +        vfmadd213ps {rn-sae}, %zmm5, %zmm5, %zmm1
> +        vfmadd213ps {rn-sae}, %zmm1, %zmm2, %zmm6
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm6
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %zmm6, %zmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm0, 64(%rsp)
> +        vmovups   %zmm6, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm6
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm6
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm6
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      coshf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_coshf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_scosh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _sExp_tbl_PH[32][1];
> +        __declspec(align(64)) VUINT32 _sExp_tbl_NH[32][1];
> +        __declspec(align(64)) VUINT32 _sShifter_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _iDomainRange_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _sPC1_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _sPC2_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _sPC3_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _sInvLn2[16][1];
> +        __declspec(align(64)) VUINT32 _sLn2hi[16][1];
> +        __declspec(align(64)) VUINT32 _sLn2lo[16][1];
> +        __declspec(align(64)) VUINT32 _sSign[16][1];
> +        __declspec(align(64)) VUINT32 _iExpMask[16][1];
> +        __declspec(align(64)) VUINT32 _sShifter[16][1];
> +        __declspec(align(64)) VUINT32 _iDomainRange[16][1];
> +        __declspec(align(64)) VUINT32 _sPC1[16][1];
> +        __declspec(align(64)) VUINT32 _sPC2[16][1];
> +        __declspec(align(64)) VUINT32 _sPC3[16][1];
> +} __svml_scosh_data_internal;
> +#endif
> +__svml_scosh_data_internal:
> +        /* _sExp_tbl_PH 2^(i/32-1), i=0..31 */
> +        .long 0x3f000000, 0x3f02cd87, 0x3f05aac3, 0x3f08980f
> +        .long 0x3f0b95c2, 0x3f0ea43a, 0x3f11c3d3, 0x3f14f4f0
> +        .long 0x3f1837f0, 0x3f1b8d3a, 0x3f1ef532, 0x3f227043
> +        .long 0x3f25fed7, 0x3f29a15b, 0x3f2d583f, 0x3f3123f6
> +        .long 0x3f3504f3, 0x3f38fbaf, 0x3f3d08a4, 0x3f412c4d
> +        .long 0x3f45672a, 0x3f49b9be, 0x3f4e248c, 0x3f52a81e
> +        .long 0x3f5744fd, 0x3f5bfbb8, 0x3f60ccdf, 0x3f65b907
> +        .long 0x3f6ac0c7, 0x3f6fe4ba, 0x3f75257d, 0x3f7a83b3
> +        /* _sExp_tbl_NH 2^(-i/32-1), i=0..31 */
> +        .align 64
> +        .long 0x3f000000, 0x3efa83b3, 0x3ef5257d, 0x3eefe4ba
> +        .long 0x3eeac0c7, 0x3ee5b907, 0x3ee0ccdf, 0x3edbfbb8
> +        .long 0x3ed744fd, 0x3ed2a81e, 0x3ece248c, 0x3ec9b9be
> +        .long 0x3ec5672a, 0x3ec12c4d, 0x3ebd08a4, 0x3eb8fbaf
> +        .long 0x3eb504f3, 0x3eb123f6, 0x3ead583f, 0x3ea9a15b
> +        .long 0x3ea5fed7, 0x3ea27043, 0x3e9ef532, 0x3e9b8d3a
> +        .long 0x3e9837f0, 0x3e94f4f0, 0x3e91c3d3, 0x3e8ea43a
> +        .long 0x3e8b95c2, 0x3e88980f, 0x3e85aac3, 0x3e82cd87
> +        .align 64
> +        .long 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000, 0x48c00000         /* 1.5*2^18 _sShifter_UISA */
> +        .align 64
> +        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E         /* _iDomainRange_UISA */
> +        .align 64
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1_UISA=1       */
> +        .align 64
> +        .long 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f, 0x3f00010f         /* _sPC2_UISA         */
> +        .align 64
> +        .long 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd, 0x3e2aaacd         /* _sPC3_UISA         */
> +        .align 64
> +        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B       /* _sInvLn2  */  //k=0
> +        .align 64
> +        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000       /* _sLn2hi   */
> +        .align 64
> +        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4       /* _sLn2lo   */
> +        .align 64
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000       /* _sSign    */
> +        .align 64
> +        .long 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000, 0x7f800000       /* _iExpMask */
> +        .align 64
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
> +        .align 64
> +        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
> +        .align 64
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
> +        .align 64
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
> +        .align 64
> +        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
> +        .align 64
> +        .type	__svml_scosh_data_internal,@object
> +        .size	__svml_scosh_data_internal,.-__svml_scosh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S
> new file mode 100644
> index 0000000000..c719dc7d6a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized coshf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_coshf _ZGVbN4v_coshf_sse2
> +#include "../svml_s_coshf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c
> new file mode 100644
> index 0000000000..c2dfcd44f8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized coshf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_coshf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_coshf, __GI__ZGVbN4v_coshf,
> +	       __redirect__ZGVbN4v_coshf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S
> new file mode 100644
> index 0000000000..506f6a4bd9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf4_core_sse4.S
> @@ -0,0 +1,305 @@
> +/* Function coshf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute cosh(x) as (exp(x)+exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   cosh(NaN) = quiet NaN, and raise invalid exception
> + *   cosh(INF) = that INF
> + *   cosh(0)   = 1
> + *   cosh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_scosh_data_internal
> + */
> +#define _sInvLn2                      	0
> +#define _sLn2hi                       	16
> +#define _sLn2lo                       	32
> +#define _sSign                        	48
> +#define _sShifter                     	64
> +#define _iDomainRange                 	80
> +#define _sPC1                         	96
> +#define _sPC2                         	112
> +#define _sPC3                         	128
> +#define _sPC4                         	144
> +#define _sPC5                         	160
> +#define _sPC6                         	176
> +#define _iHalf                        	192
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_coshf_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +
> +/*
> + *  Implementation
> + *  Abs argument
> + */
> +        movups    _sSign+__svml_scosh_data_internal(%rip), %xmm1
> +
> +/*
> + *  Load argument
> + * dM = x/log(2) + RShifter
> + */
> +        movups    _sInvLn2+__svml_scosh_data_internal(%rip), %xmm9
> +        andnps    %xmm0, %xmm1
> +        mulps     %xmm1, %xmm9
> +
> +/* Check for overflow\underflow  */
> +        movaps    %xmm1, %xmm3
> +        movups    _sShifter+__svml_scosh_data_internal(%rip), %xmm4
> +        movups    _sLn2hi+__svml_scosh_data_internal(%rip), %xmm5
> +        addps     %xmm4, %xmm9
> +
> +/*
> + *  R
> + * sN = sM - RShifter
> + */
> +        movaps    %xmm9, %xmm6
> +
> +/*
> + *  G1,G2 2^N,2^(-N)
> + * iM now is an EXP(2^N)
> + */
> +        pslld     $23, %xmm9
> +        movups    _sLn2lo+__svml_scosh_data_internal(%rip), %xmm7
> +        subps     %xmm4, %xmm6
> +
> +/* sR = sX - sN*Log2_hi */
> +        mulps     %xmm6, %xmm5
> +
> +/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
> +        mulps     %xmm6, %xmm7
> +        movdqu    _iDomainRange+__svml_scosh_data_internal(%rip), %xmm2
> +        pcmpgtd   %xmm2, %xmm3
> +        pcmpeqd   %xmm1, %xmm2
> +
> +/*
> + * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) ....
> + * sSinh_r = (a3+r^2*a5)
> + */
> +        movups    _sPC5+__svml_scosh_data_internal(%rip), %xmm10
> +        por       %xmm2, %xmm3
> +
> +/*
> + * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2)
> + * sOut = (a4 +a6*sR2)
> + */
> +        movups    _sPC6+__svml_scosh_data_internal(%rip), %xmm11
> +        subps     %xmm5, %xmm1
> +        movmskps  %xmm3, %edx
> +        movdqu    _iHalf+__svml_scosh_data_internal(%rip), %xmm8
> +        subps     %xmm7, %xmm1
> +
> +/* sR2 = sR^2,shaffled */
> +        movaps    %xmm1, %xmm13
> +        movdqa    %xmm8, %xmm2
> +        mulps     %xmm1, %xmm13
> +        paddd     %xmm9, %xmm2
> +        mulps     %xmm13, %xmm10
> +        psubd     %xmm9, %xmm8
> +        mulps     %xmm13, %xmm11
> +        addps     _sPC3+__svml_scosh_data_internal(%rip), %xmm10
> +        addps     _sPC4+__svml_scosh_data_internal(%rip), %xmm11
> +
> +/* sSinh_r = r^2*(a3+r^2*a5) */
> +        mulps     %xmm13, %xmm10
> +
> +/* sOut = a2+sR2*(a4+a6*sR2) */
> +        mulps     %xmm13, %xmm11
> +
> +/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */
> +        mulps     %xmm1, %xmm10
> +        addps     _sPC2+__svml_scosh_data_internal(%rip), %xmm11
> +        addps     %xmm10, %xmm1
> +
> +/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */
> +        mulps     %xmm11, %xmm13
> +
> +/* sG1 = 2^(N-1)-2^(-N-1) */
> +        movdqa    %xmm2, %xmm12
> +
> +/* sG2 = 2^(N-1)+2^(-N-1) */
> +        addps     %xmm8, %xmm2
> +        subps     %xmm8, %xmm12
> +
> +/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        mulps     %xmm2, %xmm13
> +
> +/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        mulps     %xmm1, %xmm12
> +        addps     %xmm12, %xmm13
> +
> +/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        addps     %xmm13, %xmm2
> +
> +/*  Ret H  */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm2, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm2, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm2
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm2
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      coshf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_coshf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_scosh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _sInvLn2[4][1];
> +        __declspec(align(16)) VUINT32 _sLn2hi[4][1];
> +        __declspec(align(16)) VUINT32 _sLn2lo[4][1];
> +        __declspec(align(16)) VUINT32 _sSign[4][1];
> +        __declspec(align(16)) VUINT32 _sShifter[4][1];
> +        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
> +        __declspec(align(16)) VUINT32 _sPC1[4][1];
> +        __declspec(align(16)) VUINT32 _sPC2[4][1];
> +        __declspec(align(16)) VUINT32 _sPC3[4][1];
> +        __declspec(align(16)) VUINT32 _sPC4[4][1];
> +        __declspec(align(16)) VUINT32 _sPC5[4][1];
> +        __declspec(align(16)) VUINT32 _sPC6[4][1];
> +        __declspec(align(16)) VUINT32 _iHalf[4][1];
> +} __svml_scosh_data_internal;
> +#endif
> +__svml_scosh_data_internal:
> +        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B       /* _sInvLn2  */  //k=0
> +        .align 16
> +        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000       /* _sLn2hi   */
> +        .align 16
> +        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4       /* _sLn2lo   */
> +        .align 16
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000       /* _sSign    */
> +        .align 16
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
> +        .align 16
> +        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
> +        .align 16
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
> +        .align 16
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
> +        .align 16
> +        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
> +        .align 16
> +        .long 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72         /* _sPC4  */
> +        .align 16
> +        .long 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461         /* _sPC5  */
> +        .align 16
> +        .long 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3         /* _sPC6  */
> +        // Integer constants
> +        .align 16
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000 /* _iHalf*/
> +        .align 16
> +        .type	__svml_scosh_data_internal,@object
> +        .size	__svml_scosh_data_internal,.-__svml_scosh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S
> new file mode 100644
> index 0000000000..c27229e1fa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized coshf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_coshf _ZGVdN8v_coshf_sse_wrapper
> +#include "../svml_s_coshf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c
> new file mode 100644
> index 0000000000..e82818b2c9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized coshf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_coshf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_coshf, __GI__ZGVdN8v_coshf,
> +	       __redirect__ZGVdN8v_coshf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S
> new file mode 100644
> index 0000000000..9149061e7e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_coshf8_core_avx2.S
> @@ -0,0 +1,308 @@
> +/* Function coshf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute cosh(x) as (exp(x)+exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   cosh(NaN) = quiet NaN, and raise invalid exception
> + *   cosh(INF) = that INF
> + *   cosh(0)   = 1
> + *   cosh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_scosh_data_internal
> + */
> +#define _sInvLn2                      	0
> +#define _sLn2hi                       	32
> +#define _sLn2lo                       	64
> +#define _sSign                        	96
> +#define _sShifter                     	128
> +#define _iDomainRange                 	160
> +#define _sPC1                         	192
> +#define _sPC2                         	224
> +#define _sPC3                         	256
> +#define _sPC4                         	288
> +#define _sPC5                         	320
> +#define _sPC6                         	352
> +#define _iHalf                        	384
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_coshf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        vmovups   _sSign+__svml_scosh_data_internal(%rip), %ymm2
> +        vmovups   _sShifter+__svml_scosh_data_internal(%rip), %ymm7
> +
> +/*
> + *  Load argument
> + * dM = x/log(2) + RShifter
> + */
> +        vmovups   _sInvLn2+__svml_scosh_data_internal(%rip), %ymm10
> +        vmovups   _sLn2hi+__svml_scosh_data_internal(%rip), %ymm8
> +        vmovups   _iDomainRange+__svml_scosh_data_internal(%rip), %ymm3
> +
> +/*
> + * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) ....
> + * sSinh_r = (a3+r^2*a5)
> + */
> +        vmovups   _sPC5+__svml_scosh_data_internal(%rip), %ymm15
> +        vmovups   _iHalf+__svml_scosh_data_internal(%rip), %ymm11
> +        vmovaps   %ymm0, %ymm1
> +
> +/*
> + *  Implementation
> + *  Abs argument
> + */
> +        vandnps   %ymm1, %ymm2, %ymm0
> +        vfmadd213ps %ymm7, %ymm0, %ymm10
> +
> +/*
> + *  R
> + * sN = sM - RShifter
> + */
> +        vsubps    %ymm7, %ymm10, %ymm9
> +
> +/*
> + *  G1,G2 2^N,2^(-N)
> + * iM now is an EXP(2^N)
> + */
> +        vpslld    $23, %ymm10, %ymm12
> +
> +/* Check for overflow\underflow  */
> +        vpcmpgtd  %ymm3, %ymm0, %ymm4
> +        vpcmpeqd  %ymm3, %ymm0, %ymm5
> +
> +/* sR = sX - sN*Log2_hi */
> +        vfnmadd231ps %ymm8, %ymm9, %ymm0
> +        vpaddd    %ymm12, %ymm11, %ymm13
> +        vpsubd    %ymm12, %ymm11, %ymm14
> +        vpor      %ymm5, %ymm4, %ymm6
> +
> +/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
> +        vfnmadd231ps _sLn2lo+__svml_scosh_data_internal(%rip), %ymm9, %ymm0
> +
> +/* sG1 = 2^(N-1)-2^(-N-1) */
> +        vsubps    %ymm14, %ymm13, %ymm4
> +
> +/* sG2 = 2^(N-1)+2^(-N-1) */
> +        vaddps    %ymm14, %ymm13, %ymm3
> +
> +/* sR2 = sR^2,shaffled */
> +        vmulps    %ymm0, %ymm0, %ymm2
> +        vfmadd213ps _sPC3+__svml_scosh_data_internal(%rip), %ymm2, %ymm15
> +
> +/* sSinh_r = r^2*(a3+r^2*a5) */
> +        vmulps    %ymm15, %ymm2, %ymm13
> +
> +/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */
> +        vfmadd213ps %ymm0, %ymm0, %ymm13
> +
> +/*
> + * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2)
> + * sOut = (a4 +a6*sR2)
> + */
> +        vmovups   _sPC6+__svml_scosh_data_internal(%rip), %ymm0
> +        vfmadd213ps _sPC4+__svml_scosh_data_internal(%rip), %ymm2, %ymm0
> +
> +/* sOut = a2+sR2*(a4+a6*sR2) */
> +        vfmadd213ps _sPC2+__svml_scosh_data_internal(%rip), %ymm2, %ymm0
> +
> +/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */
> +        vmulps    %ymm0, %ymm2, %ymm15
> +
> +/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        vmulps    %ymm15, %ymm3, %ymm14
> +
> +/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        vfmadd213ps %ymm14, %ymm13, %ymm4
> +        vmovmskps %ymm6, %edx
> +
> +/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        vaddps    %ymm4, %ymm3, %ymm0
> +
> +/*  Ret H  */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm1, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      coshf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_coshf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_scosh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _sInvLn2[8][1];
> +        __declspec(align(32)) VUINT32 _sLn2hi[8][1];
> +        __declspec(align(32)) VUINT32 _sLn2lo[8][1];
> +        __declspec(align(32)) VUINT32 _sSign[8][1];
> +        __declspec(align(32)) VUINT32 _sShifter[8][1];
> +        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
> +        __declspec(align(32)) VUINT32 _sPC1[8][1];
> +        __declspec(align(32)) VUINT32 _sPC2[8][1];
> +        __declspec(align(32)) VUINT32 _sPC3[8][1];
> +        __declspec(align(32)) VUINT32 _sPC4[8][1];
> +        __declspec(align(32)) VUINT32 _sPC5[8][1];
> +        __declspec(align(32)) VUINT32 _sPC6[8][1];
> +        __declspec(align(32)) VUINT32 _iHalf[8][1];
> +} __svml_scosh_data_internal;
> +#endif
> +__svml_scosh_data_internal:
> +        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B       /* _sInvLn2  */  //k=0
> +        .align 32
> +        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000       /* _sLn2hi   */
> +        .align 32
> +        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4       /* _sLn2lo   */
> +        .align 32
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000       /* _sSign    */
> +        .align 32
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
> +        .align 32
> +        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
> +        .align 32
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
> +        .align 32
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
> +        .align 32
> +        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
> +        .align 32
> +        .long 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72         /* _sPC4  */
> +        .align 32
> +        .long 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461         /* _sPC5  */
> +        .align 32
> +        .long 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3         /* _sPC6  */
> +        // Integer constants
> +        .align 32
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000 /* _iHalf*/
> +        .align 32
> +        .type	__svml_scosh_data_internal,@object
> +        .size	__svml_scosh_data_internal,.-__svml_scosh_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_cosh2_core.S b/sysdeps/x86_64/fpu/svml_d_cosh2_core.S
> new file mode 100644
> index 0000000000..f95952cfe5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_cosh2_core.S
> @@ -0,0 +1,29 @@
> +/* Function cosh vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_cosh)
> +WRAPPER_IMPL_SSE2 cosh
> +END (_ZGVbN2v_cosh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_cosh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_cosh4_core.S b/sysdeps/x86_64/fpu/svml_d_cosh4_core.S
> new file mode 100644
> index 0000000000..cc24d0fb6b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_cosh4_core.S
> @@ -0,0 +1,29 @@
> +/* Function cosh vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_cosh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_cosh
> +END (_ZGVdN4v_cosh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_cosh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S
> new file mode 100644
> index 0000000000..4323f5e308
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_cosh4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function cosh vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_cosh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_cosh
> +END (_ZGVcN4v_cosh)
> diff --git a/sysdeps/x86_64/fpu/svml_d_cosh8_core.S b/sysdeps/x86_64/fpu/svml_d_cosh8_core.S
> new file mode 100644
> index 0000000000..90ee1ca125
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_cosh8_core.S
> @@ -0,0 +1,25 @@
> +/* Function cosh vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_cosh)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_cosh
> +END (_ZGVeN8v_cosh)
> diff --git a/sysdeps/x86_64/fpu/svml_s_coshf16_core.S b/sysdeps/x86_64/fpu/svml_s_coshf16_core.S
> new file mode 100644
> index 0000000000..fe243b8b94
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_coshf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function coshf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_coshf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_coshf
> +END (_ZGVeN16v_coshf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_coshf4_core.S b/sysdeps/x86_64/fpu/svml_s_coshf4_core.S
> new file mode 100644
> index 0000000000..b55ede6e38
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_coshf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function coshf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_coshf)
> +WRAPPER_IMPL_SSE2 coshf
> +END (_ZGVbN4v_coshf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_coshf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_coshf8_core.S b/sysdeps/x86_64/fpu/svml_s_coshf8_core.S
> new file mode 100644
> index 0000000000..3ea02d0f19
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_coshf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function coshf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_coshf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_coshf
> +END (_ZGVdN8v_coshf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_coshf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S
> new file mode 100644
> index 0000000000..9b3002f7c9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_coshf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function coshf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_coshf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_coshf
> +END (_ZGVcN8v_coshf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c
> new file mode 100644
> index 0000000000..1dd311a562
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-cosh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c
> new file mode 100644
> index 0000000000..1dd311a562
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-cosh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c
> new file mode 100644
> index 0000000000..1dd311a562
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-cosh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-cosh.c b/sysdeps/x86_64/fpu/test-double-libmvec-cosh.c
> new file mode 100644
> index 0000000000..cf49ec5d87
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-cosh.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC cosh
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 256e8f07c9..68c449e04a 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVbN2v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVbN2vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10)
> +VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 9de1dab2c2..df67306373 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVdN4v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVdN4vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10)
> +VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 43865ab099..1a6731098f 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVcN4v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVcN4vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10)
> +VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 5dbdacf617..4cdfa918e8 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asin), _ZGVeN8v_asin)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypot), _ZGVeN8vv_hypot)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10)
> +VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c
> new file mode 100644
> index 0000000000..905dc3ca4a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-coshf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c
> new file mode 100644
> index 0000000000..905dc3ca4a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-coshf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c
> new file mode 100644
> index 0000000000..905dc3ca4a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-coshf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-coshf.c b/sysdeps/x86_64/fpu/test-float-libmvec-coshf.c
> new file mode 100644
> index 0000000000..94b899076b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-coshf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC coshf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index c159c8f583..47a9862233 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVeN16v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVeN16vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f)
> +VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index c745ef744a..e7c5410e7b 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVbN4v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVbN4vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f)
> +VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index c9226cf4dc..b8e9d48cd6 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -36,6 +36,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVdN8v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVdN8vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f)
> +VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 92970c5ace..328c827b27 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -33,6 +33,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (asinf), _ZGVcN8v_asinf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (hypotf), _ZGVcN8vv_hypotf)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f)
> +VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 12/18] x86-64: Add vector log2/log2f implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 12/18] x86-64: Add vector log2/log2f " Sunil K Pandey
@ 2021-12-29 21:26   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:26 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:54PM -0800, Sunil K Pandey wrote:
> Implement vectorized log2/log2f containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector log2/log2f with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |   11 +
>  math/bits/mathcalls.h                         |    2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |    4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |    1 +
>  sysdeps/x86_64/fpu/Versions                   |    2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
>  .../fpu/multiarch/svml_d_log22_core-sse2.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_log22_core.c  |   27 +
>  .../fpu/multiarch/svml_d_log22_core_sse4.S    | 1339 +++++++++++++++++
>  .../fpu/multiarch/svml_d_log24_core-sse.S     |   20 +
>  .../x86_64/fpu/multiarch/svml_d_log24_core.c  |   27 +
>  .../fpu/multiarch/svml_d_log24_core_avx2.S    | 1324 ++++++++++++++++
>  .../fpu/multiarch/svml_d_log28_core-avx2.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_log28_core.c  |   27 +
>  .../fpu/multiarch/svml_d_log28_core_avx512.S  |  293 ++++
>  .../fpu/multiarch/svml_s_log2f16_core-avx2.S  |   20 +
>  .../fpu/multiarch/svml_s_log2f16_core.c       |   28 +
>  .../multiarch/svml_s_log2f16_core_avx512.S    |  231 +++
>  .../fpu/multiarch/svml_s_log2f4_core-sse2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_s_log2f4_core.c |   28 +
>  .../fpu/multiarch/svml_s_log2f4_core_sse4.S   |  223 +++
>  .../fpu/multiarch/svml_s_log2f8_core-sse.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_s_log2f8_core.c |   28 +
>  .../fpu/multiarch/svml_s_log2f8_core_avx2.S   |  226 +++
>  sysdeps/x86_64/fpu/svml_d_log22_core.S        |   29 +
>  sysdeps/x86_64/fpu/svml_d_log24_core.S        |   29 +
>  sysdeps/x86_64/fpu/svml_d_log24_core_avx.S    |   25 +
>  sysdeps/x86_64/fpu/svml_d_log28_core.S        |   25 +
>  sysdeps/x86_64/fpu/svml_s_log2f16_core.S      |   25 +
>  sysdeps/x86_64/fpu/svml_s_log2f4_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_s_log2f8_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S   |   25 +
>  .../x86_64/fpu/test-double-libmvec-log2-avx.c |    1 +
>  .../fpu/test-double-libmvec-log2-avx2.c       |    1 +
>  .../fpu/test-double-libmvec-log2-avx512f.c    |    1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-log2.c |    3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-libmvec-log2f-avx.c |    1 +
>  .../fpu/test-float-libmvec-log2f-avx2.c       |    1 +
>  .../fpu/test-float-libmvec-log2f-avx512f.c    |    1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-log2f.c |    3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
>  50 files changed, 4208 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log22_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log22_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log22_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log24_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log24_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log24_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log28_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log28_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log28_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log22_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log24_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log24_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log28_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log2f.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 4ad584c227..73252615ca 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -230,4 +230,15 @@
>  #define __DECL_SIMD_log10f32x
>  #define __DECL_SIMD_log10f64x
>  #define __DECL_SIMD_log10f128x
> +
> +#define __DECL_SIMD_log2
> +#define __DECL_SIMD_log2f
> +#define __DECL_SIMD_log2l
> +#define __DECL_SIMD_log2f16
> +#define __DECL_SIMD_log2f32
> +#define __DECL_SIMD_log2f64
> +#define __DECL_SIMD_log2f128
> +#define __DECL_SIMD_log2f32x
> +#define __DECL_SIMD_log2f64x
> +#define __DECL_SIMD_log2f128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index f21384758a..bfe52a4666 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -130,7 +130,7 @@ __MATHCALL (logb,, (_Mdouble_ __x));
>  __MATHCALL_VEC (exp2,, (_Mdouble_ __x));
>  
>  /* Compute base-2 logarithm of X.  */
> -__MATHCALL (log2,, (_Mdouble_ __x));
> +__MATHCALL_VEC (log2,, (_Mdouble_ __x));
>  #endif
>  
>  
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index 8108a2a189..fa8b016c5d 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -55,6 +55,7 @@ GLIBC_2.35 _ZGVbN2v_exp10 F
>  GLIBC_2.35 _ZGVbN2v_exp2 F
>  GLIBC_2.35 _ZGVbN2v_expm1 F
>  GLIBC_2.35 _ZGVbN2v_log10 F
> +GLIBC_2.35 _ZGVbN2v_log2 F
>  GLIBC_2.35 _ZGVbN2v_sinh F
>  GLIBC_2.35 _ZGVbN2vv_atan2 F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
> @@ -67,6 +68,7 @@ GLIBC_2.35 _ZGVbN4v_exp10f F
>  GLIBC_2.35 _ZGVbN4v_exp2f F
>  GLIBC_2.35 _ZGVbN4v_expm1f F
>  GLIBC_2.35 _ZGVbN4v_log10f F
> +GLIBC_2.35 _ZGVbN4v_log2f F
>  GLIBC_2.35 _ZGVbN4v_sinhf F
>  GLIBC_2.35 _ZGVbN4vv_atan2f F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
> @@ -79,6 +81,7 @@ GLIBC_2.35 _ZGVcN4v_exp10 F
>  GLIBC_2.35 _ZGVcN4v_exp2 F
>  GLIBC_2.35 _ZGVcN4v_expm1 F
>  GLIBC_2.35 _ZGVcN4v_log10 F
> +GLIBC_2.35 _ZGVcN4v_log2 F
>  GLIBC_2.35 _ZGVcN4v_sinh F
>  GLIBC_2.35 _ZGVcN4vv_atan2 F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
> @@ -91,6 +94,7 @@ GLIBC_2.35 _ZGVcN8v_exp10f F
>  GLIBC_2.35 _ZGVcN8v_exp2f F
>  GLIBC_2.35 _ZGVcN8v_expm1f F
>  GLIBC_2.35 _ZGVcN8v_log10f F
> +GLIBC_2.35 _ZGVcN8v_log2f F
>  GLIBC_2.35 _ZGVcN8v_sinhf F
>  GLIBC_2.35 _ZGVcN8vv_atan2f F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
> @@ -103,6 +107,7 @@ GLIBC_2.35 _ZGVdN4v_exp10 F
>  GLIBC_2.35 _ZGVdN4v_exp2 F
>  GLIBC_2.35 _ZGVdN4v_expm1 F
>  GLIBC_2.35 _ZGVdN4v_log10 F
> +GLIBC_2.35 _ZGVdN4v_log2 F
>  GLIBC_2.35 _ZGVdN4v_sinh F
>  GLIBC_2.35 _ZGVdN4vv_atan2 F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
> @@ -115,6 +120,7 @@ GLIBC_2.35 _ZGVdN8v_exp10f F
>  GLIBC_2.35 _ZGVdN8v_exp2f F
>  GLIBC_2.35 _ZGVdN8v_expm1f F
>  GLIBC_2.35 _ZGVdN8v_log10f F
> +GLIBC_2.35 _ZGVdN8v_log2f F
>  GLIBC_2.35 _ZGVdN8v_sinhf F
>  GLIBC_2.35 _ZGVdN8vv_atan2f F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
> @@ -127,6 +133,7 @@ GLIBC_2.35 _ZGVeN16v_exp10f F
>  GLIBC_2.35 _ZGVeN16v_exp2f F
>  GLIBC_2.35 _ZGVeN16v_expm1f F
>  GLIBC_2.35 _ZGVeN16v_log10f F
> +GLIBC_2.35 _ZGVeN16v_log2f F
>  GLIBC_2.35 _ZGVeN16v_sinhf F
>  GLIBC_2.35 _ZGVeN16vv_atan2f F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
> @@ -139,6 +146,7 @@ GLIBC_2.35 _ZGVeN8v_exp10 F
>  GLIBC_2.35 _ZGVeN8v_exp2 F
>  GLIBC_2.35 _ZGVeN8v_expm1 F
>  GLIBC_2.35 _ZGVeN8v_log10 F
> +GLIBC_2.35 _ZGVeN8v_log2 F
>  GLIBC_2.35 _ZGVeN8v_sinh F
>  GLIBC_2.35 _ZGVeN8vv_atan2 F
>  GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 64e80ada7a..59d284a10a 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -106,6 +106,10 @@
>  #  define __DECL_SIMD_log10 __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_log10f
>  #  define __DECL_SIMD_log10f __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_log2
> +#  define __DECL_SIMD_log2 __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_log2f
> +#  define __DECL_SIMD_log2f __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index f5050c68af..a2ca9a203f 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -52,6 +52,8 @@
>  !GCC$ builtin (atan2f) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (log10) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (log10f) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (log2) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (log2f) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -89,3 +91,5 @@
>  !GCC$ builtin (atan2f) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (log10) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (log10f) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (log2) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (log2f) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index ba37044e9d..8d6d0915af 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -36,6 +36,7 @@ libmvec-funcs = \
>    hypot \
>    log \
>    log10 \
> +  log2 \
>    pow \
>    sin \
>    sincos \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 8beaf0736f..1b48c2d642 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -23,6 +23,7 @@ libmvec {
>      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
>      _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
>      _ZGVbN2v_log10; _ZGVcN4v_log10; _ZGVdN4v_log10; _ZGVeN8v_log10;
> +    _ZGVbN2v_log2; _ZGVcN4v_log2; _ZGVdN4v_log2; _ZGVeN8v_log2;
>      _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
>      _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
> @@ -35,6 +36,7 @@ libmvec {
>      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
>      _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
>      _ZGVbN4v_log10f; _ZGVcN8v_log10f; _ZGVdN8v_log10f; _ZGVeN16v_log10f;
> +    _ZGVbN4v_log2f; _ZGVcN8v_log2f; _ZGVdN8v_log2f; _ZGVeN16v_log2f;
>      _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
>      _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
>      _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index b0cd9d60ea..3b7f3cee6f 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -1709,6 +1709,26 @@ float: 3
>  float128: 1
>  ldouble: 1
>  
> +Function: "log2_vlen16":
> +float: 1
> +
> +Function: "log2_vlen2":
> +double: 1
> +
> +Function: "log2_vlen4":
> +double: 1
> +float: 1
> +
> +Function: "log2_vlen4_avx2":
> +double: 1
> +
> +Function: "log2_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "log2_vlen8_avx2":
> +float: 1
> +
>  Function: "log_downward":
>  float: 2
>  float128: 1
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core-sse2.S
> new file mode 100644
> index 0000000000..e0833a174b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized log2, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_log2 _ZGVbN2v_log2_sse2
> +#include "../svml_d_log22_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core.c
> new file mode 100644
> index 0000000000..6d0b5a03ca
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized log2, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_log2
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_log2, __GI__ZGVbN2v_log2, __redirect__ZGVbN2v_log2)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core_sse4.S
> new file mode 100644
> index 0000000000..22c12fdfea
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log22_core_sse4.S
> @@ -0,0 +1,1339 @@
> +/* Function log2 vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log2(x) = k - log2(Rcp) + poly_approximation(R)
> + *       log2(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dlog2_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	8208
> +#define poly_coeff                    	12320
> +#define ExpMask                       	12400
> +#define Two10                         	12416
> +#define MinNorm                       	12432
> +#define MaxNorm                       	12448
> +#define HalfMask                      	12464
> +#define One                           	12480
> +#define Threshold                     	12496
> +#define Bias                          	12512
> +#define Bias1                         	12528
> +
> +/* Lookup bias for data table __svml_dlog2_data_internal.  */
> +#define Table_Lookup_Bias               -0x405ff0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_log2_sse4)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $64, %rsp
> +
> +/* exponent bits */
> +        movaps    %xmm0, %xmm5
> +
> +/* preserve mantissa, set input exponent to 2^(-10) */
> +        movups    ExpMask+__svml_dlog2_data_internal(%rip), %xmm1
> +        psrlq     $20, %xmm5
> +        andps     %xmm0, %xmm1
> +        lea       Table_Lookup_Bias+__svml_dlog2_data_internal(%rip), %rsi
> +        orps      Two10+__svml_dlog2_data_internal(%rip), %xmm1
> +
> +/* check range */
> +        movaps    %xmm0, %xmm8
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        cvtpd2ps  %xmm1, %xmm2
> +        cmpltpd   MinNorm+__svml_dlog2_data_internal(%rip), %xmm8
> +        movlhps   %xmm2, %xmm2
> +        movaps    %xmm0, %xmm7
> +        rcpps     %xmm2, %xmm3
> +        cmpnlepd  MaxNorm+__svml_dlog2_data_internal(%rip), %xmm7
> +        cvtps2pd  %xmm3, %xmm12
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        movups    .FLT_11(%rip), %xmm4
> +        orps      %xmm7, %xmm8
> +        addpd     %xmm4, %xmm12
> +
> +/* combine and get argument value range mask */
> +        movmskpd  %xmm8, %edx
> +
> +/* argument reduction */
> +        movups    HalfMask+__svml_dlog2_data_internal(%rip), %xmm9
> +        subpd     %xmm4, %xmm12
> +        andps     %xmm1, %xmm9
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        movaps    %xmm12, %xmm10
> +        subpd     %xmm9, %xmm1
> +        mulpd     %xmm12, %xmm9
> +        mulpd     %xmm12, %xmm1
> +        subpd     One+__svml_dlog2_data_internal(%rip), %xmm9
> +        addpd     %xmm9, %xmm1
> +
> +/* polynomial */
> +        movups    poly_coeff+__svml_dlog2_data_internal(%rip), %xmm14
> +        psrlq     $40, %xmm10
> +        mulpd     %xmm1, %xmm14
> +        movd      %xmm10, %eax
> +        pshufd    $2, %xmm10, %xmm11
> +        movaps    %xmm1, %xmm10
> +        movups    poly_coeff+32+__svml_dlog2_data_internal(%rip), %xmm15
> +        mulpd     %xmm1, %xmm10
> +        addpd     poly_coeff+16+__svml_dlog2_data_internal(%rip), %xmm14
> +        mulpd     %xmm1, %xmm15
> +        mulpd     %xmm10, %xmm14
> +        addpd     poly_coeff+48+__svml_dlog2_data_internal(%rip), %xmm15
> +        movd      %xmm11, %ecx
> +        movups    poly_coeff+64+__svml_dlog2_data_internal(%rip), %xmm11
> +        addpd     %xmm14, %xmm15
> +        mulpd     %xmm1, %xmm11
> +        mulpd     %xmm15, %xmm10
> +
> +/* exponent */
> +        movups    Threshold+__svml_dlog2_data_internal(%rip), %xmm13
> +        cmpltpd   %xmm12, %xmm13
> +        addpd     %xmm10, %xmm11
> +        pshufd    $221, %xmm5, %xmm6
> +
> +/* biased exponent in DP format */
> +        cvtdq2pd  %xmm6, %xmm3
> +        movslq    %eax, %rax
> +        movslq    %ecx, %rcx
> +        andps     Bias+__svml_dlog2_data_internal(%rip), %xmm13
> +        orps      Bias1+__svml_dlog2_data_internal(%rip), %xmm13
> +        movsd     (%rsi,%rax), %xmm2
> +        movhpd    (%rsi,%rcx), %xmm2
> +        subpd     %xmm13, %xmm3
> +
> +/* reconstruction */
> +        addpd     %xmm11, %xmm2
> +        addpd     %xmm2, %xmm3
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm3, %xmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm3, 48(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm3
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 xmm3
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      log2@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVbN2v_log2_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dlog2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
> +        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(16)) VUINT32 poly_coeff[5][2][2];
> +        __declspec(align(16)) VUINT32 ExpMask[2][2];
> +        __declspec(align(16)) VUINT32 Two10[2][2];
> +        __declspec(align(16)) VUINT32 MinNorm[2][2];
> +        __declspec(align(16)) VUINT32 MaxNorm[2][2];
> +        __declspec(align(16)) VUINT32 HalfMask[2][2];
> +        __declspec(align(16)) VUINT32 One[2][2];
> +        __declspec(align(16)) VUINT32 Threshold[2][2];
> +        __declspec(align(16)) VUINT32 Bias[2][2];
> +        __declspec(align(16)) VUINT32 Bias1[2][2];
> +} __svml_dlog2_data_internal;
> +#endif
> +__svml_dlog2_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc08ff00000000000, 0x0000000000000000
> +        .quad 0xc08ff0040038c920, 0x3d52bfc81744e999
> +        .quad 0xc08ff007ff0f0190, 0xbd59b2cedc63c895
> +        .quad 0xc08ff00bfc839e88, 0xbd28e365e6741d71
> +        .quad 0xc08ff00ff8979428, 0x3d4027998f69a77d
> +        .quad 0xc08ff013f34bd5a0, 0x3d5dd2cb33fe6a89
> +        .quad 0xc08ff017eca15518, 0xbd526514cdf2c019
> +        .quad 0xc08ff01be49903d8, 0xbd44bfeeba165e04
> +        .quad 0xc08ff01fdb33d218, 0xbd3fa79ee110cec3
> +        .quad 0xc08ff023d072af20, 0xbd4eebb642c7fd60
> +        .quad 0xc08ff027c4568948, 0x3d429b13d7093443
> +        .quad 0xc08ff02bb6e04de8, 0x3d50f346bd36551e
> +        .quad 0xc08ff02fa810e968, 0xbd5020bb662f1536
> +        .quad 0xc08ff03397e94750, 0x3d5de76b56340995
> +        .quad 0xc08ff037866a5218, 0x3d58065ff3304090
> +        .quad 0xc08ff03b7394f360, 0x3d561fc9322fb785
> +        .quad 0xc08ff03f5f6a13d0, 0x3d0abecd17d0d778
> +        .quad 0xc08ff04349ea9b28, 0xbd588f3ad0ce4d44
> +        .quad 0xc08ff04733177040, 0xbd4454ba4ac5f44d
> +        .quad 0xc08ff04b1af178f8, 0xbd556f78faaa0887
> +        .quad 0xc08ff04f01799a58, 0x3d49db8976de7469
> +        .quad 0xc08ff052e6b0b868, 0xbd5cdb6fce17ef00
> +        .quad 0xc08ff056ca97b668, 0xbd576de8c0412f09
> +        .quad 0xc08ff05aad2f76a0, 0x3d30142c7ec6475c
> +        .quad 0xc08ff05e8e78da70, 0xbd1e685afc26de72
> +        .quad 0xc08ff0626e74c260, 0xbd40b64c954078a3
> +        .quad 0xc08ff0664d240e10, 0xbd5fcde393462d7d
> +        .quad 0xc08ff06a2a879c48, 0xbd537245eeeecc53
> +        .quad 0xc08ff06e06a04ae8, 0x3d4ac306eb47b436
> +        .quad 0xc08ff071e16ef6e8, 0xbd5a1fd9d3758f6b
> +        .quad 0xc08ff075baf47c80, 0x3d2401fbaaa67e3c
> +        .quad 0xc08ff0799331b6f0, 0x3d4f8dbef47a4d53
> +        .quad 0xc08ff07d6a2780a8, 0x3d51215e0abb42d1
> +        .quad 0xc08ff0813fd6b340, 0x3d57ce6249eddb35
> +        .quad 0xc08ff08514402770, 0xbd38a803c7083a25
> +        .quad 0xc08ff088e764b528, 0x3d42218beba5073e
> +        .quad 0xc08ff08cb9453370, 0x3d447b66f1c6248f
> +        .quad 0xc08ff09089e27880, 0xbd53d9297847e995
> +        .quad 0xc08ff094593d59c8, 0xbd12b6979cc77aa9
> +        .quad 0xc08ff0982756abd0, 0xbd55308545ecd702
> +        .quad 0xc08ff09bf42f4260, 0xbd578fa97c3b936f
> +        .quad 0xc08ff09fbfc7f068, 0xbd41828408ce869d
> +        .quad 0xc08ff0a38a218808, 0x3d555da6ce7251a6
> +        .quad 0xc08ff0a7533cda88, 0xbd41f3cd14bfcb02
> +        .quad 0xc08ff0ab1b1ab878, 0xbd1f028da6bf1852
> +        .quad 0xc08ff0aee1bbf188, 0xbd4cf04de3267f54
> +        .quad 0xc08ff0b2a72154a8, 0xbd4556e47019db10
> +        .quad 0xc08ff0b66b4baff8, 0x3d1e7ba00b15fbe4
> +        .quad 0xc08ff0ba2e3bd0d0, 0x3d5bfde1c52c2f28
> +        .quad 0xc08ff0bdeff283b8, 0x3d48d63fe20ee5d6
> +        .quad 0xc08ff0c1b0709480, 0x3d57f551980838ff
> +        .quad 0xc08ff0c56fb6ce20, 0xbd4189091f293c81
> +        .quad 0xc08ff0c92dc5fae0, 0x3d4d549f05f06169
> +        .quad 0xc08ff0ccea9ee428, 0xbd5982466074e1e3
> +        .quad 0xc08ff0d0a64252b8, 0xbd5d30a6b16c0e4b
> +        .quad 0xc08ff0d460b10e80, 0xbd3138bf3b51a201
> +        .quad 0xc08ff0d819ebdea8, 0xbd454e680c0801d6
> +        .quad 0xc08ff0dbd1f389a8, 0x3d584db361385926
> +        .quad 0xc08ff0df88c8d520, 0xbd564f2252a82c03
> +        .quad 0xc08ff0e33e6c8610, 0xbd5c78c35ed5d034
> +        .quad 0xc08ff0e6f2df60a8, 0xbd52eb9f29ca3d75
> +        .quad 0xc08ff0eaa6222860, 0x3d5340c0c01b5ff8
> +        .quad 0xc08ff0ee58359fe8, 0x3d10c2acaffa64b6
> +        .quad 0xc08ff0f2091a8948, 0xbd3fced311301ebe
> +        .quad 0xc08ff0f5b8d1a5c8, 0x3d41ee5d591af30b
> +        .quad 0xc08ff0f9675bb5f0, 0x3d4873546b0e668c
> +        .quad 0xc08ff0fd14b97998, 0x3d5a99928177a119
> +        .quad 0xc08ff100c0ebafd8, 0x3d378ead132adcac
> +        .quad 0xc08ff1046bf31720, 0x3d51a538bc597d48
> +        .quad 0xc08ff10815d06d18, 0xbd540ee2f35efd7e
> +        .quad 0xc08ff10bbe846ec8, 0xbd59cf94753adacc
> +        .quad 0xc08ff10f660fd878, 0xbd5201a3d6862895
> +        .quad 0xc08ff1130c7365c0, 0x3d383e25d0822d03
> +        .quad 0xc08ff116b1afd180, 0xbd0b7389bbea8f7b
> +        .quad 0xc08ff11a55c5d5f0, 0xbd4df278087a6617
> +        .quad 0xc08ff11df8b62c98, 0xbd48daeb8ec01e26
> +        .quad 0xc08ff1219a818e50, 0x3d57c9312e0a14da
> +        .quad 0xc08ff1253b28b330, 0xbd5f0fbc0e4d507e
> +        .quad 0xc08ff128daac52c8, 0xbd222afdee008687
> +        .quad 0xc08ff12c790d23d8, 0x3d17c71747bcef8b
> +        .quad 0xc08ff130164bdc88, 0x3d5d69cfd051af50
> +        .quad 0xc08ff133b2693248, 0x3d59dff064e9433a
> +        .quad 0xc08ff1374d65d9e8, 0x3d4f71a30db3240b
> +        .quad 0xc08ff13ae7428788, 0xbd5e56afa9524606
> +        .quad 0xc08ff13e7fffeeb0, 0xbd44acd84e6f8518
> +        .quad 0xc08ff142179ec228, 0xbd519845ade5e121
> +        .quad 0xc08ff145ae1fb420, 0xbd5b3b4a38ddec70
> +        .quad 0xc08ff14943837620, 0xbd5ea4bb5bc137c7
> +        .quad 0xc08ff14cd7cab910, 0x3d5610f3bf8eb6ce
> +        .quad 0xc08ff1506af62d20, 0x3d57b1170d6184cf
> +        .quad 0xc08ff153fd0681f0, 0x3d5791a688a3660e
> +        .quad 0xc08ff1578dfc6678, 0x3d5d41ecf8abac2e
> +        .quad 0xc08ff15b1dd88908, 0x3cf0bd995d64d573
> +        .quad 0xc08ff15eac9b9758, 0xbd5e3653cd796d01
> +        .quad 0xc08ff1623a463e80, 0xbd597573005ef2d8
> +        .quad 0xc08ff165c6d92af0, 0xbd4ee222d6439c41
> +        .quad 0xc08ff16952550880, 0x3d5913b845e75950
> +        .quad 0xc08ff16cdcba8258, 0xbd558e7ba239077e
> +        .quad 0xc08ff170660a4328, 0x3d5a0e174a2cae66
> +        .quad 0xc08ff173ee44f4d8, 0x3d22b8db103db712
> +        .quad 0xc08ff177756b40d8, 0x3d5cc610480853c4
> +        .quad 0xc08ff17afb7dcfe0, 0xbd304a8bc84e5c0f
> +        .quad 0xc08ff17e807d4a28, 0x3d3639d185da5f7d
> +        .quad 0xc08ff182046a5738, 0xbd534705d06d788f
> +        .quad 0xc08ff18587459e10, 0xbd540d25b28a51fd
> +        .quad 0xc08ff189090fc510, 0xbd02d804afa7080a
> +        .quad 0xc08ff18c89c97200, 0x3d5f2a5d305818ba
> +        .quad 0xc08ff19009734a08, 0xbd3a602e9d05c3e4
> +        .quad 0xc08ff193880df1d0, 0xbd533d6fdcd54875
> +        .quad 0xc08ff197059a0d60, 0x3d24eaf0a9490202
> +        .quad 0xc08ff19a82184020, 0xbd5685666d98eb59
> +        .quad 0xc08ff19dfd892cf8, 0xbd509f8745f0868b
> +        .quad 0xc08ff1a177ed7630, 0xbd2dcba340a9d268
> +        .quad 0xc08ff1a4f145bd80, 0x3d4916fcd0331266
> +        .quad 0xc08ff1a86992a408, 0xbd548cd033a49073
> +        .quad 0xc08ff1abe0d4ca68, 0xbd5252f40e5df1a2
> +        .quad 0xc08ff1af570cd0a0, 0xbd541d623bd02248
> +        .quad 0xc08ff1b2cc3b5628, 0xbd258dc48235c071
> +        .quad 0xc08ff1b64060f9e0, 0xbd4b4bd8f02ed3f2
> +        .quad 0xc08ff1b9b37e5a28, 0x3d4e8d20a88cd0a2
> +        .quad 0xc08ff1bd259414c0, 0x3d3b669b6380bc55
> +        .quad 0xc08ff1c096a2c6e8, 0xbd45d54159d51094
> +        .quad 0xc08ff1c406ab0d58, 0x3d59f684ffbca44d
> +        .quad 0xc08ff1c775ad8428, 0x3d543b1b1d508399
> +        .quad 0xc08ff1cae3aac6f8, 0x3d5c30953a12fc6e
> +        .quad 0xc08ff1ce50a370d0, 0xbd1763b04f9aad5f
> +        .quad 0xc08ff1d1bc981c40, 0x3d573c6fa54f46c2
> +        .quad 0xc08ff1d527896338, 0x3d48ccfb9ffd7455
> +        .quad 0xc08ff1d89177df30, 0x3d42756f80d6f7ce
> +        .quad 0xc08ff1dbfa642910, 0xbd3c2bfbc353c5a5
> +        .quad 0xc08ff1df624ed940, 0x3d1d6064f5dc380b
> +        .quad 0xc08ff1e2c9388798, 0x3ce327c6b30711cf
> +        .quad 0xc08ff1e62f21cb70, 0x3d140aa9546525bc
> +        .quad 0xc08ff1e9940b3b98, 0xbd15c1ff43c21863
> +        .quad 0xc08ff1ecf7f56e60, 0x3d590ba680120498
> +        .quad 0xc08ff1f05ae0f988, 0x3d5390c6b62dff50
> +        .quad 0xc08ff1f3bcce7258, 0x3d4da0c90878457f
> +        .quad 0xc08ff1f71dbe6d90, 0x3d30697edc85b98c
> +        .quad 0xc08ff1fa7db17f70, 0x3d04d81188510a79
> +        .quad 0xc08ff1fddca83bb0, 0xbd5f2ddc983ce25c
> +        .quad 0xc08ff2013aa33598, 0x3d46c22f0fae6844
> +        .quad 0xc08ff20497a2ffd0, 0xbd53359b714c3d03
> +        .quad 0xc08ff207f3a82ca0, 0xbd4aefaa5524f88b
> +        .quad 0xc08ff20b4eb34dc0, 0x3d39bf4a4a73d01d
> +        .quad 0xc08ff20ea8c4f468, 0x3d44217befdb12e6
> +        .quad 0xc08ff21201ddb158, 0x3d5219b281d4b6f8
> +        .quad 0xc08ff21559fe14c8, 0xbd5e3b123373d370
> +        .quad 0xc08ff218b126ae88, 0xbd59b525a6edc3cb
> +        .quad 0xc08ff21c07580dd8, 0xbd4b494e7737c4dc
> +        .quad 0xc08ff21f5c92c180, 0xbd3989b7d67e3e54
> +        .quad 0xc08ff222b0d757d0, 0x3d486c8f098ad3cf
> +        .quad 0xc08ff22604265e98, 0x3d5254956d8e15b2
> +        .quad 0xc08ff22956806330, 0x3d3f14730a362959
> +        .quad 0xc08ff22ca7e5f278, 0xbd40e8ed02e32ea1
> +        .quad 0xc08ff22ff85798d8, 0xbd40fb2b9b1e0261
> +        .quad 0xc08ff23347d5e238, 0xbd5bfeb1e13c8bc3
> +        .quad 0xc08ff23696615a18, 0x3d5b891f041e037b
> +        .quad 0xc08ff239e3fa8b60, 0xbd36255027582bb9
> +        .quad 0xc08ff23d30a200a8, 0x3d56bb5a92a55361
> +        .quad 0xc08ff2407c5843f0, 0xbd31902fb4417244
> +        .quad 0xc08ff243c71dded8, 0xbd5a8a7c3c4a2cc6
> +        .quad 0xc08ff24710f35a88, 0xbd23be1be6941016
> +        .quad 0xc08ff24a59d93fa8, 0x3d55c85afafa1d46
> +        .quad 0xc08ff24da1d01668, 0xbd5b4b05a0adcbf1
> +        .quad 0xc08ff250e8d866a0, 0x3d134d191476f74b
> +        .quad 0xc08ff2542ef2b798, 0x3d5e78ce963395e1
> +        .quad 0xc08ff257741f9028, 0x3d3f9219a8f57c17
> +        .quad 0xc08ff25ab85f76c8, 0x3d5cfc6f47ac691b
> +        .quad 0xc08ff25dfbb2f168, 0x3d4ab3b720b5ca71
> +        .quad 0xc08ff2613e1a8598, 0x3d54a4ab99feb71a
> +        .quad 0xc08ff2647f96b868, 0xbd42daa69d79d724
> +        .quad 0xc08ff267c0280e88, 0xbd344d9115018f45
> +        .quad 0xc08ff26affcf0c28, 0xbd56673e143d2ac0
> +        .quad 0xc08ff26e3e8c3518, 0x3d3aac889e91c638
> +        .quad 0xc08ff2717c600ca8, 0x3d4cf65b41d006e7
> +        .quad 0xc08ff274b94b15c0, 0xbd4c821320391e76
> +        .quad 0xc08ff277f54dd2e8, 0x3d51abd6e2ddc2a1
> +        .quad 0xc08ff27b3068c620, 0xbd2f1bdd1264e703
> +        .quad 0xc08ff27e6a9c7110, 0xbd58437b4f032f15
> +        .quad 0xc08ff281a3e954f0, 0xbd4f8e063b069a7d
> +        .quad 0xc08ff284dc4ff288, 0x3d5276d0723a662a
> +        .quad 0xc08ff28813d0ca28, 0xbd5731f7c6d8f6eb
> +        .quad 0xc08ff28b4a6c5bd0, 0xbd58b587f08307ec
> +        .quad 0xc08ff28e80232708, 0x3d57f19a7a352baf
> +        .quad 0xc08ff291b4f5aae0, 0x3d570d99aff32790
> +        .quad 0xc08ff294e8e46610, 0x3d4efafaad4f59db
> +        .quad 0xc08ff2981befd6e0, 0xbd41eb1728371564
> +        .quad 0xc08ff29b4e187b38, 0x3d458465b4e080d7
> +        .quad 0xc08ff29e7f5ed088, 0x3d46acb4a035a820
> +        .quad 0xc08ff2a1afc353e0, 0xbd39fc68238dd5d3
> +        .quad 0xc08ff2a4df4681f0, 0x3d526d90c6750dde
> +        .quad 0xc08ff2a80de8d6f0, 0x3d48505c598278fd
> +        .quad 0xc08ff2ab3baacec0, 0x3d520fece8e148e8
> +        .quad 0xc08ff2ae688ce4d0, 0x3d14f7bf38646243
> +        .quad 0xc08ff2b1948f9430, 0xbd5aa5f693a627df
> +        .quad 0xc08ff2b4bfb35790, 0xbd4725d8e6280861
> +        .quad 0xc08ff2b7e9f8a930, 0x3d482e0765d44bda
> +        .quad 0xc08ff2bb136002e8, 0xbd523d745da75cde
> +        .quad 0xc08ff2be3be9de40, 0xbd32e50b4191ef73
> +        .quad 0xc08ff2c16396b448, 0xbd490856dfe073b2
> +        .quad 0xc08ff2c48a66fdb8, 0xbd512b526137db4d
> +        .quad 0xc08ff2c7b05b32e8, 0x3d5bfcdc71b36585
> +        .quad 0xc08ff2cad573cbb8, 0xbd2c24f2afddb377
> +        .quad 0xc08ff2cdf9b13fc0, 0xbd5ea60d06da12f6
> +        .quad 0xc08ff2d11d140630, 0xbd582f2f9e256dc5
> +        .quad 0xc08ff2d43f9c95d0, 0xbd4411c269523864
> +        .quad 0xc08ff2d7614b6508, 0xbd41107eeb7e1093
> +        .quad 0xc08ff2da8220e9e8, 0x3d5a4aa491710eda
> +        .quad 0xc08ff2dda21d9a10, 0x3d46e50a14550378
> +        .quad 0xc08ff2e0c141ead0, 0xbd4881e3bd846de9
> +        .quad 0xc08ff2e3df8e5118, 0xbd46d93437bd399d
> +        .quad 0xc08ff2e6fd034170, 0xbd5b4ef1e9713a4c
> +        .quad 0xc08ff2ea19a13010, 0x3d4a0e31ed25b3ef
> +        .quad 0xc08ff2ed356890b8, 0xbd5a7a560db90113
> +        .quad 0xc08ff2f05059d6f0, 0x3d51f5bb5f9072c9
> +        .quad 0xc08ff2f36a7575c0, 0x3d5ed5225350a585
> +        .quad 0xc08ff2f683bbdfe0, 0xbd1c9363d9e745db
> +        .quad 0xc08ff2f99c2d87b8, 0x3d329c788e376e0d
> +        .quad 0xc08ff2fcb3cadf40, 0xbd59eb5d29918de0
> +        .quad 0xc08ff2ffca945828, 0xbd4a86aac097a06b
> +        .quad 0xc08ff302e08a63b8, 0x3d541c2c97e8b4d1
> +        .quad 0xc08ff305f5ad72d8, 0x3d43c95dec31821b
> +        .quad 0xc08ff30909fdf620, 0xbd590abed3d72738
> +        .quad 0xc08ff30c1d7c5dd8, 0x3d4caefdad90e913
> +        .quad 0xc08ff30f302919d0, 0xbd4f7ed5e1dcb170
> +        .quad 0xc08ff312420499a0, 0x3d3c590edf8c3407
> +        .quad 0xc08ff315530f4c70, 0x3d5477d46ce838e1
> +        .quad 0xc08ff3186349a118, 0x3d5e4b00c511fa78
> +        .quad 0xc08ff31b72b40610, 0xbd54333e5a0c1658
> +        .quad 0xc08ff31e814ee990, 0x3d25300b88bfa10a
> +        .quad 0xc08ff3218f1ab958, 0xbd5bfbd520249ed7
> +        .quad 0xc08ff3249c17e2f0, 0x3d436b1cdba645b7
> +        .quad 0xc08ff327a846d368, 0xbd5cb667c2f86eaa
> +        .quad 0xc08ff32ab3a7f7a0, 0x3d5334d06a920d5f
> +        .quad 0xc08ff32dbe3bbbf8, 0xbd5407602ab64243
> +        .quad 0xc08ff330c8028ca0, 0xbd52b12c9cc82316
> +        .quad 0xc08ff333d0fcd560, 0x3d158d7dd801324b
> +        .quad 0xc08ff336d92b01a8, 0xbd38b55deae69564
> +        .quad 0xc08ff339e08d7ca0, 0x3d4a92d51dc43d43
> +        .quad 0xc08ff33ce724b110, 0x3d5455afbb5de008
> +        .quad 0xc08ff33fecf10970, 0x3d3b65694b6f87fb
> +        .quad 0xc08ff342f1f2efe8, 0xbd3afb8ccc1260eb
> +        .quad 0xc08ff345f62ace50, 0x3d59c98f7ec71b79
> +        .quad 0xc08ff348f9990e18, 0xbd5238294ff3846d
> +        .quad 0xc08ff34bfc3e1880, 0x3d4deba7087bbf7b
> +        .quad 0xc08ff34efe1a5650, 0xbd573e25d2d308e5
> +        .quad 0xc08ff351ff2e3020, 0xbd44bc302ffa76fb
> +        .quad 0xc08ff354ff7a0e20, 0xbd2cad65891df000
> +        .quad 0xc08ff357fefe5838, 0x3d4b4fe326c05a8a
> +        .quad 0xc08ff35afdbb75f8, 0x3d0fb5680f67649b
> +        .quad 0xc08ff35dfbb1cea8, 0xbd4af509a9977e57
> +        .quad 0xc08ff360f8e1c940, 0x3cea69221cfb0ad6
> +        .quad 0xc08ff363f54bcc60, 0x3d3d116c159fead5
> +        .quad 0xc08ff366f0f03e58, 0xbd5e64e8bff70d5e
> +        .quad 0xc08ff369ebcf8538, 0xbd5cc32ce5effb96
> +        .quad 0xc08ff36ce5ea06b8, 0x3d57bbe811e4fbda
> +        .quad 0xc08ff36fdf402830, 0xbcf46d4595033678
> +        .quad 0xc08ff372d7d24ec8, 0x3d4c4bbec857b9fc
> +        .quad 0xc08ff375cfa0df40, 0xbd59d3f339613a2d
> +        .quad 0xc08ff378c6ac3e28, 0x3d58408e1bcb4e24
> +        .quad 0xc08ff37bbcf4cfa0, 0x3d5fdb793dc8e643
> +        .quad 0xc08ff37eb27af788, 0xbd5f0d884b401f1e
> +        .quad 0xc08ff381a73f1988, 0xbd5a7ed37e2c50b4
> +        .quad 0xc08ff3849b4198e8, 0x3d5b14c1f630b2af
> +        .quad 0xc08ff3878e82d898, 0x3d505a9abef02aff
> +        .quad 0xc08ff38a81033b50, 0xbd4a9bbd51a7d1c4
> +        .quad 0xc08ff38d72c32380, 0x3d4783623464f80e
> +        .quad 0xc08ff39063c2f338, 0xbd0e2d78f68abcc7
> +        .quad 0xc08ff39354030c50, 0x3d3e604763e782cb
> +        .quad 0xc08ff3964383d048, 0xbd4514f0840b6f59
> +        .quad 0xc08ff3993245a060, 0xbd5488753d6035a4
> +        .quad 0xc08ff39c2048dd90, 0x3d5ccc099b5ff97d
> +        .quad 0xc08ff39f0d8de870, 0x3d454ada83325c69
> +        .quad 0xc08ff3a1fa152168, 0x3d1e4b27fb754eb1
> +        .quad 0xc08ff3a4e5dee890, 0x3d58c67819ead583
> +        .quad 0xc08ff3a7d0eb9da8, 0xbd536d02e85d644b
> +        .quad 0xc08ff3aabb3ba048, 0x3d5f510ab9e7c184
> +        .quad 0xc08ff3ada4cf4f98, 0x3d557bc5b296d5f5
> +        .quad 0xc08ff3b08da70a90, 0xbd48893b8f7f52c9
> +        .quad 0xc08ff3b375c32fe8, 0x3d5ca0b69a37d601
> +        .quad 0xc08ff3b65d241df0, 0xbd519c57fff86872
> +        .quad 0xc08ff3b943ca32d8, 0x3d048da0e3a8c3c3
> +        .quad 0xc08ff3bc29b5cc68, 0xbd5dd05e06ec07d0
> +        .quad 0xc08ff3bf0ee74840, 0x3d56c52a5c8015db
> +        .quad 0xc08ff3c1f35f0398, 0x3d54e1dba9930bed
> +        .quad 0xc08ff3c4d71d5b78, 0x3d2c5f679a7932b7
> +        .quad 0xc08ff3c7ba22aca0, 0xbd3f77628aa1aed8
> +        .quad 0xc08ff3cd7e03ac60, 0xbd5cc8a22f1d8591
> +        .quad 0xc08ff3d33f04e360, 0x3d4ae09463e13f6f
> +        .quad 0xc08ff3d8fd292dc8, 0x3d42736efbec3922
> +        .quad 0xc08ff3deb8736390, 0xbce0324f8d149b09
> +        .quad 0xc08ff3e470e65870, 0xbd52089e4b8dd900
> +        .quad 0xc08ff3ea2684dbf0, 0xbd5f8e9d5dea127f
> +        .quad 0xc08ff3efd951b970, 0xbd4b60d79db026b1
> +        .quad 0xc08ff3f5894fb828, 0x3d45ff1d6cea2c52
> +        .quad 0xc08ff3fb36819b38, 0x3d5d56022cd7f5b2
> +        .quad 0xc08ff400e0ea21a8, 0xbd58d63f09907b27
> +        .quad 0xc08ff406888c0690, 0xbd4ce6ea362f7ce0
> +        .quad 0xc08ff40c2d6a00f0, 0x3d519fc9ad2ef3ab
> +        .quad 0xc08ff411cf86c3c8, 0xbd55fc89e7b55f20
> +        .quad 0xc08ff4176ee4fe40, 0xbd53229ca791d9be
> +        .quad 0xc08ff41d0b875b88, 0x3d5e7733e6fb23d1
> +        .quad 0xc08ff422a57082e0, 0x3d5871413696b637
> +        .quad 0xc08ff4283ca317c0, 0x3d4b118aa7f493b9
> +        .quad 0xc08ff42dd121b9c8, 0x3d4bdf3692763b50
> +        .quad 0xc08ff43362ef04c8, 0x3d4867e17476dd63
> +        .quad 0xc08ff438f20d90c8, 0xbd5d49b741c778f3
> +        .quad 0xc08ff43e7e7ff228, 0x3d59ac35724f01e3
> +        .quad 0xc08ff4440848b968, 0xbd5251ccdc49432d
> +        .quad 0xc08ff4498f6a7388, 0x3d56cf153ebc9f07
> +        .quad 0xc08ff44f13e7a9b8, 0x3d503b7a697a659c
> +        .quad 0xc08ff45495c2e198, 0xbd5fa03da8acd872
> +        .quad 0xc08ff45a14fe9d38, 0xbd5e6cfb0b5c38fc
> +        .quad 0xc08ff45f919d5b08, 0x3d468b1f1269f1cf
> +        .quad 0xc08ff4650ba195e0, 0xbd313a3a8f72c0f3
> +        .quad 0xc08ff46a830dc528, 0x3d205d31eb8d2bd4
> +        .quad 0xc08ff46ff7e45cb8, 0xbd56cb8ddf5d4a90
> +        .quad 0xc08ff4756a27cd00, 0x3d272c2d46acdcbf
> +        .quad 0xc08ff47ad9da82e8, 0xbd4946efab7a989d
> +        .quad 0xc08ff48046fee800, 0xbd23fabe48cf933c
> +        .quad 0xc08ff485b1976268, 0x3d4f03b099d80f79
> +        .quad 0xc08ff48b19a654e0, 0x3d4fe0c35ab7e9b5
> +        .quad 0xc08ff4907f2e1ed0, 0xbd54b4843f34fe09
> +        .quad 0xc08ff495e2311c58, 0xbd5dfa6541236a64
> +        .quad 0xc08ff49b42b1a648, 0x3d56fd2c8c418cbb
> +        .quad 0xc08ff4a0a0b21218, 0x3d5e687ef208418a
> +        .quad 0xc08ff4a5fc34b210, 0x3d4a671ce14c5521
> +        .quad 0xc08ff4ab553bd540, 0x3d419d0202e3cd96
> +        .quad 0xc08ff4b0abc9c780, 0x3d576b941a895781
> +        .quad 0xc08ff4b5ffe0d170, 0xbd4ea96d88cd1a30
> +        .quad 0xc08ff4bb518338a0, 0x3d4d6b405bd43ba6
> +        .quad 0xc08ff4c0a0b33f60, 0xbcf03382150a56b7
> +        .quad 0xc08ff4c5ed7324f8, 0xbd400df96beb0937
> +        .quad 0xc08ff4cb37c52590, 0xbd5c161714cdebd5
> +        .quad 0xc08ff4d07fab7a48, 0xbd333e8eda1a8e79
> +        .quad 0xc08ff4d5c5285928, 0x3d53aba20381d59f
> +        .quad 0xc08ff4db083df530, 0xbd45e9b07af4e77c
> +        .quad 0xc08ff4e048ee7e70, 0xbd533cfdb78a8c41
> +        .quad 0xc08ff4e5873c21f0, 0xbd5d9b87f4d283f2
> +        .quad 0xc08ff4eac32909c8, 0xbd53a677deee97fa
> +        .quad 0xc08ff4effcb75d18, 0xbd5afd9f5dedc208
> +        .quad 0xc08ff4f533e94020, 0x3ce9dd794d20ab77
> +        .quad 0xc08ff4fa68c0d428, 0xbd5eeae84ba1cbf1
> +        .quad 0xc08ff4ff9b4037b0, 0xbd4f4451587282c8
> +        .quad 0xc08ff504cb698648, 0xbd4a1fa15087e717
> +        .quad 0xc08ff509f93ed8b0, 0xbd5f2f0042b9331a
> +        .quad 0xc08ff50f24c244e0, 0xbd2c2389f8e86341
> +        .quad 0xc08ff5144df5ddf0, 0xbd556fcb7b48f200
> +        .quad 0xc08ff51974dbb448, 0x3d43ba060aa69038
> +        .quad 0xc08ff51e9975d578, 0x3d477ef38ca20229
> +        .quad 0xc08ff523bbc64c60, 0x3d49bcaf1aa4168a
> +        .quad 0xc08ff528dbcf2120, 0xbd51c5609b60687e
> +        .quad 0xc08ff52df9925930, 0xbd51691708d22ce7
> +        .quad 0xc08ff5331511f750, 0x3d30d05c98ecb3d1
> +        .quad 0xc08ff5382e4ffb90, 0xbd423adb056dd244
> +        .quad 0xc08ff53d454e6368, 0xbd3663607042da50
> +        .quad 0xc08ff5425a0f29a8, 0x3d42655d3c6187a6
> +        .quad 0xc08ff5476c944680, 0xbd028c958ae09d20
> +        .quad 0xc08ff54c7cdfaf90, 0xbd436eaf17756653
> +        .quad 0xc08ff5518af357e8, 0x3d5fbbbee66f8d24
> +        .quad 0xc08ff55696d12ff0, 0xbd5d93b389497880
> +        .quad 0xc08ff55ba07b25b0, 0xbd43ff8ff777f337
> +        .quad 0xc08ff560a7f32488, 0xbcf3568803ec82a4
> +        .quad 0xc08ff565ad3b1560, 0xbd50c83eba5cc7ea
> +        .quad 0xc08ff56ab054deb0, 0x3d5becc2411500b7
> +        .quad 0xc08ff56fb1426458, 0xbd5dac964ffa8b83
> +        .quad 0xc08ff574b00587f0, 0x3d1d82f6cc82e69f
> +        .quad 0xc08ff579aca02878, 0xbd34767c0d40542c
> +        .quad 0xc08ff57ea7142298, 0xbd52d28e996ed2ce
> +        .quad 0xc08ff5839f635090, 0xbd432a85d337086d
> +        .quad 0xc08ff588958f8a38, 0x3d512b06ec20c7fd
> +        .quad 0xc08ff58d899aa500, 0xbd47e2147555e10b
> +        .quad 0xc08ff5927b867410, 0xbd4d84480a1b301d
> +        .quad 0xc08ff5976b54c830, 0x3d5622146f3a51bd
> +        .quad 0xc08ff59c59076fc8, 0x3d46d485c5f9c392
> +        .quad 0xc08ff5a144a03700, 0xbd4562714549f4fd
> +        .quad 0xc08ff5a62e20e7b8, 0x3d541ab67e365a63
> +        .quad 0xc08ff5ab158b4970, 0xbd5b0855668b2369
> +        .quad 0xc08ff5affae12188, 0x3d27de1bc2ed4dd8
> +        .quad 0xc08ff5b4de243300, 0x3d40f2592d5ed454
> +        .quad 0xc08ff5b9bf563ea8, 0xbd4ee2f8ba7b3e9e
> +        .quad 0xc08ff5be9e790320, 0xbd3c2214335c2164
> +        .quad 0xc08ff5c37b8e3cc8, 0x3d30745623ab1fd9
> +        .quad 0xc08ff5c85697a5d0, 0xbd326c8fb0ffde38
> +        .quad 0xc08ff5cd2f96f640, 0xbd4c83277493b0bc
> +        .quad 0xc08ff5d2068de3f8, 0x3d39bb1655e6e5ba
> +        .quad 0xc08ff5d6db7e22a8, 0x3d403170b47a5559
> +        .quad 0xc08ff5dbae6963e8, 0x3d5801ddf1edc325
> +        .quad 0xc08ff5e07f515728, 0x3d4b2704c46fe064
> +        .quad 0xc08ff5e54e37a9c8, 0x3d5a16e99ed6cd83
> +        .quad 0xc08ff5ea1b1e0700, 0xbd5353a3ac18c62f
> +        .quad 0xc08ff5eee6061810, 0x3d567c69c189f21a
> +        .quad 0xc08ff5f3aef18400, 0xbd50dd3220e0b0f2
> +        .quad 0xc08ff5f875e1eff0, 0xbd3ab64d80638db2
> +        .quad 0xc08ff5fd3ad8fee0, 0x3d3ec753439035aa
> +        .quad 0xc08ff601fdd851c8, 0xbd5e10415f5f5e74
> +        .quad 0xc08ff606bee187b0, 0xbd55f1048b113fae
> +        .quad 0xc08ff60b7df63d90, 0x3d1e94e4107406c8
> +        .quad 0xc08ff6103b180e60, 0xbd4e2eb5d0c36eb5
> +        .quad 0xc08ff614f6489330, 0x3d43ec5c714f709a
> +        .quad 0xc08ff619af896308, 0x3d519ec459b62a08
> +        .quad 0xc08ff61e66dc1300, 0xbd5b93d09dd6161d
> +        .quad 0xc08ff6231c423658, 0x3d5d72b849dd56be
> +        .quad 0xc08ff627cfbd5e38, 0xbd276b7e32659173
> +        .quad 0xc08ff62c814f1a08, 0x3d4fd918f2e7a6b9
> +        .quad 0xc08ff63130f8f730, 0x3d5609ba1dcc4c97
> +        .quad 0xc08ff635debc8138, 0xbd55cab233dbd84c
> +        .quad 0xc08ff63a8a9b41d8, 0xbd56778ab7aaabc9
> +        .quad 0xc08ff63f3496c0e0, 0x3d5b2791da49c370
> +        .quad 0xc08ff643dcb08438, 0x3d583063ef145f9c
> +        .quad 0xc08ff64882ea1000, 0xbd484e9cab375fb6
> +        .quad 0xc08ff64d2744e688, 0xbd5c430c95c374aa
> +        .quad 0xc08ff651c9c28848, 0xbd57a16d78490bb3
> +        .quad 0xc08ff6566a6473e8, 0xbd445d70374ea9ec
> +        .quad 0xc08ff65b092c2648, 0x3d5c9729142b9d4b
> +        .quad 0xc08ff65fa61b1a70, 0xbd4aaa179d032405
> +        .quad 0xc08ff6644132c9c0, 0xbd2a3ea300d173de
> +        .quad 0xc08ff668da74abc0, 0x3d57809438efb010
> +        .quad 0xc08ff66d71e23630, 0xbd5e9156720951d6
> +        .quad 0xc08ff672077cdd30, 0xbd5bab62e8462035
> +        .quad 0xc08ff6769b461310, 0xbd05113545431443
> +        .quad 0xc08ff67b2d3f4868, 0x3d5105eb0607e59b
> +        .quad 0xc08ff67fbd69ec18, 0xbd5e657842b37dc0
> +        .quad 0xc08ff6844bc76b68, 0x3d4ad1849705bc4c
> +        .quad 0xc08ff688d85931c8, 0xbd508b6f92b6e0d6
> +        .quad 0xc08ff68d6320a920, 0x3d48683cceb5fdfc
> +        .quad 0xc08ff691ec1f3990, 0xbd2c25ee290acbf5
> +        .quad 0xc08ff696735649a8, 0x3d58904932cd46d0
> +        .quad 0xc08ff69af8c73e38, 0xbd5c964167f0bfeb
> +        .quad 0xc08ff69f7c737a90, 0xbd43d66937fa06a9
> +        .quad 0xc08ff6a3fe5c6040, 0xbd54bc302ffa76fb
> +        .quad 0xc08ff6a87e834f50, 0x3d4609b1487f87a3
> +        .quad 0xc08ff6acfce9a618, 0xbd42c0d9af0400b1
> +        .quad 0xc08ff6b17990c170, 0x3d549a63973d262d
> +        .quad 0xc08ff6b5f479fc80, 0xbd28cde894aa0641
> +        .quad 0xc08ff6ba6da6b0f0, 0xbd5acef617609a34
> +        .quad 0xc08ff6bee51836d8, 0x3d4abb9ff3cf80b8
> +        .quad 0xc08ff6c35acfe4a8, 0xbd53dcfa1b7697f3
> +        .quad 0xc08ff6c7cecf0f68, 0x3d5bcdf4aea18a55
> +        .quad 0xc08ff6cc41170a70, 0x3d3cad29d4324038
> +        .quad 0xc08ff6d0b1a927b0, 0x3d56945f9cc2a565
> +        .quad 0xc08ff6d52086b780, 0x3d5d20dfc1c668a7
> +        .quad 0xc08ff6d98db108b8, 0x3d37f20a9bcbbe04
> +        .quad 0xc08ff6ddf92968b8, 0x3d1e0824a6e3a4d2
> +        .quad 0xc08ff6e262f12358, 0xbd469f07bf6322c7
> +        .quad 0xc08ff6e6cb0982f8, 0xbd5cc593afdbfaef
> +        .quad 0xc08ff6eb3173d080, 0xbd5ee68d555d7122
> +        .quad 0xc08ff6ef96315360, 0xbd144ee1d6a39124
> +        .quad 0xc08ff6f3f9435188, 0xbd40f2cb308bcd25
> +        .quad 0xc08ff6f85aab0f80, 0xbd5fd98ced08a73c
> +        .quad 0xc08ff6fcba69d068, 0x3d54f2f2a1ea8606
> +        .quad 0xc08ff7011880d5d0, 0xbd57818234572db7
> +        .quad 0xc08ff70574f16008, 0x3d52429e823a9a83
> +        .quad 0xc08ff709cfbcadd0, 0x3d5d6dc9bb81476c
> +        .quad 0xc08ff70e28e3fc90, 0x3d57d189e116bcb2
> +        .quad 0xc08ff71280688848, 0x3d0e18992809fd6d
> +        .quad 0xc08ff716d64b8b98, 0xbd3b48ac92b8549a
> +        .quad 0xc08ff71b2a8e3fb8, 0xbd4dcfa48040893b
> +        .quad 0xc08ff71f7d31dc88, 0x3d58d945b8e53ef1
> +        .quad 0xc08ff723ce379878, 0x3d4f80faef3e15ee
> +        .quad 0xc08ff7281da0a8b0, 0x3d53edc0fd40d18f
> +        .quad 0xc08ff72c6b6e40f0, 0xbd4bcac66e0be72f
> +        .quad 0xc08ff730b7a193b0, 0xbd44fcf96e2ec967
> +        .quad 0xc08ff735023bd208, 0x3d57e2ff34b08d86
> +        .quad 0xc08ff7394b3e2bb0, 0xbd4caedfb10b98dd
> +        .quad 0xc08ff73d92a9cf28, 0xbd55db1083e5ac6a
> +        .quad 0xc08ff741d87fe990, 0xbd580e83e6d54ed6
> +        .quad 0xc08ff7461cc1a6c0, 0x3d1688c83e1b0cba
> +        .quad 0xc08ff74a5f703138, 0xbd52c398c872b701
> +        .quad 0xc08ff74ea08cb240, 0xbd49aabc3683b259
> +        .quad 0xc08ff752e01851d0, 0x3d5ccba8de72495b
> +        .quad 0xc08ff7571e143688, 0xbd5981cf630f5793
> +        .quad 0xc08ff75b5a8185e8, 0xbd4f235844e01ebd
> +        .quad 0xc08ff75f95616410, 0xbd5047de7ba8ec62
> +        .quad 0xc08ff763ceb4f3f0, 0x3d5fa55e004d6562
> +        .quad 0xc08ff768067d5720, 0xbd49f386e521a80e
> +        .quad 0xc08ff76c3cbbae20, 0x3d3693551e62fe83
> +        .quad 0xc08ff77071711818, 0x3d4ba63b30b6c42c
> +        .quad 0xc08ff774a49eb300, 0x3d4c26523d32f573
> +        .quad 0xc08ff778d6459b98, 0x3d3b65e70806143a
> +        .quad 0xc08ff77d0666ed68, 0xbd5796d9c9f2c2cb
> +        .quad 0xc08ff7813503c2d0, 0x3d33267b004b912b
> +        .quad 0xc08ff785621d34e8, 0x3d1d5d8a23e33341
> +        .quad 0xc08ff7898db45ba8, 0x3d46c95233e60f40
> +        .quad 0xc08ff78db7ca4dd0, 0x3d362865acc8f43f
> +        .quad 0xc08ff791e06020f8, 0xbd10e8203e161511
> +        .quad 0xc08ff7960776e988, 0xbd5cafe4f4467eaa
> +        .quad 0xc08ff79a2d0fbac8, 0xbd520fddea9ea0cd
> +        .quad 0xc08ff79e512ba6d0, 0x3d5c53d3778dae46
> +        .quad 0xc08ff7a273cbbe80, 0xbd5f0f6f88490367
> +        .quad 0xc08ff7a694f111c0, 0x3d5601aa3f55ec11
> +        .quad 0xc08ff7aab49caf20, 0xbd4f1a8a2328a4c4
> +        .quad 0xc08ff7aed2cfa438, 0xbd4a3d5341c07d0e
> +        .quad 0xc08ff7b2ef8afd68, 0xbd5f4a1f4c525f31
> +        .quad 0xc08ff7b70acfc600, 0xbd4d594d77b3d775
> +        .quad 0xc08ff7bb249f0828, 0x3d2aef47e37e953b
> +        .quad 0xc08ff7bf3cf9ccf0, 0x3d501803b47dfba2
> +        .quad 0xc08ff7c353e11c50, 0x3d5ed5ec84e5745e
> +        .quad 0xc08ff7c76955fd20, 0xbd3de249bc9e7f96
> +        .quad 0xc08ff7cb7d597538, 0x3d5b5794341d1fdf
> +        .quad 0xc08ff7cf8fec8938, 0xbd519dbd08276359
> +        .quad 0xc08ff7d3a1103cd0, 0xbd450129b8038848
> +        .quad 0xc08ff7d7b0c59288, 0x3d348f00d3bb30fd
> +        .quad 0xc08ff7dbbf0d8bd8, 0xbd43529025720d8a
> +        .quad 0xc08ff7dfcbe92938, 0x3d5abdaa2b1955d7
> +        .quad 0xc08ff7e3d75969f8, 0xbd4e8837d4588a98
> +        .quad 0xc08ff7e7e15f4c80, 0x3d57a782a6df5a1f
> +        .quad 0xc08ff7ebe9fbce08, 0x3d304ba3eaa96bf1
> +        .quad 0xc08ff7eff12fead8, 0xbd47aab17b868a60
> +        .quad 0xc08ff7f3f6fc9e28, 0xbd5bd858693ba90a
> +        .quad 0xc08ff7f7fb62e230, 0x3d26abb2c547789a
> +        .quad 0xc08ff7fbfe63b010, 0xbd59d383d543b3f5
> +        .quad 0xc08ff80000000000, 0x8000000000000000
> +        /*== Log_LA_table ==*/
> +        .align 16
> +        .quad 0x0000000000000000
> +        .quad 0xbf670f83ff0a7565
> +        .quad 0xbf7709c46d7aac77
> +        .quad 0xbf8143068125dd0e
> +        .quad 0xbf86fe50b6ef0851
> +        .quad 0xbf8cb6c3abd14559
> +        .quad 0xbf91363117a97b0c
> +        .quad 0xbf940f9786685d29
> +        .quad 0xbf96e79685c2d22a
> +        .quad 0xbf99be2f7749acc2
> +        .quad 0xbf9c9363ba850f86
> +        .quad 0xbf9f6734acf8695a
> +        .quad 0xbfa11cd1d5133413
> +        .quad 0xbfa2855905ca70f6
> +        .quad 0xbfa3ed3094685a26
> +        .quad 0xbfa554592bb8cd58
> +        .quad 0xbfa6bad3758efd87
> +        .quad 0xbfa820a01ac754cb
> +        .quad 0xbfa985bfc3495194
> +        .quad 0xbfaaea3316095f72
> +        .quad 0xbfac4dfab90aab5f
> +        .quad 0xbfadb1175160f3b0
> +        .quad 0xbfaf1389833253a0
> +        .quad 0xbfb03aa8f8dc854c
> +        .quad 0xbfb0eb389fa29f9b
> +        .quad 0xbfb19b74069f5f0a
> +        .quad 0xbfb24b5b7e135a3d
> +        .quad 0xbfb2faef55ccb372
> +        .quad 0xbfb3aa2fdd27f1c3
> +        .quad 0xbfb4591d6310d85a
> +        .quad 0xbfb507b836033bb7
> +        .quad 0xbfb5b600a40bd4f3
> +        .quad 0xbfb663f6fac91316
> +        .quad 0xbfb7119b876bea86
> +        .quad 0xbfb7beee96b8a281
> +        .quad 0xbfb86bf07507a0c7
> +        .quad 0xbfb918a16e46335b
> +        .quad 0xbfb9c501cdf75872
> +        .quad 0xbfba7111df348494
> +        .quad 0xbfbb1cd1ecae66e7
> +        .quad 0xbfbbc84240adabba
> +        .quad 0xbfbc73632513bd4f
> +        .quad 0xbfbd1e34e35b82da
> +        .quad 0xbfbdc8b7c49a1ddb
> +        .quad 0xbfbe72ec117fa5b2
> +        .quad 0xbfbf1cd21257e18c
> +        .quad 0xbfbfc66a0f0b00a5
> +        .quad 0xbfc037da278f2870
> +        .quad 0xbfc08c588cda79e4
> +        .quad 0xbfc0e0b05ac848ed
> +        .quad 0xbfc134e1b489062e
> +        .quad 0xbfc188ecbd1d16be
> +        .quad 0xbfc1dcd197552b7b
> +        .quad 0xbfc2309065d29791
> +        .quad 0xbfc284294b07a640
> +        .quad 0xbfc2d79c6937efdd
> +        .quad 0xbfc32ae9e278ae1a
> +        .quad 0xbfc37e11d8b10f89
> +        .quad 0xbfc3d1146d9a8a64
> +        .quad 0xbfc423f1c2c12ea2
> +        .quad 0xbfc476a9f983f74d
> +        .quad 0xbfc4c93d33151b24
> +        .quad 0xbfc51bab907a5c8a
> +        .quad 0xbfc56df5328d58c5
> +        .quad 0xbfc5c01a39fbd688
> +        .quad 0xbfc6121ac74813cf
> +        .quad 0xbfc663f6fac91316
> +        .quad 0xbfc6b5aef4aae7dc
> +        .quad 0xbfc70742d4ef027f
> +        .quad 0xbfc758b2bb6c7b76
> +        .quad 0xbfc7a9fec7d05ddf
> +        .quad 0xbfc7fb27199df16d
> +        .quad 0xbfc84c2bd02f03b3
> +        .quad 0xbfc89d0d0ab430cd
> +        .quad 0xbfc8edcae8352b6c
> +        .quad 0xbfc93e6587910444
> +        .quad 0xbfc98edd077e70df
> +        .quad 0xbfc9df31868c11d5
> +        .quad 0xbfca2f632320b86b
> +        .quad 0xbfca7f71fb7bab9d
> +        .quad 0xbfcacf5e2db4ec94
> +        .quad 0xbfcb1f27d7bd7a80
> +        .quad 0xbfcb6ecf175f95e9
> +        .quad 0xbfcbbe540a3f036f
> +        .quad 0xbfcc0db6cdd94dee
> +        .quad 0xbfcc5cf77f860826
> +        .quad 0xbfccac163c770dc9
> +        .quad 0xbfccfb1321b8c400
> +        .quad 0xbfcd49ee4c325970
> +        .quad 0xbfcd98a7d8a605a7
> +        .quad 0xbfcde73fe3b1480f
> +        .quad 0xbfce35b689cd2655
> +        .quad 0xbfce840be74e6a4d
> +        .quad 0xbfced2401865df52
> +        .quad 0xbfcf205339208f27
> +        .quad 0xbfcf6e456567fe55
> +        .quad 0xbfcfbc16b902680a
> +        .quad 0xbfd004e3a7c97cbd
> +        .quad 0xbfd02baba24d0664
> +        .quad 0xbfd0526359bab1b3
> +        .quad 0xbfd0790adbb03009
> +        .quad 0xbfd09fa235ba2020
> +        .quad 0xbfd0c62975542a8f
> +        .quad 0xbfd0eca0a7e91e0b
> +        .quad 0xbfd11307dad30b76
> +        .quad 0xbfd1395f1b5b61a6
> +        .quad 0xbfd15fa676bb08ff
> +        .quad 0xbfd185ddfa1a7ed0
> +        .quad 0xbfd1ac05b291f070
> +        .quad 0xbfd1d21dad295632
> +        .quad 0xbfd1f825f6d88e13
> +        .quad 0xbfd21e1e9c877639
> +        .quad 0xbfd24407ab0e073a
> +        .quad 0xbfd269e12f346e2c
> +        .quad 0xbfd28fab35b32683
> +        .quad 0xbfd2b565cb3313b6
> +        .quad 0xbfd2db10fc4d9aaf
> +        .quad 0xbfd300acd58cbb10
> +        .quad 0xbfd32639636b2836
> +        .quad 0xbfd34bb6b2546218
> +        .quad 0xbfd37124cea4cded
> +        .quad 0xbfd39683c4a9ce9a
> +        .quad 0xbfd3bbd3a0a1dcfb
> +        .quad 0xbfd3e1146ebc9ff2
> +        .quad 0xbfd406463b1b0449
> +        .quad 0xbfd42b6911cf5465
> +        .quad 0xbfd4507cfedd4fc4
> +        .quad 0xbfd475820e3a4251
> +        .quad 0xbfd49a784bcd1b8b
> +        .quad 0xbfd4bf5fc36e8577
> +        .quad 0xbfd4e43880e8fb6a
> +        .quad 0xbfd509028ff8e0a2
> +        .quad 0xbfd52dbdfc4c96b3
> +        .quad 0xbfd5526ad18493ce
> +        .quad 0xbfd577091b3378cb
> +        .quad 0xbfd59b98e4de271c
> +        .quad 0xbfd5c01a39fbd688
> +        .quad 0xbfd5e48d25f62ab9
> +        .quad 0xbfd608f1b42948ae
> +        .quad 0xbfd62d47efe3ebee
> +        .quad 0xbfd6518fe4677ba7
> +        .quad 0xbfd675c99ce81f92
> +        .quad 0xbfd699f5248cd4b8
> +        .quad 0xbfd6be12866f820d
> +        .quad 0xbfd6e221cd9d0cde
> +        .quad 0xbfd7062305156d1d
> +        .quad 0xbfd72a1637cbc183
> +        .quad 0xbfd74dfb70a66388
> +        .quad 0xbfd771d2ba7efb3c
> +        .quad 0xbfd7959c202292f1
> +        .quad 0xbfd7b957ac51aac4
> +        .quad 0xbfd7dd0569c04bff
> +        .quad 0xbfd800a563161c54
> +        .quad 0xbfd82437a2ee70f7
> +        .quad 0xbfd847bc33d8618e
> +        .quad 0xbfd86b332056db01
> +        .quad 0xbfd88e9c72e0b226
> +        .quad 0xbfd8b1f835e0b642
> +        .quad 0xbfd8d54673b5c372
> +        .quad 0xbfd8f88736b2d4e8
> +        .quad 0xbfd91bba891f1709
> +        .quad 0xbfd93ee07535f967
> +        .quad 0xbfd961f90527409c
> +        .quad 0xbfd98504431717fc
> +        .quad 0xbfd9a802391e232f
> +        .quad 0xbfd9caf2f1498fa4
> +        .quad 0xbfd9edd6759b25e0
> +        .quad 0xbfda10acd0095ab4
> +        .quad 0xbfda33760a7f6051
> +        .quad 0xbfda56322edd3731
> +        .quad 0xbfda78e146f7bef4
> +        .quad 0xbfda9b835c98c70a
> +        .quad 0xbfdabe18797f1f49
> +        .quad 0xbfdae0a0a75ea862
> +        .quad 0xbfdb031befe06434
> +        .quad 0xbfdb258a5ca28608
> +        .quad 0xbfdb47ebf73882a1
> +        .quad 0xbfdb6a40c92b203f
> +        .quad 0xbfdb8c88dbf8867a
> +        .quad 0xbfdbaec439144dfd
> +        .quad 0xbfdbd0f2e9e79031
> +        .quad 0xbfdbf314f7d0f6ba
> +        .quad 0xbfdc152a6c24cae6
> +        .quad 0xbfdc3733502d04f8
> +        .quad 0xbfdc592fad295b56
> +        .quad 0xbfdc7b1f8c4f51a4
> +        .quad 0xbfdc9d02f6ca47b4
> +        .quad 0xbfdcbed9f5bb886a
> +        .quad 0xbfdce0a4923a587d
> +        .quad 0xbfdd0262d554051c
> +        .quad 0xbfdd2414c80bf27d
> +        .quad 0xbfdd45ba735baa4f
> +        .quad 0xbfdd6753e032ea0f
> +        .quad 0xbfdd88e11777b149
> +        .quad 0xbfddaa6222064fb9
> +        .quad 0xbfddcbd708b17359
> +        .quad 0xbfdded3fd442364c
> +        .quad 0xbfde0e9c8d782cbd
> +        .quad 0xbfde2fed3d097298
> +        .quad 0xbfde5131eba2b931
> +        .quad 0xbfde726aa1e754d2
> +        .quad 0xbfde939768714a32
> +        .quad 0xbfdeb4b847d15bce
> +        .quad 0xbfded5cd488f1732
> +        .quad 0xbfdef6d67328e220
> +        .quad 0xbfdf17d3d01407af
> +        .quad 0xbfdf38c567bcc541
> +        .quad 0xbfdf59ab4286576c
> +        .quad 0xbfdf7a8568cb06cf
> +        .quad 0xbfdf9b53e2dc34c4
> +        .quad 0xbfdfbc16b902680a
> +        .quad 0xbfdfdccdf37d594c
> +        .quad 0xbfdffd799a83ff9b
> +        .quad 0x3fdfe1e649bb6335
> +        .quad 0x3fdfc151b11b3640
> +        .quad 0x3fdfa0c8937e7d5d
> +        .quad 0x3fdf804ae8d0cd02
> +        .quad 0x3fdf5fd8a9063e35
> +        .quad 0x3fdf3f71cc1b629c
> +        .quad 0x3fdf1f164a15389a
> +        .quad 0x3fdefec61b011f85
> +        .quad 0x3fdede8136f4cbf1
> +        .quad 0x3fdebe47960e3c08
> +        .quad 0x3fde9e193073ac06
> +        .quad 0x3fde7df5fe538ab3
> +        .quad 0x3fde5dddf7e46e0a
> +        .quad 0x3fde3dd1156507de
> +        .quad 0x3fde1dcf4f1c1a9e
> +        .quad 0x3fddfdd89d586e2b
> +        .quad 0x3fddddecf870c4c1
> +        .quad 0x3fddbe0c58c3cff2
> +        .quad 0x3fdd9e36b6b825b1
> +        .quad 0x3fdd7e6c0abc3579
> +        .quad 0x3fdd5eac4d463d7e
> +        .quad 0x3fdd3ef776d43ff4
> +        .quad 0x3fdd1f4d7febf868
> +        .quad 0x3fdcffae611ad12b
> +        .quad 0x3fdce01a12f5d8d1
> +        .quad 0x3fdcc0908e19b7bd
> +        .quad 0x3fdca111cb2aa5c5
> +        .quad 0x3fdc819dc2d45fe4
> +        .quad 0x3fdc62346dca1dfe
> +        .quad 0x3fdc42d5c4c688b4
> +        .quad 0x3fdc2381c08baf4f
> +        .quad 0x3fdc043859e2fdb3
> +        .quad 0x3fdbe4f9899d326e
> +        .quad 0x3fdbc5c5489254cc
> +        .quad 0x3fdba69b8fa1ab02
> +        .quad 0x3fdb877c57b1b070
> +        .quad 0x3fdb686799b00be3
> +        .quad 0x3fdb495d4e9185f7
> +        .quad 0x3fdb2a5d6f51ff83
> +        .quad 0x3fdb0b67f4f46810
> +        .quad 0x3fdaec7cd882b46c
> +        .quad 0x3fdacd9c130dd53f
> +        .quad 0x3fdaaec59dadadbe
> +        .quad 0x3fda8ff971810a5e
> +        .quad 0x3fda713787ad97a5
> +        .quad 0x3fda527fd95fd8ff
> +        .quad 0x3fda33d25fcb1fac
> +        .quad 0x3fda152f142981b4
> +        .quad 0x3fd9f695efbbd0ef
> +        .quad 0x3fd9d806ebc9921c
> +        .quad 0x3fd9b98201a0f405
> +        .quad 0x3fd99b072a96c6b2
> +        .quad 0x3fd97c96600672ad
> +        .quad 0x3fd95e2f9b51f04e
> +        .quad 0x3fd93fd2d5e1bf1d
> +        .quad 0x3fd921800924dd3b
> +        .quad 0x3fd903372e90bee4
> +        .quad 0x3fd8e4f83fa145ee
> +        .quad 0x3fd8c6c335d8b966
> +        .quad 0x3fd8a8980abfbd32
> +        .quad 0x3fd88a76b7e549c6
> +        .quad 0x3fd86c5f36dea3dc
> +        .quad 0x3fd84e5181475449
> +        .quad 0x3fd8304d90c11fd3
> +        .quad 0x3fd812535ef3ff19
> +        .quad 0x3fd7f462e58e1688
> +        .quad 0x3fd7d67c1e43ae5c
> +        .quad 0x3fd7b89f02cf2aad
> +        .quad 0x3fd79acb8cf10390
> +        .quad 0x3fd77d01b66fbd37
> +        .quad 0x3fd75f417917e02c
> +        .quad 0x3fd7418acebbf18f
> +        .quad 0x3fd723ddb1346b65
> +        .quad 0x3fd7063a1a5fb4f2
> +        .quad 0x3fd6e8a004221b1f
> +        .quad 0x3fd6cb0f6865c8ea
> +        .quad 0x3fd6ad88411abfea
> +        .quad 0x3fd6900a8836d0d5
> +        .quad 0x3fd6729637b59418
> +        .quad 0x3fd6552b49986277
> +        .quad 0x3fd637c9b7e64dc2
> +        .quad 0x3fd61a717cac1983
> +        .quad 0x3fd5fd2291fc33cf
> +        .quad 0x3fd5dfdcf1eeae0e
> +        .quad 0x3fd5c2a096a135dc
> +        .quad 0x3fd5a56d7a370ded
> +        .quad 0x3fd5884396d90702
> +        .quad 0x3fd56b22e6b578e5
> +        .quad 0x3fd54e0b64003b70
> +        .quad 0x3fd530fd08f29fa7
> +        .quad 0x3fd513f7cfcb68ce
> +        .quad 0x3fd4f6fbb2cec598
> +        .quad 0x3fd4da08ac46495a
> +        .quad 0x3fd4bd1eb680e548
> +        .quad 0x3fd4a03dcbd2e1be
> +        .quad 0x3fd48365e695d797
> +        .quad 0x3fd466970128a987
> +        .quad 0x3fd449d115ef7d87
> +        .quad 0x3fd42d141f53b646
> +        .quad 0x3fd4106017c3eca3
> +        .quad 0x3fd3f3b4f9b3e939
> +        .quad 0x3fd3d712bf9c9def
> +        .quad 0x3fd3ba7963fc1f8f
> +        .quad 0x3fd39de8e1559f6f
> +        .quad 0x3fd3816132316520
> +        .quad 0x3fd364e2511cc821
> +        .quad 0x3fd3486c38aa29a8
> +        .quad 0x3fd32bfee370ee68
> +        .quad 0x3fd30f9a4c0d786d
> +        .quad 0x3fd2f33e6d2120f2
> +        .quad 0x3fd2d6eb4152324f
> +        .quad 0x3fd2baa0c34be1ec
> +        .quad 0x3fd29e5eedbe4a35
> +        .quad 0x3fd28225bb5e64a4
> +        .quad 0x3fd265f526e603cb
> +        .quad 0x3fd249cd2b13cd6c
> +        .quad 0x3fd22dadc2ab3497
> +        .quad 0x3fd21196e87473d1
> +        .quad 0x3fd1f588973c8747
> +        .quad 0x3fd1d982c9d52708
> +        .quad 0x3fd1bd857b14c146
> +        .quad 0x3fd1a190a5d674a0
> +        .quad 0x3fd185a444fa0a7b
> +        .quad 0x3fd169c05363f158
> +        .quad 0x3fd14de4cbfd373e
> +        .quad 0x3fd13211a9b38424
> +        .quad 0x3fd11646e7791469
> +        .quad 0x3fd0fa848044b351
> +        .quad 0x3fd0deca6f11b58b
> +        .quad 0x3fd0c318aedff3c0
> +        .quad 0x3fd0a76f3ab3c52c
> +        .quad 0x3fd08bce0d95fa38
> +        .quad 0x3fd070352293d724
> +        .quad 0x3fd054a474bf0eb7
> +        .quad 0x3fd0391bff2dbcf3
> +        .quad 0x3fd01d9bbcfa61d4
> +        .quad 0x3fd00223a943dc19
> +        .quad 0x3fcfcd677e5ac81d
> +        .quad 0x3fcf9697f3bd0ccf
> +        .quad 0x3fcf5fd8a9063e35
> +        .quad 0x3fcf29299496a889
> +        .quad 0x3fcef28aacd72231
> +        .quad 0x3fcebbfbe83901a6
> +        .quad 0x3fce857d3d361368
> +        .quad 0x3fce4f0ea2509008
> +        .quad 0x3fce18b00e13123d
> +        .quad 0x3fcde26177108d03
> +        .quad 0x3fcdac22d3e441d3
> +        .quad 0x3fcd75f41b31b6dd
> +        .quad 0x3fcd3fd543a4ad5c
> +        .quad 0x3fcd09c643f117f0
> +        .quad 0x3fccd3c712d31109
> +        .quad 0x3fcc9dd7a70ed160
> +        .quad 0x3fcc67f7f770a67e
> +        .quad 0x3fcc3227facce950
> +        .quad 0x3fcbfc67a7fff4cc
> +        .quad 0x3fcbc6b6f5ee1c9b
> +        .quad 0x3fcb9115db83a3dd
> +        .quad 0x3fcb5b844fb4b3ef
> +        .quad 0x3fcb2602497d5346
> +        .quad 0x3fcaf08fbfe15c51
> +        .quad 0x3fcabb2ca9ec7472
> +        .quad 0x3fca85d8feb202f7
> +        .quad 0x3fca5094b54d2828
> +        .quad 0x3fca1b5fc4e0b465
> +        .quad 0x3fc9e63a24971f46
> +        .quad 0x3fc9b123cba27ed3
> +        .quad 0x3fc97c1cb13c7ec1
> +        .quad 0x3fc94724cca657be
> +        .quad 0x3fc9123c1528c6ce
> +        .quad 0x3fc8dd62821404a9
> +        .quad 0x3fc8a8980abfbd32
> +        .quad 0x3fc873dca68b06f4
> +        .quad 0x3fc83f304cdc5aa7
> +        .quad 0x3fc80a92f5218acc
> +        .quad 0x3fc7d60496cfbb4c
> +        .quad 0x3fc7a18529635926
> +        .quad 0x3fc76d14a4601225
> +        .quad 0x3fc738b2ff50ccad
> +        .quad 0x3fc7046031c79f85
> +        .quad 0x3fc6d01c335dc9b5
> +        .quad 0x3fc69be6fbb3aa6f
> +        .quad 0x3fc667c08270b905
> +        .quad 0x3fc633a8bf437ce1
> +        .quad 0x3fc5ff9fa9e18595
> +        .quad 0x3fc5cba53a0762ed
> +        .quad 0x3fc597b967789d12
> +        .quad 0x3fc563dc29ffacb2
> +        .quad 0x3fc5300d796df33a
> +        .quad 0x3fc4fc4d4d9bb313
> +        .quad 0x3fc4c89b9e6807f5
> +        .quad 0x3fc494f863b8df35
> +        .quad 0x3fc46163957af02e
> +        .quad 0x3fc42ddd2ba1b4a9
> +        .quad 0x3fc3fa651e276158
> +        .quad 0x3fc3c6fb650cde51
> +        .quad 0x3fc3939ff859bf9f
> +        .quad 0x3fc36052d01c3dd7
> +        .quad 0x3fc32d13e4692eb7
> +        .quad 0x3fc2f9e32d5bfdd1
> +        .quad 0x3fc2c6c0a316a540
> +        .quad 0x3fc293ac3dc1a668
> +        .quad 0x3fc260a5f58c02bd
> +        .quad 0x3fc22dadc2ab3497
> +        .quad 0x3fc1fac39d5b280c
> +        .quad 0x3fc1c7e77dde33dc
> +        .quad 0x3fc195195c7d125b
> +        .quad 0x3fc162593186da70
> +        .quad 0x3fc12fa6f550f896
> +        .quad 0x3fc0fd02a03727ea
> +        .quad 0x3fc0ca6c2a9b6b41
> +        .quad 0x3fc097e38ce60649
> +        .quad 0x3fc06568bf8576b3
> +        .quad 0x3fc032fbbaee6d65
> +        .quad 0x3fc0009c779bc7b5
> +        .quad 0x3fbf9c95dc1d1165
> +        .quad 0x3fbf380e2d9ba4df
> +        .quad 0x3fbed3a1d4cdbebb
> +        .quad 0x3fbe6f50c2d9f754
> +        .quad 0x3fbe0b1ae8f2fd56
> +        .quad 0x3fbda700385788a2
> +        .quad 0x3fbd4300a2524d41
> +        .quad 0x3fbcdf1c1839ee74
> +        .quad 0x3fbc7b528b70f1c5
> +        .quad 0x3fbc17a3ed65b23c
> +        .quad 0x3fbbb4102f925394
> +        .quad 0x3fbb5097437cb58e
> +        .quad 0x3fbaed391ab6674e
> +        .quad 0x3fba89f5a6dc9acc
> +        .quad 0x3fba26ccd9981853
> +        .quad 0x3fb9c3bea49d3214
> +        .quad 0x3fb960caf9abb7ca
> +        .quad 0x3fb8fdf1ca8eea6a
> +        .quad 0x3fb89b33091d6fe8
> +        .quad 0x3fb8388ea739470a
> +        .quad 0x3fb7d60496cfbb4c
> +        .quad 0x3fb77394c9d958d5
> +        .quad 0x3fb7113f3259e07a
> +        .quad 0x3fb6af03c2603bd0
> +        .quad 0x3fb64ce26c067157
> +        .quad 0x3fb5eadb217198a3
> +        .quad 0x3fb588edd4d1ceaa
> +        .quad 0x3fb5271a78622a0f
> +        .quad 0x3fb4c560fe68af88
> +        .quad 0x3fb463c15936464e
> +        .quad 0x3fb4023b7b26ac9e
> +        .quad 0x3fb3a0cf56a06c4b
> +        .quad 0x3fb33f7cde14cf5a
> +        .quad 0x3fb2de4403ffd4b3
> +        .quad 0x3fb27d24bae824db
> +        .quad 0x3fb21c1ef55f06c2
> +        .quad 0x3fb1bb32a600549d
> +        .quad 0x3fb15a5fbf7270ce
> +        .quad 0x3fb0f9a634663add
> +        .quad 0x3fb09905f797047c
> +        .quad 0x3fb0387efbca869e
> +        .quad 0x3fafb02267a1ad2d
> +        .quad 0x3faeef792508b69d
> +        .quad 0x3fae2f02159384fe
> +        .quad 0x3fad6ebd1f1febfe
> +        .quad 0x3facaeaa27a02241
> +        .quad 0x3fabeec9151aac2e
> +        .quad 0x3fab2f19cdaa46dc
> +        .quad 0x3faa6f9c377dd31b
> +        .quad 0x3fa9b05038d84095
> +        .quad 0x3fa8f135b8107912
> +        .quad 0x3fa8324c9b914bc7
> +        .quad 0x3fa77394c9d958d5
> +        .quad 0x3fa6b50e297afcce
> +        .quad 0x3fa5f6b8a11c3c61
> +        .quad 0x3fa538941776b01e
> +        .quad 0x3fa47aa07357704f
> +        .quad 0x3fa3bcdd9b9f00f3
> +        .quad 0x3fa2ff4b77413dcb
> +        .quad 0x3fa241e9ed454683
> +        .quad 0x3fa184b8e4c56af8
> +        .quad 0x3fa0c7b844ef1795
> +        .quad 0x3fa00ae7f502c1c4
> +        .quad 0x3f9e9c8fb8a7a900
> +        .quad 0x3f9d23afc49139f9
> +        .quad 0x3f9bab2fdcb46ec7
> +        .quad 0x3f9a330fd028f75f
> +        .quad 0x3f98bb4f6e2bd536
> +        .quad 0x3f9743ee861f3556
> +        .quad 0x3f95ccece78a4a9e
> +        .quad 0x3f94564a62192834
> +        .quad 0x3f92e006c59c9c29
> +        .quad 0x3f916a21e20a0a45
> +        .quad 0x3f8fe9370ef68e1b
> +        .quad 0x3f8cfee70c5ce5dc
> +        .quad 0x3f8a15535d0bab34
> +        .quad 0x3f872c7ba20f7327
> +        .quad 0x3f84445f7cbc8fd2
> +        .quad 0x3f815cfe8eaec830
> +        .quad 0x3f7cecb0f3922091
> +        .quad 0x3f7720d9c06a835f
> +        .quad 0x3f715676c8c7a8c1
> +        .quad 0x3f671b0ea42e5fda
> +        .quad 0x3f57182a894b69c6
> +        .quad 0x8000000000000000
> +        /*== poly_coeff[5] ==*/
> +        .align 16
> +        .quad 0x3fd2776E996DA1D2, 0x3fd2776E996DA1D2 /* coeff5 */
> +        .quad 0xbfd715494C3E7C9B, 0xbfd715494C3E7C9B /* coeff4 */
> +        .quad 0x3fdEC709DC39E926, 0x3fdEC709DC39E926 /* coeff3 */
> +        .quad 0xbfe71547652B7CF8, 0xbfe71547652B7CF8 /* coeff2 */
> +        .quad 0x3ff71547652B82FE, 0x3ff71547652B82FE /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 16
> +        .quad 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinNorm ==*/
> +        .align 16
> +        .quad 0x0010000000000000, 0x0010000000000000
> +        /*== MaxNorm ==*/
> +        .align 16
> +        .quad 0x7fefffffffffffff, 0x7fefffffffffffff
> +        /*== HalfMask ==*/
> +        .align 16
> +        .quad 0xfffffffffc000000, 0xfffffffffc000000
> +        /*== One ==*/
> +        .align 16
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== Threshold ==*/
> +        .align 16
> +        .quad 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 16
> +        .quad 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 16
> +        .quad 0x408ff00000000000, 0x408ff00000000000
> +        .align 16
> +        .type	__svml_dlog2_data_internal,@object
> +        .size	__svml_dlog2_data_internal,.-__svml_dlog2_data_internal
> +        .space 80, 0x00 	
> +        .align 16
> +
> +.FLT_11:
> +        .long	0x00000000,0x43380000,0x00000000,0x43380000
> +        .type	.FLT_11,@object
> +        .size	.FLT_11,16
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core-sse.S
> new file mode 100644
> index 0000000000..882ee276f2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized log2, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_log2 _ZGVdN4v_log2_sse_wrapper
> +#include "../svml_d_log24_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core.c
> new file mode 100644
> index 0000000000..7678090d11
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized log2, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_log2
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_log2, __GI__ZGVdN4v_log2, __redirect__ZGVdN4v_log2)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core_avx2.S
> new file mode 100644
> index 0000000000..b4ead42eae
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log24_core_avx2.S
> @@ -0,0 +1,1324 @@
> +/* Function log2 vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log2(x) = k - log2(Rcp) + poly_approximation(R)
> + *       log2(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dlog2_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	8224
> +#define poly_coeff                    	12352
> +#define ExpMask                       	12512
> +#define Two10                         	12544
> +#define MinNorm                       	12576
> +#define MaxNorm                       	12608
> +#define HalfMask                      	12640
> +#define One                           	12672
> +#define Threshold                     	12704
> +#define Bias                          	12736
> +#define Bias1                         	12768
> +
> +/* Lookup bias for data table __svml_dlog2_data_internal.  */
> +#define Table_Lookup_Bias               -0x405fe0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_log2_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       Table_Lookup_Bias+__svml_dlog2_data_internal(%rip), %r8
> +        vmovapd   %ymm0, %ymm3
> +
> +/* preserve mantissa, set input exponent to 2^(-10) */
> +        vandpd    ExpMask+__svml_dlog2_data_internal(%rip), %ymm3, %ymm4
> +        vorpd     Two10+__svml_dlog2_data_internal(%rip), %ymm4, %ymm2
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        vcvtpd2ps %ymm2, %xmm5
> +
> +/* exponent bits */
> +        vpsrlq    $20, %ymm3, %ymm7
> +        vmovupd   One+__svml_dlog2_data_internal(%rip), %ymm14
> +        vrcpps    %xmm5, %xmm6
> +
> +/* check range */
> +        vcmplt_oqpd MinNorm+__svml_dlog2_data_internal(%rip), %ymm3, %ymm11
> +        vcmpnle_uqpd MaxNorm+__svml_dlog2_data_internal(%rip), %ymm3, %ymm12
> +        vcvtps2pd %xmm6, %ymm9
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        vroundpd  $0, %ymm9, %ymm1
> +
> +/* exponent */
> +        vmovupd   Threshold+__svml_dlog2_data_internal(%rip), %ymm9
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        vpsrlq    $40, %ymm1, %ymm15
> +
> +/* argument reduction */
> +        vfmsub213pd %ymm14, %ymm1, %ymm2
> +
> +/* polynomial */
> +        vmovupd   poly_coeff+__svml_dlog2_data_internal(%rip), %ymm14
> +        vcmplt_oqpd %ymm1, %ymm9, %ymm1
> +        vfmadd213pd poly_coeff+32+__svml_dlog2_data_internal(%rip), %ymm2, %ymm14
> +        vorpd     %ymm12, %ymm11, %ymm13
> +        vmulpd    %ymm2, %ymm2, %ymm12
> +
> +/* combine and get argument value range mask */
> +        vmovmskpd %ymm13, %eax
> +        vextractf128 $1, %ymm7, %xmm8
> +        vshufps   $221, %xmm8, %xmm7, %xmm10
> +
> +/* biased exponent in DP format */
> +        vcvtdq2pd %xmm10, %ymm0
> +        vandpd    Bias+__svml_dlog2_data_internal(%rip), %ymm1, %ymm10
> +        vorpd     Bias1+__svml_dlog2_data_internal(%rip), %ymm10, %ymm11
> +        vsubpd    %ymm11, %ymm0, %ymm1
> +        vmovupd   poly_coeff+64+__svml_dlog2_data_internal(%rip), %ymm0
> +        vfmadd213pd poly_coeff+96+__svml_dlog2_data_internal(%rip), %ymm2, %ymm0
> +        vmulpd    poly_coeff+128+__svml_dlog2_data_internal(%rip), %ymm2, %ymm2
> +        vfmadd213pd %ymm0, %ymm12, %ymm14
> +        vfmadd213pd %ymm2, %ymm12, %ymm14
> +        vextractf128 $1, %ymm15, %xmm6
> +        vmovd     %xmm15, %edx
> +        vmovd     %xmm6, %esi
> +        movslq    %edx, %rdx
> +        vpextrd   $2, %xmm15, %ecx
> +        movslq    %esi, %rsi
> +        vpextrd   $2, %xmm6, %edi
> +        movslq    %ecx, %rcx
> +        movslq    %edi, %rdi
> +        vmovsd    (%r8,%rdx), %xmm4
> +        vmovsd    (%r8,%rsi), %xmm7
> +        vmovhpd   (%r8,%rcx), %xmm4, %xmm5
> +        vmovhpd   (%r8,%rdi), %xmm7, %xmm8
> +        vinsertf128 $1, %xmm8, %ymm5, %ymm13
> +
> +/* reconstruction */
> +        vaddpd    %ymm14, %ymm13, %ymm0
> +        vaddpd    %ymm0, %ymm1, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm3, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      log2@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_log2_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dlog2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
> +        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(32)) VUINT32 poly_coeff[5][4][2];
> +        __declspec(align(32)) VUINT32 ExpMask[4][2];
> +        __declspec(align(32)) VUINT32 Two10[4][2];
> +        __declspec(align(32)) VUINT32 MinNorm[4][2];
> +        __declspec(align(32)) VUINT32 MaxNorm[4][2];
> +        __declspec(align(32)) VUINT32 HalfMask[4][2];
> +        __declspec(align(32)) VUINT32 One[4][2];
> +        __declspec(align(32)) VUINT32 Threshold[4][2];
> +        __declspec(align(32)) VUINT32 Bias[4][2];
> +        __declspec(align(32)) VUINT32 Bias1[4][2];
> +} __svml_dlog2_data_internal;
> +#endif
> +__svml_dlog2_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc08ff00000000000, 0x0000000000000000
> +        .quad 0xc08ff0040038c920, 0x3d52bfc81744e999
> +        .quad 0xc08ff007ff0f0190, 0xbd59b2cedc63c895
> +        .quad 0xc08ff00bfc839e88, 0xbd28e365e6741d71
> +        .quad 0xc08ff00ff8979428, 0x3d4027998f69a77d
> +        .quad 0xc08ff013f34bd5a0, 0x3d5dd2cb33fe6a89
> +        .quad 0xc08ff017eca15518, 0xbd526514cdf2c019
> +        .quad 0xc08ff01be49903d8, 0xbd44bfeeba165e04
> +        .quad 0xc08ff01fdb33d218, 0xbd3fa79ee110cec3
> +        .quad 0xc08ff023d072af20, 0xbd4eebb642c7fd60
> +        .quad 0xc08ff027c4568948, 0x3d429b13d7093443
> +        .quad 0xc08ff02bb6e04de8, 0x3d50f346bd36551e
> +        .quad 0xc08ff02fa810e968, 0xbd5020bb662f1536
> +        .quad 0xc08ff03397e94750, 0x3d5de76b56340995
> +        .quad 0xc08ff037866a5218, 0x3d58065ff3304090
> +        .quad 0xc08ff03b7394f360, 0x3d561fc9322fb785
> +        .quad 0xc08ff03f5f6a13d0, 0x3d0abecd17d0d778
> +        .quad 0xc08ff04349ea9b28, 0xbd588f3ad0ce4d44
> +        .quad 0xc08ff04733177040, 0xbd4454ba4ac5f44d
> +        .quad 0xc08ff04b1af178f8, 0xbd556f78faaa0887
> +        .quad 0xc08ff04f01799a58, 0x3d49db8976de7469
> +        .quad 0xc08ff052e6b0b868, 0xbd5cdb6fce17ef00
> +        .quad 0xc08ff056ca97b668, 0xbd576de8c0412f09
> +        .quad 0xc08ff05aad2f76a0, 0x3d30142c7ec6475c
> +        .quad 0xc08ff05e8e78da70, 0xbd1e685afc26de72
> +        .quad 0xc08ff0626e74c260, 0xbd40b64c954078a3
> +        .quad 0xc08ff0664d240e10, 0xbd5fcde393462d7d
> +        .quad 0xc08ff06a2a879c48, 0xbd537245eeeecc53
> +        .quad 0xc08ff06e06a04ae8, 0x3d4ac306eb47b436
> +        .quad 0xc08ff071e16ef6e8, 0xbd5a1fd9d3758f6b
> +        .quad 0xc08ff075baf47c80, 0x3d2401fbaaa67e3c
> +        .quad 0xc08ff0799331b6f0, 0x3d4f8dbef47a4d53
> +        .quad 0xc08ff07d6a2780a8, 0x3d51215e0abb42d1
> +        .quad 0xc08ff0813fd6b340, 0x3d57ce6249eddb35
> +        .quad 0xc08ff08514402770, 0xbd38a803c7083a25
> +        .quad 0xc08ff088e764b528, 0x3d42218beba5073e
> +        .quad 0xc08ff08cb9453370, 0x3d447b66f1c6248f
> +        .quad 0xc08ff09089e27880, 0xbd53d9297847e995
> +        .quad 0xc08ff094593d59c8, 0xbd12b6979cc77aa9
> +        .quad 0xc08ff0982756abd0, 0xbd55308545ecd702
> +        .quad 0xc08ff09bf42f4260, 0xbd578fa97c3b936f
> +        .quad 0xc08ff09fbfc7f068, 0xbd41828408ce869d
> +        .quad 0xc08ff0a38a218808, 0x3d555da6ce7251a6
> +        .quad 0xc08ff0a7533cda88, 0xbd41f3cd14bfcb02
> +        .quad 0xc08ff0ab1b1ab878, 0xbd1f028da6bf1852
> +        .quad 0xc08ff0aee1bbf188, 0xbd4cf04de3267f54
> +        .quad 0xc08ff0b2a72154a8, 0xbd4556e47019db10
> +        .quad 0xc08ff0b66b4baff8, 0x3d1e7ba00b15fbe4
> +        .quad 0xc08ff0ba2e3bd0d0, 0x3d5bfde1c52c2f28
> +        .quad 0xc08ff0bdeff283b8, 0x3d48d63fe20ee5d6
> +        .quad 0xc08ff0c1b0709480, 0x3d57f551980838ff
> +        .quad 0xc08ff0c56fb6ce20, 0xbd4189091f293c81
> +        .quad 0xc08ff0c92dc5fae0, 0x3d4d549f05f06169
> +        .quad 0xc08ff0ccea9ee428, 0xbd5982466074e1e3
> +        .quad 0xc08ff0d0a64252b8, 0xbd5d30a6b16c0e4b
> +        .quad 0xc08ff0d460b10e80, 0xbd3138bf3b51a201
> +        .quad 0xc08ff0d819ebdea8, 0xbd454e680c0801d6
> +        .quad 0xc08ff0dbd1f389a8, 0x3d584db361385926
> +        .quad 0xc08ff0df88c8d520, 0xbd564f2252a82c03
> +        .quad 0xc08ff0e33e6c8610, 0xbd5c78c35ed5d034
> +        .quad 0xc08ff0e6f2df60a8, 0xbd52eb9f29ca3d75
> +        .quad 0xc08ff0eaa6222860, 0x3d5340c0c01b5ff8
> +        .quad 0xc08ff0ee58359fe8, 0x3d10c2acaffa64b6
> +        .quad 0xc08ff0f2091a8948, 0xbd3fced311301ebe
> +        .quad 0xc08ff0f5b8d1a5c8, 0x3d41ee5d591af30b
> +        .quad 0xc08ff0f9675bb5f0, 0x3d4873546b0e668c
> +        .quad 0xc08ff0fd14b97998, 0x3d5a99928177a119
> +        .quad 0xc08ff100c0ebafd8, 0x3d378ead132adcac
> +        .quad 0xc08ff1046bf31720, 0x3d51a538bc597d48
> +        .quad 0xc08ff10815d06d18, 0xbd540ee2f35efd7e
> +        .quad 0xc08ff10bbe846ec8, 0xbd59cf94753adacc
> +        .quad 0xc08ff10f660fd878, 0xbd5201a3d6862895
> +        .quad 0xc08ff1130c7365c0, 0x3d383e25d0822d03
> +        .quad 0xc08ff116b1afd180, 0xbd0b7389bbea8f7b
> +        .quad 0xc08ff11a55c5d5f0, 0xbd4df278087a6617
> +        .quad 0xc08ff11df8b62c98, 0xbd48daeb8ec01e26
> +        .quad 0xc08ff1219a818e50, 0x3d57c9312e0a14da
> +        .quad 0xc08ff1253b28b330, 0xbd5f0fbc0e4d507e
> +        .quad 0xc08ff128daac52c8, 0xbd222afdee008687
> +        .quad 0xc08ff12c790d23d8, 0x3d17c71747bcef8b
> +        .quad 0xc08ff130164bdc88, 0x3d5d69cfd051af50
> +        .quad 0xc08ff133b2693248, 0x3d59dff064e9433a
> +        .quad 0xc08ff1374d65d9e8, 0x3d4f71a30db3240b
> +        .quad 0xc08ff13ae7428788, 0xbd5e56afa9524606
> +        .quad 0xc08ff13e7fffeeb0, 0xbd44acd84e6f8518
> +        .quad 0xc08ff142179ec228, 0xbd519845ade5e121
> +        .quad 0xc08ff145ae1fb420, 0xbd5b3b4a38ddec70
> +        .quad 0xc08ff14943837620, 0xbd5ea4bb5bc137c7
> +        .quad 0xc08ff14cd7cab910, 0x3d5610f3bf8eb6ce
> +        .quad 0xc08ff1506af62d20, 0x3d57b1170d6184cf
> +        .quad 0xc08ff153fd0681f0, 0x3d5791a688a3660e
> +        .quad 0xc08ff1578dfc6678, 0x3d5d41ecf8abac2e
> +        .quad 0xc08ff15b1dd88908, 0x3cf0bd995d64d573
> +        .quad 0xc08ff15eac9b9758, 0xbd5e3653cd796d01
> +        .quad 0xc08ff1623a463e80, 0xbd597573005ef2d8
> +        .quad 0xc08ff165c6d92af0, 0xbd4ee222d6439c41
> +        .quad 0xc08ff16952550880, 0x3d5913b845e75950
> +        .quad 0xc08ff16cdcba8258, 0xbd558e7ba239077e
> +        .quad 0xc08ff170660a4328, 0x3d5a0e174a2cae66
> +        .quad 0xc08ff173ee44f4d8, 0x3d22b8db103db712
> +        .quad 0xc08ff177756b40d8, 0x3d5cc610480853c4
> +        .quad 0xc08ff17afb7dcfe0, 0xbd304a8bc84e5c0f
> +        .quad 0xc08ff17e807d4a28, 0x3d3639d185da5f7d
> +        .quad 0xc08ff182046a5738, 0xbd534705d06d788f
> +        .quad 0xc08ff18587459e10, 0xbd540d25b28a51fd
> +        .quad 0xc08ff189090fc510, 0xbd02d804afa7080a
> +        .quad 0xc08ff18c89c97200, 0x3d5f2a5d305818ba
> +        .quad 0xc08ff19009734a08, 0xbd3a602e9d05c3e4
> +        .quad 0xc08ff193880df1d0, 0xbd533d6fdcd54875
> +        .quad 0xc08ff197059a0d60, 0x3d24eaf0a9490202
> +        .quad 0xc08ff19a82184020, 0xbd5685666d98eb59
> +        .quad 0xc08ff19dfd892cf8, 0xbd509f8745f0868b
> +        .quad 0xc08ff1a177ed7630, 0xbd2dcba340a9d268
> +        .quad 0xc08ff1a4f145bd80, 0x3d4916fcd0331266
> +        .quad 0xc08ff1a86992a408, 0xbd548cd033a49073
> +        .quad 0xc08ff1abe0d4ca68, 0xbd5252f40e5df1a2
> +        .quad 0xc08ff1af570cd0a0, 0xbd541d623bd02248
> +        .quad 0xc08ff1b2cc3b5628, 0xbd258dc48235c071
> +        .quad 0xc08ff1b64060f9e0, 0xbd4b4bd8f02ed3f2
> +        .quad 0xc08ff1b9b37e5a28, 0x3d4e8d20a88cd0a2
> +        .quad 0xc08ff1bd259414c0, 0x3d3b669b6380bc55
> +        .quad 0xc08ff1c096a2c6e8, 0xbd45d54159d51094
> +        .quad 0xc08ff1c406ab0d58, 0x3d59f684ffbca44d
> +        .quad 0xc08ff1c775ad8428, 0x3d543b1b1d508399
> +        .quad 0xc08ff1cae3aac6f8, 0x3d5c30953a12fc6e
> +        .quad 0xc08ff1ce50a370d0, 0xbd1763b04f9aad5f
> +        .quad 0xc08ff1d1bc981c40, 0x3d573c6fa54f46c2
> +        .quad 0xc08ff1d527896338, 0x3d48ccfb9ffd7455
> +        .quad 0xc08ff1d89177df30, 0x3d42756f80d6f7ce
> +        .quad 0xc08ff1dbfa642910, 0xbd3c2bfbc353c5a5
> +        .quad 0xc08ff1df624ed940, 0x3d1d6064f5dc380b
> +        .quad 0xc08ff1e2c9388798, 0x3ce327c6b30711cf
> +        .quad 0xc08ff1e62f21cb70, 0x3d140aa9546525bc
> +        .quad 0xc08ff1e9940b3b98, 0xbd15c1ff43c21863
> +        .quad 0xc08ff1ecf7f56e60, 0x3d590ba680120498
> +        .quad 0xc08ff1f05ae0f988, 0x3d5390c6b62dff50
> +        .quad 0xc08ff1f3bcce7258, 0x3d4da0c90878457f
> +        .quad 0xc08ff1f71dbe6d90, 0x3d30697edc85b98c
> +        .quad 0xc08ff1fa7db17f70, 0x3d04d81188510a79
> +        .quad 0xc08ff1fddca83bb0, 0xbd5f2ddc983ce25c
> +        .quad 0xc08ff2013aa33598, 0x3d46c22f0fae6844
> +        .quad 0xc08ff20497a2ffd0, 0xbd53359b714c3d03
> +        .quad 0xc08ff207f3a82ca0, 0xbd4aefaa5524f88b
> +        .quad 0xc08ff20b4eb34dc0, 0x3d39bf4a4a73d01d
> +        .quad 0xc08ff20ea8c4f468, 0x3d44217befdb12e6
> +        .quad 0xc08ff21201ddb158, 0x3d5219b281d4b6f8
> +        .quad 0xc08ff21559fe14c8, 0xbd5e3b123373d370
> +        .quad 0xc08ff218b126ae88, 0xbd59b525a6edc3cb
> +        .quad 0xc08ff21c07580dd8, 0xbd4b494e7737c4dc
> +        .quad 0xc08ff21f5c92c180, 0xbd3989b7d67e3e54
> +        .quad 0xc08ff222b0d757d0, 0x3d486c8f098ad3cf
> +        .quad 0xc08ff22604265e98, 0x3d5254956d8e15b2
> +        .quad 0xc08ff22956806330, 0x3d3f14730a362959
> +        .quad 0xc08ff22ca7e5f278, 0xbd40e8ed02e32ea1
> +        .quad 0xc08ff22ff85798d8, 0xbd40fb2b9b1e0261
> +        .quad 0xc08ff23347d5e238, 0xbd5bfeb1e13c8bc3
> +        .quad 0xc08ff23696615a18, 0x3d5b891f041e037b
> +        .quad 0xc08ff239e3fa8b60, 0xbd36255027582bb9
> +        .quad 0xc08ff23d30a200a8, 0x3d56bb5a92a55361
> +        .quad 0xc08ff2407c5843f0, 0xbd31902fb4417244
> +        .quad 0xc08ff243c71dded8, 0xbd5a8a7c3c4a2cc6
> +        .quad 0xc08ff24710f35a88, 0xbd23be1be6941016
> +        .quad 0xc08ff24a59d93fa8, 0x3d55c85afafa1d46
> +        .quad 0xc08ff24da1d01668, 0xbd5b4b05a0adcbf1
> +        .quad 0xc08ff250e8d866a0, 0x3d134d191476f74b
> +        .quad 0xc08ff2542ef2b798, 0x3d5e78ce963395e1
> +        .quad 0xc08ff257741f9028, 0x3d3f9219a8f57c17
> +        .quad 0xc08ff25ab85f76c8, 0x3d5cfc6f47ac691b
> +        .quad 0xc08ff25dfbb2f168, 0x3d4ab3b720b5ca71
> +        .quad 0xc08ff2613e1a8598, 0x3d54a4ab99feb71a
> +        .quad 0xc08ff2647f96b868, 0xbd42daa69d79d724
> +        .quad 0xc08ff267c0280e88, 0xbd344d9115018f45
> +        .quad 0xc08ff26affcf0c28, 0xbd56673e143d2ac0
> +        .quad 0xc08ff26e3e8c3518, 0x3d3aac889e91c638
> +        .quad 0xc08ff2717c600ca8, 0x3d4cf65b41d006e7
> +        .quad 0xc08ff274b94b15c0, 0xbd4c821320391e76
> +        .quad 0xc08ff277f54dd2e8, 0x3d51abd6e2ddc2a1
> +        .quad 0xc08ff27b3068c620, 0xbd2f1bdd1264e703
> +        .quad 0xc08ff27e6a9c7110, 0xbd58437b4f032f15
> +        .quad 0xc08ff281a3e954f0, 0xbd4f8e063b069a7d
> +        .quad 0xc08ff284dc4ff288, 0x3d5276d0723a662a
> +        .quad 0xc08ff28813d0ca28, 0xbd5731f7c6d8f6eb
> +        .quad 0xc08ff28b4a6c5bd0, 0xbd58b587f08307ec
> +        .quad 0xc08ff28e80232708, 0x3d57f19a7a352baf
> +        .quad 0xc08ff291b4f5aae0, 0x3d570d99aff32790
> +        .quad 0xc08ff294e8e46610, 0x3d4efafaad4f59db
> +        .quad 0xc08ff2981befd6e0, 0xbd41eb1728371564
> +        .quad 0xc08ff29b4e187b38, 0x3d458465b4e080d7
> +        .quad 0xc08ff29e7f5ed088, 0x3d46acb4a035a820
> +        .quad 0xc08ff2a1afc353e0, 0xbd39fc68238dd5d3
> +        .quad 0xc08ff2a4df4681f0, 0x3d526d90c6750dde
> +        .quad 0xc08ff2a80de8d6f0, 0x3d48505c598278fd
> +        .quad 0xc08ff2ab3baacec0, 0x3d520fece8e148e8
> +        .quad 0xc08ff2ae688ce4d0, 0x3d14f7bf38646243
> +        .quad 0xc08ff2b1948f9430, 0xbd5aa5f693a627df
> +        .quad 0xc08ff2b4bfb35790, 0xbd4725d8e6280861
> +        .quad 0xc08ff2b7e9f8a930, 0x3d482e0765d44bda
> +        .quad 0xc08ff2bb136002e8, 0xbd523d745da75cde
> +        .quad 0xc08ff2be3be9de40, 0xbd32e50b4191ef73
> +        .quad 0xc08ff2c16396b448, 0xbd490856dfe073b2
> +        .quad 0xc08ff2c48a66fdb8, 0xbd512b526137db4d
> +        .quad 0xc08ff2c7b05b32e8, 0x3d5bfcdc71b36585
> +        .quad 0xc08ff2cad573cbb8, 0xbd2c24f2afddb377
> +        .quad 0xc08ff2cdf9b13fc0, 0xbd5ea60d06da12f6
> +        .quad 0xc08ff2d11d140630, 0xbd582f2f9e256dc5
> +        .quad 0xc08ff2d43f9c95d0, 0xbd4411c269523864
> +        .quad 0xc08ff2d7614b6508, 0xbd41107eeb7e1093
> +        .quad 0xc08ff2da8220e9e8, 0x3d5a4aa491710eda
> +        .quad 0xc08ff2dda21d9a10, 0x3d46e50a14550378
> +        .quad 0xc08ff2e0c141ead0, 0xbd4881e3bd846de9
> +        .quad 0xc08ff2e3df8e5118, 0xbd46d93437bd399d
> +        .quad 0xc08ff2e6fd034170, 0xbd5b4ef1e9713a4c
> +        .quad 0xc08ff2ea19a13010, 0x3d4a0e31ed25b3ef
> +        .quad 0xc08ff2ed356890b8, 0xbd5a7a560db90113
> +        .quad 0xc08ff2f05059d6f0, 0x3d51f5bb5f9072c9
> +        .quad 0xc08ff2f36a7575c0, 0x3d5ed5225350a585
> +        .quad 0xc08ff2f683bbdfe0, 0xbd1c9363d9e745db
> +        .quad 0xc08ff2f99c2d87b8, 0x3d329c788e376e0d
> +        .quad 0xc08ff2fcb3cadf40, 0xbd59eb5d29918de0
> +        .quad 0xc08ff2ffca945828, 0xbd4a86aac097a06b
> +        .quad 0xc08ff302e08a63b8, 0x3d541c2c97e8b4d1
> +        .quad 0xc08ff305f5ad72d8, 0x3d43c95dec31821b
> +        .quad 0xc08ff30909fdf620, 0xbd590abed3d72738
> +        .quad 0xc08ff30c1d7c5dd8, 0x3d4caefdad90e913
> +        .quad 0xc08ff30f302919d0, 0xbd4f7ed5e1dcb170
> +        .quad 0xc08ff312420499a0, 0x3d3c590edf8c3407
> +        .quad 0xc08ff315530f4c70, 0x3d5477d46ce838e1
> +        .quad 0xc08ff3186349a118, 0x3d5e4b00c511fa78
> +        .quad 0xc08ff31b72b40610, 0xbd54333e5a0c1658
> +        .quad 0xc08ff31e814ee990, 0x3d25300b88bfa10a
> +        .quad 0xc08ff3218f1ab958, 0xbd5bfbd520249ed7
> +        .quad 0xc08ff3249c17e2f0, 0x3d436b1cdba645b7
> +        .quad 0xc08ff327a846d368, 0xbd5cb667c2f86eaa
> +        .quad 0xc08ff32ab3a7f7a0, 0x3d5334d06a920d5f
> +        .quad 0xc08ff32dbe3bbbf8, 0xbd5407602ab64243
> +        .quad 0xc08ff330c8028ca0, 0xbd52b12c9cc82316
> +        .quad 0xc08ff333d0fcd560, 0x3d158d7dd801324b
> +        .quad 0xc08ff336d92b01a8, 0xbd38b55deae69564
> +        .quad 0xc08ff339e08d7ca0, 0x3d4a92d51dc43d43
> +        .quad 0xc08ff33ce724b110, 0x3d5455afbb5de008
> +        .quad 0xc08ff33fecf10970, 0x3d3b65694b6f87fb
> +        .quad 0xc08ff342f1f2efe8, 0xbd3afb8ccc1260eb
> +        .quad 0xc08ff345f62ace50, 0x3d59c98f7ec71b79
> +        .quad 0xc08ff348f9990e18, 0xbd5238294ff3846d
> +        .quad 0xc08ff34bfc3e1880, 0x3d4deba7087bbf7b
> +        .quad 0xc08ff34efe1a5650, 0xbd573e25d2d308e5
> +        .quad 0xc08ff351ff2e3020, 0xbd44bc302ffa76fb
> +        .quad 0xc08ff354ff7a0e20, 0xbd2cad65891df000
> +        .quad 0xc08ff357fefe5838, 0x3d4b4fe326c05a8a
> +        .quad 0xc08ff35afdbb75f8, 0x3d0fb5680f67649b
> +        .quad 0xc08ff35dfbb1cea8, 0xbd4af509a9977e57
> +        .quad 0xc08ff360f8e1c940, 0x3cea69221cfb0ad6
> +        .quad 0xc08ff363f54bcc60, 0x3d3d116c159fead5
> +        .quad 0xc08ff366f0f03e58, 0xbd5e64e8bff70d5e
> +        .quad 0xc08ff369ebcf8538, 0xbd5cc32ce5effb96
> +        .quad 0xc08ff36ce5ea06b8, 0x3d57bbe811e4fbda
> +        .quad 0xc08ff36fdf402830, 0xbcf46d4595033678
> +        .quad 0xc08ff372d7d24ec8, 0x3d4c4bbec857b9fc
> +        .quad 0xc08ff375cfa0df40, 0xbd59d3f339613a2d
> +        .quad 0xc08ff378c6ac3e28, 0x3d58408e1bcb4e24
> +        .quad 0xc08ff37bbcf4cfa0, 0x3d5fdb793dc8e643
> +        .quad 0xc08ff37eb27af788, 0xbd5f0d884b401f1e
> +        .quad 0xc08ff381a73f1988, 0xbd5a7ed37e2c50b4
> +        .quad 0xc08ff3849b4198e8, 0x3d5b14c1f630b2af
> +        .quad 0xc08ff3878e82d898, 0x3d505a9abef02aff
> +        .quad 0xc08ff38a81033b50, 0xbd4a9bbd51a7d1c4
> +        .quad 0xc08ff38d72c32380, 0x3d4783623464f80e
> +        .quad 0xc08ff39063c2f338, 0xbd0e2d78f68abcc7
> +        .quad 0xc08ff39354030c50, 0x3d3e604763e782cb
> +        .quad 0xc08ff3964383d048, 0xbd4514f0840b6f59
> +        .quad 0xc08ff3993245a060, 0xbd5488753d6035a4
> +        .quad 0xc08ff39c2048dd90, 0x3d5ccc099b5ff97d
> +        .quad 0xc08ff39f0d8de870, 0x3d454ada83325c69
> +        .quad 0xc08ff3a1fa152168, 0x3d1e4b27fb754eb1
> +        .quad 0xc08ff3a4e5dee890, 0x3d58c67819ead583
> +        .quad 0xc08ff3a7d0eb9da8, 0xbd536d02e85d644b
> +        .quad 0xc08ff3aabb3ba048, 0x3d5f510ab9e7c184
> +        .quad 0xc08ff3ada4cf4f98, 0x3d557bc5b296d5f5
> +        .quad 0xc08ff3b08da70a90, 0xbd48893b8f7f52c9
> +        .quad 0xc08ff3b375c32fe8, 0x3d5ca0b69a37d601
> +        .quad 0xc08ff3b65d241df0, 0xbd519c57fff86872
> +        .quad 0xc08ff3b943ca32d8, 0x3d048da0e3a8c3c3
> +        .quad 0xc08ff3bc29b5cc68, 0xbd5dd05e06ec07d0
> +        .quad 0xc08ff3bf0ee74840, 0x3d56c52a5c8015db
> +        .quad 0xc08ff3c1f35f0398, 0x3d54e1dba9930bed
> +        .quad 0xc08ff3c4d71d5b78, 0x3d2c5f679a7932b7
> +        .quad 0xc08ff3c7ba22aca0, 0xbd3f77628aa1aed8
> +        .quad 0xc08ff3cd7e03ac60, 0xbd5cc8a22f1d8591
> +        .quad 0xc08ff3d33f04e360, 0x3d4ae09463e13f6f
> +        .quad 0xc08ff3d8fd292dc8, 0x3d42736efbec3922
> +        .quad 0xc08ff3deb8736390, 0xbce0324f8d149b09
> +        .quad 0xc08ff3e470e65870, 0xbd52089e4b8dd900
> +        .quad 0xc08ff3ea2684dbf0, 0xbd5f8e9d5dea127f
> +        .quad 0xc08ff3efd951b970, 0xbd4b60d79db026b1
> +        .quad 0xc08ff3f5894fb828, 0x3d45ff1d6cea2c52
> +        .quad 0xc08ff3fb36819b38, 0x3d5d56022cd7f5b2
> +        .quad 0xc08ff400e0ea21a8, 0xbd58d63f09907b27
> +        .quad 0xc08ff406888c0690, 0xbd4ce6ea362f7ce0
> +        .quad 0xc08ff40c2d6a00f0, 0x3d519fc9ad2ef3ab
> +        .quad 0xc08ff411cf86c3c8, 0xbd55fc89e7b55f20
> +        .quad 0xc08ff4176ee4fe40, 0xbd53229ca791d9be
> +        .quad 0xc08ff41d0b875b88, 0x3d5e7733e6fb23d1
> +        .quad 0xc08ff422a57082e0, 0x3d5871413696b637
> +        .quad 0xc08ff4283ca317c0, 0x3d4b118aa7f493b9
> +        .quad 0xc08ff42dd121b9c8, 0x3d4bdf3692763b50
> +        .quad 0xc08ff43362ef04c8, 0x3d4867e17476dd63
> +        .quad 0xc08ff438f20d90c8, 0xbd5d49b741c778f3
> +        .quad 0xc08ff43e7e7ff228, 0x3d59ac35724f01e3
> +        .quad 0xc08ff4440848b968, 0xbd5251ccdc49432d
> +        .quad 0xc08ff4498f6a7388, 0x3d56cf153ebc9f07
> +        .quad 0xc08ff44f13e7a9b8, 0x3d503b7a697a659c
> +        .quad 0xc08ff45495c2e198, 0xbd5fa03da8acd872
> +        .quad 0xc08ff45a14fe9d38, 0xbd5e6cfb0b5c38fc
> +        .quad 0xc08ff45f919d5b08, 0x3d468b1f1269f1cf
> +        .quad 0xc08ff4650ba195e0, 0xbd313a3a8f72c0f3
> +        .quad 0xc08ff46a830dc528, 0x3d205d31eb8d2bd4
> +        .quad 0xc08ff46ff7e45cb8, 0xbd56cb8ddf5d4a90
> +        .quad 0xc08ff4756a27cd00, 0x3d272c2d46acdcbf
> +        .quad 0xc08ff47ad9da82e8, 0xbd4946efab7a989d
> +        .quad 0xc08ff48046fee800, 0xbd23fabe48cf933c
> +        .quad 0xc08ff485b1976268, 0x3d4f03b099d80f79
> +        .quad 0xc08ff48b19a654e0, 0x3d4fe0c35ab7e9b5
> +        .quad 0xc08ff4907f2e1ed0, 0xbd54b4843f34fe09
> +        .quad 0xc08ff495e2311c58, 0xbd5dfa6541236a64
> +        .quad 0xc08ff49b42b1a648, 0x3d56fd2c8c418cbb
> +        .quad 0xc08ff4a0a0b21218, 0x3d5e687ef208418a
> +        .quad 0xc08ff4a5fc34b210, 0x3d4a671ce14c5521
> +        .quad 0xc08ff4ab553bd540, 0x3d419d0202e3cd96
> +        .quad 0xc08ff4b0abc9c780, 0x3d576b941a895781
> +        .quad 0xc08ff4b5ffe0d170, 0xbd4ea96d88cd1a30
> +        .quad 0xc08ff4bb518338a0, 0x3d4d6b405bd43ba6
> +        .quad 0xc08ff4c0a0b33f60, 0xbcf03382150a56b7
> +        .quad 0xc08ff4c5ed7324f8, 0xbd400df96beb0937
> +        .quad 0xc08ff4cb37c52590, 0xbd5c161714cdebd5
> +        .quad 0xc08ff4d07fab7a48, 0xbd333e8eda1a8e79
> +        .quad 0xc08ff4d5c5285928, 0x3d53aba20381d59f
> +        .quad 0xc08ff4db083df530, 0xbd45e9b07af4e77c
> +        .quad 0xc08ff4e048ee7e70, 0xbd533cfdb78a8c41
> +        .quad 0xc08ff4e5873c21f0, 0xbd5d9b87f4d283f2
> +        .quad 0xc08ff4eac32909c8, 0xbd53a677deee97fa
> +        .quad 0xc08ff4effcb75d18, 0xbd5afd9f5dedc208
> +        .quad 0xc08ff4f533e94020, 0x3ce9dd794d20ab77
> +        .quad 0xc08ff4fa68c0d428, 0xbd5eeae84ba1cbf1
> +        .quad 0xc08ff4ff9b4037b0, 0xbd4f4451587282c8
> +        .quad 0xc08ff504cb698648, 0xbd4a1fa15087e717
> +        .quad 0xc08ff509f93ed8b0, 0xbd5f2f0042b9331a
> +        .quad 0xc08ff50f24c244e0, 0xbd2c2389f8e86341
> +        .quad 0xc08ff5144df5ddf0, 0xbd556fcb7b48f200
> +        .quad 0xc08ff51974dbb448, 0x3d43ba060aa69038
> +        .quad 0xc08ff51e9975d578, 0x3d477ef38ca20229
> +        .quad 0xc08ff523bbc64c60, 0x3d49bcaf1aa4168a
> +        .quad 0xc08ff528dbcf2120, 0xbd51c5609b60687e
> +        .quad 0xc08ff52df9925930, 0xbd51691708d22ce7
> +        .quad 0xc08ff5331511f750, 0x3d30d05c98ecb3d1
> +        .quad 0xc08ff5382e4ffb90, 0xbd423adb056dd244
> +        .quad 0xc08ff53d454e6368, 0xbd3663607042da50
> +        .quad 0xc08ff5425a0f29a8, 0x3d42655d3c6187a6
> +        .quad 0xc08ff5476c944680, 0xbd028c958ae09d20
> +        .quad 0xc08ff54c7cdfaf90, 0xbd436eaf17756653
> +        .quad 0xc08ff5518af357e8, 0x3d5fbbbee66f8d24
> +        .quad 0xc08ff55696d12ff0, 0xbd5d93b389497880
> +        .quad 0xc08ff55ba07b25b0, 0xbd43ff8ff777f337
> +        .quad 0xc08ff560a7f32488, 0xbcf3568803ec82a4
> +        .quad 0xc08ff565ad3b1560, 0xbd50c83eba5cc7ea
> +        .quad 0xc08ff56ab054deb0, 0x3d5becc2411500b7
> +        .quad 0xc08ff56fb1426458, 0xbd5dac964ffa8b83
> +        .quad 0xc08ff574b00587f0, 0x3d1d82f6cc82e69f
> +        .quad 0xc08ff579aca02878, 0xbd34767c0d40542c
> +        .quad 0xc08ff57ea7142298, 0xbd52d28e996ed2ce
> +        .quad 0xc08ff5839f635090, 0xbd432a85d337086d
> +        .quad 0xc08ff588958f8a38, 0x3d512b06ec20c7fd
> +        .quad 0xc08ff58d899aa500, 0xbd47e2147555e10b
> +        .quad 0xc08ff5927b867410, 0xbd4d84480a1b301d
> +        .quad 0xc08ff5976b54c830, 0x3d5622146f3a51bd
> +        .quad 0xc08ff59c59076fc8, 0x3d46d485c5f9c392
> +        .quad 0xc08ff5a144a03700, 0xbd4562714549f4fd
> +        .quad 0xc08ff5a62e20e7b8, 0x3d541ab67e365a63
> +        .quad 0xc08ff5ab158b4970, 0xbd5b0855668b2369
> +        .quad 0xc08ff5affae12188, 0x3d27de1bc2ed4dd8
> +        .quad 0xc08ff5b4de243300, 0x3d40f2592d5ed454
> +        .quad 0xc08ff5b9bf563ea8, 0xbd4ee2f8ba7b3e9e
> +        .quad 0xc08ff5be9e790320, 0xbd3c2214335c2164
> +        .quad 0xc08ff5c37b8e3cc8, 0x3d30745623ab1fd9
> +        .quad 0xc08ff5c85697a5d0, 0xbd326c8fb0ffde38
> +        .quad 0xc08ff5cd2f96f640, 0xbd4c83277493b0bc
> +        .quad 0xc08ff5d2068de3f8, 0x3d39bb1655e6e5ba
> +        .quad 0xc08ff5d6db7e22a8, 0x3d403170b47a5559
> +        .quad 0xc08ff5dbae6963e8, 0x3d5801ddf1edc325
> +        .quad 0xc08ff5e07f515728, 0x3d4b2704c46fe064
> +        .quad 0xc08ff5e54e37a9c8, 0x3d5a16e99ed6cd83
> +        .quad 0xc08ff5ea1b1e0700, 0xbd5353a3ac18c62f
> +        .quad 0xc08ff5eee6061810, 0x3d567c69c189f21a
> +        .quad 0xc08ff5f3aef18400, 0xbd50dd3220e0b0f2
> +        .quad 0xc08ff5f875e1eff0, 0xbd3ab64d80638db2
> +        .quad 0xc08ff5fd3ad8fee0, 0x3d3ec753439035aa
> +        .quad 0xc08ff601fdd851c8, 0xbd5e10415f5f5e74
> +        .quad 0xc08ff606bee187b0, 0xbd55f1048b113fae
> +        .quad 0xc08ff60b7df63d90, 0x3d1e94e4107406c8
> +        .quad 0xc08ff6103b180e60, 0xbd4e2eb5d0c36eb5
> +        .quad 0xc08ff614f6489330, 0x3d43ec5c714f709a
> +        .quad 0xc08ff619af896308, 0x3d519ec459b62a08
> +        .quad 0xc08ff61e66dc1300, 0xbd5b93d09dd6161d
> +        .quad 0xc08ff6231c423658, 0x3d5d72b849dd56be
> +        .quad 0xc08ff627cfbd5e38, 0xbd276b7e32659173
> +        .quad 0xc08ff62c814f1a08, 0x3d4fd918f2e7a6b9
> +        .quad 0xc08ff63130f8f730, 0x3d5609ba1dcc4c97
> +        .quad 0xc08ff635debc8138, 0xbd55cab233dbd84c
> +        .quad 0xc08ff63a8a9b41d8, 0xbd56778ab7aaabc9
> +        .quad 0xc08ff63f3496c0e0, 0x3d5b2791da49c370
> +        .quad 0xc08ff643dcb08438, 0x3d583063ef145f9c
> +        .quad 0xc08ff64882ea1000, 0xbd484e9cab375fb6
> +        .quad 0xc08ff64d2744e688, 0xbd5c430c95c374aa
> +        .quad 0xc08ff651c9c28848, 0xbd57a16d78490bb3
> +        .quad 0xc08ff6566a6473e8, 0xbd445d70374ea9ec
> +        .quad 0xc08ff65b092c2648, 0x3d5c9729142b9d4b
> +        .quad 0xc08ff65fa61b1a70, 0xbd4aaa179d032405
> +        .quad 0xc08ff6644132c9c0, 0xbd2a3ea300d173de
> +        .quad 0xc08ff668da74abc0, 0x3d57809438efb010
> +        .quad 0xc08ff66d71e23630, 0xbd5e9156720951d6
> +        .quad 0xc08ff672077cdd30, 0xbd5bab62e8462035
> +        .quad 0xc08ff6769b461310, 0xbd05113545431443
> +        .quad 0xc08ff67b2d3f4868, 0x3d5105eb0607e59b
> +        .quad 0xc08ff67fbd69ec18, 0xbd5e657842b37dc0
> +        .quad 0xc08ff6844bc76b68, 0x3d4ad1849705bc4c
> +        .quad 0xc08ff688d85931c8, 0xbd508b6f92b6e0d6
> +        .quad 0xc08ff68d6320a920, 0x3d48683cceb5fdfc
> +        .quad 0xc08ff691ec1f3990, 0xbd2c25ee290acbf5
> +        .quad 0xc08ff696735649a8, 0x3d58904932cd46d0
> +        .quad 0xc08ff69af8c73e38, 0xbd5c964167f0bfeb
> +        .quad 0xc08ff69f7c737a90, 0xbd43d66937fa06a9
> +        .quad 0xc08ff6a3fe5c6040, 0xbd54bc302ffa76fb
> +        .quad 0xc08ff6a87e834f50, 0x3d4609b1487f87a3
> +        .quad 0xc08ff6acfce9a618, 0xbd42c0d9af0400b1
> +        .quad 0xc08ff6b17990c170, 0x3d549a63973d262d
> +        .quad 0xc08ff6b5f479fc80, 0xbd28cde894aa0641
> +        .quad 0xc08ff6ba6da6b0f0, 0xbd5acef617609a34
> +        .quad 0xc08ff6bee51836d8, 0x3d4abb9ff3cf80b8
> +        .quad 0xc08ff6c35acfe4a8, 0xbd53dcfa1b7697f3
> +        .quad 0xc08ff6c7cecf0f68, 0x3d5bcdf4aea18a55
> +        .quad 0xc08ff6cc41170a70, 0x3d3cad29d4324038
> +        .quad 0xc08ff6d0b1a927b0, 0x3d56945f9cc2a565
> +        .quad 0xc08ff6d52086b780, 0x3d5d20dfc1c668a7
> +        .quad 0xc08ff6d98db108b8, 0x3d37f20a9bcbbe04
> +        .quad 0xc08ff6ddf92968b8, 0x3d1e0824a6e3a4d2
> +        .quad 0xc08ff6e262f12358, 0xbd469f07bf6322c7
> +        .quad 0xc08ff6e6cb0982f8, 0xbd5cc593afdbfaef
> +        .quad 0xc08ff6eb3173d080, 0xbd5ee68d555d7122
> +        .quad 0xc08ff6ef96315360, 0xbd144ee1d6a39124
> +        .quad 0xc08ff6f3f9435188, 0xbd40f2cb308bcd25
> +        .quad 0xc08ff6f85aab0f80, 0xbd5fd98ced08a73c
> +        .quad 0xc08ff6fcba69d068, 0x3d54f2f2a1ea8606
> +        .quad 0xc08ff7011880d5d0, 0xbd57818234572db7
> +        .quad 0xc08ff70574f16008, 0x3d52429e823a9a83
> +        .quad 0xc08ff709cfbcadd0, 0x3d5d6dc9bb81476c
> +        .quad 0xc08ff70e28e3fc90, 0x3d57d189e116bcb2
> +        .quad 0xc08ff71280688848, 0x3d0e18992809fd6d
> +        .quad 0xc08ff716d64b8b98, 0xbd3b48ac92b8549a
> +        .quad 0xc08ff71b2a8e3fb8, 0xbd4dcfa48040893b
> +        .quad 0xc08ff71f7d31dc88, 0x3d58d945b8e53ef1
> +        .quad 0xc08ff723ce379878, 0x3d4f80faef3e15ee
> +        .quad 0xc08ff7281da0a8b0, 0x3d53edc0fd40d18f
> +        .quad 0xc08ff72c6b6e40f0, 0xbd4bcac66e0be72f
> +        .quad 0xc08ff730b7a193b0, 0xbd44fcf96e2ec967
> +        .quad 0xc08ff735023bd208, 0x3d57e2ff34b08d86
> +        .quad 0xc08ff7394b3e2bb0, 0xbd4caedfb10b98dd
> +        .quad 0xc08ff73d92a9cf28, 0xbd55db1083e5ac6a
> +        .quad 0xc08ff741d87fe990, 0xbd580e83e6d54ed6
> +        .quad 0xc08ff7461cc1a6c0, 0x3d1688c83e1b0cba
> +        .quad 0xc08ff74a5f703138, 0xbd52c398c872b701
> +        .quad 0xc08ff74ea08cb240, 0xbd49aabc3683b259
> +        .quad 0xc08ff752e01851d0, 0x3d5ccba8de72495b
> +        .quad 0xc08ff7571e143688, 0xbd5981cf630f5793
> +        .quad 0xc08ff75b5a8185e8, 0xbd4f235844e01ebd
> +        .quad 0xc08ff75f95616410, 0xbd5047de7ba8ec62
> +        .quad 0xc08ff763ceb4f3f0, 0x3d5fa55e004d6562
> +        .quad 0xc08ff768067d5720, 0xbd49f386e521a80e
> +        .quad 0xc08ff76c3cbbae20, 0x3d3693551e62fe83
> +        .quad 0xc08ff77071711818, 0x3d4ba63b30b6c42c
> +        .quad 0xc08ff774a49eb300, 0x3d4c26523d32f573
> +        .quad 0xc08ff778d6459b98, 0x3d3b65e70806143a
> +        .quad 0xc08ff77d0666ed68, 0xbd5796d9c9f2c2cb
> +        .quad 0xc08ff7813503c2d0, 0x3d33267b004b912b
> +        .quad 0xc08ff785621d34e8, 0x3d1d5d8a23e33341
> +        .quad 0xc08ff7898db45ba8, 0x3d46c95233e60f40
> +        .quad 0xc08ff78db7ca4dd0, 0x3d362865acc8f43f
> +        .quad 0xc08ff791e06020f8, 0xbd10e8203e161511
> +        .quad 0xc08ff7960776e988, 0xbd5cafe4f4467eaa
> +        .quad 0xc08ff79a2d0fbac8, 0xbd520fddea9ea0cd
> +        .quad 0xc08ff79e512ba6d0, 0x3d5c53d3778dae46
> +        .quad 0xc08ff7a273cbbe80, 0xbd5f0f6f88490367
> +        .quad 0xc08ff7a694f111c0, 0x3d5601aa3f55ec11
> +        .quad 0xc08ff7aab49caf20, 0xbd4f1a8a2328a4c4
> +        .quad 0xc08ff7aed2cfa438, 0xbd4a3d5341c07d0e
> +        .quad 0xc08ff7b2ef8afd68, 0xbd5f4a1f4c525f31
> +        .quad 0xc08ff7b70acfc600, 0xbd4d594d77b3d775
> +        .quad 0xc08ff7bb249f0828, 0x3d2aef47e37e953b
> +        .quad 0xc08ff7bf3cf9ccf0, 0x3d501803b47dfba2
> +        .quad 0xc08ff7c353e11c50, 0x3d5ed5ec84e5745e
> +        .quad 0xc08ff7c76955fd20, 0xbd3de249bc9e7f96
> +        .quad 0xc08ff7cb7d597538, 0x3d5b5794341d1fdf
> +        .quad 0xc08ff7cf8fec8938, 0xbd519dbd08276359
> +        .quad 0xc08ff7d3a1103cd0, 0xbd450129b8038848
> +        .quad 0xc08ff7d7b0c59288, 0x3d348f00d3bb30fd
> +        .quad 0xc08ff7dbbf0d8bd8, 0xbd43529025720d8a
> +        .quad 0xc08ff7dfcbe92938, 0x3d5abdaa2b1955d7
> +        .quad 0xc08ff7e3d75969f8, 0xbd4e8837d4588a98
> +        .quad 0xc08ff7e7e15f4c80, 0x3d57a782a6df5a1f
> +        .quad 0xc08ff7ebe9fbce08, 0x3d304ba3eaa96bf1
> +        .quad 0xc08ff7eff12fead8, 0xbd47aab17b868a60
> +        .quad 0xc08ff7f3f6fc9e28, 0xbd5bd858693ba90a
> +        .quad 0xc08ff7f7fb62e230, 0x3d26abb2c547789a
> +        .quad 0xc08ff7fbfe63b010, 0xbd59d383d543b3f5
> +        .quad 0xc08ff80000000000, 0x8000000000000000
> +        /*== Log_LA_table ==*/
> +        .align 32
> +        .quad 0x0000000000000000
> +        .quad 0xbf670f83ff0a7565
> +        .quad 0xbf7709c46d7aac77
> +        .quad 0xbf8143068125dd0e
> +        .quad 0xbf86fe50b6ef0851
> +        .quad 0xbf8cb6c3abd14559
> +        .quad 0xbf91363117a97b0c
> +        .quad 0xbf940f9786685d29
> +        .quad 0xbf96e79685c2d22a
> +        .quad 0xbf99be2f7749acc2
> +        .quad 0xbf9c9363ba850f86
> +        .quad 0xbf9f6734acf8695a
> +        .quad 0xbfa11cd1d5133413
> +        .quad 0xbfa2855905ca70f6
> +        .quad 0xbfa3ed3094685a26
> +        .quad 0xbfa554592bb8cd58
> +        .quad 0xbfa6bad3758efd87
> +        .quad 0xbfa820a01ac754cb
> +        .quad 0xbfa985bfc3495194
> +        .quad 0xbfaaea3316095f72
> +        .quad 0xbfac4dfab90aab5f
> +        .quad 0xbfadb1175160f3b0
> +        .quad 0xbfaf1389833253a0
> +        .quad 0xbfb03aa8f8dc854c
> +        .quad 0xbfb0eb389fa29f9b
> +        .quad 0xbfb19b74069f5f0a
> +        .quad 0xbfb24b5b7e135a3d
> +        .quad 0xbfb2faef55ccb372
> +        .quad 0xbfb3aa2fdd27f1c3
> +        .quad 0xbfb4591d6310d85a
> +        .quad 0xbfb507b836033bb7
> +        .quad 0xbfb5b600a40bd4f3
> +        .quad 0xbfb663f6fac91316
> +        .quad 0xbfb7119b876bea86
> +        .quad 0xbfb7beee96b8a281
> +        .quad 0xbfb86bf07507a0c7
> +        .quad 0xbfb918a16e46335b
> +        .quad 0xbfb9c501cdf75872
> +        .quad 0xbfba7111df348494
> +        .quad 0xbfbb1cd1ecae66e7
> +        .quad 0xbfbbc84240adabba
> +        .quad 0xbfbc73632513bd4f
> +        .quad 0xbfbd1e34e35b82da
> +        .quad 0xbfbdc8b7c49a1ddb
> +        .quad 0xbfbe72ec117fa5b2
> +        .quad 0xbfbf1cd21257e18c
> +        .quad 0xbfbfc66a0f0b00a5
> +        .quad 0xbfc037da278f2870
> +        .quad 0xbfc08c588cda79e4
> +        .quad 0xbfc0e0b05ac848ed
> +        .quad 0xbfc134e1b489062e
> +        .quad 0xbfc188ecbd1d16be
> +        .quad 0xbfc1dcd197552b7b
> +        .quad 0xbfc2309065d29791
> +        .quad 0xbfc284294b07a640
> +        .quad 0xbfc2d79c6937efdd
> +        .quad 0xbfc32ae9e278ae1a
> +        .quad 0xbfc37e11d8b10f89
> +        .quad 0xbfc3d1146d9a8a64
> +        .quad 0xbfc423f1c2c12ea2
> +        .quad 0xbfc476a9f983f74d
> +        .quad 0xbfc4c93d33151b24
> +        .quad 0xbfc51bab907a5c8a
> +        .quad 0xbfc56df5328d58c5
> +        .quad 0xbfc5c01a39fbd688
> +        .quad 0xbfc6121ac74813cf
> +        .quad 0xbfc663f6fac91316
> +        .quad 0xbfc6b5aef4aae7dc
> +        .quad 0xbfc70742d4ef027f
> +        .quad 0xbfc758b2bb6c7b76
> +        .quad 0xbfc7a9fec7d05ddf
> +        .quad 0xbfc7fb27199df16d
> +        .quad 0xbfc84c2bd02f03b3
> +        .quad 0xbfc89d0d0ab430cd
> +        .quad 0xbfc8edcae8352b6c
> +        .quad 0xbfc93e6587910444
> +        .quad 0xbfc98edd077e70df
> +        .quad 0xbfc9df31868c11d5
> +        .quad 0xbfca2f632320b86b
> +        .quad 0xbfca7f71fb7bab9d
> +        .quad 0xbfcacf5e2db4ec94
> +        .quad 0xbfcb1f27d7bd7a80
> +        .quad 0xbfcb6ecf175f95e9
> +        .quad 0xbfcbbe540a3f036f
> +        .quad 0xbfcc0db6cdd94dee
> +        .quad 0xbfcc5cf77f860826
> +        .quad 0xbfccac163c770dc9
> +        .quad 0xbfccfb1321b8c400
> +        .quad 0xbfcd49ee4c325970
> +        .quad 0xbfcd98a7d8a605a7
> +        .quad 0xbfcde73fe3b1480f
> +        .quad 0xbfce35b689cd2655
> +        .quad 0xbfce840be74e6a4d
> +        .quad 0xbfced2401865df52
> +        .quad 0xbfcf205339208f27
> +        .quad 0xbfcf6e456567fe55
> +        .quad 0xbfcfbc16b902680a
> +        .quad 0xbfd004e3a7c97cbd
> +        .quad 0xbfd02baba24d0664
> +        .quad 0xbfd0526359bab1b3
> +        .quad 0xbfd0790adbb03009
> +        .quad 0xbfd09fa235ba2020
> +        .quad 0xbfd0c62975542a8f
> +        .quad 0xbfd0eca0a7e91e0b
> +        .quad 0xbfd11307dad30b76
> +        .quad 0xbfd1395f1b5b61a6
> +        .quad 0xbfd15fa676bb08ff
> +        .quad 0xbfd185ddfa1a7ed0
> +        .quad 0xbfd1ac05b291f070
> +        .quad 0xbfd1d21dad295632
> +        .quad 0xbfd1f825f6d88e13
> +        .quad 0xbfd21e1e9c877639
> +        .quad 0xbfd24407ab0e073a
> +        .quad 0xbfd269e12f346e2c
> +        .quad 0xbfd28fab35b32683
> +        .quad 0xbfd2b565cb3313b6
> +        .quad 0xbfd2db10fc4d9aaf
> +        .quad 0xbfd300acd58cbb10
> +        .quad 0xbfd32639636b2836
> +        .quad 0xbfd34bb6b2546218
> +        .quad 0xbfd37124cea4cded
> +        .quad 0xbfd39683c4a9ce9a
> +        .quad 0xbfd3bbd3a0a1dcfb
> +        .quad 0xbfd3e1146ebc9ff2
> +        .quad 0xbfd406463b1b0449
> +        .quad 0xbfd42b6911cf5465
> +        .quad 0xbfd4507cfedd4fc4
> +        .quad 0xbfd475820e3a4251
> +        .quad 0xbfd49a784bcd1b8b
> +        .quad 0xbfd4bf5fc36e8577
> +        .quad 0xbfd4e43880e8fb6a
> +        .quad 0xbfd509028ff8e0a2
> +        .quad 0xbfd52dbdfc4c96b3
> +        .quad 0xbfd5526ad18493ce
> +        .quad 0xbfd577091b3378cb
> +        .quad 0xbfd59b98e4de271c
> +        .quad 0xbfd5c01a39fbd688
> +        .quad 0xbfd5e48d25f62ab9
> +        .quad 0xbfd608f1b42948ae
> +        .quad 0xbfd62d47efe3ebee
> +        .quad 0xbfd6518fe4677ba7
> +        .quad 0xbfd675c99ce81f92
> +        .quad 0xbfd699f5248cd4b8
> +        .quad 0xbfd6be12866f820d
> +        .quad 0xbfd6e221cd9d0cde
> +        .quad 0xbfd7062305156d1d
> +        .quad 0xbfd72a1637cbc183
> +        .quad 0xbfd74dfb70a66388
> +        .quad 0xbfd771d2ba7efb3c
> +        .quad 0xbfd7959c202292f1
> +        .quad 0xbfd7b957ac51aac4
> +        .quad 0xbfd7dd0569c04bff
> +        .quad 0xbfd800a563161c54
> +        .quad 0xbfd82437a2ee70f7
> +        .quad 0xbfd847bc33d8618e
> +        .quad 0xbfd86b332056db01
> +        .quad 0xbfd88e9c72e0b226
> +        .quad 0xbfd8b1f835e0b642
> +        .quad 0xbfd8d54673b5c372
> +        .quad 0xbfd8f88736b2d4e8
> +        .quad 0xbfd91bba891f1709
> +        .quad 0xbfd93ee07535f967
> +        .quad 0xbfd961f90527409c
> +        .quad 0xbfd98504431717fc
> +        .quad 0xbfd9a802391e232f
> +        .quad 0xbfd9caf2f1498fa4
> +        .quad 0xbfd9edd6759b25e0
> +        .quad 0xbfda10acd0095ab4
> +        .quad 0xbfda33760a7f6051
> +        .quad 0xbfda56322edd3731
> +        .quad 0xbfda78e146f7bef4
> +        .quad 0xbfda9b835c98c70a
> +        .quad 0xbfdabe18797f1f49
> +        .quad 0xbfdae0a0a75ea862
> +        .quad 0xbfdb031befe06434
> +        .quad 0xbfdb258a5ca28608
> +        .quad 0xbfdb47ebf73882a1
> +        .quad 0xbfdb6a40c92b203f
> +        .quad 0xbfdb8c88dbf8867a
> +        .quad 0xbfdbaec439144dfd
> +        .quad 0xbfdbd0f2e9e79031
> +        .quad 0xbfdbf314f7d0f6ba
> +        .quad 0xbfdc152a6c24cae6
> +        .quad 0xbfdc3733502d04f8
> +        .quad 0xbfdc592fad295b56
> +        .quad 0xbfdc7b1f8c4f51a4
> +        .quad 0xbfdc9d02f6ca47b4
> +        .quad 0xbfdcbed9f5bb886a
> +        .quad 0xbfdce0a4923a587d
> +        .quad 0xbfdd0262d554051c
> +        .quad 0xbfdd2414c80bf27d
> +        .quad 0xbfdd45ba735baa4f
> +        .quad 0xbfdd6753e032ea0f
> +        .quad 0xbfdd88e11777b149
> +        .quad 0xbfddaa6222064fb9
> +        .quad 0xbfddcbd708b17359
> +        .quad 0xbfdded3fd442364c
> +        .quad 0xbfde0e9c8d782cbd
> +        .quad 0xbfde2fed3d097298
> +        .quad 0xbfde5131eba2b931
> +        .quad 0xbfde726aa1e754d2
> +        .quad 0xbfde939768714a32
> +        .quad 0xbfdeb4b847d15bce
> +        .quad 0xbfded5cd488f1732
> +        .quad 0xbfdef6d67328e220
> +        .quad 0xbfdf17d3d01407af
> +        .quad 0xbfdf38c567bcc541
> +        .quad 0xbfdf59ab4286576c
> +        .quad 0xbfdf7a8568cb06cf
> +        .quad 0xbfdf9b53e2dc34c4
> +        .quad 0xbfdfbc16b902680a
> +        .quad 0xbfdfdccdf37d594c
> +        .quad 0xbfdffd799a83ff9b
> +        .quad 0x3fdfe1e649bb6335
> +        .quad 0x3fdfc151b11b3640
> +        .quad 0x3fdfa0c8937e7d5d
> +        .quad 0x3fdf804ae8d0cd02
> +        .quad 0x3fdf5fd8a9063e35
> +        .quad 0x3fdf3f71cc1b629c
> +        .quad 0x3fdf1f164a15389a
> +        .quad 0x3fdefec61b011f85
> +        .quad 0x3fdede8136f4cbf1
> +        .quad 0x3fdebe47960e3c08
> +        .quad 0x3fde9e193073ac06
> +        .quad 0x3fde7df5fe538ab3
> +        .quad 0x3fde5dddf7e46e0a
> +        .quad 0x3fde3dd1156507de
> +        .quad 0x3fde1dcf4f1c1a9e
> +        .quad 0x3fddfdd89d586e2b
> +        .quad 0x3fddddecf870c4c1
> +        .quad 0x3fddbe0c58c3cff2
> +        .quad 0x3fdd9e36b6b825b1
> +        .quad 0x3fdd7e6c0abc3579
> +        .quad 0x3fdd5eac4d463d7e
> +        .quad 0x3fdd3ef776d43ff4
> +        .quad 0x3fdd1f4d7febf868
> +        .quad 0x3fdcffae611ad12b
> +        .quad 0x3fdce01a12f5d8d1
> +        .quad 0x3fdcc0908e19b7bd
> +        .quad 0x3fdca111cb2aa5c5
> +        .quad 0x3fdc819dc2d45fe4
> +        .quad 0x3fdc62346dca1dfe
> +        .quad 0x3fdc42d5c4c688b4
> +        .quad 0x3fdc2381c08baf4f
> +        .quad 0x3fdc043859e2fdb3
> +        .quad 0x3fdbe4f9899d326e
> +        .quad 0x3fdbc5c5489254cc
> +        .quad 0x3fdba69b8fa1ab02
> +        .quad 0x3fdb877c57b1b070
> +        .quad 0x3fdb686799b00be3
> +        .quad 0x3fdb495d4e9185f7
> +        .quad 0x3fdb2a5d6f51ff83
> +        .quad 0x3fdb0b67f4f46810
> +        .quad 0x3fdaec7cd882b46c
> +        .quad 0x3fdacd9c130dd53f
> +        .quad 0x3fdaaec59dadadbe
> +        .quad 0x3fda8ff971810a5e
> +        .quad 0x3fda713787ad97a5
> +        .quad 0x3fda527fd95fd8ff
> +        .quad 0x3fda33d25fcb1fac
> +        .quad 0x3fda152f142981b4
> +        .quad 0x3fd9f695efbbd0ef
> +        .quad 0x3fd9d806ebc9921c
> +        .quad 0x3fd9b98201a0f405
> +        .quad 0x3fd99b072a96c6b2
> +        .quad 0x3fd97c96600672ad
> +        .quad 0x3fd95e2f9b51f04e
> +        .quad 0x3fd93fd2d5e1bf1d
> +        .quad 0x3fd921800924dd3b
> +        .quad 0x3fd903372e90bee4
> +        .quad 0x3fd8e4f83fa145ee
> +        .quad 0x3fd8c6c335d8b966
> +        .quad 0x3fd8a8980abfbd32
> +        .quad 0x3fd88a76b7e549c6
> +        .quad 0x3fd86c5f36dea3dc
> +        .quad 0x3fd84e5181475449
> +        .quad 0x3fd8304d90c11fd3
> +        .quad 0x3fd812535ef3ff19
> +        .quad 0x3fd7f462e58e1688
> +        .quad 0x3fd7d67c1e43ae5c
> +        .quad 0x3fd7b89f02cf2aad
> +        .quad 0x3fd79acb8cf10390
> +        .quad 0x3fd77d01b66fbd37
> +        .quad 0x3fd75f417917e02c
> +        .quad 0x3fd7418acebbf18f
> +        .quad 0x3fd723ddb1346b65
> +        .quad 0x3fd7063a1a5fb4f2
> +        .quad 0x3fd6e8a004221b1f
> +        .quad 0x3fd6cb0f6865c8ea
> +        .quad 0x3fd6ad88411abfea
> +        .quad 0x3fd6900a8836d0d5
> +        .quad 0x3fd6729637b59418
> +        .quad 0x3fd6552b49986277
> +        .quad 0x3fd637c9b7e64dc2
> +        .quad 0x3fd61a717cac1983
> +        .quad 0x3fd5fd2291fc33cf
> +        .quad 0x3fd5dfdcf1eeae0e
> +        .quad 0x3fd5c2a096a135dc
> +        .quad 0x3fd5a56d7a370ded
> +        .quad 0x3fd5884396d90702
> +        .quad 0x3fd56b22e6b578e5
> +        .quad 0x3fd54e0b64003b70
> +        .quad 0x3fd530fd08f29fa7
> +        .quad 0x3fd513f7cfcb68ce
> +        .quad 0x3fd4f6fbb2cec598
> +        .quad 0x3fd4da08ac46495a
> +        .quad 0x3fd4bd1eb680e548
> +        .quad 0x3fd4a03dcbd2e1be
> +        .quad 0x3fd48365e695d797
> +        .quad 0x3fd466970128a987
> +        .quad 0x3fd449d115ef7d87
> +        .quad 0x3fd42d141f53b646
> +        .quad 0x3fd4106017c3eca3
> +        .quad 0x3fd3f3b4f9b3e939
> +        .quad 0x3fd3d712bf9c9def
> +        .quad 0x3fd3ba7963fc1f8f
> +        .quad 0x3fd39de8e1559f6f
> +        .quad 0x3fd3816132316520
> +        .quad 0x3fd364e2511cc821
> +        .quad 0x3fd3486c38aa29a8
> +        .quad 0x3fd32bfee370ee68
> +        .quad 0x3fd30f9a4c0d786d
> +        .quad 0x3fd2f33e6d2120f2
> +        .quad 0x3fd2d6eb4152324f
> +        .quad 0x3fd2baa0c34be1ec
> +        .quad 0x3fd29e5eedbe4a35
> +        .quad 0x3fd28225bb5e64a4
> +        .quad 0x3fd265f526e603cb
> +        .quad 0x3fd249cd2b13cd6c
> +        .quad 0x3fd22dadc2ab3497
> +        .quad 0x3fd21196e87473d1
> +        .quad 0x3fd1f588973c8747
> +        .quad 0x3fd1d982c9d52708
> +        .quad 0x3fd1bd857b14c146
> +        .quad 0x3fd1a190a5d674a0
> +        .quad 0x3fd185a444fa0a7b
> +        .quad 0x3fd169c05363f158
> +        .quad 0x3fd14de4cbfd373e
> +        .quad 0x3fd13211a9b38424
> +        .quad 0x3fd11646e7791469
> +        .quad 0x3fd0fa848044b351
> +        .quad 0x3fd0deca6f11b58b
> +        .quad 0x3fd0c318aedff3c0
> +        .quad 0x3fd0a76f3ab3c52c
> +        .quad 0x3fd08bce0d95fa38
> +        .quad 0x3fd070352293d724
> +        .quad 0x3fd054a474bf0eb7
> +        .quad 0x3fd0391bff2dbcf3
> +        .quad 0x3fd01d9bbcfa61d4
> +        .quad 0x3fd00223a943dc19
> +        .quad 0x3fcfcd677e5ac81d
> +        .quad 0x3fcf9697f3bd0ccf
> +        .quad 0x3fcf5fd8a9063e35
> +        .quad 0x3fcf29299496a889
> +        .quad 0x3fcef28aacd72231
> +        .quad 0x3fcebbfbe83901a6
> +        .quad 0x3fce857d3d361368
> +        .quad 0x3fce4f0ea2509008
> +        .quad 0x3fce18b00e13123d
> +        .quad 0x3fcde26177108d03
> +        .quad 0x3fcdac22d3e441d3
> +        .quad 0x3fcd75f41b31b6dd
> +        .quad 0x3fcd3fd543a4ad5c
> +        .quad 0x3fcd09c643f117f0
> +        .quad 0x3fccd3c712d31109
> +        .quad 0x3fcc9dd7a70ed160
> +        .quad 0x3fcc67f7f770a67e
> +        .quad 0x3fcc3227facce950
> +        .quad 0x3fcbfc67a7fff4cc
> +        .quad 0x3fcbc6b6f5ee1c9b
> +        .quad 0x3fcb9115db83a3dd
> +        .quad 0x3fcb5b844fb4b3ef
> +        .quad 0x3fcb2602497d5346
> +        .quad 0x3fcaf08fbfe15c51
> +        .quad 0x3fcabb2ca9ec7472
> +        .quad 0x3fca85d8feb202f7
> +        .quad 0x3fca5094b54d2828
> +        .quad 0x3fca1b5fc4e0b465
> +        .quad 0x3fc9e63a24971f46
> +        .quad 0x3fc9b123cba27ed3
> +        .quad 0x3fc97c1cb13c7ec1
> +        .quad 0x3fc94724cca657be
> +        .quad 0x3fc9123c1528c6ce
> +        .quad 0x3fc8dd62821404a9
> +        .quad 0x3fc8a8980abfbd32
> +        .quad 0x3fc873dca68b06f4
> +        .quad 0x3fc83f304cdc5aa7
> +        .quad 0x3fc80a92f5218acc
> +        .quad 0x3fc7d60496cfbb4c
> +        .quad 0x3fc7a18529635926
> +        .quad 0x3fc76d14a4601225
> +        .quad 0x3fc738b2ff50ccad
> +        .quad 0x3fc7046031c79f85
> +        .quad 0x3fc6d01c335dc9b5
> +        .quad 0x3fc69be6fbb3aa6f
> +        .quad 0x3fc667c08270b905
> +        .quad 0x3fc633a8bf437ce1
> +        .quad 0x3fc5ff9fa9e18595
> +        .quad 0x3fc5cba53a0762ed
> +        .quad 0x3fc597b967789d12
> +        .quad 0x3fc563dc29ffacb2
> +        .quad 0x3fc5300d796df33a
> +        .quad 0x3fc4fc4d4d9bb313
> +        .quad 0x3fc4c89b9e6807f5
> +        .quad 0x3fc494f863b8df35
> +        .quad 0x3fc46163957af02e
> +        .quad 0x3fc42ddd2ba1b4a9
> +        .quad 0x3fc3fa651e276158
> +        .quad 0x3fc3c6fb650cde51
> +        .quad 0x3fc3939ff859bf9f
> +        .quad 0x3fc36052d01c3dd7
> +        .quad 0x3fc32d13e4692eb7
> +        .quad 0x3fc2f9e32d5bfdd1
> +        .quad 0x3fc2c6c0a316a540
> +        .quad 0x3fc293ac3dc1a668
> +        .quad 0x3fc260a5f58c02bd
> +        .quad 0x3fc22dadc2ab3497
> +        .quad 0x3fc1fac39d5b280c
> +        .quad 0x3fc1c7e77dde33dc
> +        .quad 0x3fc195195c7d125b
> +        .quad 0x3fc162593186da70
> +        .quad 0x3fc12fa6f550f896
> +        .quad 0x3fc0fd02a03727ea
> +        .quad 0x3fc0ca6c2a9b6b41
> +        .quad 0x3fc097e38ce60649
> +        .quad 0x3fc06568bf8576b3
> +        .quad 0x3fc032fbbaee6d65
> +        .quad 0x3fc0009c779bc7b5
> +        .quad 0x3fbf9c95dc1d1165
> +        .quad 0x3fbf380e2d9ba4df
> +        .quad 0x3fbed3a1d4cdbebb
> +        .quad 0x3fbe6f50c2d9f754
> +        .quad 0x3fbe0b1ae8f2fd56
> +        .quad 0x3fbda700385788a2
> +        .quad 0x3fbd4300a2524d41
> +        .quad 0x3fbcdf1c1839ee74
> +        .quad 0x3fbc7b528b70f1c5
> +        .quad 0x3fbc17a3ed65b23c
> +        .quad 0x3fbbb4102f925394
> +        .quad 0x3fbb5097437cb58e
> +        .quad 0x3fbaed391ab6674e
> +        .quad 0x3fba89f5a6dc9acc
> +        .quad 0x3fba26ccd9981853
> +        .quad 0x3fb9c3bea49d3214
> +        .quad 0x3fb960caf9abb7ca
> +        .quad 0x3fb8fdf1ca8eea6a
> +        .quad 0x3fb89b33091d6fe8
> +        .quad 0x3fb8388ea739470a
> +        .quad 0x3fb7d60496cfbb4c
> +        .quad 0x3fb77394c9d958d5
> +        .quad 0x3fb7113f3259e07a
> +        .quad 0x3fb6af03c2603bd0
> +        .quad 0x3fb64ce26c067157
> +        .quad 0x3fb5eadb217198a3
> +        .quad 0x3fb588edd4d1ceaa
> +        .quad 0x3fb5271a78622a0f
> +        .quad 0x3fb4c560fe68af88
> +        .quad 0x3fb463c15936464e
> +        .quad 0x3fb4023b7b26ac9e
> +        .quad 0x3fb3a0cf56a06c4b
> +        .quad 0x3fb33f7cde14cf5a
> +        .quad 0x3fb2de4403ffd4b3
> +        .quad 0x3fb27d24bae824db
> +        .quad 0x3fb21c1ef55f06c2
> +        .quad 0x3fb1bb32a600549d
> +        .quad 0x3fb15a5fbf7270ce
> +        .quad 0x3fb0f9a634663add
> +        .quad 0x3fb09905f797047c
> +        .quad 0x3fb0387efbca869e
> +        .quad 0x3fafb02267a1ad2d
> +        .quad 0x3faeef792508b69d
> +        .quad 0x3fae2f02159384fe
> +        .quad 0x3fad6ebd1f1febfe
> +        .quad 0x3facaeaa27a02241
> +        .quad 0x3fabeec9151aac2e
> +        .quad 0x3fab2f19cdaa46dc
> +        .quad 0x3faa6f9c377dd31b
> +        .quad 0x3fa9b05038d84095
> +        .quad 0x3fa8f135b8107912
> +        .quad 0x3fa8324c9b914bc7
> +        .quad 0x3fa77394c9d958d5
> +        .quad 0x3fa6b50e297afcce
> +        .quad 0x3fa5f6b8a11c3c61
> +        .quad 0x3fa538941776b01e
> +        .quad 0x3fa47aa07357704f
> +        .quad 0x3fa3bcdd9b9f00f3
> +        .quad 0x3fa2ff4b77413dcb
> +        .quad 0x3fa241e9ed454683
> +        .quad 0x3fa184b8e4c56af8
> +        .quad 0x3fa0c7b844ef1795
> +        .quad 0x3fa00ae7f502c1c4
> +        .quad 0x3f9e9c8fb8a7a900
> +        .quad 0x3f9d23afc49139f9
> +        .quad 0x3f9bab2fdcb46ec7
> +        .quad 0x3f9a330fd028f75f
> +        .quad 0x3f98bb4f6e2bd536
> +        .quad 0x3f9743ee861f3556
> +        .quad 0x3f95ccece78a4a9e
> +        .quad 0x3f94564a62192834
> +        .quad 0x3f92e006c59c9c29
> +        .quad 0x3f916a21e20a0a45
> +        .quad 0x3f8fe9370ef68e1b
> +        .quad 0x3f8cfee70c5ce5dc
> +        .quad 0x3f8a15535d0bab34
> +        .quad 0x3f872c7ba20f7327
> +        .quad 0x3f84445f7cbc8fd2
> +        .quad 0x3f815cfe8eaec830
> +        .quad 0x3f7cecb0f3922091
> +        .quad 0x3f7720d9c06a835f
> +        .quad 0x3f715676c8c7a8c1
> +        .quad 0x3f671b0ea42e5fda
> +        .quad 0x3f57182a894b69c6
> +        .quad 0x8000000000000000
> +        /*== poly_coeff[5] ==*/
> +        .align 32
> +        .quad 0x3fd2776E996DA1D2, 0x3fd2776E996DA1D2, 0x3fd2776E996DA1D2, 0x3fd2776E996DA1D2 /* coeff5 */
> +        .quad 0xbfd715494C3E7C9B, 0xbfd715494C3E7C9B, 0xbfd715494C3E7C9B, 0xbfd715494C3E7C9B /* coeff4 */
> +        .quad 0x3fdEC709DC39E926, 0x3fdEC709DC39E926, 0x3fdEC709DC39E926, 0x3fdEC709DC39E926 /* coeff3 */
> +        .quad 0xbfe71547652B7CF8, 0xbfe71547652B7CF8, 0xbfe71547652B7CF8, 0xbfe71547652B7CF8 /* coeff2 */
> +        .quad 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE, 0x3ff71547652B82FE /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 32
> +        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinNorm ==*/
> +        .align 32
> +        .quad 0x0010000000000000, 0x0010000000000000, 0x0010000000000000, 0x0010000000000000
> +        /*== MaxNorm ==*/
> +        .align 32
> +        .quad 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff
> +        /*== HalfMask ==*/
> +        .align 32
> +        .quad 0xfffffffffc000000, 0xfffffffffc000000, 0xfffffffffc000000, 0xfffffffffc000000
> +        /*== One ==*/
> +        .align 32
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== Threshold ==*/
> +        .align 32
> +        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 32
> +        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 32
> +        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
> +        .align 32
> +        .type	__svml_dlog2_data_internal,@object
> +        .size	__svml_dlog2_data_internal,.-__svml_dlog2_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core-avx2.S
> new file mode 100644
> index 0000000000..804de5fe0c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized log2, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_log2 _ZGVeN8v_log2_avx2_wrapper
> +#include "../svml_d_log28_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core.c
> new file mode 100644
> index 0000000000..bd55abecc7
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized log2, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_log2
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_log2, __GI__ZGVeN8v_log2, __redirect__ZGVeN8v_log2)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core_avx512.S
> new file mode 100644
> index 0000000000..211a78f315
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log28_core_avx512.S
> @@ -0,0 +1,293 @@
> +/* Function log2 vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log2(x) = k - log2(Rcp) + poly_approximation(R)
> + *       log2(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dlog2_data_internal_avx512
> + */
> +#define Log_tbl                       	0
> +#define One                           	128
> +#define C075                          	192
> +#define poly_coeff9                   	256
> +#define poly_coeff8                   	320
> +#define poly_coeff7                   	384
> +#define poly_coeff6                   	448
> +#define poly_coeff5                   	512
> +#define poly_coeff4                   	576
> +#define poly_coeff3                   	640
> +#define poly_coeff2                   	704
> +#define poly_coeff1                   	768
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_log2_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovaps   %zmm0, %zmm7
> +        vgetmantpd $8, {sae}, %zmm7, %zmm6
> +        vmovups   One+__svml_dlog2_data_internal_avx512(%rip), %zmm2
> +        vmovups   poly_coeff5+__svml_dlog2_data_internal_avx512(%rip), %zmm12
> +        vmovups   poly_coeff3+__svml_dlog2_data_internal_avx512(%rip), %zmm13
> +
> +/* Start polynomial evaluation */
> +        vmovups   poly_coeff9+__svml_dlog2_data_internal_avx512(%rip), %zmm10
> +        vmovups   poly_coeff8+__svml_dlog2_data_internal_avx512(%rip), %zmm0
> +        vmovups   poly_coeff7+__svml_dlog2_data_internal_avx512(%rip), %zmm11
> +        vmovups   poly_coeff6+__svml_dlog2_data_internal_avx512(%rip), %zmm14
> +
> +/* Prepare exponent correction: DblRcp<0.75? */
> +        vmovups   C075+__svml_dlog2_data_internal_avx512(%rip), %zmm1
> +
> +/* Table lookup */
> +        vmovups   __svml_dlog2_data_internal_avx512(%rip), %zmm4
> +
> +/* GetExp(x) */
> +        vgetexppd {sae}, %zmm7, %zmm5
> +
> +/* DblRcp ~ 1/Mantissa */
> +        vrcp14pd  %zmm6, %zmm8
> +
> +/* x<=0? */
> +        vfpclasspd $94, %zmm7, %k0
> +
> +/* round DblRcp to 4 fractional bits (RN mode, no Precision exception) */
> +        vrndscalepd $88, {sae}, %zmm8, %zmm3
> +        vmovups   poly_coeff4+__svml_dlog2_data_internal_avx512(%rip), %zmm8
> +        kmovw     %k0, %edx
> +
> +/* Reduced argument: R = DblRcp*Mantissa - 1 */
> +        vfmsub213pd {rn-sae}, %zmm2, %zmm3, %zmm6
> +        vcmppd    $17, {sae}, %zmm1, %zmm3, %k1
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm12, %zmm8
> +        vmovups   poly_coeff2+__svml_dlog2_data_internal_avx512(%rip), %zmm12
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm10, %zmm0
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm11, %zmm14
> +        vmovups   poly_coeff1+__svml_dlog2_data_internal_avx512(%rip), %zmm1
> +
> +/* R^2 */
> +        vmulpd    {rn-sae}, %zmm6, %zmm6, %zmm15
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm13, %zmm12
> +
> +/* Prepare table index */
> +        vpsrlq    $48, %zmm3, %zmm9
> +
> +/* add 1 to Expon if DblRcp<0.75 */
> +        vaddpd    {rn-sae}, %zmm2, %zmm5, %zmm5{%k1}
> +        vmulpd    {rn-sae}, %zmm15, %zmm15, %zmm13
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm15, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm15, %zmm8
> +        vpermt2pd Log_tbl+64+__svml_dlog2_data_internal_avx512(%rip), %zmm9, %zmm4
> +
> +/* polynomial */
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm13, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm1, %zmm6, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm4, %zmm0, %zmm6
> +        vaddpd    {rn-sae}, %zmm6, %zmm5, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm7
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm7, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      log2@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_log2_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dlog2_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Log_tbl[16][2];
> +        __declspec(align(64)) VUINT32 One[8][2];
> +        __declspec(align(64)) VUINT32 C075[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
> +   } __svml_dlog2_data_internal_avx512;
> +#endif
> +__svml_dlog2_data_internal_avx512:
> +        /*== Log_tbl ==*/
> +        .quad 0x0000000000000000
> +        .quad 0xbfb663f6fac91316
> +        .quad 0xbfc5c01a39fbd688
> +        .quad 0xbfcfbc16b902680a
> +        .quad 0xbfd49a784bcd1b8b
> +        .quad 0xbfd91bba891f1709
> +        .quad 0xbfdd6753e032ea0f
> +        .quad 0xbfe0c10500d63aa6
> +        .quad 0x3fda8ff971810a5e
> +        .quad 0x3fd6cb0f6865c8ea
> +        .quad 0x3fd32bfee370ee68
> +        .quad 0x3fcf5fd8a9063e35
> +        .quad 0x3fc8a8980abfbd32
> +        .quad 0x3fc22dadc2ab3497
> +        .quad 0x3fb7d60496cfbb4c
> +        .quad 0x3fa77394c9d958d5
> +        /*== One ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== C075 0.75 ==*/
> +        .align 64
> +        .quad 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000
> +        /*== poly_coeff9 ==*/
> +        .align 64
> +        .quad 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12, 0x3fc4904bda0e1d12
> +        /*== poly_coeff8 ==*/
> +        .align 64
> +        .quad 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce, 0xbfc71fb84deb5cce
> +        /*== poly_coeff7 ==*/
> +        .align 64
> +        .quad 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613, 0x3fca617351818613
> +        /*== poly_coeff6 ==*/
> +        .align 64
> +        .quad 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c, 0xbfcec707e4e3144c
> +        /*== poly_coeff5 ==*/
> +        .align 64
> +        .quad 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a, 0x3fd2776c5114d91a
> +        /*== poly_coeff4 ==*/
> +        .align 64
> +        .quad 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d, 0xbfd71547653d0f8d
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .quad 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f, 0x3fdec709dc3a029f
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .quad 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4, 0xbfe71547652b82d4
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .quad 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe, 0x3ff71547652b82fe
> +        .align 64
> +        .type	__svml_dlog2_data_internal_avx512,@object
> +        .size	__svml_dlog2_data_internal_avx512,.-__svml_dlog2_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core-avx2.S
> new file mode 100644
> index 0000000000..234bf4750b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized log2f.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_log2f _ZGVeN16v_log2f_avx2_wrapper
> +#include "../svml_s_log2f16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core.c
> new file mode 100644
> index 0000000000..abf4f04988
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized log2f, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_log2f
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_log2f, __GI__ZGVeN16v_log2f,
> +	       __redirect__ZGVeN16v_log2f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core_avx512.S
> new file mode 100644
> index 0000000000..c3a5aceef4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f16_core_avx512.S
> @@ -0,0 +1,231 @@
> +/* Function log2f vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log2(x) = k - log2(Rcp) + poly_approximation(R)
> + *       log2(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_slog2_data_internal_avx512
> + */
> +#define One                           	0
> +#define coeff4                        	64
> +#define coeff3                        	128
> +#define coeff2                        	192
> +#define coeff1                        	256
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_log2f_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vgetmantps $11, {sae}, %zmm0, %zmm3
> +        vmovups   __svml_slog2_data_internal_avx512(%rip), %zmm1
> +        vgetexpps {sae}, %zmm0, %zmm5
> +
> +/* x<=0? */
> +        vfpclassps $94, %zmm0, %k0
> +        vsubps    {rn-sae}, %zmm1, %zmm3, %zmm9
> +        vpsrld    $19, %zmm3, %zmm7
> +        vgetexpps {sae}, %zmm3, %zmm6
> +        vpermps   coeff4+__svml_slog2_data_internal_avx512(%rip), %zmm7, %zmm1
> +        vpermps   coeff3+__svml_slog2_data_internal_avx512(%rip), %zmm7, %zmm2
> +        vpermps   coeff2+__svml_slog2_data_internal_avx512(%rip), %zmm7, %zmm4
> +        vpermps   coeff1+__svml_slog2_data_internal_avx512(%rip), %zmm7, %zmm8
> +        vsubps    {rn-sae}, %zmm6, %zmm5, %zmm10
> +        vfmadd213ps {rn-sae}, %zmm2, %zmm9, %zmm1
> +        kmovw     %k0, %edx
> +        vfmadd213ps {rn-sae}, %zmm4, %zmm9, %zmm1
> +        vfmadd213ps {rn-sae}, %zmm8, %zmm9, %zmm1
> +        vfmadd213ps {rn-sae}, %zmm10, %zmm9, %zmm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %zmm1, %zmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm0, 64(%rsp)
> +        vmovups   %zmm1, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm1
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      log2f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_log2f_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_slog2_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 One[16][1];
> +        __declspec(align(64)) VUINT32 coeff4[16][1];
> +        __declspec(align(64)) VUINT32 coeff3[16][1];
> +        __declspec(align(64)) VUINT32 coeff2[16][1];
> +        __declspec(align(64)) VUINT32 coeff1[16][1];
> +    } __svml_slog2_data_internal_avx512;
> +#endif
> +__svml_slog2_data_internal_avx512:
> +        /*== One ==*/
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        // c4
> +        .align 64
> +        .long 0xbea77e4a, 0xbe8aae3d
> +        .long 0xbe67fe32, 0xbe43d1b6
> +        .long 0xbe26a589, 0xbe0ee09b
> +        .long 0xbdf6a8a1, 0xbdd63b49
> +        .long 0xbf584e51, 0xbf3e80a1
> +        .long 0xbf2892f0, 0xbf15d377
> +        .long 0xbf05b525, 0xbeef8e30
> +        .long 0xbed75c8f, 0xbec24184
> +        // c3
> +        .align 64
> +        .long 0x3ef5910c, 0x3ef045a1
> +        .long 0x3ee7d87e, 0x3eddbb84
> +        .long 0x3ed2d6df, 0x3ec7bbd2
> +        .long 0x3ebcc42f, 0x3eb22616
> +        .long 0x3e8f3399, 0x3eb1223e
> +        .long 0x3ec9db4a, 0x3edb7a09
> +        .long 0x3ee79a1a, 0x3eef77cb
> +        .long 0x3ef407a4, 0x3ef607b4
> +        // c2
> +        .align 64
> +        .long 0xbf38a934, 0xbf387de6
> +        .long 0xbf37f6f0, 0xbf37048b
> +        .long 0xbf35a88a, 0xbf33ed04
> +        .long 0xbf31df56, 0xbf2f8d82
> +        .long 0xbf416814, 0xbf3daf58
> +        .long 0xbf3b5c08, 0xbf39fa2a
> +        .long 0xbf393713, 0xbf38d7e1
> +        .long 0xbf38b2cd, 0xbf38aa62
> +        // c1
> +        .align 64
> +        .long 0x3fb8aa3b, 0x3fb8a9c0
> +        .long 0x3fb8a6e8, 0x3fb89f4e
> +        .long 0x3fb890cb, 0x3fb879b1
> +        .long 0x3fb858d8, 0x3fb82d90
> +        .long 0x3fb8655e, 0x3fb8883a
> +        .long 0x3fb89aea, 0x3fb8a42f
> +        .long 0x3fb8a848, 0x3fb8a9c9
> +        .long 0x3fb8aa2f, 0x3fb8aa3b
> +        .align 64
> +        .type	__svml_slog2_data_internal_avx512,@object
> +        .size	__svml_slog2_data_internal_avx512,.-__svml_slog2_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core-sse2.S
> new file mode 100644
> index 0000000000..dd0e763ac9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized log2f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_log2f _ZGVbN4v_log2f_sse2
> +#include "../svml_s_log2f4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core.c
> new file mode 100644
> index 0000000000..1eb68d9f52
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized log2f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_log2f
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_log2f, __GI__ZGVbN4v_log2f,
> +	       __redirect__ZGVbN4v_log2f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core_sse4.S
> new file mode 100644
> index 0000000000..a45ea919f4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f4_core_sse4.S
> @@ -0,0 +1,223 @@
> +/* Function log2f vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log2(x) = k - log2(Rcp) + poly_approximation(R)
> + *       log2(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_slog2_data_internal
> + */
> +#define MinNorm                       	0
> +#define MaxNorm                       	16
> +#define iBrkValue                     	32
> +#define iOffExpoMask                  	48
> +#define One                           	64
> +#define sPoly                         	80
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_log2f_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm1
> +
> +/* reduction: compute r,n */
> +        movdqu    iBrkValue+__svml_slog2_data_internal(%rip), %xmm2
> +        movaps    %xmm0, %xmm4
> +        movdqu    iOffExpoMask+__svml_slog2_data_internal(%rip), %xmm10
> +        psubd     %xmm2, %xmm1
> +        pand      %xmm1, %xmm10
> +        movaps    %xmm0, %xmm3
> +        paddd     %xmm2, %xmm10
> +        psrad     $23, %xmm1
> +        movups    sPoly+__svml_slog2_data_internal(%rip), %xmm5
> +        movups    sPoly+32+__svml_slog2_data_internal(%rip), %xmm6
> +        movups    sPoly+64+__svml_slog2_data_internal(%rip), %xmm7
> +        movups    sPoly+96+__svml_slog2_data_internal(%rip), %xmm9
> +        cmpltps   MinNorm+__svml_slog2_data_internal(%rip), %xmm4
> +        cmpnleps  MaxNorm+__svml_slog2_data_internal(%rip), %xmm3
> +        cvtdq2ps  %xmm1, %xmm1
> +        subps     One+__svml_slog2_data_internal(%rip), %xmm10
> +        mulps     %xmm10, %xmm5
> +        movaps    %xmm10, %xmm8
> +        mulps     %xmm10, %xmm6
> +        mulps     %xmm10, %xmm8
> +        addps     sPoly+16+__svml_slog2_data_internal(%rip), %xmm5
> +        mulps     %xmm10, %xmm7
> +        addps     sPoly+48+__svml_slog2_data_internal(%rip), %xmm6
> +        mulps     %xmm10, %xmm9
> +        mulps     %xmm8, %xmm5
> +        addps     sPoly+80+__svml_slog2_data_internal(%rip), %xmm7
> +        addps     sPoly+112+__svml_slog2_data_internal(%rip), %xmm9
> +        addps     %xmm5, %xmm6
> +        mulps     %xmm8, %xmm6
> +        orps      %xmm3, %xmm4
> +
> +/* combine and get argument value range mask */
> +        movmskps  %xmm4, %edx
> +        addps     %xmm6, %xmm7
> +        mulps     %xmm7, %xmm8
> +        addps     %xmm8, %xmm9
> +        mulps     %xmm10, %xmm9
> +        addps     sPoly+128+__svml_slog2_data_internal(%rip), %xmm9
> +        mulps     %xmm9, %xmm10
> +        addps     %xmm10, %xmm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm1, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm1, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      log2f@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_log2f_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_slog2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 MinNorm[4][1];
> +        __declspec(align(16)) VUINT32 MaxNorm[4][1];
> +        __declspec(align(16)) VUINT32 iBrkValue[4][1];
> +        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
> +        __declspec(align(16)) VUINT32 One[4][1];
> +        __declspec(align(16)) VUINT32 sPoly[9][4][1];
> +} __svml_slog2_data_internal;
> +#endif
> +__svml_slog2_data_internal:
> +        /*== MinNorm ==*/
> +        .long 0x00800000, 0x00800000, 0x00800000, 0x00800000
> +        /*== MaxNorm ==*/
> +        .align 16
> +        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 16
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 16
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 16
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== spoly[9] ==*/
> +        .align 16
> +        .long 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012 /* coeff9 */
> +        .long 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14 /* coeff8 */
> +        .long 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B /* coeff7 */
> +        .long 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824 /* coeff6 */
> +        .long 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07 /* coeff5 */
> +        .long 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969 /* coeff4 */
> +        .long 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0 /* coeff3 */
> +        .long 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B /* coeff2 */
> +        .long 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B /* coeff1 */
> +        .align 16
> +        .type	__svml_slog2_data_internal,@object
> +        .size	__svml_slog2_data_internal,.-__svml_slog2_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core-sse.S
> new file mode 100644
> index 0000000000..ec4b70568d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized log2f, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_log2f _ZGVdN8v_log2f_sse_wrapper
> +#include "../svml_s_log2f8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core.c
> new file mode 100644
> index 0000000000..b3e958021a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized log2f, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_log2f
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_log2f, __GI__ZGVdN8v_log2f,
> +	       __redirect__ZGVdN8v_log2f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core_avx2.S
> new file mode 100644
> index 0000000000..bc0cb5081a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log2f8_core_avx2.S
> @@ -0,0 +1,226 @@
> +/* Function log2f vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log2(x) = k - log2(Rcp) + poly_approximation(R)
> + *       log2(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_slog2_data_internal
> + */
> +#define MinNorm                       	0
> +#define MaxNorm                       	32
> +#define iBrkValue                     	64
> +#define iOffExpoMask                  	96
> +#define One                           	128
> +#define sPoly                         	160
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_log2f_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +
> +/* reduction: compute r,n */
> +        vmovups   iBrkValue+__svml_slog2_data_internal(%rip), %ymm4
> +        vmovups   sPoly+64+__svml_slog2_data_internal(%rip), %ymm9
> +        vmovups   sPoly+128+__svml_slog2_data_internal(%rip), %ymm10
> +        vmovups   sPoly+192+__svml_slog2_data_internal(%rip), %ymm12
> +        vpsubd    %ymm4, %ymm0, %ymm1
> +        vcmplt_oqps MinNorm+__svml_slog2_data_internal(%rip), %ymm0, %ymm5
> +        vcmpnle_uqps MaxNorm+__svml_slog2_data_internal(%rip), %ymm0, %ymm6
> +        vpand     iOffExpoMask+__svml_slog2_data_internal(%rip), %ymm1, %ymm3
> +        vpsrad    $23, %ymm1, %ymm2
> +        vmovups   sPoly+__svml_slog2_data_internal(%rip), %ymm1
> +        vpaddd    %ymm4, %ymm3, %ymm8
> +        vcvtdq2ps %ymm2, %ymm14
> +        vsubps    One+__svml_slog2_data_internal(%rip), %ymm8, %ymm13
> +        vfmadd213ps sPoly+32+__svml_slog2_data_internal(%rip), %ymm13, %ymm1
> +        vfmadd213ps sPoly+96+__svml_slog2_data_internal(%rip), %ymm13, %ymm9
> +        vmulps    %ymm13, %ymm13, %ymm11
> +        vfmadd213ps sPoly+160+__svml_slog2_data_internal(%rip), %ymm13, %ymm10
> +        vfmadd213ps sPoly+224+__svml_slog2_data_internal(%rip), %ymm13, %ymm12
> +        vfmadd213ps %ymm9, %ymm11, %ymm1
> +        vfmadd213ps %ymm10, %ymm11, %ymm1
> +        vfmadd213ps %ymm12, %ymm11, %ymm1
> +        vfmadd213ps sPoly+256+__svml_slog2_data_internal(%rip), %ymm13, %ymm1
> +        vorps     %ymm6, %ymm5, %ymm7
> +
> +/* combine and get argument value range mask */
> +        vmovmskps %ymm7, %edx
> +        vfmadd213ps %ymm14, %ymm13, %ymm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %ymm1, %ymm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm0, 32(%rsp)
> +        vmovups   %ymm1, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm1
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      log2f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_log2f_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_slog2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 MinNorm[8][1];
> +        __declspec(align(32)) VUINT32 MaxNorm[8][1];
> +        __declspec(align(32)) VUINT32 iBrkValue[8][1];
> +        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
> +        __declspec(align(32)) VUINT32 One[8][1];
> +        __declspec(align(32)) VUINT32 sPoly[9][8][1];
> +} __svml_slog2_data_internal;
> +#endif
> +__svml_slog2_data_internal:
> +        /*== MinNorm ==*/
> +        .long 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000
> +        /*== MaxNorm ==*/
> +        .align 32
> +        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 32
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 32
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 32
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== spoly[9] ==*/
> +        .align 32
> +        .long 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012, 0x3e554012 /* coeff9 */
> +        .long 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14, 0xbe638E14 /* coeff8 */
> +        .long 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B, 0x3e4D660B /* coeff7 */
> +        .long 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824, 0xbe727824 /* coeff6 */
> +        .long 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07, 0x3e93DD07 /* coeff5 */
> +        .long 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969, 0xbeB8B969 /* coeff4 */
> +        .long 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0, 0x3eF637C0 /* coeff3 */
> +        .long 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B, 0xbf38AA2B /* coeff2 */
> +        .long 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B, 0x3fB8AA3B /* coeff1 */
> +        .align 32
> +        .type	__svml_slog2_data_internal,@object
> +        .size	__svml_slog2_data_internal,.-__svml_slog2_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_log22_core.S b/sysdeps/x86_64/fpu/svml_d_log22_core.S
> new file mode 100644
> index 0000000000..f181a62c7d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log22_core.S
> @@ -0,0 +1,29 @@
> +/* Function log2 vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_log2)
> +WRAPPER_IMPL_SSE2 log2
> +END (_ZGVbN2v_log2)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_log2)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_log24_core.S b/sysdeps/x86_64/fpu/svml_d_log24_core.S
> new file mode 100644
> index 0000000000..b0a5aa9532
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log24_core.S
> @@ -0,0 +1,29 @@
> +/* Function log2 vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_log2)
> +WRAPPER_IMPL_AVX _ZGVbN2v_log2
> +END (_ZGVdN4v_log2)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_log2)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_log24_core_avx.S b/sysdeps/x86_64/fpu/svml_d_log24_core_avx.S
> new file mode 100644
> index 0000000000..9a56cfed61
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log24_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function log2 vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_log2)
> +WRAPPER_IMPL_AVX _ZGVbN2v_log2
> +END (_ZGVcN4v_log2)
> diff --git a/sysdeps/x86_64/fpu/svml_d_log28_core.S b/sysdeps/x86_64/fpu/svml_d_log28_core.S
> new file mode 100644
> index 0000000000..443cbfd578
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log28_core.S
> @@ -0,0 +1,25 @@
> +/* Function log2 vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_log2)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_log2
> +END (_ZGVeN8v_log2)
> diff --git a/sysdeps/x86_64/fpu/svml_s_log2f16_core.S b/sysdeps/x86_64/fpu/svml_s_log2f16_core.S
> new file mode 100644
> index 0000000000..6cf265fd33
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log2f16_core.S
> @@ -0,0 +1,25 @@
> +/* Function log2f vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_log2f)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_log2f
> +END (_ZGVeN16v_log2f)
> diff --git a/sysdeps/x86_64/fpu/svml_s_log2f4_core.S b/sysdeps/x86_64/fpu/svml_s_log2f4_core.S
> new file mode 100644
> index 0000000000..024ba9b8c5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log2f4_core.S
> @@ -0,0 +1,29 @@
> +/* Function log2f vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_log2f)
> +WRAPPER_IMPL_SSE2 log2f
> +END (_ZGVbN4v_log2f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_log2f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_log2f8_core.S b/sysdeps/x86_64/fpu/svml_s_log2f8_core.S
> new file mode 100644
> index 0000000000..5705590563
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log2f8_core.S
> @@ -0,0 +1,29 @@
> +/* Function log2f vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_log2f)
> +WRAPPER_IMPL_AVX _ZGVbN4v_log2f
> +END (_ZGVdN8v_log2f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_log2f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S
> new file mode 100644
> index 0000000000..38602c475e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log2f8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function log2f vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_log2f)
> +WRAPPER_IMPL_AVX _ZGVbN4v_log2f
> +END (_ZGVcN8v_log2f)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx.c
> new file mode 100644
> index 0000000000..95d8e4bbd8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-log2.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx2.c
> new file mode 100644
> index 0000000000..95d8e4bbd8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-log2.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx512f.c
> new file mode 100644
> index 0000000000..95d8e4bbd8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log2-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-log2.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log2.c b/sysdeps/x86_64/fpu/test-double-libmvec-log2.c
> new file mode 100644
> index 0000000000..326b6f1171
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log2.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC log2
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 3dce136dfc..08c91ff634 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVbN2v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
> +VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 1852625897..a2fb0de309 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVdN4v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
> +VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index cf9ea35ffe..dc65a4ee25 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVcN4v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
> +VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index b6457ea032..253ee8c906 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVeN8v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
> +VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx.c
> new file mode 100644
> index 0000000000..c88b3fc5a9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-log2f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx2.c
> new file mode 100644
> index 0000000000..c88b3fc5a9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-log2f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx512f.c
> new file mode 100644
> index 0000000000..c88b3fc5a9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log2f-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-log2f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log2f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log2f.c
> new file mode 100644
> index 0000000000..afba03d1e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log2f.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC log2f
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 272e754e1b..1c7db5146c 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVeN16v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index b892258b99..8ec51603b3 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVbN4v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index 1c6ead71e1..1cb4553c7a 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVdN8v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 71f5d8d7b6..6ecc1792bb 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -39,6 +39,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVcN8v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 10/18] x86-64: Add vector atan2/atan2f implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 10/18] x86-64: Add vector atan2/atan2f " Sunil K Pandey
@ 2021-12-29 21:26   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:26 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:52PM -0800, Sunil K Pandey wrote:
> Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector atan2/atan2f with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
>  .../fpu/multiarch/svml_d_atan22_core-sse2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_d_atan22_core.c |  28 ++
>  .../fpu/multiarch/svml_d_atan22_core_sse4.S   | 471 +++++++++++++++++
>  .../fpu/multiarch/svml_d_atan24_core-sse.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_atan24_core.c |  28 ++
>  .../fpu/multiarch/svml_d_atan24_core_avx2.S   | 451 +++++++++++++++++
>  .../fpu/multiarch/svml_d_atan28_core-avx2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_d_atan28_core.c |  28 ++
>  .../fpu/multiarch/svml_d_atan28_core_avx512.S | 475 ++++++++++++++++++
>  .../fpu/multiarch/svml_s_atan2f16_core-avx2.S |  20 +
>  .../fpu/multiarch/svml_s_atan2f16_core.c      |  28 ++
>  .../multiarch/svml_s_atan2f16_core_avx512.S   | 399 +++++++++++++++
>  .../fpu/multiarch/svml_s_atan2f4_core-sse2.S  |  20 +
>  .../fpu/multiarch/svml_s_atan2f4_core.c       |  28 ++
>  .../fpu/multiarch/svml_s_atan2f4_core_sse4.S  | 384 ++++++++++++++
>  .../fpu/multiarch/svml_s_atan2f8_core-sse.S   |  20 +
>  .../fpu/multiarch/svml_s_atan2f8_core.c       |  28 ++
>  .../fpu/multiarch/svml_s_atan2f8_core_avx2.S  | 362 +++++++++++++
>  sysdeps/x86_64/fpu/svml_d_atan22_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_d_atan24_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S   |  25 +
>  sysdeps/x86_64/fpu/svml_d_atan28_core.S       |  25 +
>  sysdeps/x86_64/fpu/svml_s_atan2f16_core.S     |  25 +
>  sysdeps/x86_64/fpu/svml_s_atan2f4_core.S      |  29 ++
>  sysdeps/x86_64/fpu/svml_s_atan2f8_core.S      |  29 ++
>  sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S  |  25 +
>  .../fpu/test-double-libmvec-atan2-avx.c       |   1 +
>  .../fpu/test-double-libmvec-atan2-avx2.c      |   1 +
>  .../fpu/test-double-libmvec-atan2-avx512f.c   |   1 +
>  .../x86_64/fpu/test-double-libmvec-atan2.c    |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../fpu/test-float-libmvec-atan2f-avx.c       |   1 +
>  .../fpu/test-float-libmvec-atan2f-avx2.c      |   1 +
>  .../fpu/test-float-libmvec-atan2f-avx512f.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-atan2f.c    |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 3117 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atan22_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atan24_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_atan28_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-atan2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-atan2f.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 7f1304ed1d..31878bf4ed 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -208,4 +208,15 @@
>  #define __DECL_SIMD_cbrtf32x
>  #define __DECL_SIMD_cbrtf64x
>  #define __DECL_SIMD_cbrtf128x
> +
> +#define __DECL_SIMD_atan2
> +#define __DECL_SIMD_atan2f
> +#define __DECL_SIMD_atan2l
> +#define __DECL_SIMD_atan2f16
> +#define __DECL_SIMD_atan2f32
> +#define __DECL_SIMD_atan2f64
> +#define __DECL_SIMD_atan2f128
> +#define __DECL_SIMD_atan2f32x
> +#define __DECL_SIMD_atan2f64x
> +#define __DECL_SIMD_atan2f128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 26d18f0135..1bd4911993 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -56,7 +56,7 @@ __MATHCALL_VEC (asin,, (_Mdouble_ __x));
>  /* Arc tangent of X.  */
>  __MATHCALL_VEC (atan,, (_Mdouble_ __x));
>  /* Arc tangent of Y/X.  */
> -__MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
> +__MATHCALL_VEC (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
>  
>  /* Cosine of X.  */
>  __MATHCALL_VEC (cos,, (_Mdouble_ __x));
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index a6558d9810..2b3b8d3886 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -55,6 +55,7 @@ GLIBC_2.35 _ZGVbN2v_exp10 F
>  GLIBC_2.35 _ZGVbN2v_exp2 F
>  GLIBC_2.35 _ZGVbN2v_expm1 F
>  GLIBC_2.35 _ZGVbN2v_sinh F
> +GLIBC_2.35 _ZGVbN2vv_atan2 F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
> @@ -65,6 +66,7 @@ GLIBC_2.35 _ZGVbN4v_exp10f F
>  GLIBC_2.35 _ZGVbN4v_exp2f F
>  GLIBC_2.35 _ZGVbN4v_expm1f F
>  GLIBC_2.35 _ZGVbN4v_sinhf F
> +GLIBC_2.35 _ZGVbN4vv_atan2f F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
>  GLIBC_2.35 _ZGVcN4v_asin F
> @@ -75,6 +77,7 @@ GLIBC_2.35 _ZGVcN4v_exp10 F
>  GLIBC_2.35 _ZGVcN4v_exp2 F
>  GLIBC_2.35 _ZGVcN4v_expm1 F
>  GLIBC_2.35 _ZGVcN4v_sinh F
> +GLIBC_2.35 _ZGVcN4vv_atan2 F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
> @@ -85,6 +88,7 @@ GLIBC_2.35 _ZGVcN8v_exp10f F
>  GLIBC_2.35 _ZGVcN8v_exp2f F
>  GLIBC_2.35 _ZGVcN8v_expm1f F
>  GLIBC_2.35 _ZGVcN8v_sinhf F
> +GLIBC_2.35 _ZGVcN8vv_atan2f F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
>  GLIBC_2.35 _ZGVdN4v_asin F
> @@ -95,6 +99,7 @@ GLIBC_2.35 _ZGVdN4v_exp10 F
>  GLIBC_2.35 _ZGVdN4v_exp2 F
>  GLIBC_2.35 _ZGVdN4v_expm1 F
>  GLIBC_2.35 _ZGVdN4v_sinh F
> +GLIBC_2.35 _ZGVdN4vv_atan2 F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
> @@ -105,6 +110,7 @@ GLIBC_2.35 _ZGVdN8v_exp10f F
>  GLIBC_2.35 _ZGVdN8v_exp2f F
>  GLIBC_2.35 _ZGVdN8v_expm1f F
>  GLIBC_2.35 _ZGVdN8v_sinhf F
> +GLIBC_2.35 _ZGVdN8vv_atan2f F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
> @@ -115,6 +121,7 @@ GLIBC_2.35 _ZGVeN16v_exp10f F
>  GLIBC_2.35 _ZGVeN16v_exp2f F
>  GLIBC_2.35 _ZGVeN16v_expm1f F
>  GLIBC_2.35 _ZGVeN16v_sinhf F
> +GLIBC_2.35 _ZGVeN16vv_atan2f F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
>  GLIBC_2.35 _ZGVeN8v_asin F
> @@ -125,4 +132,5 @@ GLIBC_2.35 _ZGVeN8v_exp10 F
>  GLIBC_2.35 _ZGVeN8v_exp2 F
>  GLIBC_2.35 _ZGVeN8v_expm1 F
>  GLIBC_2.35 _ZGVeN8v_sinh F
> +GLIBC_2.35 _ZGVeN8vv_atan2 F
>  GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index dcd45934ab..62f2890ab3 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -98,6 +98,10 @@
>  #  define __DECL_SIMD_cbrt __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_cbrtf
>  #  define __DECL_SIMD_cbrtf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_atan2
> +#  define __DECL_SIMD_atan2 __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_atan2f
> +#  define __DECL_SIMD_atan2f __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index dfb5f13ea3..2269b74d50 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -48,6 +48,8 @@
>  !GCC$ builtin (sinhf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (cbrt) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (atan2) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (atan2f) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -81,3 +83,5 @@
>  !GCC$ builtin (sinhf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cbrt) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (atan2) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (atan2f) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index dde737c0d6..96a40856fa 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -25,6 +25,7 @@ libmvec-funcs = \
>    acos \
>    asin \
>    atan \
> +  atan2 \
>    cbrt \
>    cos \
>    cosh \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index b70aeb3e2f..f58c98eb45 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -23,6 +23,7 @@ libmvec {
>      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
>      _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
>      _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
> +    _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
> @@ -33,6 +34,7 @@ libmvec {
>      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
>      _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
>      _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
> +    _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
>      _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
>    }
>  }
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index e039a993df..6f59c61756 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -166,6 +166,26 @@ float: 2
>  float128: 2
>  ldouble: 1
>  
> +Function: "atan2_vlen16":
> +float: 2
> +
> +Function: "atan2_vlen2":
> +double: 1
> +
> +Function: "atan2_vlen4":
> +double: 1
> +float: 2
> +
> +Function: "atan2_vlen4_avx2":
> +double: 1
> +
> +Function: "atan2_vlen8":
> +double: 1
> +float: 2
> +
> +Function: "atan2_vlen8_avx2":
> +float: 2
> +
>  Function: "atan_downward":
>  double: 1
>  float: 2
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core-sse2.S
> new file mode 100644
> index 0000000000..6c3ad05a6c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized atan2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2vv_atan2 _ZGVbN2vv_atan2_sse2
> +#include "../svml_d_atan22_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core.c
> new file mode 100644
> index 0000000000..43f1ee7f33
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atan2, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2vv_atan2
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2vv_atan2, __GI__ZGVbN2vv_atan2,
> +	       __redirect__ZGVbN2vv_atan2)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core_sse4.S
> new file mode 100644
> index 0000000000..5c0d0fd17f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan22_core_sse4.S
> @@ -0,0 +1,471 @@
> +/* Function atan2 vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_datan2_data_internal
> + */
> +#define dPI                           	0
> +#define dPIO2                         	16
> +#define dA19                          	32
> +#define dA18                          	48
> +#define dA17                          	64
> +#define dA16                          	80
> +#define dA15                          	96
> +#define dA14                          	112
> +#define dA13                          	128
> +#define dA12                          	144
> +#define dA11                          	160
> +#define dA10                          	176
> +#define dA09                          	192
> +#define dA08                          	208
> +#define dA07                          	224
> +#define dA06                          	240
> +#define dA05                          	256
> +#define dA04                          	272
> +#define dA03                          	288
> +#define dA02                          	304
> +#define dA01                          	320
> +#define dA00                          	336
> +#define dSIGN_MASK                    	352
> +#define iCHK_WORK_SUB                 	368
> +#define iCHK_WORK_CMP                 	384
> +#define dABS_MASK                     	400
> +#define dZERO                         	416
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2vv_atan2_sse4)
> +        subq      $88, %rsp
> +        cfi_def_cfa_offset(96)
> +        movaps    %xmm0, %xmm8
> +
> +/*
> + * #define NO_VECTOR_ZERO_ATAN2_ARGS
> + *  Declarations
> + * Variables
> + * Constants
> + *  The end of declarations
> + *  Implementation
> + * Get r0~=1/B
> + * Cannot be replaced by VQRCP(D, dR0, dB);
> + * Argument Absolute values
> + */
> +        movups    dABS_MASK+__svml_datan2_data_internal(%rip), %xmm4
> +        movaps    %xmm1, %xmm9
> +        movaps    %xmm4, %xmm1
> +        andps     %xmm8, %xmm4
> +        andps     %xmm9, %xmm1
> +        movaps    %xmm4, %xmm2
> +        cmpnltpd  %xmm1, %xmm2
> +
> +/* Argument signs */
> +        movups    dSIGN_MASK+__svml_datan2_data_internal(%rip), %xmm3
> +        movaps    %xmm2, %xmm0
> +        movups    dPIO2+__svml_datan2_data_internal(%rip), %xmm5
> +        movaps    %xmm3, %xmm7
> +        movaps    %xmm3, %xmm6
> +
> +/*
> + * 1) If y<x then a= y, b=x, PIO2=0
> + * 2) If y>x then a=-x, b=y, PIO2=Pi/2
> + */
> +        orps      %xmm1, %xmm3
> +        movaps    %xmm2, %xmm10
> +        andps     %xmm2, %xmm5
> +        andnps    %xmm4, %xmm0
> +        andps     %xmm2, %xmm3
> +        andnps    %xmm1, %xmm10
> +        andps     %xmm4, %xmm2
> +        orps      %xmm3, %xmm0
> +        orps      %xmm2, %xmm10
> +        divpd     %xmm10, %xmm0
> +        movq      iCHK_WORK_SUB+__svml_datan2_data_internal(%rip), %xmm11
> +
> +/* if x<0, dPI = Pi, else dPI =0 */
> +        movaps    %xmm9, %xmm3
> +
> +/* Check if y and x are on main path. */
> +        pshufd    $221, %xmm1, %xmm12
> +        andps     %xmm9, %xmm7
> +        psubd     %xmm11, %xmm12
> +        andps     %xmm8, %xmm6
> +        movq      iCHK_WORK_CMP+__svml_datan2_data_internal(%rip), %xmm13
> +        xorl      %edx, %edx
> +        movups    %xmm4, 16(%rsp)
> +        xorl      %eax, %eax
> +        pshufd    $221, %xmm4, %xmm14
> +        movdqa    %xmm12, %xmm4
> +        pcmpgtd   %xmm13, %xmm4
> +        pcmpeqd   %xmm13, %xmm12
> +        por       %xmm12, %xmm4
> +
> +/* Polynomial. */
> +        movaps    %xmm0, %xmm12
> +        mulpd     %xmm0, %xmm12
> +        cmplepd   dZERO+__svml_datan2_data_internal(%rip), %xmm3
> +        psubd     %xmm11, %xmm14
> +        movdqa    %xmm14, %xmm15
> +        pcmpeqd   %xmm13, %xmm14
> +        pcmpgtd   %xmm13, %xmm15
> +        por       %xmm14, %xmm15
> +        movaps    %xmm12, %xmm14
> +        mulpd     %xmm12, %xmm14
> +        por       %xmm15, %xmm4
> +        movaps    %xmm14, %xmm15
> +        mulpd     %xmm14, %xmm15
> +        movmskps  %xmm4, %ecx
> +        movups    %xmm10, (%rsp)
> +        movups    dA19+__svml_datan2_data_internal(%rip), %xmm10
> +        mulpd     %xmm15, %xmm10
> +        movups    dA18+__svml_datan2_data_internal(%rip), %xmm13
> +        movups    dA17+__svml_datan2_data_internal(%rip), %xmm11
> +        addpd     dA15+__svml_datan2_data_internal(%rip), %xmm10
> +        mulpd     %xmm15, %xmm13
> +        mulpd     %xmm15, %xmm11
> +        mulpd     %xmm15, %xmm10
> +        addpd     dA14+__svml_datan2_data_internal(%rip), %xmm13
> +        addpd     dA13+__svml_datan2_data_internal(%rip), %xmm11
> +        addpd     dA11+__svml_datan2_data_internal(%rip), %xmm10
> +        mulpd     %xmm15, %xmm13
> +        mulpd     %xmm15, %xmm11
> +        mulpd     %xmm15, %xmm10
> +        addpd     dA10+__svml_datan2_data_internal(%rip), %xmm13
> +        addpd     dA09+__svml_datan2_data_internal(%rip), %xmm11
> +        addpd     dA07+__svml_datan2_data_internal(%rip), %xmm10
> +        mulpd     %xmm15, %xmm13
> +        mulpd     %xmm15, %xmm11
> +        mulpd     %xmm15, %xmm10
> +        addpd     dA06+__svml_datan2_data_internal(%rip), %xmm13
> +        addpd     dA05+__svml_datan2_data_internal(%rip), %xmm11
> +        addpd     dA03+__svml_datan2_data_internal(%rip), %xmm10
> +        mulpd     %xmm15, %xmm13
> +        mulpd     %xmm15, %xmm11
> +        mulpd     %xmm12, %xmm10
> +        addpd     dA02+__svml_datan2_data_internal(%rip), %xmm13
> +        addpd     dA01+__svml_datan2_data_internal(%rip), %xmm11
> +        addpd     %xmm10, %xmm13
> +        mulpd     %xmm11, %xmm12
> +        mulpd     %xmm13, %xmm14
> +        movups    dA16+__svml_datan2_data_internal(%rip), %xmm2
> +        mulpd     %xmm15, %xmm2
> +        addpd     dA12+__svml_datan2_data_internal(%rip), %xmm2
> +        mulpd     %xmm15, %xmm2
> +        addpd     dA08+__svml_datan2_data_internal(%rip), %xmm2
> +        mulpd     %xmm15, %xmm2
> +        addpd     dA04+__svml_datan2_data_internal(%rip), %xmm2
> +
> +/* A00=1.0, account for it later  VQFMA(D, dP4, dP4, dR8, dA00); */
> +        mulpd     %xmm2, %xmm15
> +        addpd     %xmm12, %xmm15
> +        addpd     %xmm14, %xmm15
> +
> +/*
> + * Reconstruction.
> + * dP=(R+R*dP) + dPIO2
> + */
> +        mulpd     %xmm0, %xmm15
> +        addpd     %xmm15, %xmm0
> +        addpd     %xmm5, %xmm0
> +        andps     __svml_datan2_data_internal(%rip), %xmm3
> +        orps      %xmm7, %xmm0
> +        addpd     %xmm3, %xmm0
> +
> +/*  Special branch for fast (vector) processing of zero arguments  */
> +        movups    16(%rsp), %xmm11
> +        orps      %xmm6, %xmm0
> +        testb     $3, %cl
> +
> +/* Go to auxilary branch */
> +        jne       L(AUX_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm1 xmm3 xmm4 xmm5 xmm6 xmm7 xmm8 xmm9 xmm11
> +
> +/* Return from auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH_RETURN):
> +/*
> + *  Special branch for fast (vector) processing of zero arguments
> + *  The end of implementation
> + */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm8 xmm9
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $88, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(96)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm8, 32(%rsp)
> +        movups    %xmm9, 48(%rsp)
> +        movups    %xmm0, 64(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0
> +
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -80)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -88)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -96)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    64(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -80)
> +        cfi_offset(13, -88)
> +        cfi_offset(14, -96)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        movsd     48(%rsp,%r14,8), %xmm1
> +        call      atan2@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +        cfi_restore(12)
> +        cfi_restore(13)
> +        cfi_restore(14)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH):
> +/* Check if at least on of Y or Y is zero: iAXAYZERO */
> +        movups    dZERO+__svml_datan2_data_internal(%rip), %xmm2
> +
> +/* Check if both X & Y are not NaNs:  iXYnotNAN */
> +        movaps    %xmm9, %xmm12
> +        movaps    %xmm8, %xmm10
> +        cmpordpd  %xmm9, %xmm12
> +        cmpordpd  %xmm8, %xmm10
> +        cmpeqpd   %xmm2, %xmm1
> +        cmpeqpd   %xmm2, %xmm11
> +        andps     %xmm10, %xmm12
> +        orps      %xmm11, %xmm1
> +        pshufd    $221, %xmm1, %xmm1
> +        pshufd    $221, %xmm12, %xmm11
> +
> +/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
> +        pand      %xmm11, %xmm1
> +
> +/* Exclude from previous callout mask zero (and not NaN) arguments */
> +        movdqa    %xmm1, %xmm13
> +        pandn     %xmm4, %xmm13
> +
> +/*
> + *  Path for zero arguments (at least one of both)
> + * Check if both args are zeros (den. is zero)
> + */
> +        movups    (%rsp), %xmm4
> +        cmpeqpd   %xmm2, %xmm4
> +
> +/* Go to callout */
> +        movmskps  %xmm13, %edx
> +
> +/* Set sPIO2 to zero if den. is zero */
> +        movaps    %xmm4, %xmm15
> +        andps     %xmm2, %xmm4
> +        andnps    %xmm5, %xmm15
> +        andl      $3, %edx
> +        orps      %xmm4, %xmm15
> +        pshufd    $221, %xmm9, %xmm5
> +        orps      %xmm7, %xmm15
> +
> +/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
> +        pshufd    $221, %xmm2, %xmm7
> +        pcmpgtd   %xmm5, %xmm7
> +        pshufd    $80, %xmm7, %xmm14
> +        andps     %xmm3, %xmm14
> +        addpd     %xmm14, %xmm15
> +
> +/* Merge results from main and spec path */
> +        pshufd    $80, %xmm1, %xmm3
> +        orps      %xmm6, %xmm15
> +        movdqa    %xmm3, %xmm6
> +        andps     %xmm3, %xmm15
> +        andnps    %xmm0, %xmm6
> +        movaps    %xmm6, %xmm0
> +        orps      %xmm15, %xmm0
> +
> +/* Return to main vector processing path */
> +        jmp       L(AUX_BRANCH_RETURN)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm8 xmm9
> +END(_ZGVbN2vv_atan2_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_datan2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 dPI[2][2];
> +        __declspec(align(16)) VUINT32 dPIO2[2][2];
> +        __declspec(align(16)) VUINT32 dA19[2][2];
> +        __declspec(align(16)) VUINT32 dA18[2][2];
> +        __declspec(align(16)) VUINT32 dA17[2][2];
> +        __declspec(align(16)) VUINT32 dA16[2][2];
> +        __declspec(align(16)) VUINT32 dA15[2][2];
> +        __declspec(align(16)) VUINT32 dA14[2][2];
> +        __declspec(align(16)) VUINT32 dA13[2][2];
> +        __declspec(align(16)) VUINT32 dA12[2][2];
> +        __declspec(align(16)) VUINT32 dA11[2][2];
> +        __declspec(align(16)) VUINT32 dA10[2][2];
> +        __declspec(align(16)) VUINT32 dA09[2][2];
> +        __declspec(align(16)) VUINT32 dA08[2][2];
> +        __declspec(align(16)) VUINT32 dA07[2][2];
> +        __declspec(align(16)) VUINT32 dA06[2][2];
> +        __declspec(align(16)) VUINT32 dA05[2][2];
> +        __declspec(align(16)) VUINT32 dA04[2][2];
> +        __declspec(align(16)) VUINT32 dA03[2][2];
> +        __declspec(align(16)) VUINT32 dA02[2][2];
> +        __declspec(align(16)) VUINT32 dA01[2][2];
> +        __declspec(align(16)) VUINT32 dA00[2][2];
> +        __declspec(align(16)) VUINT32 dSIGN_MASK[2][2];
> +        __declspec(align(16)) VUINT32 iCHK_WORK_SUB[4][1];
> +        __declspec(align(16)) VUINT32 iCHK_WORK_CMP[4][1];
> +        __declspec(align(16)) VUINT32 dABS_MASK[2][2];
> +        __declspec(align(16)) VUINT32 dZERO[2][2];
> +} __svml_datan2_data_internal;
> +#endif
> +__svml_datan2_data_internal:
> +        .quad 0x400921FB54442D18, 0x400921FB54442D18 //dPI
> +        .align 16
> +        .quad 0x3FF921FB54442D18, 0x3FF921FB54442D18 //dPIO2
> +        .align 16
> +        .quad 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3 // dA19
> +        .align 16
> +        .quad 0x3F2CED0A36665209, 0x3F2CED0A36665209 // dA18
> +        .align 16
> +        .quad 0xBF52E67C93954C23, 0xBF52E67C93954C23 // dA17
> +        .align 16
> +        .quad 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3 // dA16
> +        .align 16
> +        .quad 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD // dA15
> +        .align 16
> +        .quad 0x3F914F4C661116A5, 0x3F914F4C661116A5 // dA14
> +        .align 16
> +        .quad 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C // dA13
> +        .align 16
> +        .quad 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F // dA12
> +        .align 16
> +        .quad 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC // dA11
> +        .align 16
> +        .quad 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B // dA10
> +        .align 16
> +        .quad 0xBFAAD261EAA09954, 0xBFAAD261EAA09954 // dA09
> +        .align 16
> +        .quad 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF // dA08
> +        .align 16
> +        .quad 0xBFB11084009435E0, 0xBFB11084009435E0 // dA07
> +        .align 16
> +        .quad 0x3FB3B12A49295651, 0x3FB3B12A49295651 // dA06
> +        .align 16
> +        .quad 0xBFB745D009BADA94, 0xBFB745D009BADA94 // dA05
> +        .align 16
> +        .quad 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5 // dA04
> +        .align 16
> +        .quad 0xBFC2492491EE55C7, 0xBFC2492491EE55C7 // dA03
> +        .align 16
> +        .quad 0x3FC999999997EE34, 0x3FC999999997EE34 // dA02
> +        .align 16
> +        .quad 0xBFD55555555553C5, 0xBFD55555555553C5 // dA01
> +        .align 16
> +        .quad 0x3FF0000000000000, 0x3FF0000000000000 // dA00
> +        .align 16
> +        .quad 0x8000000000000000, 0x8000000000000000 //dSIGN_MASK
> +        .align 16
> +        .long 0x80300000, 0x80300000, 0x80300000, 0x80300000 //iCHK_WORK_SUB
> +        .align 16
> +        .long 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000 //iCHK_WORK_CMP
> +        .align 16
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff //dABS_MASK
> +        .align 16
> +        .quad 0x0000000000000000, 0x0000000000000000 //dZERO
> +        .align 16
> +        .type	__svml_datan2_data_internal,@object
> +        .size	__svml_datan2_data_internal,.-__svml_datan2_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core-sse.S
> new file mode 100644
> index 0000000000..0db843a088
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized atan2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4vv_atan2 _ZGVdN4vv_atan2_sse_wrapper
> +#include "../svml_d_atan24_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core.c
> new file mode 100644
> index 0000000000..c2e2611584
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atan2, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4vv_atan2
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4vv_atan2, __GI__ZGVdN4vv_atan2,
> +	       __redirect__ZGVdN4vv_atan2)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core_avx2.S
> new file mode 100644
> index 0000000000..cdf780715b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan24_core_avx2.S
> @@ -0,0 +1,451 @@
> +/* Function atan2 vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_datan2_data_internal
> + */
> +#define dPI                           	0
> +#define dPIO2                         	32
> +#define dA19                          	64
> +#define dA18                          	96
> +#define dA17                          	128
> +#define dA16                          	160
> +#define dA15                          	192
> +#define dA14                          	224
> +#define dA13                          	256
> +#define dA12                          	288
> +#define dA11                          	320
> +#define dA10                          	352
> +#define dA09                          	384
> +#define dA08                          	416
> +#define dA07                          	448
> +#define dA06                          	480
> +#define dA05                          	512
> +#define dA04                          	544
> +#define dA03                          	576
> +#define dA02                          	608
> +#define dA01                          	640
> +#define dA00                          	672
> +#define dSIGN_MASK                    	704
> +#define iCHK_WORK_SUB                 	736
> +#define iCHK_WORK_CMP                 	768
> +#define dABS_MASK                     	800
> +#define dZERO                         	832
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4vv_atan2_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $128, %rsp
> +        xorl      %edx, %edx
> +
> +/*
> + * #define NO_VECTOR_ZERO_ATAN2_ARGS
> + *  Declarations
> + * Variables
> + * Constants
> + *  The end of declarations
> + *  Implementation
> + * Get r0~=1/B
> + * Cannot be replaced by VQRCP(D, dR0, dB);
> + * Argument Absolute values
> + */
> +        vmovupd   dABS_MASK+__svml_datan2_data_internal(%rip), %ymm5
> +
> +/* Argument signs */
> +        vmovupd   dSIGN_MASK+__svml_datan2_data_internal(%rip), %ymm4
> +        vmovups   iCHK_WORK_SUB+__svml_datan2_data_internal(%rip), %xmm13
> +        vmovupd   %ymm0, (%rsp)
> +        vmovapd   %ymm1, %ymm8
> +        vandpd    %ymm5, %ymm8, %ymm2
> +        vandpd    %ymm5, %ymm0, %ymm1
> +        vcmpnlt_uqpd %ymm2, %ymm1, %ymm15
> +
> +/*
> + * 1) If y<x then a= y, b=x, PIO2=0
> + * 2) If y>x then a=-x, b=y, PIO2=Pi/2
> + */
> +        vorpd     %ymm4, %ymm2, %ymm6
> +        vblendvpd %ymm15, %ymm6, %ymm1, %ymm3
> +        vblendvpd %ymm15, %ymm1, %ymm2, %ymm6
> +        vdivpd    %ymm6, %ymm3, %ymm14
> +        vmovups   iCHK_WORK_CMP+__svml_datan2_data_internal(%rip), %xmm3
> +        vmovupd   %ymm6, 32(%rsp)
> +        vandpd    %ymm4, %ymm0, %ymm7
> +        vandpd    %ymm4, %ymm8, %ymm5
> +        vandpd    dPIO2+__svml_datan2_data_internal(%rip), %ymm15, %ymm4
> +
> +/* Check if y and x are on main path. */
> +        vextractf128 $1, %ymm2, %xmm9
> +        vextractf128 $1, %ymm1, %xmm10
> +        vshufps   $221, %xmm9, %xmm2, %xmm11
> +        vshufps   $221, %xmm10, %xmm1, %xmm12
> +        vpsubd    %xmm13, %xmm11, %xmm0
> +        vpsubd    %xmm13, %xmm12, %xmm9
> +        vpcmpgtd  %xmm3, %xmm0, %xmm15
> +        vpcmpeqd  %xmm3, %xmm0, %xmm6
> +        vpcmpgtd  %xmm3, %xmm9, %xmm10
> +        vpcmpeqd  %xmm3, %xmm9, %xmm3
> +        vpor      %xmm6, %xmm15, %xmm11
> +        vpor      %xmm3, %xmm10, %xmm12
> +
> +/* Polynomial. */
> +        vmulpd    %ymm14, %ymm14, %ymm10
> +        vpor      %xmm12, %xmm11, %xmm3
> +        vmovupd   dA18+__svml_datan2_data_internal(%rip), %ymm9
> +        vmovupd   dA17+__svml_datan2_data_internal(%rip), %ymm12
> +        vmovupd   dA16+__svml_datan2_data_internal(%rip), %ymm15
> +        vmulpd    %ymm10, %ymm10, %ymm11
> +
> +/* if x<0, dPI = Pi, else dPI =0 */
> +        vcmple_oqpd dZERO+__svml_datan2_data_internal(%rip), %ymm8, %ymm13
> +        vmovmskps %xmm3, %eax
> +        vmulpd    %ymm11, %ymm11, %ymm0
> +        vandpd    __svml_datan2_data_internal(%rip), %ymm13, %ymm6
> +        vmovupd   dA19+__svml_datan2_data_internal(%rip), %ymm13
> +        vfmadd213pd dA14+__svml_datan2_data_internal(%rip), %ymm0, %ymm9
> +        vfmadd213pd dA13+__svml_datan2_data_internal(%rip), %ymm0, %ymm12
> +        vfmadd213pd dA12+__svml_datan2_data_internal(%rip), %ymm0, %ymm15
> +        vfmadd213pd dA15+__svml_datan2_data_internal(%rip), %ymm0, %ymm13
> +        vfmadd213pd dA10+__svml_datan2_data_internal(%rip), %ymm0, %ymm9
> +        vfmadd213pd dA09+__svml_datan2_data_internal(%rip), %ymm0, %ymm12
> +        vfmadd213pd dA08+__svml_datan2_data_internal(%rip), %ymm0, %ymm15
> +        vfmadd213pd dA11+__svml_datan2_data_internal(%rip), %ymm0, %ymm13
> +        vfmadd213pd dA06+__svml_datan2_data_internal(%rip), %ymm0, %ymm9
> +        vfmadd213pd dA05+__svml_datan2_data_internal(%rip), %ymm0, %ymm12
> +        vfmadd213pd dA04+__svml_datan2_data_internal(%rip), %ymm0, %ymm15
> +        vfmadd213pd dA07+__svml_datan2_data_internal(%rip), %ymm0, %ymm13
> +        vfmadd213pd dA02+__svml_datan2_data_internal(%rip), %ymm0, %ymm9
> +        vfmadd213pd dA01+__svml_datan2_data_internal(%rip), %ymm0, %ymm12
> +        vfmadd213pd dA03+__svml_datan2_data_internal(%rip), %ymm0, %ymm13
> +
> +/* A00=1.0, account for it later  VQFMA(D, dP4, dP4, dR8, dA00); */
> +        vmulpd    %ymm15, %ymm0, %ymm0
> +        vfmadd213pd %ymm9, %ymm10, %ymm13
> +        vfmadd213pd %ymm0, %ymm10, %ymm12
> +        vfmadd213pd %ymm12, %ymm11, %ymm13
> +
> +/*
> + * Reconstruction.
> + * dP=(R+R*dP) + dPIO2
> + */
> +        vfmadd213pd %ymm14, %ymm14, %ymm13
> +        vaddpd    %ymm13, %ymm4, %ymm14
> +        vorpd     %ymm5, %ymm14, %ymm0
> +        vaddpd    %ymm0, %ymm6, %ymm9
> +        vorpd     %ymm7, %ymm9, %ymm0
> +
> +/*  Special branch for fast (vector) processing of zero arguments  */
> +        testl     %eax, %eax
> +
> +/* Go to auxilary branch */
> +        jne       L(AUX_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx xmm3 ymm0 ymm1 ymm2 ymm4 ymm5 ymm6 ymm7 ymm8
> +
> +/* Return from auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH_RETURN):
> +/*
> + *  Special branch for fast (vector) processing of zero arguments
> + *  The end of implementation
> + */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm8
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   (%rsp), %ymm1
> +        vmovupd   %ymm8, 64(%rsp)
> +        vmovupd   %ymm0, 96(%rsp)
> +        vmovupd   %ymm1, 32(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   96(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        movsd     64(%rsp,%r14,8), %xmm1
> +        call      atan2@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 96(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +        cfi_restore(12)
> +        cfi_restore(13)
> +        cfi_restore(14)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH):
> +        vmovupd   (%rsp), %ymm11
> +
> +/* Check if at least on of Y or Y is zero: iAXAYZERO */
> +        vmovupd   dZERO+__svml_datan2_data_internal(%rip), %ymm10
> +
> +/* Check if both X & Y are not NaNs:  iXYnotNAN */
> +        vcmpordpd %ymm8, %ymm8, %ymm12
> +        vcmpordpd %ymm11, %ymm11, %ymm13
> +        vcmpeqpd  %ymm10, %ymm2, %ymm2
> +        vcmpeqpd  %ymm10, %ymm1, %ymm1
> +        vandpd    %ymm13, %ymm12, %ymm14
> +        vorpd     %ymm1, %ymm2, %ymm2
> +        vextractf128 $1, %ymm14, %xmm15
> +        vextractf128 $1, %ymm2, %xmm11
> +        vshufps   $221, %xmm15, %xmm14, %xmm9
> +        vshufps   $221, %xmm11, %xmm2, %xmm12
> +
> +/*
> + *  Path for zero arguments (at least one of both)
> + * Check if both args are zeros (den. is zero)
> + */
> +        vcmpeqpd  32(%rsp), %ymm10, %ymm2
> +
> +/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
> +        vpand     %xmm9, %xmm12, %xmm1
> +
> +/* Exclude from previous callout mask zero (and not NaN) arguments */
> +        vpandn    %xmm3, %xmm1, %xmm3
> +
> +/* Go to callout */
> +        vmovmskps %xmm3, %edx
> +
> +/* Set sPIO2 to zero if den. is zero */
> +        vblendvpd %ymm2, %ymm10, %ymm4, %ymm4
> +        vorpd     %ymm5, %ymm4, %ymm5
> +
> +/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
> +        vextractf128 $1, %ymm10, %xmm2
> +        vextractf128 $1, %ymm8, %xmm3
> +        vshufps   $221, %xmm2, %xmm10, %xmm4
> +        vshufps   $221, %xmm3, %xmm8, %xmm9
> +        vpcmpgtd  %xmm9, %xmm4, %xmm12
> +        vpshufd   $80, %xmm12, %xmm11
> +        vpshufd   $250, %xmm12, %xmm13
> +        vinsertf128 $1, %xmm13, %ymm11, %ymm14
> +        vandpd    %ymm6, %ymm14, %ymm6
> +        vaddpd    %ymm6, %ymm5, %ymm2
> +        vorpd     %ymm7, %ymm2, %ymm2
> +
> +/* Merge results from main and spec path */
> +        vpshufd   $80, %xmm1, %xmm7
> +        vpshufd   $250, %xmm1, %xmm1
> +        vinsertf128 $1, %xmm1, %ymm7, %ymm3
> +        vblendvpd %ymm3, %ymm2, %ymm0, %ymm0
> +
> +/* Return to main vector processing path */
> +        jmp       L(AUX_BRANCH_RETURN)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm8
> +END(_ZGVdN4vv_atan2_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_datan2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 dPI[4][2];
> +        __declspec(align(32)) VUINT32 dPIO2[4][2];
> +        __declspec(align(32)) VUINT32 dA19[4][2];
> +        __declspec(align(32)) VUINT32 dA18[4][2];
> +        __declspec(align(32)) VUINT32 dA17[4][2];
> +        __declspec(align(32)) VUINT32 dA16[4][2];
> +        __declspec(align(32)) VUINT32 dA15[4][2];
> +        __declspec(align(32)) VUINT32 dA14[4][2];
> +        __declspec(align(32)) VUINT32 dA13[4][2];
> +        __declspec(align(32)) VUINT32 dA12[4][2];
> +        __declspec(align(32)) VUINT32 dA11[4][2];
> +        __declspec(align(32)) VUINT32 dA10[4][2];
> +        __declspec(align(32)) VUINT32 dA09[4][2];
> +        __declspec(align(32)) VUINT32 dA08[4][2];
> +        __declspec(align(32)) VUINT32 dA07[4][2];
> +        __declspec(align(32)) VUINT32 dA06[4][2];
> +        __declspec(align(32)) VUINT32 dA05[4][2];
> +        __declspec(align(32)) VUINT32 dA04[4][2];
> +        __declspec(align(32)) VUINT32 dA03[4][2];
> +        __declspec(align(32)) VUINT32 dA02[4][2];
> +        __declspec(align(32)) VUINT32 dA01[4][2];
> +        __declspec(align(32)) VUINT32 dA00[4][2];
> +        __declspec(align(32)) VUINT32 dSIGN_MASK[4][2];
> +        __declspec(align(32)) VUINT32 iCHK_WORK_SUB[8][1];
> +        __declspec(align(32)) VUINT32 iCHK_WORK_CMP[8][1];
> +        __declspec(align(32)) VUINT32 dABS_MASK[4][2];
> +        __declspec(align(32)) VUINT32 dZERO[4][2];
> +} __svml_datan2_data_internal;
> +#endif
> +__svml_datan2_data_internal:
> +        .quad 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18 //dPI
> +        .align 32
> +        .quad 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18 //dPIO2
> +        .align 32
> +        .quad 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3 // dA19
> +        .align 32
> +        .quad 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209 // dA18
> +        .align 32
> +        .quad 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23 // dA17
> +        .align 32
> +        .quad 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3 // dA16
> +        .align 32
> +        .quad 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD // dA15
> +        .align 32
> +        .quad 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5 // dA14
> +        .align 32
> +        .quad 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C // dA13
> +        .align 32
> +        .quad 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F // dA12
> +        .align 32
> +        .quad 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC // dA11
> +        .align 32
> +        .quad 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B // dA10
> +        .align 32
> +        .quad 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954 // dA09
> +        .align 32
> +        .quad 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF // dA08
> +        .align 32
> +        .quad 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0 // dA07
> +        .align 32
> +        .quad 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651 // dA06
> +        .align 32
> +        .quad 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94 // dA05
> +        .align 32
> +        .quad 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5 // dA04
> +        .align 32
> +        .quad 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7 // dA03
> +        .align 32
> +        .quad 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34 // dA02
> +        .align 32
> +        .quad 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5 // dA01
> +        .align 32
> +        .quad 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000 // dA00
> +        .align 32
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 //dSIGN_MASK
> +        .align 32
> +        .long 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000 //iCHK_WORK_SUB
> +        .align 32
> +        .long 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000 //iCHK_WORK_CMP
> +        .align 32
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff //dABS_MASK
> +        .align 32
> +        .quad 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000 //dZERO
> +        .align 32
> +        .type	__svml_datan2_data_internal,@object
> +        .size	__svml_datan2_data_internal,.-__svml_datan2_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core-avx2.S
> new file mode 100644
> index 0000000000..a8d34a6143
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized atan2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8vv_atan2 _ZGVeN8vv_atan2_avx2_wrapper
> +#include "../svml_d_atan28_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core.c
> new file mode 100644
> index 0000000000..a0897e9cf0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atan2, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8vv_atan2
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8vv_atan2, __GI__ZGVeN8vv_atan2,
> +	       __redirect__ZGVeN8vv_atan2)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core_avx512.S
> new file mode 100644
> index 0000000000..6d18f5f757
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_atan28_core_avx512.S
> @@ -0,0 +1,475 @@
> +/* Function atan2 vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_datan2_data_internal
> + */
> +#define dPI                           	0
> +#define dPIO2                         	64
> +#define dA19                          	128
> +#define dA18                          	192
> +#define dA17                          	256
> +#define dA16                          	320
> +#define dA15                          	384
> +#define dA14                          	448
> +#define dA13                          	512
> +#define dA12                          	576
> +#define dA11                          	640
> +#define dA10                          	704
> +#define dA09                          	768
> +#define dA08                          	832
> +#define dA07                          	896
> +#define dA06                          	960
> +#define dA05                          	1024
> +#define dA04                          	1088
> +#define dA03                          	1152
> +#define dA02                          	1216
> +#define dA01                          	1280
> +#define dA00                          	1344
> +#define dSIGN_MASK                    	1408
> +#define iCHK_WORK_SUB                 	1472
> +#define iCHK_WORK_CMP                 	1536
> +#define dABS_MASK                     	1600
> +#define dZERO                         	1664
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8vv_atan2_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $256, %rsp
> +        xorl      %edx, %edx
> +
> +/*
> + * #define NO_VECTOR_ZERO_ATAN2_ARGS
> + *  Declarations
> + * Variables
> + * Constants
> + *  The end of declarations
> + *  Implementation
> + * Get r0~=1/B
> + * Cannot be replaced by VQRCP(D, dR0, dB);
> + * Argument Absolute values
> + */
> +        vmovups   dABS_MASK+__svml_datan2_data_internal(%rip), %zmm4
> +
> +/* Argument signs */
> +        vmovups   dSIGN_MASK+__svml_datan2_data_internal(%rip), %zmm6
> +
> +/*
> + * 1) If y<x then a= y, b=x, PIO2=0
> + * 2) If y>x then a=-x, b=y, PIO2=Pi/2
> + */
> +        vmovups   dPIO2+__svml_datan2_data_internal(%rip), %zmm3
> +        vandpd    %zmm4, %zmm0, %zmm11
> +        vmovaps   %zmm1, %zmm7
> +        vandpd    %zmm4, %zmm7, %zmm2
> +        vandpd    %zmm6, %zmm7, %zmm5
> +        vandpd    %zmm6, %zmm0, %zmm4
> +        vorpd     %zmm6, %zmm2, %zmm12
> +        vcmppd    $17, {sae}, %zmm2, %zmm11, %k1
> +        vmovdqu   iCHK_WORK_CMP+__svml_datan2_data_internal(%rip), %ymm6
> +        vmovups   %zmm11, 64(%rsp)
> +
> +/* Check if y and x are on main path. */
> +        vpsrlq    $32, %zmm2, %zmm9
> +        vblendmpd %zmm11, %zmm12, %zmm13{%k1}
> +        vblendmpd %zmm2, %zmm11, %zmm15{%k1}
> +        vpsrlq    $32, %zmm11, %zmm8
> +        vmovdqu   iCHK_WORK_SUB+__svml_datan2_data_internal(%rip), %ymm12
> +        vdivpd    {rn-sae}, %zmm15, %zmm13, %zmm1
> +        vmovups   %zmm15, (%rsp)
> +        vpmovqd   %zmm9, %ymm14
> +        vpmovqd   %zmm8, %ymm10
> +        vxorpd    %zmm3, %zmm3, %zmm3{%k1}
> +        vpsubd    %ymm12, %ymm14, %ymm13
> +        vpsubd    %ymm12, %ymm10, %ymm9
> +
> +/* Polynomial. */
> +        vmulpd    {rn-sae}, %zmm1, %zmm1, %zmm12
> +        vpcmpgtd  %ymm6, %ymm13, %ymm15
> +        vpcmpeqd  %ymm6, %ymm13, %ymm11
> +        vmulpd    {rn-sae}, %zmm12, %zmm12, %zmm13
> +        vpor      %ymm11, %ymm15, %ymm8
> +        vmovups   dA19+__svml_datan2_data_internal(%rip), %zmm11
> +        vmovups   dA15+__svml_datan2_data_internal(%rip), %zmm15
> +        vpcmpgtd  %ymm6, %ymm9, %ymm14
> +        vpcmpeqd  %ymm6, %ymm9, %ymm6
> +        vpor      %ymm6, %ymm14, %ymm10
> +        vmulpd    {rn-sae}, %zmm13, %zmm13, %zmm14
> +        vmovups   dA18+__svml_datan2_data_internal(%rip), %zmm9
> +        vpor      %ymm10, %ymm8, %ymm6
> +        vmovups   dA17+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd231pd {rn-sae}, %zmm14, %zmm11, %zmm15
> +        vmovups   dA14+__svml_datan2_data_internal(%rip), %zmm11
> +        vmovups   dA12+__svml_datan2_data_internal(%rip), %zmm8
> +        vfmadd231pd {rn-sae}, %zmm14, %zmm9, %zmm11
> +        vmovups   dA13+__svml_datan2_data_internal(%rip), %zmm9
> +        vfmadd231pd {rn-sae}, %zmm14, %zmm10, %zmm9
> +        vmovups   dA16+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd231pd {rn-sae}, %zmm14, %zmm10, %zmm8
> +        vmovups   dA11+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm15
> +        vmovups   dA10+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm11
> +        vmovups   dA09+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm9
> +        vmovups   dA08+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm8
> +        vmovups   dA07+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm15
> +        vmovups   dA06+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm11
> +        vmovups   dA05+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm9
> +        vmovups   dA04+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm8
> +        vmovups   dA03+__svml_datan2_data_internal(%rip), %zmm10
> +
> +/* A00=1.0, account for it later  VQFMA(D, dP4, dP4, dR8, dA00); */
> +        vmulpd    {rn-sae}, %zmm14, %zmm8, %zmm8
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm15
> +        vmovups   dA02+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm11
> +        vmovups   dA01+__svml_datan2_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm11, %zmm12, %zmm15
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm14, %zmm9
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm12, %zmm9
> +        vmovups   __svml_datan2_data_internal(%rip), %zmm8
> +        vfmadd213pd {rn-sae}, %zmm9, %zmm13, %zmm15
> +
> +/*
> + * Reconstruction.
> + * dP=(R+R*dP) + dPIO2
> + */
> +        vfmadd213pd {rn-sae}, %zmm1, %zmm1, %zmm15
> +        vaddpd    {rn-sae}, %zmm3, %zmm15, %zmm1
> +        vorpd     %zmm5, %zmm1, %zmm9
> +
> +/* if x<0, dPI = Pi, else dPI =0 */
> +        vmovups   dZERO+__svml_datan2_data_internal(%rip), %zmm1
> +        vcmppd    $18, {sae}, %zmm1, %zmm7, %k2
> +        vaddpd    {rn-sae}, %zmm8, %zmm9, %zmm9{%k2}
> +        vmovmskps %ymm6, %eax
> +        vorpd     %zmm4, %zmm9, %zmm11
> +
> +/*  Special branch for fast (vector) processing of zero arguments  */
> +        vmovups   64(%rsp), %zmm9
> +        testl     %eax, %eax
> +
> +/* Go to auxilary branch */
> +        jne       L(AUX_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm6 zmm0 zmm2 zmm3 zmm4 zmm5 zmm7 zmm9 zmm11
> +
> +/* Return from auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH_RETURN):
> +/*
> + *  Special branch for fast (vector) processing of zero arguments
> + *  The end of implementation
> + */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm7 zmm11
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %zmm11, %zmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm0, 64(%rsp)
> +        vmovups   %zmm7, 128(%rsp)
> +        vmovups   %zmm11, 192(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm11
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   192(%rsp), %zmm11
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm11
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        movsd     128(%rsp,%r14,8), %xmm1
> +        call      atan2@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 192(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +        cfi_restore(12)
> +        cfi_restore(13)
> +        cfi_restore(14)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH):
> +/* Check if at least on of Y or Y is zero: iAXAYZERO */
> +        vmovups   dZERO+__svml_datan2_data_internal(%rip), %zmm8
> +
> +/* Check if both X & Y are not NaNs:  iXYnotNAN */
> +        vcmppd    $3, {sae}, %zmm7, %zmm7, %k1
> +        vcmppd    $3, {sae}, %zmm0, %zmm0, %k2
> +        vcmppd    $4, {sae}, %zmm8, %zmm2, %k3
> +        vcmppd    $4, {sae}, %zmm8, %zmm9, %k4
> +
> +/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
> +        vpcmpgtq  %zmm7, %zmm8, %k6
> +        vpternlogd $0xff, %zmm1, %zmm1, %zmm10
> +        vmovaps   %zmm10, %zmm15
> +        vmovaps   %zmm10, %zmm12
> +        vmovaps   %zmm10, %zmm13
> +        vpandnq   %zmm2, %zmm2, %zmm15{%k3}
> +        vmovaps   %zmm10, %zmm2
> +        vpandnq   %zmm7, %zmm7, %zmm12{%k1}
> +        vpandnq   %zmm0, %zmm0, %zmm13{%k2}
> +        vpandnq   %zmm9, %zmm9, %zmm2{%k4}
> +        vandpd    %zmm13, %zmm12, %zmm14
> +        vorpd     %zmm2, %zmm15, %zmm9
> +        vpsrlq    $32, %zmm14, %zmm1
> +        vpsrlq    $32, %zmm9, %zmm2
> +        vpmovqd   %zmm1, %ymm1
> +        vpmovqd   %zmm2, %ymm9
> +
> +/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
> +        vpand     %ymm1, %ymm9, %ymm2
> +
> +/*
> + *  Path for zero arguments (at least one of both)
> + * Check if both args are zeros (den. is zero)
> + */
> +        vmovups   (%rsp), %zmm1
> +
> +/* Exclude from previous callout mask zero (and not NaN) arguments */
> +        vpandn    %ymm6, %ymm2, %ymm6
> +        vcmppd    $4, {sae}, %zmm8, %zmm1, %k5
> +
> +/* Go to callout */
> +        vmovmskps %ymm6, %edx
> +        vpandnq   %zmm1, %zmm1, %zmm10{%k5}
> +
> +/* Set sPIO2 to zero if den. is zero */
> +        vpandnq   %zmm3, %zmm10, %zmm3
> +        vpandq    %zmm10, %zmm8, %zmm1
> +        vporq     %zmm1, %zmm3, %zmm3
> +        vorpd     %zmm5, %zmm3, %zmm1
> +        vmovups   __svml_datan2_data_internal(%rip), %zmm5
> +        vaddpd    {rn-sae}, %zmm5, %zmm1, %zmm1{%k6}
> +        vorpd     %zmm4, %zmm1, %zmm1
> +
> +/* Merge results from main and spec path */
> +        vpmovzxdq %ymm2, %zmm4
> +        vpsllq    $32, %zmm4, %zmm2
> +        vpord     %zmm2, %zmm4, %zmm3
> +        vpandnq   %zmm11, %zmm3, %zmm11
> +        vpandq    %zmm3, %zmm1, %zmm1
> +        vporq     %zmm1, %zmm11, %zmm11
> +
> +/* Return to main vector processing path */
> +        jmp       L(AUX_BRANCH_RETURN)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm7 zmm11
> +END(_ZGVeN8vv_atan2_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_datan2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 dPI[8][2];
> +        __declspec(align(64)) VUINT32 dPIO2[8][2];
> +        __declspec(align(64)) VUINT32 dA19[8][2];
> +        __declspec(align(64)) VUINT32 dA18[8][2];
> +        __declspec(align(64)) VUINT32 dA17[8][2];
> +        __declspec(align(64)) VUINT32 dA16[8][2];
> +        __declspec(align(64)) VUINT32 dA15[8][2];
> +        __declspec(align(64)) VUINT32 dA14[8][2];
> +        __declspec(align(64)) VUINT32 dA13[8][2];
> +        __declspec(align(64)) VUINT32 dA12[8][2];
> +        __declspec(align(64)) VUINT32 dA11[8][2];
> +        __declspec(align(64)) VUINT32 dA10[8][2];
> +        __declspec(align(64)) VUINT32 dA09[8][2];
> +        __declspec(align(64)) VUINT32 dA08[8][2];
> +        __declspec(align(64)) VUINT32 dA07[8][2];
> +        __declspec(align(64)) VUINT32 dA06[8][2];
> +        __declspec(align(64)) VUINT32 dA05[8][2];
> +        __declspec(align(64)) VUINT32 dA04[8][2];
> +        __declspec(align(64)) VUINT32 dA03[8][2];
> +        __declspec(align(64)) VUINT32 dA02[8][2];
> +        __declspec(align(64)) VUINT32 dA01[8][2];
> +        __declspec(align(64)) VUINT32 dA00[8][2];
> +        __declspec(align(64)) VUINT32 dSIGN_MASK[8][2];
> +        __declspec(align(64)) VUINT32 iCHK_WORK_SUB[16][1];
> +        __declspec(align(64)) VUINT32 iCHK_WORK_CMP[16][1];
> +        __declspec(align(64)) VUINT32 dABS_MASK[8][2];
> +        __declspec(align(64)) VUINT32 dZERO[8][2];
> +} __svml_datan2_data_internal;
> +#endif
> +__svml_datan2_data_internal:
> +        .quad 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18, 0x400921FB54442D18 //dPI
> +        .align 64
> +        .quad 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18, 0x3FF921FB54442D18 //dPIO2
> +        .align 64
> +        .quad 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3, 0xBEF4FDB537ABC7A3 // dA19
> +        .align 64
> +        .quad 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209, 0x3F2CED0A36665209 // dA18
> +        .align 64
> +        .quad 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23, 0xBF52E67C93954C23 // dA17
> +        .align 64
> +        .quad 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3, 0x3F6F5A1DAE82AFB3 // dA16
> +        .align 64
> +        .quad 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD, 0xBF82B2EC618E4BAD // dA15
> +        .align 64
> +        .quad 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5, 0x3F914F4C661116A5 // dA14
> +        .align 64
> +        .quad 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C, 0xBF9A5E83B081F69C // dA13
> +        .align 64
> +        .quad 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F, 0x3FA169980CB6AD4F // dA12
> +        .align 64
> +        .quad 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC, 0xBFA4EFA2E563C1BC // dA11
> +        .align 64
> +        .quad 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B, 0x3FA7EC0FBC50683B // dA10
> +        .align 64
> +        .quad 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954, 0xBFAAD261EAA09954 // dA09
> +        .align 64
> +        .quad 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF, 0x3FAE1749BD612DCF // dA08
> +        .align 64
> +        .quad 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0, 0xBFB11084009435E0 // dA07
> +        .align 64
> +        .quad 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651, 0x3FB3B12A49295651 // dA06
> +        .align 64
> +        .quad 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94, 0xBFB745D009BADA94 // dA05
> +        .align 64
> +        .quad 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5, 0x3FBC71C707F7D5B5 // dA04
> +        .align 64
> +        .quad 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7, 0xBFC2492491EE55C7 // dA03
> +        .align 64
> +        .quad 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34, 0x3FC999999997EE34 // dA02
> +        .align 64
> +        .quad 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5, 0xBFD55555555553C5 // dA01
> +        .align 64
> +        .quad 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000, 0x3FF0000000000000 // dA00
> +        .align 64
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 //dSIGN_MASK
> +        .align 64
> +        .long 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000, 0x80300000 //iCHK_WORK_SUB
> +        .align 64
> +        .long 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000, 0xfdd00000 //iCHK_WORK_CMP
> +        .align 64
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff //dABS_MASK
> +        .align 64
> +        .quad 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000 //dZERO
> +        .align 64
> +        .type	__svml_datan2_data_internal,@object
> +        .size	__svml_datan2_data_internal,.-__svml_datan2_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core-avx2.S
> new file mode 100644
> index 0000000000..a2a76e8bfd
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized atan2f.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16vv_atan2f _ZGVeN16vv_atan2f_avx2_wrapper
> +#include "../svml_s_atan2f16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core.c
> new file mode 100644
> index 0000000000..6fa806414d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atan2f, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16vv_atan2f
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16vv_atan2f, __GI__ZGVeN16vv_atan2f,
> +	       __redirect__ZGVeN16vv_atan2f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core_avx512.S
> new file mode 100644
> index 0000000000..f3477cc8e6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f16_core_avx512.S
> @@ -0,0 +1,399 @@
> +/* Function atan2f vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_satan2_data_internal
> + */
> +#define sZERO                         	0
> +#define sONE                          	64
> +#define sSIGN_MASK                    	128
> +#define sABS_MASK                     	192
> +#define sPIO2                         	256
> +#define sPI                           	320
> +#define sPC8                          	384
> +#define sPC7                          	448
> +#define sPC6                          	512
> +#define sPC5                          	576
> +#define sPC4                          	640
> +#define sPC3                          	704
> +#define sPC2                          	768
> +#define sPC1                          	832
> +#define sPC0                          	896
> +#define iCHK_WORK_SUB                 	960
> +#define iCHK_WORK_CMP                 	1024
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16vv_atan2f_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $256, %rsp
> +        xorl      %edx, %edx
> +
> +/*
> + * #define NO_VECTOR_ZERO_ATAN2_ARGS
> + *  Declarations
> + * Variables
> + * Constants
> + *  The end of declarations
> + *  Implementation
> + * Arguments signs
> + */
> +        vmovups   sABS_MASK+__svml_satan2_data_internal(%rip), %zmm6
> +        vmovups   sONE+__svml_satan2_data_internal(%rip), %zmm3
> +
> +/* Testing on working interval. */
> +        vmovups   iCHK_WORK_SUB+__svml_satan2_data_internal(%rip), %zmm9
> +        vmovups   iCHK_WORK_CMP+__svml_satan2_data_internal(%rip), %zmm14
> +
> +/*
> + * 1) If y<x then a= y, b=x, PIO2=0
> + * 2) If y>x then a=-x, b=y, PIO2=Pi/2
> + */
> +        vmovups   sPIO2+__svml_satan2_data_internal(%rip), %zmm4
> +        vpternlogd $255, %zmm13, %zmm13, %zmm13
> +        vmovaps   %zmm1, %zmm8
> +        vandps    %zmm6, %zmm8, %zmm2
> +        vandps    %zmm6, %zmm0, %zmm1
> +        vorps     sSIGN_MASK+__svml_satan2_data_internal(%rip), %zmm2, %zmm5
> +        vpsubd    %zmm9, %zmm2, %zmm10
> +        vpsubd    %zmm9, %zmm1, %zmm12
> +        vxorps    %zmm2, %zmm8, %zmm7
> +        vxorps    %zmm1, %zmm0, %zmm6
> +        vcmpps    $17, {sae}, %zmm2, %zmm1, %k1
> +        vpcmpgtd  %zmm10, %zmm14, %k2
> +        vpcmpgtd  %zmm12, %zmm14, %k3
> +        vmovups   sPC6+__svml_satan2_data_internal(%rip), %zmm14
> +        vblendmps %zmm1, %zmm5, %zmm11{%k1}
> +        vblendmps %zmm2, %zmm1, %zmm5{%k1}
> +        vxorps    %zmm4, %zmm4, %zmm4{%k1}
> +
> +/*
> + * Division a/b.
> + * Enabled when FMA is available and
> + * performance is better with NR iteration
> + */
> +        vrcp14ps  %zmm5, %zmm15
> +        vfnmadd231ps {rn-sae}, %zmm5, %zmm15, %zmm3
> +        vfmadd213ps {rn-sae}, %zmm15, %zmm3, %zmm15
> +        vmulps    {rn-sae}, %zmm15, %zmm11, %zmm3
> +        vfnmadd231ps {rn-sae}, %zmm5, %zmm3, %zmm11
> +        vfmadd213ps {rn-sae}, %zmm3, %zmm11, %zmm15
> +        vmovups   sPC8+__svml_satan2_data_internal(%rip), %zmm11
> +        vpternlogd $255, %zmm3, %zmm3, %zmm3
> +
> +/* Polynomial. */
> +        vmulps    {rn-sae}, %zmm15, %zmm15, %zmm9
> +        vpandnd   %zmm10, %zmm10, %zmm13{%k2}
> +        vmulps    {rn-sae}, %zmm9, %zmm9, %zmm10
> +        vfmadd231ps {rn-sae}, %zmm10, %zmm11, %zmm14
> +        vmovups   sPC5+__svml_satan2_data_internal(%rip), %zmm11
> +        vpandnd   %zmm12, %zmm12, %zmm3{%k3}
> +        vpord     %zmm3, %zmm13, %zmm3
> +        vmovups   sPC4+__svml_satan2_data_internal(%rip), %zmm13
> +        vmovups   sPC7+__svml_satan2_data_internal(%rip), %zmm12
> +        vptestmd  %zmm3, %zmm3, %k0
> +        vfmadd213ps {rn-sae}, %zmm13, %zmm10, %zmm14
> +        vfmadd231ps {rn-sae}, %zmm10, %zmm12, %zmm11
> +        vmovups   sPC3+__svml_satan2_data_internal(%rip), %zmm12
> +        vmovups   sPC2+__svml_satan2_data_internal(%rip), %zmm13
> +
> +/*  Special branch for fast (vector) processing of zero arguments  */
> +        kortestw  %k0, %k0
> +        vfmadd213ps {rn-sae}, %zmm12, %zmm10, %zmm11
> +        vmovups   sPC1+__svml_satan2_data_internal(%rip), %zmm12
> +        vfmadd213ps {rn-sae}, %zmm13, %zmm10, %zmm14
> +        vmovups   sPC0+__svml_satan2_data_internal(%rip), %zmm13
> +        vfmadd213ps {rn-sae}, %zmm12, %zmm10, %zmm11
> +        vfmadd213ps {rn-sae}, %zmm13, %zmm10, %zmm14
> +        vfmadd213ps {rn-sae}, %zmm14, %zmm9, %zmm11
> +
> +/* Reconstruction. */
> +        vfmadd213ps {rn-sae}, %zmm4, %zmm15, %zmm11
> +
> +/* if x<0, sPI = Pi, else sPI =0 */
> +        vmovups   __svml_satan2_data_internal(%rip), %zmm15
> +        vorps     %zmm7, %zmm11, %zmm9
> +        vcmpps    $18, {sae}, %zmm15, %zmm8, %k4
> +        vmovups   sPI+__svml_satan2_data_internal(%rip), %zmm11
> +        vaddps    {rn-sae}, %zmm11, %zmm9, %zmm9{%k4}
> +        vorps     %zmm6, %zmm9, %zmm10
> +
> +/* Go to auxilary branch */
> +        jne       L(AUX_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1 zmm2 zmm3 zmm4 zmm5 zmm6 zmm7 zmm8 zmm10 zmm11
> +
> +/* Return from auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH_RETURN):
> +/*
> + *  Special branch for fast (vector) processing of zero arguments
> + *  The end of implementation
> + */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm8 zmm10
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %zmm10, %zmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm0, 64(%rsp)
> +        vmovups   %zmm8, 128(%rsp)
> +        vmovups   %zmm10, 192(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm10
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   192(%rsp), %zmm10
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -240; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x10, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -248; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x08, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -256; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x00, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm10
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        movss     128(%rsp,%r14,4), %xmm1
> +        call      atan2f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 192(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +        cfi_restore(12)
> +        cfi_restore(13)
> +        cfi_restore(14)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH):
> +/* Check if at least on of Y or Y is zero: iAXAYZERO */
> +        vmovups   __svml_satan2_data_internal(%rip), %zmm9
> +
> +/* Check if both X & Y are not NaNs:  iXYnotNAN */
> +        vcmpps    $3, {sae}, %zmm8, %zmm8, %k1
> +        vcmpps    $3, {sae}, %zmm0, %zmm0, %k2
> +        vpcmpd    $4, %zmm9, %zmm2, %k3
> +        vpcmpd    $4, %zmm9, %zmm1, %k4
> +
> +/*
> + *  Path for zero arguments (at least one of both)
> + * Check if both args are zeros (den. is zero)
> + */
> +        vcmpps    $4, {sae}, %zmm9, %zmm5, %k5
> +
> +/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
> +        vpcmpgtd  %zmm8, %zmm9, %k6
> +        vpternlogd $255, %zmm14, %zmm14, %zmm14
> +        vpternlogd $255, %zmm12, %zmm12, %zmm12
> +        vpternlogd $255, %zmm13, %zmm13, %zmm13
> +        vpandnd   %zmm2, %zmm2, %zmm14{%k3}
> +        vpternlogd $255, %zmm2, %zmm2, %zmm2
> +        vpandnd   %zmm1, %zmm1, %zmm2{%k4}
> +        vpord     %zmm2, %zmm14, %zmm15
> +        vpternlogd $255, %zmm2, %zmm2, %zmm2
> +        vpandnd   %zmm5, %zmm5, %zmm2{%k5}
> +
> +/* Set sPIO2 to zero if den. is zero */
> +        vpandnd   %zmm4, %zmm2, %zmm4
> +        vpandd    %zmm2, %zmm9, %zmm5
> +        vpord     %zmm5, %zmm4, %zmm2
> +        vorps     %zmm7, %zmm2, %zmm7
> +        vaddps    {rn-sae}, %zmm11, %zmm7, %zmm7{%k6}
> +        vorps     %zmm6, %zmm7, %zmm6
> +        vpandnd   %zmm8, %zmm8, %zmm12{%k1}
> +        vpandnd   %zmm0, %zmm0, %zmm13{%k2}
> +        vandps    %zmm13, %zmm12, %zmm12
> +
> +/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
> +        vpandd    %zmm12, %zmm15, %zmm1
> +
> +/* Exclude from previous callout mask zero (and not NaN) arguments */
> +        vpandnd   %zmm3, %zmm1, %zmm3
> +
> +/* Go to callout */
> +        vptestmd  %zmm3, %zmm3, %k0
> +        kmovw     %k0, %edx
> +
> +/* Merge results from main and spec path */
> +        vpandnd   %zmm10, %zmm1, %zmm10
> +        vpandd    %zmm1, %zmm6, %zmm11
> +        vpord     %zmm11, %zmm10, %zmm10
> +
> +/* Return to main vector processing path */
> +        jmp       L(AUX_BRANCH_RETURN)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm8 zmm10
> +END(_ZGVeN16vv_atan2f_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_satan2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 sZERO[16][1];
> +        __declspec(align(64)) VUINT32 sONE[16][1];
> +        __declspec(align(64)) VUINT32 sSIGN_MASK[16][1];
> +        __declspec(align(64)) VUINT32 sABS_MASK[16][1];
> +        __declspec(align(64)) VUINT32 sPIO2[16][1];
> +        __declspec(align(64)) VUINT32 sPI[16][1];
> +        __declspec(align(64)) VUINT32 sPC8[16][1];
> +        __declspec(align(64)) VUINT32 sPC7[16][1];
> +        __declspec(align(64)) VUINT32 sPC6[16][1];
> +        __declspec(align(64)) VUINT32 sPC5[16][1];
> +        __declspec(align(64)) VUINT32 sPC4[16][1];
> +        __declspec(align(64)) VUINT32 sPC3[16][1];
> +        __declspec(align(64)) VUINT32 sPC2[16][1];
> +        __declspec(align(64)) VUINT32 sPC1[16][1];
> +        __declspec(align(64)) VUINT32 sPC0[16][1];
> +        __declspec(align(64)) VUINT32 iCHK_WORK_SUB[16][1];
> +        __declspec(align(64)) VUINT32 iCHK_WORK_CMP[16][1];
> +} __svml_satan2_data_internal;
> +#endif
> +__svml_satan2_data_internal:
> +        .long 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 // sZERO
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000 // sONE
> +        .align 64
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000 // sSIGN_MASK
> +        .align 64
> +        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF // sABS_MASK
> +        .align 64
> +        .long 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB // sPIO2
> +        .align 64
> +        .long 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB // sPI
> +        .align 64
> +        .long 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0 // sA08
> +        .align 64
> +        .long 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631 // sA07
> +        .align 64
> +        .long 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384 // sA06
> +        .align 64
> +        .long 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629 // sA05
> +        .align 64
> +        .long 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474 // sA04
> +        .align 64
> +        .long 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8 // sA03
> +        .align 64
> +        .long 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F // sA02
> +        .align 64
> +        .long 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49 // sA01
> +        .align 64
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 // sA00
> +        .align 64
> +        .long 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000 //iCHK_WORK_SUB
> +        .align 64
> +        .long 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000 //iCHK_WORK_CMP
> +        .align 64
> +        .type	__svml_satan2_data_internal,@object
> +        .size	__svml_satan2_data_internal,.-__svml_satan2_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core-sse2.S
> new file mode 100644
> index 0000000000..d1a67facf1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized atan2f.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4vv_atan2f _ZGVbN4vv_atan2f_sse2
> +#include "../svml_s_atan2f4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core.c
> new file mode 100644
> index 0000000000..ee882b0557
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized atan2f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4vv_atan2f
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4vv_atan2f, __GI__ZGVbN4vv_atan2f,
> +	       __redirect__ZGVbN4vv_atan2f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core_sse4.S
> new file mode 100644
> index 0000000000..e4fbe82501
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f4_core_sse4.S
> @@ -0,0 +1,384 @@
> +/* Function atan2f vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_satan2_data_internal
> + */
> +#define sZERO                         	0
> +#define sSIGN_MASK                    	16
> +#define sABS_MASK                     	32
> +#define sPIO2                         	48
> +#define sPI                           	64
> +#define sPC8                          	80
> +#define sPC7                          	96
> +#define sPC6                          	112
> +#define sPC5                          	128
> +#define sPC4                          	144
> +#define sPC3                          	160
> +#define sPC2                          	176
> +#define sPC1                          	192
> +#define sPC0                          	208
> +#define iCHK_WORK_SUB                 	224
> +#define iCHK_WORK_CMP                 	240
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4vv_atan2f_sse4)
> +        subq      $88, %rsp
> +        cfi_def_cfa_offset(96)
> +        movaps    %xmm0, %xmm12
> +
> +/*
> + * #define NO_VECTOR_ZERO_ATAN2_ARGS
> + *  Declarations
> + * Variables
> + * Constants
> + *  The end of declarations
> + *  Implementation
> + * Arguments signs
> + */
> +        movups    sABS_MASK+__svml_satan2_data_internal(%rip), %xmm10
> +        movaps    %xmm1, %xmm13
> +        movaps    %xmm10, %xmm11
> +        andps     %xmm12, %xmm10
> +        andps     %xmm13, %xmm11
> +        movaps    %xmm10, %xmm7
> +        cmpltps   %xmm11, %xmm7
> +
> +/*
> + * 1) If y<x then a= y, b=x, PIO2=0
> + * 2) If y>x then a=-x, b=y, PIO2=Pi/2
> + */
> +        movups    sSIGN_MASK+__svml_satan2_data_internal(%rip), %xmm6
> +        movaps    %xmm7, %xmm0
> +        orps      %xmm11, %xmm6
> +        movaps    %xmm10, %xmm4
> +        andnps    %xmm6, %xmm0
> +        movaps    %xmm7, %xmm6
> +        movaps    %xmm11, %xmm5
> +        andps     %xmm7, %xmm4
> +        andnps    %xmm10, %xmm6
> +        andps     %xmm7, %xmm5
> +        orps      %xmm4, %xmm0
> +        orps      %xmm5, %xmm6
> +
> +/* Division a/b. */
> +        divps     %xmm6, %xmm0
> +
> +/* Testing on working interval. */
> +        movdqu    iCHK_WORK_SUB+__svml_satan2_data_internal(%rip), %xmm14
> +        movaps    %xmm11, %xmm15
> +        movaps    %xmm10, %xmm3
> +        psubd     %xmm14, %xmm15
> +        psubd     %xmm14, %xmm3
> +        movdqa    %xmm15, %xmm1
> +        movdqu    iCHK_WORK_CMP+__svml_satan2_data_internal(%rip), %xmm2
> +        movdqa    %xmm3, %xmm14
> +        pcmpgtd   %xmm2, %xmm1
> +        pcmpeqd   %xmm2, %xmm15
> +        pcmpgtd   %xmm2, %xmm14
> +        pcmpeqd   %xmm2, %xmm3
> +        por       %xmm15, %xmm1
> +        por       %xmm3, %xmm14
> +        por       %xmm14, %xmm1
> +
> +/* Polynomial. */
> +        movaps    %xmm0, %xmm14
> +        mulps     %xmm0, %xmm14
> +        movaps    %xmm13, %xmm4
> +        movmskps  %xmm1, %ecx
> +        movaps    %xmm14, %xmm15
> +        movaps    %xmm11, %xmm9
> +        mulps     %xmm14, %xmm15
> +        pxor      %xmm13, %xmm9
> +        movups    sPC8+__svml_satan2_data_internal(%rip), %xmm2
> +        movaps    %xmm10, %xmm8
> +        mulps     %xmm15, %xmm2
> +        pxor      %xmm12, %xmm8
> +        movups    sPC7+__svml_satan2_data_internal(%rip), %xmm3
> +        xorl      %edx, %edx
> +        mulps     %xmm15, %xmm3
> +        addps     sPC6+__svml_satan2_data_internal(%rip), %xmm2
> +        mulps     %xmm15, %xmm2
> +        addps     sPC5+__svml_satan2_data_internal(%rip), %xmm3
> +        mulps     %xmm15, %xmm3
> +        addps     sPC4+__svml_satan2_data_internal(%rip), %xmm2
> +        mulps     %xmm15, %xmm2
> +        addps     sPC3+__svml_satan2_data_internal(%rip), %xmm3
> +        mulps     %xmm15, %xmm3
> +        addps     sPC2+__svml_satan2_data_internal(%rip), %xmm2
> +        mulps     %xmm2, %xmm15
> +        addps     sPC1+__svml_satan2_data_internal(%rip), %xmm3
> +        mulps     %xmm3, %xmm14
> +        addps     sPC0+__svml_satan2_data_internal(%rip), %xmm15
> +
> +/* if x<0, sPI = Pi, else sPI =0 */
> +        movups    __svml_satan2_data_internal(%rip), %xmm5
> +        xorl      %eax, %eax
> +        andnps    sPIO2+__svml_satan2_data_internal(%rip), %xmm7
> +        addps     %xmm14, %xmm15
> +        cmpleps   %xmm5, %xmm4
> +
> +/* Reconstruction. */
> +        mulps     %xmm15, %xmm0
> +        andps     sPI+__svml_satan2_data_internal(%rip), %xmm4
> +        addps     %xmm7, %xmm0
> +        orps      %xmm9, %xmm0
> +        addps     %xmm4, %xmm0
> +        orps      %xmm8, %xmm0
> +
> +/*  Special branch for fast (vector) processing of zero arguments  */
> +        testl     %ecx, %ecx
> +
> +/* Go to auxilary branch */
> +        jne       L(AUX_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm1 xmm4 xmm5 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13
> +
> +/* Return from auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH_RETURN):
> +/*
> + *  Special branch for fast (vector) processing of zero arguments
> + *  The end of implementation
> + */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm12 xmm13
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $88, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(96)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm12, 32(%rsp)
> +        movups    %xmm13, 48(%rsp)
> +        movups    %xmm0, 64(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0
> +
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -80)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -88)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -96)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    64(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -80)
> +        cfi_offset(13, -88)
> +        cfi_offset(14, -96)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        movss     48(%rsp,%r14,4), %xmm1
> +        call      atan2f@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +        cfi_restore(12)
> +        cfi_restore(13)
> +        cfi_restore(14)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH):
> +/* Check if both X & Y are not NaNs:  iXYnotNAN */
> +        movaps    %xmm13, %xmm3
> +        movaps    %xmm12, %xmm2
> +        cmpordps  %xmm13, %xmm3
> +        cmpordps  %xmm12, %xmm2
> +
> +/*
> + *  Path for zero arguments (at least one of both)
> + * Check if both args are zeros (den. is zero)
> + */
> +        cmpeqps   %xmm5, %xmm6
> +
> +/* Check if at least on of Y or Y is zero: iAXAYZERO */
> +        pcmpeqd   %xmm5, %xmm11
> +        pcmpeqd   %xmm5, %xmm10
> +        andps     %xmm2, %xmm3
> +        por       %xmm10, %xmm11
> +
> +/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
> +        andps     %xmm3, %xmm11
> +
> +/* Exclude from previous callout mask zero (and not NaN) arguments */
> +        movaps    %xmm11, %xmm10
> +        pandn     %xmm1, %xmm10
> +
> +/* Set sPIO2 to zero if den. is zero */
> +        movaps    %xmm6, %xmm1
> +        andnps    %xmm7, %xmm1
> +        andps     %xmm5, %xmm6
> +        orps      %xmm6, %xmm1
> +
> +/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
> +        pcmpgtd   %xmm13, %xmm5
> +        orps      %xmm9, %xmm1
> +        andps     %xmm4, %xmm5
> +
> +/* Merge results from main and spec path */
> +        movaps    %xmm11, %xmm4
> +        addps     %xmm5, %xmm1
> +
> +/* Go to callout */
> +        movmskps  %xmm10, %edx
> +        orps      %xmm8, %xmm1
> +        andnps    %xmm0, %xmm4
> +        andps     %xmm11, %xmm1
> +        movaps    %xmm4, %xmm0
> +        orps      %xmm1, %xmm0
> +
> +/* Return to main vector processing path */
> +        jmp       L(AUX_BRANCH_RETURN)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax edx xmm0 xmm12 xmm13
> +END(_ZGVbN4vv_atan2f_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_satan2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 sZERO[4][1];
> +        __declspec(align(16)) VUINT32 sSIGN_MASK[4][1];
> +        __declspec(align(16)) VUINT32 sABS_MASK[4][1];
> +        __declspec(align(16)) VUINT32 sPIO2[4][1];
> +        __declspec(align(16)) VUINT32 sPI[4][1];
> +        __declspec(align(16)) VUINT32 sPC8[4][1];
> +        __declspec(align(16)) VUINT32 sPC7[4][1];
> +        __declspec(align(16)) VUINT32 sPC6[4][1];
> +        __declspec(align(16)) VUINT32 sPC5[4][1];
> +        __declspec(align(16)) VUINT32 sPC4[4][1];
> +        __declspec(align(16)) VUINT32 sPC3[4][1];
> +        __declspec(align(16)) VUINT32 sPC2[4][1];
> +        __declspec(align(16)) VUINT32 sPC1[4][1];
> +        __declspec(align(16)) VUINT32 sPC0[4][1];
> +        __declspec(align(16)) VUINT32 iCHK_WORK_SUB[4][1];
> +        __declspec(align(16)) VUINT32 iCHK_WORK_CMP[4][1];
> +} __svml_satan2_data_internal;
> +#endif
> +__svml_satan2_data_internal:
> +        .long 0x00000000, 0x00000000, 0x00000000, 0x00000000 // sZERO
> +        .align 16
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000 // sSIGN_MASK
> +        .align 16
> +        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF // sABS_MASK
> +        .align 16
> +        .long 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB // sPIO2
> +        .align 16
> +        .long 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB // sPI
> +        .align 16
> +        .long 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0 // sA08
> +        .align 16
> +        .long 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631 // sA07
> +        .align 16
> +        .long 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384 // sA06
> +        .align 16
> +        .long 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629 // sA05
> +        .align 16
> +        .long 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474 // sA04
> +        .align 16
> +        .long 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8 // sA03
> +        .align 16
> +        .long 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F // sA02
> +        .align 16
> +        .long 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49 // sA01
> +        .align 16
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 // sA00
> +        .align 16
> +        .long 0x81000000, 0x81000000, 0x81000000, 0x81000000 //iCHK_WORK_SUB
> +        .align 16
> +        .long 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000 //iCHK_WORK_CMP
> +        .align 16
> +        .type	__svml_satan2_data_internal,@object
> +        .size	__svml_satan2_data_internal,.-__svml_satan2_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core-sse.S
> new file mode 100644
> index 0000000000..21b1d3ff63
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized atan2f.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8vv_atan2f _ZGVdN8vv_atan2f_sse_wrapper
> +#include "../svml_s_atan2f8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core.c
> new file mode 100644
> index 0000000000..7e02050983
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized sinf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8vv_atan2f
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8vv_atan2f, __GI__ZGVdN8vv_atan2f,
> +	       __redirect__ZGVdN8vv_atan2f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core_avx2.S
> new file mode 100644
> index 0000000000..2e6e5eb71c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_atan2f8_core_avx2.S
> @@ -0,0 +1,362 @@
> +/* Function atan2f vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *      For    0.0    <= x <=  7.0/16.0: atan(x) = atan(0.0) + atan(s), where s=(x-0.0)/(1.0+0.0*x)
> + *      For  7.0/16.0 <= x <= 11.0/16.0: atan(x) = atan(0.5) + atan(s), where s=(x-0.5)/(1.0+0.5*x)
> + *      For 11.0/16.0 <= x <= 19.0/16.0: atan(x) = atan(1.0) + atan(s), where s=(x-1.0)/(1.0+1.0*x)
> + *      For 19.0/16.0 <= x <= 39.0/16.0: atan(x) = atan(1.5) + atan(s), where s=(x-1.5)/(1.0+1.5*x)
> + *      For 39.0/16.0 <= x <=    inf   : atan(x) = atan(inf) + atan(s), where s=-1.0/x
> + *      Where atan(s) ~= s+s^3*Poly11(s^2) on interval |s|<7.0/0.16.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_satan2_data_internal
> + */
> +#define sZERO                         	0
> +#define sSIGN_MASK                    	32
> +#define sABS_MASK                     	64
> +#define sPIO2                         	96
> +#define sPI                           	128
> +#define sPC8                          	160
> +#define sPC7                          	192
> +#define sPC6                          	224
> +#define sPC5                          	256
> +#define sPC4                          	288
> +#define sPC3                          	320
> +#define sPC2                          	352
> +#define sPC1                          	384
> +#define sPC0                          	416
> +#define iCHK_WORK_SUB                 	448
> +#define iCHK_WORK_CMP                 	480
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8vv_atan2f_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $128, %rsp
> +        xorl      %edx, %edx
> +
> +/*
> + * #define NO_VECTOR_ZERO_ATAN2_ARGS
> + *  Declarations
> + * Variables
> + * Constants
> + *  The end of declarations
> + *  Implementation
> + * Arguments signs
> + */
> +        vmovups   sABS_MASK+__svml_satan2_data_internal(%rip), %ymm2
> +
> +/* Testing on working interval. */
> +        vmovups   iCHK_WORK_SUB+__svml_satan2_data_internal(%rip), %ymm15
> +        vmovups   iCHK_WORK_CMP+__svml_satan2_data_internal(%rip), %ymm9
> +
> +/* if x<0, sPI = Pi, else sPI =0 */
> +        vmovups   __svml_satan2_data_internal(%rip), %ymm5
> +        vmovaps   %ymm1, %ymm7
> +        vandps    %ymm2, %ymm7, %ymm13
> +        vandps    %ymm2, %ymm0, %ymm12
> +        vcmplt_oqps %ymm13, %ymm12, %ymm4
> +        vcmple_oqps %ymm5, %ymm7, %ymm6
> +        vpsubd    %ymm15, %ymm13, %ymm10
> +        vpsubd    %ymm15, %ymm12, %ymm8
> +
> +/*
> + * 1) If y<x then a= y, b=x, PIO2=0
> + * 2) If y>x then a=-x, b=y, PIO2=Pi/2
> + */
> +        vorps     sSIGN_MASK+__svml_satan2_data_internal(%rip), %ymm13, %ymm3
> +        vblendvps %ymm4, %ymm12, %ymm3, %ymm14
> +        vblendvps %ymm4, %ymm13, %ymm12, %ymm3
> +
> +/* Division a/b. */
> +        vdivps    %ymm3, %ymm14, %ymm11
> +        vpcmpgtd  %ymm9, %ymm10, %ymm14
> +        vpcmpeqd  %ymm9, %ymm10, %ymm15
> +        vpor      %ymm15, %ymm14, %ymm10
> +        vmovups   sPC7+__svml_satan2_data_internal(%rip), %ymm15
> +        vpcmpgtd  %ymm9, %ymm8, %ymm14
> +        vpcmpeqd  %ymm9, %ymm8, %ymm8
> +        vpor      %ymm8, %ymm14, %ymm9
> +        vmovups   sPC8+__svml_satan2_data_internal(%rip), %ymm14
> +        vpor      %ymm9, %ymm10, %ymm10
> +
> +/* Polynomial. */
> +        vmulps    %ymm11, %ymm11, %ymm9
> +        vmulps    %ymm9, %ymm9, %ymm8
> +        vfmadd213ps sPC6+__svml_satan2_data_internal(%rip), %ymm8, %ymm14
> +        vfmadd213ps sPC5+__svml_satan2_data_internal(%rip), %ymm8, %ymm15
> +        vfmadd213ps sPC4+__svml_satan2_data_internal(%rip), %ymm8, %ymm14
> +        vfmadd213ps sPC3+__svml_satan2_data_internal(%rip), %ymm8, %ymm15
> +        vfmadd213ps sPC2+__svml_satan2_data_internal(%rip), %ymm8, %ymm14
> +        vfmadd213ps sPC1+__svml_satan2_data_internal(%rip), %ymm8, %ymm15
> +        vfmadd213ps sPC0+__svml_satan2_data_internal(%rip), %ymm8, %ymm14
> +        vfmadd213ps %ymm14, %ymm9, %ymm15
> +        vandnps   sPIO2+__svml_satan2_data_internal(%rip), %ymm4, %ymm4
> +
> +/* Reconstruction. */
> +        vfmadd213ps %ymm4, %ymm11, %ymm15
> +        vxorps    %ymm13, %ymm7, %ymm1
> +        vandps    sPI+__svml_satan2_data_internal(%rip), %ymm6, %ymm6
> +        vorps     %ymm1, %ymm15, %ymm11
> +        vaddps    %ymm11, %ymm6, %ymm8
> +        vmovmskps %ymm10, %eax
> +        vxorps    %ymm12, %ymm0, %ymm2
> +        vorps     %ymm2, %ymm8, %ymm9
> +
> +/*  Special branch for fast (vector) processing of zero arguments  */
> +        testl     %eax, %eax
> +
> +/* Go to auxilary branch */
> +        jne       L(AUX_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1 ymm2 ymm3 ymm4 ymm5 ymm6 ymm7 ymm9 ymm10 ymm12 ymm13
> +
> +/* Return from auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH_RETURN):
> +/*
> + *  Special branch for fast (vector) processing of zero arguments
> + *  The end of implementation
> + */
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm7 ymm9
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %ymm9, %ymm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm0, 32(%rsp)
> +        vmovups   %ymm7, 64(%rsp)
> +        vmovups   %ymm9, 96(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm9
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   96(%rsp), %ymm9
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -112; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x90, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm9
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        movss     64(%rsp,%r14,4), %xmm1
> +        call      atan2f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 96(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +        cfi_restore(12)
> +        cfi_restore(13)
> +        cfi_restore(14)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Auxilary branch
> + * for out of main path inputs
> + */
> +
> +L(AUX_BRANCH):
> +/* Check if at least on of Y or Y is zero: iAXAYZERO */
> +        vpcmpeqd  %ymm5, %ymm13, %ymm13
> +        vpcmpeqd  %ymm5, %ymm12, %ymm12
> +
> +/* Check if both X & Y are not NaNs:  iXYnotNAN */
> +        vcmpordps %ymm7, %ymm7, %ymm11
> +        vcmpordps %ymm0, %ymm0, %ymm14
> +
> +/*
> + *  Path for zero arguments (at least one of both)
> + * Check if both args are zeros (den. is zero)
> + */
> +        vcmpeqps  %ymm5, %ymm3, %ymm3
> +        vpor      %ymm12, %ymm13, %ymm15
> +
> +/* Set sPIO2 to zero if den. is zero */
> +        vblendvps %ymm3, %ymm5, %ymm4, %ymm4
> +        vandps    %ymm14, %ymm11, %ymm8
> +
> +/* Check if at least on of Y or Y is zero and not NaN: iAXAYZEROnotNAN */
> +        vpand     %ymm8, %ymm15, %ymm8
> +
> +/* Res = sign(Y)*(X<0)?(PIO2+PI):PIO2 */
> +        vpcmpgtd  %ymm7, %ymm5, %ymm5
> +        vorps     %ymm1, %ymm4, %ymm1
> +        vandps    %ymm6, %ymm5, %ymm6
> +        vaddps    %ymm6, %ymm1, %ymm1
> +
> +/* Exclude from previous callout mask zero (and not NaN) arguments */
> +        vpandn    %ymm10, %ymm8, %ymm10
> +        vorps     %ymm2, %ymm1, %ymm2
> +
> +/* Go to callout */
> +        vmovmskps %ymm10, %edx
> +
> +/* Merge results from main and spec path */
> +        vblendvps %ymm8, %ymm2, %ymm9, %ymm9
> +
> +/* Return to main vector processing path */
> +        jmp       L(AUX_BRANCH_RETURN)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm7 ymm9
> +END(_ZGVdN8vv_atan2f_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_satan2_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 sZERO[8][1];
> +        __declspec(align(32)) VUINT32 sSIGN_MASK[8][1];
> +        __declspec(align(32)) VUINT32 sABS_MASK[8][1];
> +        __declspec(align(32)) VUINT32 sPIO2[8][1];
> +        __declspec(align(32)) VUINT32 sPI[8][1];
> +        __declspec(align(32)) VUINT32 sPC8[8][1];
> +        __declspec(align(32)) VUINT32 sPC7[8][1];
> +        __declspec(align(32)) VUINT32 sPC6[8][1];
> +        __declspec(align(32)) VUINT32 sPC5[8][1];
> +        __declspec(align(32)) VUINT32 sPC4[8][1];
> +        __declspec(align(32)) VUINT32 sPC3[8][1];
> +        __declspec(align(32)) VUINT32 sPC2[8][1];
> +        __declspec(align(32)) VUINT32 sPC1[8][1];
> +        __declspec(align(32)) VUINT32 sPC0[8][1];
> +        __declspec(align(32)) VUINT32 iCHK_WORK_SUB[8][1];
> +        __declspec(align(32)) VUINT32 iCHK_WORK_CMP[8][1];
> +} __svml_satan2_data_internal;
> +#endif
> +__svml_satan2_data_internal:
> +        .long 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 // sZERO
> +        .align 32
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000 // sSIGN_MASK
> +        .align 32
> +        .long 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF // sABS_MASK
> +        .align 32
> +        .long 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB, 0x3FC90FDB // sPIO2
> +        .align 32
> +        .long 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB, 0x40490FDB // sPI
> +        .align 32
> +        .long 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0, 0x3B322CC0 // sA08
> +        .align 32
> +        .long 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631, 0xBC7F2631 // sA07
> +        .align 32
> +        .long 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384, 0x3D2BC384 // sA06
> +        .align 32
> +        .long 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629, 0xBD987629 // sA05
> +        .align 32
> +        .long 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474, 0x3DD96474 // sA04
> +        .align 32
> +        .long 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8, 0xBE1161F8 // sA03
> +        .align 32
> +        .long 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F, 0x3E4CB79F // sA02
> +        .align 32
> +        .long 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49, 0xBEAAAA49 // sA01
> +        .align 32
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 // sA00
> +        .align 32
> +        .long 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000, 0x81000000 //iCHK_WORK_SUB
> +        .align 32
> +        .long 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000, 0xFC000000 //iCHK_WORK_CMP
> +        .align 32
> +        .type	__svml_satan2_data_internal,@object
> +        .size	__svml_satan2_data_internal,.-__svml_satan2_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_atan22_core.S b/sysdeps/x86_64/fpu/svml_d_atan22_core.S
> new file mode 100644
> index 0000000000..f3089e70f9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atan22_core.S
> @@ -0,0 +1,29 @@
> +/* Function atan2 vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2vv_atan2)
> +WRAPPER_IMPL_SSE2_ff atan2
> +END (_ZGVbN2vv_atan2)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2vv_atan2)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_atan24_core.S b/sysdeps/x86_64/fpu/svml_d_atan24_core.S
> new file mode 100644
> index 0000000000..8a163d12d2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atan24_core.S
> @@ -0,0 +1,29 @@
> +/* Function atan2 vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4vv_atan2)
> +WRAPPER_IMPL_AVX_ff _ZGVbN2vv_atan2
> +END (_ZGVdN4vv_atan2)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4vv_atan2)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S b/sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S
> new file mode 100644
> index 0000000000..0ee5ae8faf
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atan24_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function atan2 vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4vv_atan2)
> +WRAPPER_IMPL_AVX_ff _ZGVbN2vv_atan2
> +END (_ZGVcN4vv_atan2)
> diff --git a/sysdeps/x86_64/fpu/svml_d_atan28_core.S b/sysdeps/x86_64/fpu/svml_d_atan28_core.S
> new file mode 100644
> index 0000000000..b85f696686
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_atan28_core.S
> @@ -0,0 +1,25 @@
> +/* Function atan2 vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8vv_atan2)
> +WRAPPER_IMPL_AVX512_ff _ZGVdN4vv_atan2
> +END (_ZGVeN8vv_atan2)
> diff --git a/sysdeps/x86_64/fpu/svml_s_atan2f16_core.S b/sysdeps/x86_64/fpu/svml_s_atan2f16_core.S
> new file mode 100644
> index 0000000000..25acb31dfb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atan2f16_core.S
> @@ -0,0 +1,25 @@
> +/* Function atan2f vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16vv_atan2f)
> +WRAPPER_IMPL_AVX512_ff _ZGVdN8vv_atan2f
> +END (_ZGVeN16vv_atan2f)
> diff --git a/sysdeps/x86_64/fpu/svml_s_atan2f4_core.S b/sysdeps/x86_64/fpu/svml_s_atan2f4_core.S
> new file mode 100644
> index 0000000000..bc99f0ba10
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atan2f4_core.S
> @@ -0,0 +1,29 @@
> +/* Function atan2f vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4vv_atan2f)
> +WRAPPER_IMPL_SSE2_ff atan2f
> +END (_ZGVbN4vv_atan2f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4vv_atan2f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_atan2f8_core.S b/sysdeps/x86_64/fpu/svml_s_atan2f8_core.S
> new file mode 100644
> index 0000000000..bfcdb3c372
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atan2f8_core.S
> @@ -0,0 +1,29 @@
> +/* Function atan2f vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8vv_atan2f)
> +WRAPPER_IMPL_AVX_ff _ZGVbN4vv_atan2f
> +END (_ZGVdN8vv_atan2f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8vv_atan2f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S
> new file mode 100644
> index 0000000000..1aa8d05822
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_atan2f8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function atan2f vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY(_ZGVcN8vv_atan2f)
> +WRAPPER_IMPL_AVX_ff _ZGVbN4vv_atan2f
> +END(_ZGVcN8vv_atan2f)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx.c
> new file mode 100644
> index 0000000000..e423bce25b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-atan2.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx2.c
> new file mode 100644
> index 0000000000..e423bce25b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-atan2.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx512f.c
> new file mode 100644
> index 0000000000..e423bce25b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan2-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-atan2.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-atan2.c b/sysdeps/x86_64/fpu/test-double-libmvec-atan2.c
> new file mode 100644
> index 0000000000..d0aa626d95
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-atan2.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC atan2
> +#include "test-vector-abi-arg2.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index b1981ac7e4..37a7a1c777 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVbN2v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 47915a7e59..4313f67e06 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVdN4v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 5cd5049807..4b8b00f16d 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVcN4v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 83970739ab..d06522a407 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVeN8v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx.c
> new file mode 100644
> index 0000000000..5c7e2c9ad5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-atan2f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx2.c
> new file mode 100644
> index 0000000000..5c7e2c9ad5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-atan2f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx512f.c
> new file mode 100644
> index 0000000000..5c7e2c9ad5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-atan2f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-atan2f.c b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f.c
> new file mode 100644
> index 0000000000..beb5c745cb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-atan2f.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC atan2f
> +#include "test-vector-abi-arg2.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 0420f11c28..0bd631bf9a 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVeN16v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index c8f7580265..1018398bd3 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVbN4v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index b581796b88..42ea28f30f 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVdN8v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index f16789e5ff..70a0216a07 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -37,6 +37,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVcN8v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
> +VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 08/18] x86-64: Add vector sinh/sinhf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 08/18] x86-64: Add vector sinh/sinhf " Sunil K Pandey
@ 2021-12-29 21:26   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:26 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:50PM -0800, Sunil K Pandey wrote:
> Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector sinh/sinhf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
>  .../fpu/multiarch/svml_d_sinh2_core-sse2.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_sinh2_core.c  |  27 +
>  .../fpu/multiarch/svml_d_sinh2_core_sse4.S    | 456 +++++++++++++++++
>  .../fpu/multiarch/svml_d_sinh4_core-sse.S     |  20 +
>  .../x86_64/fpu/multiarch/svml_d_sinh4_core.c  |  27 +
>  .../fpu/multiarch/svml_d_sinh4_core_avx2.S    | 470 ++++++++++++++++++
>  .../fpu/multiarch/svml_d_sinh8_core-avx2.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_d_sinh8_core.c  |  27 +
>  .../fpu/multiarch/svml_d_sinh8_core_avx512.S  | 461 +++++++++++++++++
>  .../fpu/multiarch/svml_s_sinhf16_core-avx2.S  |  20 +
>  .../fpu/multiarch/svml_s_sinhf16_core.c       |  28 ++
>  .../multiarch/svml_s_sinhf16_core_avx512.S    | 318 ++++++++++++
>  .../fpu/multiarch/svml_s_sinhf4_core-sse2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_s_sinhf4_core.c |  28 ++
>  .../fpu/multiarch/svml_s_sinhf4_core_sse4.S   | 308 ++++++++++++
>  .../fpu/multiarch/svml_s_sinhf8_core-sse.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_s_sinhf8_core.c |  28 ++
>  .../fpu/multiarch/svml_s_sinhf8_core_avx2.S   | 309 ++++++++++++
>  sysdeps/x86_64/fpu/svml_d_sinh2_core.S        |  29 ++
>  sysdeps/x86_64/fpu/svml_d_sinh4_core.S        |  29 ++
>  sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S    |  25 +
>  sysdeps/x86_64/fpu/svml_d_sinh8_core.S        |  25 +
>  sysdeps/x86_64/fpu/svml_s_sinhf16_core.S      |  25 +
>  sysdeps/x86_64/fpu/svml_s_sinhf4_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_s_sinhf8_core.S       |  29 ++
>  sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S   |  25 +
>  .../x86_64/fpu/test-double-libmvec-sinh-avx.c |   1 +
>  .../fpu/test-double-libmvec-sinh-avx2.c       |   1 +
>  .../fpu/test-double-libmvec-sinh-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-sinh.c |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-sinhf-avx.c |   1 +
>  .../fpu/test-float-libmvec-sinhf-avx2.c       |   1 +
>  .../fpu/test-float-libmvec-sinhf-avx512f.c    |   1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 2894 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_sinh8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-sinh.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 28dc4a82c5..6347320521 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -186,4 +186,15 @@
>  #define __DECL_SIMD_expm1f32x
>  #define __DECL_SIMD_expm1f64x
>  #define __DECL_SIMD_expm1f128x
> +
> +#define __DECL_SIMD_sinh
> +#define __DECL_SIMD_sinhf
> +#define __DECL_SIMD_sinhl
> +#define __DECL_SIMD_sinhf16
> +#define __DECL_SIMD_sinhf32
> +#define __DECL_SIMD_sinhf64
> +#define __DECL_SIMD_sinhf128
> +#define __DECL_SIMD_sinhf32x
> +#define __DECL_SIMD_sinhf64x
> +#define __DECL_SIMD_sinhf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index c57adc8ace..673b3a93ba 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -70,7 +70,7 @@ __MATHCALL (tan,, (_Mdouble_ __x));
>  /* Hyperbolic cosine of X.  */
>  __MATHCALL_VEC (cosh,, (_Mdouble_ __x));
>  /* Hyperbolic sine of X.  */
> -__MATHCALL (sinh,, (_Mdouble_ __x));
> +__MATHCALL_VEC (sinh,, (_Mdouble_ __x));
>  /* Hyperbolic tangent of X.  */
>  __MATHCALL (tanh,, (_Mdouble_ __x));
>  
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index c9d3213bd3..f9d7b085ab 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -53,6 +53,7 @@ GLIBC_2.35 _ZGVbN2v_cosh F
>  GLIBC_2.35 _ZGVbN2v_exp10 F
>  GLIBC_2.35 _ZGVbN2v_exp2 F
>  GLIBC_2.35 _ZGVbN2v_expm1 F
> +GLIBC_2.35 _ZGVbN2v_sinh F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
> @@ -61,6 +62,7 @@ GLIBC_2.35 _ZGVbN4v_coshf F
>  GLIBC_2.35 _ZGVbN4v_exp10f F
>  GLIBC_2.35 _ZGVbN4v_exp2f F
>  GLIBC_2.35 _ZGVbN4v_expm1f F
> +GLIBC_2.35 _ZGVbN4v_sinhf F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
>  GLIBC_2.35 _ZGVcN4v_asin F
> @@ -69,6 +71,7 @@ GLIBC_2.35 _ZGVcN4v_cosh F
>  GLIBC_2.35 _ZGVcN4v_exp10 F
>  GLIBC_2.35 _ZGVcN4v_exp2 F
>  GLIBC_2.35 _ZGVcN4v_expm1 F
> +GLIBC_2.35 _ZGVcN4v_sinh F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
> @@ -77,6 +80,7 @@ GLIBC_2.35 _ZGVcN8v_coshf F
>  GLIBC_2.35 _ZGVcN8v_exp10f F
>  GLIBC_2.35 _ZGVcN8v_exp2f F
>  GLIBC_2.35 _ZGVcN8v_expm1f F
> +GLIBC_2.35 _ZGVcN8v_sinhf F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
>  GLIBC_2.35 _ZGVdN4v_asin F
> @@ -85,6 +89,7 @@ GLIBC_2.35 _ZGVdN4v_cosh F
>  GLIBC_2.35 _ZGVdN4v_exp10 F
>  GLIBC_2.35 _ZGVdN4v_exp2 F
>  GLIBC_2.35 _ZGVdN4v_expm1 F
> +GLIBC_2.35 _ZGVdN4v_sinh F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
> @@ -93,6 +98,7 @@ GLIBC_2.35 _ZGVdN8v_coshf F
>  GLIBC_2.35 _ZGVdN8v_exp10f F
>  GLIBC_2.35 _ZGVdN8v_exp2f F
>  GLIBC_2.35 _ZGVdN8v_expm1f F
> +GLIBC_2.35 _ZGVdN8v_sinhf F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
> @@ -101,6 +107,7 @@ GLIBC_2.35 _ZGVeN16v_coshf F
>  GLIBC_2.35 _ZGVeN16v_exp10f F
>  GLIBC_2.35 _ZGVeN16v_exp2f F
>  GLIBC_2.35 _ZGVeN16v_expm1f F
> +GLIBC_2.35 _ZGVeN16v_sinhf F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
>  GLIBC_2.35 _ZGVeN8v_asin F
> @@ -109,4 +116,5 @@ GLIBC_2.35 _ZGVeN8v_cosh F
>  GLIBC_2.35 _ZGVeN8v_exp10 F
>  GLIBC_2.35 _ZGVeN8v_exp2 F
>  GLIBC_2.35 _ZGVeN8v_expm1 F
> +GLIBC_2.35 _ZGVeN8v_sinh F
>  GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index e2f98e176f..51a41cfebc 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -90,6 +90,10 @@
>  #  define __DECL_SIMD_expm1 __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_expm1f
>  #  define __DECL_SIMD_expm1f __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_sinh
> +#  define __DECL_SIMD_sinh __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_sinhf
> +#  define __DECL_SIMD_sinhf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 43233059f6..91e9b4fc83 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -44,6 +44,8 @@
>  !GCC$ builtin (coshf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (expm1) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (expm1f) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (sinh) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (sinhf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -73,3 +75,5 @@
>  !GCC$ builtin (coshf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (expm1) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (expm1f) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (sinh) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (sinhf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 8de8214971..81e9fc95b2 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -36,6 +36,7 @@ libmvec-funcs = \
>    pow \
>    sin \
>    sincos \
> +  sinh \
>  
>  # Define libmvec function for benchtests directory.
>  libmvec-bench-funcs = \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 58debb2dbe..2710446d12 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -21,6 +21,7 @@ libmvec {
>      _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
>      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
>      _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
> +    _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
> @@ -29,6 +30,7 @@ libmvec {
>      _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
>      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
>      _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
> +    _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
>      _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
>    }
>  }
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index f05ece8c8a..f4b313119d 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -1840,6 +1840,26 @@ float: 3
>  float128: 4
>  ldouble: 5
>  
> +Function: "sinh_vlen16":
> +float: 1
> +
> +Function: "sinh_vlen2":
> +double: 2
> +
> +Function: "sinh_vlen4":
> +double: 2
> +float: 1
> +
> +Function: "sinh_vlen4_avx2":
> +double: 2
> +
> +Function: "sinh_vlen8":
> +double: 2
> +float: 1
> +
> +Function: "sinh_vlen8_avx2":
> +float: 1
> +
>  Function: "tan":
>  float: 1
>  float128: 1
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core-sse2.S
> new file mode 100644
> index 0000000000..ca12ad6678
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized sinh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_sinh _ZGVbN2v_sinh_sse2
> +#include "../svml_d_sinh2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core.c
> new file mode 100644
> index 0000000000..c0344b2902
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized sinh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_sinh
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_sinh, __GI__ZGVbN2v_sinh, __redirect__ZGVbN2v_sinh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core_sse4.S
> new file mode 100644
> index 0000000000..80d19e9dba
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh2_core_sse4.S
> @@ -0,0 +1,456 @@
> +/* Function sinh vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute sinh(x) as (exp(x)-exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   sinh(NaN) = quiet NaN, and raise invalid exception
> + *   sinh(INF) = that INF
> + *   sinh(x)   = x for subnormals
> + *   sinh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_dsinh_data_internal
> + */
> +#define _dbInvLn2                     	0
> +#define _dbLn2hi                      	16
> +#define _dbLn2lo                      	32
> +#define _dSign                        	48
> +#define _dbT                          	64
> +#define _dbShifter                    	2112
> +#define _iDomainRange                 	2128
> +#define _dPC2                         	2144
> +#define _dPC3                         	2160
> +#define _dPC4                         	2176
> +#define _dPC5                         	2192
> +#define _lIndexMask                   	2208
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_sinh_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm2
> +
> +/*  Abs argument  */
> +        movups    _dSign+__svml_dsinh_data_internal(%rip), %xmm0
> +        lea       _dbT+8+__svml_dsinh_data_internal(%rip), %rsi
> +        andps     %xmm2, %xmm0
> +        movaps    %xmm0, %xmm1
> +
> +/*
> + *  Load argument
> + * dM = x*2^K/log(2) + RShifter
> + */
> +        movups    _dbInvLn2+__svml_dsinh_data_internal(%rip), %xmm10
> +        pxor      %xmm2, %xmm1
> +        mulpd     %xmm1, %xmm10
> +        movups    _dbShifter+__svml_dsinh_data_internal(%rip), %xmm5
> +        addpd     %xmm5, %xmm10
> +
> +/*
> + *  R
> + * dN = dM - RShifter
> + */
> +        movaps    %xmm10, %xmm7
> +        subpd     %xmm5, %xmm7
> +
> +/* dR = dX - dN*Log2_hi/2^K */
> +        movups    _dbLn2hi+__svml_dsinh_data_internal(%rip), %xmm6
> +        mulpd     %xmm7, %xmm6
> +
> +/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */
> +        movups    _dbLn2lo+__svml_dsinh_data_internal(%rip), %xmm8
> +        mulpd     %xmm7, %xmm8
> +
> +/*
> + * Check for overflow\underflow
> + *
> + */
> +        pshufd    $221, %xmm1, %xmm4
> +        subpd     %xmm6, %xmm1
> +        subpd     %xmm8, %xmm1
> +
> +/* VLOAD_CONST( D, dPC[0],         TAB._dPC1 ); */
> +        movq      _iDomainRange+__svml_dsinh_data_internal(%rip), %xmm3
> +        pcmpgtd   %xmm3, %xmm4
> +
> +/* dR2 = dR^2 */
> +        movaps    %xmm1, %xmm3
> +        mulpd     %xmm1, %xmm3
> +        movmskps  %xmm4, %edx
> +
> +/*
> + * sinh(r) = r*((a1=1)+r^2*(a3+r^2*a5)) = r + r*(r^2*(a3+r^2*a5)) ....
> + * dSinh_r = (a3+r^2*a5)
> + */
> +        movups    _dPC5+__svml_dsinh_data_internal(%rip), %xmm12
> +
> +/*
> + * poly(r) = (dG2+dG1)+dG3*sinh(dR)+dG1*sinh(dR)+(dG1+dG2)*dR2*(a2 +a4*dR2)
> + * dOut = (a2 +a4*dR2)
> + */
> +        movups    _dPC4+__svml_dsinh_data_internal(%rip), %xmm13
> +        mulpd     %xmm3, %xmm12
> +        mulpd     %xmm3, %xmm13
> +        addpd     _dPC3+__svml_dsinh_data_internal(%rip), %xmm12
> +        addpd     _dPC2+__svml_dsinh_data_internal(%rip), %xmm13
> +
> +/* dSinh_r = r^2*(a3+r^2*a5) */
> +        mulpd     %xmm3, %xmm12
> +
> +/* dOut = dR2*(a2 +a4*dR2) */
> +        mulpd     %xmm13, %xmm3
> +
> +/* dSinh_r = r + r*(r^2*(a3+r^2*a5)) */
> +        mulpd     %xmm1, %xmm12
> +
> +/*
> + *  Index and lookup
> + * j
> + */
> +        movups    _lIndexMask+__svml_dsinh_data_internal(%rip), %xmm9
> +        andps     %xmm10, %xmm9
> +        movd      %xmm9, %eax
> +
> +/* split j and N */
> +        pxor      %xmm9, %xmm10
> +
> +/*
> + *  G1,G2,G3: dTdif,dTn * 2^N,2^(-N)
> + * lM now is an EXP(2^N)
> + */
> +        psllq     $45, %xmm10
> +
> +/*  */
> +        movaps    %xmm10, %xmm4
> +        pextrw    $4, %xmm9, %ecx
> +        addpd     %xmm12, %xmm1
> +        shll      $4, %eax
> +        shll      $4, %ecx
> +        movq      (%rax,%rsi), %xmm11
> +        movhpd    (%rcx,%rsi), %xmm11
> +        paddq     %xmm11, %xmm4
> +
> +/*  */
> +        psubq     %xmm10, %xmm11
> +
> +/* dG3 = dTn*2^N + dTn*2^-N */
> +        movdqa    %xmm4, %xmm14
> +        addpd     %xmm11, %xmm14
> +
> +/* dG2 = dTn*2^N - dTn*2^-N */
> +        subpd     %xmm11, %xmm4
> +        movq      -8(%rax,%rsi), %xmm15
> +        movhpd    -8(%rcx,%rsi), %xmm15
> +        paddq     %xmm10, %xmm15
> +
> +/* dG2 += dG1 */
> +        addpd     %xmm15, %xmm4
> +
> +/* dG1 += dG3 */
> +        addpd     %xmm14, %xmm15
> +
> +/* dOut = dG2*dR2*(a2 +a4*dR2) */
> +        mulpd     %xmm4, %xmm3
> +
> +/* dOut = dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
> +        mulpd     %xmm15, %xmm1
> +        addpd     %xmm1, %xmm3
> +
> +/* dOut = dG2 + dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
> +        addpd     %xmm3, %xmm4
> +
> +/*  Ret H  */
> +        orps      %xmm4, %xmm0
> +        andl      $3, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm2, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      sinh@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN2v_sinh_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dsinh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _dbInvLn2[2][2];
> +        __declspec(align(16)) VUINT32 _dbLn2hi[2][2];
> +        __declspec(align(16)) VUINT32 _dbLn2lo[2][2];
> +        __declspec(align(16)) VUINT32 _dSign[2][2];                //0x8000000000000000
> +        __declspec(align(16)) VUINT32 _dbT[(1<<7)][2][2]; //precalc poly coeff
> +        __declspec(align(16)) VUINT32 _dbShifter[2][2];
> +        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
> +        __declspec(align(16)) VUINT32 _dPC2[2][2];
> +        __declspec(align(16)) VUINT32 _dPC3[2][2];
> +        __declspec(align(16)) VUINT32 _dPC4[2][2];
> +        __declspec(align(16)) VUINT32 _dPC5[2][2];
> +        __declspec(align(16)) VUINT32 _lIndexMask[2][2];
> +} __svml_dsinh_data_internal;
> +#endif
> +__svml_dsinh_data_internal:
> +        .quad 0x3FF71547652B82FE, 0x3FF71547652B82FE /* _dbInvLn2 = 1/log(2) */
> +        .align 16
> +        .quad 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000 /* _dbLn2hi  = log(2) hi*/
> +        .align 16
> +        .quad 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A /* _dbLn2lo  = log(2) lo*/
> +        .align 16
> +        .quad 0x8000000000000000, 0x8000000000000000 /* _dSign */
> +        //_dbT
> +        .align 16
> +        .quad 0x0000000000000000, 0x3FE0000000000000  //2^( 0 /128-1) - 2^(- 0 /128-1), 2^(- 0 /128-1)
> +        .quad 0x3F762E4A19BD1E74, 0x3FDFD3C22B8F71F1  //2^( 1 /128-1) - 2^(- 1 /128-1), 2^(- 1 /128-1)
> +        .quad 0x3F862E5F6A0DFD36, 0x3FDFA7C1819E90D8  //2^( 2 /128-1) - 2^(- 2 /128-1), 2^(- 2 /128-1)
> +        .quad 0x3F90A2E234040F5F, 0x3FDF7BFDAD9CBE14  //2^( 3 /128-1) - 2^(- 3 /128-1), 2^(- 3 /128-1)
> +        .quad 0x3F962EB4ABCC5A81, 0x3FDF50765B6E4540  //2^( 4 /128-1) - 2^(- 4 /128-1), 2^(- 4 /128-1)
> +        .quad 0x3F9BBAB1C5033244, 0x3FDF252B376BBA97  //2^( 5 /128-1) - 2^(- 5 /128-1), 2^(- 5 /128-1)
> +        .quad 0x3FA0A372144EEB45, 0x3FDEFA1BEE615A27  //2^( 6 /128-1) - 2^(- 6 /128-1), 2^(- 6 /128-1)
> +        .quad 0x3FA369AB3FFBF8B0, 0x3FDECF482D8E67F1  //2^( 7 /128-1) - 2^(- 7 /128-1), 2^(- 7 /128-1)
> +        .quad 0x3FA63009BA740A2A, 0x3FDEA4AFA2A490DA  //2^( 8 /128-1) - 2^(- 8 /128-1), 2^(- 8 /128-1)
> +        .quad 0x3FA8F692D8EA1B5A, 0x3FDE7A51FBC74C83  //2^( 9 /128-1) - 2^(- 9 /128-1), 2^(- 9 /128-1)
> +        .quad 0x3FABBD4BF0E31A6F, 0x3FDE502EE78B3FF6  //2^( 10 /128-1) - 2^(- 10 /128-1), 2^(- 10 /128-1)
> +        .quad 0x3FAE843A5840286A, 0x3FDE264614F5A129  //2^( 11 /128-1) - 2^(- 11 /128-1), 2^(- 11 /128-1)
> +        .quad 0x3FB0A5B1B2A46D0A, 0x3FDDFC97337B9B5F  //2^( 12 /128-1) - 2^(- 12 /128-1), 2^(- 12 /128-1)
> +        .quad 0x3FB20966375ABCDF, 0x3FDDD321F301B460  //2^( 13 /128-1) - 2^(- 13 /128-1), 2^(- 13 /128-1)
> +        .quad 0x3FB36D3D65DCA4E8, 0x3FDDA9E603DB3285  //2^( 14 /128-1) - 2^(- 14 /128-1), 2^(- 14 /128-1)
> +        .quad 0x3FB4D139EA06642A, 0x3FDD80E316C98398  //2^( 15 /128-1) - 2^(- 15 /128-1), 2^(- 15 /128-1)
> +        .quad 0x3FB6355E6FFBF9BA, 0x3FDD5818DCFBA487  //2^( 16 /128-1) - 2^(- 16 /128-1), 2^(- 16 /128-1)
> +        .quad 0x3FB799ADA42E4788, 0x3FDD2F87080D89F2  //2^( 17 /128-1) - 2^(- 17 /128-1), 2^(- 17 /128-1)
> +        .quad 0x3FB8FE2A336035BC, 0x3FDD072D4A07897C  //2^( 18 /128-1) - 2^(- 18 /128-1), 2^(- 18 /128-1)
> +        .quad 0x3FBA62D6CAABD6B6, 0x3FDCDF0B555DC3FA  //2^( 19 /128-1) - 2^(- 19 /128-1), 2^(- 19 /128-1)
> +        .quad 0x3FBBC7B617878BAF, 0x3FDCB720DCEF9069  //2^( 20 /128-1) - 2^(- 20 /128-1), 2^(- 20 /128-1)
> +        .quad 0x3FBD2CCAC7CB2A11, 0x3FDC8F6D9406E7B5  //2^( 21 /128-1) - 2^(- 21 /128-1), 2^(- 21 /128-1)
> +        .quad 0x3FBE921789B52185, 0x3FDC67F12E57D14B  //2^( 22 /128-1) - 2^(- 22 /128-1), 2^(- 22 /128-1)
> +        .quad 0x3FBFF79F0BEFA2C7, 0x3FDC40AB5FFFD07A  //2^( 23 /128-1) - 2^(- 23 /128-1), 2^(- 23 /128-1)
> +        .quad 0x3FC0AEB1FECAE3A9, 0x3FDC199BDD85529C  //2^( 24 /128-1) - 2^(- 24 /128-1), 2^(- 24 /128-1)
> +        .quad 0x3FC161B4871C5CEC, 0x3FDBF2C25BD71E09  //2^( 25 /128-1) - 2^(- 25 /128-1), 2^(- 25 /128-1)
> +        .quad 0x3FC214D876F26FD0, 0x3FDBCC1E904BC1D2  //2^( 26 /128-1) - 2^(- 26 /128-1), 2^(- 26 /128-1)
> +        .quad 0x3FC2C81F2693816F, 0x3FDBA5B030A1064A  //2^( 27 /128-1) - 2^(- 27 /128-1), 2^(- 27 /128-1)
> +        .quad 0x3FC37B89EE88BEF7, 0x3FDB7F76F2FB5E47  //2^( 28 /128-1) - 2^(- 28 /128-1), 2^(- 28 /128-1)
> +        .quad 0x3FC42F1A27A0B3CD, 0x3FDB59728DE5593A  //2^( 29 /128-1) - 2^(- 29 /128-1), 2^(- 29 /128-1)
> +        .quad 0x3FC4E2D12AF1E037, 0x3FDB33A2B84F15FB  //2^( 30 /128-1) - 2^(- 30 /128-1), 2^(- 30 /128-1)
> +        .quad 0x3FC596B051DD508D, 0x3FDB0E07298DB666  //2^( 31 /128-1) - 2^(- 31 /128-1), 2^(- 31 /128-1)
> +        .quad 0x3FC64AB8F61134FA, 0x3FDAE89F995AD3AD  //2^( 32 /128-1) - 2^(- 32 /128-1), 2^(- 32 /128-1)
> +        .quad 0x3FC6FEEC718B79D1, 0x3FDAC36BBFD3F37A  //2^( 33 /128-1) - 2^(- 33 /128-1), 2^(- 33 /128-1)
> +        .quad 0x3FC7B34C1E9C607F, 0x3FDA9E6B5579FDBF  //2^( 34 /128-1) - 2^(- 34 /128-1), 2^(- 34 /128-1)
> +        .quad 0x3FC867D957E91912, 0x3FDA799E1330B358  //2^( 35 /128-1) - 2^(- 35 /128-1), 2^(- 35 /128-1)
> +        .quad 0x3FC91C95786E5C72, 0x3FDA5503B23E255D  //2^( 36 /128-1) - 2^(- 36 /128-1), 2^(- 36 /128-1)
> +        .quad 0x3FC9D181DB83072F, 0x3FDA309BEC4A2D33  //2^( 37 /128-1) - 2^(- 37 /128-1), 2^(- 37 /128-1)
> +        .quad 0x3FCA869FDCDAB512, 0x3FDA0C667B5DE565  //2^( 38 /128-1) - 2^(- 38 /128-1), 2^(- 38 /128-1)
> +        .quad 0x3FCB3BF0D8885D4C, 0x3FD9E86319E32323  //2^( 39 /128-1) - 2^(- 39 /128-1), 2^(- 39 /128-1)
> +        .quad 0x3FCBF1762B00EF69, 0x3FD9C49182A3F090  //2^( 40 /128-1) - 2^(- 40 /128-1), 2^(- 40 /128-1)
> +        .quad 0x3FCCA731311DF0FB, 0x3FD9A0F170CA07BA  //2^( 41 /128-1) - 2^(- 41 /128-1), 2^(- 41 /128-1)
> +        .quad 0x3FCD5D2348201C09, 0x3FD97D829FDE4E50  //2^( 42 /128-1) - 2^(- 42 /128-1), 2^(- 42 /128-1)
> +        .quad 0x3FCE134DCDB1FE3E, 0x3FD95A44CBC8520F  //2^( 43 /128-1) - 2^(- 43 /128-1), 2^(- 43 /128-1)
> +        .quad 0x3FCEC9B21FEA98EA, 0x3FD93737B0CDC5E5  //2^( 44 /128-1) - 2^(- 44 /128-1), 2^(- 44 /128-1)
> +        .quad 0x3FCF80519D5001D3, 0x3FD9145B0B91FFC6  //2^( 45 /128-1) - 2^(- 45 /128-1), 2^(- 45 /128-1)
> +        .quad 0x3FD01B96D26D026A, 0x3FD8F1AE99157736  //2^( 46 /128-1) - 2^(- 46 /128-1), 2^(- 46 /128-1)
> +        .quad 0x3FD07723CAFA6331, 0x3FD8CF3216B5448C  //2^( 47 /128-1) - 2^(- 47 /128-1), 2^(- 47 /128-1)
> +        .quad 0x3FD0D2D06841B373, 0x3FD8ACE5422AA0DB  //2^( 48 /128-1) - 2^(- 48 /128-1), 2^(- 48 /128-1)
> +        .quad 0x3FD12E9D5A715381, 0x3FD88AC7D98A6699  //2^( 49 /128-1) - 2^(- 49 /128-1), 2^(- 49 /128-1)
> +        .quad 0x3FD18A8B51F5C661, 0x3FD868D99B4492ED  //2^( 50 /128-1) - 2^(- 50 /128-1), 2^(- 50 /128-1)
> +        .quad 0x3FD1E69AFF7B04D7, 0x3FD8471A4623C7AD  //2^( 51 /128-1) - 2^(- 51 /128-1), 2^(- 51 /128-1)
> +        .quad 0x3FD242CD13EDD0F1, 0x3FD82589994CCE13  //2^( 52 /128-1) - 2^(- 52 /128-1), 2^(- 52 /128-1)
> +        .quad 0x3FD29F22407D0A0C, 0x3FD80427543E1A12  //2^( 53 /128-1) - 2^(- 53 /128-1), 2^(- 53 /128-1)
> +        .quad 0x3FD2FB9B369B0153, 0x3FD7E2F336CF4E62  //2^( 54 /128-1) - 2^(- 54 /128-1), 2^(- 54 /128-1)
> +        .quad 0x3FD35838A7FECEC8, 0x3FD7C1ED0130C132  //2^( 55 /128-1) - 2^(- 55 /128-1), 2^(- 55 /128-1)
> +        .quad 0x3FD3B4FB46A5A6CC, 0x3FD7A11473EB0187  //2^( 56 /128-1) - 2^(- 56 /128-1), 2^(- 56 /128-1)
> +        .quad 0x3FD411E3C4D4302F, 0x3FD780694FDE5D3F  //2^( 57 /128-1) - 2^(- 57 /128-1), 2^(- 57 /128-1)
> +        .quad 0x3FD46EF2D517DAC8, 0x3FD75FEB564267C9  //2^( 58 /128-1) - 2^(- 58 /128-1), 2^(- 58 /128-1)
> +        .quad 0x3FD4CC292A48369E, 0x3FD73F9A48A58174  //2^( 59 /128-1) - 2^(- 59 /128-1), 2^(- 59 /128-1)
> +        .quad 0x3FD5298777884B96, 0x3FD71F75E8EC5F74  //2^( 60 /128-1) - 2^(- 60 /128-1), 2^(- 60 /128-1)
> +        .quad 0x3FD5870E7047F1BC, 0x3FD6FF7DF9519484  //2^( 61 /128-1) - 2^(- 61 /128-1), 2^(- 61 /128-1)
> +        .quad 0x3FD5E4BEC8452A1A, 0x3FD6DFB23C651A2F  //2^( 62 /128-1) - 2^(- 62 /128-1), 2^(- 62 /128-1)
> +        .quad 0x3FD64299338D7827, 0x3FD6C012750BDABF  //2^( 63 /128-1) - 2^(- 63 /128-1), 2^(- 63 /128-1)
> +        .quad 0x3FD6A09E667F3BCD, 0x3FD6A09E667F3BCD  //2^( 64 /128-1) - 2^(- 64 /128-1), 2^(- 64 /128-1)
> +        .quad 0x3FD6FECF15CB0C0B, 0x3FD68155D44CA973  //2^( 65 /128-1) - 2^(- 65 /128-1), 2^(- 65 /128-1)
> +        .quad 0x3FD75D2BF6751239, 0x3FD6623882552225  //2^( 66 /128-1) - 2^(- 66 /128-1), 2^(- 66 /128-1)
> +        .quad 0x3FD7BBB5BDD665E8, 0x3FD6434634CCC320  //2^( 67 /128-1) - 2^(- 67 /128-1), 2^(- 67 /128-1)
> +        .quad 0x3FD81A6D219E6963, 0x3FD6247EB03A5585  //2^( 68 /128-1) - 2^(- 68 /128-1), 2^(- 68 /128-1)
> +        .quad 0x3FD87952D7D426DF, 0x3FD605E1B976DC09  //2^( 69 /128-1) - 2^(- 69 /128-1), 2^(- 69 /128-1)
> +        .quad 0x3FD8D86796D7AE49, 0x3FD5E76F15AD2148  //2^( 70 /128-1) - 2^(- 70 /128-1), 2^(- 70 /128-1)
> +        .quad 0x3FD937AC156373C8, 0x3FD5C9268A5946B7  //2^( 71 /128-1) - 2^(- 71 /128-1), 2^(- 71 /128-1)
> +        .quad 0x3FD997210A8DAEE4, 0x3FD5AB07DD485429  //2^( 72 /128-1) - 2^(- 72 /128-1), 2^(- 72 /128-1)
> +        .quad 0x3FD9F6C72DC9BA68, 0x3FD58D12D497C7FD  //2^( 73 /128-1) - 2^(- 73 /128-1), 2^(- 73 /128-1)
> +        .quad 0x3FDA569F36E974EA, 0x3FD56F4736B527DA  //2^( 74 /128-1) - 2^(- 74 /128-1), 2^(- 74 /128-1)
> +        .quad 0x3FDAB6A9DE1EA215, 0x3FD551A4CA5D920F  //2^( 75 /128-1) - 2^(- 75 /128-1), 2^(- 75 /128-1)
> +        .quad 0x3FDB16E7DBFC4CA3, 0x3FD5342B569D4F82  //2^( 76 /128-1) - 2^(- 76 /128-1), 2^(- 76 /128-1)
> +        .quad 0x3FDB7759E9782918, 0x3FD516DAA2CF6642  //2^( 77 /128-1) - 2^(- 77 /128-1), 2^(- 77 /128-1)
> +        .quad 0x3FDBD800BFEBF932, 0x3FD4F9B2769D2CA7  //2^( 78 /128-1) - 2^(- 78 /128-1), 2^(- 78 /128-1)
> +        .quad 0x3FDC38DD1916F025, 0x3FD4DCB299FDDD0D  //2^( 79 /128-1) - 2^(- 79 /128-1), 2^(- 79 /128-1)
> +        .quad 0x3FDC99EFAF1F1790, 0x3FD4BFDAD5362A27  //2^( 80 /128-1) - 2^(- 80 /128-1), 2^(- 80 /128-1)
> +        .quad 0x3FDCFB393C92B539, 0x3FD4A32AF0D7D3DE  //2^( 81 /128-1) - 2^(- 81 /128-1), 2^(- 81 /128-1)
> +        .quad 0x3FDD5CBA7C69B19C, 0x3FD486A2B5C13CD0  //2^( 82 /128-1) - 2^(- 82 /128-1), 2^(- 82 /128-1)
> +        .quad 0x3FDDBE742A06FF34, 0x3FD46A41ED1D0057  //2^( 83 /128-1) - 2^(- 83 /128-1), 2^(- 83 /128-1)
> +        .quad 0x3FDE2067013A029D, 0x3FD44E086061892D  //2^( 84 /128-1) - 2^(- 84 /128-1), 2^(- 84 /128-1)
> +        .quad 0x3FDE8293BE3FFB87, 0x3FD431F5D950A897  //2^( 85 /128-1) - 2^(- 85 /128-1), 2^(- 85 /128-1)
> +        .quad 0x3FDEE4FB1DC56E75, 0x3FD4160A21F72E2A  //2^( 86 /128-1) - 2^(- 86 /128-1), 2^(- 86 /128-1)
> +        .quad 0x3FDF479DDCE78F58, 0x3FD3FA4504AC801C  //2^( 87 /128-1) - 2^(- 87 /128-1), 2^(- 87 /128-1)
> +        .quad 0x3FDFAA7CB935ACFE, 0x3FD3DEA64C123422  //2^( 88 /128-1) - 2^(- 88 /128-1), 2^(- 88 /128-1)
> +        .quad 0x3FE006CC38594EB1, 0x3FD3C32DC313A8E5  //2^( 89 /128-1) - 2^(- 89 /128-1), 2^(- 89 /128-1)
> +        .quad 0x3FE03878E0EB1569, 0x3FD3A7DB34E59FF7  //2^( 90 /128-1) - 2^(- 90 /128-1), 2^(- 90 /128-1)
> +        .quad 0x3FE06A44B5C74101, 0x3FD38CAE6D05D866  //2^( 91 /128-1) - 2^(- 91 /128-1), 2^(- 91 /128-1)
> +        .quad 0x3FE09C3016A0D077, 0x3FD371A7373AA9CB  //2^( 92 /128-1) - 2^(- 92 /128-1), 2^(- 92 /128-1)
> +        .quad 0x3FE0CE3B63676360, 0x3FD356C55F929FF1  //2^( 93 /128-1) - 2^(- 93 /128-1), 2^(- 93 /128-1)
> +        .quad 0x3FE10066FC47F240, 0x3FD33C08B26416FF  //2^( 94 /128-1) - 2^(- 94 /128-1), 2^(- 94 /128-1)
> +        .quad 0x3FE132B341AD8761, 0x3FD32170FC4CD831  //2^( 95 /128-1) - 2^(- 95 /128-1), 2^(- 95 /128-1)
> +        .quad 0x3FE165209441F823, 0x3FD306FE0A31B715  //2^( 96 /128-1) - 2^(- 96 /128-1), 2^(- 96 /128-1)
> +        .quad 0x3FE197AF54EE9EBB, 0x3FD2ECAFA93E2F56  //2^( 97 /128-1) - 2^(- 97 /128-1), 2^(- 97 /128-1)
> +        .quad 0x3FE1CA5FE4DD1475, 0x3FD2D285A6E4030B  //2^( 98 /128-1) - 2^(- 98 /128-1), 2^(- 98 /128-1)
> +        .quad 0x3FE1FD32A577EC72, 0x3FD2B87FD0DAD990  //2^( 99 /128-1) - 2^(- 99 /128-1), 2^(- 99 /128-1)
> +        .quad 0x3FE23027F86B6ED6, 0x3FD29E9DF51FDEE1  //2^( 100 /128-1) - 2^(- 100 /128-1), 2^(- 100 /128-1)
> +        .quad 0x3FE263403FA65489, 0x3FD284DFE1F56381  //2^( 101 /128-1) - 2^(- 101 /128-1), 2^(- 101 /128-1)
> +        .quad 0x3FE2967BDD5A8364, 0x3FD26B4565E27CDD  //2^( 102 /128-1) - 2^(- 102 /128-1), 2^(- 102 /128-1)
> +        .quad 0x3FE2C9DB33FDCAE9, 0x3FD251CE4FB2A63F  //2^( 103 /128-1) - 2^(- 103 /128-1), 2^(- 103 /128-1)
> +        .quad 0x3FE2FD5EA64AA180, 0x3FD2387A6E756238  //2^( 104 /128-1) - 2^(- 104 /128-1), 2^(- 104 /128-1)
> +        .quad 0x3FE331069740E22F, 0x3FD21F49917DDC96  //2^( 105 /128-1) - 2^(- 105 /128-1), 2^(- 105 /128-1)
> +        .quad 0x3FE364D36A268AE0, 0x3FD2063B88628CD6  //2^( 106 /128-1) - 2^(- 106 /128-1), 2^(- 106 /128-1)
> +        .quad 0x3FE398C582887B27, 0x3FD1ED5022FCD91D  //2^( 107 /128-1) - 2^(- 107 /128-1), 2^(- 107 /128-1)
> +        .quad 0x3FE3CCDD443B3394, 0x3FD1D4873168B9AA  //2^( 108 /128-1) - 2^(- 108 /128-1), 2^(- 108 /128-1)
> +        .quad 0x3FE4011B135B9590, 0x3FD1BBE084045CD4  //2^( 109 /128-1) - 2^(- 109 /128-1), 2^(- 109 /128-1)
> +        .quad 0x3FE4357F544FA3C1, 0x3FD1A35BEB6FCB75  //2^( 110 /128-1) - 2^(- 110 /128-1), 2^(- 110 /128-1)
> +        .quad 0x3FE46A0A6BC742FD, 0x3FD18AF9388C8DEA  //2^( 111 /128-1) - 2^(- 111 /128-1), 2^(- 111 /128-1)
> +        .quad 0x3FE49EBCBEBCFBCA, 0x3FD172B83C7D517B  //2^( 112 /128-1) - 2^(- 112 /128-1), 2^(- 112 /128-1)
> +        .quad 0x3FE4D396B276BC6F, 0x3FD15A98C8A58E51  //2^( 113 /128-1) - 2^(- 113 /128-1), 2^(- 113 /128-1)
> +        .quad 0x3FE50898AC869B96, 0x3FD1429AAEA92DE0  //2^( 114 /128-1) - 2^(- 114 /128-1), 2^(- 114 /128-1)
> +        .quad 0x3FE53DC312CB9B7A, 0x3FD12ABDC06C31CC  //2^( 115 /128-1) - 2^(- 115 /128-1), 2^(- 115 /128-1)
> +        .quad 0x3FE573164B726DB6, 0x3FD11301D0125B51  //2^( 116 /128-1) - 2^(- 116 /128-1), 2^(- 116 /128-1)
> +        .quad 0x3FE5A892BCF6379B, 0x3FD0FB66AFFED31B  //2^( 117 /128-1) - 2^(- 117 /128-1), 2^(- 117 /128-1)
> +        .quad 0x3FE5DE38CE215725, 0x3FD0E3EC32D3D1A2  //2^( 118 /128-1) - 2^(- 118 /128-1), 2^(- 118 /128-1)
> +        .quad 0x3FE61408E60E2888, 0x3FD0CC922B7247F7  //2^( 119 /128-1) - 2^(- 119 /128-1), 2^(- 119 /128-1)
> +        .quad 0x3FE64A036C27CC52, 0x3FD0B5586CF9890F  //2^( 120 /128-1) - 2^(- 120 /128-1), 2^(- 120 /128-1)
> +        .quad 0x3FE68028C82AEE2F, 0x3FD09E3ECAC6F383  //2^( 121 /128-1) - 2^(- 121 /128-1), 2^(- 121 /128-1)
> +        .quad 0x3FE6B67962268C43, 0x3FD0874518759BC8  //2^( 122 /128-1) - 2^(- 122 /128-1), 2^(- 122 /128-1)
> +        .quad 0x3FE6ECF5A27CBF28, 0x3FD0706B29DDF6DE  //2^( 123 /128-1) - 2^(- 123 /128-1), 2^(- 123 /128-1)
> +        .quad 0x3FE7239DF1E38286, 0x3FD059B0D3158574  //2^( 124 /128-1) - 2^(- 124 /128-1), 2^(- 124 /128-1)
> +        .quad 0x3FE75A72B9657E51, 0x3FD04315E86E7F85  //2^( 125 /128-1) - 2^(- 125 /128-1), 2^(- 125 /128-1)
> +        .quad 0x3FE791746262D0A8, 0x3FD02C9A3E778061  //2^( 126 /128-1) - 2^(- 126 /128-1), 2^(- 126 /128-1)
> +        .quad 0x3FE7C8A35691D856, 0x3FD0163DA9FB3335 //2^( 127 /128-1) - 2^(- 127 /128-1), 2^(- 127 /128-1)
> +        .align 16
> +        .quad 0x42C8000000000000, 0x42C8000000000000 /* _dbShifter = 1.5 * 2^(52-k)*/
> +        .align 16
> +        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99         /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
> +        .align 16
> +        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
> +        .align 16
> +        .quad 0x3FC55555555554AD, 0x3FC55555555554AD /* _dPC3 */
> +        .align 16
> +        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
> +        .align 16
> +        .quad 0x3F8111115712F425, 0x3F8111115712F425 /* _dPC5 */
> +        .align 16
> +        .quad 0x000000000000007f, 0x000000000000007f /* _lIndexMask */
> +        .align 16
> +        .type	__svml_dsinh_data_internal,@object
> +        .size	__svml_dsinh_data_internal,.-__svml_dsinh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core-sse.S
> new file mode 100644
> index 0000000000..ae531575fe
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized sinh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_sinh _ZGVdN4v_sinh_sse_wrapper
> +#include "../svml_d_sinh4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core.c
> new file mode 100644
> index 0000000000..bdf10b664b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized sinh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_sinh
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_sinh, __GI__ZGVdN4v_sinh, __redirect__ZGVdN4v_sinh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core_avx2.S
> new file mode 100644
> index 0000000000..27b50d31a8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh4_core_avx2.S
> @@ -0,0 +1,470 @@
> +/* Function sinh vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute sinh(x) as (exp(x)-exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   sinh(NaN) = quiet NaN, and raise invalid exception
> + *   sinh(INF) = that INF
> + *   sinh(x)   = x for subnormals
> + *   sinh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_dsinh_data_internal
> + */
> +#define _dbInvLn2                     	0
> +#define _dbLn2hi                      	32
> +#define _dbLn2lo                      	64
> +#define _dSign                        	96
> +#define _dbT                          	128
> +#define _dbShifter                    	2176
> +#define _iDomainRange                 	2208
> +#define _dPC2                         	2240
> +#define _dPC3                         	2272
> +#define _dPC4                         	2304
> +#define _dPC5                         	2336
> +#define _lIndexMask                   	2368
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_sinh_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       _dbT+8+__svml_dsinh_data_internal(%rip), %r8
> +        vmovupd   _dbShifter+__svml_dsinh_data_internal(%rip), %ymm12
> +
> +/*
> + *  Load argument
> + * dM = x*2^K/log(2) + RShifter
> + */
> +        vmovupd   _dbInvLn2+__svml_dsinh_data_internal(%rip), %ymm5
> +        vmovupd   _dbLn2hi+__svml_dsinh_data_internal(%rip), %ymm13
> +        vmovapd   %ymm0, %ymm8
> +
> +/*
> + * VLOAD_CONST( D, dPC[0],         TAB._dPC1 );
> + *  Abs argument
> + */
> +        vandpd    _dSign+__svml_dsinh_data_internal(%rip), %ymm8, %ymm7
> +        vxorpd    %ymm8, %ymm7, %ymm6
> +        vfmadd213pd %ymm12, %ymm6, %ymm5
> +
> +/*
> + *  R
> + * dN = dM - RShifter
> + */
> +        vsubpd    %ymm12, %ymm5, %ymm3
> +
> +/*
> + *  Index and lookup
> + * j
> + */
> +        vandps    _lIndexMask+__svml_dsinh_data_internal(%rip), %ymm5, %ymm4
> +
> +/*
> + * Check for overflow\underflow
> + *
> + */
> +        vextractf128 $1, %ymm6, %xmm9
> +        vshufps   $221, %xmm9, %xmm6, %xmm10
> +
> +/* dR = dX - dN*Log2_hi/2^K */
> +        vfnmadd231pd %ymm13, %ymm3, %ymm6
> +        vpcmpgtd  _iDomainRange+__svml_dsinh_data_internal(%rip), %xmm10, %xmm11
> +        vmovmskps %xmm11, %eax
> +
> +/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */
> +        vfnmadd231pd _dbLn2lo+__svml_dsinh_data_internal(%rip), %ymm3, %ymm6
> +        vextractf128 $1, %ymm4, %xmm0
> +        vmovd     %xmm4, %edx
> +        vmovd     %xmm0, %esi
> +        shll      $4, %edx
> +        vpextrd   $2, %xmm4, %ecx
> +
> +/* split j and N */
> +        vxorps    %ymm4, %ymm5, %ymm3
> +        shll      $4, %esi
> +        vpextrd   $2, %xmm0, %edi
> +        shll      $4, %ecx
> +
> +/*
> + *  G1,G2,G3: dTdif,dTn * 2^N,2^(-N)
> + * lM now is an EXP(2^N)
> + */
> +        vpsllq    $45, %ymm3, %ymm4
> +        vmovq     (%rdx,%r8), %xmm14
> +        vmovq     (%rsi,%r8), %xmm1
> +        vmovhpd   (%rcx,%r8), %xmm14, %xmm15
> +        shll      $4, %edi
> +        vmovhpd   (%rdi,%r8), %xmm1, %xmm2
> +
> +/* dR2 = dR^2 */
> +        vmulpd    %ymm6, %ymm6, %ymm1
> +        vmovq     -8(%rdx,%r8), %xmm9
> +        vmovq     -8(%rsi,%r8), %xmm11
> +        vmovhpd   -8(%rcx,%r8), %xmm9, %xmm10
> +        vmovhpd   -8(%rdi,%r8), %xmm11, %xmm12
> +        vinsertf128 $1, %xmm2, %ymm15, %ymm2
> +
> +/*  */
> +        vpaddq    %ymm4, %ymm2, %ymm5
> +
> +/*  */
> +        vpsubq    %ymm4, %ymm2, %ymm14
> +
> +/* dG3 = dTn*2^N + dTn*2^-N */
> +        vaddpd    %ymm14, %ymm5, %ymm2
> +
> +/* dG2 = dTn*2^N - dTn*2^-N */
> +        vsubpd    %ymm14, %ymm5, %ymm14
> +
> +/*
> + * sinh(r) = r*((a1=1)+r^2*(a3+r^2*a5)) = r + r*(r^2*(a3+r^2*a5)) ....
> + * dSinh_r = (a3+r^2*a5)
> + */
> +        vmovupd   _dPC5+__svml_dsinh_data_internal(%rip), %ymm5
> +        vfmadd213pd _dPC3+__svml_dsinh_data_internal(%rip), %ymm1, %ymm5
> +        vinsertf128 $1, %xmm12, %ymm10, %ymm13
> +        vpaddq    %ymm4, %ymm13, %ymm0
> +
> +/* dSinh_r = r^2*(a3+r^2*a5) */
> +        vmulpd    %ymm5, %ymm1, %ymm4
> +
> +/* dG2 += dG1 */
> +        vaddpd    %ymm14, %ymm0, %ymm3
> +
> +/* dG1 += dG3 */
> +        vaddpd    %ymm2, %ymm0, %ymm0
> +
> +/* dSinh_r = r + r*(r^2*(a3+r^2*a5)) */
> +        vfmadd213pd %ymm6, %ymm6, %ymm4
> +
> +/*
> + * poly(r) = (dG2+dG1)+dG3*sinh(dR)+dG1*sinh(dR)+(dG1+dG2)*dR2*(a2 +a4*dR2)
> + * dOut = (a2 +a4*dR2)
> + */
> +        vmovupd   _dPC4+__svml_dsinh_data_internal(%rip), %ymm6
> +        vfmadd213pd _dPC2+__svml_dsinh_data_internal(%rip), %ymm1, %ymm6
> +
> +/* dOut = dR2*(a2 +a4*dR2) */
> +        vmulpd    %ymm6, %ymm1, %ymm1
> +
> +/* dOut = dG2*dR2*(a2 +a4*dR2) */
> +        vmulpd    %ymm3, %ymm1, %ymm6
> +
> +/* dOut = dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
> +        vfmadd213pd %ymm6, %ymm0, %ymm4
> +
> +/* dOut = dG2 + dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
> +        vaddpd    %ymm4, %ymm3, %ymm5
> +
> +/*  Ret H  */
> +        vorpd     %ymm5, %ymm7, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm8
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm8, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      sinh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_sinh_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dsinh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _dbInvLn2[4][2];
> +        __declspec(align(32)) VUINT32 _dbLn2hi[4][2];
> +        __declspec(align(32)) VUINT32 _dbLn2lo[4][2];
> +        __declspec(align(32)) VUINT32 _dSign[4][2];                //0x8000000000000000
> +        __declspec(align(32)) VUINT32 _dbT[(1<<7)][2][2]; //precalc poly coeff
> +        __declspec(align(32)) VUINT32 _dbShifter[4][2];
> +        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
> +        __declspec(align(32)) VUINT32 _dPC2[4][2];
> +        __declspec(align(32)) VUINT32 _dPC3[4][2];
> +        __declspec(align(32)) VUINT32 _dPC4[4][2];
> +        __declspec(align(32)) VUINT32 _dPC5[4][2];
> +        __declspec(align(32)) VUINT32 _lIndexMask[4][2];
> +} __svml_dsinh_data_internal;
> +#endif
> +__svml_dsinh_data_internal:
> +        .quad 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE /* _dbInvLn2 = 1/log(2) */
> +        .align 32
> +        .quad 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000 /* _dbLn2hi  = log(2) hi*/
> +        .align 32
> +        .quad 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A /* _dbLn2lo  = log(2) lo*/
> +        .align 32
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 /* _dSign */
> +        //_dbT
> +        .align 32
> +        .quad 0x0000000000000000, 0x3FE0000000000000  //2^( 0 /128-1) - 2^(- 0 /128-1), 2^(- 0 /128-1)
> +        .quad 0x3F762E4A19BD1E74, 0x3FDFD3C22B8F71F1  //2^( 1 /128-1) - 2^(- 1 /128-1), 2^(- 1 /128-1)
> +        .quad 0x3F862E5F6A0DFD36, 0x3FDFA7C1819E90D8  //2^( 2 /128-1) - 2^(- 2 /128-1), 2^(- 2 /128-1)
> +        .quad 0x3F90A2E234040F5F, 0x3FDF7BFDAD9CBE14  //2^( 3 /128-1) - 2^(- 3 /128-1), 2^(- 3 /128-1)
> +        .quad 0x3F962EB4ABCC5A81, 0x3FDF50765B6E4540  //2^( 4 /128-1) - 2^(- 4 /128-1), 2^(- 4 /128-1)
> +        .quad 0x3F9BBAB1C5033244, 0x3FDF252B376BBA97  //2^( 5 /128-1) - 2^(- 5 /128-1), 2^(- 5 /128-1)
> +        .quad 0x3FA0A372144EEB45, 0x3FDEFA1BEE615A27  //2^( 6 /128-1) - 2^(- 6 /128-1), 2^(- 6 /128-1)
> +        .quad 0x3FA369AB3FFBF8B0, 0x3FDECF482D8E67F1  //2^( 7 /128-1) - 2^(- 7 /128-1), 2^(- 7 /128-1)
> +        .quad 0x3FA63009BA740A2A, 0x3FDEA4AFA2A490DA  //2^( 8 /128-1) - 2^(- 8 /128-1), 2^(- 8 /128-1)
> +        .quad 0x3FA8F692D8EA1B5A, 0x3FDE7A51FBC74C83  //2^( 9 /128-1) - 2^(- 9 /128-1), 2^(- 9 /128-1)
> +        .quad 0x3FABBD4BF0E31A6F, 0x3FDE502EE78B3FF6  //2^( 10 /128-1) - 2^(- 10 /128-1), 2^(- 10 /128-1)
> +        .quad 0x3FAE843A5840286A, 0x3FDE264614F5A129  //2^( 11 /128-1) - 2^(- 11 /128-1), 2^(- 11 /128-1)
> +        .quad 0x3FB0A5B1B2A46D0A, 0x3FDDFC97337B9B5F  //2^( 12 /128-1) - 2^(- 12 /128-1), 2^(- 12 /128-1)
> +        .quad 0x3FB20966375ABCDF, 0x3FDDD321F301B460  //2^( 13 /128-1) - 2^(- 13 /128-1), 2^(- 13 /128-1)
> +        .quad 0x3FB36D3D65DCA4E8, 0x3FDDA9E603DB3285  //2^( 14 /128-1) - 2^(- 14 /128-1), 2^(- 14 /128-1)
> +        .quad 0x3FB4D139EA06642A, 0x3FDD80E316C98398  //2^( 15 /128-1) - 2^(- 15 /128-1), 2^(- 15 /128-1)
> +        .quad 0x3FB6355E6FFBF9BA, 0x3FDD5818DCFBA487  //2^( 16 /128-1) - 2^(- 16 /128-1), 2^(- 16 /128-1)
> +        .quad 0x3FB799ADA42E4788, 0x3FDD2F87080D89F2  //2^( 17 /128-1) - 2^(- 17 /128-1), 2^(- 17 /128-1)
> +        .quad 0x3FB8FE2A336035BC, 0x3FDD072D4A07897C  //2^( 18 /128-1) - 2^(- 18 /128-1), 2^(- 18 /128-1)
> +        .quad 0x3FBA62D6CAABD6B6, 0x3FDCDF0B555DC3FA  //2^( 19 /128-1) - 2^(- 19 /128-1), 2^(- 19 /128-1)
> +        .quad 0x3FBBC7B617878BAF, 0x3FDCB720DCEF9069  //2^( 20 /128-1) - 2^(- 20 /128-1), 2^(- 20 /128-1)
> +        .quad 0x3FBD2CCAC7CB2A11, 0x3FDC8F6D9406E7B5  //2^( 21 /128-1) - 2^(- 21 /128-1), 2^(- 21 /128-1)
> +        .quad 0x3FBE921789B52185, 0x3FDC67F12E57D14B  //2^( 22 /128-1) - 2^(- 22 /128-1), 2^(- 22 /128-1)
> +        .quad 0x3FBFF79F0BEFA2C7, 0x3FDC40AB5FFFD07A  //2^( 23 /128-1) - 2^(- 23 /128-1), 2^(- 23 /128-1)
> +        .quad 0x3FC0AEB1FECAE3A9, 0x3FDC199BDD85529C  //2^( 24 /128-1) - 2^(- 24 /128-1), 2^(- 24 /128-1)
> +        .quad 0x3FC161B4871C5CEC, 0x3FDBF2C25BD71E09  //2^( 25 /128-1) - 2^(- 25 /128-1), 2^(- 25 /128-1)
> +        .quad 0x3FC214D876F26FD0, 0x3FDBCC1E904BC1D2  //2^( 26 /128-1) - 2^(- 26 /128-1), 2^(- 26 /128-1)
> +        .quad 0x3FC2C81F2693816F, 0x3FDBA5B030A1064A  //2^( 27 /128-1) - 2^(- 27 /128-1), 2^(- 27 /128-1)
> +        .quad 0x3FC37B89EE88BEF7, 0x3FDB7F76F2FB5E47  //2^( 28 /128-1) - 2^(- 28 /128-1), 2^(- 28 /128-1)
> +        .quad 0x3FC42F1A27A0B3CD, 0x3FDB59728DE5593A  //2^( 29 /128-1) - 2^(- 29 /128-1), 2^(- 29 /128-1)
> +        .quad 0x3FC4E2D12AF1E037, 0x3FDB33A2B84F15FB  //2^( 30 /128-1) - 2^(- 30 /128-1), 2^(- 30 /128-1)
> +        .quad 0x3FC596B051DD508D, 0x3FDB0E07298DB666  //2^( 31 /128-1) - 2^(- 31 /128-1), 2^(- 31 /128-1)
> +        .quad 0x3FC64AB8F61134FA, 0x3FDAE89F995AD3AD  //2^( 32 /128-1) - 2^(- 32 /128-1), 2^(- 32 /128-1)
> +        .quad 0x3FC6FEEC718B79D1, 0x3FDAC36BBFD3F37A  //2^( 33 /128-1) - 2^(- 33 /128-1), 2^(- 33 /128-1)
> +        .quad 0x3FC7B34C1E9C607F, 0x3FDA9E6B5579FDBF  //2^( 34 /128-1) - 2^(- 34 /128-1), 2^(- 34 /128-1)
> +        .quad 0x3FC867D957E91912, 0x3FDA799E1330B358  //2^( 35 /128-1) - 2^(- 35 /128-1), 2^(- 35 /128-1)
> +        .quad 0x3FC91C95786E5C72, 0x3FDA5503B23E255D  //2^( 36 /128-1) - 2^(- 36 /128-1), 2^(- 36 /128-1)
> +        .quad 0x3FC9D181DB83072F, 0x3FDA309BEC4A2D33  //2^( 37 /128-1) - 2^(- 37 /128-1), 2^(- 37 /128-1)
> +        .quad 0x3FCA869FDCDAB512, 0x3FDA0C667B5DE565  //2^( 38 /128-1) - 2^(- 38 /128-1), 2^(- 38 /128-1)
> +        .quad 0x3FCB3BF0D8885D4C, 0x3FD9E86319E32323  //2^( 39 /128-1) - 2^(- 39 /128-1), 2^(- 39 /128-1)
> +        .quad 0x3FCBF1762B00EF69, 0x3FD9C49182A3F090  //2^( 40 /128-1) - 2^(- 40 /128-1), 2^(- 40 /128-1)
> +        .quad 0x3FCCA731311DF0FB, 0x3FD9A0F170CA07BA  //2^( 41 /128-1) - 2^(- 41 /128-1), 2^(- 41 /128-1)
> +        .quad 0x3FCD5D2348201C09, 0x3FD97D829FDE4E50  //2^( 42 /128-1) - 2^(- 42 /128-1), 2^(- 42 /128-1)
> +        .quad 0x3FCE134DCDB1FE3E, 0x3FD95A44CBC8520F  //2^( 43 /128-1) - 2^(- 43 /128-1), 2^(- 43 /128-1)
> +        .quad 0x3FCEC9B21FEA98EA, 0x3FD93737B0CDC5E5  //2^( 44 /128-1) - 2^(- 44 /128-1), 2^(- 44 /128-1)
> +        .quad 0x3FCF80519D5001D3, 0x3FD9145B0B91FFC6  //2^( 45 /128-1) - 2^(- 45 /128-1), 2^(- 45 /128-1)
> +        .quad 0x3FD01B96D26D026A, 0x3FD8F1AE99157736  //2^( 46 /128-1) - 2^(- 46 /128-1), 2^(- 46 /128-1)
> +        .quad 0x3FD07723CAFA6331, 0x3FD8CF3216B5448C  //2^( 47 /128-1) - 2^(- 47 /128-1), 2^(- 47 /128-1)
> +        .quad 0x3FD0D2D06841B373, 0x3FD8ACE5422AA0DB  //2^( 48 /128-1) - 2^(- 48 /128-1), 2^(- 48 /128-1)
> +        .quad 0x3FD12E9D5A715381, 0x3FD88AC7D98A6699  //2^( 49 /128-1) - 2^(- 49 /128-1), 2^(- 49 /128-1)
> +        .quad 0x3FD18A8B51F5C661, 0x3FD868D99B4492ED  //2^( 50 /128-1) - 2^(- 50 /128-1), 2^(- 50 /128-1)
> +        .quad 0x3FD1E69AFF7B04D7, 0x3FD8471A4623C7AD  //2^( 51 /128-1) - 2^(- 51 /128-1), 2^(- 51 /128-1)
> +        .quad 0x3FD242CD13EDD0F1, 0x3FD82589994CCE13  //2^( 52 /128-1) - 2^(- 52 /128-1), 2^(- 52 /128-1)
> +        .quad 0x3FD29F22407D0A0C, 0x3FD80427543E1A12  //2^( 53 /128-1) - 2^(- 53 /128-1), 2^(- 53 /128-1)
> +        .quad 0x3FD2FB9B369B0153, 0x3FD7E2F336CF4E62  //2^( 54 /128-1) - 2^(- 54 /128-1), 2^(- 54 /128-1)
> +        .quad 0x3FD35838A7FECEC8, 0x3FD7C1ED0130C132  //2^( 55 /128-1) - 2^(- 55 /128-1), 2^(- 55 /128-1)
> +        .quad 0x3FD3B4FB46A5A6CC, 0x3FD7A11473EB0187  //2^( 56 /128-1) - 2^(- 56 /128-1), 2^(- 56 /128-1)
> +        .quad 0x3FD411E3C4D4302F, 0x3FD780694FDE5D3F  //2^( 57 /128-1) - 2^(- 57 /128-1), 2^(- 57 /128-1)
> +        .quad 0x3FD46EF2D517DAC8, 0x3FD75FEB564267C9  //2^( 58 /128-1) - 2^(- 58 /128-1), 2^(- 58 /128-1)
> +        .quad 0x3FD4CC292A48369E, 0x3FD73F9A48A58174  //2^( 59 /128-1) - 2^(- 59 /128-1), 2^(- 59 /128-1)
> +        .quad 0x3FD5298777884B96, 0x3FD71F75E8EC5F74  //2^( 60 /128-1) - 2^(- 60 /128-1), 2^(- 60 /128-1)
> +        .quad 0x3FD5870E7047F1BC, 0x3FD6FF7DF9519484  //2^( 61 /128-1) - 2^(- 61 /128-1), 2^(- 61 /128-1)
> +        .quad 0x3FD5E4BEC8452A1A, 0x3FD6DFB23C651A2F  //2^( 62 /128-1) - 2^(- 62 /128-1), 2^(- 62 /128-1)
> +        .quad 0x3FD64299338D7827, 0x3FD6C012750BDABF  //2^( 63 /128-1) - 2^(- 63 /128-1), 2^(- 63 /128-1)
> +        .quad 0x3FD6A09E667F3BCD, 0x3FD6A09E667F3BCD  //2^( 64 /128-1) - 2^(- 64 /128-1), 2^(- 64 /128-1)
> +        .quad 0x3FD6FECF15CB0C0B, 0x3FD68155D44CA973  //2^( 65 /128-1) - 2^(- 65 /128-1), 2^(- 65 /128-1)
> +        .quad 0x3FD75D2BF6751239, 0x3FD6623882552225  //2^( 66 /128-1) - 2^(- 66 /128-1), 2^(- 66 /128-1)
> +        .quad 0x3FD7BBB5BDD665E8, 0x3FD6434634CCC320  //2^( 67 /128-1) - 2^(- 67 /128-1), 2^(- 67 /128-1)
> +        .quad 0x3FD81A6D219E6963, 0x3FD6247EB03A5585  //2^( 68 /128-1) - 2^(- 68 /128-1), 2^(- 68 /128-1)
> +        .quad 0x3FD87952D7D426DF, 0x3FD605E1B976DC09  //2^( 69 /128-1) - 2^(- 69 /128-1), 2^(- 69 /128-1)
> +        .quad 0x3FD8D86796D7AE49, 0x3FD5E76F15AD2148  //2^( 70 /128-1) - 2^(- 70 /128-1), 2^(- 70 /128-1)
> +        .quad 0x3FD937AC156373C8, 0x3FD5C9268A5946B7  //2^( 71 /128-1) - 2^(- 71 /128-1), 2^(- 71 /128-1)
> +        .quad 0x3FD997210A8DAEE4, 0x3FD5AB07DD485429  //2^( 72 /128-1) - 2^(- 72 /128-1), 2^(- 72 /128-1)
> +        .quad 0x3FD9F6C72DC9BA68, 0x3FD58D12D497C7FD  //2^( 73 /128-1) - 2^(- 73 /128-1), 2^(- 73 /128-1)
> +        .quad 0x3FDA569F36E974EA, 0x3FD56F4736B527DA  //2^( 74 /128-1) - 2^(- 74 /128-1), 2^(- 74 /128-1)
> +        .quad 0x3FDAB6A9DE1EA215, 0x3FD551A4CA5D920F  //2^( 75 /128-1) - 2^(- 75 /128-1), 2^(- 75 /128-1)
> +        .quad 0x3FDB16E7DBFC4CA3, 0x3FD5342B569D4F82  //2^( 76 /128-1) - 2^(- 76 /128-1), 2^(- 76 /128-1)
> +        .quad 0x3FDB7759E9782918, 0x3FD516DAA2CF6642  //2^( 77 /128-1) - 2^(- 77 /128-1), 2^(- 77 /128-1)
> +        .quad 0x3FDBD800BFEBF932, 0x3FD4F9B2769D2CA7  //2^( 78 /128-1) - 2^(- 78 /128-1), 2^(- 78 /128-1)
> +        .quad 0x3FDC38DD1916F025, 0x3FD4DCB299FDDD0D  //2^( 79 /128-1) - 2^(- 79 /128-1), 2^(- 79 /128-1)
> +        .quad 0x3FDC99EFAF1F1790, 0x3FD4BFDAD5362A27  //2^( 80 /128-1) - 2^(- 80 /128-1), 2^(- 80 /128-1)
> +        .quad 0x3FDCFB393C92B539, 0x3FD4A32AF0D7D3DE  //2^( 81 /128-1) - 2^(- 81 /128-1), 2^(- 81 /128-1)
> +        .quad 0x3FDD5CBA7C69B19C, 0x3FD486A2B5C13CD0  //2^( 82 /128-1) - 2^(- 82 /128-1), 2^(- 82 /128-1)
> +        .quad 0x3FDDBE742A06FF34, 0x3FD46A41ED1D0057  //2^( 83 /128-1) - 2^(- 83 /128-1), 2^(- 83 /128-1)
> +        .quad 0x3FDE2067013A029D, 0x3FD44E086061892D  //2^( 84 /128-1) - 2^(- 84 /128-1), 2^(- 84 /128-1)
> +        .quad 0x3FDE8293BE3FFB87, 0x3FD431F5D950A897  //2^( 85 /128-1) - 2^(- 85 /128-1), 2^(- 85 /128-1)
> +        .quad 0x3FDEE4FB1DC56E75, 0x3FD4160A21F72E2A  //2^( 86 /128-1) - 2^(- 86 /128-1), 2^(- 86 /128-1)
> +        .quad 0x3FDF479DDCE78F58, 0x3FD3FA4504AC801C  //2^( 87 /128-1) - 2^(- 87 /128-1), 2^(- 87 /128-1)
> +        .quad 0x3FDFAA7CB935ACFE, 0x3FD3DEA64C123422  //2^( 88 /128-1) - 2^(- 88 /128-1), 2^(- 88 /128-1)
> +        .quad 0x3FE006CC38594EB1, 0x3FD3C32DC313A8E5  //2^( 89 /128-1) - 2^(- 89 /128-1), 2^(- 89 /128-1)
> +        .quad 0x3FE03878E0EB1569, 0x3FD3A7DB34E59FF7  //2^( 90 /128-1) - 2^(- 90 /128-1), 2^(- 90 /128-1)
> +        .quad 0x3FE06A44B5C74101, 0x3FD38CAE6D05D866  //2^( 91 /128-1) - 2^(- 91 /128-1), 2^(- 91 /128-1)
> +        .quad 0x3FE09C3016A0D077, 0x3FD371A7373AA9CB  //2^( 92 /128-1) - 2^(- 92 /128-1), 2^(- 92 /128-1)
> +        .quad 0x3FE0CE3B63676360, 0x3FD356C55F929FF1  //2^( 93 /128-1) - 2^(- 93 /128-1), 2^(- 93 /128-1)
> +        .quad 0x3FE10066FC47F240, 0x3FD33C08B26416FF  //2^( 94 /128-1) - 2^(- 94 /128-1), 2^(- 94 /128-1)
> +        .quad 0x3FE132B341AD8761, 0x3FD32170FC4CD831  //2^( 95 /128-1) - 2^(- 95 /128-1), 2^(- 95 /128-1)
> +        .quad 0x3FE165209441F823, 0x3FD306FE0A31B715  //2^( 96 /128-1) - 2^(- 96 /128-1), 2^(- 96 /128-1)
> +        .quad 0x3FE197AF54EE9EBB, 0x3FD2ECAFA93E2F56  //2^( 97 /128-1) - 2^(- 97 /128-1), 2^(- 97 /128-1)
> +        .quad 0x3FE1CA5FE4DD1475, 0x3FD2D285A6E4030B  //2^( 98 /128-1) - 2^(- 98 /128-1), 2^(- 98 /128-1)
> +        .quad 0x3FE1FD32A577EC72, 0x3FD2B87FD0DAD990  //2^( 99 /128-1) - 2^(- 99 /128-1), 2^(- 99 /128-1)
> +        .quad 0x3FE23027F86B6ED6, 0x3FD29E9DF51FDEE1  //2^( 100 /128-1) - 2^(- 100 /128-1), 2^(- 100 /128-1)
> +        .quad 0x3FE263403FA65489, 0x3FD284DFE1F56381  //2^( 101 /128-1) - 2^(- 101 /128-1), 2^(- 101 /128-1)
> +        .quad 0x3FE2967BDD5A8364, 0x3FD26B4565E27CDD  //2^( 102 /128-1) - 2^(- 102 /128-1), 2^(- 102 /128-1)
> +        .quad 0x3FE2C9DB33FDCAE9, 0x3FD251CE4FB2A63F  //2^( 103 /128-1) - 2^(- 103 /128-1), 2^(- 103 /128-1)
> +        .quad 0x3FE2FD5EA64AA180, 0x3FD2387A6E756238  //2^( 104 /128-1) - 2^(- 104 /128-1), 2^(- 104 /128-1)
> +        .quad 0x3FE331069740E22F, 0x3FD21F49917DDC96  //2^( 105 /128-1) - 2^(- 105 /128-1), 2^(- 105 /128-1)
> +        .quad 0x3FE364D36A268AE0, 0x3FD2063B88628CD6  //2^( 106 /128-1) - 2^(- 106 /128-1), 2^(- 106 /128-1)
> +        .quad 0x3FE398C582887B27, 0x3FD1ED5022FCD91D  //2^( 107 /128-1) - 2^(- 107 /128-1), 2^(- 107 /128-1)
> +        .quad 0x3FE3CCDD443B3394, 0x3FD1D4873168B9AA  //2^( 108 /128-1) - 2^(- 108 /128-1), 2^(- 108 /128-1)
> +        .quad 0x3FE4011B135B9590, 0x3FD1BBE084045CD4  //2^( 109 /128-1) - 2^(- 109 /128-1), 2^(- 109 /128-1)
> +        .quad 0x3FE4357F544FA3C1, 0x3FD1A35BEB6FCB75  //2^( 110 /128-1) - 2^(- 110 /128-1), 2^(- 110 /128-1)
> +        .quad 0x3FE46A0A6BC742FD, 0x3FD18AF9388C8DEA  //2^( 111 /128-1) - 2^(- 111 /128-1), 2^(- 111 /128-1)
> +        .quad 0x3FE49EBCBEBCFBCA, 0x3FD172B83C7D517B  //2^( 112 /128-1) - 2^(- 112 /128-1), 2^(- 112 /128-1)
> +        .quad 0x3FE4D396B276BC6F, 0x3FD15A98C8A58E51  //2^( 113 /128-1) - 2^(- 113 /128-1), 2^(- 113 /128-1)
> +        .quad 0x3FE50898AC869B96, 0x3FD1429AAEA92DE0  //2^( 114 /128-1) - 2^(- 114 /128-1), 2^(- 114 /128-1)
> +        .quad 0x3FE53DC312CB9B7A, 0x3FD12ABDC06C31CC  //2^( 115 /128-1) - 2^(- 115 /128-1), 2^(- 115 /128-1)
> +        .quad 0x3FE573164B726DB6, 0x3FD11301D0125B51  //2^( 116 /128-1) - 2^(- 116 /128-1), 2^(- 116 /128-1)
> +        .quad 0x3FE5A892BCF6379B, 0x3FD0FB66AFFED31B  //2^( 117 /128-1) - 2^(- 117 /128-1), 2^(- 117 /128-1)
> +        .quad 0x3FE5DE38CE215725, 0x3FD0E3EC32D3D1A2  //2^( 118 /128-1) - 2^(- 118 /128-1), 2^(- 118 /128-1)
> +        .quad 0x3FE61408E60E2888, 0x3FD0CC922B7247F7  //2^( 119 /128-1) - 2^(- 119 /128-1), 2^(- 119 /128-1)
> +        .quad 0x3FE64A036C27CC52, 0x3FD0B5586CF9890F  //2^( 120 /128-1) - 2^(- 120 /128-1), 2^(- 120 /128-1)
> +        .quad 0x3FE68028C82AEE2F, 0x3FD09E3ECAC6F383  //2^( 121 /128-1) - 2^(- 121 /128-1), 2^(- 121 /128-1)
> +        .quad 0x3FE6B67962268C43, 0x3FD0874518759BC8  //2^( 122 /128-1) - 2^(- 122 /128-1), 2^(- 122 /128-1)
> +        .quad 0x3FE6ECF5A27CBF28, 0x3FD0706B29DDF6DE  //2^( 123 /128-1) - 2^(- 123 /128-1), 2^(- 123 /128-1)
> +        .quad 0x3FE7239DF1E38286, 0x3FD059B0D3158574  //2^( 124 /128-1) - 2^(- 124 /128-1), 2^(- 124 /128-1)
> +        .quad 0x3FE75A72B9657E51, 0x3FD04315E86E7F85  //2^( 125 /128-1) - 2^(- 125 /128-1), 2^(- 125 /128-1)
> +        .quad 0x3FE791746262D0A8, 0x3FD02C9A3E778061  //2^( 126 /128-1) - 2^(- 126 /128-1), 2^(- 126 /128-1)
> +        .quad 0x3FE7C8A35691D856, 0x3FD0163DA9FB3335 //2^( 127 /128-1) - 2^(- 127 /128-1), 2^(- 127 /128-1)
> +        .align 32
> +        .quad 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000 /* _dbShifter = 1.5 * 2^(52-k)*/
> +        .align 32
> +        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99         /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
> +        .align 32
> +        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
> +        .align 32
> +        .quad 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD /* _dPC3 */
> +        .align 32
> +        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
> +        .align 32
> +        .quad 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425 /* _dPC5 */
> +        .align 32
> +        .quad 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f /* _lIndexMask */
> +        .align 32
> +        .type	__svml_dsinh_data_internal,@object
> +        .size	__svml_dsinh_data_internal,.-__svml_dsinh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core-avx2.S
> new file mode 100644
> index 0000000000..d767d25080
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized sinh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_sinh _ZGVeN8v_sinh_avx2_wrapper
> +#include "../svml_d_sinh8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core.c
> new file mode 100644
> index 0000000000..427d07bce2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized sinh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_sinh
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_sinh, __GI__ZGVeN8v_sinh, __redirect__ZGVeN8v_sinh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core_avx512.S
> new file mode 100644
> index 0000000000..d057d6c7eb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_sinh8_core_avx512.S
> @@ -0,0 +1,461 @@
> +/* Function sinh vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute sinh(x) as (exp(x)-exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   sinh(NaN) = quiet NaN, and raise invalid exception
> + *   sinh(INF) = that INF
> + *   sinh(x)   = x for subnormals
> + *   sinh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_dsinh_data_internal
> + */
> +#define _dbInvLn2                     	0
> +#define _dbLn2hi                      	64
> +#define _dbLn2lo                      	128
> +#define _dSign                        	192
> +#define _dbT                          	256
> +#define _dbShifter                    	2304
> +#define _iDomainRange                 	2368
> +#define _dPC2                         	2432
> +#define _dPC3                         	2496
> +#define _dPC4                         	2560
> +#define _dPC5                         	2624
> +#define _lIndexMask                   	2688
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_sinh_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        lea       _dbT+8+__svml_dsinh_data_internal(%rip), %rax
> +        vmovaps   %zmm0, %zmm8
> +
> +/*  Abs argument  */
> +        vandpd    _dSign+__svml_dsinh_data_internal(%rip), %zmm8, %zmm7
> +        vmovups   _dbShifter+__svml_dsinh_data_internal(%rip), %zmm13
> +
> +/*
> + *  Load argument
> + * dM = x*2^K/log(2) + RShifter
> + */
> +        vmovups   _dbInvLn2+__svml_dsinh_data_internal(%rip), %zmm12
> +        vmovups   _dbLn2hi+__svml_dsinh_data_internal(%rip), %zmm14
> +        vmovups   _dPC5+__svml_dsinh_data_internal(%rip), %zmm6
> +
> +/* VLOAD_CONST( D, dPC[0],         TAB._dPC1 ); */
> +        vmovups   _dPC4+__svml_dsinh_data_internal(%rip), %zmm4
> +        vxorpd    %zmm8, %zmm7, %zmm5
> +        kxnorw    %k0, %k0, %k1
> +        kxnorw    %k0, %k0, %k2
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm5, %zmm12
> +
> +/*
> + * Check for overflow\underflow
> + *
> + */
> +        vpsrlq    $32, %zmm5, %zmm9
> +
> +/*
> + *  R
> + * dN = dM - RShifter
> + */
> +        vsubpd    {rn-sae}, %zmm13, %zmm12, %zmm2
> +        vpmovqd   %zmm9, %ymm10
> +        vmovups   _dbLn2lo+__svml_dsinh_data_internal(%rip), %zmm9
> +
> +/* dR = dX - dN*Log2_hi/2^K */
> +        vfnmadd231pd {rn-sae}, %zmm14, %zmm2, %zmm5
> +
> +/*
> + * sinh(r) = r*((a1=1)+r^2*(a3+r^2*a5)) = r + r*(r^2*(a3+r^2*a5)) ....
> + * dSinh_r = (a3+r^2*a5)
> + */
> +        vmovups   _dPC3+__svml_dsinh_data_internal(%rip), %zmm14
> +
> +/* dR = (dX - dN*Log2_hi/2^K) - dN*Log2_lo/2^K */
> +        vfnmadd231pd {rn-sae}, %zmm9, %zmm2, %zmm5
> +        vpcmpgtd  _iDomainRange+__svml_dsinh_data_internal(%rip), %ymm10, %ymm11
> +        vmovmskps %ymm11, %edx
> +
> +/* dR2 = dR^2 */
> +        vmulpd    {rn-sae}, %zmm5, %zmm5, %zmm2
> +        vfmadd231pd {rn-sae}, %zmm2, %zmm6, %zmm14
> +
> +/*
> + *  Index and lookup
> + * j
> + */
> +        vpandq    _lIndexMask+__svml_dsinh_data_internal(%rip), %zmm12, %zmm15
> +        vpsllq    $4, %zmm15, %zmm1
> +        vpmovqd   %zmm1, %ymm0
> +        vpxord    %zmm11, %zmm11, %zmm11
> +        vpxord    %zmm10, %zmm10, %zmm10
> +        vgatherdpd (%rax,%ymm0), %zmm11{%k1}
> +        vgatherdpd -8(%rax,%ymm0), %zmm10{%k2}
> +
> +/* split j and N */
> +        vpxorq    %zmm15, %zmm12, %zmm3
> +
> +/*
> + *  G1,G2,G3: dTdif,dTn * 2^N,2^(-N)
> + * lM now is an EXP(2^N)
> + */
> +        vpsllq    $45, %zmm3, %zmm3
> +        vpaddq    %zmm3, %zmm10, %zmm1
> +
> +/*  */
> +        vpaddq    %zmm3, %zmm11, %zmm12
> +
> +/*  */
> +        vpsubq    %zmm3, %zmm11, %zmm13
> +
> +/* dSinh_r = r^2*(a3+r^2*a5) */
> +        vmulpd    {rn-sae}, %zmm2, %zmm14, %zmm3
> +
> +/* dG2 = dTn*2^N - dTn*2^-N */
> +        vsubpd    {rn-sae}, %zmm13, %zmm12, %zmm15
> +
> +/* dG3 = dTn*2^N + dTn*2^-N */
> +        vaddpd    {rn-sae}, %zmm13, %zmm12, %zmm0
> +
> +/* dSinh_r = r + r*(r^2*(a3+r^2*a5)) */
> +        vfmadd213pd {rn-sae}, %zmm5, %zmm5, %zmm3
> +
> +/*
> + * poly(r) = (dG2+dG1)+dG3*sinh(dR)+dG1*sinh(dR)+(dG1+dG2)*dR2*(a2 +a4*dR2)
> + * dOut = (a2 +a4*dR2)
> + */
> +        vmovups   _dPC2+__svml_dsinh_data_internal(%rip), %zmm5
> +
> +/* dG1 += dG3 */
> +        vaddpd    {rn-sae}, %zmm0, %zmm1, %zmm6
> +        vfmadd231pd {rn-sae}, %zmm2, %zmm4, %zmm5
> +
> +/* dOut = dR2*(a2 +a4*dR2) */
> +        vmulpd    {rn-sae}, %zmm2, %zmm5, %zmm4
> +
> +/* dG2 += dG1 */
> +        vaddpd    {rn-sae}, %zmm15, %zmm1, %zmm2
> +
> +/* dOut = dG2*dR2*(a2 +a4*dR2) */
> +        vmulpd    {rn-sae}, %zmm2, %zmm4, %zmm4
> +
> +/* dOut = dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
> +        vfmadd213pd {rn-sae}, %zmm4, %zmm6, %zmm3
> +
> +/* dOut = dG2 + dG1*sinh(dR)+dG2*dR2*(a2 +a4*dR2) */
> +        vaddpd    {rn-sae}, %zmm2, %zmm3, %zmm0
> +
> +/*  Ret H  */
> +        vorpd     %zmm0, %zmm7, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm8
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm8, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      sinh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_sinh_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dsinh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _dbInvLn2[8][2];
> +        __declspec(align(64)) VUINT32 _dbLn2hi[8][2];
> +        __declspec(align(64)) VUINT32 _dbLn2lo[8][2];
> +        __declspec(align(64)) VUINT32 _dSign[8][2];                //0x8000000000000000
> +        __declspec(align(64)) VUINT32 _dbT[(1<<7)][2][2]; //precalc poly coeff
> +        __declspec(align(64)) VUINT32 _dbShifter[8][2];
> +        __declspec(align(64)) VUINT32 _iDomainRange[16][1];
> +        __declspec(align(64)) VUINT32 _dPC2[8][2];
> +        __declspec(align(64)) VUINT32 _dPC3[8][2];
> +        __declspec(align(64)) VUINT32 _dPC4[8][2];
> +        __declspec(align(64)) VUINT32 _dPC5[8][2];
> +        __declspec(align(64)) VUINT32 _lIndexMask[8][2];
> +} __svml_dsinh_data_internal;
> +#endif
> +__svml_dsinh_data_internal:
> +        .quad 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE, 0x3FF71547652B82FE /* _dbInvLn2 = 1/log(2) */
> +        .align 64
> +        .quad 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000, 0x3FE62E42FEFA0000 /* _dbLn2hi  = log(2) hi*/
> +        .align 64
> +        .quad 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A, 0x3D7CF79ABC9E3B3A /* _dbLn2lo  = log(2) lo*/
> +        .align 64
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000 /* _dSign */
> +        //_dbT
> +        .align 64
> +        .quad 0x0000000000000000, 0x3FE0000000000000  //2^( 0 /128-1) - 2^(- 0 /128-1), 2^(- 0 /128-1)
> +        .quad 0x3F762E4A19BD1E74, 0x3FDFD3C22B8F71F1  //2^( 1 /128-1) - 2^(- 1 /128-1), 2^(- 1 /128-1)
> +        .quad 0x3F862E5F6A0DFD36, 0x3FDFA7C1819E90D8  //2^( 2 /128-1) - 2^(- 2 /128-1), 2^(- 2 /128-1)
> +        .quad 0x3F90A2E234040F5F, 0x3FDF7BFDAD9CBE14  //2^( 3 /128-1) - 2^(- 3 /128-1), 2^(- 3 /128-1)
> +        .quad 0x3F962EB4ABCC5A81, 0x3FDF50765B6E4540  //2^( 4 /128-1) - 2^(- 4 /128-1), 2^(- 4 /128-1)
> +        .quad 0x3F9BBAB1C5033244, 0x3FDF252B376BBA97  //2^( 5 /128-1) - 2^(- 5 /128-1), 2^(- 5 /128-1)
> +        .quad 0x3FA0A372144EEB45, 0x3FDEFA1BEE615A27  //2^( 6 /128-1) - 2^(- 6 /128-1), 2^(- 6 /128-1)
> +        .quad 0x3FA369AB3FFBF8B0, 0x3FDECF482D8E67F1  //2^( 7 /128-1) - 2^(- 7 /128-1), 2^(- 7 /128-1)
> +        .quad 0x3FA63009BA740A2A, 0x3FDEA4AFA2A490DA  //2^( 8 /128-1) - 2^(- 8 /128-1), 2^(- 8 /128-1)
> +        .quad 0x3FA8F692D8EA1B5A, 0x3FDE7A51FBC74C83  //2^( 9 /128-1) - 2^(- 9 /128-1), 2^(- 9 /128-1)
> +        .quad 0x3FABBD4BF0E31A6F, 0x3FDE502EE78B3FF6  //2^( 10 /128-1) - 2^(- 10 /128-1), 2^(- 10 /128-1)
> +        .quad 0x3FAE843A5840286A, 0x3FDE264614F5A129  //2^( 11 /128-1) - 2^(- 11 /128-1), 2^(- 11 /128-1)
> +        .quad 0x3FB0A5B1B2A46D0A, 0x3FDDFC97337B9B5F  //2^( 12 /128-1) - 2^(- 12 /128-1), 2^(- 12 /128-1)
> +        .quad 0x3FB20966375ABCDF, 0x3FDDD321F301B460  //2^( 13 /128-1) - 2^(- 13 /128-1), 2^(- 13 /128-1)
> +        .quad 0x3FB36D3D65DCA4E8, 0x3FDDA9E603DB3285  //2^( 14 /128-1) - 2^(- 14 /128-1), 2^(- 14 /128-1)
> +        .quad 0x3FB4D139EA06642A, 0x3FDD80E316C98398  //2^( 15 /128-1) - 2^(- 15 /128-1), 2^(- 15 /128-1)
> +        .quad 0x3FB6355E6FFBF9BA, 0x3FDD5818DCFBA487  //2^( 16 /128-1) - 2^(- 16 /128-1), 2^(- 16 /128-1)
> +        .quad 0x3FB799ADA42E4788, 0x3FDD2F87080D89F2  //2^( 17 /128-1) - 2^(- 17 /128-1), 2^(- 17 /128-1)
> +        .quad 0x3FB8FE2A336035BC, 0x3FDD072D4A07897C  //2^( 18 /128-1) - 2^(- 18 /128-1), 2^(- 18 /128-1)
> +        .quad 0x3FBA62D6CAABD6B6, 0x3FDCDF0B555DC3FA  //2^( 19 /128-1) - 2^(- 19 /128-1), 2^(- 19 /128-1)
> +        .quad 0x3FBBC7B617878BAF, 0x3FDCB720DCEF9069  //2^( 20 /128-1) - 2^(- 20 /128-1), 2^(- 20 /128-1)
> +        .quad 0x3FBD2CCAC7CB2A11, 0x3FDC8F6D9406E7B5  //2^( 21 /128-1) - 2^(- 21 /128-1), 2^(- 21 /128-1)
> +        .quad 0x3FBE921789B52185, 0x3FDC67F12E57D14B  //2^( 22 /128-1) - 2^(- 22 /128-1), 2^(- 22 /128-1)
> +        .quad 0x3FBFF79F0BEFA2C7, 0x3FDC40AB5FFFD07A  //2^( 23 /128-1) - 2^(- 23 /128-1), 2^(- 23 /128-1)
> +        .quad 0x3FC0AEB1FECAE3A9, 0x3FDC199BDD85529C  //2^( 24 /128-1) - 2^(- 24 /128-1), 2^(- 24 /128-1)
> +        .quad 0x3FC161B4871C5CEC, 0x3FDBF2C25BD71E09  //2^( 25 /128-1) - 2^(- 25 /128-1), 2^(- 25 /128-1)
> +        .quad 0x3FC214D876F26FD0, 0x3FDBCC1E904BC1D2  //2^( 26 /128-1) - 2^(- 26 /128-1), 2^(- 26 /128-1)
> +        .quad 0x3FC2C81F2693816F, 0x3FDBA5B030A1064A  //2^( 27 /128-1) - 2^(- 27 /128-1), 2^(- 27 /128-1)
> +        .quad 0x3FC37B89EE88BEF7, 0x3FDB7F76F2FB5E47  //2^( 28 /128-1) - 2^(- 28 /128-1), 2^(- 28 /128-1)
> +        .quad 0x3FC42F1A27A0B3CD, 0x3FDB59728DE5593A  //2^( 29 /128-1) - 2^(- 29 /128-1), 2^(- 29 /128-1)
> +        .quad 0x3FC4E2D12AF1E037, 0x3FDB33A2B84F15FB  //2^( 30 /128-1) - 2^(- 30 /128-1), 2^(- 30 /128-1)
> +        .quad 0x3FC596B051DD508D, 0x3FDB0E07298DB666  //2^( 31 /128-1) - 2^(- 31 /128-1), 2^(- 31 /128-1)
> +        .quad 0x3FC64AB8F61134FA, 0x3FDAE89F995AD3AD  //2^( 32 /128-1) - 2^(- 32 /128-1), 2^(- 32 /128-1)
> +        .quad 0x3FC6FEEC718B79D1, 0x3FDAC36BBFD3F37A  //2^( 33 /128-1) - 2^(- 33 /128-1), 2^(- 33 /128-1)
> +        .quad 0x3FC7B34C1E9C607F, 0x3FDA9E6B5579FDBF  //2^( 34 /128-1) - 2^(- 34 /128-1), 2^(- 34 /128-1)
> +        .quad 0x3FC867D957E91912, 0x3FDA799E1330B358  //2^( 35 /128-1) - 2^(- 35 /128-1), 2^(- 35 /128-1)
> +        .quad 0x3FC91C95786E5C72, 0x3FDA5503B23E255D  //2^( 36 /128-1) - 2^(- 36 /128-1), 2^(- 36 /128-1)
> +        .quad 0x3FC9D181DB83072F, 0x3FDA309BEC4A2D33  //2^( 37 /128-1) - 2^(- 37 /128-1), 2^(- 37 /128-1)
> +        .quad 0x3FCA869FDCDAB512, 0x3FDA0C667B5DE565  //2^( 38 /128-1) - 2^(- 38 /128-1), 2^(- 38 /128-1)
> +        .quad 0x3FCB3BF0D8885D4C, 0x3FD9E86319E32323  //2^( 39 /128-1) - 2^(- 39 /128-1), 2^(- 39 /128-1)
> +        .quad 0x3FCBF1762B00EF69, 0x3FD9C49182A3F090  //2^( 40 /128-1) - 2^(- 40 /128-1), 2^(- 40 /128-1)
> +        .quad 0x3FCCA731311DF0FB, 0x3FD9A0F170CA07BA  //2^( 41 /128-1) - 2^(- 41 /128-1), 2^(- 41 /128-1)
> +        .quad 0x3FCD5D2348201C09, 0x3FD97D829FDE4E50  //2^( 42 /128-1) - 2^(- 42 /128-1), 2^(- 42 /128-1)
> +        .quad 0x3FCE134DCDB1FE3E, 0x3FD95A44CBC8520F  //2^( 43 /128-1) - 2^(- 43 /128-1), 2^(- 43 /128-1)
> +        .quad 0x3FCEC9B21FEA98EA, 0x3FD93737B0CDC5E5  //2^( 44 /128-1) - 2^(- 44 /128-1), 2^(- 44 /128-1)
> +        .quad 0x3FCF80519D5001D3, 0x3FD9145B0B91FFC6  //2^( 45 /128-1) - 2^(- 45 /128-1), 2^(- 45 /128-1)
> +        .quad 0x3FD01B96D26D026A, 0x3FD8F1AE99157736  //2^( 46 /128-1) - 2^(- 46 /128-1), 2^(- 46 /128-1)
> +        .quad 0x3FD07723CAFA6331, 0x3FD8CF3216B5448C  //2^( 47 /128-1) - 2^(- 47 /128-1), 2^(- 47 /128-1)
> +        .quad 0x3FD0D2D06841B373, 0x3FD8ACE5422AA0DB  //2^( 48 /128-1) - 2^(- 48 /128-1), 2^(- 48 /128-1)
> +        .quad 0x3FD12E9D5A715381, 0x3FD88AC7D98A6699  //2^( 49 /128-1) - 2^(- 49 /128-1), 2^(- 49 /128-1)
> +        .quad 0x3FD18A8B51F5C661, 0x3FD868D99B4492ED  //2^( 50 /128-1) - 2^(- 50 /128-1), 2^(- 50 /128-1)
> +        .quad 0x3FD1E69AFF7B04D7, 0x3FD8471A4623C7AD  //2^( 51 /128-1) - 2^(- 51 /128-1), 2^(- 51 /128-1)
> +        .quad 0x3FD242CD13EDD0F1, 0x3FD82589994CCE13  //2^( 52 /128-1) - 2^(- 52 /128-1), 2^(- 52 /128-1)
> +        .quad 0x3FD29F22407D0A0C, 0x3FD80427543E1A12  //2^( 53 /128-1) - 2^(- 53 /128-1), 2^(- 53 /128-1)
> +        .quad 0x3FD2FB9B369B0153, 0x3FD7E2F336CF4E62  //2^( 54 /128-1) - 2^(- 54 /128-1), 2^(- 54 /128-1)
> +        .quad 0x3FD35838A7FECEC8, 0x3FD7C1ED0130C132  //2^( 55 /128-1) - 2^(- 55 /128-1), 2^(- 55 /128-1)
> +        .quad 0x3FD3B4FB46A5A6CC, 0x3FD7A11473EB0187  //2^( 56 /128-1) - 2^(- 56 /128-1), 2^(- 56 /128-1)
> +        .quad 0x3FD411E3C4D4302F, 0x3FD780694FDE5D3F  //2^( 57 /128-1) - 2^(- 57 /128-1), 2^(- 57 /128-1)
> +        .quad 0x3FD46EF2D517DAC8, 0x3FD75FEB564267C9  //2^( 58 /128-1) - 2^(- 58 /128-1), 2^(- 58 /128-1)
> +        .quad 0x3FD4CC292A48369E, 0x3FD73F9A48A58174  //2^( 59 /128-1) - 2^(- 59 /128-1), 2^(- 59 /128-1)
> +        .quad 0x3FD5298777884B96, 0x3FD71F75E8EC5F74  //2^( 60 /128-1) - 2^(- 60 /128-1), 2^(- 60 /128-1)
> +        .quad 0x3FD5870E7047F1BC, 0x3FD6FF7DF9519484  //2^( 61 /128-1) - 2^(- 61 /128-1), 2^(- 61 /128-1)
> +        .quad 0x3FD5E4BEC8452A1A, 0x3FD6DFB23C651A2F  //2^( 62 /128-1) - 2^(- 62 /128-1), 2^(- 62 /128-1)
> +        .quad 0x3FD64299338D7827, 0x3FD6C012750BDABF  //2^( 63 /128-1) - 2^(- 63 /128-1), 2^(- 63 /128-1)
> +        .quad 0x3FD6A09E667F3BCD, 0x3FD6A09E667F3BCD  //2^( 64 /128-1) - 2^(- 64 /128-1), 2^(- 64 /128-1)
> +        .quad 0x3FD6FECF15CB0C0B, 0x3FD68155D44CA973  //2^( 65 /128-1) - 2^(- 65 /128-1), 2^(- 65 /128-1)
> +        .quad 0x3FD75D2BF6751239, 0x3FD6623882552225  //2^( 66 /128-1) - 2^(- 66 /128-1), 2^(- 66 /128-1)
> +        .quad 0x3FD7BBB5BDD665E8, 0x3FD6434634CCC320  //2^( 67 /128-1) - 2^(- 67 /128-1), 2^(- 67 /128-1)
> +        .quad 0x3FD81A6D219E6963, 0x3FD6247EB03A5585  //2^( 68 /128-1) - 2^(- 68 /128-1), 2^(- 68 /128-1)
> +        .quad 0x3FD87952D7D426DF, 0x3FD605E1B976DC09  //2^( 69 /128-1) - 2^(- 69 /128-1), 2^(- 69 /128-1)
> +        .quad 0x3FD8D86796D7AE49, 0x3FD5E76F15AD2148  //2^( 70 /128-1) - 2^(- 70 /128-1), 2^(- 70 /128-1)
> +        .quad 0x3FD937AC156373C8, 0x3FD5C9268A5946B7  //2^( 71 /128-1) - 2^(- 71 /128-1), 2^(- 71 /128-1)
> +        .quad 0x3FD997210A8DAEE4, 0x3FD5AB07DD485429  //2^( 72 /128-1) - 2^(- 72 /128-1), 2^(- 72 /128-1)
> +        .quad 0x3FD9F6C72DC9BA68, 0x3FD58D12D497C7FD  //2^( 73 /128-1) - 2^(- 73 /128-1), 2^(- 73 /128-1)
> +        .quad 0x3FDA569F36E974EA, 0x3FD56F4736B527DA  //2^( 74 /128-1) - 2^(- 74 /128-1), 2^(- 74 /128-1)
> +        .quad 0x3FDAB6A9DE1EA215, 0x3FD551A4CA5D920F  //2^( 75 /128-1) - 2^(- 75 /128-1), 2^(- 75 /128-1)
> +        .quad 0x3FDB16E7DBFC4CA3, 0x3FD5342B569D4F82  //2^( 76 /128-1) - 2^(- 76 /128-1), 2^(- 76 /128-1)
> +        .quad 0x3FDB7759E9782918, 0x3FD516DAA2CF6642  //2^( 77 /128-1) - 2^(- 77 /128-1), 2^(- 77 /128-1)
> +        .quad 0x3FDBD800BFEBF932, 0x3FD4F9B2769D2CA7  //2^( 78 /128-1) - 2^(- 78 /128-1), 2^(- 78 /128-1)
> +        .quad 0x3FDC38DD1916F025, 0x3FD4DCB299FDDD0D  //2^( 79 /128-1) - 2^(- 79 /128-1), 2^(- 79 /128-1)
> +        .quad 0x3FDC99EFAF1F1790, 0x3FD4BFDAD5362A27  //2^( 80 /128-1) - 2^(- 80 /128-1), 2^(- 80 /128-1)
> +        .quad 0x3FDCFB393C92B539, 0x3FD4A32AF0D7D3DE  //2^( 81 /128-1) - 2^(- 81 /128-1), 2^(- 81 /128-1)
> +        .quad 0x3FDD5CBA7C69B19C, 0x3FD486A2B5C13CD0  //2^( 82 /128-1) - 2^(- 82 /128-1), 2^(- 82 /128-1)
> +        .quad 0x3FDDBE742A06FF34, 0x3FD46A41ED1D0057  //2^( 83 /128-1) - 2^(- 83 /128-1), 2^(- 83 /128-1)
> +        .quad 0x3FDE2067013A029D, 0x3FD44E086061892D  //2^( 84 /128-1) - 2^(- 84 /128-1), 2^(- 84 /128-1)
> +        .quad 0x3FDE8293BE3FFB87, 0x3FD431F5D950A897  //2^( 85 /128-1) - 2^(- 85 /128-1), 2^(- 85 /128-1)
> +        .quad 0x3FDEE4FB1DC56E75, 0x3FD4160A21F72E2A  //2^( 86 /128-1) - 2^(- 86 /128-1), 2^(- 86 /128-1)
> +        .quad 0x3FDF479DDCE78F58, 0x3FD3FA4504AC801C  //2^( 87 /128-1) - 2^(- 87 /128-1), 2^(- 87 /128-1)
> +        .quad 0x3FDFAA7CB935ACFE, 0x3FD3DEA64C123422  //2^( 88 /128-1) - 2^(- 88 /128-1), 2^(- 88 /128-1)
> +        .quad 0x3FE006CC38594EB1, 0x3FD3C32DC313A8E5  //2^( 89 /128-1) - 2^(- 89 /128-1), 2^(- 89 /128-1)
> +        .quad 0x3FE03878E0EB1569, 0x3FD3A7DB34E59FF7  //2^( 90 /128-1) - 2^(- 90 /128-1), 2^(- 90 /128-1)
> +        .quad 0x3FE06A44B5C74101, 0x3FD38CAE6D05D866  //2^( 91 /128-1) - 2^(- 91 /128-1), 2^(- 91 /128-1)
> +        .quad 0x3FE09C3016A0D077, 0x3FD371A7373AA9CB  //2^( 92 /128-1) - 2^(- 92 /128-1), 2^(- 92 /128-1)
> +        .quad 0x3FE0CE3B63676360, 0x3FD356C55F929FF1  //2^( 93 /128-1) - 2^(- 93 /128-1), 2^(- 93 /128-1)
> +        .quad 0x3FE10066FC47F240, 0x3FD33C08B26416FF  //2^( 94 /128-1) - 2^(- 94 /128-1), 2^(- 94 /128-1)
> +        .quad 0x3FE132B341AD8761, 0x3FD32170FC4CD831  //2^( 95 /128-1) - 2^(- 95 /128-1), 2^(- 95 /128-1)
> +        .quad 0x3FE165209441F823, 0x3FD306FE0A31B715  //2^( 96 /128-1) - 2^(- 96 /128-1), 2^(- 96 /128-1)
> +        .quad 0x3FE197AF54EE9EBB, 0x3FD2ECAFA93E2F56  //2^( 97 /128-1) - 2^(- 97 /128-1), 2^(- 97 /128-1)
> +        .quad 0x3FE1CA5FE4DD1475, 0x3FD2D285A6E4030B  //2^( 98 /128-1) - 2^(- 98 /128-1), 2^(- 98 /128-1)
> +        .quad 0x3FE1FD32A577EC72, 0x3FD2B87FD0DAD990  //2^( 99 /128-1) - 2^(- 99 /128-1), 2^(- 99 /128-1)
> +        .quad 0x3FE23027F86B6ED6, 0x3FD29E9DF51FDEE1  //2^( 100 /128-1) - 2^(- 100 /128-1), 2^(- 100 /128-1)
> +        .quad 0x3FE263403FA65489, 0x3FD284DFE1F56381  //2^( 101 /128-1) - 2^(- 101 /128-1), 2^(- 101 /128-1)
> +        .quad 0x3FE2967BDD5A8364, 0x3FD26B4565E27CDD  //2^( 102 /128-1) - 2^(- 102 /128-1), 2^(- 102 /128-1)
> +        .quad 0x3FE2C9DB33FDCAE9, 0x3FD251CE4FB2A63F  //2^( 103 /128-1) - 2^(- 103 /128-1), 2^(- 103 /128-1)
> +        .quad 0x3FE2FD5EA64AA180, 0x3FD2387A6E756238  //2^( 104 /128-1) - 2^(- 104 /128-1), 2^(- 104 /128-1)
> +        .quad 0x3FE331069740E22F, 0x3FD21F49917DDC96  //2^( 105 /128-1) - 2^(- 105 /128-1), 2^(- 105 /128-1)
> +        .quad 0x3FE364D36A268AE0, 0x3FD2063B88628CD6  //2^( 106 /128-1) - 2^(- 106 /128-1), 2^(- 106 /128-1)
> +        .quad 0x3FE398C582887B27, 0x3FD1ED5022FCD91D  //2^( 107 /128-1) - 2^(- 107 /128-1), 2^(- 107 /128-1)
> +        .quad 0x3FE3CCDD443B3394, 0x3FD1D4873168B9AA  //2^( 108 /128-1) - 2^(- 108 /128-1), 2^(- 108 /128-1)
> +        .quad 0x3FE4011B135B9590, 0x3FD1BBE084045CD4  //2^( 109 /128-1) - 2^(- 109 /128-1), 2^(- 109 /128-1)
> +        .quad 0x3FE4357F544FA3C1, 0x3FD1A35BEB6FCB75  //2^( 110 /128-1) - 2^(- 110 /128-1), 2^(- 110 /128-1)
> +        .quad 0x3FE46A0A6BC742FD, 0x3FD18AF9388C8DEA  //2^( 111 /128-1) - 2^(- 111 /128-1), 2^(- 111 /128-1)
> +        .quad 0x3FE49EBCBEBCFBCA, 0x3FD172B83C7D517B  //2^( 112 /128-1) - 2^(- 112 /128-1), 2^(- 112 /128-1)
> +        .quad 0x3FE4D396B276BC6F, 0x3FD15A98C8A58E51  //2^( 113 /128-1) - 2^(- 113 /128-1), 2^(- 113 /128-1)
> +        .quad 0x3FE50898AC869B96, 0x3FD1429AAEA92DE0  //2^( 114 /128-1) - 2^(- 114 /128-1), 2^(- 114 /128-1)
> +        .quad 0x3FE53DC312CB9B7A, 0x3FD12ABDC06C31CC  //2^( 115 /128-1) - 2^(- 115 /128-1), 2^(- 115 /128-1)
> +        .quad 0x3FE573164B726DB6, 0x3FD11301D0125B51  //2^( 116 /128-1) - 2^(- 116 /128-1), 2^(- 116 /128-1)
> +        .quad 0x3FE5A892BCF6379B, 0x3FD0FB66AFFED31B  //2^( 117 /128-1) - 2^(- 117 /128-1), 2^(- 117 /128-1)
> +        .quad 0x3FE5DE38CE215725, 0x3FD0E3EC32D3D1A2  //2^( 118 /128-1) - 2^(- 118 /128-1), 2^(- 118 /128-1)
> +        .quad 0x3FE61408E60E2888, 0x3FD0CC922B7247F7  //2^( 119 /128-1) - 2^(- 119 /128-1), 2^(- 119 /128-1)
> +        .quad 0x3FE64A036C27CC52, 0x3FD0B5586CF9890F  //2^( 120 /128-1) - 2^(- 120 /128-1), 2^(- 120 /128-1)
> +        .quad 0x3FE68028C82AEE2F, 0x3FD09E3ECAC6F383  //2^( 121 /128-1) - 2^(- 121 /128-1), 2^(- 121 /128-1)
> +        .quad 0x3FE6B67962268C43, 0x3FD0874518759BC8  //2^( 122 /128-1) - 2^(- 122 /128-1), 2^(- 122 /128-1)
> +        .quad 0x3FE6ECF5A27CBF28, 0x3FD0706B29DDF6DE  //2^( 123 /128-1) - 2^(- 123 /128-1), 2^(- 123 /128-1)
> +        .quad 0x3FE7239DF1E38286, 0x3FD059B0D3158574  //2^( 124 /128-1) - 2^(- 124 /128-1), 2^(- 124 /128-1)
> +        .quad 0x3FE75A72B9657E51, 0x3FD04315E86E7F85  //2^( 125 /128-1) - 2^(- 125 /128-1), 2^(- 125 /128-1)
> +        .quad 0x3FE791746262D0A8, 0x3FD02C9A3E778061  //2^( 126 /128-1) - 2^(- 126 /128-1), 2^(- 126 /128-1)
> +        .quad 0x3FE7C8A35691D856, 0x3FD0163DA9FB3335 //2^( 127 /128-1) - 2^(- 127 /128-1), 2^(- 127 /128-1)
> +        .align 64
> +        .quad 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000, 0x42C8000000000000 /* _dbShifter = 1.5 * 2^(52-k)*/
> +        .align 64
> +        .long 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99, 0x40861d99         /* _iDomainRange 0x40861d9ac12a3e85 =(1021*2^K-0.5)*log(2)/2^K -needed for quick exp*/
> +        .align 64
> +        .quad 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD, 0x3FDFFFFFFFFFFDBD /* _dPC2 */
> +        .align 64
> +        .quad 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD, 0x3FC55555555554AD /* _dPC3 */
> +        .align 64
> +        .quad 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299, 0x3FA55555CF16D299 /* _dPC4 */
> +        .align 64
> +        .quad 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425, 0x3F8111115712F425 /* _dPC5 */
> +        .align 64
> +        .quad 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f, 0x000000000000007f /* _lIndexMask */
> +        .align 64
> +        .type	__svml_dsinh_data_internal,@object
> +        .size	__svml_dsinh_data_internal,.-__svml_dsinh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core-avx2.S
> new file mode 100644
> index 0000000000..06525b7b37
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized sinhf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_sinhf _ZGVeN16v_sinhf_avx2_wrapper
> +#include "../svml_s_sinhf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core.c
> new file mode 100644
> index 0000000000..6a954caa37
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized sinhf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_sinhf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_sinhf, __GI__ZGVeN16v_sinhf,
> +	       __redirect__ZGVeN16v_sinhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core_avx512.S
> new file mode 100644
> index 0000000000..1119c00259
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf16_core_avx512.S
> @@ -0,0 +1,318 @@
> +/* Function sinhf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute sinh(x) as (exp(x)-exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   sinh(NaN) = quiet NaN, and raise invalid exception
> + *   sinh(INF) = that INF
> + *   sinh(x)   = x for subnormals
> + *   sinh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_ssinh_data_internal
> + */
> +#define _sInvLn2                      	0
> +#define _sLn2hi                       	64
> +#define _sLn2lo                       	128
> +#define _sSign                        	192
> +#define _sShifter                     	256
> +#define _iDomainRange                 	320
> +#define _sPC1                         	384
> +#define _sPC2                         	448
> +#define _sPC3                         	512
> +#define _sPC4                         	576
> +#define _sPC5                         	640
> +#define _sPC6                         	704
> +#define _iHalf                        	768
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_sinhf_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovaps   %zmm0, %zmm5
> +
> +/*
> + *  Implementation
> + *  Abs argument
> + */
> +        vandps    _sSign+__svml_ssinh_data_internal(%rip), %zmm5, %zmm4
> +
> +/*
> + * Check for overflow\underflow
> + * MORE faster than GE?
> + */
> +        vpternlogd $255, %zmm6, %zmm6, %zmm6
> +        vmovups   _sShifter+__svml_ssinh_data_internal(%rip), %zmm7
> +
> +/*
> + *  Load argument
> + * dM = x/log(2) + RShifter
> + */
> +        vmovups   _sInvLn2+__svml_ssinh_data_internal(%rip), %zmm11
> +        vmovups   _sLn2hi+__svml_ssinh_data_internal(%rip), %zmm8
> +        vmovups   _sLn2lo+__svml_ssinh_data_internal(%rip), %zmm10
> +        vmovups   _iHalf+__svml_ssinh_data_internal(%rip), %zmm12
> +        vmovups   _sPC5+__svml_ssinh_data_internal(%rip), %zmm0
> +        vmovups   _sPC6+__svml_ssinh_data_internal(%rip), %zmm3
> +
> +/* x^2 */
> +        vmovups   _sPC2+__svml_ssinh_data_internal(%rip), %zmm2
> +        vxorps    %zmm5, %zmm4, %zmm1
> +        vfmadd213ps {rn-sae}, %zmm7, %zmm1, %zmm11
> +        vpcmpd    $2, _iDomainRange+__svml_ssinh_data_internal(%rip), %zmm1, %k1
> +
> +/*
> + *  G1,G2 2^N,2^(-N)
> + * iM now is an EXP(2^N)
> + */
> +        vpslld    $23, %zmm11, %zmm13
> +
> +/*
> + *  R
> + * sN = sM - RShifter
> + */
> +        vsubps    {rn-sae}, %zmm7, %zmm11, %zmm9
> +        vpaddd    %zmm13, %zmm12, %zmm14
> +        vpsubd    %zmm13, %zmm12, %zmm15
> +
> +/* sG1 = 2^(N-1)+2^(-N-1) */
> +        vaddps    {rn-sae}, %zmm15, %zmm14, %zmm7
> +        vpandnd   %zmm1, %zmm1, %zmm6{%k1}
> +
> +/* sR = sX - sN*Log2_hi */
> +        vfnmadd231ps {rn-sae}, %zmm8, %zmm9, %zmm1
> +        vptestmd  %zmm6, %zmm6, %k0
> +
> +/* sG2 = 2^(N-1)-2^(-N-1) */
> +        vsubps    {rn-sae}, %zmm15, %zmm14, %zmm8
> +
> +/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
> +        vfnmadd231ps {rn-sae}, %zmm10, %zmm9, %zmm1
> +
> +/*
> + * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) ....
> + * sSinh_r = (a3+r^2*a5)
> + */
> +        vmovups   _sPC3+__svml_ssinh_data_internal(%rip), %zmm14
> +        kmovw     %k0, %edx
> +
> +/* sR2 = sR^2 */
> +        vmulps    {rn-sae}, %zmm1, %zmm1, %zmm6
> +        vfmadd231ps {rn-sae}, %zmm6, %zmm0, %zmm14
> +
> +/* sSinh_r = r^2*(a3+r^2*a5) */
> +        vmulps    {rn-sae}, %zmm6, %zmm14, %zmm0
> +
> +/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */
> +        vfmadd213ps {rn-sae}, %zmm1, %zmm1, %zmm0
> +
> +/*
> + * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2)
> + * sOut = (a4 +a6*sR2)
> + */
> +        vmovups   _sPC4+__svml_ssinh_data_internal(%rip), %zmm1
> +        vfmadd231ps {rn-sae}, %zmm6, %zmm3, %zmm1
> +
> +/* sOut = a2+sR2*(a4+a6*sR2) */
> +        vfmadd213ps {rn-sae}, %zmm2, %zmm6, %zmm1
> +
> +/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */
> +        vmulps    {rn-sae}, %zmm6, %zmm1, %zmm2
> +
> +/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        vmulps    {rn-sae}, %zmm8, %zmm2, %zmm3
> +
> +/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        vfmadd213ps {rn-sae}, %zmm3, %zmm0, %zmm7
> +
> +/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        vaddps    {rn-sae}, %zmm8, %zmm7, %zmm9
> +
> +/*  Ret H  */
> +        vorps     %zmm9, %zmm4, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm5
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm5, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      sinhf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_sinhf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_ssinh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _sInvLn2[16][1];
> +        __declspec(align(64)) VUINT32 _sLn2hi[16][1];
> +        __declspec(align(64)) VUINT32 _sLn2lo[16][1];
> +        __declspec(align(64)) VUINT32 _sSign[16][1];
> +        __declspec(align(64)) VUINT32 _sShifter[16][1];
> +        __declspec(align(64)) VUINT32 _iDomainRange[16][1];
> +        __declspec(align(64)) VUINT32 _sPC1[16][1];
> +        __declspec(align(64)) VUINT32 _sPC2[16][1];
> +        __declspec(align(64)) VUINT32 _sPC3[16][1];
> +        __declspec(align(64)) VUINT32 _sPC4[16][1];
> +        __declspec(align(64)) VUINT32 _sPC5[16][1];
> +        __declspec(align(64)) VUINT32 _sPC6[16][1];
> +        __declspec(align(64)) VUINT32 _iHalf[16][1];
> +} __svml_ssinh_data_internal;
> +#endif
> +__svml_ssinh_data_internal:
> +        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B           /* _sInvLn2  */  //k=0
> +        .align 64
> +        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000           /* _sLn2hi   */
> +        .align 64
> +        .long 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4           /* _sLn2lo   */
> +        .align 64
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSign    */
> +        .align 64
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
> +        .align 64
> +        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
> +        .align 64
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
> +        .align 64
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
> +        .align 64
> +        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
> +        .align 64
> +        .long 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72         /* _sPC4  */
> +        .align 64
> +        .long 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461         /* _sPC5  */
> +        .align 64
> +        .long 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3         /* _sPC6  */
> +        // Integer constants
> +        .align 64
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000 /* _iHalf*/
> +        .align 64
> +        .type	__svml_ssinh_data_internal,@object
> +        .size	__svml_ssinh_data_internal,.-__svml_ssinh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core-sse2.S
> new file mode 100644
> index 0000000000..1b31095fe1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized sinhf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_sinhf _ZGVbN4v_sinhf_sse2
> +#include "../svml_s_sinhf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core.c
> new file mode 100644
> index 0000000000..9d4297c2c9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized sinhf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_sinhf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_sinhf, __GI__ZGVbN4v_sinhf,
> +	       __redirect__ZGVbN4v_sinhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core_sse4.S
> new file mode 100644
> index 0000000000..82d6f55d33
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf4_core_sse4.S
> @@ -0,0 +1,308 @@
> +/* Function sinhf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute sinh(x) as (exp(x)-exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   sinh(NaN) = quiet NaN, and raise invalid exception
> + *   sinh(INF) = that INF
> + *   sinh(x)   = x for subnormals
> + *   sinh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_ssinh_data_internal
> + */
> +#define _sInvLn2                      	0
> +#define _sLn2hi                       	16
> +#define _sLn2lo                       	32
> +#define _sSign                        	48
> +#define _sShifter                     	64
> +#define _iDomainRange                 	80
> +#define _sPC1                         	96
> +#define _sPC2                         	112
> +#define _sPC3                         	128
> +#define _sPC4                         	144
> +#define _sPC5                         	160
> +#define _sPC6                         	176
> +#define _iHalf                        	192
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_sinhf_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +
> +/*
> + *  Implementation
> + *  Abs argument
> + */
> +        movups    _sSign+__svml_ssinh_data_internal(%rip), %xmm14
> +        andps     %xmm0, %xmm14
> +        movaps    %xmm14, %xmm10
> +
> +/*
> + *  Load argument
> + * dM = x/log(2) + RShifter
> + */
> +        movups    _sInvLn2+__svml_ssinh_data_internal(%rip), %xmm7
> +        pxor      %xmm0, %xmm10
> +        mulps     %xmm10, %xmm7
> +
> +/*
> + * Check for overflow\underflow
> + * MORE faster than GE?
> + */
> +        movaps    %xmm10, %xmm1
> +        movups    _sShifter+__svml_ssinh_data_internal(%rip), %xmm2
> +
> +/* sR = sX - sN*Log2_hi */
> +        movups    _sLn2hi+__svml_ssinh_data_internal(%rip), %xmm3
> +        addps     %xmm2, %xmm7
> +
> +/*
> + *  R
> + * sN = sM - RShifter
> + */
> +        movaps    %xmm7, %xmm4
> +
> +/*
> + *  G1,G2 2^N,2^(-N)
> + * iM now is an EXP(2^N)
> + */
> +        pslld     $23, %xmm7
> +
> +/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
> +        movups    _sLn2lo+__svml_ssinh_data_internal(%rip), %xmm5
> +        subps     %xmm2, %xmm4
> +        mulps     %xmm4, %xmm3
> +        mulps     %xmm4, %xmm5
> +        subps     %xmm3, %xmm10
> +
> +/*
> + * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) ....
> + * sSinh_r = (a3+r^2*a5)
> + */
> +        movups    _sPC5+__svml_ssinh_data_internal(%rip), %xmm8
> +        subps     %xmm5, %xmm10
> +
> +/* sR2 = sR^2 */
> +        movaps    %xmm10, %xmm12
> +        mulps     %xmm10, %xmm12
> +
> +/*
> + * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2)
> + * sOut = (a4 +a6*sR2)
> + */
> +        movups    _sPC6+__svml_ssinh_data_internal(%rip), %xmm9
> +        mulps     %xmm12, %xmm8
> +        mulps     %xmm12, %xmm9
> +        addps     _sPC3+__svml_ssinh_data_internal(%rip), %xmm8
> +        addps     _sPC4+__svml_ssinh_data_internal(%rip), %xmm9
> +
> +/* sSinh_r = r^2*(a3+r^2*a5) */
> +        mulps     %xmm12, %xmm8
> +
> +/* sOut = a2+sR2*(a4+a6*sR2) */
> +        mulps     %xmm12, %xmm9
> +
> +/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */
> +        mulps     %xmm10, %xmm8
> +        addps     _sPC2+__svml_ssinh_data_internal(%rip), %xmm9
> +        addps     %xmm8, %xmm10
> +
> +/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */
> +        mulps     %xmm9, %xmm12
> +        movdqu    _iHalf+__svml_ssinh_data_internal(%rip), %xmm6
> +        movdqa    %xmm6, %xmm13
> +        psubd     %xmm7, %xmm6
> +        paddd     %xmm7, %xmm13
> +
> +/* sG1 = 2^(N-1)+2^(-N-1) */
> +        movdqa    %xmm13, %xmm11
> +
> +/* sG2 = 2^(N-1)-2^(-N-1) */
> +        subps     %xmm6, %xmm13
> +        addps     %xmm6, %xmm11
> +
> +/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        mulps     %xmm13, %xmm12
> +
> +/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        mulps     %xmm10, %xmm11
> +        pcmpgtd   _iDomainRange+__svml_ssinh_data_internal(%rip), %xmm1
> +        addps     %xmm11, %xmm12
> +        movmskps  %xmm1, %edx
> +
> +/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        addps     %xmm12, %xmm13
> +
> +/*  Ret H  */
> +        orps      %xmm13, %xmm14
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm14
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm14, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm14, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm14
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm14
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      sinhf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_sinhf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_ssinh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _sInvLn2[4][1];
> +        __declspec(align(16)) VUINT32 _sLn2hi[4][1];
> +        __declspec(align(16)) VUINT32 _sLn2lo[4][1];
> +        __declspec(align(16)) VUINT32 _sSign[4][1];
> +        __declspec(align(16)) VUINT32 _sShifter[4][1];
> +        __declspec(align(16)) VUINT32 _iDomainRange[4][1];
> +        __declspec(align(16)) VUINT32 _sPC1[4][1];
> +        __declspec(align(16)) VUINT32 _sPC2[4][1];
> +        __declspec(align(16)) VUINT32 _sPC3[4][1];
> +        __declspec(align(16)) VUINT32 _sPC4[4][1];
> +        __declspec(align(16)) VUINT32 _sPC5[4][1];
> +        __declspec(align(16)) VUINT32 _sPC6[4][1];
> +        __declspec(align(16)) VUINT32 _iHalf[4][1];
> +} __svml_ssinh_data_internal;
> +#endif
> +__svml_ssinh_data_internal:
> +        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B           /* _sInvLn2  */  //k=0
> +        .align 16
> +        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000           /* _sLn2hi   */
> +        .align 16
> +        .long 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4           /* _sLn2lo   */
> +        .align 16
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSign    */
> +        .align 16
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
> +        .align 16
> +        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
> +        .align 16
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
> +        .align 16
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
> +        .align 16
> +        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
> +        .align 16
> +        .long 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72         /* _sPC4  */
> +        .align 16
> +        .long 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461         /* _sPC5  */
> +        .align 16
> +        .long 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3         /* _sPC6  */
> +        // Integer constants
> +        .align 16
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000 /* _iHalf*/
> +        .align 16
> +        .type	__svml_ssinh_data_internal,@object
> +        .size	__svml_ssinh_data_internal,.-__svml_ssinh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core-sse.S
> new file mode 100644
> index 0000000000..d3c9c607a0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized sinhf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_sinhf _ZGVdN8v_sinhf_sse_wrapper
> +#include "../svml_s_sinhf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core.c
> new file mode 100644
> index 0000000000..2a2e21e742
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized sinhf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_sinhf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_sinhf, __GI__ZGVdN8v_sinhf,
> +	       __redirect__ZGVdN8v_sinhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core_avx2.S
> new file mode 100644
> index 0000000000..ea13fb60d4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinhf8_core_avx2.S
> @@ -0,0 +1,309 @@
> +/* Function sinhf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute sinh(x) as (exp(x)-exp(-x))/2,
> + *   where exp is calculated as
> + *   exp(M*ln2 + ln2*(j/2^k) + r) = 2^M * 2^(j/2^k) * exp(r)
> + *
> + *   Special cases:
> + *
> + *   sinh(NaN) = quiet NaN, and raise invalid exception
> + *   sinh(INF) = that INF
> + *   sinh(x)   = x for subnormals
> + *   sinh(x) overflows for big x and returns MAXLOG+log(2)
> + *
> + */
> +
> +/* Offsets for data table __svml_ssinh_data_internal
> + */
> +#define _sInvLn2                      	0
> +#define _sLn2hi                       	32
> +#define _sLn2lo                       	64
> +#define _sSign                        	96
> +#define _sShifter                     	128
> +#define _iDomainRange                 	160
> +#define _sPC1                         	192
> +#define _sPC2                         	224
> +#define _sPC3                         	256
> +#define _sPC4                         	288
> +#define _sPC5                         	320
> +#define _sPC6                         	352
> +#define _iHalf                        	384
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_sinhf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        vmovups   _sInvLn2+__svml_ssinh_data_internal(%rip), %ymm7
> +        vmovups   _sShifter+__svml_ssinh_data_internal(%rip), %ymm4
> +        vmovups   _sLn2hi+__svml_ssinh_data_internal(%rip), %ymm5
> +
> +/*
> + * sinh(X) = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2)
> + * sOut = (a4 +a6*sR2)
> + */
> +        vmovups   _sPC6+__svml_ssinh_data_internal(%rip), %ymm14
> +
> +/*
> + * sinh(r) = r*((a1=1)+r^2*(a3+r^2*(a5+{v1 r^2*a7})))) = r + r*(r^2*(a3+r^2*(a5+r^2*a7))) ....
> + * sSinh_r = (a3+r^2*a5)
> + */
> +        vmovups   _sPC5+__svml_ssinh_data_internal(%rip), %ymm12
> +        vmovups   _iHalf+__svml_ssinh_data_internal(%rip), %ymm8
> +        vmovaps   %ymm0, %ymm2
> +
> +/*
> + *  Implementation
> + *  Abs argument
> + */
> +        vandps    _sSign+__svml_ssinh_data_internal(%rip), %ymm2, %ymm1
> +        vxorps    %ymm2, %ymm1, %ymm0
> +
> +/*
> + *  Load argument
> + * dM = x/log(2) + RShifter
> + */
> +        vfmadd213ps %ymm4, %ymm0, %ymm7
> +
> +/*
> + *  R
> + * sN = sM - RShifter
> + */
> +        vsubps    %ymm4, %ymm7, %ymm6
> +
> +/*
> + *  G1,G2 2^N,2^(-N)
> + * iM now is an EXP(2^N)
> + */
> +        vpslld    $23, %ymm7, %ymm9
> +
> +/*
> + * Check for overflow\underflow
> + * MORE faster than GE?
> + */
> +        vpcmpgtd  _iDomainRange+__svml_ssinh_data_internal(%rip), %ymm0, %ymm3
> +
> +/* sR = sX - sN*Log2_hi */
> +        vfnmadd231ps %ymm5, %ymm6, %ymm0
> +        vpaddd    %ymm9, %ymm8, %ymm10
> +        vpsubd    %ymm9, %ymm8, %ymm11
> +
> +/* sR = (sX - sN*Log2_hi) - sN*Log2_lo */
> +        vfnmadd231ps _sLn2lo+__svml_ssinh_data_internal(%rip), %ymm6, %ymm0
> +
> +/* sR2 = sR^2 */
> +        vmulps    %ymm0, %ymm0, %ymm13
> +        vfmadd213ps _sPC4+__svml_ssinh_data_internal(%rip), %ymm13, %ymm14
> +        vfmadd213ps _sPC3+__svml_ssinh_data_internal(%rip), %ymm13, %ymm12
> +
> +/* sOut = a2+sR2*(a4+a6*sR2) */
> +        vfmadd213ps _sPC2+__svml_ssinh_data_internal(%rip), %ymm13, %ymm14
> +
> +/* sSinh_r = r^2*(a3+r^2*a5) */
> +        vmulps    %ymm12, %ymm13, %ymm12
> +
> +/* sOut = sR2*(a2+sR2*(a4+a6*sR2) */
> +        vmulps    %ymm14, %ymm13, %ymm15
> +
> +/* sSinh_r = r + r*(r^2*(a3+r^2*a5)) */
> +        vfmadd213ps %ymm0, %ymm0, %ymm12
> +        vmovmskps %ymm3, %edx
> +
> +/* sG1 = 2^(N-1)+2^(-N-1) */
> +        vaddps    %ymm11, %ymm10, %ymm3
> +
> +/* sG2 = 2^(N-1)-2^(-N-1) */
> +        vsubps    %ymm11, %ymm10, %ymm10
> +
> +/* sOut = sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        vmulps    %ymm15, %ymm10, %ymm0
> +
> +/* sOut = sG1*sinh(dR)+sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        vfmadd213ps %ymm0, %ymm12, %ymm3
> +
> +/* sOut = sG2 + sG1*sinh(dR) + sG2*sR2*(a2+sR2*(a4+a6*sR2) */
> +        vaddps    %ymm3, %ymm10, %ymm4
> +
> +/*  Ret H  */
> +        vorps     %ymm4, %ymm1, %ymm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm2
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm2, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      sinhf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_sinhf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_ssinh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _sInvLn2[8][1];
> +        __declspec(align(32)) VUINT32 _sLn2hi[8][1];
> +        __declspec(align(32)) VUINT32 _sLn2lo[8][1];
> +        __declspec(align(32)) VUINT32 _sSign[8][1];
> +        __declspec(align(32)) VUINT32 _sShifter[8][1];
> +        __declspec(align(32)) VUINT32 _iDomainRange[8][1];
> +        __declspec(align(32)) VUINT32 _sPC1[8][1];
> +        __declspec(align(32)) VUINT32 _sPC2[8][1];
> +        __declspec(align(32)) VUINT32 _sPC3[8][1];
> +        __declspec(align(32)) VUINT32 _sPC4[8][1];
> +        __declspec(align(32)) VUINT32 _sPC5[8][1];
> +        __declspec(align(32)) VUINT32 _sPC6[8][1];
> +        __declspec(align(32)) VUINT32 _iHalf[8][1];
> +} __svml_ssinh_data_internal;
> +#endif
> +__svml_ssinh_data_internal:
> +        .long 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B, 0x3FB8AA3B           /* _sInvLn2  */  //k=0
> +        .align 32
> +        .long 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000, 0x3F317000           /* _sLn2hi   */
> +        .align 32
> +        .long 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4, 0x3805FDF4           /* _sLn2lo   */
> +        .align 32
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSign    */
> +        .align 32
> +        .long 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000, 0x4b400000           /* _sShifter */
> +        .align 32
> +        .long 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E, 0x42AEAC4E           /* _iDomainRange */
> +        .align 32
> +        .long 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000         /* _sPC1=1  */
> +        .align 32
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000         /* _sPC2  */
> +        .align 32
> +        .long 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57, 0x3e2aaa57         /* _sPC3  */
> +        .align 32
> +        .long 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72, 0x3d2aaa72         /* _sPC4  */
> +        .align 32
> +        .long 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461, 0x3c091461         /* _sPC5  */
> +        .align 32
> +        .long 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3, 0x3ab6a8a3         /* _sPC6  */
> +        // Integer constants
> +        .align 32
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000 /* _iHalf*/
> +        .align 32
> +        .type	__svml_ssinh_data_internal,@object
> +        .size	__svml_ssinh_data_internal,.-__svml_ssinh_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_sinh2_core.S b/sysdeps/x86_64/fpu/svml_d_sinh2_core.S
> new file mode 100644
> index 0000000000..91bda7318c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_sinh2_core.S
> @@ -0,0 +1,29 @@
> +/* Function sinh vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_sinh)
> +WRAPPER_IMPL_SSE2 sinh
> +END (_ZGVbN2v_sinh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_sinh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_sinh4_core.S b/sysdeps/x86_64/fpu/svml_d_sinh4_core.S
> new file mode 100644
> index 0000000000..7b8091946a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_sinh4_core.S
> @@ -0,0 +1,29 @@
> +/* Function sinh vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_sinh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_sinh
> +END (_ZGVdN4v_sinh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_sinh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S
> new file mode 100644
> index 0000000000..f773bf110c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_sinh4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function sinh vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_sinh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_sinh
> +END (_ZGVcN4v_sinh)
> diff --git a/sysdeps/x86_64/fpu/svml_d_sinh8_core.S b/sysdeps/x86_64/fpu/svml_d_sinh8_core.S
> new file mode 100644
> index 0000000000..153a18429c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_sinh8_core.S
> @@ -0,0 +1,25 @@
> +/* Function sinh vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_sinh)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_sinh
> +END (_ZGVeN8v_sinh)
> diff --git a/sysdeps/x86_64/fpu/svml_s_sinhf16_core.S b/sysdeps/x86_64/fpu/svml_s_sinhf16_core.S
> new file mode 100644
> index 0000000000..f8dc7da336
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_sinhf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function sinhf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_sinhf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_sinhf
> +END (_ZGVeN16v_sinhf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_sinhf4_core.S b/sysdeps/x86_64/fpu/svml_s_sinhf4_core.S
> new file mode 100644
> index 0000000000..d065d03eb6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_sinhf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function sinhf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_sinhf)
> +WRAPPER_IMPL_SSE2 sinhf
> +END (_ZGVbN4v_sinhf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_sinhf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_sinhf8_core.S b/sysdeps/x86_64/fpu/svml_s_sinhf8_core.S
> new file mode 100644
> index 0000000000..1194699a76
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_sinhf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function sinhf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_sinhf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_sinhf
> +END (_ZGVdN8v_sinhf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_sinhf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S
> new file mode 100644
> index 0000000000..82c6b9b239
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_sinhf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function sinhf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_sinhf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_sinhf
> +END (_ZGVcN8v_sinhf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx.c
> new file mode 100644
> index 0000000000..55aa36d866
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-sinh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx2.c
> new file mode 100644
> index 0000000000..55aa36d866
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-sinh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx512f.c
> new file mode 100644
> index 0000000000..55aa36d866
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-sinh-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-sinh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-sinh.c b/sysdeps/x86_64/fpu/test-double-libmvec-sinh.c
> new file mode 100644
> index 0000000000..82dcaf745d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-sinh.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC sinh
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 0222f9f5b8..db136cc901 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVbN2v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVbN2v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVbN2v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1)
> +VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVbN2v_sinh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 1aad9faf9c..5fc09ac8c0 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVdN4v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVdN4v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVdN4v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1)
> +VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVdN4v_sinh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index e404bf899d..26ef7fb365 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVcN4v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVcN4v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVcN4v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1)
> +VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVcN4v_sinh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 2b4de59343..c7055fca76 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2), _ZGVeN8v_exp2)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10), _ZGVeN8v_exp10)
>  VECTOR_WRAPPER (WRAPPER_NAME (cosh), _ZGVeN8v_cosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1)
> +VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVeN8v_sinh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx.c
> new file mode 100644
> index 0000000000..93986945f3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-sinhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx2.c
> new file mode 100644
> index 0000000000..93986945f3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-sinhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx512f.c
> new file mode 100644
> index 0000000000..93986945f3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-sinhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c
> new file mode 100644
> index 0000000000..fb1f3c5c48
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-sinhf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC sinhf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 9a4a1b84a9..d353bcb0f2 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVeN16v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVeN16v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVeN16v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f)
> +VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVeN16v_sinhf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index eb4e36d0e2..5e59117626 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVbN4v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVbN4v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVbN4v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f)
> +VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVbN4v_sinhf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index d8adab59e6..e884a5f4df 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVdN8v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVdN8v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVdN8v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f)
> +VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVdN8v_sinhf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index e6e1a90c72..95910d39e9 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -35,6 +35,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (exp2f), _ZGVcN8v_exp2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (exp10f), _ZGVcN8v_exp10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (coshf), _ZGVcN8v_coshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f)
> +VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVcN8v_sinhf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 11/18] x86-64: Add vector log10/log10f implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 11/18] x86-64: Add vector log10/log10f " Sunil K Pandey
@ 2021-12-29 21:26   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:26 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:53PM -0800, Sunil K Pandey wrote:
> Implement vectorized log10/log10f containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector log10/log10f with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |   11 +
>  math/bits/mathcalls.h                         |    2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |    4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |    1 +
>  sysdeps/x86_64/fpu/Versions                   |    2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
>  .../fpu/multiarch/svml_d_log102_core-sse2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_d_log102_core.c |   27 +
>  .../fpu/multiarch/svml_d_log102_core_sse4.S   | 1089 +++++++++++++++++
>  .../fpu/multiarch/svml_d_log104_core-sse.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_log104_core.c |   27 +
>  .../fpu/multiarch/svml_d_log104_core_avx2.S   | 1074 ++++++++++++++++
>  .../fpu/multiarch/svml_d_log108_core-avx2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_d_log108_core.c |   27 +
>  .../fpu/multiarch/svml_d_log108_core_avx512.S |  299 +++++
>  .../fpu/multiarch/svml_s_log10f16_core-avx2.S |   20 +
>  .../fpu/multiarch/svml_s_log10f16_core.c      |   28 +
>  .../multiarch/svml_s_log10f16_core_avx512.S   |  238 ++++
>  .../fpu/multiarch/svml_s_log10f4_core-sse2.S  |   20 +
>  .../fpu/multiarch/svml_s_log10f4_core.c       |   28 +
>  .../fpu/multiarch/svml_s_log10f4_core_sse4.S  |  243 ++++
>  .../fpu/multiarch/svml_s_log10f8_core-sse.S   |   20 +
>  .../fpu/multiarch/svml_s_log10f8_core.c       |   28 +
>  .../fpu/multiarch/svml_s_log10f8_core_avx2.S  |  243 ++++
>  sysdeps/x86_64/fpu/svml_d_log102_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_d_log104_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_d_log104_core_avx.S   |   25 +
>  sysdeps/x86_64/fpu/svml_d_log108_core.S       |   25 +
>  sysdeps/x86_64/fpu/svml_s_log10f16_core.S     |   25 +
>  sysdeps/x86_64/fpu/svml_s_log10f4_core.S      |   29 +
>  sysdeps/x86_64/fpu/svml_s_log10f8_core.S      |   29 +
>  sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S  |   25 +
>  .../fpu/test-double-libmvec-log10-avx.c       |    1 +
>  .../fpu/test-double-libmvec-log10-avx2.c      |    1 +
>  .../fpu/test-double-libmvec-log10-avx512f.c   |    1 +
>  .../x86_64/fpu/test-double-libmvec-log10.c    |    3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
>  .../fpu/test-float-libmvec-log10f-avx.c       |    1 +
>  .../fpu/test-float-libmvec-log10f-avx2.c      |    1 +
>  .../fpu/test-float-libmvec-log10f-avx512f.c   |    1 +
>  .../x86_64/fpu/test-float-libmvec-log10f.c    |    3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
>  50 files changed, 3758 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log102_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log102_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log102_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log104_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log104_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log104_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log108_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log108_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log108_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log102_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log104_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log104_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log108_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log10.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log10f.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 31878bf4ed..4ad584c227 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -219,4 +219,15 @@
>  #define __DECL_SIMD_atan2f32x
>  #define __DECL_SIMD_atan2f64x
>  #define __DECL_SIMD_atan2f128x
> +
> +#define __DECL_SIMD_log10
> +#define __DECL_SIMD_log10f
> +#define __DECL_SIMD_log10l
> +#define __DECL_SIMD_log10f16
> +#define __DECL_SIMD_log10f32
> +#define __DECL_SIMD_log10f64
> +#define __DECL_SIMD_log10f128
> +#define __DECL_SIMD_log10f32x
> +#define __DECL_SIMD_log10f64x
> +#define __DECL_SIMD_log10f128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 1bd4911993..f21384758a 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -104,7 +104,7 @@ __MATHCALL (ldexp,, (_Mdouble_ __x, int __exponent));
>  __MATHCALL_VEC (log,, (_Mdouble_ __x));
>  
>  /* Base-ten logarithm of X.  */
> -__MATHCALL (log10,, (_Mdouble_ __x));
> +__MATHCALL_VEC (log10,, (_Mdouble_ __x));
>  
>  /* Break VALUE into integral and fractional parts.  */
>  __MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr)) __nonnull ((2));
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index 2b3b8d3886..8108a2a189 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -54,6 +54,7 @@ GLIBC_2.35 _ZGVbN2v_cosh F
>  GLIBC_2.35 _ZGVbN2v_exp10 F
>  GLIBC_2.35 _ZGVbN2v_exp2 F
>  GLIBC_2.35 _ZGVbN2v_expm1 F
> +GLIBC_2.35 _ZGVbN2v_log10 F
>  GLIBC_2.35 _ZGVbN2v_sinh F
>  GLIBC_2.35 _ZGVbN2vv_atan2 F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
> @@ -65,6 +66,7 @@ GLIBC_2.35 _ZGVbN4v_coshf F
>  GLIBC_2.35 _ZGVbN4v_exp10f F
>  GLIBC_2.35 _ZGVbN4v_exp2f F
>  GLIBC_2.35 _ZGVbN4v_expm1f F
> +GLIBC_2.35 _ZGVbN4v_log10f F
>  GLIBC_2.35 _ZGVbN4v_sinhf F
>  GLIBC_2.35 _ZGVbN4vv_atan2f F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
> @@ -76,6 +78,7 @@ GLIBC_2.35 _ZGVcN4v_cosh F
>  GLIBC_2.35 _ZGVcN4v_exp10 F
>  GLIBC_2.35 _ZGVcN4v_exp2 F
>  GLIBC_2.35 _ZGVcN4v_expm1 F
> +GLIBC_2.35 _ZGVcN4v_log10 F
>  GLIBC_2.35 _ZGVcN4v_sinh F
>  GLIBC_2.35 _ZGVcN4vv_atan2 F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
> @@ -87,6 +90,7 @@ GLIBC_2.35 _ZGVcN8v_coshf F
>  GLIBC_2.35 _ZGVcN8v_exp10f F
>  GLIBC_2.35 _ZGVcN8v_exp2f F
>  GLIBC_2.35 _ZGVcN8v_expm1f F
> +GLIBC_2.35 _ZGVcN8v_log10f F
>  GLIBC_2.35 _ZGVcN8v_sinhf F
>  GLIBC_2.35 _ZGVcN8vv_atan2f F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
> @@ -98,6 +102,7 @@ GLIBC_2.35 _ZGVdN4v_cosh F
>  GLIBC_2.35 _ZGVdN4v_exp10 F
>  GLIBC_2.35 _ZGVdN4v_exp2 F
>  GLIBC_2.35 _ZGVdN4v_expm1 F
> +GLIBC_2.35 _ZGVdN4v_log10 F
>  GLIBC_2.35 _ZGVdN4v_sinh F
>  GLIBC_2.35 _ZGVdN4vv_atan2 F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
> @@ -109,6 +114,7 @@ GLIBC_2.35 _ZGVdN8v_coshf F
>  GLIBC_2.35 _ZGVdN8v_exp10f F
>  GLIBC_2.35 _ZGVdN8v_exp2f F
>  GLIBC_2.35 _ZGVdN8v_expm1f F
> +GLIBC_2.35 _ZGVdN8v_log10f F
>  GLIBC_2.35 _ZGVdN8v_sinhf F
>  GLIBC_2.35 _ZGVdN8vv_atan2f F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
> @@ -120,6 +126,7 @@ GLIBC_2.35 _ZGVeN16v_coshf F
>  GLIBC_2.35 _ZGVeN16v_exp10f F
>  GLIBC_2.35 _ZGVeN16v_exp2f F
>  GLIBC_2.35 _ZGVeN16v_expm1f F
> +GLIBC_2.35 _ZGVeN16v_log10f F
>  GLIBC_2.35 _ZGVeN16v_sinhf F
>  GLIBC_2.35 _ZGVeN16vv_atan2f F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
> @@ -131,6 +138,7 @@ GLIBC_2.35 _ZGVeN8v_cosh F
>  GLIBC_2.35 _ZGVeN8v_exp10 F
>  GLIBC_2.35 _ZGVeN8v_exp2 F
>  GLIBC_2.35 _ZGVeN8v_expm1 F
> +GLIBC_2.35 _ZGVeN8v_log10 F
>  GLIBC_2.35 _ZGVeN8v_sinh F
>  GLIBC_2.35 _ZGVeN8vv_atan2 F
>  GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 62f2890ab3..64e80ada7a 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -102,6 +102,10 @@
>  #  define __DECL_SIMD_atan2 __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_atan2f
>  #  define __DECL_SIMD_atan2f __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_log10
> +#  define __DECL_SIMD_log10 __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_log10f
> +#  define __DECL_SIMD_log10f __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 2269b74d50..f5050c68af 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -50,6 +50,8 @@
>  !GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (atan2) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (atan2f) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (log10) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (log10f) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -85,3 +87,5 @@
>  !GCC$ builtin (cbrtf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (atan2) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (atan2f) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (log10) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (log10f) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 96a40856fa..ba37044e9d 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -35,6 +35,7 @@ libmvec-funcs = \
>    expm1 \
>    hypot \
>    log \
> +  log10 \
>    pow \
>    sin \
>    sincos \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index f58c98eb45..8beaf0736f 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -22,6 +22,7 @@ libmvec {
>      _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
>      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
>      _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
> +    _ZGVbN2v_log10; _ZGVcN4v_log10; _ZGVdN4v_log10; _ZGVeN8v_log10;
>      _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
>      _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
> @@ -33,6 +34,7 @@ libmvec {
>      _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
>      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
>      _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
> +    _ZGVbN4v_log10f; _ZGVcN8v_log10f; _ZGVdN8v_log10f; _ZGVeN16v_log10f;
>      _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
>      _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
>      _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 6f59c61756..b0cd9d60ea 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -1641,6 +1641,26 @@ float: 2
>  float128: 1
>  ldouble: 1
>  
> +Function: "log10_vlen16":
> +float: 1
> +
> +Function: "log10_vlen2":
> +double: 1
> +
> +Function: "log10_vlen4":
> +double: 1
> +float: 1
> +
> +Function: "log10_vlen4_avx2":
> +double: 1
> +
> +Function: "log10_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "log10_vlen8_avx2":
> +float: 1
> +
>  Function: "log1p":
>  double: 1
>  float: 1
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core-sse2.S
> new file mode 100644
> index 0000000000..e654db6d6c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized log10, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_log10 _ZGVbN2v_log10_sse2
> +#include "../svml_d_log102_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core.c
> new file mode 100644
> index 0000000000..1c775f33b6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized log10, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_log10
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_log10, __GI__ZGVbN2v_log10, __redirect__ZGVbN2v_log10)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core_sse4.S
> new file mode 100644
> index 0000000000..33372f576f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log102_core_sse4.S
> @@ -0,0 +1,1089 @@
> +/* Function log10 vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
> + *       log10(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dlog10_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	4112
> +#define poly_coeff                    	8224
> +#define ExpMask                       	8304
> +#define Two10                         	8320
> +#define MinNorm                       	8336
> +#define MaxNorm                       	8352
> +#define HalfMask                      	8368
> +#define One                           	8384
> +#define Threshold                     	8400
> +#define Bias                          	8416
> +#define Bias1                         	8432
> +#define L2                            	8448
> +
> +/* Lookup bias for data table __svml_dlog10_data_internal.  */
> +#define Table_Lookup_Bias               -0x406ff0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_log10_sse4)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $64, %rsp
> +
> +/* exponent bits */
> +        movaps    %xmm0, %xmm5
> +
> +/* preserve mantissa, set input exponent to 2^(-10) */
> +        movups    ExpMask+__svml_dlog10_data_internal(%rip), %xmm1
> +        psrlq     $20, %xmm5
> +        andps     %xmm0, %xmm1
> +        lea       Table_Lookup_Bias+__svml_dlog10_data_internal(%rip), %rsi
> +        orps      Two10+__svml_dlog10_data_internal(%rip), %xmm1
> +
> +/* check range */
> +        movaps    %xmm0, %xmm8
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        cvtpd2ps  %xmm1, %xmm2
> +        cmpltpd   MinNorm+__svml_dlog10_data_internal(%rip), %xmm8
> +        movlhps   %xmm2, %xmm2
> +        movaps    %xmm0, %xmm7
> +        rcpps     %xmm2, %xmm3
> +        cmpnlepd  MaxNorm+__svml_dlog10_data_internal(%rip), %xmm7
> +        cvtps2pd  %xmm3, %xmm12
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        movups    .FLT_12(%rip), %xmm4
> +        orps      %xmm7, %xmm8
> +        addpd     %xmm4, %xmm12
> +
> +/* combine and get argument value range mask */
> +        movmskpd  %xmm8, %edx
> +
> +/* argument reduction */
> +        movups    HalfMask+__svml_dlog10_data_internal(%rip), %xmm9
> +        subpd     %xmm4, %xmm12
> +        andps     %xmm1, %xmm9
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        movaps    %xmm12, %xmm10
> +        subpd     %xmm9, %xmm1
> +        mulpd     %xmm12, %xmm9
> +        mulpd     %xmm12, %xmm1
> +        subpd     One+__svml_dlog10_data_internal(%rip), %xmm9
> +        addpd     %xmm9, %xmm1
> +
> +/* polynomial */
> +        movups    poly_coeff+__svml_dlog10_data_internal(%rip), %xmm14
> +        psrlq     $40, %xmm10
> +        mulpd     %xmm1, %xmm14
> +        movd      %xmm10, %eax
> +        pshufd    $2, %xmm10, %xmm11
> +        movaps    %xmm1, %xmm10
> +        movups    poly_coeff+32+__svml_dlog10_data_internal(%rip), %xmm15
> +        mulpd     %xmm1, %xmm10
> +        addpd     poly_coeff+16+__svml_dlog10_data_internal(%rip), %xmm14
> +        mulpd     %xmm1, %xmm15
> +        mulpd     %xmm10, %xmm14
> +        addpd     poly_coeff+48+__svml_dlog10_data_internal(%rip), %xmm15
> +        movd      %xmm11, %ecx
> +
> +/* exponent*log(2.0) */
> +        movups    Threshold+__svml_dlog10_data_internal(%rip), %xmm13
> +        addpd     %xmm14, %xmm15
> +        cmpltpd   %xmm12, %xmm13
> +        mulpd     %xmm15, %xmm10
> +        pshufd    $221, %xmm5, %xmm6
> +        movups    poly_coeff+64+__svml_dlog10_data_internal(%rip), %xmm11
> +
> +/* biased exponent in DP format */
> +        cvtdq2pd  %xmm6, %xmm3
> +        mulpd     %xmm1, %xmm11
> +        andps     Bias+__svml_dlog10_data_internal(%rip), %xmm13
> +        orps      Bias1+__svml_dlog10_data_internal(%rip), %xmm13
> +        subpd     %xmm13, %xmm3
> +        addpd     %xmm10, %xmm11
> +        mulpd     L2+__svml_dlog10_data_internal(%rip), %xmm3
> +        movslq    %eax, %rax
> +        movslq    %ecx, %rcx
> +        movsd     (%rsi,%rax), %xmm2
> +        movhpd    (%rsi,%rcx), %xmm2
> +
> +/* reconstruction */
> +        addpd     %xmm11, %xmm2
> +        addpd     %xmm2, %xmm3
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm3, %xmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm3, 48(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm3
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 xmm3
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      log10@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVbN2v_log10_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dlog10_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 Log_HA_table[(1<<9)+2][2];
> +        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(16)) VUINT32 poly_coeff[5][2][2];
> +        __declspec(align(16)) VUINT32 ExpMask[2][2];
> +        __declspec(align(16)) VUINT32 Two10[2][2];
> +        __declspec(align(16)) VUINT32 MinNorm[2][2];
> +        __declspec(align(16)) VUINT32 MaxNorm[2][2];
> +        __declspec(align(16)) VUINT32 HalfMask[2][2];
> +        __declspec(align(16)) VUINT32 One[2][2];
> +        __declspec(align(16)) VUINT32 Threshold[2][2];
> +        __declspec(align(16)) VUINT32 Bias[2][2];
> +        __declspec(align(16)) VUINT32 Bias1[2][2];
> +        __declspec(align(16)) VUINT32 L2[2][2];
> +} __svml_dlog10_data_internal;
> +#endif
> +__svml_dlog10_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc0733a7146f6b080, 0xbe1e707ce619c200
> +        .quad 0xc0733a7547771970, 0xbe1e79c6c06d6f51
> +        .quad 0xc0733a7945aacb70, 0xbe1e78e225fad29c
> +        .quad 0xc0733a7d41946970, 0xbe1e76d607f9693b
> +        .quad 0xc0733a813b3691f0, 0xbe1e7704b3e0685b
> +        .quad 0xc0733a853293df00, 0xbe1e79c1216a27fa
> +        .quad 0xc0733a8927aee660, 0xbe1e76dce5734a81
> +        .quad 0xc0733a8d1a8a3920, 0xbe1e782ee2ca4dba
> +        .quad 0xc0733a910b286430, 0xbe1e7812d1a0a61f
> +        .quad 0xc0733a94f98bf010, 0xbe1e77e1b5ecbc61
> +        .quad 0xc0733a98e5b76100, 0xbe1e76635cac1586
> +        .quad 0xc0733a9ccfad36f0, 0xbe1e7638f7968f32
> +        .quad 0xc0733aa0b76feda0, 0xbe1e7840ee76e365
> +        .quad 0xc0733aa49d01fcb0, 0xbe1e79f3fd01907e
> +        .quad 0xc0733aa88065d7a0, 0xbe1e77bbb3a9c38a
> +        .quad 0xc0733aac619dedb0, 0xbe1e7742719bf41d
> +        .quad 0xc0733ab040acaa20, 0xbe1e79bcedaf79cb
> +        .quad 0xc0733ab41d947450, 0xbe1e762d63cb7ca0
> +        .quad 0xc0733ab7f857af50, 0xbe1e77a07be83403
> +        .quad 0xc0733abbd0f8ba80, 0xbe1e7763ff836ad0
> +        .quad 0xc0733abfa779f130, 0xbe1e7737720ead39
> +        .quad 0xc0733ac37bddaad0, 0xbe1e7776a08e55e7
> +        .quad 0xc0733ac74e263af0, 0xbe1e793e3c52dd36
> +        .quad 0xc0733acb1e55f160, 0xbe1e788a94695051
> +        .quad 0xc0733aceec6f1a10, 0xbe1e76508114a813
> +        .quad 0xc0733ad2b873fd20, 0xbe1e76909457d23e
> +        .quad 0xc0733ad68266df10, 0xbe1e7664a24f9ca4
> +        .quad 0xc0733ada4a4a0090, 0xbe1e7a07b3d44b18
> +        .quad 0xc0733ade101f9ee0, 0xbe1e76d87594704d
> +        .quad 0xc0733ae1d3e9f340, 0xbe1e79563595a182
> +        .quad 0xc0733ae595ab33b0, 0xbe1e771880c3c6ab
> +        .quad 0xc0733ae955659250, 0xbe1e78c171f517d4
> +        .quad 0xc0733aed131b3df0, 0xbe1e77eac3874666
> +        .quad 0xc0733af0cece61b0, 0xbe1e790db479d8f6
> +        .quad 0xc0733af488812550, 0xbe1e7965d1aa5c90
> +        .quad 0xc0733af84035ad10, 0xbe1e78ceb398ba47
> +        .quad 0xc0733afbf5ee19c0, 0xbe1e779cc0dcb5aa
> +        .quad 0xc0733affa9ac88c0, 0xbe1e7871053953ed
> +        .quad 0xc0733b035b731420, 0xbe1e7a082cffa71a
> +        .quad 0xc0733b070b43d2a0, 0xbe1e7904b4382fad
> +        .quad 0xc0733b0ab920d790, 0xbe1e79b458d0b4f3
> +        .quad 0xc0733b0e650c3310, 0xbe1e79d0ded414c6
> +        .quad 0xc0733b120f07f200, 0xbe1e763c357a1943
> +        .quad 0xc0733b15b7161dd0, 0xbe1e78b80ba6daaa
> +        .quad 0xc0733b195d38bd00, 0xbe1e7998e23b8ffd
> +        .quad 0xc0733b1d0171d2c0, 0xbe1e7974aa65ee8c
> +        .quad 0xc0733b20a3c35f20, 0xbe1e76ccfde752ab
> +        .quad 0xc0733b24442f5ef0, 0xbe1e77b4ff19debb
> +        .quad 0xc0733b27e2b7cc10, 0xbe1e7772ee478542
> +        .quad 0xc0733b2b7f5e9d30, 0xbe1e781d81b58b44
> +        .quad 0xc0733b2f1a25c600, 0xbe1e78350d967565
> +        .quad 0xc0733b32b30f3720, 0xbe1e783888e48152
> +        .quad 0xc0733b364a1cde30, 0xbe1e78367bf7c111
> +        .quad 0xc0733b39df50a5d0, 0xbe1e7959e57ca47d
> +        .quad 0xc0733b3d72ac75c0, 0xbe1e777322423222
> +        .quad 0xc0733b41043232b0, 0xbe1e767ce42a60aa
> +        .quad 0xc0733b4493e3be70, 0xbe1e781d445aea19
> +        .quad 0xc0733b4821c2f800, 0xbe1e7922fca18e18
> +        .quad 0xc0733b4badd1bb80, 0xbe1e76fed3d40647
> +        .quad 0xc0733b4f3811e210, 0xbe1e793948c9eabc
> +        .quad 0xc0733b52c0854240, 0xbe1e76e487656b8c
> +        .quad 0xc0733b56472daf90, 0xbe1e780ab2f71223
> +        .quad 0xc0733b59cc0cfaf0, 0xbe1e77189120b09c
> +        .quad 0xc0733b5d4f24f270, 0xbe1e7644a0343a12
> +        .quad 0xc0733b60d0776160, 0xbe1e78f2a3e4733d
> +        .quad 0xc0733b6450061080, 0xbe1e7913b2f73ae5
> +        .quad 0xc0733b67cdd2c5c0, 0xbe1e7882d08393b5
> +        .quad 0xc0733b6b49df4470, 0xbe1e765e1b209979
> +        .quad 0xc0733b6ec42d4d20, 0xbe1e785c9c4620d4
> +        .quad 0xc0733b75b394f240, 0xbe1e78878cd0e956
> +        .quad 0xc0733b7c9c178630, 0xbe1e789a4112d90b
> +        .quad 0xc0733b837dc2b0f0, 0xbe1e79050b8a1766
> +        .quad 0xc0733b8a58a3f220, 0xbe1e7790dffc47aa
> +        .quad 0xc0733b912cc8a180, 0xbe1e77174593b06a
> +        .quad 0xc0733b97fa3defb0, 0xbe1e7677de2d2ecc
> +        .quad 0xc0733b9ec110e6b0, 0xbe1e76cff477ca18
> +        .quad 0xc0733ba5814e6a80, 0xbe1e78f8644dec7b
> +        .quad 0xc0733bac3b0339d0, 0xbe1e764e1361788d
> +        .quad 0xc0733bb2ee3bee30, 0xbe1e78c913e738de
> +        .quad 0xc0733bb99b04fd30, 0xbe1e76666f5bddaa
> +        .quad 0xc0733bc0416ab850, 0xbe1e77e87cbd8ab6
> +        .quad 0xc0733bc6e1794e10, 0xbe1e76f18ba1c966
> +        .quad 0xc0733bcd7b3cca10, 0xbe1e777c9461b8db
> +        .quad 0xc0733bd40ec115d0, 0xbe1e78b78526ffac
> +        .quad 0xc0733bda9c11f920, 0xbe1e7942abecfede
> +        .quad 0xc0733be1233b1aa0, 0xbe1e76d8a684fd8c
> +        .quad 0xc0733be7a4480010, 0xbe1e79622b539ac9
> +        .quad 0xc0733bee1f440f30, 0xbe1e7978e7cc20ea
> +        .quad 0xc0733bf4943a8de0, 0xbe1e765c9c9de825
> +        .quad 0xc0733bfb0336a290, 0xbe1e775d8b138ee2
> +        .quad 0xc0733c016c435500, 0xbe1e78bf33465c2f
> +        .quad 0xc0733c07cf6b8e80, 0xbe1e78164f7cc441
> +        .quad 0xc0733c0e2cba1a50, 0xbe1e7824e64d0b23
> +        .quad 0xc0733c148439a630, 0xbe1e78373ae7dd81
> +        .quad 0xc0733c1ad5f4c2c0, 0xbe1e7704513e0afe
> +        .quad 0xc0733c2121f5e3d0, 0xbe1e7914aa84200f
> +        .quad 0xc0733c2768476110, 0xbe1e76b1cde25cf6
> +        .quad 0xc0733c2da8f37600, 0xbe1e796120e3862d
> +        .quad 0xc0733c33e40442e0, 0xbe1e78ec836d7e7b
> +        .quad 0xc0733c3a1983cca0, 0xbe1e77fb13b7dabb
> +        .quad 0xc0733c40497bfd70, 0xbe1e783c6fcb2404
> +        .quad 0xc0733c4673f6a530, 0xbe1e7628bb93dce8
> +        .quad 0xc0733c4c98fd7990, 0xbe1e7857a47b5001
> +        .quad 0xc0733c52b89a16d0, 0xbe1e76708dc2831f
> +        .quad 0xc0733c58d2d5ffa0, 0xbe1e77b6038651f1
> +        .quad 0xc0733c5ee7ba9de0, 0xbe1e792e855bb5b2
> +        .quad 0xc0733c64f75142d0, 0xbe1e776cacd5c105
> +        .quad 0xc0733c6b01a32740, 0xbe1e77f8a8011315
> +        .quad 0xc0733c7106b96c30, 0xbe1e765cf3efcfde
> +        .quad 0xc0733c77069d1ad0, 0xbe1e78d837d2efac
> +        .quad 0xc0733c7d01572530, 0xbe1e78b615cf772c
> +        .quad 0xc0733c82f6f06640, 0xbe1e7650bbbd7a25
> +        .quad 0xc0733c88e771a220, 0xbe1e78bcf3495872
> +        .quad 0xc0733c8ed2e386c0, 0xbe1e792266832e84
> +        .quad 0xc0733c94b94eabd0, 0xbe1e79c1c3c2ca52
> +        .quad 0xc0733c9a9abb9340, 0xbe1e78aa61e5807d
> +        .quad 0xc0733ca07732a970, 0xbe1e7620fc4cf156
> +        .quad 0xc0733ca64ebc4570, 0xbe1e76b914a832c5
> +        .quad 0xc0733cac2160a970, 0xbe1e79227f72020e
> +        .quad 0xc0733cb1ef280300, 0xbe1e77ac972cc008
> +        .quad 0xc0733cb7b81a6b10, 0xbe1e798089be41f4
> +        .quad 0xc0733cbd7c3fe6a0, 0xbe1e77942ae037fe
> +        .quad 0xc0733cc33ba06690, 0xbe1e7956ae6463d9
> +        .quad 0xc0733cc8f643c850, 0xbe1e7918a50c7942
> +        .quad 0xc0733cceac31d5d0, 0xbe1e78308eeab604
> +        .quad 0xc0733cd45d7245e0, 0xbe1e76dd4ea88445
> +        .quad 0xc0733cda0a0cbc60, 0xbe1e77e7c1aa5909
> +        .quad 0xc0733cdfb208caa0, 0xbe1e7804b9d20e54
> +        .quad 0xc0733ce5556def70, 0xbe1e78f88e99d49c
> +        .quad 0xc0733ceaf4439780, 0xbe1e787d74682d68
> +        .quad 0xc0733cf08e911d80, 0xbe1e76edc24fe6e7
> +        .quad 0xc0733cf6245dca50, 0xbe1e79b347ec86d2
> +        .quad 0xc0733cfbb5b0d580, 0xbe1e797cceb2c39b
> +        .quad 0xc0733d0142916530, 0xbe1e783adbdc6aa1
> +        .quad 0xc0733d06cb068e70, 0xbe1e76e4c20e3d9e
> +        .quad 0xc0733d0c4f175570, 0xbe1e77070bf3cf61
> +        .quad 0xc0733d11cecaadc0, 0xbe1e781c43502734
> +        .quad 0xc0733d174a277a80, 0xbe1e78b11268ea72
> +        .quad 0xc0733d1cc1348e90, 0xbe1e7754b83bfc7d
> +        .quad 0xc0733d2233f8acb0, 0xbe1e7756c29bf5e9
> +        .quad 0xc0733d27a27a87d0, 0xbe1e7952fc1d9333
> +        .quad 0xc0733d2d0cc0c350, 0xbe1e778c76ae6077
> +        .quad 0xc0733d3272d1f2e0, 0xbe1e7a1896ba8f43
> +        .quad 0xc0733d37d4b49b30, 0xbe1e76dafdf432d8
> +        .quad 0xc0733d3d326f3180, 0xbe1e795330184013
> +        .quad 0xc0733d428c081c80, 0xbe1e763cc774d30f
> +        .quad 0xc0733d47e185b3d0, 0xbe1e77030a779c0a
> +        .quad 0xc0733d4d32ee40b0, 0xbe1e7908af2a2d7e
> +        .quad 0xc0733d528047fe00, 0xbe1e78c4953b797d
> +        .quad 0xc0733d57c9991850, 0xbe1e78b43b096579
> +        .quad 0xc0733d5d0ee7ae30, 0xbe1e7824ae0a4804
> +        .quad 0xc0733d625039d040, 0xbe1e79d2b2fbb740
> +        .quad 0xc0733d678d958190, 0xbe1e7662de59a1a6
> +        .quad 0xc0733d6cc700b760, 0xbe1e76b251d59aaa
> +        .quad 0xc0733d71fc8159b0, 0xbe1e7a00cfd1f487
> +        .quad 0xc0733d772e1d4360, 0xbe1e77f4d246167e
> +        .quad 0xc0733d7c5bda4200, 0xbe1e767a4ee8e6fc
> +        .quad 0xc0733d8185be1640, 0xbe1e777ccf0a8aed
> +        .quad 0xc0733d86abce7420, 0xbe1e767d7e279ada
> +        .quad 0xc0733d8bce1102d0, 0xbe1e7a05cef4bb90
> +        .quad 0xc0733d90ec8b5d40, 0xbe1e78f75369be5b
> +        .quad 0xc0733d96074311d0, 0xbe1e77b9612e8c8a
> +        .quad 0xc0733d9b1e3da2b0, 0xbe1e794518b9adeb
> +        .quad 0xc0733da031808620, 0xbe1e7810626fb934
> +        .quad 0xc0733da541112650, 0xbe1e76d87223fa6d
> +        .quad 0xc0733daa4cf4e1a0, 0xbe1e794c5e7ca3b5
> +        .quad 0xc0733daf55310af0, 0xbe1e789856ef816f
> +        .quad 0xc0733db459cae970, 0xbe1e77d2004effbd
> +        .quad 0xc0733db95ac7b8f0, 0xbe1e78467d31eb9c
> +        .quad 0xc0733dbe582caa00, 0xbe1e79aaa4e25787
> +        .quad 0xc0733dc351fee220, 0xbe1e762de8f107bf
> +        .quad 0xc0733dc848437b90, 0xbe1e7670670a63fe
> +        .quad 0xc0733dcd3aff85d0, 0xbe1e795ca237c6cc
> +        .quad 0xc0733dd22a3805b0, 0xbe1e77e55c53c1d9
> +        .quad 0xc0733dd715f1f520, 0xbe1e78a806213ac4
> +        .quad 0xc0733ddbfe3243b0, 0xbe1e77743a2bc615
> +        .quad 0xc0733de0e2fdd660, 0xbe1e78b8b45b0b7d
> +        .quad 0xc0733de5c4598800, 0xbe1e78d635f2f4b9
> +        .quad 0xc0733deaa24a2920, 0xbe1e7758c396a11e
> +        .quad 0xc0733def7cd48020, 0xbe1e7a17a8cc454c
> +        .quad 0xc0733df453fd49a0, 0xbe1e783caa73f616
> +        .quad 0xc0733df927c93820, 0xbe1e7932cfa29664
> +        .quad 0xc0733dfdf83cf490, 0xbe1e777d265c72a6
> +        .quad 0xc0733e02c55d1e10, 0xbe1e7775e7c03c60
> +        .quad 0xc0733e078f2e4a40, 0xbe1e79f65d52d232
> +        .quad 0xc0733e0c55b50570, 0xbe1e76e7e7464b4e
> +        .quad 0xc0733e1118f5d250, 0xbe1e77be81cad877
> +        .quad 0xc0733e15d8f52a80, 0xbe1e79dd25b5fb3a
> +        .quad 0xc0733e1a95b77e80, 0xbe1e78e45f1418ef
> +        .quad 0xc0733e1f4f4135a0, 0xbe1e78eb7289505b
> +        .quad 0xc0733e240596ae50, 0xbe1e78a468c07cad
> +        .quad 0xc0733e28b8bc3e20, 0xbe1e776b558a4009
> +        .quad 0xc0733e2d68b631d0, 0xbe1e77412eb9941e
> +        .quad 0xc0733e321588cd80, 0xbe1e76b2853f845e
> +        .quad 0xc0733e36bf384cb0, 0xbe1e76aa7184273c
> +        .quad 0xc0733e3b65c8e260, 0xbe1e7832027f78fa
> +        .quad 0xc0733e40093eb930, 0xbe1e7a1c7da131f5
> +        .quad 0xc0733e44a99df380, 0xbe1e76a0bc2ae4bc
> +        .quad 0xc0733e4946eaab30, 0xbe1e78dff13b6f5d
> +        .quad 0xc0733e4de128f250, 0xbe1e765a226dea2c
> +        .quad 0xc0733e52785cd290, 0xbe1e78509b989111
> +        .quad 0xc0733e570c8a4de0, 0xbe1e7916a4e9803d
> +        .quad 0xc0733e5b9db55e30, 0xbe1e7950c15758cc
> +        .quad 0xc0733e602be1f5a0, 0xbe1e7922ba1ad420
> +        .quad 0xc0733e64b713fe90, 0xbe1e794cbaabcef6
> +        .quad 0xc0733e693f4f5bc0, 0xbe1e7837bf883fed
> +        .quad 0xc0733e6dc497e850, 0xbe1e76f198ddbbdf
> +        .quad 0xc0733e7246f177d0, 0xbe1e7a18c1067764
> +        .quad 0xc0733e76c65fd6a0, 0xbe1e76b845a8fd9d
> +        .quad 0xc0733e7b42e6c970, 0xbe1e7714012df506
> +        .quad 0xc0733e7fbc8a0de0, 0xbe1e7765612922cd
> +        .quad 0xc0733e84334d5a50, 0xbe1e7688f5424a00
> +        .quad 0xc0733e88a7345df0, 0xbe1e769d011f6663
> +        .quad 0xc0733e8d1842c0e0, 0xbe1e79914acbfaf7
> +        .quad 0xc0733e91867c2460, 0xbe1e79a85e189bd7
> +        .quad 0xc0733e95f1e422a0, 0xbe1e79ea7c726432
> +        .quad 0xc0733e9a5a7e4f10, 0xbe1e768a6fbb8e6e
> +        .quad 0xc0733e9ec04e3620, 0xbe1e793c75bcc9fc
> +        .quad 0xc0733ea323575dd0, 0xbe1e797f78da13d4
> +        .quad 0xc0733ea7839d4550, 0xbe1e78d8c9cda978
> +        .quad 0xc0733eabe1236540, 0xbe1e77028d480fff
> +        .quad 0xc0733eb03bed2fa0, 0xbe1e7a0d0f74ff7c
> +        .quad 0xc0733eb493fe1040, 0xbe1e76732e8a35fb
> +        .quad 0xc0733eb8e9596c30, 0xbe1e77220caeabeb
> +        .quad 0xc0733ebd3c02a260, 0xbe1e797438b645ef
> +        .quad 0xc0733ec18bfd0b80, 0xbe1e79207c5fd6e8
> +        .quad 0xc0733ec5d94bf9f0, 0xbe1e781c7df8f946
> +        .quad 0xc0733eca23f2b9f0, 0xbe1e76736284e2db
> +        .quad 0xc0733ece6bf49190, 0xbe1e7a109cc0c3f5
> +        .quad 0xc0733ed2b154c120, 0xbe1e767f14a16d50
> +        .quad 0xc0733ed6f4168290, 0xbe1e789cd22acaf0
> +        .quad 0xc0733edb343d0a40, 0xbe1e764355ca28ad
> +        .quad 0xc0733edf71cb8660, 0xbe1e79e4c7a81c45
> +        .quad 0xc0733ee3acc51fb0, 0xbe1e761e26b644c2
> +        .quad 0xc0733ee7e52cf8c0, 0xbe1e793e9f8fbdd3
> +        .quad 0xc0733eec1b062ed0, 0xbe1e78c432991c20
> +        .quad 0xc0733ef04e53d940, 0xbe1e78cdd025f4d8
> +        .quad 0xc0733ef47f1909f0, 0xbe1e778310c6446e
> +        .quad 0xc0733ef8ad58cd20, 0xbe1e7871af3d6e17
> +        .quad 0xc0733efcd91629b0, 0xbe1e77e0e906f697
> +        .quad 0xc0733f01025420f0, 0xbe1e7a1ae9b27892
> +        .quad 0xc0733f052915af00, 0xbe1e76ac64c88f9d
> +        .quad 0xc0733f094d5dca60, 0xbe1e779a815589c4
> +        .quad 0xc0733f0d6f2f6480, 0xbe1e788f39a4864c
> +        .quad 0xc0733f118e8d6980, 0xbe1e79fc51263525
> +        .quad 0xc0733f15ab7ac060, 0xbe1e783501f19e90
> +        .quad 0xc0733f19c5fa4ae0, 0xbe1e767e82c327ab
> +        .quad 0xc0733f1dde0ee5a0, 0xbe1e7a1785d66123
> +        .quad 0xc0733f21f3bb6870, 0xbe1e7936d07203da
> +        .quad 0xc0733f260702a5e0, 0xbe1e7a010a7ac699
> +        .quad 0xc0733f2a17e76bb0, 0xbe1e7975e4e16312
> +        .quad 0xc0733f2e266c82b0, 0xbe1e7654b5422330
> +        .quad 0xc0733f323294aeb0, 0xbe1e77f8a4909d35
> +        .quad 0xc0733f363c62aee0, 0xbe1e792c8e30d226
> +        .quad 0xc0733f3a43d93da0, 0xbe1e76f6ac67a1ff
> +        .quad 0xc0733f3e48fb1070, 0xbe1e775c2e97715a
> +        .quad 0xc0733f424bcad840, 0xbe1e781cd54ae100
> +        /*== Log_LA_table ==*/
> +        .align 16
> +        .quad 0x0000000000000000
> +        .quad 0xbf4bc48a867884b7
> +        .quad 0xbf5bbd9e9482af09
> +        .quad 0xbf64c9096b94befd
> +        .quad 0xbf6bafd47221ed26
> +        .quad 0xbf714999e2ad8ea6
> +        .quad 0xbf74b99563d2a1bd
> +        .quad 0xbf7827de6b310350
> +        .quad 0xbf7b9476a4fcd10f
> +        .quad 0xbf7eff5fbaf25781
> +        .quad 0xbf81344daa2d7553
> +        .quad 0xbf82e8158b08d957
> +        .quad 0xbf849b0851443684
> +        .quad 0xbf864d26cce610dd
> +        .quad 0xbf87fe71ccc4e6b0
> +        .quad 0xbf89aeea1e897fdf
> +        .quad 0xbf8b5e908eb13790
> +        .quad 0xbf8d0d65e890405a
> +        .quad 0xbf8ebb6af653e2ee
> +        .quad 0xbf90345040825bad
> +        .quad 0xbf910a83a8446c78
> +        .quad 0xbf91e05015d30a71
> +        .quad 0xbf92b5b5ec0209d3
> +        .quad 0xbf938ab58d173e91
> +        .quad 0xbf945f4f5acb8be0
> +        .quad 0xbf953383b64bf13f
> +        .quad 0xbf960753003a94ef
> +        .quad 0xbf96dabd98afcc05
> +        .quad 0xbf97adc3df3b1ff8
> +        .quad 0xbf98806632e451d0
> +        .quad 0xbf9952a4f22c5ae9
> +        .quad 0xbf9a24807b0e6b5c
> +        .quad 0xbf9af5f92b00e610
> +        .quad 0xbf9bc70f5ef65a77
> +        .quad 0xbf9c97c3735e7c0a
> +        .quad 0xbf9d6815c4271775
> +        .quad 0xbf9e3806acbd058f
> +        .quad 0xbf9f0796880d1c19
> +        .quad 0xbf9fd6c5b0851c4c
> +        .quad 0xbfa052ca400a4f9b
> +        .quad 0xbfa0ba01a8170000
> +        .quad 0xbfa121093ce3a205
> +        .quad 0xbfa187e12aad8077
> +        .quad 0xbfa1ee899d74a03e
> +        .quad 0xbfa25502c0fc314c
> +        .quad 0xbfa2bb4cc0cafe8d
> +        .quad 0xbfa32167c82bdcda
> +        .quad 0xbfa38754022e18e2
> +        .quad 0xbfa3ed1199a5e425
> +        .quad 0xbfa452a0b92cc0ec
> +        .quad 0xbfa4b8018b21ed4f
> +        .quad 0xbfa51d3439aacd4a
> +        .quad 0xbfa58238eeb353da
> +        .quad 0xbfa5e70fd3ee6b34
> +        .quad 0xbfa64bb912d65c07
> +        .quad 0xbfa6b034d4ad33df
> +        .quad 0xbfa71483427d2a99
> +        .quad 0xbfa778a4851906f3
> +        .quad 0xbfa7dc98c51c8242
> +        .quad 0xbfa840602aecab3d
> +        .quad 0xbfa8a3fadeb847f4
> +        .quad 0xbfa90769087836e4
> +        .quad 0xbfa96aaacfefcf3c
> +        .quad 0xbfa9cdc05cad4042
> +        .quad 0xbfaa30a9d609efea
> +        .quad 0xbfaa9367632ad897
> +        .quad 0xbfaaf5f92b00e610
> +        .quad 0xbfab585f544951a4
> +        .quad 0xbfabba9a058dfd84
> +        .quad 0xbfac1ca96525cf56
> +        .quad 0xbfac7e8d993509f9
> +        .quad 0xbface046c7ada68d
> +        .quad 0xbfad41d5164facb4
> +        .quad 0xbfada338aaa98a0c
> +        .quad 0xbfae0471aa1868f5
> +        .quad 0xbfae658039c88690
> +        .quad 0xbfaec6647eb58808
> +        .quad 0xbfaf271e9daacf20
> +        .quad 0xbfaf87aebb43ce06
> +        .quad 0xbfafe814fbec5a77
> +        .quad 0xbfb02428c1f08016
> +        .quad 0xbfb054323b97a948
> +        .quad 0xbfb08426fcdb1ee7
> +        .quad 0xbfb0b40717932b96
> +        .quad 0xbfb0e3d29d81165e
> +        .quad 0xbfb11389a04f4a2e
> +        .quad 0xbfb1432c31917d08
> +        .quad 0xbfb172ba62c4d6de
> +        .quad 0xbfb1a23445501816
> +        .quad 0xbfb1d199ea83bfbe
> +        .quad 0xbfb200eb639a3173
> +        .quad 0xbfb23028c1b7daed
> +        .quad 0xbfb25f5215eb594a
> +        .quad 0xbfb28e67712d9dfc
> +        .quad 0xbfb2bd68e4621371
> +        .quad 0xbfb2ec568056c16f
> +        .quad 0xbfb31b3055c47118
> +        .quad 0xbfb349f6754ed0b4
> +        .quad 0xbfb378a8ef84971e
> +        .quad 0xbfb3a747d4dfa6f5
> +        .quad 0xbfb3d5d335c53179
> +        .quad 0xbfb4044b2285d925
> +        .quad 0xbfb432afab5dd3ff
> +        .quad 0xbfb46100e0750da1
> +        .quad 0xbfb48f3ed1df48fb
> +        .quad 0xbfb4bd698f9c41cf
> +        .quad 0xbfb4eb812997cde4
> +        .quad 0xbfb51985afa9fdfd
> +        .quad 0xbfb5477731973e85
> +        .quad 0xbfb57555bf1077f5
> +        .quad 0xbfb5a32167b32f02
> +        .quad 0xbfb5d0da3b09a47e
> +        .quad 0xbfb5fe80488af4fd
> +        .quad 0xbfb62c139f9b3837
> +        .quad 0xbfb659944f8ba02d
> +        .quad 0xbfb68702679a980a
> +        .quad 0xbfb6b45df6f3e2c9
> +        .quad 0xbfb6e1a70cb0b99a
> +        .quad 0xbfb70eddb7d7ea07
> +        .quad 0xbfb73c02075df3e5
> +        .quad 0xbfb769140a2526fd
> +        .quad 0xbfb79613cefdc07d
> +        .quad 0xbfb7c30164a60836
> +        .quad 0xbfb7efdcd9ca6d8f
> +        .quad 0xbfb81ca63d05a44a
> +        .quad 0xbfb8495d9ce0c10c
> +        .quad 0xbfb8760307d355ab
> +        .quad 0xbfb8a2968c438d41
> +        .quad 0xbfb8cf183886480d
> +        .quad 0xbfb8fb881adf3713
> +        .quad 0xbfb927e64180f790
> +        .quad 0xbfb95432ba8d2e2f
> +        .quad 0xbfb9806d9414a209
> +        .quad 0xbfb9ac96dc175776
> +        .quad 0xbfb9d8aea084aa9c
> +        .quad 0xbfba04b4ef3b69d8
> +        .quad 0xbfba30a9d609efea
> +        .quad 0xbfba5c8d62ae3dec
> +        .quad 0xbfba885fa2d6151e
> +        .quad 0xbfbab420a41f1076
> +        .quad 0xbfbadfd07416be07
> +        .quad 0xbfbb0b6f203ab82c
> +        .quad 0xbfbb36fcb5f8be8a
> +        .quad 0xbfbb627942aecedd
> +        .quad 0xbfbb8de4d3ab3d98
> +        .quad 0xbfbbb93f762cce4f
> +        .quad 0xbfbbe4893762cbf7
> +        .quad 0xbfbc0fc2246d20f5
> +        .quad 0xbfbc3aea4a5c6eff
> +        .quad 0xbfbc6601b63226cb
> +        .quad 0xbfbc910874e09f98
> +        .quad 0xbfbcbbfe934b2e81
> +        .quad 0xbfbce6e41e463da5
> +        .quad 0xbfbd11b92297632b
> +        .quad 0xbfbd3c7dacf5780b
> +        .quad 0xbfbd6731ca08aeb9
> +        .quad 0xbfbd91d5866aa99c
> +        .quad 0xbfbdbc68eea6915b
> +        .quad 0xbfbde6ec0f392b05
> +        .quad 0xbfbe115ef490ee07
> +        .quad 0xbfbe3bc1ab0e19fe
> +        .quad 0xbfbe66143f02cc5d
> +        .quad 0xbfbe9056bcb315e8
> +        .quad 0xbfbeba893055100b
> +        .quad 0xbfbee4aba610f204
> +        .quad 0xbfbf0ebe2a0125eb
> +        .quad 0xbfbf38c0c8325d86
> +        .quad 0xbfbf62b38ca3a706
> +        .quad 0xbfbf8c9683468191
> +        .quad 0xbfbfb669b7fef1a8
> +        .quad 0xbfbfe02d36a3956d
> +        .quad 0xbfc004f0857edc5c
> +        .quad 0xbfc019c2a064b486
> +        .quad 0xbfc02e8cf1dac4b8
> +        .quad 0xbfc0434f7fb1f307
> +        .quad 0xbfc0580a4fb4a3df
> +        .quad 0xbfc06cbd67a6c3b6
> +        .quad 0xbfc08168cd45d0a9
> +        .quad 0xbfc0960c8648e406
> +        .quad 0xbfc0aaa89860bbcf
> +        .quad 0xbfc0bf3d0937c41c
> +        .quad 0xbfc0d3c9de722078
> +        .quad 0xbfc0e84f1dadb526
> +        .quad 0xbfc0fccccc823059
> +        .quad 0xbfc11142f0811357
> +        .quad 0xbfc125b18f35bb8e
> +        .quad 0xbfc13a18ae256b99
> +        .quad 0xbfc14e7852cf5430
> +        .quad 0xbfc162d082ac9d10
> +        .quad 0xbfc1772143306dc6
> +        .quad 0xbfc18b6a99c7f679
> +        .quad 0xbfc19fac8bda7897
> +        .quad 0xbfc1b3e71ec94f7b
> +        .quad 0xbfc1c81a57eff8fd
> +        .quad 0xbfc1dc463ca41df8
> +        .quad 0xbfc1f06ad2359abd
> +        .quad 0xbfc204881dee8777
> +        .quad 0xbfc2189e25134081
> +        .quad 0xbfc22cacece26ead
> +        .quad 0xbfc240b47a950f79
> +        .quad 0xbfc254b4d35e7d3c
> +        .quad 0xbfc268adfc6c773e
> +        .quad 0xbfc27c9ffae729c1
> +        .quad 0xbfc2908ad3f13603
> +        .quad 0xbfc2a46e8ca7ba2a
> +        .quad 0xbfc2b84b2a225923
> +        .quad 0xbfc2cc20b1734279
> +        .quad 0xbfc2dfef27a73a18
> +        .quad 0xbfc2f3b691c5a001
> +        .quad 0xbfc30776f4d077f7
> +        .quad 0xbfc31b3055c47118
> +        .quad 0xbfc32ee2b998ed6e
> +        .quad 0xbfc3428e2540096d
> +        .quad 0x3fc331f403985097
> +        .quad 0x3fc31e56798a910a
> +        .quad 0x3fc30abfd8f333b6
> +        .quad 0x3fc2f7301cf4e87b
> +        .quad 0x3fc2e3a740b7800f
> +        .quad 0x3fc2d0253f67e4cb
> +        .quad 0x3fc2bcaa14381386
> +        .quad 0x3fc2a935ba5f1479
> +        .quad 0x3fc295c82d18f434
> +        .quad 0x3fc2826167a6bc9c
> +        .quad 0x3fc26f01654e6df6
> +        .quad 0x3fc25ba8215af7fc
> +        .quad 0x3fc24855971c3307
> +        .quad 0x3fc23509c1e6d937
> +        .quad 0x3fc221c49d147fb3
> +        .quad 0x3fc20e8624038fed
> +        .quad 0x3fc1fb4e521740f4
> +        .quad 0x3fc1e81d22b790d4
> +        .quad 0x3fc1d4f291513e01
> +        .quad 0x3fc1c1ce9955c0c6
> +        .quad 0x3fc1aeb1363b44c8
> +        .quad 0x3fc19b9a637ca295
> +        .quad 0x3fc1888a1c995931
> +        .quad 0x3fc175805d1587c1
> +        .quad 0x3fc1627d2079e731
> +        .quad 0x3fc14f806253c3ed
> +        .quad 0x3fc13c8a1e34f7a0
> +        .quad 0x3fc1299a4fb3e306
> +        .quad 0x3fc116b0f26b67bb
> +        .quad 0x3fc103ce01fae223
> +        .quad 0x3fc0f0f17a062353
> +        .quad 0x3fc0de1b56356b04
> +        .quad 0x3fc0cb4b9235619a
> +        .quad 0x3fc0b88229b71227
> +        .quad 0x3fc0a5bf186fe483
> +        .quad 0x3fc093025a19976c
> +        .quad 0x3fc0804bea723aa9
> +        .quad 0x3fc06d9bc53c2941
> +        .quad 0x3fc05af1e63e03b4
> +        .quad 0x3fc0484e4942aa43
> +        .quad 0x3fc035b0ea19373b
> +        .quad 0x3fc02319c494f951
> +        .quad 0x3fc01088d48d6e03
> +        .quad 0x3fbffbfc2bbc7803
> +        .quad 0x3fbfd6f308ce5b52
> +        .quad 0x3fbfb1f6381856f4
> +        .quad 0x3fbf8d05b16a6d47
> +        .quad 0x3fbf68216c9cc727
> +        .quad 0x3fbf4349618fa91a
> +        .quad 0x3fbf1e7d882b689a
> +        .quad 0x3fbef9bdd860616b
> +        .quad 0x3fbed50a4a26eafc
> +        .quad 0x3fbeb062d57f4de8
> +        .quad 0x3fbe8bc77271b97a
> +        .quad 0x3fbe6738190e394c
> +        .quad 0x3fbe42b4c16caaf3
> +        .quad 0x3fbe1e3d63acb3ba
> +        .quad 0x3fbdf9d1f7f5b674
> +        .quad 0x3fbdd5727676c959
> +        .quad 0x3fbdb11ed766abf4
> +        .quad 0x3fbd8cd71303bd26
> +        .quad 0x3fbd689b2193f133
> +        .quad 0x3fbd446afb64c7e5
> +        .quad 0x3fbd204698cb42bd
> +        .quad 0x3fbcfc2df223db2d
> +        .quad 0x3fbcd820ffd278f3
> +        .quad 0x3fbcb41fba42686d
> +        .quad 0x3fbc902a19e65111
> +        .quad 0x3fbc6c4017382bea
> +        .quad 0x3fbc4861aab93a23
> +        .quad 0x3fbc248eccf1fba6
> +        .quad 0x3fbc00c7767225cb
> +        .quad 0x3fbbdd0b9fd09a10
> +        .quad 0x3fbbb95b41ab5ce6
> +        .quad 0x3fbb95b654a78c87
> +        .quad 0x3fbb721cd17157e3
> +        .quad 0x3fbb4e8eb0bbf58f
> +        .quad 0x3fbb2b0beb419ad0
> +        .quad 0x3fbb079479c372ad
> +        .quad 0x3fbae4285509950b
> +        .quad 0x3fbac0c775e2fde6
> +        .quad 0x3fba9d71d5258484
> +        .quad 0x3fba7a276badd2c8
> +        .quad 0x3fba56e8325f5c87
> +        .quad 0x3fba33b4222456f1
> +        .quad 0x3fba108b33edb005
> +        .quad 0x3fb9ed6d60b30612
> +        .quad 0x3fb9ca5aa1729f45
> +        .quad 0x3fb9a752ef316149
> +        .quad 0x3fb9845642fac8f0
> +        .quad 0x3fb9616495e0e1e8
> +        .quad 0x3fb93e7de0fc3e80
> +        .quad 0x3fb91ba21d6bef77
> +        .quad 0x3fb8f8d144557bdf
> +        .quad 0x3fb8d60b4ee4d901
> +        .quad 0x3fb8b350364c6257
> +        .quad 0x3fb8909ff3c4d191
> +        .quad 0x3fb86dfa808d36a0
> +        .quad 0x3fb84b5fd5eaefd8
> +        .quad 0x3fb828cfed29a215
> +        .quad 0x3fb8064abf9b30f1
> +        .quad 0x3fb7e3d04697b704
> +        .quad 0x3fb7c1607b7d7e32
> +        .quad 0x3fb79efb57b0f803
> +        .quad 0x3fb77ca0d49cb608
> +        .quad 0x3fb75a50ebb1624a
> +        .quad 0x3fb7380b9665b7c8
> +        .quad 0x3fb715d0ce367afc
> +        .quad 0x3fb6f3a08ca67270
> +        .quad 0x3fb6d17acb3e5f5e
> +        .quad 0x3fb6af5f838cf654
> +        .quad 0x3fb68d4eaf26d7ee
> +        .quad 0x3fb66b4847a68997
> +        .quad 0x3fb6494c46ac6e4d
> +        .quad 0x3fb6275aa5debf81
> +        .quad 0x3fb605735ee985f1
> +        .quad 0x3fb5e3966b7e9295
> +        .quad 0x3fb5c1c3c5557799
> +        .quad 0x3fb59ffb662b815c
> +        .quad 0x3fb57e3d47c3af7b
> +        .quad 0x3fb55c8963e6adeb
> +        .quad 0x3fb53adfb462ce16
> +        .quad 0x3fb51940330c000b
> +        .quad 0x3fb4f7aad9bbcbaf
> +        .quad 0x3fb4d61fa2514a00
> +        .quad 0x3fb4b49e86b11e5f
> +        .quad 0x3fb4932780c56fe2
> +        .quad 0x3fb471ba8a7de2b7
> +        .quad 0x3fb450579dcf9186
> +        .quad 0x3fb42efeb4b506e9
> +        .quad 0x3fb40dafc92e36e2
> +        .quad 0x3fb3ec6ad5407868
> +        .quad 0x3fb3cb2fd2f67ef1
> +        .quad 0x3fb3a9febc60540a
> +        .quad 0x3fb388d78b9350ff
> +        .quad 0x3fb367ba3aaa1883
> +        .quad 0x3fb346a6c3c49066
> +        .quad 0x3fb3259d2107db54
> +        .quad 0x3fb3049d4c9e52a0
> +        .quad 0x3fb2e3a740b7800f
> +        .quad 0x3fb2c2baf78817b7
> +        .quad 0x3fb2a1d86b49f1e2
> +        .quad 0x3fb280ff963c04fc
> +        .quad 0x3fb2603072a25f82
> +        .quad 0x3fb23f6afac6220a
> +        .quad 0x3fb21eaf28f57941
> +        .quad 0x3fb1fdfcf7839804
> +        .quad 0x3fb1dd5460c8b16f
> +        .quad 0x3fb1bcb55f21f307
> +        .quad 0x3fb19c1fecf17ee0
> +        .quad 0x3fb17b94049e65d0
> +        .quad 0x3fb15b11a094a1aa
> +        .quad 0x3fb13a98bb450f81
> +        .quad 0x3fb11a294f2569f6
> +        .quad 0x3fb0f9c356b04389
> +        .quad 0x3fb0d966cc6500fa
> +        .quad 0x3fb0b913aac7d3a7
> +        .quad 0x3fb098c9ec61b3ff
> +        .quad 0x3fb078898bc05bf4
> +        .quad 0x3fb0585283764178
> +        .quad 0x3fb03824ce1a9101
> +        .quad 0x3fb0180066492817
> +        .quad 0x3fafefca8d451fd6
> +        .quad 0x3fafafa6d397efdb
> +        .quad 0x3faf6f9594de60f0
> +        .quad 0x3faf2f96c6754aee
> +        .quad 0x3faeefaa5dc2b239
> +        .quad 0x3faeafd05035bd3b
> +        .quad 0x3fae70089346a9e6
> +        .quad 0x3fae30531c76c34a
> +        .quad 0x3fadf0afe1505738
> +        .quad 0x3fadb11ed766abf4
> +        .quad 0x3fad719ff455f5f7
> +        .quad 0x3fad32332dc34dbd
> +        .quad 0x3facf2d8795ca5a5
> +        .quad 0x3facb38fccd8bfdb
> +        .quad 0x3fac74591df72456
> +        .quad 0x3fac3534628016dd
> +        .quad 0x3fabf62190448d22
> +        .quad 0x3fabb7209d1e24e5
> +        .quad 0x3fab78317eef1a29
> +        .quad 0x3fab39542ba23d73
> +        .quad 0x3faafa88992aea19
> +        .quad 0x3faabbcebd84fca0
> +        .quad 0x3faa7d268eb4c924
> +        .quad 0x3faa3e9002c711d2
> +        .quad 0x3faa000b0fd0fd6b
> +        .quad 0x3fa9c197abf00dd7
> +        .quad 0x3fa98335cd4a16c3
> +        .quad 0x3fa944e56a0d3450
> +        .quad 0x3fa906a6786fc1cb
> +        .quad 0x3fa8c878eeb05074
> +        .quad 0x3fa88a5cc3159e53
> +        .quad 0x3fa84c51ebee8d15
> +        .quad 0x3fa80e585f9218fc
> +        .quad 0x3fa7d070145f4fd7
> +        .quad 0x3fa7929900bd4809
> +        .quad 0x3fa754d31b1b179c
> +        .quad 0x3fa7171e59efcb5f
> +        .quad 0x3fa6d97ab3ba5e10
> +        .quad 0x3fa69be81f01af99
> +        .quad 0x3fa65e6692547c4e
> +        .quad 0x3fa620f604495440
> +        .quad 0x3fa5e3966b7e9295
> +        .quad 0x3fa5a647be9a54f6
> +        .quad 0x3fa56909f44a72fe
> +        .quad 0x3fa52bdd034475b8
> +        .quad 0x3fa4eec0e2458f30
> +        .quad 0x3fa4b1b588129203
> +        .quad 0x3fa474baeb77e904
> +        .quad 0x3fa437d103498eec
> +        .quad 0x3fa3faf7c663060e
> +        .quad 0x3fa3be2f2ba7501f
> +        .quad 0x3fa381772a00e604
> +        .quad 0x3fa344cfb861afae
> +        .quad 0x3fa30838cdc2fbfd
> +        .quad 0x3fa2cbb2612578b4
> +        .quad 0x3fa28f3c69912a74
> +        .quad 0x3fa252d6de1564c1
> +        .quad 0x3fa21681b5c8c213
> +        .quad 0x3fa1da3ce7c91bf8
> +        .quad 0x3fa19e086b3b8333
> +        .quad 0x3fa161e4374c37f4
> +        .quad 0x3fa125d0432ea20e
> +        .quad 0x3fa0e9cc861d4944
> +        .quad 0x3fa0add8f759cd95
> +        .quad 0x3fa071f58e2cdf9b
> +        .quad 0x3fa0362241e638ec
> +        .quad 0x3f9ff4be13b92920
> +        .quad 0x3f9f7d57badb4ee8
> +        .quad 0x3f9f061167fc31e8
> +        .quad 0x3f9e8eeb09f2f6cb
> +        .quad 0x3f9e17e48fa48962
> +        .quad 0x3f9da0fde8038de9
> +        .quad 0x3f9d2a3702105259
> +        .quad 0x3f9cb38fccd8bfdb
> +        .quad 0x3f9c3d0837784c41
> +        .quad 0x3f9bc6a03117eb97
> +        .quad 0x3f9b5057a8ee01ce
> +        .quad 0x3f9ada2e8e3e546f
> +        .quad 0x3f9a6424d059fc68
> +        .quad 0x3f99ee3a5e9f57e8
> +        .quad 0x3f99786f2879fc53
> +        .quad 0x3f9902c31d62a843
> +        .quad 0x3f988d362cdf359e
> +        .quad 0x3f9817c846828bbd
> +        .quad 0x3f97a27959ec91aa
> +        .quad 0x3f972d4956ca2067
> +        .quad 0x3f96b8382cd4f551
> +        .quad 0x3f964345cbd3a491
> +        .quad 0x3f95ce7223998b98
> +        .quad 0x3f9559bd2406c3ba
> +        .quad 0x3f94e526bd0814d1
> +        .quad 0x3f9470aede96e7f2
> +        .quad 0x3f93fc5578b93a38
> +        .quad 0x3f93881a7b818f9e
> +        .quad 0x3f9313fdd70ee5e8
> +        .quad 0x3f929fff7b8ca79d
> +        .quad 0x3f922c1f59329f1b
> +        .quad 0x3f91b85d6044e9ae
> +        .quad 0x3f9144b98113eac0
> +        .quad 0x3f90d133abfc3f1b
> +        .quad 0x3f905dcbd166b033
> +        .quad 0x3f8fd503c3904f1d
> +        .quad 0x3f8eeeab9b43445d
> +        .quad 0x3f8e088f0b004827
> +        .quad 0x3f8d22adf3f9579d
> +        .quad 0x3f8c3d0837784c41
> +        .quad 0x3f8b579db6dec358
> +        .quad 0x3f8a726e53a6056e
> +        .quad 0x3f898d79ef5eedf0
> +        .quad 0x3f88a8c06bb1d2f4
> +        .quad 0x3f87c441aa5e6d15
> +        .quad 0x3f86dffd8d3bbf70
> +        .quad 0x3f85fbf3f637ffc5
> +        .quad 0x3f851824c7587eb0
> +        .quad 0x3f84348fe2b99002
> +        .quad 0x3f8351352a8e733f
> +        .quad 0x3f826e1481213c2e
> +        .quad 0x3f818b2dc8d2bb91
> +        .quad 0x3f80a880e41a67f6
> +        .quad 0x3f7f8c1b6b0c8d4e
> +        .quad 0x3f7dc7a83f75a96d
> +        .quad 0x3f7c03a80ae5e054
> +        .quad 0x3f7a401a92ff827e
> +        .quad 0x3f787cff9d9147a5
> +        .quad 0x3f76ba56f09621bc
> +        .quad 0x3f74f8205235102d
> +        .quad 0x3f73365b88c0f347
> +        .quad 0x3f7175085ab85ff0
> +        .quad 0x3f6f684d1d8ae702
> +        .quad 0x3f6be76bd77b4fc3
> +        .quad 0x3f68676c71434fb9
> +        .quad 0x3f64e84e793a474a
> +        .quad 0x3f616a117e0d4b30
> +        .quad 0x3f5bd96a1d7d9cbc
> +        .quad 0x3f54e071754c98ba
> +        .quad 0x3f4bd27045bfd025
> +        .quad 0x3f3bcef518e29612
> +        .quad 0x8000000000000000
> +        /*== poly_coeff[5] ==*/
> +        .align 16
> +        .quad 0x3fb63C65231FBD16, 0x3fb63C65231FBD16 /* coeff5 */
> +        .quad 0xbfbBCB7D4EFBE80B, 0xbfbBCB7D4EFBE80B /* coeff4 */
> +        .quad 0x3fc287A7636F341E, 0x3fc287A7636F341E /* coeff3 */
> +        .quad 0xbfcBCB7B1526DE36, 0xbfcBCB7B1526DE36 /* coeff2 */
> +        .quad 0x3fdBCB7B1526E50E, 0x3fdBCB7B1526E50E /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 16
> +        .quad 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinNorm ==*/
> +        .align 16
> +        .quad 0x0010000000000000, 0x0010000000000000
> +        /*== MaxNorm ==*/
> +        .align 16
> +        .quad 0x7fefffffffffffff, 0x7fefffffffffffff
> +        /*== HalfMask ==*/
> +        .align 16
> +        .quad 0xfffffffffc000000, 0xfffffffffc000000
> +        /*== One ==*/
> +        .align 16
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== Threshold ==*/
> +        .align 16
> +        .quad 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 16
> +        .quad 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 16
> +        .quad 0x408ff00000000000, 0x408ff00000000000
> +        /*== L2 ==*/
> +        .align 16
> +        .quad 0x3fd34413509f79ff, 0x3fd34413509f79ff
> +        .align 16
> +        .type	__svml_dlog10_data_internal,@object
> +        .size	__svml_dlog10_data_internal,.-__svml_dlog10_data_internal
> +        .space 48, 0x00 	
> +        .align 16
> +
> +.FLT_12:
> +        .long	0x00000000,0x43380000,0x00000000,0x43380000
> +        .type	.FLT_12,@object
> +        .size	.FLT_12,16
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core-sse.S
> new file mode 100644
> index 0000000000..0a101666f5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized log10, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_log10 _ZGVdN4v_log10_sse_wrapper
> +#include "../svml_d_log104_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core.c
> new file mode 100644
> index 0000000000..48c63cfb3d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized log10, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_log10
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_log10, __GI__ZGVdN4v_log10, __redirect__ZGVdN4v_log10)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core_avx2.S
> new file mode 100644
> index 0000000000..df23926562
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log104_core_avx2.S
> @@ -0,0 +1,1074 @@
> +/* Function log10 vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
> + *       log10(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dlog10_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	4128
> +#define poly_coeff                    	8256
> +#define ExpMask                       	8416
> +#define Two10                         	8448
> +#define MinNorm                       	8480
> +#define MaxNorm                       	8512
> +#define HalfMask                      	8544
> +#define One                           	8576
> +#define Threshold                     	8608
> +#define Bias                          	8640
> +#define Bias1                         	8672
> +#define L2                            	8704
> +
> +/* Lookup bias for data table __svml_dlog10_data_internal.  */
> +#define Table_Lookup_Bias               -0x406fe0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_log10_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       Table_Lookup_Bias+__svml_dlog10_data_internal(%rip), %r8
> +        vmovapd   %ymm0, %ymm3
> +
> +/* preserve mantissa, set input exponent to 2^(-10) */
> +        vandpd    ExpMask+__svml_dlog10_data_internal(%rip), %ymm3, %ymm4
> +        vorpd     Two10+__svml_dlog10_data_internal(%rip), %ymm4, %ymm2
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        vcvtpd2ps %ymm2, %xmm5
> +
> +/* exponent bits */
> +        vpsrlq    $20, %ymm3, %ymm7
> +        vmovupd   One+__svml_dlog10_data_internal(%rip), %ymm14
> +        vrcpps    %xmm5, %xmm6
> +
> +/* check range */
> +        vcmplt_oqpd MinNorm+__svml_dlog10_data_internal(%rip), %ymm3, %ymm11
> +        vcmpnle_uqpd MaxNorm+__svml_dlog10_data_internal(%rip), %ymm3, %ymm12
> +        vcvtps2pd %xmm6, %ymm9
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        vroundpd  $0, %ymm9, %ymm1
> +
> +/* exponent*log(2.0) */
> +        vmovupd   Threshold+__svml_dlog10_data_internal(%rip), %ymm9
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        vpsrlq    $40, %ymm1, %ymm15
> +
> +/* argument reduction */
> +        vfmsub213pd %ymm14, %ymm1, %ymm2
> +        vcmplt_oqpd %ymm1, %ymm9, %ymm1
> +        vorpd     %ymm12, %ymm11, %ymm13
> +        vmovupd   poly_coeff+64+__svml_dlog10_data_internal(%rip), %ymm12
> +        vfmadd213pd poly_coeff+96+__svml_dlog10_data_internal(%rip), %ymm2, %ymm12
> +
> +/* combine and get argument value range mask */
> +        vmovmskpd %ymm13, %eax
> +        vmulpd    %ymm2, %ymm2, %ymm13
> +        vextractf128 $1, %ymm7, %xmm8
> +        vshufps   $221, %xmm8, %xmm7, %xmm10
> +
> +/* biased exponent in DP format */
> +        vcvtdq2pd %xmm10, %ymm0
> +        vandpd    Bias+__svml_dlog10_data_internal(%rip), %ymm1, %ymm10
> +        vorpd     Bias1+__svml_dlog10_data_internal(%rip), %ymm10, %ymm11
> +        vsubpd    %ymm11, %ymm0, %ymm0
> +        vmulpd    L2+__svml_dlog10_data_internal(%rip), %ymm0, %ymm1
> +
> +/* polynomial */
> +        vmovupd   poly_coeff+__svml_dlog10_data_internal(%rip), %ymm0
> +        vfmadd213pd poly_coeff+32+__svml_dlog10_data_internal(%rip), %ymm2, %ymm0
> +        vmulpd    poly_coeff+128+__svml_dlog10_data_internal(%rip), %ymm2, %ymm2
> +        vfmadd213pd %ymm12, %ymm13, %ymm0
> +        vfmadd213pd %ymm2, %ymm13, %ymm0
> +        vextractf128 $1, %ymm15, %xmm6
> +        vmovd     %xmm15, %edx
> +        vmovd     %xmm6, %esi
> +        movslq    %edx, %rdx
> +        vpextrd   $2, %xmm15, %ecx
> +        movslq    %esi, %rsi
> +        vpextrd   $2, %xmm6, %edi
> +        movslq    %ecx, %rcx
> +        movslq    %edi, %rdi
> +        vmovsd    (%r8,%rdx), %xmm4
> +        vmovsd    (%r8,%rsi), %xmm7
> +        vmovhpd   (%r8,%rcx), %xmm4, %xmm5
> +        vmovhpd   (%r8,%rdi), %xmm7, %xmm8
> +        vinsertf128 $1, %xmm8, %ymm5, %ymm14
> +
> +/* reconstruction */
> +        vaddpd    %ymm0, %ymm14, %ymm2
> +        vaddpd    %ymm2, %ymm1, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm3, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      log10@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_log10_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dlog10_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 Log_HA_table[(1<<9)+2][2];
> +        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(32)) VUINT32 poly_coeff[5][4][2];
> +        __declspec(align(32)) VUINT32 ExpMask[4][2];
> +        __declspec(align(32)) VUINT32 Two10[4][2];
> +        __declspec(align(32)) VUINT32 MinNorm[4][2];
> +        __declspec(align(32)) VUINT32 MaxNorm[4][2];
> +        __declspec(align(32)) VUINT32 HalfMask[4][2];
> +        __declspec(align(32)) VUINT32 One[4][2];
> +        __declspec(align(32)) VUINT32 Threshold[4][2];
> +        __declspec(align(32)) VUINT32 Bias[4][2];
> +        __declspec(align(32)) VUINT32 Bias1[4][2];
> +        __declspec(align(32)) VUINT32 L2[4][2];
> +} __svml_dlog10_data_internal;
> +#endif
> +__svml_dlog10_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc0733a7146f6b080, 0xbe1e707ce619c200
> +        .quad 0xc0733a7547771970, 0xbe1e79c6c06d6f51
> +        .quad 0xc0733a7945aacb70, 0xbe1e78e225fad29c
> +        .quad 0xc0733a7d41946970, 0xbe1e76d607f9693b
> +        .quad 0xc0733a813b3691f0, 0xbe1e7704b3e0685b
> +        .quad 0xc0733a853293df00, 0xbe1e79c1216a27fa
> +        .quad 0xc0733a8927aee660, 0xbe1e76dce5734a81
> +        .quad 0xc0733a8d1a8a3920, 0xbe1e782ee2ca4dba
> +        .quad 0xc0733a910b286430, 0xbe1e7812d1a0a61f
> +        .quad 0xc0733a94f98bf010, 0xbe1e77e1b5ecbc61
> +        .quad 0xc0733a98e5b76100, 0xbe1e76635cac1586
> +        .quad 0xc0733a9ccfad36f0, 0xbe1e7638f7968f32
> +        .quad 0xc0733aa0b76feda0, 0xbe1e7840ee76e365
> +        .quad 0xc0733aa49d01fcb0, 0xbe1e79f3fd01907e
> +        .quad 0xc0733aa88065d7a0, 0xbe1e77bbb3a9c38a
> +        .quad 0xc0733aac619dedb0, 0xbe1e7742719bf41d
> +        .quad 0xc0733ab040acaa20, 0xbe1e79bcedaf79cb
> +        .quad 0xc0733ab41d947450, 0xbe1e762d63cb7ca0
> +        .quad 0xc0733ab7f857af50, 0xbe1e77a07be83403
> +        .quad 0xc0733abbd0f8ba80, 0xbe1e7763ff836ad0
> +        .quad 0xc0733abfa779f130, 0xbe1e7737720ead39
> +        .quad 0xc0733ac37bddaad0, 0xbe1e7776a08e55e7
> +        .quad 0xc0733ac74e263af0, 0xbe1e793e3c52dd36
> +        .quad 0xc0733acb1e55f160, 0xbe1e788a94695051
> +        .quad 0xc0733aceec6f1a10, 0xbe1e76508114a813
> +        .quad 0xc0733ad2b873fd20, 0xbe1e76909457d23e
> +        .quad 0xc0733ad68266df10, 0xbe1e7664a24f9ca4
> +        .quad 0xc0733ada4a4a0090, 0xbe1e7a07b3d44b18
> +        .quad 0xc0733ade101f9ee0, 0xbe1e76d87594704d
> +        .quad 0xc0733ae1d3e9f340, 0xbe1e79563595a182
> +        .quad 0xc0733ae595ab33b0, 0xbe1e771880c3c6ab
> +        .quad 0xc0733ae955659250, 0xbe1e78c171f517d4
> +        .quad 0xc0733aed131b3df0, 0xbe1e77eac3874666
> +        .quad 0xc0733af0cece61b0, 0xbe1e790db479d8f6
> +        .quad 0xc0733af488812550, 0xbe1e7965d1aa5c90
> +        .quad 0xc0733af84035ad10, 0xbe1e78ceb398ba47
> +        .quad 0xc0733afbf5ee19c0, 0xbe1e779cc0dcb5aa
> +        .quad 0xc0733affa9ac88c0, 0xbe1e7871053953ed
> +        .quad 0xc0733b035b731420, 0xbe1e7a082cffa71a
> +        .quad 0xc0733b070b43d2a0, 0xbe1e7904b4382fad
> +        .quad 0xc0733b0ab920d790, 0xbe1e79b458d0b4f3
> +        .quad 0xc0733b0e650c3310, 0xbe1e79d0ded414c6
> +        .quad 0xc0733b120f07f200, 0xbe1e763c357a1943
> +        .quad 0xc0733b15b7161dd0, 0xbe1e78b80ba6daaa
> +        .quad 0xc0733b195d38bd00, 0xbe1e7998e23b8ffd
> +        .quad 0xc0733b1d0171d2c0, 0xbe1e7974aa65ee8c
> +        .quad 0xc0733b20a3c35f20, 0xbe1e76ccfde752ab
> +        .quad 0xc0733b24442f5ef0, 0xbe1e77b4ff19debb
> +        .quad 0xc0733b27e2b7cc10, 0xbe1e7772ee478542
> +        .quad 0xc0733b2b7f5e9d30, 0xbe1e781d81b58b44
> +        .quad 0xc0733b2f1a25c600, 0xbe1e78350d967565
> +        .quad 0xc0733b32b30f3720, 0xbe1e783888e48152
> +        .quad 0xc0733b364a1cde30, 0xbe1e78367bf7c111
> +        .quad 0xc0733b39df50a5d0, 0xbe1e7959e57ca47d
> +        .quad 0xc0733b3d72ac75c0, 0xbe1e777322423222
> +        .quad 0xc0733b41043232b0, 0xbe1e767ce42a60aa
> +        .quad 0xc0733b4493e3be70, 0xbe1e781d445aea19
> +        .quad 0xc0733b4821c2f800, 0xbe1e7922fca18e18
> +        .quad 0xc0733b4badd1bb80, 0xbe1e76fed3d40647
> +        .quad 0xc0733b4f3811e210, 0xbe1e793948c9eabc
> +        .quad 0xc0733b52c0854240, 0xbe1e76e487656b8c
> +        .quad 0xc0733b56472daf90, 0xbe1e780ab2f71223
> +        .quad 0xc0733b59cc0cfaf0, 0xbe1e77189120b09c
> +        .quad 0xc0733b5d4f24f270, 0xbe1e7644a0343a12
> +        .quad 0xc0733b60d0776160, 0xbe1e78f2a3e4733d
> +        .quad 0xc0733b6450061080, 0xbe1e7913b2f73ae5
> +        .quad 0xc0733b67cdd2c5c0, 0xbe1e7882d08393b5
> +        .quad 0xc0733b6b49df4470, 0xbe1e765e1b209979
> +        .quad 0xc0733b6ec42d4d20, 0xbe1e785c9c4620d4
> +        .quad 0xc0733b75b394f240, 0xbe1e78878cd0e956
> +        .quad 0xc0733b7c9c178630, 0xbe1e789a4112d90b
> +        .quad 0xc0733b837dc2b0f0, 0xbe1e79050b8a1766
> +        .quad 0xc0733b8a58a3f220, 0xbe1e7790dffc47aa
> +        .quad 0xc0733b912cc8a180, 0xbe1e77174593b06a
> +        .quad 0xc0733b97fa3defb0, 0xbe1e7677de2d2ecc
> +        .quad 0xc0733b9ec110e6b0, 0xbe1e76cff477ca18
> +        .quad 0xc0733ba5814e6a80, 0xbe1e78f8644dec7b
> +        .quad 0xc0733bac3b0339d0, 0xbe1e764e1361788d
> +        .quad 0xc0733bb2ee3bee30, 0xbe1e78c913e738de
> +        .quad 0xc0733bb99b04fd30, 0xbe1e76666f5bddaa
> +        .quad 0xc0733bc0416ab850, 0xbe1e77e87cbd8ab6
> +        .quad 0xc0733bc6e1794e10, 0xbe1e76f18ba1c966
> +        .quad 0xc0733bcd7b3cca10, 0xbe1e777c9461b8db
> +        .quad 0xc0733bd40ec115d0, 0xbe1e78b78526ffac
> +        .quad 0xc0733bda9c11f920, 0xbe1e7942abecfede
> +        .quad 0xc0733be1233b1aa0, 0xbe1e76d8a684fd8c
> +        .quad 0xc0733be7a4480010, 0xbe1e79622b539ac9
> +        .quad 0xc0733bee1f440f30, 0xbe1e7978e7cc20ea
> +        .quad 0xc0733bf4943a8de0, 0xbe1e765c9c9de825
> +        .quad 0xc0733bfb0336a290, 0xbe1e775d8b138ee2
> +        .quad 0xc0733c016c435500, 0xbe1e78bf33465c2f
> +        .quad 0xc0733c07cf6b8e80, 0xbe1e78164f7cc441
> +        .quad 0xc0733c0e2cba1a50, 0xbe1e7824e64d0b23
> +        .quad 0xc0733c148439a630, 0xbe1e78373ae7dd81
> +        .quad 0xc0733c1ad5f4c2c0, 0xbe1e7704513e0afe
> +        .quad 0xc0733c2121f5e3d0, 0xbe1e7914aa84200f
> +        .quad 0xc0733c2768476110, 0xbe1e76b1cde25cf6
> +        .quad 0xc0733c2da8f37600, 0xbe1e796120e3862d
> +        .quad 0xc0733c33e40442e0, 0xbe1e78ec836d7e7b
> +        .quad 0xc0733c3a1983cca0, 0xbe1e77fb13b7dabb
> +        .quad 0xc0733c40497bfd70, 0xbe1e783c6fcb2404
> +        .quad 0xc0733c4673f6a530, 0xbe1e7628bb93dce8
> +        .quad 0xc0733c4c98fd7990, 0xbe1e7857a47b5001
> +        .quad 0xc0733c52b89a16d0, 0xbe1e76708dc2831f
> +        .quad 0xc0733c58d2d5ffa0, 0xbe1e77b6038651f1
> +        .quad 0xc0733c5ee7ba9de0, 0xbe1e792e855bb5b2
> +        .quad 0xc0733c64f75142d0, 0xbe1e776cacd5c105
> +        .quad 0xc0733c6b01a32740, 0xbe1e77f8a8011315
> +        .quad 0xc0733c7106b96c30, 0xbe1e765cf3efcfde
> +        .quad 0xc0733c77069d1ad0, 0xbe1e78d837d2efac
> +        .quad 0xc0733c7d01572530, 0xbe1e78b615cf772c
> +        .quad 0xc0733c82f6f06640, 0xbe1e7650bbbd7a25
> +        .quad 0xc0733c88e771a220, 0xbe1e78bcf3495872
> +        .quad 0xc0733c8ed2e386c0, 0xbe1e792266832e84
> +        .quad 0xc0733c94b94eabd0, 0xbe1e79c1c3c2ca52
> +        .quad 0xc0733c9a9abb9340, 0xbe1e78aa61e5807d
> +        .quad 0xc0733ca07732a970, 0xbe1e7620fc4cf156
> +        .quad 0xc0733ca64ebc4570, 0xbe1e76b914a832c5
> +        .quad 0xc0733cac2160a970, 0xbe1e79227f72020e
> +        .quad 0xc0733cb1ef280300, 0xbe1e77ac972cc008
> +        .quad 0xc0733cb7b81a6b10, 0xbe1e798089be41f4
> +        .quad 0xc0733cbd7c3fe6a0, 0xbe1e77942ae037fe
> +        .quad 0xc0733cc33ba06690, 0xbe1e7956ae6463d9
> +        .quad 0xc0733cc8f643c850, 0xbe1e7918a50c7942
> +        .quad 0xc0733cceac31d5d0, 0xbe1e78308eeab604
> +        .quad 0xc0733cd45d7245e0, 0xbe1e76dd4ea88445
> +        .quad 0xc0733cda0a0cbc60, 0xbe1e77e7c1aa5909
> +        .quad 0xc0733cdfb208caa0, 0xbe1e7804b9d20e54
> +        .quad 0xc0733ce5556def70, 0xbe1e78f88e99d49c
> +        .quad 0xc0733ceaf4439780, 0xbe1e787d74682d68
> +        .quad 0xc0733cf08e911d80, 0xbe1e76edc24fe6e7
> +        .quad 0xc0733cf6245dca50, 0xbe1e79b347ec86d2
> +        .quad 0xc0733cfbb5b0d580, 0xbe1e797cceb2c39b
> +        .quad 0xc0733d0142916530, 0xbe1e783adbdc6aa1
> +        .quad 0xc0733d06cb068e70, 0xbe1e76e4c20e3d9e
> +        .quad 0xc0733d0c4f175570, 0xbe1e77070bf3cf61
> +        .quad 0xc0733d11cecaadc0, 0xbe1e781c43502734
> +        .quad 0xc0733d174a277a80, 0xbe1e78b11268ea72
> +        .quad 0xc0733d1cc1348e90, 0xbe1e7754b83bfc7d
> +        .quad 0xc0733d2233f8acb0, 0xbe1e7756c29bf5e9
> +        .quad 0xc0733d27a27a87d0, 0xbe1e7952fc1d9333
> +        .quad 0xc0733d2d0cc0c350, 0xbe1e778c76ae6077
> +        .quad 0xc0733d3272d1f2e0, 0xbe1e7a1896ba8f43
> +        .quad 0xc0733d37d4b49b30, 0xbe1e76dafdf432d8
> +        .quad 0xc0733d3d326f3180, 0xbe1e795330184013
> +        .quad 0xc0733d428c081c80, 0xbe1e763cc774d30f
> +        .quad 0xc0733d47e185b3d0, 0xbe1e77030a779c0a
> +        .quad 0xc0733d4d32ee40b0, 0xbe1e7908af2a2d7e
> +        .quad 0xc0733d528047fe00, 0xbe1e78c4953b797d
> +        .quad 0xc0733d57c9991850, 0xbe1e78b43b096579
> +        .quad 0xc0733d5d0ee7ae30, 0xbe1e7824ae0a4804
> +        .quad 0xc0733d625039d040, 0xbe1e79d2b2fbb740
> +        .quad 0xc0733d678d958190, 0xbe1e7662de59a1a6
> +        .quad 0xc0733d6cc700b760, 0xbe1e76b251d59aaa
> +        .quad 0xc0733d71fc8159b0, 0xbe1e7a00cfd1f487
> +        .quad 0xc0733d772e1d4360, 0xbe1e77f4d246167e
> +        .quad 0xc0733d7c5bda4200, 0xbe1e767a4ee8e6fc
> +        .quad 0xc0733d8185be1640, 0xbe1e777ccf0a8aed
> +        .quad 0xc0733d86abce7420, 0xbe1e767d7e279ada
> +        .quad 0xc0733d8bce1102d0, 0xbe1e7a05cef4bb90
> +        .quad 0xc0733d90ec8b5d40, 0xbe1e78f75369be5b
> +        .quad 0xc0733d96074311d0, 0xbe1e77b9612e8c8a
> +        .quad 0xc0733d9b1e3da2b0, 0xbe1e794518b9adeb
> +        .quad 0xc0733da031808620, 0xbe1e7810626fb934
> +        .quad 0xc0733da541112650, 0xbe1e76d87223fa6d
> +        .quad 0xc0733daa4cf4e1a0, 0xbe1e794c5e7ca3b5
> +        .quad 0xc0733daf55310af0, 0xbe1e789856ef816f
> +        .quad 0xc0733db459cae970, 0xbe1e77d2004effbd
> +        .quad 0xc0733db95ac7b8f0, 0xbe1e78467d31eb9c
> +        .quad 0xc0733dbe582caa00, 0xbe1e79aaa4e25787
> +        .quad 0xc0733dc351fee220, 0xbe1e762de8f107bf
> +        .quad 0xc0733dc848437b90, 0xbe1e7670670a63fe
> +        .quad 0xc0733dcd3aff85d0, 0xbe1e795ca237c6cc
> +        .quad 0xc0733dd22a3805b0, 0xbe1e77e55c53c1d9
> +        .quad 0xc0733dd715f1f520, 0xbe1e78a806213ac4
> +        .quad 0xc0733ddbfe3243b0, 0xbe1e77743a2bc615
> +        .quad 0xc0733de0e2fdd660, 0xbe1e78b8b45b0b7d
> +        .quad 0xc0733de5c4598800, 0xbe1e78d635f2f4b9
> +        .quad 0xc0733deaa24a2920, 0xbe1e7758c396a11e
> +        .quad 0xc0733def7cd48020, 0xbe1e7a17a8cc454c
> +        .quad 0xc0733df453fd49a0, 0xbe1e783caa73f616
> +        .quad 0xc0733df927c93820, 0xbe1e7932cfa29664
> +        .quad 0xc0733dfdf83cf490, 0xbe1e777d265c72a6
> +        .quad 0xc0733e02c55d1e10, 0xbe1e7775e7c03c60
> +        .quad 0xc0733e078f2e4a40, 0xbe1e79f65d52d232
> +        .quad 0xc0733e0c55b50570, 0xbe1e76e7e7464b4e
> +        .quad 0xc0733e1118f5d250, 0xbe1e77be81cad877
> +        .quad 0xc0733e15d8f52a80, 0xbe1e79dd25b5fb3a
> +        .quad 0xc0733e1a95b77e80, 0xbe1e78e45f1418ef
> +        .quad 0xc0733e1f4f4135a0, 0xbe1e78eb7289505b
> +        .quad 0xc0733e240596ae50, 0xbe1e78a468c07cad
> +        .quad 0xc0733e28b8bc3e20, 0xbe1e776b558a4009
> +        .quad 0xc0733e2d68b631d0, 0xbe1e77412eb9941e
> +        .quad 0xc0733e321588cd80, 0xbe1e76b2853f845e
> +        .quad 0xc0733e36bf384cb0, 0xbe1e76aa7184273c
> +        .quad 0xc0733e3b65c8e260, 0xbe1e7832027f78fa
> +        .quad 0xc0733e40093eb930, 0xbe1e7a1c7da131f5
> +        .quad 0xc0733e44a99df380, 0xbe1e76a0bc2ae4bc
> +        .quad 0xc0733e4946eaab30, 0xbe1e78dff13b6f5d
> +        .quad 0xc0733e4de128f250, 0xbe1e765a226dea2c
> +        .quad 0xc0733e52785cd290, 0xbe1e78509b989111
> +        .quad 0xc0733e570c8a4de0, 0xbe1e7916a4e9803d
> +        .quad 0xc0733e5b9db55e30, 0xbe1e7950c15758cc
> +        .quad 0xc0733e602be1f5a0, 0xbe1e7922ba1ad420
> +        .quad 0xc0733e64b713fe90, 0xbe1e794cbaabcef6
> +        .quad 0xc0733e693f4f5bc0, 0xbe1e7837bf883fed
> +        .quad 0xc0733e6dc497e850, 0xbe1e76f198ddbbdf
> +        .quad 0xc0733e7246f177d0, 0xbe1e7a18c1067764
> +        .quad 0xc0733e76c65fd6a0, 0xbe1e76b845a8fd9d
> +        .quad 0xc0733e7b42e6c970, 0xbe1e7714012df506
> +        .quad 0xc0733e7fbc8a0de0, 0xbe1e7765612922cd
> +        .quad 0xc0733e84334d5a50, 0xbe1e7688f5424a00
> +        .quad 0xc0733e88a7345df0, 0xbe1e769d011f6663
> +        .quad 0xc0733e8d1842c0e0, 0xbe1e79914acbfaf7
> +        .quad 0xc0733e91867c2460, 0xbe1e79a85e189bd7
> +        .quad 0xc0733e95f1e422a0, 0xbe1e79ea7c726432
> +        .quad 0xc0733e9a5a7e4f10, 0xbe1e768a6fbb8e6e
> +        .quad 0xc0733e9ec04e3620, 0xbe1e793c75bcc9fc
> +        .quad 0xc0733ea323575dd0, 0xbe1e797f78da13d4
> +        .quad 0xc0733ea7839d4550, 0xbe1e78d8c9cda978
> +        .quad 0xc0733eabe1236540, 0xbe1e77028d480fff
> +        .quad 0xc0733eb03bed2fa0, 0xbe1e7a0d0f74ff7c
> +        .quad 0xc0733eb493fe1040, 0xbe1e76732e8a35fb
> +        .quad 0xc0733eb8e9596c30, 0xbe1e77220caeabeb
> +        .quad 0xc0733ebd3c02a260, 0xbe1e797438b645ef
> +        .quad 0xc0733ec18bfd0b80, 0xbe1e79207c5fd6e8
> +        .quad 0xc0733ec5d94bf9f0, 0xbe1e781c7df8f946
> +        .quad 0xc0733eca23f2b9f0, 0xbe1e76736284e2db
> +        .quad 0xc0733ece6bf49190, 0xbe1e7a109cc0c3f5
> +        .quad 0xc0733ed2b154c120, 0xbe1e767f14a16d50
> +        .quad 0xc0733ed6f4168290, 0xbe1e789cd22acaf0
> +        .quad 0xc0733edb343d0a40, 0xbe1e764355ca28ad
> +        .quad 0xc0733edf71cb8660, 0xbe1e79e4c7a81c45
> +        .quad 0xc0733ee3acc51fb0, 0xbe1e761e26b644c2
> +        .quad 0xc0733ee7e52cf8c0, 0xbe1e793e9f8fbdd3
> +        .quad 0xc0733eec1b062ed0, 0xbe1e78c432991c20
> +        .quad 0xc0733ef04e53d940, 0xbe1e78cdd025f4d8
> +        .quad 0xc0733ef47f1909f0, 0xbe1e778310c6446e
> +        .quad 0xc0733ef8ad58cd20, 0xbe1e7871af3d6e17
> +        .quad 0xc0733efcd91629b0, 0xbe1e77e0e906f697
> +        .quad 0xc0733f01025420f0, 0xbe1e7a1ae9b27892
> +        .quad 0xc0733f052915af00, 0xbe1e76ac64c88f9d
> +        .quad 0xc0733f094d5dca60, 0xbe1e779a815589c4
> +        .quad 0xc0733f0d6f2f6480, 0xbe1e788f39a4864c
> +        .quad 0xc0733f118e8d6980, 0xbe1e79fc51263525
> +        .quad 0xc0733f15ab7ac060, 0xbe1e783501f19e90
> +        .quad 0xc0733f19c5fa4ae0, 0xbe1e767e82c327ab
> +        .quad 0xc0733f1dde0ee5a0, 0xbe1e7a1785d66123
> +        .quad 0xc0733f21f3bb6870, 0xbe1e7936d07203da
> +        .quad 0xc0733f260702a5e0, 0xbe1e7a010a7ac699
> +        .quad 0xc0733f2a17e76bb0, 0xbe1e7975e4e16312
> +        .quad 0xc0733f2e266c82b0, 0xbe1e7654b5422330
> +        .quad 0xc0733f323294aeb0, 0xbe1e77f8a4909d35
> +        .quad 0xc0733f363c62aee0, 0xbe1e792c8e30d226
> +        .quad 0xc0733f3a43d93da0, 0xbe1e76f6ac67a1ff
> +        .quad 0xc0733f3e48fb1070, 0xbe1e775c2e97715a
> +        .quad 0xc0733f424bcad840, 0xbe1e781cd54ae100
> +        /*== Log_LA_table ==*/
> +        .align 32
> +        .quad 0x0000000000000000
> +        .quad 0xbf4bc48a867884b7
> +        .quad 0xbf5bbd9e9482af09
> +        .quad 0xbf64c9096b94befd
> +        .quad 0xbf6bafd47221ed26
> +        .quad 0xbf714999e2ad8ea6
> +        .quad 0xbf74b99563d2a1bd
> +        .quad 0xbf7827de6b310350
> +        .quad 0xbf7b9476a4fcd10f
> +        .quad 0xbf7eff5fbaf25781
> +        .quad 0xbf81344daa2d7553
> +        .quad 0xbf82e8158b08d957
> +        .quad 0xbf849b0851443684
> +        .quad 0xbf864d26cce610dd
> +        .quad 0xbf87fe71ccc4e6b0
> +        .quad 0xbf89aeea1e897fdf
> +        .quad 0xbf8b5e908eb13790
> +        .quad 0xbf8d0d65e890405a
> +        .quad 0xbf8ebb6af653e2ee
> +        .quad 0xbf90345040825bad
> +        .quad 0xbf910a83a8446c78
> +        .quad 0xbf91e05015d30a71
> +        .quad 0xbf92b5b5ec0209d3
> +        .quad 0xbf938ab58d173e91
> +        .quad 0xbf945f4f5acb8be0
> +        .quad 0xbf953383b64bf13f
> +        .quad 0xbf960753003a94ef
> +        .quad 0xbf96dabd98afcc05
> +        .quad 0xbf97adc3df3b1ff8
> +        .quad 0xbf98806632e451d0
> +        .quad 0xbf9952a4f22c5ae9
> +        .quad 0xbf9a24807b0e6b5c
> +        .quad 0xbf9af5f92b00e610
> +        .quad 0xbf9bc70f5ef65a77
> +        .quad 0xbf9c97c3735e7c0a
> +        .quad 0xbf9d6815c4271775
> +        .quad 0xbf9e3806acbd058f
> +        .quad 0xbf9f0796880d1c19
> +        .quad 0xbf9fd6c5b0851c4c
> +        .quad 0xbfa052ca400a4f9b
> +        .quad 0xbfa0ba01a8170000
> +        .quad 0xbfa121093ce3a205
> +        .quad 0xbfa187e12aad8077
> +        .quad 0xbfa1ee899d74a03e
> +        .quad 0xbfa25502c0fc314c
> +        .quad 0xbfa2bb4cc0cafe8d
> +        .quad 0xbfa32167c82bdcda
> +        .quad 0xbfa38754022e18e2
> +        .quad 0xbfa3ed1199a5e425
> +        .quad 0xbfa452a0b92cc0ec
> +        .quad 0xbfa4b8018b21ed4f
> +        .quad 0xbfa51d3439aacd4a
> +        .quad 0xbfa58238eeb353da
> +        .quad 0xbfa5e70fd3ee6b34
> +        .quad 0xbfa64bb912d65c07
> +        .quad 0xbfa6b034d4ad33df
> +        .quad 0xbfa71483427d2a99
> +        .quad 0xbfa778a4851906f3
> +        .quad 0xbfa7dc98c51c8242
> +        .quad 0xbfa840602aecab3d
> +        .quad 0xbfa8a3fadeb847f4
> +        .quad 0xbfa90769087836e4
> +        .quad 0xbfa96aaacfefcf3c
> +        .quad 0xbfa9cdc05cad4042
> +        .quad 0xbfaa30a9d609efea
> +        .quad 0xbfaa9367632ad897
> +        .quad 0xbfaaf5f92b00e610
> +        .quad 0xbfab585f544951a4
> +        .quad 0xbfabba9a058dfd84
> +        .quad 0xbfac1ca96525cf56
> +        .quad 0xbfac7e8d993509f9
> +        .quad 0xbface046c7ada68d
> +        .quad 0xbfad41d5164facb4
> +        .quad 0xbfada338aaa98a0c
> +        .quad 0xbfae0471aa1868f5
> +        .quad 0xbfae658039c88690
> +        .quad 0xbfaec6647eb58808
> +        .quad 0xbfaf271e9daacf20
> +        .quad 0xbfaf87aebb43ce06
> +        .quad 0xbfafe814fbec5a77
> +        .quad 0xbfb02428c1f08016
> +        .quad 0xbfb054323b97a948
> +        .quad 0xbfb08426fcdb1ee7
> +        .quad 0xbfb0b40717932b96
> +        .quad 0xbfb0e3d29d81165e
> +        .quad 0xbfb11389a04f4a2e
> +        .quad 0xbfb1432c31917d08
> +        .quad 0xbfb172ba62c4d6de
> +        .quad 0xbfb1a23445501816
> +        .quad 0xbfb1d199ea83bfbe
> +        .quad 0xbfb200eb639a3173
> +        .quad 0xbfb23028c1b7daed
> +        .quad 0xbfb25f5215eb594a
> +        .quad 0xbfb28e67712d9dfc
> +        .quad 0xbfb2bd68e4621371
> +        .quad 0xbfb2ec568056c16f
> +        .quad 0xbfb31b3055c47118
> +        .quad 0xbfb349f6754ed0b4
> +        .quad 0xbfb378a8ef84971e
> +        .quad 0xbfb3a747d4dfa6f5
> +        .quad 0xbfb3d5d335c53179
> +        .quad 0xbfb4044b2285d925
> +        .quad 0xbfb432afab5dd3ff
> +        .quad 0xbfb46100e0750da1
> +        .quad 0xbfb48f3ed1df48fb
> +        .quad 0xbfb4bd698f9c41cf
> +        .quad 0xbfb4eb812997cde4
> +        .quad 0xbfb51985afa9fdfd
> +        .quad 0xbfb5477731973e85
> +        .quad 0xbfb57555bf1077f5
> +        .quad 0xbfb5a32167b32f02
> +        .quad 0xbfb5d0da3b09a47e
> +        .quad 0xbfb5fe80488af4fd
> +        .quad 0xbfb62c139f9b3837
> +        .quad 0xbfb659944f8ba02d
> +        .quad 0xbfb68702679a980a
> +        .quad 0xbfb6b45df6f3e2c9
> +        .quad 0xbfb6e1a70cb0b99a
> +        .quad 0xbfb70eddb7d7ea07
> +        .quad 0xbfb73c02075df3e5
> +        .quad 0xbfb769140a2526fd
> +        .quad 0xbfb79613cefdc07d
> +        .quad 0xbfb7c30164a60836
> +        .quad 0xbfb7efdcd9ca6d8f
> +        .quad 0xbfb81ca63d05a44a
> +        .quad 0xbfb8495d9ce0c10c
> +        .quad 0xbfb8760307d355ab
> +        .quad 0xbfb8a2968c438d41
> +        .quad 0xbfb8cf183886480d
> +        .quad 0xbfb8fb881adf3713
> +        .quad 0xbfb927e64180f790
> +        .quad 0xbfb95432ba8d2e2f
> +        .quad 0xbfb9806d9414a209
> +        .quad 0xbfb9ac96dc175776
> +        .quad 0xbfb9d8aea084aa9c
> +        .quad 0xbfba04b4ef3b69d8
> +        .quad 0xbfba30a9d609efea
> +        .quad 0xbfba5c8d62ae3dec
> +        .quad 0xbfba885fa2d6151e
> +        .quad 0xbfbab420a41f1076
> +        .quad 0xbfbadfd07416be07
> +        .quad 0xbfbb0b6f203ab82c
> +        .quad 0xbfbb36fcb5f8be8a
> +        .quad 0xbfbb627942aecedd
> +        .quad 0xbfbb8de4d3ab3d98
> +        .quad 0xbfbbb93f762cce4f
> +        .quad 0xbfbbe4893762cbf7
> +        .quad 0xbfbc0fc2246d20f5
> +        .quad 0xbfbc3aea4a5c6eff
> +        .quad 0xbfbc6601b63226cb
> +        .quad 0xbfbc910874e09f98
> +        .quad 0xbfbcbbfe934b2e81
> +        .quad 0xbfbce6e41e463da5
> +        .quad 0xbfbd11b92297632b
> +        .quad 0xbfbd3c7dacf5780b
> +        .quad 0xbfbd6731ca08aeb9
> +        .quad 0xbfbd91d5866aa99c
> +        .quad 0xbfbdbc68eea6915b
> +        .quad 0xbfbde6ec0f392b05
> +        .quad 0xbfbe115ef490ee07
> +        .quad 0xbfbe3bc1ab0e19fe
> +        .quad 0xbfbe66143f02cc5d
> +        .quad 0xbfbe9056bcb315e8
> +        .quad 0xbfbeba893055100b
> +        .quad 0xbfbee4aba610f204
> +        .quad 0xbfbf0ebe2a0125eb
> +        .quad 0xbfbf38c0c8325d86
> +        .quad 0xbfbf62b38ca3a706
> +        .quad 0xbfbf8c9683468191
> +        .quad 0xbfbfb669b7fef1a8
> +        .quad 0xbfbfe02d36a3956d
> +        .quad 0xbfc004f0857edc5c
> +        .quad 0xbfc019c2a064b486
> +        .quad 0xbfc02e8cf1dac4b8
> +        .quad 0xbfc0434f7fb1f307
> +        .quad 0xbfc0580a4fb4a3df
> +        .quad 0xbfc06cbd67a6c3b6
> +        .quad 0xbfc08168cd45d0a9
> +        .quad 0xbfc0960c8648e406
> +        .quad 0xbfc0aaa89860bbcf
> +        .quad 0xbfc0bf3d0937c41c
> +        .quad 0xbfc0d3c9de722078
> +        .quad 0xbfc0e84f1dadb526
> +        .quad 0xbfc0fccccc823059
> +        .quad 0xbfc11142f0811357
> +        .quad 0xbfc125b18f35bb8e
> +        .quad 0xbfc13a18ae256b99
> +        .quad 0xbfc14e7852cf5430
> +        .quad 0xbfc162d082ac9d10
> +        .quad 0xbfc1772143306dc6
> +        .quad 0xbfc18b6a99c7f679
> +        .quad 0xbfc19fac8bda7897
> +        .quad 0xbfc1b3e71ec94f7b
> +        .quad 0xbfc1c81a57eff8fd
> +        .quad 0xbfc1dc463ca41df8
> +        .quad 0xbfc1f06ad2359abd
> +        .quad 0xbfc204881dee8777
> +        .quad 0xbfc2189e25134081
> +        .quad 0xbfc22cacece26ead
> +        .quad 0xbfc240b47a950f79
> +        .quad 0xbfc254b4d35e7d3c
> +        .quad 0xbfc268adfc6c773e
> +        .quad 0xbfc27c9ffae729c1
> +        .quad 0xbfc2908ad3f13603
> +        .quad 0xbfc2a46e8ca7ba2a
> +        .quad 0xbfc2b84b2a225923
> +        .quad 0xbfc2cc20b1734279
> +        .quad 0xbfc2dfef27a73a18
> +        .quad 0xbfc2f3b691c5a001
> +        .quad 0xbfc30776f4d077f7
> +        .quad 0xbfc31b3055c47118
> +        .quad 0xbfc32ee2b998ed6e
> +        .quad 0xbfc3428e2540096d
> +        .quad 0x3fc331f403985097
> +        .quad 0x3fc31e56798a910a
> +        .quad 0x3fc30abfd8f333b6
> +        .quad 0x3fc2f7301cf4e87b
> +        .quad 0x3fc2e3a740b7800f
> +        .quad 0x3fc2d0253f67e4cb
> +        .quad 0x3fc2bcaa14381386
> +        .quad 0x3fc2a935ba5f1479
> +        .quad 0x3fc295c82d18f434
> +        .quad 0x3fc2826167a6bc9c
> +        .quad 0x3fc26f01654e6df6
> +        .quad 0x3fc25ba8215af7fc
> +        .quad 0x3fc24855971c3307
> +        .quad 0x3fc23509c1e6d937
> +        .quad 0x3fc221c49d147fb3
> +        .quad 0x3fc20e8624038fed
> +        .quad 0x3fc1fb4e521740f4
> +        .quad 0x3fc1e81d22b790d4
> +        .quad 0x3fc1d4f291513e01
> +        .quad 0x3fc1c1ce9955c0c6
> +        .quad 0x3fc1aeb1363b44c8
> +        .quad 0x3fc19b9a637ca295
> +        .quad 0x3fc1888a1c995931
> +        .quad 0x3fc175805d1587c1
> +        .quad 0x3fc1627d2079e731
> +        .quad 0x3fc14f806253c3ed
> +        .quad 0x3fc13c8a1e34f7a0
> +        .quad 0x3fc1299a4fb3e306
> +        .quad 0x3fc116b0f26b67bb
> +        .quad 0x3fc103ce01fae223
> +        .quad 0x3fc0f0f17a062353
> +        .quad 0x3fc0de1b56356b04
> +        .quad 0x3fc0cb4b9235619a
> +        .quad 0x3fc0b88229b71227
> +        .quad 0x3fc0a5bf186fe483
> +        .quad 0x3fc093025a19976c
> +        .quad 0x3fc0804bea723aa9
> +        .quad 0x3fc06d9bc53c2941
> +        .quad 0x3fc05af1e63e03b4
> +        .quad 0x3fc0484e4942aa43
> +        .quad 0x3fc035b0ea19373b
> +        .quad 0x3fc02319c494f951
> +        .quad 0x3fc01088d48d6e03
> +        .quad 0x3fbffbfc2bbc7803
> +        .quad 0x3fbfd6f308ce5b52
> +        .quad 0x3fbfb1f6381856f4
> +        .quad 0x3fbf8d05b16a6d47
> +        .quad 0x3fbf68216c9cc727
> +        .quad 0x3fbf4349618fa91a
> +        .quad 0x3fbf1e7d882b689a
> +        .quad 0x3fbef9bdd860616b
> +        .quad 0x3fbed50a4a26eafc
> +        .quad 0x3fbeb062d57f4de8
> +        .quad 0x3fbe8bc77271b97a
> +        .quad 0x3fbe6738190e394c
> +        .quad 0x3fbe42b4c16caaf3
> +        .quad 0x3fbe1e3d63acb3ba
> +        .quad 0x3fbdf9d1f7f5b674
> +        .quad 0x3fbdd5727676c959
> +        .quad 0x3fbdb11ed766abf4
> +        .quad 0x3fbd8cd71303bd26
> +        .quad 0x3fbd689b2193f133
> +        .quad 0x3fbd446afb64c7e5
> +        .quad 0x3fbd204698cb42bd
> +        .quad 0x3fbcfc2df223db2d
> +        .quad 0x3fbcd820ffd278f3
> +        .quad 0x3fbcb41fba42686d
> +        .quad 0x3fbc902a19e65111
> +        .quad 0x3fbc6c4017382bea
> +        .quad 0x3fbc4861aab93a23
> +        .quad 0x3fbc248eccf1fba6
> +        .quad 0x3fbc00c7767225cb
> +        .quad 0x3fbbdd0b9fd09a10
> +        .quad 0x3fbbb95b41ab5ce6
> +        .quad 0x3fbb95b654a78c87
> +        .quad 0x3fbb721cd17157e3
> +        .quad 0x3fbb4e8eb0bbf58f
> +        .quad 0x3fbb2b0beb419ad0
> +        .quad 0x3fbb079479c372ad
> +        .quad 0x3fbae4285509950b
> +        .quad 0x3fbac0c775e2fde6
> +        .quad 0x3fba9d71d5258484
> +        .quad 0x3fba7a276badd2c8
> +        .quad 0x3fba56e8325f5c87
> +        .quad 0x3fba33b4222456f1
> +        .quad 0x3fba108b33edb005
> +        .quad 0x3fb9ed6d60b30612
> +        .quad 0x3fb9ca5aa1729f45
> +        .quad 0x3fb9a752ef316149
> +        .quad 0x3fb9845642fac8f0
> +        .quad 0x3fb9616495e0e1e8
> +        .quad 0x3fb93e7de0fc3e80
> +        .quad 0x3fb91ba21d6bef77
> +        .quad 0x3fb8f8d144557bdf
> +        .quad 0x3fb8d60b4ee4d901
> +        .quad 0x3fb8b350364c6257
> +        .quad 0x3fb8909ff3c4d191
> +        .quad 0x3fb86dfa808d36a0
> +        .quad 0x3fb84b5fd5eaefd8
> +        .quad 0x3fb828cfed29a215
> +        .quad 0x3fb8064abf9b30f1
> +        .quad 0x3fb7e3d04697b704
> +        .quad 0x3fb7c1607b7d7e32
> +        .quad 0x3fb79efb57b0f803
> +        .quad 0x3fb77ca0d49cb608
> +        .quad 0x3fb75a50ebb1624a
> +        .quad 0x3fb7380b9665b7c8
> +        .quad 0x3fb715d0ce367afc
> +        .quad 0x3fb6f3a08ca67270
> +        .quad 0x3fb6d17acb3e5f5e
> +        .quad 0x3fb6af5f838cf654
> +        .quad 0x3fb68d4eaf26d7ee
> +        .quad 0x3fb66b4847a68997
> +        .quad 0x3fb6494c46ac6e4d
> +        .quad 0x3fb6275aa5debf81
> +        .quad 0x3fb605735ee985f1
> +        .quad 0x3fb5e3966b7e9295
> +        .quad 0x3fb5c1c3c5557799
> +        .quad 0x3fb59ffb662b815c
> +        .quad 0x3fb57e3d47c3af7b
> +        .quad 0x3fb55c8963e6adeb
> +        .quad 0x3fb53adfb462ce16
> +        .quad 0x3fb51940330c000b
> +        .quad 0x3fb4f7aad9bbcbaf
> +        .quad 0x3fb4d61fa2514a00
> +        .quad 0x3fb4b49e86b11e5f
> +        .quad 0x3fb4932780c56fe2
> +        .quad 0x3fb471ba8a7de2b7
> +        .quad 0x3fb450579dcf9186
> +        .quad 0x3fb42efeb4b506e9
> +        .quad 0x3fb40dafc92e36e2
> +        .quad 0x3fb3ec6ad5407868
> +        .quad 0x3fb3cb2fd2f67ef1
> +        .quad 0x3fb3a9febc60540a
> +        .quad 0x3fb388d78b9350ff
> +        .quad 0x3fb367ba3aaa1883
> +        .quad 0x3fb346a6c3c49066
> +        .quad 0x3fb3259d2107db54
> +        .quad 0x3fb3049d4c9e52a0
> +        .quad 0x3fb2e3a740b7800f
> +        .quad 0x3fb2c2baf78817b7
> +        .quad 0x3fb2a1d86b49f1e2
> +        .quad 0x3fb280ff963c04fc
> +        .quad 0x3fb2603072a25f82
> +        .quad 0x3fb23f6afac6220a
> +        .quad 0x3fb21eaf28f57941
> +        .quad 0x3fb1fdfcf7839804
> +        .quad 0x3fb1dd5460c8b16f
> +        .quad 0x3fb1bcb55f21f307
> +        .quad 0x3fb19c1fecf17ee0
> +        .quad 0x3fb17b94049e65d0
> +        .quad 0x3fb15b11a094a1aa
> +        .quad 0x3fb13a98bb450f81
> +        .quad 0x3fb11a294f2569f6
> +        .quad 0x3fb0f9c356b04389
> +        .quad 0x3fb0d966cc6500fa
> +        .quad 0x3fb0b913aac7d3a7
> +        .quad 0x3fb098c9ec61b3ff
> +        .quad 0x3fb078898bc05bf4
> +        .quad 0x3fb0585283764178
> +        .quad 0x3fb03824ce1a9101
> +        .quad 0x3fb0180066492817
> +        .quad 0x3fafefca8d451fd6
> +        .quad 0x3fafafa6d397efdb
> +        .quad 0x3faf6f9594de60f0
> +        .quad 0x3faf2f96c6754aee
> +        .quad 0x3faeefaa5dc2b239
> +        .quad 0x3faeafd05035bd3b
> +        .quad 0x3fae70089346a9e6
> +        .quad 0x3fae30531c76c34a
> +        .quad 0x3fadf0afe1505738
> +        .quad 0x3fadb11ed766abf4
> +        .quad 0x3fad719ff455f5f7
> +        .quad 0x3fad32332dc34dbd
> +        .quad 0x3facf2d8795ca5a5
> +        .quad 0x3facb38fccd8bfdb
> +        .quad 0x3fac74591df72456
> +        .quad 0x3fac3534628016dd
> +        .quad 0x3fabf62190448d22
> +        .quad 0x3fabb7209d1e24e5
> +        .quad 0x3fab78317eef1a29
> +        .quad 0x3fab39542ba23d73
> +        .quad 0x3faafa88992aea19
> +        .quad 0x3faabbcebd84fca0
> +        .quad 0x3faa7d268eb4c924
> +        .quad 0x3faa3e9002c711d2
> +        .quad 0x3faa000b0fd0fd6b
> +        .quad 0x3fa9c197abf00dd7
> +        .quad 0x3fa98335cd4a16c3
> +        .quad 0x3fa944e56a0d3450
> +        .quad 0x3fa906a6786fc1cb
> +        .quad 0x3fa8c878eeb05074
> +        .quad 0x3fa88a5cc3159e53
> +        .quad 0x3fa84c51ebee8d15
> +        .quad 0x3fa80e585f9218fc
> +        .quad 0x3fa7d070145f4fd7
> +        .quad 0x3fa7929900bd4809
> +        .quad 0x3fa754d31b1b179c
> +        .quad 0x3fa7171e59efcb5f
> +        .quad 0x3fa6d97ab3ba5e10
> +        .quad 0x3fa69be81f01af99
> +        .quad 0x3fa65e6692547c4e
> +        .quad 0x3fa620f604495440
> +        .quad 0x3fa5e3966b7e9295
> +        .quad 0x3fa5a647be9a54f6
> +        .quad 0x3fa56909f44a72fe
> +        .quad 0x3fa52bdd034475b8
> +        .quad 0x3fa4eec0e2458f30
> +        .quad 0x3fa4b1b588129203
> +        .quad 0x3fa474baeb77e904
> +        .quad 0x3fa437d103498eec
> +        .quad 0x3fa3faf7c663060e
> +        .quad 0x3fa3be2f2ba7501f
> +        .quad 0x3fa381772a00e604
> +        .quad 0x3fa344cfb861afae
> +        .quad 0x3fa30838cdc2fbfd
> +        .quad 0x3fa2cbb2612578b4
> +        .quad 0x3fa28f3c69912a74
> +        .quad 0x3fa252d6de1564c1
> +        .quad 0x3fa21681b5c8c213
> +        .quad 0x3fa1da3ce7c91bf8
> +        .quad 0x3fa19e086b3b8333
> +        .quad 0x3fa161e4374c37f4
> +        .quad 0x3fa125d0432ea20e
> +        .quad 0x3fa0e9cc861d4944
> +        .quad 0x3fa0add8f759cd95
> +        .quad 0x3fa071f58e2cdf9b
> +        .quad 0x3fa0362241e638ec
> +        .quad 0x3f9ff4be13b92920
> +        .quad 0x3f9f7d57badb4ee8
> +        .quad 0x3f9f061167fc31e8
> +        .quad 0x3f9e8eeb09f2f6cb
> +        .quad 0x3f9e17e48fa48962
> +        .quad 0x3f9da0fde8038de9
> +        .quad 0x3f9d2a3702105259
> +        .quad 0x3f9cb38fccd8bfdb
> +        .quad 0x3f9c3d0837784c41
> +        .quad 0x3f9bc6a03117eb97
> +        .quad 0x3f9b5057a8ee01ce
> +        .quad 0x3f9ada2e8e3e546f
> +        .quad 0x3f9a6424d059fc68
> +        .quad 0x3f99ee3a5e9f57e8
> +        .quad 0x3f99786f2879fc53
> +        .quad 0x3f9902c31d62a843
> +        .quad 0x3f988d362cdf359e
> +        .quad 0x3f9817c846828bbd
> +        .quad 0x3f97a27959ec91aa
> +        .quad 0x3f972d4956ca2067
> +        .quad 0x3f96b8382cd4f551
> +        .quad 0x3f964345cbd3a491
> +        .quad 0x3f95ce7223998b98
> +        .quad 0x3f9559bd2406c3ba
> +        .quad 0x3f94e526bd0814d1
> +        .quad 0x3f9470aede96e7f2
> +        .quad 0x3f93fc5578b93a38
> +        .quad 0x3f93881a7b818f9e
> +        .quad 0x3f9313fdd70ee5e8
> +        .quad 0x3f929fff7b8ca79d
> +        .quad 0x3f922c1f59329f1b
> +        .quad 0x3f91b85d6044e9ae
> +        .quad 0x3f9144b98113eac0
> +        .quad 0x3f90d133abfc3f1b
> +        .quad 0x3f905dcbd166b033
> +        .quad 0x3f8fd503c3904f1d
> +        .quad 0x3f8eeeab9b43445d
> +        .quad 0x3f8e088f0b004827
> +        .quad 0x3f8d22adf3f9579d
> +        .quad 0x3f8c3d0837784c41
> +        .quad 0x3f8b579db6dec358
> +        .quad 0x3f8a726e53a6056e
> +        .quad 0x3f898d79ef5eedf0
> +        .quad 0x3f88a8c06bb1d2f4
> +        .quad 0x3f87c441aa5e6d15
> +        .quad 0x3f86dffd8d3bbf70
> +        .quad 0x3f85fbf3f637ffc5
> +        .quad 0x3f851824c7587eb0
> +        .quad 0x3f84348fe2b99002
> +        .quad 0x3f8351352a8e733f
> +        .quad 0x3f826e1481213c2e
> +        .quad 0x3f818b2dc8d2bb91
> +        .quad 0x3f80a880e41a67f6
> +        .quad 0x3f7f8c1b6b0c8d4e
> +        .quad 0x3f7dc7a83f75a96d
> +        .quad 0x3f7c03a80ae5e054
> +        .quad 0x3f7a401a92ff827e
> +        .quad 0x3f787cff9d9147a5
> +        .quad 0x3f76ba56f09621bc
> +        .quad 0x3f74f8205235102d
> +        .quad 0x3f73365b88c0f347
> +        .quad 0x3f7175085ab85ff0
> +        .quad 0x3f6f684d1d8ae702
> +        .quad 0x3f6be76bd77b4fc3
> +        .quad 0x3f68676c71434fb9
> +        .quad 0x3f64e84e793a474a
> +        .quad 0x3f616a117e0d4b30
> +        .quad 0x3f5bd96a1d7d9cbc
> +        .quad 0x3f54e071754c98ba
> +        .quad 0x3f4bd27045bfd025
> +        .quad 0x3f3bcef518e29612
> +        .quad 0x8000000000000000
> +        /*== poly_coeff[5] ==*/
> +        .align 32
> +        .quad 0x3fb63C65231FBD16, 0x3fb63C65231FBD16, 0x3fb63C65231FBD16, 0x3fb63C65231FBD16 /* coeff5 */
> +        .quad 0xbfbBCB7D4EFBE80B, 0xbfbBCB7D4EFBE80B, 0xbfbBCB7D4EFBE80B, 0xbfbBCB7D4EFBE80B /* coeff4 */
> +        .quad 0x3fc287A7636F341E, 0x3fc287A7636F341E, 0x3fc287A7636F341E, 0x3fc287A7636F341E /* coeff3 */
> +        .quad 0xbfcBCB7B1526DE36, 0xbfcBCB7B1526DE36, 0xbfcBCB7B1526DE36, 0xbfcBCB7B1526DE36 /* coeff2 */
> +        .quad 0x3fdBCB7B1526E50E, 0x3fdBCB7B1526E50E, 0x3fdBCB7B1526E50E, 0x3fdBCB7B1526E50E /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 32
> +        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinNorm ==*/
> +        .align 32
> +        .quad 0x0010000000000000, 0x0010000000000000, 0x0010000000000000, 0x0010000000000000
> +        /*== MaxNorm ==*/
> +        .align 32
> +        .quad 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff
> +        /*== HalfMask ==*/
> +        .align 32
> +        .quad 0xfffffffffc000000, 0xfffffffffc000000, 0xfffffffffc000000, 0xfffffffffc000000
> +        /*== One ==*/
> +        .align 32
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== Threshold ==*/
> +        .align 32
> +        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 32
> +        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 32
> +        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
> +        /*== L2 ==*/
> +        .align 32
> +        .quad 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff
> +        .align 32
> +        .type	__svml_dlog10_data_internal,@object
> +        .size	__svml_dlog10_data_internal,.-__svml_dlog10_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core-avx2.S
> new file mode 100644
> index 0000000000..3432e7cffe
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized log10, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_log10 _ZGVeN8v_log10_avx2_wrapper
> +#include "../svml_d_log108_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core.c
> new file mode 100644
> index 0000000000..273a0d4739
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized log10, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_log10
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_log10, __GI__ZGVeN8v_log10, __redirect__ZGVeN8v_log10)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core_avx512.S
> new file mode 100644
> index 0000000000..0799f99eba
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log108_core_avx512.S
> @@ -0,0 +1,299 @@
> +/* Function log10 vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
> + *       log10(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dlog10_data_internal_avx512
> + */
> +#define Log_tbl                       	0
> +#define One                           	128
> +#define C075                          	192
> +#define poly_coeff9                   	256
> +#define poly_coeff8                   	320
> +#define poly_coeff7                   	384
> +#define poly_coeff6                   	448
> +#define poly_coeff5                   	512
> +#define poly_coeff4                   	576
> +#define poly_coeff3                   	640
> +#define poly_coeff2                   	704
> +#define poly_coeff1                   	768
> +#define L2                            	832
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_log10_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovaps   %zmm0, %zmm7
> +        vgetmantpd $8, {sae}, %zmm7, %zmm6
> +        vmovups   One+__svml_dlog10_data_internal_avx512(%rip), %zmm3
> +        vmovups   poly_coeff5+__svml_dlog10_data_internal_avx512(%rip), %zmm12
> +        vmovups   poly_coeff3+__svml_dlog10_data_internal_avx512(%rip), %zmm13
> +
> +/* Start polynomial evaluation */
> +        vmovups   poly_coeff9+__svml_dlog10_data_internal_avx512(%rip), %zmm10
> +        vmovups   poly_coeff8+__svml_dlog10_data_internal_avx512(%rip), %zmm1
> +        vmovups   poly_coeff7+__svml_dlog10_data_internal_avx512(%rip), %zmm11
> +        vmovups   poly_coeff6+__svml_dlog10_data_internal_avx512(%rip), %zmm14
> +
> +/* Prepare exponent correction: DblRcp<0.75? */
> +        vmovups   C075+__svml_dlog10_data_internal_avx512(%rip), %zmm2
> +
> +/* Table lookup */
> +        vmovups   __svml_dlog10_data_internal_avx512(%rip), %zmm5
> +
> +/* GetExp(x) */
> +        vgetexppd {sae}, %zmm7, %zmm0
> +
> +/* DblRcp ~ 1/Mantissa */
> +        vrcp14pd  %zmm6, %zmm8
> +
> +/* x<=0? */
> +        vfpclasspd $94, %zmm7, %k0
> +
> +/* round DblRcp to 4 fractional bits (RN mode, no Precision exception) */
> +        vrndscalepd $88, {sae}, %zmm8, %zmm4
> +        vmovups   poly_coeff4+__svml_dlog10_data_internal_avx512(%rip), %zmm8
> +        kmovw     %k0, %edx
> +
> +/* Reduced argument: R = DblRcp*Mantissa - 1 */
> +        vfmsub213pd {rn-sae}, %zmm3, %zmm4, %zmm6
> +        vcmppd    $17, {sae}, %zmm2, %zmm4, %k1
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm12, %zmm8
> +        vmovups   poly_coeff2+__svml_dlog10_data_internal_avx512(%rip), %zmm12
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm10, %zmm1
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm11, %zmm14
> +        vmovups   poly_coeff1+__svml_dlog10_data_internal_avx512(%rip), %zmm2
> +
> +/* R^2 */
> +        vmulpd    {rn-sae}, %zmm6, %zmm6, %zmm15
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm13, %zmm12
> +
> +/* Prepare table index */
> +        vpsrlq    $48, %zmm4, %zmm9
> +
> +/* add 1 to Expon if DblRcp<0.75 */
> +        vaddpd    {rn-sae}, %zmm3, %zmm0, %zmm0{%k1}
> +        vmulpd    {rn-sae}, %zmm15, %zmm15, %zmm13
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm15, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm15, %zmm8
> +        vpermt2pd Log_tbl+64+__svml_dlog10_data_internal_avx512(%rip), %zmm9, %zmm5
> +
> +/* polynomial */
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm13, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm6, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm5, %zmm1, %zmm6
> +        vmovups   L2+__svml_dlog10_data_internal_avx512(%rip), %zmm1
> +        vfmadd213pd {rn-sae}, %zmm6, %zmm1, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm7
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm7, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      log10@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_log10_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dlog10_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Log_tbl[16][2];
> +        __declspec(align(64)) VUINT32 One[8][2];
> +        __declspec(align(64)) VUINT32 C075[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
> +        __declspec(align(64)) VUINT32 L2[8][2];
> +   } __svml_dlog10_data_internal_avx512;
> +#endif
> +__svml_dlog10_data_internal_avx512:
> +        /*== Log_tbl ==*/
> +        .quad 0x0000000000000000
> +        .quad 0xbf9af5f92b00e610
> +        .quad 0xbfaa30a9d609efea
> +        .quad 0xbfb31b3055c47118
> +        .quad 0xbfb8cf183886480d
> +        .quad 0xbfbe3bc1ab0e19fe
> +        .quad 0xbfc1b3e71ec94f7b
> +        .quad 0xbfc42c7e7fe3fc02
> +        .quad 0x3fbffbfc2bbc7803
> +        .quad 0x3fbb721cd17157e3
> +        .quad 0x3fb715d0ce367afc
> +        .quad 0x3fb2e3a740b7800f
> +        .quad 0x3fadb11ed766abf4
> +        .quad 0x3fa5e3966b7e9295
> +        .quad 0x3f9cb38fccd8bfdb
> +        .quad 0x3f8c3d0837784c41
> +        /*== One ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== 0.75 ==*/
> +        .align 64
> +        .quad 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000
> +        /*== poly_coeff9 ==*/
> +        .align 64
> +        .quad 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370, 0x3fa8c2d828480370
> +        /*== poly_coeff8 ==*/
> +        .align 64
> +        .quad 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814, 0xbfabd80d96029814
> +        /*== poly_coeff7 ==*/
> +        .align 64
> +        .quad 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2, 0x3fafc3f6f38b58a2
> +        /*== poly_coeff6 ==*/
> +        .align 64
> +        .quad 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80, 0xbfb287a63464dc80
> +        /*== poly_coeff5 ==*/
> +        .align 64
> +        .quad 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9, 0x3fb63c62777f27d9
> +        /*== poly_coeff4 ==*/
> +        .align 64
> +        .quad 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3, 0xbfbbcb7b153c06a3
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .quad 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c, 0x3fc287a7636f428c
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .quad 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db, 0xbfcbcb7b1526e4db
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .quad 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e, 0x3fdbcb7b1526e50e
> +        /*== L2 ==*/
> +        .align 64
> +        .quad 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff, 0x3fd34413509f79ff
> +        .align 64
> +        .type	__svml_dlog10_data_internal_avx512,@object
> +        .size	__svml_dlog10_data_internal_avx512,.-__svml_dlog10_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core-avx2.S
> new file mode 100644
> index 0000000000..e389e2eca1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized log10f.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_log10f _ZGVeN16v_log10f_avx2_wrapper
> +#include "../svml_s_log10f16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core.c
> new file mode 100644
> index 0000000000..274fc7e0ff
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized log10f, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_log10f
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_log10f, __GI__ZGVeN16v_log10f,
> +	       __redirect__ZGVeN16v_log10f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core_avx512.S
> new file mode 100644
> index 0000000000..3dffd662ab
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f16_core_avx512.S
> @@ -0,0 +1,238 @@
> +/* Function log10f vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
> + *       log10(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_slog10_data_internal_avx512
> + */
> +#define One                           	0
> +#define coeff4                        	64
> +#define coeff3                        	128
> +#define coeff2                        	192
> +#define coeff1                        	256
> +#define L2                            	320
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_log10f_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vgetmantps $11, {sae}, %zmm0, %zmm3
> +        vmovups   __svml_slog10_data_internal_avx512(%rip), %zmm1
> +        vgetexpps {sae}, %zmm0, %zmm5
> +        vmovups   L2+__svml_slog10_data_internal_avx512(%rip), %zmm10
> +        vpsrld    $19, %zmm3, %zmm7
> +        vgetexpps {sae}, %zmm3, %zmm6
> +        vsubps    {rn-sae}, %zmm1, %zmm3, %zmm11
> +        vpermps   coeff4+__svml_slog10_data_internal_avx512(%rip), %zmm7, %zmm1
> +        vpermps   coeff3+__svml_slog10_data_internal_avx512(%rip), %zmm7, %zmm2
> +        vsubps    {rn-sae}, %zmm6, %zmm5, %zmm9
> +        vpermps   coeff2+__svml_slog10_data_internal_avx512(%rip), %zmm7, %zmm4
> +        vpermps   coeff1+__svml_slog10_data_internal_avx512(%rip), %zmm7, %zmm8
> +
> +/* x<=0? */
> +        vfpclassps $94, %zmm0, %k0
> +        vfmadd213ps {rn-sae}, %zmm2, %zmm11, %zmm1
> +        vmulps    {rn-sae}, %zmm10, %zmm9, %zmm12
> +        vfmadd213ps {rn-sae}, %zmm4, %zmm11, %zmm1
> +        kmovw     %k0, %edx
> +        vfmadd213ps {rn-sae}, %zmm8, %zmm11, %zmm1
> +        vfmadd213ps {rn-sae}, %zmm12, %zmm11, %zmm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %zmm1, %zmm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm0, 64(%rsp)
> +        vmovups   %zmm1, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm1
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      log10f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_log10f_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_slog10_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 One[16][1];
> +        __declspec(align(64)) VUINT32 coeff4[16][1];
> +        __declspec(align(64)) VUINT32 coeff3[16][1];
> +        __declspec(align(64)) VUINT32 coeff2[16][1];
> +        __declspec(align(64)) VUINT32 coeff1[16][1];
> +        __declspec(align(64)) VUINT32 L2[16][1];
> +    } __svml_slog10_data_internal_avx512;
> +#endif
> +__svml_slog10_data_internal_avx512:
> +        /*== One ==*/
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        // c4
> +        .align 64
> +        .long 0xbdc9ae9b, 0xbda6fcf4
> +        .long 0xbd8bac76, 0xbd6bca30
> +        .long 0xbd48a99b, 0xbd2c0a9f
> +        .long 0xbd1480db, 0xbd00faf2
> +        .long 0xbe823aa9, 0xbe656348
> +        .long 0xbe4afbb9, 0xbe346895
> +        .long 0xbe20ffff, 0xbe103a0b
> +        .long 0xbe01a91c, 0xbde9e84e
> +        // c3
> +        .align 64
> +        .long 0x3e13d888, 0x3e10a87c
> +        .long 0x3e0b95c3, 0x3e057f0b
> +        .long 0x3dfde038, 0x3df080d9
> +        .long 0x3de34c1e, 0x3dd68333
> +        .long 0x3dac6e8e, 0x3dd54a51
> +        .long 0x3df30f40, 0x3e04235d
> +        .long 0x3e0b7033, 0x3e102c90
> +        .long 0x3e12ebad, 0x3e141ff8
> +        // c2
> +        .align 64
> +        .long 0xbe5e5a9b, 0xbe5e2677
> +        .long 0xbe5d83f5, 0xbe5c6016
> +        .long 0xbe5abd0b, 0xbe58a6fd
> +        .long 0xbe562e02, 0xbe5362f8
> +        .long 0xbe68e27c, 0xbe646747
> +        .long 0xbe619a73, 0xbe5ff05a
> +        .long 0xbe5f0570, 0xbe5e92d0
> +        .long 0xbe5e662b, 0xbe5e5c08
> +        // c1
> +        .align 64
> +        .long 0x3ede5bd8, 0x3ede5b45
> +        .long 0x3ede57d8, 0x3ede4eb1
> +        .long 0x3ede3d37, 0x3ede2166
> +        .long 0x3eddf9d9, 0x3eddc5bb
> +        .long 0x3ede08ed, 0x3ede32e7
> +        .long 0x3ede4967, 0x3ede5490
> +        .long 0x3ede597f, 0x3ede5b50
> +        .long 0x3ede5bca, 0x3ede5bd9
> +        /*== L2 ==*/
> +        .align 64
> +        .long 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b
> +        .align 64
> +        .type	__svml_slog10_data_internal_avx512,@object
> +        .size	__svml_slog10_data_internal_avx512,.-__svml_slog10_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core-sse2.S
> new file mode 100644
> index 0000000000..bb1cdee37e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized log10f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_log10f _ZGVbN4v_log10f_sse2
> +#include "../svml_s_log10f4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core.c
> new file mode 100644
> index 0000000000..67e9e71a76
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized log10f, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_log10f
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_log10f, __GI__ZGVbN4v_log10f,
> +	       __redirect__ZGVbN4v_log10f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core_sse4.S
> new file mode 100644
> index 0000000000..88b3535d5c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f4_core_sse4.S
> @@ -0,0 +1,243 @@
> +/* Function log10f vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
> + *       log10(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_slog10_data_internal
> + */
> +#define MinNorm                       	0
> +#define MaxNorm                       	16
> +#define L2H                           	32
> +#define L2L                           	48
> +#define iBrkValue                     	64
> +#define iOffExpoMask                  	80
> +#define One                           	96
> +#define sPoly                         	112
> +#define L2                            	256
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_log10f_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm1
> +
> +/* reduction: compute r,n */
> +        movdqu    iBrkValue+__svml_slog10_data_internal(%rip), %xmm2
> +        movaps    %xmm0, %xmm4
> +        movdqu    iOffExpoMask+__svml_slog10_data_internal(%rip), %xmm10
> +        psubd     %xmm2, %xmm1
> +        pand      %xmm1, %xmm10
> +        psrad     $23, %xmm1
> +        paddd     %xmm2, %xmm10
> +        movaps    %xmm0, %xmm3
> +        movups    sPoly+__svml_slog10_data_internal(%rip), %xmm5
> +        movups    sPoly+32+__svml_slog10_data_internal(%rip), %xmm6
> +        movups    sPoly+64+__svml_slog10_data_internal(%rip), %xmm7
> +        movups    sPoly+96+__svml_slog10_data_internal(%rip), %xmm9
> +        cvtdq2ps  %xmm1, %xmm12
> +        cmpltps   MinNorm+__svml_slog10_data_internal(%rip), %xmm4
> +        cmpnleps  MaxNorm+__svml_slog10_data_internal(%rip), %xmm3
> +        subps     One+__svml_slog10_data_internal(%rip), %xmm10
> +        mulps     %xmm10, %xmm5
> +        movaps    %xmm10, %xmm8
> +        mulps     %xmm10, %xmm6
> +        mulps     %xmm10, %xmm8
> +        addps     sPoly+16+__svml_slog10_data_internal(%rip), %xmm5
> +        mulps     %xmm10, %xmm7
> +        addps     sPoly+48+__svml_slog10_data_internal(%rip), %xmm6
> +        mulps     %xmm10, %xmm9
> +        mulps     %xmm8, %xmm5
> +        addps     sPoly+80+__svml_slog10_data_internal(%rip), %xmm7
> +        addps     sPoly+112+__svml_slog10_data_internal(%rip), %xmm9
> +        addps     %xmm5, %xmm6
> +        mulps     %xmm8, %xmm6
> +        orps      %xmm3, %xmm4
> +
> +/* combine and get argument value range mask */
> +        movmskps  %xmm4, %edx
> +        movups    L2L+__svml_slog10_data_internal(%rip), %xmm1
> +        addps     %xmm6, %xmm7
> +        mulps     %xmm12, %xmm1
> +        mulps     %xmm7, %xmm8
> +        movups    L2H+__svml_slog10_data_internal(%rip), %xmm11
> +        addps     %xmm8, %xmm9
> +        mulps     %xmm11, %xmm12
> +        mulps     %xmm10, %xmm9
> +        addps     sPoly+128+__svml_slog10_data_internal(%rip), %xmm9
> +        mulps     %xmm9, %xmm10
> +        addps     %xmm10, %xmm1
> +        addps     %xmm12, %xmm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm1, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm1, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      log10f@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_log10f_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_slog10_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 MinNorm[4][1];
> +        __declspec(align(16)) VUINT32 MaxNorm[4][1];
> +        __declspec(align(16)) VUINT32 L2H[4][1];
> +        __declspec(align(16)) VUINT32 L2L[4][1];
> +        __declspec(align(16)) VUINT32 iBrkValue[4][1];
> +        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
> +        __declspec(align(16)) VUINT32 One[4][1];
> +        __declspec(align(16)) VUINT32 sPoly[9][4][1];
> +        __declspec(align(16)) VUINT32 L2[4][1];
> +} __svml_slog10_data_internal;
> +#endif
> +__svml_slog10_data_internal:
> +        /*== MinNorm ==*/
> +        .long 0x00800000, 0x00800000, 0x00800000, 0x00800000
> +        /*== MaxNorm ==*/
> +        .align 16
> +        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
> +        /*== L2H ==*/
> +        .align 16
> +        .long 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100
> +        /*== L2L ==*/
> +        .align 16
> +        .long 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 16
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 16
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 16
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== spoly[9] ==*/
> +        .align 16
> +        .long 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4 /* coeff9 */
> +        .long 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073 /* coeff8 */
> +        .long 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317 /* coeff7 */
> +        .long 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27 /* coeff6 */
> +        .long 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96 /* coeff5 */
> +        .long 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20 /* coeff4 */
> +        .long 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5 /* coeff3 */
> +        .long 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5 /* coeff2 */
> +        .long 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9 /* coeff1 */
> +        /*== L2 ==*/
> +        .align 16
> +        .long 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b
> +        .align 16
> +        .type	__svml_slog10_data_internal,@object
> +        .size	__svml_slog10_data_internal,.-__svml_slog10_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core-sse.S
> new file mode 100644
> index 0000000000..e3467e5c90
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized log10f, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_log10f _ZGVdN8v_log10f_sse_wrapper
> +#include "../svml_s_log10f8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core.c
> new file mode 100644
> index 0000000000..bfd3ef6554
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized log10f, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_log10f
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_log10f, __GI__ZGVdN8v_log10f,
> +	       __redirect__ZGVdN8v_log10f)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core_avx2.S
> new file mode 100644
> index 0000000000..58e26342e7
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log10f8_core_avx2.S
> @@ -0,0 +1,243 @@
> +/* Function log10f vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    Get short reciprocal approximation Rcp ~ 1/mantissa(x)
> + *    R = Rcp*x - 1.0
> + *    log10(x) = k*log10(2.0) - log10(Rcp) + poly_approximation(R)
> + *       log10(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_slog10_data_internal
> + */
> +#define MinNorm                       	0
> +#define MaxNorm                       	32
> +#define L2H                           	64
> +#define L2L                           	96
> +#define iBrkValue                     	128
> +#define iOffExpoMask                  	160
> +#define One                           	192
> +#define sPoly                         	224
> +#define L2                            	512
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_log10f_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +
> +/* reduction: compute r,n */
> +        vmovups   iBrkValue+__svml_slog10_data_internal(%rip), %ymm4
> +        vmovups   sPoly+__svml_slog10_data_internal(%rip), %ymm15
> +        vmovups   sPoly+64+__svml_slog10_data_internal(%rip), %ymm9
> +        vmovups   sPoly+128+__svml_slog10_data_internal(%rip), %ymm10
> +        vmovups   sPoly+192+__svml_slog10_data_internal(%rip), %ymm12
> +        vpsubd    %ymm4, %ymm0, %ymm1
> +        vcmplt_oqps MinNorm+__svml_slog10_data_internal(%rip), %ymm0, %ymm5
> +        vcmpnle_uqps MaxNorm+__svml_slog10_data_internal(%rip), %ymm0, %ymm6
> +        vpand     iOffExpoMask+__svml_slog10_data_internal(%rip), %ymm1, %ymm3
> +        vpsrad    $23, %ymm1, %ymm2
> +        vpaddd    %ymm4, %ymm3, %ymm8
> +        vcvtdq2ps %ymm2, %ymm1
> +        vsubps    One+__svml_slog10_data_internal(%rip), %ymm8, %ymm13
> +        vmulps    L2L+__svml_slog10_data_internal(%rip), %ymm1, %ymm14
> +        vfmadd213ps sPoly+32+__svml_slog10_data_internal(%rip), %ymm13, %ymm15
> +        vfmadd213ps sPoly+96+__svml_slog10_data_internal(%rip), %ymm13, %ymm9
> +        vmulps    %ymm13, %ymm13, %ymm11
> +        vfmadd213ps sPoly+160+__svml_slog10_data_internal(%rip), %ymm13, %ymm10
> +        vfmadd213ps sPoly+224+__svml_slog10_data_internal(%rip), %ymm13, %ymm12
> +        vfmadd213ps %ymm9, %ymm11, %ymm15
> +        vfmadd213ps %ymm10, %ymm11, %ymm15
> +        vfmadd213ps %ymm12, %ymm11, %ymm15
> +        vfmadd213ps sPoly+256+__svml_slog10_data_internal(%rip), %ymm13, %ymm15
> +        vfmadd213ps %ymm14, %ymm13, %ymm15
> +        vorps     %ymm6, %ymm5, %ymm7
> +
> +/* combine and get argument value range mask */
> +        vmovmskps %ymm7, %edx
> +        vfmadd132ps L2H+__svml_slog10_data_internal(%rip), %ymm15, %ymm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        vmovaps   %ymm1, %ymm0
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm0, 32(%rsp)
> +        vmovups   %ymm1, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm1
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      log10f@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_log10f_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_slog10_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 MinNorm[8][1];
> +        __declspec(align(32)) VUINT32 MaxNorm[8][1];
> +        __declspec(align(32)) VUINT32 L2H[8][1];
> +        __declspec(align(32)) VUINT32 L2L[8][1];
> +        __declspec(align(32)) VUINT32 iBrkValue[8][1];
> +        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
> +        __declspec(align(32)) VUINT32 One[8][1];
> +        __declspec(align(32)) VUINT32 sPoly[9][8][1];
> +        __declspec(align(32)) VUINT32 L2[8][1];
> +} __svml_slog10_data_internal;
> +#endif
> +__svml_slog10_data_internal:
> +        /*== MinNorm ==*/
> +        .long 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000, 0x00800000
> +        /*== MaxNorm ==*/
> +        .align 32
> +        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
> +        /*== L2H ==*/
> +        .align 32
> +        .long 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100, 0x3e9a2100
> +        /*== L2L ==*/
> +        .align 32
> +        .long 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600, 0xb64AF600
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 32
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 32
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 32
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== spoly[9] ==*/
> +        .align 32
> +        .long 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4, 0x3d8063B4 /* coeff9 */
> +        .long 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073, 0xbd890073 /* coeff8 */
> +        .long 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317, 0x3d775317 /* coeff7 */
> +        .long 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27, 0xbd91FB27 /* coeff6 */
> +        .long 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96, 0x3dB20B96 /* coeff5 */
> +        .long 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20, 0xbdDE6E20 /* coeff4 */
> +        .long 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5, 0x3e143CE5 /* coeff3 */
> +        .long 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5, 0xbe5E5BC5 /* coeff2 */
> +        .long 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9, 0x3eDE5BD9 /* coeff1 */
> +        /*== L2 ==*/
> +        .align 32
> +        .long 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b, 0x3e9a209b
> +        .align 32
> +        .type	__svml_slog10_data_internal,@object
> +        .size	__svml_slog10_data_internal,.-__svml_slog10_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_log102_core.S b/sysdeps/x86_64/fpu/svml_d_log102_core.S
> new file mode 100644
> index 0000000000..3d0c058ac2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log102_core.S
> @@ -0,0 +1,29 @@
> +/* Function log10 vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_log10)
> +WRAPPER_IMPL_SSE2 log10
> +END (_ZGVbN2v_log10)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_log10)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_log104_core.S b/sysdeps/x86_64/fpu/svml_d_log104_core.S
> new file mode 100644
> index 0000000000..9e32c62c0e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log104_core.S
> @@ -0,0 +1,29 @@
> +/* Function log10 vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_log10)
> +WRAPPER_IMPL_AVX _ZGVbN2v_log10
> +END (_ZGVdN4v_log10)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_log10)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_log104_core_avx.S b/sysdeps/x86_64/fpu/svml_d_log104_core_avx.S
> new file mode 100644
> index 0000000000..2b073b16f9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log104_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function log10 vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_log10)
> +WRAPPER_IMPL_AVX _ZGVbN2v_log10
> +END (_ZGVcN4v_log10)
> diff --git a/sysdeps/x86_64/fpu/svml_d_log108_core.S b/sysdeps/x86_64/fpu/svml_d_log108_core.S
> new file mode 100644
> index 0000000000..853d791f2d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log108_core.S
> @@ -0,0 +1,25 @@
> +/* Function log10 vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_log10)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_log10
> +END (_ZGVeN8v_log10)
> diff --git a/sysdeps/x86_64/fpu/svml_s_log10f16_core.S b/sysdeps/x86_64/fpu/svml_s_log10f16_core.S
> new file mode 100644
> index 0000000000..769603c92d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log10f16_core.S
> @@ -0,0 +1,25 @@
> +/* Function log10f vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_log10f)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_log10f
> +END (_ZGVeN16v_log10f)
> diff --git a/sysdeps/x86_64/fpu/svml_s_log10f4_core.S b/sysdeps/x86_64/fpu/svml_s_log10f4_core.S
> new file mode 100644
> index 0000000000..523525409b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log10f4_core.S
> @@ -0,0 +1,29 @@
> +/* Function log10f vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_log10f)
> +WRAPPER_IMPL_SSE2 log10f
> +END (_ZGVbN4v_log10f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_log10f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_log10f8_core.S b/sysdeps/x86_64/fpu/svml_s_log10f8_core.S
> new file mode 100644
> index 0000000000..630ec76b7f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log10f8_core.S
> @@ -0,0 +1,29 @@
> +/* Function log10f vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_log10f)
> +WRAPPER_IMPL_AVX _ZGVbN4v_log10f
> +END (_ZGVdN8v_log10f)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_log10f)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S
> new file mode 100644
> index 0000000000..374208cb2c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log10f8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function log10f vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_log10f)
> +WRAPPER_IMPL_AVX _ZGVbN4v_log10f
> +END (_ZGVcN8v_log10f)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx.c
> new file mode 100644
> index 0000000000..770fd725e0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-log10.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx2.c
> new file mode 100644
> index 0000000000..770fd725e0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-log10.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx512f.c
> new file mode 100644
> index 0000000000..770fd725e0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log10-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-log10.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log10.c b/sysdeps/x86_64/fpu/test-double-libmvec-log10.c
> new file mode 100644
> index 0000000000..cb1ab36819
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log10.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC log10
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 37a7a1c777..3dce136dfc 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVbN2v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVbN2v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
> +VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 4313f67e06..1852625897 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVdN4v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVdN4v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
> +VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 4b8b00f16d..cf9ea35ffe 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVcN4v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVcN4v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
> +VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index d06522a407..b6457ea032 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1), _ZGVeN8v_expm1)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinh), _ZGVeN8v_sinh)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
> +VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx.c
> new file mode 100644
> index 0000000000..04f017f1e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-log10f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx2.c
> new file mode 100644
> index 0000000000..04f017f1e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-log10f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx512f.c
> new file mode 100644
> index 0000000000..04f017f1e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log10f-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-log10f.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log10f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log10f.c
> new file mode 100644
> index 0000000000..682ce1e239
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log10f.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC log10f
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 0bd631bf9a..272e754e1b 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVeN16v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVeN16v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index 1018398bd3..b892258b99 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVbN4v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVbN4v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index 42ea28f30f..1c6ead71e1 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -41,6 +41,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVdN8v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVdN8v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 70a0216a07..71f5d8d7b6 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -38,6 +38,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (expm1f), _ZGVcN8v_expm1f)
>  VECTOR_WRAPPER (WRAPPER_NAME (sinhf), _ZGVcN8v_sinhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 15/18] x86-64: Add vector acosh/acoshf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 15/18] x86-64: Add vector acosh/acoshf " Sunil K Pandey
@ 2021-12-29 21:26   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:26 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:57PM -0800, Sunil K Pandey wrote:
> Implement vectorized acosh/acoshf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector acosh/acoshf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |   11 +
>  math/bits/mathcalls.h                         |    2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |    4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |    1 +
>  sysdeps/x86_64/fpu/Versions                   |    2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
>  .../fpu/multiarch/svml_d_acosh2_core-sse2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_d_acosh2_core.c |   27 +
>  .../fpu/multiarch/svml_d_acosh2_core_sse4.S   | 1469 ++++++++++++++++
>  .../fpu/multiarch/svml_d_acosh4_core-sse.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_acosh4_core.c |   27 +
>  .../fpu/multiarch/svml_d_acosh4_core_avx2.S   | 1536 +++++++++++++++++
>  .../fpu/multiarch/svml_d_acosh8_core-avx2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_d_acosh8_core.c |   27 +
>  .../fpu/multiarch/svml_d_acosh8_core_avx512.S |  480 ++++++
>  .../fpu/multiarch/svml_s_acoshf16_core-avx2.S |   20 +
>  .../fpu/multiarch/svml_s_acoshf16_core.c      |   28 +
>  .../multiarch/svml_s_acoshf16_core_avx512.S   |  449 +++++
>  .../fpu/multiarch/svml_s_acoshf4_core-sse2.S  |   20 +
>  .../fpu/multiarch/svml_s_acoshf4_core.c       |   28 +
>  .../fpu/multiarch/svml_s_acoshf4_core_sse4.S  |  389 +++++
>  .../fpu/multiarch/svml_s_acoshf8_core-sse.S   |   20 +
>  .../fpu/multiarch/svml_s_acoshf8_core.c       |   28 +
>  .../fpu/multiarch/svml_s_acoshf8_core_avx2.S  |  370 ++++
>  sysdeps/x86_64/fpu/svml_d_acosh2_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_d_acosh4_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S   |   25 +
>  sysdeps/x86_64/fpu/svml_d_acosh8_core.S       |   25 +
>  sysdeps/x86_64/fpu/svml_s_acoshf16_core.S     |   25 +
>  sysdeps/x86_64/fpu/svml_s_acoshf4_core.S      |   29 +
>  sysdeps/x86_64/fpu/svml_s_acoshf8_core.S      |   29 +
>  sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S  |   25 +
>  .../fpu/test-double-libmvec-acosh-avx.c       |    1 +
>  .../fpu/test-double-libmvec-acosh-avx2.c      |    1 +
>  .../fpu/test-double-libmvec-acosh-avx512f.c   |    1 +
>  .../x86_64/fpu/test-double-libmvec-acosh.c    |    3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
>  .../fpu/test-float-libmvec-acoshf-avx.c       |    1 +
>  .../fpu/test-float-libmvec-acoshf-avx2.c      |    1 +
>  .../fpu/test-float-libmvec-acoshf-avx512f.c   |    1 +
>  .../x86_64/fpu/test-float-libmvec-acoshf.c    |    3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
>  50 files changed, 5265 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_acosh8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acosh.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acoshf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index bb7380a446..b17bf78cd9 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -263,4 +263,15 @@
>  #define __DECL_SIMD_atanhf32x
>  #define __DECL_SIMD_atanhf64x
>  #define __DECL_SIMD_atanhf128x
> +
> +#define __DECL_SIMD_acosh
> +#define __DECL_SIMD_acoshf
> +#define __DECL_SIMD_acoshl
> +#define __DECL_SIMD_acoshf16
> +#define __DECL_SIMD_acoshf32
> +#define __DECL_SIMD_acoshf64
> +#define __DECL_SIMD_acoshf128
> +#define __DECL_SIMD_acoshf32x
> +#define __DECL_SIMD_acoshf64x
> +#define __DECL_SIMD_acoshf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 04dd9c5d1b..bc37973c41 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -82,7 +82,7 @@ __MATHDECL_VEC (void,sincos,,
>  
>  #if defined __USE_XOPEN_EXTENDED || defined __USE_ISOC99
>  /* Hyperbolic arc cosine of X.  */
> -__MATHCALL (acosh,, (_Mdouble_ __x));
> +__MATHCALL_VEC (acosh,, (_Mdouble_ __x));
>  /* Hyperbolic arc sine of X.  */
>  __MATHCALL (asinh,, (_Mdouble_ __x));
>  /* Hyperbolic arc tangent of X.  */
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index 2d389912b1..e9d6ade70a 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -47,6 +47,7 @@ GLIBC_2.22 _ZGVeN8v_sin F
>  GLIBC_2.22 _ZGVeN8vv_pow F
>  GLIBC_2.22 _ZGVeN8vvv_sincos F
>  GLIBC_2.35 _ZGVbN2v_acos F
> +GLIBC_2.35 _ZGVbN2v_acosh F
>  GLIBC_2.35 _ZGVbN2v_asin F
>  GLIBC_2.35 _ZGVbN2v_atan F
>  GLIBC_2.35 _ZGVbN2v_atanh F
> @@ -62,6 +63,7 @@ GLIBC_2.35 _ZGVbN2v_sinh F
>  GLIBC_2.35 _ZGVbN2vv_atan2 F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
> +GLIBC_2.35 _ZGVbN4v_acoshf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
>  GLIBC_2.35 _ZGVbN4v_atanf F
>  GLIBC_2.35 _ZGVbN4v_atanhf F
> @@ -77,6 +79,7 @@ GLIBC_2.35 _ZGVbN4v_sinhf F
>  GLIBC_2.35 _ZGVbN4vv_atan2f F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
> +GLIBC_2.35 _ZGVcN4v_acosh F
>  GLIBC_2.35 _ZGVcN4v_asin F
>  GLIBC_2.35 _ZGVcN4v_atan F
>  GLIBC_2.35 _ZGVcN4v_atanh F
> @@ -92,6 +95,7 @@ GLIBC_2.35 _ZGVcN4v_sinh F
>  GLIBC_2.35 _ZGVcN4vv_atan2 F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
> +GLIBC_2.35 _ZGVcN8v_acoshf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
>  GLIBC_2.35 _ZGVcN8v_atanf F
>  GLIBC_2.35 _ZGVcN8v_atanhf F
> @@ -107,6 +111,7 @@ GLIBC_2.35 _ZGVcN8v_sinhf F
>  GLIBC_2.35 _ZGVcN8vv_atan2f F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
> +GLIBC_2.35 _ZGVdN4v_acosh F
>  GLIBC_2.35 _ZGVdN4v_asin F
>  GLIBC_2.35 _ZGVdN4v_atan F
>  GLIBC_2.35 _ZGVdN4v_atanh F
> @@ -122,6 +127,7 @@ GLIBC_2.35 _ZGVdN4v_sinh F
>  GLIBC_2.35 _ZGVdN4vv_atan2 F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
> +GLIBC_2.35 _ZGVdN8v_acoshf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
>  GLIBC_2.35 _ZGVdN8v_atanf F
>  GLIBC_2.35 _ZGVdN8v_atanhf F
> @@ -137,6 +143,7 @@ GLIBC_2.35 _ZGVdN8v_sinhf F
>  GLIBC_2.35 _ZGVdN8vv_atan2f F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
> +GLIBC_2.35 _ZGVeN16v_acoshf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
>  GLIBC_2.35 _ZGVeN16v_atanf F
>  GLIBC_2.35 _ZGVeN16v_atanhf F
> @@ -152,6 +159,7 @@ GLIBC_2.35 _ZGVeN16v_sinhf F
>  GLIBC_2.35 _ZGVeN16vv_atan2f F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
> +GLIBC_2.35 _ZGVeN8v_acosh F
>  GLIBC_2.35 _ZGVeN8v_asin F
>  GLIBC_2.35 _ZGVeN8v_atan F
>  GLIBC_2.35 _ZGVeN8v_atanh F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 4937b6811f..4ad12a33e5 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -118,6 +118,10 @@
>  #  define __DECL_SIMD_atanh __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_atanhf
>  #  define __DECL_SIMD_atanhf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_acosh
> +#  define __DECL_SIMD_acosh __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_acoshf
> +#  define __DECL_SIMD_acoshf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index da39c08ba9..503547d3e4 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -58,6 +58,8 @@
>  !GCC$ builtin (log1pf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (atanh) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (atanhf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (acosh) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (acoshf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -101,3 +103,5 @@
>  !GCC$ builtin (log1pf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (atanh) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (atanhf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (acosh) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (acoshf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index de87544259..7b90b3d049 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -23,6 +23,7 @@ postclean-generated += libmvec.mk
>  # Define for both math and mathvec directories.
>  libmvec-funcs = \
>    acos \
> +  acosh \
>    asin \
>    atan \
>    atan2 \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index df0ea83711..fd5e5923a1 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -15,6 +15,7 @@ libmvec {
>    }
>    GLIBC_2.35 {
>      _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
> +    _ZGVbN2v_acosh; _ZGVcN4v_acosh; _ZGVdN4v_acosh; _ZGVeN8v_acosh;
>      _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
>      _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
>      _ZGVbN2v_atanh; _ZGVcN4v_atanh; _ZGVdN4v_atanh; _ZGVeN8v_atanh;
> @@ -30,6 +31,7 @@ libmvec {
>      _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
> +    _ZGVbN4v_acoshf; _ZGVcN8v_acoshf; _ZGVdN8v_acoshf; _ZGVeN16v_acoshf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
>      _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
>      _ZGVbN4v_atanhf; _ZGVcN8v_atanhf; _ZGVdN8v_atanhf; _ZGVeN16v_atanhf;
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 09a46190b6..b2aa8fc56e 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -69,6 +69,26 @@ float: 2
>  float128: 3
>  ldouble: 3
>  
> +Function: "acosh_vlen16":
> +float: 1
> +
> +Function: "acosh_vlen2":
> +double: 2
> +
> +Function: "acosh_vlen4":
> +double: 2
> +float: 1
> +
> +Function: "acosh_vlen4_avx2":
> +double: 2
> +
> +Function: "acosh_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "acosh_vlen8_avx2":
> +float: 2
> +
>  Function: "asin":
>  double: 1
>  float: 1
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core-sse2.S
> new file mode 100644
> index 0000000000..28620a03a9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized acosh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_acosh _ZGVbN2v_acosh_sse2
> +#include "../svml_d_acosh2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core.c
> new file mode 100644
> index 0000000000..8a41507326
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized acosh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_acosh
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_acosh, __GI__ZGVbN2v_acosh, __redirect__ZGVbN2v_acosh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core_sse4.S
> new file mode 100644
> index 0000000000..6455f57ce7
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh2_core_sse4.S
> @@ -0,0 +1,1469 @@
> +/* Function acosh vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute acosh(x) as log(x + sqrt(x*x - 1))
> + *
> + *   Special cases:
> + *
> + *   acosh(NaN)  = quiet NaN, and raise invalid exception
> + *   acosh(-INF) = NaN
> + *   acosh(+INF) = +INF
> + *   acosh(x)    = NaN if x < 1
> + *   acosh(1)    = +0
> + *
> + */
> +
> +/* Offsets for data table __svml_dacosh_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	8208
> +#define poly_coeff                    	12320
> +#define ExpMask                       	12384
> +#define Two10                         	12400
> +#define MinLog1p                      	12416
> +#define MaxLog1p                      	12432
> +#define One                           	12448
> +#define SgnMask                       	12464
> +#define XThreshold                    	12480
> +#define XhMask                        	12496
> +#define Threshold                     	12512
> +#define Bias                          	12528
> +#define Bias1                         	12544
> +#define ExpMask0                      	12560
> +#define ExpMask2                      	12576
> +#define L2                            	12592
> +#define dBigThreshold                 	12608
> +#define dLargestFinite                	12624
> +#define dThirtyOne                    	12640
> +#define XScale                        	12656
> +
> +/* Lookup bias for data table __svml_dacosh_data_internal.  */
> +#define Table_Lookup_Bias               -0x405ff0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_acosh_sse4)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $64, %rsp
> +        movaps    %xmm0, %xmm7
> +
> +/* Load the constant 1 and possibly other stuff */
> +        movups    One+__svml_dacosh_data_internal(%rip), %xmm6
> +
> +/* Compute U = X - 1 and V = X + 1, naively first. */
> +        movaps    %xmm7, %xmm11
> +        movaps    %xmm6, %xmm10
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * also adding L into Xl.
> + * compute 1+x as high, low parts
> + */
> +        movaps    %xmm6, %xmm14
> +        subpd     %xmm6, %xmm11
> +        addpd     %xmm7, %xmm10
> +
> +/* For low-accuracy versions, naivety is harmless */
> +        mulpd     %xmm11, %xmm10
> +
> +/* dH = [X + sqrt(X^2 - 1)] - 1 */
> +        sqrtpd    %xmm10, %xmm13
> +        addpd     %xmm11, %xmm13
> +        maxpd     %xmm13, %xmm14
> +        movaps    %xmm6, %xmm4
> +
> +/*
> + * The following computation can go wrong for very large X, e.g.
> + * the X^2 - 1 = U * V can overflow. But for large X we have
> + * acosh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
> + * we can just later stick X back into the log and tweak up the exponent.
> + * Actually we scale X by 2^-30 and tweak the exponent up by 31,
> + * to stay in the safe range for the later log computation.
> + * Compute a flag now telling us when to do this.
> + */
> +        movaps    %xmm7, %xmm5
> +        minpd     %xmm13, %xmm4
> +        cmpltpd   dBigThreshold+__svml_dacosh_data_internal(%rip), %xmm5
> +        movups    SgnMask+__svml_dacosh_data_internal(%rip), %xmm12
> +        movaps    %xmm14, %xmm0
> +
> +/* Now multiplex to the case X = 2^-30 * input, Xl = dL = 0 in the "big" case. */
> +        movups    XScale+__svml_dacosh_data_internal(%rip), %xmm15
> +        andps     %xmm12, %xmm13
> +        mulpd     %xmm7, %xmm15
> +        cmpltpd   XThreshold+__svml_dacosh_data_internal(%rip), %xmm13
> +        addpd     %xmm4, %xmm0
> +        orps      XhMask+__svml_dacosh_data_internal(%rip), %xmm13
> +        movaps    %xmm5, %xmm3
> +        andps     %xmm13, %xmm0
> +        andnps    %xmm15, %xmm3
> +        subpd     %xmm0, %xmm14
> +        andps     %xmm5, %xmm0
> +
> +/*
> + * Check that 1 < X < +inf; otherwise go to the callout function.
> + * We need the callout for X = 1 to avoid division by zero below.
> + * This test ensures that callout handles NaN and either infinity.
> + */
> +        movaps    %xmm7, %xmm9
> +
> +/* Now resume the main code. */
> +        movups    ExpMask+__svml_dacosh_data_internal(%rip), %xmm1
> +        orps      %xmm0, %xmm3
> +
> +/* preserve mantissa, set input exponent to 2^(-10) */
> +        andps     %xmm3, %xmm1
> +        movaps    %xmm6, %xmm8
> +        orps      Two10+__svml_dacosh_data_internal(%rip), %xmm1
> +
> +/* exponent bits */
> +        movaps    %xmm3, %xmm11
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        cvtpd2ps  %xmm1, %xmm2
> +        cmpnlepd  dLargestFinite+__svml_dacosh_data_internal(%rip), %xmm9
> +        cmpnltpd  %xmm7, %xmm8
> +        addpd     %xmm14, %xmm4
> +        movlhps   %xmm2, %xmm2
> +        orps      %xmm8, %xmm9
> +        rcpps     %xmm2, %xmm8
> +        movmskpd  %xmm9, %edx
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        movups    .FLT_20(%rip), %xmm10
> +        andps     %xmm5, %xmm4
> +
> +/* exponent of X needed to scale Xl */
> +        movdqu    ExpMask0+__svml_dacosh_data_internal(%rip), %xmm9
> +        psrlq     $20, %xmm11
> +        cvtps2pd  %xmm8, %xmm1
> +        addpd     %xmm10, %xmm1
> +        subpd     %xmm10, %xmm1
> +
> +/* 2^ (-10-exp(X) ) */
> +        movdqu    ExpMask2+__svml_dacosh_data_internal(%rip), %xmm2
> +        pand      %xmm3, %xmm9
> +        psubq     %xmm9, %xmm2
> +
> +/* scale DblRcp */
> +        mulpd     %xmm1, %xmm2
> +
> +/* argument reduction */
> +        mulpd     %xmm2, %xmm3
> +        mulpd     %xmm2, %xmm4
> +        subpd     %xmm6, %xmm3
> +        movaps    %xmm3, %xmm2
> +        movaps    %xmm5, %xmm0
> +        addpd     %xmm4, %xmm2
> +        pshufd    $221, %xmm11, %xmm12
> +        movaps    %xmm2, %xmm6
> +
> +/* biased exponent in DP format */
> +        cvtdq2pd  %xmm12, %xmm14
> +        subpd     %xmm3, %xmm6
> +
> +/* polynomial */
> +        movups    poly_coeff+__svml_dacosh_data_internal(%rip), %xmm3
> +        lea       Table_Lookup_Bias+__svml_dacosh_data_internal(%rip), %rsi
> +        mulpd     %xmm2, %xmm3
> +        subpd     %xmm6, %xmm4
> +        addpd     poly_coeff+16+__svml_dacosh_data_internal(%rip), %xmm3
> +
> +/* Add 31 to the exponent in the "large" case to get log(2 * input) */
> +        movups    dThirtyOne+__svml_dacosh_data_internal(%rip), %xmm13
> +
> +/* exponent*log(2.0) */
> +        movups    Threshold+__svml_dacosh_data_internal(%rip), %xmm8
> +        addpd     %xmm14, %xmm13
> +        cmpltpd   %xmm1, %xmm8
> +        andps     %xmm5, %xmm14
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        movaps    %xmm1, %xmm5
> +        movaps    %xmm2, %xmm1
> +        andnps    %xmm13, %xmm0
> +        mulpd     %xmm2, %xmm1
> +        movups    poly_coeff+32+__svml_dacosh_data_internal(%rip), %xmm6
> +        psrlq     $40, %xmm5
> +        mulpd     %xmm2, %xmm6
> +        mulpd     %xmm1, %xmm3
> +        addpd     poly_coeff+48+__svml_dacosh_data_internal(%rip), %xmm6
> +        movd      %xmm5, %eax
> +        andps     Bias+__svml_dacosh_data_internal(%rip), %xmm8
> +        orps      %xmm14, %xmm0
> +        addpd     %xmm3, %xmm6
> +
> +/*
> + * reconstruction
> + * VQFMA( D, R, P, R2, R );
> + */
> +        mulpd     %xmm6, %xmm1
> +        addpd     %xmm1, %xmm4
> +        orps      Bias1+__svml_dacosh_data_internal(%rip), %xmm8
> +        pshufd    $2, %xmm5, %xmm15
> +        subpd     %xmm8, %xmm0
> +        addpd     %xmm4, %xmm2
> +        movd      %xmm15, %ecx
> +        mulpd     L2+__svml_dacosh_data_internal(%rip), %xmm0
> +        movslq    %eax, %rax
> +        movslq    %ecx, %rcx
> +        movsd     (%rsi,%rax), %xmm9
> +        movhpd    (%rsi,%rcx), %xmm9
> +        addpd     %xmm2, %xmm9
> +        addpd     %xmm9, %xmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm7
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm7, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      acosh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVbN2v_acosh_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dacosh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
> +        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
> +        __declspec(align(16)) VUINT32 ExpMask[2][2];
> +        __declspec(align(16)) VUINT32 Two10[2][2];
> +        __declspec(align(16)) VUINT32 MinLog1p[2][2];
> +        __declspec(align(16)) VUINT32 MaxLog1p[2][2];
> +        __declspec(align(16)) VUINT32 One[2][2];
> +        __declspec(align(16)) VUINT32 SgnMask[2][2];
> +        __declspec(align(16)) VUINT32 XThreshold[2][2];
> +        __declspec(align(16)) VUINT32 XhMask[2][2];
> +        __declspec(align(16)) VUINT32 Threshold[2][2];
> +        __declspec(align(16)) VUINT32 Bias[2][2];
> +        __declspec(align(16)) VUINT32 Bias1[2][2];
> +        __declspec(align(16)) VUINT32 ExpMask0[2][2];
> +        __declspec(align(16)) VUINT32 ExpMask2[2][2];
> +        __declspec(align(16)) VUINT32 L2[2][2];
> +        __declspec(align(16)) VUINT32 dBigThreshold[2][2];
> +        __declspec(align(16)) VUINT32 dLargestFinite[2][2];
> +        __declspec(align(16)) VUINT32 dThirtyOne[2][2];
> +        __declspec(align(16)) VUINT32 XScale[2][2];
> +} __svml_dacosh_data_internal;
> +#endif
> +__svml_dacosh_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> +        /*== Log_LA_table ==*/
> +        .align 16
> +        .quad 0x8000000000000000
> +        .quad 0xbf5ff802a9ab10e6
> +        .quad 0xbf6ff00aa2b10bc0
> +        .quad 0xbf77ee11ebd82e94
> +        .quad 0xbf7fe02a6b106789
> +        .quad 0xbf83e7295d25a7d9
> +        .quad 0xbf87dc475f810a77
> +        .quad 0xbf8bcf712c74384c
> +        .quad 0xbf8fc0a8b0fc03e4
> +        .quad 0xbf91d7f7eb9eebe7
> +        .quad 0xbf93cea44346a575
> +        .quad 0xbf95c45a51b8d389
> +        .quad 0xbf97b91b07d5b11b
> +        .quad 0xbf99ace7551cc514
> +        .quad 0xbf9b9fc027af9198
> +        .quad 0xbf9d91a66c543cc4
> +        .quad 0xbf9f829b0e783300
> +        .quad 0xbfa0b94f7c196176
> +        .quad 0xbfa1b0d98923d980
> +        .quad 0xbfa2a7ec2214e873
> +        .quad 0xbfa39e87b9febd60
> +        .quad 0xbfa494acc34d911c
> +        .quad 0xbfa58a5bafc8e4d5
> +        .quad 0xbfa67f94f094bd98
> +        .quad 0xbfa77458f632dcfc
> +        .quad 0xbfa868a83083f6cf
> +        .quad 0xbfa95c830ec8e3eb
> +        .quad 0xbfaa4fe9ffa3d235
> +        .quad 0xbfab42dd711971bf
> +        .quad 0xbfac355dd0921f2d
> +        .quad 0xbfad276b8adb0b52
> +        .quad 0xbfae19070c276016
> +        .quad 0xbfaf0a30c01162a6
> +        .quad 0xbfaffae9119b9303
> +        .quad 0xbfb075983598e471
> +        .quad 0xbfb0ed839b5526fe
> +        .quad 0xbfb16536eea37ae1
> +        .quad 0xbfb1dcb263db1944
> +        .quad 0xbfb253f62f0a1417
> +        .quad 0xbfb2cb0283f5de1f
> +        .quad 0xbfb341d7961bd1d1
> +        .quad 0xbfb3b87598b1b6ee
> +        .quad 0xbfb42edcbea646f0
> +        .quad 0xbfb4a50d3aa1b040
> +        .quad 0xbfb51b073f06183f
> +        .quad 0xbfb590cafdf01c28
> +        .quad 0xbfb60658a93750c4
> +        .quad 0xbfb67bb0726ec0fc
> +        .quad 0xbfb6f0d28ae56b4c
> +        .quad 0xbfb765bf23a6be13
> +        .quad 0xbfb7da766d7b12cd
> +        .quad 0xbfb84ef898e8282a
> +        .quad 0xbfb8c345d6319b21
> +        .quad 0xbfb9375e55595ede
> +        .quad 0xbfb9ab42462033ad
> +        .quad 0xbfba1ef1d8061cd4
> +        .quad 0xbfba926d3a4ad563
> +        .quad 0xbfbb05b49bee43fe
> +        .quad 0xbfbb78c82bb0eda1
> +        .quad 0xbfbbeba818146765
> +        .quad 0xbfbc5e548f5bc743
> +        .quad 0xbfbcd0cdbf8c13e1
> +        .quad 0xbfbd4313d66cb35d
> +        .quad 0xbfbdb5270187d927
> +        .quad 0xbfbe27076e2af2e6
> +        .quad 0xbfbe98b549671467
> +        .quad 0xbfbf0a30c01162a6
> +        .quad 0xbfbf7b79fec37ddf
> +        .quad 0xbfbfec9131dbeabb
> +        .quad 0xbfc02ebb42bf3d4b
> +        .quad 0xbfc0671512ca596e
> +        .quad 0xbfc09f561ee719c3
> +        .quad 0xbfc0d77e7cd08e59
> +        .quad 0xbfc10f8e422539b1
> +        .quad 0xbfc14785846742ac
> +        .quad 0xbfc17f6458fca611
> +        .quad 0xbfc1b72ad52f67a0
> +        .quad 0xbfc1eed90e2dc2c3
> +        .quad 0xbfc2266f190a5acb
> +        .quad 0xbfc25ded0abc6ad2
> +        .quad 0xbfc29552f81ff523
> +        .quad 0xbfc2cca0f5f5f251
> +        .quad 0xbfc303d718e47fd3
> +        .quad 0xbfc33af575770e4f
> +        .quad 0xbfc371fc201e8f74
> +        .quad 0xbfc3a8eb2d31a376
> +        .quad 0xbfc3dfc2b0ecc62a
> +        .quad 0xbfc41682bf727bc0
> +        .quad 0xbfc44d2b6ccb7d1e
> +        .quad 0xbfc483bccce6e3dd
> +        .quad 0xbfc4ba36f39a55e5
> +        .quad 0xbfc4f099f4a230b2
> +        .quad 0xbfc526e5e3a1b438
> +        .quad 0xbfc55d1ad4232d6f
> +        .quad 0xbfc59338d9982086
> +        .quad 0xbfc5c940075972b9
> +        .quad 0xbfc5ff3070a793d4
> +        .quad 0xbfc6350a28aaa758
> +        .quad 0xbfc66acd4272ad51
> +        .quad 0xbfc6a079d0f7aad2
> +        .quad 0xbfc6d60fe719d21d
> +        .quad 0xbfc70b8f97a1aa75
> +        .quad 0xbfc740f8f54037a5
> +        .quad 0xbfc7764c128f2127
> +        .quad 0xbfc7ab890210d909
> +        .quad 0xbfc7e0afd630c274
> +        .quad 0xbfc815c0a14357eb
> +        .quad 0xbfc84abb75865139
> +        .quad 0xbfc87fa06520c911
> +        .quad 0xbfc8b46f8223625b
> +        .quad 0xbfc8e928de886d41
> +        .quad 0xbfc91dcc8c340bde
> +        .quad 0xbfc9525a9cf456b4
> +        .quad 0xbfc986d3228180ca
> +        .quad 0xbfc9bb362e7dfb83
> +        .quad 0xbfc9ef83d2769a34
> +        .quad 0xbfca23bc1fe2b563
> +        .quad 0xbfca57df28244dcd
> +        .quad 0xbfca8becfc882f19
> +        .quad 0xbfcabfe5ae46124c
> +        .quad 0xbfcaf3c94e80bff3
> +        .quad 0xbfcb2797ee46320c
> +        .quad 0xbfcb5b519e8fb5a4
> +        .quad 0xbfcb8ef670420c3b
> +        .quad 0xbfcbc286742d8cd6
> +        .quad 0xbfcbf601bb0e44e2
> +        .quad 0xbfcc2968558c18c1
> +        .quad 0xbfcc5cba543ae425
> +        .quad 0xbfcc8ff7c79a9a22
> +        .quad 0xbfccc320c0176502
> +        .quad 0xbfccf6354e09c5dc
> +        .quad 0xbfcd293581b6b3e7
> +        .quad 0xbfcd5c216b4fbb91
> +        .quad 0xbfcd8ef91af31d5e
> +        .quad 0xbfcdc1bca0abec7d
> +        .quad 0xbfcdf46c0c722d2f
> +        .quad 0xbfce27076e2af2e6
> +        .quad 0xbfce598ed5a87e2f
> +        .quad 0xbfce8c0252aa5a60
> +        .quad 0xbfcebe61f4dd7b0b
> +        .quad 0xbfcef0adcbdc5936
> +        .quad 0xbfcf22e5e72f105d
> +        .quad 0xbfcf550a564b7b37
> +        .quad 0xbfcf871b28955045
> +        .quad 0xbfcfb9186d5e3e2b
> +        .quad 0xbfcfeb0233e607cc
> +        .quad 0xbfd00e6c45ad501d
> +        .quad 0xbfd0274dc16c232f
> +        .quad 0xbfd0402594b4d041
> +        .quad 0xbfd058f3c703ebc6
> +        .quad 0xbfd071b85fcd590d
> +        .quad 0xbfd08a73667c57af
> +        .quad 0xbfd0a324e27390e3
> +        .quad 0xbfd0bbccdb0d24bd
> +        .quad 0xbfd0d46b579ab74b
> +        .quad 0xbfd0ed005f657da4
> +        .quad 0xbfd1058bf9ae4ad5
> +        .quad 0xbfd11e0e2dad9cb7
> +        .quad 0xbfd136870293a8b0
> +        .quad 0xbfd14ef67f88685a
> +        .quad 0xbfd1675cababa60e
> +        .quad 0xbfd17fb98e15095d
> +        .quad 0xbfd1980d2dd4236f
> +        .quad 0xbfd1b05791f07b49
> +        .quad 0xbfd1c898c16999fb
> +        .quad 0xbfd1e0d0c33716be
> +        .quad 0xbfd1f8ff9e48a2f3
> +        .quad 0xbfd211255986160c
> +        .quad 0xbfd22941fbcf7966
> +        .quad 0xbfd241558bfd1404
> +        .quad 0xbfd2596010df763a
> +        .quad 0xbfd27161913f853d
> +        .quad 0xbfd2895a13de86a3
> +        .quad 0xbfd2a1499f762bc9
> +        .quad 0xbfd2b9303ab89d25
> +        .quad 0xbfd2d10dec508583
> +        .quad 0xbfd2e8e2bae11d31
> +        .quad 0xbfd300aead06350c
> +        .quad 0xbfd31871c9544185
> +        .quad 0xbfd3302c16586588
> +        .quad 0xbfd347dd9a987d55
> +        .quad 0xbfd35f865c93293e
> +        .quad 0xbfd3772662bfd85b
> +        .quad 0xbfd38ebdb38ed321
> +        .quad 0xbfd3a64c556945ea
> +        .quad 0xbfd3bdd24eb14b6a
> +        .quad 0xbfd3d54fa5c1f710
> +        .quad 0xbfd3ecc460ef5f50
> +        .quad 0xbfd404308686a7e4
> +        .quad 0xbfd41b941cce0bee
> +        .quad 0xbfd432ef2a04e814
> +        .quad 0xbfd44a41b463c47c
> +        .quad 0xbfd4618bc21c5ec2
> +        .quad 0xbfd478cd5959b3d9
> +        .quad 0xbfd49006804009d1
> +        .quad 0xbfd4a7373cecf997
> +        .quad 0xbfd4be5f957778a1
> +        .quad 0xbfd4d57f8fefe27f
> +        .quad 0xbfd4ec973260026a
> +        .quad 0xbfd503a682cb1cb3
> +        .quad 0xbfd51aad872df82d
> +        .quad 0xbfd531ac457ee77e
> +        .quad 0xbfd548a2c3add263
> +        .quad 0xbfd55f9107a43ee2
> +        .quad 0xbfd5767717455a6c
> +        .quad 0xbfd58d54f86e02f2
> +        .quad 0xbfd5a42ab0f4cfe2
> +        .quad 0xbfd5baf846aa1b19
> +        .quad 0xbfd5d1bdbf5809ca
> +        .quad 0xbfd5e87b20c2954a
> +        .quad 0xbfd5ff3070a793d4
> +        .quad 0xbfd615ddb4bec13c
> +        .quad 0xbfd62c82f2b9c795
> +        .quad 0x3fd61965cdb02c1f
> +        .quad 0x3fd602d08af091ec
> +        .quad 0x3fd5ec433d5c35ae
> +        .quad 0x3fd5d5bddf595f30
> +        .quad 0x3fd5bf406b543db2
> +        .quad 0x3fd5a8cadbbedfa1
> +        .quad 0x3fd5925d2b112a59
> +        .quad 0x3fd57bf753c8d1fb
> +        .quad 0x3fd565995069514c
> +        .quad 0x3fd54f431b7be1a9
> +        .quad 0x3fd538f4af8f72fe
> +        .quad 0x3fd522ae0738a3d8
> +        .quad 0x3fd50c6f1d11b97c
> +        .quad 0x3fd4f637ebba9810
> +        .quad 0x3fd4e0086dd8baca
> +        .quad 0x3fd4c9e09e172c3c
> +        .quad 0x3fd4b3c077267e9a
> +        .quad 0x3fd49da7f3bcc41f
> +        .quad 0x3fd487970e958770
> +        .quad 0x3fd4718dc271c41b
> +        .quad 0x3fd45b8c0a17df13
> +        .quad 0x3fd44591e0539f49
> +        .quad 0x3fd42f9f3ff62642
> +        .quad 0x3fd419b423d5e8c7
> +        .quad 0x3fd403d086cea79c
> +        .quad 0x3fd3edf463c1683e
> +        .quad 0x3fd3d81fb5946dba
> +        .quad 0x3fd3c25277333184
> +        .quad 0x3fd3ac8ca38e5c5f
> +        .quad 0x3fd396ce359bbf54
> +        .quad 0x3fd3811728564cb2
> +        .quad 0x3fd36b6776be1117
> +        .quad 0x3fd355bf1bd82c8b
> +        .quad 0x3fd3401e12aecba1
> +        .quad 0x3fd32a84565120a8
> +        .quad 0x3fd314f1e1d35ce4
> +        .quad 0x3fd2ff66b04ea9d4
> +        .quad 0x3fd2e9e2bce12286
> +        .quad 0x3fd2d46602adccee
> +        .quad 0x3fd2bef07cdc9354
> +        .quad 0x3fd2a982269a3dbf
> +        .quad 0x3fd2941afb186b7c
> +        .quad 0x3fd27ebaf58d8c9d
> +        .quad 0x3fd269621134db92
> +        .quad 0x3fd25410494e56c7
> +        .quad 0x3fd23ec5991eba49
> +        .quad 0x3fd22981fbef797b
> +        .quad 0x3fd214456d0eb8d4
> +        .quad 0x3fd1ff0fe7cf47a7
> +        .quad 0x3fd1e9e1678899f4
> +        .quad 0x3fd1d4b9e796c245
> +        .quad 0x3fd1bf99635a6b95
> +        .quad 0x3fd1aa7fd638d33f
> +        .quad 0x3fd1956d3b9bc2fa
> +        .quad 0x3fd180618ef18adf
> +        .quad 0x3fd16b5ccbacfb73
> +        .quad 0x3fd1565eed455fc3
> +        .quad 0x3fd14167ef367783
> +        .quad 0x3fd12c77cd00713b
> +        .quad 0x3fd1178e8227e47c
> +        .quad 0x3fd102ac0a35cc1c
> +        .quad 0x3fd0edd060b78081
> +        .quad 0x3fd0d8fb813eb1ef
> +        .quad 0x3fd0c42d676162e3
> +        .quad 0x3fd0af660eb9e279
> +        .quad 0x3fd09aa572e6c6d4
> +        .quad 0x3fd085eb8f8ae797
> +        .quad 0x3fd07138604d5862
> +        .quad 0x3fd05c8be0d9635a
> +        .quad 0x3fd047e60cde83b8
> +        .quad 0x3fd03346e0106062
> +        .quad 0x3fd01eae5626c691
> +        .quad 0x3fd00a1c6adda473
> +        .quad 0x3fcfeb2233ea07cd
> +        .quad 0x3fcfc218be620a5e
> +        .quad 0x3fcf991c6cb3b379
> +        .quad 0x3fcf702d36777df0
> +        .quad 0x3fcf474b134df229
> +        .quad 0x3fcf1e75fadf9bde
> +        .quad 0x3fcef5ade4dcffe6
> +        .quad 0x3fceccf2c8fe920a
> +        .quad 0x3fcea4449f04aaf5
> +        .quad 0x3fce7ba35eb77e2a
> +        .quad 0x3fce530effe71012
> +        .quad 0x3fce2a877a6b2c12
> +        .quad 0x3fce020cc6235ab5
> +        .quad 0x3fcdd99edaf6d7e9
> +        .quad 0x3fcdb13db0d48940
> +        .quad 0x3fcd88e93fb2f450
> +        .quad 0x3fcd60a17f903515
> +        .quad 0x3fcd38666871f465
> +        .quad 0x3fcd1037f2655e7b
> +        .quad 0x3fcce816157f1988
> +        .quad 0x3fccc000c9db3c52
> +        .quad 0x3fcc97f8079d44ec
> +        .quad 0x3fcc6ffbc6f00f71
> +        .quad 0x3fcc480c0005ccd1
> +        .quad 0x3fcc2028ab17f9b4
> +        .quad 0x3fcbf851c067555f
> +        .quad 0x3fcbd087383bd8ad
> +        .quad 0x3fcba8c90ae4ad19
> +        .quad 0x3fcb811730b823d2
> +        .quad 0x3fcb5971a213acdb
> +        .quad 0x3fcb31d8575bce3d
> +        .quad 0x3fcb0a4b48fc1b46
> +        .quad 0x3fcae2ca6f672bd4
> +        .quad 0x3fcabb55c31693ad
> +        .quad 0x3fca93ed3c8ad9e3
> +        .quad 0x3fca6c90d44b704e
> +        .quad 0x3fca454082e6ab05
> +        .quad 0x3fca1dfc40f1b7f1
> +        .quad 0x3fc9f6c407089664
> +        .quad 0x3fc9cf97cdce0ec3
> +        .quad 0x3fc9a8778debaa38
> +        .quad 0x3fc981634011aa75
> +        .quad 0x3fc95a5adcf7017f
> +        .quad 0x3fc9335e5d594989
> +        .quad 0x3fc90c6db9fcbcd9
> +        .quad 0x3fc8e588ebac2dbf
> +        .quad 0x3fc8beafeb38fe8c
> +        .quad 0x3fc897e2b17b19a5
> +        .quad 0x3fc871213750e994
> +        .quad 0x3fc84a6b759f512f
> +        .quad 0x3fc823c16551a3c2
> +        .quad 0x3fc7fd22ff599d4f
> +        .quad 0x3fc7d6903caf5ad0
> +        .quad 0x3fc7b0091651528c
> +        .quad 0x3fc7898d85444c73
> +        .quad 0x3fc7631d82935a86
> +        .quad 0x3fc73cb9074fd14d
> +        .quad 0x3fc716600c914054
> +        .quad 0x3fc6f0128b756abc
> +        .quad 0x3fc6c9d07d203fc7
> +        .quad 0x3fc6a399dabbd383
> +        .quad 0x3fc67d6e9d785771
> +        .quad 0x3fc6574ebe8c133a
> +        .quad 0x3fc6313a37335d76
> +        .quad 0x3fc60b3100b09476
> +        .quad 0x3fc5e533144c1719
> +        .quad 0x3fc5bf406b543db2
> +        .quad 0x3fc59958ff1d52f1
> +        .quad 0x3fc5737cc9018cdd
> +        .quad 0x3fc54dabc26105d2
> +        .quad 0x3fc527e5e4a1b58d
> +        .quad 0x3fc5022b292f6a45
> +        .quad 0x3fc4dc7b897bc1c8
> +        .quad 0x3fc4b6d6fefe22a4
> +        .quad 0x3fc4913d8333b561
> +        .quad 0x3fc46baf0f9f5db7
> +        .quad 0x3fc4462b9dc9b3dc
> +        .quad 0x3fc420b32740fdd4
> +        .quad 0x3fc3fb45a59928cc
> +        .quad 0x3fc3d5e3126bc27f
> +        .quad 0x3fc3b08b6757f2a9
> +        .quad 0x3fc38b3e9e027479
> +        .quad 0x3fc365fcb0159016
> +        .quad 0x3fc340c59741142e
> +        .quad 0x3fc31b994d3a4f85
> +        .quad 0x3fc2f677cbbc0a96
> +        .quad 0x3fc2d1610c86813a
> +        .quad 0x3fc2ac55095f5c59
> +        .quad 0x3fc28753bc11aba5
> +        .quad 0x3fc2625d1e6ddf57
> +        .quad 0x3fc23d712a49c202
> +        .quad 0x3fc2188fd9807263
> +        .quad 0x3fc1f3b925f25d41
> +        .quad 0x3fc1ceed09853752
> +        .quad 0x3fc1aa2b7e23f72a
> +        .quad 0x3fc185747dbecf34
> +        .quad 0x3fc160c8024b27b1
> +        .quad 0x3fc13c2605c398c3
> +        .quad 0x3fc1178e8227e47c
> +        .quad 0x3fc0f301717cf0fb
> +        .quad 0x3fc0ce7ecdccc28d
> +        .quad 0x3fc0aa06912675d5
> +        .quad 0x3fc08598b59e3a07
> +        .quad 0x3fc06135354d4b18
> +        .quad 0x3fc03cdc0a51ec0d
> +        .quad 0x3fc0188d2ecf6140
> +        .quad 0x3fbfe89139dbd566
> +        .quad 0x3fbfa01c9db57ce2
> +        .quad 0x3fbf57bc7d9005db
> +        .quad 0x3fbf0f70cdd992e3
> +        .quad 0x3fbec739830a1120
> +        .quad 0x3fbe7f1691a32d3e
> +        .quad 0x3fbe3707ee30487b
> +        .quad 0x3fbdef0d8d466db9
> +        .quad 0x3fbda727638446a2
> +        .quad 0x3fbd5f55659210e2
> +        .quad 0x3fbd179788219364
> +        .quad 0x3fbccfedbfee13a8
> +        .quad 0x3fbc885801bc4b23
> +        .quad 0x3fbc40d6425a5cb1
> +        .quad 0x3fbbf968769fca11
> +        .quad 0x3fbbb20e936d6974
> +        .quad 0x3fbb6ac88dad5b1c
> +        .quad 0x3fbb23965a52ff00
> +        .quad 0x3fbadc77ee5aea8c
> +        .quad 0x3fba956d3ecade63
> +        .quad 0x3fba4e7640b1bc38
> +        .quad 0x3fba0792e9277cac
> +        .quad 0x3fb9c0c32d4d2548
> +        .quad 0x3fb97a07024cbe74
> +        .quad 0x3fb9335e5d594989
> +        .quad 0x3fb8ecc933aeb6e8
> +        .quad 0x3fb8a6477a91dc29
> +        .quad 0x3fb85fd927506a48
> +        .quad 0x3fb8197e2f40e3f0
> +        .quad 0x3fb7d33687c293c9
> +        .quad 0x3fb78d02263d82d3
> +        .quad 0x3fb746e100226ed9
> +        .quad 0x3fb700d30aeac0e1
> +        .quad 0x3fb6bad83c1883b6
> +        .quad 0x3fb674f089365a7a
> +        .quad 0x3fb62f1be7d77743
> +        .quad 0x3fb5e95a4d9791cb
> +        .quad 0x3fb5a3abb01ade25
> +        .quad 0x3fb55e10050e0384
> +        .quad 0x3fb518874226130a
> +        .quad 0x3fb4d3115d207eac
> +        .quad 0x3fb48dae4bc31018
> +        .quad 0x3fb4485e03dbdfad
> +        .quad 0x3fb403207b414b7f
> +        .quad 0x3fb3bdf5a7d1ee64
> +        .quad 0x3fb378dd7f749714
> +        .quad 0x3fb333d7f8183f4b
> +        .quad 0x3fb2eee507b40301
> +        .quad 0x3fb2aa04a44717a5
> +        .quad 0x3fb26536c3d8c369
> +        .quad 0x3fb2207b5c78549e
> +        .quad 0x3fb1dbd2643d190b
> +        .quad 0x3fb1973bd1465567
> +        .quad 0x3fb152b799bb3cc9
> +        .quad 0x3fb10e45b3cae831
> +        .quad 0x3fb0c9e615ac4e17
> +        .quad 0x3fb08598b59e3a07
> +        .quad 0x3fb0415d89e74444
> +        .quad 0x3faffa6911ab9301
> +        .quad 0x3faf723b517fc523
> +        .quad 0x3faeea31c006b87c
> +        .quad 0x3fae624c4a0b5e1b
> +        .quad 0x3fadda8adc67ee4e
> +        .quad 0x3fad52ed6405d86f
> +        .quad 0x3faccb73cdddb2cc
> +        .quad 0x3fac441e06f72a9e
> +        .quad 0x3fabbcebfc68f420
> +        .quad 0x3fab35dd9b58baad
> +        .quad 0x3faaaef2d0fb10fc
> +        .quad 0x3faa282b8a936171
> +        .quad 0x3fa9a187b573de7c
> +        .quad 0x3fa91b073efd7314
> +        .quad 0x3fa894aa149fb343
> +        .quad 0x3fa80e7023d8ccc4
> +        .quad 0x3fa788595a3577ba
> +        .quad 0x3fa70265a550e777
> +        .quad 0x3fa67c94f2d4bb58
> +        .quad 0x3fa5f6e73078efb8
> +        .quad 0x3fa5715c4c03ceef
> +        .quad 0x3fa4ebf43349e26f
> +        .quad 0x3fa466aed42de3ea
> +        .quad 0x3fa3e18c1ca0ae92
> +        .quad 0x3fa35c8bfaa1306b
> +        .quad 0x3fa2d7ae5c3c5bae
> +        .quad 0x3fa252f32f8d183f
> +        .quad 0x3fa1ce5a62bc353a
> +        .quad 0x3fa149e3e4005a8d
> +        .quad 0x3fa0c58fa19dfaaa
> +        .quad 0x3fa0415d89e74444
> +        .quad 0x3f9f7a9b16782856
> +        .quad 0x3f9e72bf2813ce51
> +        .quad 0x3f9d6b2725979802
> +        .quad 0x3f9c63d2ec14aaf2
> +        .quad 0x3f9b5cc258b718e6
> +        .quad 0x3f9a55f548c5c43f
> +        .quad 0x3f994f6b99a24475
> +        .quad 0x3f98492528c8cabf
> +        .quad 0x3f974321d3d006d3
> +        .quad 0x3f963d6178690bd6
> +        .quad 0x3f9537e3f45f3565
> +        .quad 0x3f9432a925980cc1
> +        .quad 0x3f932db0ea132e22
> +        .quad 0x3f9228fb1fea2e28
> +        .quad 0x3f912487a5507f70
> +        .quad 0x3f90205658935847
> +        .quad 0x3f8e38ce3033310c
> +        .quad 0x3f8c317384c75f06
> +        .quad 0x3f8a2a9c6c170462
> +        .quad 0x3f882448a388a2aa
> +        .quad 0x3f861e77e8b53fc6
> +        .quad 0x3f841929f96832f0
> +        .quad 0x3f82145e939ef1e9
> +        .quad 0x3f8010157588de71
> +        .quad 0x3f7c189cbb0e27fb
> +        .quad 0x3f78121214586b54
> +        .quad 0x3f740c8a747878e2
> +        .quad 0x3f70080559588b35
> +        .quad 0x3f680904828985c0
> +        .quad 0x3f60040155d5889e
> +        .quad 0x3f50020055655889
> +        .quad 0x0000000000000000
> +        /*== poly_coeff[4] ==*/
> +        .align 16
> +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 16
> +        .quad 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinLog1p = -1+2^(-53) ==*/
> +        .align 16
> +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff
> +        /*== MaxLog1p ==*/
> +        .align 16
> +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000
> +        /*== One ==*/
> +        .align 16
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== SgnMask ==*/
> +        .align 16
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== XThreshold ==*/
> +        .align 16
> +        .quad 0x3e00000000000000, 0x3e00000000000000
> +        /*== XhMask ==*/
> +        .align 16
> +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00
> +        /*== Threshold ==*/
> +        .align 16
> +        .quad 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 16
> +        .quad 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 16
> +        .quad 0x408ff00000000000, 0x408ff00000000000
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000
> +        /*== ExpMask2 ==*/
> +        .align 16
> +        .quad 0x7f40000000000000, 0x7f40000000000000
> +        /*== L2L ==*/
> +        .align 16
> +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> +        /*== dBigThreshold ==*/
> +        .align 16
> +        .quad 0x41D0000000000000, 0x41D0000000000000
> +        /*== dLargestFinite ==*/
> +        .align 16
> +        .quad 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF
> +        /*== dThirtyOne ==*/
> +        .align 16
> +        .quad 0x403F000000000000, 0x403F000000000000
> +        /*== XScale ==*/
> +        .align 16
> +        .quad 0x3E10000000000000, 0x3E10000000000000
> +        .align 16
> +        .type	__svml_dacosh_data_internal,@object
> +        .size	__svml_dacosh_data_internal,.-__svml_dacosh_data_internal
> +        .align 16
> +
> +.FLT_20:
> +        .long	0x00000000,0x43380000,0x00000000,0x43380000
> +        .type	.FLT_20,@object
> +        .size	.FLT_20,16
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core-sse.S
> new file mode 100644
> index 0000000000..cc524d4813
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized acosh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_acosh _ZGVdN4v_acosh_sse_wrapper
> +#include "../svml_d_acosh4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core.c
> new file mode 100644
> index 0000000000..bb07c44f4b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized acosh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_acosh
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_acosh, __GI__ZGVdN4v_acosh, __redirect__ZGVdN4v_acosh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core_avx2.S
> new file mode 100644
> index 0000000000..18f278d899
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh4_core_avx2.S
> @@ -0,0 +1,1536 @@
> +/* Function acosh vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute acosh(x) as log(x + sqrt(x*x - 1))
> + *
> + *   Special cases:
> + *
> + *   acosh(NaN)  = quiet NaN, and raise invalid exception
> + *   acosh(-INF) = NaN
> + *   acosh(+INF) = +INF
> + *   acosh(x)    = NaN if x < 1
> + *   acosh(1)    = +0
> + *
> + */
> +
> +/* Offsets for data table __svml_dacosh_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	8224
> +#define poly_coeff                    	12352
> +#define ExpMask                       	12480
> +#define Two10                         	12512
> +#define MinLog1p                      	12544
> +#define MaxLog1p                      	12576
> +#define One                           	12608
> +#define SgnMask                       	12640
> +#define XThreshold                    	12672
> +#define XhMask                        	12704
> +#define Threshold                     	12736
> +#define Bias                          	12768
> +#define Bias1                         	12800
> +#define ExpMask0                      	12832
> +#define ExpMask2                      	12864
> +#define L2                            	12896
> +#define dBigThreshold                 	12928
> +#define dC1                           	12960
> +#define dC2                           	12992
> +#define dC3                           	13024
> +#define dC4                           	13056
> +#define dC5                           	13088
> +#define dLargestFinite                	13120
> +#define dThirtyOne                    	13152
> +#define dTopMask12                    	13184
> +#define dTopMask29                    	13216
> +#define XScale                        	13248
> +
> +/* Lookup bias for data table __svml_dacosh_data_internal.  */
> +#define Table_Lookup_Bias               -0x405fe0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_acosh_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       Table_Lookup_Bias+__svml_dacosh_data_internal(%rip), %r8
> +
> +/* Load the constant 1 and possibly other stuff */
> +        vmovupd   One+__svml_dacosh_data_internal(%rip), %ymm8
> +
> +/*
> + * Now       1 / (1 + d)
> + * = 1 / (1 + (sqrt(1 - e) - 1))
> + * = 1 / sqrt(1 - e)
> + * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 +
> + * 63/256 * e^5 + 231/1024 * e^6 + ....
> + * So compute the first five nonconstant terms of that, so that
> + * we have a relative correction (1 + Corr) to apply to S etc.
> + * C1 = 1/2
> + * C2 = 3/8
> + * C3 = 5/16
> + * C4 = 35/128
> + * C5 = 63/256
> + */
> +        vmovupd   dC5+__svml_dacosh_data_internal(%rip), %ymm3
> +        vmovapd   %ymm0, %ymm9
> +        vmovapd   %ymm8, %ymm13
> +        vfmsub231pd %ymm9, %ymm9, %ymm13
> +
> +/*
> + * Check that 1 < X < +inf; otherwise go to the callout function.
> + * We need the callout for X = 1 to avoid division by zero below.
> + * This test ensures that callout handles NaN and either infinity.
> + */
> +        vcmpnle_uqpd dLargestFinite+__svml_dacosh_data_internal(%rip), %ymm9, %ymm10
> +        vcmpngt_uqpd %ymm8, %ymm9, %ymm11
> +
> +/* dU is needed later on */
> +        vsubpd    %ymm8, %ymm9, %ymm6
> +
> +/*
> + * The following computation can go wrong for very large X, e.g.
> + * the X^2 - 1 = U * V can overflow. But for large X we have
> + * acosh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
> + * we can just later stick X back into the log and tweak up the exponent.
> + * Actually we scale X by 2^-30 and tweak the exponent up by 31,
> + * to stay in the safe range for the later log computation.
> + * Compute a flag now telling us when to do this.
> + */
> +        vcmplt_oqpd dBigThreshold+__svml_dacosh_data_internal(%rip), %ymm9, %ymm7
> +
> +/*
> + * do the same thing but with NR iteration
> + * Finally, express Y + W = U * V accurately where Y has <= 29 bits
> + */
> +        vandpd    dTopMask29+__svml_dacosh_data_internal(%rip), %ymm13, %ymm5
> +
> +/*
> + * Compute R = 1/sqrt(Y + W) * (1 + d)
> + * Force R to <= 12 significant bits in case it isn't already
> + * This means that R * Y and R^2 * Y are exactly representable.
> + */
> +        vcvtpd2ps %ymm5, %xmm14
> +        vsubpd    %ymm5, %ymm13, %ymm4
> +        vrsqrtps  %xmm14, %xmm15
> +        vcvtps2pd %xmm15, %ymm0
> +        vandpd    dTopMask12+__svml_dacosh_data_internal(%rip), %ymm0, %ymm2
> +        vorpd     %ymm11, %ymm10, %ymm12
> +
> +/*
> + * Compute S = (Y/sqrt(Y + W)) * (1 + d)
> + * and T = (W/sqrt(Y + W)) * (1 + d)
> + * so that S + T = sqrt(Y + W) * (1 + d)
> + * S is exact, and the rounding error in T is OK.
> + */
> +        vmulpd    %ymm2, %ymm5, %ymm10
> +        vmulpd    %ymm4, %ymm2, %ymm11
> +
> +/*
> + * Compute e = -(2 * d + d^2)
> + * The first FMR is exact, and the rounding error in the other is acceptable
> + * since d and e are ~ 2^-12
> + */
> +        vmovapd   %ymm8, %ymm1
> +        vfnmadd231pd %ymm10, %ymm2, %ymm1
> +
> +/*
> + * For low-accuracy versions, the computation can be done
> + * just as U + ((S + T) + (S + T) * Corr)
> + */
> +        vaddpd    %ymm11, %ymm10, %ymm13
> +        vfnmadd231pd %ymm11, %ymm2, %ymm1
> +        vfmadd213pd dC4+__svml_dacosh_data_internal(%rip), %ymm1, %ymm3
> +        vfmadd213pd dC3+__svml_dacosh_data_internal(%rip), %ymm1, %ymm3
> +        vfmadd213pd dC2+__svml_dacosh_data_internal(%rip), %ymm1, %ymm3
> +        vfmadd213pd dC1+__svml_dacosh_data_internal(%rip), %ymm1, %ymm3
> +        vmovmskpd %ymm12, %eax
> +        vmulpd    %ymm3, %ymm1, %ymm12
> +
> +/* Now multiplex to the case X = 2^-30 * input, Xl = dL = 0 in the "big" case. */
> +        vmulpd    XScale+__svml_dacosh_data_internal(%rip), %ymm9, %ymm3
> +        vfmadd213pd %ymm13, %ymm12, %ymm13
> +        vaddpd    %ymm13, %ymm6, %ymm6
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * also adding L into Xl.
> + * compute 1+x as high, low parts
> + */
> +        vmaxpd    %ymm6, %ymm8, %ymm4
> +        vminpd    %ymm6, %ymm8, %ymm2
> +        vandpd    SgnMask+__svml_dacosh_data_internal(%rip), %ymm6, %ymm14
> +        vcmplt_oqpd XThreshold+__svml_dacosh_data_internal(%rip), %ymm14, %ymm15
> +        vaddpd    %ymm2, %ymm4, %ymm0
> +        vorpd     XhMask+__svml_dacosh_data_internal(%rip), %ymm15, %ymm5
> +        vandpd    %ymm5, %ymm0, %ymm6
> +        vblendvpd %ymm7, %ymm6, %ymm3, %ymm5
> +        vsubpd    %ymm6, %ymm4, %ymm1
> +
> +/* 2^ (-10-exp(X) ) */
> +        vmovupd   ExpMask2+__svml_dacosh_data_internal(%rip), %ymm15
> +        vaddpd    %ymm1, %ymm2, %ymm10
> +
> +/* exponent bits */
> +        vpsrlq    $20, %ymm5, %ymm2
> +
> +/*
> + * Now resume the main code.
> + * preserve mantissa, set input exponent to 2^(-10)
> + */
> +        vandpd    ExpMask+__svml_dacosh_data_internal(%rip), %ymm5, %ymm11
> +        vorpd     Two10+__svml_dacosh_data_internal(%rip), %ymm11, %ymm12
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        vcvtpd2ps %ymm12, %xmm13
> +        vrcpps    %xmm13, %xmm14
> +
> +/* exponent*log(2.0) */
> +        vmovupd   Threshold+__svml_dacosh_data_internal(%rip), %ymm13
> +        vcvtps2pd %xmm14, %ymm3
> +        vandpd    %ymm7, %ymm10, %ymm4
> +
> +/* exponent of X needed to scale Xl */
> +        vandps    ExpMask0+__svml_dacosh_data_internal(%rip), %ymm5, %ymm0
> +        vpsubq    %ymm0, %ymm15, %ymm6
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        vroundpd  $0, %ymm3, %ymm3
> +        vextractf128 $1, %ymm2, %xmm1
> +        vshufps   $221, %xmm1, %xmm2, %xmm10
> +
> +/* biased exponent in DP format */
> +        vcvtdq2pd %xmm10, %ymm12
> +
> +/* scale DblRcp */
> +        vmulpd    %ymm6, %ymm3, %ymm2
> +
> +/* Add 31 to the exponent in the "large" case to get log(2 * input) */
> +        vaddpd    dThirtyOne+__svml_dacosh_data_internal(%rip), %ymm12, %ymm11
> +
> +/* argument reduction */
> +        vfmsub213pd %ymm8, %ymm2, %ymm5
> +        vmulpd    %ymm2, %ymm4, %ymm8
> +        vmovupd   poly_coeff+64+__svml_dacosh_data_internal(%rip), %ymm2
> +        vblendvpd %ymm7, %ymm12, %ymm11, %ymm1
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        vpsrlq    $40, %ymm3, %ymm7
> +        vcmplt_oqpd %ymm3, %ymm13, %ymm3
> +        vandpd    Bias+__svml_dacosh_data_internal(%rip), %ymm3, %ymm14
> +        vorpd     Bias1+__svml_dacosh_data_internal(%rip), %ymm14, %ymm15
> +        vsubpd    %ymm15, %ymm1, %ymm1
> +
> +/* polynomial */
> +        vmovupd   poly_coeff+__svml_dacosh_data_internal(%rip), %ymm3
> +        vmovd     %xmm7, %edx
> +        vextractf128 $1, %ymm7, %xmm10
> +        vpextrd   $2, %xmm7, %ecx
> +        vmulpd    L2+__svml_dacosh_data_internal(%rip), %ymm1, %ymm7
> +        vaddpd    %ymm8, %ymm5, %ymm1
> +        vmovd     %xmm10, %esi
> +        vsubpd    %ymm5, %ymm1, %ymm5
> +        vfmadd213pd poly_coeff+32+__svml_dacosh_data_internal(%rip), %ymm1, %ymm3
> +        vfmadd213pd poly_coeff+96+__svml_dacosh_data_internal(%rip), %ymm1, %ymm2
> +        vsubpd    %ymm5, %ymm8, %ymm4
> +        vmulpd    %ymm1, %ymm1, %ymm8
> +        vfmadd213pd %ymm2, %ymm8, %ymm3
> +        movslq    %edx, %rdx
> +        movslq    %esi, %rsi
> +        vpextrd   $2, %xmm10, %edi
> +        movslq    %ecx, %rcx
> +        movslq    %edi, %rdi
> +
> +/*
> + * reconstruction
> + * VQFMA( D, R, P, R2, R );
> + */
> +        vfmadd213pd %ymm4, %ymm8, %ymm3
> +        vmovsd    (%r8,%rdx), %xmm0
> +        vmovsd    (%r8,%rsi), %xmm11
> +        vmovhpd   (%r8,%rcx), %xmm0, %xmm6
> +        vmovhpd   (%r8,%rdi), %xmm11, %xmm12
> +        vinsertf128 $1, %xmm12, %ymm6, %ymm0
> +        vaddpd    %ymm3, %ymm1, %ymm6
> +        vaddpd    %ymm6, %ymm0, %ymm0
> +        vaddpd    %ymm0, %ymm7, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm9
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm9, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      acosh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_acosh_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dacosh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
> +        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
> +        __declspec(align(32)) VUINT32 ExpMask[4][2];
> +        __declspec(align(32)) VUINT32 Two10[4][2];
> +        __declspec(align(32)) VUINT32 MinLog1p[4][2];
> +        __declspec(align(32)) VUINT32 MaxLog1p[4][2];
> +        __declspec(align(32)) VUINT32 One[4][2];
> +        __declspec(align(32)) VUINT32 SgnMask[4][2];
> +        __declspec(align(32)) VUINT32 XThreshold[4][2];
> +        __declspec(align(32)) VUINT32 XhMask[4][2];
> +        __declspec(align(32)) VUINT32 Threshold[4][2];
> +        __declspec(align(32)) VUINT32 Bias[4][2];
> +        __declspec(align(32)) VUINT32 Bias1[4][2];
> +        __declspec(align(32)) VUINT32 ExpMask0[4][2];
> +        __declspec(align(32)) VUINT32 ExpMask2[4][2];
> +        __declspec(align(32)) VUINT32 L2[4][2];
> +        __declspec(align(32)) VUINT32 dBigThreshold[4][2];
> +        __declspec(align(32)) VUINT32 dC1[4][2];
> +        __declspec(align(32)) VUINT32 dC2[4][2];
> +        __declspec(align(32)) VUINT32 dC3[4][2];
> +        __declspec(align(32)) VUINT32 dC4[4][2];
> +        __declspec(align(32)) VUINT32 dC5[4][2];
> +        __declspec(align(32)) VUINT32 dLargestFinite[4][2];
> +        __declspec(align(32)) VUINT32 dThirtyOne[4][2];
> +        __declspec(align(32)) VUINT32 dTopMask12[4][2];
> +        __declspec(align(32)) VUINT32 dTopMask29[4][2];
> +        __declspec(align(32)) VUINT32 XScale[4][2];
> +} __svml_dacosh_data_internal;
> +#endif
> +__svml_dacosh_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> +        /*== Log_LA_table ==*/
> +        .align 32
> +        .quad 0x8000000000000000
> +        .quad 0xbf5ff802a9ab10e6
> +        .quad 0xbf6ff00aa2b10bc0
> +        .quad 0xbf77ee11ebd82e94
> +        .quad 0xbf7fe02a6b106789
> +        .quad 0xbf83e7295d25a7d9
> +        .quad 0xbf87dc475f810a77
> +        .quad 0xbf8bcf712c74384c
> +        .quad 0xbf8fc0a8b0fc03e4
> +        .quad 0xbf91d7f7eb9eebe7
> +        .quad 0xbf93cea44346a575
> +        .quad 0xbf95c45a51b8d389
> +        .quad 0xbf97b91b07d5b11b
> +        .quad 0xbf99ace7551cc514
> +        .quad 0xbf9b9fc027af9198
> +        .quad 0xbf9d91a66c543cc4
> +        .quad 0xbf9f829b0e783300
> +        .quad 0xbfa0b94f7c196176
> +        .quad 0xbfa1b0d98923d980
> +        .quad 0xbfa2a7ec2214e873
> +        .quad 0xbfa39e87b9febd60
> +        .quad 0xbfa494acc34d911c
> +        .quad 0xbfa58a5bafc8e4d5
> +        .quad 0xbfa67f94f094bd98
> +        .quad 0xbfa77458f632dcfc
> +        .quad 0xbfa868a83083f6cf
> +        .quad 0xbfa95c830ec8e3eb
> +        .quad 0xbfaa4fe9ffa3d235
> +        .quad 0xbfab42dd711971bf
> +        .quad 0xbfac355dd0921f2d
> +        .quad 0xbfad276b8adb0b52
> +        .quad 0xbfae19070c276016
> +        .quad 0xbfaf0a30c01162a6
> +        .quad 0xbfaffae9119b9303
> +        .quad 0xbfb075983598e471
> +        .quad 0xbfb0ed839b5526fe
> +        .quad 0xbfb16536eea37ae1
> +        .quad 0xbfb1dcb263db1944
> +        .quad 0xbfb253f62f0a1417
> +        .quad 0xbfb2cb0283f5de1f
> +        .quad 0xbfb341d7961bd1d1
> +        .quad 0xbfb3b87598b1b6ee
> +        .quad 0xbfb42edcbea646f0
> +        .quad 0xbfb4a50d3aa1b040
> +        .quad 0xbfb51b073f06183f
> +        .quad 0xbfb590cafdf01c28
> +        .quad 0xbfb60658a93750c4
> +        .quad 0xbfb67bb0726ec0fc
> +        .quad 0xbfb6f0d28ae56b4c
> +        .quad 0xbfb765bf23a6be13
> +        .quad 0xbfb7da766d7b12cd
> +        .quad 0xbfb84ef898e8282a
> +        .quad 0xbfb8c345d6319b21
> +        .quad 0xbfb9375e55595ede
> +        .quad 0xbfb9ab42462033ad
> +        .quad 0xbfba1ef1d8061cd4
> +        .quad 0xbfba926d3a4ad563
> +        .quad 0xbfbb05b49bee43fe
> +        .quad 0xbfbb78c82bb0eda1
> +        .quad 0xbfbbeba818146765
> +        .quad 0xbfbc5e548f5bc743
> +        .quad 0xbfbcd0cdbf8c13e1
> +        .quad 0xbfbd4313d66cb35d
> +        .quad 0xbfbdb5270187d927
> +        .quad 0xbfbe27076e2af2e6
> +        .quad 0xbfbe98b549671467
> +        .quad 0xbfbf0a30c01162a6
> +        .quad 0xbfbf7b79fec37ddf
> +        .quad 0xbfbfec9131dbeabb
> +        .quad 0xbfc02ebb42bf3d4b
> +        .quad 0xbfc0671512ca596e
> +        .quad 0xbfc09f561ee719c3
> +        .quad 0xbfc0d77e7cd08e59
> +        .quad 0xbfc10f8e422539b1
> +        .quad 0xbfc14785846742ac
> +        .quad 0xbfc17f6458fca611
> +        .quad 0xbfc1b72ad52f67a0
> +        .quad 0xbfc1eed90e2dc2c3
> +        .quad 0xbfc2266f190a5acb
> +        .quad 0xbfc25ded0abc6ad2
> +        .quad 0xbfc29552f81ff523
> +        .quad 0xbfc2cca0f5f5f251
> +        .quad 0xbfc303d718e47fd3
> +        .quad 0xbfc33af575770e4f
> +        .quad 0xbfc371fc201e8f74
> +        .quad 0xbfc3a8eb2d31a376
> +        .quad 0xbfc3dfc2b0ecc62a
> +        .quad 0xbfc41682bf727bc0
> +        .quad 0xbfc44d2b6ccb7d1e
> +        .quad 0xbfc483bccce6e3dd
> +        .quad 0xbfc4ba36f39a55e5
> +        .quad 0xbfc4f099f4a230b2
> +        .quad 0xbfc526e5e3a1b438
> +        .quad 0xbfc55d1ad4232d6f
> +        .quad 0xbfc59338d9982086
> +        .quad 0xbfc5c940075972b9
> +        .quad 0xbfc5ff3070a793d4
> +        .quad 0xbfc6350a28aaa758
> +        .quad 0xbfc66acd4272ad51
> +        .quad 0xbfc6a079d0f7aad2
> +        .quad 0xbfc6d60fe719d21d
> +        .quad 0xbfc70b8f97a1aa75
> +        .quad 0xbfc740f8f54037a5
> +        .quad 0xbfc7764c128f2127
> +        .quad 0xbfc7ab890210d909
> +        .quad 0xbfc7e0afd630c274
> +        .quad 0xbfc815c0a14357eb
> +        .quad 0xbfc84abb75865139
> +        .quad 0xbfc87fa06520c911
> +        .quad 0xbfc8b46f8223625b
> +        .quad 0xbfc8e928de886d41
> +        .quad 0xbfc91dcc8c340bde
> +        .quad 0xbfc9525a9cf456b4
> +        .quad 0xbfc986d3228180ca
> +        .quad 0xbfc9bb362e7dfb83
> +        .quad 0xbfc9ef83d2769a34
> +        .quad 0xbfca23bc1fe2b563
> +        .quad 0xbfca57df28244dcd
> +        .quad 0xbfca8becfc882f19
> +        .quad 0xbfcabfe5ae46124c
> +        .quad 0xbfcaf3c94e80bff3
> +        .quad 0xbfcb2797ee46320c
> +        .quad 0xbfcb5b519e8fb5a4
> +        .quad 0xbfcb8ef670420c3b
> +        .quad 0xbfcbc286742d8cd6
> +        .quad 0xbfcbf601bb0e44e2
> +        .quad 0xbfcc2968558c18c1
> +        .quad 0xbfcc5cba543ae425
> +        .quad 0xbfcc8ff7c79a9a22
> +        .quad 0xbfccc320c0176502
> +        .quad 0xbfccf6354e09c5dc
> +        .quad 0xbfcd293581b6b3e7
> +        .quad 0xbfcd5c216b4fbb91
> +        .quad 0xbfcd8ef91af31d5e
> +        .quad 0xbfcdc1bca0abec7d
> +        .quad 0xbfcdf46c0c722d2f
> +        .quad 0xbfce27076e2af2e6
> +        .quad 0xbfce598ed5a87e2f
> +        .quad 0xbfce8c0252aa5a60
> +        .quad 0xbfcebe61f4dd7b0b
> +        .quad 0xbfcef0adcbdc5936
> +        .quad 0xbfcf22e5e72f105d
> +        .quad 0xbfcf550a564b7b37
> +        .quad 0xbfcf871b28955045
> +        .quad 0xbfcfb9186d5e3e2b
> +        .quad 0xbfcfeb0233e607cc
> +        .quad 0xbfd00e6c45ad501d
> +        .quad 0xbfd0274dc16c232f
> +        .quad 0xbfd0402594b4d041
> +        .quad 0xbfd058f3c703ebc6
> +        .quad 0xbfd071b85fcd590d
> +        .quad 0xbfd08a73667c57af
> +        .quad 0xbfd0a324e27390e3
> +        .quad 0xbfd0bbccdb0d24bd
> +        .quad 0xbfd0d46b579ab74b
> +        .quad 0xbfd0ed005f657da4
> +        .quad 0xbfd1058bf9ae4ad5
> +        .quad 0xbfd11e0e2dad9cb7
> +        .quad 0xbfd136870293a8b0
> +        .quad 0xbfd14ef67f88685a
> +        .quad 0xbfd1675cababa60e
> +        .quad 0xbfd17fb98e15095d
> +        .quad 0xbfd1980d2dd4236f
> +        .quad 0xbfd1b05791f07b49
> +        .quad 0xbfd1c898c16999fb
> +        .quad 0xbfd1e0d0c33716be
> +        .quad 0xbfd1f8ff9e48a2f3
> +        .quad 0xbfd211255986160c
> +        .quad 0xbfd22941fbcf7966
> +        .quad 0xbfd241558bfd1404
> +        .quad 0xbfd2596010df763a
> +        .quad 0xbfd27161913f853d
> +        .quad 0xbfd2895a13de86a3
> +        .quad 0xbfd2a1499f762bc9
> +        .quad 0xbfd2b9303ab89d25
> +        .quad 0xbfd2d10dec508583
> +        .quad 0xbfd2e8e2bae11d31
> +        .quad 0xbfd300aead06350c
> +        .quad 0xbfd31871c9544185
> +        .quad 0xbfd3302c16586588
> +        .quad 0xbfd347dd9a987d55
> +        .quad 0xbfd35f865c93293e
> +        .quad 0xbfd3772662bfd85b
> +        .quad 0xbfd38ebdb38ed321
> +        .quad 0xbfd3a64c556945ea
> +        .quad 0xbfd3bdd24eb14b6a
> +        .quad 0xbfd3d54fa5c1f710
> +        .quad 0xbfd3ecc460ef5f50
> +        .quad 0xbfd404308686a7e4
> +        .quad 0xbfd41b941cce0bee
> +        .quad 0xbfd432ef2a04e814
> +        .quad 0xbfd44a41b463c47c
> +        .quad 0xbfd4618bc21c5ec2
> +        .quad 0xbfd478cd5959b3d9
> +        .quad 0xbfd49006804009d1
> +        .quad 0xbfd4a7373cecf997
> +        .quad 0xbfd4be5f957778a1
> +        .quad 0xbfd4d57f8fefe27f
> +        .quad 0xbfd4ec973260026a
> +        .quad 0xbfd503a682cb1cb3
> +        .quad 0xbfd51aad872df82d
> +        .quad 0xbfd531ac457ee77e
> +        .quad 0xbfd548a2c3add263
> +        .quad 0xbfd55f9107a43ee2
> +        .quad 0xbfd5767717455a6c
> +        .quad 0xbfd58d54f86e02f2
> +        .quad 0xbfd5a42ab0f4cfe2
> +        .quad 0xbfd5baf846aa1b19
> +        .quad 0xbfd5d1bdbf5809ca
> +        .quad 0xbfd5e87b20c2954a
> +        .quad 0xbfd5ff3070a793d4
> +        .quad 0xbfd615ddb4bec13c
> +        .quad 0xbfd62c82f2b9c795
> +        .quad 0x3fd61965cdb02c1f
> +        .quad 0x3fd602d08af091ec
> +        .quad 0x3fd5ec433d5c35ae
> +        .quad 0x3fd5d5bddf595f30
> +        .quad 0x3fd5bf406b543db2
> +        .quad 0x3fd5a8cadbbedfa1
> +        .quad 0x3fd5925d2b112a59
> +        .quad 0x3fd57bf753c8d1fb
> +        .quad 0x3fd565995069514c
> +        .quad 0x3fd54f431b7be1a9
> +        .quad 0x3fd538f4af8f72fe
> +        .quad 0x3fd522ae0738a3d8
> +        .quad 0x3fd50c6f1d11b97c
> +        .quad 0x3fd4f637ebba9810
> +        .quad 0x3fd4e0086dd8baca
> +        .quad 0x3fd4c9e09e172c3c
> +        .quad 0x3fd4b3c077267e9a
> +        .quad 0x3fd49da7f3bcc41f
> +        .quad 0x3fd487970e958770
> +        .quad 0x3fd4718dc271c41b
> +        .quad 0x3fd45b8c0a17df13
> +        .quad 0x3fd44591e0539f49
> +        .quad 0x3fd42f9f3ff62642
> +        .quad 0x3fd419b423d5e8c7
> +        .quad 0x3fd403d086cea79c
> +        .quad 0x3fd3edf463c1683e
> +        .quad 0x3fd3d81fb5946dba
> +        .quad 0x3fd3c25277333184
> +        .quad 0x3fd3ac8ca38e5c5f
> +        .quad 0x3fd396ce359bbf54
> +        .quad 0x3fd3811728564cb2
> +        .quad 0x3fd36b6776be1117
> +        .quad 0x3fd355bf1bd82c8b
> +        .quad 0x3fd3401e12aecba1
> +        .quad 0x3fd32a84565120a8
> +        .quad 0x3fd314f1e1d35ce4
> +        .quad 0x3fd2ff66b04ea9d4
> +        .quad 0x3fd2e9e2bce12286
> +        .quad 0x3fd2d46602adccee
> +        .quad 0x3fd2bef07cdc9354
> +        .quad 0x3fd2a982269a3dbf
> +        .quad 0x3fd2941afb186b7c
> +        .quad 0x3fd27ebaf58d8c9d
> +        .quad 0x3fd269621134db92
> +        .quad 0x3fd25410494e56c7
> +        .quad 0x3fd23ec5991eba49
> +        .quad 0x3fd22981fbef797b
> +        .quad 0x3fd214456d0eb8d4
> +        .quad 0x3fd1ff0fe7cf47a7
> +        .quad 0x3fd1e9e1678899f4
> +        .quad 0x3fd1d4b9e796c245
> +        .quad 0x3fd1bf99635a6b95
> +        .quad 0x3fd1aa7fd638d33f
> +        .quad 0x3fd1956d3b9bc2fa
> +        .quad 0x3fd180618ef18adf
> +        .quad 0x3fd16b5ccbacfb73
> +        .quad 0x3fd1565eed455fc3
> +        .quad 0x3fd14167ef367783
> +        .quad 0x3fd12c77cd00713b
> +        .quad 0x3fd1178e8227e47c
> +        .quad 0x3fd102ac0a35cc1c
> +        .quad 0x3fd0edd060b78081
> +        .quad 0x3fd0d8fb813eb1ef
> +        .quad 0x3fd0c42d676162e3
> +        .quad 0x3fd0af660eb9e279
> +        .quad 0x3fd09aa572e6c6d4
> +        .quad 0x3fd085eb8f8ae797
> +        .quad 0x3fd07138604d5862
> +        .quad 0x3fd05c8be0d9635a
> +        .quad 0x3fd047e60cde83b8
> +        .quad 0x3fd03346e0106062
> +        .quad 0x3fd01eae5626c691
> +        .quad 0x3fd00a1c6adda473
> +        .quad 0x3fcfeb2233ea07cd
> +        .quad 0x3fcfc218be620a5e
> +        .quad 0x3fcf991c6cb3b379
> +        .quad 0x3fcf702d36777df0
> +        .quad 0x3fcf474b134df229
> +        .quad 0x3fcf1e75fadf9bde
> +        .quad 0x3fcef5ade4dcffe6
> +        .quad 0x3fceccf2c8fe920a
> +        .quad 0x3fcea4449f04aaf5
> +        .quad 0x3fce7ba35eb77e2a
> +        .quad 0x3fce530effe71012
> +        .quad 0x3fce2a877a6b2c12
> +        .quad 0x3fce020cc6235ab5
> +        .quad 0x3fcdd99edaf6d7e9
> +        .quad 0x3fcdb13db0d48940
> +        .quad 0x3fcd88e93fb2f450
> +        .quad 0x3fcd60a17f903515
> +        .quad 0x3fcd38666871f465
> +        .quad 0x3fcd1037f2655e7b
> +        .quad 0x3fcce816157f1988
> +        .quad 0x3fccc000c9db3c52
> +        .quad 0x3fcc97f8079d44ec
> +        .quad 0x3fcc6ffbc6f00f71
> +        .quad 0x3fcc480c0005ccd1
> +        .quad 0x3fcc2028ab17f9b4
> +        .quad 0x3fcbf851c067555f
> +        .quad 0x3fcbd087383bd8ad
> +        .quad 0x3fcba8c90ae4ad19
> +        .quad 0x3fcb811730b823d2
> +        .quad 0x3fcb5971a213acdb
> +        .quad 0x3fcb31d8575bce3d
> +        .quad 0x3fcb0a4b48fc1b46
> +        .quad 0x3fcae2ca6f672bd4
> +        .quad 0x3fcabb55c31693ad
> +        .quad 0x3fca93ed3c8ad9e3
> +        .quad 0x3fca6c90d44b704e
> +        .quad 0x3fca454082e6ab05
> +        .quad 0x3fca1dfc40f1b7f1
> +        .quad 0x3fc9f6c407089664
> +        .quad 0x3fc9cf97cdce0ec3
> +        .quad 0x3fc9a8778debaa38
> +        .quad 0x3fc981634011aa75
> +        .quad 0x3fc95a5adcf7017f
> +        .quad 0x3fc9335e5d594989
> +        .quad 0x3fc90c6db9fcbcd9
> +        .quad 0x3fc8e588ebac2dbf
> +        .quad 0x3fc8beafeb38fe8c
> +        .quad 0x3fc897e2b17b19a5
> +        .quad 0x3fc871213750e994
> +        .quad 0x3fc84a6b759f512f
> +        .quad 0x3fc823c16551a3c2
> +        .quad 0x3fc7fd22ff599d4f
> +        .quad 0x3fc7d6903caf5ad0
> +        .quad 0x3fc7b0091651528c
> +        .quad 0x3fc7898d85444c73
> +        .quad 0x3fc7631d82935a86
> +        .quad 0x3fc73cb9074fd14d
> +        .quad 0x3fc716600c914054
> +        .quad 0x3fc6f0128b756abc
> +        .quad 0x3fc6c9d07d203fc7
> +        .quad 0x3fc6a399dabbd383
> +        .quad 0x3fc67d6e9d785771
> +        .quad 0x3fc6574ebe8c133a
> +        .quad 0x3fc6313a37335d76
> +        .quad 0x3fc60b3100b09476
> +        .quad 0x3fc5e533144c1719
> +        .quad 0x3fc5bf406b543db2
> +        .quad 0x3fc59958ff1d52f1
> +        .quad 0x3fc5737cc9018cdd
> +        .quad 0x3fc54dabc26105d2
> +        .quad 0x3fc527e5e4a1b58d
> +        .quad 0x3fc5022b292f6a45
> +        .quad 0x3fc4dc7b897bc1c8
> +        .quad 0x3fc4b6d6fefe22a4
> +        .quad 0x3fc4913d8333b561
> +        .quad 0x3fc46baf0f9f5db7
> +        .quad 0x3fc4462b9dc9b3dc
> +        .quad 0x3fc420b32740fdd4
> +        .quad 0x3fc3fb45a59928cc
> +        .quad 0x3fc3d5e3126bc27f
> +        .quad 0x3fc3b08b6757f2a9
> +        .quad 0x3fc38b3e9e027479
> +        .quad 0x3fc365fcb0159016
> +        .quad 0x3fc340c59741142e
> +        .quad 0x3fc31b994d3a4f85
> +        .quad 0x3fc2f677cbbc0a96
> +        .quad 0x3fc2d1610c86813a
> +        .quad 0x3fc2ac55095f5c59
> +        .quad 0x3fc28753bc11aba5
> +        .quad 0x3fc2625d1e6ddf57
> +        .quad 0x3fc23d712a49c202
> +        .quad 0x3fc2188fd9807263
> +        .quad 0x3fc1f3b925f25d41
> +        .quad 0x3fc1ceed09853752
> +        .quad 0x3fc1aa2b7e23f72a
> +        .quad 0x3fc185747dbecf34
> +        .quad 0x3fc160c8024b27b1
> +        .quad 0x3fc13c2605c398c3
> +        .quad 0x3fc1178e8227e47c
> +        .quad 0x3fc0f301717cf0fb
> +        .quad 0x3fc0ce7ecdccc28d
> +        .quad 0x3fc0aa06912675d5
> +        .quad 0x3fc08598b59e3a07
> +        .quad 0x3fc06135354d4b18
> +        .quad 0x3fc03cdc0a51ec0d
> +        .quad 0x3fc0188d2ecf6140
> +        .quad 0x3fbfe89139dbd566
> +        .quad 0x3fbfa01c9db57ce2
> +        .quad 0x3fbf57bc7d9005db
> +        .quad 0x3fbf0f70cdd992e3
> +        .quad 0x3fbec739830a1120
> +        .quad 0x3fbe7f1691a32d3e
> +        .quad 0x3fbe3707ee30487b
> +        .quad 0x3fbdef0d8d466db9
> +        .quad 0x3fbda727638446a2
> +        .quad 0x3fbd5f55659210e2
> +        .quad 0x3fbd179788219364
> +        .quad 0x3fbccfedbfee13a8
> +        .quad 0x3fbc885801bc4b23
> +        .quad 0x3fbc40d6425a5cb1
> +        .quad 0x3fbbf968769fca11
> +        .quad 0x3fbbb20e936d6974
> +        .quad 0x3fbb6ac88dad5b1c
> +        .quad 0x3fbb23965a52ff00
> +        .quad 0x3fbadc77ee5aea8c
> +        .quad 0x3fba956d3ecade63
> +        .quad 0x3fba4e7640b1bc38
> +        .quad 0x3fba0792e9277cac
> +        .quad 0x3fb9c0c32d4d2548
> +        .quad 0x3fb97a07024cbe74
> +        .quad 0x3fb9335e5d594989
> +        .quad 0x3fb8ecc933aeb6e8
> +        .quad 0x3fb8a6477a91dc29
> +        .quad 0x3fb85fd927506a48
> +        .quad 0x3fb8197e2f40e3f0
> +        .quad 0x3fb7d33687c293c9
> +        .quad 0x3fb78d02263d82d3
> +        .quad 0x3fb746e100226ed9
> +        .quad 0x3fb700d30aeac0e1
> +        .quad 0x3fb6bad83c1883b6
> +        .quad 0x3fb674f089365a7a
> +        .quad 0x3fb62f1be7d77743
> +        .quad 0x3fb5e95a4d9791cb
> +        .quad 0x3fb5a3abb01ade25
> +        .quad 0x3fb55e10050e0384
> +        .quad 0x3fb518874226130a
> +        .quad 0x3fb4d3115d207eac
> +        .quad 0x3fb48dae4bc31018
> +        .quad 0x3fb4485e03dbdfad
> +        .quad 0x3fb403207b414b7f
> +        .quad 0x3fb3bdf5a7d1ee64
> +        .quad 0x3fb378dd7f749714
> +        .quad 0x3fb333d7f8183f4b
> +        .quad 0x3fb2eee507b40301
> +        .quad 0x3fb2aa04a44717a5
> +        .quad 0x3fb26536c3d8c369
> +        .quad 0x3fb2207b5c78549e
> +        .quad 0x3fb1dbd2643d190b
> +        .quad 0x3fb1973bd1465567
> +        .quad 0x3fb152b799bb3cc9
> +        .quad 0x3fb10e45b3cae831
> +        .quad 0x3fb0c9e615ac4e17
> +        .quad 0x3fb08598b59e3a07
> +        .quad 0x3fb0415d89e74444
> +        .quad 0x3faffa6911ab9301
> +        .quad 0x3faf723b517fc523
> +        .quad 0x3faeea31c006b87c
> +        .quad 0x3fae624c4a0b5e1b
> +        .quad 0x3fadda8adc67ee4e
> +        .quad 0x3fad52ed6405d86f
> +        .quad 0x3faccb73cdddb2cc
> +        .quad 0x3fac441e06f72a9e
> +        .quad 0x3fabbcebfc68f420
> +        .quad 0x3fab35dd9b58baad
> +        .quad 0x3faaaef2d0fb10fc
> +        .quad 0x3faa282b8a936171
> +        .quad 0x3fa9a187b573de7c
> +        .quad 0x3fa91b073efd7314
> +        .quad 0x3fa894aa149fb343
> +        .quad 0x3fa80e7023d8ccc4
> +        .quad 0x3fa788595a3577ba
> +        .quad 0x3fa70265a550e777
> +        .quad 0x3fa67c94f2d4bb58
> +        .quad 0x3fa5f6e73078efb8
> +        .quad 0x3fa5715c4c03ceef
> +        .quad 0x3fa4ebf43349e26f
> +        .quad 0x3fa466aed42de3ea
> +        .quad 0x3fa3e18c1ca0ae92
> +        .quad 0x3fa35c8bfaa1306b
> +        .quad 0x3fa2d7ae5c3c5bae
> +        .quad 0x3fa252f32f8d183f
> +        .quad 0x3fa1ce5a62bc353a
> +        .quad 0x3fa149e3e4005a8d
> +        .quad 0x3fa0c58fa19dfaaa
> +        .quad 0x3fa0415d89e74444
> +        .quad 0x3f9f7a9b16782856
> +        .quad 0x3f9e72bf2813ce51
> +        .quad 0x3f9d6b2725979802
> +        .quad 0x3f9c63d2ec14aaf2
> +        .quad 0x3f9b5cc258b718e6
> +        .quad 0x3f9a55f548c5c43f
> +        .quad 0x3f994f6b99a24475
> +        .quad 0x3f98492528c8cabf
> +        .quad 0x3f974321d3d006d3
> +        .quad 0x3f963d6178690bd6
> +        .quad 0x3f9537e3f45f3565
> +        .quad 0x3f9432a925980cc1
> +        .quad 0x3f932db0ea132e22
> +        .quad 0x3f9228fb1fea2e28
> +        .quad 0x3f912487a5507f70
> +        .quad 0x3f90205658935847
> +        .quad 0x3f8e38ce3033310c
> +        .quad 0x3f8c317384c75f06
> +        .quad 0x3f8a2a9c6c170462
> +        .quad 0x3f882448a388a2aa
> +        .quad 0x3f861e77e8b53fc6
> +        .quad 0x3f841929f96832f0
> +        .quad 0x3f82145e939ef1e9
> +        .quad 0x3f8010157588de71
> +        .quad 0x3f7c189cbb0e27fb
> +        .quad 0x3f78121214586b54
> +        .quad 0x3f740c8a747878e2
> +        .quad 0x3f70080559588b35
> +        .quad 0x3f680904828985c0
> +        .quad 0x3f60040155d5889e
> +        .quad 0x3f50020055655889
> +        .quad 0x0000000000000000
> +        /*== poly_coeff[4] ==*/
> +        .align 32
> +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 32
> +        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinLog1p = -1+2^(-53) ==*/
> +        .align 32
> +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff
> +        /*== MaxLog1p ==*/
> +        .align 32
> +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000
> +        /*== One ==*/
> +        .align 32
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== SgnMask ==*/
> +        .align 32
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== XThreshold ==*/
> +        .align 32
> +        .quad 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000
> +        /*== XhMask ==*/
> +        .align 32
> +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00
> +        /*== Threshold ==*/
> +        .align 32
> +        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 32
> +        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 32
> +        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000
> +        /*== ExpMask2 ==*/
> +        .align 32
> +        .quad 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000
> +        /*== L2L ==*/
> +        .align 32
> +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> +        /*== dBigThreshold ==*/
> +        .align 32
> +        .quad 0x41D0000000000000, 0x41D0000000000000, 0x41D0000000000000, 0x41D0000000000000
> +        /*== dC1 ==*/
> +        .align 32
> +        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
> +        /*== dC2 ==*/
> +        .align 32
> +        .quad 0x3fd7fffffffffffa, 0x3fd7fffffffffffa, 0x3fd7fffffffffffa, 0x3fd7fffffffffffa
> +        /*== dC3 ==*/
> +        .align 32
> +        .quad 0x3fd3fffffffffffa, 0x3fd3fffffffffffa, 0x3fd3fffffffffffa, 0x3fd3fffffffffffa
> +        /*== dC4 ==*/
> +        .align 32
> +        .quad 0x3fd1800013d9d428, 0x3fd1800013d9d428, 0x3fd1800013d9d428, 0x3fd1800013d9d428
> +        /*== dC5 ==*/
> +        .align 32
> +        .quad 0x3fcf800025de102f, 0x3fcf800025de102f, 0x3fcf800025de102f, 0x3fcf800025de102f
> +        /*== dLargestFinite ==*/
> +        .align 32
> +        .quad 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF
> +        /*== dThirtyOne ==*/
> +        .align 32
> +        .quad 0x403F000000000000, 0x403F000000000000, 0x403F000000000000, 0x403F000000000000
> +        /*== dTopMask12 ==*/
> +        .align 32
> +        .quad 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000
> +        /*== dTopMask29 ==*/
> +        .align 32
> +        .quad 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000
> +        /*== XScale ==*/
> +        .align 32
> +        .quad 0x3E10000000000000, 0x3E10000000000000, 0x3E10000000000000, 0x3E10000000000000
> +        .align 32
> +        .type	__svml_dacosh_data_internal,@object
> +        .size	__svml_dacosh_data_internal,.-__svml_dacosh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core-avx2.S
> new file mode 100644
> index 0000000000..48879787c1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized acosh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_acosh _ZGVeN8v_acosh_avx2_wrapper
> +#include "../svml_d_acosh8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core.c
> new file mode 100644
> index 0000000000..4322a5f707
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized acosh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_acosh
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_acosh, __GI__ZGVeN8v_acosh, __redirect__ZGVeN8v_acosh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core_avx512.S
> new file mode 100644
> index 0000000000..3199ef77e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acosh8_core_avx512.S
> @@ -0,0 +1,480 @@
> +/* Function acosh vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute acosh(x) as log(x + sqrt(x*x - 1))
> + *   using RSQRT instructions for starting the
> + *   square root approximation, and small table lookups for log
> + *   that map to AVX-512 permute instructions
> + *
> + *   Special cases:
> + *
> + *   acosh(NaN)  = quiet NaN, and raise invalid exception
> + *   acosh(-INF) = NaN
> + *   acosh(+INF) = +INF
> + *   acosh(x)    = NaN if x < 1
> + *   acosh(1)    = +0
> + *
> + */
> +
> +/* Offsets for data table __svml_dacosh_data_internal_avx512
> + */
> +#define Log_tbl_H                     	0
> +#define Log_tbl_L                     	128
> +#define One                           	256
> +#define SmallThreshold                	320
> +#define Threshold                     	384
> +#define LargeThreshold                	448
> +#define ca2                           	512
> +#define ca1                           	576
> +#define c4s                           	640
> +#define c3s                           	704
> +#define c2s                           	768
> +#define c1s                           	832
> +#define AddB5                         	896
> +#define RcpBitMask                    	960
> +#define OneEighth                     	1024
> +#define Four                          	1088
> +#define poly_coeff9                   	1152
> +#define poly_coeff8                   	1216
> +#define poly_coeff7                   	1280
> +#define poly_coeff6                   	1344
> +#define poly_coeff5                   	1408
> +#define poly_coeff4                   	1472
> +#define poly_coeff3                   	1536
> +#define poly_coeff2                   	1600
> +#define poly_coeff1                   	1664
> +#define L2H                           	1728
> +#define L2L                           	1792
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_acosh_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   One+__svml_dacosh_data_internal_avx512(%rip), %zmm5
> +
> +/* polynomial computation for small inputs */
> +        vmovups   ca2+__svml_dacosh_data_internal_avx512(%rip), %zmm13
> +        vmovups   ca1+__svml_dacosh_data_internal_avx512(%rip), %zmm14
> +
> +/*
> + * sqrt(1+x^2) ~ Sh + Sl + Sh*Eh*poly_s
> + * poly_s = c1+c2*Eh+c3*Eh^2
> + */
> +        vmovups   c4s+__svml_dacosh_data_internal_avx512(%rip), %zmm1
> +        vmovups   c2s+__svml_dacosh_data_internal_avx512(%rip), %zmm2
> +        vmovups   c1s+__svml_dacosh_data_internal_avx512(%rip), %zmm6
> +
> +/* very large inputs ? */
> +        vmovups   Threshold+__svml_dacosh_data_internal_avx512(%rip), %zmm15
> +
> +/* out of range inputs? */
> +        vmovups   LargeThreshold+__svml_dacosh_data_internal_avx512(%rip), %zmm3
> +
> +/* not a very small input ? */
> +        vmovups   SmallThreshold+__svml_dacosh_data_internal_avx512(%rip), %zmm10
> +        vmovaps   %zmm0, %zmm12
> +
> +/* x^2 - 1 */
> +        vmovaps   %zmm5, %zmm11
> +        vfmsub231pd {rn-sae}, %zmm12, %zmm12, %zmm11
> +        vcmppd    $21, {sae}, %zmm15, %zmm12, %k2
> +        vcmppd    $22, {sae}, %zmm3, %zmm12, %k0
> +        vcmppd    $18, {sae}, %zmm5, %zmm12, %k1
> +        vrsqrt14pd %zmm11, %zmm4
> +        vcmppd    $21, {sae}, %zmm10, %zmm11, %k3
> +        vfmadd231pd {rn-sae}, %zmm11, %zmm13, %zmm14
> +        vmovups   c3s+__svml_dacosh_data_internal_avx512(%rip), %zmm13
> +
> +/* Sh ~sqrt(-1+x^2) */
> +        vmulpd    {rn-sae}, %zmm4, %zmm11, %zmm9
> +        vmulpd    {rn-sae}, %zmm11, %zmm14, %zmm8
> +
> +/* Sh+x */
> +        vaddpd    {rn-sae}, %zmm12, %zmm9, %zmm15
> +
> +/* Shh */
> +        vsubpd    {rn-sae}, %zmm12, %zmm15, %zmm14
> +
> +/* (Yh*R0)_low */
> +        vmovaps   %zmm11, %zmm0
> +        korw      %k0, %k1, %k0
> +
> +/* rel. error term: Eh=1-Sh*R0 */
> +        vmovaps   %zmm5, %zmm7
> +        vfmsub213pd {rn-sae}, %zmm9, %zmm4, %zmm0
> +        vfnmadd231pd {rn-sae}, %zmm9, %zmm4, %zmm7
> +
> +/* rel. error term: Eh=(1-Sh*R0)-Sl*R0 */
> +        vfnmadd231pd {rn-sae}, %zmm0, %zmm4, %zmm7
> +
> +/* Shl */
> +        vsubpd    {rn-sae}, %zmm14, %zmm9, %zmm4
> +        vmovups   poly_coeff7+__svml_dacosh_data_internal_avx512(%rip), %zmm14
> +        vfmadd231pd {rn-sae}, %zmm7, %zmm1, %zmm13
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm7, %zmm13
> +        vfmadd213pd {rn-sae}, %zmm6, %zmm7, %zmm13
> +
> +/* Sh*Eh */
> +        vmulpd    {rn-sae}, %zmm7, %zmm9, %zmm7
> +
> +/* Sl + Sh*Eh*poly_s */
> +        vfmadd213pd {rn-sae}, %zmm0, %zmm13, %zmm7
> +
> +/* polynomials */
> +        vmovups   poly_coeff9+__svml_dacosh_data_internal_avx512(%rip), %zmm13
> +
> +/* polynomial computation for small inputs */
> +        vaddpd    {rn-sae}, %zmm7, %zmm9, %zmm0
> +
> +/* Xin0+Sl+Sh*Eh*poly_s ~ x+sqrt(1+x^2) */
> +        vaddpd    {rn-sae}, %zmm7, %zmm15, %zmm6
> +        vfmadd231pd {rn-sae}, %zmm0, %zmm8, %zmm0
> +
> +/* fixup for very large inputs */
> +        vmovups   OneEighth+__svml_dacosh_data_internal_avx512(%rip), %zmm8
> +
> +/* Sl_high */
> +        vsubpd    {rn-sae}, %zmm15, %zmm6, %zmm9
> +        vmovups   poly_coeff6+__svml_dacosh_data_internal_avx512(%rip), %zmm15
> +        vmulpd    {rn-sae}, %zmm8, %zmm12, %zmm6{%k2}
> +
> +/* Sl_l */
> +        vsubpd    {rn-sae}, %zmm9, %zmm7, %zmm3
> +        vrcp14pd  %zmm6, %zmm1
> +
> +/* Xin_low */
> +        vaddpd    {rn-sae}, %zmm4, %zmm3, %zmm7
> +
> +/* Table lookups */
> +        vmovups   __svml_dacosh_data_internal_avx512(%rip), %zmm3
> +
> +/* round reciprocal to 1+4b mantissas */
> +        vpaddq    AddB5+__svml_dacosh_data_internal_avx512(%rip), %zmm1, %zmm2
> +
> +/* fixup for very large inputs */
> +        vxorpd    %zmm7, %zmm7, %zmm7{%k2}
> +        vmovups   poly_coeff8+__svml_dacosh_data_internal_avx512(%rip), %zmm1
> +        vandpd    RcpBitMask+__svml_dacosh_data_internal_avx512(%rip), %zmm2, %zmm8
> +        vmovups   Log_tbl_L+__svml_dacosh_data_internal_avx512(%rip), %zmm2
> +
> +/* Prepare table index */
> +        vpsrlq    $48, %zmm8, %zmm9
> +
> +/* reduced argument for log(): (Rcp*Xin-1)+Rcp*Xin_low */
> +        vfmsub231pd {rn-sae}, %zmm8, %zmm6, %zmm5
> +
> +/* exponents */
> +        vgetexppd {sae}, %zmm8, %zmm4
> +        vmovups   Four+__svml_dacosh_data_internal_avx512(%rip), %zmm6
> +        vpermt2pd Log_tbl_H+64+__svml_dacosh_data_internal_avx512(%rip), %zmm9, %zmm3
> +        vpermt2pd Log_tbl_L+64+__svml_dacosh_data_internal_avx512(%rip), %zmm9, %zmm2
> +        vsubpd    {rn-sae}, %zmm6, %zmm4, %zmm4{%k2}
> +        vfmadd231pd {rn-sae}, %zmm8, %zmm7, %zmm5
> +        vmovups   poly_coeff5+__svml_dacosh_data_internal_avx512(%rip), %zmm6
> +        vmovups   poly_coeff4+__svml_dacosh_data_internal_avx512(%rip), %zmm7
> +
> +/* -K*L2H + Th */
> +        vmovups   L2H+__svml_dacosh_data_internal_avx512(%rip), %zmm8
> +
> +/* -K*L2L + Tl */
> +        vmovups   L2L+__svml_dacosh_data_internal_avx512(%rip), %zmm9
> +        vfmadd231pd {rn-sae}, %zmm5, %zmm13, %zmm1
> +        vmovups   poly_coeff2+__svml_dacosh_data_internal_avx512(%rip), %zmm13
> +        vfnmadd231pd {rn-sae}, %zmm4, %zmm8, %zmm3
> +        vfnmadd213pd {rn-sae}, %zmm2, %zmm9, %zmm4
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm5, %zmm1
> +        vmovups   poly_coeff3+__svml_dacosh_data_internal_avx512(%rip), %zmm2
> +        vmovups   poly_coeff1+__svml_dacosh_data_internal_avx512(%rip), %zmm14
> +        vfmadd213pd {rn-sae}, %zmm15, %zmm5, %zmm1
> +
> +/* R^2 */
> +        vmulpd    {rn-sae}, %zmm5, %zmm5, %zmm15
> +        vfmadd213pd {rn-sae}, %zmm6, %zmm5, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm5, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm5, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm5, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm5, %zmm1
> +
> +/* Tl + R^2*Poly */
> +        vfmadd213pd {rn-sae}, %zmm4, %zmm15, %zmm1
> +
> +/* R+Tl + R^2*Poly */
> +        vaddpd    {rn-sae}, %zmm5, %zmm1, %zmm5
> +        vaddpd    {rn-sae}, %zmm5, %zmm3, %zmm0{%k3}
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 k0 zmm0 zmm12
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm12, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 k0 zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax k0
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        kmovd     %k0, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      acosh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_acosh_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dacosh_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Log_tbl_H[16][2];
> +        __declspec(align(64)) VUINT32 Log_tbl_L[16][2];
> +        __declspec(align(64)) VUINT32 One[8][2];
> +        __declspec(align(64)) VUINT32 SmallThreshold[8][2];
> +        __declspec(align(64)) VUINT32 Threshold[8][2];
> +        __declspec(align(64)) VUINT32 LargeThreshold[8][2];
> +        __declspec(align(64)) VUINT32 ca2[8][2];
> +        __declspec(align(64)) VUINT32 ca1[8][2];
> +        __declspec(align(64)) VUINT32 c4s[8][2];
> +        __declspec(align(64)) VUINT32 c3s[8][2];
> +        __declspec(align(64)) VUINT32 c2s[8][2];
> +        __declspec(align(64)) VUINT32 c1s[8][2];
> +        __declspec(align(64)) VUINT32 AddB5[8][2];
> +        __declspec(align(64)) VUINT32 RcpBitMask[8][2];
> +        __declspec(align(64)) VUINT32 OneEighth[8][2];
> +        __declspec(align(64)) VUINT32 Four[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
> +        __declspec(align(64)) VUINT32 L2H[8][2];
> +        __declspec(align(64)) VUINT32 L2L[8][2];
> +    } __svml_dacosh_data_internal_avx512;
> +#endif
> +__svml_dacosh_data_internal_avx512:
> +        /*== Log_tbl_H ==*/
> +        .quad 0x0000000000000000
> +        .quad 0xbfaf0a30c0120000
> +        .quad 0xbfbe27076e2b0000
> +        .quad 0xbfc5ff3070a78000
> +        .quad 0xbfcc8ff7c79a8000
> +        .quad 0xbfd1675cababc000
> +        .quad 0xbfd4618bc21c4000
> +        .quad 0xbfd739d7f6bbc000
> +        .quad 0xbfd9f323ecbf8000
> +        .quad 0xbfdc8ff7c79a8000
> +        .quad 0xbfdf128f5faf0000
> +        .quad 0xbfe0be72e4252000
> +        .quad 0xbfe1e85f5e704000
> +        .quad 0xbfe307d7334f2000
> +        .quad 0xbfe41d8fe8468000
> +        .quad 0xbfe52a2d265bc000
> +        /*== Log_tbl_L ==*/
> +        .align 64
> +        .quad 0x0000000000000000
> +        .quad 0x3d53ab33d066d1d2
> +        .quad 0x3d2a342c2af0003c
> +        .quad 0xbd43d3c873e20a07
> +        .quad 0xbd4a21ac25d81ef3
> +        .quad 0x3d59f1fc63382a8f
> +        .quad 0xbd5ec27d0b7b37b3
> +        .quad 0xbd50069ce24c53fb
> +        .quad 0xbd584bf2b68d766f
> +        .quad 0xbd5a21ac25d81ef3
> +        .quad 0xbd3bb2cd720ec44c
> +        .quad 0xbd55056d312f7668
> +        .quad 0xbd1a07bd8b34be7c
> +        .quad 0x3d5e83c094debc15
> +        .quad 0x3d5aa33736867a17
> +        .quad 0xbd46abb9df22bc57
> +        /*== One ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== SmallThreshold ==*/
> +        .align 64
> +        .quad 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000, 0x3ef0000000000000
> +        /*== Threshold ==*/
> +        .align 64
> +        .quad 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000
> +        /*== LargeThreshold ==*/
> +        .align 64
> +        .quad 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff
> +        /*== ca2 ==*/
> +        .align 64
> +        .quad 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7
> +        /*== ca1 ==*/
> +        .align 64
> +        .quad 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e
> +        /*== c4s ==*/
> +        .align 64
> +        .quad 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612
> +        /*== c3s ==*/
> +        .align 64
> +        .quad 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000
> +        /*== c2s ==*/
> +        .align 64
> +        .quad 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000
> +        /*== c1s ==*/
> +        .align 64
> +        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
> +        /*== AddB5 ==*/
> +        .align 64
> +        .quad 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000
> +        /*== RcpBitMask ==*/
> +        .align 64
> +        .quad 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000
> +        /*==OneEighth ==*/
> +        .align 64
> +        .quad 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000
> +        /*== Four ==*/
> +        .align 64
> +        .quad 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000
> +        /*== poly_coeff9 ==*/
> +        .align 64
> +        .quad 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368
> +        /*== poly_coeff8 ==*/
> +        .align 64
> +        .quad 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778
> +        /*== poly_coeff7 ==*/
> +        .align 64
> +        .quad 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9
> +        /*== poly_coeff6 ==*/
> +        .align 64
> +        .quad 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1
> +        /*== poly_coeff5 ==*/
> +        .align 64
> +        .quad 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736
> +        /*== poly_coeff4 ==*/
> +        .align 64
> +        .quad 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .quad 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .quad 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .quad 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000
> +        /*== L2H = log(2)_high ==*/
> +        .align 64
> +        .quad 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000
> +        /*== L2L = log(2)_low ==*/
> +        .align 64
> +        .quad 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000
> +        .align 64
> +        .type	__svml_dacosh_data_internal_avx512,@object
> +        .size	__svml_dacosh_data_internal_avx512,.-__svml_dacosh_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core-avx2.S
> new file mode 100644
> index 0000000000..a54c6863c5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized acoshf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_acoshf _ZGVeN16v_acoshf_avx2_wrapper
> +#include "../svml_s_acoshf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core.c
> new file mode 100644
> index 0000000000..8109b73ebf
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized acoshf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_acoshf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_acoshf, __GI__ZGVeN16v_acoshf,
> +	       __redirect__ZGVeN16v_acoshf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core_avx512.S
> new file mode 100644
> index 0000000000..688ca38669
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf16_core_avx512.S
> @@ -0,0 +1,449 @@
> +/* Function acoshf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute acosh(x) as log(x + sqrt(x*x - 1))
> + *   using RSQRT instructions for starting the
> + *   square root approximation, and small table lookups for log
> + *   that map to AVX-512 permute instructions
> + *
> + *   Special cases:
> + *
> + *   acosh(NaN)  = quiet NaN, and raise invalid exception
> + *   acosh(-INF) = NaN
> + *   acosh(+INF) = +INF
> + *   acosh(x)    = NaN if x < 1
> + *   acosh(1)    = +0
> + *
> + */
> +
> +/* Offsets for data table __svml_sacosh_data_internal_avx512
> + */
> +#define Log_tbl_H                     	0
> +#define Log_tbl_L                     	128
> +#define One                           	256
> +#define SmallThreshold                	320
> +#define Threshold                     	384
> +#define LargeThreshold                	448
> +#define ca1                           	512
> +#define c2s                           	576
> +#define c1s                           	640
> +#define AddB5                         	704
> +#define RcpBitMask                    	768
> +#define OneEighth                     	832
> +#define Four                          	896
> +#define poly_coeff3                   	960
> +#define poly_coeff2                   	1024
> +#define poly_coeff1                   	1088
> +#define L2H                           	1152
> +#define L2L                           	1216
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_acoshf_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   One+__svml_sacosh_data_internal_avx512(%rip), %zmm1
> +
> +/*
> + * sqrt(1+x^2) ~ Sh + Sl + Sh*Eh*poly_s
> + * poly_s = c1+c2*Eh
> + */
> +        vmovups   c2s+__svml_sacosh_data_internal_avx512(%rip), %zmm13
> +        vmovups   c1s+__svml_sacosh_data_internal_avx512(%rip), %zmm15
> +
> +/* polynomial computation for small inputs */
> +        vmovups   ca1+__svml_sacosh_data_internal_avx512(%rip), %zmm9
> +
> +/* very large inputs ? */
> +        vmovups   Threshold+__svml_sacosh_data_internal_avx512(%rip), %zmm10
> +
> +/* out of range inputs? */
> +        vmovups   LargeThreshold+__svml_sacosh_data_internal_avx512(%rip), %zmm11
> +
> +/* not a very small input ? */
> +        vmovups   SmallThreshold+__svml_sacosh_data_internal_avx512(%rip), %zmm6
> +        vmovaps   %zmm0, %zmm8
> +
> +/* x^2 - 1 */
> +        vmovaps   %zmm1, %zmm7
> +        vfmsub231ps {rn-sae}, %zmm8, %zmm8, %zmm7
> +        vcmpps    $21, {sae}, %zmm10, %zmm8, %k2
> +        vcmpps    $22, {sae}, %zmm11, %zmm8, %k0
> +        vcmpps    $18, {sae}, %zmm1, %zmm8, %k1
> +        vrsqrt14ps %zmm7, %zmm12
> +        vcmpps    $21, {sae}, %zmm6, %zmm7, %k3
> +        vmulps    {rn-sae}, %zmm9, %zmm7, %zmm4
> +
> +/* Sh ~sqrt(-1+x^2) */
> +        vmulps    {rn-sae}, %zmm12, %zmm7, %zmm5
> +
> +/* Sh+x */
> +        vaddps    {rn-sae}, %zmm8, %zmm5, %zmm9
> +
> +/* (Yh*R0)_low */
> +        vmovaps   %zmm7, %zmm0
> +        korw      %k0, %k1, %k0
> +
> +/* rel. error term: Eh=1-Sh*R0 */
> +        vmovaps   %zmm1, %zmm14
> +        vfmsub213ps {rn-sae}, %zmm5, %zmm12, %zmm0
> +        vfnmadd231ps {rn-sae}, %zmm5, %zmm12, %zmm14
> +
> +/* rel. error term: Eh=(1-Sh*R0)-Sl*R0 */
> +        vfnmadd231ps {rn-sae}, %zmm0, %zmm12, %zmm14
> +
> +/* Sh*Eh */
> +        vmulps    {rn-sae}, %zmm14, %zmm5, %zmm3
> +        vfmadd231ps {rn-sae}, %zmm14, %zmm13, %zmm15
> +
> +/* Sl + Sh*Eh*poly_s */
> +        vfmadd213ps {rn-sae}, %zmm0, %zmm15, %zmm3
> +
> +/* Shh */
> +        vsubps    {rn-sae}, %zmm8, %zmm9, %zmm15
> +
> +/* polynomial computation for small inputs */
> +        vaddps    {rn-sae}, %zmm3, %zmm5, %zmm0
> +
> +/* Xin0+Sl+Sh*Eh*poly_s ~ x+sqrt(1+x^2) */
> +        vaddps    {rn-sae}, %zmm3, %zmm9, %zmm2
> +
> +/* Shl */
> +        vsubps    {rn-sae}, %zmm15, %zmm5, %zmm10
> +        vfmadd231ps {rn-sae}, %zmm0, %zmm4, %zmm0
> +
> +/* fixup for very large inputs */
> +        vmovups   OneEighth+__svml_sacosh_data_internal_avx512(%rip), %zmm4
> +
> +/* Sl_high */
> +        vsubps    {rn-sae}, %zmm9, %zmm2, %zmm5
> +
> +/* polynomial */
> +        vmovups   poly_coeff3+__svml_sacosh_data_internal_avx512(%rip), %zmm9
> +        vmulps    {rn-sae}, %zmm4, %zmm8, %zmm2{%k2}
> +
> +/* -K*L2L + Tl */
> +        vmovups   L2L+__svml_sacosh_data_internal_avx512(%rip), %zmm4
> +
> +/* Sl_l */
> +        vsubps    {rn-sae}, %zmm5, %zmm3, %zmm3
> +        vrcp14ps  %zmm2, %zmm11
> +        vmovups   Log_tbl_L+__svml_sacosh_data_internal_avx512(%rip), %zmm5
> +
> +/* Xin_low */
> +        vaddps    {rn-sae}, %zmm10, %zmm3, %zmm13
> +
> +/* round reciprocal to 1+4b mantissas */
> +        vpaddd    AddB5+__svml_sacosh_data_internal_avx512(%rip), %zmm11, %zmm12
> +        vmovups   poly_coeff1+__svml_sacosh_data_internal_avx512(%rip), %zmm10
> +        vandps    RcpBitMask+__svml_sacosh_data_internal_avx512(%rip), %zmm12, %zmm14
> +
> +/* fixup for very large inputs */
> +        vxorps    %zmm13, %zmm13, %zmm13{%k2}
> +
> +/* reduced argument for log(): (Rcp*Xin-1)+Rcp*Xin_low */
> +        vfmsub231ps {rn-sae}, %zmm14, %zmm2, %zmm1
> +
> +/* exponents */
> +        vgetexpps {sae}, %zmm14, %zmm12
> +        vmovups   Four+__svml_sacosh_data_internal_avx512(%rip), %zmm2
> +
> +/* Prepare table index */
> +        vpsrld    $18, %zmm14, %zmm3
> +        vfmadd231ps {rn-sae}, %zmm14, %zmm13, %zmm1
> +        vmovups   poly_coeff2+__svml_sacosh_data_internal_avx512(%rip), %zmm13
> +
> +/* Table lookups */
> +        vmovups   __svml_sacosh_data_internal_avx512(%rip), %zmm14
> +        vsubps    {rn-sae}, %zmm2, %zmm12, %zmm12{%k2}
> +        vpermt2ps Log_tbl_L+64+__svml_sacosh_data_internal_avx512(%rip), %zmm3, %zmm5
> +        vpermt2ps Log_tbl_H+64+__svml_sacosh_data_internal_avx512(%rip), %zmm3, %zmm14
> +
> +/* R^2 */
> +        vmulps    {rn-sae}, %zmm1, %zmm1, %zmm11
> +
> +/* -K*L2H + Th */
> +        vmovups   L2H+__svml_sacosh_data_internal_avx512(%rip), %zmm2
> +        vfmadd231ps {rn-sae}, %zmm1, %zmm9, %zmm13
> +        vfnmadd231ps {rn-sae}, %zmm12, %zmm2, %zmm14
> +        vfnmadd213ps {rn-sae}, %zmm5, %zmm4, %zmm12
> +        vfmadd213ps {rn-sae}, %zmm10, %zmm1, %zmm13
> +
> +/* Tl + R^2*Poly */
> +        vfmadd213ps {rn-sae}, %zmm12, %zmm11, %zmm13
> +
> +/* R+Tl + R^2*Poly */
> +        vaddps    {rn-sae}, %zmm1, %zmm13, %zmm1
> +        vaddps    {rn-sae}, %zmm1, %zmm14, %zmm0{%k3}
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 k0 zmm0 zmm8
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm8, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 k0 zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax k0
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        kmovd     %k0, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      acoshf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_acoshf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_sacosh_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Log_tbl_H[32][1];
> +        __declspec(align(64)) VUINT32 Log_tbl_L[32][1];
> +        __declspec(align(64)) VUINT32 One[16][1];
> +        __declspec(align(64)) VUINT32 SmallThreshold[16][1];
> +        __declspec(align(64)) VUINT32 Threshold[16][1];
> +        __declspec(align(64)) VUINT32 LargeThreshold[16][1];
> +        __declspec(align(64)) VUINT32 ca1[16][1];
> +        __declspec(align(64)) VUINT32 c2s[16][1];
> +        __declspec(align(64)) VUINT32 c1s[16][1];
> +        __declspec(align(64)) VUINT32 AddB5[16][1];
> +        __declspec(align(64)) VUINT32 RcpBitMask[16][1];
> +        __declspec(align(64)) VUINT32 OneEighth[16][1];
> +        __declspec(align(64)) VUINT32 Four[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
> +        __declspec(align(64)) VUINT32 L2H[16][1];
> +        __declspec(align(64)) VUINT32 L2L[16][1];
> +    } __svml_sacosh_data_internal_avx512;
> +#endif
> +__svml_sacosh_data_internal_avx512:
> +        /*== Log_tbl_H ==*/
> +        .long 0x00000000
> +        .long 0xbcfc0000
> +        .long 0xbd788000
> +        .long 0xbdb78000
> +        .long 0xbdf14000
> +        .long 0xbe14a000
> +        .long 0xbe300000
> +        .long 0xbe4aa000
> +        .long 0xbe648000
> +        .long 0xbe7dc000
> +        .long 0xbe8b4000
> +        .long 0xbe974000
> +        .long 0xbea31000
> +        .long 0xbeae9000
> +        .long 0xbeb9d000
> +        .long 0xbec4d000
> +        .long 0xbecfa000
> +        .long 0xbeda2000
> +        .long 0xbee48000
> +        .long 0xbeeea000
> +        .long 0xbef89000
> +        .long 0xbf012800
> +        .long 0xbf05f000
> +        .long 0xbf0aa800
> +        .long 0xbf0f4000
> +        .long 0xbf13c800
> +        .long 0xbf184000
> +        .long 0xbf1ca000
> +        .long 0xbf20f000
> +        .long 0xbf252800
> +        .long 0xbf295000
> +        .long 0xbf2d6800
> +        /*== Log_tbl_L ==*/
> +        .align 64
> +        .long 0x80000000
> +        .long 0xb726c39e
> +        .long 0x3839e7fe
> +        .long 0xb7528ae5
> +        .long 0x377891d5
> +        .long 0xb8297c10
> +        .long 0x37cf8f58
> +        .long 0x3852b186
> +        .long 0x35838656
> +        .long 0xb80c36af
> +        .long 0x38235454
> +        .long 0xb862bae1
> +        .long 0x37e87bc7
> +        .long 0x37848150
> +        .long 0x37202511
> +        .long 0xb74e1b05
> +        .long 0x385c1340
> +        .long 0xb8777bcd
> +        .long 0x36038656
> +        .long 0xb7d40984
> +        .long 0xb80f5faf
> +        .long 0xb8254b4c
> +        .long 0xb865c84a
> +        .long 0x37f0b42d
> +        .long 0xb83ebce1
> +        .long 0xb83c2513
> +        .long 0x37a332c4
> +        .long 0x3779654f
> +        .long 0x38602f73
> +        .long 0x367449f8
> +        .long 0xb7b4996f
> +        .long 0xb800986b
> +        /*== One ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== SmallThreshold ==*/
> +        .align 64
> +        .long 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000, 0x39800000
> +        /*== Threshold ==*/
> +        .align 64
> +        .long 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000
> +        /*== LargeThreshold ==*/
> +        .align 64
> +        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
> +        /*== ca1 ==*/
> +        .align 64
> +        .long 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE
> +        /*== c2s ==*/
> +        .align 64
> +        .long 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000
> +        /*== c1s ==*/
> +        .align 64
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
> +        /*== AddB5 ==*/
> +        .align 64
> +        .long 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000
> +        /*== RcpBitMask ==*/
> +        .align 64
> +        .long 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000
> +        /*==OneEighth ==*/
> +        .align 64
> +        .long 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000
> +        /*== Four ==*/
> +        .align 64
> +        .long 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .long 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .long 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000
> +        /*== L2H = log(2)_high ==*/
> +        .align 64
> +        .long 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000
> +        /*== L2L = log(2)_low ==*/
> +        .align 64
> +        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4
> +        .align 64
> +        .type	__svml_sacosh_data_internal_avx512,@object
> +        .size	__svml_sacosh_data_internal_avx512,.-__svml_sacosh_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core-sse2.S
> new file mode 100644
> index 0000000000..d789ec1d47
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized acoshf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_acoshf _ZGVbN4v_acoshf_sse2
> +#include "../svml_s_acoshf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core.c
> new file mode 100644
> index 0000000000..b2d9101c47
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized acoshf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_acoshf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_acoshf, __GI__ZGVbN4v_acoshf,
> +	       __redirect__ZGVbN4v_acoshf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core_sse4.S
> new file mode 100644
> index 0000000000..e897ea304f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf4_core_sse4.S
> @@ -0,0 +1,389 @@
> +/* Function acoshf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute acosh(x) as log(x + sqrt(x*x - 1))
> + *
> + *   Special cases:
> + *
> + *   acosh(NaN)  = quiet NaN, and raise invalid exception
> + *   acosh(-INF) = NaN
> + *   acosh(+INF) = +INF
> + *   acosh(x)    = NaN if x < 1
> + *   acosh(1)    = +0
> + *
> + */
> +
> +/* Offsets for data table __svml_sacosh_data_internal
> + */
> +#define sOne                          	0
> +#define sPoly                         	16
> +#define iBrkValue                     	144
> +#define iOffExpoMask                  	160
> +#define sBigThreshold                 	176
> +#define sC2                           	192
> +#define sC3                           	208
> +#define sHalf                         	224
> +#define sLargestFinite                	240
> +#define sThirtyOne                    	256
> +#define sTopMask8                     	272
> +#define XScale                        	288
> +#define sLn2                          	304
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_acoshf_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +
> +/* Compute U = X - 1 and V = X + 1, naively first. */
> +        movaps    %xmm0, %xmm12
> +
> +/* Load constants, always including One = 1 */
> +        movups    sOne+__svml_sacosh_data_internal(%rip), %xmm2
> +
> +/*
> + * Check that 1 < X < +inf; otherwise go to the callout function.
> + * We need the callout for X = 1 to avoid division by zero below.
> + * This test ensures that callout handles NaN and either infinity.
> + */
> +        movaps    %xmm0, %xmm4
> +        movaps    %xmm2, %xmm9
> +
> +/*
> + * Compute e = -(2 * d + d^2)
> + * The first FMR is exact, and the rounding error in the other is acceptable
> + * since d and e are ~ 2^-8
> + */
> +        movaps    %xmm2, %xmm10
> +
> +/* Finally, express Y + W = U * V accurately where Y has <= 8 bits */
> +        movups    sTopMask8+__svml_sacosh_data_internal(%rip), %xmm5
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * also adding L into Xl.
> + * compute 1+x as high, low parts
> + */
> +        movaps    %xmm2, %xmm13
> +        movaps    %xmm5, %xmm11
> +        movaps    %xmm2, %xmm3
> +
> +/*
> + * Now       1 / (1 + d)
> + * = 1 / (1 + (sqrt(1 - e) - 1))
> + * = 1 / sqrt(1 - e)
> + * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 + ...
> + * So compute the first three nonconstant terms of that, so that
> + * we have a relative correction (1 + Corr) to apply to S etc.
> + * C1 = 1/2
> + * C2 = 3/8
> + * C3 = 5/16
> + */
> +        movups    sC3+__svml_sacosh_data_internal(%rip), %xmm8
> +
> +/*
> + * The following computation can go wrong for very large X, e.g.
> + * the X^2 - 1 = U * V can overflow. But for large X we have
> + * acosh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
> + * we can just later stick X back into the log and tweak up the exponent.
> + * Actually we scale X by 2^-30 and tweak the exponent up by 31,
> + * to stay in the safe range for the later log computation.
> + * Compute a flag now telling us when to do this.
> + */
> +        movaps    %xmm0, %xmm1
> +        cmpnleps  sLargestFinite+__svml_sacosh_data_internal(%rip), %xmm4
> +        cmpltps   sBigThreshold+__svml_sacosh_data_internal(%rip), %xmm1
> +        cmpnltps  %xmm0, %xmm3
> +        subps     %xmm2, %xmm12
> +        addps     %xmm0, %xmm9
> +
> +/* For low-accuracy versions, naivety is harmless */
> +        mulps     %xmm12, %xmm9
> +        orps      %xmm3, %xmm4
> +        movmskps  %xmm4, %edx
> +        andps     %xmm9, %xmm11
> +        movaps    %xmm1, %xmm3
> +
> +/*
> + * Compute R = 1/sqrt(Y + W) * (1 + d)
> + * Force R to <= 8 significant bits.
> + * This means that R * Y and R^2 * Y are exactly representable.
> + */
> +        rsqrtps   %xmm11, %xmm7
> +        subps     %xmm11, %xmm9
> +        andps     %xmm5, %xmm7
> +        movaps    %xmm2, %xmm4
> +
> +/*
> + * Compute S = (Y/sqrt(Y + W)) * (1 + d)
> + * and T = (W/sqrt(Y + W)) * (1 + d)
> + * so that S + T = sqrt(Y + W) * (1 + d)
> + * S is exact, and the rounding error in T is OK.
> + */
> +        mulps     %xmm7, %xmm11
> +        movaps    %xmm7, %xmm6
> +        mulps     %xmm7, %xmm9
> +        mulps     %xmm11, %xmm6
> +        mulps     %xmm9, %xmm7
> +
> +/*
> + * For low-accuracy versions, the computation can be done
> + * just as U + ((S + T) + (S + T) * Corr)
> + */
> +        addps     %xmm9, %xmm11
> +        subps     %xmm6, %xmm10
> +        movaps    %xmm2, %xmm9
> +        subps     %xmm7, %xmm10
> +        mulps     %xmm10, %xmm8
> +
> +/* Now multiplex to the case X = 2^-30 * input, Xl = 0 in the "big" case. */
> +        movups    XScale+__svml_sacosh_data_internal(%rip), %xmm14
> +        mulps     %xmm0, %xmm14
> +        addps     sC2+__svml_sacosh_data_internal(%rip), %xmm8
> +        mulps     %xmm10, %xmm8
> +        andnps    %xmm14, %xmm3
> +
> +/*
> + * Now resume the main code.
> + * reduction: compute r,n
> + */
> +        movdqu    iBrkValue+__svml_sacosh_data_internal(%rip), %xmm14
> +        movdqu    iOffExpoMask+__svml_sacosh_data_internal(%rip), %xmm5
> +
> +/* Add 31 to the exponent in the "large" case to get log(2 * input) */
> +        movups    sThirtyOne+__svml_sacosh_data_internal(%rip), %xmm6
> +        addps     sHalf+__svml_sacosh_data_internal(%rip), %xmm8
> +        mulps     %xmm8, %xmm10
> +        movaps    %xmm1, %xmm8
> +        mulps     %xmm11, %xmm10
> +        addps     %xmm10, %xmm11
> +        addps     %xmm11, %xmm12
> +        maxps     %xmm12, %xmm13
> +        minps     %xmm12, %xmm9
> +        movaps    %xmm13, %xmm15
> +        addps     %xmm9, %xmm15
> +        subps     %xmm15, %xmm13
> +        andps     %xmm1, %xmm15
> +        orps      %xmm15, %xmm3
> +        addps     %xmm13, %xmm9
> +        psubd     %xmm14, %xmm3
> +        andps     %xmm1, %xmm9
> +        pand      %xmm3, %xmm5
> +        psrad     $23, %xmm3
> +        cvtdq2ps  %xmm3, %xmm7
> +        pslld     $23, %xmm3
> +        paddd     %xmm14, %xmm5
> +        psubd     %xmm3, %xmm4
> +
> +/* polynomial evaluation */
> +        subps     %xmm2, %xmm5
> +        mulps     %xmm4, %xmm9
> +        addps     %xmm7, %xmm6
> +        movups    sPoly+112+__svml_sacosh_data_internal(%rip), %xmm2
> +        andnps    %xmm6, %xmm8
> +        andps     %xmm1, %xmm7
> +        addps     %xmm5, %xmm9
> +        mulps     %xmm9, %xmm2
> +        orps      %xmm7, %xmm8
> +
> +/* final reconstruction */
> +        mulps     sLn2+__svml_sacosh_data_internal(%rip), %xmm8
> +        addps     sPoly+96+__svml_sacosh_data_internal(%rip), %xmm2
> +        mulps     %xmm9, %xmm2
> +        addps     sPoly+80+__svml_sacosh_data_internal(%rip), %xmm2
> +        mulps     %xmm9, %xmm2
> +        addps     sPoly+64+__svml_sacosh_data_internal(%rip), %xmm2
> +        mulps     %xmm9, %xmm2
> +        addps     sPoly+48+__svml_sacosh_data_internal(%rip), %xmm2
> +        mulps     %xmm9, %xmm2
> +        addps     sPoly+32+__svml_sacosh_data_internal(%rip), %xmm2
> +        mulps     %xmm9, %xmm2
> +        addps     sPoly+16+__svml_sacosh_data_internal(%rip), %xmm2
> +        mulps     %xmm9, %xmm2
> +        addps     sPoly+__svml_sacosh_data_internal(%rip), %xmm2
> +        mulps     %xmm9, %xmm2
> +        mulps     %xmm9, %xmm2
> +        addps     %xmm2, %xmm9
> +        addps     %xmm8, %xmm9
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm9
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm9, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm9, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm9
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm9
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      acoshf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_acoshf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_sacosh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 sOne[4][1];
> +        __declspec(align(16)) VUINT32 sPoly[8][4][1];
> +        __declspec(align(16)) VUINT32 iBrkValue[4][1];
> +        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
> +        __declspec(align(16)) VUINT32 sBigThreshold[4][1];
> +        __declspec(align(16)) VUINT32 sC2[4][1];
> +        __declspec(align(16)) VUINT32 sC3[4][1];
> +        __declspec(align(16)) VUINT32 sHalf[4][1];
> +        __declspec(align(16)) VUINT32 sLargestFinite[4][1];
> +        __declspec(align(16)) VUINT32 sThirtyOne[4][1];
> +        __declspec(align(16)) VUINT32 sTopMask8[4][1];
> +        __declspec(align(16)) VUINT32 XScale[4][1];
> +        __declspec(align(16)) VUINT32 sLn2[4][1];
> +} __svml_sacosh_data_internal;
> +#endif
> +__svml_sacosh_data_internal:
> +        /*== sOne = SP 1.0 ==*/
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== sPoly[] = SP polynomial ==*/
> +        .align 16
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 16
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 16
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sBigThreshold ==*/
> +        .align 16
> +        .long 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000
> +        /*== sC2 ==*/
> +        .align 16
> +        .long 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000
> +        /*== sC3 ==*/
> +        .align 16
> +        .long 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000
> +        /*== sHalf ==*/
> +        .align 16
> +        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
> +        /*== sLargestFinite ==*/
> +        .align 16
> +        .long 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF
> +        /*== sThirtyOne ==*/
> +        .align 16
> +        .long 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000
> +        /*== sTopMask8 ==*/
> +        .align 16
> +        .long 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000
> +        /*== XScale ==*/
> +        .align 16
> +        .long 0x30800000, 0x30800000, 0x30800000, 0x30800000
> +        /*== sLn2 = SP ln(2) ==*/
> +        .align 16
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> +        .align 16
> +        .type	__svml_sacosh_data_internal,@object
> +        .size	__svml_sacosh_data_internal,.-__svml_sacosh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core-sse.S
> new file mode 100644
> index 0000000000..cb97d291c5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized acoshf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_acoshf _ZGVdN8v_acoshf_sse_wrapper
> +#include "../svml_s_acoshf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core.c
> new file mode 100644
> index 0000000000..db71194cd0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized acoshf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_acoshf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_acoshf, __GI__ZGVdN8v_acoshf,
> +	       __redirect__ZGVdN8v_acoshf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core_avx2.S
> new file mode 100644
> index 0000000000..1d847fcd40
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acoshf8_core_avx2.S
> @@ -0,0 +1,370 @@
> +/* Function acoshf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute acosh(x) as log(x + sqrt(x*x - 1))
> + *
> + *   Special cases:
> + *
> + *   acosh(NaN)  = quiet NaN, and raise invalid exception
> + *   acosh(-INF) = NaN
> + *   acosh(+INF) = +INF
> + *   acosh(x)    = NaN if x < 1
> + *   acosh(1)    = +0
> + *
> + */
> +
> +/* Offsets for data table __svml_sacosh_data_internal
> + */
> +#define sOne                          	0
> +#define sPoly                         	32
> +#define iBrkValue                     	288
> +#define iOffExpoMask                  	320
> +#define sBigThreshold                 	352
> +#define sC2                           	384
> +#define sC3                           	416
> +#define sHalf                         	448
> +#define sLargestFinite                	480
> +#define sThirtyOne                    	512
> +#define sTopMask8                     	544
> +#define XScale                        	576
> +#define sLn2                          	608
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_acoshf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +
> +/* Load constants, always including One = 1 */
> +        vmovups   sOne+__svml_sacosh_data_internal(%rip), %ymm2
> +
> +/* Finally, express Y + W = U * V accurately where Y has <= 8 bits */
> +        vmovups   sTopMask8+__svml_sacosh_data_internal(%rip), %ymm9
> +
> +/*
> + * Now       1 / (1 + d)
> + * = 1 / (1 + (sqrt(1 - e) - 1))
> + * = 1 / sqrt(1 - e)
> + * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 + ...
> + * So compute the first three nonconstant terms of that, so that
> + * we have a relative correction (1 + Corr) to apply to S etc.
> + * C1 = 1/2
> + * C2 = 3/8
> + * C3 = 5/16
> + */
> +        vmovups   sC3+__svml_sacosh_data_internal(%rip), %ymm14
> +        vmovaps   %ymm0, %ymm3
> +        vmovaps   %ymm2, %ymm7
> +        vfmsub231ps %ymm3, %ymm3, %ymm7
> +
> +/*
> + * Check that 1 < X < +inf; otherwise go to the callout function.
> + * We need the callout for X = 1 to avoid division by zero below.
> + * This test ensures that callout handles NaN and either infinity.
> + */
> +        vcmpnle_uqps sLargestFinite+__svml_sacosh_data_internal(%rip), %ymm3, %ymm4
> +        vcmpngt_uqps %ymm2, %ymm3, %ymm5
> +
> +/*
> + * The following computation can go wrong for very large X, e.g.
> + * the X^2 - 1 = U * V can overflow. But for large X we have
> + * acosh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
> + * we can just later stick X back into the log and tweak up the exponent.
> + * Actually we scale X by 2^-30 and tweak the exponent up by 31,
> + * to stay in the safe range for the later log computation.
> + * Compute a flag now telling us when to do this.
> + */
> +        vcmplt_oqps sBigThreshold+__svml_sacosh_data_internal(%rip), %ymm3, %ymm1
> +        vandps    %ymm9, %ymm7, %ymm10
> +
> +/*
> + * Compute R = 1/sqrt(Y + W) * (1 + d)
> + * Force R to <= 8 significant bits.
> + * This means that R * Y and R^2 * Y are exactly representable.
> + */
> +        vrsqrtps  %ymm10, %ymm8
> +        vsubps    %ymm10, %ymm7, %ymm11
> +        vandps    %ymm9, %ymm8, %ymm12
> +
> +/*
> + * Compute S = (Y/sqrt(Y + W)) * (1 + d)
> + * and T = (W/sqrt(Y + W)) * (1 + d)
> + * so that S + T = sqrt(Y + W) * (1 + d)
> + * S is exact, and the rounding error in T is OK.
> + */
> +        vmulps    %ymm12, %ymm10, %ymm15
> +        vmulps    %ymm11, %ymm12, %ymm0
> +
> +/* Now multiplex to the case X = 2^-30 * input, Xl = 0 in the "big" case. */
> +        vmulps    XScale+__svml_sacosh_data_internal(%rip), %ymm3, %ymm11
> +
> +/*
> + * Compute e = -(2 * d + d^2)
> + * The first FMR is exact, and the rounding error in the other is acceptable
> + * since d and e are ~ 2^-8
> + */
> +        vmovaps   %ymm2, %ymm13
> +        vfnmadd231ps %ymm15, %ymm12, %ymm13
> +        vfnmadd231ps %ymm0, %ymm12, %ymm13
> +        vfmadd213ps sC2+__svml_sacosh_data_internal(%rip), %ymm13, %ymm14
> +        vfmadd213ps sHalf+__svml_sacosh_data_internal(%rip), %ymm13, %ymm14
> +        vmulps    %ymm14, %ymm13, %ymm7
> +        vorps     %ymm5, %ymm4, %ymm6
> +
> +/*
> + * For low-accuracy versions, the computation can be done
> + * just as U + ((S + T) + (S + T) * Corr)
> + */
> +        vaddps    %ymm0, %ymm15, %ymm5
> +
> +/* sU is needed later on */
> +        vsubps    %ymm2, %ymm3, %ymm4
> +        vfmadd213ps %ymm5, %ymm7, %ymm5
> +        vmovmskps %ymm6, %edx
> +        vaddps    %ymm5, %ymm4, %ymm6
> +
> +/*
> + * Now resume the main code.
> + * reduction: compute r,n
> + */
> +        vmovups   iBrkValue+__svml_sacosh_data_internal(%rip), %ymm4
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * also adding L into Xl.
> + * compute 1+x as high, low parts
> + */
> +        vmaxps    %ymm6, %ymm2, %ymm8
> +        vminps    %ymm6, %ymm2, %ymm9
> +        vaddps    %ymm9, %ymm8, %ymm12
> +        vblendvps %ymm1, %ymm12, %ymm11, %ymm14
> +        vsubps    %ymm12, %ymm8, %ymm10
> +        vpsubd    %ymm4, %ymm14, %ymm15
> +        vaddps    %ymm10, %ymm9, %ymm13
> +        vpand     iOffExpoMask+__svml_sacosh_data_internal(%rip), %ymm15, %ymm14
> +        vpsrad    $23, %ymm15, %ymm15
> +        vpaddd    %ymm4, %ymm14, %ymm8
> +        vpslld    $23, %ymm15, %ymm5
> +        vmovups   sPoly+224+__svml_sacosh_data_internal(%rip), %ymm4
> +        vcvtdq2ps %ymm15, %ymm0
> +        vpsubd    %ymm5, %ymm2, %ymm7
> +
> +/* polynomial evaluation */
> +        vsubps    %ymm2, %ymm8, %ymm2
> +
> +/* Add 31 to the exponent in the "large" case to get log(2 * input) */
> +        vaddps    sThirtyOne+__svml_sacosh_data_internal(%rip), %ymm0, %ymm5
> +        vandps    %ymm1, %ymm13, %ymm6
> +        vmulps    %ymm7, %ymm6, %ymm9
> +        vblendvps %ymm1, %ymm0, %ymm5, %ymm0
> +        vaddps    %ymm2, %ymm9, %ymm2
> +        vfmadd213ps sPoly+192+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
> +        vfmadd213ps sPoly+160+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
> +        vfmadd213ps sPoly+128+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
> +        vfmadd213ps sPoly+96+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
> +        vfmadd213ps sPoly+64+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
> +        vfmadd213ps sPoly+32+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
> +        vfmadd213ps sPoly+__svml_sacosh_data_internal(%rip), %ymm2, %ymm4
> +        vmulps    %ymm4, %ymm2, %ymm6
> +        vfmadd213ps %ymm2, %ymm2, %ymm6
> +
> +/* final reconstruction */
> +        vfmadd132ps sLn2+__svml_sacosh_data_internal(%rip), %ymm6, %ymm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm3, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      acoshf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_acoshf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_sacosh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 sOne[8][1];
> +        __declspec(align(32)) VUINT32 sPoly[8][8][1];
> +        __declspec(align(32)) VUINT32 iBrkValue[8][1];
> +        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
> +        __declspec(align(32)) VUINT32 sBigThreshold[8][1];
> +        __declspec(align(32)) VUINT32 sC2[8][1];
> +        __declspec(align(32)) VUINT32 sC3[8][1];
> +        __declspec(align(32)) VUINT32 sHalf[8][1];
> +        __declspec(align(32)) VUINT32 sLargestFinite[8][1];
> +        __declspec(align(32)) VUINT32 sThirtyOne[8][1];
> +        __declspec(align(32)) VUINT32 sTopMask8[8][1];
> +        __declspec(align(32)) VUINT32 XScale[8][1];
> +        __declspec(align(32)) VUINT32 sLn2[8][1];
> +} __svml_sacosh_data_internal;
> +#endif
> +__svml_sacosh_data_internal:
> +        /*== sOne = SP 1.0 ==*/
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== sPoly[] = SP polynomial ==*/
> +        .align 32
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 32
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 32
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sBigThreshold ==*/
> +        .align 32
> +        .long 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000
> +        /*== sC2 ==*/
> +        .align 32
> +        .long 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000
> +        /*== sC3 ==*/
> +        .align 32
> +        .long 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000
> +        /*== sHalf ==*/
> +        .align 32
> +        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
> +        /*== sLargestFinite ==*/
> +        .align 32
> +        .long 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF
> +        /*== sThirtyOne ==*/
> +        .align 32
> +        .long 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000
> +        /*== sTopMask8 ==*/
> +        .align 32
> +        .long 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000
> +        /*== XScale ==*/
> +        .align 32
> +        .long 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000
> +        /*== sLn2 = SP ln(2) ==*/
> +        .align 32
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> +        .align 32
> +        .type	__svml_sacosh_data_internal,@object
> +        .size	__svml_sacosh_data_internal,.-__svml_sacosh_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_acosh2_core.S b/sysdeps/x86_64/fpu/svml_d_acosh2_core.S
> new file mode 100644
> index 0000000000..42bd5c1b5d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_acosh2_core.S
> @@ -0,0 +1,29 @@
> +/* Function acosh vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_acosh)
> +WRAPPER_IMPL_SSE2 acosh
> +END (_ZGVbN2v_acosh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_acosh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_acosh4_core.S b/sysdeps/x86_64/fpu/svml_d_acosh4_core.S
> new file mode 100644
> index 0000000000..433192bae1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_acosh4_core.S
> @@ -0,0 +1,29 @@
> +/* Function acosh vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_acosh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_acosh
> +END (_ZGVdN4v_acosh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_acosh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S
> new file mode 100644
> index 0000000000..9e60289c45
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_acosh4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function acosh vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_acosh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_acosh
> +END (_ZGVcN4v_acosh)
> diff --git a/sysdeps/x86_64/fpu/svml_d_acosh8_core.S b/sysdeps/x86_64/fpu/svml_d_acosh8_core.S
> new file mode 100644
> index 0000000000..ef1f8b3426
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_acosh8_core.S
> @@ -0,0 +1,25 @@
> +/* Function acosh vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_acosh)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_acosh
> +END (_ZGVeN8v_acosh)
> diff --git a/sysdeps/x86_64/fpu/svml_s_acoshf16_core.S b/sysdeps/x86_64/fpu/svml_s_acoshf16_core.S
> new file mode 100644
> index 0000000000..41c0241492
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_acoshf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function acoshf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_acoshf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_acoshf
> +END (_ZGVeN16v_acoshf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_acoshf4_core.S b/sysdeps/x86_64/fpu/svml_s_acoshf4_core.S
> new file mode 100644
> index 0000000000..2ef7f428c0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_acoshf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function acoshf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_acoshf)
> +WRAPPER_IMPL_SSE2 acoshf
> +END (_ZGVbN4v_acoshf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_acoshf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_acoshf8_core.S b/sysdeps/x86_64/fpu/svml_s_acoshf8_core.S
> new file mode 100644
> index 0000000000..40f1066ce2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_acoshf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function acoshf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_acoshf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_acoshf
> +END (_ZGVdN8v_acoshf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_acoshf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S
> new file mode 100644
> index 0000000000..b44a9ed28b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_acoshf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function acoshf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_acoshf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_acoshf
> +END (_ZGVcN8v_acoshf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx.c
> new file mode 100644
> index 0000000000..331c6d71cc
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-acosh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx2.c
> new file mode 100644
> index 0000000000..331c6d71cc
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-acosh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx512f.c
> new file mode 100644
> index 0000000000..331c6d71cc
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-acosh-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-acosh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acosh.c b/sysdeps/x86_64/fpu/test-double-libmvec-acosh.c
> new file mode 100644
> index 0000000000..19b5997414
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-acosh.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC acosh
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 04a4fe654b..db7ae3e7a6 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
> +VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVbN2v_acosh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index f9ac2fad5d..269ae38f67 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
> +VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVdN4v_acosh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 185801fa82..d95b960a45 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
> +VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVcN4v_acosh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 1cc8aaecbf..a22f08b5f8 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
> +VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVeN8v_acosh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx.c
> new file mode 100644
> index 0000000000..7d75108bc0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-acoshf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx2.c
> new file mode 100644
> index 0000000000..7d75108bc0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-acoshf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx512f.c
> new file mode 100644
> index 0000000000..7d75108bc0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-acoshf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acoshf.c b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf.c
> new file mode 100644
> index 0000000000..f8b536df2e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-acoshf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC acoshf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index b5d76d80e0..7982ae2c84 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVeN16v_acoshf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index c1df6a03c1..bdfcbea2cd 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVbN4v_acoshf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index f4c646683f..7b3ba81441 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVdN8v_acoshf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index a6acd3ffca..a13d2e4ca1 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -42,6 +42,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVcN8v_acoshf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 17/18] x86-64: Add vector tanh/tanhf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 17/18] x86-64: Add vector tanh/tanhf " Sunil K Pandey
@ 2021-12-29 21:26   ` H.J. Lu
  2022-01-29  1:33   ` Noah Goldstein
  1 sibling, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:26 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:59PM -0800, Sunil K Pandey wrote:
> Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector tanh/tanhf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |   11 +
>  math/bits/mathcalls.h                         |    2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |    4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |    1 +
>  sysdeps/x86_64/fpu/Versions                   |    2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |   15 +
>  .../fpu/multiarch/svml_d_tanh2_core-sse2.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_tanh2_core.c  |   27 +
>  .../fpu/multiarch/svml_d_tanh2_core_sse4.S    | 1272 ++++++++++++++++
>  .../fpu/multiarch/svml_d_tanh4_core-sse.S     |   20 +
>  .../x86_64/fpu/multiarch/svml_d_tanh4_core.c  |   27 +
>  .../fpu/multiarch/svml_d_tanh4_core_avx2.S    | 1279 +++++++++++++++++
>  .../fpu/multiarch/svml_d_tanh8_core-avx2.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_tanh8_core.c  |   27 +
>  .../fpu/multiarch/svml_d_tanh8_core_avx512.S  |  472 ++++++
>  .../fpu/multiarch/svml_s_tanhf16_core-avx2.S  |   20 +
>  .../fpu/multiarch/svml_s_tanhf16_core.c       |   28 +
>  .../multiarch/svml_s_tanhf16_core_avx512.S    |  381 +++++
>  .../fpu/multiarch/svml_s_tanhf4_core-sse2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_s_tanhf4_core.c |   28 +
>  .../fpu/multiarch/svml_s_tanhf4_core_sse4.S   |  832 +++++++++++
>  .../fpu/multiarch/svml_s_tanhf8_core-sse.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_s_tanhf8_core.c |   28 +
>  .../fpu/multiarch/svml_s_tanhf8_core_avx2.S   |  844 +++++++++++
>  sysdeps/x86_64/fpu/svml_d_tanh2_core.S        |   29 +
>  sysdeps/x86_64/fpu/svml_d_tanh4_core.S        |   29 +
>  sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S    |   25 +
>  sysdeps/x86_64/fpu/svml_d_tanh8_core.S        |   25 +
>  sysdeps/x86_64/fpu/svml_s_tanhf16_core.S      |   25 +
>  sysdeps/x86_64/fpu/svml_s_tanhf4_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_s_tanhf8_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S   |   25 +
>  .../x86_64/fpu/test-double-libmvec-tanh-avx.c |    1 +
>  .../fpu/test-double-libmvec-tanh-avx2.c       |    1 +
>  .../fpu/test-double-libmvec-tanh-avx512f.c    |    1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-tanh.c |    3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-libmvec-tanhf-avx.c |    1 +
>  .../fpu/test-float-libmvec-tanhf-avx2.c       |    1 +
>  .../fpu/test-float-libmvec-tanhf-avx512f.c    |    1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c |    3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
>  50 files changed, 5647 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 33d480031b..21f1a43232 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -285,4 +285,15 @@
>  #define __DECL_SIMD_erff32x
>  #define __DECL_SIMD_erff64x
>  #define __DECL_SIMD_erff128x
> +
> +#define __DECL_SIMD_tanh
> +#define __DECL_SIMD_tanhf
> +#define __DECL_SIMD_tanhl
> +#define __DECL_SIMD_tanhf16
> +#define __DECL_SIMD_tanhf32
> +#define __DECL_SIMD_tanhf64
> +#define __DECL_SIMD_tanhf128
> +#define __DECL_SIMD_tanhf32x
> +#define __DECL_SIMD_tanhf64x
> +#define __DECL_SIMD_tanhf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index a5b6c4457f..3d1c2056d5 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -72,7 +72,7 @@ __MATHCALL_VEC (cosh,, (_Mdouble_ __x));
>  /* Hyperbolic sine of X.  */
>  __MATHCALL_VEC (sinh,, (_Mdouble_ __x));
>  /* Hyperbolic tangent of X.  */
> -__MATHCALL (tanh,, (_Mdouble_ __x));
> +__MATHCALL_VEC (tanh,, (_Mdouble_ __x));
>  
>  #ifdef __USE_GNU
>  /* Cosine and sine of X.  */
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index 5525c8a0d6..e178cef683 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -61,6 +61,7 @@ GLIBC_2.35 _ZGVbN2v_log10 F
>  GLIBC_2.35 _ZGVbN2v_log1p F
>  GLIBC_2.35 _ZGVbN2v_log2 F
>  GLIBC_2.35 _ZGVbN2v_sinh F
> +GLIBC_2.35 _ZGVbN2v_tanh F
>  GLIBC_2.35 _ZGVbN2vv_atan2 F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
> @@ -78,6 +79,7 @@ GLIBC_2.35 _ZGVbN4v_log10f F
>  GLIBC_2.35 _ZGVbN4v_log1pf F
>  GLIBC_2.35 _ZGVbN4v_log2f F
>  GLIBC_2.35 _ZGVbN4v_sinhf F
> +GLIBC_2.35 _ZGVbN4v_tanhf F
>  GLIBC_2.35 _ZGVbN4vv_atan2f F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
> @@ -95,6 +97,7 @@ GLIBC_2.35 _ZGVcN4v_log10 F
>  GLIBC_2.35 _ZGVcN4v_log1p F
>  GLIBC_2.35 _ZGVcN4v_log2 F
>  GLIBC_2.35 _ZGVcN4v_sinh F
> +GLIBC_2.35 _ZGVcN4v_tanh F
>  GLIBC_2.35 _ZGVcN4vv_atan2 F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
> @@ -112,6 +115,7 @@ GLIBC_2.35 _ZGVcN8v_log10f F
>  GLIBC_2.35 _ZGVcN8v_log1pf F
>  GLIBC_2.35 _ZGVcN8v_log2f F
>  GLIBC_2.35 _ZGVcN8v_sinhf F
> +GLIBC_2.35 _ZGVcN8v_tanhf F
>  GLIBC_2.35 _ZGVcN8vv_atan2f F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
> @@ -129,6 +133,7 @@ GLIBC_2.35 _ZGVdN4v_log10 F
>  GLIBC_2.35 _ZGVdN4v_log1p F
>  GLIBC_2.35 _ZGVdN4v_log2 F
>  GLIBC_2.35 _ZGVdN4v_sinh F
> +GLIBC_2.35 _ZGVdN4v_tanh F
>  GLIBC_2.35 _ZGVdN4vv_atan2 F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
> @@ -146,6 +151,7 @@ GLIBC_2.35 _ZGVdN8v_log10f F
>  GLIBC_2.35 _ZGVdN8v_log1pf F
>  GLIBC_2.35 _ZGVdN8v_log2f F
>  GLIBC_2.35 _ZGVdN8v_sinhf F
> +GLIBC_2.35 _ZGVdN8v_tanhf F
>  GLIBC_2.35 _ZGVdN8vv_atan2f F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
> @@ -163,6 +169,7 @@ GLIBC_2.35 _ZGVeN16v_log10f F
>  GLIBC_2.35 _ZGVeN16v_log1pf F
>  GLIBC_2.35 _ZGVeN16v_log2f F
>  GLIBC_2.35 _ZGVeN16v_sinhf F
> +GLIBC_2.35 _ZGVeN16v_tanhf F
>  GLIBC_2.35 _ZGVeN16vv_atan2f F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
> @@ -180,5 +187,6 @@ GLIBC_2.35 _ZGVeN8v_log10 F
>  GLIBC_2.35 _ZGVeN8v_log1p F
>  GLIBC_2.35 _ZGVeN8v_log2 F
>  GLIBC_2.35 _ZGVeN8v_sinh F
> +GLIBC_2.35 _ZGVeN8v_tanh F
>  GLIBC_2.35 _ZGVeN8vv_atan2 F
>  GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index ea0deb31c1..3c657f6108 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -126,6 +126,10 @@
>  #  define __DECL_SIMD_erf __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_erff
>  #  define __DECL_SIMD_erff __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_tanh
> +#  define __DECL_SIMD_tanh __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_tanhf
> +#  define __DECL_SIMD_tanhf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 42addd9a25..c7f81945fe 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -62,6 +62,8 @@
>  !GCC$ builtin (acoshf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (erf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (erff) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (tanh) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (tanhf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -109,3 +111,5 @@
>  !GCC$ builtin (acoshf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (erf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (erff) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (tanh) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (tanhf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 2b89a1bba3..26df8d47bf 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -45,6 +45,7 @@ libmvec-funcs = \
>    sin \
>    sincos \
>    sinh \
> +  tanh \
>  
>  # Define libmvec function for benchtests directory.
>  libmvec-bench-funcs = \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 2fcdef6944..adcbe0fefb 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -29,6 +29,7 @@ libmvec {
>      _ZGVbN2v_log1p; _ZGVcN4v_log1p; _ZGVdN4v_log1p; _ZGVeN8v_log1p;
>      _ZGVbN2v_log2; _ZGVcN4v_log2; _ZGVdN4v_log2; _ZGVeN8v_log2;
>      _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
> +    _ZGVbN2v_tanh; _ZGVcN4v_tanh; _ZGVdN4v_tanh; _ZGVeN8v_tanh;
>      _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
> @@ -46,6 +47,7 @@ libmvec {
>      _ZGVbN4v_log1pf; _ZGVcN8v_log1pf; _ZGVdN8v_log1pf; _ZGVeN16v_log1pf;
>      _ZGVbN4v_log2f; _ZGVcN8v_log2f; _ZGVdN8v_log2f; _ZGVeN16v_log2f;
>      _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
> +    _ZGVbN4v_tanhf; _ZGVcN8v_tanhf; _ZGVdN8v_tanhf; _ZGVeN16v_tanhf;
>      _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
>      _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
>    }
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 929de0e786..bfaad7acef 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -2067,6 +2067,21 @@ float: 3
>  float128: 3
>  ldouble: 4
>  
> +Function: "tanh_vlen16":
> +float: 1
> +
> +Function: "tanh_vlen2":
> +double: 1
> +
> +Function: "tanh_vlen4":
> +double: 1
> +
> +Function: "tanh_vlen4_avx2":
> +double: 1
> +
> +Function: "tanh_vlen8":
> +double: 1
> +
>  Function: "tgamma":
>  double: 9
>  float: 8
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S
> new file mode 100644
> index 0000000000..35b065fe55
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized tanh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_tanh _ZGVbN2v_tanh_sse2
> +#include "../svml_d_tanh2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c
> new file mode 100644
> index 0000000000..d2e63bdc56
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized tanh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_tanh
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_tanh, __GI__ZGVbN2v_tanh, __redirect__ZGVbN2v_tanh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S
> new file mode 100644
> index 0000000000..35bbb5b04c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S
> @@ -0,0 +1,1272 @@
> +/* Function tanh vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dtanh_data_internal
> + */
> +#define _dbP                          	0
> +#define _dbSignMask                   	7680
> +#define _dbAbsMask                    	7696
> +#define _iExpMantMask                 	7712
> +#define _iExpMask                     	7728
> +#define _iMinIdxOfsMask               	7744
> +#define _iMaxIdxMask                  	7760
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_tanh_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm13
> +        movq      _iExpMantMask+__svml_dtanh_data_internal(%rip), %xmm14
> +        lea       _dbP+96+__svml_dtanh_data_internal(%rip), %rsi
> +        pshufd    $221, %xmm13, %xmm8
> +
> +/* if VMIN, VMAX is defined for I type */
> +        pxor      %xmm10, %xmm10
> +        movq      _iMinIdxOfsMask+__svml_dtanh_data_internal(%rip), %xmm9
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        pand      %xmm14, %xmm8
> +        movdqa    %xmm8, %xmm11
> +        psubd     %xmm9, %xmm8
> +        movq      _iMaxIdxMask+__svml_dtanh_data_internal(%rip), %xmm5
> +        movdqa    %xmm8, %xmm6
> +        movdqa    %xmm8, %xmm7
> +        pcmpgtd   %xmm5, %xmm6
> +        pcmpgtd   %xmm10, %xmm7
> +        movdqa    %xmm6, %xmm3
> +        pand      %xmm7, %xmm8
> +        andps     %xmm6, %xmm5
> +        andnps    %xmm8, %xmm3
> +        orps      %xmm5, %xmm3
> +
> +/*
> + * VSHRIMM( I, iIndex, = iIndex, (17 - 4) );
> + * VGATHER_MATRIX( L2D, p, TAB._dbP, iIndex, 0, T_ITEM_SIZE, T_ITEM_GRAN, 13, 0, 0 );
> + */
> +        psrld     $10, %xmm3
> +        movd      %xmm3, %eax
> +        pshufd    $1, %xmm3, %xmm4
> +
> +/*  Constant loading  */
> +        movq      _iExpMask+__svml_dtanh_data_internal(%rip), %xmm15
> +        movd      %xmm4, %ecx
> +        pcmpgtd   %xmm15, %xmm11
> +        movmskps  %xmm11, %edx
> +        movups    _dbAbsMask+__svml_dtanh_data_internal(%rip), %xmm0
> +        movups    _dbSignMask+__svml_dtanh_data_internal(%rip), %xmm12
> +        andps     %xmm13, %xmm0
> +        movslq    %eax, %rax
> +        andps     %xmm13, %xmm12
> +        movslq    %ecx, %rcx
> +        movups    %xmm13, (%rsp)
> +        movups    -96(%rax,%rsi), %xmm11
> +        movups    -96(%rcx,%rsi), %xmm2
> +        movups    -80(%rax,%rsi), %xmm9
> +        movups    -48(%rax,%rsi), %xmm5
> +        movaps    %xmm9, %xmm10
> +        movups    -32(%rax,%rsi), %xmm3
> +        movaps    %xmm5, %xmm6
> +        movaps    %xmm3, %xmm4
> +        unpckhpd  %xmm2, %xmm11
> +        movups    -80(%rcx,%rsi), %xmm13
> +        movups    -48(%rcx,%rsi), %xmm15
> +        movups    -32(%rcx,%rsi), %xmm1
> +        movups    -64(%rax,%rsi), %xmm7
> +        movups    -16(%rax,%rsi), %xmm2
> +        movaps    %xmm7, %xmm8
> +        unpcklpd  %xmm13, %xmm10
> +        unpckhpd  %xmm13, %xmm9
> +        movups    -64(%rcx,%rsi), %xmm14
> +        movups    -16(%rcx,%rsi), %xmm13
> +        unpcklpd  %xmm15, %xmm6
> +        unpckhpd  %xmm15, %xmm5
> +        unpcklpd  %xmm1, %xmm4
> +        unpckhpd  %xmm1, %xmm3
> +        movaps    %xmm2, %xmm1
> +        movups    (%rax,%rsi), %xmm15
> +        unpcklpd  %xmm14, %xmm8
> +        unpckhpd  %xmm14, %xmm7
> +        unpcklpd  %xmm13, %xmm1
> +        unpckhpd  %xmm13, %xmm2
> +        movaps    %xmm15, %xmm13
> +        movups    (%rcx,%rsi), %xmm14
> +        unpcklpd  %xmm14, %xmm13
> +        addpd     %xmm13, %xmm0
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm1, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm3, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm4, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm5, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm6, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm7, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm8, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm9, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm10, %xmm2
> +        mulpd     %xmm2, %xmm0
> +        addpd     %xmm11, %xmm0
> +        orps      %xmm12, %xmm0
> +        andl      $3, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    (%rsp), %xmm1
> +        movups    %xmm1, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      tanh@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN2v_tanh_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dtanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _dbP[60*16][2];
> +        __declspec(align(16)) VUINT32 _dbSignMask[2][2];
> +        __declspec(align(16)) VUINT32 _dbAbsMask[2][2];
> +        __declspec(align(16)) VUINT32 _iExpMantMask[4][1];
> +        __declspec(align(16)) VUINT32 _iExpMask[4][1];
> +        __declspec(align(16)) VUINT32 _iMinIdxOfsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iMaxIdxMask[4][1];
> +} __svml_dtanh_data_internal;
> +#endif
> +__svml_dtanh_data_internal:
> +        /* Polynomial coefficients */
> +        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* PH0 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF0000000000000   /* P1  = +1.000000000000000014103e+00 */
> +        .quad 0xBD197DEAD79668D3   /* P2  = -2.264132406596103056796e-14 */
> +        .quad 0xBFD555555553AF3C   /* P3  = -3.333333333273349741024e-01 */
> +        .quad 0xBE052F7CCA134846   /* P4  = -6.165791385711493738399e-10 */
> +        .quad 0x3FC11111563849D6   /* P5  = +1.333333655353061107201e-01 */
> +        .quad 0xBEB038623673FFB2   /* P6  = -9.668021563879858950855e-07 */
> +        .quad 0xBFAB9F685E64022E   /* P7  = -5.395055916051593179252e-02 */
> +        .quad 0xBF2A54E2B28F2207   /* P8  = -2.008940439550829012647e-04 */
> +        .quad 0x3F97CFB9328A230E   /* P9  = +2.325333949059698582189e-02 */
> +        .quad 0xBF75CA6D61723E02   /* P10 = -5.320002811586290441790e-03 */
> +        .quad 0x0000000000000000   /* B = +0        */
> +        .quad 0x3FF0000000000000   /* A = +1.0      */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C3708A564FAD29A   /* PL0 = +1.248663375337163807466e-18 */
> +        .quad 0x3FC0E6973998DA48   /* PH0 = +1.320370703922029154143e-01 */
> +        .quad 0x3FEF712EB25C0888   /* P1  = +9.825662120422444519229e-01 */
> +        .quad 0xBFC09B296F7C1EA9   /* P2  = -1.297351641044220078331e-01 */
> +        .quad 0xBFD3DD77541EDDA7   /* P3  = -3.103922196855485849143e-01 */
> +        .quad 0x3FB58FFCF4309615   /* P4  = +8.422833406128689275566e-02 */
> +        .quad 0x3FBD3ABE845DCF49   /* P5  = +1.141776154670967208833e-01 */
> +        .quad 0xBFA791DF538C37FA   /* P6  = -4.603479285115947936529e-02 */
> +        .quad 0xBFA4F872F69CD6E8   /* P7  = -4.095801601799370195284e-02 */
> +        .quad 0x3F9772E49EF6412B   /* P8  = +2.289921970583567527179e-02 */
> +        .quad 0x3F8CBC0807393909   /* P9  = +1.403051635784581776625e-02 */
> +        .quad 0xBF85F06A30F93319   /* P10 = -1.071246110873285040939e-02 */
> +        .quad 0xBFC1000000000000   /* B = -.132813 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6004EE5739DEAC   /* PL0 = +6.947247374112211856530e-18 */
> +        .quad 0x3FC2DC968E6E0D62   /* PH0 = +1.473568149050193398786e-01 */
> +        .quad 0x3FEF4E1E606D96DF   /* P1  = +9.782859691010478680677e-01 */
> +        .quad 0xBFC273BD70994AB9   /* P2  = -1.441571044730005866646e-01 */
> +        .quad 0xBFD382B548270D2C   /* P3  = -3.048527912726111386771e-01 */
> +        .quad 0x3FB7CD2D582A6B29   /* P4  = +9.297450449450351894400e-02 */
> +        .quad 0x3FBC1278CCCBF0DB   /* P5  = +1.096568584434324642303e-01 */
> +        .quad 0xBFA9C7F5115B86A1   /* P6  = -5.035367810138536095866e-02 */
> +        .quad 0xBFA371C21BAF618E   /* P7  = -3.797728145554222910481e-02 */
> +        .quad 0x3F9958943F68417E   /* P8  = +2.475196492201935923783e-02 */
> +        .quad 0x3F8930D5CFFD4152   /* P9  = +1.230017701132682667572e-02 */
> +        .quad 0xBF875CF7ADD31B76   /* P10 = -1.140779017658897660092e-02 */
> +        .quad 0xBFC3000000000000   /* B = -.148438 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7EABE24E052A1F   /* PL0 = +2.660321779421749543501e-17 */
> +        .quad 0x3FC4D04783618C71   /* PH0 = +1.626061812886266111366e-01 */
> +        .quad 0x3FEF2765AF97A4B3   /* P1  = +9.735592298067302883212e-01 */
> +        .quad 0xBFC443654205FEA5   /* P2  = -1.583067486171689074207e-01 */
> +        .quad 0xBFD31F2E208A5B97   /* P3  = -2.987780874040536844467e-01 */
> +        .quad 0x3FB9F235BD339878   /* P4  = +1.013520800512156573576e-01 */
> +        .quad 0x3FBAD0B0DFCCA141   /* P5  = +1.047468706498238100104e-01 */
> +        .quad 0xBFABD1B9600E608E   /* P6  = -5.433444306908184548967e-02 */
> +        .quad 0xBFA1CEBEAF07DB58   /* P7  = -3.478046309094534453598e-02 */
> +        .quad 0x3F9AFC9FB1D8EFD2   /* P8  = +2.635430834764902126383e-02 */
> +        .quad 0x3F8573444F1AB502   /* P9  = +1.047376028449287564018e-02 */
> +        .quad 0xBF8874FBC8F24406   /* P10 = -1.194187838544459322219e-02 */
> +        .quad 0xBFC5000000000000   /* B = -.164063 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7FB199D361A790   /* PL0 = +2.748994907060158996213e-17 */
> +        .quad 0x3FC6C170259E21F7   /* PH0 = +1.777782615356639783766e-01 */
> +        .quad 0x3FEEFD17479F7C65   /* P1  = +9.683948897253570478266e-01 */
> +        .quad 0xBFC609530FE4DF8D   /* P2  = -1.721595599753950294577e-01 */
> +        .quad 0xBFD2B3465D71B4DE   /* P3  = -2.921920692959484052676e-01 */
> +        .quad 0x3FBBFD2D34AC509B   /* P4  = +1.093319181057403192166e-01 */
> +        .quad 0x3FB9778C3C16A0FE   /* P5  = +9.948040453912551395183e-02 */
> +        .quad 0xBFADAC4D9E63C665   /* P6  = -5.795519407719210697372e-02 */
> +        .quad 0xBFA0139CCAD02D60   /* P7  = -3.139963126894929339124e-02 */
> +        .quad 0x3F9C5BF43BA6F19D   /* P8  = +2.769452680671379432854e-02 */
> +        .quad 0x3F8190B703350341   /* P9  = +8.576803002712575184772e-03 */
> +        .quad 0xBF8936606782858A   /* P10 = -1.231074634444230850234e-02 */
> +        .quad 0xBFC7000000000000   /* B = -.179688 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6A917CA3624D50   /* PL0 = +1.152216693509785660691e-17 */
> +        .quad 0x3FC8AFD7B974FABB   /* PH0 = +1.928662925292508878439e-01 */
> +        .quad 0x3FEECF47624A5D03   /* P1  = +9.628025932060214187231e-01 */
> +        .quad 0xBFC7C4C2CB4FDE4D   /* P2  = -1.856921665891938814679e-01 */
> +        .quad 0xBFD23F69CB2C1F9D   /* P3  = -2.851204380135586155453e-01 */
> +        .quad 0x3FBDEC5703A03814   /* P4  = +1.168875106670557712458e-01 */
> +        .quad 0x3FB8095003D0CF15   /* P5  = +9.389209836154706616487e-02 */
> +        .quad 0xBFAF554B47B10CBB   /* P6  = -6.119761705533607365968e-02 */
> +        .quad 0xBF9C89743FE7BC1B   /* P7  = -2.786809577986213853937e-02 */
> +        .quad 0x3F9D74725B746E7C   /* P8  = +2.876452143855921824991e-02 */
> +        .quad 0x3F7B2D8AFB70B88C   /* P9  = +6.635229968237631511880e-03 */
> +        .quad 0xBF89A0A2883EF6CB   /* P10 = -1.251341799058582545252e-02 */
> +        .quad 0xBFC9000000000000   /* B = -.195313 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7608279E8609CB   /* PL0 = +1.910958764623660748269e-17 */
> +        .quad 0x3FCA9B46D2DDC5E3   /* PH0 = +2.078636674519166172015e-01 */
> +        .quad 0x3FEE9E0BB72A01A1   /* P1  = +9.567926957534390123919e-01 */
> +        .quad 0xBFC974FAD10C5330   /* P2  = -1.988824387305156976885e-01 */
> +        .quad 0xBFD1C40ACCBA4044   /* P3  = -2.775904654781735703430e-01 */
> +        .quad 0x3FBFBE24E2987853   /* P4  = +1.239951184474830487522e-01 */
> +        .quad 0x3FB6885B4345E47F   /* P5  = +8.801813499839460539687e-02 */
> +        .quad 0xBFB06563D5670584   /* P6  = -6.404708824176991770896e-02 */
> +        .quad 0xBF98CD1D620DF6E2   /* P7  = -2.421995078065365147772e-02 */
> +        .quad 0x3F9E44EF3E844D21   /* P8  = +2.955983943054463683119e-02 */
> +        .quad 0x3F7325FA0148CAAE   /* P9  = +4.674889165971292322643e-03 */
> +        .quad 0xBF89B4C8556C2D92   /* P10 = -1.255184660614964011319e-02 */
> +        .quad 0xBFCB000000000000   /* B = -.210938 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6F19DAA20F51D5   /* PL0 = +1.348790537832000351176e-17 */
> +        .quad 0x3FCC83876CA98E15   /* PH0 = +2.227639465883021474557e-01 */
> +        .quad 0x3FEE697B662D07CD   /* P1  = +9.503762241004040620296e-01 */
> +        .quad 0xBFCB194C7ED76ACF   /* P2  = -2.117095584242946953999e-01 */
> +        .quad 0xBFD141A19E419762   /* P3  = -2.696308179350720680191e-01 */
> +        .quad 0x3FC0B89C64BC7B98   /* P4  = +1.306338779331468503007e-01 */
> +        .quad 0x3FB4F721150BBFC5   /* P5  = +8.189589275184434216748e-02 */
> +        .quad 0xBFB105AAFAB87898   /* P6  = -6.649273511036069461061e-02 */
> +        .quad 0xBF94FB3B31248C01   /* P7  = -2.048962104266749732921e-02 */
> +        .quad 0x3F9ECD31E588709C   /* P8  = +3.007963145692880855964e-02 */
> +        .quad 0x3F664A91A335C105   /* P9  = +2.721104095762541127495e-03 */
> +        .quad 0xBF89754E32E1E26E   /* P10 = -1.243077366619723806134e-02 */
> +        .quad 0xBFCD000000000000   /* B = -.226563 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6AC6C889D8111D   /* PL0 = +1.161245469312620769170e-17 */
> +        .quad 0x3FCE6864FE55A3D0   /* PH0 = +2.375608674877001114112e-01 */
> +        .quad 0x3FEE31AEE116B82B   /* P1  = +9.435648342384913826391e-01 */
> +        .quad 0xBFCCB114B69E808B   /* P2  = -2.241540805525839833707e-01 */
> +        .quad 0xBFD0B8AB913BA99D   /* P3  = -2.612713735858507980441e-01 */
> +        .quad 0x3FC1823322BED48A   /* P4  = +1.367858810096190233514e-01 */
> +        .quad 0x3FB35822B7929893   /* P5  = +7.556359273675842651653e-02 */
> +        .quad 0xBFB18B03CC78D2DA   /* P6  = -6.852744810096158580830e-02 */
> +        .quad 0xBF911CCC3C8D5E5D   /* P7  = -1.671141738492420009734e-02 */
> +        .quad 0x3F9F0DEC2D99B12F   /* P8  = +3.032654789278515819797e-02 */
> +        .quad 0x3F4A28398B4EBD98   /* P9  = +7.982521989244205404918e-04 */
> +        .quad 0xBF88E60CB2FAB9A4   /* P10 = -1.215753480150000985458e-02 */
> +        .quad 0xBFCF000000000000   /* B = -.242188 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C89D2B6774FB61D   /* PL0 = +4.479593208720169247958e-17 */
> +        .quad 0x3FD09C744F539BE4   /* PH0 = +2.595492148088267558848e-01 */
> +        .quad 0x3FEDD823B0400D42   /* P1  = +9.326342050921214825882e-01 */
> +        .quad 0xBFCEFBF7FF305FCC   /* P2  = -2.420644756355144687086e-01 */
> +        .quad 0xBFCFC01DC4F24A41   /* P3  = -2.480504237797323303990e-01 */
> +        .quad 0x3FC291A2C26D5548   /* P4  = +1.450694512701977626753e-01 */
> +        .quad 0x3FB0D562E672D188   /* P5  = +6.575601698097532991976e-02 */
> +        .quad 0xBFB2201ECC119E06   /* P6  = -7.080261690281738261872e-02 */
> +        .quad 0xBF8695D50F778D31   /* P7  = -1.102796987010509974642e-02 */
> +        .quad 0x3F9EEC8CFBC031A0   /* P8  = +3.019924437107734972427e-02 */
> +        .quad 0xBF6030F0A4D3660A   /* P9  = -1.976461417694923328722e-03 */
> +        .quad 0xBF87845288A4AEF5   /* P10 = -1.148285369398347838494e-02 */
> +        .quad 0xBFD1000000000000   /* B = -.265625 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8B6AAB614D1C8D   /* PL0 = +4.756035418366735312727e-17 */
> +        .quad 0x3FD275F7E1CF7F63   /* PH0 = +2.884502129727392616410e-01 */
> +        .quad 0x3FED56658F74C9CC   /* P1  = +9.167964746359813351341e-01 */
> +        .quad 0xBFD0ECC045EBD596   /* P2  = -2.644501383614054083635e-01 */
> +        .quad 0xBFCD5A4BDE179180   /* P3  = -2.293181261476426808811e-01 */
> +        .quad 0x3FC3C00047D34767   /* P4  = +1.542969084462655120552e-01 */
> +        .quad 0x3FAAC7CE84FD609F   /* P5  = +5.230565427217581251974e-02 */
> +        .quad 0xBFB288948D2E8B43   /* P6  = -7.239654967137902384931e-02 */
> +        .quad 0xBF6D6605AAD5A1C0   /* P7  = -3.588687008847041164896e-03 */
> +        .quad 0x3F9DDB0790848E97   /* P8  = +2.915584392134337382866e-02 */
> +        .quad 0xBF75FDE291BAD5B4   /* P9  = -5.369076763306269573660e-03 */
> +        .quad 0xBF84CEA5C52E0A78   /* P10 = -1.015977390284671071888e-02 */
> +        .quad 0xBFD3000000000000   /* B = -.296875 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7139A81C8A6ECF   /* PL0 = +1.494049799478574591322e-17 */
> +        .quad 0x3FD4470650036407   /* PH0 = +3.168350011233659890841e-01 */
> +        .quad 0x3FECC9A69DFDDD48   /* P1  = +8.996155820631566629678e-01 */
> +        .quad 0xBFD23DED3A37A09F   /* P2  = -2.850297039535778028925e-01 */
> +        .quad 0xBFCAD302395D51C1   /* P3  = -2.095644741153943890185e-01 */
> +        .quad 0x3FC4A8FE3F309C22   /* P4  = +1.614072617096278705115e-01 */
> +        .quad 0x3FA3D161188AA436   /* P5  = +3.870681213931741151586e-02 */
> +        .quad 0xBFB288CFE5494E98   /* P6  = -7.240008685885823969403e-02 */
> +        .quad 0x3F6C7903EED8D334   /* P7  = +3.475673371918475361081e-03 */
> +        .quad 0x3F9BE023CDFB02F6   /* P8  = +2.722221321778569498033e-02 */
> +        .quad 0xBF80F8296F2C3A95   /* P9  = -8.285831170295390358336e-03 */
> +        .quad 0xBF8152DF4790049B   /* P10 = -8.458847400108650973189e-03 */
> +        .quad 0xBFD5000000000000   /* B = -.328125 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7751FE0FEE8335   /* PL0 = +2.022712113430213599928e-17 */
> +        .quad 0x3FD60EF7120502A9   /* PH0 = +3.446633983585721261456e-01 */
> +        .quad 0x3FEC32D951E56E6F   /* P1  = +8.812071418319202070776e-01 */
> +        .quad 0xBFD370255FC004F8   /* P2  = -3.037198481616338996824e-01 */
> +        .quad 0xBFC832F0EBC6BB41   /* P3  = -1.890545989276351359107e-01 */
> +        .quad 0x3FC54C99A0FF432F   /* P4  = +1.664001499289269127540e-01 */
> +        .quad 0x3F99DAC0CC283C18   /* P5  = +2.524853941036661688369e-02 */
> +        .quad 0xBFB227B3896A026D   /* P6  = -7.091829399906553280461e-02 */
> +        .quad 0x3F84663364E1FB19   /* P7  = +9.960557476231411602383e-03 */
> +        .quad 0x3F9922D70DE07C57   /* P8  = +2.454696676442965935283e-02 */
> +        .quad 0xBF85C4A4EB6F86BC   /* P9  = -1.062897532932837635222e-02 */
> +        .quad 0xBF7AAB61214FFE17   /* P10 = -6.511096396024671890972e-03 */
> +        .quad 0xBFD7000000000000   /* B = -.359375 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3BFE67F266843B2C   /* PL0 = +1.030196791298162288777e-19 */
> +        .quad 0x3FD7CD3115FC0F16   /* PH0 = +3.718989100163850869407e-01 */
> +        .quad 0x3FEB92F96CCC2C5B   /* P1  = +8.616912007286247079761e-01 */
> +        .quad 0xBFD4827320135092   /* P2  = -3.204620183216856200247e-01 */
> +        .quad 0xBFC582B15550168A   /* P3  = -1.680509249273891977521e-01 */
> +        .quad 0x3FC5AC3B9A2E4C31   /* P4  = +1.693186285816366254244e-01 */
> +        .quad 0x3F88FA599FCADAFB   /* P5  = +1.219625491044728129762e-02 */
> +        .quad 0xBFB16EC8F5CA169E   /* P6  = -6.809669495313605642174e-02 */
> +        .quad 0x3F90140EFC748BBE   /* P7  = +1.570151725639922719844e-02 */
> +        .quad 0x3F95CFC49C1A28DC   /* P8  = +2.130038454792147768770e-02 */
> +        .quad 0xBF8946ED8B1BF454   /* P9  = -1.234231549050882816697e-02 */
> +        .quad 0xBF7239E55C1DD50F   /* P10 = -4.449745117985472755606e-03 */
> +        .quad 0xBFD9000000000000   /* B = -.390625 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6412330191189C   /* PL0 = +8.704448096175471149661e-18 */
> +        .quad 0x3FD9812B3B03F0A5   /* PH0 = +3.985088421175169703936e-01 */
> +        .quad 0x3FEAEB08C3C0E84D   /* P1  = +8.411907027541559254748e-01 */
> +        .quad 0xBFD57446B1BC46CF   /* P2  = -3.352219329545790787820e-01 */
> +        .quad 0xBFC2CA9ABC0444AD   /* P3  = -1.468079965639267634401e-01 */
> +        .quad 0x3FC5CA95F9460D18   /* P4  = +1.702449290424759093710e-01 */
> +        .quad 0xBF2C2DAA35DD05C3   /* P5  = -2.149839664813813012186e-04 */
> +        .quad 0xBFB069A516EEB75D   /* P6  = -6.411201295733578195472e-02 */
> +        .quad 0x3F9512716416FDC7   /* P7  = +2.057816670798986720058e-02 */
> +        .quad 0x3F921630CB1319A3   /* P8  = +1.766277541607908852593e-02 */
> +        .quad 0xBF8B76DA2EC99526   /* P9  = -1.341028647693549562145e-02 */
> +        .quad 0xBF63A97474A161E4   /* P10 = -2.400138332671485493040e-03 */
> +        .quad 0xBFDB000000000000   /* B = -.421875 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C89B79F5783381C   /* PL0 = +4.461236087774530799537e-17 */
> +        .quad 0x3FDB2A6C993B829D   /* PH0 = +4.244643684778937609003e-01 */
> +        .quad 0x3FEA3C0C1FBA328C   /* P1  = +8.198299998926627915155e-01 */
> +        .quad 0xBFD6457212F78DE0   /* P2  = -3.479886231636708581604e-01 */
> +        .quad 0xBFC0129BDA380A66   /* P3  = -1.255678954622282824818e-01 */
> +        .quad 0x3FC5AB77F388FBDE   /* P4  = +1.692953051696965507089e-01 */
> +        .quad 0xBF8822F3A6CADB7C   /* P5  = -1.178541519889874597783e-02 */
> +        .quad 0xBFAE4A876370A4BD   /* P6  = -5.916236008517603590739e-02 */
> +        .quad 0x3F991A89BC3B7710   /* P7  = +2.451529704455085335710e-02 */
> +        .quad 0x3F8C4A4328204D4B   /* P8  = +1.381351915555364098800e-02 */
> +        .quad 0xBF8C5F921D01EC0B   /* P9  = -1.385416174911393178490e-02 */
> +        .quad 0xBF3EE844C5B79FB8   /* P10 = -4.716079617694784908234e-04 */
> +        .quad 0xBFDD000000000000   /* B = -.453125 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C73FA437AD7AD87   /* PL0 = +1.732779905745858845932e-17 */
> +        .quad 0x3FDCC88C9902CF45   /* PH0 = +4.497405523536495697279e-01 */
> +        .quad 0x3FE9870845162D1D   /* P1  = +7.977334355686341748810e-01 */
> +        .quad 0xBFD6F62358F73DA8   /* P2  = -3.587730759436120677668e-01 */
> +        .quad 0xBFBAC4345D675FE1   /* P3  = -1.045563438450467661101e-01 */
> +        .quad 0x3FC5539DA8287019   /* P4  = +1.666142531474868131862e-01 */
> +        .quad 0xBF96E3E0DC04A09F   /* P5  = -2.235366194614185212822e-02 */
> +        .quad 0xBFAB5EC7147C207D   /* P6  = -5.345747113284546871398e-02 */
> +        .quad 0x3F9C24166FFA7A58   /* P7  = +2.748141344511120915667e-02 */
> +        .quad 0x3F8451B907819844   /* P8  = +9.921498815128277696693e-03 */
> +        .quad 0xBF8C1C6D19191FCB   /* P9  = -1.372609360545586670239e-02 */
> +        .quad 0x3F547372DF72E35A   /* P10 = +1.248228245272117756098e-03 */
> +        .quad 0xBFDF000000000000   /* B = -.484375 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C848FE06EE49950   /* PL0 = +3.566941590788961528958e-17 */
> +        .quad 0x3FDF20211A36475D   /* PH0 = +4.863360172249622803697e-01 */
> +        .quad 0x3FE86E67E6B80AC2   /* P1  = +7.634772783497611574659e-01 */
> +        .quad 0xBFD7C37C55474D9B   /* P2  = -3.713064987943767913461e-01 */
> +        .quad 0xBFB2EBF15F3CB036   /* P3  = -7.391270232318521952684e-02 */
> +        .quad 0x3FC4718C8EF6E3AA   /* P4  = +1.597152422016539530950e-01 */
> +        .quad 0xBFA277F8394E9B07   /* P5  = -3.607154559658991932071e-02 */
> +        .quad 0xBFA680312AB207E3   /* P6  = -4.394677778419955009224e-02 */
> +        .quad 0x3F9EDC9A8B57E286   /* P7  = +3.013841128810892143223e-02 */
> +        .quad 0x3F71B8C5E648EAF6   /* P8  = +4.326603932492947851719e-03 */
> +        .quad 0xBF89DB218356730C   /* P9  = -1.262499029217558458029e-02 */
> +        .quad 0x3F6B05728E6EBC8E   /* P10 = +3.298496001171330815865e-03 */
> +        .quad 0xBFE1000000000000   /* B = -.53125  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8429831EDD94DE   /* PL0 = +3.497576705878673192147e-17 */
> +        .quad 0x3FE10AF47E0BF610   /* PH0 = +5.325872861719194162333e-01 */
> +        .quad 0x3FE6EC5879F87EEE   /* P1  = +7.163507826080299761242e-01 */
> +        .quad 0xBFD86AD001BFE200   /* P2  = -3.815193192563413204129e-01 */
> +        .quad 0xBFA239045B661385   /* P3  = -3.559125533778398983564e-02 */
> +        .quad 0x3FC2B4572D9CC147   /* P4  = +1.461285565105845078038e-01 */
> +        .quad 0xBFA99F4F01740705   /* P5  = -5.004355328311586406115e-02 */
> +        .quad 0xBF9F449C484F4879   /* P6  = -3.053516570418721511214e-02 */
> +        .quad 0x3F9F5F42169D7DDE   /* P7  = +3.063681853325116830798e-02 */
> +        .quad 0xBF6111B1BA632A97   /* P8  = -2.083632588527460989469e-03 */
> +        .quad 0xBF84725FBE5B6E61   /* P9  = -9.983776089419639342530e-03 */
> +        .quad 0x3F7438A2986CFA9C   /* P10 = +4.936823976832951342488e-03 */
> +        .quad 0xBFE3000000000000   /* B = -.59375  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BE9160BFB3505   /* PL0 = +1.210424670976053242391e-17 */
> +        .quad 0x3FE26D76F73233C7   /* PH0 = +5.758623912857893101247e-01 */
> +        .quad 0x3FE56363B5B93937   /* P1  = +6.683825063026124740752e-01 */
> +        .quad 0xBFD8A2244B27297E   /* P2  = -3.848963483730115724200e-01 */
> +        .quad 0xBF52CA2F101EEF63   /* P3  = -1.146837196286797844817e-03 */
> +        .quad 0x3FC081BC342243AD   /* P4  = +1.289592032012739958675e-01 */
> +        .quad 0xBFAE38DB4A932344   /* P5  = -5.902753148399722719732e-02 */
> +        .quad 0xBF91F814D4AE90C6   /* P6  = -1.754791782481459457885e-02 */
> +        .quad 0x3F9D056AE193C4F3   /* P7  = +2.834097863973723355792e-02 */
> +        .quad 0xBF7BD0B502D8F3A0   /* P8  = -6.790835451792626336974e-03 */
> +        .quad 0xBF7B763F7BB8AE2F   /* P9  = -6.704566938008179114124e-03 */
> +        .quad 0x3F76036F42D9AB69   /* P10 = +5.374369252971835729099e-03 */
> +        .quad 0xBFE5000000000000   /* B = -.65625  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8B64AF0450486E   /* PL0 = +4.751979286662385162741e-17 */
> +        .quad 0x3FE3B75F8BCB742D   /* PH0 = +6.161344271055263499548e-01 */
> +        .quad 0x3FE3DA23BC12369F   /* P1  = +6.203783677353447780947e-01 */
> +        .quad 0xBFD8768FF4B46416   /* P2  = -3.822364701932782367281e-01 */
> +        .quad 0x3F9D67CB8AD9CB1A   /* P3  = +2.871625933625941117406e-02 */
> +        .quad 0x3FBC168CB7827DF4   /* P4  = +1.097190807363331305006e-01 */
> +        .quad 0xBFB03A2B83C9272E   /* P5  = -6.338760344911228324430e-02 */
> +        .quad 0xBF789FEB595297DC   /* P6  = -6.011885959344067548074e-03 */
> +        .quad 0x3F98BD01B4C335E7   /* P7  = +2.415850320612902513532e-02 */
> +        .quad 0xBF83BADC303D6535   /* P8  = -9.633751127398152979976e-03 */
> +        .quad 0xBF6C54E7A1C1E3F3   /* P9  = -3.458454519258407989501e-03 */
> +        .quad 0x3F7408394B7EF3E7   /* P10 = +4.890655334688332484537e-03 */
> +        .quad 0xBFE7000000000000   /* B = -.71875  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6A48557F6E0D3E   /* PL0 = +1.139824111505584215867e-17 */
> +        .quad 0x3FE4E8D895B010DC   /* PH0 = +6.534235881413468227663e-01 */
> +        .quad 0x3FE25652FAAF8A73   /* P1  = +5.730376144604875448991e-01 */
> +        .quad 0xBFD7F6C3A57C444B   /* P2  = -3.744362941807295084434e-01 */
> +        .quad 0x3FAB7866E3F99EBE   /* P3  = +5.365296872042567001598e-02 */
> +        .quad 0x3FB6FA1DF47CCD40   /* P4  = +8.975398272450707099784e-02 */
> +        .quad 0xBFB05508D3741B8E   /* P5  = -6.379752314033580026840e-02 */
> +        .quad 0x3F6C3EFDF7BB279C   /* P6  = +3.448005705512137236209e-03 */
> +        .quad 0x3F9372BADD6D3E27   /* P7  = +1.899234749299530050806e-02 */
> +        .quad 0xBF860FD5AE65F3DA   /* P8  = -1.077238977881649471165e-02 */
> +        .quad 0xBF47266FFB07E628   /* P9  = -7.064863949032872448118e-04 */
> +        .quad 0x3F6F9763992C2A05   /* P10 = +3.856367614735181120799e-03 */
> +        .quad 0xBFE9000000000000   /* B = -.78125  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BB6A2B194E3AB   /* PL0 = +1.201878007209462528697e-17 */
> +        .quad 0x3FE602609AAE7C22   /* PH0 = +6.877902051090851731630e-01 */
> +        .quad 0x3FE0DCBAFE191C7F   /* P1  = +5.269446337560025312137e-01 */
> +        .quad 0xBFD732028428A9FB   /* P2  = -3.624273577321727538225e-01 */
> +        .quad 0x3FB2D92389BE065B   /* P3  = +7.362577545975439796588e-02 */
> +        .quad 0x3FB1F6A9C8C49993   /* P4  = +7.017003203927733370937e-02 */
> +        .quad 0xBFAF47C0B50B56EE   /* P5  = -6.109430513394707378526e-02 */
> +        .quad 0x3F85A8EDD1356223   /* P6  = +1.057611269668352068104e-02 */
> +        .quad 0x3F8BE05C5CD1B4FA   /* P7  = +1.361152799855823798207e-02 */
> +        .quad 0xBF85A0EFE4552F76   /* P8  = -1.056086936537046752272e-02 */
> +        .quad 0x3F559F2A6A356194   /* P9  = +1.319686337259627831943e-03 */
> +        .quad 0x3F6576F5E989208D   /* P10 = +2.620201394425042596201e-03 */
> +        .quad 0xBFEB000000000000   /* B = -.84375  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C80328BD86C8B74   /* PL0 = +2.809809047161267929701e-17 */
> +        .quad 0x3FE704BB1B7FCB81   /* PH0 = +7.193275010198335595035e-01 */
> +        .quad 0x3FDEE264AAD6C40C   /* P1  = +4.825679462765613089739e-01 */
> +        .quad 0xBFD637493CE659F1   /* P2  = -3.471243948673921548357e-01 */
> +        .quad 0x3FB6BE3A3DEE6F4A   /* P3  = +8.884014141079635303208e-02 */
> +        .quad 0x3FAA85EB6470AC0F   /* P4  = +5.180297471118688523488e-02 */
> +        .quad 0xBFACC0146EA4858D   /* P5  = -5.615295267694895314457e-02 */
> +        .quad 0x3F8F8FB683CDDAC5   /* P6  = +1.541082944616557159055e-02 */
> +        .quad 0x3F819515DEE2CB91   /* P7  = +8.585139145315585602547e-03 */
> +        .quad 0xBF834E45E6AF9EA1   /* P8  = -9.426637747267209169415e-03 */
> +        .quad 0x3F65250F197CA56D   /* P9  = +2.581147662472352252568e-03 */
> +        .quad 0x3F57A766026D036C   /* P10 = +1.443719500187702367690e-03 */
> +        .quad 0xBFED000000000000   /* B = -.90625  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C716F7EEF7B61AD   /* PL0 = +1.512291215142578135651e-17 */
> +        .quad 0x3FE7F0E1A4CD846E   /* PH0 = +7.481544703297353660076e-01 */
> +        .quad 0x3FDC2D4CC872DC09   /* P1  = +4.402648885256331012598e-01 */
> +        .quad 0xBFD514A99F92ED53   /* P2  = -3.293861444796750250530e-01 */
> +        .quad 0x3FB9846A6CF2F337   /* P3  = +9.967675361526749494844e-02 */
> +        .quad 0x3FA20896939AB161   /* P4  = +3.522177268800664413493e-02 */
> +        .quad 0xBFA97E801F31EE0D   /* P5  = -4.979324703978358553405e-02 */
> +        .quad 0x3F92A11F47B82085   /* P6  = +1.819275737037219740638e-02 */
> +        .quad 0x3F717D70FE289C34   /* P7  = +4.270020845559097605514e-03 */
> +        .quad 0xBF7FDCF1D3F6CE2D   /* P8  = -7.779068604054678540132e-03 */
> +        .quad 0x3F69F607E81AF6B6   /* P9  = +3.169074480722534625181e-03 */
> +        .quad 0x3F3F925C80D0F889   /* P10 = +4.817462766516585511824e-04 */
> +        .quad 0xBFEF000000000000   /* B = -.96875  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C931A11D7E8606E   /* PL0 = +6.627280241435322692188e-17 */
> +        .quad 0x3FE92BFB370D9B71   /* PH0 = +7.866188121086975515439e-01 */
> +        .quad 0x3FD866160E454111   /* P1  = +3.812308444367014680480e-01 */
> +        .quad 0xBFD33149F3801DBA   /* P2  = -2.998833539899937679796e-01 */
> +        .quad 0x3FBBDB6D4C949899   /* P3  = +1.088169395412442909023e-01 */
> +        .quad 0x3F8D6AB2A74B9343   /* P4  = +1.436366627735597372494e-02 */
> +        .quad 0xBFA404D1047C5D72   /* P5  = -3.909924678571997970917e-02 */
> +        .quad 0x3F93C47D9ACCD919   /* P6  = +1.930423981976856424661e-02 */
> +        .quad 0xBF41B755642CFF1B   /* P7  = -5.406538915408738478158e-04 */
> +        .quad 0xBF74B5301AA1E788   /* P8  = -5.055606752756853900641e-03 */
> +        .quad 0x3F69A84C5B2A3E68   /* P9  = +3.132008679422249529120e-03 */
> +        .quad 0xBF3CF47830328C11   /* P10 = -4.418176105877589308931e-04 */
> +        .quad 0xBFF1000000000000   /* B = -1.0625   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C884D471B8FD396   /* PL0 = +4.215701792312937090514e-17 */
> +        .quad 0x3FEA8DBCBC31897A   /* PH0 = +8.298019099859594849278e-01 */
> +        .quad 0x3FD3EE730537C8EA   /* P1  = +3.114287901836535219818e-01 */
> +        .quad 0xBFD08A05AD27CE32   /* P2  = -2.584242049190123217982e-01 */
> +        .quad 0x3FBC5255406F84B6   /* P3  = +1.106313021005175045399e-01 */
> +        .quad 0xBF772FA2F633AA5E   /* P4  = -5.660664147607434209241e-03 */
> +        .quad 0xBF99DD8E4C473FC4   /* P5  = -2.525923100057504533247e-02 */
> +        .quad 0x3F9183C935B6495D   /* P6  = +1.710428610165003372069e-02 */
> +        .quad 0xBF70471A3A591480   /* P7  = -3.974058583087303228038e-03 */
> +        .quad 0xBF603DDD4DEBB9A4   /* P8  = -1.982624278176818987264e-03 */
> +        .quad 0x3F62591E44D3C17F   /* P9  = +2.239760512218135956425e-03 */
> +        .quad 0xBF4C195D3A9B1AB4   /* P10 = -8.575158328419569430544e-04 */
> +        .quad 0xBFF3000000000000   /* B = -1.1875   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C90DD1C9BFF7F64   /* PL0 = +5.850777430004479798187e-17 */
> +        .quad 0x3FEBAD50A4A68BC1   /* PH0 = +8.649066177207417327466e-01 */
> +        .quad 0x3FD01FBA72CEE1A5   /* P1  = +2.519365426228666233893e-01 */
> +        .quad 0xBFCBE432F647C4D6   /* P2  = -2.179015829602010702633e-01 */
> +        .quad 0x3FBABF92B6E5AC73   /* P3  = +1.044856735731387955105e-01 */
> +        .quad 0xBF922983AA24E217   /* P4  = -1.773648954369563555378e-02 */
> +        .quad 0xBF8C72214C14E23A   /* P5  = -1.388956082756564056328e-02 */
> +        .quad 0x3F8ACB4D1F388E8B   /* P6  = +1.308307887581540972153e-02 */
> +        .quad 0xBF740EF8B4A2EE3B   /* P7  = -4.897090441029978580995e-03 */
> +        .quad 0xBF0EA9F30C8DC900   /* P8  = -5.848668076326342477133e-05 */
> +        .quad 0x3F53CC40D18713AE   /* P9  = +1.208365725788622757410e-03 */
> +        .quad 0xBF4848B86029CBA1   /* P10 = -7.410908004444779592485e-04 */
> +        .quad 0xBFF5000000000000   /* B = -1.3125   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8FB61781D22681   /* PL0 = +5.501032995458057064843e-17 */
> +        .quad 0x3FEC950A3340C8BF   /* PH0 = +8.931933404003514764824e-01 */
> +        .quad 0x3FC9E1DFFD385423   /* P1  = +2.022056566644617586005e-01 */
> +        .quad 0xBFC71E2FF88EBA23   /* P2  = -1.806087459239772032583e-01 */
> +        .quad 0x3FB80AEBD07AB5BA   /* P3  = +9.391664352252506838449e-02 */
> +        .quad 0xBF98404E27EAE6ED   /* P4  = -2.368280523908243895884e-02 */
> +        .quad 0xBF772DA520B5006E   /* P5  = -5.658764868087568802107e-03 */
> +        .quad 0x3F824C9268AF9423   /* P6  = +8.935111827620250551925e-03 */
> +        .quad 0xBF722AE76D206AE3   /* P7  = -4.435447701349490160113e-03 */
> +        .quad 0x3F4B807F56298D5E   /* P8  = +8.392926941493230644497e-04 */
> +        .quad 0x3F3D71027DF95D2A   /* P9  = +4.492407879061627603159e-04 */
> +        .quad 0xBF3EBD17676755FB   /* P10 = -4.690343988874298905483e-04 */
> +        .quad 0xBFF7000000000000   /* B = -1.4375   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C95393C63CE8224   /* PL0 = +7.363407705201031038415e-17 */
> +        .quad 0x3FED4E6F464286B0   /* PH0 = +9.158245441687622445670e-01 */
> +        .quad 0x3FC4A45842B7DE1E   /* P1  = +1.612654042980787191461e-01 */
> +        .quad 0xBFC2E7885AFDD3D0   /* P2  = -1.476908153814791087327e-01 */
> +        .quad 0x3FB4DD6DD51D3FEB   /* P3  = +8.150373890862254580204e-02 */
> +        .quad 0xBF9A05D3ADAB489C   /* P4  = -2.541285274021075503042e-02 */
> +        .quad 0xBF3459B643B4995C   /* P5  = -3.105230313899165257622e-04 */
> +        .quad 0x3F766B30745F2E3A   /* P6  = +5.473317409222350365811e-03 */
> +        .quad 0xBF6C2C891E555BDF   /* P7  = -3.439204988051155730940e-03 */
> +        .quad 0x3F5194F30D6C576D   /* P8  = +1.073109966176012791522e-03 */
> +        .quad 0x3EF4DBB43C3132A2   /* P9  = +1.989194766975849961365e-05 */
> +        .quad 0xBF2E45EBAB3C15A0   /* P10 = -2.309656316514087783666e-04 */
> +        .quad 0xBFF9000000000000   /* B = -1.5625   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C75111669651DAA   /* PL0 = +1.827249135453834384396e-17 */
> +        .quad 0x3FEDE1EB5937518F   /* PH0 = +9.338280432225917193634e-01 */
> +        .quad 0x3FC06129C7C8EBB1   /* P1  = +1.279651856910653382507e-01 */
> +        .quad 0xBFBE9763041064E1   /* P2  = -1.194974789545031421774e-01 */
> +        .quad 0x3FB1A5B9F9113928   /* P3  = +6.893503504509068635308e-02 */
> +        .quad 0xBF992145039F9AFE   /* P4  = -2.454097590080105816526e-02 */
> +        .quad 0x3F66CB116EA49C89   /* P5  = +2.782377288116648315142e-03 */
> +        .quad 0x3F67F972FDF30001   /* P6  = +2.926563829163342740100e-03 */
> +        .quad 0xBF63A7B5975F02F3   /* P7  = -2.399305983061922438601e-03 */
> +        .quad 0x3F4FDE7B8777F4C8   /* P8  = +9.725669069095216373599e-04 */
> +        .quad 0xBF25918876626BA4   /* P9  = -1.645545082212515656240e-04 */
> +        .quad 0xBF1495123C991F00   /* P10 = -7.851527984669912693674e-05 */
> +        .quad 0xBFFB000000000000   /* B = -1.6875   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9F29A5B7426D27   /* PL0 = +1.081172820484012446345e-16 */
> +        .quad 0x3FEE56B6F3EFABFC   /* PH0 = +9.480852856044061915952e-01 */
> +        .quad 0x3FB9E3EFD94BB9FC   /* P1  = +1.011342912204113371518e-01 */
> +        .quad 0xBFB88BD9760FECA7   /* P2  = -9.588393337610288420285e-02 */
> +        .quad 0x3FAD48A0350B3ACF   /* P3  = +5.719471595295077387313e-02 */
> +        .quad 0xBF96CC6A5110F129   /* P4  = -2.226415748394675367257e-02 */
> +        .quad 0x3F71934687170384   /* P5  = +4.290843485649345772606e-03 */
> +        .quad 0x3F5407BAF73B3DF9   /* P6  = +1.222546180475235334287e-03 */
> +        .quad 0xBF591B626C0646DD   /* P7  = -1.532407870488964407324e-03 */
> +        .quad 0x3F48B0E1DD283558   /* P8  = +7.535078860329375669277e-04 */
> +        .quad 0xBF2B322292840D2B   /* P9  = -2.074877932117605962646e-04 */
> +        .quad 0xBE99E4061120C741   /* P10 = -3.858017559892704559672e-07 */
> +        .quad 0xBFFD000000000000   /* B = -1.8125   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6AF8C2041C67CD   /* PL0 = +1.169711482626385762338e-17 */
> +        .quad 0x3FEEB2DFEDD5EC93   /* PH0 = +9.593352933146824801369e-01 */
> +        .quad 0x3FB465A205CFB638   /* P1  = +7.967579500083210999681e-02 */
> +        .quad 0xBFB3914BF68D39FF   /* P2  = -7.643580216720378576778e-02 */
> +        .quad 0x3FA7F21A08C5C734   /* P3  = +4.676896435820623621673e-02 */
> +        .quad 0xBF93DA9560EA9960   /* P4  = -1.938851741820124550772e-02 */
> +        .quad 0x3F73953FEC62820E   /* P5  = +4.781007481284861359820e-03 */
> +        .quad 0x3F2749D5E1273E3C   /* P6  = +1.776765426044646108071e-04 */
> +        .quad 0xBF4D46B0B498CE5A   /* P7  = -8.934367007839658352859e-04 */
> +        .quad 0x3F4153D680E1F4C4   /* P8  = +5.287930851093571206574e-04 */
> +        .quad 0xBF28477014ECA6A2   /* P9  = -1.852344816708944640949e-04 */
> +        .quad 0x3EFFAC54E07CEB4B   /* P10 = +3.020588886147182143902e-05 */
> +        .quad 0xBFFF000000000000   /* B = -1.9375   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7A8AF2BB2231F2   /* PL0 = +2.302217989249372577466e-17 */
> +        .quad 0x3FEF1994DF724FC8   /* PH0 = +9.718727459135090285258e-01 */
> +        .quad 0x3FAC65B1BC0C9D58   /* P1  = +5.546336575053583942603e-02 */
> +        .quad 0xBFAB9937BDA747C8   /* P2  = -5.390333356957871365599e-02 */
> +        .quad 0x3FA15B42D9EF931C   /* P3  = +3.389939222669210777241e-02 */
> +        .quad 0xBF8EACD8E8507A3C   /* P4  = -1.497811755149058215502e-02 */
> +        .quad 0x3F7263A15721C682   /* P5  = +4.489546046998806349050e-03 */
> +        .quad 0xBF42A032ACDC3B32   /* P6  = -5.684134900735048121829e-04 */
> +        .quad 0xBF3431E79B5AD185   /* P7  = -3.081503340170088810438e-04 */
> +        .quad 0x3F31B51667C7DF5E   /* P8  = +2.701930714290502424828e-04 */
> +        .quad 0xBF1F8709579250AD   /* P9  = -1.202678157759563704341e-04 */
> +        .quad 0x3F01ED8ED1BF9595   /* P10 = +3.419487094883790833778e-05 */
> +        .quad 0xC001000000000000   /* B = -2.125    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C86F3F7C3DAFC55   /* PL0 = +3.981710680748877459333e-17 */
> +        .quad 0x3FEF73776B2AA2DB   /* PH0 = +9.828450291725759901951e-01 */
> +        .quad 0x3FA16A7FC4D7B900   /* P1  = +3.401564863075812007064e-02 */
> +        .quad 0xBFA11E03803AD621   /* P2  = -3.343211117082156940532e-02 */
> +        .quad 0x3F9609591597297F   /* P3  = +2.152003473546803654658e-02 */
> +        .quad 0xBF847E74ED9BBB0C   /* P4  = -1.000682211039596246436e-02 */
> +        .quad 0x3F6BFF771725CD65   /* P5  = +3.417713736035987187864e-03 */
> +        .quad 0xBF491D1FF73C18FA   /* P6  = -7.664114077392807421000e-04 */
> +        .quad 0x3EF53EE467B51DC5   /* P7  = +2.026145237479599375099e-05 */
> +        .quad 0x3F160135BE0D94A0   /* P8  = +8.394136922403255700685e-05 */
> +        .quad 0xBF0B32CB1D276A40   /* P9  = -5.187685350778849443841e-05 */
> +        .quad 0x3EF4DAF70C12D555   /* P10 = +1.988919462255396826584e-05 */
> +        .quad 0xC003000000000000   /* B = -2.375    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C19DBF4E2E5B7DC   /* PL0 = +3.504575836708380670219e-19 */
> +        .quad 0x3FEFAA7934B75EBD   /* PH0 = +9.895597486128832054320e-01 */
> +        .quad 0x3F9545200830A42C   /* P1  = +2.077150392520736492125e-02 */
> +        .quad 0xBF950C46D285F6BC   /* P2  = -2.055464420253970271376e-02 */
> +        .quad 0x3F8B79F5BFC6513F   /* P3  = +1.341621390819425058164e-02 */
> +        .quad 0xBF7A50ADAD777898   /* P4  = -6.424597194806612772505e-03 */
> +        .quad 0x3F633A19BE8255E3   /* P5  = +2.347040444940816227383e-03 */
> +        .quad 0xBF44E609BC2557B7   /* P6  = -6.377742322836087134324e-04 */
> +        .quad 0x3F1AFCBAD60EAACD   /* P7  = +1.029480968230231421206e-04 */
> +        .quad 0x3EE80476AC34A8EF   /* P8  = +1.145240583485084317660e-05 */
> +        .quad 0xBEF278E23DE463E9   /* P9  = -1.761646478213091821804e-05 */
> +        .quad 0x3EE209FAF377264D   /* P10 = +8.601658563106529694651e-06 */
> +        .quad 0xC005000000000000   /* B = -2.625    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C979D62702C631C   /* PL0 = +8.193023793215066385979e-17 */
> +        .quad 0x3FEFCC04CDBCDC4B   /* PH0 = +9.936546343150295390600e-01 */
> +        .quad 0x3F89E87D088D269A   /* P1  = +1.265046770426474576547e-02 */
> +        .quad 0xBF89BE6721012B80   /* P2  = -1.257019586059526836624e-02 */
> +        .quad 0x3F80F1C13E8D39D3   /* P3  = +8.273610803056031004326e-03 */
> +        .quad 0xBF7082DBC9602757   /* P4  = -4.031046430108839563004e-03 */
> +        .quad 0x3F590BE9BD4E0A11   /* P5  = +1.528719197467002507978e-03 */
> +        .quad 0xBF3DCC2BEF6D0283   /* P6  = -4.546744598208711809986e-04 */
> +        .quad 0x3F1A08065C4A8E85   /* P7  = +9.930170842636406837764e-05 */
> +        .quad 0xBEE528117D0410F3   /* P8  = -1.008821337267942266431e-05 */
> +        .quad 0xBED0BE73A44FF565   /* P9  = -3.992069257383521775961e-06 */
> +        .quad 0x3EC9B0C11E342E38   /* P10 = +3.062539904901699218737e-06 */
> +        .quad 0xC007000000000000   /* B = -2.875    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C804B931AD7A3CC   /* PL0 = +2.826768921701616830245e-17 */
> +        .quad 0x3FEFE06EB0688212   /* PH0 = +9.961465306733450209009e-01 */
> +        .quad 0x3F7F81BD8876224D   /* P1  = +7.692089427458426472642e-03 */
> +        .quad 0xBF7F62A8C699A963   /* P2  = -7.662448196791823756776e-03 */
> +        .quad 0x3F74C31E2B2A6A28   /* P3  = +5.068891378551522166321e-03 */
> +        .quad 0xBF6470D537F16227   /* P4  = -2.495209162173734080001e-03 */
> +        .quad 0x3F4FAEEF61C89673   /* P5  = +9.668988091717359455754e-04 */
> +        .quad 0xBF33C5E80B349783   /* P6  = -3.017131341088651514023e-04 */
> +        .quad 0x3F138F3D31037A6B   /* P7  = +7.461367590931028650557e-05 */
> +        .quad 0xBEEB3C780996FFE3   /* P8  = -1.298723536791163711556e-05 */
> +        .quad 0x3E9D0C75BC8BFEFC   /* P9  = +4.328589367358221917138e-07 */
> +        .quad 0x3EAC3865227764D4   /* P10 = +8.410302755848104487452e-07 */
> +        .quad 0xC009000000000000   /* B = -3.125    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C5B978B202749F9   /* PL0 = +5.983054034451594408315e-18 */
> +        .quad 0x3FEFECD6B7EA3128   /* PH0 = +9.976609794698889643882e-01 */
> +        .quad 0x3F73238B786137FE   /* P1  = +4.672570043181776968058e-03 */
> +        .quad 0xBF731815ACEA072E   /* P2  = -4.661640805922390930706e-03 */
> +        .quad 0x3F6956F0816D5AEE   /* P3  = +3.093213784647877798933e-03 */
> +        .quad 0xBF591A16286C4885   /* P4  = -1.532098425461232453877e-03 */
> +        .quad 0x3F43B3E3A00C6096   /* P5  = +6.012784434430592468442e-04 */
> +        .quad 0xBF29441B2A56DEC7   /* P6  = -1.927645836710038499293e-04 */
> +        .quad 0x3F0A99C3A2E857B6   /* P7  = +5.073669705184196724674e-05 */
> +        .quad 0xBEE61CB034DDC151   /* P8  = -1.054385361573597042258e-05 */
> +        .quad 0x3EB792BBC76D6107   /* P9  = +1.405070887824641788698e-06 */
> +        .quad 0x3E761472362A16F0   /* P10 = +8.225391704739515383837e-08 */
> +        .quad 0xC00B000000000000   /* B = -3.375    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9C290AFCBDE00D   /* PL0 = +9.770074992945060684926e-17 */
> +        .quad 0x3FEFF45F6D36133A   /* PH0 = +9.985806592017987259879e-01 */
> +        .quad 0x3F673CEC093032DE   /* P1  = +2.836667068100913999228e-03 */
> +        .quad 0xBF67347A7CD844D5   /* P2  = -2.832640870800243808078e-03 */
> +        .quad 0x3F5EDA25530355DB   /* P3  = +1.883064698679040793627e-03 */
> +        .quad 0xBF4EAD3BBABC1BA9   /* P4  = -9.361783645268534848806e-04 */
> +        .quad 0x3F3842E61CD35432   /* P5  = +3.701984213198588740338e-04 */
> +        .quad 0xBF1F9AB7FD1A3DDD   /* P6  = -1.205611036090218544867e-04 */
> +        .quad 0x3F0136C154EA3DED   /* P7  = +3.283288480304320224929e-05 */
> +        .quad 0xBEDF12807F721E66   /* P8  = -7.408207230892235753013e-06 */
> +        .quad 0x3EB5B53687AD5112   /* P9  = +1.293889481520047941659e-06 */
> +        .quad 0xBE801E90FBFED147   /* P10 = -1.200988872775447204019e-07 */
> +        .quad 0xC00D000000000000   /* B = -3.625    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9E323294294877   /* PL0 = +1.047637125334028950603e-16 */
> +        .quad 0x3FEFF8F21CDAAA62   /* PH0 = +9.991388858373506653976e-01 */
> +        .quad 0x3F5C3470628813F2   /* P1  = +1.721486807697344658108e-03 */
> +        .quad 0xBF5C2E38AC6FF8D2   /* P2  = -1.720004411026422324849e-03 */
> +        .quad 0x3F52C13234626F43   /* P3  = +1.144694354969070234454e-03 */
> +        .quad 0xBF42B0A47DF47BB4   /* P4  = -5.703738387728891173354e-04 */
> +        .quad 0x3F2DB2889E32FBFD   /* P5  = +2.265731592156760387344e-04 */
> +        .quad 0xBF1385FBD54C5A55   /* P6  = -7.447576110695385196414e-05 */
> +        .quad 0x3EF5AFA812C6984E   /* P7  = +2.068153223579892541184e-05 */
> +        .quad 0xBED47097C188A03C   /* P8  = -4.873231795467276043290e-06 */
> +        .quad 0x3EAFF2B982F7EE8C   /* P9  = +9.521288628073486288914e-07 */
> +        .quad 0xBE828EC5B57D424D   /* P10 = -1.382656715739529384702e-07 */
> +        .quad 0xC00F000000000000   /* B = -3.875    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9BA40DA6983BEC   /* PL0 = +9.589840482158163453169e-17 */
> +        .quad 0x3FEFFCAAC3F20E65   /* PH0 = +9.995931460438894911036e-01 */
> +        .quad 0x3F4AA87CF664754C   /* P1  = +8.135423820793490331956e-04 */
> +        .quad 0xBF4AA5B62919E224   /* P2  = -8.132113891426467676310e-04 */
> +        .quad 0x3F41C01B53B0B312   /* P3  = +5.416997368051531710388e-04 */
> +        .quad 0xBF31B8B54D091751   /* P4  = -2.704088811110632606347e-04 */
> +        .quad 0x3F1C431305954ECC   /* P5  = +1.078110084525254933728e-04 */
> +        .quad 0xBF02B7DEAD0D44E6   /* P6  = -3.570221236393906131126e-05 */
> +        .quad 0x3EE51C6EFF109EA9   /* P7  = +1.006654199116272154479e-05 */
> +        .quad 0xBEC48CFB08072D17   /* P8  = -2.449834994621594976610e-06 */
> +        .quad 0x3EA1585EC59CAE34   /* P9  = +5.169271261920604503617e-07 */
> +        .quad 0xBE78832BAF950BA9   /* P10 = -9.131575131209528255629e-08 */
> +        .quad 0xC011000000000000   /* B = -4.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8FBF237F4AFE10   /* PL0 = +5.507163370275307643966e-17 */
> +        .quad 0x3FEFFEC61279A3A4   /* PH0 = +9.998503075449787225182e-01 */
> +        .quad 0x3F339E78281A00EA   /* P1  = +2.993625022114214863645e-04 */
> +        .quad 0xBF339DB7B072AD62   /* P2  = -2.993176899035080028902e-04 */
> +        .quad 0x3F2A259E658EF4E4   /* P3  = +1.994853835451177669594e-04 */
> +        .quad 0xBF1A219C312B10BA   /* P4  = -9.968295880030927192162e-05 */
> +        .quad 0x3F04E146B4F5F4B7   /* P5  = +3.982541113154699160876e-05 */
> +        .quad 0xBEEBC5F137088210   /* P6  = -1.324329943580649487333e-05 */
> +        .quad 0x3ECF96736E300B00   /* P7  = +3.765547135882256916132e-06 */
> +        .quad 0xBEAF4874840B91EB   /* P8  = -9.323068824421825762292e-07 */
> +        .quad 0x3E8B6AB2B5C8FD3F   /* P9  = +2.042709991312793245971e-07 */
> +        .quad 0xBE650BCCE62FD2B7   /* P10 = -3.920140725219944650830e-08 */
> +        .quad 0xC013000000000000   /* B = -4.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9C869C85471703   /* PL0 = +9.896883942603146946483e-17 */
> +        .quad 0x3FEFFF8C81C6DC33   /* PH0 = +9.999449286177707341139e-01 */
> +        .quad 0x3F1CDF5A2E4D7C69   /* P1  = +1.101397316012206760643e-04 */
> +        .quad 0xBF1CDEF1F9BE63BE   /* P2  = -1.101336660539594564027e-04 */
> +        .quad 0x3F133EC10C83AAA0   /* P3  = +7.341435696487731017506e-05 */
> +        .quad 0xBF033DAB325FAACB   /* P4  = -3.669909192168459445238e-05 */
> +        .quad 0x3EEEC598FA98BAD8   /* P5  = +1.467316890843338172161e-05 */
> +        .quad 0xBED47F1A15BA368E   /* P6  = -4.886744445221253126882e-06 */
> +        .quad 0x3EB761FBE7D201C1   /* P7  = +1.393720509029845064726e-06 */
> +        .quad 0xBE974CD75A43BF6B   /* P8  = -3.471994551992448536007e-07 */
> +        .quad 0x3E74B02965BBF8DC   /* P9  = +7.706929621914905669946e-08 */
> +        .quad 0xBE504EF4E3892A66   /* P10 = -1.518840362012570189110e-08 */
> +        .quad 0xC015000000000000   /* B = -5.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C643810400471B0   /* PL0 = +8.768592603904887599187e-18 */
> +        .quad 0x3FEFFFD583014825   /* PH0 = +9.999797400180382433987e-01 */
> +        .quad 0x3F053E71416C43CA   /* P1  = +4.051955345663706869871e-05 */
> +        .quad 0xBF053E550C7C8CC9   /* P2  = -4.051873253121394012080e-05 */
> +        .quad 0x3EFC52D0D90D4843   /* P3  = +2.701139380018752534477e-05 */
> +        .quad 0xBEEC523A6ADBE142   /* P4  = -1.350460237457883558350e-05 */
> +        .quad 0x3ED6A73E22D844B3   /* P5  = +5.400965660055565196396e-06 */
> +        .quad 0xBEBE31D10F23ACD0   /* P6  = -1.799738182979224868919e-06 */
> +        .quad 0x3EA13E14264DEAB2   /* P7  = +5.138663935333241981438e-07 */
> +        .quad 0xBE81385ABB98EDCC   /* P8  = -1.282999997786486835638e-07 */
> +        .quad 0x3E5EB9164593E0B6   /* P9  = +2.861301981891537161158e-08 */
> +        .quad 0xBE387218CFE7772E   /* P10 = -5.691705994073124478195e-09 */
> +        .quad 0xC017000000000000   /* B = -5.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C92530433F4C703   /* PL0 = +6.357512739163799046861e-17 */
> +        .quad 0x3FEFFFF05E8D3191   /* PH0 = +9.999925467214315633058e-01 */
> +        .quad 0x3EEF42DDFA52B575   /* P1  = +1.490650158538873335176e-05 */
> +        .quad 0xBEEF42CEB54212AA   /* P2  = -1.490639048307961378200e-05 */
> +        .quad 0x3EE4D7201CBCB853   /* P3  = +9.937445518550804010127e-06 */
> +        .quad 0xBED4D6F764B66C37   /* P4  = -4.968574624976280456686e-06 */
> +        .quad 0x3EC0ABB806EBDE71   /* P5  = +1.987311456171617620608e-06 */
> +        .quad 0xBEA6399CF854F876   /* P6  = -6.623581475862682369330e-07 */
> +        .quad 0x3E8964B91728D7C9   /* P7  = +1.891959403186505598965e-07 */
> +        .quad 0xBE6961A0528444D6   /* P8  = -4.727645325404986954168e-08 */
> +        .quad 0x3E46AE3B0814EE00   /* P9  = +1.056147192151514779549e-08 */
> +        .quad 0xBE221B8194DACD16   /* P10 = -2.107984154277957626641e-09 */
> +        .quad 0xC019000000000000   /* B = -6.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7BB5622CE1A79E   /* PL0 = +2.403331811901679167526e-17 */
> +        .quad 0x3FEFFFFA3FF22708   /* PH0 = +9.999972580855862602789e-01 */
> +        .quad 0x3ED7003552D53503   /* P1  = +5.483821309338170039906e-06 */
> +        .quad 0xBED7003130C1AB92   /* P2  = -5.483806273169366545037e-06 */
> +        .quad 0x3ECEAAE13B699C45   /* P3  = +3.655850800133043324271e-06 */
> +        .quad 0xBEBEAACB305F3D07   /* P4  = -1.827905351959291114416e-06 */
> +        .quad 0x3EA8887F5F9C87EF   /* P5  = +7.311461438267648556646e-07 */
> +        .quad 0xBE905AD08DF8454F   /* P6  = -2.437046884027860662692e-07 */
> +        .quad 0x3E72B068300B703F   /* P7  = +6.962228483613086736676e-08 */
> +        .quad 0xBE52AF921A71C058   /* P8  = -1.740252888706390465423e-08 */
> +        .quad 0x3E30B53EAA35300D   /* P9  = +3.890131469838137725119e-09 */
> +        .quad 0xBE0AB60CDAD7E22E   /* P10 = -7.773963050435300060566e-10 */
> +        .quad 0xC01B000000000000   /* B = -6.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8BD1ACF80D7256   /* PL0 = +4.825835138930451121169e-17 */
> +        .quad 0x3FEFFFFDE2760A41   /* PH0 = +9.999989913051835488389e-01 */
> +        .quad 0x3EC0EC4F1EC27E55   /* P1  = +2.017388615341105998718e-06 */
> +        .quad 0xBEC0EC4E005E6EAC   /* P2  = -2.017386580411626200507e-06 */
> +        .quad 0x3EB6906504BC4610   /* P3  = +1.344921673533307001969e-06 */
> +        .quad 0xBEA6905F0D52C8B5   /* P4  = -6.724581235377781360384e-07 */
> +        .quad 0x3E920D0F5CCE152B   /* P5  = +2.689810941136721216499e-07 */
> +        .quad 0xBE7811505B10E753   /* P6  = -8.965891741619763761543e-08 */
> +        .quad 0x3E5B811EE4F9B8EE   /* P7  = +2.561544781706659619288e-08 */
> +        .quad 0xBE3B80ABC067E840   /* P8  = -6.403452884688571158579e-09 */
> +        .quad 0x3E1898E394E09335   /* P9  = +1.431746793613569087489e-09 */
> +        .quad 0xBDF3ABB5BA711DB7   /* P10 = -2.862469657501951918569e-10 */
> +        .quad 0xC01D000000000000   /* B = -7.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8AE01DB39A3791   /* PL0 = +4.662147961093911873193e-17 */
> +        .quad 0x3FEFFFFF38C76668   /* PH0 = +9.999996289217962797125e-01 */
> +        .quad 0x3EA8E712E56E1188   /* P1  = +7.421562696484951529573e-07 */
> +        .quad 0xBEA8E7124A650791   /* P2  = -7.421559942504648535596e-07 */
> +        .quad 0x3EA09A0B62D8EF94   /* P3  = +4.947702955735978541097e-07 */
> +        .quad 0xBE909A09C56C2107   /* P4  = -2.473847805916120382218e-07 */
> +        .quad 0x3E7A900A90A54A6E   /* P5  = +9.895362410487317236618e-08 */
> +        .quad 0xBE61B5557BB449B6   /* P6  = -3.298434544432568302770e-08 */
> +        .quad 0x3E443CC74732CDCA   /* P7  = +9.423781066565733462466e-09 */
> +        .quad 0xBE243CA8AA8D6E54   /* P8  = -2.355890888986360997159e-09 */
> +        .quad 0x3E0219C341E0D1B4   /* P9  = +5.267978308406275552691e-10 */
> +        .quad 0xBDDCF49A10950F13   /* P10 = -1.053394074620716018815e-10 */
> +        .quad 0xC01F000000000000   /* B = -7.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C75CB18F3775414   /* PL0 = +1.890271747518592444083e-17 */
> +        .quad 0x3FEFFFFFD38C39F0   /* PH0 = +9.999999172012490333827e-01 */
> +        .quad 0x3E8639E2F89493BB   /* P1  = +1.655974950855472979393e-07 */
> +        .quad 0xBE8639E2D9B29562   /* P2  = -1.655974813708346974914e-07 */
> +        .quad 0x3E7DA2836A1F706E   /* P3  = +1.103982989742589616541e-07 */
> +        .quad 0xBE6DA282C6733DAE   /* P4  = -5.519913131581509871840e-08 */
> +        .quad 0x3E57B53A278851FD   /* P5  = +2.207971980430773309147e-08 */
> +        .quad 0xBE3F9C4A72536E22   /* P6  = -7.359895614149337484810e-09 */
> +        .quad 0x3E220E81FBE19CDD   /* P7  = +2.102073153607135257714e-09 */
> +        .quad 0xBE020E8875ADA8D8   /* P8  = -5.255211642212584097407e-10 */
> +        .quad 0x3DE07634328384FC   /* P9  = +1.197748786062966341989e-10 */
> +        .quad 0xBDBA54078E3C351F   /* P10 = -2.394539505021488953905e-11 */
> +        .quad 0xC021000000000000   /* B = -8.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C98B78738B0EDEF   /* PL0 = +8.575399788039081964921e-17 */
> +        .quad 0x3FEFFFFFF9FBEA40   /* PH0 = +9.999999887944071019774e-01 */
> +        .quad 0x3E581056FAC28C46   /* P1  = +2.241118550516412682327e-08 */
> +        .quad 0xBE581056F63A4351   /* P2  = -2.241118525356742542550e-08 */
> +        .quad 0x3E500AE49533790A   /* P3  = +1.494078933911655875521e-08 */
> +        .quad 0xBE400AE489ACBA90   /* P4  = -7.470394349637968945652e-09 */
> +        .quad 0x3E29AB0D59A1967B   /* P5  = +2.988168557255271725494e-09 */
> +        .quad 0xBE111CB32D6EEF2B   /* P6  = -9.960558400070350772418e-10 */
> +        .quad 0x3DF38CBADF396908   /* P7  = +2.844859618921805216353e-10 */
> +        .quad 0xBDD38CC7B92CECD3   /* P8  = -7.112220386749926320915e-11 */
> +        .quad 0x3DB1D2BBE2705032   /* P9  = +1.621008722427575444686e-11 */
> +        .quad 0xBD8C8199294E6380   /* P10 = -3.240784656869469020111e-12 */
> +        .quad 0xC023000000000000   /* B = -9.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8EEEC16618B984   /* PL0 = +5.365957423487855307906e-17 */
> +        .quad 0x3FEFFFFFFF2F9279   /* PH0 = +9.999999984834878619111e-01 */
> +        .quad 0x3E2A0DB0D052B148   /* P1  = +3.033024167396880687734e-09 */
> +        .quad 0xBE2A0DB0CFA6AB71   /* P2  = -3.033024162734192808028e-09 */
> +        .quad 0x3E215E75D53A3105   /* P3  = +2.022016035353114070618e-09 */
> +        .quad 0xBE115E75D40AA47F   /* P4  = -1.011008013562702155050e-09 */
> +        .quad 0x3DFBCA5CDC12ED1C   /* P5  = +4.044047007631481841556e-10 */
> +        .quad 0xBDE286E85704FC22   /* P6  = -1.348015410318274576187e-10 */
> +        .quad 0x3DC52A8925354517   /* P7  = +3.850101197145027796396e-11 */
> +        .quad 0xBDA52A97EA3F5F4A   /* P8  = -9.625355478142550638468e-12 */
> +        .quad 0x3D834C011A2AC0F7   /* P9  = +2.193802608697321032841e-12 */
> +        .quad 0xBD5EDD05BDCB3A62   /* P10 = -4.385948508419928563300e-13 */
> +        .quad 0xC025000000000000   /* B = -10.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BD8B474BBF792   /* PL0 = +1.207649585364892639612e-17 */
> +        .quad 0x3FEFFFFFFFE3CAD8   /* PH0 = +9.999999997947623953110e-01 */
> +        .quad 0x3DFC3527E43C565F   /* P1  = +4.104751852963940338559e-10 */
> +        .quad 0xBDFC3527E420F415   /* P2  = -4.104751852036136216697e-10 */
> +        .quad 0x3DF2CE1A8D806DAD   /* P3  = +2.736501142887952919489e-10 */
> +        .quad 0xBDE2CE1A8DDF690A   /* P4  = -1.368250573053032426141e-10 */
> +        .quad 0x3DCE169832D8BD68   /* P5  = +5.473022586854025789680e-11 */
> +        .quad 0xBDB40F0FE853DA5B   /* P6  = -1.824340550195944358477e-11 */
> +        .quad 0x3D96EA8D930D31A1   /* P7  = +5.210545794901128943676e-12 */
> +        .quad 0xBD76EA9DB0D09839   /* P8  = -1.302650427355019556441e-12 */
> +        .quad 0x3D54E474FD4303A1   /* P9  = +2.968990047962355000258e-13 */
> +        .quad 0xBD30B526CA2B228A   /* P10 = -5.935740124899435401321e-14 */
> +        .quad 0xC027000000000000   /* B = -11.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C56E8953D525FD5   /* PL0 = +4.967494994909661698725e-18 */
> +        .quad 0x3FEFFFFFFFFC2EB9   /* PH0 = +9.999999999722241073030e-01 */
> +        .quad 0x3DCE8A37A48016C2   /* P1  = +5.555177547354687971427e-11 */
> +        .quad 0xBDCE8A37A479B7D4   /* P2  = -5.555177547084873157964e-11 */
> +        .quad 0x3DC45C250CFA9C16   /* P3  = +3.703451575129414499553e-11 */
> +        .quad 0xBDB45C250D9F8467   /* P4  = -1.851725791056759260154e-11 */
> +        .quad 0x3DA049BB33CBD4E9   /* P5  = +7.406930640558963265190e-12 */
> +        .quad 0xBD85B7A407C422C1   /* P6  = -2.468976464832073512208e-12 */
> +        .quad 0x3D68CF9CED2B3FD5   /* P7  = +7.051706989348171774536e-13 */
> +        .quad 0xBD48CFAE64C352B3   /* P8  = -1.762945685274427023683e-13 */
> +        .quad 0x3D269EAE08690D52   /* P9  = +4.018091287355461204663e-14 */
> +        .quad 0xBD0216CBEAFFF5AA   /* P10 = -8.033151495672990022322e-15 */
> +        .quad 0xC029000000000000   /* B = -12.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8ACF1392B106D3   /* PL0 = +4.650601502940921454330e-17 */
> +        .quad 0x3FEFFFFFFFFF7BBD   /* PH0 = +9.999999999962408958609e-01 */
> +        .quad 0x3DA088529889B316   /* P1  = +7.518115268189742464885e-12 */
> +        .quad 0xBDA088529887F4C4   /* P2  = -7.518115268005149164680e-12 */
> +        .quad 0x3D960B18BF1DF711   /* P3  = +5.012076679213679703380e-12 */
> +        .quad 0xBD860B18BFD99A48   /* P4  = -2.506038344573564868987e-12 */
> +        .quad 0x3D71A27E7CA64143   /* P5  = +1.002419056539285288454e-12 */
> +        .quad 0xBD5783530EA76D91   /* P6  = -3.341396294294381580191e-13 */
> +        .quad 0x3D3ADCC75CBD2A03   /* P7  = +9.543447641637910477850e-14 */
> +        .quad 0xBD1ADCDA46BE5F17   /* P8  = -2.385887543769010971872e-14 */
> +        .quad 0x3CF87D77650BE5B8   /* P9  = +5.437895260471143131391e-15 */
> +        .quad 0xBCD395AE6E74C6D2   /* P10 = -1.087168847335561258239e-15 */
> +        .quad 0xC02B000000000000   /* B = -13.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C97A8A295292858   /* PL0 = +8.208271151146829171896e-17 */
> +        .quad 0x3FEFFFFFFFFFEE19   /* PH0 = +9.999999999994911847878e-01 */
> +        .quad 0x3D71E642BB008F95   /* P1  = +1.017466259229268282255e-12 */
> +        .quad 0xBD71E642BAFEEC54   /* P2  = -1.017466259207593392022e-12 */
> +        .quad 0x3D67DDAE41647741   /* P3  = +6.783108169938233581038e-13 */
> +        .quad 0xBD57DDAE4230F34B   /* P4  = -3.391554091734942426856e-13 */
> +        .quad 0x3D4317C33FAE2536   /* P5  = +1.356626669455791324801e-13 */
> +        .quad 0xBD2975040D3E26B9   /* P6  = -4.522088139411435138867e-14 */
> +        .quad 0x3D0D155DCD0F0AFB   /* P7  = +1.291565189902030307333e-14 */
> +        .quad 0xBCED157247832B20   /* P8  = -3.228947666403019234175e-15 */
> +        .quad 0x3CCA83D70F607C28   /* P9  = +7.359390959466796619024e-16 */
> +        .quad 0xBCA5343952C1E19E   /* P10 = -1.471323041436694087188e-16 */
> +        .quad 0xC02D000000000000   /* B = -14.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9B7876CBC5306E   /* PL0 = +9.530765996816607711732e-17 */
> +        .quad 0x3FEFFFFFFFFFFD93   /* PH0 = +9.999999999999310551502e-01 */
> +        .quad 0x3D436121E2640D76   /* P1  = +1.376990843765503869546e-13 */
> +        .quad 0xBD436121E26250EA   /* P2  = -1.376990843736775811281e-13 */
> +        .quad 0x3D39D6D7CA259186   /* P3  = +9.179938654047876451320e-14 */
> +        .quad 0xBD29D6D7CB0327CE   /* P4  = -4.589969336188563660531e-14 */
> +        .quad 0x3D14ABE4DC31244A   /* P5  = +1.835994545584345768382e-14 */
> +        .quad 0xBCFB8FDB82AB6BB7   /* P6  = -6.119980791767901275443e-15 */
> +        .quad 0x3CDF7CF757491B60   /* P7  = +1.747943407988343076526e-15 */
> +        .quad 0xBCBF7D0D833640FB   /* P8  = -4.369905470133249448357e-16 */
> +        .quad 0x3C9CB512F6BDC754   /* P9  = +9.959852600692493655511e-17 */
> +        .quad 0xBC76F50AB1B0E9BA   /* P10 = -1.991219205936492089091e-17 */
> +        .quad 0xC02F000000000000   /* B = -15.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6FFE15D5F78543   /* PL0 = +1.387454417328248962819e-17 */
> +        .quad 0x3FEFFFFFFFFFFFE1   /* PH0 = +9.999999999999965583086e-01 */
> +        .quad 0x3CFEE00288B99C26   /* P1  = +6.855635762864742358597e-15 */
> +        .quad 0xBCFEE0027D060EE2   /* P2  = -6.855635607998342735403e-15 */
> +        .quad 0x3CF4954AA23148A2   /* P3  = +4.570381865813341696777e-15 */
> +        .quad 0xBCE4954B5DAD3010   /* P4  = -2.285192173571711474199e-15 */
> +        .quad 0x3CD07883DD8793BD   /* P5  = +9.143109661358222028007e-16 */
> +        .quad 0xBCB5F5F4BB87ADCF   /* P6  = -3.047668447080103869032e-16 */
> +        .quad 0x3C98F1A905097685   /* P7  = +8.654183371862458774513e-17 */
> +        .quad 0xBC78F2D585007222   /* P8  = -2.163943551222030413627e-17 */
> +        .quad 0x3C58A37CC5082B5F   /* P9  = +5.342649626494471588064e-18 */
> +        .quad 0xBC33AE7917F94D17   /* P10 = -1.066938163384541013918e-18 */
> +        .quad 0xC031000000000000   /* B = -17        */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C91BF1D80474F0F   /* PL0 = +6.157069264461989135096e-17 */
> +        .quad 0x3FEFFFFFFFFFFFFE   /* PH0 = +9.999999999999997779554e-01 */
> +        .quad 0x3CB72071400E6275   /* P1  = +3.209478247225075961360e-16 */
> +        .quad 0xBCB72071400A9F37   /* P2  = -3.209478247103497434502e-16 */
> +        .quad 0x3CAED5EC39A77629   /* P3  = +2.139652050028423711308e-16 */
> +        .quad 0xBC9ED5EC3B530600   /* P4  = -1.069826028468029104719e-16 */
> +        .quad 0x3C88AB2BFED159DE   /* P5  = +4.279326904335078988705e-17 */
> +        .quad 0xBC70721D1220B3FC   /* P6  = -1.426441958074916244382e-17 */
> +        .quad 0x3C52C96049721FB8   /* P7  = +4.073700029965821523731e-18 */
> +        .quad 0xBC32C971215735DC   /* P8  = -1.018438939975201710113e-18 */
> +        .quad 0x3C112EF658AB41A9   /* P9  = +2.328791246104218830028e-19 */
> +        .quad 0xBBEB7B598C6AD3DE   /* P10 = -4.655603964908654142787e-20 */
> +        .quad 0xC03287E0C98F84E5   /* B = -18.530774 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF0000000000000   /* PH0 = +1.000000000000000000000e+00 */
> +        .quad 0x0000000000000000   /* P1  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P2  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P3  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P4  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P5  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P6  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P7  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P8  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P9  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P10 = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* B = +0        */
> +        .quad 0x0000000000000000   /* A = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .align 16
> +        .quad 0x8000000000000000, 0x8000000000000000   /* _dbSignMask       */
> +        .align 16
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff   /* _dbAbsMask        */
> +        .align 16
> +        .long 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000           /* _iExpMantMask     */
> +        .align 16
> +        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMask         */
> +        .align 16
> +        .long 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000           /* _iMinIdxOfsMask   */
> +        .align 16
> +        .long 0x00760000, 0x00760000, 0x00760000, 0x00760000           /* _iMaxIdxMask      */
> +        .align 16
> +        .type	__svml_dtanh_data_internal,@object
> +        .size	__svml_dtanh_data_internal,.-__svml_dtanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S
> new file mode 100644
> index 0000000000..80e85c47ec
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized tanh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_tanh _ZGVdN4v_tanh_sse_wrapper
> +#include "../svml_d_tanh4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c
> new file mode 100644
> index 0000000000..a26e62052b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized tanh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_tanh
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_tanh, __GI__ZGVdN4v_tanh, __redirect__ZGVdN4v_tanh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S
> new file mode 100644
> index 0000000000..53dda241e4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S
> @@ -0,0 +1,1279 @@
> +/* Function tanh vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dtanh_data_internal
> + */
> +#define _dbP                          	0
> +#define _dbSignMask                   	7680
> +#define _dbAbsMask                    	7712
> +#define _iExpMantMask                 	7744
> +#define _iExpMask                     	7776
> +#define _iMinIdxOfsMask               	7808
> +#define _iMaxIdxMask                  	7840
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_tanh_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       _dbP+96+__svml_dtanh_data_internal(%rip), %r8
> +        vmovupd   %ymm0, (%rsp)
> +
> +/* if VMIN, VMAX is defined for I type */
> +        vpxor     %xmm11, %xmm11, %xmm11
> +
> +/*  Constant loading  */
> +        vmovups   _iMaxIdxMask+__svml_dtanh_data_internal(%rip), %xmm8
> +        vandpd    _dbAbsMask+__svml_dtanh_data_internal(%rip), %ymm0, %ymm1
> +        vandpd    _dbSignMask+__svml_dtanh_data_internal(%rip), %ymm0, %ymm2
> +        vextractf128 $1, %ymm0, %xmm15
> +        vshufps   $221, %xmm15, %xmm0, %xmm14
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        vpand     _iExpMantMask+__svml_dtanh_data_internal(%rip), %xmm14, %xmm12
> +        vpsubd    _iMinIdxOfsMask+__svml_dtanh_data_internal(%rip), %xmm12, %xmm9
> +        vpcmpgtd  %xmm11, %xmm9, %xmm10
> +        vpcmpgtd  %xmm8, %xmm9, %xmm0
> +        vpand     %xmm10, %xmm9, %xmm7
> +        blendvps  %xmm0, %xmm8, %xmm7
> +
> +/*
> + * VSHRIMM( I, iIndex, = iIndex, (17 - 4) );
> + * VGATHER_MATRIX( L2D, p, TAB._dbP, iIndex, 0, T_ITEM_SIZE, T_ITEM_GRAN, 13, 0, 0 );
> + */
> +        vpsrld    $10, %xmm7, %xmm6
> +        vmovd     %xmm6, %edx
> +        vpcmpgtd  _iExpMask+__svml_dtanh_data_internal(%rip), %xmm12, %xmm13
> +        vmovmskps %xmm13, %eax
> +        vpextrd   $1, %xmm6, %ecx
> +        movslq    %edx, %rdx
> +        movslq    %ecx, %rcx
> +        vpextrd   $2, %xmm6, %esi
> +        vpextrd   $3, %xmm6, %edi
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        vmovupd   -96(%rdx,%r8), %xmm3
> +        vmovupd   -96(%rcx,%r8), %xmm4
> +        vmovupd   -80(%rcx,%r8), %xmm13
> +        vmovupd   -64(%rcx,%r8), %xmm9
> +        vmovupd   -80(%rdx,%r8), %xmm14
> +        vmovupd   -64(%rdx,%r8), %xmm10
> +        vmovupd   -48(%rdx,%r8), %xmm6
> +        vinsertf128 $1, -96(%rsi,%r8), %ymm3, %ymm0
> +        vinsertf128 $1, -96(%rdi,%r8), %ymm4, %ymm15
> +        vmovupd   -48(%rcx,%r8), %xmm3
> +        vunpckhpd %ymm15, %ymm0, %ymm0
> +        vinsertf128 $1, -80(%rsi,%r8), %ymm14, %ymm12
> +        vinsertf128 $1, -64(%rsi,%r8), %ymm10, %ymm8
> +        vinsertf128 $1, -80(%rdi,%r8), %ymm13, %ymm11
> +        vinsertf128 $1, -64(%rdi,%r8), %ymm9, %ymm7
> +        vunpcklpd %ymm11, %ymm12, %ymm15
> +        vunpckhpd %ymm11, %ymm12, %ymm14
> +        vunpcklpd %ymm7, %ymm8, %ymm13
> +        vunpckhpd %ymm7, %ymm8, %ymm12
> +        vmovupd   -32(%rdx,%r8), %xmm9
> +        vmovupd   -32(%rcx,%r8), %xmm8
> +        vinsertf128 $1, -48(%rsi,%r8), %ymm6, %ymm4
> +        vinsertf128 $1, -48(%rdi,%r8), %ymm3, %ymm5
> +        vunpcklpd %ymm5, %ymm4, %ymm11
> +        vunpckhpd %ymm5, %ymm4, %ymm10
> +        vmovupd   -16(%rdx,%r8), %xmm3
> +        vmovupd   -16(%rcx,%r8), %xmm4
> +        vinsertf128 $1, -32(%rsi,%r8), %ymm9, %ymm7
> +        vinsertf128 $1, -32(%rdi,%r8), %ymm8, %ymm6
> +        vunpcklpd %ymm6, %ymm7, %ymm9
> +        vunpckhpd %ymm6, %ymm7, %ymm8
> +        vinsertf128 $1, -16(%rsi,%r8), %ymm3, %ymm5
> +        vinsertf128 $1, -16(%rdi,%r8), %ymm4, %ymm6
> +        vunpcklpd %ymm6, %ymm5, %ymm7
> +        vunpckhpd %ymm6, %ymm5, %ymm6
> +        vmovupd   (%rdx,%r8), %xmm3
> +        vmovupd   (%rcx,%r8), %xmm5
> +        vinsertf128 $1, (%rsi,%r8), %ymm3, %ymm4
> +        vinsertf128 $1, (%rdi,%r8), %ymm5, %ymm5
> +        vunpcklpd %ymm5, %ymm4, %ymm3
> +        vaddpd    %ymm3, %ymm1, %ymm1
> +        vfmadd213pd %ymm7, %ymm1, %ymm6
> +        vfmadd213pd %ymm8, %ymm1, %ymm6
> +        vfmadd213pd %ymm9, %ymm1, %ymm6
> +        vfmadd213pd %ymm10, %ymm1, %ymm6
> +        vfmadd213pd %ymm11, %ymm1, %ymm6
> +        vfmadd213pd %ymm12, %ymm1, %ymm6
> +        vfmadd213pd %ymm13, %ymm1, %ymm6
> +        vfmadd213pd %ymm14, %ymm1, %ymm6
> +        vfmadd213pd %ymm15, %ymm1, %ymm6
> +        vfmadd213pd %ymm0, %ymm1, %ymm6
> +        vorpd     %ymm2, %ymm6, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   (%rsp), %ymm1
> +        vmovupd   %ymm0, 64(%rsp)
> +        vmovupd   %ymm1, 32(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      tanh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_tanh_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dtanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _dbP[60*16][2];
> +        __declspec(align(32)) VUINT32 _dbSignMask[4][2];
> +        __declspec(align(32)) VUINT32 _dbAbsMask[4][2];
> +        __declspec(align(32)) VUINT32 _iExpMantMask[8][1];
> +        __declspec(align(32)) VUINT32 _iExpMask[8][1];
> +        __declspec(align(32)) VUINT32 _iMinIdxOfsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iMaxIdxMask[8][1];
> +} __svml_dtanh_data_internal;
> +#endif
> +__svml_dtanh_data_internal:
> +        /* Polynomial coefficients */
> +        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* PH0 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF0000000000000   /* P1  = +1.000000000000000014103e+00 */
> +        .quad 0xBD197DEAD79668D3   /* P2  = -2.264132406596103056796e-14 */
> +        .quad 0xBFD555555553AF3C   /* P3  = -3.333333333273349741024e-01 */
> +        .quad 0xBE052F7CCA134846   /* P4  = -6.165791385711493738399e-10 */
> +        .quad 0x3FC11111563849D6   /* P5  = +1.333333655353061107201e-01 */
> +        .quad 0xBEB038623673FFB2   /* P6  = -9.668021563879858950855e-07 */
> +        .quad 0xBFAB9F685E64022E   /* P7  = -5.395055916051593179252e-02 */
> +        .quad 0xBF2A54E2B28F2207   /* P8  = -2.008940439550829012647e-04 */
> +        .quad 0x3F97CFB9328A230E   /* P9  = +2.325333949059698582189e-02 */
> +        .quad 0xBF75CA6D61723E02   /* P10 = -5.320002811586290441790e-03 */
> +        .quad 0x0000000000000000   /* B = +0        */
> +        .quad 0x3FF0000000000000   /* A = +1.0      */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C3708A564FAD29A   /* PL0 = +1.248663375337163807466e-18 */
> +        .quad 0x3FC0E6973998DA48   /* PH0 = +1.320370703922029154143e-01 */
> +        .quad 0x3FEF712EB25C0888   /* P1  = +9.825662120422444519229e-01 */
> +        .quad 0xBFC09B296F7C1EA9   /* P2  = -1.297351641044220078331e-01 */
> +        .quad 0xBFD3DD77541EDDA7   /* P3  = -3.103922196855485849143e-01 */
> +        .quad 0x3FB58FFCF4309615   /* P4  = +8.422833406128689275566e-02 */
> +        .quad 0x3FBD3ABE845DCF49   /* P5  = +1.141776154670967208833e-01 */
> +        .quad 0xBFA791DF538C37FA   /* P6  = -4.603479285115947936529e-02 */
> +        .quad 0xBFA4F872F69CD6E8   /* P7  = -4.095801601799370195284e-02 */
> +        .quad 0x3F9772E49EF6412B   /* P8  = +2.289921970583567527179e-02 */
> +        .quad 0x3F8CBC0807393909   /* P9  = +1.403051635784581776625e-02 */
> +        .quad 0xBF85F06A30F93319   /* P10 = -1.071246110873285040939e-02 */
> +        .quad 0xBFC1000000000000   /* B = -.132813 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6004EE5739DEAC   /* PL0 = +6.947247374112211856530e-18 */
> +        .quad 0x3FC2DC968E6E0D62   /* PH0 = +1.473568149050193398786e-01 */
> +        .quad 0x3FEF4E1E606D96DF   /* P1  = +9.782859691010478680677e-01 */
> +        .quad 0xBFC273BD70994AB9   /* P2  = -1.441571044730005866646e-01 */
> +        .quad 0xBFD382B548270D2C   /* P3  = -3.048527912726111386771e-01 */
> +        .quad 0x3FB7CD2D582A6B29   /* P4  = +9.297450449450351894400e-02 */
> +        .quad 0x3FBC1278CCCBF0DB   /* P5  = +1.096568584434324642303e-01 */
> +        .quad 0xBFA9C7F5115B86A1   /* P6  = -5.035367810138536095866e-02 */
> +        .quad 0xBFA371C21BAF618E   /* P7  = -3.797728145554222910481e-02 */
> +        .quad 0x3F9958943F68417E   /* P8  = +2.475196492201935923783e-02 */
> +        .quad 0x3F8930D5CFFD4152   /* P9  = +1.230017701132682667572e-02 */
> +        .quad 0xBF875CF7ADD31B76   /* P10 = -1.140779017658897660092e-02 */
> +        .quad 0xBFC3000000000000   /* B = -.148438 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7EABE24E052A1F   /* PL0 = +2.660321779421749543501e-17 */
> +        .quad 0x3FC4D04783618C71   /* PH0 = +1.626061812886266111366e-01 */
> +        .quad 0x3FEF2765AF97A4B3   /* P1  = +9.735592298067302883212e-01 */
> +        .quad 0xBFC443654205FEA5   /* P2  = -1.583067486171689074207e-01 */
> +        .quad 0xBFD31F2E208A5B97   /* P3  = -2.987780874040536844467e-01 */
> +        .quad 0x3FB9F235BD339878   /* P4  = +1.013520800512156573576e-01 */
> +        .quad 0x3FBAD0B0DFCCA141   /* P5  = +1.047468706498238100104e-01 */
> +        .quad 0xBFABD1B9600E608E   /* P6  = -5.433444306908184548967e-02 */
> +        .quad 0xBFA1CEBEAF07DB58   /* P7  = -3.478046309094534453598e-02 */
> +        .quad 0x3F9AFC9FB1D8EFD2   /* P8  = +2.635430834764902126383e-02 */
> +        .quad 0x3F8573444F1AB502   /* P9  = +1.047376028449287564018e-02 */
> +        .quad 0xBF8874FBC8F24406   /* P10 = -1.194187838544459322219e-02 */
> +        .quad 0xBFC5000000000000   /* B = -.164063 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7FB199D361A790   /* PL0 = +2.748994907060158996213e-17 */
> +        .quad 0x3FC6C170259E21F7   /* PH0 = +1.777782615356639783766e-01 */
> +        .quad 0x3FEEFD17479F7C65   /* P1  = +9.683948897253570478266e-01 */
> +        .quad 0xBFC609530FE4DF8D   /* P2  = -1.721595599753950294577e-01 */
> +        .quad 0xBFD2B3465D71B4DE   /* P3  = -2.921920692959484052676e-01 */
> +        .quad 0x3FBBFD2D34AC509B   /* P4  = +1.093319181057403192166e-01 */
> +        .quad 0x3FB9778C3C16A0FE   /* P5  = +9.948040453912551395183e-02 */
> +        .quad 0xBFADAC4D9E63C665   /* P6  = -5.795519407719210697372e-02 */
> +        .quad 0xBFA0139CCAD02D60   /* P7  = -3.139963126894929339124e-02 */
> +        .quad 0x3F9C5BF43BA6F19D   /* P8  = +2.769452680671379432854e-02 */
> +        .quad 0x3F8190B703350341   /* P9  = +8.576803002712575184772e-03 */
> +        .quad 0xBF8936606782858A   /* P10 = -1.231074634444230850234e-02 */
> +        .quad 0xBFC7000000000000   /* B = -.179688 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6A917CA3624D50   /* PL0 = +1.152216693509785660691e-17 */
> +        .quad 0x3FC8AFD7B974FABB   /* PH0 = +1.928662925292508878439e-01 */
> +        .quad 0x3FEECF47624A5D03   /* P1  = +9.628025932060214187231e-01 */
> +        .quad 0xBFC7C4C2CB4FDE4D   /* P2  = -1.856921665891938814679e-01 */
> +        .quad 0xBFD23F69CB2C1F9D   /* P3  = -2.851204380135586155453e-01 */
> +        .quad 0x3FBDEC5703A03814   /* P4  = +1.168875106670557712458e-01 */
> +        .quad 0x3FB8095003D0CF15   /* P5  = +9.389209836154706616487e-02 */
> +        .quad 0xBFAF554B47B10CBB   /* P6  = -6.119761705533607365968e-02 */
> +        .quad 0xBF9C89743FE7BC1B   /* P7  = -2.786809577986213853937e-02 */
> +        .quad 0x3F9D74725B746E7C   /* P8  = +2.876452143855921824991e-02 */
> +        .quad 0x3F7B2D8AFB70B88C   /* P9  = +6.635229968237631511880e-03 */
> +        .quad 0xBF89A0A2883EF6CB   /* P10 = -1.251341799058582545252e-02 */
> +        .quad 0xBFC9000000000000   /* B = -.195313 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7608279E8609CB   /* PL0 = +1.910958764623660748269e-17 */
> +        .quad 0x3FCA9B46D2DDC5E3   /* PH0 = +2.078636674519166172015e-01 */
> +        .quad 0x3FEE9E0BB72A01A1   /* P1  = +9.567926957534390123919e-01 */
> +        .quad 0xBFC974FAD10C5330   /* P2  = -1.988824387305156976885e-01 */
> +        .quad 0xBFD1C40ACCBA4044   /* P3  = -2.775904654781735703430e-01 */
> +        .quad 0x3FBFBE24E2987853   /* P4  = +1.239951184474830487522e-01 */
> +        .quad 0x3FB6885B4345E47F   /* P5  = +8.801813499839460539687e-02 */
> +        .quad 0xBFB06563D5670584   /* P6  = -6.404708824176991770896e-02 */
> +        .quad 0xBF98CD1D620DF6E2   /* P7  = -2.421995078065365147772e-02 */
> +        .quad 0x3F9E44EF3E844D21   /* P8  = +2.955983943054463683119e-02 */
> +        .quad 0x3F7325FA0148CAAE   /* P9  = +4.674889165971292322643e-03 */
> +        .quad 0xBF89B4C8556C2D92   /* P10 = -1.255184660614964011319e-02 */
> +        .quad 0xBFCB000000000000   /* B = -.210938 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6F19DAA20F51D5   /* PL0 = +1.348790537832000351176e-17 */
> +        .quad 0x3FCC83876CA98E15   /* PH0 = +2.227639465883021474557e-01 */
> +        .quad 0x3FEE697B662D07CD   /* P1  = +9.503762241004040620296e-01 */
> +        .quad 0xBFCB194C7ED76ACF   /* P2  = -2.117095584242946953999e-01 */
> +        .quad 0xBFD141A19E419762   /* P3  = -2.696308179350720680191e-01 */
> +        .quad 0x3FC0B89C64BC7B98   /* P4  = +1.306338779331468503007e-01 */
> +        .quad 0x3FB4F721150BBFC5   /* P5  = +8.189589275184434216748e-02 */
> +        .quad 0xBFB105AAFAB87898   /* P6  = -6.649273511036069461061e-02 */
> +        .quad 0xBF94FB3B31248C01   /* P7  = -2.048962104266749732921e-02 */
> +        .quad 0x3F9ECD31E588709C   /* P8  = +3.007963145692880855964e-02 */
> +        .quad 0x3F664A91A335C105   /* P9  = +2.721104095762541127495e-03 */
> +        .quad 0xBF89754E32E1E26E   /* P10 = -1.243077366619723806134e-02 */
> +        .quad 0xBFCD000000000000   /* B = -.226563 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6AC6C889D8111D   /* PL0 = +1.161245469312620769170e-17 */
> +        .quad 0x3FCE6864FE55A3D0   /* PH0 = +2.375608674877001114112e-01 */
> +        .quad 0x3FEE31AEE116B82B   /* P1  = +9.435648342384913826391e-01 */
> +        .quad 0xBFCCB114B69E808B   /* P2  = -2.241540805525839833707e-01 */
> +        .quad 0xBFD0B8AB913BA99D   /* P3  = -2.612713735858507980441e-01 */
> +        .quad 0x3FC1823322BED48A   /* P4  = +1.367858810096190233514e-01 */
> +        .quad 0x3FB35822B7929893   /* P5  = +7.556359273675842651653e-02 */
> +        .quad 0xBFB18B03CC78D2DA   /* P6  = -6.852744810096158580830e-02 */
> +        .quad 0xBF911CCC3C8D5E5D   /* P7  = -1.671141738492420009734e-02 */
> +        .quad 0x3F9F0DEC2D99B12F   /* P8  = +3.032654789278515819797e-02 */
> +        .quad 0x3F4A28398B4EBD98   /* P9  = +7.982521989244205404918e-04 */
> +        .quad 0xBF88E60CB2FAB9A4   /* P10 = -1.215753480150000985458e-02 */
> +        .quad 0xBFCF000000000000   /* B = -.242188 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C89D2B6774FB61D   /* PL0 = +4.479593208720169247958e-17 */
> +        .quad 0x3FD09C744F539BE4   /* PH0 = +2.595492148088267558848e-01 */
> +        .quad 0x3FEDD823B0400D42   /* P1  = +9.326342050921214825882e-01 */
> +        .quad 0xBFCEFBF7FF305FCC   /* P2  = -2.420644756355144687086e-01 */
> +        .quad 0xBFCFC01DC4F24A41   /* P3  = -2.480504237797323303990e-01 */
> +        .quad 0x3FC291A2C26D5548   /* P4  = +1.450694512701977626753e-01 */
> +        .quad 0x3FB0D562E672D188   /* P5  = +6.575601698097532991976e-02 */
> +        .quad 0xBFB2201ECC119E06   /* P6  = -7.080261690281738261872e-02 */
> +        .quad 0xBF8695D50F778D31   /* P7  = -1.102796987010509974642e-02 */
> +        .quad 0x3F9EEC8CFBC031A0   /* P8  = +3.019924437107734972427e-02 */
> +        .quad 0xBF6030F0A4D3660A   /* P9  = -1.976461417694923328722e-03 */
> +        .quad 0xBF87845288A4AEF5   /* P10 = -1.148285369398347838494e-02 */
> +        .quad 0xBFD1000000000000   /* B = -.265625 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8B6AAB614D1C8D   /* PL0 = +4.756035418366735312727e-17 */
> +        .quad 0x3FD275F7E1CF7F63   /* PH0 = +2.884502129727392616410e-01 */
> +        .quad 0x3FED56658F74C9CC   /* P1  = +9.167964746359813351341e-01 */
> +        .quad 0xBFD0ECC045EBD596   /* P2  = -2.644501383614054083635e-01 */
> +        .quad 0xBFCD5A4BDE179180   /* P3  = -2.293181261476426808811e-01 */
> +        .quad 0x3FC3C00047D34767   /* P4  = +1.542969084462655120552e-01 */
> +        .quad 0x3FAAC7CE84FD609F   /* P5  = +5.230565427217581251974e-02 */
> +        .quad 0xBFB288948D2E8B43   /* P6  = -7.239654967137902384931e-02 */
> +        .quad 0xBF6D6605AAD5A1C0   /* P7  = -3.588687008847041164896e-03 */
> +        .quad 0x3F9DDB0790848E97   /* P8  = +2.915584392134337382866e-02 */
> +        .quad 0xBF75FDE291BAD5B4   /* P9  = -5.369076763306269573660e-03 */
> +        .quad 0xBF84CEA5C52E0A78   /* P10 = -1.015977390284671071888e-02 */
> +        .quad 0xBFD3000000000000   /* B = -.296875 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7139A81C8A6ECF   /* PL0 = +1.494049799478574591322e-17 */
> +        .quad 0x3FD4470650036407   /* PH0 = +3.168350011233659890841e-01 */
> +        .quad 0x3FECC9A69DFDDD48   /* P1  = +8.996155820631566629678e-01 */
> +        .quad 0xBFD23DED3A37A09F   /* P2  = -2.850297039535778028925e-01 */
> +        .quad 0xBFCAD302395D51C1   /* P3  = -2.095644741153943890185e-01 */
> +        .quad 0x3FC4A8FE3F309C22   /* P4  = +1.614072617096278705115e-01 */
> +        .quad 0x3FA3D161188AA436   /* P5  = +3.870681213931741151586e-02 */
> +        .quad 0xBFB288CFE5494E98   /* P6  = -7.240008685885823969403e-02 */
> +        .quad 0x3F6C7903EED8D334   /* P7  = +3.475673371918475361081e-03 */
> +        .quad 0x3F9BE023CDFB02F6   /* P8  = +2.722221321778569498033e-02 */
> +        .quad 0xBF80F8296F2C3A95   /* P9  = -8.285831170295390358336e-03 */
> +        .quad 0xBF8152DF4790049B   /* P10 = -8.458847400108650973189e-03 */
> +        .quad 0xBFD5000000000000   /* B = -.328125 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7751FE0FEE8335   /* PL0 = +2.022712113430213599928e-17 */
> +        .quad 0x3FD60EF7120502A9   /* PH0 = +3.446633983585721261456e-01 */
> +        .quad 0x3FEC32D951E56E6F   /* P1  = +8.812071418319202070776e-01 */
> +        .quad 0xBFD370255FC004F8   /* P2  = -3.037198481616338996824e-01 */
> +        .quad 0xBFC832F0EBC6BB41   /* P3  = -1.890545989276351359107e-01 */
> +        .quad 0x3FC54C99A0FF432F   /* P4  = +1.664001499289269127540e-01 */
> +        .quad 0x3F99DAC0CC283C18   /* P5  = +2.524853941036661688369e-02 */
> +        .quad 0xBFB227B3896A026D   /* P6  = -7.091829399906553280461e-02 */
> +        .quad 0x3F84663364E1FB19   /* P7  = +9.960557476231411602383e-03 */
> +        .quad 0x3F9922D70DE07C57   /* P8  = +2.454696676442965935283e-02 */
> +        .quad 0xBF85C4A4EB6F86BC   /* P9  = -1.062897532932837635222e-02 */
> +        .quad 0xBF7AAB61214FFE17   /* P10 = -6.511096396024671890972e-03 */
> +        .quad 0xBFD7000000000000   /* B = -.359375 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3BFE67F266843B2C   /* PL0 = +1.030196791298162288777e-19 */
> +        .quad 0x3FD7CD3115FC0F16   /* PH0 = +3.718989100163850869407e-01 */
> +        .quad 0x3FEB92F96CCC2C5B   /* P1  = +8.616912007286247079761e-01 */
> +        .quad 0xBFD4827320135092   /* P2  = -3.204620183216856200247e-01 */
> +        .quad 0xBFC582B15550168A   /* P3  = -1.680509249273891977521e-01 */
> +        .quad 0x3FC5AC3B9A2E4C31   /* P4  = +1.693186285816366254244e-01 */
> +        .quad 0x3F88FA599FCADAFB   /* P5  = +1.219625491044728129762e-02 */
> +        .quad 0xBFB16EC8F5CA169E   /* P6  = -6.809669495313605642174e-02 */
> +        .quad 0x3F90140EFC748BBE   /* P7  = +1.570151725639922719844e-02 */
> +        .quad 0x3F95CFC49C1A28DC   /* P8  = +2.130038454792147768770e-02 */
> +        .quad 0xBF8946ED8B1BF454   /* P9  = -1.234231549050882816697e-02 */
> +        .quad 0xBF7239E55C1DD50F   /* P10 = -4.449745117985472755606e-03 */
> +        .quad 0xBFD9000000000000   /* B = -.390625 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6412330191189C   /* PL0 = +8.704448096175471149661e-18 */
> +        .quad 0x3FD9812B3B03F0A5   /* PH0 = +3.985088421175169703936e-01 */
> +        .quad 0x3FEAEB08C3C0E84D   /* P1  = +8.411907027541559254748e-01 */
> +        .quad 0xBFD57446B1BC46CF   /* P2  = -3.352219329545790787820e-01 */
> +        .quad 0xBFC2CA9ABC0444AD   /* P3  = -1.468079965639267634401e-01 */
> +        .quad 0x3FC5CA95F9460D18   /* P4  = +1.702449290424759093710e-01 */
> +        .quad 0xBF2C2DAA35DD05C3   /* P5  = -2.149839664813813012186e-04 */
> +        .quad 0xBFB069A516EEB75D   /* P6  = -6.411201295733578195472e-02 */
> +        .quad 0x3F9512716416FDC7   /* P7  = +2.057816670798986720058e-02 */
> +        .quad 0x3F921630CB1319A3   /* P8  = +1.766277541607908852593e-02 */
> +        .quad 0xBF8B76DA2EC99526   /* P9  = -1.341028647693549562145e-02 */
> +        .quad 0xBF63A97474A161E4   /* P10 = -2.400138332671485493040e-03 */
> +        .quad 0xBFDB000000000000   /* B = -.421875 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C89B79F5783381C   /* PL0 = +4.461236087774530799537e-17 */
> +        .quad 0x3FDB2A6C993B829D   /* PH0 = +4.244643684778937609003e-01 */
> +        .quad 0x3FEA3C0C1FBA328C   /* P1  = +8.198299998926627915155e-01 */
> +        .quad 0xBFD6457212F78DE0   /* P2  = -3.479886231636708581604e-01 */
> +        .quad 0xBFC0129BDA380A66   /* P3  = -1.255678954622282824818e-01 */
> +        .quad 0x3FC5AB77F388FBDE   /* P4  = +1.692953051696965507089e-01 */
> +        .quad 0xBF8822F3A6CADB7C   /* P5  = -1.178541519889874597783e-02 */
> +        .quad 0xBFAE4A876370A4BD   /* P6  = -5.916236008517603590739e-02 */
> +        .quad 0x3F991A89BC3B7710   /* P7  = +2.451529704455085335710e-02 */
> +        .quad 0x3F8C4A4328204D4B   /* P8  = +1.381351915555364098800e-02 */
> +        .quad 0xBF8C5F921D01EC0B   /* P9  = -1.385416174911393178490e-02 */
> +        .quad 0xBF3EE844C5B79FB8   /* P10 = -4.716079617694784908234e-04 */
> +        .quad 0xBFDD000000000000   /* B = -.453125 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C73FA437AD7AD87   /* PL0 = +1.732779905745858845932e-17 */
> +        .quad 0x3FDCC88C9902CF45   /* PH0 = +4.497405523536495697279e-01 */
> +        .quad 0x3FE9870845162D1D   /* P1  = +7.977334355686341748810e-01 */
> +        .quad 0xBFD6F62358F73DA8   /* P2  = -3.587730759436120677668e-01 */
> +        .quad 0xBFBAC4345D675FE1   /* P3  = -1.045563438450467661101e-01 */
> +        .quad 0x3FC5539DA8287019   /* P4  = +1.666142531474868131862e-01 */
> +        .quad 0xBF96E3E0DC04A09F   /* P5  = -2.235366194614185212822e-02 */
> +        .quad 0xBFAB5EC7147C207D   /* P6  = -5.345747113284546871398e-02 */
> +        .quad 0x3F9C24166FFA7A58   /* P7  = +2.748141344511120915667e-02 */
> +        .quad 0x3F8451B907819844   /* P8  = +9.921498815128277696693e-03 */
> +        .quad 0xBF8C1C6D19191FCB   /* P9  = -1.372609360545586670239e-02 */
> +        .quad 0x3F547372DF72E35A   /* P10 = +1.248228245272117756098e-03 */
> +        .quad 0xBFDF000000000000   /* B = -.484375 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C848FE06EE49950   /* PL0 = +3.566941590788961528958e-17 */
> +        .quad 0x3FDF20211A36475D   /* PH0 = +4.863360172249622803697e-01 */
> +        .quad 0x3FE86E67E6B80AC2   /* P1  = +7.634772783497611574659e-01 */
> +        .quad 0xBFD7C37C55474D9B   /* P2  = -3.713064987943767913461e-01 */
> +        .quad 0xBFB2EBF15F3CB036   /* P3  = -7.391270232318521952684e-02 */
> +        .quad 0x3FC4718C8EF6E3AA   /* P4  = +1.597152422016539530950e-01 */
> +        .quad 0xBFA277F8394E9B07   /* P5  = -3.607154559658991932071e-02 */
> +        .quad 0xBFA680312AB207E3   /* P6  = -4.394677778419955009224e-02 */
> +        .quad 0x3F9EDC9A8B57E286   /* P7  = +3.013841128810892143223e-02 */
> +        .quad 0x3F71B8C5E648EAF6   /* P8  = +4.326603932492947851719e-03 */
> +        .quad 0xBF89DB218356730C   /* P9  = -1.262499029217558458029e-02 */
> +        .quad 0x3F6B05728E6EBC8E   /* P10 = +3.298496001171330815865e-03 */
> +        .quad 0xBFE1000000000000   /* B = -.53125  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8429831EDD94DE   /* PL0 = +3.497576705878673192147e-17 */
> +        .quad 0x3FE10AF47E0BF610   /* PH0 = +5.325872861719194162333e-01 */
> +        .quad 0x3FE6EC5879F87EEE   /* P1  = +7.163507826080299761242e-01 */
> +        .quad 0xBFD86AD001BFE200   /* P2  = -3.815193192563413204129e-01 */
> +        .quad 0xBFA239045B661385   /* P3  = -3.559125533778398983564e-02 */
> +        .quad 0x3FC2B4572D9CC147   /* P4  = +1.461285565105845078038e-01 */
> +        .quad 0xBFA99F4F01740705   /* P5  = -5.004355328311586406115e-02 */
> +        .quad 0xBF9F449C484F4879   /* P6  = -3.053516570418721511214e-02 */
> +        .quad 0x3F9F5F42169D7DDE   /* P7  = +3.063681853325116830798e-02 */
> +        .quad 0xBF6111B1BA632A97   /* P8  = -2.083632588527460989469e-03 */
> +        .quad 0xBF84725FBE5B6E61   /* P9  = -9.983776089419639342530e-03 */
> +        .quad 0x3F7438A2986CFA9C   /* P10 = +4.936823976832951342488e-03 */
> +        .quad 0xBFE3000000000000   /* B = -.59375  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BE9160BFB3505   /* PL0 = +1.210424670976053242391e-17 */
> +        .quad 0x3FE26D76F73233C7   /* PH0 = +5.758623912857893101247e-01 */
> +        .quad 0x3FE56363B5B93937   /* P1  = +6.683825063026124740752e-01 */
> +        .quad 0xBFD8A2244B27297E   /* P2  = -3.848963483730115724200e-01 */
> +        .quad 0xBF52CA2F101EEF63   /* P3  = -1.146837196286797844817e-03 */
> +        .quad 0x3FC081BC342243AD   /* P4  = +1.289592032012739958675e-01 */
> +        .quad 0xBFAE38DB4A932344   /* P5  = -5.902753148399722719732e-02 */
> +        .quad 0xBF91F814D4AE90C6   /* P6  = -1.754791782481459457885e-02 */
> +        .quad 0x3F9D056AE193C4F3   /* P7  = +2.834097863973723355792e-02 */
> +        .quad 0xBF7BD0B502D8F3A0   /* P8  = -6.790835451792626336974e-03 */
> +        .quad 0xBF7B763F7BB8AE2F   /* P9  = -6.704566938008179114124e-03 */
> +        .quad 0x3F76036F42D9AB69   /* P10 = +5.374369252971835729099e-03 */
> +        .quad 0xBFE5000000000000   /* B = -.65625  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8B64AF0450486E   /* PL0 = +4.751979286662385162741e-17 */
> +        .quad 0x3FE3B75F8BCB742D   /* PH0 = +6.161344271055263499548e-01 */
> +        .quad 0x3FE3DA23BC12369F   /* P1  = +6.203783677353447780947e-01 */
> +        .quad 0xBFD8768FF4B46416   /* P2  = -3.822364701932782367281e-01 */
> +        .quad 0x3F9D67CB8AD9CB1A   /* P3  = +2.871625933625941117406e-02 */
> +        .quad 0x3FBC168CB7827DF4   /* P4  = +1.097190807363331305006e-01 */
> +        .quad 0xBFB03A2B83C9272E   /* P5  = -6.338760344911228324430e-02 */
> +        .quad 0xBF789FEB595297DC   /* P6  = -6.011885959344067548074e-03 */
> +        .quad 0x3F98BD01B4C335E7   /* P7  = +2.415850320612902513532e-02 */
> +        .quad 0xBF83BADC303D6535   /* P8  = -9.633751127398152979976e-03 */
> +        .quad 0xBF6C54E7A1C1E3F3   /* P9  = -3.458454519258407989501e-03 */
> +        .quad 0x3F7408394B7EF3E7   /* P10 = +4.890655334688332484537e-03 */
> +        .quad 0xBFE7000000000000   /* B = -.71875  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6A48557F6E0D3E   /* PL0 = +1.139824111505584215867e-17 */
> +        .quad 0x3FE4E8D895B010DC   /* PH0 = +6.534235881413468227663e-01 */
> +        .quad 0x3FE25652FAAF8A73   /* P1  = +5.730376144604875448991e-01 */
> +        .quad 0xBFD7F6C3A57C444B   /* P2  = -3.744362941807295084434e-01 */
> +        .quad 0x3FAB7866E3F99EBE   /* P3  = +5.365296872042567001598e-02 */
> +        .quad 0x3FB6FA1DF47CCD40   /* P4  = +8.975398272450707099784e-02 */
> +        .quad 0xBFB05508D3741B8E   /* P5  = -6.379752314033580026840e-02 */
> +        .quad 0x3F6C3EFDF7BB279C   /* P6  = +3.448005705512137236209e-03 */
> +        .quad 0x3F9372BADD6D3E27   /* P7  = +1.899234749299530050806e-02 */
> +        .quad 0xBF860FD5AE65F3DA   /* P8  = -1.077238977881649471165e-02 */
> +        .quad 0xBF47266FFB07E628   /* P9  = -7.064863949032872448118e-04 */
> +        .quad 0x3F6F9763992C2A05   /* P10 = +3.856367614735181120799e-03 */
> +        .quad 0xBFE9000000000000   /* B = -.78125  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BB6A2B194E3AB   /* PL0 = +1.201878007209462528697e-17 */
> +        .quad 0x3FE602609AAE7C22   /* PH0 = +6.877902051090851731630e-01 */
> +        .quad 0x3FE0DCBAFE191C7F   /* P1  = +5.269446337560025312137e-01 */
> +        .quad 0xBFD732028428A9FB   /* P2  = -3.624273577321727538225e-01 */
> +        .quad 0x3FB2D92389BE065B   /* P3  = +7.362577545975439796588e-02 */
> +        .quad 0x3FB1F6A9C8C49993   /* P4  = +7.017003203927733370937e-02 */
> +        .quad 0xBFAF47C0B50B56EE   /* P5  = -6.109430513394707378526e-02 */
> +        .quad 0x3F85A8EDD1356223   /* P6  = +1.057611269668352068104e-02 */
> +        .quad 0x3F8BE05C5CD1B4FA   /* P7  = +1.361152799855823798207e-02 */
> +        .quad 0xBF85A0EFE4552F76   /* P8  = -1.056086936537046752272e-02 */
> +        .quad 0x3F559F2A6A356194   /* P9  = +1.319686337259627831943e-03 */
> +        .quad 0x3F6576F5E989208D   /* P10 = +2.620201394425042596201e-03 */
> +        .quad 0xBFEB000000000000   /* B = -.84375  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C80328BD86C8B74   /* PL0 = +2.809809047161267929701e-17 */
> +        .quad 0x3FE704BB1B7FCB81   /* PH0 = +7.193275010198335595035e-01 */
> +        .quad 0x3FDEE264AAD6C40C   /* P1  = +4.825679462765613089739e-01 */
> +        .quad 0xBFD637493CE659F1   /* P2  = -3.471243948673921548357e-01 */
> +        .quad 0x3FB6BE3A3DEE6F4A   /* P3  = +8.884014141079635303208e-02 */
> +        .quad 0x3FAA85EB6470AC0F   /* P4  = +5.180297471118688523488e-02 */
> +        .quad 0xBFACC0146EA4858D   /* P5  = -5.615295267694895314457e-02 */
> +        .quad 0x3F8F8FB683CDDAC5   /* P6  = +1.541082944616557159055e-02 */
> +        .quad 0x3F819515DEE2CB91   /* P7  = +8.585139145315585602547e-03 */
> +        .quad 0xBF834E45E6AF9EA1   /* P8  = -9.426637747267209169415e-03 */
> +        .quad 0x3F65250F197CA56D   /* P9  = +2.581147662472352252568e-03 */
> +        .quad 0x3F57A766026D036C   /* P10 = +1.443719500187702367690e-03 */
> +        .quad 0xBFED000000000000   /* B = -.90625  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C716F7EEF7B61AD   /* PL0 = +1.512291215142578135651e-17 */
> +        .quad 0x3FE7F0E1A4CD846E   /* PH0 = +7.481544703297353660076e-01 */
> +        .quad 0x3FDC2D4CC872DC09   /* P1  = +4.402648885256331012598e-01 */
> +        .quad 0xBFD514A99F92ED53   /* P2  = -3.293861444796750250530e-01 */
> +        .quad 0x3FB9846A6CF2F337   /* P3  = +9.967675361526749494844e-02 */
> +        .quad 0x3FA20896939AB161   /* P4  = +3.522177268800664413493e-02 */
> +        .quad 0xBFA97E801F31EE0D   /* P5  = -4.979324703978358553405e-02 */
> +        .quad 0x3F92A11F47B82085   /* P6  = +1.819275737037219740638e-02 */
> +        .quad 0x3F717D70FE289C34   /* P7  = +4.270020845559097605514e-03 */
> +        .quad 0xBF7FDCF1D3F6CE2D   /* P8  = -7.779068604054678540132e-03 */
> +        .quad 0x3F69F607E81AF6B6   /* P9  = +3.169074480722534625181e-03 */
> +        .quad 0x3F3F925C80D0F889   /* P10 = +4.817462766516585511824e-04 */
> +        .quad 0xBFEF000000000000   /* B = -.96875  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C931A11D7E8606E   /* PL0 = +6.627280241435322692188e-17 */
> +        .quad 0x3FE92BFB370D9B71   /* PH0 = +7.866188121086975515439e-01 */
> +        .quad 0x3FD866160E454111   /* P1  = +3.812308444367014680480e-01 */
> +        .quad 0xBFD33149F3801DBA   /* P2  = -2.998833539899937679796e-01 */
> +        .quad 0x3FBBDB6D4C949899   /* P3  = +1.088169395412442909023e-01 */
> +        .quad 0x3F8D6AB2A74B9343   /* P4  = +1.436366627735597372494e-02 */
> +        .quad 0xBFA404D1047C5D72   /* P5  = -3.909924678571997970917e-02 */
> +        .quad 0x3F93C47D9ACCD919   /* P6  = +1.930423981976856424661e-02 */
> +        .quad 0xBF41B755642CFF1B   /* P7  = -5.406538915408738478158e-04 */
> +        .quad 0xBF74B5301AA1E788   /* P8  = -5.055606752756853900641e-03 */
> +        .quad 0x3F69A84C5B2A3E68   /* P9  = +3.132008679422249529120e-03 */
> +        .quad 0xBF3CF47830328C11   /* P10 = -4.418176105877589308931e-04 */
> +        .quad 0xBFF1000000000000   /* B = -1.0625   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C884D471B8FD396   /* PL0 = +4.215701792312937090514e-17 */
> +        .quad 0x3FEA8DBCBC31897A   /* PH0 = +8.298019099859594849278e-01 */
> +        .quad 0x3FD3EE730537C8EA   /* P1  = +3.114287901836535219818e-01 */
> +        .quad 0xBFD08A05AD27CE32   /* P2  = -2.584242049190123217982e-01 */
> +        .quad 0x3FBC5255406F84B6   /* P3  = +1.106313021005175045399e-01 */
> +        .quad 0xBF772FA2F633AA5E   /* P4  = -5.660664147607434209241e-03 */
> +        .quad 0xBF99DD8E4C473FC4   /* P5  = -2.525923100057504533247e-02 */
> +        .quad 0x3F9183C935B6495D   /* P6  = +1.710428610165003372069e-02 */
> +        .quad 0xBF70471A3A591480   /* P7  = -3.974058583087303228038e-03 */
> +        .quad 0xBF603DDD4DEBB9A4   /* P8  = -1.982624278176818987264e-03 */
> +        .quad 0x3F62591E44D3C17F   /* P9  = +2.239760512218135956425e-03 */
> +        .quad 0xBF4C195D3A9B1AB4   /* P10 = -8.575158328419569430544e-04 */
> +        .quad 0xBFF3000000000000   /* B = -1.1875   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C90DD1C9BFF7F64   /* PL0 = +5.850777430004479798187e-17 */
> +        .quad 0x3FEBAD50A4A68BC1   /* PH0 = +8.649066177207417327466e-01 */
> +        .quad 0x3FD01FBA72CEE1A5   /* P1  = +2.519365426228666233893e-01 */
> +        .quad 0xBFCBE432F647C4D6   /* P2  = -2.179015829602010702633e-01 */
> +        .quad 0x3FBABF92B6E5AC73   /* P3  = +1.044856735731387955105e-01 */
> +        .quad 0xBF922983AA24E217   /* P4  = -1.773648954369563555378e-02 */
> +        .quad 0xBF8C72214C14E23A   /* P5  = -1.388956082756564056328e-02 */
> +        .quad 0x3F8ACB4D1F388E8B   /* P6  = +1.308307887581540972153e-02 */
> +        .quad 0xBF740EF8B4A2EE3B   /* P7  = -4.897090441029978580995e-03 */
> +        .quad 0xBF0EA9F30C8DC900   /* P8  = -5.848668076326342477133e-05 */
> +        .quad 0x3F53CC40D18713AE   /* P9  = +1.208365725788622757410e-03 */
> +        .quad 0xBF4848B86029CBA1   /* P10 = -7.410908004444779592485e-04 */
> +        .quad 0xBFF5000000000000   /* B = -1.3125   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8FB61781D22681   /* PL0 = +5.501032995458057064843e-17 */
> +        .quad 0x3FEC950A3340C8BF   /* PH0 = +8.931933404003514764824e-01 */
> +        .quad 0x3FC9E1DFFD385423   /* P1  = +2.022056566644617586005e-01 */
> +        .quad 0xBFC71E2FF88EBA23   /* P2  = -1.806087459239772032583e-01 */
> +        .quad 0x3FB80AEBD07AB5BA   /* P3  = +9.391664352252506838449e-02 */
> +        .quad 0xBF98404E27EAE6ED   /* P4  = -2.368280523908243895884e-02 */
> +        .quad 0xBF772DA520B5006E   /* P5  = -5.658764868087568802107e-03 */
> +        .quad 0x3F824C9268AF9423   /* P6  = +8.935111827620250551925e-03 */
> +        .quad 0xBF722AE76D206AE3   /* P7  = -4.435447701349490160113e-03 */
> +        .quad 0x3F4B807F56298D5E   /* P8  = +8.392926941493230644497e-04 */
> +        .quad 0x3F3D71027DF95D2A   /* P9  = +4.492407879061627603159e-04 */
> +        .quad 0xBF3EBD17676755FB   /* P10 = -4.690343988874298905483e-04 */
> +        .quad 0xBFF7000000000000   /* B = -1.4375   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C95393C63CE8224   /* PL0 = +7.363407705201031038415e-17 */
> +        .quad 0x3FED4E6F464286B0   /* PH0 = +9.158245441687622445670e-01 */
> +        .quad 0x3FC4A45842B7DE1E   /* P1  = +1.612654042980787191461e-01 */
> +        .quad 0xBFC2E7885AFDD3D0   /* P2  = -1.476908153814791087327e-01 */
> +        .quad 0x3FB4DD6DD51D3FEB   /* P3  = +8.150373890862254580204e-02 */
> +        .quad 0xBF9A05D3ADAB489C   /* P4  = -2.541285274021075503042e-02 */
> +        .quad 0xBF3459B643B4995C   /* P5  = -3.105230313899165257622e-04 */
> +        .quad 0x3F766B30745F2E3A   /* P6  = +5.473317409222350365811e-03 */
> +        .quad 0xBF6C2C891E555BDF   /* P7  = -3.439204988051155730940e-03 */
> +        .quad 0x3F5194F30D6C576D   /* P8  = +1.073109966176012791522e-03 */
> +        .quad 0x3EF4DBB43C3132A2   /* P9  = +1.989194766975849961365e-05 */
> +        .quad 0xBF2E45EBAB3C15A0   /* P10 = -2.309656316514087783666e-04 */
> +        .quad 0xBFF9000000000000   /* B = -1.5625   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C75111669651DAA   /* PL0 = +1.827249135453834384396e-17 */
> +        .quad 0x3FEDE1EB5937518F   /* PH0 = +9.338280432225917193634e-01 */
> +        .quad 0x3FC06129C7C8EBB1   /* P1  = +1.279651856910653382507e-01 */
> +        .quad 0xBFBE9763041064E1   /* P2  = -1.194974789545031421774e-01 */
> +        .quad 0x3FB1A5B9F9113928   /* P3  = +6.893503504509068635308e-02 */
> +        .quad 0xBF992145039F9AFE   /* P4  = -2.454097590080105816526e-02 */
> +        .quad 0x3F66CB116EA49C89   /* P5  = +2.782377288116648315142e-03 */
> +        .quad 0x3F67F972FDF30001   /* P6  = +2.926563829163342740100e-03 */
> +        .quad 0xBF63A7B5975F02F3   /* P7  = -2.399305983061922438601e-03 */
> +        .quad 0x3F4FDE7B8777F4C8   /* P8  = +9.725669069095216373599e-04 */
> +        .quad 0xBF25918876626BA4   /* P9  = -1.645545082212515656240e-04 */
> +        .quad 0xBF1495123C991F00   /* P10 = -7.851527984669912693674e-05 */
> +        .quad 0xBFFB000000000000   /* B = -1.6875   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9F29A5B7426D27   /* PL0 = +1.081172820484012446345e-16 */
> +        .quad 0x3FEE56B6F3EFABFC   /* PH0 = +9.480852856044061915952e-01 */
> +        .quad 0x3FB9E3EFD94BB9FC   /* P1  = +1.011342912204113371518e-01 */
> +        .quad 0xBFB88BD9760FECA7   /* P2  = -9.588393337610288420285e-02 */
> +        .quad 0x3FAD48A0350B3ACF   /* P3  = +5.719471595295077387313e-02 */
> +        .quad 0xBF96CC6A5110F129   /* P4  = -2.226415748394675367257e-02 */
> +        .quad 0x3F71934687170384   /* P5  = +4.290843485649345772606e-03 */
> +        .quad 0x3F5407BAF73B3DF9   /* P6  = +1.222546180475235334287e-03 */
> +        .quad 0xBF591B626C0646DD   /* P7  = -1.532407870488964407324e-03 */
> +        .quad 0x3F48B0E1DD283558   /* P8  = +7.535078860329375669277e-04 */
> +        .quad 0xBF2B322292840D2B   /* P9  = -2.074877932117605962646e-04 */
> +        .quad 0xBE99E4061120C741   /* P10 = -3.858017559892704559672e-07 */
> +        .quad 0xBFFD000000000000   /* B = -1.8125   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6AF8C2041C67CD   /* PL0 = +1.169711482626385762338e-17 */
> +        .quad 0x3FEEB2DFEDD5EC93   /* PH0 = +9.593352933146824801369e-01 */
> +        .quad 0x3FB465A205CFB638   /* P1  = +7.967579500083210999681e-02 */
> +        .quad 0xBFB3914BF68D39FF   /* P2  = -7.643580216720378576778e-02 */
> +        .quad 0x3FA7F21A08C5C734   /* P3  = +4.676896435820623621673e-02 */
> +        .quad 0xBF93DA9560EA9960   /* P4  = -1.938851741820124550772e-02 */
> +        .quad 0x3F73953FEC62820E   /* P5  = +4.781007481284861359820e-03 */
> +        .quad 0x3F2749D5E1273E3C   /* P6  = +1.776765426044646108071e-04 */
> +        .quad 0xBF4D46B0B498CE5A   /* P7  = -8.934367007839658352859e-04 */
> +        .quad 0x3F4153D680E1F4C4   /* P8  = +5.287930851093571206574e-04 */
> +        .quad 0xBF28477014ECA6A2   /* P9  = -1.852344816708944640949e-04 */
> +        .quad 0x3EFFAC54E07CEB4B   /* P10 = +3.020588886147182143902e-05 */
> +        .quad 0xBFFF000000000000   /* B = -1.9375   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7A8AF2BB2231F2   /* PL0 = +2.302217989249372577466e-17 */
> +        .quad 0x3FEF1994DF724FC8   /* PH0 = +9.718727459135090285258e-01 */
> +        .quad 0x3FAC65B1BC0C9D58   /* P1  = +5.546336575053583942603e-02 */
> +        .quad 0xBFAB9937BDA747C8   /* P2  = -5.390333356957871365599e-02 */
> +        .quad 0x3FA15B42D9EF931C   /* P3  = +3.389939222669210777241e-02 */
> +        .quad 0xBF8EACD8E8507A3C   /* P4  = -1.497811755149058215502e-02 */
> +        .quad 0x3F7263A15721C682   /* P5  = +4.489546046998806349050e-03 */
> +        .quad 0xBF42A032ACDC3B32   /* P6  = -5.684134900735048121829e-04 */
> +        .quad 0xBF3431E79B5AD185   /* P7  = -3.081503340170088810438e-04 */
> +        .quad 0x3F31B51667C7DF5E   /* P8  = +2.701930714290502424828e-04 */
> +        .quad 0xBF1F8709579250AD   /* P9  = -1.202678157759563704341e-04 */
> +        .quad 0x3F01ED8ED1BF9595   /* P10 = +3.419487094883790833778e-05 */
> +        .quad 0xC001000000000000   /* B = -2.125    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C86F3F7C3DAFC55   /* PL0 = +3.981710680748877459333e-17 */
> +        .quad 0x3FEF73776B2AA2DB   /* PH0 = +9.828450291725759901951e-01 */
> +        .quad 0x3FA16A7FC4D7B900   /* P1  = +3.401564863075812007064e-02 */
> +        .quad 0xBFA11E03803AD621   /* P2  = -3.343211117082156940532e-02 */
> +        .quad 0x3F9609591597297F   /* P3  = +2.152003473546803654658e-02 */
> +        .quad 0xBF847E74ED9BBB0C   /* P4  = -1.000682211039596246436e-02 */
> +        .quad 0x3F6BFF771725CD65   /* P5  = +3.417713736035987187864e-03 */
> +        .quad 0xBF491D1FF73C18FA   /* P6  = -7.664114077392807421000e-04 */
> +        .quad 0x3EF53EE467B51DC5   /* P7  = +2.026145237479599375099e-05 */
> +        .quad 0x3F160135BE0D94A0   /* P8  = +8.394136922403255700685e-05 */
> +        .quad 0xBF0B32CB1D276A40   /* P9  = -5.187685350778849443841e-05 */
> +        .quad 0x3EF4DAF70C12D555   /* P10 = +1.988919462255396826584e-05 */
> +        .quad 0xC003000000000000   /* B = -2.375    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C19DBF4E2E5B7DC   /* PL0 = +3.504575836708380670219e-19 */
> +        .quad 0x3FEFAA7934B75EBD   /* PH0 = +9.895597486128832054320e-01 */
> +        .quad 0x3F9545200830A42C   /* P1  = +2.077150392520736492125e-02 */
> +        .quad 0xBF950C46D285F6BC   /* P2  = -2.055464420253970271376e-02 */
> +        .quad 0x3F8B79F5BFC6513F   /* P3  = +1.341621390819425058164e-02 */
> +        .quad 0xBF7A50ADAD777898   /* P4  = -6.424597194806612772505e-03 */
> +        .quad 0x3F633A19BE8255E3   /* P5  = +2.347040444940816227383e-03 */
> +        .quad 0xBF44E609BC2557B7   /* P6  = -6.377742322836087134324e-04 */
> +        .quad 0x3F1AFCBAD60EAACD   /* P7  = +1.029480968230231421206e-04 */
> +        .quad 0x3EE80476AC34A8EF   /* P8  = +1.145240583485084317660e-05 */
> +        .quad 0xBEF278E23DE463E9   /* P9  = -1.761646478213091821804e-05 */
> +        .quad 0x3EE209FAF377264D   /* P10 = +8.601658563106529694651e-06 */
> +        .quad 0xC005000000000000   /* B = -2.625    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C979D62702C631C   /* PL0 = +8.193023793215066385979e-17 */
> +        .quad 0x3FEFCC04CDBCDC4B   /* PH0 = +9.936546343150295390600e-01 */
> +        .quad 0x3F89E87D088D269A   /* P1  = +1.265046770426474576547e-02 */
> +        .quad 0xBF89BE6721012B80   /* P2  = -1.257019586059526836624e-02 */
> +        .quad 0x3F80F1C13E8D39D3   /* P3  = +8.273610803056031004326e-03 */
> +        .quad 0xBF7082DBC9602757   /* P4  = -4.031046430108839563004e-03 */
> +        .quad 0x3F590BE9BD4E0A11   /* P5  = +1.528719197467002507978e-03 */
> +        .quad 0xBF3DCC2BEF6D0283   /* P6  = -4.546744598208711809986e-04 */
> +        .quad 0x3F1A08065C4A8E85   /* P7  = +9.930170842636406837764e-05 */
> +        .quad 0xBEE528117D0410F3   /* P8  = -1.008821337267942266431e-05 */
> +        .quad 0xBED0BE73A44FF565   /* P9  = -3.992069257383521775961e-06 */
> +        .quad 0x3EC9B0C11E342E38   /* P10 = +3.062539904901699218737e-06 */
> +        .quad 0xC007000000000000   /* B = -2.875    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C804B931AD7A3CC   /* PL0 = +2.826768921701616830245e-17 */
> +        .quad 0x3FEFE06EB0688212   /* PH0 = +9.961465306733450209009e-01 */
> +        .quad 0x3F7F81BD8876224D   /* P1  = +7.692089427458426472642e-03 */
> +        .quad 0xBF7F62A8C699A963   /* P2  = -7.662448196791823756776e-03 */
> +        .quad 0x3F74C31E2B2A6A28   /* P3  = +5.068891378551522166321e-03 */
> +        .quad 0xBF6470D537F16227   /* P4  = -2.495209162173734080001e-03 */
> +        .quad 0x3F4FAEEF61C89673   /* P5  = +9.668988091717359455754e-04 */
> +        .quad 0xBF33C5E80B349783   /* P6  = -3.017131341088651514023e-04 */
> +        .quad 0x3F138F3D31037A6B   /* P7  = +7.461367590931028650557e-05 */
> +        .quad 0xBEEB3C780996FFE3   /* P8  = -1.298723536791163711556e-05 */
> +        .quad 0x3E9D0C75BC8BFEFC   /* P9  = +4.328589367358221917138e-07 */
> +        .quad 0x3EAC3865227764D4   /* P10 = +8.410302755848104487452e-07 */
> +        .quad 0xC009000000000000   /* B = -3.125    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C5B978B202749F9   /* PL0 = +5.983054034451594408315e-18 */
> +        .quad 0x3FEFECD6B7EA3128   /* PH0 = +9.976609794698889643882e-01 */
> +        .quad 0x3F73238B786137FE   /* P1  = +4.672570043181776968058e-03 */
> +        .quad 0xBF731815ACEA072E   /* P2  = -4.661640805922390930706e-03 */
> +        .quad 0x3F6956F0816D5AEE   /* P3  = +3.093213784647877798933e-03 */
> +        .quad 0xBF591A16286C4885   /* P4  = -1.532098425461232453877e-03 */
> +        .quad 0x3F43B3E3A00C6096   /* P5  = +6.012784434430592468442e-04 */
> +        .quad 0xBF29441B2A56DEC7   /* P6  = -1.927645836710038499293e-04 */
> +        .quad 0x3F0A99C3A2E857B6   /* P7  = +5.073669705184196724674e-05 */
> +        .quad 0xBEE61CB034DDC151   /* P8  = -1.054385361573597042258e-05 */
> +        .quad 0x3EB792BBC76D6107   /* P9  = +1.405070887824641788698e-06 */
> +        .quad 0x3E761472362A16F0   /* P10 = +8.225391704739515383837e-08 */
> +        .quad 0xC00B000000000000   /* B = -3.375    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9C290AFCBDE00D   /* PL0 = +9.770074992945060684926e-17 */
> +        .quad 0x3FEFF45F6D36133A   /* PH0 = +9.985806592017987259879e-01 */
> +        .quad 0x3F673CEC093032DE   /* P1  = +2.836667068100913999228e-03 */
> +        .quad 0xBF67347A7CD844D5   /* P2  = -2.832640870800243808078e-03 */
> +        .quad 0x3F5EDA25530355DB   /* P3  = +1.883064698679040793627e-03 */
> +        .quad 0xBF4EAD3BBABC1BA9   /* P4  = -9.361783645268534848806e-04 */
> +        .quad 0x3F3842E61CD35432   /* P5  = +3.701984213198588740338e-04 */
> +        .quad 0xBF1F9AB7FD1A3DDD   /* P6  = -1.205611036090218544867e-04 */
> +        .quad 0x3F0136C154EA3DED   /* P7  = +3.283288480304320224929e-05 */
> +        .quad 0xBEDF12807F721E66   /* P8  = -7.408207230892235753013e-06 */
> +        .quad 0x3EB5B53687AD5112   /* P9  = +1.293889481520047941659e-06 */
> +        .quad 0xBE801E90FBFED147   /* P10 = -1.200988872775447204019e-07 */
> +        .quad 0xC00D000000000000   /* B = -3.625    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9E323294294877   /* PL0 = +1.047637125334028950603e-16 */
> +        .quad 0x3FEFF8F21CDAAA62   /* PH0 = +9.991388858373506653976e-01 */
> +        .quad 0x3F5C3470628813F2   /* P1  = +1.721486807697344658108e-03 */
> +        .quad 0xBF5C2E38AC6FF8D2   /* P2  = -1.720004411026422324849e-03 */
> +        .quad 0x3F52C13234626F43   /* P3  = +1.144694354969070234454e-03 */
> +        .quad 0xBF42B0A47DF47BB4   /* P4  = -5.703738387728891173354e-04 */
> +        .quad 0x3F2DB2889E32FBFD   /* P5  = +2.265731592156760387344e-04 */
> +        .quad 0xBF1385FBD54C5A55   /* P6  = -7.447576110695385196414e-05 */
> +        .quad 0x3EF5AFA812C6984E   /* P7  = +2.068153223579892541184e-05 */
> +        .quad 0xBED47097C188A03C   /* P8  = -4.873231795467276043290e-06 */
> +        .quad 0x3EAFF2B982F7EE8C   /* P9  = +9.521288628073486288914e-07 */
> +        .quad 0xBE828EC5B57D424D   /* P10 = -1.382656715739529384702e-07 */
> +        .quad 0xC00F000000000000   /* B = -3.875    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9BA40DA6983BEC   /* PL0 = +9.589840482158163453169e-17 */
> +        .quad 0x3FEFFCAAC3F20E65   /* PH0 = +9.995931460438894911036e-01 */
> +        .quad 0x3F4AA87CF664754C   /* P1  = +8.135423820793490331956e-04 */
> +        .quad 0xBF4AA5B62919E224   /* P2  = -8.132113891426467676310e-04 */
> +        .quad 0x3F41C01B53B0B312   /* P3  = +5.416997368051531710388e-04 */
> +        .quad 0xBF31B8B54D091751   /* P4  = -2.704088811110632606347e-04 */
> +        .quad 0x3F1C431305954ECC   /* P5  = +1.078110084525254933728e-04 */
> +        .quad 0xBF02B7DEAD0D44E6   /* P6  = -3.570221236393906131126e-05 */
> +        .quad 0x3EE51C6EFF109EA9   /* P7  = +1.006654199116272154479e-05 */
> +        .quad 0xBEC48CFB08072D17   /* P8  = -2.449834994621594976610e-06 */
> +        .quad 0x3EA1585EC59CAE34   /* P9  = +5.169271261920604503617e-07 */
> +        .quad 0xBE78832BAF950BA9   /* P10 = -9.131575131209528255629e-08 */
> +        .quad 0xC011000000000000   /* B = -4.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8FBF237F4AFE10   /* PL0 = +5.507163370275307643966e-17 */
> +        .quad 0x3FEFFEC61279A3A4   /* PH0 = +9.998503075449787225182e-01 */
> +        .quad 0x3F339E78281A00EA   /* P1  = +2.993625022114214863645e-04 */
> +        .quad 0xBF339DB7B072AD62   /* P2  = -2.993176899035080028902e-04 */
> +        .quad 0x3F2A259E658EF4E4   /* P3  = +1.994853835451177669594e-04 */
> +        .quad 0xBF1A219C312B10BA   /* P4  = -9.968295880030927192162e-05 */
> +        .quad 0x3F04E146B4F5F4B7   /* P5  = +3.982541113154699160876e-05 */
> +        .quad 0xBEEBC5F137088210   /* P6  = -1.324329943580649487333e-05 */
> +        .quad 0x3ECF96736E300B00   /* P7  = +3.765547135882256916132e-06 */
> +        .quad 0xBEAF4874840B91EB   /* P8  = -9.323068824421825762292e-07 */
> +        .quad 0x3E8B6AB2B5C8FD3F   /* P9  = +2.042709991312793245971e-07 */
> +        .quad 0xBE650BCCE62FD2B7   /* P10 = -3.920140725219944650830e-08 */
> +        .quad 0xC013000000000000   /* B = -4.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9C869C85471703   /* PL0 = +9.896883942603146946483e-17 */
> +        .quad 0x3FEFFF8C81C6DC33   /* PH0 = +9.999449286177707341139e-01 */
> +        .quad 0x3F1CDF5A2E4D7C69   /* P1  = +1.101397316012206760643e-04 */
> +        .quad 0xBF1CDEF1F9BE63BE   /* P2  = -1.101336660539594564027e-04 */
> +        .quad 0x3F133EC10C83AAA0   /* P3  = +7.341435696487731017506e-05 */
> +        .quad 0xBF033DAB325FAACB   /* P4  = -3.669909192168459445238e-05 */
> +        .quad 0x3EEEC598FA98BAD8   /* P5  = +1.467316890843338172161e-05 */
> +        .quad 0xBED47F1A15BA368E   /* P6  = -4.886744445221253126882e-06 */
> +        .quad 0x3EB761FBE7D201C1   /* P7  = +1.393720509029845064726e-06 */
> +        .quad 0xBE974CD75A43BF6B   /* P8  = -3.471994551992448536007e-07 */
> +        .quad 0x3E74B02965BBF8DC   /* P9  = +7.706929621914905669946e-08 */
> +        .quad 0xBE504EF4E3892A66   /* P10 = -1.518840362012570189110e-08 */
> +        .quad 0xC015000000000000   /* B = -5.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C643810400471B0   /* PL0 = +8.768592603904887599187e-18 */
> +        .quad 0x3FEFFFD583014825   /* PH0 = +9.999797400180382433987e-01 */
> +        .quad 0x3F053E71416C43CA   /* P1  = +4.051955345663706869871e-05 */
> +        .quad 0xBF053E550C7C8CC9   /* P2  = -4.051873253121394012080e-05 */
> +        .quad 0x3EFC52D0D90D4843   /* P3  = +2.701139380018752534477e-05 */
> +        .quad 0xBEEC523A6ADBE142   /* P4  = -1.350460237457883558350e-05 */
> +        .quad 0x3ED6A73E22D844B3   /* P5  = +5.400965660055565196396e-06 */
> +        .quad 0xBEBE31D10F23ACD0   /* P6  = -1.799738182979224868919e-06 */
> +        .quad 0x3EA13E14264DEAB2   /* P7  = +5.138663935333241981438e-07 */
> +        .quad 0xBE81385ABB98EDCC   /* P8  = -1.282999997786486835638e-07 */
> +        .quad 0x3E5EB9164593E0B6   /* P9  = +2.861301981891537161158e-08 */
> +        .quad 0xBE387218CFE7772E   /* P10 = -5.691705994073124478195e-09 */
> +        .quad 0xC017000000000000   /* B = -5.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C92530433F4C703   /* PL0 = +6.357512739163799046861e-17 */
> +        .quad 0x3FEFFFF05E8D3191   /* PH0 = +9.999925467214315633058e-01 */
> +        .quad 0x3EEF42DDFA52B575   /* P1  = +1.490650158538873335176e-05 */
> +        .quad 0xBEEF42CEB54212AA   /* P2  = -1.490639048307961378200e-05 */
> +        .quad 0x3EE4D7201CBCB853   /* P3  = +9.937445518550804010127e-06 */
> +        .quad 0xBED4D6F764B66C37   /* P4  = -4.968574624976280456686e-06 */
> +        .quad 0x3EC0ABB806EBDE71   /* P5  = +1.987311456171617620608e-06 */
> +        .quad 0xBEA6399CF854F876   /* P6  = -6.623581475862682369330e-07 */
> +        .quad 0x3E8964B91728D7C9   /* P7  = +1.891959403186505598965e-07 */
> +        .quad 0xBE6961A0528444D6   /* P8  = -4.727645325404986954168e-08 */
> +        .quad 0x3E46AE3B0814EE00   /* P9  = +1.056147192151514779549e-08 */
> +        .quad 0xBE221B8194DACD16   /* P10 = -2.107984154277957626641e-09 */
> +        .quad 0xC019000000000000   /* B = -6.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7BB5622CE1A79E   /* PL0 = +2.403331811901679167526e-17 */
> +        .quad 0x3FEFFFFA3FF22708   /* PH0 = +9.999972580855862602789e-01 */
> +        .quad 0x3ED7003552D53503   /* P1  = +5.483821309338170039906e-06 */
> +        .quad 0xBED7003130C1AB92   /* P2  = -5.483806273169366545037e-06 */
> +        .quad 0x3ECEAAE13B699C45   /* P3  = +3.655850800133043324271e-06 */
> +        .quad 0xBEBEAACB305F3D07   /* P4  = -1.827905351959291114416e-06 */
> +        .quad 0x3EA8887F5F9C87EF   /* P5  = +7.311461438267648556646e-07 */
> +        .quad 0xBE905AD08DF8454F   /* P6  = -2.437046884027860662692e-07 */
> +        .quad 0x3E72B068300B703F   /* P7  = +6.962228483613086736676e-08 */
> +        .quad 0xBE52AF921A71C058   /* P8  = -1.740252888706390465423e-08 */
> +        .quad 0x3E30B53EAA35300D   /* P9  = +3.890131469838137725119e-09 */
> +        .quad 0xBE0AB60CDAD7E22E   /* P10 = -7.773963050435300060566e-10 */
> +        .quad 0xC01B000000000000   /* B = -6.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8BD1ACF80D7256   /* PL0 = +4.825835138930451121169e-17 */
> +        .quad 0x3FEFFFFDE2760A41   /* PH0 = +9.999989913051835488389e-01 */
> +        .quad 0x3EC0EC4F1EC27E55   /* P1  = +2.017388615341105998718e-06 */
> +        .quad 0xBEC0EC4E005E6EAC   /* P2  = -2.017386580411626200507e-06 */
> +        .quad 0x3EB6906504BC4610   /* P3  = +1.344921673533307001969e-06 */
> +        .quad 0xBEA6905F0D52C8B5   /* P4  = -6.724581235377781360384e-07 */
> +        .quad 0x3E920D0F5CCE152B   /* P5  = +2.689810941136721216499e-07 */
> +        .quad 0xBE7811505B10E753   /* P6  = -8.965891741619763761543e-08 */
> +        .quad 0x3E5B811EE4F9B8EE   /* P7  = +2.561544781706659619288e-08 */
> +        .quad 0xBE3B80ABC067E840   /* P8  = -6.403452884688571158579e-09 */
> +        .quad 0x3E1898E394E09335   /* P9  = +1.431746793613569087489e-09 */
> +        .quad 0xBDF3ABB5BA711DB7   /* P10 = -2.862469657501951918569e-10 */
> +        .quad 0xC01D000000000000   /* B = -7.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8AE01DB39A3791   /* PL0 = +4.662147961093911873193e-17 */
> +        .quad 0x3FEFFFFF38C76668   /* PH0 = +9.999996289217962797125e-01 */
> +        .quad 0x3EA8E712E56E1188   /* P1  = +7.421562696484951529573e-07 */
> +        .quad 0xBEA8E7124A650791   /* P2  = -7.421559942504648535596e-07 */
> +        .quad 0x3EA09A0B62D8EF94   /* P3  = +4.947702955735978541097e-07 */
> +        .quad 0xBE909A09C56C2107   /* P4  = -2.473847805916120382218e-07 */
> +        .quad 0x3E7A900A90A54A6E   /* P5  = +9.895362410487317236618e-08 */
> +        .quad 0xBE61B5557BB449B6   /* P6  = -3.298434544432568302770e-08 */
> +        .quad 0x3E443CC74732CDCA   /* P7  = +9.423781066565733462466e-09 */
> +        .quad 0xBE243CA8AA8D6E54   /* P8  = -2.355890888986360997159e-09 */
> +        .quad 0x3E0219C341E0D1B4   /* P9  = +5.267978308406275552691e-10 */
> +        .quad 0xBDDCF49A10950F13   /* P10 = -1.053394074620716018815e-10 */
> +        .quad 0xC01F000000000000   /* B = -7.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C75CB18F3775414   /* PL0 = +1.890271747518592444083e-17 */
> +        .quad 0x3FEFFFFFD38C39F0   /* PH0 = +9.999999172012490333827e-01 */
> +        .quad 0x3E8639E2F89493BB   /* P1  = +1.655974950855472979393e-07 */
> +        .quad 0xBE8639E2D9B29562   /* P2  = -1.655974813708346974914e-07 */
> +        .quad 0x3E7DA2836A1F706E   /* P3  = +1.103982989742589616541e-07 */
> +        .quad 0xBE6DA282C6733DAE   /* P4  = -5.519913131581509871840e-08 */
> +        .quad 0x3E57B53A278851FD   /* P5  = +2.207971980430773309147e-08 */
> +        .quad 0xBE3F9C4A72536E22   /* P6  = -7.359895614149337484810e-09 */
> +        .quad 0x3E220E81FBE19CDD   /* P7  = +2.102073153607135257714e-09 */
> +        .quad 0xBE020E8875ADA8D8   /* P8  = -5.255211642212584097407e-10 */
> +        .quad 0x3DE07634328384FC   /* P9  = +1.197748786062966341989e-10 */
> +        .quad 0xBDBA54078E3C351F   /* P10 = -2.394539505021488953905e-11 */
> +        .quad 0xC021000000000000   /* B = -8.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C98B78738B0EDEF   /* PL0 = +8.575399788039081964921e-17 */
> +        .quad 0x3FEFFFFFF9FBEA40   /* PH0 = +9.999999887944071019774e-01 */
> +        .quad 0x3E581056FAC28C46   /* P1  = +2.241118550516412682327e-08 */
> +        .quad 0xBE581056F63A4351   /* P2  = -2.241118525356742542550e-08 */
> +        .quad 0x3E500AE49533790A   /* P3  = +1.494078933911655875521e-08 */
> +        .quad 0xBE400AE489ACBA90   /* P4  = -7.470394349637968945652e-09 */
> +        .quad 0x3E29AB0D59A1967B   /* P5  = +2.988168557255271725494e-09 */
> +        .quad 0xBE111CB32D6EEF2B   /* P6  = -9.960558400070350772418e-10 */
> +        .quad 0x3DF38CBADF396908   /* P7  = +2.844859618921805216353e-10 */
> +        .quad 0xBDD38CC7B92CECD3   /* P8  = -7.112220386749926320915e-11 */
> +        .quad 0x3DB1D2BBE2705032   /* P9  = +1.621008722427575444686e-11 */
> +        .quad 0xBD8C8199294E6380   /* P10 = -3.240784656869469020111e-12 */
> +        .quad 0xC023000000000000   /* B = -9.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8EEEC16618B984   /* PL0 = +5.365957423487855307906e-17 */
> +        .quad 0x3FEFFFFFFF2F9279   /* PH0 = +9.999999984834878619111e-01 */
> +        .quad 0x3E2A0DB0D052B148   /* P1  = +3.033024167396880687734e-09 */
> +        .quad 0xBE2A0DB0CFA6AB71   /* P2  = -3.033024162734192808028e-09 */
> +        .quad 0x3E215E75D53A3105   /* P3  = +2.022016035353114070618e-09 */
> +        .quad 0xBE115E75D40AA47F   /* P4  = -1.011008013562702155050e-09 */
> +        .quad 0x3DFBCA5CDC12ED1C   /* P5  = +4.044047007631481841556e-10 */
> +        .quad 0xBDE286E85704FC22   /* P6  = -1.348015410318274576187e-10 */
> +        .quad 0x3DC52A8925354517   /* P7  = +3.850101197145027796396e-11 */
> +        .quad 0xBDA52A97EA3F5F4A   /* P8  = -9.625355478142550638468e-12 */
> +        .quad 0x3D834C011A2AC0F7   /* P9  = +2.193802608697321032841e-12 */
> +        .quad 0xBD5EDD05BDCB3A62   /* P10 = -4.385948508419928563300e-13 */
> +        .quad 0xC025000000000000   /* B = -10.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BD8B474BBF792   /* PL0 = +1.207649585364892639612e-17 */
> +        .quad 0x3FEFFFFFFFE3CAD8   /* PH0 = +9.999999997947623953110e-01 */
> +        .quad 0x3DFC3527E43C565F   /* P1  = +4.104751852963940338559e-10 */
> +        .quad 0xBDFC3527E420F415   /* P2  = -4.104751852036136216697e-10 */
> +        .quad 0x3DF2CE1A8D806DAD   /* P3  = +2.736501142887952919489e-10 */
> +        .quad 0xBDE2CE1A8DDF690A   /* P4  = -1.368250573053032426141e-10 */
> +        .quad 0x3DCE169832D8BD68   /* P5  = +5.473022586854025789680e-11 */
> +        .quad 0xBDB40F0FE853DA5B   /* P6  = -1.824340550195944358477e-11 */
> +        .quad 0x3D96EA8D930D31A1   /* P7  = +5.210545794901128943676e-12 */
> +        .quad 0xBD76EA9DB0D09839   /* P8  = -1.302650427355019556441e-12 */
> +        .quad 0x3D54E474FD4303A1   /* P9  = +2.968990047962355000258e-13 */
> +        .quad 0xBD30B526CA2B228A   /* P10 = -5.935740124899435401321e-14 */
> +        .quad 0xC027000000000000   /* B = -11.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C56E8953D525FD5   /* PL0 = +4.967494994909661698725e-18 */
> +        .quad 0x3FEFFFFFFFFC2EB9   /* PH0 = +9.999999999722241073030e-01 */
> +        .quad 0x3DCE8A37A48016C2   /* P1  = +5.555177547354687971427e-11 */
> +        .quad 0xBDCE8A37A479B7D4   /* P2  = -5.555177547084873157964e-11 */
> +        .quad 0x3DC45C250CFA9C16   /* P3  = +3.703451575129414499553e-11 */
> +        .quad 0xBDB45C250D9F8467   /* P4  = -1.851725791056759260154e-11 */
> +        .quad 0x3DA049BB33CBD4E9   /* P5  = +7.406930640558963265190e-12 */
> +        .quad 0xBD85B7A407C422C1   /* P6  = -2.468976464832073512208e-12 */
> +        .quad 0x3D68CF9CED2B3FD5   /* P7  = +7.051706989348171774536e-13 */
> +        .quad 0xBD48CFAE64C352B3   /* P8  = -1.762945685274427023683e-13 */
> +        .quad 0x3D269EAE08690D52   /* P9  = +4.018091287355461204663e-14 */
> +        .quad 0xBD0216CBEAFFF5AA   /* P10 = -8.033151495672990022322e-15 */
> +        .quad 0xC029000000000000   /* B = -12.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8ACF1392B106D3   /* PL0 = +4.650601502940921454330e-17 */
> +        .quad 0x3FEFFFFFFFFF7BBD   /* PH0 = +9.999999999962408958609e-01 */
> +        .quad 0x3DA088529889B316   /* P1  = +7.518115268189742464885e-12 */
> +        .quad 0xBDA088529887F4C4   /* P2  = -7.518115268005149164680e-12 */
> +        .quad 0x3D960B18BF1DF711   /* P3  = +5.012076679213679703380e-12 */
> +        .quad 0xBD860B18BFD99A48   /* P4  = -2.506038344573564868987e-12 */
> +        .quad 0x3D71A27E7CA64143   /* P5  = +1.002419056539285288454e-12 */
> +        .quad 0xBD5783530EA76D91   /* P6  = -3.341396294294381580191e-13 */
> +        .quad 0x3D3ADCC75CBD2A03   /* P7  = +9.543447641637910477850e-14 */
> +        .quad 0xBD1ADCDA46BE5F17   /* P8  = -2.385887543769010971872e-14 */
> +        .quad 0x3CF87D77650BE5B8   /* P9  = +5.437895260471143131391e-15 */
> +        .quad 0xBCD395AE6E74C6D2   /* P10 = -1.087168847335561258239e-15 */
> +        .quad 0xC02B000000000000   /* B = -13.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C97A8A295292858   /* PL0 = +8.208271151146829171896e-17 */
> +        .quad 0x3FEFFFFFFFFFEE19   /* PH0 = +9.999999999994911847878e-01 */
> +        .quad 0x3D71E642BB008F95   /* P1  = +1.017466259229268282255e-12 */
> +        .quad 0xBD71E642BAFEEC54   /* P2  = -1.017466259207593392022e-12 */
> +        .quad 0x3D67DDAE41647741   /* P3  = +6.783108169938233581038e-13 */
> +        .quad 0xBD57DDAE4230F34B   /* P4  = -3.391554091734942426856e-13 */
> +        .quad 0x3D4317C33FAE2536   /* P5  = +1.356626669455791324801e-13 */
> +        .quad 0xBD2975040D3E26B9   /* P6  = -4.522088139411435138867e-14 */
> +        .quad 0x3D0D155DCD0F0AFB   /* P7  = +1.291565189902030307333e-14 */
> +        .quad 0xBCED157247832B20   /* P8  = -3.228947666403019234175e-15 */
> +        .quad 0x3CCA83D70F607C28   /* P9  = +7.359390959466796619024e-16 */
> +        .quad 0xBCA5343952C1E19E   /* P10 = -1.471323041436694087188e-16 */
> +        .quad 0xC02D000000000000   /* B = -14.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9B7876CBC5306E   /* PL0 = +9.530765996816607711732e-17 */
> +        .quad 0x3FEFFFFFFFFFFD93   /* PH0 = +9.999999999999310551502e-01 */
> +        .quad 0x3D436121E2640D76   /* P1  = +1.376990843765503869546e-13 */
> +        .quad 0xBD436121E26250EA   /* P2  = -1.376990843736775811281e-13 */
> +        .quad 0x3D39D6D7CA259186   /* P3  = +9.179938654047876451320e-14 */
> +        .quad 0xBD29D6D7CB0327CE   /* P4  = -4.589969336188563660531e-14 */
> +        .quad 0x3D14ABE4DC31244A   /* P5  = +1.835994545584345768382e-14 */
> +        .quad 0xBCFB8FDB82AB6BB7   /* P6  = -6.119980791767901275443e-15 */
> +        .quad 0x3CDF7CF757491B60   /* P7  = +1.747943407988343076526e-15 */
> +        .quad 0xBCBF7D0D833640FB   /* P8  = -4.369905470133249448357e-16 */
> +        .quad 0x3C9CB512F6BDC754   /* P9  = +9.959852600692493655511e-17 */
> +        .quad 0xBC76F50AB1B0E9BA   /* P10 = -1.991219205936492089091e-17 */
> +        .quad 0xC02F000000000000   /* B = -15.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6FFE15D5F78543   /* PL0 = +1.387454417328248962819e-17 */
> +        .quad 0x3FEFFFFFFFFFFFE1   /* PH0 = +9.999999999999965583086e-01 */
> +        .quad 0x3CFEE00288B99C26   /* P1  = +6.855635762864742358597e-15 */
> +        .quad 0xBCFEE0027D060EE2   /* P2  = -6.855635607998342735403e-15 */
> +        .quad 0x3CF4954AA23148A2   /* P3  = +4.570381865813341696777e-15 */
> +        .quad 0xBCE4954B5DAD3010   /* P4  = -2.285192173571711474199e-15 */
> +        .quad 0x3CD07883DD8793BD   /* P5  = +9.143109661358222028007e-16 */
> +        .quad 0xBCB5F5F4BB87ADCF   /* P6  = -3.047668447080103869032e-16 */
> +        .quad 0x3C98F1A905097685   /* P7  = +8.654183371862458774513e-17 */
> +        .quad 0xBC78F2D585007222   /* P8  = -2.163943551222030413627e-17 */
> +        .quad 0x3C58A37CC5082B5F   /* P9  = +5.342649626494471588064e-18 */
> +        .quad 0xBC33AE7917F94D17   /* P10 = -1.066938163384541013918e-18 */
> +        .quad 0xC031000000000000   /* B = -17        */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C91BF1D80474F0F   /* PL0 = +6.157069264461989135096e-17 */
> +        .quad 0x3FEFFFFFFFFFFFFE   /* PH0 = +9.999999999999997779554e-01 */
> +        .quad 0x3CB72071400E6275   /* P1  = +3.209478247225075961360e-16 */
> +        .quad 0xBCB72071400A9F37   /* P2  = -3.209478247103497434502e-16 */
> +        .quad 0x3CAED5EC39A77629   /* P3  = +2.139652050028423711308e-16 */
> +        .quad 0xBC9ED5EC3B530600   /* P4  = -1.069826028468029104719e-16 */
> +        .quad 0x3C88AB2BFED159DE   /* P5  = +4.279326904335078988705e-17 */
> +        .quad 0xBC70721D1220B3FC   /* P6  = -1.426441958074916244382e-17 */
> +        .quad 0x3C52C96049721FB8   /* P7  = +4.073700029965821523731e-18 */
> +        .quad 0xBC32C971215735DC   /* P8  = -1.018438939975201710113e-18 */
> +        .quad 0x3C112EF658AB41A9   /* P9  = +2.328791246104218830028e-19 */
> +        .quad 0xBBEB7B598C6AD3DE   /* P10 = -4.655603964908654142787e-20 */
> +        .quad 0xC03287E0C98F84E5   /* B = -18.530774 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF0000000000000   /* PH0 = +1.000000000000000000000e+00 */
> +        .quad 0x0000000000000000   /* P1  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P2  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P3  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P4  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P5  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P6  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P7  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P8  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P9  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P10 = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* B = +0        */
> +        .quad 0x0000000000000000   /* A = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .align 32
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000   /* _dbSignMask       */
> +        .align 32
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff   /* _dbAbsMask        */
> +        .align 32
> +        .long 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000           /* _iExpMantMask     */
> +        .align 32
> +        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMask         */
> +        .align 32
> +        .long 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000           /* _iMinIdxOfsMask   */
> +        .align 32
> +        .long 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000           /* _iMaxIdxMask      */
> +        .align 32
> +        .type	__svml_dtanh_data_internal,@object
> +        .size	__svml_dtanh_data_internal,.-__svml_dtanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S
> new file mode 100644
> index 0000000000..92fb24a640
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized tanh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_tanh _ZGVeN8v_tanh_avx2_wrapper
> +#include "../svml_d_tanh8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c
> new file mode 100644
> index 0000000000..495cb1f4fc
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized tanh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_tanh
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_tanh, __GI__ZGVeN8v_tanh, __redirect__ZGVeN8v_tanh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S
> new file mode 100644
> index 0000000000..01fc22ba6f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S
> @@ -0,0 +1,472 @@
> +/* Function tanh vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dtanh_data_internal
> + */
> +#define _dC                           	0
> +#define _dP0                          	128
> +#define _dP1                          	256
> +#define _dP2                          	384
> +#define _dP3                          	512
> +#define _dP4                          	640
> +#define _dP5                          	768
> +#define _dP6                          	896
> +#define _dP7                          	1024
> +#define _dP8                          	1152
> +#define _dP9                          	1280
> +#define _dP10                         	1408
> +#define _dP11                         	1536
> +#define _dP12                         	1664
> +#define _dP13                         	1792
> +#define _dP14                         	1920
> +#define _dP15                         	2048
> +#define _dP16                         	2176
> +#define _dP17                         	2304
> +#define _iExpMantMask_UISA            	2432
> +#define _iMinIdxOfsMask_UISA          	2496
> +#define _iMaxIdxMask_UISA             	2560
> +#define _dbSignMask                   	2624
> +#define _dbAbsMask                    	2688
> +#define _iExpMantMask                 	2752
> +#define _iExpMask                     	2816
> +#define _iMinIdxOfsMask               	2880
> +#define _iMaxIdxMask                  	2944
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_tanh_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $320, %rsp
> +        vpsrlq    $32, %zmm0, %zmm4
> +        vmovups   %zmm0, (%rsp)
> +        vmovups   __svml_dtanh_data_internal(%rip), %zmm14
> +        vmovups   _dP0+__svml_dtanh_data_internal(%rip), %zmm15
> +        vpmovqd   %zmm4, %ymm5
> +
> +/*  Constant loading  */
> +        vandpd    _dbAbsMask+__svml_dtanh_data_internal(%rip), %zmm0, %zmm13
> +        vandpd    _dbSignMask+__svml_dtanh_data_internal(%rip), %zmm0, %zmm3
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        vpand     _iExpMantMask_UISA+__svml_dtanh_data_internal(%rip), %ymm5, %ymm7
> +        vmovups   _dP2+__svml_dtanh_data_internal(%rip), %zmm0
> +        vmovups   _dP16+__svml_dtanh_data_internal(%rip), %zmm4
> +        vmovups   _dP15+__svml_dtanh_data_internal(%rip), %zmm5
> +        vmovups   %zmm3, 64(%rsp)
> +        vmovups   _dP3+__svml_dtanh_data_internal(%rip), %zmm3
> +        vpsubd    _iMinIdxOfsMask_UISA+__svml_dtanh_data_internal(%rip), %ymm7, %ymm8
> +
> +/* if VMIN, VMAX is defined for I type */
> +        vxorps    %ymm9, %ymm9, %ymm9
> +        vpmaxsd   %ymm9, %ymm8, %ymm10
> +        vpminsd   _iMaxIdxMask_UISA+__svml_dtanh_data_internal(%rip), %ymm10, %ymm11
> +        vpsrld    $19, %ymm11, %ymm12
> +        vmovups   _dP12+__svml_dtanh_data_internal(%rip), %zmm8
> +        vmovups   _dP11+__svml_dtanh_data_internal(%rip), %zmm9
> +        vmovups   _dP10+__svml_dtanh_data_internal(%rip), %zmm10
> +        vmovups   _dP9+__svml_dtanh_data_internal(%rip), %zmm11
> +        vpmovzxdq %ymm12, %zmm2
> +        vmovups   _dP8+__svml_dtanh_data_internal(%rip), %zmm12
> +        vpermt2pd _dP2+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm0
> +        vpermt2pd _dC+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm14
> +        vpermt2pd _dP16+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm4
> +        vpermt2pd _dP15+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm5
> +        vsubpd    {rn-sae}, %zmm14, %zmm13, %zmm1
> +        vpermt2pd _dP12+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm8
> +        vpermt2pd _dP11+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm9
> +        vpermt2pd _dP10+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm10
> +        vpermt2pd _dP9+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm11
> +        vpermt2pd _dP8+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm12
> +        vpermt2pd _dP3+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm3
> +        vpermt2pd _dP0+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm15
> +        vmovups   %zmm0, 192(%rsp)
> +        vmovups   _dP17+__svml_dtanh_data_internal(%rip), %zmm0
> +        vmovups   _dP7+__svml_dtanh_data_internal(%rip), %zmm13
> +        vmovups   _dP6+__svml_dtanh_data_internal(%rip), %zmm14
> +        vmovups   %zmm3, 256(%rsp)
> +        vmovups   _dP5+__svml_dtanh_data_internal(%rip), %zmm3
> +        vmovups   %zmm15, 128(%rsp)
> +        vmovups   _dP4+__svml_dtanh_data_internal(%rip), %zmm15
> +        vpermt2pd _dP17+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm0
> +        vpermt2pd _dP7+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm13
> +        vpermt2pd _dP6+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm14
> +        vpermt2pd _dP5+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm3
> +        vpermt2pd _dP4+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm15
> +        vfmadd213pd {rn-sae}, %zmm4, %zmm1, %zmm0
> +        vpcmpgtd  _iExpMask+__svml_dtanh_data_internal(%rip), %ymm7, %ymm6
> +        vmovmskps %ymm6, %edx
> +        vmovups   _dP14+__svml_dtanh_data_internal(%rip), %zmm6
> +        vfmadd213pd {rn-sae}, %zmm5, %zmm1, %zmm0
> +        vmovups   _dP13+__svml_dtanh_data_internal(%rip), %zmm7
> +        vpermt2pd _dP14+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm6
> +        vpermt2pd _dP13+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm7
> +        vfmadd213pd {rn-sae}, %zmm6, %zmm1, %zmm0
> +        vmovups   256(%rsp), %zmm2
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm9, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm11, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm3, %zmm1, %zmm0
> +        vmovups   128(%rsp), %zmm3
> +        vfmadd213pd {rn-sae}, %zmm15, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm1, %zmm0
> +        vmovups   192(%rsp), %zmm2
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm3, %zmm1, %zmm0
> +        vorpd     64(%rsp), %zmm0, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   (%rsp), %zmm1
> +        vmovups   %zmm0, 128(%rsp)
> +        vmovups   %zmm1, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -304; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -312; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xfe, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -320; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -304; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -312; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xfe, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -320; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      tanh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_tanh_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dtanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _dC[16][2];
> +        __declspec(align(64)) VUINT32 _dP0[16][2];
> +        __declspec(align(64)) VUINT32 _dP1[16][2];
> +        __declspec(align(64)) VUINT32 _dP2[16][2];
> +        __declspec(align(64)) VUINT32 _dP3[16][2];
> +        __declspec(align(64)) VUINT32 _dP4[16][2];
> +        __declspec(align(64)) VUINT32 _dP5[16][2];
> +        __declspec(align(64)) VUINT32 _dP6[16][2];
> +        __declspec(align(64)) VUINT32 _dP7[16][2];
> +        __declspec(align(64)) VUINT32 _dP8[16][2];
> +        __declspec(align(64)) VUINT32 _dP9[16][2];
> +        __declspec(align(64)) VUINT32 _dP10[16][2];
> +        __declspec(align(64)) VUINT32 _dP11[16][2];
> +        __declspec(align(64)) VUINT32 _dP12[16][2];
> +        __declspec(align(64)) VUINT32 _dP13[16][2];
> +        __declspec(align(64)) VUINT32 _dP14[16][2];
> +        __declspec(align(64)) VUINT32 _dP15[16][2];
> +        __declspec(align(64)) VUINT32 _dP16[16][2];
> +        __declspec(align(64)) VUINT32 _dP17[16][2];
> +        __declspec(align(64)) VUINT32 _iExpMantMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _iMinIdxOfsMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _iMaxIdxMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _dbSignMask[8][2];
> +        __declspec(align(64)) VUINT32 _dbAbsMask[8][2];
> +        __declspec(align(64)) VUINT32 _iExpMantMask[16][1];
> +        __declspec(align(64)) VUINT32 _iExpMask[16][1];
> +        __declspec(align(64)) VUINT32 _iMinIdxOfsMask[16][1];
> +        __declspec(align(64)) VUINT32 _iMaxIdxMask[16][1];
> +} __svml_dtanh_data_internal;
> +#endif
> +__svml_dtanh_data_internal:
> +        /*== _dC ==*/
> +        .quad 0x0000000000000000, 0x3fcc000000000000, 0x3fd4000000000000, 0x3fdc000000000000
> +        .quad 0x3fe4000000000000, 0x3fec000000000000, 0x3ff4000000000000, 0x3ffc000000000000
> +        .quad 0x4004000000000000, 0x400c000000000000, 0x4014000000000000, 0x401c000000000000
> +        .quad 0x4024000000000000, 0x402c000000000000, 0x4034000000000000, 0x0000000000000000
> +        /*== p0 ==*/
> +        .align 64
> +        .quad 0x0000000000000000, 0x3fcb8fd0416a7c92, 0x3fd35f98a0ea650e, 0x3fda5729ee488037
> +        .quad 0x3fe1bf47eabb8f95, 0x3fe686650b8c2015, 0x3feb2523bb6b2dee, 0x3fee1fbf97e33527
> +        .quad 0x3fef9258260a71c2, 0x3feff112c63a9077, 0x3fefff419668df11, 0x3feffffc832750f2
> +        .quad 0x3feffffffdc96f35, 0x3fefffffffffcf58, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== p1 ==*/
> +        .align 64
> +        .quad 0x0000000000000000, 0x3c65e23ebcd3bcbe, 0xbc4c600bac3adf00, 0x3c6c44091785d040
> +        .quad 0x3c8221d7a6e3674b, 0x3c69f89d2cf6b85c, 0x3c73b3e9ec0b8f1c, 0xbc7f8d4b0428aada
> +        .quad 0xbc7c52d880cf43c0, 0x3c7dd36e37096480, 0x3c7b4f6380c442ca, 0xbc729755de470096
> +        .quad 0x3c84cf852845efbd, 0x3c6fc4fb440a5378, 0xbc63981083b55870, 0x0000000000000000
> +        /*== p2 ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3fee842ca3f08532, 0x3fed11574af58f1b, 0x3fea945b9c24e4f9
> +        .quad 0x3fe6284c3374f815, 0x3fe02500a09f8d6e, 0x3fd1f25131e3a8c0, 0x3fbd22ca1c24a139
> +        .quad 0x3f9b3afe1fba5c76, 0x3f6dd37d19b22b21, 0x3f27ccec13a9ef96, 0x3ecbe6c3f33250ae
> +        .quad 0x3e41b4865394f75f, 0x3d8853f01bda5f28, 0x3c73953c0197ef58, 0x0000000000000000
> +        /*== p3 ==*/
> +        .align 64
> +        .quad 0xbbf0b3ea3fdfaa19, 0xbfca48aaeb53bc21, 0xbfd19921f4329916, 0xbfd5e0f09bef8011
> +        .quad 0xbfd893b59c35c882, 0xbfd6ba7cb7576538, 0xbfce7291743d7555, 0xbfbb6d85a01efb80
> +        .quad 0xbf9addae58c7141a, 0xbf6dc59376c7aa19, 0xbf27cc5e74677410, 0xbecbe6c0e8b4cc87
> +        .quad 0xbe41b486526b0565, 0xbd8853f01bef63a4, 0xbc73955be519be31, 0x0000000000000000
> +        /*== p4 ==*/
> +        .align 64
> +        .quad 0xbfd5555555555555, 0xbfd183afc292ba11, 0xbfcc1a4b039c9bfa, 0xbfc16e1e6d8d0be6
> +        .quad 0xbf92426c751e48a2, 0x3fb4f152b2bad124, 0x3fbbba40cbef72be, 0x3fb01ba038be6a3d
> +        .quad 0x3f916df44871efc8, 0x3f63c6869dfc8870, 0x3f1fb9aef915d828, 0x3ec299d1e27c6e11
> +        .quad 0x3e379b5ddcca334c, 0x3d8037f57bc62c9a, 0x3c6a2d4b50a2cff7, 0x0000000000000000
> +        /*== p5 ==*/
> +        .align 64
> +        .quad 0xbce6863ee44ed636, 0x3fc04dcd0476c75e, 0x3fc43d3449a80f08, 0x3fc5c26f3699b7e7
> +        .quad 0x3fc1a686f6ab2533, 0x3faf203c316ce730, 0xbf89c7a02788557c, 0xbf98157e26e0d541
> +        .quad 0xbf807b55c1c7d278, 0xbf53a18d5843190f, 0xbf0fb6bbc89b1a5b, 0xbeb299c9c684a963
> +        .quad 0xbe279b5dd4fb3d01, 0xbd7037f57ae72aa6, 0xbc5a2ca2bba78e86, 0x0000000000000000
> +        /*== p6 ==*/
> +        .align 64
> +        .quad 0x3fc1111111112ab5, 0x3fb5c19efdfc08ad, 0x3fa74c98dc34fbac, 0xbf790d6a8eff0a77
> +        .quad 0xbfac3c021789a786, 0xbfae2196b7326859, 0xbf93a7a011ff8c2a, 0x3f6e4709c7e8430e
> +        .quad 0x3f67682afa611151, 0x3f3ef2ee77717cbf, 0x3ef95a4482f180b7, 0x3e9dc2c27da3b603
> +        .quad 0x3e12e2afd9f7433e, 0x3d59f320348679ba, 0x3c44b61d9bbcc940, 0x0000000000000000
> +        /*== p7 ==*/
> +        .align 64
> +        .quad 0xbda1ea19ddddb3b4, 0xbfb0b8df995ce4df, 0xbfb2955cf41e8164, 0xbfaf9d05c309f7c6
> +        .quad 0xbf987d27ccff4291, 0x3f8b2ca62572b098, 0x3f8f1cf6c7f5b00a, 0x3f60379811e43dd5
> +        .quad 0xbf4793826f78537e, 0xbf2405695e36240f, 0xbee0e08de39ce756, 0xbe83d709ba5f714e
> +        .quad 0xbdf92e3fc5ee63e0, 0xbd414cc030f2110e, 0xbc2ba022e8d82a87, 0x0000000000000000
> +        /*== p8 ==*/
> +        .align 64
> +        .quad 0xbfaba1ba1990520b, 0xbf96e37bba52f6fc, 0x3ecff7df18455399, 0x3f97362834d33a4e
> +        .quad 0x3f9e7f8380184b45, 0x3f869543e7c420d4, 0xbf7326bd4914222a, 0xbf5fc15b0a9d98fa
> +        .quad 0x3f14cffcfa69fbb6, 0x3f057e48e5b79d10, 0x3ec33b66d7d77264, 0x3e66ac4e578b9b10
> +        .quad 0x3ddcc74b8d3d5c42, 0x3d23c589137f92b4, 0x3c107f8e2c8707a1, 0x0000000000000000
> +        /*== p9 ==*/
> +        .align 64
> +        .quad 0xbe351ca7f096011f, 0x3f9eaaf3320c3851, 0x3f9cf823fe761fc1, 0x3f9022271754ff1f
> +        .quad 0xbf731fe77c9c60af, 0xbf84a6046865ec7d, 0xbf4ca3f1f2b9192b, 0x3f4c77dee0afd227
> +        .quad 0x3f04055bce68597a, 0xbee2bf0cb4a71647, 0xbea31eaafe73efd5, 0xbe46abb02c4368ed
> +        .quad 0xbdbcc749ca8079dd, 0xbd03c5883836b9d2, 0xbbf07a5416264aec, 0x0000000000000000
> +        /*== p10 ==*/
> +        .align 64
> +        .quad 0x3f9664f94e6ac14e, 0xbf94d3343bae39dd, 0xbf7bc748e60df843, 0xbf8c89372b43ba85
> +        .quad 0xbf8129a092de747a, 0x3f60c85b4d538746, 0x3f5be9392199ec18, 0xbf2a0c68a4489f10
> +        .quad 0xbf00462601dc2faa, 0x3eb7b6a219dea9f4, 0x3e80cbcc8d4c5c8a, 0x3e2425bb231a5e29
> +        .quad 0x3d9992a4beac8662, 0x3ce191ba5ed3fb67, 0x3bc892450bad44c4, 0x0000000000000000
> +        /*== p11 ==*/
> +        .align 64
> +        .quad 0xbea8c4c1fd7852fe, 0xbfccce16b1046f13, 0xbf81a16f224bb7b6, 0xbf62cbf00406bc09
> +        .quad 0x3f75b29bb02cf69b, 0x3f607df0f9f90c17, 0xbf4b852a6e0758d5, 0xbf0078c63d1b8445
> +        .quad 0x3eec12eadd55be7a, 0xbe6fa600f593181b, 0xbe5a3c935dce3f7d, 0xbe001c6d95e3ae96
> +        .quad 0xbd74755a00ea1fd3, 0xbcbc1c6c063bb7ac, 0xbba3be9a4460fe00, 0x0000000000000000
> +        /*== p12 ==*/
> +        .align 64
> +        .quad 0xbf822404577aa9dd, 0x403d8b07f7a82aa3, 0xbf9f44ab92fbab0a, 0x3fb2eac604473d6a
> +        .quad 0x3f45f87d903aaac8, 0xbf5e104671036300, 0x3f19bc98ddf0f340, 0x3f0d4304bc9246e8
> +        .quad 0xbed13c415f7b9d41, 0xbe722b8d9720cdb0, 0x3e322666d739bec0, 0x3dd76a553d7e7918
> +        .quad 0x3d4de0fa59416a39, 0x3c948716cf3681b4, 0x3b873f9f2d2fda99, 0x0000000000000000
> +        /*== p13 ==*/
> +        .align 64
> +        .quad 0xbefdd99a221ed573, 0x4070593a3735bab4, 0xbfccab654e44835e, 0x3fd13ed80037dbac
> +        .quad 0xbf6045b9076cc487, 0x3f2085ee7e8ac170, 0x3f23524622610430, 0xbeff12a6626911b4
> +        .quad 0x3eab9008bca408af, 0x3e634df71865f620, 0xbe05bb1bcf83ca73, 0xbdaf2ac143fb6762
> +        .quad 0xbd23eae52a3dbf57, 0xbc6b5e3e9ca0955e, 0xbb5eca68e2c1ba2e, 0x0000000000000000
> +        /*== p14 ==*/
> +        .align 64
> +        .quad 0x3f6e3be689423841, 0xc0d263511f5baac1, 0x40169f73b15ebe5c, 0xc025c1dd41cd6cb5
> +        .quad 0xbf58fd89fe05e0d1, 0x3f73f7af01d5af7a, 0xbf1e40bdead17e6b, 0x3ee224cd6c4513e5
> +        .quad 0xbe24b645e68eeaa3, 0xbe4abfebfb72bc83, 0x3dd51c38f8695ed3, 0x3d8313ac38c6832b
> +        .quad 0x3cf7787935626685, 0x3c401ffc49c6bc29, 0xbabf0b21acfa52ab, 0x0000000000000000
> +        /*== p15 ==*/
> +        .align 64
> +        .quad 0xbf2a1306713a4f3a, 0xc1045e509116b066, 0x4041fab9250984ce, 0xc0458d090ec3de95
> +        .quad 0xbf74949d60113d63, 0x3f7c9fd6200d0ade, 0x3f02cd40e0ad0a9f, 0xbe858ab8e019f311
> +        .quad 0xbe792fa6323b7cf8, 0x3e2df04d67876402, 0xbd95c72be95e4d2c, 0xbd55a89c30203106
> +        .quad 0xbccad6b3bb9eff65, 0xbc12705ccd3dd884, 0xba8e0a4c47ae75f5, 0x0000000000000000
> +        /*== p16 ==*/
> +        .align 64
> +        .quad 0xbf55d7e76dc56871, 0x41528c38809c90c7, 0xc076d57fb5190b02, 0x4085f09f888f8ada
> +        .quad 0x3fa246332a2fcba5, 0xbfb29d851a896fcd, 0x3ed9065ae369b212, 0xbeb8e1ba4c98a030
> +        .quad 0x3e6ffd0766ad4016, 0xbe0c63c29f505f5b, 0xbd7fab216b9e0e49, 0x3d2826b62056aa27
> +        .quad 0x3ca313e31762f523, 0x3bea37aa21895319, 0x3ae5c7f1fd871496, 0x0000000000000000
> +        /*== p17 ==*/
> +        .align 64
> +        .quad 0x3f35e67ab76a26e7, 0x41848ee0627d8206, 0xc0a216d618b489ec, 0x40a5b89107c8af4f
> +        .quad 0x3fb69d8374520eda, 0xbfbded519f981716, 0xbef02d288b5b3371, 0x3eb290981209c1a6
> +        .quad 0xbe567e924bf5ff6e, 0x3de3f7f7de6b0eb6, 0x3d69ed18bae3ebbc, 0xbcf7534c4f3dfa71
> +        .quad 0xbc730b73f1eaff20, 0xbbba2cff8135d462, 0xbab5a71b5f7d9035, 0x0000000000000000
> +        .align 64
> +        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask_UISA     */
> +        .align 64
> +        .long 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000           /* _iMinIdxOfsMask_UISA   */
> +        .align 64
> +        .long 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000           /* _iMaxIdxMask_UISA      */
> +        .align 64
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000   /* _dbSignMask       */
> +        .align 64
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff   /* _dbAbsMask        */
> +        .align 64
> +        .long 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000           /* _iExpMantMask     */
> +        .align 64
> +        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMask         */
> +        .align 64
> +        .long 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000           /* _iMinIdxOfsMask   */
> +        .align 64
> +        .long 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000           /* _iMaxIdxMask      */
> +        .align 64
> +        .type	__svml_dtanh_data_internal,@object
> +        .size	__svml_dtanh_data_internal,.-__svml_dtanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S
> new file mode 100644
> index 0000000000..76bb22229e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized tanhf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_tanhf _ZGVeN16v_tanhf_avx2_wrapper
> +#include "../svml_s_tanhf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c
> new file mode 100644
> index 0000000000..cec4c7ed74
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized tanhf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_tanhf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_tanhf, __GI__ZGVeN16v_tanhf,
> +	       __redirect__ZGVeN16v_tanhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
> new file mode 100644
> index 0000000000..b6bdf97cc5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
> @@ -0,0 +1,381 @@
> +/* Function tanhf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_stanh_data_internal
> + */
> +#define _sC                           	0
> +#define _sP0                          	128
> +#define _sP2                          	256
> +#define _sP3                          	384
> +#define _sP4                          	512
> +#define _sP5                          	640
> +#define _sP6                          	768
> +#define _sP7                          	896
> +#define _iExpMantMask_UISA            	1024
> +#define _iMinIdxOfsMask_UISA          	1088
> +#define _iMaxIdxMask_UISA             	1152
> +#define _sSignMask                    	1216
> +#define _sAbsMask                     	1280
> +#define _iExpMantMask                 	1344
> +#define _iExpMask                     	1408
> +#define _iMinIdxOfsMask               	1472
> +#define _iMaxIdxMask                  	1536
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_tanhf_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovaps   %zmm0, %zmm1
> +        vmovups   __svml_stanh_data_internal(%rip), %zmm9
> +        vmovups   _sP6+__svml_stanh_data_internal(%rip), %zmm11
> +        vmovups   _sP5+__svml_stanh_data_internal(%rip), %zmm12
> +        vmovups   _sP4+__svml_stanh_data_internal(%rip), %zmm13
> +        vmovups   _sP3+__svml_stanh_data_internal(%rip), %zmm14
> +        vmovups   _sP2+__svml_stanh_data_internal(%rip), %zmm15
> +        vpternlogd $255, %zmm2, %zmm2, %zmm2
> +        vandps    _sAbsMask+__svml_stanh_data_internal(%rip), %zmm1, %zmm8
> +        vandps    _sSignMask+__svml_stanh_data_internal(%rip), %zmm1, %zmm0
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        vpandd    _iExpMantMask_UISA+__svml_stanh_data_internal(%rip), %zmm1, %zmm3
> +        vpsubd    _iMinIdxOfsMask_UISA+__svml_stanh_data_internal(%rip), %zmm3, %zmm4
> +        vpcmpd    $2, _iExpMask+__svml_stanh_data_internal(%rip), %zmm3, %k1
> +
> +/*
> + *  small table specific variables *
> + *  Constant loading
> + */
> +        vpxord    %zmm5, %zmm5, %zmm5
> +
> +/* if VMIN, VMAX is defined for I type */
> +        vpmaxsd   %zmm5, %zmm4, %zmm6
> +        vpminsd   _iMaxIdxMask_UISA+__svml_stanh_data_internal(%rip), %zmm6, %zmm7
> +        vpsrld    $21, %zmm7, %zmm10
> +        vmovups   _sP7+__svml_stanh_data_internal(%rip), %zmm4
> +        vpermt2ps _sC+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm9
> +        vpermt2ps _sP6+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm11
> +        vpermt2ps _sP7+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm4
> +        vpermt2ps _sP5+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm12
> +        vpermt2ps _sP4+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm13
> +        vpermt2ps _sP3+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm14
> +        vpermt2ps _sP2+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm15
> +        vpandnd   %zmm3, %zmm3, %zmm2{%k1}
> +        vptestmd  %zmm2, %zmm2, %k0
> +        vmovups   _sP0+__svml_stanh_data_internal(%rip), %zmm3
> +        vsubps    {rn-sae}, %zmm9, %zmm8, %zmm2
> +        kmovw     %k0, %edx
> +        vfmadd213ps {rn-sae}, %zmm11, %zmm2, %zmm4
> +        vpermt2ps _sP0+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm3
> +        vfmadd213ps {rn-sae}, %zmm12, %zmm2, %zmm4
> +        vfmadd213ps {rn-sae}, %zmm13, %zmm2, %zmm4
> +        vfmadd213ps {rn-sae}, %zmm14, %zmm2, %zmm4
> +        vfmadd213ps {rn-sae}, %zmm15, %zmm2, %zmm4
> +        vfmadd213ps {rn-sae}, %zmm3, %zmm2, %zmm4
> +        vorps     %zmm0, %zmm4, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm1, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      tanhf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_tanhf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_stanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _sC[32][1];
> +        __declspec(align(64)) VUINT32 _sP0[32][1];
> +        __declspec(align(64)) VUINT32 _sP2[32][1];
> +        __declspec(align(64)) VUINT32 _sP3[32][1];
> +        __declspec(align(64)) VUINT32 _sP4[32][1];
> +        __declspec(align(64)) VUINT32 _sP5[32][1];
> +        __declspec(align(64)) VUINT32 _sP6[32][1];
> +        __declspec(align(64)) VUINT32 _sP7[32][1];
> +        __declspec(align(64)) VUINT32 _iExpMantMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _iMinIdxOfsMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _iMaxIdxMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _sSignMask[16][1];
> +        __declspec(align(64)) VUINT32 _sAbsMask[16][1];
> +        __declspec(align(64)) VUINT32 _iExpMantMask[16][1];
> +        __declspec(align(64)) VUINT32 _iExpMask[16][1];
> +        __declspec(align(64)) VUINT32 _iMinIdxOfsMask[16][1];
> +        __declspec(align(64)) VUINT32 _iMaxIdxMask[16][1];
> +} __svml_stanh_data_internal;
> +#endif
> +__svml_stanh_data_internal:
> +        /*== _sC ==*/
> +        .long 0x00000000, 0x3d700000, 0x3d900000, 0x3db00000
> +        .long 0x3dd00000, 0x3df00000, 0x3e100000, 0x3e300000
> +        .long 0x3e500000, 0x3e700000, 0x3e900000, 0x3eb00000
> +        .long 0x3ed00000, 0x3ef00000, 0x3f100000, 0x3f300000
> +        .long 0x3f500000, 0x3f700000, 0x3f900000, 0x3fb00000
> +        .long 0x3fd00000, 0x3ff00000, 0x40100000, 0x40300000
> +        .long 0x40500000, 0x40700000, 0x40900000, 0x40b00000
> +        .long 0x40d00000, 0x40f00000, 0x41100000, 0x00000000
> +        /*== p0 ==*/
> +        .align 64
> +        .long 0x00000000, 0x3d6fb9c9, 0x3d8fc35f, 0x3daf9169
> +        .long 0x3dcf49ab, 0x3deee849, 0x3e0f0ee8, 0x3e2e4984
> +        .long 0x3e4d2f8e, 0x3e6bb32e, 0x3e8c51cd, 0x3ea96163
> +        .long 0x3ec543f1, 0x3edfd735, 0x3f028438, 0x3f18abf0
> +        .long 0x3f2bc480, 0x3f3bec1c, 0x3f4f2e5b, 0x3f613c53
> +        .long 0x3f6ce37d, 0x3f743c4f, 0x3f7a5feb, 0x3f7dea85
> +        .long 0x3f7f3b3d, 0x3f7fb78c, 0x3f7fefd4, 0x3f7ffdd0
> +        .long 0x3f7fffb4, 0x3f7ffff6, 0x3f7fffff, 0x3f800000
> +        /*== p2 ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f7f1f84, 0x3f7ebd11, 0x3f7e1e5f
> +        .long 0x3f7d609f, 0x3f7c842d, 0x3f7b00e5, 0x3f789580
> +        .long 0x3f75b8ad, 0x3f726fd9, 0x3f6cc59b, 0x3f63fb92
> +        .long 0x3f59ff97, 0x3f4f11d7, 0x3f3d7573, 0x3f24f360
> +        .long 0x3f0cbfe7, 0x3eec1a69, 0x3eb0a801, 0x3e6753a2
> +        .long 0x3e132f1a, 0x3db7e7d3, 0x3d320845, 0x3c84d3d4
> +        .long 0x3bc477b7, 0x3b10d3da, 0x3a01601e, 0x388c1a3b
> +        .long 0x3717b0da, 0x35a43bce, 0x338306c6, 0x00000000
> +        /*== p3 ==*/
> +        .align 64
> +        .long 0xb0343c7b, 0xbd6ee69d, 0xbd8f0da7, 0xbdae477d
> +        .long 0xbdcd2a1f, 0xbdeba80d, 0xbe0c443b, 0xbe293cf3
> +        .long 0xbe44f282, 0xbe5f3651, 0xbe81c7c0, 0xbe96d7ca
> +        .long 0xbea7fb8e, 0xbeb50e9e, 0xbec12efe, 0xbec4be92
> +        .long 0xbebce070, 0xbead510e, 0xbe8ef7d6, 0xbe4b8704
> +        .long 0xbe083237, 0xbdaf7449, 0xbd2e1ec4, 0xbc83bf06
> +        .long 0xbbc3e0b5, 0xbb10aadc, 0xba0157db, 0xb88c18f2
> +        .long 0xb717b096, 0xb5a43bae, 0xb383012c, 0x00000000
> +        /*== p4 ==*/
> +        .align 64
> +        .long 0xbeaaaaa5, 0xbeab0612, 0xbea7f01f, 0xbea4e120
> +        .long 0xbea387b7, 0xbea15962, 0xbe9d57f7, 0xbe976b5a
> +        .long 0xbe90230d, 0xbe880dff, 0xbe7479b3, 0xbe4c3d88
> +        .long 0xbe212482, 0xbdeb8cba, 0xbd5e78ad, 0x3c6b5e6e
> +        .long 0x3d839143, 0x3dc21ee1, 0x3de347af, 0x3dcbec96
> +        .long 0x3d99ef2d, 0x3d542ea1, 0x3cdde701, 0x3c2cca67
> +        .long 0x3b81cb27, 0x3ac073a1, 0x39ac3032, 0x383a94d9
> +        .long 0x36ca081d, 0x355abd4c, 0x332b3cb6, 0x00000000
> +        /*== p5 ==*/
> +        .align 64
> +        .long 0xb76dd6b9, 0xbe1c276d, 0x3c1dcf2f, 0x3dc1a78d
> +        .long 0x3d96f985, 0x3da2b61b, 0x3dc13397, 0x3dd2f670
> +        .long 0x3df48a0a, 0x3e06c5a8, 0x3e1a3aba, 0x3e27c405
> +        .long 0x3e2e78d0, 0x3e2c3e44, 0x3e1d3097, 0x3df4a8f4
> +        .long 0x3da38508, 0x3d31416a, 0x3b562657, 0xbcaeeac9
> +        .long 0xbcce9419, 0xbcaaeac4, 0xbc49e7d0, 0xbba71ddd
> +        .long 0xbb003b0e, 0xba3f9a05, 0xb92c08a7, 0xb7ba9232
> +        .long 0xb64a0b0f, 0xb4dac169, 0xb2ab78ac, 0x00000000
> +        /*== p6 ==*/
> +        .align 64
> +        .long 0x3e0910e9, 0x43761143, 0x4165ecdc, 0xc190f756
> +        .long 0xc08c097d, 0xc02ba813, 0xbf7f6bda, 0x3f2b1dc0
> +        .long 0x3ece105d, 0x3f426a94, 0xbadb0dc4, 0x3da43b17
> +        .long 0xbd51ab88, 0xbcaea23d, 0xbd3b6d8d, 0xbd6caaad
> +        .long 0xbd795bed, 0xbd5fddda, 0xbd038f3b, 0xbc1cad63
> +        .long 0x3abb4766, 0x3b95f10b, 0x3b825873, 0x3afaea66
> +        .long 0x3a49f878, 0x39996bf3, 0x388f3e6c, 0x371bb0e3
> +        .long 0x35a8a5e6, 0x34369b17, 0x322487b0, 0x00000000
> +        /*== p7 ==*/
> +        .align 64
> +        .long 0xbc0e2f66, 0x460bda12, 0x43d638ef, 0xc3e11c3e
> +        .long 0xc2baa4e9, 0xc249da2d, 0xc1859b82, 0x40dd5b57
> +        .long 0x40494640, 0x40c730a8, 0xbf0f160e, 0x3e30e76f
> +        .long 0xbea81387, 0xbdb26a1c, 0xbd351e57, 0xbb4c01a0
> +        .long 0x3c1d7bfb, 0x3c722cd1, 0x3c973f1c, 0x3c33a31b
> +        .long 0x3b862ef4, 0x3a27b3d0, 0xba3b5907, 0xba0efc22
> +        .long 0xb97f9f0f, 0xb8c8af50, 0xb7bdddfb, 0xb64f2950
> +        .long 0xb4e085b1, 0xb3731dfa, 0xb15a1f04, 0x00000000
> +        .align 64
> +        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMantMask_UISA     */
> +        .align 64
> +        .long 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000           /* _iMinIdxOfsMask_UISA   */
> +        .align 64
> +        .long 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000           /* _iMaxIdxMask_UISA      */
> +        .align 64
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSignMask        */
> +        .align 64
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff           /* _sAbsMask         */
> +        .align 64
> +        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask     */
> +        .align 64
> +        .long 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000           /* _iExpMask         */
> +        .align 64
> +        .long 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000           /* _iMinIdxOfsMask   */
> +        .align 64
> +        .long 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000           /* _iMaxIdxMask      */
> +        .align 64
> +        .type	__svml_stanh_data_internal,@object
> +        .size	__svml_stanh_data_internal,.-__svml_stanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S
> new file mode 100644
> index 0000000000..cd290db337
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized tanhf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_tanhf _ZGVbN4v_tanhf_sse2
> +#include "../svml_s_tanhf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c
> new file mode 100644
> index 0000000000..2dcb1f3676
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized tanhf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_tanhf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_tanhf, __GI__ZGVbN4v_tanhf,
> +	       __redirect__ZGVbN4v_tanhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S
> new file mode 100644
> index 0000000000..3a0ce20473
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S
> @@ -0,0 +1,832 @@
> +/* Function tanhf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_stanh_data_internal
> + */
> +#define _dbP                          	0
> +#define _sSignMask                    	4288
> +#define _sAbsMask                     	4304
> +#define _iExpMantMask                 	4320
> +#define _iExpMask                     	4336
> +#define _iMinIdxOfsMask               	4352
> +#define _iMaxIdxMask                  	4368
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_tanhf_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm5
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        movdqu    _iExpMantMask+__svml_stanh_data_internal(%rip), %xmm9
> +        lea       _dbP+16+__svml_stanh_data_internal(%rip), %r8
> +        pand      %xmm5, %xmm9
> +
> +/* if VMIN, VMAX is defined for I type */
> +        pxor      %xmm7, %xmm7
> +        movdqa    %xmm9, %xmm6
> +        psubd     _iMinIdxOfsMask+__svml_stanh_data_internal(%rip), %xmm9
> +
> +/*
> + *  small table specific variables *
> + *  Constant loading
> + */
> +        movdqu    _iMaxIdxMask+__svml_stanh_data_internal(%rip), %xmm10
> +        movdqa    %xmm9, %xmm11
> +        movdqa    %xmm9, %xmm8
> +        pcmpgtd   %xmm10, %xmm11
> +        pcmpgtd   %xmm7, %xmm8
> +        movdqa    %xmm11, %xmm14
> +        pand      %xmm8, %xmm9
> +        andps     %xmm11, %xmm10
> +        andnps    %xmm9, %xmm14
> +        orps      %xmm10, %xmm14
> +        psrld     $14, %xmm14
> +        movd      %xmm14, %edx
> +        pshufd    $1, %xmm14, %xmm12
> +        pshufd    $2, %xmm14, %xmm13
> +        movd      %xmm12, %ecx
> +        pshufd    $3, %xmm14, %xmm15
> +        movups    _sAbsMask+__svml_stanh_data_internal(%rip), %xmm3
> +        movslq    %edx, %rdx
> +        andps     %xmm5, %xmm3
> +        movslq    %ecx, %rcx
> +        pcmpgtd   _iExpMask+__svml_stanh_data_internal(%rip), %xmm6
> +        movd      %xmm13, %esi
> +        movups    -16(%rdx,%r8), %xmm2
> +        movaps    %xmm2, %xmm0
> +        movd      %xmm15, %edi
> +        movmskps  %xmm6, %eax
> +        movups    -16(%rcx,%r8), %xmm6
> +        unpcklpd  %xmm6, %xmm0
> +        unpckhpd  %xmm6, %xmm2
> +        cvtps2pd  %xmm3, %xmm6
> +        movhlps   %xmm3, %xmm3
> +        cvtps2pd  %xmm3, %xmm3
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        movups    (%rcx,%r8), %xmm8
> +        movups    (%rdx,%r8), %xmm12
> +        movups    (%rsi,%r8), %xmm13
> +        movaps    %xmm12, %xmm10
> +        movups    (%rdi,%r8), %xmm9
> +        movaps    %xmm13, %xmm11
> +        unpckhpd  %xmm8, %xmm12
> +        unpckhpd  %xmm9, %xmm13
> +        mulpd     %xmm6, %xmm12
> +        mulpd     %xmm3, %xmm13
> +        unpcklpd  %xmm8, %xmm10
> +        unpcklpd  %xmm9, %xmm11
> +        addpd     %xmm10, %xmm12
> +        addpd     %xmm11, %xmm13
> +        mulpd     %xmm6, %xmm12
> +        mulpd     %xmm3, %xmm13
> +        addpd     %xmm2, %xmm12
> +        movups    -16(%rsi,%r8), %xmm1
> +        movups    -16(%rdi,%r8), %xmm7
> +        movaps    %xmm1, %xmm14
> +        unpckhpd  %xmm7, %xmm1
> +        addpd     %xmm1, %xmm13
> +        mulpd     %xmm12, %xmm6
> +        mulpd     %xmm13, %xmm3
> +        addpd     %xmm0, %xmm6
> +        unpcklpd  %xmm7, %xmm14
> +        addpd     %xmm14, %xmm3
> +        cvtpd2ps  %xmm6, %xmm0
> +        cvtpd2ps  %xmm3, %xmm1
> +        movups    _sSignMask+__svml_stanh_data_internal(%rip), %xmm4
> +        movlhps   %xmm1, %xmm0
> +        andps     %xmm5, %xmm4
> +        orps      %xmm4, %xmm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm5
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm5, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax
> +
> +        xorl      %edx, %edx
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      tanhf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_tanhf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_stanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _dbP[(134*4)][2];
> +        __declspec(align(16)) VUINT32 _sSignMask[4][1];
> +        __declspec(align(16)) VUINT32 _sAbsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iExpMantMask[4][1];
> +        __declspec(align(16)) VUINT32 _iExpMask[4][1];
> +        __declspec(align(16)) VUINT32 _iMinIdxOfsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iMaxIdxMask[4][1];
> +} __svml_stanh_data_internal;
> +#endif
> +__svml_stanh_data_internal:
> +        /* Pol_000:  err=7.93e-09, x in [0.0000000; 0.0312500]. */
> +        .quad 0x0000000000000000  /* A00 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF00000022C70EB  /* A01 = +1.000000008097283510367e+00 */
> +        .quad 0xBED00E878CFFA194  /* A02 = -3.828228912518614443549e-06 */
> +        .quad 0xBFD551766D0607A9  /* A03 = -3.330970825846813476723e-01 */
> +        .quad 0xBE53D60CE3E4C297  /* A00 = -1.847383956330407336230e-08 */
> +        .quad 0x3FF000024177CF5C  /* A01 = +1.000002151235967140508e+00 */
> +        .quad 0xBF1758BC94A51A25  /* A02 = -8.906031613262943753568e-05 */
> +        .quad 0xBFD53EAE67E0D4F0  /* A03 = -3.319507612644221339337e-01 */
> +        .quad 0xBE5A9E47EF32D6FE  /* A00 = -2.479020984039698285657e-08 */
> +        .quad 0x3FF00002DA983057  /* A01 = +1.000002721676556793895e+00 */
> +        .quad 0xBF1BD953509E94AA  /* A02 = -1.062352277175377670507e-04 */
> +        .quad 0xBFD53BDB562EEDD5  /* A03 = -3.317783681520414806876e-01 */
> +        .quad 0xBE6191BBE496D294  /* A00 = -3.272532162914017685901e-08 */
> +        .quad 0x3FF0000390492017  /* A01 = +1.000003398528866105366e+00 */
> +        .quad 0xBF20727E814A57CE  /* A02 = -1.254825043772153972919e-04 */
> +        .quad 0xBFD538DE060A6F22  /* A03 = -3.315959033004550748913e-01 */
> +        .quad 0xBE66DAFA2A893A25  /* A00 = -4.257146219278012568149e-08 */
> +        .quad 0x3FF0000465E08CD1  /* A01 = +1.000004194219219266770e+00 */
> +        .quad 0xBF2341C765EF91B6  /* A02 = -1.469188600530365522261e-04 */
> +        .quad 0xBFD535B6841FAF9E  /* A03 = -3.314033785124993469751e-01 */
> +        .quad 0xBE6D5794E361E964  /* A00 = -5.465394929765249413434e-08 */
> +        .quad 0x3FF000055EE2A0CB  /* A01 = +1.000005121846742950353e+00 */
> +        .quad 0xBF265E6C77E66C8B  /* A02 = -1.706607253709506650304e-04 */
> +        .quad 0xBFD53264DDCCEDA6  /* A03 = -3.312008062382240103361e-01 */
> +        .quad 0xBE729C844D374A6E  /* A00 = -6.933284462462096107184e-08 */
> +        .quad 0x3FF000067F019093  /* A01 = +1.000006195180536350264e+00 */
> +        .quad 0xBF29CC5348D6DCE5  /* A02 = -1.968242326435338705130e-04 */
> +        .quad 0xBFD52EE92121ED35  /* A03 = -3.309881995734998416658e-01 */
> +        .quad 0xBE775AEA17EAA872  /* A00 = -8.700465590574974405858e-08 */
> +        .quad 0x3FF00007CA1D66B8  /* A01 = +1.000007428656699559610e+00 */
> +        .quad 0xBF2D8F5EB98A2637  /* A02 = -2.255252009216044881395e-04 */
> +        .quad 0xBFD52B435CDF9128  /* A03 = -3.307655722585587376727e-01 */
> +        .quad 0xBE7D04DA28C343F0  /* A00 = -1.081040272327705484794e-07 */
> +        .quad 0x3FF000094443CCF5  /* A01 = +1.000008837375216730337e+00 */
> +        .quad 0xBF30D5B76C947AE5  /* A02 = -2.568791210978817814332e-04 */
> +        .quad 0xBFD52773A0776FAD  /* A03 = -3.305329386764651045105e-01 */
> +        .quad 0xBE81DD77A12C51C7  /* A00 = -1.331054169875768625701e-07 */
> +        .quad 0x3FF0000AF1AFD2DA  /* A01 = +1.000010437096696680470e+00 */
> +        .quad 0xBF331230624C1680  /* A02 = -2.910011410651516805537e-04 */
> +        .quad 0xBFD52379FC0B61DF  /* A03 = -3.302903138515186909352e-01 */
> +        .quad 0xBE85D04EEEB3C435  /* A00 = -1.625247628488202841012e-07 */
> +        .quad 0x3FF0000CD6C9B1F2  /* A01 = +1.000012244238970726684e+00 */
> +        .quad 0xBF357F0742FADDD4  /* A02 = -3.280060509313874068243e-04 */
> +        .quad 0xBFD51F56806D0E81  /* A03 = -3.300377134475880880338e-01 */
> +        .quad 0xBE8A6E289B59681B  /* A00 = -1.969211333326924655065e-07 */
> +        .quad 0x3FF0000EF8268F72  /* A01 = +1.000014275873550406715e+00 */
> +        .quad 0xBF381E277A1B747A  /* A02 = -3.680082682942575423093e-04 */
> +        .quad 0xBFD51B093F1D6FD4  /* A03 = -3.297751537663746734808e-01 */
> +        .quad 0xBE8FCBC40EE9ABD5  /* A00 = -2.368983653301529373887e-07 */
> +        .quad 0x3FF000115A883B6C  /* A01 = +1.000016549721943981410e+00 */
> +        .quad 0xBF3AF17AC974B3D9  /* A02 = -4.111218235774406434303e-04 */
> +        .quad 0xBFD516924A4C549C  /* A03 = -3.295026517456081105450e-01 */
> +        .quad 0xBE92FFBC60A3F956  /* A00 = -2.831066871072026054144e-07 */
> +        .quad 0x3FF0001402DCED8A  /* A01 = +1.000019084151832604590e+00 */
> +        .quad 0xBF3DFAE9390C4801  /* A02 = -4.574603454311488280083e-04 */
> +        .quad 0xBFD511F1B4D7DC3A  /* A03 = -3.292202249571719585575e-01 */
> +        .quad 0xBE9690A22F96D5AD  /* A00 = -3.362443262393081632612e-07 */
> +        .quad 0x3FF00016F63EFF5D  /* A01 = +1.000021898173108825247e+00 */
> +        .quad 0xBF409E2C839605BB  /* A02 = -5.071370461992499986334e-04 */
> +        .quad 0xBFD50D27924BEE00  /* A03 = -3.289278916051614487515e-01 */
> +        .quad 0xBE9AA56C65E72A73  /* A00 = -3.970591019557469835586e-07 */
> +        .quad 0x3FF0001A39F4A43E  /* A01 = +1.000025011433776978009e+00 */
> +        .quad 0xBF425BD74C3D6667  /* A02 = -5.602647074553602319844e-04 */
> +        .quad 0xBFD50833F6E1ABA2  /* A03 = -3.286256705238718156536e-01 */
> +        .quad 0xBE9F4BD4FF1A83B0  /* A00 = -4.663500013744687071912e-07 */
> +        .quad 0x3FF0001DD36F9EC2  /* A01 = +1.000028444215715683896e+00 */
> +        .quad 0xBF44376634149405  /* A02 = -6.169556656102642569831e-04 */
> +        .quad 0xBFD50316F77EDEE5  /* A03 = -3.283135811757190158922e-01 */
> +        .quad 0xBEA3B625387BB079  /* A00 = -5.874486399249461304297e-07 */
> +        .quad 0x3FF00023E14CFBA9  /* A01 = +1.000034217911642153709e+00 */
> +        .quad 0xBF47392F923218D2  /* A02 = -7.087213783883111826306e-04 */
> +        .quad 0xBFD4FB1FACDEB938  /* A03 = -3.278273761924483942209e-01 */
> +        .quad 0xBEAA6E24F543500A  /* A00 = -7.876828740601738750574e-07 */
> +        .quad 0x3FF0002D5C6E8412  /* A01 = +1.000043259679163742959e+00 */
> +        .quad 0xBF4BAF02BD7FDD70  /* A02 = -8.448375110664940040861e-04 */
> +        .quad 0xBFD4EFEE6527A7DE  /* A03 = -3.271442401734229177279e-01 */
> +        .quad 0xBEB16E3EBE2157D0  /* A00 = -1.038947396133402500647e-06 */
> +        .quad 0x3FF00038990FEE2F  /* A01 = +1.000053975962952312884e+00 */
> +        .quad 0xBF50569481C574CB  /* A02 = -9.972048056490652716971e-04 */
> +        .quad 0xBFD4E419278DA2B4  /* A03 = -3.264220129263251113372e-01 */
> +        .quad 0xBEB6A7B6723165D4  /* A00 = -1.350350836279403750524e-06 */
> +        .quad 0x3FF00045CAB4158E  /* A01 = +1.000066558657042303793e+00 */
> +        .quad 0xBF531D7C9C849108  /* A02 = -1.166698160951775212202e-03 */
> +        .quad 0xBFD4D7A0BB33B152  /* A03 = -3.256608799117844954552e-01 */
> +        .quad 0xBEBD0EE2A8654AFD  /* A00 = -1.732000471561702711532e-06 */
> +        .quad 0x3FF00055276F18D6  /* A01 = +1.000081209219890521211e+00 */
> +        .quad 0xBF562FDBA3FB6C6C  /* A02 = -1.354183666925102939860e-03 */
> +        .quad 0xBFD4CA85F1B93DB2  /* A03 = -3.248610363561638125773e-01 */
> +        .quad 0xBEC269D4036A207E  /* A00 = -2.195047297096822741730e-06 */
> +        .quad 0x3FF00066E7DA6E4E  /* A01 = +1.000098138500919997540e+00 */
> +        .quad 0xBF5991499FC36B3A  /* A02 = -1.560518167983372759405e-03 */
> +        .quad 0xBFD4BCC9A72283D6  /* A03 = -3.240226871658341556426e-01 */
> +        .quad 0xBEC7154B6C09CFE1  /* A00 = -2.751729738565190291276e-06 */
> +        .quad 0x3FF0007B47086B80  /* A01 = +1.000117566559055148900e+00 */
> +        .quad 0xBF5D455433B4F8F4  /* A02 = -1.786548832412968197680e-03 */
> +        .quad 0xBFD4AE6CC1BFE145  /* A03 = -3.231460468373550942722e-01 */
> +        .quad 0xBECCA68CC64A0F8A  /* A00 = -3.415415948561670285790e-06 */
> +        .quad 0x3FF00092827742F7  /* A01 = +1.000139722473418535387e+00 */
> +        .quad 0xBF60A7BF15A527AF  /* A02 = -2.033112728132522705610e-03 */
> +        .quad 0xBFD49F703214084C  /* A03 = -3.222313393636155876010e-01 */
> +        .quad 0xBED19E68676B241B  /* A00 = -4.200644630977303616698e-06 */
> +        .quad 0x3FF000ACDA037B26  /* A01 = +1.000164844146362863597e+00 */
> +        .quad 0xBF62D99F836A02F8  /* A02 = -2.301036405072284102280e-03 */
> +        .quad 0xBFD48FD4F2B91B28  /* A03 = -3.212787981359945810311e-01 */
> +        .quad 0xBED57CF4B0C7AA54  /* A00 = -5.123164339408145209103e-06 */
> +        .quad 0x3FF000CA8FD9E1A1  /* A01 = +1.000193178099017865534e+00 */
> +        .quad 0xBF653A014548E686  /* A02 = -2.591135484433962181405e-03 */
> +        .quad 0xBFD47F9C0844B38F  /* A03 = -3.202886658426046806447e-01 */
> +        .quad 0xBEDA012B1B1A41E2  /* A00 = -6.199971197454598722328e-06 */
> +        .quad 0x3FF000EBE868FDF4  /* A01 = +1.000224979259539459520e+00 */
> +        .quad 0xBF67CA9427E0A544  /* A02 = -2.904214255086275467410e-03 */
> +        .quad 0xBFD46EC6812ADB37  /* A03 = -3.192611943626845749655e-01 */
> +        .quad 0xBEDF3EAC5BF12194  /* A00 = -7.449344990702664567927e-06 */
> +        .quad 0x3FF001112A520784  /* A01 = +1.000260510744255704196e+00 */
> +        .quad 0xBF6A8D01ABDA4DC4  /* A02 = -3.241065277345108255891e-03 */
> +        .quad 0xBFD45D55759FFA4A  /* A03 = -3.181966446572103146551e-01 */
> +        .quad 0xBEE2A541BC274267  /* A00 = -8.890883582164319970972e-06 */
> +        .quad 0x3FF0013A9E5961F2  /* A01 = +1.000300043631906721231e+00 */
> +        .quad 0xBF6D82ECD080C540  /* A02 = -3.602468994380686462264e-03 */
> +        .quad 0xBFD44B4A0779C0AD  /* A03 = -3.170952866557950611259e-01 */
> +        .quad 0xBEE61D97609A27F4  /* A00 = -1.054553560499505625520e-05 */
> +        .quad 0x3FF001688F56A3AF  /* A01 = +1.000343856731187974773e+00 */
> +        .quad 0xBF7056F8EFB683EC  /* A02 = -3.989193351487490407647e-03 */
> +        .quad 0xBFD438A5620F0F74  /* A03 = -3.159573991399533543500e-01 */
> +        .quad 0xBEEA145429EDD370  /* A00 = -1.243563138839952927732e-05 */
> +        .quad 0x3FF0019B4A242A67  /* A01 = +1.000392236341804297339e+00 */
> +        .quad 0xBF7207D31CA78D9B  /* A02 = -4.401993423445739288258e-03 */
> +        .quad 0xBFD42568BA16E7CD  /* A03 = -3.147832696228050619602e-01 */
> +        .quad 0xBEEE96370D52680F  /* A00 = -1.458491207477835326165e-05 */
> +        .quad 0x3FF001D31D8E4115  /* A01 = +1.000445476009251821736e+00 */
> +        .quad 0xBF73D4CC11EDC094  /* A02 = -4.841611050196221316400e-03 */
> +        .quad 0xBFD411954D8664E7  /* A03 = -3.135731942252974469021e-01 */
> +        .quad 0xBEF338C046215EF8  /* A00 = -1.833122622260562810219e-05 */
> +        .quad 0x3FF00230C32C2EC1  /* A01 = +1.000534784691737621998e+00 */
> +        .quad 0xBF76BD019BCC5DAF  /* A02 = -5.551344188254799492943e-03 */
> +        .quad 0xBFD3F2C7156DC21E  /* A03 = -3.116929730668135389848e-01 */
> +        .quad 0xBEF9B15EAE411EAE  /* A00 = -2.450261207822986676092e-05 */
> +        .quad 0x3FF002C2DF057A4D  /* A01 = +1.000674124886830940184e+00 */
> +        .quad 0xBF7B08CCD9AC1E30  /* A02 = -6.600189396301511801646e-03 */
> +        .quad 0xBFD3C7A7A114FED8  /* A03 = -3.090609620157755976777e-01 */
> +        .quad 0xBF00E36483C373B3  /* A00 = -3.221178528332122595812e-05 */
> +        .quad 0x3FF0036F419480D7  /* A01 = +1.000838524028997644777e+00 */
> +        .quad 0xBF7FD255D1777007  /* A02 = -7.768950679260206403087e-03 */
> +        .quad 0xBFD39A453911D6CE  /* A03 = -3.062909180947429588215e-01 */
> +        .quad 0xBF05DFA04DD12059  /* A00 = -4.172046622180685472624e-05 */
> +        .quad 0x3FF00438B2A03D8D  /* A01 = +1.001030633695197069599e+00 */
> +        .quad 0xBF828F8DBB4A9D10  /* A02 = -9.062869337255224921890e-03 */
> +        .quad 0xBFD36AAB704697D9  /* A03 = -3.033856007044711255993e-01 */
> +        .quad 0xBF0BF3E0C647DEFB  /* A00 = -5.331544597092331081714e-05 */
> +        .quad 0x3FF005221063D36D  /* A01 = +1.001253189109060359741e+00 */
> +        .quad 0xBF857A2CB3C96102  /* A02 = -1.048693584122917590862e-02 */
> +        .quad 0xBFD338E65BBB4FEC  /* A03 = -3.003478904549854444639e-01 */
> +        .quad 0xBF11A506ED7C9D31  /* A00 = -6.730894835681591541979e-05 */
> +        .quad 0x3FF0062E4D0EA92A  /* A01 = +1.001508999829250345925e+00 */
> +        .quad 0xBF88AB82C2761AF3  /* A02 = -1.204588085125866091241e-02 */
> +        .quad 0xBFD305028D6BD206  /* A03 = -2.971807843271395688234e-01 */
> +        .quad 0xBF1607C0922D9BF1  /* A00 = -8.403885708006799337092e-05 */
> +        .quad 0x3FF007606C341961  /* A01 = +1.001800940198869449560e+00 */
> +        .quad 0xBF8C25E6DA487BCF  /* A02 = -1.374416688582682892494e-02 */
> +        .quad 0xBFD2CF0D0EE8F7B5  /* A03 = -2.938873906713255768075e-01 */
> +        .quad 0xBF1B3A8480A0A16D  /* A00 = -1.038688061788578038307e-04 */
> +        .quad 0x3FF008BB802D02D6  /* A01 = +1.002131939589323561535e+00 */
> +        .quad 0xBF8FEB8AE99FD100  /* A02 = -1.558598065819483124983e-02 */
> +        .quad 0xBFD297135BD0911B  /* A03 = -2.904709240558688843059e-01 */
> +        .quad 0xBF20ABB9BDB75C65  /* A00 = -1.271881327357976163798e-04 */
> +        .quad 0x3FF00A42A76D8CD1  /* A01 = +1.002504972472525901495e+00 */
> +        .quad 0xBF91FF3D752BB9E6  /* A02 = -1.757522609380570560722e-02 */
> +        .quad 0xBFD25D235C1F88B4  /* A03 = -2.869346999779154305799e-01 */
> +        .quad 0xBF243D3254425461  /* A00 = -1.544116913733432829448e-04 */
> +        .quad 0x3FF00BF909D1795E  /* A01 = +1.002923048355647051011e+00 */
> +        .quad 0xBF94304E04D44942  /* A02 = -1.971551804042204897316e-02 */
> +        .quad 0xBFD2214B5E61CFA6  /* A03 = -2.832821294498394371075e-01 */
> +        .quad 0xBF286070011B61CE  /* A00 = -1.859795307186510085994e-04 */
> +        .quad 0x3FF00DE1D5E1627E  /* A01 = +1.003389201612804537689e+00 */
> +        .quad 0xBF9689D5F4163F59  /* A02 = -2.201017668045266231780e-02 */
> +        .quad 0xBFD1E39A11C3B42C  /* A03 = -2.795167134743816728104e-01 */
> +        .quad 0xBF2D250B366A79E8  /* A00 = -2.223564326486314902259e-04 */
> +        .quad 0x3FF010003E134001  /* A01 = +1.003906481248123094829e+00 */
> +        .quad 0xBF990C9FF91F6F81  /* A02 = -2.446222265267250853271e-02 */
> +        .quad 0xBFD1A41E80084CDC  /* A03 = -2.756420374218586655246e-01 */
> +        .quad 0xBF314DB5DDC2A30E  /* A00 = -2.640313157465248123865e-04 */
> +        .quad 0x3FF012577608921B  /* A01 = +1.004477940624503018441e+00 */
> +        .quad 0xBF9BB9626875B0C9  /* A02 = -2.707437288829409385849e-02 */
> +        .quad 0xBFD162E80768A9D0  /* A03 = -2.716617653228725615122e-01 */
> +        .quad 0xBF346A6133808864  /* A00 = -3.115165050094957730625e-04 */
> +        .quad 0x3FF014EAAFCC88A3  /* A01 = +1.005106627192198898157e+00 */
> +        .quad 0xBF9E90BEF9BF7419  /* A02 = -2.984903716411588595059e-02 */
> +        .quad 0xBFD12006545F7FAD  /* A03 = -2.675796340899932457269e-01 */
> +        .quad 0xBF37F180DC3848EA  /* A00 = -3.653468704395550778821e-04 */
> +        .quad 0x3FF017BD19147861  /* A01 = +1.005795572250939295955e+00 */
> +        .quad 0xBFA0C9A14C702E07  /* A02 = -3.278831537326359207851e-02 */
> +        .quad 0xBFD0DB895B650092  /* A03 = -2.633994476818851682154e-01 */
> +        .quad 0xBF3BEC6AAC6D7635  /* A00 = -4.260788377246944457107e-04 */
> +        .quad 0x3FF01AD1D884E719  /* A01 = +1.006547780778822565040e+00 */
> +        .quad 0xBFA260B2A1B1434A  /* A02 = -3.589399551186163439542e-02 */
> +        .quad 0xBFD09581529E93D6  /* A03 = -2.591250712233067465817e-01 */
> +        .quad 0xBF4164E26167882B  /* A00 = -5.308251737086202562063e-04 */
> +        .quad 0x3FF01FEF14B62B81  /* A01 = +1.007796364693348545316e+00 */
> +        .quad 0xBFA4EB014538AA42  /* A02 = -4.085544557559163403315e-02 */
> +        .quad 0xBFD029D36FEAF41F  /* A03 = -2.525528519580024222613e-01 */
> +        .quad 0xBF46F6FFF4E53DC8  /* A00 = -7.008313930700277652464e-04 */
> +        .quad 0x3FF027CBB51CBBA0  /* A01 = +1.009715754956893363214e+00 */
> +        .quad 0xBFA89DEC9FEC112E  /* A02 = -4.807986690687680864098e-02 */
> +        .quad 0xBFCF2A99464D0DB4  /* A03 = -2.434875100390009317053e-01 */
> +        .quad 0xBF4DCC9C4F66A4D9  /* A00 = -9.094012482836712945103e-04 */
> +        .quad 0x3FF030E7CFCCD583  /* A01 = +1.011939822882909068014e+00 */
> +        .quad 0xBFACAA3B95814081  /* A02 = -5.598627281199331645611e-02 */
> +        .quad 0xBFCDF78F156BE7CF  /* A03 = -2.341173987004467604844e-01 */
> +        .quad 0xBF5308ED74E5C7A6  /* A00 = -1.161796466103906435435e-03 */
> +        .quad 0x3FF03B5986412ECB  /* A01 = +1.014489674026594512313e+00 */
> +        .quad 0xBFB087EBA88DCC3F  /* A02 = -6.457398285947223148806e-02 */
> +        .quad 0xBFCCBB9BD134862F  /* A03 = -2.244753619680052991736e-01 */
> +        .quad 0xBF57FA23C00DF4B5  /* A00 = -1.463446533505758208674e-03 */
> +        .quad 0x3FF0473558A1BCC0  /* A01 = +1.017384859292903342975e+00 */
> +        .quad 0xBFB2E702BC6360EF  /* A02 = -7.383744334527241048871e-02 */
> +        .quad 0xBFCB77D546379288  /* A03 = -2.145945160729250122955e-01 */
> +        .quad 0xBF5DD12971557F71  /* A00 = -1.819887610814388068450e-03 */
> +        .quad 0x3FF0548DDF5000A8  /* A01 = +1.020643112482540360020e+00 */
> +        .quad 0xBFB571B63DA186E1  /* A02 = -8.376635555898871710045e-02 */
> +        .quad 0xBFCA2D5202605148  /* A03 = -2.045080672838912594358e-01 */
> +        .quad 0xBF6252B1AD5D4F17  /* A00 = -2.236697221556737096709e-03 */
> +        .quad 0x3FF063738A910BF7  /* A01 = +1.024280110622155737232e+00 */
> +        .quad 0xBFB8270C8E6B601B  /* A02 = -9.434584118878357184013e-02 */
> +        .quad 0xBFC8DD27D950A07E  /* A03 = -1.942491351230763441116e-01 */
> +        .quad 0xBF66470C91730CFC  /* A00 = -2.719425723258004842786e-03 */
> +        .quad 0x3FF073F468FCF331  /* A01 = +1.028309259519300633556e+00 */
> +        .quad 0xBFBB05C2952191E4  /* A02 = -1.055566419686964629854e-01 */
> +        .quad 0xBFC7886A770DE2BD  /* A03 = -1.838505822486435070662e-01 */
> +        .quad 0xBF6AD114AC8E98EC  /* A00 = -3.273525599485007861467e-03 */
> +        .quad 0x3FF0861BF53E5226  /* A01 = +1.032741506559554434119e+00 */
> +        .quad 0xBFBE0C4F9B461507  /* A02 = -1.173753503881763554650e-01 */
> +        .quad 0xBFC6302A037CDE3A  /* A03 = -1.733448521642786954722e-01 */
> +        .quad 0xBF6FFBDE2A6C2AF8  /* A00 = -3.904279630096648551207e-03 */
> +        .quad 0x3FF099F2EB8E7DA3  /* A01 = +1.037585182326304034106e+00 */
> +        .quad 0xBFC09C74D192DDF0  /* A02 = -1.297746680554463516444e-01 */
> +        .quad 0xBFC4D571D8E3079F  /* A03 = -1.627638157861470424859e-01 */
> +        .quad 0xBF72E8FDC0B952AA  /* A00 = -4.616728994353872309042e-03 */
> +        .quad 0x3FF0AF7F273C9533  /* A01 = +1.042845872181101141152e+00 */
> +        .quad 0xBFC244C512736F10  /* A02 = -1.427236881344176033792e-01 */
> +        .quad 0xBFC379474F58B902  /* A03 = -1.521386277613104298645e-01 */
> +        .quad 0xBF762EABAF17395B  /* A00 = -5.415602341101023557701e-03 */
> +        .quad 0x3FF0C6C3886F63FB  /* A01 = +1.048526318502125631582e+00 */
> +        .quad 0xBFC3FDF9918EA12A  /* A02 = -1.561881981590514389957e-01 */
> +        .quad 0xBFC21CA89ECAB895  /* A03 = -1.414995932913753196036e-01 */
> +        .quad 0xBF79D387CE5B2BAE  /* A00 = -6.305246822828998107258e-03 */
> +        .quad 0x3FF0DFBFE2346376  /* A01 = +1.054626353847394337748e+00 */
> +        .quad 0xBFC5C6DA43602620  /* A02 = -1.701309994680721970894e-01 */
> +        .quad 0xBFC0C08BD8DB6631  /* A03 = -1.308760460731704100557e-01 */
> +        .quad 0xBF7DDBA8E8DA9060  /* A00 = -7.289562037531366334164e-03 */
> +        .quad 0x3FF0FA70F0D1B464  /* A01 = +1.061142864894713433443e+00 */
> +        .quad 0xBFC79E18D92BAA7C  /* A02 = -1.845122394946264732241e-01 */
> +        .quad 0xBFBECBBBF74C2669  /* A03 = -1.202962378266875381749e-01 */
> +        .quad 0xBF81254E76EA25DA  /* A00 = -8.371937755572145950511e-03 */
> +        .quad 0x3FF116D05835EBD0  /* A01 = +1.068069786618014660462e+00 */
> +        .quad 0xBFC982539E2ED224  /* A02 = -1.992897531869327609755e-01 */
> +        .quad 0xBFBC1B043C350159  /* A03 = -1.097872397413132278254e-01 */
> +        .quad 0xBF8391ACBA863403  /* A00 = -9.555196230190082448686e-03 */
> +        .quad 0x3FF134D4AA477FE2  /* A01 = +1.075398125794884141015e+00 */
> +        .quad 0xBFCB7218609FEAFB  /* A02 = -2.144194099235717521079e-01 */
> +        .quad 0xBFB970A16CB88329  /* A03 = -9.937485603633135211599e-02 */
> +        .quad 0xBF87935088E48E8B  /* A00 = -1.151144902957603431692e-02 */
> +        .quad 0x3FF1649892AD7DD3  /* A01 = +1.087059567413110938716e+00 */
> +        .quad 0xBFCE6971DDE75409  /* A02 = -2.375929196847723912089e-01 */
> +        .quad 0xBFB58291E88CB251  /* A03 = -8.402358939628952472223e-02 */
> +        .quad 0xBF8DB3A62C325325  /* A00 = -1.450280973794233242702e-02 */
> +        .quad 0x3FF1A9C900C6DEEA  /* A01 = +1.103951457056548068891e+00 */
> +        .quad 0xBFD13DBC65B0E08E  /* A02 = -2.693930619311765140012e-01 */
> +        .quad 0xBFB06696F62696D1  /* A03 = -6.406539449252625362252e-02 */
> +        .quad 0xBF92583699F2E27A  /* A00 = -1.791463198307716858659e-02 */
> +        .quad 0x3FF1F451B85AA9F0  /* A01 = +1.122148246892376022288e+00 */
> +        .quad 0xBFD34FD5F8288180  /* A02 = -3.017477916164565954205e-01 */
> +        .quad 0xBFA6FB692825B683  /* A03 = -4.488686194495718900788e-02 */
> +        .quad 0xBF9641C26E673D6F  /* A00 = -2.173522757385398448959e-02 */
> +        .quad 0x3FF24364DA5E2B07  /* A01 = +1.141453602790251542487e+00 */
> +        .quad 0xBFD564A5A5EF5890  /* A02 = -3.342680092295120530821e-01 */
> +        .quad 0xBF9B43712011A982  /* A03 = -2.662445791467283467968e-02 */
> +        .quad 0xBF9A901038EC2F39  /* A00 = -2.594018313816024226548e-02 */
> +        .quad 0x3FF2961356DFFEBA  /* A01 = +1.161639537196534011088e+00 */
> +        .quad 0xBFD775EBB17198C7  /* A02 = -3.665723069046972759644e-01 */
> +        .quad 0xBF833B1A926CD462  /* A03 = -9.390075295963199591975e-03 */
> +        .quad 0xBF9F396A6A461B91  /* A00 = -3.049246095317987084727e-02 */
> +        .quad 0x3FF2EB53BAEF534B  /* A01 = +1.182452898229899629357e+00 */
> +        .quad 0xBFD97DABF8AD8BBD  /* A02 = -3.982953957076310058660e-01 */
> +        .quad 0x3F7B8F6A3E0F8837  /* A03 = +6.728568086119371925713e-03 */
> +        .quad 0xBFA21878590F8BAA  /* A00 = -3.534294211546946951064e-02 */
> +        .quad 0x3FF34209790236E1  /* A01 = +1.203622315111197105253e+00 */
> +        .quad 0xBFDB764C0E71BECB  /* A02 = -4.290952817018306997277e-01 */
> +        .quad 0x3F962FE0C03F84C0  /* A03 = +2.166701482190513949888e-02 */
> +        .quad 0xBFA4B36B9AD27ECC  /* A00 = -4.043136849327097492868e-02 */
> +        .quad 0x3FF3990C5B12FC16  /* A01 = +1.224865298994477935679e+00 */
> +        .quad 0xBFDD5AABB0D01390  /* A02 = -4.586590983092770912322e-01 */
> +        .quad 0x3FA21DAF5CA162DB  /* A03 = +3.538272863142363083844e-02 */
> +        .quad 0xBFA7645E4D7BF28B  /* A00 = -4.568762489177399105378e-02 */
> +        .quad 0x3FF3EF2FD51C0D9F  /* A01 = +1.245895225962932562069e+00 */
> +        .quad 0xBFDF26377E1B686E  /* A02 = -4.867075664057044503963e-01 */
> +        .quad 0x3FA8803E756EE812  /* A03 = +4.785342391501513914509e-02 */
> +        .quad 0xBFAA210925C64413  /* A00 = -5.103329263796054643398e-02 */
> +        .quad 0x3FF44349F897D8E7  /* A01 = +1.266427966181760345066e+00 */
> +        .quad 0xBFE06A7B02C6D8E2  /* A02 = -5.129981092675530707226e-01 */
> +        .quad 0x3FAE3F194734F5D0  /* A03 = +5.907515520309980505687e-02 */
> +        .quad 0xBFACDE48F8A19BBB  /* A00 = -5.638340029764018351832e-02 */
> +        .quad 0x3FF49439D5466582  /* A01 = +1.286187966447272845727e+00 */
> +        .quad 0xBFE131C7C1063DDC  /* A02 = -5.373266954429101183166e-01 */
> +        .quad 0x3FB1ADEEC36AD805  /* A03 = +6.906025191241844940482e-02 */
> +        .quad 0xBFAF905D8F585680  /* A00 = -6.164829611604449866036e-02 */
> +        .quad 0x3FF4E0ED1FD27F99  /* A01 = +1.304913639360142818546e+00 */
> +        .quad 0xBFE1E7A859DC1D3D  /* A02 = -5.595285182070380836095e-01 */
> +        .quad 0x3FB3ED018E4642A1  /* A03 = +7.783517573831001679086e-02 */
> +        .quad 0xBFB11595104160BA  /* A00 = -6.673556944713512906198e-02 */
> +        .quad 0x3FF528650340490B  /* A01 = +1.322361958217302513319e+00 */
> +        .quad 0xBFE28B14B40BC974  /* A02 = -5.794776455425521000109e-01 */
> +        .quad 0x3FB5DF49F5BAF6D7  /* A03 = +8.543836831355676453281e-02 */
> +        .quad 0xBFB2513A97344BA4  /* A00 = -7.155195418844911836587e-02 */
> +        .quad 0x3FF569BA0DB5EE14  /* A01 = +1.338312200124055273420e+00 */
> +        .quad 0xBFE31B53A8B67B20  /* A02 = -5.970857901737396389308e-01 */
> +        .quad 0x3FB787F297BB0544  /* A03 = +9.191814617499455275507e-02 */
> +        .quad 0xBFB37512E848FAFA  /* A00 = -7.600515528700305112331e-02 */
> +        .quad 0x3FF5A41F33B403C8  /* A01 = +1.352568819013173495591e+00 */
> +        .quad 0xBFE397F6EA9A58A5  /* A02 = -6.123003561103997904880e-01 */
> +        .quad 0x3FB8EAA9FF25CA06  /* A03 = +9.733068923177520814782e-02 */
> +        .quad 0xBFB47B3E603AFC5D  /* A00 = -8.000554894805263217439e-02 */
> +        .quad 0x3FF5D6E3EDE40487  /* A01 = +1.364963464031718975988e+00 */
> +        .quad 0xBFE400D5BCA6D631  /* A02 = -6.251019177058819709103e-01 */
> +        .quad 0x3FBA0B830ED567FE  /* A03 = +1.017381583418739132707e-01 */
> +        .quad 0xBFB5BBFE8AC90496  /* A00 = -8.489981544791400103200e-02 */
> +        .quad 0x3FF612BA70107E95  /* A01 = +1.379572332145390989311e+00 */
> +        .quad 0xBFE477EAF1FA7693  /* A02 = -6.396383978023599814478e-01 */
> +        .quad 0x3FBB4784B7C08A95  /* A03 = +1.065600346196709652391e-01 */
> +        .quad 0xBFB6D5D940743939  /* A00 = -8.920057128509463473254e-02 */
> +        .quad 0x3FF644A8748F70CE  /* A01 = +1.391762214006166953340e+00 */
> +        .quad 0xBFE4D646AB07EA37  /* A02 = -6.511567440459832267763e-01 */
> +        .quad 0x3FBC354F4E1D5292  /* A03 = +1.101884427747086558913e-01 */
> +        .quad 0xBFB7223D19E4F3D1  /* A00 = -9.036619074045339206069e-02 */
> +        .quad 0x3FF6518FEB42B7FA  /* A01 = +1.394912642466350494175e+00 */
> +        .quad 0xBFE4ED86CB87498C  /* A02 = -6.539949393430091184598e-01 */
> +        .quad 0x3FBC6D29F28CCA9B  /* A03 = +1.110407082713131127205e-01 */
> +        .quad 0xBFB6878652FF6312  /* A00 = -8.800544287022329936754e-02 */
> +        .quad 0x3FF63948C302D040  /* A01 = +1.388985406648330922508e+00 */
> +        .quad 0xBFE4C4E2E7904E17  /* A02 = -6.490339777687407218920e-01 */
> +        .quad 0x3FBC127356CA1ABE  /* A03 = +1.096565329445224612481e-01 */
> +        .quad 0xBFB4F5D18B0C91D6  /* A00 = -8.187589306596207427980e-02 */
> +        .quad 0x3FF5FD27EB7DD0B8  /* A01 = +1.374305648697413673176e+00 */
> +        .quad 0xBFE464E01A2B2FC6  /* A02 = -6.373138915164353601739e-01 */
> +        .quad 0x3FBB460547674A30  /* A03 = +1.065371798825160976065e-01 */
> +        .quad 0xBFB26642FA16A685  /* A00 = -7.187288861919156890412e-02 */
> +        .quad 0x3FF59F9BEDE1C95A  /* A01 = +1.351467065073470141812e+00 */
> +        .quad 0xBFE3D67920C8FBEA  /* A02 = -6.199308052381387046381e-01 */
> +        .quad 0x3FBA24F6A8D3CBC1  /* A03 = +1.021265184570401413078e-01 */
> +        .quad 0xBFADB5294794F097  /* A00 = -5.802277563859197656582e-02 */
> +        .quad 0x3FF523EA7B9CF453  /* A01 = +1.321268542159732772845e+00 */
> +        .quad 0xBFE322A8B55E35DB  /* A02 = -5.979808370918208160205e-01 */
> +        .quad 0x3FB8C8673B1B3E37  /* A03 = +9.680791085269722928697e-02 */
> +        .quad 0xBFA4B7D661965C6A  /* A00 = -4.046506825687219699450e-02 */
> +        .quad 0x3FF48DE3E2CE3122  /* A01 = +1.284641157110919085227e+00 */
> +        .quad 0xBFE251FED1A7F445  /* A02 = -5.725092024655472622285e-01 */
> +        .quad 0x3FB745699FCABDB9  /* A03 = +9.090290213747821701507e-02 */
> +        .quad 0xBF93E60456E4EE1D  /* A00 = -1.943213253365004902773e-02 */
> +        .quad 0x3FF3E1A14E628A59  /* A01 = +1.242585474196536532432e+00 */
> +        .quad 0xBFE16C5AB660E876  /* A02 = -5.444768488007543094653e-01 */
> +        .quad 0x3FB5AD33AA8C188F  /* A03 = +8.467410005332197397987e-02 */
> +        .quad 0x3F738C17C47C7961  /* A00 = +4.772274820224659853951e-03 */
> +        .quad 0x3FF3234DDE3BD146  /* A01 = +1.196119182682268355933e+00 */
> +        .quad 0xBFE078C0D77A9D3B  /* A02 = -5.147403915952176722826e-01 */
> +        .quad 0x3FB40D74B3E276B8  /* A03 = +7.833032027925923568290e-02 */
> +        .quad 0x3FA0474BECC689C7  /* A00 = +3.179394975019849550746e-02 */
> +        .quad 0x3FF256FB4FA7D18A  /* A01 = +1.146235762743432307076e+00 */
> +        .quad 0xBFDEFA8E3FB285E2  /* A02 = -4.840427038235174395098e-01 */
> +        .quad 0x3FB270C007493D59  /* A03 = +7.203293016322244446403e-02 */
> +        .quad 0x3FAF5BD51E479BDC  /* A00 = +6.124750132203590768931e-02 */
> +        .quad 0x3FF18081D0B53BC5  /* A01 = +1.093873801484492647162e+00 */
> +        .quad 0xBFDCFE2439BD0C03  /* A02 = -4.530115665294831006626e-01 */
> +        .quad 0x3FB0DEFE5A45AFDD  /* A03 = +6.590261176978580437424e-02 */
> +        .quad 0x3FB7BD5D2806EA26  /* A00 = +9.273321368429118805032e-02 */
> +        .quad 0x3FF0A369E35B4440  /* A01 = +1.039895904647224256223e+00 */
> +        .quad 0xBFDB04BC5C9951E7  /* A02 = -4.221640495573226181669e-01 */
> +        .quad 0x3FAEBBBAA9D6DEEF  /* A03 = +6.002600978120919278380e-02 */
> +        .quad 0x3FC01BE411098DBC  /* A00 = +1.258511622610124502941e-01 */
> +        .quad 0x3FEF85BDABC031C1  /* A01 = +9.850757936961188621083e-01 */
> +        .quad 0xBFD91521375097C2  /* A02 = -3.919146576102968682065e-01 */
> +        .quad 0x3FABE26F0086D982  /* A03 = +5.446192628317005068883e-02 */
> +        .quad 0x3FC481D7FF5776B9  /* A00 = +1.602125164781023347604e-01 */
> +        .quad 0x3FEDC3506C1E7218  /* A01 = +9.300920592973538347792e-01 */
> +        .quad 0xBFD7349A88DA7D4F  /* A02 = -3.625856720409119104964e-01 */
> +        .quad 0x3FA936E2DFF8E2AE  /* A03 = +4.924687370334389358018e-02 */
> +        .quad 0x3FC90471F96FA27A  /* A00 = +1.954481571149420671141e-01 */
> +        .quad 0x3FEC0451601987A2  /* A01 = +8.755270840595026360376e-01 */
> +        .quad 0xBFD5671CD4B898DC  /* A02 = -3.344184949259110251063e-01 */
> +        .quad 0x3FA6BB9594603B67  /* A03 = +4.439990459660841243261e-02 */
> +        .quad 0x3FCFD8ADB9ED944C  /* A00 = +2.488000066615846384011e-01 */
> +        .quad 0x3FE978C073F6809A  /* A01 = +7.959902062321078108909e-01 */
> +        .quad 0xBFD2DF7E00BCD5A9  /* A02 = -2.948908812716931060471e-01 */
> +        .quad 0x3FA3614033D490B2  /* A03 = +3.785133965200894456959e-02 */
> +        .quad 0x3FD4846A12AFE5A0  /* A00 = +3.205819303981005674586e-01 */
> +        .quad 0x3FE63A1147D40472  /* A01 = +6.945883181471244061100e-01 */
> +        .quad 0xBFCFA2268AD34450  /* A02 = -2.471359422548027318101e-01 */
> +        .quad 0x3F9F150201D9FFE0  /* A03 = +3.035357605267552383310e-02 */
> +        .quad 0x3FD9018641F82BEB  /* A00 = +3.907180446846598154131e-01 */
> +        .quad 0x3FE33B7C220FFBDC  /* A01 = +6.010113396913498995389e-01 */
> +        .quad 0xBFCA4E4187E29C86  /* A02 = -2.055131829740483584423e-01 */
> +        .quad 0x3F98C30CED19F8F4  /* A03 = +2.418155858185229434287e-02 */
> +        .quad 0x3FDD4B8255BEB078  /* A00 = +4.577337109901757905561e-01 */
> +        .quad 0x3FE0858B19D3A49B  /* A01 = +5.163016800335243905451e-01 */
> +        .quad 0xBFC5BC929EACE564  /* A02 = -1.698172831327539045176e-01 */
> +        .quad 0x3F93A083CE57DE2B  /* A03 = +1.916700312537337677621e-02 */
> +        .quad 0x3FE0A8E5E039295C  /* A00 = +5.206174258576470315063e-01 */
> +        .quad 0x3FDC35E1234583FE  /* A01 = +4.407885403107342225937e-01 */
> +        .quad 0xBFC1DE034E31AEB9  /* A02 = -1.395877963835710222629e-01 */
> +        .quad 0x3F8EFDEBB3471BDC  /* A03 = +1.513275280821162888101e-02 */
> +        .quad 0x3FE2851B603CB2A5  /* A00 = +5.787484054213406503564e-01 */
> +        .quad 0x3FD7F4A44ABBB286  /* A01 = +3.743067483726821853551e-01 */
> +        .quad 0xBFBD3EEB67087DE7  /* A02 = -1.142413260026767657385e-01 */
> +        .quad 0x3F8864F38329E8BD  /* A03 = +1.191129917173260922836e-02 */
> +        .quad 0x3FE437DBE3C34AC1  /* A00 = +6.318187187665317283702e-01 */
> +        .quad 0x3FD43F6F789441B5  /* A01 = +3.163717916040938438194e-01 */
> +        .quad 0xBFB7D92E7901B9A4  /* A02 = -9.315767721429907277653e-02 */
> +        .quad 0x3F8327ED342308E1  /* A03 = +9.353497651663324544136e-03 */
> +        .quad 0x3FE5C0977766D55C  /* A00 = +6.797597248138731451661e-01 */
> +        .quad 0x3FD10B42A764D8F9  /* A01 = +2.663122782427219115142e-01 */
> +        .quad 0xBFB3633351D3D70F  /* A02 = -7.573242900602060456716e-02 */
> +        .quad 0x3F7E079E30FF899C  /* A03 = +7.331483779099558922843e-03 */
> +        .quad 0x3FE7202CE08A88C4  /* A00 = +7.226776490754436288455e-01 */
> +        .quad 0x3FCC973EB5662B01  /* A01 = +2.233656297433626314319e-01 */
> +        .quad 0xBFAF70A455F9920B  /* A02 = -6.140626477716545211782e-02 */
> +        .quad 0x3F77812411CE99B6  /* A03 = +5.738392731393584730859e-03 */
> +        .quad 0x3FE85879424095B1  /* A00 = +7.608000082006382003286e-01 */
> +        .quad 0x3FC7E73BD1674D84  /* A01 = +1.867441914060742336190e-01 */
> +        .quad 0xBFA96F84E4BF333B  /* A02 = -4.967894832916504993525e-02 */
> +        .quad 0x3F72606DDCA6E117  /* A03 = +4.486493251924870105662e-03 */
> +        .quad 0x3FE96BFE4957F4DD  /* A00 = +7.944327766887472330737e-01 */
> +        .quad 0x3FC3ED4780D25478  /* A01 = +1.556786898624158421711e-01 */
> +        .quad 0xBFA489C5F9A56B58  /* A02 = -4.011362717093075458408e-02 */
> +        .quad 0x3F6CB5DC17E9AD2A  /* A03 = +3.504686231556104931972e-03 */
> +        .quad 0x3FEA5D9CB2F41234  /* A00 = +8.239272589858672724006e-01 */
> +        .quad 0x3FC091A758374DCF  /* A01 = +1.294449978582705440555e-01 */
> +        .quad 0xBFA08E436D4B5CE0  /* A02 = -3.233538350257858517978e-02 */
> +        .quad 0x3F666997AD53E6B7  /* A03 = +2.735897297154145629133e-03 */
> +        .quad 0x3FEB3060342CB850  /* A00 = +8.496552485501158713532e-01 */
> +        .quad 0x3FBB7D30BBC7DC1B  /* A01 = +1.073790033768634993860e-01 */
> +        .quad 0xBF9AA6BA3443D9E3  /* A02 = -2.602663940430173170060e-02 */
> +        .quad 0x3F617CA764B7850B  /* A03 = +2.134634914668814050648e-03 */
> +        .quad 0x3FEBE759A6A0C7B8  /* A00 = +8.719909910635044170135e-01 */
> +        .quad 0x3FB6C10DE6A703FF  /* A01 = +8.888327485239243264115e-02 */
> +        .quad 0xBF956C566D8BE1F6  /* A02 = -2.092108768099084498138e-02 */
> +        .quad 0x3F5B46D1A4A59CF8  /* A03 = +1.664833764687232917079e-03 */
> +        .quad 0x3FEC858494887A04  /* A00 = +8.912985707318630268503e-01 */
> +        .quad 0x3FB2CC31F543394D  /* A01 = +7.342827070099140762682e-02 */
> +        .quad 0xBF9133477FF69137  /* A02 = -1.679717749142747504343e-02 */
> +        .quad 0x3F5544482FBB4DA5  /* A03 = +1.298017973501022466823e-03 */
> +        .quad 0x3FED0DB59D0E32E9  /* A00 = +9.079235141267335551518e-01 */
> +        .quad 0x3FAF006BAFFC6EF4  /* A01 = +6.055008433597022787787e-02 */
> +        .quad 0xBF8B97146FA2B97A  /* A02 = -1.347175565419144252499e-02 */
> +        .quad 0x3F5093B01F4CDC69  /* A03 = +1.011774057770665211434e-03 */
> +        .quad 0x3FEDB487C3EC457C  /* A00 = +9.282873942012623835751e-01 */
> +        .quad 0x3FA7390C09D0BD1D  /* A01 = +4.535710925881118044112e-02 */
> +        .quad 0xBF83D9F7C3181106  /* A02 = -9.693084374710735778846e-03 */
> +        .quad 0x3F46E34A0A3C0E64  /* A03 = +6.984817050299072134500e-04 */
> +        .quad 0x3FEE5FFCB4E6EB00  /* A00 = +9.492171796076434020506e-01 */
> +        .quad 0x3F9F4913ED00AADF  /* A01 = +3.055220731782070861526e-02 */
> +        .quad 0xBF79670BD0E59B5C  /* A02 = -6.201788097633133961528e-03 */
> +        .quad 0x3F3BC998EBCAF96D  /* A03 = +4.240034429975534616304e-04 */
> +        .quad 0x3FEEDBA41E9542FE  /* A00 = +9.643116566968215064293e-01 */
> +        .quad 0x3F94F5DD18D9C24D  /* A01 = +2.046914543319848858727e-02 */
> +        .quad 0xBF7034896AA122B9  /* A02 = -3.956352980886528904192e-03 */
> +        .quad 0x3F30DCCB47810B39  /* A03 = +2.573009765038273091199e-04 */
> +        .quad 0x3FEF33F2882520ED  /* A00 = +9.750912341196716903724e-01 */
> +        .quad 0x3F8BF37F2CF553FF  /* A01 = +1.364802699996836392315e-02 */
> +        .quad 0xBF649F6F05A69619  /* A02 = -2.517430152880317534986e-03 */
> +        .quad 0x3F247623C950AAC9  /* A03 = +1.561087307505231250044e-04 */
> +        .quad 0x3FEF727757751741  /* A00 = +9.827229221489021115943e-01 */
> +        .quad 0x3F828E67912C4400  /* A01 = +9.060677640748693306705e-03 */
> +        .quad 0xBF5A2F51A806CC2C  /* A02 = -1.598195784123355826789e-03 */
> +        .quad 0x3F18D35D7687E613  /* A03 = +9.470231965016282719549e-05 */
> +        .quad 0x3FEF9E6325C5942A  /* A00 = +9.880843866091073568469e-01 */
> +        .quad 0x3F788AB117618F76  /* A01 = +5.991641772286606867914e-03 */
> +        .quad 0xBF5096EAB0B1EA89  /* A02 = -1.012543859160305046233e-03 */
> +        .quad 0x3F0E1E50EC4435AB  /* A03 = +5.744633156910412119652e-05 */
> +        .quad 0x3FEFBD0784049369  /* A00 = +9.918248728250605994461e-01 */
> +        .quad 0x3F702BBD8294035F  /* A01 = +3.947963975634432264028e-03 */
> +        .quad 0xBF44FB55E0F00593  /* A02 = -6.403130845457509273330e-04 */
> +        .quad 0x3F0244DCD723230A  /* A03 = +3.484534217219031730379e-05 */
> +        .quad 0x3FEFD245E2366A43  /* A00 = +9.944180887426415926811e-01 */
> +        .quad 0x3F653D82EC088433  /* A01 = +2.592807490387838333795e-03 */
> +        .quad 0xBF3A7DF75E013CB8  /* A02 = -4.042366908878036561859e-04 */
> +        .quad 0x3EF6298E69F991CD  /* A03 = +2.113564425911141559972e-05 */
> +        .quad 0x3FEFE0EAA508BC69  /* A00 = +9.962056372950317539861e-01 */
> +        .quad 0x3F5BD0771AF3FDDA  /* A01 = +1.697651208644282514598e-03 */
> +        .quad 0xBF30B2E1254DE571  /* A02 = -2.548026725928887099328e-04 */
> +        .quad 0x3EEAE28B70EC0256  /* A03 = +1.281973848454955042307e-05 */
> +        .quad 0x3FEFEAF5303D7F96  /* A00 = +9.974313680831865536192e-01 */
> +        .quad 0x3F5229111365657E  /* A01 = +1.108423877289460134782e-03 */
> +        .quad 0xBF250572D04DFE66  /* A02 = -1.603796628408704519168e-04 */
> +        .quad 0x3EE04E89BB57C981  /* A03 = +7.775682983689149966743e-06 */
> +        .quad 0x3FEFF1CF52F1CF44  /* A00 = +9.982678051005469122003e-01 */
> +        .quad 0x3F47A71316147CEB  /* A01 = +7.218211359577819110842e-04 */
> +        .quad 0xBF1A6D7604055719  /* A02 = -1.008132248946049582547e-04 */
> +        .quad 0x3ED3C8047586A85C  /* A03 = +4.716233739913014633626e-06 */
> +        .quad 0x3FEFF6770369EF69  /* A00 = +9.988360468555416149528e-01 */
> +        .quad 0x3F3EBB261180FBF0  /* A01 = +4.689186039321105101130e-04 */
> +        .quad 0xBF1097754FE19D7F  /* A02 = -6.329206004950480057066e-05 */
> +        .quad 0x3EC7FEFF83BCA0A7  /* A03 = +2.860556404988488738366e-06 */
> +        .quad 0x3FEFF99D42371AC4  /* A00 = +9.992204945818561334647e-01 */
> +        .quad 0x3F33EB2AEC271F59  /* A01 = +3.039340773764907474054e-04 */
> +        .quad 0xBF04CF18E0FC0D79  /* A02 = -3.968996690952969588805e-05 */
> +        .quad 0x3EBD1BDBD6019BE9  /* A03 = +1.735021065507727833886e-06 */
> +        .quad 0x3FEFFBBCA32B0D91  /* A00 = +9.994795977476532700123e-01 */
> +        .quad 0x3F29C41E1615110A  /* A01 = +1.965796209707565346710e-04 */
> +        .quad 0xBEFA11F93D9DCB5A  /* A02 = -2.486248909101414873235e-05 */
> +        .quad 0x3EB1A7CA4546F7A7  /* A03 = +1.052345642723709228769e-06 */
> +        .quad 0x3FEFFD298B8E8DE2  /* A00 = +9.996535993308806045121e-01 */
> +        .quad 0x3F20A1C42D523C5B  /* A01 = +1.268913244172078754520e-04 */
> +        .quad 0xBEF0507A364AFAE4  /* A02 = -1.555859070622834605755e-05 */
> +        .quad 0x3EA56ACA17E7CDF4  /* A03 = +6.382806956848098872313e-07 */
> +        .quad 0x3FEFFE1DC82BA5A3  /* A00 = +9.997700604991915929176e-01 */
> +        .quad 0x3F156E73B90F1769  /* A01 = +8.175450626798714452801e-05 */
> +        .quad 0xBEE4663579D0A09F  /* A02 = -9.727122057226747625365e-06 */
> +        .quad 0x3E99FAF6FEC5D4C1  /* A03 = +3.871371052824002996020e-07 */
> +        .quad 0x3FEFFEF8D0BB5E81  /* A00 = +9.998745037837154514548e-01 */
> +        .quad 0x3F06686DA18D39C3  /* A01 = +4.273972098777251447726e-05 */
> +        .quad 0xBED46BC298073E90  /* A02 = -4.868731025855742842491e-06 */
> +        .quad 0x3E88E42286B9D0FD  /* A03 = +1.854535328530838170114e-07 */
> +        .quad 0x3FEFFF8DBC68DDC7  /* A00 = +9.999455146670975791423e-01 */
> +        .quad 0x3EF26B2953A80AF0  /* A01 = +1.756534514108903368909e-05 */
> +        .quad 0xBEBFC4472D580F83  /* A02 = -1.893443529411295465239e-06 */
> +        .quad 0x3E72505B4553D19F  /* A03 = +6.822456673547912277047e-08 */
> +        .quad 0x3FEFFFCED1276609  /* A00 = +9.999765477215883935358e-01 */
> +        .quad 0x3EDE1A94C7CC58F5  /* A01 = +7.177313020153979672606e-06 */
> +        .quad 0xBEA8A2C988744E57  /* A02 = -7.342066660497443762363e-07 */
> +        .quad 0x3E5AF30036BBBAF4  /* A03 = +2.509841882843541084885e-08 */
> +        .quad 0x3FEFFFEAFE70FCFC  /* A00 = +9.999899835164849370983e-01 */
> +        .quad 0x3EC879175E3549F5  /* A01 = +2.917410471128503564412e-06 */
> +        .quad 0xBE930E36677D1813  /* A02 = -2.839493400307523115929e-07 */
> +        .quad 0x3E43D4005B42D48F  /* A03 = +9.233192745401904898013e-09 */
> +        .quad 0x3ff0000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .align 16
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSignMask        */
> +        .align 16
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff           /* _sAbsMask         */
> +        .align 16
> +        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask     */
> +        .align 16
> +        .long 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000           /* _iExpMask         */
> +        .align 16
> +        .long 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000           /* _iMinIdxOfsMask   */
> +        .align 16
> +        .long 0x04280000, 0x04280000, 0x04280000, 0x04280000           /* _iMaxIdxMask      */
> +        .align 16
> +        .type	__svml_stanh_data_internal,@object
> +        .size	__svml_stanh_data_internal,.-__svml_stanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S
> new file mode 100644
> index 0000000000..a56795e3cd
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized tanhf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_tanhf _ZGVdN8v_tanhf_sse_wrapper
> +#include "../svml_s_tanhf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c
> new file mode 100644
> index 0000000000..fadcea36ab
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized tanhf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_tanhf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_tanhf, __GI__ZGVdN8v_tanhf,
> +	       __redirect__ZGVdN8v_tanhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
> new file mode 100644
> index 0000000000..c19e6bf8b5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
> @@ -0,0 +1,844 @@
> +/* Function tanhf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_stanh_data_internal
> + */
> +#define _dbP                          	0
> +#define _sSignMask                    	4288
> +#define _sAbsMask                     	4320
> +#define _iExpMantMask                 	4352
> +#define _iExpMask                     	4384
> +#define _iMinIdxOfsMask               	4416
> +#define _iMaxIdxMask                  	4448
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_tanhf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        pushq     %r12
> +        subq      $120, %rsp
> +        lea       _dbP+16+__svml_stanh_data_internal(%rip), %r10
> +        vmovaps   %ymm0, %ymm12
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        vpand     _iExpMantMask+__svml_stanh_data_internal(%rip), %ymm12, %ymm14
> +
> +/*
> + *  small table specific variables *
> + *  Constant loading
> + */
> +        vmovups   _iMaxIdxMask+__svml_stanh_data_internal(%rip), %ymm8
> +        vpsubd    _iMinIdxOfsMask+__svml_stanh_data_internal(%rip), %ymm14, %ymm9
> +
> +/* if VMIN, VMAX is defined for I type */
> +        vxorps    %ymm15, %ymm15, %ymm15
> +        vpcmpgtd  %ymm15, %ymm9, %ymm0
> +        vpand     %ymm0, %ymm9, %ymm7
> +        vpcmpgtd  %ymm8, %ymm9, %ymm6
> +        vblendvps %ymm6, %ymm8, %ymm7, %ymm3
> +        vpsrld    $14, %ymm3, %ymm1
> +        vpcmpgtd  _iExpMask+__svml_stanh_data_internal(%rip), %ymm14, %ymm13
> +        vmovmskps %ymm13, %r11d
> +        vandps    _sAbsMask+__svml_stanh_data_internal(%rip), %ymm12, %ymm10
> +        vandps    _sSignMask+__svml_stanh_data_internal(%rip), %ymm12, %ymm11
> +        vextractf128 $1, %ymm1, %xmm2
> +        vmovd     %xmm1, %r9d
> +        vmovd     %xmm2, %ecx
> +        vpextrd   $1, %xmm2, %edx
> +        vpextrd   $1, %xmm1, %r8d
> +        movslq    %r9d, %r9
> +        movslq    %edx, %rdx
> +        movslq    %r8d, %r8
> +        vpextrd   $2, %xmm1, %edi
> +        movslq    %ecx, %rcx
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -8; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf8, 0xff, 0xff, 0xff, 0x22
> +        vpextrd   $3, %xmm2, %r12d
> +        vpextrd   $3, %xmm1, %esi
> +        vpextrd   $2, %xmm2, %eax
> +        movslq    %edi, %rdi
> +        movslq    %r12d, %r12
> +        movslq    %esi, %rsi
> +        movslq    %eax, %rax
> +        vmovupd   -16(%r9,%r10), %xmm5
> +        vmovupd   -16(%rdx,%r10), %xmm14
> +        vmovupd   -16(%rcx,%r10), %xmm13
> +        vmovupd   (%r9,%r10), %xmm1
> +        vmovupd   (%r8,%r10), %xmm2
> +        vmovupd   -16(%r8,%r10), %xmm4
> +        vinsertf128 $1, -16(%rdi,%r10), %ymm5, %ymm15
> +        vinsertf128 $1, -16(%r12,%r10), %ymm14, %ymm3
> +        vinsertf128 $1, -16(%rax,%r10), %ymm13, %ymm6
> +        vinsertf128 $1, (%rdi,%r10), %ymm1, %ymm5
> +        vinsertf128 $1, (%rsi,%r10), %ymm2, %ymm14
> +        vunpcklpd %ymm3, %ymm6, %ymm8
> +        vunpckhpd %ymm3, %ymm6, %ymm6
> +        vunpcklpd %ymm14, %ymm5, %ymm3
> +        vunpckhpd %ymm14, %ymm5, %ymm2
> +        vmovupd   (%rcx,%r10), %xmm13
> +        vcvtps2pd %xmm10, %ymm5
> +        vextractf128 $1, %ymm10, %xmm10
> +        vfmadd213pd %ymm3, %ymm5, %ymm2
> +        vinsertf128 $1, -16(%rsi,%r10), %ymm4, %ymm0
> +        vmovupd   (%rdx,%r10), %xmm4
> +        vunpcklpd %ymm0, %ymm15, %ymm9
> +        vunpckhpd %ymm0, %ymm15, %ymm7
> +        vfmadd213pd %ymm7, %ymm5, %ymm2
> +        vfmadd213pd %ymm9, %ymm5, %ymm2
> +        vinsertf128 $1, (%r12,%r10), %ymm4, %ymm0
> +        vcvtps2pd %xmm10, %ymm4
> +        vinsertf128 $1, (%rax,%r10), %ymm13, %ymm15
> +        vunpcklpd %ymm0, %ymm15, %ymm1
> +        vunpckhpd %ymm0, %ymm15, %ymm0
> +        vfmadd213pd %ymm1, %ymm4, %ymm0
> +        vcvtpd2ps %ymm2, %xmm1
> +        vfmadd213pd %ymm6, %ymm4, %ymm0
> +        vfmadd213pd %ymm8, %ymm4, %ymm0
> +        vcvtpd2ps %ymm0, %xmm0
> +        vinsertf128 $1, %xmm0, %ymm1, %ymm2
> +        vorps     %ymm11, %ymm2, %ymm0
> +        testl     %r11d, %r11d
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r13 r14 r15 r11d ymm0 ymm12
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $120, %rsp
> +        cfi_restore(12)
> +        popq      %r12
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -8; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf8, 0xff, 0xff, 0xff, 0x22
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm12, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r13 r14 r15 r11d ymm0
> +
> +        xorl      %r12d, %r12d
> +                                # LOE rbx r13 r14 r15 r11d r12d
> +
> +        vzeroupper
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        movl      %r11d, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      tanhf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_tanhf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_stanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _dbP[(134*4)][2];
> +        __declspec(align(32)) VUINT32 _sSignMask[8][1];
> +        __declspec(align(32)) VUINT32 _sAbsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iExpMantMask[8][1];
> +        __declspec(align(32)) VUINT32 _iExpMask[8][1];
> +        __declspec(align(32)) VUINT32 _iMinIdxOfsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iMaxIdxMask[8][1];
> +} __svml_stanh_data_internal;
> +#endif
> +__svml_stanh_data_internal:
> +        /* Pol_000:  err=7.93e-09, x in [0.0000000; 0.0312500]. */
> +        .quad 0x0000000000000000  /* A00 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF00000022C70EB  /* A01 = +1.000000008097283510367e+00 */
> +        .quad 0xBED00E878CFFA194  /* A02 = -3.828228912518614443549e-06 */
> +        .quad 0xBFD551766D0607A9  /* A03 = -3.330970825846813476723e-01 */
> +        .quad 0xBE53D60CE3E4C297  /* A00 = -1.847383956330407336230e-08 */
> +        .quad 0x3FF000024177CF5C  /* A01 = +1.000002151235967140508e+00 */
> +        .quad 0xBF1758BC94A51A25  /* A02 = -8.906031613262943753568e-05 */
> +        .quad 0xBFD53EAE67E0D4F0  /* A03 = -3.319507612644221339337e-01 */
> +        .quad 0xBE5A9E47EF32D6FE  /* A00 = -2.479020984039698285657e-08 */
> +        .quad 0x3FF00002DA983057  /* A01 = +1.000002721676556793895e+00 */
> +        .quad 0xBF1BD953509E94AA  /* A02 = -1.062352277175377670507e-04 */
> +        .quad 0xBFD53BDB562EEDD5  /* A03 = -3.317783681520414806876e-01 */
> +        .quad 0xBE6191BBE496D294  /* A00 = -3.272532162914017685901e-08 */
> +        .quad 0x3FF0000390492017  /* A01 = +1.000003398528866105366e+00 */
> +        .quad 0xBF20727E814A57CE  /* A02 = -1.254825043772153972919e-04 */
> +        .quad 0xBFD538DE060A6F22  /* A03 = -3.315959033004550748913e-01 */
> +        .quad 0xBE66DAFA2A893A25  /* A00 = -4.257146219278012568149e-08 */
> +        .quad 0x3FF0000465E08CD1  /* A01 = +1.000004194219219266770e+00 */
> +        .quad 0xBF2341C765EF91B6  /* A02 = -1.469188600530365522261e-04 */
> +        .quad 0xBFD535B6841FAF9E  /* A03 = -3.314033785124993469751e-01 */
> +        .quad 0xBE6D5794E361E964  /* A00 = -5.465394929765249413434e-08 */
> +        .quad 0x3FF000055EE2A0CB  /* A01 = +1.000005121846742950353e+00 */
> +        .quad 0xBF265E6C77E66C8B  /* A02 = -1.706607253709506650304e-04 */
> +        .quad 0xBFD53264DDCCEDA6  /* A03 = -3.312008062382240103361e-01 */
> +        .quad 0xBE729C844D374A6E  /* A00 = -6.933284462462096107184e-08 */
> +        .quad 0x3FF000067F019093  /* A01 = +1.000006195180536350264e+00 */
> +        .quad 0xBF29CC5348D6DCE5  /* A02 = -1.968242326435338705130e-04 */
> +        .quad 0xBFD52EE92121ED35  /* A03 = -3.309881995734998416658e-01 */
> +        .quad 0xBE775AEA17EAA872  /* A00 = -8.700465590574974405858e-08 */
> +        .quad 0x3FF00007CA1D66B8  /* A01 = +1.000007428656699559610e+00 */
> +        .quad 0xBF2D8F5EB98A2637  /* A02 = -2.255252009216044881395e-04 */
> +        .quad 0xBFD52B435CDF9128  /* A03 = -3.307655722585587376727e-01 */
> +        .quad 0xBE7D04DA28C343F0  /* A00 = -1.081040272327705484794e-07 */
> +        .quad 0x3FF000094443CCF5  /* A01 = +1.000008837375216730337e+00 */
> +        .quad 0xBF30D5B76C947AE5  /* A02 = -2.568791210978817814332e-04 */
> +        .quad 0xBFD52773A0776FAD  /* A03 = -3.305329386764651045105e-01 */
> +        .quad 0xBE81DD77A12C51C7  /* A00 = -1.331054169875768625701e-07 */
> +        .quad 0x3FF0000AF1AFD2DA  /* A01 = +1.000010437096696680470e+00 */
> +        .quad 0xBF331230624C1680  /* A02 = -2.910011410651516805537e-04 */
> +        .quad 0xBFD52379FC0B61DF  /* A03 = -3.302903138515186909352e-01 */
> +        .quad 0xBE85D04EEEB3C435  /* A00 = -1.625247628488202841012e-07 */
> +        .quad 0x3FF0000CD6C9B1F2  /* A01 = +1.000012244238970726684e+00 */
> +        .quad 0xBF357F0742FADDD4  /* A02 = -3.280060509313874068243e-04 */
> +        .quad 0xBFD51F56806D0E81  /* A03 = -3.300377134475880880338e-01 */
> +        .quad 0xBE8A6E289B59681B  /* A00 = -1.969211333326924655065e-07 */
> +        .quad 0x3FF0000EF8268F72  /* A01 = +1.000014275873550406715e+00 */
> +        .quad 0xBF381E277A1B747A  /* A02 = -3.680082682942575423093e-04 */
> +        .quad 0xBFD51B093F1D6FD4  /* A03 = -3.297751537663746734808e-01 */
> +        .quad 0xBE8FCBC40EE9ABD5  /* A00 = -2.368983653301529373887e-07 */
> +        .quad 0x3FF000115A883B6C  /* A01 = +1.000016549721943981410e+00 */
> +        .quad 0xBF3AF17AC974B3D9  /* A02 = -4.111218235774406434303e-04 */
> +        .quad 0xBFD516924A4C549C  /* A03 = -3.295026517456081105450e-01 */
> +        .quad 0xBE92FFBC60A3F956  /* A00 = -2.831066871072026054144e-07 */
> +        .quad 0x3FF0001402DCED8A  /* A01 = +1.000019084151832604590e+00 */
> +        .quad 0xBF3DFAE9390C4801  /* A02 = -4.574603454311488280083e-04 */
> +        .quad 0xBFD511F1B4D7DC3A  /* A03 = -3.292202249571719585575e-01 */
> +        .quad 0xBE9690A22F96D5AD  /* A00 = -3.362443262393081632612e-07 */
> +        .quad 0x3FF00016F63EFF5D  /* A01 = +1.000021898173108825247e+00 */
> +        .quad 0xBF409E2C839605BB  /* A02 = -5.071370461992499986334e-04 */
> +        .quad 0xBFD50D27924BEE00  /* A03 = -3.289278916051614487515e-01 */
> +        .quad 0xBE9AA56C65E72A73  /* A00 = -3.970591019557469835586e-07 */
> +        .quad 0x3FF0001A39F4A43E  /* A01 = +1.000025011433776978009e+00 */
> +        .quad 0xBF425BD74C3D6667  /* A02 = -5.602647074553602319844e-04 */
> +        .quad 0xBFD50833F6E1ABA2  /* A03 = -3.286256705238718156536e-01 */
> +        .quad 0xBE9F4BD4FF1A83B0  /* A00 = -4.663500013744687071912e-07 */
> +        .quad 0x3FF0001DD36F9EC2  /* A01 = +1.000028444215715683896e+00 */
> +        .quad 0xBF44376634149405  /* A02 = -6.169556656102642569831e-04 */
> +        .quad 0xBFD50316F77EDEE5  /* A03 = -3.283135811757190158922e-01 */
> +        .quad 0xBEA3B625387BB079  /* A00 = -5.874486399249461304297e-07 */
> +        .quad 0x3FF00023E14CFBA9  /* A01 = +1.000034217911642153709e+00 */
> +        .quad 0xBF47392F923218D2  /* A02 = -7.087213783883111826306e-04 */
> +        .quad 0xBFD4FB1FACDEB938  /* A03 = -3.278273761924483942209e-01 */
> +        .quad 0xBEAA6E24F543500A  /* A00 = -7.876828740601738750574e-07 */
> +        .quad 0x3FF0002D5C6E8412  /* A01 = +1.000043259679163742959e+00 */
> +        .quad 0xBF4BAF02BD7FDD70  /* A02 = -8.448375110664940040861e-04 */
> +        .quad 0xBFD4EFEE6527A7DE  /* A03 = -3.271442401734229177279e-01 */
> +        .quad 0xBEB16E3EBE2157D0  /* A00 = -1.038947396133402500647e-06 */
> +        .quad 0x3FF00038990FEE2F  /* A01 = +1.000053975962952312884e+00 */
> +        .quad 0xBF50569481C574CB  /* A02 = -9.972048056490652716971e-04 */
> +        .quad 0xBFD4E419278DA2B4  /* A03 = -3.264220129263251113372e-01 */
> +        .quad 0xBEB6A7B6723165D4  /* A00 = -1.350350836279403750524e-06 */
> +        .quad 0x3FF00045CAB4158E  /* A01 = +1.000066558657042303793e+00 */
> +        .quad 0xBF531D7C9C849108  /* A02 = -1.166698160951775212202e-03 */
> +        .quad 0xBFD4D7A0BB33B152  /* A03 = -3.256608799117844954552e-01 */
> +        .quad 0xBEBD0EE2A8654AFD  /* A00 = -1.732000471561702711532e-06 */
> +        .quad 0x3FF00055276F18D6  /* A01 = +1.000081209219890521211e+00 */
> +        .quad 0xBF562FDBA3FB6C6C  /* A02 = -1.354183666925102939860e-03 */
> +        .quad 0xBFD4CA85F1B93DB2  /* A03 = -3.248610363561638125773e-01 */
> +        .quad 0xBEC269D4036A207E  /* A00 = -2.195047297096822741730e-06 */
> +        .quad 0x3FF00066E7DA6E4E  /* A01 = +1.000098138500919997540e+00 */
> +        .quad 0xBF5991499FC36B3A  /* A02 = -1.560518167983372759405e-03 */
> +        .quad 0xBFD4BCC9A72283D6  /* A03 = -3.240226871658341556426e-01 */
> +        .quad 0xBEC7154B6C09CFE1  /* A00 = -2.751729738565190291276e-06 */
> +        .quad 0x3FF0007B47086B80  /* A01 = +1.000117566559055148900e+00 */
> +        .quad 0xBF5D455433B4F8F4  /* A02 = -1.786548832412968197680e-03 */
> +        .quad 0xBFD4AE6CC1BFE145  /* A03 = -3.231460468373550942722e-01 */
> +        .quad 0xBECCA68CC64A0F8A  /* A00 = -3.415415948561670285790e-06 */
> +        .quad 0x3FF00092827742F7  /* A01 = +1.000139722473418535387e+00 */
> +        .quad 0xBF60A7BF15A527AF  /* A02 = -2.033112728132522705610e-03 */
> +        .quad 0xBFD49F703214084C  /* A03 = -3.222313393636155876010e-01 */
> +        .quad 0xBED19E68676B241B  /* A00 = -4.200644630977303616698e-06 */
> +        .quad 0x3FF000ACDA037B26  /* A01 = +1.000164844146362863597e+00 */
> +        .quad 0xBF62D99F836A02F8  /* A02 = -2.301036405072284102280e-03 */
> +        .quad 0xBFD48FD4F2B91B28  /* A03 = -3.212787981359945810311e-01 */
> +        .quad 0xBED57CF4B0C7AA54  /* A00 = -5.123164339408145209103e-06 */
> +        .quad 0x3FF000CA8FD9E1A1  /* A01 = +1.000193178099017865534e+00 */
> +        .quad 0xBF653A014548E686  /* A02 = -2.591135484433962181405e-03 */
> +        .quad 0xBFD47F9C0844B38F  /* A03 = -3.202886658426046806447e-01 */
> +        .quad 0xBEDA012B1B1A41E2  /* A00 = -6.199971197454598722328e-06 */
> +        .quad 0x3FF000EBE868FDF4  /* A01 = +1.000224979259539459520e+00 */
> +        .quad 0xBF67CA9427E0A544  /* A02 = -2.904214255086275467410e-03 */
> +        .quad 0xBFD46EC6812ADB37  /* A03 = -3.192611943626845749655e-01 */
> +        .quad 0xBEDF3EAC5BF12194  /* A00 = -7.449344990702664567927e-06 */
> +        .quad 0x3FF001112A520784  /* A01 = +1.000260510744255704196e+00 */
> +        .quad 0xBF6A8D01ABDA4DC4  /* A02 = -3.241065277345108255891e-03 */
> +        .quad 0xBFD45D55759FFA4A  /* A03 = -3.181966446572103146551e-01 */
> +        .quad 0xBEE2A541BC274267  /* A00 = -8.890883582164319970972e-06 */
> +        .quad 0x3FF0013A9E5961F2  /* A01 = +1.000300043631906721231e+00 */
> +        .quad 0xBF6D82ECD080C540  /* A02 = -3.602468994380686462264e-03 */
> +        .quad 0xBFD44B4A0779C0AD  /* A03 = -3.170952866557950611259e-01 */
> +        .quad 0xBEE61D97609A27F4  /* A00 = -1.054553560499505625520e-05 */
> +        .quad 0x3FF001688F56A3AF  /* A01 = +1.000343856731187974773e+00 */
> +        .quad 0xBF7056F8EFB683EC  /* A02 = -3.989193351487490407647e-03 */
> +        .quad 0xBFD438A5620F0F74  /* A03 = -3.159573991399533543500e-01 */
> +        .quad 0xBEEA145429EDD370  /* A00 = -1.243563138839952927732e-05 */
> +        .quad 0x3FF0019B4A242A67  /* A01 = +1.000392236341804297339e+00 */
> +        .quad 0xBF7207D31CA78D9B  /* A02 = -4.401993423445739288258e-03 */
> +        .quad 0xBFD42568BA16E7CD  /* A03 = -3.147832696228050619602e-01 */
> +        .quad 0xBEEE96370D52680F  /* A00 = -1.458491207477835326165e-05 */
> +        .quad 0x3FF001D31D8E4115  /* A01 = +1.000445476009251821736e+00 */
> +        .quad 0xBF73D4CC11EDC094  /* A02 = -4.841611050196221316400e-03 */
> +        .quad 0xBFD411954D8664E7  /* A03 = -3.135731942252974469021e-01 */
> +        .quad 0xBEF338C046215EF8  /* A00 = -1.833122622260562810219e-05 */
> +        .quad 0x3FF00230C32C2EC1  /* A01 = +1.000534784691737621998e+00 */
> +        .quad 0xBF76BD019BCC5DAF  /* A02 = -5.551344188254799492943e-03 */
> +        .quad 0xBFD3F2C7156DC21E  /* A03 = -3.116929730668135389848e-01 */
> +        .quad 0xBEF9B15EAE411EAE  /* A00 = -2.450261207822986676092e-05 */
> +        .quad 0x3FF002C2DF057A4D  /* A01 = +1.000674124886830940184e+00 */
> +        .quad 0xBF7B08CCD9AC1E30  /* A02 = -6.600189396301511801646e-03 */
> +        .quad 0xBFD3C7A7A114FED8  /* A03 = -3.090609620157755976777e-01 */
> +        .quad 0xBF00E36483C373B3  /* A00 = -3.221178528332122595812e-05 */
> +        .quad 0x3FF0036F419480D7  /* A01 = +1.000838524028997644777e+00 */
> +        .quad 0xBF7FD255D1777007  /* A02 = -7.768950679260206403087e-03 */
> +        .quad 0xBFD39A453911D6CE  /* A03 = -3.062909180947429588215e-01 */
> +        .quad 0xBF05DFA04DD12059  /* A00 = -4.172046622180685472624e-05 */
> +        .quad 0x3FF00438B2A03D8D  /* A01 = +1.001030633695197069599e+00 */
> +        .quad 0xBF828F8DBB4A9D10  /* A02 = -9.062869337255224921890e-03 */
> +        .quad 0xBFD36AAB704697D9  /* A03 = -3.033856007044711255993e-01 */
> +        .quad 0xBF0BF3E0C647DEFB  /* A00 = -5.331544597092331081714e-05 */
> +        .quad 0x3FF005221063D36D  /* A01 = +1.001253189109060359741e+00 */
> +        .quad 0xBF857A2CB3C96102  /* A02 = -1.048693584122917590862e-02 */
> +        .quad 0xBFD338E65BBB4FEC  /* A03 = -3.003478904549854444639e-01 */
> +        .quad 0xBF11A506ED7C9D31  /* A00 = -6.730894835681591541979e-05 */
> +        .quad 0x3FF0062E4D0EA92A  /* A01 = +1.001508999829250345925e+00 */
> +        .quad 0xBF88AB82C2761AF3  /* A02 = -1.204588085125866091241e-02 */
> +        .quad 0xBFD305028D6BD206  /* A03 = -2.971807843271395688234e-01 */
> +        .quad 0xBF1607C0922D9BF1  /* A00 = -8.403885708006799337092e-05 */
> +        .quad 0x3FF007606C341961  /* A01 = +1.001800940198869449560e+00 */
> +        .quad 0xBF8C25E6DA487BCF  /* A02 = -1.374416688582682892494e-02 */
> +        .quad 0xBFD2CF0D0EE8F7B5  /* A03 = -2.938873906713255768075e-01 */
> +        .quad 0xBF1B3A8480A0A16D  /* A00 = -1.038688061788578038307e-04 */
> +        .quad 0x3FF008BB802D02D6  /* A01 = +1.002131939589323561535e+00 */
> +        .quad 0xBF8FEB8AE99FD100  /* A02 = -1.558598065819483124983e-02 */
> +        .quad 0xBFD297135BD0911B  /* A03 = -2.904709240558688843059e-01 */
> +        .quad 0xBF20ABB9BDB75C65  /* A00 = -1.271881327357976163798e-04 */
> +        .quad 0x3FF00A42A76D8CD1  /* A01 = +1.002504972472525901495e+00 */
> +        .quad 0xBF91FF3D752BB9E6  /* A02 = -1.757522609380570560722e-02 */
> +        .quad 0xBFD25D235C1F88B4  /* A03 = -2.869346999779154305799e-01 */
> +        .quad 0xBF243D3254425461  /* A00 = -1.544116913733432829448e-04 */
> +        .quad 0x3FF00BF909D1795E  /* A01 = +1.002923048355647051011e+00 */
> +        .quad 0xBF94304E04D44942  /* A02 = -1.971551804042204897316e-02 */
> +        .quad 0xBFD2214B5E61CFA6  /* A03 = -2.832821294498394371075e-01 */
> +        .quad 0xBF286070011B61CE  /* A00 = -1.859795307186510085994e-04 */
> +        .quad 0x3FF00DE1D5E1627E  /* A01 = +1.003389201612804537689e+00 */
> +        .quad 0xBF9689D5F4163F59  /* A02 = -2.201017668045266231780e-02 */
> +        .quad 0xBFD1E39A11C3B42C  /* A03 = -2.795167134743816728104e-01 */
> +        .quad 0xBF2D250B366A79E8  /* A00 = -2.223564326486314902259e-04 */
> +        .quad 0x3FF010003E134001  /* A01 = +1.003906481248123094829e+00 */
> +        .quad 0xBF990C9FF91F6F81  /* A02 = -2.446222265267250853271e-02 */
> +        .quad 0xBFD1A41E80084CDC  /* A03 = -2.756420374218586655246e-01 */
> +        .quad 0xBF314DB5DDC2A30E  /* A00 = -2.640313157465248123865e-04 */
> +        .quad 0x3FF012577608921B  /* A01 = +1.004477940624503018441e+00 */
> +        .quad 0xBF9BB9626875B0C9  /* A02 = -2.707437288829409385849e-02 */
> +        .quad 0xBFD162E80768A9D0  /* A03 = -2.716617653228725615122e-01 */
> +        .quad 0xBF346A6133808864  /* A00 = -3.115165050094957730625e-04 */
> +        .quad 0x3FF014EAAFCC88A3  /* A01 = +1.005106627192198898157e+00 */
> +        .quad 0xBF9E90BEF9BF7419  /* A02 = -2.984903716411588595059e-02 */
> +        .quad 0xBFD12006545F7FAD  /* A03 = -2.675796340899932457269e-01 */
> +        .quad 0xBF37F180DC3848EA  /* A00 = -3.653468704395550778821e-04 */
> +        .quad 0x3FF017BD19147861  /* A01 = +1.005795572250939295955e+00 */
> +        .quad 0xBFA0C9A14C702E07  /* A02 = -3.278831537326359207851e-02 */
> +        .quad 0xBFD0DB895B650092  /* A03 = -2.633994476818851682154e-01 */
> +        .quad 0xBF3BEC6AAC6D7635  /* A00 = -4.260788377246944457107e-04 */
> +        .quad 0x3FF01AD1D884E719  /* A01 = +1.006547780778822565040e+00 */
> +        .quad 0xBFA260B2A1B1434A  /* A02 = -3.589399551186163439542e-02 */
> +        .quad 0xBFD09581529E93D6  /* A03 = -2.591250712233067465817e-01 */
> +        .quad 0xBF4164E26167882B  /* A00 = -5.308251737086202562063e-04 */
> +        .quad 0x3FF01FEF14B62B81  /* A01 = +1.007796364693348545316e+00 */
> +        .quad 0xBFA4EB014538AA42  /* A02 = -4.085544557559163403315e-02 */
> +        .quad 0xBFD029D36FEAF41F  /* A03 = -2.525528519580024222613e-01 */
> +        .quad 0xBF46F6FFF4E53DC8  /* A00 = -7.008313930700277652464e-04 */
> +        .quad 0x3FF027CBB51CBBA0  /* A01 = +1.009715754956893363214e+00 */
> +        .quad 0xBFA89DEC9FEC112E  /* A02 = -4.807986690687680864098e-02 */
> +        .quad 0xBFCF2A99464D0DB4  /* A03 = -2.434875100390009317053e-01 */
> +        .quad 0xBF4DCC9C4F66A4D9  /* A00 = -9.094012482836712945103e-04 */
> +        .quad 0x3FF030E7CFCCD583  /* A01 = +1.011939822882909068014e+00 */
> +        .quad 0xBFACAA3B95814081  /* A02 = -5.598627281199331645611e-02 */
> +        .quad 0xBFCDF78F156BE7CF  /* A03 = -2.341173987004467604844e-01 */
> +        .quad 0xBF5308ED74E5C7A6  /* A00 = -1.161796466103906435435e-03 */
> +        .quad 0x3FF03B5986412ECB  /* A01 = +1.014489674026594512313e+00 */
> +        .quad 0xBFB087EBA88DCC3F  /* A02 = -6.457398285947223148806e-02 */
> +        .quad 0xBFCCBB9BD134862F  /* A03 = -2.244753619680052991736e-01 */
> +        .quad 0xBF57FA23C00DF4B5  /* A00 = -1.463446533505758208674e-03 */
> +        .quad 0x3FF0473558A1BCC0  /* A01 = +1.017384859292903342975e+00 */
> +        .quad 0xBFB2E702BC6360EF  /* A02 = -7.383744334527241048871e-02 */
> +        .quad 0xBFCB77D546379288  /* A03 = -2.145945160729250122955e-01 */
> +        .quad 0xBF5DD12971557F71  /* A00 = -1.819887610814388068450e-03 */
> +        .quad 0x3FF0548DDF5000A8  /* A01 = +1.020643112482540360020e+00 */
> +        .quad 0xBFB571B63DA186E1  /* A02 = -8.376635555898871710045e-02 */
> +        .quad 0xBFCA2D5202605148  /* A03 = -2.045080672838912594358e-01 */
> +        .quad 0xBF6252B1AD5D4F17  /* A00 = -2.236697221556737096709e-03 */
> +        .quad 0x3FF063738A910BF7  /* A01 = +1.024280110622155737232e+00 */
> +        .quad 0xBFB8270C8E6B601B  /* A02 = -9.434584118878357184013e-02 */
> +        .quad 0xBFC8DD27D950A07E  /* A03 = -1.942491351230763441116e-01 */
> +        .quad 0xBF66470C91730CFC  /* A00 = -2.719425723258004842786e-03 */
> +        .quad 0x3FF073F468FCF331  /* A01 = +1.028309259519300633556e+00 */
> +        .quad 0xBFBB05C2952191E4  /* A02 = -1.055566419686964629854e-01 */
> +        .quad 0xBFC7886A770DE2BD  /* A03 = -1.838505822486435070662e-01 */
> +        .quad 0xBF6AD114AC8E98EC  /* A00 = -3.273525599485007861467e-03 */
> +        .quad 0x3FF0861BF53E5226  /* A01 = +1.032741506559554434119e+00 */
> +        .quad 0xBFBE0C4F9B461507  /* A02 = -1.173753503881763554650e-01 */
> +        .quad 0xBFC6302A037CDE3A  /* A03 = -1.733448521642786954722e-01 */
> +        .quad 0xBF6FFBDE2A6C2AF8  /* A00 = -3.904279630096648551207e-03 */
> +        .quad 0x3FF099F2EB8E7DA3  /* A01 = +1.037585182326304034106e+00 */
> +        .quad 0xBFC09C74D192DDF0  /* A02 = -1.297746680554463516444e-01 */
> +        .quad 0xBFC4D571D8E3079F  /* A03 = -1.627638157861470424859e-01 */
> +        .quad 0xBF72E8FDC0B952AA  /* A00 = -4.616728994353872309042e-03 */
> +        .quad 0x3FF0AF7F273C9533  /* A01 = +1.042845872181101141152e+00 */
> +        .quad 0xBFC244C512736F10  /* A02 = -1.427236881344176033792e-01 */
> +        .quad 0xBFC379474F58B902  /* A03 = -1.521386277613104298645e-01 */
> +        .quad 0xBF762EABAF17395B  /* A00 = -5.415602341101023557701e-03 */
> +        .quad 0x3FF0C6C3886F63FB  /* A01 = +1.048526318502125631582e+00 */
> +        .quad 0xBFC3FDF9918EA12A  /* A02 = -1.561881981590514389957e-01 */
> +        .quad 0xBFC21CA89ECAB895  /* A03 = -1.414995932913753196036e-01 */
> +        .quad 0xBF79D387CE5B2BAE  /* A00 = -6.305246822828998107258e-03 */
> +        .quad 0x3FF0DFBFE2346376  /* A01 = +1.054626353847394337748e+00 */
> +        .quad 0xBFC5C6DA43602620  /* A02 = -1.701309994680721970894e-01 */
> +        .quad 0xBFC0C08BD8DB6631  /* A03 = -1.308760460731704100557e-01 */
> +        .quad 0xBF7DDBA8E8DA9060  /* A00 = -7.289562037531366334164e-03 */
> +        .quad 0x3FF0FA70F0D1B464  /* A01 = +1.061142864894713433443e+00 */
> +        .quad 0xBFC79E18D92BAA7C  /* A02 = -1.845122394946264732241e-01 */
> +        .quad 0xBFBECBBBF74C2669  /* A03 = -1.202962378266875381749e-01 */
> +        .quad 0xBF81254E76EA25DA  /* A00 = -8.371937755572145950511e-03 */
> +        .quad 0x3FF116D05835EBD0  /* A01 = +1.068069786618014660462e+00 */
> +        .quad 0xBFC982539E2ED224  /* A02 = -1.992897531869327609755e-01 */
> +        .quad 0xBFBC1B043C350159  /* A03 = -1.097872397413132278254e-01 */
> +        .quad 0xBF8391ACBA863403  /* A00 = -9.555196230190082448686e-03 */
> +        .quad 0x3FF134D4AA477FE2  /* A01 = +1.075398125794884141015e+00 */
> +        .quad 0xBFCB7218609FEAFB  /* A02 = -2.144194099235717521079e-01 */
> +        .quad 0xBFB970A16CB88329  /* A03 = -9.937485603633135211599e-02 */
> +        .quad 0xBF87935088E48E8B  /* A00 = -1.151144902957603431692e-02 */
> +        .quad 0x3FF1649892AD7DD3  /* A01 = +1.087059567413110938716e+00 */
> +        .quad 0xBFCE6971DDE75409  /* A02 = -2.375929196847723912089e-01 */
> +        .quad 0xBFB58291E88CB251  /* A03 = -8.402358939628952472223e-02 */
> +        .quad 0xBF8DB3A62C325325  /* A00 = -1.450280973794233242702e-02 */
> +        .quad 0x3FF1A9C900C6DEEA  /* A01 = +1.103951457056548068891e+00 */
> +        .quad 0xBFD13DBC65B0E08E  /* A02 = -2.693930619311765140012e-01 */
> +        .quad 0xBFB06696F62696D1  /* A03 = -6.406539449252625362252e-02 */
> +        .quad 0xBF92583699F2E27A  /* A00 = -1.791463198307716858659e-02 */
> +        .quad 0x3FF1F451B85AA9F0  /* A01 = +1.122148246892376022288e+00 */
> +        .quad 0xBFD34FD5F8288180  /* A02 = -3.017477916164565954205e-01 */
> +        .quad 0xBFA6FB692825B683  /* A03 = -4.488686194495718900788e-02 */
> +        .quad 0xBF9641C26E673D6F  /* A00 = -2.173522757385398448959e-02 */
> +        .quad 0x3FF24364DA5E2B07  /* A01 = +1.141453602790251542487e+00 */
> +        .quad 0xBFD564A5A5EF5890  /* A02 = -3.342680092295120530821e-01 */
> +        .quad 0xBF9B43712011A982  /* A03 = -2.662445791467283467968e-02 */
> +        .quad 0xBF9A901038EC2F39  /* A00 = -2.594018313816024226548e-02 */
> +        .quad 0x3FF2961356DFFEBA  /* A01 = +1.161639537196534011088e+00 */
> +        .quad 0xBFD775EBB17198C7  /* A02 = -3.665723069046972759644e-01 */
> +        .quad 0xBF833B1A926CD462  /* A03 = -9.390075295963199591975e-03 */
> +        .quad 0xBF9F396A6A461B91  /* A00 = -3.049246095317987084727e-02 */
> +        .quad 0x3FF2EB53BAEF534B  /* A01 = +1.182452898229899629357e+00 */
> +        .quad 0xBFD97DABF8AD8BBD  /* A02 = -3.982953957076310058660e-01 */
> +        .quad 0x3F7B8F6A3E0F8837  /* A03 = +6.728568086119371925713e-03 */
> +        .quad 0xBFA21878590F8BAA  /* A00 = -3.534294211546946951064e-02 */
> +        .quad 0x3FF34209790236E1  /* A01 = +1.203622315111197105253e+00 */
> +        .quad 0xBFDB764C0E71BECB  /* A02 = -4.290952817018306997277e-01 */
> +        .quad 0x3F962FE0C03F84C0  /* A03 = +2.166701482190513949888e-02 */
> +        .quad 0xBFA4B36B9AD27ECC  /* A00 = -4.043136849327097492868e-02 */
> +        .quad 0x3FF3990C5B12FC16  /* A01 = +1.224865298994477935679e+00 */
> +        .quad 0xBFDD5AABB0D01390  /* A02 = -4.586590983092770912322e-01 */
> +        .quad 0x3FA21DAF5CA162DB  /* A03 = +3.538272863142363083844e-02 */
> +        .quad 0xBFA7645E4D7BF28B  /* A00 = -4.568762489177399105378e-02 */
> +        .quad 0x3FF3EF2FD51C0D9F  /* A01 = +1.245895225962932562069e+00 */
> +        .quad 0xBFDF26377E1B686E  /* A02 = -4.867075664057044503963e-01 */
> +        .quad 0x3FA8803E756EE812  /* A03 = +4.785342391501513914509e-02 */
> +        .quad 0xBFAA210925C64413  /* A00 = -5.103329263796054643398e-02 */
> +        .quad 0x3FF44349F897D8E7  /* A01 = +1.266427966181760345066e+00 */
> +        .quad 0xBFE06A7B02C6D8E2  /* A02 = -5.129981092675530707226e-01 */
> +        .quad 0x3FAE3F194734F5D0  /* A03 = +5.907515520309980505687e-02 */
> +        .quad 0xBFACDE48F8A19BBB  /* A00 = -5.638340029764018351832e-02 */
> +        .quad 0x3FF49439D5466582  /* A01 = +1.286187966447272845727e+00 */
> +        .quad 0xBFE131C7C1063DDC  /* A02 = -5.373266954429101183166e-01 */
> +        .quad 0x3FB1ADEEC36AD805  /* A03 = +6.906025191241844940482e-02 */
> +        .quad 0xBFAF905D8F585680  /* A00 = -6.164829611604449866036e-02 */
> +        .quad 0x3FF4E0ED1FD27F99  /* A01 = +1.304913639360142818546e+00 */
> +        .quad 0xBFE1E7A859DC1D3D  /* A02 = -5.595285182070380836095e-01 */
> +        .quad 0x3FB3ED018E4642A1  /* A03 = +7.783517573831001679086e-02 */
> +        .quad 0xBFB11595104160BA  /* A00 = -6.673556944713512906198e-02 */
> +        .quad 0x3FF528650340490B  /* A01 = +1.322361958217302513319e+00 */
> +        .quad 0xBFE28B14B40BC974  /* A02 = -5.794776455425521000109e-01 */
> +        .quad 0x3FB5DF49F5BAF6D7  /* A03 = +8.543836831355676453281e-02 */
> +        .quad 0xBFB2513A97344BA4  /* A00 = -7.155195418844911836587e-02 */
> +        .quad 0x3FF569BA0DB5EE14  /* A01 = +1.338312200124055273420e+00 */
> +        .quad 0xBFE31B53A8B67B20  /* A02 = -5.970857901737396389308e-01 */
> +        .quad 0x3FB787F297BB0544  /* A03 = +9.191814617499455275507e-02 */
> +        .quad 0xBFB37512E848FAFA  /* A00 = -7.600515528700305112331e-02 */
> +        .quad 0x3FF5A41F33B403C8  /* A01 = +1.352568819013173495591e+00 */
> +        .quad 0xBFE397F6EA9A58A5  /* A02 = -6.123003561103997904880e-01 */
> +        .quad 0x3FB8EAA9FF25CA06  /* A03 = +9.733068923177520814782e-02 */
> +        .quad 0xBFB47B3E603AFC5D  /* A00 = -8.000554894805263217439e-02 */
> +        .quad 0x3FF5D6E3EDE40487  /* A01 = +1.364963464031718975988e+00 */
> +        .quad 0xBFE400D5BCA6D631  /* A02 = -6.251019177058819709103e-01 */
> +        .quad 0x3FBA0B830ED567FE  /* A03 = +1.017381583418739132707e-01 */
> +        .quad 0xBFB5BBFE8AC90496  /* A00 = -8.489981544791400103200e-02 */
> +        .quad 0x3FF612BA70107E95  /* A01 = +1.379572332145390989311e+00 */
> +        .quad 0xBFE477EAF1FA7693  /* A02 = -6.396383978023599814478e-01 */
> +        .quad 0x3FBB4784B7C08A95  /* A03 = +1.065600346196709652391e-01 */
> +        .quad 0xBFB6D5D940743939  /* A00 = -8.920057128509463473254e-02 */
> +        .quad 0x3FF644A8748F70CE  /* A01 = +1.391762214006166953340e+00 */
> +        .quad 0xBFE4D646AB07EA37  /* A02 = -6.511567440459832267763e-01 */
> +        .quad 0x3FBC354F4E1D5292  /* A03 = +1.101884427747086558913e-01 */
> +        .quad 0xBFB7223D19E4F3D1  /* A00 = -9.036619074045339206069e-02 */
> +        .quad 0x3FF6518FEB42B7FA  /* A01 = +1.394912642466350494175e+00 */
> +        .quad 0xBFE4ED86CB87498C  /* A02 = -6.539949393430091184598e-01 */
> +        .quad 0x3FBC6D29F28CCA9B  /* A03 = +1.110407082713131127205e-01 */
> +        .quad 0xBFB6878652FF6312  /* A00 = -8.800544287022329936754e-02 */
> +        .quad 0x3FF63948C302D040  /* A01 = +1.388985406648330922508e+00 */
> +        .quad 0xBFE4C4E2E7904E17  /* A02 = -6.490339777687407218920e-01 */
> +        .quad 0x3FBC127356CA1ABE  /* A03 = +1.096565329445224612481e-01 */
> +        .quad 0xBFB4F5D18B0C91D6  /* A00 = -8.187589306596207427980e-02 */
> +        .quad 0x3FF5FD27EB7DD0B8  /* A01 = +1.374305648697413673176e+00 */
> +        .quad 0xBFE464E01A2B2FC6  /* A02 = -6.373138915164353601739e-01 */
> +        .quad 0x3FBB460547674A30  /* A03 = +1.065371798825160976065e-01 */
> +        .quad 0xBFB26642FA16A685  /* A00 = -7.187288861919156890412e-02 */
> +        .quad 0x3FF59F9BEDE1C95A  /* A01 = +1.351467065073470141812e+00 */
> +        .quad 0xBFE3D67920C8FBEA  /* A02 = -6.199308052381387046381e-01 */
> +        .quad 0x3FBA24F6A8D3CBC1  /* A03 = +1.021265184570401413078e-01 */
> +        .quad 0xBFADB5294794F097  /* A00 = -5.802277563859197656582e-02 */
> +        .quad 0x3FF523EA7B9CF453  /* A01 = +1.321268542159732772845e+00 */
> +        .quad 0xBFE322A8B55E35DB  /* A02 = -5.979808370918208160205e-01 */
> +        .quad 0x3FB8C8673B1B3E37  /* A03 = +9.680791085269722928697e-02 */
> +        .quad 0xBFA4B7D661965C6A  /* A00 = -4.046506825687219699450e-02 */
> +        .quad 0x3FF48DE3E2CE3122  /* A01 = +1.284641157110919085227e+00 */
> +        .quad 0xBFE251FED1A7F445  /* A02 = -5.725092024655472622285e-01 */
> +        .quad 0x3FB745699FCABDB9  /* A03 = +9.090290213747821701507e-02 */
> +        .quad 0xBF93E60456E4EE1D  /* A00 = -1.943213253365004902773e-02 */
> +        .quad 0x3FF3E1A14E628A59  /* A01 = +1.242585474196536532432e+00 */
> +        .quad 0xBFE16C5AB660E876  /* A02 = -5.444768488007543094653e-01 */
> +        .quad 0x3FB5AD33AA8C188F  /* A03 = +8.467410005332197397987e-02 */
> +        .quad 0x3F738C17C47C7961  /* A00 = +4.772274820224659853951e-03 */
> +        .quad 0x3FF3234DDE3BD146  /* A01 = +1.196119182682268355933e+00 */
> +        .quad 0xBFE078C0D77A9D3B  /* A02 = -5.147403915952176722826e-01 */
> +        .quad 0x3FB40D74B3E276B8  /* A03 = +7.833032027925923568290e-02 */
> +        .quad 0x3FA0474BECC689C7  /* A00 = +3.179394975019849550746e-02 */
> +        .quad 0x3FF256FB4FA7D18A  /* A01 = +1.146235762743432307076e+00 */
> +        .quad 0xBFDEFA8E3FB285E2  /* A02 = -4.840427038235174395098e-01 */
> +        .quad 0x3FB270C007493D59  /* A03 = +7.203293016322244446403e-02 */
> +        .quad 0x3FAF5BD51E479BDC  /* A00 = +6.124750132203590768931e-02 */
> +        .quad 0x3FF18081D0B53BC5  /* A01 = +1.093873801484492647162e+00 */
> +        .quad 0xBFDCFE2439BD0C03  /* A02 = -4.530115665294831006626e-01 */
> +        .quad 0x3FB0DEFE5A45AFDD  /* A03 = +6.590261176978580437424e-02 */
> +        .quad 0x3FB7BD5D2806EA26  /* A00 = +9.273321368429118805032e-02 */
> +        .quad 0x3FF0A369E35B4440  /* A01 = +1.039895904647224256223e+00 */
> +        .quad 0xBFDB04BC5C9951E7  /* A02 = -4.221640495573226181669e-01 */
> +        .quad 0x3FAEBBBAA9D6DEEF  /* A03 = +6.002600978120919278380e-02 */
> +        .quad 0x3FC01BE411098DBC  /* A00 = +1.258511622610124502941e-01 */
> +        .quad 0x3FEF85BDABC031C1  /* A01 = +9.850757936961188621083e-01 */
> +        .quad 0xBFD91521375097C2  /* A02 = -3.919146576102968682065e-01 */
> +        .quad 0x3FABE26F0086D982  /* A03 = +5.446192628317005068883e-02 */
> +        .quad 0x3FC481D7FF5776B9  /* A00 = +1.602125164781023347604e-01 */
> +        .quad 0x3FEDC3506C1E7218  /* A01 = +9.300920592973538347792e-01 */
> +        .quad 0xBFD7349A88DA7D4F  /* A02 = -3.625856720409119104964e-01 */
> +        .quad 0x3FA936E2DFF8E2AE  /* A03 = +4.924687370334389358018e-02 */
> +        .quad 0x3FC90471F96FA27A  /* A00 = +1.954481571149420671141e-01 */
> +        .quad 0x3FEC0451601987A2  /* A01 = +8.755270840595026360376e-01 */
> +        .quad 0xBFD5671CD4B898DC  /* A02 = -3.344184949259110251063e-01 */
> +        .quad 0x3FA6BB9594603B67  /* A03 = +4.439990459660841243261e-02 */
> +        .quad 0x3FCFD8ADB9ED944C  /* A00 = +2.488000066615846384011e-01 */
> +        .quad 0x3FE978C073F6809A  /* A01 = +7.959902062321078108909e-01 */
> +        .quad 0xBFD2DF7E00BCD5A9  /* A02 = -2.948908812716931060471e-01 */
> +        .quad 0x3FA3614033D490B2  /* A03 = +3.785133965200894456959e-02 */
> +        .quad 0x3FD4846A12AFE5A0  /* A00 = +3.205819303981005674586e-01 */
> +        .quad 0x3FE63A1147D40472  /* A01 = +6.945883181471244061100e-01 */
> +        .quad 0xBFCFA2268AD34450  /* A02 = -2.471359422548027318101e-01 */
> +        .quad 0x3F9F150201D9FFE0  /* A03 = +3.035357605267552383310e-02 */
> +        .quad 0x3FD9018641F82BEB  /* A00 = +3.907180446846598154131e-01 */
> +        .quad 0x3FE33B7C220FFBDC  /* A01 = +6.010113396913498995389e-01 */
> +        .quad 0xBFCA4E4187E29C86  /* A02 = -2.055131829740483584423e-01 */
> +        .quad 0x3F98C30CED19F8F4  /* A03 = +2.418155858185229434287e-02 */
> +        .quad 0x3FDD4B8255BEB078  /* A00 = +4.577337109901757905561e-01 */
> +        .quad 0x3FE0858B19D3A49B  /* A01 = +5.163016800335243905451e-01 */
> +        .quad 0xBFC5BC929EACE564  /* A02 = -1.698172831327539045176e-01 */
> +        .quad 0x3F93A083CE57DE2B  /* A03 = +1.916700312537337677621e-02 */
> +        .quad 0x3FE0A8E5E039295C  /* A00 = +5.206174258576470315063e-01 */
> +        .quad 0x3FDC35E1234583FE  /* A01 = +4.407885403107342225937e-01 */
> +        .quad 0xBFC1DE034E31AEB9  /* A02 = -1.395877963835710222629e-01 */
> +        .quad 0x3F8EFDEBB3471BDC  /* A03 = +1.513275280821162888101e-02 */
> +        .quad 0x3FE2851B603CB2A5  /* A00 = +5.787484054213406503564e-01 */
> +        .quad 0x3FD7F4A44ABBB286  /* A01 = +3.743067483726821853551e-01 */
> +        .quad 0xBFBD3EEB67087DE7  /* A02 = -1.142413260026767657385e-01 */
> +        .quad 0x3F8864F38329E8BD  /* A03 = +1.191129917173260922836e-02 */
> +        .quad 0x3FE437DBE3C34AC1  /* A00 = +6.318187187665317283702e-01 */
> +        .quad 0x3FD43F6F789441B5  /* A01 = +3.163717916040938438194e-01 */
> +        .quad 0xBFB7D92E7901B9A4  /* A02 = -9.315767721429907277653e-02 */
> +        .quad 0x3F8327ED342308E1  /* A03 = +9.353497651663324544136e-03 */
> +        .quad 0x3FE5C0977766D55C  /* A00 = +6.797597248138731451661e-01 */
> +        .quad 0x3FD10B42A764D8F9  /* A01 = +2.663122782427219115142e-01 */
> +        .quad 0xBFB3633351D3D70F  /* A02 = -7.573242900602060456716e-02 */
> +        .quad 0x3F7E079E30FF899C  /* A03 = +7.331483779099558922843e-03 */
> +        .quad 0x3FE7202CE08A88C4  /* A00 = +7.226776490754436288455e-01 */
> +        .quad 0x3FCC973EB5662B01  /* A01 = +2.233656297433626314319e-01 */
> +        .quad 0xBFAF70A455F9920B  /* A02 = -6.140626477716545211782e-02 */
> +        .quad 0x3F77812411CE99B6  /* A03 = +5.738392731393584730859e-03 */
> +        .quad 0x3FE85879424095B1  /* A00 = +7.608000082006382003286e-01 */
> +        .quad 0x3FC7E73BD1674D84  /* A01 = +1.867441914060742336190e-01 */
> +        .quad 0xBFA96F84E4BF333B  /* A02 = -4.967894832916504993525e-02 */
> +        .quad 0x3F72606DDCA6E117  /* A03 = +4.486493251924870105662e-03 */
> +        .quad 0x3FE96BFE4957F4DD  /* A00 = +7.944327766887472330737e-01 */
> +        .quad 0x3FC3ED4780D25478  /* A01 = +1.556786898624158421711e-01 */
> +        .quad 0xBFA489C5F9A56B58  /* A02 = -4.011362717093075458408e-02 */
> +        .quad 0x3F6CB5DC17E9AD2A  /* A03 = +3.504686231556104931972e-03 */
> +        .quad 0x3FEA5D9CB2F41234  /* A00 = +8.239272589858672724006e-01 */
> +        .quad 0x3FC091A758374DCF  /* A01 = +1.294449978582705440555e-01 */
> +        .quad 0xBFA08E436D4B5CE0  /* A02 = -3.233538350257858517978e-02 */
> +        .quad 0x3F666997AD53E6B7  /* A03 = +2.735897297154145629133e-03 */
> +        .quad 0x3FEB3060342CB850  /* A00 = +8.496552485501158713532e-01 */
> +        .quad 0x3FBB7D30BBC7DC1B  /* A01 = +1.073790033768634993860e-01 */
> +        .quad 0xBF9AA6BA3443D9E3  /* A02 = -2.602663940430173170060e-02 */
> +        .quad 0x3F617CA764B7850B  /* A03 = +2.134634914668814050648e-03 */
> +        .quad 0x3FEBE759A6A0C7B8  /* A00 = +8.719909910635044170135e-01 */
> +        .quad 0x3FB6C10DE6A703FF  /* A01 = +8.888327485239243264115e-02 */
> +        .quad 0xBF956C566D8BE1F6  /* A02 = -2.092108768099084498138e-02 */
> +        .quad 0x3F5B46D1A4A59CF8  /* A03 = +1.664833764687232917079e-03 */
> +        .quad 0x3FEC858494887A04  /* A00 = +8.912985707318630268503e-01 */
> +        .quad 0x3FB2CC31F543394D  /* A01 = +7.342827070099140762682e-02 */
> +        .quad 0xBF9133477FF69137  /* A02 = -1.679717749142747504343e-02 */
> +        .quad 0x3F5544482FBB4DA5  /* A03 = +1.298017973501022466823e-03 */
> +        .quad 0x3FED0DB59D0E32E9  /* A00 = +9.079235141267335551518e-01 */
> +        .quad 0x3FAF006BAFFC6EF4  /* A01 = +6.055008433597022787787e-02 */
> +        .quad 0xBF8B97146FA2B97A  /* A02 = -1.347175565419144252499e-02 */
> +        .quad 0x3F5093B01F4CDC69  /* A03 = +1.011774057770665211434e-03 */
> +        .quad 0x3FEDB487C3EC457C  /* A00 = +9.282873942012623835751e-01 */
> +        .quad 0x3FA7390C09D0BD1D  /* A01 = +4.535710925881118044112e-02 */
> +        .quad 0xBF83D9F7C3181106  /* A02 = -9.693084374710735778846e-03 */
> +        .quad 0x3F46E34A0A3C0E64  /* A03 = +6.984817050299072134500e-04 */
> +        .quad 0x3FEE5FFCB4E6EB00  /* A00 = +9.492171796076434020506e-01 */
> +        .quad 0x3F9F4913ED00AADF  /* A01 = +3.055220731782070861526e-02 */
> +        .quad 0xBF79670BD0E59B5C  /* A02 = -6.201788097633133961528e-03 */
> +        .quad 0x3F3BC998EBCAF96D  /* A03 = +4.240034429975534616304e-04 */
> +        .quad 0x3FEEDBA41E9542FE  /* A00 = +9.643116566968215064293e-01 */
> +        .quad 0x3F94F5DD18D9C24D  /* A01 = +2.046914543319848858727e-02 */
> +        .quad 0xBF7034896AA122B9  /* A02 = -3.956352980886528904192e-03 */
> +        .quad 0x3F30DCCB47810B39  /* A03 = +2.573009765038273091199e-04 */
> +        .quad 0x3FEF33F2882520ED  /* A00 = +9.750912341196716903724e-01 */
> +        .quad 0x3F8BF37F2CF553FF  /* A01 = +1.364802699996836392315e-02 */
> +        .quad 0xBF649F6F05A69619  /* A02 = -2.517430152880317534986e-03 */
> +        .quad 0x3F247623C950AAC9  /* A03 = +1.561087307505231250044e-04 */
> +        .quad 0x3FEF727757751741  /* A00 = +9.827229221489021115943e-01 */
> +        .quad 0x3F828E67912C4400  /* A01 = +9.060677640748693306705e-03 */
> +        .quad 0xBF5A2F51A806CC2C  /* A02 = -1.598195784123355826789e-03 */
> +        .quad 0x3F18D35D7687E613  /* A03 = +9.470231965016282719549e-05 */
> +        .quad 0x3FEF9E6325C5942A  /* A00 = +9.880843866091073568469e-01 */
> +        .quad 0x3F788AB117618F76  /* A01 = +5.991641772286606867914e-03 */
> +        .quad 0xBF5096EAB0B1EA89  /* A02 = -1.012543859160305046233e-03 */
> +        .quad 0x3F0E1E50EC4435AB  /* A03 = +5.744633156910412119652e-05 */
> +        .quad 0x3FEFBD0784049369  /* A00 = +9.918248728250605994461e-01 */
> +        .quad 0x3F702BBD8294035F  /* A01 = +3.947963975634432264028e-03 */
> +        .quad 0xBF44FB55E0F00593  /* A02 = -6.403130845457509273330e-04 */
> +        .quad 0x3F0244DCD723230A  /* A03 = +3.484534217219031730379e-05 */
> +        .quad 0x3FEFD245E2366A43  /* A00 = +9.944180887426415926811e-01 */
> +        .quad 0x3F653D82EC088433  /* A01 = +2.592807490387838333795e-03 */
> +        .quad 0xBF3A7DF75E013CB8  /* A02 = -4.042366908878036561859e-04 */
> +        .quad 0x3EF6298E69F991CD  /* A03 = +2.113564425911141559972e-05 */
> +        .quad 0x3FEFE0EAA508BC69  /* A00 = +9.962056372950317539861e-01 */
> +        .quad 0x3F5BD0771AF3FDDA  /* A01 = +1.697651208644282514598e-03 */
> +        .quad 0xBF30B2E1254DE571  /* A02 = -2.548026725928887099328e-04 */
> +        .quad 0x3EEAE28B70EC0256  /* A03 = +1.281973848454955042307e-05 */
> +        .quad 0x3FEFEAF5303D7F96  /* A00 = +9.974313680831865536192e-01 */
> +        .quad 0x3F5229111365657E  /* A01 = +1.108423877289460134782e-03 */
> +        .quad 0xBF250572D04DFE66  /* A02 = -1.603796628408704519168e-04 */
> +        .quad 0x3EE04E89BB57C981  /* A03 = +7.775682983689149966743e-06 */
> +        .quad 0x3FEFF1CF52F1CF44  /* A00 = +9.982678051005469122003e-01 */
> +        .quad 0x3F47A71316147CEB  /* A01 = +7.218211359577819110842e-04 */
> +        .quad 0xBF1A6D7604055719  /* A02 = -1.008132248946049582547e-04 */
> +        .quad 0x3ED3C8047586A85C  /* A03 = +4.716233739913014633626e-06 */
> +        .quad 0x3FEFF6770369EF69  /* A00 = +9.988360468555416149528e-01 */
> +        .quad 0x3F3EBB261180FBF0  /* A01 = +4.689186039321105101130e-04 */
> +        .quad 0xBF1097754FE19D7F  /* A02 = -6.329206004950480057066e-05 */
> +        .quad 0x3EC7FEFF83BCA0A7  /* A03 = +2.860556404988488738366e-06 */
> +        .quad 0x3FEFF99D42371AC4  /* A00 = +9.992204945818561334647e-01 */
> +        .quad 0x3F33EB2AEC271F59  /* A01 = +3.039340773764907474054e-04 */
> +        .quad 0xBF04CF18E0FC0D79  /* A02 = -3.968996690952969588805e-05 */
> +        .quad 0x3EBD1BDBD6019BE9  /* A03 = +1.735021065507727833886e-06 */
> +        .quad 0x3FEFFBBCA32B0D91  /* A00 = +9.994795977476532700123e-01 */
> +        .quad 0x3F29C41E1615110A  /* A01 = +1.965796209707565346710e-04 */
> +        .quad 0xBEFA11F93D9DCB5A  /* A02 = -2.486248909101414873235e-05 */
> +        .quad 0x3EB1A7CA4546F7A7  /* A03 = +1.052345642723709228769e-06 */
> +        .quad 0x3FEFFD298B8E8DE2  /* A00 = +9.996535993308806045121e-01 */
> +        .quad 0x3F20A1C42D523C5B  /* A01 = +1.268913244172078754520e-04 */
> +        .quad 0xBEF0507A364AFAE4  /* A02 = -1.555859070622834605755e-05 */
> +        .quad 0x3EA56ACA17E7CDF4  /* A03 = +6.382806956848098872313e-07 */
> +        .quad 0x3FEFFE1DC82BA5A3  /* A00 = +9.997700604991915929176e-01 */
> +        .quad 0x3F156E73B90F1769  /* A01 = +8.175450626798714452801e-05 */
> +        .quad 0xBEE4663579D0A09F  /* A02 = -9.727122057226747625365e-06 */
> +        .quad 0x3E99FAF6FEC5D4C1  /* A03 = +3.871371052824002996020e-07 */
> +        .quad 0x3FEFFEF8D0BB5E81  /* A00 = +9.998745037837154514548e-01 */
> +        .quad 0x3F06686DA18D39C3  /* A01 = +4.273972098777251447726e-05 */
> +        .quad 0xBED46BC298073E90  /* A02 = -4.868731025855742842491e-06 */
> +        .quad 0x3E88E42286B9D0FD  /* A03 = +1.854535328530838170114e-07 */
> +        .quad 0x3FEFFF8DBC68DDC7  /* A00 = +9.999455146670975791423e-01 */
> +        .quad 0x3EF26B2953A80AF0  /* A01 = +1.756534514108903368909e-05 */
> +        .quad 0xBEBFC4472D580F83  /* A02 = -1.893443529411295465239e-06 */
> +        .quad 0x3E72505B4553D19F  /* A03 = +6.822456673547912277047e-08 */
> +        .quad 0x3FEFFFCED1276609  /* A00 = +9.999765477215883935358e-01 */
> +        .quad 0x3EDE1A94C7CC58F5  /* A01 = +7.177313020153979672606e-06 */
> +        .quad 0xBEA8A2C988744E57  /* A02 = -7.342066660497443762363e-07 */
> +        .quad 0x3E5AF30036BBBAF4  /* A03 = +2.509841882843541084885e-08 */
> +        .quad 0x3FEFFFEAFE70FCFC  /* A00 = +9.999899835164849370983e-01 */
> +        .quad 0x3EC879175E3549F5  /* A01 = +2.917410471128503564412e-06 */
> +        .quad 0xBE930E36677D1813  /* A02 = -2.839493400307523115929e-07 */
> +        .quad 0x3E43D4005B42D48F  /* A03 = +9.233192745401904898013e-09 */
> +        .quad 0x3ff0000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .align 32
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSignMask        */
> +        .align 32
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff           /* _sAbsMask         */
> +        .align 32
> +        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask     */
> +        .align 32
> +        .long 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000           /* _iExpMask         */
> +        .align 32
> +        .long 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000           /* _iMinIdxOfsMask   */
> +        .align 32
> +        .long 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000           /* _iMaxIdxMask      */
> +        .align 32
> +        .type	__svml_stanh_data_internal,@object
> +        .size	__svml_stanh_data_internal,.-__svml_stanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_tanh2_core.S b/sysdeps/x86_64/fpu/svml_d_tanh2_core.S
> new file mode 100644
> index 0000000000..c703131777
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_tanh2_core.S
> @@ -0,0 +1,29 @@
> +/* Function tanh vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_tanh)
> +WRAPPER_IMPL_SSE2 tanh
> +END (_ZGVbN2v_tanh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_tanh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_tanh4_core.S b/sysdeps/x86_64/fpu/svml_d_tanh4_core.S
> new file mode 100644
> index 0000000000..fb293f4dba
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_tanh4_core.S
> @@ -0,0 +1,29 @@
> +/* Function tanh vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_tanh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_tanh
> +END (_ZGVdN4v_tanh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_tanh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S
> new file mode 100644
> index 0000000000..5385a2c27c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function tanh vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_tanh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_tanh
> +END (_ZGVcN4v_tanh)
> diff --git a/sysdeps/x86_64/fpu/svml_d_tanh8_core.S b/sysdeps/x86_64/fpu/svml_d_tanh8_core.S
> new file mode 100644
> index 0000000000..9dafa7bb9a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_tanh8_core.S
> @@ -0,0 +1,25 @@
> +/* Function tanh vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_tanh)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_tanh
> +END (_ZGVeN8v_tanh)
> diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf16_core.S b/sysdeps/x86_64/fpu/svml_s_tanhf16_core.S
> new file mode 100644
> index 0000000000..19d51365e8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_tanhf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function tanhf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_tanhf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_tanhf
> +END (_ZGVeN16v_tanhf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf4_core.S b/sysdeps/x86_64/fpu/svml_s_tanhf4_core.S
> new file mode 100644
> index 0000000000..6b98950f84
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_tanhf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function tanhf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_tanhf)
> +WRAPPER_IMPL_SSE2 tanhf
> +END (_ZGVbN4v_tanhf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_tanhf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf8_core.S b/sysdeps/x86_64/fpu/svml_s_tanhf8_core.S
> new file mode 100644
> index 0000000000..3ada061ae0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_tanhf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function tanhf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_tanhf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_tanhf
> +END (_ZGVdN8v_tanhf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_tanhf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S
> new file mode 100644
> index 0000000000..255d45952d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function tanhf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_tanhf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_tanhf
> +END (_ZGVcN8v_tanhf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c
> new file mode 100644
> index 0000000000..a456c574e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-tanh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c
> new file mode 100644
> index 0000000000..a456c574e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-tanh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c
> new file mode 100644
> index 0000000000..a456c574e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-tanh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh.c
> new file mode 100644
> index 0000000000..4cb6a169d8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC tanh
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 9d91ccfe51..f53bb6813e 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVbN2v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVbN2v_erf)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVbN2v_tanh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 9e86d5fef8..0452c3db38 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -47,6 +47,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVdN4v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVdN4v_erf)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVdN4v_tanh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 0f4ef00de4..197d5afc88 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVcN4v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVcN4v_erf)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVcN4v_tanh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 975dff85af..e56ece640c 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVeN8v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVeN8v_erf)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVeN8v_tanh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c
> new file mode 100644
> index 0000000000..254f9201aa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-tanhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c
> new file mode 100644
> index 0000000000..254f9201aa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-tanhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c
> new file mode 100644
> index 0000000000..254f9201aa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-tanhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c
> new file mode 100644
> index 0000000000..9a61ee8f9c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC tanhf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 2b1e27391a..abbebf9993 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVeN16v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVeN16v_erff)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVeN16v_tanhf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index 78428bf517..ae1c8b98c2 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVbN4v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVbN4v_erff)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVbN4v_tanhf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index dadd4e6ca0..eb477a0371 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -47,6 +47,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVdN8v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVdN8v_erff)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVdN8v_tanhf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 7b2d583e54..944f7f0a75 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVcN8v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVcN8v_erff)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVcN8v_tanhf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 13/18] x86-64: Add vector log1p/log1pf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 13/18] x86-64: Add vector log1p/log1pf " Sunil K Pandey
@ 2021-12-29 21:26   ` H.J. Lu
  2021-12-29 23:28     ` Noah Goldstein
  0 siblings, 1 reply; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:26 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:55PM -0800, Sunil K Pandey wrote:
> Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector log1p/log1pf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |   11 +
>  math/bits/mathcalls.h                         |    2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |    4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |    1 +
>  sysdeps/x86_64/fpu/Versions                   |    2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
>  .../fpu/multiarch/svml_d_log1p2_core-sse2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_d_log1p2_core.c |   27 +
>  .../fpu/multiarch/svml_d_log1p2_core_sse4.S   | 1398 +++++++++++++++++
>  .../fpu/multiarch/svml_d_log1p4_core-sse.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_log1p4_core.c |   27 +
>  .../fpu/multiarch/svml_d_log1p4_core_avx2.S   | 1383 ++++++++++++++++
>  .../fpu/multiarch/svml_d_log1p8_core-avx2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_d_log1p8_core.c |   27 +
>  .../fpu/multiarch/svml_d_log1p8_core_avx512.S |  317 ++++
>  .../fpu/multiarch/svml_s_log1pf16_core-avx2.S |   20 +
>  .../fpu/multiarch/svml_s_log1pf16_core.c      |   28 +
>  .../multiarch/svml_s_log1pf16_core_avx512.S   |  271 ++++
>  .../fpu/multiarch/svml_s_log1pf4_core-sse2.S  |   20 +
>  .../fpu/multiarch/svml_s_log1pf4_core.c       |   28 +
>  .../fpu/multiarch/svml_s_log1pf4_core_sse4.S  |  252 +++
>  .../fpu/multiarch/svml_s_log1pf8_core-sse.S   |   20 +
>  .../fpu/multiarch/svml_s_log1pf8_core.c       |   28 +
>  .../fpu/multiarch/svml_s_log1pf8_core_avx2.S  |  254 +++
>  sysdeps/x86_64/fpu/svml_d_log1p2_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_d_log1p4_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S   |   25 +
>  sysdeps/x86_64/fpu/svml_d_log1p8_core.S       |   25 +
>  sysdeps/x86_64/fpu/svml_s_log1pf16_core.S     |   25 +
>  sysdeps/x86_64/fpu/svml_s_log1pf4_core.S      |   29 +
>  sysdeps/x86_64/fpu/svml_s_log1pf8_core.S      |   29 +
>  sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S  |   25 +
>  .../fpu/test-double-libmvec-log1p-avx.c       |    1 +
>  .../fpu/test-double-libmvec-log1p-avx2.c      |    1 +
>  .../fpu/test-double-libmvec-log1p-avx512f.c   |    1 +
>  .../x86_64/fpu/test-double-libmvec-log1p.c    |    3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
>  .../fpu/test-float-libmvec-log1pf-avx.c       |    1 +
>  .../fpu/test-float-libmvec-log1pf-avx2.c      |    1 +
>  .../fpu/test-float-libmvec-log1pf-avx512f.c   |    1 +
>  .../x86_64/fpu/test-float-libmvec-log1pf.c    |    3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
>  50 files changed, 4447 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 73252615ca..845246fab9 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -241,4 +241,15 @@
>  #define __DECL_SIMD_log2f32x
>  #define __DECL_SIMD_log2f64x
>  #define __DECL_SIMD_log2f128x
> +
> +#define __DECL_SIMD_log1p
> +#define __DECL_SIMD_log1pf
> +#define __DECL_SIMD_log1pl
> +#define __DECL_SIMD_log1pf16
> +#define __DECL_SIMD_log1pf32
> +#define __DECL_SIMD_log1pf64
> +#define __DECL_SIMD_log1pf128
> +#define __DECL_SIMD_log1pf32x
> +#define __DECL_SIMD_log1pf64x
> +#define __DECL_SIMD_log1pf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index bfe52a4666..aa4bc61aa4 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -119,7 +119,7 @@ __MATHCALL_VEC (exp10,, (_Mdouble_ __x));
>  __MATHCALL_VEC (expm1,, (_Mdouble_ __x));
>  
>  /* Return log(1 + X).  */
> -__MATHCALL (log1p,, (_Mdouble_ __x));
> +__MATHCALL_VEC (log1p,, (_Mdouble_ __x));
>  
>  /* Return the base 2 signed integral exponent of X.  */
>  __MATHCALL (logb,, (_Mdouble_ __x));
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index fa8b016c5d..68b940606a 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -55,6 +55,7 @@ GLIBC_2.35 _ZGVbN2v_exp10 F
>  GLIBC_2.35 _ZGVbN2v_exp2 F
>  GLIBC_2.35 _ZGVbN2v_expm1 F
>  GLIBC_2.35 _ZGVbN2v_log10 F
> +GLIBC_2.35 _ZGVbN2v_log1p F
>  GLIBC_2.35 _ZGVbN2v_log2 F
>  GLIBC_2.35 _ZGVbN2v_sinh F
>  GLIBC_2.35 _ZGVbN2vv_atan2 F
> @@ -68,6 +69,7 @@ GLIBC_2.35 _ZGVbN4v_exp10f F
>  GLIBC_2.35 _ZGVbN4v_exp2f F
>  GLIBC_2.35 _ZGVbN4v_expm1f F
>  GLIBC_2.35 _ZGVbN4v_log10f F
> +GLIBC_2.35 _ZGVbN4v_log1pf F
>  GLIBC_2.35 _ZGVbN4v_log2f F
>  GLIBC_2.35 _ZGVbN4v_sinhf F
>  GLIBC_2.35 _ZGVbN4vv_atan2f F
> @@ -81,6 +83,7 @@ GLIBC_2.35 _ZGVcN4v_exp10 F
>  GLIBC_2.35 _ZGVcN4v_exp2 F
>  GLIBC_2.35 _ZGVcN4v_expm1 F
>  GLIBC_2.35 _ZGVcN4v_log10 F
> +GLIBC_2.35 _ZGVcN4v_log1p F
>  GLIBC_2.35 _ZGVcN4v_log2 F
>  GLIBC_2.35 _ZGVcN4v_sinh F
>  GLIBC_2.35 _ZGVcN4vv_atan2 F
> @@ -94,6 +97,7 @@ GLIBC_2.35 _ZGVcN8v_exp10f F
>  GLIBC_2.35 _ZGVcN8v_exp2f F
>  GLIBC_2.35 _ZGVcN8v_expm1f F
>  GLIBC_2.35 _ZGVcN8v_log10f F
> +GLIBC_2.35 _ZGVcN8v_log1pf F
>  GLIBC_2.35 _ZGVcN8v_log2f F
>  GLIBC_2.35 _ZGVcN8v_sinhf F
>  GLIBC_2.35 _ZGVcN8vv_atan2f F
> @@ -107,6 +111,7 @@ GLIBC_2.35 _ZGVdN4v_exp10 F
>  GLIBC_2.35 _ZGVdN4v_exp2 F
>  GLIBC_2.35 _ZGVdN4v_expm1 F
>  GLIBC_2.35 _ZGVdN4v_log10 F
> +GLIBC_2.35 _ZGVdN4v_log1p F
>  GLIBC_2.35 _ZGVdN4v_log2 F
>  GLIBC_2.35 _ZGVdN4v_sinh F
>  GLIBC_2.35 _ZGVdN4vv_atan2 F
> @@ -120,6 +125,7 @@ GLIBC_2.35 _ZGVdN8v_exp10f F
>  GLIBC_2.35 _ZGVdN8v_exp2f F
>  GLIBC_2.35 _ZGVdN8v_expm1f F
>  GLIBC_2.35 _ZGVdN8v_log10f F
> +GLIBC_2.35 _ZGVdN8v_log1pf F
>  GLIBC_2.35 _ZGVdN8v_log2f F
>  GLIBC_2.35 _ZGVdN8v_sinhf F
>  GLIBC_2.35 _ZGVdN8vv_atan2f F
> @@ -133,6 +139,7 @@ GLIBC_2.35 _ZGVeN16v_exp10f F
>  GLIBC_2.35 _ZGVeN16v_exp2f F
>  GLIBC_2.35 _ZGVeN16v_expm1f F
>  GLIBC_2.35 _ZGVeN16v_log10f F
> +GLIBC_2.35 _ZGVeN16v_log1pf F
>  GLIBC_2.35 _ZGVeN16v_log2f F
>  GLIBC_2.35 _ZGVeN16v_sinhf F
>  GLIBC_2.35 _ZGVeN16vv_atan2f F
> @@ -146,6 +153,7 @@ GLIBC_2.35 _ZGVeN8v_exp10 F
>  GLIBC_2.35 _ZGVeN8v_exp2 F
>  GLIBC_2.35 _ZGVeN8v_expm1 F
>  GLIBC_2.35 _ZGVeN8v_log10 F
> +GLIBC_2.35 _ZGVeN8v_log1p F
>  GLIBC_2.35 _ZGVeN8v_log2 F
>  GLIBC_2.35 _ZGVeN8v_sinh F
>  GLIBC_2.35 _ZGVeN8vv_atan2 F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 59d284a10a..14c9db3bb3 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -110,6 +110,10 @@
>  #  define __DECL_SIMD_log2 __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_log2f
>  #  define __DECL_SIMD_log2f __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_log1p
> +#  define __DECL_SIMD_log1p __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_log1pf
> +#  define __DECL_SIMD_log1pf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index a2ca9a203f..3dca196432 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -54,6 +54,8 @@
>  !GCC$ builtin (log10f) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (log2) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (log2f) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (log1p) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (log1pf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -93,3 +95,5 @@
>  !GCC$ builtin (log10f) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (log2) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (log2f) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (log1p) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (log1pf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 8d6d0915af..378cb06d37 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -36,6 +36,7 @@ libmvec-funcs = \
>    hypot \
>    log \
>    log10 \
> +  log1p \
>    log2 \
>    pow \
>    sin \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 1b48c2d642..155fb115f3 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -23,6 +23,7 @@ libmvec {
>      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
>      _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
>      _ZGVbN2v_log10; _ZGVcN4v_log10; _ZGVdN4v_log10; _ZGVeN8v_log10;
> +    _ZGVbN2v_log1p; _ZGVcN4v_log1p; _ZGVdN4v_log1p; _ZGVeN8v_log1p;
>      _ZGVbN2v_log2; _ZGVcN4v_log2; _ZGVdN4v_log2; _ZGVeN8v_log2;
>      _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
>      _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
> @@ -36,6 +37,7 @@ libmvec {
>      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
>      _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
>      _ZGVbN4v_log10f; _ZGVcN8v_log10f; _ZGVdN8v_log10f; _ZGVeN16v_log10f;
> +    _ZGVbN4v_log1pf; _ZGVcN8v_log1pf; _ZGVdN8v_log1pf; _ZGVeN16v_log1pf;
>      _ZGVbN4v_log2f; _ZGVcN8v_log2f; _ZGVdN8v_log2f; _ZGVeN16v_log2f;
>      _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
>      _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 3b7f3cee6f..a2b15a795b 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -1685,6 +1685,26 @@ float: 2
>  float128: 2
>  ldouble: 3
>  
> +Function: "log1p_vlen16":
> +float: 2
> +
> +Function: "log1p_vlen2":
> +double: 1
> +
> +Function: "log1p_vlen4":
> +double: 1
> +float: 2
> +
> +Function: "log1p_vlen4_avx2":
> +double: 1
> +
> +Function: "log1p_vlen8":
> +double: 1
> +float: 2
> +
> +Function: "log1p_vlen8_avx2":
> +float: 2
> +
>  Function: "log2":
>  double: 2
>  float: 1
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
> new file mode 100644
> index 0000000000..8004088346
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized log1p, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_log1p _ZGVbN2v_log1p_sse2
> +#include "../svml_d_log1p2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
> new file mode 100644
> index 0000000000..35ca620aba
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized log1p, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_log1p
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_log1p, __GI__ZGVbN2v_log1p, __redirect__ZGVbN2v_log1p)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
> new file mode 100644
> index 0000000000..9d3f0647b4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
> @@ -0,0 +1,1398 @@
> +/* Function log1p vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> + *    Get short reciprocal approximation Rcp ~ 1/xh
> + *    R = (Rcp*xh - 1.0) + Rcp*xl
> + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> + *       log(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dlog1p_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	8208
> +#define poly_coeff                    	12320
> +#define ExpMask                       	12384
> +#define Two10                         	12400
> +#define MinLog1p                      	12416
> +#define MaxLog1p                      	12432
> +#define One                           	12448
> +#define SgnMask                       	12464
> +#define XThreshold                    	12480
> +#define XhMask                        	12496
> +#define Threshold                     	12512
> +#define Bias                          	12528
> +#define Bias1                         	12544
> +#define ExpMask0                      	12560
> +#define ExpMask2                      	12576
> +#define L2                            	12592
> +
> +/* Lookup bias for data table __svml_dlog1p_data_internal.  */
> +#define Table_Lookup_Bias               -0x405ff0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_log1p_sse4)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $64, %rsp
> +        movaps    %xmm0, %xmm7
> +
> +/* SgnMask used by all accuracies */
> +        movups    SgnMask+__svml_dlog1p_data_internal(%rip), %xmm6
> +        lea       Table_Lookup_Bias+__svml_dlog1p_data_internal(%rip), %rsi
> +        movaps    %xmm6, %xmm8
> +        movaps    %xmm7, %xmm15
> +        movups    One+__svml_dlog1p_data_internal(%rip), %xmm0
> +        andps     %xmm7, %xmm8
> +        cmpltpd   XThreshold+__svml_dlog1p_data_internal(%rip), %xmm8
> +        cmpnlepd  MaxLog1p+__svml_dlog1p_data_internal(%rip), %xmm15
> +        movaps    %xmm0, %xmm4
> +
> +/* compute 1+x as high, low parts */
> +        movaps    %xmm0, %xmm9
> +        addpd     %xmm7, %xmm4
> +        maxpd     %xmm7, %xmm9
> +        orps      XhMask+__svml_dlog1p_data_internal(%rip), %xmm8
> +        movaps    %xmm0, %xmm5
> +
> +/* preserve mantissa, set input exponent to 2^(-10) */
> +        movups    ExpMask+__svml_dlog1p_data_internal(%rip), %xmm3
> +        andps     %xmm8, %xmm4
> +        andps     %xmm4, %xmm3
> +
> +/* check range */
> +        movaps    %xmm7, %xmm8
> +        orps      Two10+__svml_dlog1p_data_internal(%rip), %xmm3
> +
> +/* Compute SignMask for all accuracies, including EP */
> +        andnps    %xmm7, %xmm6
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        cvtpd2ps  %xmm3, %xmm10
> +        minpd     %xmm7, %xmm5
> +        subpd     %xmm4, %xmm9
> +        cmpltpd   MinLog1p+__svml_dlog1p_data_internal(%rip), %xmm8
> +        addpd     %xmm9, %xmm5
> +        movlhps   %xmm10, %xmm10
> +        orps      %xmm15, %xmm8
> +        rcpps     %xmm10, %xmm11
> +
> +/* combine and get argument value range mask */
> +        movmskpd  %xmm8, %edx
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        movups    .FLT_16(%rip), %xmm13
> +
> +/* exponent of X needed to scale Xl */
> +        movdqu    ExpMask0+__svml_dlog1p_data_internal(%rip), %xmm12
> +        cvtps2pd  %xmm11, %xmm1
> +        addpd     %xmm13, %xmm1
> +        subpd     %xmm13, %xmm1
> +
> +/* 2^ (-10-exp(X) ) */
> +        movdqu    ExpMask2+__svml_dlog1p_data_internal(%rip), %xmm2
> +        pand      %xmm4, %xmm12
> +        psubq     %xmm12, %xmm2
> +        mulpd     %xmm1, %xmm3
> +
> +/* scale DblRcp */
> +        mulpd     %xmm1, %xmm2
> +        subpd     %xmm0, %xmm3
> +
> +/*
> + * argument reduction
> + * VQFMS( D, R, X, DblRcp1, One );
> + */
> +        mulpd     %xmm2, %xmm5
> +        addpd     %xmm5, %xmm3
> +
> +/* exponent*log(2.0) */
> +        movups    Threshold+__svml_dlog1p_data_internal(%rip), %xmm10
> +
> +/* exponent bits */
> +        psrlq     $20, %xmm4
> +        pshufd    $221, %xmm4, %xmm14
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        movaps    %xmm1, %xmm4
> +        cmpltpd   %xmm1, %xmm10
> +
> +/* biased exponent in DP format */
> +        cvtdq2pd  %xmm14, %xmm0
> +
> +/* polynomial */
> +        movups    poly_coeff+__svml_dlog1p_data_internal(%rip), %xmm1
> +        movaps    %xmm3, %xmm5
> +        mulpd     %xmm3, %xmm1
> +        mulpd     %xmm3, %xmm5
> +        addpd     poly_coeff+16+__svml_dlog1p_data_internal(%rip), %xmm1
> +        movups    poly_coeff+32+__svml_dlog1p_data_internal(%rip), %xmm2
> +        psrlq     $40, %xmm4
> +        mulpd     %xmm3, %xmm2
> +        mulpd     %xmm5, %xmm1
> +        addpd     poly_coeff+48+__svml_dlog1p_data_internal(%rip), %xmm2
> +        movd      %xmm4, %eax
> +        andps     Bias+__svml_dlog1p_data_internal(%rip), %xmm10
> +        addpd     %xmm1, %xmm2
> +
> +/* reconstruction */
> +        mulpd     %xmm2, %xmm5
> +        orps      Bias1+__svml_dlog1p_data_internal(%rip), %xmm10
> +        pshufd    $2, %xmm4, %xmm9
> +        subpd     %xmm10, %xmm0
> +        addpd     %xmm5, %xmm3
> +        movd      %xmm9, %ecx
> +        mulpd     L2+__svml_dlog1p_data_internal(%rip), %xmm0
> +        movslq    %eax, %rax
> +        movslq    %ecx, %rcx
> +        movsd     (%rsi,%rax), %xmm11
> +        movhpd    (%rsi,%rcx), %xmm11
> +        addpd     %xmm3, %xmm11
> +        addpd     %xmm11, %xmm0
> +
> +/* OR in the Sign of input argument to produce correct log1p(-0) */
> +        orps      %xmm6, %xmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm7
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm7, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      log1p@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVbN2v_log1p_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dlog1p_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
> +        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
> +        __declspec(align(16)) VUINT32 ExpMask[2][2];
> +        __declspec(align(16)) VUINT32 Two10[2][2];
> +        __declspec(align(16)) VUINT32 MinLog1p[2][2];
> +        __declspec(align(16)) VUINT32 MaxLog1p[2][2];
> +        __declspec(align(16)) VUINT32 One[2][2];
> +        __declspec(align(16)) VUINT32 SgnMask[2][2];
> +        __declspec(align(16)) VUINT32 XThreshold[2][2];
> +        __declspec(align(16)) VUINT32 XhMask[2][2];
> +        __declspec(align(16)) VUINT32 Threshold[2][2];
> +        __declspec(align(16)) VUINT32 Bias[2][2];
> +        __declspec(align(16)) VUINT32 Bias1[2][2];
> +        __declspec(align(16)) VUINT32 ExpMask0[2][2];
> +        __declspec(align(16)) VUINT32 ExpMask2[2][2];
> +        __declspec(align(16)) VUINT32 L2[2][2];
> +} __svml_dlog1p_data_internal;
> +#endif
> +__svml_dlog1p_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> +        /*== Log_LA_table ==*/
> +        .align 16
> +        .quad 0x8000000000000000
> +        .quad 0xbf5ff802a9ab10e6
> +        .quad 0xbf6ff00aa2b10bc0
> +        .quad 0xbf77ee11ebd82e94
> +        .quad 0xbf7fe02a6b106789
> +        .quad 0xbf83e7295d25a7d9
> +        .quad 0xbf87dc475f810a77
> +        .quad 0xbf8bcf712c74384c
> +        .quad 0xbf8fc0a8b0fc03e4
> +        .quad 0xbf91d7f7eb9eebe7
> +        .quad 0xbf93cea44346a575
> +        .quad 0xbf95c45a51b8d389
> +        .quad 0xbf97b91b07d5b11b
> +        .quad 0xbf99ace7551cc514
> +        .quad 0xbf9b9fc027af9198
> +        .quad 0xbf9d91a66c543cc4
> +        .quad 0xbf9f829b0e783300
> +        .quad 0xbfa0b94f7c196176
> +        .quad 0xbfa1b0d98923d980
> +        .quad 0xbfa2a7ec2214e873
> +        .quad 0xbfa39e87b9febd60
> +        .quad 0xbfa494acc34d911c
> +        .quad 0xbfa58a5bafc8e4d5
> +        .quad 0xbfa67f94f094bd98
> +        .quad 0xbfa77458f632dcfc
> +        .quad 0xbfa868a83083f6cf
> +        .quad 0xbfa95c830ec8e3eb
> +        .quad 0xbfaa4fe9ffa3d235
> +        .quad 0xbfab42dd711971bf
> +        .quad 0xbfac355dd0921f2d
> +        .quad 0xbfad276b8adb0b52
> +        .quad 0xbfae19070c276016
> +        .quad 0xbfaf0a30c01162a6
> +        .quad 0xbfaffae9119b9303
> +        .quad 0xbfb075983598e471
> +        .quad 0xbfb0ed839b5526fe
> +        .quad 0xbfb16536eea37ae1
> +        .quad 0xbfb1dcb263db1944
> +        .quad 0xbfb253f62f0a1417
> +        .quad 0xbfb2cb0283f5de1f
> +        .quad 0xbfb341d7961bd1d1
> +        .quad 0xbfb3b87598b1b6ee
> +        .quad 0xbfb42edcbea646f0
> +        .quad 0xbfb4a50d3aa1b040
> +        .quad 0xbfb51b073f06183f
> +        .quad 0xbfb590cafdf01c28
> +        .quad 0xbfb60658a93750c4
> +        .quad 0xbfb67bb0726ec0fc
> +        .quad 0xbfb6f0d28ae56b4c
> +        .quad 0xbfb765bf23a6be13
> +        .quad 0xbfb7da766d7b12cd
> +        .quad 0xbfb84ef898e8282a
> +        .quad 0xbfb8c345d6319b21
> +        .quad 0xbfb9375e55595ede
> +        .quad 0xbfb9ab42462033ad
> +        .quad 0xbfba1ef1d8061cd4
> +        .quad 0xbfba926d3a4ad563
> +        .quad 0xbfbb05b49bee43fe
> +        .quad 0xbfbb78c82bb0eda1
> +        .quad 0xbfbbeba818146765
> +        .quad 0xbfbc5e548f5bc743
> +        .quad 0xbfbcd0cdbf8c13e1
> +        .quad 0xbfbd4313d66cb35d
> +        .quad 0xbfbdb5270187d927
> +        .quad 0xbfbe27076e2af2e6
> +        .quad 0xbfbe98b549671467
> +        .quad 0xbfbf0a30c01162a6
> +        .quad 0xbfbf7b79fec37ddf
> +        .quad 0xbfbfec9131dbeabb
> +        .quad 0xbfc02ebb42bf3d4b
> +        .quad 0xbfc0671512ca596e
> +        .quad 0xbfc09f561ee719c3
> +        .quad 0xbfc0d77e7cd08e59
> +        .quad 0xbfc10f8e422539b1
> +        .quad 0xbfc14785846742ac
> +        .quad 0xbfc17f6458fca611
> +        .quad 0xbfc1b72ad52f67a0
> +        .quad 0xbfc1eed90e2dc2c3
> +        .quad 0xbfc2266f190a5acb
> +        .quad 0xbfc25ded0abc6ad2
> +        .quad 0xbfc29552f81ff523
> +        .quad 0xbfc2cca0f5f5f251
> +        .quad 0xbfc303d718e47fd3
> +        .quad 0xbfc33af575770e4f
> +        .quad 0xbfc371fc201e8f74
> +        .quad 0xbfc3a8eb2d31a376
> +        .quad 0xbfc3dfc2b0ecc62a
> +        .quad 0xbfc41682bf727bc0
> +        .quad 0xbfc44d2b6ccb7d1e
> +        .quad 0xbfc483bccce6e3dd
> +        .quad 0xbfc4ba36f39a55e5
> +        .quad 0xbfc4f099f4a230b2
> +        .quad 0xbfc526e5e3a1b438
> +        .quad 0xbfc55d1ad4232d6f
> +        .quad 0xbfc59338d9982086
> +        .quad 0xbfc5c940075972b9
> +        .quad 0xbfc5ff3070a793d4
> +        .quad 0xbfc6350a28aaa758
> +        .quad 0xbfc66acd4272ad51
> +        .quad 0xbfc6a079d0f7aad2
> +        .quad 0xbfc6d60fe719d21d
> +        .quad 0xbfc70b8f97a1aa75
> +        .quad 0xbfc740f8f54037a5
> +        .quad 0xbfc7764c128f2127
> +        .quad 0xbfc7ab890210d909
> +        .quad 0xbfc7e0afd630c274
> +        .quad 0xbfc815c0a14357eb
> +        .quad 0xbfc84abb75865139
> +        .quad 0xbfc87fa06520c911
> +        .quad 0xbfc8b46f8223625b
> +        .quad 0xbfc8e928de886d41
> +        .quad 0xbfc91dcc8c340bde
> +        .quad 0xbfc9525a9cf456b4
> +        .quad 0xbfc986d3228180ca
> +        .quad 0xbfc9bb362e7dfb83
> +        .quad 0xbfc9ef83d2769a34
> +        .quad 0xbfca23bc1fe2b563
> +        .quad 0xbfca57df28244dcd
> +        .quad 0xbfca8becfc882f19
> +        .quad 0xbfcabfe5ae46124c
> +        .quad 0xbfcaf3c94e80bff3
> +        .quad 0xbfcb2797ee46320c
> +        .quad 0xbfcb5b519e8fb5a4
> +        .quad 0xbfcb8ef670420c3b
> +        .quad 0xbfcbc286742d8cd6
> +        .quad 0xbfcbf601bb0e44e2
> +        .quad 0xbfcc2968558c18c1
> +        .quad 0xbfcc5cba543ae425
> +        .quad 0xbfcc8ff7c79a9a22
> +        .quad 0xbfccc320c0176502
> +        .quad 0xbfccf6354e09c5dc
> +        .quad 0xbfcd293581b6b3e7
> +        .quad 0xbfcd5c216b4fbb91
> +        .quad 0xbfcd8ef91af31d5e
> +        .quad 0xbfcdc1bca0abec7d
> +        .quad 0xbfcdf46c0c722d2f
> +        .quad 0xbfce27076e2af2e6
> +        .quad 0xbfce598ed5a87e2f
> +        .quad 0xbfce8c0252aa5a60
> +        .quad 0xbfcebe61f4dd7b0b
> +        .quad 0xbfcef0adcbdc5936
> +        .quad 0xbfcf22e5e72f105d
> +        .quad 0xbfcf550a564b7b37
> +        .quad 0xbfcf871b28955045
> +        .quad 0xbfcfb9186d5e3e2b
> +        .quad 0xbfcfeb0233e607cc
> +        .quad 0xbfd00e6c45ad501d
> +        .quad 0xbfd0274dc16c232f
> +        .quad 0xbfd0402594b4d041
> +        .quad 0xbfd058f3c703ebc6
> +        .quad 0xbfd071b85fcd590d
> +        .quad 0xbfd08a73667c57af
> +        .quad 0xbfd0a324e27390e3
> +        .quad 0xbfd0bbccdb0d24bd
> +        .quad 0xbfd0d46b579ab74b
> +        .quad 0xbfd0ed005f657da4
> +        .quad 0xbfd1058bf9ae4ad5
> +        .quad 0xbfd11e0e2dad9cb7
> +        .quad 0xbfd136870293a8b0
> +        .quad 0xbfd14ef67f88685a
> +        .quad 0xbfd1675cababa60e
> +        .quad 0xbfd17fb98e15095d
> +        .quad 0xbfd1980d2dd4236f
> +        .quad 0xbfd1b05791f07b49
> +        .quad 0xbfd1c898c16999fb
> +        .quad 0xbfd1e0d0c33716be
> +        .quad 0xbfd1f8ff9e48a2f3
> +        .quad 0xbfd211255986160c
> +        .quad 0xbfd22941fbcf7966
> +        .quad 0xbfd241558bfd1404
> +        .quad 0xbfd2596010df763a
> +        .quad 0xbfd27161913f853d
> +        .quad 0xbfd2895a13de86a3
> +        .quad 0xbfd2a1499f762bc9
> +        .quad 0xbfd2b9303ab89d25
> +        .quad 0xbfd2d10dec508583
> +        .quad 0xbfd2e8e2bae11d31
> +        .quad 0xbfd300aead06350c
> +        .quad 0xbfd31871c9544185
> +        .quad 0xbfd3302c16586588
> +        .quad 0xbfd347dd9a987d55
> +        .quad 0xbfd35f865c93293e
> +        .quad 0xbfd3772662bfd85b
> +        .quad 0xbfd38ebdb38ed321
> +        .quad 0xbfd3a64c556945ea
> +        .quad 0xbfd3bdd24eb14b6a
> +        .quad 0xbfd3d54fa5c1f710
> +        .quad 0xbfd3ecc460ef5f50
> +        .quad 0xbfd404308686a7e4
> +        .quad 0xbfd41b941cce0bee
> +        .quad 0xbfd432ef2a04e814
> +        .quad 0xbfd44a41b463c47c
> +        .quad 0xbfd4618bc21c5ec2
> +        .quad 0xbfd478cd5959b3d9
> +        .quad 0xbfd49006804009d1
> +        .quad 0xbfd4a7373cecf997
> +        .quad 0xbfd4be5f957778a1
> +        .quad 0xbfd4d57f8fefe27f
> +        .quad 0xbfd4ec973260026a
> +        .quad 0xbfd503a682cb1cb3
> +        .quad 0xbfd51aad872df82d
> +        .quad 0xbfd531ac457ee77e
> +        .quad 0xbfd548a2c3add263
> +        .quad 0xbfd55f9107a43ee2
> +        .quad 0xbfd5767717455a6c
> +        .quad 0xbfd58d54f86e02f2
> +        .quad 0xbfd5a42ab0f4cfe2
> +        .quad 0xbfd5baf846aa1b19
> +        .quad 0xbfd5d1bdbf5809ca
> +        .quad 0xbfd5e87b20c2954a
> +        .quad 0xbfd5ff3070a793d4
> +        .quad 0xbfd615ddb4bec13c
> +        .quad 0xbfd62c82f2b9c795
> +        .quad 0x3fd61965cdb02c1f
> +        .quad 0x3fd602d08af091ec
> +        .quad 0x3fd5ec433d5c35ae
> +        .quad 0x3fd5d5bddf595f30
> +        .quad 0x3fd5bf406b543db2
> +        .quad 0x3fd5a8cadbbedfa1
> +        .quad 0x3fd5925d2b112a59
> +        .quad 0x3fd57bf753c8d1fb
> +        .quad 0x3fd565995069514c
> +        .quad 0x3fd54f431b7be1a9
> +        .quad 0x3fd538f4af8f72fe
> +        .quad 0x3fd522ae0738a3d8
> +        .quad 0x3fd50c6f1d11b97c
> +        .quad 0x3fd4f637ebba9810
> +        .quad 0x3fd4e0086dd8baca
> +        .quad 0x3fd4c9e09e172c3c
> +        .quad 0x3fd4b3c077267e9a
> +        .quad 0x3fd49da7f3bcc41f
> +        .quad 0x3fd487970e958770
> +        .quad 0x3fd4718dc271c41b
> +        .quad 0x3fd45b8c0a17df13
> +        .quad 0x3fd44591e0539f49
> +        .quad 0x3fd42f9f3ff62642
> +        .quad 0x3fd419b423d5e8c7
> +        .quad 0x3fd403d086cea79c
> +        .quad 0x3fd3edf463c1683e
> +        .quad 0x3fd3d81fb5946dba
> +        .quad 0x3fd3c25277333184
> +        .quad 0x3fd3ac8ca38e5c5f
> +        .quad 0x3fd396ce359bbf54
> +        .quad 0x3fd3811728564cb2
> +        .quad 0x3fd36b6776be1117
> +        .quad 0x3fd355bf1bd82c8b
> +        .quad 0x3fd3401e12aecba1
> +        .quad 0x3fd32a84565120a8
> +        .quad 0x3fd314f1e1d35ce4
> +        .quad 0x3fd2ff66b04ea9d4
> +        .quad 0x3fd2e9e2bce12286
> +        .quad 0x3fd2d46602adccee
> +        .quad 0x3fd2bef07cdc9354
> +        .quad 0x3fd2a982269a3dbf
> +        .quad 0x3fd2941afb186b7c
> +        .quad 0x3fd27ebaf58d8c9d
> +        .quad 0x3fd269621134db92
> +        .quad 0x3fd25410494e56c7
> +        .quad 0x3fd23ec5991eba49
> +        .quad 0x3fd22981fbef797b
> +        .quad 0x3fd214456d0eb8d4
> +        .quad 0x3fd1ff0fe7cf47a7
> +        .quad 0x3fd1e9e1678899f4
> +        .quad 0x3fd1d4b9e796c245
> +        .quad 0x3fd1bf99635a6b95
> +        .quad 0x3fd1aa7fd638d33f
> +        .quad 0x3fd1956d3b9bc2fa
> +        .quad 0x3fd180618ef18adf
> +        .quad 0x3fd16b5ccbacfb73
> +        .quad 0x3fd1565eed455fc3
> +        .quad 0x3fd14167ef367783
> +        .quad 0x3fd12c77cd00713b
> +        .quad 0x3fd1178e8227e47c
> +        .quad 0x3fd102ac0a35cc1c
> +        .quad 0x3fd0edd060b78081
> +        .quad 0x3fd0d8fb813eb1ef
> +        .quad 0x3fd0c42d676162e3
> +        .quad 0x3fd0af660eb9e279
> +        .quad 0x3fd09aa572e6c6d4
> +        .quad 0x3fd085eb8f8ae797
> +        .quad 0x3fd07138604d5862
> +        .quad 0x3fd05c8be0d9635a
> +        .quad 0x3fd047e60cde83b8
> +        .quad 0x3fd03346e0106062
> +        .quad 0x3fd01eae5626c691
> +        .quad 0x3fd00a1c6adda473
> +        .quad 0x3fcfeb2233ea07cd
> +        .quad 0x3fcfc218be620a5e
> +        .quad 0x3fcf991c6cb3b379
> +        .quad 0x3fcf702d36777df0
> +        .quad 0x3fcf474b134df229
> +        .quad 0x3fcf1e75fadf9bde
> +        .quad 0x3fcef5ade4dcffe6
> +        .quad 0x3fceccf2c8fe920a
> +        .quad 0x3fcea4449f04aaf5
> +        .quad 0x3fce7ba35eb77e2a
> +        .quad 0x3fce530effe71012
> +        .quad 0x3fce2a877a6b2c12
> +        .quad 0x3fce020cc6235ab5
> +        .quad 0x3fcdd99edaf6d7e9
> +        .quad 0x3fcdb13db0d48940
> +        .quad 0x3fcd88e93fb2f450
> +        .quad 0x3fcd60a17f903515
> +        .quad 0x3fcd38666871f465
> +        .quad 0x3fcd1037f2655e7b
> +        .quad 0x3fcce816157f1988
> +        .quad 0x3fccc000c9db3c52
> +        .quad 0x3fcc97f8079d44ec
> +        .quad 0x3fcc6ffbc6f00f71
> +        .quad 0x3fcc480c0005ccd1
> +        .quad 0x3fcc2028ab17f9b4
> +        .quad 0x3fcbf851c067555f
> +        .quad 0x3fcbd087383bd8ad
> +        .quad 0x3fcba8c90ae4ad19
> +        .quad 0x3fcb811730b823d2
> +        .quad 0x3fcb5971a213acdb
> +        .quad 0x3fcb31d8575bce3d
> +        .quad 0x3fcb0a4b48fc1b46
> +        .quad 0x3fcae2ca6f672bd4
> +        .quad 0x3fcabb55c31693ad
> +        .quad 0x3fca93ed3c8ad9e3
> +        .quad 0x3fca6c90d44b704e
> +        .quad 0x3fca454082e6ab05
> +        .quad 0x3fca1dfc40f1b7f1
> +        .quad 0x3fc9f6c407089664
> +        .quad 0x3fc9cf97cdce0ec3
> +        .quad 0x3fc9a8778debaa38
> +        .quad 0x3fc981634011aa75
> +        .quad 0x3fc95a5adcf7017f
> +        .quad 0x3fc9335e5d594989
> +        .quad 0x3fc90c6db9fcbcd9
> +        .quad 0x3fc8e588ebac2dbf
> +        .quad 0x3fc8beafeb38fe8c
> +        .quad 0x3fc897e2b17b19a5
> +        .quad 0x3fc871213750e994
> +        .quad 0x3fc84a6b759f512f
> +        .quad 0x3fc823c16551a3c2
> +        .quad 0x3fc7fd22ff599d4f
> +        .quad 0x3fc7d6903caf5ad0
> +        .quad 0x3fc7b0091651528c
> +        .quad 0x3fc7898d85444c73
> +        .quad 0x3fc7631d82935a86
> +        .quad 0x3fc73cb9074fd14d
> +        .quad 0x3fc716600c914054
> +        .quad 0x3fc6f0128b756abc
> +        .quad 0x3fc6c9d07d203fc7
> +        .quad 0x3fc6a399dabbd383
> +        .quad 0x3fc67d6e9d785771
> +        .quad 0x3fc6574ebe8c133a
> +        .quad 0x3fc6313a37335d76
> +        .quad 0x3fc60b3100b09476
> +        .quad 0x3fc5e533144c1719
> +        .quad 0x3fc5bf406b543db2
> +        .quad 0x3fc59958ff1d52f1
> +        .quad 0x3fc5737cc9018cdd
> +        .quad 0x3fc54dabc26105d2
> +        .quad 0x3fc527e5e4a1b58d
> +        .quad 0x3fc5022b292f6a45
> +        .quad 0x3fc4dc7b897bc1c8
> +        .quad 0x3fc4b6d6fefe22a4
> +        .quad 0x3fc4913d8333b561
> +        .quad 0x3fc46baf0f9f5db7
> +        .quad 0x3fc4462b9dc9b3dc
> +        .quad 0x3fc420b32740fdd4
> +        .quad 0x3fc3fb45a59928cc
> +        .quad 0x3fc3d5e3126bc27f
> +        .quad 0x3fc3b08b6757f2a9
> +        .quad 0x3fc38b3e9e027479
> +        .quad 0x3fc365fcb0159016
> +        .quad 0x3fc340c59741142e
> +        .quad 0x3fc31b994d3a4f85
> +        .quad 0x3fc2f677cbbc0a96
> +        .quad 0x3fc2d1610c86813a
> +        .quad 0x3fc2ac55095f5c59
> +        .quad 0x3fc28753bc11aba5
> +        .quad 0x3fc2625d1e6ddf57
> +        .quad 0x3fc23d712a49c202
> +        .quad 0x3fc2188fd9807263
> +        .quad 0x3fc1f3b925f25d41
> +        .quad 0x3fc1ceed09853752
> +        .quad 0x3fc1aa2b7e23f72a
> +        .quad 0x3fc185747dbecf34
> +        .quad 0x3fc160c8024b27b1
> +        .quad 0x3fc13c2605c398c3
> +        .quad 0x3fc1178e8227e47c
> +        .quad 0x3fc0f301717cf0fb
> +        .quad 0x3fc0ce7ecdccc28d
> +        .quad 0x3fc0aa06912675d5
> +        .quad 0x3fc08598b59e3a07
> +        .quad 0x3fc06135354d4b18
> +        .quad 0x3fc03cdc0a51ec0d
> +        .quad 0x3fc0188d2ecf6140
> +        .quad 0x3fbfe89139dbd566
> +        .quad 0x3fbfa01c9db57ce2
> +        .quad 0x3fbf57bc7d9005db
> +        .quad 0x3fbf0f70cdd992e3
> +        .quad 0x3fbec739830a1120
> +        .quad 0x3fbe7f1691a32d3e
> +        .quad 0x3fbe3707ee30487b
> +        .quad 0x3fbdef0d8d466db9
> +        .quad 0x3fbda727638446a2
> +        .quad 0x3fbd5f55659210e2
> +        .quad 0x3fbd179788219364
> +        .quad 0x3fbccfedbfee13a8
> +        .quad 0x3fbc885801bc4b23
> +        .quad 0x3fbc40d6425a5cb1
> +        .quad 0x3fbbf968769fca11
> +        .quad 0x3fbbb20e936d6974
> +        .quad 0x3fbb6ac88dad5b1c
> +        .quad 0x3fbb23965a52ff00
> +        .quad 0x3fbadc77ee5aea8c
> +        .quad 0x3fba956d3ecade63
> +        .quad 0x3fba4e7640b1bc38
> +        .quad 0x3fba0792e9277cac
> +        .quad 0x3fb9c0c32d4d2548
> +        .quad 0x3fb97a07024cbe74
> +        .quad 0x3fb9335e5d594989
> +        .quad 0x3fb8ecc933aeb6e8
> +        .quad 0x3fb8a6477a91dc29
> +        .quad 0x3fb85fd927506a48
> +        .quad 0x3fb8197e2f40e3f0
> +        .quad 0x3fb7d33687c293c9
> +        .quad 0x3fb78d02263d82d3
> +        .quad 0x3fb746e100226ed9
> +        .quad 0x3fb700d30aeac0e1
> +        .quad 0x3fb6bad83c1883b6
> +        .quad 0x3fb674f089365a7a
> +        .quad 0x3fb62f1be7d77743
> +        .quad 0x3fb5e95a4d9791cb
> +        .quad 0x3fb5a3abb01ade25
> +        .quad 0x3fb55e10050e0384
> +        .quad 0x3fb518874226130a
> +        .quad 0x3fb4d3115d207eac
> +        .quad 0x3fb48dae4bc31018
> +        .quad 0x3fb4485e03dbdfad
> +        .quad 0x3fb403207b414b7f
> +        .quad 0x3fb3bdf5a7d1ee64
> +        .quad 0x3fb378dd7f749714
> +        .quad 0x3fb333d7f8183f4b
> +        .quad 0x3fb2eee507b40301
> +        .quad 0x3fb2aa04a44717a5
> +        .quad 0x3fb26536c3d8c369
> +        .quad 0x3fb2207b5c78549e
> +        .quad 0x3fb1dbd2643d190b
> +        .quad 0x3fb1973bd1465567
> +        .quad 0x3fb152b799bb3cc9
> +        .quad 0x3fb10e45b3cae831
> +        .quad 0x3fb0c9e615ac4e17
> +        .quad 0x3fb08598b59e3a07
> +        .quad 0x3fb0415d89e74444
> +        .quad 0x3faffa6911ab9301
> +        .quad 0x3faf723b517fc523
> +        .quad 0x3faeea31c006b87c
> +        .quad 0x3fae624c4a0b5e1b
> +        .quad 0x3fadda8adc67ee4e
> +        .quad 0x3fad52ed6405d86f
> +        .quad 0x3faccb73cdddb2cc
> +        .quad 0x3fac441e06f72a9e
> +        .quad 0x3fabbcebfc68f420
> +        .quad 0x3fab35dd9b58baad
> +        .quad 0x3faaaef2d0fb10fc
> +        .quad 0x3faa282b8a936171
> +        .quad 0x3fa9a187b573de7c
> +        .quad 0x3fa91b073efd7314
> +        .quad 0x3fa894aa149fb343
> +        .quad 0x3fa80e7023d8ccc4
> +        .quad 0x3fa788595a3577ba
> +        .quad 0x3fa70265a550e777
> +        .quad 0x3fa67c94f2d4bb58
> +        .quad 0x3fa5f6e73078efb8
> +        .quad 0x3fa5715c4c03ceef
> +        .quad 0x3fa4ebf43349e26f
> +        .quad 0x3fa466aed42de3ea
> +        .quad 0x3fa3e18c1ca0ae92
> +        .quad 0x3fa35c8bfaa1306b
> +        .quad 0x3fa2d7ae5c3c5bae
> +        .quad 0x3fa252f32f8d183f
> +        .quad 0x3fa1ce5a62bc353a
> +        .quad 0x3fa149e3e4005a8d
> +        .quad 0x3fa0c58fa19dfaaa
> +        .quad 0x3fa0415d89e74444
> +        .quad 0x3f9f7a9b16782856
> +        .quad 0x3f9e72bf2813ce51
> +        .quad 0x3f9d6b2725979802
> +        .quad 0x3f9c63d2ec14aaf2
> +        .quad 0x3f9b5cc258b718e6
> +        .quad 0x3f9a55f548c5c43f
> +        .quad 0x3f994f6b99a24475
> +        .quad 0x3f98492528c8cabf
> +        .quad 0x3f974321d3d006d3
> +        .quad 0x3f963d6178690bd6
> +        .quad 0x3f9537e3f45f3565
> +        .quad 0x3f9432a925980cc1
> +        .quad 0x3f932db0ea132e22
> +        .quad 0x3f9228fb1fea2e28
> +        .quad 0x3f912487a5507f70
> +        .quad 0x3f90205658935847
> +        .quad 0x3f8e38ce3033310c
> +        .quad 0x3f8c317384c75f06
> +        .quad 0x3f8a2a9c6c170462
> +        .quad 0x3f882448a388a2aa
> +        .quad 0x3f861e77e8b53fc6
> +        .quad 0x3f841929f96832f0
> +        .quad 0x3f82145e939ef1e9
> +        .quad 0x3f8010157588de71
> +        .quad 0x3f7c189cbb0e27fb
> +        .quad 0x3f78121214586b54
> +        .quad 0x3f740c8a747878e2
> +        .quad 0x3f70080559588b35
> +        .quad 0x3f680904828985c0
> +        .quad 0x3f60040155d5889e
> +        .quad 0x3f50020055655889
> +        .quad 0x0000000000000000
> +        /*== poly_coeff[4] ==*/
> +        .align 16
> +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 16
> +        .quad 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinLog1p = -1+2^(-53) ==*/
> +        .align 16
> +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff
> +        /*== MaxLog1p ==*/
> +        .align 16
> +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000
> +        /*== One ==*/
> +        .align 16
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== SgnMask ==*/
> +        .align 16
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== XThreshold ==*/
> +        .align 16
> +        .quad 0x3e00000000000000, 0x3e00000000000000
> +        /*== XhMask ==*/
> +        .align 16
> +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00
> +        /*== Threshold ==*/
> +        .align 16
> +        .quad 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 16
> +        .quad 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 16
> +        .quad 0x408ff00000000000, 0x408ff00000000000
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000
> +        /*== ExpMask2 ==*/
> +        .align 16
> +        .quad 0x7f40000000000000, 0x7f40000000000000
> +        /*== L2L ==*/
> +        .align 16
> +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> +        .align 16
> +        .type	__svml_dlog1p_data_internal,@object
> +        .size	__svml_dlog1p_data_internal,.-__svml_dlog1p_data_internal
> +        .space 96, 0x00 	
> +        .align 16
> +
> +.FLT_16:
> +        .long	0x00000000,0x43380000,0x00000000,0x43380000
> +        .type	.FLT_16,@object
> +        .size	.FLT_16,16
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
> new file mode 100644
> index 0000000000..ec01af680c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized log1p, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_log1p _ZGVdN4v_log1p_sse_wrapper
> +#include "../svml_d_log1p4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
> new file mode 100644
> index 0000000000..808f3224ef
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized log1p, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_log1p
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_log1p, __GI__ZGVdN4v_log1p, __redirect__ZGVdN4v_log1p)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
> new file mode 100644
> index 0000000000..548538b0ec
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
> @@ -0,0 +1,1383 @@
> +/* Function log1p vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> + *    Get short reciprocal approximation Rcp ~ 1/xh
> + *    R = (Rcp*xh - 1.0) + Rcp*xl
> + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> + *       log(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dlog1p_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	8224
> +#define poly_coeff                    	12352
> +#define ExpMask                       	12480
> +#define Two10                         	12512
> +#define MinLog1p                      	12544
> +#define MaxLog1p                      	12576
> +#define One                           	12608
> +#define SgnMask                       	12640
> +#define XThreshold                    	12672
> +#define XhMask                        	12704
> +#define Threshold                     	12736
> +#define Bias                          	12768
> +#define Bias1                         	12800
> +#define ExpMask0                      	12832
> +#define ExpMask2                      	12864
> +#define L2                            	12896
> +
> +/* Lookup bias for data table __svml_dlog1p_data_internal.  */
> +#define Table_Lookup_Bias               -0x405fe0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_log1p_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       Table_Lookup_Bias+__svml_dlog1p_data_internal(%rip), %r8
> +
> +/* SgnMask used by all accuracies */
> +        vmovupd   SgnMask+__svml_dlog1p_data_internal(%rip), %ymm12
> +        vmovupd   One+__svml_dlog1p_data_internal(%rip), %ymm7
> +
> +/* 2^ (-10-exp(X) ) */
> +        vmovupd   ExpMask2+__svml_dlog1p_data_internal(%rip), %ymm3
> +        vmovapd   %ymm0, %ymm9
> +        vandpd    %ymm12, %ymm9, %ymm10
> +        vcmplt_oqpd XThreshold+__svml_dlog1p_data_internal(%rip), %ymm10, %ymm11
> +        vaddpd    %ymm7, %ymm9, %ymm13
> +
> +/* compute 1+x as high, low parts */
> +        vmaxpd    %ymm9, %ymm7, %ymm15
> +        vminpd    %ymm9, %ymm7, %ymm6
> +        vorpd     XhMask+__svml_dlog1p_data_internal(%rip), %ymm11, %ymm14
> +        vandpd    %ymm14, %ymm13, %ymm4
> +
> +/* preserve mantissa, set input exponent to 2^(-10) */
> +        vandpd    ExpMask+__svml_dlog1p_data_internal(%rip), %ymm4, %ymm5
> +        vorpd     Two10+__svml_dlog1p_data_internal(%rip), %ymm5, %ymm5
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        vcvtpd2ps %ymm5, %xmm2
> +        vsubpd    %ymm4, %ymm15, %ymm0
> +
> +/* check range */
> +        vcmplt_oqpd MinLog1p+__svml_dlog1p_data_internal(%rip), %ymm9, %ymm15
> +        vrcpps    %xmm2, %xmm1
> +        vaddpd    %ymm0, %ymm6, %ymm6
> +        vcmpnle_uqpd MaxLog1p+__svml_dlog1p_data_internal(%rip), %ymm9, %ymm0
> +        vcvtps2pd %xmm1, %ymm11
> +
> +/* exponent of X needed to scale Xl */
> +        vandps    ExpMask0+__svml_dlog1p_data_internal(%rip), %ymm4, %ymm10
> +        vpsubq    %ymm10, %ymm3, %ymm13
> +
> +/* exponent bits */
> +        vpsrlq    $20, %ymm4, %ymm4
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        vroundpd  $0, %ymm11, %ymm3
> +
> +/* scale DblRcp */
> +        vmulpd    %ymm13, %ymm3, %ymm2
> +
> +/* exponent*log(2.0) */
> +        vmovupd   Threshold+__svml_dlog1p_data_internal(%rip), %ymm13
> +        vfmsub213pd %ymm7, %ymm3, %ymm5
> +
> +/* Compute SignMask for all accuracies, including EP */
> +        vandnpd   %ymm9, %ymm12, %ymm8
> +        vorpd     %ymm0, %ymm15, %ymm7
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        vpsrlq    $40, %ymm3, %ymm0
> +
> +/*
> + * argument reduction
> + * VQFMS( D, R, X, DblRcp1, One );
> + */
> +        vfmadd213pd %ymm5, %ymm2, %ymm6
> +        vmovupd   poly_coeff+64+__svml_dlog1p_data_internal(%rip), %ymm2
> +        vcmplt_oqpd %ymm3, %ymm13, %ymm3
> +        vmulpd    %ymm6, %ymm6, %ymm5
> +        vfmadd213pd poly_coeff+96+__svml_dlog1p_data_internal(%rip), %ymm6, %ymm2
> +
> +/* combine and get argument value range mask */
> +        vmovmskpd %ymm7, %eax
> +        vextractf128 $1, %ymm4, %xmm12
> +        vshufps   $221, %xmm12, %xmm4, %xmm14
> +
> +/* biased exponent in DP format */
> +        vcvtdq2pd %xmm14, %ymm1
> +        vandpd    Bias+__svml_dlog1p_data_internal(%rip), %ymm3, %ymm14
> +        vorpd     Bias1+__svml_dlog1p_data_internal(%rip), %ymm14, %ymm15
> +        vsubpd    %ymm15, %ymm1, %ymm1
> +        vmulpd    L2+__svml_dlog1p_data_internal(%rip), %ymm1, %ymm3
> +
> +/* polynomial */
> +        vmovupd   poly_coeff+__svml_dlog1p_data_internal(%rip), %ymm1
> +        vfmadd213pd poly_coeff+32+__svml_dlog1p_data_internal(%rip), %ymm6, %ymm1
> +        vfmadd213pd %ymm2, %ymm5, %ymm1
> +
> +/* reconstruction */
> +        vfmadd213pd %ymm6, %ymm5, %ymm1
> +        vextractf128 $1, %ymm0, %xmm10
> +        vmovd     %xmm0, %edx
> +        vmovd     %xmm10, %esi
> +        movslq    %edx, %rdx
> +        vpextrd   $2, %xmm0, %ecx
> +        movslq    %esi, %rsi
> +        vpextrd   $2, %xmm10, %edi
> +        movslq    %ecx, %rcx
> +        movslq    %edi, %rdi
> +        vmovsd    (%r8,%rdx), %xmm4
> +        vmovsd    (%r8,%rsi), %xmm11
> +        vmovhpd   (%r8,%rcx), %xmm4, %xmm7
> +        vmovhpd   (%r8,%rdi), %xmm11, %xmm12
> +        vinsertf128 $1, %xmm12, %ymm7, %ymm0
> +        vaddpd    %ymm1, %ymm0, %ymm6
> +        vaddpd    %ymm6, %ymm3, %ymm0
> +
> +/* OR in the Sign of input argument to produce correct log1p(-0) */
> +        vorpd     %ymm8, %ymm0, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm9
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm9, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      log1p@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_log1p_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dlog1p_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
> +        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
> +        __declspec(align(32)) VUINT32 ExpMask[4][2];
> +        __declspec(align(32)) VUINT32 Two10[4][2];
> +        __declspec(align(32)) VUINT32 MinLog1p[4][2];
> +        __declspec(align(32)) VUINT32 MaxLog1p[4][2];
> +        __declspec(align(32)) VUINT32 One[4][2];
> +        __declspec(align(32)) VUINT32 SgnMask[4][2];
> +        __declspec(align(32)) VUINT32 XThreshold[4][2];
> +        __declspec(align(32)) VUINT32 XhMask[4][2];
> +        __declspec(align(32)) VUINT32 Threshold[4][2];
> +        __declspec(align(32)) VUINT32 Bias[4][2];
> +        __declspec(align(32)) VUINT32 Bias1[4][2];
> +        __declspec(align(32)) VUINT32 ExpMask0[4][2];
> +        __declspec(align(32)) VUINT32 ExpMask2[4][2];
> +        __declspec(align(32)) VUINT32 L2[4][2];
> +} __svml_dlog1p_data_internal;
> +#endif
> +__svml_dlog1p_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> +        /*== Log_LA_table ==*/
> +        .align 32
> +        .quad 0x8000000000000000
> +        .quad 0xbf5ff802a9ab10e6
> +        .quad 0xbf6ff00aa2b10bc0
> +        .quad 0xbf77ee11ebd82e94
> +        .quad 0xbf7fe02a6b106789
> +        .quad 0xbf83e7295d25a7d9
> +        .quad 0xbf87dc475f810a77
> +        .quad 0xbf8bcf712c74384c
> +        .quad 0xbf8fc0a8b0fc03e4
> +        .quad 0xbf91d7f7eb9eebe7
> +        .quad 0xbf93cea44346a575
> +        .quad 0xbf95c45a51b8d389
> +        .quad 0xbf97b91b07d5b11b
> +        .quad 0xbf99ace7551cc514
> +        .quad 0xbf9b9fc027af9198
> +        .quad 0xbf9d91a66c543cc4
> +        .quad 0xbf9f829b0e783300
> +        .quad 0xbfa0b94f7c196176
> +        .quad 0xbfa1b0d98923d980
> +        .quad 0xbfa2a7ec2214e873
> +        .quad 0xbfa39e87b9febd60
> +        .quad 0xbfa494acc34d911c
> +        .quad 0xbfa58a5bafc8e4d5
> +        .quad 0xbfa67f94f094bd98
> +        .quad 0xbfa77458f632dcfc
> +        .quad 0xbfa868a83083f6cf
> +        .quad 0xbfa95c830ec8e3eb
> +        .quad 0xbfaa4fe9ffa3d235
> +        .quad 0xbfab42dd711971bf
> +        .quad 0xbfac355dd0921f2d
> +        .quad 0xbfad276b8adb0b52
> +        .quad 0xbfae19070c276016
> +        .quad 0xbfaf0a30c01162a6
> +        .quad 0xbfaffae9119b9303
> +        .quad 0xbfb075983598e471
> +        .quad 0xbfb0ed839b5526fe
> +        .quad 0xbfb16536eea37ae1
> +        .quad 0xbfb1dcb263db1944
> +        .quad 0xbfb253f62f0a1417
> +        .quad 0xbfb2cb0283f5de1f
> +        .quad 0xbfb341d7961bd1d1
> +        .quad 0xbfb3b87598b1b6ee
> +        .quad 0xbfb42edcbea646f0
> +        .quad 0xbfb4a50d3aa1b040
> +        .quad 0xbfb51b073f06183f
> +        .quad 0xbfb590cafdf01c28
> +        .quad 0xbfb60658a93750c4
> +        .quad 0xbfb67bb0726ec0fc
> +        .quad 0xbfb6f0d28ae56b4c
> +        .quad 0xbfb765bf23a6be13
> +        .quad 0xbfb7da766d7b12cd
> +        .quad 0xbfb84ef898e8282a
> +        .quad 0xbfb8c345d6319b21
> +        .quad 0xbfb9375e55595ede
> +        .quad 0xbfb9ab42462033ad
> +        .quad 0xbfba1ef1d8061cd4
> +        .quad 0xbfba926d3a4ad563
> +        .quad 0xbfbb05b49bee43fe
> +        .quad 0xbfbb78c82bb0eda1
> +        .quad 0xbfbbeba818146765
> +        .quad 0xbfbc5e548f5bc743
> +        .quad 0xbfbcd0cdbf8c13e1
> +        .quad 0xbfbd4313d66cb35d
> +        .quad 0xbfbdb5270187d927
> +        .quad 0xbfbe27076e2af2e6
> +        .quad 0xbfbe98b549671467
> +        .quad 0xbfbf0a30c01162a6
> +        .quad 0xbfbf7b79fec37ddf
> +        .quad 0xbfbfec9131dbeabb
> +        .quad 0xbfc02ebb42bf3d4b
> +        .quad 0xbfc0671512ca596e
> +        .quad 0xbfc09f561ee719c3
> +        .quad 0xbfc0d77e7cd08e59
> +        .quad 0xbfc10f8e422539b1
> +        .quad 0xbfc14785846742ac
> +        .quad 0xbfc17f6458fca611
> +        .quad 0xbfc1b72ad52f67a0
> +        .quad 0xbfc1eed90e2dc2c3
> +        .quad 0xbfc2266f190a5acb
> +        .quad 0xbfc25ded0abc6ad2
> +        .quad 0xbfc29552f81ff523
> +        .quad 0xbfc2cca0f5f5f251
> +        .quad 0xbfc303d718e47fd3
> +        .quad 0xbfc33af575770e4f
> +        .quad 0xbfc371fc201e8f74
> +        .quad 0xbfc3a8eb2d31a376
> +        .quad 0xbfc3dfc2b0ecc62a
> +        .quad 0xbfc41682bf727bc0
> +        .quad 0xbfc44d2b6ccb7d1e
> +        .quad 0xbfc483bccce6e3dd
> +        .quad 0xbfc4ba36f39a55e5
> +        .quad 0xbfc4f099f4a230b2
> +        .quad 0xbfc526e5e3a1b438
> +        .quad 0xbfc55d1ad4232d6f
> +        .quad 0xbfc59338d9982086
> +        .quad 0xbfc5c940075972b9
> +        .quad 0xbfc5ff3070a793d4
> +        .quad 0xbfc6350a28aaa758
> +        .quad 0xbfc66acd4272ad51
> +        .quad 0xbfc6a079d0f7aad2
> +        .quad 0xbfc6d60fe719d21d
> +        .quad 0xbfc70b8f97a1aa75
> +        .quad 0xbfc740f8f54037a5
> +        .quad 0xbfc7764c128f2127
> +        .quad 0xbfc7ab890210d909
> +        .quad 0xbfc7e0afd630c274
> +        .quad 0xbfc815c0a14357eb
> +        .quad 0xbfc84abb75865139
> +        .quad 0xbfc87fa06520c911
> +        .quad 0xbfc8b46f8223625b
> +        .quad 0xbfc8e928de886d41
> +        .quad 0xbfc91dcc8c340bde
> +        .quad 0xbfc9525a9cf456b4
> +        .quad 0xbfc986d3228180ca
> +        .quad 0xbfc9bb362e7dfb83
> +        .quad 0xbfc9ef83d2769a34
> +        .quad 0xbfca23bc1fe2b563
> +        .quad 0xbfca57df28244dcd
> +        .quad 0xbfca8becfc882f19
> +        .quad 0xbfcabfe5ae46124c
> +        .quad 0xbfcaf3c94e80bff3
> +        .quad 0xbfcb2797ee46320c
> +        .quad 0xbfcb5b519e8fb5a4
> +        .quad 0xbfcb8ef670420c3b
> +        .quad 0xbfcbc286742d8cd6
> +        .quad 0xbfcbf601bb0e44e2
> +        .quad 0xbfcc2968558c18c1
> +        .quad 0xbfcc5cba543ae425
> +        .quad 0xbfcc8ff7c79a9a22
> +        .quad 0xbfccc320c0176502
> +        .quad 0xbfccf6354e09c5dc
> +        .quad 0xbfcd293581b6b3e7
> +        .quad 0xbfcd5c216b4fbb91
> +        .quad 0xbfcd8ef91af31d5e
> +        .quad 0xbfcdc1bca0abec7d
> +        .quad 0xbfcdf46c0c722d2f
> +        .quad 0xbfce27076e2af2e6
> +        .quad 0xbfce598ed5a87e2f
> +        .quad 0xbfce8c0252aa5a60
> +        .quad 0xbfcebe61f4dd7b0b
> +        .quad 0xbfcef0adcbdc5936
> +        .quad 0xbfcf22e5e72f105d
> +        .quad 0xbfcf550a564b7b37
> +        .quad 0xbfcf871b28955045
> +        .quad 0xbfcfb9186d5e3e2b
> +        .quad 0xbfcfeb0233e607cc
> +        .quad 0xbfd00e6c45ad501d
> +        .quad 0xbfd0274dc16c232f
> +        .quad 0xbfd0402594b4d041
> +        .quad 0xbfd058f3c703ebc6
> +        .quad 0xbfd071b85fcd590d
> +        .quad 0xbfd08a73667c57af
> +        .quad 0xbfd0a324e27390e3
> +        .quad 0xbfd0bbccdb0d24bd
> +        .quad 0xbfd0d46b579ab74b
> +        .quad 0xbfd0ed005f657da4
> +        .quad 0xbfd1058bf9ae4ad5
> +        .quad 0xbfd11e0e2dad9cb7
> +        .quad 0xbfd136870293a8b0
> +        .quad 0xbfd14ef67f88685a
> +        .quad 0xbfd1675cababa60e
> +        .quad 0xbfd17fb98e15095d
> +        .quad 0xbfd1980d2dd4236f
> +        .quad 0xbfd1b05791f07b49
> +        .quad 0xbfd1c898c16999fb
> +        .quad 0xbfd1e0d0c33716be
> +        .quad 0xbfd1f8ff9e48a2f3
> +        .quad 0xbfd211255986160c
> +        .quad 0xbfd22941fbcf7966
> +        .quad 0xbfd241558bfd1404
> +        .quad 0xbfd2596010df763a
> +        .quad 0xbfd27161913f853d
> +        .quad 0xbfd2895a13de86a3
> +        .quad 0xbfd2a1499f762bc9
> +        .quad 0xbfd2b9303ab89d25
> +        .quad 0xbfd2d10dec508583
> +        .quad 0xbfd2e8e2bae11d31
> +        .quad 0xbfd300aead06350c
> +        .quad 0xbfd31871c9544185
> +        .quad 0xbfd3302c16586588
> +        .quad 0xbfd347dd9a987d55
> +        .quad 0xbfd35f865c93293e
> +        .quad 0xbfd3772662bfd85b
> +        .quad 0xbfd38ebdb38ed321
> +        .quad 0xbfd3a64c556945ea
> +        .quad 0xbfd3bdd24eb14b6a
> +        .quad 0xbfd3d54fa5c1f710
> +        .quad 0xbfd3ecc460ef5f50
> +        .quad 0xbfd404308686a7e4
> +        .quad 0xbfd41b941cce0bee
> +        .quad 0xbfd432ef2a04e814
> +        .quad 0xbfd44a41b463c47c
> +        .quad 0xbfd4618bc21c5ec2
> +        .quad 0xbfd478cd5959b3d9
> +        .quad 0xbfd49006804009d1
> +        .quad 0xbfd4a7373cecf997
> +        .quad 0xbfd4be5f957778a1
> +        .quad 0xbfd4d57f8fefe27f
> +        .quad 0xbfd4ec973260026a
> +        .quad 0xbfd503a682cb1cb3
> +        .quad 0xbfd51aad872df82d
> +        .quad 0xbfd531ac457ee77e
> +        .quad 0xbfd548a2c3add263
> +        .quad 0xbfd55f9107a43ee2
> +        .quad 0xbfd5767717455a6c
> +        .quad 0xbfd58d54f86e02f2
> +        .quad 0xbfd5a42ab0f4cfe2
> +        .quad 0xbfd5baf846aa1b19
> +        .quad 0xbfd5d1bdbf5809ca
> +        .quad 0xbfd5e87b20c2954a
> +        .quad 0xbfd5ff3070a793d4
> +        .quad 0xbfd615ddb4bec13c
> +        .quad 0xbfd62c82f2b9c795
> +        .quad 0x3fd61965cdb02c1f
> +        .quad 0x3fd602d08af091ec
> +        .quad 0x3fd5ec433d5c35ae
> +        .quad 0x3fd5d5bddf595f30
> +        .quad 0x3fd5bf406b543db2
> +        .quad 0x3fd5a8cadbbedfa1
> +        .quad 0x3fd5925d2b112a59
> +        .quad 0x3fd57bf753c8d1fb
> +        .quad 0x3fd565995069514c
> +        .quad 0x3fd54f431b7be1a9
> +        .quad 0x3fd538f4af8f72fe
> +        .quad 0x3fd522ae0738a3d8
> +        .quad 0x3fd50c6f1d11b97c
> +        .quad 0x3fd4f637ebba9810
> +        .quad 0x3fd4e0086dd8baca
> +        .quad 0x3fd4c9e09e172c3c
> +        .quad 0x3fd4b3c077267e9a
> +        .quad 0x3fd49da7f3bcc41f
> +        .quad 0x3fd487970e958770
> +        .quad 0x3fd4718dc271c41b
> +        .quad 0x3fd45b8c0a17df13
> +        .quad 0x3fd44591e0539f49
> +        .quad 0x3fd42f9f3ff62642
> +        .quad 0x3fd419b423d5e8c7
> +        .quad 0x3fd403d086cea79c
> +        .quad 0x3fd3edf463c1683e
> +        .quad 0x3fd3d81fb5946dba
> +        .quad 0x3fd3c25277333184
> +        .quad 0x3fd3ac8ca38e5c5f
> +        .quad 0x3fd396ce359bbf54
> +        .quad 0x3fd3811728564cb2
> +        .quad 0x3fd36b6776be1117
> +        .quad 0x3fd355bf1bd82c8b
> +        .quad 0x3fd3401e12aecba1
> +        .quad 0x3fd32a84565120a8
> +        .quad 0x3fd314f1e1d35ce4
> +        .quad 0x3fd2ff66b04ea9d4
> +        .quad 0x3fd2e9e2bce12286
> +        .quad 0x3fd2d46602adccee
> +        .quad 0x3fd2bef07cdc9354
> +        .quad 0x3fd2a982269a3dbf
> +        .quad 0x3fd2941afb186b7c
> +        .quad 0x3fd27ebaf58d8c9d
> +        .quad 0x3fd269621134db92
> +        .quad 0x3fd25410494e56c7
> +        .quad 0x3fd23ec5991eba49
> +        .quad 0x3fd22981fbef797b
> +        .quad 0x3fd214456d0eb8d4
> +        .quad 0x3fd1ff0fe7cf47a7
> +        .quad 0x3fd1e9e1678899f4
> +        .quad 0x3fd1d4b9e796c245
> +        .quad 0x3fd1bf99635a6b95
> +        .quad 0x3fd1aa7fd638d33f
> +        .quad 0x3fd1956d3b9bc2fa
> +        .quad 0x3fd180618ef18adf
> +        .quad 0x3fd16b5ccbacfb73
> +        .quad 0x3fd1565eed455fc3
> +        .quad 0x3fd14167ef367783
> +        .quad 0x3fd12c77cd00713b
> +        .quad 0x3fd1178e8227e47c
> +        .quad 0x3fd102ac0a35cc1c
> +        .quad 0x3fd0edd060b78081
> +        .quad 0x3fd0d8fb813eb1ef
> +        .quad 0x3fd0c42d676162e3
> +        .quad 0x3fd0af660eb9e279
> +        .quad 0x3fd09aa572e6c6d4
> +        .quad 0x3fd085eb8f8ae797
> +        .quad 0x3fd07138604d5862
> +        .quad 0x3fd05c8be0d9635a
> +        .quad 0x3fd047e60cde83b8
> +        .quad 0x3fd03346e0106062
> +        .quad 0x3fd01eae5626c691
> +        .quad 0x3fd00a1c6adda473
> +        .quad 0x3fcfeb2233ea07cd
> +        .quad 0x3fcfc218be620a5e
> +        .quad 0x3fcf991c6cb3b379
> +        .quad 0x3fcf702d36777df0
> +        .quad 0x3fcf474b134df229
> +        .quad 0x3fcf1e75fadf9bde
> +        .quad 0x3fcef5ade4dcffe6
> +        .quad 0x3fceccf2c8fe920a
> +        .quad 0x3fcea4449f04aaf5
> +        .quad 0x3fce7ba35eb77e2a
> +        .quad 0x3fce530effe71012
> +        .quad 0x3fce2a877a6b2c12
> +        .quad 0x3fce020cc6235ab5
> +        .quad 0x3fcdd99edaf6d7e9
> +        .quad 0x3fcdb13db0d48940
> +        .quad 0x3fcd88e93fb2f450
> +        .quad 0x3fcd60a17f903515
> +        .quad 0x3fcd38666871f465
> +        .quad 0x3fcd1037f2655e7b
> +        .quad 0x3fcce816157f1988
> +        .quad 0x3fccc000c9db3c52
> +        .quad 0x3fcc97f8079d44ec
> +        .quad 0x3fcc6ffbc6f00f71
> +        .quad 0x3fcc480c0005ccd1
> +        .quad 0x3fcc2028ab17f9b4
> +        .quad 0x3fcbf851c067555f
> +        .quad 0x3fcbd087383bd8ad
> +        .quad 0x3fcba8c90ae4ad19
> +        .quad 0x3fcb811730b823d2
> +        .quad 0x3fcb5971a213acdb
> +        .quad 0x3fcb31d8575bce3d
> +        .quad 0x3fcb0a4b48fc1b46
> +        .quad 0x3fcae2ca6f672bd4
> +        .quad 0x3fcabb55c31693ad
> +        .quad 0x3fca93ed3c8ad9e3
> +        .quad 0x3fca6c90d44b704e
> +        .quad 0x3fca454082e6ab05
> +        .quad 0x3fca1dfc40f1b7f1
> +        .quad 0x3fc9f6c407089664
> +        .quad 0x3fc9cf97cdce0ec3
> +        .quad 0x3fc9a8778debaa38
> +        .quad 0x3fc981634011aa75
> +        .quad 0x3fc95a5adcf7017f
> +        .quad 0x3fc9335e5d594989
> +        .quad 0x3fc90c6db9fcbcd9
> +        .quad 0x3fc8e588ebac2dbf
> +        .quad 0x3fc8beafeb38fe8c
> +        .quad 0x3fc897e2b17b19a5
> +        .quad 0x3fc871213750e994
> +        .quad 0x3fc84a6b759f512f
> +        .quad 0x3fc823c16551a3c2
> +        .quad 0x3fc7fd22ff599d4f
> +        .quad 0x3fc7d6903caf5ad0
> +        .quad 0x3fc7b0091651528c
> +        .quad 0x3fc7898d85444c73
> +        .quad 0x3fc7631d82935a86
> +        .quad 0x3fc73cb9074fd14d
> +        .quad 0x3fc716600c914054
> +        .quad 0x3fc6f0128b756abc
> +        .quad 0x3fc6c9d07d203fc7
> +        .quad 0x3fc6a399dabbd383
> +        .quad 0x3fc67d6e9d785771
> +        .quad 0x3fc6574ebe8c133a
> +        .quad 0x3fc6313a37335d76
> +        .quad 0x3fc60b3100b09476
> +        .quad 0x3fc5e533144c1719
> +        .quad 0x3fc5bf406b543db2
> +        .quad 0x3fc59958ff1d52f1
> +        .quad 0x3fc5737cc9018cdd
> +        .quad 0x3fc54dabc26105d2
> +        .quad 0x3fc527e5e4a1b58d
> +        .quad 0x3fc5022b292f6a45
> +        .quad 0x3fc4dc7b897bc1c8
> +        .quad 0x3fc4b6d6fefe22a4
> +        .quad 0x3fc4913d8333b561
> +        .quad 0x3fc46baf0f9f5db7
> +        .quad 0x3fc4462b9dc9b3dc
> +        .quad 0x3fc420b32740fdd4
> +        .quad 0x3fc3fb45a59928cc
> +        .quad 0x3fc3d5e3126bc27f
> +        .quad 0x3fc3b08b6757f2a9
> +        .quad 0x3fc38b3e9e027479
> +        .quad 0x3fc365fcb0159016
> +        .quad 0x3fc340c59741142e
> +        .quad 0x3fc31b994d3a4f85
> +        .quad 0x3fc2f677cbbc0a96
> +        .quad 0x3fc2d1610c86813a
> +        .quad 0x3fc2ac55095f5c59
> +        .quad 0x3fc28753bc11aba5
> +        .quad 0x3fc2625d1e6ddf57
> +        .quad 0x3fc23d712a49c202
> +        .quad 0x3fc2188fd9807263
> +        .quad 0x3fc1f3b925f25d41
> +        .quad 0x3fc1ceed09853752
> +        .quad 0x3fc1aa2b7e23f72a
> +        .quad 0x3fc185747dbecf34
> +        .quad 0x3fc160c8024b27b1
> +        .quad 0x3fc13c2605c398c3
> +        .quad 0x3fc1178e8227e47c
> +        .quad 0x3fc0f301717cf0fb
> +        .quad 0x3fc0ce7ecdccc28d
> +        .quad 0x3fc0aa06912675d5
> +        .quad 0x3fc08598b59e3a07
> +        .quad 0x3fc06135354d4b18
> +        .quad 0x3fc03cdc0a51ec0d
> +        .quad 0x3fc0188d2ecf6140
> +        .quad 0x3fbfe89139dbd566
> +        .quad 0x3fbfa01c9db57ce2
> +        .quad 0x3fbf57bc7d9005db
> +        .quad 0x3fbf0f70cdd992e3
> +        .quad 0x3fbec739830a1120
> +        .quad 0x3fbe7f1691a32d3e
> +        .quad 0x3fbe3707ee30487b
> +        .quad 0x3fbdef0d8d466db9
> +        .quad 0x3fbda727638446a2
> +        .quad 0x3fbd5f55659210e2
> +        .quad 0x3fbd179788219364
> +        .quad 0x3fbccfedbfee13a8
> +        .quad 0x3fbc885801bc4b23
> +        .quad 0x3fbc40d6425a5cb1
> +        .quad 0x3fbbf968769fca11
> +        .quad 0x3fbbb20e936d6974
> +        .quad 0x3fbb6ac88dad5b1c
> +        .quad 0x3fbb23965a52ff00
> +        .quad 0x3fbadc77ee5aea8c
> +        .quad 0x3fba956d3ecade63
> +        .quad 0x3fba4e7640b1bc38
> +        .quad 0x3fba0792e9277cac
> +        .quad 0x3fb9c0c32d4d2548
> +        .quad 0x3fb97a07024cbe74
> +        .quad 0x3fb9335e5d594989
> +        .quad 0x3fb8ecc933aeb6e8
> +        .quad 0x3fb8a6477a91dc29
> +        .quad 0x3fb85fd927506a48
> +        .quad 0x3fb8197e2f40e3f0
> +        .quad 0x3fb7d33687c293c9
> +        .quad 0x3fb78d02263d82d3
> +        .quad 0x3fb746e100226ed9
> +        .quad 0x3fb700d30aeac0e1
> +        .quad 0x3fb6bad83c1883b6
> +        .quad 0x3fb674f089365a7a
> +        .quad 0x3fb62f1be7d77743
> +        .quad 0x3fb5e95a4d9791cb
> +        .quad 0x3fb5a3abb01ade25
> +        .quad 0x3fb55e10050e0384
> +        .quad 0x3fb518874226130a
> +        .quad 0x3fb4d3115d207eac
> +        .quad 0x3fb48dae4bc31018
> +        .quad 0x3fb4485e03dbdfad
> +        .quad 0x3fb403207b414b7f
> +        .quad 0x3fb3bdf5a7d1ee64
> +        .quad 0x3fb378dd7f749714
> +        .quad 0x3fb333d7f8183f4b
> +        .quad 0x3fb2eee507b40301
> +        .quad 0x3fb2aa04a44717a5
> +        .quad 0x3fb26536c3d8c369
> +        .quad 0x3fb2207b5c78549e
> +        .quad 0x3fb1dbd2643d190b
> +        .quad 0x3fb1973bd1465567
> +        .quad 0x3fb152b799bb3cc9
> +        .quad 0x3fb10e45b3cae831
> +        .quad 0x3fb0c9e615ac4e17
> +        .quad 0x3fb08598b59e3a07
> +        .quad 0x3fb0415d89e74444
> +        .quad 0x3faffa6911ab9301
> +        .quad 0x3faf723b517fc523
> +        .quad 0x3faeea31c006b87c
> +        .quad 0x3fae624c4a0b5e1b
> +        .quad 0x3fadda8adc67ee4e
> +        .quad 0x3fad52ed6405d86f
> +        .quad 0x3faccb73cdddb2cc
> +        .quad 0x3fac441e06f72a9e
> +        .quad 0x3fabbcebfc68f420
> +        .quad 0x3fab35dd9b58baad
> +        .quad 0x3faaaef2d0fb10fc
> +        .quad 0x3faa282b8a936171
> +        .quad 0x3fa9a187b573de7c
> +        .quad 0x3fa91b073efd7314
> +        .quad 0x3fa894aa149fb343
> +        .quad 0x3fa80e7023d8ccc4
> +        .quad 0x3fa788595a3577ba
> +        .quad 0x3fa70265a550e777
> +        .quad 0x3fa67c94f2d4bb58
> +        .quad 0x3fa5f6e73078efb8
> +        .quad 0x3fa5715c4c03ceef
> +        .quad 0x3fa4ebf43349e26f
> +        .quad 0x3fa466aed42de3ea
> +        .quad 0x3fa3e18c1ca0ae92
> +        .quad 0x3fa35c8bfaa1306b
> +        .quad 0x3fa2d7ae5c3c5bae
> +        .quad 0x3fa252f32f8d183f
> +        .quad 0x3fa1ce5a62bc353a
> +        .quad 0x3fa149e3e4005a8d
> +        .quad 0x3fa0c58fa19dfaaa
> +        .quad 0x3fa0415d89e74444
> +        .quad 0x3f9f7a9b16782856
> +        .quad 0x3f9e72bf2813ce51
> +        .quad 0x3f9d6b2725979802
> +        .quad 0x3f9c63d2ec14aaf2
> +        .quad 0x3f9b5cc258b718e6
> +        .quad 0x3f9a55f548c5c43f
> +        .quad 0x3f994f6b99a24475
> +        .quad 0x3f98492528c8cabf
> +        .quad 0x3f974321d3d006d3
> +        .quad 0x3f963d6178690bd6
> +        .quad 0x3f9537e3f45f3565
> +        .quad 0x3f9432a925980cc1
> +        .quad 0x3f932db0ea132e22
> +        .quad 0x3f9228fb1fea2e28
> +        .quad 0x3f912487a5507f70
> +        .quad 0x3f90205658935847
> +        .quad 0x3f8e38ce3033310c
> +        .quad 0x3f8c317384c75f06
> +        .quad 0x3f8a2a9c6c170462
> +        .quad 0x3f882448a388a2aa
> +        .quad 0x3f861e77e8b53fc6
> +        .quad 0x3f841929f96832f0
> +        .quad 0x3f82145e939ef1e9
> +        .quad 0x3f8010157588de71
> +        .quad 0x3f7c189cbb0e27fb
> +        .quad 0x3f78121214586b54
> +        .quad 0x3f740c8a747878e2
> +        .quad 0x3f70080559588b35
> +        .quad 0x3f680904828985c0
> +        .quad 0x3f60040155d5889e
> +        .quad 0x3f50020055655889
> +        .quad 0x0000000000000000
> +        /*== poly_coeff[4] ==*/
> +        .align 32
> +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 32
> +        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinLog1p = -1+2^(-53) ==*/
> +        .align 32
> +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff
> +        /*== MaxLog1p ==*/
> +        .align 32
> +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000
> +        /*== One ==*/
> +        .align 32
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== SgnMask ==*/
> +        .align 32
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== XThreshold ==*/
> +        .align 32
> +        .quad 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000
> +        /*== XhMask ==*/
> +        .align 32
> +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00
> +        /*== Threshold ==*/
> +        .align 32
> +        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 32
> +        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 32
> +        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000
> +        /*== ExpMask2 ==*/
> +        .align 32
> +        .quad 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000
> +        /*== L2L ==*/
> +        .align 32
> +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> +        .align 32
> +        .type	__svml_dlog1p_data_internal,@object
> +        .size	__svml_dlog1p_data_internal,.-__svml_dlog1p_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
> new file mode 100644
> index 0000000000..ca174a5f52
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized log1p, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_log1p _ZGVeN8v_log1p_avx2_wrapper
> +#include "../svml_d_log1p8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
> new file mode 100644
> index 0000000000..0aa35ec8c5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized log1p, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_log1p
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_log1p, __GI__ZGVeN8v_log1p, __redirect__ZGVeN8v_log1p)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
> new file mode 100644
> index 0000000000..5e38ff8d39
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
> @@ -0,0 +1,317 @@
> +/* Function log1p vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> + *    Get short reciprocal approximation Rcp ~ 1/xh
> + *    R = (Rcp*xh - 1.0) + Rcp*xl
> + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> + *       log(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dlog1p_data_internal_avx512
> + */
> +#define Log_tbl                       	0
> +#define One                           	128
> +#define SgnMask                       	192
> +#define C075                          	256
> +#define poly_coeff9                   	320
> +#define poly_coeff8                   	384
> +#define poly_coeff7                   	448
> +#define poly_coeff6                   	512
> +#define poly_coeff5                   	576
> +#define poly_coeff4                   	640
> +#define poly_coeff3                   	704
> +#define poly_coeff2                   	768
> +#define L2                            	832
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_log1p_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   One+__svml_dlog1p_data_internal_avx512(%rip), %zmm7
> +        vmovups   SgnMask+__svml_dlog1p_data_internal_avx512(%rip), %zmm14
> +        vmovaps   %zmm0, %zmm9
> +        vaddpd    {rn-sae}, %zmm9, %zmm7, %zmm11
> +        vandpd    %zmm14, %zmm9, %zmm8
> +
> +/* compute 1+x as high, low parts */
> +        vmaxpd    {sae}, %zmm9, %zmm7, %zmm10
> +        vminpd    {sae}, %zmm9, %zmm7, %zmm12
> +
> +/* GetMant(x), normalized to [1,2) for x>=0, NaN for x<0 */
> +        vgetmantpd $8, {sae}, %zmm11, %zmm6
> +
> +/* GetExp(x) */
> +        vgetexppd {sae}, %zmm11, %zmm5
> +        vsubpd    {rn-sae}, %zmm10, %zmm11, %zmm13
> +
> +/* DblRcp ~ 1/Mantissa */
> +        vrcp14pd  %zmm6, %zmm15
> +
> +/* Start polynomial evaluation */
> +        vmovups   poly_coeff9+__svml_dlog1p_data_internal_avx512(%rip), %zmm10
> +        vmovups   poly_coeff7+__svml_dlog1p_data_internal_avx512(%rip), %zmm11
> +
> +/* Xl */
> +        vsubpd    {rn-sae}, %zmm13, %zmm12, %zmm2
> +        vxorpd    %zmm14, %zmm5, %zmm3
> +
> +/* round DblRcp to 4 fractional bits (RN mode, no Precision exception) */
> +        vrndscalepd $88, {sae}, %zmm15, %zmm4
> +        vmovups   poly_coeff5+__svml_dlog1p_data_internal_avx512(%rip), %zmm12
> +        vmovups   poly_coeff6+__svml_dlog1p_data_internal_avx512(%rip), %zmm14
> +        vmovups   poly_coeff3+__svml_dlog1p_data_internal_avx512(%rip), %zmm13
> +
> +/* Xl*2^(-Expon) */
> +        vscalefpd {rn-sae}, %zmm3, %zmm2, %zmm1
> +
> +/* Reduced argument: R = DblRcp*(Mantissa+Xl) - 1 */
> +        vfmsub213pd {rn-sae}, %zmm7, %zmm4, %zmm6
> +        vmovups   __svml_dlog1p_data_internal_avx512(%rip), %zmm3
> +
> +/*
> + * Table lookup
> + * Prepare exponent correction: DblRcp<0.75?
> + */
> +        vmovups   C075+__svml_dlog1p_data_internal_avx512(%rip), %zmm2
> +
> +/* Prepare table index */
> +        vpsrlq    $48, %zmm4, %zmm0
> +        vfmadd231pd {rn-sae}, %zmm4, %zmm1, %zmm6
> +        vmovups   poly_coeff8+__svml_dlog1p_data_internal_avx512(%rip), %zmm1
> +        vcmppd    $17, {sae}, %zmm2, %zmm4, %k1
> +        vcmppd    $4, {sae}, %zmm6, %zmm6, %k0
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm10, %zmm1
> +        vmovups   poly_coeff4+__svml_dlog1p_data_internal_avx512(%rip), %zmm10
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm11, %zmm14
> +        vmovups   L2+__svml_dlog1p_data_internal_avx512(%rip), %zmm4
> +        vpermt2pd Log_tbl+64+__svml_dlog1p_data_internal_avx512(%rip), %zmm0, %zmm3
> +
> +/* add 1 to Expon if DblRcp<0.75 */
> +        vaddpd    {rn-sae}, %zmm7, %zmm5, %zmm5{%k1}
> +
> +/* R^2 */
> +        vmulpd    {rn-sae}, %zmm6, %zmm6, %zmm0
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm12, %zmm10
> +        vmovups   poly_coeff2+__svml_dlog1p_data_internal_avx512(%rip), %zmm12
> +        vmulpd    {rn-sae}, %zmm0, %zmm0, %zmm15
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm13, %zmm12
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm1
> +        kmovw     %k0, %edx
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm0, %zmm10
> +
> +/* polynomial */
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm15, %zmm1
> +        vfmadd213pd {rn-sae}, %zmm6, %zmm0, %zmm1
> +        vaddpd    {rn-sae}, %zmm1, %zmm3, %zmm6
> +        vfmadd213pd {rn-sae}, %zmm6, %zmm4, %zmm5
> +        vorpd     %zmm8, %zmm5, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm9
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm9, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      log1p@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_log1p_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dlog1p_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Log_tbl[16][2];
> +        __declspec(align(64)) VUINT32 One[8][2];
> +        __declspec(align(64)) VUINT32 SgnMask[8][2];
> +        __declspec(align(64)) VUINT32 C075[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> +        __declspec(align(64)) VUINT32 L2[8][2];
> +   } __svml_dlog1p_data_internal_avx512;
> +#endif
> +__svml_dlog1p_data_internal_avx512:
> +        /*== Log_tbl ==*/
> +        .quad 0x0000000000000000
> +        .quad 0xbfaf0a30c01162a6
> +        .quad 0xbfbe27076e2af2e6
> +        .quad 0xbfc5ff3070a793d4
> +        .quad 0xbfcc8ff7c79a9a22
> +        .quad 0xbfd1675cababa60e
> +        .quad 0xbfd4618bc21c5ec2
> +        .quad 0xbfd739d7f6bbd007
> +        .quad 0x3fd269621134db92
> +        .quad 0x3fcf991c6cb3b379
> +        .quad 0x3fca93ed3c8ad9e3
> +        .quad 0x3fc5bf406b543db2
> +        .quad 0x3fc1178e8227e47c
> +        .quad 0x3fb9335e5d594989
> +        .quad 0x3fb08598b59e3a07
> +        .quad 0x3fa0415d89e74444
> +        /*== One ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== SgnMask ==*/
> +        .align 64
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
> +        /*== C075 0.75 ==*/
> +        .align 64
> +        .quad 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000
> +        /*== poly_coeff9 ==*/
> +        .align 64
> +        .quad 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70
> +        /*== poly_coeff8 ==*/
> +        .align 64
> +        .quad 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62
> +        /*== poly_coeff7 ==*/
> +        .align 64
> +        .quad 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF
> +        /*== poly_coeff6 ==*/
> +        .align 64
> +        .quad 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06
> +        /*== poly_coeff5 ==*/
> +        .align 64
> +        .quad 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C
> +        /*== poly_coeff4 ==*/
> +        .align 64
> +        .quad 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .quad 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .quad 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6
> +        /*== L2 = log(2) ==*/
> +        .align 64
> +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> +        .align 64
> +        .type	__svml_dlog1p_data_internal_avx512,@object
> +        .size	__svml_dlog1p_data_internal_avx512,.-__svml_dlog1p_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
> new file mode 100644
> index 0000000000..3c0a0a01a2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized log1pf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_log1pf _ZGVeN16v_log1pf_avx2_wrapper
> +#include "../svml_s_log1pf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
> new file mode 100644
> index 0000000000..9af1320547
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized log1pf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_log1pf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_log1pf, __GI__ZGVeN16v_log1pf,
> +	       __redirect__ZGVeN16v_log1pf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
> new file mode 100644
> index 0000000000..78b2fe417f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
> @@ -0,0 +1,271 @@
> +/* Function log1pf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> + *    Get short reciprocal approximation Rcp ~ 1/xh
> + *    R = (Rcp*xh - 1.0) + Rcp*xl
> + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> + *       log(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_slog1p_data_internal
> + */
> +#define SgnMask                       	0
> +#define sOne                          	64
> +#define sPoly_1                       	128
> +#define sPoly_2                       	192
> +#define sPoly_3                       	256
> +#define sPoly_4                       	320
> +#define sPoly_5                       	384
> +#define sPoly_6                       	448
> +#define sPoly_7                       	512
> +#define sPoly_8                       	576
> +#define iHiDelta                      	640
> +#define iLoRange                      	704
> +#define iBrkValue                     	768
> +#define iOffExpoMask                  	832
> +#define sLn2                          	896
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_log1pf_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovups   sOne+__svml_slog1p_data_internal(%rip), %zmm2
> +
> +/* reduction: compute r,n */
> +        vmovups   iBrkValue+__svml_slog1p_data_internal(%rip), %zmm12
> +        vmovups   SgnMask+__svml_slog1p_data_internal(%rip), %zmm4
> +        vmovaps   %zmm0, %zmm3
> +
> +/* compute 1+x as high, low parts */
> +        vmaxps    {sae}, %zmm3, %zmm2, %zmm5
> +        vminps    {sae}, %zmm3, %zmm2, %zmm7
> +        vandnps   %zmm3, %zmm4, %zmm1
> +        vpternlogd $255, %zmm4, %zmm4, %zmm4
> +        vaddps    {rn-sae}, %zmm7, %zmm5, %zmm9
> +        vpsubd    %zmm12, %zmm9, %zmm10
> +        vsubps    {rn-sae}, %zmm9, %zmm5, %zmm6
> +
> +/* check argument value ranges */
> +        vpaddd    iHiDelta+__svml_slog1p_data_internal(%rip), %zmm9, %zmm8
> +        vpsrad    $23, %zmm10, %zmm13
> +        vmovups   sPoly_5+__svml_slog1p_data_internal(%rip), %zmm9
> +        vpcmpd    $5, iLoRange+__svml_slog1p_data_internal(%rip), %zmm8, %k1
> +        vpslld    $23, %zmm13, %zmm14
> +        vaddps    {rn-sae}, %zmm7, %zmm6, %zmm15
> +        vcvtdq2ps {rn-sae}, %zmm13, %zmm0
> +        vpsubd    %zmm14, %zmm2, %zmm13
> +        vmovups   sPoly_8+__svml_slog1p_data_internal(%rip), %zmm7
> +        vmovups   sPoly_1+__svml_slog1p_data_internal(%rip), %zmm14
> +        vmulps    {rn-sae}, %zmm13, %zmm15, %zmm6
> +        vpandd    iOffExpoMask+__svml_slog1p_data_internal(%rip), %zmm10, %zmm11
> +        vpaddd    %zmm12, %zmm11, %zmm5
> +        vmovups   sPoly_4+__svml_slog1p_data_internal(%rip), %zmm10
> +        vmovups   sPoly_3+__svml_slog1p_data_internal(%rip), %zmm11
> +        vmovups   sPoly_2+__svml_slog1p_data_internal(%rip), %zmm12
> +
> +/* polynomial evaluation */
> +        vsubps    {rn-sae}, %zmm2, %zmm5, %zmm2
> +        vaddps    {rn-sae}, %zmm6, %zmm2, %zmm15
> +        vmovups   sPoly_7+__svml_slog1p_data_internal(%rip), %zmm2
> +        vfmadd231ps {rn-sae}, %zmm15, %zmm7, %zmm2
> +        vpandnd   %zmm8, %zmm8, %zmm4{%k1}
> +        vmovups   sPoly_6+__svml_slog1p_data_internal(%rip), %zmm8
> +
> +/* combine and get argument value range mask */
> +        vptestmd  %zmm4, %zmm4, %k0
> +        vfmadd213ps {rn-sae}, %zmm8, %zmm15, %zmm2
> +        kmovw     %k0, %edx
> +        vfmadd213ps {rn-sae}, %zmm9, %zmm15, %zmm2
> +        vfmadd213ps {rn-sae}, %zmm10, %zmm15, %zmm2
> +        vfmadd213ps {rn-sae}, %zmm11, %zmm15, %zmm2
> +        vfmadd213ps {rn-sae}, %zmm12, %zmm15, %zmm2
> +        vfmadd213ps {rn-sae}, %zmm14, %zmm15, %zmm2
> +        vmulps    {rn-sae}, %zmm15, %zmm2, %zmm4
> +        vfmadd213ps {rn-sae}, %zmm15, %zmm15, %zmm4
> +
> +/* final reconstruction */
> +        vmovups   sLn2+__svml_slog1p_data_internal(%rip), %zmm15
> +        vfmadd213ps {rn-sae}, %zmm4, %zmm15, %zmm0
> +        vorps     %zmm1, %zmm0, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm3, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      log1pf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_log1pf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_slog1p_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 SgnMask[16][1];
> +        __declspec(align(64)) VUINT32 sOne[16][1];
> +        __declspec(align(64)) VUINT32 sPoly[8][16][1];
> +        __declspec(align(64)) VUINT32 iHiDelta[16][1];
> +        __declspec(align(64)) VUINT32 iLoRange[16][1];
> +        __declspec(align(64)) VUINT32 iBrkValue[16][1];
> +        __declspec(align(64)) VUINT32 iOffExpoMask[16][1];
> +        __declspec(align(64)) VUINT32 sLn2[16][1];
> +} __svml_slog1p_data_internal;
> +#endif
> +__svml_slog1p_data_internal:
> +        /*== SgnMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== sPoly[] = SP polynomial ==*/
> +        .align 64
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> +        /*== iHiDelta = SP 80000000-7f000000 ==*/
> +        .align 64
> +        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000
> +        /*== iLoRange = SP 00800000+iHiDelta ==*/
> +        .align 64
> +        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 64
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 64
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sLn2 = SP ln(2) ==*/
> +        .align 64
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> +        .align 64
> +        .type	__svml_slog1p_data_internal,@object
> +        .size	__svml_slog1p_data_internal,.-__svml_slog1p_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
> new file mode 100644
> index 0000000000..913c8290c8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized log1pf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_log1pf _ZGVbN4v_log1pf_sse2
> +#include "../svml_s_log1pf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
> new file mode 100644
> index 0000000000..b6aff48023
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized log1pf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_log1pf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_log1pf, __GI__ZGVbN4v_log1pf,
> +	       __redirect__ZGVbN4v_log1pf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
> new file mode 100644
> index 0000000000..ef1bae58c0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
> @@ -0,0 +1,252 @@
> +/* Function log1pf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> + *    Get short reciprocal approximation Rcp ~ 1/xh
> + *    R = (Rcp*xh - 1.0) + Rcp*xl
> + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> + *       log(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_slog1p_data_internal
> + */
> +#define SgnMask                       	0
> +#define sOne                          	16
> +#define sPoly                         	32
> +#define iHiDelta                      	160
> +#define iLoRange                      	176
> +#define iBrkValue                     	192
> +#define iOffExpoMask                  	208
> +#define sLn2                          	224
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_log1pf_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movups    sOne+__svml_slog1p_data_internal(%rip), %xmm7
> +
> +/* compute 1+x as high, low parts */
> +        movaps    %xmm7, %xmm1
> +        movaps    %xmm7, %xmm5
> +        maxps     %xmm0, %xmm1
> +        minps     %xmm0, %xmm5
> +        movaps    %xmm1, %xmm4
> +
> +/* check argument value ranges */
> +        movdqu    iHiDelta+__svml_slog1p_data_internal(%rip), %xmm2
> +        addps     %xmm5, %xmm4
> +
> +/* reduction: compute r,n */
> +        movdqu    iBrkValue+__svml_slog1p_data_internal(%rip), %xmm3
> +        paddd     %xmm4, %xmm2
> +        movdqu    iOffExpoMask+__svml_slog1p_data_internal(%rip), %xmm8
> +        subps     %xmm4, %xmm1
> +        psubd     %xmm3, %xmm4
> +        addps     %xmm1, %xmm5
> +        pand      %xmm4, %xmm8
> +        psrad     $23, %xmm4
> +        cvtdq2ps  %xmm4, %xmm10
> +        pslld     $23, %xmm4
> +        movaps    %xmm7, %xmm1
> +        paddd     %xmm3, %xmm8
> +        psubd     %xmm4, %xmm1
> +        mulps     %xmm5, %xmm1
> +
> +/* polynomial evaluation */
> +        subps     %xmm7, %xmm8
> +
> +/* final reconstruction */
> +        mulps     sLn2+__svml_slog1p_data_internal(%rip), %xmm10
> +        addps     %xmm8, %xmm1
> +        movups    sPoly+112+__svml_slog1p_data_internal(%rip), %xmm9
> +        mulps     %xmm1, %xmm9
> +        movdqu    iLoRange+__svml_slog1p_data_internal(%rip), %xmm6
> +        pcmpgtd   %xmm2, %xmm6
> +        addps     sPoly+96+__svml_slog1p_data_internal(%rip), %xmm9
> +
> +/* combine and get argument value range mask */
> +        movmskps  %xmm6, %edx
> +        movups    SgnMask+__svml_slog1p_data_internal(%rip), %xmm11
> +        mulps     %xmm1, %xmm9
> +        andnps    %xmm0, %xmm11
> +        addps     sPoly+80+__svml_slog1p_data_internal(%rip), %xmm9
> +        mulps     %xmm1, %xmm9
> +        addps     sPoly+64+__svml_slog1p_data_internal(%rip), %xmm9
> +        mulps     %xmm1, %xmm9
> +        addps     sPoly+48+__svml_slog1p_data_internal(%rip), %xmm9
> +        mulps     %xmm1, %xmm9
> +        addps     sPoly+32+__svml_slog1p_data_internal(%rip), %xmm9
> +        mulps     %xmm1, %xmm9
> +        addps     sPoly+16+__svml_slog1p_data_internal(%rip), %xmm9
> +        mulps     %xmm1, %xmm9
> +        addps     sPoly+__svml_slog1p_data_internal(%rip), %xmm9
> +        mulps     %xmm1, %xmm9
> +        mulps     %xmm1, %xmm9
> +        addps     %xmm9, %xmm1
> +        addps     %xmm10, %xmm1
> +        orps      %xmm11, %xmm1
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movaps    %xmm1, %xmm0
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm0, 32(%rsp)
> +        movups    %xmm1, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm1
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm1
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      log1pf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_log1pf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_slog1p_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 SgnMask[4][1];
> +        __declspec(align(16)) VUINT32 sOne[4][1];
> +        __declspec(align(16)) VUINT32 sPoly[8][4][1];
> +        __declspec(align(16)) VUINT32 iHiDelta[4][1];
> +        __declspec(align(16)) VUINT32 iLoRange[4][1];
> +        __declspec(align(16)) VUINT32 iBrkValue[4][1];
> +        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
> +        __declspec(align(16)) VUINT32 sLn2[4][1];
> +} __svml_slog1p_data_internal;
> +#endif
> +__svml_slog1p_data_internal:
> +        /*== SgnMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 16
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== sPoly[] = SP polynomial ==*/
> +        .align 16
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> +        /*== iHiDelta = SP 80000000-7f000000 ==*/
> +        .align 16
> +        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000
> +        /*== iLoRange = SP 00800000+iHiDelta ==*/
> +        .align 16
> +        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 16
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 16
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sLn2 = SP ln(2) ==*/
> +        .align 16
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> +        .align 16
> +        .type	__svml_slog1p_data_internal,@object
> +        .size	__svml_slog1p_data_internal,.-__svml_slog1p_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
> new file mode 100644
> index 0000000000..c0b97d89e6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized log1pf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_log1pf _ZGVdN8v_log1pf_sse_wrapper
> +#include "../svml_s_log1pf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
> new file mode 100644
> index 0000000000..a2bbe37129
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized log1pf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_log1pf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_log1pf, __GI__ZGVdN8v_log1pf,
> +	       __redirect__ZGVdN8v_log1pf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
> new file mode 100644
> index 0000000000..957dc23e3f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
> @@ -0,0 +1,254 @@
> +/* Function log1pf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> + *    Get short reciprocal approximation Rcp ~ 1/xh
> + *    R = (Rcp*xh - 1.0) + Rcp*xl
> + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> + *       log(Rcp) is tabulated
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_slog1p_data_internal
> + */
> +#define SgnMask                       	0
> +#define sOne                          	32
> +#define sPoly                         	64
> +#define iHiDelta                      	320
> +#define iLoRange                      	352
> +#define iBrkValue                     	384
> +#define iOffExpoMask                  	416
> +#define sLn2                          	448
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_log1pf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        vmovups   sOne+__svml_slog1p_data_internal(%rip), %ymm2
> +
> +/* reduction: compute r,n */
> +        vmovups   iBrkValue+__svml_slog1p_data_internal(%rip), %ymm13
> +        vmovups   SgnMask+__svml_slog1p_data_internal(%rip), %ymm4
> +        vmovups   iLoRange+__svml_slog1p_data_internal(%rip), %ymm8
> +        vmovaps   %ymm0, %ymm3
> +
> +/* compute 1+x as high, low parts */
> +        vmaxps    %ymm3, %ymm2, %ymm5
> +        vminps    %ymm3, %ymm2, %ymm6
> +        vaddps    %ymm6, %ymm5, %ymm10
> +        vpsubd    %ymm13, %ymm10, %ymm11
> +
> +/* check argument value ranges */
> +        vpaddd    iHiDelta+__svml_slog1p_data_internal(%rip), %ymm10, %ymm9
> +        vsubps    %ymm10, %ymm5, %ymm7
> +        vpsrad    $23, %ymm11, %ymm14
> +        vpand     iOffExpoMask+__svml_slog1p_data_internal(%rip), %ymm11, %ymm12
> +        vpslld    $23, %ymm14, %ymm15
> +        vcvtdq2ps %ymm14, %ymm0
> +        vpsubd    %ymm15, %ymm2, %ymm14
> +        vandnps   %ymm3, %ymm4, %ymm1
> +        vaddps    %ymm7, %ymm6, %ymm4
> +        vpaddd    %ymm13, %ymm12, %ymm6
> +        vmulps    %ymm4, %ymm14, %ymm7
> +
> +/* polynomial evaluation */
> +        vsubps    %ymm2, %ymm6, %ymm2
> +        vpcmpgtd  %ymm9, %ymm8, %ymm5
> +        vmovups   sPoly+224+__svml_slog1p_data_internal(%rip), %ymm8
> +        vaddps    %ymm2, %ymm7, %ymm9
> +        vfmadd213ps sPoly+192+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> +        vfmadd213ps sPoly+160+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> +        vfmadd213ps sPoly+128+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> +        vfmadd213ps sPoly+96+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> +        vfmadd213ps sPoly+64+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> +        vfmadd213ps sPoly+32+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> +        vfmadd213ps sPoly+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> +        vmulps    %ymm8, %ymm9, %ymm10
> +        vfmadd213ps %ymm9, %ymm9, %ymm10
> +
> +/* final reconstruction */
> +        vfmadd132ps sLn2+__svml_slog1p_data_internal(%rip), %ymm10, %ymm0
> +
> +/* combine and get argument value range mask */
> +        vmovmskps %ymm5, %edx
> +        vorps     %ymm1, %ymm0, %ymm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm3, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      log1pf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_log1pf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_slog1p_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 SgnMask[8][1];
> +        __declspec(align(32)) VUINT32 sOne[8][1];
> +        __declspec(align(32)) VUINT32 sPoly[8][8][1];
> +        __declspec(align(32)) VUINT32 iHiDelta[8][1];
> +        __declspec(align(32)) VUINT32 iLoRange[8][1];
> +        __declspec(align(32)) VUINT32 iBrkValue[8][1];
> +        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
> +        __declspec(align(32)) VUINT32 sLn2[8][1];
> +} __svml_slog1p_data_internal;
> +#endif
> +__svml_slog1p_data_internal:
> +        /*== SgnMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 32
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== sPoly[] = SP polynomial ==*/
> +        .align 32
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> +        /*== iHiDelta = SP 80000000-7f000000 ==*/
> +        .align 32
> +        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000
> +        /*== iLoRange = SP 00800000+iHiDelta ==*/
> +        .align 32
> +        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 32
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 32
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sLn2 = SP ln(2) ==*/
> +        .align 32
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> +        .align 32
> +        .type	__svml_slog1p_data_internal,@object
> +        .size	__svml_slog1p_data_internal,.-__svml_slog1p_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_log1p2_core.S b/sysdeps/x86_64/fpu/svml_d_log1p2_core.S
> new file mode 100644
> index 0000000000..e3f01717d9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log1p2_core.S
> @@ -0,0 +1,29 @@
> +/* Function log1p vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_log1p)
> +WRAPPER_IMPL_SSE2 log1p
> +END (_ZGVbN2v_log1p)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_log1p)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_log1p4_core.S b/sysdeps/x86_64/fpu/svml_d_log1p4_core.S
> new file mode 100644
> index 0000000000..49beb96183
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log1p4_core.S
> @@ -0,0 +1,29 @@
> +/* Function log1p vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_log1p)
> +WRAPPER_IMPL_AVX _ZGVbN2v_log1p
> +END (_ZGVdN4v_log1p)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_log1p)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
> new file mode 100644
> index 0000000000..8b89768b7c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function log1p vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_log1p)
> +WRAPPER_IMPL_AVX _ZGVbN2v_log1p
> +END (_ZGVcN4v_log1p)
> diff --git a/sysdeps/x86_64/fpu/svml_d_log1p8_core.S b/sysdeps/x86_64/fpu/svml_d_log1p8_core.S
> new file mode 100644
> index 0000000000..54b4d4ede8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_log1p8_core.S
> @@ -0,0 +1,25 @@
> +/* Function log1p vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_log1p)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_log1p
> +END (_ZGVeN8v_log1p)
> diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
> new file mode 100644
> index 0000000000..2c953d00fb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function log1pf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_log1pf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_log1pf
> +END (_ZGVeN16v_log1pf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
> new file mode 100644
> index 0000000000..6f68762eaa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function log1pf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_log1pf)
> +WRAPPER_IMPL_SSE2 log1pf
> +END (_ZGVbN4v_log1pf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_log1pf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
> new file mode 100644
> index 0000000000..74f81283b1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function log1pf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_log1pf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_log1pf
> +END (_ZGVdN8v_log1pf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_log1pf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
> new file mode 100644
> index 0000000000..f33be0e904
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function log1pf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_log1pf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_log1pf
> +END (_ZGVcN8v_log1pf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
> new file mode 100644
> index 0000000000..18aa6aaeaa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-log1p.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
> new file mode 100644
> index 0000000000..18aa6aaeaa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-log1p.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
> new file mode 100644
> index 0000000000..18aa6aaeaa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-log1p.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
> new file mode 100644
> index 0000000000..40937f987a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC log1p
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 08c91ff634..38359b05e3 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
> +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index a2fb0de309..17701e7731 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
> +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index dc65a4ee25..bba62b2446 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
> +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 253ee8c906..8a04e13a07 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
> +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
> new file mode 100644
> index 0000000000..3395decaf4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-log1pf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
> new file mode 100644
> index 0000000000..3395decaf4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-log1pf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
> new file mode 100644
> index 0000000000..3395decaf4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-log1pf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
> new file mode 100644
> index 0000000000..1b36069ded
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC log1pf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 1c7db5146c..706f52c618 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index 8ec51603b3..ceace4c53a 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index 1cb4553c7a..06a4753409 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 6ecc1792bb..a87e5298e0 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
>  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
> +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 18/18] x86-64: Add vector asinh/asinhf implementation to libmvec
  2021-12-29  6:40 ` [PATCH v5 18/18] x86-64: Add vector asinh/asinhf " Sunil K Pandey
@ 2021-12-29 21:27   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:27 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:40:00PM -0800, Sunil K Pandey wrote:
> Implement vectorized asinh/asinhf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector asinh/asinhf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |   11 +
>  math/bits/mathcalls.h                         |    2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |    4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |    1 +
>  sysdeps/x86_64/fpu/Versions                   |    2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |   17 +
>  .../fpu/multiarch/svml_d_asinh2_core-sse2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_d_asinh2_core.c |   27 +
>  .../fpu/multiarch/svml_d_asinh2_core_sse4.S   | 1662 +++++++++++++++++
>  .../fpu/multiarch/svml_d_asinh4_core-sse.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_asinh4_core.c |   27 +
>  .../fpu/multiarch/svml_d_asinh4_core_avx2.S   | 1601 ++++++++++++++++
>  .../fpu/multiarch/svml_d_asinh8_core-avx2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_d_asinh8_core.c |   27 +
>  .../fpu/multiarch/svml_d_asinh8_core_avx512.S |  510 +++++
>  .../fpu/multiarch/svml_s_asinhf16_core-avx2.S |   20 +
>  .../fpu/multiarch/svml_s_asinhf16_core.c      |   28 +
>  .../multiarch/svml_s_asinhf16_core_avx512.S   |  476 +++++
>  .../fpu/multiarch/svml_s_asinhf4_core-sse2.S  |   20 +
>  .../fpu/multiarch/svml_s_asinhf4_core.c       |   28 +
>  .../fpu/multiarch/svml_s_asinhf4_core_sse4.S  |  509 +++++
>  .../fpu/multiarch/svml_s_asinhf8_core-sse.S   |   20 +
>  .../fpu/multiarch/svml_s_asinhf8_core.c       |   28 +
>  .../fpu/multiarch/svml_s_asinhf8_core_avx2.S  |  457 +++++
>  sysdeps/x86_64/fpu/svml_d_asinh2_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_d_asinh4_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S   |   25 +
>  sysdeps/x86_64/fpu/svml_d_asinh8_core.S       |   25 +
>  sysdeps/x86_64/fpu/svml_s_asinhf16_core.S     |   25 +
>  sysdeps/x86_64/fpu/svml_s_asinhf4_core.S      |   29 +
>  sysdeps/x86_64/fpu/svml_s_asinhf8_core.S      |   29 +
>  sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S  |   25 +
>  .../fpu/test-double-libmvec-asinh-avx.c       |    1 +
>  .../fpu/test-double-libmvec-asinh-avx2.c      |    1 +
>  .../fpu/test-double-libmvec-asinh-avx512f.c   |    1 +
>  .../x86_64/fpu/test-double-libmvec-asinh.c    |    3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
>  .../fpu/test-float-libmvec-asinhf-avx.c       |    1 +
>  .../fpu/test-float-libmvec-asinhf-avx2.c      |    1 +
>  .../fpu/test-float-libmvec-asinhf-avx512f.c   |    1 +
>  .../x86_64/fpu/test-float-libmvec-asinhf.c    |    3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
>  50 files changed, 5784 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_asinh8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-asinh.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-asinhf.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 21f1a43232..bcaddb7a0e 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -296,4 +296,15 @@
>  #define __DECL_SIMD_tanhf32x
>  #define __DECL_SIMD_tanhf64x
>  #define __DECL_SIMD_tanhf128x
> +
> +#define __DECL_SIMD_asinh
> +#define __DECL_SIMD_asinhf
> +#define __DECL_SIMD_asinhl
> +#define __DECL_SIMD_asinhf16
> +#define __DECL_SIMD_asinhf32
> +#define __DECL_SIMD_asinhf64
> +#define __DECL_SIMD_asinhf128
> +#define __DECL_SIMD_asinhf32x
> +#define __DECL_SIMD_asinhf64x
> +#define __DECL_SIMD_asinhf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 3d1c2056d5..40e055e579 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -84,7 +84,7 @@ __MATHDECL_VEC (void,sincos,,
>  /* Hyperbolic arc cosine of X.  */
>  __MATHCALL_VEC (acosh,, (_Mdouble_ __x));
>  /* Hyperbolic arc sine of X.  */
> -__MATHCALL (asinh,, (_Mdouble_ __x));
> +__MATHCALL_VEC (asinh,, (_Mdouble_ __x));
>  /* Hyperbolic arc tangent of X.  */
>  __MATHCALL_VEC (atanh,, (_Mdouble_ __x));
>  #endif
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index e178cef683..df265d6a12 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -49,6 +49,7 @@ GLIBC_2.22 _ZGVeN8vvv_sincos F
>  GLIBC_2.35 _ZGVbN2v_acos F
>  GLIBC_2.35 _ZGVbN2v_acosh F
>  GLIBC_2.35 _ZGVbN2v_asin F
> +GLIBC_2.35 _ZGVbN2v_asinh F
>  GLIBC_2.35 _ZGVbN2v_atan F
>  GLIBC_2.35 _ZGVbN2v_atanh F
>  GLIBC_2.35 _ZGVbN2v_cbrt F
> @@ -67,6 +68,7 @@ GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
>  GLIBC_2.35 _ZGVbN4v_acoshf F
>  GLIBC_2.35 _ZGVbN4v_asinf F
> +GLIBC_2.35 _ZGVbN4v_asinhf F
>  GLIBC_2.35 _ZGVbN4v_atanf F
>  GLIBC_2.35 _ZGVbN4v_atanhf F
>  GLIBC_2.35 _ZGVbN4v_cbrtf F
> @@ -85,6 +87,7 @@ GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
>  GLIBC_2.35 _ZGVcN4v_acosh F
>  GLIBC_2.35 _ZGVcN4v_asin F
> +GLIBC_2.35 _ZGVcN4v_asinh F
>  GLIBC_2.35 _ZGVcN4v_atan F
>  GLIBC_2.35 _ZGVcN4v_atanh F
>  GLIBC_2.35 _ZGVcN4v_cbrt F
> @@ -103,6 +106,7 @@ GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
>  GLIBC_2.35 _ZGVcN8v_acoshf F
>  GLIBC_2.35 _ZGVcN8v_asinf F
> +GLIBC_2.35 _ZGVcN8v_asinhf F
>  GLIBC_2.35 _ZGVcN8v_atanf F
>  GLIBC_2.35 _ZGVcN8v_atanhf F
>  GLIBC_2.35 _ZGVcN8v_cbrtf F
> @@ -121,6 +125,7 @@ GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
>  GLIBC_2.35 _ZGVdN4v_acosh F
>  GLIBC_2.35 _ZGVdN4v_asin F
> +GLIBC_2.35 _ZGVdN4v_asinh F
>  GLIBC_2.35 _ZGVdN4v_atan F
>  GLIBC_2.35 _ZGVdN4v_atanh F
>  GLIBC_2.35 _ZGVdN4v_cbrt F
> @@ -139,6 +144,7 @@ GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
>  GLIBC_2.35 _ZGVdN8v_acoshf F
>  GLIBC_2.35 _ZGVdN8v_asinf F
> +GLIBC_2.35 _ZGVdN8v_asinhf F
>  GLIBC_2.35 _ZGVdN8v_atanf F
>  GLIBC_2.35 _ZGVdN8v_atanhf F
>  GLIBC_2.35 _ZGVdN8v_cbrtf F
> @@ -157,6 +163,7 @@ GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
>  GLIBC_2.35 _ZGVeN16v_acoshf F
>  GLIBC_2.35 _ZGVeN16v_asinf F
> +GLIBC_2.35 _ZGVeN16v_asinhf F
>  GLIBC_2.35 _ZGVeN16v_atanf F
>  GLIBC_2.35 _ZGVeN16v_atanhf F
>  GLIBC_2.35 _ZGVeN16v_cbrtf F
> @@ -175,6 +182,7 @@ GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
>  GLIBC_2.35 _ZGVeN8v_acosh F
>  GLIBC_2.35 _ZGVeN8v_asin F
> +GLIBC_2.35 _ZGVeN8v_asinh F
>  GLIBC_2.35 _ZGVeN8v_atan F
>  GLIBC_2.35 _ZGVeN8v_atanh F
>  GLIBC_2.35 _ZGVeN8v_cbrt F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 3c657f6108..71b7d660db 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -130,6 +130,10 @@
>  #  define __DECL_SIMD_tanh __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_tanhf
>  #  define __DECL_SIMD_tanhf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_asinh
> +#  define __DECL_SIMD_asinh __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_asinhf
> +#  define __DECL_SIMD_asinhf __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index c7f81945fe..4d3afdf753 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -64,6 +64,8 @@
>  !GCC$ builtin (erff) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (tanh) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (tanhf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (asinh) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (asinhf) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -113,3 +115,5 @@
>  !GCC$ builtin (erff) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (tanh) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (tanhf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (asinh) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (asinhf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 26df8d47bf..2ff33c7dd8 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -25,6 +25,7 @@ libmvec-funcs = \
>    acos \
>    acosh \
>    asin \
> +  asinh \
>    atan \
>    atan2 \
>    atanh \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index adcbe0fefb..e6ead13085 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -17,6 +17,7 @@ libmvec {
>      _ZGVbN2v_acos; _ZGVcN4v_acos; _ZGVdN4v_acos; _ZGVeN8v_acos;
>      _ZGVbN2v_acosh; _ZGVcN4v_acosh; _ZGVdN4v_acosh; _ZGVeN8v_acosh;
>      _ZGVbN2v_asin; _ZGVcN4v_asin; _ZGVdN4v_asin; _ZGVeN8v_asin;
> +    _ZGVbN2v_asinh; _ZGVcN4v_asinh; _ZGVdN4v_asinh; _ZGVeN8v_asinh;
>      _ZGVbN2v_atan; _ZGVcN4v_atan; _ZGVdN4v_atan; _ZGVeN8v_atan;
>      _ZGVbN2v_atanh; _ZGVcN4v_atanh; _ZGVdN4v_atanh; _ZGVeN8v_atanh;
>      _ZGVbN2v_cbrt; _ZGVcN4v_cbrt; _ZGVdN4v_cbrt; _ZGVeN8v_cbrt;
> @@ -35,6 +36,7 @@ libmvec {
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
>      _ZGVbN4v_acoshf; _ZGVcN8v_acoshf; _ZGVdN8v_acoshf; _ZGVeN16v_acoshf;
>      _ZGVbN4v_asinf; _ZGVcN8v_asinf; _ZGVdN8v_asinf; _ZGVeN16v_asinf;
> +    _ZGVbN4v_asinhf; _ZGVcN8v_asinhf; _ZGVdN8v_asinhf; _ZGVeN16v_asinhf;
>      _ZGVbN4v_atanf; _ZGVcN8v_atanf; _ZGVdN8v_atanf; _ZGVeN16v_atanf;
>      _ZGVbN4v_atanhf; _ZGVcN8v_atanhf; _ZGVdN8v_atanhf; _ZGVeN16v_atanhf;
>      _ZGVbN4v_cbrtf; _ZGVcN8v_cbrtf; _ZGVdN8v_cbrtf; _ZGVeN16v_cbrtf;
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index bfaad7acef..71e9fced02 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -157,6 +157,23 @@ float: 3
>  float128: 4
>  ldouble: 5
>  
> +Function: "asinh_vlen2":
> +double: 1
> +
> +Function: "asinh_vlen4":
> +double: 1
> +float: 1
> +
> +Function: "asinh_vlen4_avx2":
> +double: 1
> +
> +Function: "asinh_vlen8":
> +double: 1
> +float: 1
> +
> +Function: "asinh_vlen8_avx2":
> +float: 1
> +
>  Function: "atan":
>  double: 1
>  float: 1
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core-sse2.S
> new file mode 100644
> index 0000000000..ddd1c3ca24
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized asinh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_asinh _ZGVbN2v_asinh_sse2
> +#include "../svml_d_asinh2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core.c
> new file mode 100644
> index 0000000000..37452d0f92
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized asinh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_asinh
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_asinh, __GI__ZGVbN2v_asinh, __redirect__ZGVbN2v_asinh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core_sse4.S
> new file mode 100644
> index 0000000000..0fe130f20a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh2_core_sse4.S
> @@ -0,0 +1,1662 @@
> +/* Function asinh vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute asinh(x) as log(x + sqrt(x*x + 1))
> + *
> + *   Special cases:
> + *
> + *   asinh(NaN) = quiet NaN, and raise invalid exception
> + *   asinh(INF) = that INF
> + *   asinh(0)   = that 0
> + *
> + */
> +
> +/* Offsets for data table __svml_dasinh_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	8208
> +#define poly_coeff                    	12320
> +#define ExpMask                       	12384
> +#define Two10                         	12400
> +#define MinLog1p                      	12416
> +#define MaxLog1p                      	12432
> +#define One                           	12448
> +#define SgnMask                       	12464
> +#define XThreshold                    	12480
> +#define XhMask                        	12496
> +#define Threshold                     	12512
> +#define Bias                          	12528
> +#define Bias1                         	12544
> +#define ExpMask0                      	12560
> +#define ExpMask2                      	12576
> +#define L2                            	12592
> +#define dBigThreshold                 	12608
> +#define dC2                           	12624
> +#define dC3                           	12640
> +#define dC4                           	12656
> +#define dC5                           	12672
> +#define dHalf                         	12688
> +#define dLargestFinite                	12704
> +#define dLittleThreshold              	12720
> +#define dSign                         	12736
> +#define dThirtyOne                    	12752
> +#define dTopMask12                    	12768
> +#define dTopMask26                    	12784
> +#define dTopMask29                    	12800
> +#define XScale                        	12816
> +
> +/* Lookup bias for data table __svml_dasinh_data_internal.  */
> +#define Table_Lookup_Bias               -0x405ff0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_asinh_sse4)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $64, %rsp
> +        movaps    %xmm0, %xmm13
> +
> +/*
> + * Split X into high and low parts, XHi (<= 26 bits) and XLo (<= 27 bits)
> + * We could use either X or |X| here, but it doesn't seem to matter
> + */
> +        movups    dTopMask26+__svml_dasinh_data_internal(%rip), %xmm15
> +        movaps    %xmm13, %xmm7
> +        andps     %xmm13, %xmm15
> +        lea       Table_Lookup_Bias+__svml_dasinh_data_internal(%rip), %rsi
> +
> +/*
> + * Compute X^2 = (XHi + XLo)^2 = XHi^2 + XLo * (X + XHi)
> + * The two parts are shifted off by around 26 bits. So even though
> + * the low bit will not in general be exact, it's near enough
> + */
> +        movaps    %xmm15, %xmm8
> +        mulpd     %xmm15, %xmm8
> +        subpd     %xmm15, %xmm7
> +        addpd     %xmm13, %xmm15
> +
> +/* Load the constant 1 and a sign mask */
> +        movups    One+__svml_dasinh_data_internal(%rip), %xmm12
> +
> +/*
> + * Finally, express Y + W = X^2 + 1 accurately where Y has <= 29 bits.
> + * If |X| <= 1 then |XHi| <= 1 and so |X2Hi| <= 1, so we can treat 1
> + * as the dominant component in the compensated summation. Otherwise,
> + * if |X| >= 1, then since X2Hi only has 52 significant bits, the basic
> + * addition will be exact anyway until we get to |X| >= 2^53. But by
> + * that time the log function is well-conditioned enough that the
> + * rounding error doesn't matter. Hence we can treat 1 as dominant even
> + * if it literally isn't.
> + */
> +        movaps    %xmm12, %xmm3
> +        movaps    %xmm12, %xmm5
> +        addpd     %xmm8, %xmm3
> +        mulpd     %xmm15, %xmm7
> +        subpd     %xmm3, %xmm5
> +        movups    dTopMask29+__svml_dasinh_data_internal(%rip), %xmm6
> +        andps     %xmm3, %xmm6
> +
> +/*
> + * Compute R = 1/sqrt(Y + W) * (1 + d)
> + * Force R to <= 12 significant bits in case it isn't already
> + * This means that R * Y and R^2 * Y are exactly representable.
> + */
> +        cvtpd2ps  %xmm6, %xmm1
> +        addpd     %xmm8, %xmm5
> +        subpd     %xmm6, %xmm3
> +
> +/*
> + * Unfortunately, we can still be in trouble if |X| <= 2^-10, since
> + * the absolute error 2^-(12+53)-ish in sqrt(1 + X^2) gets scaled up
> + * by 1/X and comes close to our threshold. Hence if |X| <= 2^-9,
> + * perform an alternative computation
> + * sqrt(1 + X^2) - 1 = X^2/2 - X^4/8 + X^6/16
> + * X2 = X^2
> + */
> +        addpd     %xmm7, %xmm8
> +        addpd     %xmm7, %xmm5
> +        movlhps   %xmm1, %xmm1
> +        rsqrtps   %xmm1, %xmm4
> +        addpd     %xmm3, %xmm5
> +        cvtps2pd  %xmm4, %xmm2
> +        andps     dTopMask12+__svml_dasinh_data_internal(%rip), %xmm2
> +
> +/*
> + * Compute e = -(2 * d + d^2)
> + * The first FMR is exact, and the rounding error in the other is acceptable
> + * since d and e are ~ 2^-12
> + */
> +        movaps    %xmm12, %xmm1
> +
> +/*
> + * Compute S = (Y/sqrt(Y + W)) * (1 + d)
> + * and T = (W/sqrt(Y + W)) * (1 + d)
> + * so that S + T = sqrt(Y + W) * (1 + d)
> + * S is exact, and the rounding error in T is OK.
> + */
> +        mulpd     %xmm2, %xmm6
> +        mulpd     %xmm2, %xmm5
> +        movaps    %xmm2, %xmm0
> +
> +/*
> + * Obtain sqrt(1 + X^2) - 1 in two pieces
> + * sqrt(1 + X^2) - 1
> + * = sqrt(Y + W) - 1
> + * = (S + T) * (1 + Corr) - 1
> + * = [S - 1] + [T + (S + T) * Corr]
> + * We need a compensated summation for the last part. We treat S - 1
> + * as the larger part; it certainly is until about X < 2^-4, and in that
> + * case, the error is affordable since X dominates over sqrt(1 + X^2) - 1
> + * Final sum is dTmp5 (hi) + dTmp7 (lo)
> + */
> +        movaps    %xmm6, %xmm3
> +        mulpd     %xmm6, %xmm0
> +        mulpd     %xmm5, %xmm2
> +        subpd     %xmm0, %xmm1
> +        addpd     %xmm5, %xmm3
> +        subpd     %xmm12, %xmm6
> +        subpd     %xmm2, %xmm1
> +        movups    SgnMask+__svml_dasinh_data_internal(%rip), %xmm9
> +        movaps    %xmm12, %xmm4
> +
> +/*
> + * Get the absolute value of the input, since we will exploit antisymmetry
> + * and mostly assume X >= 0 in the core computation
> + */
> +        movaps    %xmm9, %xmm10
> +        andps     %xmm13, %xmm10
> +
> +/*
> + * Check whether the input is finite, by checking |X| <= MaxFloat
> + * Otherwise set the rangemask so that the callout will get used.
> + * Note that this will also use the callout for NaNs since not(NaN <= MaxFloat)
> + */
> +        movaps    %xmm10, %xmm14
> +
> +/*
> + * The following computation can go wrong for very large X, basically
> + * because X^2 overflows. But for large X we have
> + * asinh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
> + * we can just later stick X back into the log and tweak up the exponent.
> + * Actually we scale X by 2^-30 and tweak the exponent up by 31,
> + * to stay in the safe range for the later log computation.
> + * Compute a flag now telling us when do do this.
> + */
> +        movaps    %xmm10, %xmm11
> +        cmpnlepd  dLargestFinite+__svml_dasinh_data_internal(%rip), %xmm14
> +        cmpltpd   dBigThreshold+__svml_dasinh_data_internal(%rip), %xmm11
> +        movmskpd  %xmm14, %edx
> +
> +/*
> + * Now       1 / (1 + d)
> + * = 1 / (1 + (sqrt(1 - e) - 1))
> + * = 1 / sqrt(1 - e)
> + * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 +
> + * 63/256 * e^5 + 231/1024 * e^6 + ....
> + * So compute the first five nonconstant terms of that, so that
> + * we have a relative correction (1 + Corr) to apply to S etc.
> + * C1 = 1/2
> + * C2 = 3/8
> + * C3 = 5/16
> + * C4 = 35/128
> + * C5 = 63/256
> + */
> +        movups    dC5+__svml_dasinh_data_internal(%rip), %xmm14
> +        movups    dHalf+__svml_dasinh_data_internal(%rip), %xmm15
> +        mulpd     %xmm1, %xmm14
> +
> +/* dX2over2 = X^2/2 */
> +        mulpd     %xmm15, %xmm8
> +        addpd     dC4+__svml_dasinh_data_internal(%rip), %xmm14
> +        mulpd     %xmm1, %xmm14
> +        addpd     dC3+__svml_dasinh_data_internal(%rip), %xmm14
> +        mulpd     %xmm1, %xmm14
> +        addpd     dC2+__svml_dasinh_data_internal(%rip), %xmm14
> +        mulpd     %xmm1, %xmm14
> +        addpd     %xmm15, %xmm14
> +        mulpd     %xmm14, %xmm1
> +        mulpd     %xmm3, %xmm1
> +        addpd     %xmm1, %xmm5
> +        addpd     %xmm6, %xmm5
> +
> +/* dX4over4 = X^4/4 */
> +        movaps    %xmm8, %xmm6
> +
> +/* dX46 = -X^4/4 + X^6/8 */
> +        movaps    %xmm8, %xmm7
> +        mulpd     %xmm8, %xmm6
> +        mulpd     %xmm6, %xmm7
> +        subpd     %xmm6, %xmm7
> +
> +/* dX46over2 = -X^4/8 + x^6/16 */
> +        mulpd     %xmm7, %xmm15
> +
> +/* Now multiplex the two possible computations */
> +        movaps    %xmm10, %xmm3
> +        cmplepd   dLittleThreshold+__svml_dasinh_data_internal(%rip), %xmm3
> +        addpd     %xmm15, %xmm8
> +        movaps    %xmm3, %xmm1
> +        andps     %xmm3, %xmm8
> +        andnps    %xmm5, %xmm1
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * also adding L into Xl.
> + * compute 1+x as high, low parts
> + */
> +        movaps    %xmm12, %xmm5
> +        orps      %xmm8, %xmm1
> +        movaps    %xmm11, %xmm3
> +
> +/*
> + * Now do another compensated sum to add |X| + [sqrt(1 + X^2) - 1].
> + * It's always safe to assume |X| is larger.
> + * This is the final 2-part argument to the log1p function
> + */
> +        addpd     %xmm10, %xmm1
> +        maxpd     %xmm1, %xmm5
> +        minpd     %xmm1, %xmm4
> +
> +/* Now multiplex to the case X = 2^-30 * |input|, Xl = dL = 0 in the "big" case. */
> +        movups    XScale+__svml_dasinh_data_internal(%rip), %xmm8
> +        andps     %xmm9, %xmm1
> +        mulpd     %xmm8, %xmm10
> +        cmpltpd   XThreshold+__svml_dasinh_data_internal(%rip), %xmm1
> +        movaps    %xmm5, %xmm9
> +        andnps    %xmm10, %xmm3
> +        addpd     %xmm4, %xmm9
> +        orps      XhMask+__svml_dasinh_data_internal(%rip), %xmm1
> +        andps     %xmm1, %xmm9
> +        subpd     %xmm9, %xmm5
> +        andps     %xmm11, %xmm9
> +
> +/* Now resume the main code. */
> +        movups    ExpMask+__svml_dasinh_data_internal(%rip), %xmm10
> +        orps      %xmm9, %xmm3
> +
> +/* preserve mantissa, set input exponent to 2^(-10) */
> +        andps     %xmm3, %xmm10
> +
> +/* exponent bits */
> +        movaps    %xmm3, %xmm7
> +        orps      Two10+__svml_dasinh_data_internal(%rip), %xmm10
> +        psrlq     $20, %xmm7
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        cvtpd2ps  %xmm10, %xmm1
> +        addpd     %xmm5, %xmm4
> +        movlhps   %xmm1, %xmm1
> +        andps     %xmm11, %xmm4
> +        rcpps     %xmm1, %xmm0
> +        cvtps2pd  %xmm0, %xmm0
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        movups    .FLT_30(%rip), %xmm6
> +        movaps    %xmm11, %xmm1
> +        addpd     %xmm6, %xmm0
> +        subpd     %xmm6, %xmm0
> +
> +/* exponent of X needed to scale Xl */
> +        movdqu    ExpMask0+__svml_dasinh_data_internal(%rip), %xmm5
> +
> +/* 2^ (-10-exp(X) ) */
> +        movdqu    ExpMask2+__svml_dasinh_data_internal(%rip), %xmm2
> +        pand      %xmm3, %xmm5
> +        psubq     %xmm5, %xmm2
> +
> +/* scale DblRcp */
> +        mulpd     %xmm0, %xmm2
> +
> +/* argument reduction */
> +        mulpd     %xmm2, %xmm3
> +        mulpd     %xmm2, %xmm4
> +        subpd     %xmm12, %xmm3
> +        addpd     %xmm4, %xmm3
> +
> +/* polynomial */
> +        movups    poly_coeff+__svml_dasinh_data_internal(%rip), %xmm12
> +        movaps    %xmm3, %xmm2
> +        pshufd    $221, %xmm7, %xmm8
> +        mulpd     %xmm3, %xmm12
> +
> +/* biased exponent in DP format */
> +        cvtdq2pd  %xmm8, %xmm14
> +        addpd     poly_coeff+16+__svml_dasinh_data_internal(%rip), %xmm12
> +        mulpd     %xmm3, %xmm2
> +
> +/* Add 31 to the exponent in the "large" case to get log(2 * input) */
> +        movups    dThirtyOne+__svml_dasinh_data_internal(%rip), %xmm9
> +
> +/* exponent*log(2.0) */
> +        movups    Threshold+__svml_dasinh_data_internal(%rip), %xmm5
> +        addpd     %xmm14, %xmm9
> +        cmpltpd   %xmm0, %xmm5
> +        mulpd     %xmm2, %xmm12
> +        andps     %xmm11, %xmm14
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        movaps    %xmm0, %xmm11
> +        movups    poly_coeff+32+__svml_dasinh_data_internal(%rip), %xmm0
> +        andnps    %xmm9, %xmm1
> +        mulpd     %xmm3, %xmm0
> +        addpd     poly_coeff+48+__svml_dasinh_data_internal(%rip), %xmm0
> +        addpd     %xmm12, %xmm0
> +
> +/* reconstruction */
> +        mulpd     %xmm0, %xmm2
> +        andps     Bias+__svml_dasinh_data_internal(%rip), %xmm5
> +        psrlq     $40, %xmm11
> +        orps      Bias1+__svml_dasinh_data_internal(%rip), %xmm5
> +        orps      %xmm14, %xmm1
> +        movd      %xmm11, %eax
> +        pshufd    $2, %xmm11, %xmm11
> +
> +/* Finally, reincorporate the original sign. */
> +        movups    dSign+__svml_dasinh_data_internal(%rip), %xmm0
> +        subpd     %xmm5, %xmm1
> +        addpd     %xmm2, %xmm3
> +        movd      %xmm11, %ecx
> +        mulpd     L2+__svml_dasinh_data_internal(%rip), %xmm1
> +        movslq    %eax, %rax
> +        andps     %xmm13, %xmm0
> +        movslq    %ecx, %rcx
> +        movsd     (%rsi,%rax), %xmm6
> +        movhpd    (%rsi,%rcx), %xmm6
> +        addpd     %xmm3, %xmm6
> +        addpd     %xmm6, %xmm1
> +        pxor      %xmm1, %xmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm13
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm13, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      asinh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVbN2v_asinh_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dasinh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
> +        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
> +        __declspec(align(16)) VUINT32 ExpMask[2][2];
> +        __declspec(align(16)) VUINT32 Two10[2][2];
> +        __declspec(align(16)) VUINT32 MinLog1p[2][2];
> +        __declspec(align(16)) VUINT32 MaxLog1p[2][2];
> +        __declspec(align(16)) VUINT32 One[2][2];
> +        __declspec(align(16)) VUINT32 SgnMask[2][2];
> +        __declspec(align(16)) VUINT32 XThreshold[2][2];
> +        __declspec(align(16)) VUINT32 XhMask[2][2];
> +        __declspec(align(16)) VUINT32 Threshold[2][2];
> +        __declspec(align(16)) VUINT32 Bias[2][2];
> +        __declspec(align(16)) VUINT32 Bias1[2][2];
> +        __declspec(align(16)) VUINT32 ExpMask0[2][2];
> +        __declspec(align(16)) VUINT32 ExpMask2[2][2];
> +        __declspec(align(16)) VUINT32 L2[2][2];
> +        __declspec(align(16)) VUINT32 dBigThreshold[2][2];
> +        __declspec(align(16)) VUINT32 dC2[2][2];
> +        __declspec(align(16)) VUINT32 dC3[2][2];
> +        __declspec(align(16)) VUINT32 dC4[2][2];
> +        __declspec(align(16)) VUINT32 dC5[2][2];
> +        __declspec(align(16)) VUINT32 dHalf[2][2];
> +        __declspec(align(16)) VUINT32 dLargestFinite[2][2];
> +        __declspec(align(16)) VUINT32 dLittleThreshold[2][2];
> +        __declspec(align(16)) VUINT32 dSign[2][2];
> +        __declspec(align(16)) VUINT32 dThirtyOne[2][2];
> +        __declspec(align(16)) VUINT32 dTopMask12[2][2];
> +        __declspec(align(16)) VUINT32 dTopMask26[2][2];
> +        __declspec(align(16)) VUINT32 dTopMask29[2][2];
> +        __declspec(align(16)) VUINT32 XScale[2][2];
> +} __svml_dasinh_data_internal;
> +#endif
> +__svml_dasinh_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> +        /*== Log_LA_table ==*/
> +        .align 16
> +        .quad 0x8000000000000000
> +        .quad 0xbf5ff802a9ab10e6
> +        .quad 0xbf6ff00aa2b10bc0
> +        .quad 0xbf77ee11ebd82e94
> +        .quad 0xbf7fe02a6b106789
> +        .quad 0xbf83e7295d25a7d9
> +        .quad 0xbf87dc475f810a77
> +        .quad 0xbf8bcf712c74384c
> +        .quad 0xbf8fc0a8b0fc03e4
> +        .quad 0xbf91d7f7eb9eebe7
> +        .quad 0xbf93cea44346a575
> +        .quad 0xbf95c45a51b8d389
> +        .quad 0xbf97b91b07d5b11b
> +        .quad 0xbf99ace7551cc514
> +        .quad 0xbf9b9fc027af9198
> +        .quad 0xbf9d91a66c543cc4
> +        .quad 0xbf9f829b0e783300
> +        .quad 0xbfa0b94f7c196176
> +        .quad 0xbfa1b0d98923d980
> +        .quad 0xbfa2a7ec2214e873
> +        .quad 0xbfa39e87b9febd60
> +        .quad 0xbfa494acc34d911c
> +        .quad 0xbfa58a5bafc8e4d5
> +        .quad 0xbfa67f94f094bd98
> +        .quad 0xbfa77458f632dcfc
> +        .quad 0xbfa868a83083f6cf
> +        .quad 0xbfa95c830ec8e3eb
> +        .quad 0xbfaa4fe9ffa3d235
> +        .quad 0xbfab42dd711971bf
> +        .quad 0xbfac355dd0921f2d
> +        .quad 0xbfad276b8adb0b52
> +        .quad 0xbfae19070c276016
> +        .quad 0xbfaf0a30c01162a6
> +        .quad 0xbfaffae9119b9303
> +        .quad 0xbfb075983598e471
> +        .quad 0xbfb0ed839b5526fe
> +        .quad 0xbfb16536eea37ae1
> +        .quad 0xbfb1dcb263db1944
> +        .quad 0xbfb253f62f0a1417
> +        .quad 0xbfb2cb0283f5de1f
> +        .quad 0xbfb341d7961bd1d1
> +        .quad 0xbfb3b87598b1b6ee
> +        .quad 0xbfb42edcbea646f0
> +        .quad 0xbfb4a50d3aa1b040
> +        .quad 0xbfb51b073f06183f
> +        .quad 0xbfb590cafdf01c28
> +        .quad 0xbfb60658a93750c4
> +        .quad 0xbfb67bb0726ec0fc
> +        .quad 0xbfb6f0d28ae56b4c
> +        .quad 0xbfb765bf23a6be13
> +        .quad 0xbfb7da766d7b12cd
> +        .quad 0xbfb84ef898e8282a
> +        .quad 0xbfb8c345d6319b21
> +        .quad 0xbfb9375e55595ede
> +        .quad 0xbfb9ab42462033ad
> +        .quad 0xbfba1ef1d8061cd4
> +        .quad 0xbfba926d3a4ad563
> +        .quad 0xbfbb05b49bee43fe
> +        .quad 0xbfbb78c82bb0eda1
> +        .quad 0xbfbbeba818146765
> +        .quad 0xbfbc5e548f5bc743
> +        .quad 0xbfbcd0cdbf8c13e1
> +        .quad 0xbfbd4313d66cb35d
> +        .quad 0xbfbdb5270187d927
> +        .quad 0xbfbe27076e2af2e6
> +        .quad 0xbfbe98b549671467
> +        .quad 0xbfbf0a30c01162a6
> +        .quad 0xbfbf7b79fec37ddf
> +        .quad 0xbfbfec9131dbeabb
> +        .quad 0xbfc02ebb42bf3d4b
> +        .quad 0xbfc0671512ca596e
> +        .quad 0xbfc09f561ee719c3
> +        .quad 0xbfc0d77e7cd08e59
> +        .quad 0xbfc10f8e422539b1
> +        .quad 0xbfc14785846742ac
> +        .quad 0xbfc17f6458fca611
> +        .quad 0xbfc1b72ad52f67a0
> +        .quad 0xbfc1eed90e2dc2c3
> +        .quad 0xbfc2266f190a5acb
> +        .quad 0xbfc25ded0abc6ad2
> +        .quad 0xbfc29552f81ff523
> +        .quad 0xbfc2cca0f5f5f251
> +        .quad 0xbfc303d718e47fd3
> +        .quad 0xbfc33af575770e4f
> +        .quad 0xbfc371fc201e8f74
> +        .quad 0xbfc3a8eb2d31a376
> +        .quad 0xbfc3dfc2b0ecc62a
> +        .quad 0xbfc41682bf727bc0
> +        .quad 0xbfc44d2b6ccb7d1e
> +        .quad 0xbfc483bccce6e3dd
> +        .quad 0xbfc4ba36f39a55e5
> +        .quad 0xbfc4f099f4a230b2
> +        .quad 0xbfc526e5e3a1b438
> +        .quad 0xbfc55d1ad4232d6f
> +        .quad 0xbfc59338d9982086
> +        .quad 0xbfc5c940075972b9
> +        .quad 0xbfc5ff3070a793d4
> +        .quad 0xbfc6350a28aaa758
> +        .quad 0xbfc66acd4272ad51
> +        .quad 0xbfc6a079d0f7aad2
> +        .quad 0xbfc6d60fe719d21d
> +        .quad 0xbfc70b8f97a1aa75
> +        .quad 0xbfc740f8f54037a5
> +        .quad 0xbfc7764c128f2127
> +        .quad 0xbfc7ab890210d909
> +        .quad 0xbfc7e0afd630c274
> +        .quad 0xbfc815c0a14357eb
> +        .quad 0xbfc84abb75865139
> +        .quad 0xbfc87fa06520c911
> +        .quad 0xbfc8b46f8223625b
> +        .quad 0xbfc8e928de886d41
> +        .quad 0xbfc91dcc8c340bde
> +        .quad 0xbfc9525a9cf456b4
> +        .quad 0xbfc986d3228180ca
> +        .quad 0xbfc9bb362e7dfb83
> +        .quad 0xbfc9ef83d2769a34
> +        .quad 0xbfca23bc1fe2b563
> +        .quad 0xbfca57df28244dcd
> +        .quad 0xbfca8becfc882f19
> +        .quad 0xbfcabfe5ae46124c
> +        .quad 0xbfcaf3c94e80bff3
> +        .quad 0xbfcb2797ee46320c
> +        .quad 0xbfcb5b519e8fb5a4
> +        .quad 0xbfcb8ef670420c3b
> +        .quad 0xbfcbc286742d8cd6
> +        .quad 0xbfcbf601bb0e44e2
> +        .quad 0xbfcc2968558c18c1
> +        .quad 0xbfcc5cba543ae425
> +        .quad 0xbfcc8ff7c79a9a22
> +        .quad 0xbfccc320c0176502
> +        .quad 0xbfccf6354e09c5dc
> +        .quad 0xbfcd293581b6b3e7
> +        .quad 0xbfcd5c216b4fbb91
> +        .quad 0xbfcd8ef91af31d5e
> +        .quad 0xbfcdc1bca0abec7d
> +        .quad 0xbfcdf46c0c722d2f
> +        .quad 0xbfce27076e2af2e6
> +        .quad 0xbfce598ed5a87e2f
> +        .quad 0xbfce8c0252aa5a60
> +        .quad 0xbfcebe61f4dd7b0b
> +        .quad 0xbfcef0adcbdc5936
> +        .quad 0xbfcf22e5e72f105d
> +        .quad 0xbfcf550a564b7b37
> +        .quad 0xbfcf871b28955045
> +        .quad 0xbfcfb9186d5e3e2b
> +        .quad 0xbfcfeb0233e607cc
> +        .quad 0xbfd00e6c45ad501d
> +        .quad 0xbfd0274dc16c232f
> +        .quad 0xbfd0402594b4d041
> +        .quad 0xbfd058f3c703ebc6
> +        .quad 0xbfd071b85fcd590d
> +        .quad 0xbfd08a73667c57af
> +        .quad 0xbfd0a324e27390e3
> +        .quad 0xbfd0bbccdb0d24bd
> +        .quad 0xbfd0d46b579ab74b
> +        .quad 0xbfd0ed005f657da4
> +        .quad 0xbfd1058bf9ae4ad5
> +        .quad 0xbfd11e0e2dad9cb7
> +        .quad 0xbfd136870293a8b0
> +        .quad 0xbfd14ef67f88685a
> +        .quad 0xbfd1675cababa60e
> +        .quad 0xbfd17fb98e15095d
> +        .quad 0xbfd1980d2dd4236f
> +        .quad 0xbfd1b05791f07b49
> +        .quad 0xbfd1c898c16999fb
> +        .quad 0xbfd1e0d0c33716be
> +        .quad 0xbfd1f8ff9e48a2f3
> +        .quad 0xbfd211255986160c
> +        .quad 0xbfd22941fbcf7966
> +        .quad 0xbfd241558bfd1404
> +        .quad 0xbfd2596010df763a
> +        .quad 0xbfd27161913f853d
> +        .quad 0xbfd2895a13de86a3
> +        .quad 0xbfd2a1499f762bc9
> +        .quad 0xbfd2b9303ab89d25
> +        .quad 0xbfd2d10dec508583
> +        .quad 0xbfd2e8e2bae11d31
> +        .quad 0xbfd300aead06350c
> +        .quad 0xbfd31871c9544185
> +        .quad 0xbfd3302c16586588
> +        .quad 0xbfd347dd9a987d55
> +        .quad 0xbfd35f865c93293e
> +        .quad 0xbfd3772662bfd85b
> +        .quad 0xbfd38ebdb38ed321
> +        .quad 0xbfd3a64c556945ea
> +        .quad 0xbfd3bdd24eb14b6a
> +        .quad 0xbfd3d54fa5c1f710
> +        .quad 0xbfd3ecc460ef5f50
> +        .quad 0xbfd404308686a7e4
> +        .quad 0xbfd41b941cce0bee
> +        .quad 0xbfd432ef2a04e814
> +        .quad 0xbfd44a41b463c47c
> +        .quad 0xbfd4618bc21c5ec2
> +        .quad 0xbfd478cd5959b3d9
> +        .quad 0xbfd49006804009d1
> +        .quad 0xbfd4a7373cecf997
> +        .quad 0xbfd4be5f957778a1
> +        .quad 0xbfd4d57f8fefe27f
> +        .quad 0xbfd4ec973260026a
> +        .quad 0xbfd503a682cb1cb3
> +        .quad 0xbfd51aad872df82d
> +        .quad 0xbfd531ac457ee77e
> +        .quad 0xbfd548a2c3add263
> +        .quad 0xbfd55f9107a43ee2
> +        .quad 0xbfd5767717455a6c
> +        .quad 0xbfd58d54f86e02f2
> +        .quad 0xbfd5a42ab0f4cfe2
> +        .quad 0xbfd5baf846aa1b19
> +        .quad 0xbfd5d1bdbf5809ca
> +        .quad 0xbfd5e87b20c2954a
> +        .quad 0xbfd5ff3070a793d4
> +        .quad 0xbfd615ddb4bec13c
> +        .quad 0xbfd62c82f2b9c795
> +        .quad 0x3fd61965cdb02c1f
> +        .quad 0x3fd602d08af091ec
> +        .quad 0x3fd5ec433d5c35ae
> +        .quad 0x3fd5d5bddf595f30
> +        .quad 0x3fd5bf406b543db2
> +        .quad 0x3fd5a8cadbbedfa1
> +        .quad 0x3fd5925d2b112a59
> +        .quad 0x3fd57bf753c8d1fb
> +        .quad 0x3fd565995069514c
> +        .quad 0x3fd54f431b7be1a9
> +        .quad 0x3fd538f4af8f72fe
> +        .quad 0x3fd522ae0738a3d8
> +        .quad 0x3fd50c6f1d11b97c
> +        .quad 0x3fd4f637ebba9810
> +        .quad 0x3fd4e0086dd8baca
> +        .quad 0x3fd4c9e09e172c3c
> +        .quad 0x3fd4b3c077267e9a
> +        .quad 0x3fd49da7f3bcc41f
> +        .quad 0x3fd487970e958770
> +        .quad 0x3fd4718dc271c41b
> +        .quad 0x3fd45b8c0a17df13
> +        .quad 0x3fd44591e0539f49
> +        .quad 0x3fd42f9f3ff62642
> +        .quad 0x3fd419b423d5e8c7
> +        .quad 0x3fd403d086cea79c
> +        .quad 0x3fd3edf463c1683e
> +        .quad 0x3fd3d81fb5946dba
> +        .quad 0x3fd3c25277333184
> +        .quad 0x3fd3ac8ca38e5c5f
> +        .quad 0x3fd396ce359bbf54
> +        .quad 0x3fd3811728564cb2
> +        .quad 0x3fd36b6776be1117
> +        .quad 0x3fd355bf1bd82c8b
> +        .quad 0x3fd3401e12aecba1
> +        .quad 0x3fd32a84565120a8
> +        .quad 0x3fd314f1e1d35ce4
> +        .quad 0x3fd2ff66b04ea9d4
> +        .quad 0x3fd2e9e2bce12286
> +        .quad 0x3fd2d46602adccee
> +        .quad 0x3fd2bef07cdc9354
> +        .quad 0x3fd2a982269a3dbf
> +        .quad 0x3fd2941afb186b7c
> +        .quad 0x3fd27ebaf58d8c9d
> +        .quad 0x3fd269621134db92
> +        .quad 0x3fd25410494e56c7
> +        .quad 0x3fd23ec5991eba49
> +        .quad 0x3fd22981fbef797b
> +        .quad 0x3fd214456d0eb8d4
> +        .quad 0x3fd1ff0fe7cf47a7
> +        .quad 0x3fd1e9e1678899f4
> +        .quad 0x3fd1d4b9e796c245
> +        .quad 0x3fd1bf99635a6b95
> +        .quad 0x3fd1aa7fd638d33f
> +        .quad 0x3fd1956d3b9bc2fa
> +        .quad 0x3fd180618ef18adf
> +        .quad 0x3fd16b5ccbacfb73
> +        .quad 0x3fd1565eed455fc3
> +        .quad 0x3fd14167ef367783
> +        .quad 0x3fd12c77cd00713b
> +        .quad 0x3fd1178e8227e47c
> +        .quad 0x3fd102ac0a35cc1c
> +        .quad 0x3fd0edd060b78081
> +        .quad 0x3fd0d8fb813eb1ef
> +        .quad 0x3fd0c42d676162e3
> +        .quad 0x3fd0af660eb9e279
> +        .quad 0x3fd09aa572e6c6d4
> +        .quad 0x3fd085eb8f8ae797
> +        .quad 0x3fd07138604d5862
> +        .quad 0x3fd05c8be0d9635a
> +        .quad 0x3fd047e60cde83b8
> +        .quad 0x3fd03346e0106062
> +        .quad 0x3fd01eae5626c691
> +        .quad 0x3fd00a1c6adda473
> +        .quad 0x3fcfeb2233ea07cd
> +        .quad 0x3fcfc218be620a5e
> +        .quad 0x3fcf991c6cb3b379
> +        .quad 0x3fcf702d36777df0
> +        .quad 0x3fcf474b134df229
> +        .quad 0x3fcf1e75fadf9bde
> +        .quad 0x3fcef5ade4dcffe6
> +        .quad 0x3fceccf2c8fe920a
> +        .quad 0x3fcea4449f04aaf5
> +        .quad 0x3fce7ba35eb77e2a
> +        .quad 0x3fce530effe71012
> +        .quad 0x3fce2a877a6b2c12
> +        .quad 0x3fce020cc6235ab5
> +        .quad 0x3fcdd99edaf6d7e9
> +        .quad 0x3fcdb13db0d48940
> +        .quad 0x3fcd88e93fb2f450
> +        .quad 0x3fcd60a17f903515
> +        .quad 0x3fcd38666871f465
> +        .quad 0x3fcd1037f2655e7b
> +        .quad 0x3fcce816157f1988
> +        .quad 0x3fccc000c9db3c52
> +        .quad 0x3fcc97f8079d44ec
> +        .quad 0x3fcc6ffbc6f00f71
> +        .quad 0x3fcc480c0005ccd1
> +        .quad 0x3fcc2028ab17f9b4
> +        .quad 0x3fcbf851c067555f
> +        .quad 0x3fcbd087383bd8ad
> +        .quad 0x3fcba8c90ae4ad19
> +        .quad 0x3fcb811730b823d2
> +        .quad 0x3fcb5971a213acdb
> +        .quad 0x3fcb31d8575bce3d
> +        .quad 0x3fcb0a4b48fc1b46
> +        .quad 0x3fcae2ca6f672bd4
> +        .quad 0x3fcabb55c31693ad
> +        .quad 0x3fca93ed3c8ad9e3
> +        .quad 0x3fca6c90d44b704e
> +        .quad 0x3fca454082e6ab05
> +        .quad 0x3fca1dfc40f1b7f1
> +        .quad 0x3fc9f6c407089664
> +        .quad 0x3fc9cf97cdce0ec3
> +        .quad 0x3fc9a8778debaa38
> +        .quad 0x3fc981634011aa75
> +        .quad 0x3fc95a5adcf7017f
> +        .quad 0x3fc9335e5d594989
> +        .quad 0x3fc90c6db9fcbcd9
> +        .quad 0x3fc8e588ebac2dbf
> +        .quad 0x3fc8beafeb38fe8c
> +        .quad 0x3fc897e2b17b19a5
> +        .quad 0x3fc871213750e994
> +        .quad 0x3fc84a6b759f512f
> +        .quad 0x3fc823c16551a3c2
> +        .quad 0x3fc7fd22ff599d4f
> +        .quad 0x3fc7d6903caf5ad0
> +        .quad 0x3fc7b0091651528c
> +        .quad 0x3fc7898d85444c73
> +        .quad 0x3fc7631d82935a86
> +        .quad 0x3fc73cb9074fd14d
> +        .quad 0x3fc716600c914054
> +        .quad 0x3fc6f0128b756abc
> +        .quad 0x3fc6c9d07d203fc7
> +        .quad 0x3fc6a399dabbd383
> +        .quad 0x3fc67d6e9d785771
> +        .quad 0x3fc6574ebe8c133a
> +        .quad 0x3fc6313a37335d76
> +        .quad 0x3fc60b3100b09476
> +        .quad 0x3fc5e533144c1719
> +        .quad 0x3fc5bf406b543db2
> +        .quad 0x3fc59958ff1d52f1
> +        .quad 0x3fc5737cc9018cdd
> +        .quad 0x3fc54dabc26105d2
> +        .quad 0x3fc527e5e4a1b58d
> +        .quad 0x3fc5022b292f6a45
> +        .quad 0x3fc4dc7b897bc1c8
> +        .quad 0x3fc4b6d6fefe22a4
> +        .quad 0x3fc4913d8333b561
> +        .quad 0x3fc46baf0f9f5db7
> +        .quad 0x3fc4462b9dc9b3dc
> +        .quad 0x3fc420b32740fdd4
> +        .quad 0x3fc3fb45a59928cc
> +        .quad 0x3fc3d5e3126bc27f
> +        .quad 0x3fc3b08b6757f2a9
> +        .quad 0x3fc38b3e9e027479
> +        .quad 0x3fc365fcb0159016
> +        .quad 0x3fc340c59741142e
> +        .quad 0x3fc31b994d3a4f85
> +        .quad 0x3fc2f677cbbc0a96
> +        .quad 0x3fc2d1610c86813a
> +        .quad 0x3fc2ac55095f5c59
> +        .quad 0x3fc28753bc11aba5
> +        .quad 0x3fc2625d1e6ddf57
> +        .quad 0x3fc23d712a49c202
> +        .quad 0x3fc2188fd9807263
> +        .quad 0x3fc1f3b925f25d41
> +        .quad 0x3fc1ceed09853752
> +        .quad 0x3fc1aa2b7e23f72a
> +        .quad 0x3fc185747dbecf34
> +        .quad 0x3fc160c8024b27b1
> +        .quad 0x3fc13c2605c398c3
> +        .quad 0x3fc1178e8227e47c
> +        .quad 0x3fc0f301717cf0fb
> +        .quad 0x3fc0ce7ecdccc28d
> +        .quad 0x3fc0aa06912675d5
> +        .quad 0x3fc08598b59e3a07
> +        .quad 0x3fc06135354d4b18
> +        .quad 0x3fc03cdc0a51ec0d
> +        .quad 0x3fc0188d2ecf6140
> +        .quad 0x3fbfe89139dbd566
> +        .quad 0x3fbfa01c9db57ce2
> +        .quad 0x3fbf57bc7d9005db
> +        .quad 0x3fbf0f70cdd992e3
> +        .quad 0x3fbec739830a1120
> +        .quad 0x3fbe7f1691a32d3e
> +        .quad 0x3fbe3707ee30487b
> +        .quad 0x3fbdef0d8d466db9
> +        .quad 0x3fbda727638446a2
> +        .quad 0x3fbd5f55659210e2
> +        .quad 0x3fbd179788219364
> +        .quad 0x3fbccfedbfee13a8
> +        .quad 0x3fbc885801bc4b23
> +        .quad 0x3fbc40d6425a5cb1
> +        .quad 0x3fbbf968769fca11
> +        .quad 0x3fbbb20e936d6974
> +        .quad 0x3fbb6ac88dad5b1c
> +        .quad 0x3fbb23965a52ff00
> +        .quad 0x3fbadc77ee5aea8c
> +        .quad 0x3fba956d3ecade63
> +        .quad 0x3fba4e7640b1bc38
> +        .quad 0x3fba0792e9277cac
> +        .quad 0x3fb9c0c32d4d2548
> +        .quad 0x3fb97a07024cbe74
> +        .quad 0x3fb9335e5d594989
> +        .quad 0x3fb8ecc933aeb6e8
> +        .quad 0x3fb8a6477a91dc29
> +        .quad 0x3fb85fd927506a48
> +        .quad 0x3fb8197e2f40e3f0
> +        .quad 0x3fb7d33687c293c9
> +        .quad 0x3fb78d02263d82d3
> +        .quad 0x3fb746e100226ed9
> +        .quad 0x3fb700d30aeac0e1
> +        .quad 0x3fb6bad83c1883b6
> +        .quad 0x3fb674f089365a7a
> +        .quad 0x3fb62f1be7d77743
> +        .quad 0x3fb5e95a4d9791cb
> +        .quad 0x3fb5a3abb01ade25
> +        .quad 0x3fb55e10050e0384
> +        .quad 0x3fb518874226130a
> +        .quad 0x3fb4d3115d207eac
> +        .quad 0x3fb48dae4bc31018
> +        .quad 0x3fb4485e03dbdfad
> +        .quad 0x3fb403207b414b7f
> +        .quad 0x3fb3bdf5a7d1ee64
> +        .quad 0x3fb378dd7f749714
> +        .quad 0x3fb333d7f8183f4b
> +        .quad 0x3fb2eee507b40301
> +        .quad 0x3fb2aa04a44717a5
> +        .quad 0x3fb26536c3d8c369
> +        .quad 0x3fb2207b5c78549e
> +        .quad 0x3fb1dbd2643d190b
> +        .quad 0x3fb1973bd1465567
> +        .quad 0x3fb152b799bb3cc9
> +        .quad 0x3fb10e45b3cae831
> +        .quad 0x3fb0c9e615ac4e17
> +        .quad 0x3fb08598b59e3a07
> +        .quad 0x3fb0415d89e74444
> +        .quad 0x3faffa6911ab9301
> +        .quad 0x3faf723b517fc523
> +        .quad 0x3faeea31c006b87c
> +        .quad 0x3fae624c4a0b5e1b
> +        .quad 0x3fadda8adc67ee4e
> +        .quad 0x3fad52ed6405d86f
> +        .quad 0x3faccb73cdddb2cc
> +        .quad 0x3fac441e06f72a9e
> +        .quad 0x3fabbcebfc68f420
> +        .quad 0x3fab35dd9b58baad
> +        .quad 0x3faaaef2d0fb10fc
> +        .quad 0x3faa282b8a936171
> +        .quad 0x3fa9a187b573de7c
> +        .quad 0x3fa91b073efd7314
> +        .quad 0x3fa894aa149fb343
> +        .quad 0x3fa80e7023d8ccc4
> +        .quad 0x3fa788595a3577ba
> +        .quad 0x3fa70265a550e777
> +        .quad 0x3fa67c94f2d4bb58
> +        .quad 0x3fa5f6e73078efb8
> +        .quad 0x3fa5715c4c03ceef
> +        .quad 0x3fa4ebf43349e26f
> +        .quad 0x3fa466aed42de3ea
> +        .quad 0x3fa3e18c1ca0ae92
> +        .quad 0x3fa35c8bfaa1306b
> +        .quad 0x3fa2d7ae5c3c5bae
> +        .quad 0x3fa252f32f8d183f
> +        .quad 0x3fa1ce5a62bc353a
> +        .quad 0x3fa149e3e4005a8d
> +        .quad 0x3fa0c58fa19dfaaa
> +        .quad 0x3fa0415d89e74444
> +        .quad 0x3f9f7a9b16782856
> +        .quad 0x3f9e72bf2813ce51
> +        .quad 0x3f9d6b2725979802
> +        .quad 0x3f9c63d2ec14aaf2
> +        .quad 0x3f9b5cc258b718e6
> +        .quad 0x3f9a55f548c5c43f
> +        .quad 0x3f994f6b99a24475
> +        .quad 0x3f98492528c8cabf
> +        .quad 0x3f974321d3d006d3
> +        .quad 0x3f963d6178690bd6
> +        .quad 0x3f9537e3f45f3565
> +        .quad 0x3f9432a925980cc1
> +        .quad 0x3f932db0ea132e22
> +        .quad 0x3f9228fb1fea2e28
> +        .quad 0x3f912487a5507f70
> +        .quad 0x3f90205658935847
> +        .quad 0x3f8e38ce3033310c
> +        .quad 0x3f8c317384c75f06
> +        .quad 0x3f8a2a9c6c170462
> +        .quad 0x3f882448a388a2aa
> +        .quad 0x3f861e77e8b53fc6
> +        .quad 0x3f841929f96832f0
> +        .quad 0x3f82145e939ef1e9
> +        .quad 0x3f8010157588de71
> +        .quad 0x3f7c189cbb0e27fb
> +        .quad 0x3f78121214586b54
> +        .quad 0x3f740c8a747878e2
> +        .quad 0x3f70080559588b35
> +        .quad 0x3f680904828985c0
> +        .quad 0x3f60040155d5889e
> +        .quad 0x3f50020055655889
> +        .quad 0x0000000000000000
> +        /*== poly_coeff[4] ==*/
> +        .align 16
> +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 16
> +        .quad 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinLog1p = -1+2^(-53) ==*/
> +        .align 16
> +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff
> +        /*== MaxLog1p ==*/
> +        .align 16
> +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000
> +        /*== One ==*/
> +        .align 16
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== SgnMask ==*/
> +        .align 16
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== XThreshold ==*/
> +        .align 16
> +        .quad 0x3e00000000000000, 0x3e00000000000000
> +        /*== XhMask ==*/
> +        .align 16
> +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00
> +        /*== Threshold ==*/
> +        .align 16
> +        .quad 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 16
> +        .quad 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 16
> +        .quad 0x408ff00000000000, 0x408ff00000000000
> +        /*== ExpMask ==*/
> +        .align 16
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000
> +        /*== ExpMask2 ==*/
> +        .align 16
> +        .quad 0x7f40000000000000, 0x7f40000000000000
> +        /*== L2L ==*/
> +        .align 16
> +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> +        /*== dBigThreshold ==*/
> +        .align 16
> +        .quad 0x41D0000000000000, 0x41D0000000000000
> +        /*== dC2 ==*/
> +        .align 16
> +        .quad 0x3FD8000000000000, 0x3FD8000000000000
> +        /*== dC3 ==*/
> +        .align 16
> +        .quad 0x3FD4000000000000, 0x3FD4000000000000
> +        /*== dC4 ==*/
> +        .align 16
> +        .quad 0x3FD1800000000000, 0x3FD1800000000000
> +        /*== dC5 ==*/
> +        .align 16
> +        .quad 0x3FCF800000000000, 0x3FCF800000000000
> +        /*== dHalf ==*/
> +        .align 16
> +        .quad 0x3FE0000000000000, 0x3FE0000000000000
> +        /*== dLargestFinite ==*/
> +        .align 16
> +        .quad 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF
> +        /*== dLittleThreshold ==*/
> +        .align 16
> +        .quad 0x3F60000000000000, 0x3F60000000000000
> +        /*== dSign ==*/
> +        .align 16
> +        .quad 0x8000000000000000, 0x8000000000000000
> +        /*== dThirtyOne ==*/
> +        .align 16
> +        .quad 0x403F000000000000, 0x403F000000000000
> +        /*== dTopMask12 ==*/
> +        .align 16
> +        .quad 0xFFFFFE0000000000, 0xFFFFFE0000000000
> +        /*== dTopMask26 ==*/
> +        .align 16
> +        .quad 0xFFFFFFFFF8000000, 0xFFFFFFFFF8000000
> +        /*== dTopMask29 ==*/
> +        .align 16
> +        .quad 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000
> +        /*== XScale ==*/
> +        .align 16
> +        .quad 0x3E10000000000000, 0x3E10000000000000
> +        .align 16
> +        .type	__svml_dasinh_data_internal,@object
> +        .size	__svml_dasinh_data_internal,.-__svml_dasinh_data_internal
> +        .align 16
> +
> +.FLT_30:
> +        .long	0x00000000,0x43380000,0x00000000,0x43380000
> +        .type	.FLT_30,@object
> +        .size	.FLT_30,16
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core-sse.S
> new file mode 100644
> index 0000000000..903b5f0fb5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized asinh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_asinh _ZGVdN4v_asinh_sse_wrapper
> +#include "../svml_d_asinh4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core.c
> new file mode 100644
> index 0000000000..e7acd032b5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized asinh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_asinh
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_asinh, __GI__ZGVdN4v_asinh, __redirect__ZGVdN4v_asinh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core_avx2.S
> new file mode 100644
> index 0000000000..d691d1ec6f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh4_core_avx2.S
> @@ -0,0 +1,1601 @@
> +/* Function asinh vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute asinh(x) as log(x + sqrt(x*x + 1))
> + *
> + *   Special cases:
> + *
> + *   asinh(NaN) = quiet NaN, and raise invalid exception
> + *   asinh(INF) = that INF
> + *   asinh(0)   = that 0
> + *
> + */
> +
> +/* Offsets for data table __svml_dasinh_data_internal
> + */
> +#define Log_HA_table                  	0
> +#define Log_LA_table                  	8224
> +#define poly_coeff                    	12352
> +#define ExpMask                       	12480
> +#define Two10                         	12512
> +#define MinLog1p                      	12544
> +#define MaxLog1p                      	12576
> +#define One                           	12608
> +#define SgnMask                       	12640
> +#define XThreshold                    	12672
> +#define XhMask                        	12704
> +#define Threshold                     	12736
> +#define Bias                          	12768
> +#define Bias1                         	12800
> +#define ExpMask0                      	12832
> +#define ExpMask2                      	12864
> +#define L2                            	12896
> +#define dBigThreshold                 	12928
> +#define dC2                           	12960
> +#define dC3                           	12992
> +#define dC4                           	13024
> +#define dC5                           	13056
> +#define dHalf                         	13088
> +#define dLargestFinite                	13120
> +#define dLittleThreshold              	13152
> +#define dSign                         	13184
> +#define dThirtyOne                    	13216
> +#define dTopMask12                    	13248
> +#define dTopMask29                    	13280
> +#define XScale                        	13312
> +
> +/* Lookup bias for data table __svml_dasinh_data_internal.  */
> +#define Table_Lookup_Bias               -0x405fe0
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_asinh_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       Table_Lookup_Bias+__svml_dasinh_data_internal(%rip), %r8
> +        vmovapd   %ymm0, %ymm13
> +        vmovupd   SgnMask+__svml_dasinh_data_internal(%rip), %ymm9
> +
> +/* Load the constant 1 and a sign mask */
> +        vmovupd   One+__svml_dasinh_data_internal(%rip), %ymm12
> +
> +/* No need to split X when FMA is available in hardware. */
> +        vmulpd    %ymm13, %ymm13, %ymm8
> +
> +/*
> + * Get the absolute value of the input, since we will exploit antisymmetry
> + * and mostly assume X >= 0 in the core computation
> + */
> +        vandpd    %ymm9, %ymm13, %ymm10
> +
> +/*
> + * Check whether the input is finite, by checking |X| <= MaxFloat
> + * Otherwise set the rangemask so that the callout will get used.
> + * Note that this will also use the callout for NaNs since not(NaN <= MaxFloat)
> + */
> +        vcmpnle_uqpd dLargestFinite+__svml_dasinh_data_internal(%rip), %ymm10, %ymm14
> +
> +/*
> + * Finally, express Y + W = X^2 + 1 accurately where Y has <= 29 bits.
> + * If |X| <= 1 then |XHi| <= 1 and so |X2Hi| <= 1, so we can treat 1
> + * as the dominant component in the compensated summation. Otherwise,
> + * if |X| >= 1, then since X2Hi only has 52 significant bits, the basic
> + * addition will be exact anyway until we get to |X| >= 2^53. But by
> + * that time the log function is well-conditioned enough that the
> + * rounding error doesn't matter. Hence we can treat 1 as dominant even
> + * if it literally isn't.
> + */
> +        vaddpd    %ymm8, %ymm12, %ymm5
> +
> +/*
> + * The following computation can go wrong for very large X, basically
> + * because X^2 overflows. But for large X we have
> + * asinh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
> + * we can just later stick X back into the log and tweak up the exponent.
> + * Actually we scale X by 2^-30 and tweak the exponent up by 31,
> + * to stay in the safe range for the later log computation.
> + * Compute a flag now telling us when do do this.
> + */
> +        vcmplt_oqpd dBigThreshold+__svml_dasinh_data_internal(%rip), %ymm10, %ymm11
> +        vsubpd    %ymm5, %ymm12, %ymm15
> +        vmovmskpd %ymm14, %eax
> +        vandpd    dTopMask29+__svml_dasinh_data_internal(%rip), %ymm5, %ymm14
> +
> +/*
> + * Compute R = 1/sqrt(Y + W) * (1 + d)
> + * Force R to <= 12 significant bits in case it isn't already
> + * This means that R * Y and R^2 * Y are exactly representable.
> + */
> +        vcvtpd2ps %ymm14, %xmm1
> +        vaddpd    %ymm15, %ymm8, %ymm0
> +        vsubpd    %ymm14, %ymm5, %ymm2
> +        vrsqrtps  %xmm1, %xmm3
> +        vmovapd   %ymm13, %ymm7
> +        vfmsub213pd %ymm8, %ymm13, %ymm7
> +        vcvtps2pd %xmm3, %ymm6
> +        vaddpd    %ymm0, %ymm7, %ymm4
> +
> +/*
> + * Unfortunately, we can still be in trouble if |X| <= 2^-10, since
> + * the absolute error 2^-(12+53)-ish in sqrt(1 + X^2) gets scaled up
> + * by 1/X and comes close to our threshold. Hence if |X| <= 2^-9,
> + * perform an alternative computation
> + * sqrt(1 + X^2) - 1 = X^2/2 - X^4/8 + X^6/16
> + * X2 = X^2
> + */
> +        vaddpd    %ymm7, %ymm8, %ymm7
> +        vaddpd    %ymm2, %ymm4, %ymm15
> +
> +/*
> + * Now       1 / (1 + d)
> + * = 1 / (1 + (sqrt(1 - e) - 1))
> + * = 1 / sqrt(1 - e)
> + * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 +
> + * 63/256 * e^5 + 231/1024 * e^6 + ....
> + * So compute the first five nonconstant terms of that, so that
> + * we have a relative correction (1 + Corr) to apply to S etc.
> + * C1 = 1/2
> + * C2 = 3/8
> + * C3 = 5/16
> + * C4 = 35/128
> + * C5 = 63/256
> + */
> +        vmovupd   dC5+__svml_dasinh_data_internal(%rip), %ymm4
> +        vandpd    dTopMask12+__svml_dasinh_data_internal(%rip), %ymm6, %ymm0
> +
> +/*
> + * Compute S = (Y/sqrt(Y + W)) * (1 + d)
> + * and T = (W/sqrt(Y + W)) * (1 + d)
> + * so that S + T = sqrt(Y + W) * (1 + d)
> + * S is exact, and the rounding error in T is OK.
> + */
> +        vmulpd    %ymm0, %ymm14, %ymm3
> +        vmulpd    %ymm15, %ymm0, %ymm1
> +        vmovupd   dHalf+__svml_dasinh_data_internal(%rip), %ymm6
> +        vsubpd    %ymm12, %ymm3, %ymm14
> +
> +/*
> + * Obtain sqrt(1 + X^2) - 1 in two pieces
> + * sqrt(1 + X^2) - 1
> + * = sqrt(Y + W) - 1
> + * = (S + T) * (1 + Corr) - 1
> + * = [S - 1] + [T + (S + T) * Corr]
> + * We need a compensated summation for the last part. We treat S - 1
> + * as the larger part; it certainly is until about X < 2^-4, and in that
> + * case, the error is affordable since X dominates over sqrt(1 + X^2) - 1
> + * Final sum is dTmp5 (hi) + dTmp7 (lo)
> + */
> +        vaddpd    %ymm1, %ymm3, %ymm2
> +
> +/*
> + * Compute e = -(2 * d + d^2)
> + * The first FMR is exact, and the rounding error in the other is acceptable
> + * since d and e are ~ 2^-12
> + */
> +        vmovapd   %ymm12, %ymm5
> +        vfnmadd231pd %ymm3, %ymm0, %ymm5
> +        vfnmadd231pd %ymm1, %ymm0, %ymm5
> +        vfmadd213pd dC4+__svml_dasinh_data_internal(%rip), %ymm5, %ymm4
> +        vfmadd213pd dC3+__svml_dasinh_data_internal(%rip), %ymm5, %ymm4
> +        vfmadd213pd dC2+__svml_dasinh_data_internal(%rip), %ymm5, %ymm4
> +        vfmadd213pd %ymm6, %ymm5, %ymm4
> +        vmulpd    %ymm4, %ymm5, %ymm0
> +        vfmadd213pd %ymm1, %ymm2, %ymm0
> +
> +/* Now multiplex the two possible computations */
> +        vcmple_oqpd dLittleThreshold+__svml_dasinh_data_internal(%rip), %ymm10, %ymm2
> +        vaddpd    %ymm14, %ymm0, %ymm15
> +
> +/* dX2over2 = X^2/2 */
> +        vmulpd    %ymm7, %ymm6, %ymm0
> +
> +/* dX4over4 = X^4/4 */
> +        vmulpd    %ymm0, %ymm0, %ymm8
> +
> +/* dX46 = -X^4/4 + X^6/8 */
> +        vfmsub231pd %ymm0, %ymm8, %ymm8
> +
> +/* dX46over2 = -X^4/8 + x^6/16 */
> +        vmulpd    %ymm8, %ymm6, %ymm5
> +
> +/* 2^ (-10-exp(X) ) */
> +        vmovupd   ExpMask2+__svml_dasinh_data_internal(%rip), %ymm8
> +        vaddpd    %ymm5, %ymm0, %ymm4
> +        vblendvpd %ymm2, %ymm4, %ymm15, %ymm1
> +
> +/*
> + * Now do another compensated sum to add |X| + [sqrt(1 + X^2) - 1].
> + * It's always safe to assume |X| is larger.
> + * This is the final 2-part argument to the log1p function
> + */
> +        vaddpd    %ymm1, %ymm10, %ymm3
> +
> +/* Now multiplex to the case X = 2^-30 * |input|, Xl = dL = 0 in the "big" case. */
> +        vmulpd    XScale+__svml_dasinh_data_internal(%rip), %ymm10, %ymm10
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * also adding L into Xl.
> + * compute 1+x as high, low parts
> + */
> +        vmaxpd    %ymm3, %ymm12, %ymm6
> +        vminpd    %ymm3, %ymm12, %ymm7
> +        vandpd    %ymm9, %ymm3, %ymm9
> +        vcmplt_oqpd XThreshold+__svml_dasinh_data_internal(%rip), %ymm9, %ymm0
> +        vaddpd    %ymm7, %ymm6, %ymm5
> +        vorpd     XhMask+__svml_dasinh_data_internal(%rip), %ymm0, %ymm4
> +        vandpd    %ymm4, %ymm5, %ymm1
> +        vblendvpd %ymm11, %ymm1, %ymm10, %ymm5
> +        vsubpd    %ymm1, %ymm6, %ymm2
> +
> +/* exponent bits */
> +        vpsrlq    $20, %ymm5, %ymm10
> +        vaddpd    %ymm2, %ymm7, %ymm3
> +
> +/*
> + * Now resume the main code.
> + * preserve mantissa, set input exponent to 2^(-10)
> + */
> +        vandpd    ExpMask+__svml_dasinh_data_internal(%rip), %ymm5, %ymm0
> +        vorpd     Two10+__svml_dasinh_data_internal(%rip), %ymm0, %ymm2
> +
> +/* reciprocal approximation good to at least 11 bits */
> +        vcvtpd2ps %ymm2, %xmm6
> +        vrcpps    %xmm6, %xmm7
> +        vcvtps2pd %xmm7, %ymm15
> +
> +/* exponent of X needed to scale Xl */
> +        vandps    ExpMask0+__svml_dasinh_data_internal(%rip), %ymm5, %ymm9
> +        vpsubq    %ymm9, %ymm8, %ymm0
> +        vandpd    %ymm11, %ymm3, %ymm4
> +
> +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> +        vroundpd  $0, %ymm15, %ymm3
> +
> +/* scale DblRcp */
> +        vmulpd    %ymm0, %ymm3, %ymm2
> +
> +/* argument reduction */
> +        vfmsub213pd %ymm12, %ymm2, %ymm5
> +        vmulpd    %ymm2, %ymm4, %ymm12
> +        vmovupd   poly_coeff+64+__svml_dasinh_data_internal(%rip), %ymm2
> +        vaddpd    %ymm12, %ymm5, %ymm5
> +        vfmadd213pd poly_coeff+96+__svml_dasinh_data_internal(%rip), %ymm5, %ymm2
> +        vmulpd    %ymm5, %ymm5, %ymm4
> +        vextractf128 $1, %ymm10, %xmm14
> +        vshufps   $221, %xmm14, %xmm10, %xmm1
> +
> +/* biased exponent in DP format */
> +        vcvtdq2pd %xmm1, %ymm7
> +
> +/* exponent*log(2.0) */
> +        vmovupd   Threshold+__svml_dasinh_data_internal(%rip), %ymm10
> +
> +/* Add 31 to the exponent in the "large" case to get log(2 * input) */
> +        vaddpd    dThirtyOne+__svml_dasinh_data_internal(%rip), %ymm7, %ymm6
> +        vblendvpd %ymm11, %ymm7, %ymm6, %ymm1
> +
> +/*
> + * prepare table index
> + * table lookup
> + */
> +        vpsrlq    $40, %ymm3, %ymm11
> +        vcmplt_oqpd %ymm3, %ymm10, %ymm3
> +        vandpd    Bias+__svml_dasinh_data_internal(%rip), %ymm3, %ymm14
> +        vorpd     Bias1+__svml_dasinh_data_internal(%rip), %ymm14, %ymm15
> +        vsubpd    %ymm15, %ymm1, %ymm1
> +        vmulpd    L2+__svml_dasinh_data_internal(%rip), %ymm1, %ymm3
> +
> +/* polynomial */
> +        vmovupd   poly_coeff+__svml_dasinh_data_internal(%rip), %ymm1
> +        vfmadd213pd poly_coeff+32+__svml_dasinh_data_internal(%rip), %ymm5, %ymm1
> +        vfmadd213pd %ymm2, %ymm4, %ymm1
> +
> +/* reconstruction */
> +        vfmadd213pd %ymm5, %ymm4, %ymm1
> +        vextractf128 $1, %ymm11, %xmm7
> +        vmovd     %xmm11, %edx
> +        vmovd     %xmm7, %esi
> +        movslq    %edx, %rdx
> +        vpextrd   $2, %xmm11, %ecx
> +        movslq    %esi, %rsi
> +        vpextrd   $2, %xmm7, %edi
> +        movslq    %ecx, %rcx
> +        movslq    %edi, %rdi
> +        vmovsd    (%r8,%rdx), %xmm0
> +        vmovsd    (%r8,%rsi), %xmm8
> +        vmovhpd   (%r8,%rcx), %xmm0, %xmm6
> +        vmovhpd   (%r8,%rdi), %xmm8, %xmm9
> +        vinsertf128 $1, %xmm9, %ymm6, %ymm0
> +        vaddpd    %ymm1, %ymm0, %ymm0
> +        vaddpd    %ymm0, %ymm3, %ymm7
> +
> +/* Finally, reincorporate the original sign. */
> +        vandpd    dSign+__svml_dasinh_data_internal(%rip), %ymm13, %ymm6
> +        vxorpd    %ymm7, %ymm6, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm13
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   %ymm13, 32(%rsp)
> +        vmovupd   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      asinh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_asinh_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dasinh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
> +        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
> +        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
> +        __declspec(align(32)) VUINT32 ExpMask[4][2];
> +        __declspec(align(32)) VUINT32 Two10[4][2];
> +        __declspec(align(32)) VUINT32 MinLog1p[4][2];
> +        __declspec(align(32)) VUINT32 MaxLog1p[4][2];
> +        __declspec(align(32)) VUINT32 One[4][2];
> +        __declspec(align(32)) VUINT32 SgnMask[4][2];
> +        __declspec(align(32)) VUINT32 XThreshold[4][2];
> +        __declspec(align(32)) VUINT32 XhMask[4][2];
> +        __declspec(align(32)) VUINT32 Threshold[4][2];
> +        __declspec(align(32)) VUINT32 Bias[4][2];
> +        __declspec(align(32)) VUINT32 Bias1[4][2];
> +        __declspec(align(32)) VUINT32 ExpMask0[4][2];
> +        __declspec(align(32)) VUINT32 ExpMask2[4][2];
> +        __declspec(align(32)) VUINT32 L2[4][2];
> +        __declspec(align(32)) VUINT32 dBigThreshold[4][2];
> +        __declspec(align(32)) VUINT32 dC2[4][2];
> +        __declspec(align(32)) VUINT32 dC3[4][2];
> +        __declspec(align(32)) VUINT32 dC4[4][2];
> +        __declspec(align(32)) VUINT32 dC5[4][2];
> +        __declspec(align(32)) VUINT32 dHalf[4][2];
> +        __declspec(align(32)) VUINT32 dLargestFinite[4][2];
> +        __declspec(align(32)) VUINT32 dLittleThreshold[4][2];
> +        __declspec(align(32)) VUINT32 dSign[4][2];
> +        __declspec(align(32)) VUINT32 dThirtyOne[4][2];
> +        __declspec(align(32)) VUINT32 dTopMask12[4][2];
> +        __declspec(align(32)) VUINT32 dTopMask29[4][2];
> +        __declspec(align(32)) VUINT32 XScale[4][2];
> +} __svml_dasinh_data_internal;
> +#endif
> +__svml_dasinh_data_internal:
> +        /* Log_HA_table */
> +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> +        /*== Log_LA_table ==*/
> +        .align 32
> +        .quad 0x8000000000000000
> +        .quad 0xbf5ff802a9ab10e6
> +        .quad 0xbf6ff00aa2b10bc0
> +        .quad 0xbf77ee11ebd82e94
> +        .quad 0xbf7fe02a6b106789
> +        .quad 0xbf83e7295d25a7d9
> +        .quad 0xbf87dc475f810a77
> +        .quad 0xbf8bcf712c74384c
> +        .quad 0xbf8fc0a8b0fc03e4
> +        .quad 0xbf91d7f7eb9eebe7
> +        .quad 0xbf93cea44346a575
> +        .quad 0xbf95c45a51b8d389
> +        .quad 0xbf97b91b07d5b11b
> +        .quad 0xbf99ace7551cc514
> +        .quad 0xbf9b9fc027af9198
> +        .quad 0xbf9d91a66c543cc4
> +        .quad 0xbf9f829b0e783300
> +        .quad 0xbfa0b94f7c196176
> +        .quad 0xbfa1b0d98923d980
> +        .quad 0xbfa2a7ec2214e873
> +        .quad 0xbfa39e87b9febd60
> +        .quad 0xbfa494acc34d911c
> +        .quad 0xbfa58a5bafc8e4d5
> +        .quad 0xbfa67f94f094bd98
> +        .quad 0xbfa77458f632dcfc
> +        .quad 0xbfa868a83083f6cf
> +        .quad 0xbfa95c830ec8e3eb
> +        .quad 0xbfaa4fe9ffa3d235
> +        .quad 0xbfab42dd711971bf
> +        .quad 0xbfac355dd0921f2d
> +        .quad 0xbfad276b8adb0b52
> +        .quad 0xbfae19070c276016
> +        .quad 0xbfaf0a30c01162a6
> +        .quad 0xbfaffae9119b9303
> +        .quad 0xbfb075983598e471
> +        .quad 0xbfb0ed839b5526fe
> +        .quad 0xbfb16536eea37ae1
> +        .quad 0xbfb1dcb263db1944
> +        .quad 0xbfb253f62f0a1417
> +        .quad 0xbfb2cb0283f5de1f
> +        .quad 0xbfb341d7961bd1d1
> +        .quad 0xbfb3b87598b1b6ee
> +        .quad 0xbfb42edcbea646f0
> +        .quad 0xbfb4a50d3aa1b040
> +        .quad 0xbfb51b073f06183f
> +        .quad 0xbfb590cafdf01c28
> +        .quad 0xbfb60658a93750c4
> +        .quad 0xbfb67bb0726ec0fc
> +        .quad 0xbfb6f0d28ae56b4c
> +        .quad 0xbfb765bf23a6be13
> +        .quad 0xbfb7da766d7b12cd
> +        .quad 0xbfb84ef898e8282a
> +        .quad 0xbfb8c345d6319b21
> +        .quad 0xbfb9375e55595ede
> +        .quad 0xbfb9ab42462033ad
> +        .quad 0xbfba1ef1d8061cd4
> +        .quad 0xbfba926d3a4ad563
> +        .quad 0xbfbb05b49bee43fe
> +        .quad 0xbfbb78c82bb0eda1
> +        .quad 0xbfbbeba818146765
> +        .quad 0xbfbc5e548f5bc743
> +        .quad 0xbfbcd0cdbf8c13e1
> +        .quad 0xbfbd4313d66cb35d
> +        .quad 0xbfbdb5270187d927
> +        .quad 0xbfbe27076e2af2e6
> +        .quad 0xbfbe98b549671467
> +        .quad 0xbfbf0a30c01162a6
> +        .quad 0xbfbf7b79fec37ddf
> +        .quad 0xbfbfec9131dbeabb
> +        .quad 0xbfc02ebb42bf3d4b
> +        .quad 0xbfc0671512ca596e
> +        .quad 0xbfc09f561ee719c3
> +        .quad 0xbfc0d77e7cd08e59
> +        .quad 0xbfc10f8e422539b1
> +        .quad 0xbfc14785846742ac
> +        .quad 0xbfc17f6458fca611
> +        .quad 0xbfc1b72ad52f67a0
> +        .quad 0xbfc1eed90e2dc2c3
> +        .quad 0xbfc2266f190a5acb
> +        .quad 0xbfc25ded0abc6ad2
> +        .quad 0xbfc29552f81ff523
> +        .quad 0xbfc2cca0f5f5f251
> +        .quad 0xbfc303d718e47fd3
> +        .quad 0xbfc33af575770e4f
> +        .quad 0xbfc371fc201e8f74
> +        .quad 0xbfc3a8eb2d31a376
> +        .quad 0xbfc3dfc2b0ecc62a
> +        .quad 0xbfc41682bf727bc0
> +        .quad 0xbfc44d2b6ccb7d1e
> +        .quad 0xbfc483bccce6e3dd
> +        .quad 0xbfc4ba36f39a55e5
> +        .quad 0xbfc4f099f4a230b2
> +        .quad 0xbfc526e5e3a1b438
> +        .quad 0xbfc55d1ad4232d6f
> +        .quad 0xbfc59338d9982086
> +        .quad 0xbfc5c940075972b9
> +        .quad 0xbfc5ff3070a793d4
> +        .quad 0xbfc6350a28aaa758
> +        .quad 0xbfc66acd4272ad51
> +        .quad 0xbfc6a079d0f7aad2
> +        .quad 0xbfc6d60fe719d21d
> +        .quad 0xbfc70b8f97a1aa75
> +        .quad 0xbfc740f8f54037a5
> +        .quad 0xbfc7764c128f2127
> +        .quad 0xbfc7ab890210d909
> +        .quad 0xbfc7e0afd630c274
> +        .quad 0xbfc815c0a14357eb
> +        .quad 0xbfc84abb75865139
> +        .quad 0xbfc87fa06520c911
> +        .quad 0xbfc8b46f8223625b
> +        .quad 0xbfc8e928de886d41
> +        .quad 0xbfc91dcc8c340bde
> +        .quad 0xbfc9525a9cf456b4
> +        .quad 0xbfc986d3228180ca
> +        .quad 0xbfc9bb362e7dfb83
> +        .quad 0xbfc9ef83d2769a34
> +        .quad 0xbfca23bc1fe2b563
> +        .quad 0xbfca57df28244dcd
> +        .quad 0xbfca8becfc882f19
> +        .quad 0xbfcabfe5ae46124c
> +        .quad 0xbfcaf3c94e80bff3
> +        .quad 0xbfcb2797ee46320c
> +        .quad 0xbfcb5b519e8fb5a4
> +        .quad 0xbfcb8ef670420c3b
> +        .quad 0xbfcbc286742d8cd6
> +        .quad 0xbfcbf601bb0e44e2
> +        .quad 0xbfcc2968558c18c1
> +        .quad 0xbfcc5cba543ae425
> +        .quad 0xbfcc8ff7c79a9a22
> +        .quad 0xbfccc320c0176502
> +        .quad 0xbfccf6354e09c5dc
> +        .quad 0xbfcd293581b6b3e7
> +        .quad 0xbfcd5c216b4fbb91
> +        .quad 0xbfcd8ef91af31d5e
> +        .quad 0xbfcdc1bca0abec7d
> +        .quad 0xbfcdf46c0c722d2f
> +        .quad 0xbfce27076e2af2e6
> +        .quad 0xbfce598ed5a87e2f
> +        .quad 0xbfce8c0252aa5a60
> +        .quad 0xbfcebe61f4dd7b0b
> +        .quad 0xbfcef0adcbdc5936
> +        .quad 0xbfcf22e5e72f105d
> +        .quad 0xbfcf550a564b7b37
> +        .quad 0xbfcf871b28955045
> +        .quad 0xbfcfb9186d5e3e2b
> +        .quad 0xbfcfeb0233e607cc
> +        .quad 0xbfd00e6c45ad501d
> +        .quad 0xbfd0274dc16c232f
> +        .quad 0xbfd0402594b4d041
> +        .quad 0xbfd058f3c703ebc6
> +        .quad 0xbfd071b85fcd590d
> +        .quad 0xbfd08a73667c57af
> +        .quad 0xbfd0a324e27390e3
> +        .quad 0xbfd0bbccdb0d24bd
> +        .quad 0xbfd0d46b579ab74b
> +        .quad 0xbfd0ed005f657da4
> +        .quad 0xbfd1058bf9ae4ad5
> +        .quad 0xbfd11e0e2dad9cb7
> +        .quad 0xbfd136870293a8b0
> +        .quad 0xbfd14ef67f88685a
> +        .quad 0xbfd1675cababa60e
> +        .quad 0xbfd17fb98e15095d
> +        .quad 0xbfd1980d2dd4236f
> +        .quad 0xbfd1b05791f07b49
> +        .quad 0xbfd1c898c16999fb
> +        .quad 0xbfd1e0d0c33716be
> +        .quad 0xbfd1f8ff9e48a2f3
> +        .quad 0xbfd211255986160c
> +        .quad 0xbfd22941fbcf7966
> +        .quad 0xbfd241558bfd1404
> +        .quad 0xbfd2596010df763a
> +        .quad 0xbfd27161913f853d
> +        .quad 0xbfd2895a13de86a3
> +        .quad 0xbfd2a1499f762bc9
> +        .quad 0xbfd2b9303ab89d25
> +        .quad 0xbfd2d10dec508583
> +        .quad 0xbfd2e8e2bae11d31
> +        .quad 0xbfd300aead06350c
> +        .quad 0xbfd31871c9544185
> +        .quad 0xbfd3302c16586588
> +        .quad 0xbfd347dd9a987d55
> +        .quad 0xbfd35f865c93293e
> +        .quad 0xbfd3772662bfd85b
> +        .quad 0xbfd38ebdb38ed321
> +        .quad 0xbfd3a64c556945ea
> +        .quad 0xbfd3bdd24eb14b6a
> +        .quad 0xbfd3d54fa5c1f710
> +        .quad 0xbfd3ecc460ef5f50
> +        .quad 0xbfd404308686a7e4
> +        .quad 0xbfd41b941cce0bee
> +        .quad 0xbfd432ef2a04e814
> +        .quad 0xbfd44a41b463c47c
> +        .quad 0xbfd4618bc21c5ec2
> +        .quad 0xbfd478cd5959b3d9
> +        .quad 0xbfd49006804009d1
> +        .quad 0xbfd4a7373cecf997
> +        .quad 0xbfd4be5f957778a1
> +        .quad 0xbfd4d57f8fefe27f
> +        .quad 0xbfd4ec973260026a
> +        .quad 0xbfd503a682cb1cb3
> +        .quad 0xbfd51aad872df82d
> +        .quad 0xbfd531ac457ee77e
> +        .quad 0xbfd548a2c3add263
> +        .quad 0xbfd55f9107a43ee2
> +        .quad 0xbfd5767717455a6c
> +        .quad 0xbfd58d54f86e02f2
> +        .quad 0xbfd5a42ab0f4cfe2
> +        .quad 0xbfd5baf846aa1b19
> +        .quad 0xbfd5d1bdbf5809ca
> +        .quad 0xbfd5e87b20c2954a
> +        .quad 0xbfd5ff3070a793d4
> +        .quad 0xbfd615ddb4bec13c
> +        .quad 0xbfd62c82f2b9c795
> +        .quad 0x3fd61965cdb02c1f
> +        .quad 0x3fd602d08af091ec
> +        .quad 0x3fd5ec433d5c35ae
> +        .quad 0x3fd5d5bddf595f30
> +        .quad 0x3fd5bf406b543db2
> +        .quad 0x3fd5a8cadbbedfa1
> +        .quad 0x3fd5925d2b112a59
> +        .quad 0x3fd57bf753c8d1fb
> +        .quad 0x3fd565995069514c
> +        .quad 0x3fd54f431b7be1a9
> +        .quad 0x3fd538f4af8f72fe
> +        .quad 0x3fd522ae0738a3d8
> +        .quad 0x3fd50c6f1d11b97c
> +        .quad 0x3fd4f637ebba9810
> +        .quad 0x3fd4e0086dd8baca
> +        .quad 0x3fd4c9e09e172c3c
> +        .quad 0x3fd4b3c077267e9a
> +        .quad 0x3fd49da7f3bcc41f
> +        .quad 0x3fd487970e958770
> +        .quad 0x3fd4718dc271c41b
> +        .quad 0x3fd45b8c0a17df13
> +        .quad 0x3fd44591e0539f49
> +        .quad 0x3fd42f9f3ff62642
> +        .quad 0x3fd419b423d5e8c7
> +        .quad 0x3fd403d086cea79c
> +        .quad 0x3fd3edf463c1683e
> +        .quad 0x3fd3d81fb5946dba
> +        .quad 0x3fd3c25277333184
> +        .quad 0x3fd3ac8ca38e5c5f
> +        .quad 0x3fd396ce359bbf54
> +        .quad 0x3fd3811728564cb2
> +        .quad 0x3fd36b6776be1117
> +        .quad 0x3fd355bf1bd82c8b
> +        .quad 0x3fd3401e12aecba1
> +        .quad 0x3fd32a84565120a8
> +        .quad 0x3fd314f1e1d35ce4
> +        .quad 0x3fd2ff66b04ea9d4
> +        .quad 0x3fd2e9e2bce12286
> +        .quad 0x3fd2d46602adccee
> +        .quad 0x3fd2bef07cdc9354
> +        .quad 0x3fd2a982269a3dbf
> +        .quad 0x3fd2941afb186b7c
> +        .quad 0x3fd27ebaf58d8c9d
> +        .quad 0x3fd269621134db92
> +        .quad 0x3fd25410494e56c7
> +        .quad 0x3fd23ec5991eba49
> +        .quad 0x3fd22981fbef797b
> +        .quad 0x3fd214456d0eb8d4
> +        .quad 0x3fd1ff0fe7cf47a7
> +        .quad 0x3fd1e9e1678899f4
> +        .quad 0x3fd1d4b9e796c245
> +        .quad 0x3fd1bf99635a6b95
> +        .quad 0x3fd1aa7fd638d33f
> +        .quad 0x3fd1956d3b9bc2fa
> +        .quad 0x3fd180618ef18adf
> +        .quad 0x3fd16b5ccbacfb73
> +        .quad 0x3fd1565eed455fc3
> +        .quad 0x3fd14167ef367783
> +        .quad 0x3fd12c77cd00713b
> +        .quad 0x3fd1178e8227e47c
> +        .quad 0x3fd102ac0a35cc1c
> +        .quad 0x3fd0edd060b78081
> +        .quad 0x3fd0d8fb813eb1ef
> +        .quad 0x3fd0c42d676162e3
> +        .quad 0x3fd0af660eb9e279
> +        .quad 0x3fd09aa572e6c6d4
> +        .quad 0x3fd085eb8f8ae797
> +        .quad 0x3fd07138604d5862
> +        .quad 0x3fd05c8be0d9635a
> +        .quad 0x3fd047e60cde83b8
> +        .quad 0x3fd03346e0106062
> +        .quad 0x3fd01eae5626c691
> +        .quad 0x3fd00a1c6adda473
> +        .quad 0x3fcfeb2233ea07cd
> +        .quad 0x3fcfc218be620a5e
> +        .quad 0x3fcf991c6cb3b379
> +        .quad 0x3fcf702d36777df0
> +        .quad 0x3fcf474b134df229
> +        .quad 0x3fcf1e75fadf9bde
> +        .quad 0x3fcef5ade4dcffe6
> +        .quad 0x3fceccf2c8fe920a
> +        .quad 0x3fcea4449f04aaf5
> +        .quad 0x3fce7ba35eb77e2a
> +        .quad 0x3fce530effe71012
> +        .quad 0x3fce2a877a6b2c12
> +        .quad 0x3fce020cc6235ab5
> +        .quad 0x3fcdd99edaf6d7e9
> +        .quad 0x3fcdb13db0d48940
> +        .quad 0x3fcd88e93fb2f450
> +        .quad 0x3fcd60a17f903515
> +        .quad 0x3fcd38666871f465
> +        .quad 0x3fcd1037f2655e7b
> +        .quad 0x3fcce816157f1988
> +        .quad 0x3fccc000c9db3c52
> +        .quad 0x3fcc97f8079d44ec
> +        .quad 0x3fcc6ffbc6f00f71
> +        .quad 0x3fcc480c0005ccd1
> +        .quad 0x3fcc2028ab17f9b4
> +        .quad 0x3fcbf851c067555f
> +        .quad 0x3fcbd087383bd8ad
> +        .quad 0x3fcba8c90ae4ad19
> +        .quad 0x3fcb811730b823d2
> +        .quad 0x3fcb5971a213acdb
> +        .quad 0x3fcb31d8575bce3d
> +        .quad 0x3fcb0a4b48fc1b46
> +        .quad 0x3fcae2ca6f672bd4
> +        .quad 0x3fcabb55c31693ad
> +        .quad 0x3fca93ed3c8ad9e3
> +        .quad 0x3fca6c90d44b704e
> +        .quad 0x3fca454082e6ab05
> +        .quad 0x3fca1dfc40f1b7f1
> +        .quad 0x3fc9f6c407089664
> +        .quad 0x3fc9cf97cdce0ec3
> +        .quad 0x3fc9a8778debaa38
> +        .quad 0x3fc981634011aa75
> +        .quad 0x3fc95a5adcf7017f
> +        .quad 0x3fc9335e5d594989
> +        .quad 0x3fc90c6db9fcbcd9
> +        .quad 0x3fc8e588ebac2dbf
> +        .quad 0x3fc8beafeb38fe8c
> +        .quad 0x3fc897e2b17b19a5
> +        .quad 0x3fc871213750e994
> +        .quad 0x3fc84a6b759f512f
> +        .quad 0x3fc823c16551a3c2
> +        .quad 0x3fc7fd22ff599d4f
> +        .quad 0x3fc7d6903caf5ad0
> +        .quad 0x3fc7b0091651528c
> +        .quad 0x3fc7898d85444c73
> +        .quad 0x3fc7631d82935a86
> +        .quad 0x3fc73cb9074fd14d
> +        .quad 0x3fc716600c914054
> +        .quad 0x3fc6f0128b756abc
> +        .quad 0x3fc6c9d07d203fc7
> +        .quad 0x3fc6a399dabbd383
> +        .quad 0x3fc67d6e9d785771
> +        .quad 0x3fc6574ebe8c133a
> +        .quad 0x3fc6313a37335d76
> +        .quad 0x3fc60b3100b09476
> +        .quad 0x3fc5e533144c1719
> +        .quad 0x3fc5bf406b543db2
> +        .quad 0x3fc59958ff1d52f1
> +        .quad 0x3fc5737cc9018cdd
> +        .quad 0x3fc54dabc26105d2
> +        .quad 0x3fc527e5e4a1b58d
> +        .quad 0x3fc5022b292f6a45
> +        .quad 0x3fc4dc7b897bc1c8
> +        .quad 0x3fc4b6d6fefe22a4
> +        .quad 0x3fc4913d8333b561
> +        .quad 0x3fc46baf0f9f5db7
> +        .quad 0x3fc4462b9dc9b3dc
> +        .quad 0x3fc420b32740fdd4
> +        .quad 0x3fc3fb45a59928cc
> +        .quad 0x3fc3d5e3126bc27f
> +        .quad 0x3fc3b08b6757f2a9
> +        .quad 0x3fc38b3e9e027479
> +        .quad 0x3fc365fcb0159016
> +        .quad 0x3fc340c59741142e
> +        .quad 0x3fc31b994d3a4f85
> +        .quad 0x3fc2f677cbbc0a96
> +        .quad 0x3fc2d1610c86813a
> +        .quad 0x3fc2ac55095f5c59
> +        .quad 0x3fc28753bc11aba5
> +        .quad 0x3fc2625d1e6ddf57
> +        .quad 0x3fc23d712a49c202
> +        .quad 0x3fc2188fd9807263
> +        .quad 0x3fc1f3b925f25d41
> +        .quad 0x3fc1ceed09853752
> +        .quad 0x3fc1aa2b7e23f72a
> +        .quad 0x3fc185747dbecf34
> +        .quad 0x3fc160c8024b27b1
> +        .quad 0x3fc13c2605c398c3
> +        .quad 0x3fc1178e8227e47c
> +        .quad 0x3fc0f301717cf0fb
> +        .quad 0x3fc0ce7ecdccc28d
> +        .quad 0x3fc0aa06912675d5
> +        .quad 0x3fc08598b59e3a07
> +        .quad 0x3fc06135354d4b18
> +        .quad 0x3fc03cdc0a51ec0d
> +        .quad 0x3fc0188d2ecf6140
> +        .quad 0x3fbfe89139dbd566
> +        .quad 0x3fbfa01c9db57ce2
> +        .quad 0x3fbf57bc7d9005db
> +        .quad 0x3fbf0f70cdd992e3
> +        .quad 0x3fbec739830a1120
> +        .quad 0x3fbe7f1691a32d3e
> +        .quad 0x3fbe3707ee30487b
> +        .quad 0x3fbdef0d8d466db9
> +        .quad 0x3fbda727638446a2
> +        .quad 0x3fbd5f55659210e2
> +        .quad 0x3fbd179788219364
> +        .quad 0x3fbccfedbfee13a8
> +        .quad 0x3fbc885801bc4b23
> +        .quad 0x3fbc40d6425a5cb1
> +        .quad 0x3fbbf968769fca11
> +        .quad 0x3fbbb20e936d6974
> +        .quad 0x3fbb6ac88dad5b1c
> +        .quad 0x3fbb23965a52ff00
> +        .quad 0x3fbadc77ee5aea8c
> +        .quad 0x3fba956d3ecade63
> +        .quad 0x3fba4e7640b1bc38
> +        .quad 0x3fba0792e9277cac
> +        .quad 0x3fb9c0c32d4d2548
> +        .quad 0x3fb97a07024cbe74
> +        .quad 0x3fb9335e5d594989
> +        .quad 0x3fb8ecc933aeb6e8
> +        .quad 0x3fb8a6477a91dc29
> +        .quad 0x3fb85fd927506a48
> +        .quad 0x3fb8197e2f40e3f0
> +        .quad 0x3fb7d33687c293c9
> +        .quad 0x3fb78d02263d82d3
> +        .quad 0x3fb746e100226ed9
> +        .quad 0x3fb700d30aeac0e1
> +        .quad 0x3fb6bad83c1883b6
> +        .quad 0x3fb674f089365a7a
> +        .quad 0x3fb62f1be7d77743
> +        .quad 0x3fb5e95a4d9791cb
> +        .quad 0x3fb5a3abb01ade25
> +        .quad 0x3fb55e10050e0384
> +        .quad 0x3fb518874226130a
> +        .quad 0x3fb4d3115d207eac
> +        .quad 0x3fb48dae4bc31018
> +        .quad 0x3fb4485e03dbdfad
> +        .quad 0x3fb403207b414b7f
> +        .quad 0x3fb3bdf5a7d1ee64
> +        .quad 0x3fb378dd7f749714
> +        .quad 0x3fb333d7f8183f4b
> +        .quad 0x3fb2eee507b40301
> +        .quad 0x3fb2aa04a44717a5
> +        .quad 0x3fb26536c3d8c369
> +        .quad 0x3fb2207b5c78549e
> +        .quad 0x3fb1dbd2643d190b
> +        .quad 0x3fb1973bd1465567
> +        .quad 0x3fb152b799bb3cc9
> +        .quad 0x3fb10e45b3cae831
> +        .quad 0x3fb0c9e615ac4e17
> +        .quad 0x3fb08598b59e3a07
> +        .quad 0x3fb0415d89e74444
> +        .quad 0x3faffa6911ab9301
> +        .quad 0x3faf723b517fc523
> +        .quad 0x3faeea31c006b87c
> +        .quad 0x3fae624c4a0b5e1b
> +        .quad 0x3fadda8adc67ee4e
> +        .quad 0x3fad52ed6405d86f
> +        .quad 0x3faccb73cdddb2cc
> +        .quad 0x3fac441e06f72a9e
> +        .quad 0x3fabbcebfc68f420
> +        .quad 0x3fab35dd9b58baad
> +        .quad 0x3faaaef2d0fb10fc
> +        .quad 0x3faa282b8a936171
> +        .quad 0x3fa9a187b573de7c
> +        .quad 0x3fa91b073efd7314
> +        .quad 0x3fa894aa149fb343
> +        .quad 0x3fa80e7023d8ccc4
> +        .quad 0x3fa788595a3577ba
> +        .quad 0x3fa70265a550e777
> +        .quad 0x3fa67c94f2d4bb58
> +        .quad 0x3fa5f6e73078efb8
> +        .quad 0x3fa5715c4c03ceef
> +        .quad 0x3fa4ebf43349e26f
> +        .quad 0x3fa466aed42de3ea
> +        .quad 0x3fa3e18c1ca0ae92
> +        .quad 0x3fa35c8bfaa1306b
> +        .quad 0x3fa2d7ae5c3c5bae
> +        .quad 0x3fa252f32f8d183f
> +        .quad 0x3fa1ce5a62bc353a
> +        .quad 0x3fa149e3e4005a8d
> +        .quad 0x3fa0c58fa19dfaaa
> +        .quad 0x3fa0415d89e74444
> +        .quad 0x3f9f7a9b16782856
> +        .quad 0x3f9e72bf2813ce51
> +        .quad 0x3f9d6b2725979802
> +        .quad 0x3f9c63d2ec14aaf2
> +        .quad 0x3f9b5cc258b718e6
> +        .quad 0x3f9a55f548c5c43f
> +        .quad 0x3f994f6b99a24475
> +        .quad 0x3f98492528c8cabf
> +        .quad 0x3f974321d3d006d3
> +        .quad 0x3f963d6178690bd6
> +        .quad 0x3f9537e3f45f3565
> +        .quad 0x3f9432a925980cc1
> +        .quad 0x3f932db0ea132e22
> +        .quad 0x3f9228fb1fea2e28
> +        .quad 0x3f912487a5507f70
> +        .quad 0x3f90205658935847
> +        .quad 0x3f8e38ce3033310c
> +        .quad 0x3f8c317384c75f06
> +        .quad 0x3f8a2a9c6c170462
> +        .quad 0x3f882448a388a2aa
> +        .quad 0x3f861e77e8b53fc6
> +        .quad 0x3f841929f96832f0
> +        .quad 0x3f82145e939ef1e9
> +        .quad 0x3f8010157588de71
> +        .quad 0x3f7c189cbb0e27fb
> +        .quad 0x3f78121214586b54
> +        .quad 0x3f740c8a747878e2
> +        .quad 0x3f70080559588b35
> +        .quad 0x3f680904828985c0
> +        .quad 0x3f60040155d5889e
> +        .quad 0x3f50020055655889
> +        .quad 0x0000000000000000
> +        /*== poly_coeff[4] ==*/
> +        .align 32
> +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
> +        /*== Two10 ==*/
> +        .align 32
> +        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
> +        /*== MinLog1p = -1+2^(-53) ==*/
> +        .align 32
> +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff
> +        /*== MaxLog1p ==*/
> +        .align 32
> +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000
> +        /*== One ==*/
> +        .align 32
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== SgnMask ==*/
> +        .align 32
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== XThreshold ==*/
> +        .align 32
> +        .quad 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000
> +        /*== XhMask ==*/
> +        .align 32
> +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00
> +        /*== Threshold ==*/
> +        .align 32
> +        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
> +        /*== Bias ==*/
> +        .align 32
> +        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
> +        /*== Bias1 ==*/
> +        .align 32
> +        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
> +        /*== ExpMask ==*/
> +        .align 32
> +        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000
> +        /*== ExpMask2 ==*/
> +        .align 32
> +        .quad 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000
> +        /*== L2L ==*/
> +        .align 32
> +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> +        /*== dBigThreshold ==*/
> +        .align 32
> +        .quad 0x41D0000000000000, 0x41D0000000000000, 0x41D0000000000000, 0x41D0000000000000
> +        /*== dC2 ==*/
> +        .align 32
> +        .quad 0x3FD8000000000000, 0x3FD8000000000000, 0x3FD8000000000000, 0x3FD8000000000000
> +        /*== dC3 ==*/
> +        .align 32
> +        .quad 0x3FD4000000000000, 0x3FD4000000000000, 0x3FD4000000000000, 0x3FD4000000000000
> +        /*== dC4 ==*/
> +        .align 32
> +        .quad 0x3FD1800000000000, 0x3FD1800000000000, 0x3FD1800000000000, 0x3FD1800000000000
> +        /*== dC5 ==*/
> +        .align 32
> +        .quad 0x3FCF800000000000, 0x3FCF800000000000, 0x3FCF800000000000, 0x3FCF800000000000
> +        /*== dHalf ==*/
> +        .align 32
> +        .quad 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000, 0x3FE0000000000000
> +        /*== dLargestFinite ==*/
> +        .align 32
> +        .quad 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF, 0x7FEFFFFFFFFFFFFF
> +        /*== dLittleThreshold ==*/
> +        .align 32
> +        .quad 0x3F60000000000000, 0x3F60000000000000, 0x3F60000000000000, 0x3F60000000000000
> +        /*== dSign ==*/
> +        .align 32
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
> +        /*== dThirtyOne ==*/
> +        .align 32
> +        .quad 0x403F000000000000, 0x403F000000000000, 0x403F000000000000, 0x403F000000000000
> +        /*== dTopMask12 ==*/
> +        .align 32
> +        .quad 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000, 0xFFFFFE0000000000
> +        /*== dTopMask29 ==*/
> +        .align 32
> +        .quad 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000, 0xFFFFFFFFFF000000
> +        /*== XScale ==*/
> +        .align 32
> +        .quad 0x3E10000000000000, 0x3E10000000000000, 0x3E10000000000000, 0x3E10000000000000
> +        .align 32
> +        .type	__svml_dasinh_data_internal,@object
> +        .size	__svml_dasinh_data_internal,.-__svml_dasinh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core-avx2.S
> new file mode 100644
> index 0000000000..647c73292c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized asinh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_asinh _ZGVeN8v_asinh_avx2_wrapper
> +#include "../svml_d_asinh8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core.c
> new file mode 100644
> index 0000000000..45e5ab72a6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized asinh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_asinh
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_asinh, __GI__ZGVeN8v_asinh, __redirect__ZGVeN8v_asinh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core_avx512.S
> new file mode 100644
> index 0000000000..8100e8a50a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_asinh8_core_avx512.S
> @@ -0,0 +1,510 @@
> +/* Function asinh vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute asinh(x) as log(x + sqrt(x*x + 1))
> + *   using RSQRT instructions for starting the
> + *   square root approximation, and small table lookups for log
> + *   that map to AVX-512 permute instructions
> + *
> + *   Special cases:
> + *
> + *   asinh(NaN) = quiet NaN, and raise invalid exception
> + *   asinh(INF) = that INF
> + *   asinh(0)   = that 0
> + *
> + */
> +
> +/* Offsets for data table __svml_dasinh_data_internal_avx512
> + */
> +#define Log_tbl_H                     	0
> +#define Log_tbl_L                     	128
> +#define One                           	256
> +#define AbsMask                       	320
> +#define SmallThreshold                	384
> +#define Threshold                     	448
> +#define LargeThreshold                	512
> +#define ca2                           	576
> +#define ca1                           	640
> +#define c4s                           	704
> +#define c3s                           	768
> +#define c2s                           	832
> +#define c1s                           	896
> +#define AddB5                         	960
> +#define RcpBitMask                    	1024
> +#define OneEighth                     	1088
> +#define Four                          	1152
> +#define poly_coeff9                   	1216
> +#define poly_coeff8                   	1280
> +#define poly_coeff7                   	1344
> +#define poly_coeff6                   	1408
> +#define poly_coeff5                   	1472
> +#define poly_coeff4                   	1536
> +#define poly_coeff3                   	1600
> +#define poly_coeff2                   	1664
> +#define poly_coeff1                   	1728
> +#define L2H                           	1792
> +#define L2L                           	1856
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_asinh_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovaps   %zmm0, %zmm3
> +
> +/* x^2 */
> +        vmulpd    {rn-sae}, %zmm3, %zmm3, %zmm14
> +        vmovups   One+__svml_dasinh_data_internal_avx512(%rip), %zmm9
> +
> +/* polynomial computation for small inputs */
> +        vmovups   ca2+__svml_dasinh_data_internal_avx512(%rip), %zmm10
> +        vmovups   ca1+__svml_dasinh_data_internal_avx512(%rip), %zmm11
> +
> +/* not a very small input ? */
> +        vmovups   SmallThreshold+__svml_dasinh_data_internal_avx512(%rip), %zmm0
> +
> +/* A=max(x^2, 1); */
> +        vmaxpd    {sae}, %zmm14, %zmm9, %zmm4
> +
> +/* B=min(x^2, 1); */
> +        vminpd    {sae}, %zmm14, %zmm9, %zmm5
> +        vfmadd231pd {rn-sae}, %zmm14, %zmm10, %zmm11
> +
> +/* 1+x^2 */
> +        vaddpd    {rn-sae}, %zmm9, %zmm14, %zmm8
> +
> +/* |input| */
> +        vandpd    AbsMask+__svml_dasinh_data_internal_avx512(%rip), %zmm3, %zmm1
> +        vrsqrt14pd %zmm8, %zmm6
> +        vcmppd    $21, {sae}, %zmm0, %zmm1, %k2
> +
> +/* B_high */
> +        vsubpd    {rn-sae}, %zmm4, %zmm8, %zmm7
> +
> +/* sign bit */
> +        vxorpd    %zmm3, %zmm1, %zmm2
> +        vmulpd    {rn-sae}, %zmm14, %zmm11, %zmm4
> +
> +/* B_low */
> +        vsubpd    {rn-sae}, %zmm7, %zmm5, %zmm13
> +        vmovups   c2s+__svml_dasinh_data_internal_avx512(%rip), %zmm5
> +        vmovups   c1s+__svml_dasinh_data_internal_avx512(%rip), %zmm7
> +
> +/* polynomial computation for small inputs */
> +        vfmadd213pd {rn-sae}, %zmm1, %zmm1, %zmm4
> +
> +/* (x^2)_low */
> +        vmovaps   %zmm3, %zmm15
> +        vfmsub213pd {rn-sae}, %zmm14, %zmm3, %zmm15
> +
> +/* Sh ~sqrt(1+x^2) */
> +        vmulpd    {rn-sae}, %zmm6, %zmm8, %zmm14
> +
> +/* Yl = (x^2)_low + B_low */
> +        vaddpd    {rn-sae}, %zmm15, %zmm13, %zmm13
> +
> +/* very large inputs ? */
> +        vmovups   Threshold+__svml_dasinh_data_internal_avx512(%rip), %zmm15
> +
> +/* (Yh*R0)_low */
> +        vfmsub213pd {rn-sae}, %zmm14, %zmm6, %zmm8
> +        vcmppd    $21, {sae}, %zmm15, %zmm1, %k1
> +
> +/* Sl = (Yh*R0)_low+(R0*Yl) */
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm6, %zmm13
> +        vmovups   LargeThreshold+__svml_dasinh_data_internal_avx512(%rip), %zmm8
> +
> +/* rel. error term: Eh=1-Sh*R0 */
> +        vmovaps   %zmm9, %zmm12
> +        vfnmadd231pd {rn-sae}, %zmm14, %zmm6, %zmm12
> +        vcmppd    $22, {sae}, %zmm8, %zmm1, %k0
> +
> +/* rel. error term: Eh=(1-Sh*R0)-Sl*R0 */
> +        vfnmadd231pd {rn-sae}, %zmm13, %zmm6, %zmm12
> +
> +/*
> + * sqrt(1+x^2) ~ Sh + Sl + Sh*Eh*poly_s
> + * poly_s = c1+c2*Eh+c3*Eh^2
> + */
> +        vmovups   c4s+__svml_dasinh_data_internal_avx512(%rip), %zmm6
> +        vmovups   c3s+__svml_dasinh_data_internal_avx512(%rip), %zmm8
> +
> +/* Sh*Eh */
> +        vmulpd    {rn-sae}, %zmm12, %zmm14, %zmm11
> +        vfmadd231pd {rn-sae}, %zmm12, %zmm6, %zmm8
> +
> +/* Sh+x */
> +        vaddpd    {rn-sae}, %zmm1, %zmm14, %zmm6
> +        kmovw     %k0, %edx
> +        vfmadd213pd {rn-sae}, %zmm5, %zmm12, %zmm8
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm12, %zmm8
> +
> +/* Xh */
> +        vsubpd    {rn-sae}, %zmm14, %zmm6, %zmm12
> +
> +/* Sl + Sh*Eh*poly_s */
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm8, %zmm11
> +
> +/* fixup for very large inputs */
> +        vmovups   OneEighth+__svml_dasinh_data_internal_avx512(%rip), %zmm8
> +
> +/* Xl */
> +        vsubpd    {rn-sae}, %zmm12, %zmm1, %zmm12
> +
> +/* Xin0+Sl+Sh*Eh*poly_s ~ x+sqrt(1+x^2) */
> +        vaddpd    {rn-sae}, %zmm11, %zmm6, %zmm10
> +
> +/* Sl_high */
> +        vsubpd    {rn-sae}, %zmm6, %zmm10, %zmm5
> +        vmulpd    {rn-sae}, %zmm8, %zmm1, %zmm10{%k1}
> +
> +/* Table lookups */
> +        vmovups   __svml_dasinh_data_internal_avx512(%rip), %zmm6
> +
> +/* Sl_l */
> +        vsubpd    {rn-sae}, %zmm5, %zmm11, %zmm7
> +        vrcp14pd  %zmm10, %zmm13
> +
> +/* Xin_low */
> +        vaddpd    {rn-sae}, %zmm12, %zmm7, %zmm14
> +        vmovups   Log_tbl_L+__svml_dasinh_data_internal_avx512(%rip), %zmm7
> +        vmovups   poly_coeff6+__svml_dasinh_data_internal_avx512(%rip), %zmm12
> +
> +/* round reciprocal to 1+4b mantissas */
> +        vpaddq    AddB5+__svml_dasinh_data_internal_avx512(%rip), %zmm13, %zmm11
> +
> +/* fixup for very large inputs */
> +        vxorpd    %zmm14, %zmm14, %zmm14{%k1}
> +        vmovups   poly_coeff5+__svml_dasinh_data_internal_avx512(%rip), %zmm13
> +        vandpd    RcpBitMask+__svml_dasinh_data_internal_avx512(%rip), %zmm11, %zmm15
> +        vmovups   poly_coeff7+__svml_dasinh_data_internal_avx512(%rip), %zmm11
> +
> +/* Prepare table index */
> +        vpsrlq    $48, %zmm15, %zmm5
> +
> +/* reduced argument for log(): (Rcp*Xin-1)+Rcp*Xin_low */
> +        vfmsub231pd {rn-sae}, %zmm15, %zmm10, %zmm9
> +
> +/* exponents */
> +        vgetexppd {sae}, %zmm15, %zmm8
> +        vmovups   Four+__svml_dasinh_data_internal_avx512(%rip), %zmm10
> +        vpermt2pd Log_tbl_H+64+__svml_dasinh_data_internal_avx512(%rip), %zmm5, %zmm6
> +        vpermt2pd Log_tbl_L+64+__svml_dasinh_data_internal_avx512(%rip), %zmm5, %zmm7
> +        vsubpd    {rn-sae}, %zmm10, %zmm8, %zmm8{%k1}
> +        vfmadd231pd {rn-sae}, %zmm15, %zmm14, %zmm9
> +
> +/* polynomials */
> +        vmovups   poly_coeff9+__svml_dasinh_data_internal_avx512(%rip), %zmm10
> +        vmovups   poly_coeff8+__svml_dasinh_data_internal_avx512(%rip), %zmm5
> +        vmovups   poly_coeff4+__svml_dasinh_data_internal_avx512(%rip), %zmm14
> +
> +/* -K*L2H + Th */
> +        vmovups   L2H+__svml_dasinh_data_internal_avx512(%rip), %zmm15
> +        vfmadd231pd {rn-sae}, %zmm9, %zmm10, %zmm5
> +
> +/* -K*L2L + Tl */
> +        vmovups   L2L+__svml_dasinh_data_internal_avx512(%rip), %zmm10
> +        vfnmadd231pd {rn-sae}, %zmm8, %zmm15, %zmm6
> +        vfmadd213pd {rn-sae}, %zmm11, %zmm9, %zmm5
> +        vfnmadd213pd {rn-sae}, %zmm7, %zmm10, %zmm8
> +        vmovups   poly_coeff3+__svml_dasinh_data_internal_avx512(%rip), %zmm7
> +        vmovups   poly_coeff1+__svml_dasinh_data_internal_avx512(%rip), %zmm10
> +
> +/* R^2 */
> +        vmulpd    {rn-sae}, %zmm9, %zmm9, %zmm11
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm9, %zmm5
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm9, %zmm5
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm9, %zmm5
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm9, %zmm5
> +        vmovups   poly_coeff2+__svml_dasinh_data_internal_avx512(%rip), %zmm7
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm9, %zmm5
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm9, %zmm5
> +
> +/* Tl + R^2*Poly */
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm11, %zmm5
> +
> +/* R+Tl + R^2*Poly */
> +        vaddpd    {rn-sae}, %zmm9, %zmm5, %zmm9
> +        vaddpd    {rn-sae}, %zmm9, %zmm6, %zmm4{%k2}
> +        vxorpd    %zmm2, %zmm4, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm3
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm3, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      asinh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_asinh_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dasinh_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Log_tbl_H[16][2];
> +        __declspec(align(64)) VUINT32 Log_tbl_L[16][2];
> +        __declspec(align(64)) VUINT32 One[8][2];
> +        __declspec(align(64)) VUINT32 AbsMask[8][2];
> +        __declspec(align(64)) VUINT32 SmallThreshold[8][2];
> +        __declspec(align(64)) VUINT32 Threshold[8][2];
> +        __declspec(align(64)) VUINT32 LargeThreshold[8][2];
> +        __declspec(align(64)) VUINT32 ca2[8][2];
> +        __declspec(align(64)) VUINT32 ca1[8][2];
> +        __declspec(align(64)) VUINT32 c4s[8][2];
> +        __declspec(align(64)) VUINT32 c3s[8][2];
> +        __declspec(align(64)) VUINT32 c2s[8][2];
> +        __declspec(align(64)) VUINT32 c1s[8][2];
> +        __declspec(align(64)) VUINT32 AddB5[8][2];
> +        __declspec(align(64)) VUINT32 RcpBitMask[8][2];
> +        __declspec(align(64)) VUINT32 OneEighth[8][2];
> +        __declspec(align(64)) VUINT32 Four[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> +        __declspec(align(64)) VUINT32 poly_coeff1[8][2];
> +        __declspec(align(64)) VUINT32 L2H[8][2];
> +        __declspec(align(64)) VUINT32 L2L[8][2];
> +    } __svml_dasinh_data_internal_avx512;
> +#endif
> +__svml_dasinh_data_internal_avx512:
> +        /*== Log_tbl_H ==*/
> +        .quad 0x0000000000000000
> +        .quad 0xbfaf0a30c0120000
> +        .quad 0xbfbe27076e2b0000
> +        .quad 0xbfc5ff3070a78000
> +        .quad 0xbfcc8ff7c79a8000
> +        .quad 0xbfd1675cababc000
> +        .quad 0xbfd4618bc21c4000
> +        .quad 0xbfd739d7f6bbc000
> +        .quad 0xbfd9f323ecbf8000
> +        .quad 0xbfdc8ff7c79a8000
> +        .quad 0xbfdf128f5faf0000
> +        .quad 0xbfe0be72e4252000
> +        .quad 0xbfe1e85f5e704000
> +        .quad 0xbfe307d7334f2000
> +        .quad 0xbfe41d8fe8468000
> +        .quad 0xbfe52a2d265bc000
> +        /*== Log_tbl_L ==*/
> +        .align 64
> +        .quad 0x0000000000000000
> +        .quad 0x3d53ab33d066d1d2
> +        .quad 0x3d2a342c2af0003c
> +        .quad 0xbd43d3c873e20a07
> +        .quad 0xbd4a21ac25d81ef3
> +        .quad 0x3d59f1fc63382a8f
> +        .quad 0xbd5ec27d0b7b37b3
> +        .quad 0xbd50069ce24c53fb
> +        .quad 0xbd584bf2b68d766f
> +        .quad 0xbd5a21ac25d81ef3
> +        .quad 0xbd3bb2cd720ec44c
> +        .quad 0xbd55056d312f7668
> +        .quad 0xbd1a07bd8b34be7c
> +        .quad 0x3d5e83c094debc15
> +        .quad 0x3d5aa33736867a17
> +        .quad 0xbd46abb9df22bc57
> +        /*== One ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== AbsMask ==*/
> +        .align 64
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> +        /*== SmallThreshold ==*/
> +        .align 64
> +        .quad 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000, 0x3f70000000000000
> +        /*== Threshold ==*/
> +        .align 64
> +        .quad 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000, 0x5fe0000000000000
> +        /*== LargeThreshold ==*/
> +        .align 64
> +        .quad 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff, 0x7fefffffffffffff
> +        /*== ca2 ==*/
> +        .align 64
> +        .quad 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7, 0x3fb333220eaf02e7
> +        /*== ca1 ==*/
> +        .align 64
> +        .quad 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e, 0xbfc5555555521e7e
> +        /*== c4s ==*/
> +        .align 64
> +        .quad 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612, 0x3fd1800001943612
> +        /*== c3s ==*/
> +        .align 64
> +        .quad 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000, 0x3fd40000013b0000
> +        /*== c2s ==*/
> +        .align 64
> +        .quad 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000, 0x3fd8000000000000
> +        /*== c1s ==*/
> +        .align 64
> +        .quad 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000, 0x3fe0000000000000
> +        /*== AddB5 ==*/
> +        .align 64
> +        .quad 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000, 0x0000800000000000
> +        /*== RcpBitMask ==*/
> +        .align 64
> +        .quad 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000, 0xffff000000000000
> +        /*==OneEighth ==*/
> +        .align 64
> +        .quad 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000, 0x3fc0000000000000
> +        /*== Four ==*/
> +        .align 64
> +        .quad 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000, 0x4010000000000000
> +        /*== poly_coeff9 ==*/
> +        .align 64
> +        .quad 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368, 0xbfb9a9b040214368
> +        /*== poly_coeff8 ==*/
> +        .align 64
> +        .quad 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778, 0x3fbc80666e249778
> +        /*== poly_coeff7 ==*/
> +        .align 64
> +        .quad 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9, 0xbfbffffb8a054bc9
> +        /*== poly_coeff6 ==*/
> +        .align 64
> +        .quad 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1, 0x3fc24922f71256f1
> +        /*== poly_coeff5 ==*/
> +        .align 64
> +        .quad 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736, 0xbfc55555559ba736
> +        /*== poly_coeff4 ==*/
> +        .align 64
> +        .quad 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af, 0x3fc9999999be77af
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .quad 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65, 0xbfcffffffffffc65
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .quad 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1, 0x3fd55555555554c1
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .quad 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000, 0xbfe0000000000000
> +        /*== L2H = log(2)_high ==*/
> +        .align 64
> +        .quad 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000, 0x3fe62E42FEFA0000
> +        /*== L2L = log(2)_low ==*/
> +        .align 64
> +        .quad 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000, 0x3d7cf79abc9e0000
> +        .align 64
> +        .type	__svml_dasinh_data_internal_avx512,@object
> +        .size	__svml_dasinh_data_internal_avx512,.-__svml_dasinh_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core-avx2.S
> new file mode 100644
> index 0000000000..7dfd95e400
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized asinhf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_asinhf _ZGVeN16v_asinhf_avx2_wrapper
> +#include "../svml_s_asinhf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core.c
> new file mode 100644
> index 0000000000..dc770a0e65
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized asinhf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_asinhf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_asinhf, __GI__ZGVeN16v_asinhf,
> +	       __redirect__ZGVeN16v_asinhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core_avx512.S
> new file mode 100644
> index 0000000000..fc6a8e7cd3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf16_core_avx512.S
> @@ -0,0 +1,476 @@
> +/* Function asinhf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute asinh(x) as log(x + sqrt(x*x + 1))
> + *   using RSQRT instructions for starting the
> + *   square root approximation, and small table lookups for log
> + *   that map to AVX-512 permute instructions
> + *
> + *   Special cases:
> + *
> + *   asinh(NaN) = quiet NaN, and raise invalid exception
> + *   asinh(INF) = that INF
> + *   asinh(0)   = that 0
> + *
> + */
> +
> +/* Offsets for data table __svml_sasinh_data_internal_avx512
> + */
> +#define Log_tbl_H                     	0
> +#define Log_tbl_L                     	128
> +#define One                           	256
> +#define AbsMask                       	320
> +#define SmallThreshold                	384
> +#define Threshold                     	448
> +#define LargeThreshold                	512
> +#define ca1                           	576
> +#define c2s                           	640
> +#define c1s                           	704
> +#define AddB5                         	768
> +#define RcpBitMask                    	832
> +#define OneEighth                     	896
> +#define Four                          	960
> +#define poly_coeff3                   	1024
> +#define poly_coeff2                   	1088
> +#define poly_coeff1                   	1152
> +#define L2H                           	1216
> +#define L2L                           	1280
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_asinhf_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovaps   %zmm0, %zmm10
> +
> +/* x^2 */
> +        vmulps    {rn-sae}, %zmm10, %zmm10, %zmm0
> +        vmovups   One+__svml_sasinh_data_internal_avx512(%rip), %zmm2
> +
> +/* polynomial computation for small inputs */
> +        vmovups   ca1+__svml_sasinh_data_internal_avx512(%rip), %zmm1
> +
> +/* not a very small input ? */
> +        vmovups   SmallThreshold+__svml_sasinh_data_internal_avx512(%rip), %zmm11
> +
> +/* 1+x^2 */
> +        vaddps    {rn-sae}, %zmm2, %zmm0, %zmm7
> +
> +/* |input| */
> +        vandps    AbsMask+__svml_sasinh_data_internal_avx512(%rip), %zmm10, %zmm12
> +
> +/* A=max(x^2, 1); */
> +        vmaxps    {sae}, %zmm0, %zmm2, %zmm14
> +        vrsqrt14ps %zmm7, %zmm8
> +
> +/* B=min(x^2, 1); */
> +        vminps    {sae}, %zmm0, %zmm2, %zmm15
> +        vcmpps    $21, {sae}, %zmm11, %zmm12, %k2
> +
> +/* B_high */
> +        vsubps    {rn-sae}, %zmm14, %zmm7, %zmm9
> +
> +/* sign bit */
> +        vxorps    %zmm10, %zmm12, %zmm13
> +
> +/* Sh ~sqrt(1+x^2) */
> +        vmulps    {rn-sae}, %zmm8, %zmm7, %zmm6
> +        vmovups   LargeThreshold+__svml_sasinh_data_internal_avx512(%rip), %zmm14
> +
> +/* B_low */
> +        vsubps    {rn-sae}, %zmm9, %zmm15, %zmm3
> +
> +/* Sh+x */
> +        vaddps    {rn-sae}, %zmm12, %zmm6, %zmm15
> +
> +/* (Yh*R0)_low */
> +        vfmsub213ps {rn-sae}, %zmm6, %zmm8, %zmm7
> +        vmulps    {rn-sae}, %zmm1, %zmm0, %zmm9
> +        vcmpps    $22, {sae}, %zmm14, %zmm12, %k0
> +        vmovups   c1s+__svml_sasinh_data_internal_avx512(%rip), %zmm1
> +
> +/* polynomial computation for small inputs */
> +        vfmadd213ps {rn-sae}, %zmm12, %zmm12, %zmm9
> +        kmovw     %k0, %edx
> +
> +/* (x^2)_low */
> +        vmovaps   %zmm10, %zmm4
> +        vfmsub213ps {rn-sae}, %zmm0, %zmm10, %zmm4
> +
> +/* Yl = (x^2)_low + B_low */
> +        vaddps    {rn-sae}, %zmm4, %zmm3, %zmm5
> +
> +/* rel. error term: Eh=1-Sh*R0 */
> +        vmovaps   %zmm2, %zmm0
> +        vfnmadd231ps {rn-sae}, %zmm6, %zmm8, %zmm0
> +
> +/* Sl = (Yh*R0)_low+(R0*Yl) */
> +        vfmadd213ps {rn-sae}, %zmm7, %zmm8, %zmm5
> +
> +/* very large inputs ? */
> +        vmovups   Threshold+__svml_sasinh_data_internal_avx512(%rip), %zmm7
> +
> +/* rel. error term: Eh=(1-Sh*R0)-Sl*R0 */
> +        vfnmadd231ps {rn-sae}, %zmm5, %zmm8, %zmm0
> +
> +/* sqrt(1+x^2) ~ Sh + Sl + Sh*Eh*poly_s */
> +        vmovups   c2s+__svml_sasinh_data_internal_avx512(%rip), %zmm8
> +        vcmpps    $21, {sae}, %zmm7, %zmm12, %k1
> +
> +/* Sh*Eh */
> +        vmulps    {rn-sae}, %zmm0, %zmm6, %zmm4
> +        vfmadd231ps {rn-sae}, %zmm0, %zmm8, %zmm1
> +
> +/* Sl + Sh*Eh*poly_s */
> +        vfmadd213ps {rn-sae}, %zmm5, %zmm1, %zmm4
> +
> +/* Xh */
> +        vsubps    {rn-sae}, %zmm6, %zmm15, %zmm5
> +
> +/* fixup for very large inputs */
> +        vmovups   OneEighth+__svml_sasinh_data_internal_avx512(%rip), %zmm6
> +
> +/* Xin0+Sl+Sh*Eh*poly_s ~ x+sqrt(1+x^2) */
> +        vaddps    {rn-sae}, %zmm4, %zmm15, %zmm3
> +
> +/* Xl */
> +        vsubps    {rn-sae}, %zmm5, %zmm12, %zmm5
> +
> +/* Sl_high */
> +        vsubps    {rn-sae}, %zmm15, %zmm3, %zmm0
> +        vmulps    {rn-sae}, %zmm6, %zmm12, %zmm3{%k1}
> +
> +/* -K*L2H + Th */
> +        vmovups   L2H+__svml_sasinh_data_internal_avx512(%rip), %zmm15
> +
> +/* Sl_l */
> +        vsubps    {rn-sae}, %zmm0, %zmm4, %zmm1
> +        vrcp14ps  %zmm3, %zmm6
> +
> +/* Table lookups */
> +        vmovups   __svml_sasinh_data_internal_avx512(%rip), %zmm0
> +
> +/* Xin_low */
> +        vaddps    {rn-sae}, %zmm5, %zmm1, %zmm7
> +
> +/* round reciprocal to 1+4b mantissas */
> +        vpaddd    AddB5+__svml_sasinh_data_internal_avx512(%rip), %zmm6, %zmm4
> +        vmovups   poly_coeff1+__svml_sasinh_data_internal_avx512(%rip), %zmm5
> +        vandps    RcpBitMask+__svml_sasinh_data_internal_avx512(%rip), %zmm4, %zmm8
> +
> +/* fixup for very large inputs */
> +        vxorps    %zmm7, %zmm7, %zmm7{%k1}
> +
> +/* polynomial */
> +        vmovups   poly_coeff3+__svml_sasinh_data_internal_avx512(%rip), %zmm4
> +
> +/* reduced argument for log(): (Rcp*Xin-1)+Rcp*Xin_low */
> +        vfmsub231ps {rn-sae}, %zmm8, %zmm3, %zmm2
> +        vmovups   Four+__svml_sasinh_data_internal_avx512(%rip), %zmm3
> +
> +/* exponents */
> +        vgetexpps {sae}, %zmm8, %zmm1
> +
> +/* Prepare table index */
> +        vpsrld    $18, %zmm8, %zmm14
> +        vfmadd231ps {rn-sae}, %zmm8, %zmm7, %zmm2
> +        vmovups   poly_coeff2+__svml_sasinh_data_internal_avx512(%rip), %zmm7
> +        vsubps    {rn-sae}, %zmm3, %zmm1, %zmm1{%k1}
> +        vpermt2ps Log_tbl_H+64+__svml_sasinh_data_internal_avx512(%rip), %zmm14, %zmm0
> +        vmovups   Log_tbl_L+__svml_sasinh_data_internal_avx512(%rip), %zmm3
> +        vfmadd231ps {rn-sae}, %zmm2, %zmm4, %zmm7
> +        vfnmadd231ps {rn-sae}, %zmm1, %zmm15, %zmm0
> +
> +/* R^2 */
> +        vmulps    {rn-sae}, %zmm2, %zmm2, %zmm6
> +        vfmadd213ps {rn-sae}, %zmm5, %zmm2, %zmm7
> +        vpermt2ps Log_tbl_L+64+__svml_sasinh_data_internal_avx512(%rip), %zmm14, %zmm3
> +
> +/* -K*L2L + Tl */
> +        vmovups   L2L+__svml_sasinh_data_internal_avx512(%rip), %zmm14
> +        vfnmadd213ps {rn-sae}, %zmm3, %zmm14, %zmm1
> +
> +/* Tl + R^2*Poly */
> +        vfmadd213ps {rn-sae}, %zmm1, %zmm6, %zmm7
> +
> +/* R+Tl + R^2*Poly */
> +        vaddps    {rn-sae}, %zmm2, %zmm7, %zmm2
> +        vaddps    {rn-sae}, %zmm2, %zmm0, %zmm9{%k2}
> +        vxorps    %zmm13, %zmm9, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm10
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm10, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      asinhf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_asinhf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_sasinh_data_internal_avx512_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(64)) VUINT32 Log_tbl_H[32][1];
> +        __declspec(align(64)) VUINT32 Log_tbl_L[32][1];
> +        __declspec(align(64)) VUINT32 One[16][1];
> +        __declspec(align(64)) VUINT32 AbsMask[16][1];
> +        __declspec(align(64)) VUINT32 SmallThreshold[16][1];
> +        __declspec(align(64)) VUINT32 Threshold[16][1];
> +        __declspec(align(64)) VUINT32 LargeThreshold[16][1];
> +        __declspec(align(64)) VUINT32 ca1[16][1];
> +        __declspec(align(64)) VUINT32 c2s[16][1];
> +        __declspec(align(64)) VUINT32 c1s[16][1];
> +        __declspec(align(64)) VUINT32 AddB5[16][1];
> +        __declspec(align(64)) VUINT32 RcpBitMask[16][1];
> +        __declspec(align(64)) VUINT32 OneEighth[16][1];
> +        __declspec(align(64)) VUINT32 Four[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff3[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff2[16][1];
> +        __declspec(align(64)) VUINT32 poly_coeff1[16][1];
> +        __declspec(align(64)) VUINT32 L2H[16][1];
> +        __declspec(align(64)) VUINT32 L2L[16][1];
> +    } __svml_sasinh_data_internal_avx512;
> +#endif
> +__svml_sasinh_data_internal_avx512:
> +        /*== Log_tbl_H ==*/
> +        .long 0x00000000
> +        .long 0xbcfc0000
> +        .long 0xbd788000
> +        .long 0xbdb78000
> +        .long 0xbdf14000
> +        .long 0xbe14a000
> +        .long 0xbe300000
> +        .long 0xbe4aa000
> +        .long 0xbe648000
> +        .long 0xbe7dc000
> +        .long 0xbe8b4000
> +        .long 0xbe974000
> +        .long 0xbea31000
> +        .long 0xbeae9000
> +        .long 0xbeb9d000
> +        .long 0xbec4d000
> +        .long 0xbecfa000
> +        .long 0xbeda2000
> +        .long 0xbee48000
> +        .long 0xbeeea000
> +        .long 0xbef89000
> +        .long 0xbf012800
> +        .long 0xbf05f000
> +        .long 0xbf0aa800
> +        .long 0xbf0f4000
> +        .long 0xbf13c800
> +        .long 0xbf184000
> +        .long 0xbf1ca000
> +        .long 0xbf20f000
> +        .long 0xbf252800
> +        .long 0xbf295000
> +        .long 0xbf2d6800
> +        /*== Log_tbl_L ==*/
> +        .align 64
> +        .long 0x80000000
> +        .long 0xb726c39e
> +        .long 0x3839e7fe
> +        .long 0xb7528ae5
> +        .long 0x377891d5
> +        .long 0xb8297c10
> +        .long 0x37cf8f58
> +        .long 0x3852b186
> +        .long 0x35838656
> +        .long 0xb80c36af
> +        .long 0x38235454
> +        .long 0xb862bae1
> +        .long 0x37e87bc7
> +        .long 0x37848150
> +        .long 0x37202511
> +        .long 0xb74e1b05
> +        .long 0x385c1340
> +        .long 0xb8777bcd
> +        .long 0x36038656
> +        .long 0xb7d40984
> +        .long 0xb80f5faf
> +        .long 0xb8254b4c
> +        .long 0xb865c84a
> +        .long 0x37f0b42d
> +        .long 0xb83ebce1
> +        .long 0xb83c2513
> +        .long 0x37a332c4
> +        .long 0x3779654f
> +        .long 0x38602f73
> +        .long 0x367449f8
> +        .long 0xb7b4996f
> +        .long 0xb800986b
> +        /*== One ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== AbsMask ==*/
> +        .align 64
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== SmallThreshold ==*/
> +        .align 64
> +        .long 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000, 0x3c800000
> +        /*== Threshold ==*/
> +        .align 64
> +        .long 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000, 0x5f000000
> +        /*== LargeThreshold ==*/
> +        .align 64
> +        .long 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff, 0x7f7fffff
> +        /*== ca1 ==*/
> +        .align 64
> +        .long 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE, 0xbe2AA5DE
> +        /*== c2s ==*/
> +        .align 64
> +        .long 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000, 0x3ec00000
> +        /*== c1s ==*/
> +        .align 64
> +        .long 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000
> +        /*== AddB5 ==*/
> +        .align 64
> +        .long 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000, 0x00020000
> +        /*== RcpBitMask ==*/
> +        .align 64
> +        .long 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000, 0xfffc0000
> +        /*==OneEighth ==*/
> +        .align 64
> +        .long 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000, 0x3e000000
> +        /*== Four ==*/
> +        .align 64
> +        .long 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000, 0x40800000
> +        /*== poly_coeff3 ==*/
> +        .align 64
> +        .long 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810, 0xbe800810
> +        /*== poly_coeff2 ==*/
> +        .align 64
> +        .long 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e, 0x3eaab11e
> +        /*== poly_coeff1 ==*/
> +        .align 64
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000
> +        /*== L2H = log(2)_high ==*/
> +        .align 64
> +        .long 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000, 0x3f317000
> +        /*== L2L = log(2)_low ==*/
> +        .align 64
> +        .long 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4, 0x3805fdf4
> +        .align 64
> +        .type	__svml_sasinh_data_internal_avx512,@object
> +        .size	__svml_sasinh_data_internal_avx512,.-__svml_sasinh_data_internal_avx512
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core-sse2.S
> new file mode 100644
> index 0000000000..52e4d2f728
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized asinhf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_asinhf _ZGVbN4v_asinhf_sse2
> +#include "../svml_s_asinhf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core.c
> new file mode 100644
> index 0000000000..296d5754ae
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized asinhf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_asinhf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_asinhf, __GI__ZGVbN4v_asinhf,
> +	       __redirect__ZGVbN4v_asinhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core_sse4.S
> new file mode 100644
> index 0000000000..1eeeb4f5af
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf4_core_sse4.S
> @@ -0,0 +1,509 @@
> +/* Function asinhf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute asinh(x) as log(x + sqrt(x*x + 1))
> + *
> + *   Special cases:
> + *
> + *   asinh(NaN) = quiet NaN, and raise invalid exception
> + *   asinh(INF) = that INF
> + *   asinh(0)   = that 0
> + *
> + */
> +
> +/* Offsets for data table __svml_sasinh_data_internal
> + */
> +#define SgnMask                       	0
> +#define sOne                          	16
> +#define sPoly                         	32
> +#define iBrkValue                     	160
> +#define iOffExpoMask                  	176
> +#define sBigThreshold                 	192
> +#define sC2                           	208
> +#define sC3                           	224
> +#define sHalf                         	240
> +#define sLargestFinite                	256
> +#define sLittleThreshold              	272
> +#define sSign                         	288
> +#define sThirtyOne                    	304
> +#define sTopMask11                    	320
> +#define sTopMask8                     	336
> +#define XScale                        	352
> +#define sLn2                          	368
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_asinhf_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm8
> +
> +/*
> + * Split X into high and low parts, XHi (<= 11 bits) and XLo (<= 13 bits)
> + * We could use either X or |X| here, but it doesn't seem to matter
> + */
> +        movups    sTopMask11+__svml_sasinh_data_internal(%rip), %xmm10
> +        movaps    %xmm8, %xmm2
> +        andps     %xmm8, %xmm10
> +
> +/*
> + * Compute X^2 = (XHi + XLo)^2 = XHi^2 + XLo * (X + XHi)
> + * The two parts are shifted off by around 11 bits. So even though
> + * the low bit will not in general be exact, it's near enough
> + */
> +        movaps    %xmm10, %xmm3
> +        subps     %xmm10, %xmm2
> +        mulps     %xmm10, %xmm3
> +        addps     %xmm8, %xmm10
> +
> +/* Load the constant 1 and a sign mask */
> +        movups    sOne+__svml_sasinh_data_internal(%rip), %xmm7
> +
> +/*
> + * Finally, express Y + W = X^2 + 1 accurately where Y has <= 8 bits.
> + * If |X| <= 1 then |XHi| <= 1 and so |X2Hi| <= 1, so we can treat 1
> + * as the dominant component in the compensated summation. Otherwise,
> + * if |X| >= 1, then since X2Hi only has 22 significant bits, the basic
> + * addition will be exact anyway until we get to |X| >= 2^24. But by
> + * that time the log function is well-conditioned enough that the
> + * rounding error doesn't matter. Hence we can treat 1 as dominant even
> + * if it literally isn't.
> + */
> +        movaps    %xmm7, %xmm11
> +        movaps    %xmm7, %xmm4
> +        movups    sTopMask8+__svml_sasinh_data_internal(%rip), %xmm12
> +        addps     %xmm3, %xmm11
> +        mulps     %xmm10, %xmm2
> +        subps     %xmm11, %xmm4
> +        movaps    %xmm12, %xmm0
> +        addps     %xmm3, %xmm4
> +
> +/*
> + * Unfortunately, we can still be in trouble if |X| <= 2^-5, since
> + * the absolute error 2^-(7+24)-ish in sqrt(1 + X^2) gets scaled up
> + * by 1/X and comes close to our threshold. Hence if |X| <= 2^-4,
> + * perform an alternative computation
> + * sqrt(1 + X^2) - 1 = X^2/2 - X^4/8 + X^6/16
> + * X2 = X^2
> + */
> +        addps     %xmm2, %xmm3
> +        addps     %xmm2, %xmm4
> +        andps     %xmm11, %xmm0
> +
> +/*
> + * Compute R = 1/sqrt(Y + W) * (1 + d)
> + * Force R to <= 8 significant bits.
> + * This means that R * Y and R^2 * Y are exactly representable.
> + */
> +        rsqrtps   %xmm0, %xmm14
> +        subps     %xmm0, %xmm11
> +        andps     %xmm12, %xmm14
> +        addps     %xmm11, %xmm4
> +
> +/*
> + * Compute S = (Y/sqrt(Y + W)) * (1 + d)
> + * and T = (W/sqrt(Y + W)) * (1 + d)
> + * so that S + T = sqrt(Y + W) * (1 + d)
> + * S is exact, and the rounding error in T is OK.
> + */
> +        mulps     %xmm14, %xmm0
> +        mulps     %xmm14, %xmm4
> +
> +/*
> + * Get the absolute value of the input, since we will exploit antisymmetry
> + * and mostly assume X >= 0 in the core computation
> + */
> +        movups    SgnMask+__svml_sasinh_data_internal(%rip), %xmm6
> +
> +/*
> + * Compute e = -(2 * d + d^2)
> + * The first FMR is exact, and the rounding error in the other is acceptable
> + * since d and e are ~ 2^-8
> + */
> +        movaps    %xmm14, %xmm13
> +        andps     %xmm8, %xmm6
> +
> +/*
> + * Obtain sqrt(1 + X^2) - 1 in two pieces
> + * sqrt(1 + X^2) - 1
> + * = sqrt(Y + W) - 1
> + * = (S + T) * (1 + Corr) - 1
> + * = [S - 1] + [T + (S + T) * Corr]
> + * We need a compensated summation for the last part. We treat S - 1
> + * as the larger part; it certainly is until about X < 2^-4, and in that
> + * case, the error is affordable since X dominates over sqrt(1 + X^2) - 1
> + * Final sum is dTmp5 (hi) + dTmp7 (lo)
> + */
> +        movaps    %xmm0, %xmm1
> +
> +/*
> + * Check whether the input is finite, by checking |X| <= MaxFloat
> + * Otherwise set the rangemask so that the callout will get used.
> + * Note that this will also use the callout for NaNs since not(NaN <= MaxFloat)
> + */
> +        movaps    %xmm6, %xmm9
> +
> +/*
> + * The following computation can go wrong for very large X, basically
> + * because X^2 overflows. But for large X we have
> + * asinh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
> + * we can just later stick X back into the log and tweak up the exponent.
> + * Actually we scale X by 2^-30 and tweak the exponent up by 31,
> + * to stay in the safe range for the later log computation.
> + * Compute a flag now telling us when do do this.
> + */
> +        movaps    %xmm6, %xmm5
> +        cmpnleps  sLargestFinite+__svml_sasinh_data_internal(%rip), %xmm9
> +        cmpltps   sBigThreshold+__svml_sasinh_data_internal(%rip), %xmm5
> +        mulps     %xmm0, %xmm13
> +        addps     %xmm4, %xmm1
> +        subps     %xmm7, %xmm0
> +        mulps     %xmm4, %xmm14
> +        movmskps  %xmm9, %edx
> +        movaps    %xmm7, %xmm9
> +
> +/*
> + * Now       1 / (1 + d)
> + * = 1 / (1 + (sqrt(1 - e) - 1))
> + * = 1 / sqrt(1 - e)
> + * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 + ...
> + * So compute the first three nonconstant terms of that, so that
> + * we have a relative correction (1 + Corr) to apply to S etc.
> + * C1 = 1/2
> + * C2 = 3/8
> + * C3 = 5/16
> + */
> +        movups    sC3+__svml_sasinh_data_internal(%rip), %xmm15
> +        subps     %xmm13, %xmm9
> +        movups    sHalf+__svml_sasinh_data_internal(%rip), %xmm10
> +        subps     %xmm14, %xmm9
> +
> +/* sX2over2 = X^2/2 */
> +        mulps     %xmm10, %xmm3
> +        mulps     %xmm9, %xmm15
> +
> +/* sX46 = -X^4/4 + X^6/8 */
> +        movaps    %xmm3, %xmm2
> +        movaps    %xmm3, %xmm12
> +
> +/*
> + * Now do another compensated sum to add |X| + [sqrt(1 + X^2) - 1].
> + * It's always safe to assume |X| is larger.
> + * This is the final 2-part argument to the log1p function
> + */
> +        movaps    %xmm6, %xmm14
> +        addps     sC2+__svml_sasinh_data_internal(%rip), %xmm15
> +        mulps     %xmm9, %xmm15
> +        addps     %xmm10, %xmm15
> +        mulps     %xmm15, %xmm9
> +        mulps     %xmm1, %xmm9
> +
> +/* Now multiplex to the case X = 2^-30 * input, Xl = sL = 0 in the "big" case. */
> +        movups    XScale+__svml_sasinh_data_internal(%rip), %xmm15
> +        addps     %xmm9, %xmm4
> +        movaps    %xmm4, %xmm11
> +        addps     %xmm0, %xmm11
> +        subps     %xmm11, %xmm0
> +        addps     %xmm0, %xmm4
> +
> +/* sX4over4 = X^4/4 */
> +        movaps    %xmm3, %xmm0
> +        mulps     %xmm3, %xmm0
> +        mulps     %xmm0, %xmm2
> +        subps     %xmm0, %xmm2
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * also adding L into Xl.
> + * compute 1+x as high, low parts
> + */
> +        movaps    %xmm7, %xmm0
> +
> +/* sX46over2 = -X^4/8 + x^6/16 */
> +        mulps     %xmm2, %xmm10
> +        movaps    %xmm7, %xmm2
> +        addps     %xmm10, %xmm12
> +        subps     %xmm12, %xmm3
> +        addps     %xmm3, %xmm10
> +
> +/* Now multiplex the two possible computations */
> +        movaps    %xmm6, %xmm3
> +        cmpleps   sLittleThreshold+__svml_sasinh_data_internal(%rip), %xmm3
> +        movaps    %xmm3, %xmm13
> +        andps     %xmm3, %xmm12
> +        andnps    %xmm11, %xmm13
> +        movaps    %xmm3, %xmm1
> +        orps      %xmm12, %xmm13
> +        andnps    %xmm4, %xmm1
> +        andps     %xmm3, %xmm10
> +        movaps    %xmm6, %xmm4
> +        orps      %xmm10, %xmm1
> +        addps     %xmm13, %xmm14
> +        mulps     %xmm15, %xmm6
> +        maxps     %xmm14, %xmm0
> +        minps     %xmm14, %xmm2
> +        subps     %xmm14, %xmm4
> +        movaps    %xmm0, %xmm3
> +        addps     %xmm4, %xmm13
> +        addps     %xmm2, %xmm3
> +        addps     %xmm13, %xmm1
> +        subps     %xmm3, %xmm0
> +        movaps    %xmm5, %xmm4
> +        andps     %xmm5, %xmm3
> +        andnps    %xmm6, %xmm4
> +        addps     %xmm0, %xmm2
> +
> +/*
> + * Now resume the main code.
> + * reduction: compute r,n
> + */
> +        movdqu    iBrkValue+__svml_sasinh_data_internal(%rip), %xmm6
> +        orps      %xmm3, %xmm4
> +        psubd     %xmm6, %xmm4
> +        movaps    %xmm7, %xmm0
> +        addps     %xmm2, %xmm1
> +        movdqu    iOffExpoMask+__svml_sasinh_data_internal(%rip), %xmm2
> +        pand      %xmm4, %xmm2
> +        psrad     $23, %xmm4
> +        cvtdq2ps  %xmm4, %xmm3
> +        pslld     $23, %xmm4
> +        andps     %xmm5, %xmm1
> +        paddd     %xmm6, %xmm2
> +        psubd     %xmm4, %xmm0
> +        mulps     %xmm0, %xmm1
> +
> +/* polynomial evaluation */
> +        subps     %xmm7, %xmm2
> +        movups    sPoly+112+__svml_sasinh_data_internal(%rip), %xmm7
> +        addps     %xmm2, %xmm1
> +        mulps     %xmm1, %xmm7
> +        movaps    %xmm5, %xmm2
> +
> +/* Add 31 to the exponent in the "large" case to get log(2 * input) */
> +        movups    sThirtyOne+__svml_sasinh_data_internal(%rip), %xmm0
> +        addps     sPoly+96+__svml_sasinh_data_internal(%rip), %xmm7
> +        addps     %xmm3, %xmm0
> +        mulps     %xmm1, %xmm7
> +        andnps    %xmm0, %xmm2
> +        andps     %xmm5, %xmm3
> +        orps      %xmm3, %xmm2
> +        addps     sPoly+80+__svml_sasinh_data_internal(%rip), %xmm7
> +
> +/* final reconstruction */
> +        mulps     sLn2+__svml_sasinh_data_internal(%rip), %xmm2
> +        mulps     %xmm1, %xmm7
> +
> +/* Finally, reincorporate the original sign. */
> +        movups    sSign+__svml_sasinh_data_internal(%rip), %xmm0
> +        andps     %xmm8, %xmm0
> +        addps     sPoly+64+__svml_sasinh_data_internal(%rip), %xmm7
> +        mulps     %xmm1, %xmm7
> +        addps     sPoly+48+__svml_sasinh_data_internal(%rip), %xmm7
> +        mulps     %xmm1, %xmm7
> +        addps     sPoly+32+__svml_sasinh_data_internal(%rip), %xmm7
> +        mulps     %xmm1, %xmm7
> +        addps     sPoly+16+__svml_sasinh_data_internal(%rip), %xmm7
> +        mulps     %xmm1, %xmm7
> +        addps     sPoly+__svml_sasinh_data_internal(%rip), %xmm7
> +        mulps     %xmm1, %xmm7
> +        mulps     %xmm1, %xmm7
> +        addps     %xmm7, %xmm1
> +        addps     %xmm2, %xmm1
> +        pxor      %xmm1, %xmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm8
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm8, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      asinhf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_asinhf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_sasinh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(16)) VUINT32 SgnMask[4][1];
> +        __declspec(align(16)) VUINT32 sOne[4][1];
> +        __declspec(align(16)) VUINT32 sPoly[8][4][1];
> +        __declspec(align(16)) VUINT32 iBrkValue[4][1];
> +        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
> +        __declspec(align(16)) VUINT32 sBigThreshold[4][1];
> +        __declspec(align(16)) VUINT32 sC2[4][1];
> +        __declspec(align(16)) VUINT32 sC3[4][1];
> +        __declspec(align(16)) VUINT32 sHalf[4][1];
> +        __declspec(align(16)) VUINT32 sLargestFinite[4][1];
> +        __declspec(align(16)) VUINT32 sLittleThreshold[4][1];
> +        __declspec(align(16)) VUINT32 sSign[4][1];
> +        __declspec(align(16)) VUINT32 sThirtyOne[4][1];
> +        __declspec(align(16)) VUINT32 sTopMask11[4][1];
> +        __declspec(align(16)) VUINT32 sTopMask8[4][1];
> +        __declspec(align(16)) VUINT32 XScale[4][1];
> +        __declspec(align(16)) VUINT32 sLn2[4][1];
> +} __svml_sasinh_data_internal;
> +#endif
> +__svml_sasinh_data_internal:
> +        /*== SgnMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 16
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== sPoly[] = SP polynomial ==*/
> +        .align 16
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 16
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 16
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sBigThreshold ==*/
> +        .align 16
> +        .long 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000
> +        /*== sC2 ==*/
> +        .align 16
> +        .long 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000
> +        /*== sC3 ==*/
> +        .align 16
> +        .long 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000
> +        /*== sHalf ==*/
> +        .align 16
> +        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
> +        /*== sLargestFinite ==*/
> +        .align 16
> +        .long 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF
> +        /*== sLittleThreshold ==*/
> +        .align 16
> +        .long 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000
> +        /*== sSign ==*/
> +        .align 16
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000
> +        /*== sThirtyOne ==*/
> +        .align 16
> +        .long 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000
> +        /*== sTopMask11 ==*/
> +        .align 16
> +        .long 0xFFFFE000, 0xFFFFE000, 0xFFFFE000, 0xFFFFE000
> +        /*== sTopMask8 ==*/
> +        .align 16
> +        .long 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000
> +        /*== XScale ==*/
> +        .align 16
> +        .long 0x30800000, 0x30800000, 0x30800000, 0x30800000
> +        /*== sLn2 = SP ln(2) ==*/
> +        .align 16
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> +        .align 16
> +        .type	__svml_sasinh_data_internal,@object
> +        .size	__svml_sasinh_data_internal,.-__svml_sasinh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core-sse.S
> new file mode 100644
> index 0000000000..1a0e113e94
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized asinhf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_asinhf _ZGVdN8v_asinhf_sse_wrapper
> +#include "../svml_s_asinhf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core.c
> new file mode 100644
> index 0000000000..d97097a394
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized asinhf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_asinhf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_asinhf, __GI__ZGVdN8v_asinhf,
> +	       __redirect__ZGVdN8v_asinhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core_avx2.S
> new file mode 100644
> index 0000000000..a966f53773
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_asinhf8_core_avx2.S
> @@ -0,0 +1,457 @@
> +/* Function asinhf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Compute asinh(x) as log(x + sqrt(x*x + 1))
> + *
> + *   Special cases:
> + *
> + *   asinh(NaN) = quiet NaN, and raise invalid exception
> + *   asinh(INF) = that INF
> + *   asinh(0)   = that 0
> + *
> + */
> +
> +/* Offsets for data table __svml_sasinh_data_internal
> + */
> +#define SgnMask                       	0
> +#define sOne                          	32
> +#define sPoly                         	64
> +#define iBrkValue                     	320
> +#define iOffExpoMask                  	352
> +#define sBigThreshold                 	384
> +#define sC2                           	416
> +#define sC3                           	448
> +#define sHalf                         	480
> +#define sLargestFinite                	512
> +#define sLittleThreshold              	544
> +#define sSign                         	576
> +#define sThirtyOne                    	608
> +#define sTopMask8                     	640
> +#define XScale                        	672
> +#define sLn2                          	704
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_asinhf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        vmovaps   %ymm0, %ymm9
> +
> +/* Load the constant 1 and a sign mask */
> +        vmovups   sOne+__svml_sasinh_data_internal(%rip), %ymm8
> +
> +/* No need to split X when FMA is available in hardware. */
> +        vmulps    %ymm9, %ymm9, %ymm5
> +        vmovups   sTopMask8+__svml_sasinh_data_internal(%rip), %ymm1
> +
> +/*
> + * Finally, express Y + W = X^2 + 1 accurately where Y has <= 8 bits.
> + * If |X| <= 1 then |XHi| <= 1 and so |X2Hi| <= 1, so we can treat 1
> + * as the dominant component in the compensated summation. Otherwise,
> + * if |X| >= 1, then since X2Hi only has 22 significant bits, the basic
> + * addition will be exact anyway until we get to |X| >= 2^24. But by
> + * that time the log function is well-conditioned enough that the
> + * rounding error doesn't matter. Hence we can treat 1 as dominant even
> + * if it literally isn't.
> + */
> +        vaddps    %ymm5, %ymm8, %ymm13
> +        vandps    %ymm1, %ymm13, %ymm2
> +        vmovaps   %ymm9, %ymm4
> +        vsubps    %ymm13, %ymm8, %ymm11
> +        vsubps    %ymm2, %ymm13, %ymm15
> +
> +/*
> + * Compute R = 1/sqrt(Y + W) * (1 + d)
> + * Force R to <= 8 significant bits.
> + * This means that R * Y and R^2 * Y are exactly representable.
> + */
> +        vrsqrtps  %ymm2, %ymm0
> +        vfmsub213ps %ymm5, %ymm9, %ymm4
> +        vaddps    %ymm11, %ymm5, %ymm12
> +
> +/*
> + * Get the absolute value of the input, since we will exploit antisymmetry
> + * and mostly assume X >= 0 in the core computation
> + */
> +        vandps    SgnMask+__svml_sasinh_data_internal(%rip), %ymm9, %ymm6
> +
> +/*
> + * Check whether the input is finite, by checking |X| <= MaxFloat
> + * Otherwise set the rangemask so that the callout will get used.
> + * Note that this will also use the callout for NaNs since not(NaN <= MaxFloat)
> + */
> +        vcmpnle_uqps sLargestFinite+__svml_sasinh_data_internal(%rip), %ymm6, %ymm10
> +        vaddps    %ymm12, %ymm4, %ymm14
> +
> +/*
> + * Unfortunately, we can still be in trouble if |X| <= 2^-5, since
> + * the absolute error 2^-(7+24)-ish in sqrt(1 + X^2) gets scaled up
> + * by 1/X and comes close to our threshold. Hence if |X| <= 2^-4,
> + * perform an alternative computation
> + * sqrt(1 + X^2) - 1 = X^2/2 - X^4/8 + X^6/16
> + * X2 = X^2
> + */
> +        vaddps    %ymm4, %ymm5, %ymm4
> +
> +/*
> + * The following computation can go wrong for very large X, basically
> + * because X^2 overflows. But for large X we have
> + * asinh(X) / log(2 X) - 1 =~= 1/(4 * X^2), so for X >= 2^30
> + * we can just later stick X back into the log and tweak up the exponent.
> + * Actually we scale X by 2^-30 and tweak the exponent up by 31,
> + * to stay in the safe range for the later log computation.
> + * Compute a flag now telling us when do do this.
> + */
> +        vcmplt_oqps sBigThreshold+__svml_sasinh_data_internal(%rip), %ymm6, %ymm7
> +        vaddps    %ymm15, %ymm14, %ymm3
> +
> +/*
> + * Now       1 / (1 + d)
> + * = 1 / (1 + (sqrt(1 - e) - 1))
> + * = 1 / sqrt(1 - e)
> + * = 1 + 1/2 * e + 3/8 * e^2 + 5/16 * e^3 + 35/128 * e^4 + ...
> + * So compute the first three nonconstant terms of that, so that
> + * we have a relative correction (1 + Corr) to apply to S etc.
> + * C1 = 1/2
> + * C2 = 3/8
> + * C3 = 5/16
> + */
> +        vmovups   sC3+__svml_sasinh_data_internal(%rip), %ymm12
> +        vmovmskps %ymm10, %edx
> +        vandps    %ymm1, %ymm0, %ymm10
> +
> +/*
> + * Compute S = (Y/sqrt(Y + W)) * (1 + d)
> + * and T = (W/sqrt(Y + W)) * (1 + d)
> + * so that S + T = sqrt(Y + W) * (1 + d)
> + * S is exact, and the rounding error in T is OK.
> + */
> +        vmulps    %ymm10, %ymm2, %ymm15
> +        vmulps    %ymm3, %ymm10, %ymm14
> +        vmovups   sHalf+__svml_sasinh_data_internal(%rip), %ymm3
> +        vsubps    %ymm8, %ymm15, %ymm0
> +
> +/*
> + * Obtain sqrt(1 + X^2) - 1 in two pieces
> + * sqrt(1 + X^2) - 1
> + * = sqrt(Y + W) - 1
> + * = (S + T) * (1 + Corr) - 1
> + * = [S - 1] + [T + (S + T) * Corr]
> + * We need a compensated summation for the last part. We treat S - 1
> + * as the larger part; it certainly is until about X < 2^-4, and in that
> + * case, the error is affordable since X dominates over sqrt(1 + X^2) - 1
> + * Final sum is dTmp5 (hi) + dTmp7 (lo)
> + */
> +        vaddps    %ymm14, %ymm15, %ymm13
> +
> +/*
> + * Compute e = -(2 * d + d^2)
> + * The first FMR is exact, and the rounding error in the other is acceptable
> + * since d and e are ~ 2^-8
> + */
> +        vmovaps   %ymm8, %ymm11
> +        vfnmadd231ps %ymm15, %ymm10, %ymm11
> +        vfnmadd231ps %ymm14, %ymm10, %ymm11
> +        vfmadd213ps sC2+__svml_sasinh_data_internal(%rip), %ymm11, %ymm12
> +        vfmadd213ps %ymm3, %ymm11, %ymm12
> +        vmulps    %ymm12, %ymm11, %ymm1
> +
> +/* Now multiplex the two possible computations */
> +        vcmple_oqps sLittleThreshold+__svml_sasinh_data_internal(%rip), %ymm6, %ymm11
> +        vfmadd213ps %ymm14, %ymm13, %ymm1
> +        vaddps    %ymm0, %ymm1, %ymm2
> +        vsubps    %ymm2, %ymm0, %ymm10
> +
> +/* sX2over2 = X^2/2 */
> +        vmulps    %ymm4, %ymm3, %ymm0
> +        vaddps    %ymm10, %ymm1, %ymm1
> +
> +/* sX4over4 = X^4/4 */
> +        vmulps    %ymm0, %ymm0, %ymm5
> +
> +/* sX46 = -X^4/4 + X^6/8 */
> +        vfmsub231ps %ymm0, %ymm5, %ymm5
> +
> +/* sX46over2 = -X^4/8 + x^6/16 */
> +        vmulps    %ymm5, %ymm3, %ymm3
> +        vaddps    %ymm3, %ymm0, %ymm5
> +        vblendvps %ymm11, %ymm5, %ymm2, %ymm2
> +        vsubps    %ymm5, %ymm0, %ymm4
> +
> +/*
> + * Now do another compensated sum to add |X| + [sqrt(1 + X^2) - 1].
> + * It's always safe to assume |X| is larger.
> + * This is the final 2-part argument to the log1p function
> + */
> +        vaddps    %ymm2, %ymm6, %ymm14
> +
> +/*
> + * Now resume the main code.
> + * reduction: compute r,n
> + */
> +        vmovups   iBrkValue+__svml_sasinh_data_internal(%rip), %ymm5
> +        vaddps    %ymm4, %ymm3, %ymm10
> +
> +/*
> + * Now we feed into the log1p code, using H in place of _VARG1 and
> + * also adding L into Xl.
> + * compute 1+x as high, low parts
> + */
> +        vmaxps    %ymm14, %ymm8, %ymm15
> +        vminps    %ymm14, %ymm8, %ymm0
> +        vblendvps %ymm11, %ymm10, %ymm1, %ymm12
> +        vsubps    %ymm14, %ymm6, %ymm1
> +        vaddps    %ymm0, %ymm15, %ymm3
> +
> +/* Now multiplex to the case X = 2^-30 * input, Xl = sL = 0 in the "big" case. */
> +        vmulps    XScale+__svml_sasinh_data_internal(%rip), %ymm6, %ymm6
> +        vaddps    %ymm1, %ymm2, %ymm13
> +        vsubps    %ymm3, %ymm15, %ymm15
> +        vaddps    %ymm13, %ymm12, %ymm1
> +        vaddps    %ymm15, %ymm0, %ymm2
> +        vblendvps %ymm7, %ymm3, %ymm6, %ymm0
> +        vaddps    %ymm2, %ymm1, %ymm4
> +        vpsubd    %ymm5, %ymm0, %ymm1
> +        vpsrad    $23, %ymm1, %ymm6
> +        vpand     iOffExpoMask+__svml_sasinh_data_internal(%rip), %ymm1, %ymm2
> +        vmovups   sPoly+224+__svml_sasinh_data_internal(%rip), %ymm1
> +        vpslld    $23, %ymm6, %ymm10
> +        vpaddd    %ymm5, %ymm2, %ymm13
> +        vcvtdq2ps %ymm6, %ymm0
> +        vpsubd    %ymm10, %ymm8, %ymm12
> +
> +/* polynomial evaluation */
> +        vsubps    %ymm8, %ymm13, %ymm8
> +
> +/* Add 31 to the exponent in the "large" case to get log(2 * input) */
> +        vaddps    sThirtyOne+__svml_sasinh_data_internal(%rip), %ymm0, %ymm3
> +        vandps    %ymm7, %ymm4, %ymm11
> +        vmulps    %ymm12, %ymm11, %ymm14
> +        vblendvps %ymm7, %ymm0, %ymm3, %ymm0
> +        vaddps    %ymm8, %ymm14, %ymm2
> +        vfmadd213ps sPoly+192+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213ps sPoly+160+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213ps sPoly+128+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213ps sPoly+96+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213ps sPoly+64+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213ps sPoly+32+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213ps sPoly+__svml_sasinh_data_internal(%rip), %ymm2, %ymm1
> +        vmulps    %ymm1, %ymm2, %ymm4
> +        vfmadd213ps %ymm2, %ymm2, %ymm4
> +
> +/* final reconstruction */
> +        vfmadd132ps sLn2+__svml_sasinh_data_internal(%rip), %ymm4, %ymm0
> +
> +/* Finally, reincorporate the original sign. */
> +        vandps    sSign+__svml_sasinh_data_internal(%rip), %ymm9, %ymm7
> +        vxorps    %ymm0, %ymm7, %ymm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm9
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm9, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      asinhf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_asinhf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_sasinh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct {
> +        __declspec(align(32)) VUINT32 SgnMask[8][1];
> +        __declspec(align(32)) VUINT32 sOne[8][1];
> +        __declspec(align(32)) VUINT32 sPoly[8][8][1];
> +        __declspec(align(32)) VUINT32 iBrkValue[8][1];
> +        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
> +        __declspec(align(32)) VUINT32 sBigThreshold[8][1];
> +        __declspec(align(32)) VUINT32 sC2[8][1];
> +        __declspec(align(32)) VUINT32 sC3[8][1];
> +        __declspec(align(32)) VUINT32 sHalf[8][1];
> +        __declspec(align(32)) VUINT32 sLargestFinite[8][1];
> +        __declspec(align(32)) VUINT32 sLittleThreshold[8][1];
> +        __declspec(align(32)) VUINT32 sSign[8][1];
> +        __declspec(align(32)) VUINT32 sThirtyOne[8][1];
> +        __declspec(align(32)) VUINT32 sTopMask8[8][1];
> +        __declspec(align(32)) VUINT32 XScale[8][1];
> +        __declspec(align(32)) VUINT32 sLn2[8][1];
> +} __svml_sasinh_data_internal;
> +#endif
> +__svml_sasinh_data_internal:
> +        /*== SgnMask ==*/
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> +        /*== sOne = SP 1.0 ==*/
> +        .align 32
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> +        /*== sPoly[] = SP polynomial ==*/
> +        .align 32
> +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> +        /*== iBrkValue = SP 2/3 ==*/
> +        .align 32
> +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> +        /*== iOffExpoMask = SP significand mask ==*/
> +        .align 32
> +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> +        /*== sBigThreshold ==*/
> +        .align 32
> +        .long 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000, 0x4E800000
> +        /*== sC2 ==*/
> +        .align 32
> +        .long 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000, 0x3EC00000
> +        /*== sC3 ==*/
> +        .align 32
> +        .long 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000, 0x3EA00000
> +        /*== sHalf ==*/
> +        .align 32
> +        .long 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000, 0x3F000000
> +        /*== sLargestFinite ==*/
> +        .align 32
> +        .long 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF, 0x7F7FFFFF
> +        /*== sLittleThreshold ==*/
> +        .align 32
> +        .long 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000, 0x3D800000
> +        /*== sSign ==*/
> +        .align 32
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000
> +        /*== sThirtyOne ==*/
> +        .align 32
> +        .long 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000, 0x41F80000
> +        /*== sTopMask8 ==*/
> +        .align 32
> +        .long 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000, 0xFFFF0000
> +        /*== XScale ==*/
> +        .align 32
> +        .long 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000, 0x30800000
> +        /*== sLn2 = SP ln(2) ==*/
> +        .align 32
> +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> +        .align 32
> +        .type	__svml_sasinh_data_internal,@object
> +        .size	__svml_sasinh_data_internal,.-__svml_sasinh_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_asinh2_core.S b/sysdeps/x86_64/fpu/svml_d_asinh2_core.S
> new file mode 100644
> index 0000000000..60e372238a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_asinh2_core.S
> @@ -0,0 +1,29 @@
> +/* Function asinh vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_asinh)
> +WRAPPER_IMPL_SSE2 asinh
> +END (_ZGVbN2v_asinh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_asinh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_asinh4_core.S b/sysdeps/x86_64/fpu/svml_d_asinh4_core.S
> new file mode 100644
> index 0000000000..c7350011e1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_asinh4_core.S
> @@ -0,0 +1,29 @@
> +/* Function asinh vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_asinh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_asinh
> +END (_ZGVdN4v_asinh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_asinh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S
> new file mode 100644
> index 0000000000..83aaa8c3f1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_asinh4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function asinh vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_asinh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_asinh
> +END (_ZGVcN4v_asinh)
> diff --git a/sysdeps/x86_64/fpu/svml_d_asinh8_core.S b/sysdeps/x86_64/fpu/svml_d_asinh8_core.S
> new file mode 100644
> index 0000000000..9597975ff6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_asinh8_core.S
> @@ -0,0 +1,25 @@
> +/* Function asinh vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_asinh)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_asinh
> +END (_ZGVeN8v_asinh)
> diff --git a/sysdeps/x86_64/fpu/svml_s_asinhf16_core.S b/sysdeps/x86_64/fpu/svml_s_asinhf16_core.S
> new file mode 100644
> index 0000000000..5b3d405f2e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_asinhf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function asinhf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_asinhf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_asinhf
> +END (_ZGVeN16v_asinhf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_asinhf4_core.S b/sysdeps/x86_64/fpu/svml_s_asinhf4_core.S
> new file mode 100644
> index 0000000000..af44fa5108
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_asinhf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function asinhf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_asinhf)
> +WRAPPER_IMPL_SSE2 asinhf
> +END (_ZGVbN4v_asinhf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_asinhf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_asinhf8_core.S b/sysdeps/x86_64/fpu/svml_s_asinhf8_core.S
> new file mode 100644
> index 0000000000..3bd06d8032
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_asinhf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function asinhf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_asinhf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_asinhf
> +END (_ZGVdN8v_asinhf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_asinhf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S
> new file mode 100644
> index 0000000000..f79616c0bd
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_asinhf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function asinhf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_asinhf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_asinhf
> +END (_ZGVcN8v_asinhf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx.c
> new file mode 100644
> index 0000000000..da03528700
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-asinh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx2.c
> new file mode 100644
> index 0000000000..da03528700
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-asinh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx512f.c
> new file mode 100644
> index 0000000000..da03528700
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-asinh-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-asinh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-asinh.c b/sysdeps/x86_64/fpu/test-double-libmvec-asinh.c
> new file mode 100644
> index 0000000000..71e6b9f578
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-asinh.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC asinh
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index f53bb6813e..76114772ba 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVbN2v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVbN2v_erf)
>  VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVbN2v_tanh)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVbN2v_asinh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 0452c3db38..1e0ee34975 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -48,6 +48,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVdN4v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVdN4v_erf)
>  VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVdN4v_tanh)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVdN4v_asinh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 197d5afc88..17c43a75d1 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVcN4v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVcN4v_erf)
>  VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVcN4v_tanh)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVcN4v_asinh)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index e56ece640c..1c6809e6e3 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVeN8v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVeN8v_erf)
>  VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVeN8v_tanh)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinh), _ZGVeN8v_asinh)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx.c
> new file mode 100644
> index 0000000000..77e1838bb4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-asinhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx2.c
> new file mode 100644
> index 0000000000..77e1838bb4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-asinhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx512f.c
> new file mode 100644
> index 0000000000..77e1838bb4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-asinhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-asinhf.c b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf.c
> new file mode 100644
> index 0000000000..3353754102
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-asinhf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC asinhf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index abbebf9993..e8ab1885a7 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVeN16v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVeN16v_erff)
>  VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVeN16v_tanhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVeN16v_asinhf)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index ae1c8b98c2..a80c5387e4 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVbN4v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVbN4v_erff)
>  VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVbN4v_tanhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVbN4v_asinhf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index eb477a0371..c3d1d5936b 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -48,6 +48,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVdN8v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVdN8v_erff)
>  VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVdN8v_tanhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVdN8v_asinhf)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 944f7f0a75..b7da0f523b 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -45,6 +45,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVcN8v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVcN8v_erff)
>  VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVcN8v_tanhf)
> +VECTOR_WRAPPER (WRAPPER_NAME (asinhf), _ZGVcN8v_asinhf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 16/18] x86-64: Add vector erf/erff implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 16/18] x86-64: Add vector erf/erff " Sunil K Pandey
@ 2021-12-29 21:27   ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-29 21:27 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: libc-alpha, hjl.tools, andrey.kolesov, marius.cornea

On Tue, Dec 28, 2021 at 10:39:58PM -0800, Sunil K Pandey wrote:
> Implement vectorized erf/erff containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector erf/erff with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |  11 +
>  math/bits/mathcalls.h                         |   2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |   8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |   4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |   4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |   1 +
>  sysdeps/x86_64/fpu/Versions                   |   2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |  20 +
>  .../fpu/multiarch/svml_d_erf2_core-sse2.S     |  20 +
>  .../x86_64/fpu/multiarch/svml_d_erf2_core.c   |  27 +
>  .../fpu/multiarch/svml_d_erf2_core_sse4.S     | 987 ++++++++++++++++++
>  .../fpu/multiarch/svml_d_erf4_core-sse.S      |  20 +
>  .../x86_64/fpu/multiarch/svml_d_erf4_core.c   |  27 +
>  .../fpu/multiarch/svml_d_erf4_core_avx2.S     | 984 +++++++++++++++++
>  .../fpu/multiarch/svml_d_erf8_core-avx2.S     |  20 +
>  .../x86_64/fpu/multiarch/svml_d_erf8_core.c   |  27 +
>  .../fpu/multiarch/svml_d_erf8_core_avx512.S   | 983 +++++++++++++++++
>  .../fpu/multiarch/svml_s_erff16_core-avx2.S   |  20 +
>  .../x86_64/fpu/multiarch/svml_s_erff16_core.c |  28 +
>  .../fpu/multiarch/svml_s_erff16_core_avx512.S | 185 ++++
>  .../fpu/multiarch/svml_s_erff4_core-sse2.S    |  20 +
>  .../x86_64/fpu/multiarch/svml_s_erff4_core.c  |  28 +
>  .../fpu/multiarch/svml_s_erff4_core_sse4.S    | 664 ++++++++++++
>  .../fpu/multiarch/svml_s_erff8_core-sse.S     |  20 +
>  .../x86_64/fpu/multiarch/svml_s_erff8_core.c  |  28 +
>  .../fpu/multiarch/svml_s_erff8_core_avx2.S    | 669 ++++++++++++
>  sysdeps/x86_64/fpu/svml_d_erf2_core.S         |  29 +
>  sysdeps/x86_64/fpu/svml_d_erf4_core.S         |  29 +
>  sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S     |  25 +
>  sysdeps/x86_64/fpu/svml_d_erf8_core.S         |  25 +
>  sysdeps/x86_64/fpu/svml_s_erff16_core.S       |  25 +
>  sysdeps/x86_64/fpu/svml_s_erff4_core.S        |  29 +
>  sysdeps/x86_64/fpu/svml_s_erff8_core.S        |  29 +
>  sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S    |  25 +
>  .../x86_64/fpu/test-double-libmvec-erf-avx.c  |   1 +
>  .../x86_64/fpu/test-double-libmvec-erf-avx2.c |   1 +
>  .../fpu/test-double-libmvec-erf-avx512f.c     |   1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-erf.c  |   3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |   1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |   1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |   1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-libmvec-erff-avx.c  |   1 +
>  .../x86_64/fpu/test-float-libmvec-erff-avx2.c |   1 +
>  .../fpu/test-float-libmvec-erff-avx512f.c     |   1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-erff.c  |   3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |   1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |   1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |   1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |   1 +
>  50 files changed, 5044 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_erf2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_erf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_erf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_erff16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_erff4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_erff8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-erf.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-erff.c
> 
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index b17bf78cd9..33d480031b 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -274,4 +274,15 @@
>  #define __DECL_SIMD_acoshf32x
>  #define __DECL_SIMD_acoshf64x
>  #define __DECL_SIMD_acoshf128x
> +
> +#define __DECL_SIMD_erf
> +#define __DECL_SIMD_erff
> +#define __DECL_SIMD_erfl
> +#define __DECL_SIMD_erff16
> +#define __DECL_SIMD_erff32
> +#define __DECL_SIMD_erff64
> +#define __DECL_SIMD_erff128
> +#define __DECL_SIMD_erff32x
> +#define __DECL_SIMD_erff64x
> +#define __DECL_SIMD_erff128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index bc37973c41..a5b6c4457f 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -228,7 +228,7 @@ __MATHCALL (yn,, (int, _Mdouble_));
>  
>  #if defined __USE_XOPEN || defined __USE_ISOC99
>  /* Error and gamma functions.  */
> -__MATHCALL (erf,, (_Mdouble_));
> +__MATHCALL_VEC (erf,, (_Mdouble_));
>  __MATHCALL (erfc,, (_Mdouble_));
>  __MATHCALL (lgamma,, (_Mdouble_));
>  #endif
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index e9d6ade70a..5525c8a0d6 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -53,6 +53,7 @@ GLIBC_2.35 _ZGVbN2v_atan F
>  GLIBC_2.35 _ZGVbN2v_atanh F
>  GLIBC_2.35 _ZGVbN2v_cbrt F
>  GLIBC_2.35 _ZGVbN2v_cosh F
> +GLIBC_2.35 _ZGVbN2v_erf F
>  GLIBC_2.35 _ZGVbN2v_exp10 F
>  GLIBC_2.35 _ZGVbN2v_exp2 F
>  GLIBC_2.35 _ZGVbN2v_expm1 F
> @@ -69,6 +70,7 @@ GLIBC_2.35 _ZGVbN4v_atanf F
>  GLIBC_2.35 _ZGVbN4v_atanhf F
>  GLIBC_2.35 _ZGVbN4v_cbrtf F
>  GLIBC_2.35 _ZGVbN4v_coshf F
> +GLIBC_2.35 _ZGVbN4v_erff F
>  GLIBC_2.35 _ZGVbN4v_exp10f F
>  GLIBC_2.35 _ZGVbN4v_exp2f F
>  GLIBC_2.35 _ZGVbN4v_expm1f F
> @@ -85,6 +87,7 @@ GLIBC_2.35 _ZGVcN4v_atan F
>  GLIBC_2.35 _ZGVcN4v_atanh F
>  GLIBC_2.35 _ZGVcN4v_cbrt F
>  GLIBC_2.35 _ZGVcN4v_cosh F
> +GLIBC_2.35 _ZGVcN4v_erf F
>  GLIBC_2.35 _ZGVcN4v_exp10 F
>  GLIBC_2.35 _ZGVcN4v_exp2 F
>  GLIBC_2.35 _ZGVcN4v_expm1 F
> @@ -101,6 +104,7 @@ GLIBC_2.35 _ZGVcN8v_atanf F
>  GLIBC_2.35 _ZGVcN8v_atanhf F
>  GLIBC_2.35 _ZGVcN8v_cbrtf F
>  GLIBC_2.35 _ZGVcN8v_coshf F
> +GLIBC_2.35 _ZGVcN8v_erff F
>  GLIBC_2.35 _ZGVcN8v_exp10f F
>  GLIBC_2.35 _ZGVcN8v_exp2f F
>  GLIBC_2.35 _ZGVcN8v_expm1f F
> @@ -117,6 +121,7 @@ GLIBC_2.35 _ZGVdN4v_atan F
>  GLIBC_2.35 _ZGVdN4v_atanh F
>  GLIBC_2.35 _ZGVdN4v_cbrt F
>  GLIBC_2.35 _ZGVdN4v_cosh F
> +GLIBC_2.35 _ZGVdN4v_erf F
>  GLIBC_2.35 _ZGVdN4v_exp10 F
>  GLIBC_2.35 _ZGVdN4v_exp2 F
>  GLIBC_2.35 _ZGVdN4v_expm1 F
> @@ -133,6 +138,7 @@ GLIBC_2.35 _ZGVdN8v_atanf F
>  GLIBC_2.35 _ZGVdN8v_atanhf F
>  GLIBC_2.35 _ZGVdN8v_cbrtf F
>  GLIBC_2.35 _ZGVdN8v_coshf F
> +GLIBC_2.35 _ZGVdN8v_erff F
>  GLIBC_2.35 _ZGVdN8v_exp10f F
>  GLIBC_2.35 _ZGVdN8v_exp2f F
>  GLIBC_2.35 _ZGVdN8v_expm1f F
> @@ -149,6 +155,7 @@ GLIBC_2.35 _ZGVeN16v_atanf F
>  GLIBC_2.35 _ZGVeN16v_atanhf F
>  GLIBC_2.35 _ZGVeN16v_cbrtf F
>  GLIBC_2.35 _ZGVeN16v_coshf F
> +GLIBC_2.35 _ZGVeN16v_erff F
>  GLIBC_2.35 _ZGVeN16v_exp10f F
>  GLIBC_2.35 _ZGVeN16v_exp2f F
>  GLIBC_2.35 _ZGVeN16v_expm1f F
> @@ -165,6 +172,7 @@ GLIBC_2.35 _ZGVeN8v_atan F
>  GLIBC_2.35 _ZGVeN8v_atanh F
>  GLIBC_2.35 _ZGVeN8v_cbrt F
>  GLIBC_2.35 _ZGVeN8v_cosh F
> +GLIBC_2.35 _ZGVeN8v_erf F
>  GLIBC_2.35 _ZGVeN8v_exp10 F
>  GLIBC_2.35 _ZGVeN8v_exp2 F
>  GLIBC_2.35 _ZGVeN8v_expm1 F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index 4ad12a33e5..ea0deb31c1 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -122,6 +122,10 @@
>  #  define __DECL_SIMD_acosh __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_acoshf
>  #  define __DECL_SIMD_acoshf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_erf
> +#  define __DECL_SIMD_erf __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_erff
> +#  define __DECL_SIMD_erff __DECL_SIMD_x86_64
>  
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 503547d3e4..42addd9a25 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -60,6 +60,8 @@
>  !GCC$ builtin (atanhf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (acosh) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (acoshf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (erf) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (erff) attributes simd (notinbranch) if('x86_64')
>  
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -105,3 +107,5 @@
>  !GCC$ builtin (atanhf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (acosh) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (acoshf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (erf) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (erff) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 7b90b3d049..2b89a1bba3 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -31,6 +31,7 @@ libmvec-funcs = \
>    cbrt \
>    cos \
>    cosh \
> +  erf \
>    exp \
>    exp10 \
>    exp2 \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index fd5e5923a1..2fcdef6944 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -21,6 +21,7 @@ libmvec {
>      _ZGVbN2v_atanh; _ZGVcN4v_atanh; _ZGVdN4v_atanh; _ZGVeN8v_atanh;
>      _ZGVbN2v_cbrt; _ZGVcN4v_cbrt; _ZGVdN4v_cbrt; _ZGVeN8v_cbrt;
>      _ZGVbN2v_cosh; _ZGVcN4v_cosh; _ZGVdN4v_cosh; _ZGVeN8v_cosh;
> +    _ZGVbN2v_erf; _ZGVcN4v_erf; _ZGVdN4v_erf; _ZGVeN8v_erf;
>      _ZGVbN2v_exp10; _ZGVcN4v_exp10; _ZGVdN4v_exp10; _ZGVeN8v_exp10;
>      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
>      _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
> @@ -37,6 +38,7 @@ libmvec {
>      _ZGVbN4v_atanhf; _ZGVcN8v_atanhf; _ZGVdN8v_atanhf; _ZGVeN16v_atanhf;
>      _ZGVbN4v_cbrtf; _ZGVcN8v_cbrtf; _ZGVdN8v_cbrtf; _ZGVeN16v_cbrtf;
>      _ZGVbN4v_coshf; _ZGVcN8v_coshf; _ZGVdN8v_coshf; _ZGVeN16v_coshf;
> +    _ZGVbN4v_erff; _ZGVcN8v_erff; _ZGVdN8v_erff; _ZGVeN16v_erff;
>      _ZGVbN4v_exp10f; _ZGVcN8v_exp10f; _ZGVdN8v_exp10f; _ZGVeN16v_exp10f;
>      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
>      _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index b2aa8fc56e..929de0e786 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -1298,6 +1298,26 @@ float: 1
>  float128: 2
>  ldouble: 1
>  
> +Function: "erf_vlen16":
> +float: 1
> +
> +Function: "erf_vlen2":
> +double: 1
> +
> +Function: "erf_vlen4":
> +double: 1
> +float: 2
> +
> +Function: "erf_vlen4_avx2":
> +double: 1
> +
> +Function: "erf_vlen8":
> +double: 1
> +float: 2
> +
> +Function: "erf_vlen8_avx2":
> +float: 2
> +
>  Function: "erfc":
>  double: 5
>  float: 3
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core-sse2.S
> new file mode 100644
> index 0000000000..2b5735ebb3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized erf, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_erf _ZGVbN2v_erf_sse2
> +#include "../svml_d_erf2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core.c
> new file mode 100644
> index 0000000000..74757be88f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized erf, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_erf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_erf, __GI__ZGVbN2v_erf, __redirect__ZGVbN2v_erf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core_sse4.S
> new file mode 100644
> index 0000000000..c164748bbe
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf2_core_sse4.S
> @@ -0,0 +1,987 @@
> +/* Function erf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Basic formula is
> + *    erf(x) ~ erf(x0) +
> + *              + exp(-x0*x0)*D*(1+c0+T*P1(T)+D^2*P3(T)+D^4*P5(T)+D^6*p7+D^8*p9)
> + *   where D=x-x0, T=x0*D
> + *   x0 is x rounded to a specified number of fractional bits (in this case 7),
> + *    except that x0=0 for |x|<3.5/128.0 (using x0=0 for first 4 table entries)
> + *
> + *   Data table packs both erf(x0)_high and a few bits of erf(x0)_low in one
> + *   entry (in place of redundant exponent bits)
> + *
> + */
> +
> +/* Offsets for data table __svml_derf_data_internal
> + */
> +#define _erf_tbl                      	0
> +#define _AbsMask                      	12288
> +#define _MaxThreshold                 	12304
> +#define _SRound                       	12320
> +#define _U2Threshold                  	12336
> +#define _poly1_0                      	12352
> +#define _poly1_1                      	12368
> +#define _poly3_0                      	12384
> +#define _poly3_1                      	12400
> +#define _poly5_0                      	12416
> +#define _poly5_1                      	12432
> +#define _poly1_2                      	12448
> +#define _poly3_2                      	12464
> +#define _poly1_3                      	12480
> +#define _poly3_3                      	12496
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_erf_sse4)
> +/*
> + * vector gather: erf(x0),
> + * second value is exp(-x0*x0)
> + */
> +        lea       __svml_derf_data_internal(%rip), %rcx
> +        movups    _AbsMask+__svml_derf_data_internal(%rip), %xmm5
> +        andps     %xmm0, %xmm5
> +
> +/*
> + * erf(x) rounds to 1.0 for x>_MaxThreshold (5.9921875)
> + * can compute all results in the main path
> + */
> +        movaps    %xmm5, %xmm9
> +
> +/* save sign */
> +        pxor      %xmm5, %xmm0
> +        minpd     _MaxThreshold+__svml_derf_data_internal(%rip), %xmm9
> +        movups    _SRound+__svml_derf_data_internal(%rip), %xmm1
> +        movaps    %xmm1, %xmm2
> +        addpd     %xmm9, %xmm2
> +        movaps    %xmm2, %xmm8
> +        psllq     $4, %xmm2
> +        subpd     %xmm1, %xmm8
> +        movd      %xmm2, %eax
> +        movups    _U2Threshold+__svml_derf_data_internal(%rip), %xmm11
> +        cmpltpd   %xmm9, %xmm11
> +        subpd     %xmm8, %xmm9
> +        mulpd     %xmm9, %xmm8
> +
> +/*
> + * _LA_ polynomial computation
> + * Start polynomial evaluation
> + */
> +        movups    _poly1_0+__svml_derf_data_internal(%rip), %xmm7
> +        andps     %xmm9, %xmm11
> +        mulpd     %xmm8, %xmm7
> +
> +/* D2 = Diff^2 */
> +        mulpd     %xmm11, %xmm11
> +        addpd     _poly1_1+__svml_derf_data_internal(%rip), %xmm7
> +
> +/* NaN fixup */
> +        minpd     %xmm5, %xmm9
> +        mulpd     %xmm8, %xmm7
> +        movups    _poly3_0+__svml_derf_data_internal(%rip), %xmm6
> +
> +/* T^2 */
> +        movaps    %xmm8, %xmm12
> +        mulpd     %xmm8, %xmm6
> +        addpd     _poly1_2+__svml_derf_data_internal(%rip), %xmm7
> +        addpd     _poly3_1+__svml_derf_data_internal(%rip), %xmm6
> +        mulpd     %xmm8, %xmm12
> +        mulpd     %xmm8, %xmm6
> +        mulpd     %xmm8, %xmm7
> +        addpd     _poly3_2+__svml_derf_data_internal(%rip), %xmm6
> +        addpd     _poly1_3+__svml_derf_data_internal(%rip), %xmm7
> +        mulpd     %xmm8, %xmm6
> +
> +/* P1 = T^2*P1 - T */
> +        mulpd     %xmm7, %xmm12
> +        movups    _poly5_0+__svml_derf_data_internal(%rip), %xmm10
> +
> +/* Sign | Diff */
> +        pxor      %xmm0, %xmm9
> +        mulpd     %xmm8, %xmm10
> +        subpd     %xmm8, %xmm12
> +        addpd     _poly5_1+__svml_derf_data_internal(%rip), %xmm10
> +        mulpd     %xmm11, %xmm10
> +        addpd     _poly3_3+__svml_derf_data_internal(%rip), %xmm10
> +        addpd     %xmm6, %xmm10
> +        pshufd    $2, %xmm2, %xmm3
> +        movd      %xmm3, %edx
> +
> +/* P1 + P3*D2 */
> +        mulpd     %xmm10, %xmm11
> +        movslq    %eax, %rax
> +        movslq    %edx, %rdx
> +        addpd     %xmm11, %xmm12
> +        movups    (%rcx,%rax), %xmm13
> +        movups    (%rcx,%rdx), %xmm4
> +        movaps    %xmm13, %xmm14
> +        unpckhpd  %xmm4, %xmm13
> +
> +/* exp_h(x0) * Diff */
> +        mulpd     %xmm9, %xmm13
> +
> +/*
> + * branch-free
> + * low part of result: exp_h(x0) * Diff*(1+P1)
> + */
> +        mulpd     %xmm13, %xmm12
> +        addpd     %xmm12, %xmm13
> +        unpcklpd  %xmm4, %xmm14
> +
> +/* Sign | _Erf_H */
> +        pxor      %xmm0, %xmm14
> +
> +/* Final result */
> +        addpd     %xmm13, %xmm14
> +
> +/* Fix erf(-0) = -0 */
> +        orps      %xmm14, %xmm0
> +        ret
> +
> +END(_ZGVbN2v_erf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_derf_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _erf_tbl[6*128*2][2];
> +        __declspec(align(16)) VUINT32 _AbsMask[2][2];
> +        __declspec(align(16)) VUINT32 _MaxThreshold[2][2];
> +        __declspec(align(16)) VUINT32 _SRound[2][2];
> +        __declspec(align(16)) VUINT32 _U2Threshold[2][2];
> +        __declspec(align(16)) VUINT32 _poly1_0[2][2];
> +        __declspec(align(16)) VUINT32 _poly1_1[2][2];
> +        __declspec(align(16)) VUINT32 _poly3_0[2][2];
> +        __declspec(align(16)) VUINT32 _poly3_1[2][2];
> +        __declspec(align(16)) VUINT32 _poly5_0[2][2];
> +        __declspec(align(16)) VUINT32 _poly5_1[2][2];
> +        __declspec(align(16)) VUINT32 _poly1_2[2][2];
> +        __declspec(align(16)) VUINT32 _poly3_2[2][2];
> +        __declspec(align(16)) VUINT32 _poly1_3[2][2];
> +        __declspec(align(16)) VUINT32 _poly3_3[2][2];
> +} __svml_derf_data_internal;
> +#endif
> +__svml_derf_data_internal:
> +        /*== _erf_tbl ==*/
> +        .quad 0x0000000000000000, 0x3ff20dd750429b6d
> +        .quad 0x3f820dbf3deb1340, 0x3ff20d8f1975c85d
> +        .quad 0x3f920d77083f17a0, 0x3ff20cb67bd452c7
> +        .quad 0x3f9b137e0cf584dc, 0x3ff20b4d8bac36c1
> +        .quad 0x3fa20c5645dd2538, 0x3ff209546ad13ccf
> +        .quad 0x3fa68e5d3bbc9526, 0x3ff206cb4897b148
> +        .quad 0x3fab0fafef135745, 0x3ff203b261cd0053
> +        .quad 0x3faf902a77bd3821, 0x3ff2000a00ae3804
> +        .quad 0x3fb207d480e90658, 0x3ff1fbd27cdc72d3
> +        .quad 0x3fb44703e87e8593, 0x3ff1f70c3b4f2cc8
> +        .quad 0x3fb68591a1e83b5d, 0x3ff1f1b7ae44867f
> +        .quad 0x3fb8c36beb8a8d23, 0x3ff1ebd5552f795b
> +        .quad 0x3fbb0081148a873a, 0x3ff1e565bca400d4
> +        .quad 0x3fbd3cbf7e70a4b3, 0x3ff1de697e413d29
> +        .quad 0x3fbf78159ec8bb50, 0x3ff1d6e14099944a
> +        .quad 0x3fc0d939005f65e5, 0x3ff1cecdb718d61c
> +        .quad 0x3fc1f5e1a35c3b89, 0x3ff1c62fa1e869b6
> +        .quad 0x3fc311fc15f56d14, 0x3ff1bd07cdd189ac
> +        .quad 0x3fc42d7fc2f64959, 0x3ff1b357141d95d5
> +        .quad 0x3fc548642321d7c6, 0x3ff1a91e5a748165
> +        .quad 0x3fc662a0bdf7a89f, 0x3ff19e5e92b964ab
> +        .quad 0x3fc77c2d2a765f9e, 0x3ff19318bae53a04
> +        .quad 0x3fc895010fdbdbfd, 0x3ff1874ddcdfce24
> +        .quad 0x3fc9ad142662e14d, 0x3ff17aff0e56ec10
> +        .quad 0x3fcac45e37fe2526, 0x3ff16e2d7093cd8c
> +        .quad 0x3fcbdad72110a648, 0x3ff160da304ed92f
> +        .quad 0x3fccf076d1233237, 0x3ff153068581b781
> +        .quad 0x3fce05354b96ff36, 0x3ff144b3b337c90c
> +        .quad 0x3fcf190aa85540e2, 0x3ff135e3075d076b
> +        .quad 0x3fd015f78a3dcf3d, 0x3ff12695da8b5bde
> +        .quad 0x3fd09eed6982b948, 0x3ff116cd8fd67618
> +        .quad 0x3fd127631eb8de32, 0x3ff1068b94962e5e
> +        .quad 0x3fd1af54e232d609, 0x3ff0f5d1602f7e41
> +        .quad 0x3fd236bef825d9a2, 0x3ff0e4a073dc1b91
> +        .quad 0x3fd2bd9db0f7827f, 0x3ff0d2fa5a70c168
> +        .quad 0x3fd343ed6989b7d9, 0x3ff0c0e0a8223359
> +        .quad 0x3fd3c9aa8b84beda, 0x3ff0ae54fa490723
> +        .quad 0x3fd44ed18d9f6462, 0x3ff09b58f724416b
> +        .quad 0x3fd4d35ef3e5372e, 0x3ff087ee4d9ad247
> +        .quad 0x3fd5574f4ffac98e, 0x3ff07416b4fbfe7c
> +        .quad 0x3fd5da9f415ff23f, 0x3ff05fd3ecbec298
> +        .quad 0x3fd65d4b75b00471, 0x3ff04b27bc403d30
> +        .quad 0x3fd6df50a8dff772, 0x3ff03613f2812daf
> +        .quad 0x3fd760aba57a76bf, 0x3ff0209a65e29545
> +        .quad 0x3fd7e15944d9d3e4, 0x3ff00abcf3e187a9
> +        .quad 0x3fd861566f5fd3c0, 0x3fefe8fb01a47307
> +        .quad 0x3fd8e0a01cab516b, 0x3fefbbbbef34b4b2
> +        .quad 0x3fd95f3353cbb146, 0x3fef8dc092d58ff8
> +        .quad 0x3fd9dd0d2b721f39, 0x3fef5f0cdaf15313
> +        .quad 0x3fda5a2aca209394, 0x3fef2fa4c16c0019
> +        .quad 0x3fdad68966569a87, 0x3feeff8c4b1375db
> +        .quad 0x3fdb522646bbda68, 0x3feecec7870ebca8
> +        .quad 0x3fdbccfec24855b8, 0x3fee9d5a8e4c934e
> +        .quad 0x3fdc4710406a65fc, 0x3fee6b4982f158b9
> +        .quad 0x3fdcc058392a6d2d, 0x3fee38988fc46e72
> +        .quad 0x3fdd38d4354c3bd0, 0x3fee054be79d3042
> +        .quad 0x3fddb081ce6e2a48, 0x3fedd167c4cf9d2a
> +        .quad 0x3fde275eaf25e458, 0x3fed9cf06898cdaf
> +        .quad 0x3fde9d68931ae650, 0x3fed67ea1a8b5368
> +        .quad 0x3fdf129d471eabb1, 0x3fed325927fb9d89
> +        .quad 0x3fdf86faa9428f9d, 0x3fecfc41e36c7df9
> +        .quad 0x3fdffa7ea8eb5fd0, 0x3fecc5a8a3fbea40
> +        .quad 0x3fe03693a371519c, 0x3fec8e91c4d01368
> +        .quad 0x3fe06f794ab2cae7, 0x3fec5701a484ef9d
> +        .quad 0x3fe0a7ef5c18edd2, 0x3fec1efca49a5011
> +        .quad 0x3fe0dff4f247f6c6, 0x3febe68728e29d5e
> +        .quad 0x3fe1178930ada115, 0x3febada596f25436
> +        .quad 0x3fe14eab43841b55, 0x3feb745c55905bf8
> +        .quad 0x3fe1855a5fd3dd50, 0x3feb3aafcc27502e
> +        .quad 0x3fe1bb95c3746199, 0x3feb00a46237d5be
> +        .quad 0x3fe1f15cb50bc4de, 0x3feac63e7ecc1411
> +        .quad 0x3fe226ae840d4d70, 0x3fea8b8287ec6a09
> +        .quad 0x3fe25b8a88b6dd7f, 0x3fea5074e2157620
> +        .quad 0x3fe28ff0240d52cd, 0x3fea1519efaf889e
> +        .quad 0x3fe2c3debfd7d6c1, 0x3fe9d97610879642
> +        .quad 0x3fe2f755ce9a21f4, 0x3fe99d8da149c13f
> +        .quad 0x3fe32a54cb8db67b, 0x3fe96164fafd8de3
> +        .quad 0x3fe35cdb3a9a144d, 0x3fe925007283d7aa
> +        .quad 0x3fe38ee8a84beb71, 0x3fe8e86458169af8
> +        .quad 0x3fe3c07ca9cb4f9e, 0x3fe8ab94f6caa71d
> +        .quad 0x3fe3f196dcd0f135, 0x3fe86e9694134b9e
> +        .quad 0x3fe42236e79a5fa6, 0x3fe8316d6f48133d
> +        .quad 0x3fe4525c78dd5966, 0x3fe7f41dc12c9e89
> +        .quad 0x3fe4820747ba2dc2, 0x3fe7b6abbb7aaf19
> +        .quad 0x3fe4b13713ad3513, 0x3fe7791b886e7403
> +        .quad 0x3fe4dfeba47f63cc, 0x3fe73b714a552763
> +        .quad 0x3fe50e24ca35fd2c, 0x3fe6fdb11b1e0c34
> +        .quad 0x3fe53be25d016a4f, 0x3fe6bfdf0beddaf5
> +        .quad 0x3fe569243d2b3a9b, 0x3fe681ff24b4ab04
> +        .quad 0x3fe595ea53035283, 0x3fe6441563c665d4
> +        .quad 0x3fe5c2348ecc4dc3, 0x3fe60625bd75d07b
> +        .quad 0x3fe5ee02e8a71a53, 0x3fe5c8341bb23767
> +        .quad 0x3fe61955607dd15d, 0x3fe58a445da7c74c
> +        .quad 0x3fe6442bfdedd397, 0x3fe54c5a57629db0
> +        .quad 0x3fe66e86d0312e82, 0x3fe50e79d1749ac9
> +        .quad 0x3fe69865ee075011, 0x3fe4d0a6889dfd9f
> +        .quad 0x3fe6c1c9759d0e5f, 0x3fe492e42d78d2c5
> +        .quad 0x3fe6eab18c74091b, 0x3fe4553664273d24
> +        .quad 0x3fe7131e5f496a5a, 0x3fe417a0c4049fd0
> +        .quad 0x3fe73b1021fc0cb8, 0x3fe3da26d759aef5
> +        .quad 0x3fe762870f720c6f, 0x3fe39ccc1b136d5a
> +        .quad 0x3fe78983697dc96f, 0x3fe35f93fe7d1b3d
> +        .quad 0x3fe7b00578c26037, 0x3fe32281e2fd1a92
> +        .quad 0x3fe7d60d8c979f7b, 0x3fe2e5991bd4cbfc
> +        .quad 0x3fe7fb9bfaed8078, 0x3fe2a8dcede3673b
> +        .quad 0x3fe820b1202f27fb, 0x3fe26c508f6bd0ff
> +        .quad 0x3fe8454d5f25760d, 0x3fe22ff727dd6f7b
> +        .quad 0x3fe8697120d92a4a, 0x3fe1f3d3cf9ffe5a
> +        .quad 0x3fe88d1cd474a2e0, 0x3fe1b7e98fe26217
> +        .quad 0x3fe8b050ef253c37, 0x3fe17c3b626c7a12
> +        .quad 0x3fe8d30debfc572e, 0x3fe140cc3173f007
> +        .quad 0x3fe8f5544bd00c04, 0x3fe1059ed7740313
> +        .quad 0x3fe91724951b8fc6, 0x3fe0cab61f084b93
> +        .quad 0x3fe9387f53df5238, 0x3fe09014c2ca74da
> +        .quad 0x3fe959651980da31, 0x3fe055bd6d32e8d7
> +        .quad 0x3fe979d67caa6631, 0x3fe01bb2b87c6968
> +        .quad 0x3fe999d4192a5715, 0x3fdfc3ee5d1524b0
> +        .quad 0x3fe9b95e8fd26aba, 0x3fdf511a91a67d2a
> +        .quad 0x3fe9d8768656cc42, 0x3fdedeeee0959518
> +        .quad 0x3fe9f71ca72cffb6, 0x3fde6d6ffaa65a25
> +        .quad 0x3fea1551a16aaeaf, 0x3fddfca26f5bbf88
> +        .quad 0x3fea331628a45b92, 0x3fdd8c8aace11e63
> +        .quad 0x3fea506af4cc00f4, 0x3fdd1d2cfff91594
> +        .quad 0x3fea6d50c20fa293, 0x3fdcae8d93f1d7b7
> +        .quad 0x3fea89c850b7d54d, 0x3fdc40b0729ed548
> +        .quad 0x3feaa5d265064366, 0x3fdbd3998457afdb
> +        .quad 0x3feac16fc7143263, 0x3fdb674c8ffc6283
> +        .quad 0x3feadca142b10f98, 0x3fdafbcd3afe8ab6
> +        .quad 0x3feaf767a741088b, 0x3fda911f096fbc26
> +        .quad 0x3feb11c3c79bb424, 0x3fda27455e14c93c
> +        .quad 0x3feb2bb679ead19c, 0x3fd9be437a7de946
> +        .quad 0x3feb4540978921ee, 0x3fd9561c7f23a47b
> +        .quad 0x3feb5e62fce16095, 0x3fd8eed36b886d93
> +        .quad 0x3feb771e894d602e, 0x3fd8886b1e5ecfd1
> +        .quad 0x3feb8f741ef54f83, 0x3fd822e655b417e7
> +        .quad 0x3feba764a2af2b78, 0x3fd7be47af1f5d89
> +        .quad 0x3febbef0fbde6221, 0x3fd75a91a7f4d2ed
> +        .quad 0x3febd61a1453ab44, 0x3fd6f7c69d7d3ef8
> +        .quad 0x3febece0d82d1a5c, 0x3fd695e8cd31867e
> +        .quad 0x3fec034635b66e23, 0x3fd634fa54fa285f
> +        .quad 0x3fec194b1d49a184, 0x3fd5d4fd33729015
> +        .quad 0x3fec2ef0812fc1bd, 0x3fd575f3483021c3
> +        .quad 0x3fec443755820d64, 0x3fd517de540ce2a3
> +        .quad 0x3fec5920900b5fd1, 0x3fd4babff975a04c
> +        .quad 0x3fec6dad2829ec62, 0x3fd45e99bcbb7915
> +        .quad 0x3fec81de16b14cef, 0x3fd4036d0468a7a2
> +        .quad 0x3fec95b455cce69d, 0x3fd3a93b1998736c
> +        .quad 0x3feca930e0e2a825, 0x3fd35005285227f1
> +        .quad 0x3fecbc54b476248d, 0x3fd2f7cc3fe6f423
> +        .quad 0x3feccf20ce0c0d27, 0x3fd2a09153529381
> +        .quad 0x3fece1962c0e0d8b, 0x3fd24a55399ea239
> +        .quad 0x3fecf3b5cdaf0c39, 0x3fd1f518ae487dc8
> +        .quad 0x3fed0580b2cfd249, 0x3fd1a0dc51a9934d
> +        .quad 0x3fed16f7dbe41ca0, 0x3fd14da0a961fd14
> +        .quad 0x3fed281c49d818d0, 0x3fd0fb6620c550af
> +        .quad 0x3fed38eefdf64fdd, 0x3fd0aa2d09497f2b
> +        .quad 0x3fed4970f9ce00d9, 0x3fd059f59af7a906
> +        .quad 0x3fed59a33f19ed42, 0x3fd00abff4dec7a3
> +        .quad 0x3fed6986cfa798e7, 0x3fcf79183b101c5b
> +        .quad 0x3fed791cad3eff01, 0x3fcedeb406d9c825
> +        .quad 0x3fed8865d98abe01, 0x3fce4652fadcb6b2
> +        .quad 0x3fed97635600bb89, 0x3fcdaff4969c0b04
> +        .quad 0x3feda61623cb41e0, 0x3fcd1b982c501370
> +        .quad 0x3fedb47f43b2980d, 0x3fcc893ce1dcbef7
> +        .quad 0x3fedc29fb60715af, 0x3fcbf8e1b1ca2279
> +        .quad 0x3fedd0787a8bb39d, 0x3fcb6a856c3ed54f
> +        .quad 0x3fedde0a90611a0d, 0x3fcade26b7fbed95
> +        .quad 0x3fedeb56f5f12d28, 0x3fca53c4135a6526
> +        .quad 0x3fedf85ea8db188e, 0x3fc9cb5bd549b111
> +        .quad 0x3fee0522a5dfda73, 0x3fc944ec2e4f5630
> +        .quad 0x3fee11a3e8cf4eb8, 0x3fc8c07329874652
> +        .quad 0x3fee1de36c75ba58, 0x3fc83deeada4d25a
> +        .quad 0x3fee29e22a89d766, 0x3fc7bd5c7df3fe9c
> +        .quad 0x3fee35a11b9b61ce, 0x3fc73eba3b5b07b7
> +        .quad 0x3fee4121370224cc, 0x3fc6c205655be720
> +        .quad 0x3fee4c6372cd8927, 0x3fc6473b5b15a7a1
> +        .quad 0x3fee5768c3b4a3fc, 0x3fc5ce595c455b0a
> +        .quad 0x3fee62321d06c5e0, 0x3fc5575c8a468362
> +        .quad 0x3fee6cc0709c8a0d, 0x3fc4e241e912c305
> +        .quad 0x3fee7714aec96534, 0x3fc46f066040a832
> +        .quad 0x3fee812fc64db369, 0x3fc3fda6bc016994
> +        .quad 0x3fee8b12a44944a8, 0x3fc38e1fae1d6a9d
> +        .quad 0x3fee94be342e6743, 0x3fc3206dceef5f87
> +        .quad 0x3fee9e335fb56f87, 0x3fc2b48d9e5dea1c
> +        .quad 0x3feea7730ed0bbb9, 0x3fc24a7b84d38971
> +        .quad 0x3feeb07e27a133aa, 0x3fc1e233d434b813
> +        .quad 0x3feeb9558e6b42ce, 0x3fc17bb2c8d41535
> +        .quad 0x3feec1fa258c4bea, 0x3fc116f48a6476cc
> +        .quad 0x3feeca6ccd709544, 0x3fc0b3f52ce8c383
> +        .quad 0x3feed2ae6489ac1e, 0x3fc052b0b1a174ea
> +        .quad 0x3feedabfc7453e63, 0x3fbfe6460fef4680
> +        .quad 0x3feee2a1d004692c, 0x3fbf2a901ccafb37
> +        .quad 0x3feeea5557137ae0, 0x3fbe723726b824a9
> +        .quad 0x3feef1db32a2277c, 0x3fbdbd32ac4c99b0
> +        .quad 0x3feef93436bc2daa, 0x3fbd0b7a0f921e7c
> +        .quad 0x3fef006135426b26, 0x3fbc5d0497c09e74
> +        .quad 0x3fef0762fde45ee6, 0x3fbbb1c972f23e50
> +        .quad 0x3fef0e3a5e1a1788, 0x3fbb09bfb7d11a84
> +        .quad 0x3fef14e8211e8c55, 0x3fba64de673e8837
> +        .quad 0x3fef1b6d0fea5f4d, 0x3fb9c31c6df3b1b8
> +        .quad 0x3fef21c9f12f0677, 0x3fb92470a61b6965
> +        .quad 0x3fef27ff89525acf, 0x3fb888d1d8e510a3
> +        .quad 0x3fef2e0e9a6a8b09, 0x3fb7f036c0107294
> +        .quad 0x3fef33f7e43a706b, 0x3fb75a96077274ba
> +        .quad 0x3fef39bc242e43e6, 0x3fb6c7e64e7281cb
> +        .quad 0x3fef3f5c1558b19e, 0x3fb6381e2980956b
> +        .quad 0x3fef44d870704911, 0x3fb5ab342383d178
> +        .quad 0x3fef4a31ebcd47df, 0x3fb5211ebf41880b
> +        .quad 0x3fef4f693b67bd77, 0x3fb499d478bca735
> +        .quad 0x3fef547f10d60597, 0x3fb4154bc68d75c3
> +        .quad 0x3fef59741b4b97cf, 0x3fb3937b1b31925a
> +        .quad 0x3fef5e4907982a07, 0x3fb31458e6542847
> +        .quad 0x3fef62fe80272419, 0x3fb297db960e4f63
> +        .quad 0x3fef67952cff6282, 0x3fb21df9981f8e53
> +        .quad 0x3fef6c0db3c34641, 0x3fb1a6a95b1e786f
> +        .quad 0x3fef7068b7b10fd9, 0x3fb131e14fa1625d
> +        .quad 0x3fef74a6d9a38383, 0x3fb0bf97e95f2a64
> +        .quad 0x3fef78c8b812d498, 0x3fb04fc3a0481321
> +        .quad 0x3fef7cceef15d631, 0x3fafc4b5e32d6259
> +        .quad 0x3fef80ba18636f07, 0x3faeeea8c1b1db94
> +        .quad 0x3fef848acb544e95, 0x3fae1d4cf1e2450a
> +        .quad 0x3fef88419ce4e184, 0x3fad508f9a1ea64f
> +        .quad 0x3fef8bdf1fb78370, 0x3fac885df3451a07
> +        .quad 0x3fef8f63e416ebff, 0x3fabc4a54a84e834
> +        .quad 0x3fef92d077f8d56d, 0x3fab055303221015
> +        .quad 0x3fef96256700da8e, 0x3faa4a549829587e
> +        .quad 0x3fef99633a838a57, 0x3fa993979e14fffe
> +        .quad 0x3fef9c8a7989af0d, 0x3fa8e109c4622913
> +        .quad 0x3fef9f9ba8d3c733, 0x3fa83298d717210e
> +        .quad 0x3fefa2974addae45, 0x3fa78832c03aa2b1
> +        .quad 0x3fefa57ddfe27376, 0x3fa6e1c5893c380b
> +        .quad 0x3fefa84fe5e05c8d, 0x3fa63f3f5c4de13b
> +        .quad 0x3fefab0dd89d1309, 0x3fa5a08e85af27e0
> +        .quad 0x3fefadb831a9f9c3, 0x3fa505a174e9c929
> +        .quad 0x3fefb04f6868a944, 0x3fa46e66be002240
> +        .quad 0x3fefb2d3f20f9101, 0x3fa3dacd1a8d8cce
> +        .quad 0x3fefb54641aebbc9, 0x3fa34ac36ad8dafe
> +        .quad 0x3fefb7a6c834b5a2, 0x3fa2be38b6d92415
> +        .quad 0x3fefb9f5f4739170, 0x3fa2351c2f2d1449
> +        .quad 0x3fefbc3433260ca5, 0x3fa1af5d2e04f3f6
> +        .quad 0x3fefbe61eef4cf6a, 0x3fa12ceb37ff9bc3
> +        .quad 0x3fefc07f907bc794, 0x3fa0adb5fcfa8c75
> +        .quad 0x3fefc28d7e4f9cd0, 0x3fa031ad58d56279
> +        .quad 0x3fefc48c1d033c7a, 0x3f9f7182a851bca2
> +        .quad 0x3fefc67bcf2d7b8f, 0x3f9e85c449e377f3
> +        .quad 0x3fefc85cf56ecd38, 0x3f9da0005e5f28df
> +        .quad 0x3fefca2fee770c79, 0x3f9cc0180af00a8b
> +        .quad 0x3fefcbf5170b578b, 0x3f9be5ecd2fcb5f9
> +        .quad 0x3fefcdacca0bfb73, 0x3f9b1160991ff737
> +        .quad 0x3fefcf57607a6e7c, 0x3f9a4255a00b9f03
> +        .quad 0x3fefd0f5317f582f, 0x3f9978ae8b55ce1b
> +        .quad 0x3fefd2869270a56f, 0x3f98b44e6031383e
> +        .quad 0x3fefd40bd6d7a785, 0x3f97f5188610ddc8
> +        .quad 0x3fefd58550773cb5, 0x3f973af0c737bb45
> +        .quad 0x3fefd6f34f52013a, 0x3f9685bb5134ef13
> +        .quad 0x3fefd85621b0876d, 0x3f95d55cb54cd53a
> +        .quad 0x3fefd9ae142795e3, 0x3f9529b9e8cf9a1e
> +        .quad 0x3fefdafb719e6a69, 0x3f9482b8455dc491
> +        .quad 0x3fefdc3e835500b3, 0x3f93e03d891b37de
> +        .quad 0x3fefdd7790ea5bc0, 0x3f93422fd6d12e2b
> +        .quad 0x3fefdea6e062d0c9, 0x3f92a875b5ffab56
> +        .quad 0x3fefdfccb62e52d3, 0x3f9212f612dee7fb
> +        .quad 0x3fefe0e9552ebdd6, 0x3f9181983e5133dd
> +        .quad 0x3fefe1fcfebe2083, 0x3f90f443edc5ce49
> +        .quad 0x3fefe307f2b503d0, 0x3f906ae13b0d3255
> +        .quad 0x3fefe40a6f70af4b, 0x3f8fcab1483ea7fc
> +        .quad 0x3fefe504b1d9696c, 0x3f8ec72615a894c4
> +        .quad 0x3fefe5f6f568b301, 0x3f8dcaf3691fc448
> +        .quad 0x3fefe6e1742f7cf6, 0x3f8cd5ec93c12432
> +        .quad 0x3fefe7c466dc57a1, 0x3f8be7e5ac24963b
> +        .quad 0x3fefe8a004c19ae6, 0x3f8b00b38d6b3575
> +        .quad 0x3fefe97483db8670, 0x3f8a202bd6372dce
> +        .quad 0x3fefea4218d6594a, 0x3f894624e78e0faf
> +        .quad 0x3fefeb08f7146046, 0x3f887275e3a6869e
> +        .quad 0x3fefebc950b3fa75, 0x3f87a4f6aca256cb
> +        .quad 0x3fefec835695932e, 0x3f86dd7fe3358230
> +        .quad 0x3fefed37386190fb, 0x3f861beae53b72b7
> +        .quad 0x3fefede5248e38f4, 0x3f856011cc3b036d
> +        .quad 0x3fefee8d486585ee, 0x3f84a9cf6bda3f4c
> +        .quad 0x3fefef2fd00af31a, 0x3f83f8ff5042a88e
> +        .quad 0x3fefefcce6813974, 0x3f834d7dbc76d7e5
> +        .quad 0x3feff064b5afffbe, 0x3f82a727a89a3f14
> +        .quad 0x3feff0f766697c76, 0x3f8205dac02bd6b9
> +        .quad 0x3feff18520700971, 0x3f81697560347b26
> +        .quad 0x3feff20e0a7ba8c2, 0x3f80d1d69569b82d
> +        .quad 0x3feff2924a3f7a83, 0x3f803ede1a45bfee
> +        .quad 0x3feff312046f2339, 0x3f7f60d8aa2a88f2
> +        .quad 0x3feff38d5cc4227f, 0x3f7e4cc4abf7d065
> +        .quad 0x3feff404760319b4, 0x3f7d4143a9dfe965
> +        .quad 0x3feff47772010262, 0x3f7c3e1a5f5c077c
> +        .quad 0x3feff4e671a85425, 0x3f7b430ecf4a83a8
> +        .quad 0x3feff55194fe19df, 0x3f7a4fe83fb9db25
> +        .quad 0x3feff5b8fb26f5f6, 0x3f79646f35a76624
> +        .quad 0x3feff61cc26c1578, 0x3f78806d70b2fc36
> +        .quad 0x3feff67d08401202, 0x3f77a3ade6c8b3e5
> +        .quad 0x3feff6d9e943c231, 0x3f76cdfcbfc1e263
> +        .quad 0x3feff733814af88c, 0x3f75ff2750fe7820
> +        .quad 0x3feff789eb6130c9, 0x3f7536fc18f7ce5c
> +        .quad 0x3feff7dd41ce2b4d, 0x3f74754abacdf1dc
> +        .quad 0x3feff82d9e1a76d8, 0x3f73b9e3f9d06e3f
> +        .quad 0x3feff87b1913e853, 0x3f730499b503957f
> +        .quad 0x3feff8c5cad200a5, 0x3f72553ee2a336bf
> +        .quad 0x3feff90dcaba4096, 0x3f71aba78ba3af89
> +        .quad 0x3feff9532f846ab0, 0x3f7107a8c7323a6e
> +        .quad 0x3feff9960f3eb327, 0x3f706918b6355624
> +        .quad 0x3feff9d67f51ddba, 0x3f6f9f9cfd9c3035
> +        .quad 0x3feffa14948549a7, 0x3f6e77448fb66bb9
> +        .quad 0x3feffa506302ebae, 0x3f6d58da68fd1170
> +        .quad 0x3feffa89fe5b3625, 0x3f6c4412bf4b8f0b
> +        .quad 0x3feffac17988ef4b, 0x3f6b38a3af2e55b4
> +        .quad 0x3feffaf6e6f4f5c0, 0x3f6a3645330550ff
> +        .quad 0x3feffb2a5879f35e, 0x3f693cb11a30d765
> +        .quad 0x3feffb5bdf67fe6f, 0x3f684ba3004a50d0
> +        .quad 0x3feffb8b8c88295f, 0x3f6762d84469c18f
> +        .quad 0x3feffbb970200110, 0x3f66821000795a03
> +        .quad 0x3feffbe599f4f9d9, 0x3f65a90b00981d93
> +        .quad 0x3feffc10194fcb64, 0x3f64d78bba8ca5fd
> +        .quad 0x3feffc38fcffbb7c, 0x3f640d564548fad7
> +        .quad 0x3feffc60535dd7f5, 0x3f634a305080681f
> +        .quad 0x3feffc862a501fd7, 0x3f628de11c5031eb
> +        .quad 0x3feffcaa8f4c9bea, 0x3f61d83170fbf6fb
> +        .quad 0x3feffccd8f5c66d1, 0x3f6128eb96be8798
> +        .quad 0x3feffcef371ea4d7, 0x3f607fdb4dafea5f
> +        .quad 0x3feffd0f92cb6ba7, 0x3f5fb99b8b8279e1
> +        .quad 0x3feffd2eae369a07, 0x3f5e7f232d9e2630
> +        .quad 0x3feffd4c94d29fdb, 0x3f5d4fed7195d7e8
> +        .quad 0x3feffd6951b33686, 0x3f5c2b9cf7f893bf
> +        .quad 0x3feffd84ef9009ee, 0x3f5b11d702b3deb2
> +        .quad 0x3feffd9f78c7524a, 0x3f5a024365f771bd
> +        .quad 0x3feffdb8f7605ee7, 0x3f58fc8c794b03b5
> +        .quad 0x3feffdd1750e1220, 0x3f58005f08d6f1ef
> +        .quad 0x3feffde8fb314ebf, 0x3f570d6a46e07dda
> +        .quad 0x3feffdff92db56e5, 0x3f56235fbd7a4345
> +        .quad 0x3feffe1544d01ccb, 0x3f5541f340697987
> +        .quad 0x3feffe2a1988857c, 0x3f5468dadf4080ab
> +        .quad 0x3feffe3e19349dc7, 0x3f5397ced7af2b15
> +        .quad 0x3feffe514bbdc197, 0x3f52ce898809244e
> +        .quad 0x3feffe63b8c8b5f7, 0x3f520cc76202c5fb
> +        .quad 0x3feffe7567b7b5e1, 0x3f515246dda49d47
> +        .quad 0x3feffe865fac722b, 0x3f509ec86c75d497
> +        .quad 0x3feffe96a78a04a9, 0x3f4fe41cd9bb4eee
> +        .quad 0x3feffea645f6d6da, 0x3f4e97ba3b77f306
> +        .quad 0x3feffeb5415e7c44, 0x3f4d57f524723822
> +        .quad 0x3feffec39ff380b9, 0x3f4c245d4b99847a
> +        .quad 0x3feffed167b12ac2, 0x3f4afc85e0f82e12
> +        .quad 0x3feffede9e5d3262, 0x3f49e005769dbc1d
> +        .quad 0x3feffeeb49896c6d, 0x3f48ce75e9f6f8a0
> +        .quad 0x3feffef76e956a9f, 0x3f47c7744d9378f7
> +        .quad 0x3fefff0312b010b5, 0x3f46caa0d3582fe9
> +        .quad 0x3fefff0e3ad91ec2, 0x3f45d79eb71e893b
> +        .quad 0x3fefff18ebe2b0e1, 0x3f44ee1429bf7cc0
> +        .quad 0x3fefff232a72b48e, 0x3f440daa3c89f5b6
> +        .quad 0x3fefff2cfb0453d9, 0x3f43360ccd23db3a
> +        .quad 0x3fefff3661e9569d, 0x3f4266ea71d4f71a
> +        .quad 0x3fefff3f634b79f9, 0x3f419ff4663ae9df
> +        .quad 0x3fefff48032dbe40, 0x3f40e0de78654d1e
> +        .quad 0x3fefff50456dab8c, 0x3f40295ef6591848
> +        .quad 0x3fefff582dc48d30, 0x3f3ef25d37f49fe1
> +        .quad 0x3fefff5fbfc8a439, 0x3f3da01102b5f851
> +        .quad 0x3fefff66feee5129, 0x3f3c5b5412dcafad
> +        .quad 0x3fefff6dee89352e, 0x3f3b23a5a23e4210
> +        .quad 0x3fefff7491cd4af6, 0x3f39f8893d8fd1c1
> +        .quad 0x3fefff7aebcff755, 0x3f38d986a4187285
> +        .quad 0x3fefff80ff8911fd, 0x3f37c629a822bc9e
> +        .quad 0x3fefff86cfd3e657, 0x3f36be02102b3520
> +        .quad 0x3fefff8c5f702ccf, 0x3f35c0a378c90bca
> +        .quad 0x3fefff91b102fca8, 0x3f34cda5374ea275
> +        .quad 0x3fefff96c717b695, 0x3f33e4a23d1f4703
> +        .quad 0x3fefff9ba420e834, 0x3f330538fbb77ecd
> +        .quad 0x3fefffa04a7928b1, 0x3f322f0b496539be
> +        .quad 0x3fefffa4bc63ee9a, 0x3f3161be46ad3b50
> +        .quad 0x3fefffa8fc0e5f33, 0x3f309cfa445b00ff
> +        .quad 0x3fefffad0b901755, 0x3f2fc0d55470cf51
> +        .quad 0x3fefffb0ecebee1b, 0x3f2e577bbcd49935
> +        .quad 0x3fefffb4a210b172, 0x3f2cfd4a5adec5c0
> +        .quad 0x3fefffb82cd9dcbf, 0x3f2bb1a9657ce465
> +        .quad 0x3fefffbb8f1049c6, 0x3f2a740684026555
> +        .quad 0x3fefffbeca6adbe9, 0x3f2943d4a1d1ed39
> +        .quad 0x3fefffc1e08f25f5, 0x3f28208bc334a6a5
> +        .quad 0x3fefffc4d3120aa1, 0x3f2709a8db59f25c
> +        .quad 0x3fefffc7a37857d2, 0x3f25feada379d8b7
> +        .quad 0x3fefffca53375ce3, 0x3f24ff207314a102
> +        .quad 0x3fefffcce3b57bff, 0x3f240a8c1949f75e
> +        .quad 0x3fefffcf564ab6b7, 0x3f23207fb7420eb9
> +        .quad 0x3fefffd1ac4135f9, 0x3f22408e9ba3327f
> +        .quad 0x3fefffd3e6d5cd87, 0x3f216a501f0e42ca
> +        .quad 0x3fefffd607387b07, 0x3f209d5f819c9e29
> +        .quad 0x3fefffd80e8ce0da, 0x3f1fb2b792b40a22
> +        .quad 0x3fefffd9fdeabcce, 0x3f1e3bcf436a1a95
> +        .quad 0x3fefffdbd65e5ad0, 0x3f1cd55277c18d05
> +        .quad 0x3fefffdd98e903b2, 0x3f1b7e94604479dc
> +        .quad 0x3fefffdf46816833, 0x3f1a36eec00926dd
> +        .quad 0x3fefffe0e0140857, 0x3f18fdc1b2dcf7b9
> +        .quad 0x3fefffe26683972a, 0x3f17d2737527c3f9
> +        .quad 0x3fefffe3daa95b18, 0x3f16b4702d7d5849
> +        .quad 0x3fefffe53d558ae9, 0x3f15a329b7d30748
> +        .quad 0x3fefffe68f4fa777, 0x3f149e17724f4d41
> +        .quad 0x3fefffe7d156d244, 0x3f13a4b60ba9aa4e
> +        .quad 0x3fefffe904222101, 0x3f12b6875310f785
> +        .quad 0x3fefffea2860ee1e, 0x3f11d312098e9dba
> +        .quad 0x3fefffeb3ebb267b, 0x3f10f9e1b4dd36df
> +        .quad 0x3fefffec47d19457, 0x3f102a8673a94692
> +        .quad 0x3fefffed443e2787, 0x3f0ec929a665b449
> +        .quad 0x3fefffee34943b15, 0x3f0d4f4b4c8e09ed
> +        .quad 0x3fefffef1960d85d, 0x3f0be6abbb10a5aa
> +        .quad 0x3fefffeff32af7af, 0x3f0a8e8cc1fadef6
> +        .quad 0x3feffff0c273bea2, 0x3f094637d5bacfdb
> +        .quad 0x3feffff187b6bc0e, 0x3f080cfdc72220cf
> +        .quad 0x3feffff2436a21dc, 0x3f06e2367dc27f95
> +        .quad 0x3feffff2f5fefcaa, 0x3f05c540b4936fd2
> +        .quad 0x3feffff39fe16963, 0x3f04b581b8d170fc
> +        .quad 0x3feffff44178c8d2, 0x3f03b2652b06c2b2
> +        .quad 0x3feffff4db27f146, 0x3f02bb5cc22e5db6
> +        .quad 0x3feffff56d4d5e5e, 0x3f01cfe010e2052d
> +        .quad 0x3feffff5f8435efc, 0x3f00ef6c4c84a0fe
> +        .quad 0x3feffff67c604180, 0x3f001984165a5f36
> +        .quad 0x3feffff6f9f67e55, 0x3efe9b5e8d00ce77
> +        .quad 0x3feffff77154e0d6, 0x3efd16f5716c6c1a
> +        .quad 0x3feffff7e2c6aea2, 0x3efba4f035d60e03
> +        .quad 0x3feffff84e93cd75, 0x3efa447b7b03f045
> +        .quad 0x3feffff8b500e77c, 0x3ef8f4ccca7fc90d
> +        .quad 0x3feffff9164f8e46, 0x3ef7b5223dac7336
> +        .quad 0x3feffff972be5c59, 0x3ef684c227fcacef
> +        .quad 0x3feffff9ca891572, 0x3ef562fac4329b48
> +        .quad 0x3feffffa1de8c582, 0x3ef44f21e49054f2
> +        .quad 0x3feffffa6d13de73, 0x3ef34894a5e24657
> +        .quad 0x3feffffab83e54b8, 0x3ef24eb7254ccf83
> +        .quad 0x3feffffaff99bac4, 0x3ef160f438c70913
> +        .quad 0x3feffffb43555b5f, 0x3ef07ebd2a2d2844
> +        .quad 0x3feffffb839e52f3, 0x3eef4f12e9ab070a
> +        .quad 0x3feffffbc09fa7cd, 0x3eedb5ad0b27805c
> +        .quad 0x3feffffbfa82616b, 0x3eec304efa2c6f4e
> +        .quad 0x3feffffc316d9ed0, 0x3eeabe09e9144b5e
> +        .quad 0x3feffffc6586abf6, 0x3ee95df988e76644
> +        .quad 0x3feffffc96f1165e, 0x3ee80f439b4ee04b
> +        .quad 0x3feffffcc5cec0c1, 0x3ee6d11788a69c64
> +        .quad 0x3feffffcf23ff5fc, 0x3ee5a2adfa0b4bc4
> +        .quad 0x3feffffd1c637b2b, 0x3ee4834877429b8f
> +        .quad 0x3feffffd4456a10d, 0x3ee37231085c7d9a
> +        .quad 0x3feffffd6a3554a1, 0x3ee26eb9daed6f7e
> +        .quad 0x3feffffd8e1a2f22, 0x3ee1783ceac28910
> +        .quad 0x3feffffdb01e8546, 0x3ee08e1badf0fced
> +        .quad 0x3feffffdd05a75ea, 0x3edf5f7d88472604
> +        .quad 0x3feffffdeee4f810, 0x3eddb92b5212fb8d
> +        .quad 0x3feffffe0bd3e852, 0x3edc282cd3957eda
> +        .quad 0x3feffffe273c15b7, 0x3edaab7abace48dc
> +        .quad 0x3feffffe41314e06, 0x3ed94219bfcb4928
> +        .quad 0x3feffffe59c6698b, 0x3ed7eb1a2075864e
> +        .quad 0x3feffffe710d565e, 0x3ed6a597219a93da
> +        .quad 0x3feffffe8717232d, 0x3ed570b69502f313
> +        .quad 0x3feffffe9bf4098c, 0x3ed44ba864670882
> +        .quad 0x3feffffeafb377d5, 0x3ed335a62115bce2
> +        .quad 0x3feffffec2641a9e, 0x3ed22df298214423
> +        .quad 0x3feffffed413e5b7, 0x3ed133d96ae7e0dd
> +        .quad 0x3feffffee4d01cd6, 0x3ed046aeabcfcdec
> +        .quad 0x3feffffef4a55bd4, 0x3ececb9cfe1d8642
> +        .quad 0x3fefffff039f9e8f, 0x3ecd21397ead99cb
> +        .quad 0x3fefffff11ca4876, 0x3ecb8d094c86d374
> +        .quad 0x3fefffff1f302bc1, 0x3eca0df0f0c626dc
> +        .quad 0x3fefffff2bdb904d, 0x3ec8a2e269750a39
> +        .quad 0x3fefffff37d63a36, 0x3ec74adc8f4064d3
> +        .quad 0x3fefffff43297019, 0x3ec604ea819f007c
> +        .quad 0x3fefffff4dde0118, 0x3ec4d0231928c6f9
> +        .quad 0x3fefffff57fc4a95, 0x3ec3aba85fe22e20
> +        .quad 0x3fefffff618c3da6, 0x3ec296a70f414053
> +        .quad 0x3fefffff6a956450, 0x3ec1905613b3abf2
> +        .quad 0x3fefffff731ee681, 0x3ec097f6156f32c5
> +        .quad 0x3fefffff7b2f8ed6, 0x3ebf59a20caf6695
> +        .quad 0x3fefffff82cdcf1b, 0x3ebd9c73698fb1dc
> +        .quad 0x3fefffff89ffc4aa, 0x3ebbf716c6168bae
> +        .quad 0x3fefffff90cb3c81, 0x3eba6852c6b58392
> +        .quad 0x3fefffff9735b73b, 0x3eb8eefd70594a89
> +        .quad 0x3fefffff9d446ccc, 0x3eb789fb715aae95
> +        .quad 0x3fefffffa2fc5015, 0x3eb6383f726a8e04
> +        .quad 0x3fefffffa8621251, 0x3eb4f8c96f26a26a
> +        .quad 0x3fefffffad7a2652, 0x3eb3caa61607f920
> +        .quad 0x3fefffffb248c39d, 0x3eb2acee2f5ecdb8
> +        .quad 0x3fefffffb6d1e95d, 0x3eb19ec60b1242ed
> +        .quad 0x3fefffffbb196132, 0x3eb09f5cf4dd2877
> +        .quad 0x3fefffffbf22c1e2, 0x3eaf5bd95d8730d8
> +        .quad 0x3fefffffc2f171e3, 0x3ead9371e2ff7c35
> +        .quad 0x3fefffffc688a9cf, 0x3eabe41de54d155a
> +        .quad 0x3fefffffc9eb76ac, 0x3eaa4c89e08ef4f3
> +        .quad 0x3fefffffcd1cbc28, 0x3ea8cb738399b12c
> +        .quad 0x3fefffffd01f36af, 0x3ea75fa8dbc84bec
> +        .quad 0x3fefffffd2f57d68, 0x3ea608078a70dcbc
> +        .quad 0x3fefffffd5a2041f, 0x3ea4c37c0394d094
> +        .quad 0x3fefffffd8271d12, 0x3ea39100d5687bfe
> +        .quad 0x3fefffffda86faa9, 0x3ea26f9df8519bd7
> +        .quad 0x3fefffffdcc3b117, 0x3ea15e6827001f18
> +        .quad 0x3fefffffdedf37ed, 0x3ea05c803e4831c1
> +        .quad 0x3fefffffe0db6b91, 0x3e9ed22548cffd35
> +        .quad 0x3fefffffe2ba0ea5, 0x3e9d06ad6ecdf971
> +        .quad 0x3fefffffe47ccb60, 0x3e9b551c847fbc96
> +        .quad 0x3fefffffe62534d4, 0x3e99bc09f112b494
> +        .quad 0x3fefffffe7b4c81e, 0x3e983a1ff0aa239d
> +        .quad 0x3fefffffe92ced93, 0x3e96ce1aa3fd7bdd
> +        .quad 0x3fefffffea8ef9cf, 0x3e9576c72b514859
> +        .quad 0x3fefffffebdc2ec6, 0x3e943302cc4a0da8
> +        .quad 0x3fefffffed15bcba, 0x3e9301ba221dc9bb
> +        .quad 0x3fefffffee3cc32c, 0x3e91e1e857adc568
> +        .quad 0x3fefffffef5251c2, 0x3e90d2966b1746f7
> +        .quad 0x3feffffff0576917, 0x3e8fa5b4f49cc6b2
> +        .quad 0x3feffffff14cfb92, 0x3e8dc3ae30b55c16
> +        .quad 0x3feffffff233ee1d, 0x3e8bfd7555a3bd68
> +        .quad 0x3feffffff30d18e8, 0x3e8a517d9e61628a
> +        .quad 0x3feffffff3d9480f, 0x3e88be4f8f6c951f
> +        .quad 0x3feffffff4993c46, 0x3e874287ded49339
> +        .quad 0x3feffffff54dab72, 0x3e85dcd669f2cd34
> +        .quad 0x3feffffff5f74141, 0x3e848bfd38302871
> +        .quad 0x3feffffff6969fb8, 0x3e834ecf8a3c124a
> +        .quad 0x3feffffff72c5fb6, 0x3e822430f521cbcf
> +        .quad 0x3feffffff7b91176, 0x3e810b1488aeb235
> +        .quad 0x3feffffff83d3d07, 0x3e80027c00a263a6
> +        .quad 0x3feffffff8b962be, 0x3e7e12ee004efc37
> +        .quad 0x3feffffff92dfba2, 0x3e7c3e44ae32b16b
> +        .quad 0x3feffffff99b79d2, 0x3e7a854ea14102a8
> +        .quad 0x3feffffffa0248e8, 0x3e78e6761569f45d
> +        .quad 0x3feffffffa62ce54, 0x3e77603bac345f65
> +        .quad 0x3feffffffabd69b4, 0x3e75f1353cdad001
> +        .quad 0x3feffffffb127525, 0x3e74980cb3c80949
> +        .quad 0x3feffffffb624592, 0x3e73537f00b6ad4d
> +        .quad 0x3feffffffbad2aff, 0x3e72225b12bffc68
> +        .quad 0x3feffffffbf370cd, 0x3e710380e1adb7e9
> +        .quad 0x3feffffffc355dfd, 0x3e6febc107d5efaa
> +        .quad 0x3feffffffc733572, 0x3e6df0f2a0ee6947
> +        .quad 0x3feffffffcad3626, 0x3e6c14b2188bcee4
> +        .quad 0x3feffffffce39b67, 0x3e6a553644f7f07d
> +        .quad 0x3feffffffd169d0c, 0x3e68b0cfce0579e0
> +        .quad 0x3feffffffd466fa5, 0x3e6725e7c5dd20f7
> +        .quad 0x3feffffffd7344aa, 0x3e65b2fe547a1340
> +        .quad 0x3feffffffd9d4aab, 0x3e6456a974e92e93
> +        .quad 0x3feffffffdc4ad7a, 0x3e630f93c3699078
> +        .quad 0x3feffffffde9964e, 0x3e61dc7b5b978cf8
> +        .quad 0x3feffffffe0c2bf0, 0x3e60bc30c5d52f15
> +        .quad 0x3feffffffe2c92db, 0x3e5f5b2be65a0c7f
> +        .quad 0x3feffffffe4aed5e, 0x3e5d5f3a8dea7357
> +        .quad 0x3feffffffe675bbd, 0x3e5b82915b03515b
> +        .quad 0x3feffffffe81fc4e, 0x3e59c3517e789488
> +        .quad 0x3feffffffe9aeb97, 0x3e581fb7df06136e
> +        .quad 0x3feffffffeb24467, 0x3e56961b8d641d06
> +        .quad 0x3feffffffec81ff2, 0x3e5524ec4d916cae
> +        .quad 0x3feffffffedc95e7, 0x3e53cab1343d18d1
> +        .quad 0x3feffffffeefbc85, 0x3e52860757487a01
> +        .quad 0x3fefffffff01a8b6, 0x3e5155a09065d4f7
> +        .quad 0x3fefffffff126e1e, 0x3e50384250e4c9fc
> +        .quad 0x3fefffffff221f30, 0x3e4e59890b926c78
> +        .quad 0x3fefffffff30cd3f, 0x3e4c642116a8a9e3
> +        .quad 0x3fefffffff3e8892, 0x3e4a8e405e651ab6
> +        .quad 0x3fefffffff4b606f, 0x3e48d5f98114f872
> +        .quad 0x3fefffffff57632d, 0x3e47397c5a66e307
> +        .quad 0x3fefffffff629e44, 0x3e45b71456c5a4c4
> +        .quad 0x3fefffffff6d1e56, 0x3e444d26de513197
> +        .quad 0x3fefffffff76ef3f, 0x3e42fa31d6371537
> +        .quad 0x3fefffffff801c1f, 0x3e41bcca373b7b43
> +        .quad 0x3fefffffff88af67, 0x3e40939ab853339f
> +        .quad 0x3fefffffff90b2e3, 0x3e3efac5187b2863
> +        .quad 0x3fefffffff982fc1, 0x3e3cf1e86235d0e7
> +        .quad 0x3fefffffff9f2e9f, 0x3e3b0a68a2128bab
> +        .quad 0x3fefffffffa5b790, 0x3e39423165bc4444
> +        .quad 0x3fefffffffabd229, 0x3e37974e743dea3d
> +        .quad 0x3fefffffffb18582, 0x3e3607e9eacd1050
> +        .quad 0x3fefffffffb6d844, 0x3e34924a74dec729
> +        .quad 0x3fefffffffbbd0aa, 0x3e3334d19e0c2160
> +        .quad 0x3fefffffffc0748f, 0x3e31edfa3c5f5cca
> +        .quad 0x3fefffffffc4c96c, 0x3e30bc56f1b54701
> +        .quad 0x3fefffffffc8d462, 0x3e2f3d2185e047d9
> +        .quad 0x3fefffffffcc9a41, 0x3e2d26cb87945e87
> +        .quad 0x3fefffffffd01f89, 0x3e2b334fac4b9f99
> +        .quad 0x3fefffffffd36871, 0x3e296076f7918d1c
> +        .quad 0x3fefffffffd678ed, 0x3e27ac2d72fc2c63
> +        .quad 0x3fefffffffd954ae, 0x3e2614801550319e
> +        .quad 0x3fefffffffdbff2a, 0x3e24979ac8b28927
> +        .quad 0x3fefffffffde7ba0, 0x3e2333c68e2d0548
> +        .quad 0x3fefffffffe0cd16, 0x3e21e767bce37dd7
> +        .quad 0x3fefffffffe2f664, 0x3e20b0fc5b6d05a0
> +        .quad 0x3fefffffffe4fa30, 0x3e1f1e3523b41d7d
> +        .quad 0x3fefffffffe6daf7, 0x3e1d00de6608effe
> +        .quad 0x3fefffffffe89b0c, 0x3e1b0778b7b3301b
> +        .quad 0x3fefffffffea3c9a, 0x3e192fb04ec0f6cf
> +        .quad 0x3fefffffffebc1a9, 0x3e177756ec9f78fa
> +        .quad 0x3fefffffffed2c21, 0x3e15dc61922d5a06
> +        .quad 0x3fefffffffee7dc8, 0x3e145ce65699ff6d
> +        .quad 0x3fefffffffefb847, 0x3e12f71a5f159970
> +        .quad 0x3feffffffff0dd2b, 0x3e11a94ff571654f
> +        .quad 0x3feffffffff1ede9, 0x3e1071f4bbea09ec
> +        .quad 0x3feffffffff2ebda, 0x3e0e9f1ff8ddd774
> +        .quad 0x3feffffffff3d843, 0x3e0c818223a202c7
> +        .quad 0x3feffffffff4b453, 0x3e0a887bd2b4404d
> +        .quad 0x3feffffffff58126, 0x3e08b1a336c5eb6b
> +        .quad 0x3feffffffff63fc3, 0x3e06fab63324088a
> +        .quad 0x3feffffffff6f121, 0x3e056197e30205ba
> +        .quad 0x3feffffffff79626, 0x3e03e44e45301b92
> +        .quad 0x3feffffffff82fab, 0x3e0281000bfe4c3f
> +        .quad 0x3feffffffff8be77, 0x3e0135f28f2d50b4
> +        .quad 0x3feffffffff94346, 0x3e000187dded5975
> +        .quad 0x3feffffffff9bec8, 0x3dfdc479de0ef001
> +        .quad 0x3feffffffffa319f, 0x3dfbad4fdad3caa1
> +        .quad 0x3feffffffffa9c63, 0x3df9baed3ed27ab8
> +        .quad 0x3feffffffffaffa4, 0x3df7ead9ce4285bb
> +        .quad 0x3feffffffffb5be5, 0x3df63ac6b4edc88e
> +        .quad 0x3feffffffffbb1a2, 0x3df4a88be2a6390c
> +        .quad 0x3feffffffffc014e, 0x3df332259185f1a0
> +        .quad 0x3feffffffffc4b56, 0x3df1d5b1f3793044
> +        .quad 0x3feffffffffc901c, 0x3df0916f04b6e18b
> +        .quad 0x3feffffffffccfff, 0x3deec77101de6926
> +        .quad 0x3feffffffffd0b56, 0x3dec960bf23153e0
> +        .quad 0x3feffffffffd4271, 0x3dea8bd20fc65ef7
> +        .quad 0x3feffffffffd759d, 0x3de8a61745ec7d1d
> +        .quad 0x3feffffffffda520, 0x3de6e25d0e756261
> +        .quad 0x3feffffffffdd13c, 0x3de53e4f7d1666cb
> +        .quad 0x3feffffffffdfa2d, 0x3de3b7c27a7ddb0e
> +        .quad 0x3feffffffffe202d, 0x3de24caf2c32af14
> +        .quad 0x3feffffffffe4371, 0x3de0fb3186804d0f
> +        .quad 0x3feffffffffe642a, 0x3ddf830c0bb41fd7
> +        .quad 0x3feffffffffe8286, 0x3ddd3c0f1a91c846
> +        .quad 0x3feffffffffe9eb0, 0x3ddb1e5acf351d87
> +        .quad 0x3feffffffffeb8d0, 0x3dd92712d259ce66
> +        .quad 0x3feffffffffed10a, 0x3dd7538c60a04476
> +        .quad 0x3feffffffffee782, 0x3dd5a14b04b47879
> +        .quad 0x3feffffffffefc57, 0x3dd40dfd87456f4c
> +        .quad 0x3fefffffffff0fa7, 0x3dd2977b1172b9d5
> +        .quad 0x3fefffffffff218f, 0x3dd13bc07e891491
> +        .quad 0x3fefffffffff3227, 0x3dcff1dbb4300811
> +        .quad 0x3fefffffffff4188, 0x3dcd9a880f306bd8
> +        .quad 0x3fefffffffff4fc9, 0x3dcb6e45220b55e0
> +        .quad 0x3fefffffffff5cfd, 0x3dc96a0b33f2c4da
> +        .quad 0x3fefffffffff6939, 0x3dc78b07e9e924ac
> +        .quad 0x3fefffffffff748e, 0x3dc5ce9ab1670dd2
> +        .quad 0x3fefffffffff7f0d, 0x3dc4325167006bb0
> +        .quad 0x3fefffffffff88c5, 0x3dc2b3e53538ff3f
> +        .quad 0x3fefffffffff91c6, 0x3dc15137a7f44864
> +        .quad 0x3fefffffffff9a1b, 0x3dc0084ff125639d
> +        .quad 0x3fefffffffffa1d2, 0x3dbdaeb0b7311ec7
> +        .quad 0x3fefffffffffa8f6, 0x3dbb7937d1c40c53
> +        .quad 0x3fefffffffffaf92, 0x3db96d082f59ab06
> +        .quad 0x3fefffffffffb5b0, 0x3db7872d9fa10aad
> +        .quad 0x3fefffffffffbb58, 0x3db5c4e8e37bc7d0
> +        .quad 0x3fefffffffffc095, 0x3db423ac0df49a40
> +        .quad 0x3fefffffffffc56d, 0x3db2a117230ad284
> +        .quad 0x3fefffffffffc9e8, 0x3db13af4f04f9998
> +        .quad 0x3fefffffffffce0d, 0x3dafde703724e560
> +        .quad 0x3fefffffffffd1e1, 0x3dad77f0c82e7641
> +        .quad 0x3fefffffffffd56c, 0x3dab3ee02611d7dd
> +        .quad 0x3fefffffffffd8b3, 0x3da92ff33023d5bd
> +        .quad 0x3fefffffffffdbba, 0x3da7481a9e69f53f
> +        .quad 0x3fefffffffffde86, 0x3da5847eda620959
> +        .quad 0x3fefffffffffe11d, 0x3da3e27c1fcc74bd
> +        .quad 0x3fefffffffffe380, 0x3da25f9ee0b923dc
> +        .quad 0x3fefffffffffe5b6, 0x3da0f9a068653200
> +        .quad 0x3fefffffffffe7c0, 0x3d9f5cc7718082b0
> +        .quad 0x3fefffffffffe9a2, 0x3d9cf7e53d6a2ca5
> +        .quad 0x3fefffffffffeb60, 0x3d9ac0f5f3229372
> +        .quad 0x3fefffffffffecfb, 0x3d98b498644847ea
> +        .quad 0x3fefffffffffee77, 0x3d96cfa9bcca59dc
> +        .quad 0x3fefffffffffefd6, 0x3d950f411d4fd2cd
> +        .quad 0x3feffffffffff11a, 0x3d9370ab8327af5e
> +        .quad 0x3feffffffffff245, 0x3d91f167f88c6b6e
> +        .quad 0x3feffffffffff359, 0x3d908f24085d4597
> +        .quad 0x3feffffffffff457, 0x3d8e8f70e181d61a
> +        .quad 0x3feffffffffff542, 0x3d8c324c20e337dc
> +        .quad 0x3feffffffffff61b, 0x3d8a03261574b54e
> +        .quad 0x3feffffffffff6e3, 0x3d87fe903cdf5855
> +        .quad 0x3feffffffffff79b, 0x3d86215c58da3450
> +        .quad 0x3feffffffffff845, 0x3d846897d4b69fc6
> +        .quad 0x3feffffffffff8e2, 0x3d82d1877d731b7b
> +        .quad 0x3feffffffffff973, 0x3d8159a386b11517
> +        .quad 0x3feffffffffff9f8, 0x3d7ffd27ae9393ce
> +        .quad 0x3feffffffffffa73, 0x3d7d7c593130dd0b
> +        .quad 0x3feffffffffffae4, 0x3d7b2cd607c79bcf
> +        .quad 0x3feffffffffffb4c, 0x3d790ae4d3405651
> +        .quad 0x3feffffffffffbad, 0x3d771312dd1759e2
> +        .quad 0x3feffffffffffc05, 0x3d75422ef5d8949d
> +        .quad 0x3feffffffffffc57, 0x3d739544b0ecc957
> +        .quad 0x3feffffffffffca2, 0x3d720997f73e73dd
> +        .quad 0x3feffffffffffce7, 0x3d709ca0eaacd277
> +        .quad 0x3feffffffffffd27, 0x3d6e9810295890ec
> +        .quad 0x3feffffffffffd62, 0x3d6c2b45b5aa4a1d
> +        .quad 0x3feffffffffffd98, 0x3d69eee068fa7596
> +        .quad 0x3feffffffffffdca, 0x3d67df2b399c10a8
> +        .quad 0x3feffffffffffdf8, 0x3d65f8b87a31bd85
> +        .quad 0x3feffffffffffe22, 0x3d64385c96e9a2d9
> +        .quad 0x3feffffffffffe49, 0x3d629b2933ef4cbc
> +        .quad 0x3feffffffffffe6c, 0x3d611e68a6378f8a
> +        .quad 0x3feffffffffffe8d, 0x3d5f7f338086a86b
> +        .quad 0x3feffffffffffeab, 0x3d5cf8d7d9ce040a
> +        .quad 0x3feffffffffffec7, 0x3d5aa577251ae485
> +        .quad 0x3feffffffffffee1, 0x3d58811d739efb5f
> +        .quad 0x3feffffffffffef8, 0x3d568823e52970be
> +        .quad 0x3fefffffffffff0e, 0x3d54b72ae68e8b4c
> +        .quad 0x3fefffffffffff22, 0x3d530b14dbe876bc
> +        .quad 0x3fefffffffffff34, 0x3d5181012ef86610
> +        .quad 0x3fefffffffffff45, 0x3d501647ba798745
> +        .quad 0x3fefffffffffff54, 0x3d4d90e917701675
> +        .quad 0x3fefffffffffff62, 0x3d4b2a87e86d0c8a
> +        .quad 0x3fefffffffffff6f, 0x3d48f53dcb377293
> +        .quad 0x3fefffffffffff7b, 0x3d46ed2f2515e933
> +        .quad 0x3fefffffffffff86, 0x3d450ecc9ed47f19
> +        .quad 0x3fefffffffffff90, 0x3d4356cd5ce7799e
> +        .quad 0x3fefffffffffff9a, 0x3d41c229a587ab78
> +        .quad 0x3fefffffffffffa2, 0x3d404e15ecc7f3f6
> +        .quad 0x3fefffffffffffaa, 0x3d3deffc7e6a6017
> +        .quad 0x3fefffffffffffb1, 0x3d3b7b040832f310
> +        .quad 0x3fefffffffffffb8, 0x3d3938e021f36d76
> +        .quad 0x3fefffffffffffbe, 0x3d37258610b3b233
> +        .quad 0x3fefffffffffffc3, 0x3d353d3bfc82a909
> +        .quad 0x3fefffffffffffc8, 0x3d337c92babdc2fd
> +        .quad 0x3fefffffffffffcd, 0x3d31e06010120f6a
> +        .quad 0x3fefffffffffffd1, 0x3d3065b9616170d4
> +        .quad 0x3fefffffffffffd5, 0x3d2e13dd96b3753b
> +        .quad 0x3fefffffffffffd9, 0x3d2b950d32467392
> +        .quad 0x3fefffffffffffdc, 0x3d294a72263259a5
> +        .quad 0x3fefffffffffffdf, 0x3d272fd93e036cdc
> +        .quad 0x3fefffffffffffe2, 0x3d254164576929ab
> +        .quad 0x3fefffffffffffe4, 0x3d237b83c521fe96
> +        .quad 0x3fefffffffffffe7, 0x3d21daf033182e96
> +        .quad 0x3fefffffffffffe9, 0x3d205ca50205d26a
> +        .quad 0x3fefffffffffffeb, 0x3d1dfbb6235639fa
> +        .quad 0x3fefffffffffffed, 0x3d1b7807e294781f
> +        .quad 0x3fefffffffffffee, 0x3d19298add70a734
> +        .quad 0x3feffffffffffff0, 0x3d170beaf9c7ffb6
> +        .quad 0x3feffffffffffff1, 0x3d151b2cd6709222
> +        .quad 0x3feffffffffffff3, 0x3d1353a6cf7f7fff
> +        .quad 0x3feffffffffffff4, 0x3d11b1fa8cbe84a7
> +        .quad 0x3feffffffffffff5, 0x3d10330f0fd69921
> +        .quad 0x3feffffffffffff6, 0x3d0da81670f96f9b
> +        .quad 0x3feffffffffffff7, 0x3d0b24a16b4d09aa
> +        .quad 0x3feffffffffffff7, 0x3d08d6eeb6efdbd6
> +        .quad 0x3feffffffffffff8, 0x3d06ba91ac734786
> +        .quad 0x3feffffffffffff9, 0x3d04cb7966770ab5
> +        .quad 0x3feffffffffffff9, 0x3d0305e9721d0981
> +        .quad 0x3feffffffffffffa, 0x3d01667311fff70a
> +        .quad 0x3feffffffffffffb, 0x3cffd3de10d62855
> +        .quad 0x3feffffffffffffb, 0x3cfd1aefbcd48d0c
> +        .quad 0x3feffffffffffffb, 0x3cfa9cc93c25aca9
> +        .quad 0x3feffffffffffffc, 0x3cf85487ee3ea735
> +        .quad 0x3feffffffffffffc, 0x3cf63daf8b4b1e0c
> +        .quad 0x3feffffffffffffd, 0x3cf45421e69a6ca1
> +        .quad 0x3feffffffffffffd, 0x3cf294175802d99a
> +        .quad 0x3feffffffffffffd, 0x3cf0fa17bf41068f
> +        .quad 0x3feffffffffffffd, 0x3cef05e82aae2bb9
> +        .quad 0x3feffffffffffffe, 0x3cec578101b29058
> +        .quad 0x3feffffffffffffe, 0x3ce9e39dc5dd2f7c
> +        .quad 0x3feffffffffffffe, 0x3ce7a553a728bbf2
> +        .quad 0x3feffffffffffffe, 0x3ce5982008db1304
> +        .quad 0x3feffffffffffffe, 0x3ce3b7e00422e51b
> +        .quad 0x3feffffffffffffe, 0x3ce200c898d9ee3e
> +        .quad 0x3fefffffffffffff, 0x3ce06f5f7eb65a56
> +        .quad 0x3fefffffffffffff, 0x3cde00e9148a1d25
> +        .quad 0x3fefffffffffffff, 0x3cdb623734024e92
> +        .quad 0x3fefffffffffffff, 0x3cd8fd4e01891bf8
> +        .quad 0x3fefffffffffffff, 0x3cd6cd44c7470d89
> +        .quad 0x3fefffffffffffff, 0x3cd4cd9c04158cd7
> +        .quad 0x3fefffffffffffff, 0x3cd2fa34bf5c8344
> +        .quad 0x3fefffffffffffff, 0x3cd14f4890ff2461
> +        .quad 0x3fefffffffffffff, 0x3ccf92c49dfa4df5
> +        .quad 0x3fefffffffffffff, 0x3ccccaaea71ab0df
> +        .quad 0x3fefffffffffffff, 0x3cca40829f001197
> +        .quad 0x3ff0000000000000, 0x3cc7eef13b59e96c
> +        .quad 0x3ff0000000000000, 0x3cc5d11e1a252bf5
> +        .quad 0x3ff0000000000000, 0x3cc3e296303b2297
> +        .quad 0x3ff0000000000000, 0x3cc21f47009f43ce
> +        .quad 0x3ff0000000000000, 0x3cc083768c5e4542
> +        .quad 0x3ff0000000000000, 0x3cbe1777d831265f
> +        .quad 0x3ff0000000000000, 0x3cbb69f10b0191b5
> +        .quad 0x3ff0000000000000, 0x3cb8f8a3a05b5b53
> +        .quad 0x3ff0000000000000, 0x3cb6be573c40c8e7
> +        .quad 0x3ff0000000000000, 0x3cb4b645ba991fdb
> +        .align 16
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff  /* _AbsMask */
> +        .align 16
> +        .quad 0x4017f80000000000, 0x4017f80000000000  /* _MaxThreshold = 6.0 - 1.0/128.0 */
> +        .align 16
> +        .quad 0x42c0000000000000, 0x42c0000000000000  /* SRound */
> +        .align 16
> +        .quad 0x2ff0000000000000, 0x2ff0000000000000  /* _U2THreshold  */
> +        .align 16
> +        .quad 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5  /* _poly_1_0 */
> +        .align 16
> +        .quad 0x3fc1111235a363b1, 0x3fc1111235a363b1  /* _poly_1_1 */
> +        .align 16
> +        .quad 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57  /* _poly_3_0 */
> +        .align 16
> +        .quad 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8  /* _poly_3_1 */
> +        .align 16
> +        .quad 0xbfc5555800001B4F, 0xbfc5555800001B4F  /* _poly_5_0 */
> +        .align 16
> +        .quad 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122  /* _poly_5_1 */
> +        .align 16
> +        .quad 0xbfd55555555547f6, 0xbfd55555555547f6  /* _poly_1_2 */
> +        .align 16
> +        .quad 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd  /* _poly_3_2 */
> +        .align 16
> +        .quad 0x3fe5555555554b0c, 0x3fe5555555554b0c  /* _poly_1_3 */
> +        .align 16
> +        .quad 0xbfd5555555555555, 0xbfd5555555555555  /* _poly_3_3 */
> +        .align 16
> +        .type	__svml_derf_data_internal,@object
> +        .size	__svml_derf_data_internal,.-__svml_derf_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core-sse.S
> new file mode 100644
> index 0000000000..704785738f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized erf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_erf _ZGVdN4v_erf_sse_wrapper
> +#include "../svml_d_erf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core.c
> new file mode 100644
> index 0000000000..0647917209
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized erf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_erf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_erf, __GI__ZGVdN4v_erf, __redirect__ZGVdN4v_erf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core_avx2.S
> new file mode 100644
> index 0000000000..bd7226cd5c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf4_core_avx2.S
> @@ -0,0 +1,984 @@
> +/* Function erf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Basic formula is
> + *    erf(x) ~ erf(x0) +
> + *              + exp(-x0*x0)*D*(1+c0+T*P1(T)+D^2*P3(T)+D^4*P5(T)+D^6*p7+D^8*p9)
> + *   where D=x-x0, T=x0*D
> + *   x0 is x rounded to a specified number of fractional bits (in this case 7),
> + *    except that x0=0 for |x|<3.5/128.0 (using x0=0 for first 4 table entries)
> + *
> + *   Data table packs both erf(x0)_high and a few bits of erf(x0)_low in one
> + *   entry (in place of redundant exponent bits)
> + *
> + */
> +
> +/* Offsets for data table __svml_derf_data_internal
> + */
> +#define _erf_tbl                      	0
> +#define _AbsMask                      	12288
> +#define _MaxThreshold                 	12320
> +#define _SRound                       	12352
> +#define _U2Threshold                  	12384
> +#define _poly1_0                      	12416
> +#define _poly1_1                      	12448
> +#define _poly3_0                      	12480
> +#define _poly3_1                      	12512
> +#define _poly5_0                      	12544
> +#define _poly5_1                      	12576
> +#define _poly1_2                      	12608
> +#define _poly3_2                      	12640
> +#define _poly1_3                      	12672
> +#define _poly3_3                      	12704
> +#define _Mask32                       	12736
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_erf_avx2)
> +/*
> + * vector gather: erf(x0),
> + * second value is exp(-x0*x0)
> + */
> +        lea       __svml_derf_data_internal(%rip), %rdi
> +        vmovupd   _SRound+__svml_derf_data_internal(%rip), %ymm6
> +        vandpd    _AbsMask+__svml_derf_data_internal(%rip), %ymm0, %ymm5
> +
> +/*
> + * erf(x) rounds to 1.0 for x>_MaxThreshold (5.9921875)
> + * can compute all results in the main path
> + */
> +        vminpd    _MaxThreshold+__svml_derf_data_internal(%rip), %ymm5, %ymm7
> +        vaddpd    %ymm6, %ymm7, %ymm10
> +        vcmpgt_oqpd _U2Threshold+__svml_derf_data_internal(%rip), %ymm7, %ymm9
> +        vpsllq    $4, %ymm10, %ymm11
> +        vsubpd    %ymm6, %ymm10, %ymm8
> +        vandps    _Mask32+__svml_derf_data_internal(%rip), %ymm11, %ymm12
> +        vsubpd    %ymm8, %ymm7, %ymm3
> +        vmulpd    %ymm3, %ymm8, %ymm2
> +        vandpd    %ymm9, %ymm3, %ymm1
> +
> +/* NaN fixup */
> +        vminpd    %ymm5, %ymm3, %ymm3
> +
> +/* save sign */
> +        vxorpd    %ymm0, %ymm5, %ymm4
> +
> +/* T^2 */
> +        vmulpd    %ymm2, %ymm2, %ymm5
> +        vextractf128 $1, %ymm12, %xmm13
> +        vmovd     %xmm12, %eax
> +        vmovd     %xmm13, %ecx
> +        vpextrd   $2, %xmm12, %edx
> +        vpextrd   $2, %xmm13, %esi
> +        movslq    %eax, %rax
> +        movslq    %edx, %rdx
> +        movslq    %ecx, %rcx
> +        movslq    %esi, %rsi
> +
> +/* Sign | Diff */
> +        vxorpd    %ymm4, %ymm3, %ymm12
> +
> +/*
> + * _LA_ polynomial computation
> + * Start polynomial evaluation
> + */
> +        vmovupd   _poly1_0+__svml_derf_data_internal(%rip), %ymm3
> +        vmovupd   (%rdi,%rax), %xmm6
> +        vmovupd   (%rdi,%rdx), %xmm7
> +        vmovupd   (%rdi,%rcx), %xmm8
> +        vmovupd   (%rdi,%rsi), %xmm9
> +        vunpcklpd %xmm7, %xmm6, %xmm14
> +        vunpcklpd %xmm9, %xmm8, %xmm15
> +
> +/* D2 = Diff^2 */
> +        vmulpd    %ymm1, %ymm1, %ymm13
> +        vfmadd213pd _poly1_1+__svml_derf_data_internal(%rip), %ymm2, %ymm3
> +        vmovupd   _poly5_0+__svml_derf_data_internal(%rip), %ymm1
> +        vunpckhpd %xmm9, %xmm8, %xmm10
> +        vfmadd213pd _poly1_2+__svml_derf_data_internal(%rip), %ymm2, %ymm3
> +        vfmadd213pd _poly5_1+__svml_derf_data_internal(%rip), %ymm2, %ymm1
> +        vfmadd213pd _poly1_3+__svml_derf_data_internal(%rip), %ymm2, %ymm3
> +        vfmadd213pd _poly3_3+__svml_derf_data_internal(%rip), %ymm13, %ymm1
> +
> +/* P1 = T^2*P1 - T */
> +        vfmsub213pd %ymm2, %ymm5, %ymm3
> +        vinsertf128 $1, %xmm15, %ymm14, %ymm0
> +        vunpckhpd %xmm7, %xmm6, %xmm14
> +        vmovupd   _poly3_0+__svml_derf_data_internal(%rip), %ymm6
> +        vfmadd213pd _poly3_1+__svml_derf_data_internal(%rip), %ymm2, %ymm6
> +        vfmadd213pd _poly3_2+__svml_derf_data_internal(%rip), %ymm2, %ymm6
> +        vfmadd213pd %ymm1, %ymm2, %ymm6
> +
> +/* P1 + P3*D2 */
> +        vfmadd213pd %ymm3, %ymm13, %ymm6
> +
> +/* Sign | _Erf_H */
> +        vxorpd    %ymm4, %ymm0, %ymm0
> +        vinsertf128 $1, %xmm10, %ymm14, %ymm11
> +
> +/* exp_h(x0) * Diff */
> +        vmulpd    %ymm12, %ymm11, %ymm2
> +
> +/*
> + * branch-free
> + * low part of result: exp_h(x0) * Diff*(1+P1)
> + */
> +        vfmadd213pd %ymm2, %ymm2, %ymm6
> +
> +/* Final result */
> +        vaddpd    %ymm6, %ymm0, %ymm15
> +
> +/* Fix erf(-0) = -0 */
> +        vorpd     %ymm4, %ymm15, %ymm0
> +        ret
> +
> +END(_ZGVdN4v_erf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_derf_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _erf_tbl[6*128*2][2];
> +        __declspec(align(32)) VUINT32 _AbsMask[4][2];
> +        __declspec(align(32)) VUINT32 _MaxThreshold[4][2];
> +        __declspec(align(32)) VUINT32 _SRound[4][2];
> +        __declspec(align(32)) VUINT32 _U2Threshold[4][2];
> +        __declspec(align(32)) VUINT32 _poly1_0[4][2];
> +        __declspec(align(32)) VUINT32 _poly1_1[4][2];
> +        __declspec(align(32)) VUINT32 _poly3_0[4][2];
> +        __declspec(align(32)) VUINT32 _poly3_1[4][2];
> +        __declspec(align(32)) VUINT32 _poly5_0[4][2];
> +        __declspec(align(32)) VUINT32 _poly5_1[4][2];
> +        __declspec(align(32)) VUINT32 _poly1_2[4][2];
> +        __declspec(align(32)) VUINT32 _poly3_2[4][2];
> +        __declspec(align(32)) VUINT32 _poly1_3[4][2];
> +        __declspec(align(32)) VUINT32 _poly3_3[4][2];
> +        __declspec(align(32)) VUINT32 _Mask32[4][2];
> +} __svml_derf_data_internal;
> +#endif
> +__svml_derf_data_internal:
> +        /*== _erf_tbl ==*/
> +        .quad 0x0000000000000000, 0x3ff20dd750429b6d
> +        .quad 0x3f820dbf3deb1340, 0x3ff20d8f1975c85d
> +        .quad 0x3f920d77083f17a0, 0x3ff20cb67bd452c7
> +        .quad 0x3f9b137e0cf584dc, 0x3ff20b4d8bac36c1
> +        .quad 0x3fa20c5645dd2538, 0x3ff209546ad13ccf
> +        .quad 0x3fa68e5d3bbc9526, 0x3ff206cb4897b148
> +        .quad 0x3fab0fafef135745, 0x3ff203b261cd0053
> +        .quad 0x3faf902a77bd3821, 0x3ff2000a00ae3804
> +        .quad 0x3fb207d480e90658, 0x3ff1fbd27cdc72d3
> +        .quad 0x3fb44703e87e8593, 0x3ff1f70c3b4f2cc8
> +        .quad 0x3fb68591a1e83b5d, 0x3ff1f1b7ae44867f
> +        .quad 0x3fb8c36beb8a8d23, 0x3ff1ebd5552f795b
> +        .quad 0x3fbb0081148a873a, 0x3ff1e565bca400d4
> +        .quad 0x3fbd3cbf7e70a4b3, 0x3ff1de697e413d29
> +        .quad 0x3fbf78159ec8bb50, 0x3ff1d6e14099944a
> +        .quad 0x3fc0d939005f65e5, 0x3ff1cecdb718d61c
> +        .quad 0x3fc1f5e1a35c3b89, 0x3ff1c62fa1e869b6
> +        .quad 0x3fc311fc15f56d14, 0x3ff1bd07cdd189ac
> +        .quad 0x3fc42d7fc2f64959, 0x3ff1b357141d95d5
> +        .quad 0x3fc548642321d7c6, 0x3ff1a91e5a748165
> +        .quad 0x3fc662a0bdf7a89f, 0x3ff19e5e92b964ab
> +        .quad 0x3fc77c2d2a765f9e, 0x3ff19318bae53a04
> +        .quad 0x3fc895010fdbdbfd, 0x3ff1874ddcdfce24
> +        .quad 0x3fc9ad142662e14d, 0x3ff17aff0e56ec10
> +        .quad 0x3fcac45e37fe2526, 0x3ff16e2d7093cd8c
> +        .quad 0x3fcbdad72110a648, 0x3ff160da304ed92f
> +        .quad 0x3fccf076d1233237, 0x3ff153068581b781
> +        .quad 0x3fce05354b96ff36, 0x3ff144b3b337c90c
> +        .quad 0x3fcf190aa85540e2, 0x3ff135e3075d076b
> +        .quad 0x3fd015f78a3dcf3d, 0x3ff12695da8b5bde
> +        .quad 0x3fd09eed6982b948, 0x3ff116cd8fd67618
> +        .quad 0x3fd127631eb8de32, 0x3ff1068b94962e5e
> +        .quad 0x3fd1af54e232d609, 0x3ff0f5d1602f7e41
> +        .quad 0x3fd236bef825d9a2, 0x3ff0e4a073dc1b91
> +        .quad 0x3fd2bd9db0f7827f, 0x3ff0d2fa5a70c168
> +        .quad 0x3fd343ed6989b7d9, 0x3ff0c0e0a8223359
> +        .quad 0x3fd3c9aa8b84beda, 0x3ff0ae54fa490723
> +        .quad 0x3fd44ed18d9f6462, 0x3ff09b58f724416b
> +        .quad 0x3fd4d35ef3e5372e, 0x3ff087ee4d9ad247
> +        .quad 0x3fd5574f4ffac98e, 0x3ff07416b4fbfe7c
> +        .quad 0x3fd5da9f415ff23f, 0x3ff05fd3ecbec298
> +        .quad 0x3fd65d4b75b00471, 0x3ff04b27bc403d30
> +        .quad 0x3fd6df50a8dff772, 0x3ff03613f2812daf
> +        .quad 0x3fd760aba57a76bf, 0x3ff0209a65e29545
> +        .quad 0x3fd7e15944d9d3e4, 0x3ff00abcf3e187a9
> +        .quad 0x3fd861566f5fd3c0, 0x3fefe8fb01a47307
> +        .quad 0x3fd8e0a01cab516b, 0x3fefbbbbef34b4b2
> +        .quad 0x3fd95f3353cbb146, 0x3fef8dc092d58ff8
> +        .quad 0x3fd9dd0d2b721f39, 0x3fef5f0cdaf15313
> +        .quad 0x3fda5a2aca209394, 0x3fef2fa4c16c0019
> +        .quad 0x3fdad68966569a87, 0x3feeff8c4b1375db
> +        .quad 0x3fdb522646bbda68, 0x3feecec7870ebca8
> +        .quad 0x3fdbccfec24855b8, 0x3fee9d5a8e4c934e
> +        .quad 0x3fdc4710406a65fc, 0x3fee6b4982f158b9
> +        .quad 0x3fdcc058392a6d2d, 0x3fee38988fc46e72
> +        .quad 0x3fdd38d4354c3bd0, 0x3fee054be79d3042
> +        .quad 0x3fddb081ce6e2a48, 0x3fedd167c4cf9d2a
> +        .quad 0x3fde275eaf25e458, 0x3fed9cf06898cdaf
> +        .quad 0x3fde9d68931ae650, 0x3fed67ea1a8b5368
> +        .quad 0x3fdf129d471eabb1, 0x3fed325927fb9d89
> +        .quad 0x3fdf86faa9428f9d, 0x3fecfc41e36c7df9
> +        .quad 0x3fdffa7ea8eb5fd0, 0x3fecc5a8a3fbea40
> +        .quad 0x3fe03693a371519c, 0x3fec8e91c4d01368
> +        .quad 0x3fe06f794ab2cae7, 0x3fec5701a484ef9d
> +        .quad 0x3fe0a7ef5c18edd2, 0x3fec1efca49a5011
> +        .quad 0x3fe0dff4f247f6c6, 0x3febe68728e29d5e
> +        .quad 0x3fe1178930ada115, 0x3febada596f25436
> +        .quad 0x3fe14eab43841b55, 0x3feb745c55905bf8
> +        .quad 0x3fe1855a5fd3dd50, 0x3feb3aafcc27502e
> +        .quad 0x3fe1bb95c3746199, 0x3feb00a46237d5be
> +        .quad 0x3fe1f15cb50bc4de, 0x3feac63e7ecc1411
> +        .quad 0x3fe226ae840d4d70, 0x3fea8b8287ec6a09
> +        .quad 0x3fe25b8a88b6dd7f, 0x3fea5074e2157620
> +        .quad 0x3fe28ff0240d52cd, 0x3fea1519efaf889e
> +        .quad 0x3fe2c3debfd7d6c1, 0x3fe9d97610879642
> +        .quad 0x3fe2f755ce9a21f4, 0x3fe99d8da149c13f
> +        .quad 0x3fe32a54cb8db67b, 0x3fe96164fafd8de3
> +        .quad 0x3fe35cdb3a9a144d, 0x3fe925007283d7aa
> +        .quad 0x3fe38ee8a84beb71, 0x3fe8e86458169af8
> +        .quad 0x3fe3c07ca9cb4f9e, 0x3fe8ab94f6caa71d
> +        .quad 0x3fe3f196dcd0f135, 0x3fe86e9694134b9e
> +        .quad 0x3fe42236e79a5fa6, 0x3fe8316d6f48133d
> +        .quad 0x3fe4525c78dd5966, 0x3fe7f41dc12c9e89
> +        .quad 0x3fe4820747ba2dc2, 0x3fe7b6abbb7aaf19
> +        .quad 0x3fe4b13713ad3513, 0x3fe7791b886e7403
> +        .quad 0x3fe4dfeba47f63cc, 0x3fe73b714a552763
> +        .quad 0x3fe50e24ca35fd2c, 0x3fe6fdb11b1e0c34
> +        .quad 0x3fe53be25d016a4f, 0x3fe6bfdf0beddaf5
> +        .quad 0x3fe569243d2b3a9b, 0x3fe681ff24b4ab04
> +        .quad 0x3fe595ea53035283, 0x3fe6441563c665d4
> +        .quad 0x3fe5c2348ecc4dc3, 0x3fe60625bd75d07b
> +        .quad 0x3fe5ee02e8a71a53, 0x3fe5c8341bb23767
> +        .quad 0x3fe61955607dd15d, 0x3fe58a445da7c74c
> +        .quad 0x3fe6442bfdedd397, 0x3fe54c5a57629db0
> +        .quad 0x3fe66e86d0312e82, 0x3fe50e79d1749ac9
> +        .quad 0x3fe69865ee075011, 0x3fe4d0a6889dfd9f
> +        .quad 0x3fe6c1c9759d0e5f, 0x3fe492e42d78d2c5
> +        .quad 0x3fe6eab18c74091b, 0x3fe4553664273d24
> +        .quad 0x3fe7131e5f496a5a, 0x3fe417a0c4049fd0
> +        .quad 0x3fe73b1021fc0cb8, 0x3fe3da26d759aef5
> +        .quad 0x3fe762870f720c6f, 0x3fe39ccc1b136d5a
> +        .quad 0x3fe78983697dc96f, 0x3fe35f93fe7d1b3d
> +        .quad 0x3fe7b00578c26037, 0x3fe32281e2fd1a92
> +        .quad 0x3fe7d60d8c979f7b, 0x3fe2e5991bd4cbfc
> +        .quad 0x3fe7fb9bfaed8078, 0x3fe2a8dcede3673b
> +        .quad 0x3fe820b1202f27fb, 0x3fe26c508f6bd0ff
> +        .quad 0x3fe8454d5f25760d, 0x3fe22ff727dd6f7b
> +        .quad 0x3fe8697120d92a4a, 0x3fe1f3d3cf9ffe5a
> +        .quad 0x3fe88d1cd474a2e0, 0x3fe1b7e98fe26217
> +        .quad 0x3fe8b050ef253c37, 0x3fe17c3b626c7a12
> +        .quad 0x3fe8d30debfc572e, 0x3fe140cc3173f007
> +        .quad 0x3fe8f5544bd00c04, 0x3fe1059ed7740313
> +        .quad 0x3fe91724951b8fc6, 0x3fe0cab61f084b93
> +        .quad 0x3fe9387f53df5238, 0x3fe09014c2ca74da
> +        .quad 0x3fe959651980da31, 0x3fe055bd6d32e8d7
> +        .quad 0x3fe979d67caa6631, 0x3fe01bb2b87c6968
> +        .quad 0x3fe999d4192a5715, 0x3fdfc3ee5d1524b0
> +        .quad 0x3fe9b95e8fd26aba, 0x3fdf511a91a67d2a
> +        .quad 0x3fe9d8768656cc42, 0x3fdedeeee0959518
> +        .quad 0x3fe9f71ca72cffb6, 0x3fde6d6ffaa65a25
> +        .quad 0x3fea1551a16aaeaf, 0x3fddfca26f5bbf88
> +        .quad 0x3fea331628a45b92, 0x3fdd8c8aace11e63
> +        .quad 0x3fea506af4cc00f4, 0x3fdd1d2cfff91594
> +        .quad 0x3fea6d50c20fa293, 0x3fdcae8d93f1d7b7
> +        .quad 0x3fea89c850b7d54d, 0x3fdc40b0729ed548
> +        .quad 0x3feaa5d265064366, 0x3fdbd3998457afdb
> +        .quad 0x3feac16fc7143263, 0x3fdb674c8ffc6283
> +        .quad 0x3feadca142b10f98, 0x3fdafbcd3afe8ab6
> +        .quad 0x3feaf767a741088b, 0x3fda911f096fbc26
> +        .quad 0x3feb11c3c79bb424, 0x3fda27455e14c93c
> +        .quad 0x3feb2bb679ead19c, 0x3fd9be437a7de946
> +        .quad 0x3feb4540978921ee, 0x3fd9561c7f23a47b
> +        .quad 0x3feb5e62fce16095, 0x3fd8eed36b886d93
> +        .quad 0x3feb771e894d602e, 0x3fd8886b1e5ecfd1
> +        .quad 0x3feb8f741ef54f83, 0x3fd822e655b417e7
> +        .quad 0x3feba764a2af2b78, 0x3fd7be47af1f5d89
> +        .quad 0x3febbef0fbde6221, 0x3fd75a91a7f4d2ed
> +        .quad 0x3febd61a1453ab44, 0x3fd6f7c69d7d3ef8
> +        .quad 0x3febece0d82d1a5c, 0x3fd695e8cd31867e
> +        .quad 0x3fec034635b66e23, 0x3fd634fa54fa285f
> +        .quad 0x3fec194b1d49a184, 0x3fd5d4fd33729015
> +        .quad 0x3fec2ef0812fc1bd, 0x3fd575f3483021c3
> +        .quad 0x3fec443755820d64, 0x3fd517de540ce2a3
> +        .quad 0x3fec5920900b5fd1, 0x3fd4babff975a04c
> +        .quad 0x3fec6dad2829ec62, 0x3fd45e99bcbb7915
> +        .quad 0x3fec81de16b14cef, 0x3fd4036d0468a7a2
> +        .quad 0x3fec95b455cce69d, 0x3fd3a93b1998736c
> +        .quad 0x3feca930e0e2a825, 0x3fd35005285227f1
> +        .quad 0x3fecbc54b476248d, 0x3fd2f7cc3fe6f423
> +        .quad 0x3feccf20ce0c0d27, 0x3fd2a09153529381
> +        .quad 0x3fece1962c0e0d8b, 0x3fd24a55399ea239
> +        .quad 0x3fecf3b5cdaf0c39, 0x3fd1f518ae487dc8
> +        .quad 0x3fed0580b2cfd249, 0x3fd1a0dc51a9934d
> +        .quad 0x3fed16f7dbe41ca0, 0x3fd14da0a961fd14
> +        .quad 0x3fed281c49d818d0, 0x3fd0fb6620c550af
> +        .quad 0x3fed38eefdf64fdd, 0x3fd0aa2d09497f2b
> +        .quad 0x3fed4970f9ce00d9, 0x3fd059f59af7a906
> +        .quad 0x3fed59a33f19ed42, 0x3fd00abff4dec7a3
> +        .quad 0x3fed6986cfa798e7, 0x3fcf79183b101c5b
> +        .quad 0x3fed791cad3eff01, 0x3fcedeb406d9c825
> +        .quad 0x3fed8865d98abe01, 0x3fce4652fadcb6b2
> +        .quad 0x3fed97635600bb89, 0x3fcdaff4969c0b04
> +        .quad 0x3feda61623cb41e0, 0x3fcd1b982c501370
> +        .quad 0x3fedb47f43b2980d, 0x3fcc893ce1dcbef7
> +        .quad 0x3fedc29fb60715af, 0x3fcbf8e1b1ca2279
> +        .quad 0x3fedd0787a8bb39d, 0x3fcb6a856c3ed54f
> +        .quad 0x3fedde0a90611a0d, 0x3fcade26b7fbed95
> +        .quad 0x3fedeb56f5f12d28, 0x3fca53c4135a6526
> +        .quad 0x3fedf85ea8db188e, 0x3fc9cb5bd549b111
> +        .quad 0x3fee0522a5dfda73, 0x3fc944ec2e4f5630
> +        .quad 0x3fee11a3e8cf4eb8, 0x3fc8c07329874652
> +        .quad 0x3fee1de36c75ba58, 0x3fc83deeada4d25a
> +        .quad 0x3fee29e22a89d766, 0x3fc7bd5c7df3fe9c
> +        .quad 0x3fee35a11b9b61ce, 0x3fc73eba3b5b07b7
> +        .quad 0x3fee4121370224cc, 0x3fc6c205655be720
> +        .quad 0x3fee4c6372cd8927, 0x3fc6473b5b15a7a1
> +        .quad 0x3fee5768c3b4a3fc, 0x3fc5ce595c455b0a
> +        .quad 0x3fee62321d06c5e0, 0x3fc5575c8a468362
> +        .quad 0x3fee6cc0709c8a0d, 0x3fc4e241e912c305
> +        .quad 0x3fee7714aec96534, 0x3fc46f066040a832
> +        .quad 0x3fee812fc64db369, 0x3fc3fda6bc016994
> +        .quad 0x3fee8b12a44944a8, 0x3fc38e1fae1d6a9d
> +        .quad 0x3fee94be342e6743, 0x3fc3206dceef5f87
> +        .quad 0x3fee9e335fb56f87, 0x3fc2b48d9e5dea1c
> +        .quad 0x3feea7730ed0bbb9, 0x3fc24a7b84d38971
> +        .quad 0x3feeb07e27a133aa, 0x3fc1e233d434b813
> +        .quad 0x3feeb9558e6b42ce, 0x3fc17bb2c8d41535
> +        .quad 0x3feec1fa258c4bea, 0x3fc116f48a6476cc
> +        .quad 0x3feeca6ccd709544, 0x3fc0b3f52ce8c383
> +        .quad 0x3feed2ae6489ac1e, 0x3fc052b0b1a174ea
> +        .quad 0x3feedabfc7453e63, 0x3fbfe6460fef4680
> +        .quad 0x3feee2a1d004692c, 0x3fbf2a901ccafb37
> +        .quad 0x3feeea5557137ae0, 0x3fbe723726b824a9
> +        .quad 0x3feef1db32a2277c, 0x3fbdbd32ac4c99b0
> +        .quad 0x3feef93436bc2daa, 0x3fbd0b7a0f921e7c
> +        .quad 0x3fef006135426b26, 0x3fbc5d0497c09e74
> +        .quad 0x3fef0762fde45ee6, 0x3fbbb1c972f23e50
> +        .quad 0x3fef0e3a5e1a1788, 0x3fbb09bfb7d11a84
> +        .quad 0x3fef14e8211e8c55, 0x3fba64de673e8837
> +        .quad 0x3fef1b6d0fea5f4d, 0x3fb9c31c6df3b1b8
> +        .quad 0x3fef21c9f12f0677, 0x3fb92470a61b6965
> +        .quad 0x3fef27ff89525acf, 0x3fb888d1d8e510a3
> +        .quad 0x3fef2e0e9a6a8b09, 0x3fb7f036c0107294
> +        .quad 0x3fef33f7e43a706b, 0x3fb75a96077274ba
> +        .quad 0x3fef39bc242e43e6, 0x3fb6c7e64e7281cb
> +        .quad 0x3fef3f5c1558b19e, 0x3fb6381e2980956b
> +        .quad 0x3fef44d870704911, 0x3fb5ab342383d178
> +        .quad 0x3fef4a31ebcd47df, 0x3fb5211ebf41880b
> +        .quad 0x3fef4f693b67bd77, 0x3fb499d478bca735
> +        .quad 0x3fef547f10d60597, 0x3fb4154bc68d75c3
> +        .quad 0x3fef59741b4b97cf, 0x3fb3937b1b31925a
> +        .quad 0x3fef5e4907982a07, 0x3fb31458e6542847
> +        .quad 0x3fef62fe80272419, 0x3fb297db960e4f63
> +        .quad 0x3fef67952cff6282, 0x3fb21df9981f8e53
> +        .quad 0x3fef6c0db3c34641, 0x3fb1a6a95b1e786f
> +        .quad 0x3fef7068b7b10fd9, 0x3fb131e14fa1625d
> +        .quad 0x3fef74a6d9a38383, 0x3fb0bf97e95f2a64
> +        .quad 0x3fef78c8b812d498, 0x3fb04fc3a0481321
> +        .quad 0x3fef7cceef15d631, 0x3fafc4b5e32d6259
> +        .quad 0x3fef80ba18636f07, 0x3faeeea8c1b1db94
> +        .quad 0x3fef848acb544e95, 0x3fae1d4cf1e2450a
> +        .quad 0x3fef88419ce4e184, 0x3fad508f9a1ea64f
> +        .quad 0x3fef8bdf1fb78370, 0x3fac885df3451a07
> +        .quad 0x3fef8f63e416ebff, 0x3fabc4a54a84e834
> +        .quad 0x3fef92d077f8d56d, 0x3fab055303221015
> +        .quad 0x3fef96256700da8e, 0x3faa4a549829587e
> +        .quad 0x3fef99633a838a57, 0x3fa993979e14fffe
> +        .quad 0x3fef9c8a7989af0d, 0x3fa8e109c4622913
> +        .quad 0x3fef9f9ba8d3c733, 0x3fa83298d717210e
> +        .quad 0x3fefa2974addae45, 0x3fa78832c03aa2b1
> +        .quad 0x3fefa57ddfe27376, 0x3fa6e1c5893c380b
> +        .quad 0x3fefa84fe5e05c8d, 0x3fa63f3f5c4de13b
> +        .quad 0x3fefab0dd89d1309, 0x3fa5a08e85af27e0
> +        .quad 0x3fefadb831a9f9c3, 0x3fa505a174e9c929
> +        .quad 0x3fefb04f6868a944, 0x3fa46e66be002240
> +        .quad 0x3fefb2d3f20f9101, 0x3fa3dacd1a8d8cce
> +        .quad 0x3fefb54641aebbc9, 0x3fa34ac36ad8dafe
> +        .quad 0x3fefb7a6c834b5a2, 0x3fa2be38b6d92415
> +        .quad 0x3fefb9f5f4739170, 0x3fa2351c2f2d1449
> +        .quad 0x3fefbc3433260ca5, 0x3fa1af5d2e04f3f6
> +        .quad 0x3fefbe61eef4cf6a, 0x3fa12ceb37ff9bc3
> +        .quad 0x3fefc07f907bc794, 0x3fa0adb5fcfa8c75
> +        .quad 0x3fefc28d7e4f9cd0, 0x3fa031ad58d56279
> +        .quad 0x3fefc48c1d033c7a, 0x3f9f7182a851bca2
> +        .quad 0x3fefc67bcf2d7b8f, 0x3f9e85c449e377f3
> +        .quad 0x3fefc85cf56ecd38, 0x3f9da0005e5f28df
> +        .quad 0x3fefca2fee770c79, 0x3f9cc0180af00a8b
> +        .quad 0x3fefcbf5170b578b, 0x3f9be5ecd2fcb5f9
> +        .quad 0x3fefcdacca0bfb73, 0x3f9b1160991ff737
> +        .quad 0x3fefcf57607a6e7c, 0x3f9a4255a00b9f03
> +        .quad 0x3fefd0f5317f582f, 0x3f9978ae8b55ce1b
> +        .quad 0x3fefd2869270a56f, 0x3f98b44e6031383e
> +        .quad 0x3fefd40bd6d7a785, 0x3f97f5188610ddc8
> +        .quad 0x3fefd58550773cb5, 0x3f973af0c737bb45
> +        .quad 0x3fefd6f34f52013a, 0x3f9685bb5134ef13
> +        .quad 0x3fefd85621b0876d, 0x3f95d55cb54cd53a
> +        .quad 0x3fefd9ae142795e3, 0x3f9529b9e8cf9a1e
> +        .quad 0x3fefdafb719e6a69, 0x3f9482b8455dc491
> +        .quad 0x3fefdc3e835500b3, 0x3f93e03d891b37de
> +        .quad 0x3fefdd7790ea5bc0, 0x3f93422fd6d12e2b
> +        .quad 0x3fefdea6e062d0c9, 0x3f92a875b5ffab56
> +        .quad 0x3fefdfccb62e52d3, 0x3f9212f612dee7fb
> +        .quad 0x3fefe0e9552ebdd6, 0x3f9181983e5133dd
> +        .quad 0x3fefe1fcfebe2083, 0x3f90f443edc5ce49
> +        .quad 0x3fefe307f2b503d0, 0x3f906ae13b0d3255
> +        .quad 0x3fefe40a6f70af4b, 0x3f8fcab1483ea7fc
> +        .quad 0x3fefe504b1d9696c, 0x3f8ec72615a894c4
> +        .quad 0x3fefe5f6f568b301, 0x3f8dcaf3691fc448
> +        .quad 0x3fefe6e1742f7cf6, 0x3f8cd5ec93c12432
> +        .quad 0x3fefe7c466dc57a1, 0x3f8be7e5ac24963b
> +        .quad 0x3fefe8a004c19ae6, 0x3f8b00b38d6b3575
> +        .quad 0x3fefe97483db8670, 0x3f8a202bd6372dce
> +        .quad 0x3fefea4218d6594a, 0x3f894624e78e0faf
> +        .quad 0x3fefeb08f7146046, 0x3f887275e3a6869e
> +        .quad 0x3fefebc950b3fa75, 0x3f87a4f6aca256cb
> +        .quad 0x3fefec835695932e, 0x3f86dd7fe3358230
> +        .quad 0x3fefed37386190fb, 0x3f861beae53b72b7
> +        .quad 0x3fefede5248e38f4, 0x3f856011cc3b036d
> +        .quad 0x3fefee8d486585ee, 0x3f84a9cf6bda3f4c
> +        .quad 0x3fefef2fd00af31a, 0x3f83f8ff5042a88e
> +        .quad 0x3fefefcce6813974, 0x3f834d7dbc76d7e5
> +        .quad 0x3feff064b5afffbe, 0x3f82a727a89a3f14
> +        .quad 0x3feff0f766697c76, 0x3f8205dac02bd6b9
> +        .quad 0x3feff18520700971, 0x3f81697560347b26
> +        .quad 0x3feff20e0a7ba8c2, 0x3f80d1d69569b82d
> +        .quad 0x3feff2924a3f7a83, 0x3f803ede1a45bfee
> +        .quad 0x3feff312046f2339, 0x3f7f60d8aa2a88f2
> +        .quad 0x3feff38d5cc4227f, 0x3f7e4cc4abf7d065
> +        .quad 0x3feff404760319b4, 0x3f7d4143a9dfe965
> +        .quad 0x3feff47772010262, 0x3f7c3e1a5f5c077c
> +        .quad 0x3feff4e671a85425, 0x3f7b430ecf4a83a8
> +        .quad 0x3feff55194fe19df, 0x3f7a4fe83fb9db25
> +        .quad 0x3feff5b8fb26f5f6, 0x3f79646f35a76624
> +        .quad 0x3feff61cc26c1578, 0x3f78806d70b2fc36
> +        .quad 0x3feff67d08401202, 0x3f77a3ade6c8b3e5
> +        .quad 0x3feff6d9e943c231, 0x3f76cdfcbfc1e263
> +        .quad 0x3feff733814af88c, 0x3f75ff2750fe7820
> +        .quad 0x3feff789eb6130c9, 0x3f7536fc18f7ce5c
> +        .quad 0x3feff7dd41ce2b4d, 0x3f74754abacdf1dc
> +        .quad 0x3feff82d9e1a76d8, 0x3f73b9e3f9d06e3f
> +        .quad 0x3feff87b1913e853, 0x3f730499b503957f
> +        .quad 0x3feff8c5cad200a5, 0x3f72553ee2a336bf
> +        .quad 0x3feff90dcaba4096, 0x3f71aba78ba3af89
> +        .quad 0x3feff9532f846ab0, 0x3f7107a8c7323a6e
> +        .quad 0x3feff9960f3eb327, 0x3f706918b6355624
> +        .quad 0x3feff9d67f51ddba, 0x3f6f9f9cfd9c3035
> +        .quad 0x3feffa14948549a7, 0x3f6e77448fb66bb9
> +        .quad 0x3feffa506302ebae, 0x3f6d58da68fd1170
> +        .quad 0x3feffa89fe5b3625, 0x3f6c4412bf4b8f0b
> +        .quad 0x3feffac17988ef4b, 0x3f6b38a3af2e55b4
> +        .quad 0x3feffaf6e6f4f5c0, 0x3f6a3645330550ff
> +        .quad 0x3feffb2a5879f35e, 0x3f693cb11a30d765
> +        .quad 0x3feffb5bdf67fe6f, 0x3f684ba3004a50d0
> +        .quad 0x3feffb8b8c88295f, 0x3f6762d84469c18f
> +        .quad 0x3feffbb970200110, 0x3f66821000795a03
> +        .quad 0x3feffbe599f4f9d9, 0x3f65a90b00981d93
> +        .quad 0x3feffc10194fcb64, 0x3f64d78bba8ca5fd
> +        .quad 0x3feffc38fcffbb7c, 0x3f640d564548fad7
> +        .quad 0x3feffc60535dd7f5, 0x3f634a305080681f
> +        .quad 0x3feffc862a501fd7, 0x3f628de11c5031eb
> +        .quad 0x3feffcaa8f4c9bea, 0x3f61d83170fbf6fb
> +        .quad 0x3feffccd8f5c66d1, 0x3f6128eb96be8798
> +        .quad 0x3feffcef371ea4d7, 0x3f607fdb4dafea5f
> +        .quad 0x3feffd0f92cb6ba7, 0x3f5fb99b8b8279e1
> +        .quad 0x3feffd2eae369a07, 0x3f5e7f232d9e2630
> +        .quad 0x3feffd4c94d29fdb, 0x3f5d4fed7195d7e8
> +        .quad 0x3feffd6951b33686, 0x3f5c2b9cf7f893bf
> +        .quad 0x3feffd84ef9009ee, 0x3f5b11d702b3deb2
> +        .quad 0x3feffd9f78c7524a, 0x3f5a024365f771bd
> +        .quad 0x3feffdb8f7605ee7, 0x3f58fc8c794b03b5
> +        .quad 0x3feffdd1750e1220, 0x3f58005f08d6f1ef
> +        .quad 0x3feffde8fb314ebf, 0x3f570d6a46e07dda
> +        .quad 0x3feffdff92db56e5, 0x3f56235fbd7a4345
> +        .quad 0x3feffe1544d01ccb, 0x3f5541f340697987
> +        .quad 0x3feffe2a1988857c, 0x3f5468dadf4080ab
> +        .quad 0x3feffe3e19349dc7, 0x3f5397ced7af2b15
> +        .quad 0x3feffe514bbdc197, 0x3f52ce898809244e
> +        .quad 0x3feffe63b8c8b5f7, 0x3f520cc76202c5fb
> +        .quad 0x3feffe7567b7b5e1, 0x3f515246dda49d47
> +        .quad 0x3feffe865fac722b, 0x3f509ec86c75d497
> +        .quad 0x3feffe96a78a04a9, 0x3f4fe41cd9bb4eee
> +        .quad 0x3feffea645f6d6da, 0x3f4e97ba3b77f306
> +        .quad 0x3feffeb5415e7c44, 0x3f4d57f524723822
> +        .quad 0x3feffec39ff380b9, 0x3f4c245d4b99847a
> +        .quad 0x3feffed167b12ac2, 0x3f4afc85e0f82e12
> +        .quad 0x3feffede9e5d3262, 0x3f49e005769dbc1d
> +        .quad 0x3feffeeb49896c6d, 0x3f48ce75e9f6f8a0
> +        .quad 0x3feffef76e956a9f, 0x3f47c7744d9378f7
> +        .quad 0x3fefff0312b010b5, 0x3f46caa0d3582fe9
> +        .quad 0x3fefff0e3ad91ec2, 0x3f45d79eb71e893b
> +        .quad 0x3fefff18ebe2b0e1, 0x3f44ee1429bf7cc0
> +        .quad 0x3fefff232a72b48e, 0x3f440daa3c89f5b6
> +        .quad 0x3fefff2cfb0453d9, 0x3f43360ccd23db3a
> +        .quad 0x3fefff3661e9569d, 0x3f4266ea71d4f71a
> +        .quad 0x3fefff3f634b79f9, 0x3f419ff4663ae9df
> +        .quad 0x3fefff48032dbe40, 0x3f40e0de78654d1e
> +        .quad 0x3fefff50456dab8c, 0x3f40295ef6591848
> +        .quad 0x3fefff582dc48d30, 0x3f3ef25d37f49fe1
> +        .quad 0x3fefff5fbfc8a439, 0x3f3da01102b5f851
> +        .quad 0x3fefff66feee5129, 0x3f3c5b5412dcafad
> +        .quad 0x3fefff6dee89352e, 0x3f3b23a5a23e4210
> +        .quad 0x3fefff7491cd4af6, 0x3f39f8893d8fd1c1
> +        .quad 0x3fefff7aebcff755, 0x3f38d986a4187285
> +        .quad 0x3fefff80ff8911fd, 0x3f37c629a822bc9e
> +        .quad 0x3fefff86cfd3e657, 0x3f36be02102b3520
> +        .quad 0x3fefff8c5f702ccf, 0x3f35c0a378c90bca
> +        .quad 0x3fefff91b102fca8, 0x3f34cda5374ea275
> +        .quad 0x3fefff96c717b695, 0x3f33e4a23d1f4703
> +        .quad 0x3fefff9ba420e834, 0x3f330538fbb77ecd
> +        .quad 0x3fefffa04a7928b1, 0x3f322f0b496539be
> +        .quad 0x3fefffa4bc63ee9a, 0x3f3161be46ad3b50
> +        .quad 0x3fefffa8fc0e5f33, 0x3f309cfa445b00ff
> +        .quad 0x3fefffad0b901755, 0x3f2fc0d55470cf51
> +        .quad 0x3fefffb0ecebee1b, 0x3f2e577bbcd49935
> +        .quad 0x3fefffb4a210b172, 0x3f2cfd4a5adec5c0
> +        .quad 0x3fefffb82cd9dcbf, 0x3f2bb1a9657ce465
> +        .quad 0x3fefffbb8f1049c6, 0x3f2a740684026555
> +        .quad 0x3fefffbeca6adbe9, 0x3f2943d4a1d1ed39
> +        .quad 0x3fefffc1e08f25f5, 0x3f28208bc334a6a5
> +        .quad 0x3fefffc4d3120aa1, 0x3f2709a8db59f25c
> +        .quad 0x3fefffc7a37857d2, 0x3f25feada379d8b7
> +        .quad 0x3fefffca53375ce3, 0x3f24ff207314a102
> +        .quad 0x3fefffcce3b57bff, 0x3f240a8c1949f75e
> +        .quad 0x3fefffcf564ab6b7, 0x3f23207fb7420eb9
> +        .quad 0x3fefffd1ac4135f9, 0x3f22408e9ba3327f
> +        .quad 0x3fefffd3e6d5cd87, 0x3f216a501f0e42ca
> +        .quad 0x3fefffd607387b07, 0x3f209d5f819c9e29
> +        .quad 0x3fefffd80e8ce0da, 0x3f1fb2b792b40a22
> +        .quad 0x3fefffd9fdeabcce, 0x3f1e3bcf436a1a95
> +        .quad 0x3fefffdbd65e5ad0, 0x3f1cd55277c18d05
> +        .quad 0x3fefffdd98e903b2, 0x3f1b7e94604479dc
> +        .quad 0x3fefffdf46816833, 0x3f1a36eec00926dd
> +        .quad 0x3fefffe0e0140857, 0x3f18fdc1b2dcf7b9
> +        .quad 0x3fefffe26683972a, 0x3f17d2737527c3f9
> +        .quad 0x3fefffe3daa95b18, 0x3f16b4702d7d5849
> +        .quad 0x3fefffe53d558ae9, 0x3f15a329b7d30748
> +        .quad 0x3fefffe68f4fa777, 0x3f149e17724f4d41
> +        .quad 0x3fefffe7d156d244, 0x3f13a4b60ba9aa4e
> +        .quad 0x3fefffe904222101, 0x3f12b6875310f785
> +        .quad 0x3fefffea2860ee1e, 0x3f11d312098e9dba
> +        .quad 0x3fefffeb3ebb267b, 0x3f10f9e1b4dd36df
> +        .quad 0x3fefffec47d19457, 0x3f102a8673a94692
> +        .quad 0x3fefffed443e2787, 0x3f0ec929a665b449
> +        .quad 0x3fefffee34943b15, 0x3f0d4f4b4c8e09ed
> +        .quad 0x3fefffef1960d85d, 0x3f0be6abbb10a5aa
> +        .quad 0x3fefffeff32af7af, 0x3f0a8e8cc1fadef6
> +        .quad 0x3feffff0c273bea2, 0x3f094637d5bacfdb
> +        .quad 0x3feffff187b6bc0e, 0x3f080cfdc72220cf
> +        .quad 0x3feffff2436a21dc, 0x3f06e2367dc27f95
> +        .quad 0x3feffff2f5fefcaa, 0x3f05c540b4936fd2
> +        .quad 0x3feffff39fe16963, 0x3f04b581b8d170fc
> +        .quad 0x3feffff44178c8d2, 0x3f03b2652b06c2b2
> +        .quad 0x3feffff4db27f146, 0x3f02bb5cc22e5db6
> +        .quad 0x3feffff56d4d5e5e, 0x3f01cfe010e2052d
> +        .quad 0x3feffff5f8435efc, 0x3f00ef6c4c84a0fe
> +        .quad 0x3feffff67c604180, 0x3f001984165a5f36
> +        .quad 0x3feffff6f9f67e55, 0x3efe9b5e8d00ce77
> +        .quad 0x3feffff77154e0d6, 0x3efd16f5716c6c1a
> +        .quad 0x3feffff7e2c6aea2, 0x3efba4f035d60e03
> +        .quad 0x3feffff84e93cd75, 0x3efa447b7b03f045
> +        .quad 0x3feffff8b500e77c, 0x3ef8f4ccca7fc90d
> +        .quad 0x3feffff9164f8e46, 0x3ef7b5223dac7336
> +        .quad 0x3feffff972be5c59, 0x3ef684c227fcacef
> +        .quad 0x3feffff9ca891572, 0x3ef562fac4329b48
> +        .quad 0x3feffffa1de8c582, 0x3ef44f21e49054f2
> +        .quad 0x3feffffa6d13de73, 0x3ef34894a5e24657
> +        .quad 0x3feffffab83e54b8, 0x3ef24eb7254ccf83
> +        .quad 0x3feffffaff99bac4, 0x3ef160f438c70913
> +        .quad 0x3feffffb43555b5f, 0x3ef07ebd2a2d2844
> +        .quad 0x3feffffb839e52f3, 0x3eef4f12e9ab070a
> +        .quad 0x3feffffbc09fa7cd, 0x3eedb5ad0b27805c
> +        .quad 0x3feffffbfa82616b, 0x3eec304efa2c6f4e
> +        .quad 0x3feffffc316d9ed0, 0x3eeabe09e9144b5e
> +        .quad 0x3feffffc6586abf6, 0x3ee95df988e76644
> +        .quad 0x3feffffc96f1165e, 0x3ee80f439b4ee04b
> +        .quad 0x3feffffcc5cec0c1, 0x3ee6d11788a69c64
> +        .quad 0x3feffffcf23ff5fc, 0x3ee5a2adfa0b4bc4
> +        .quad 0x3feffffd1c637b2b, 0x3ee4834877429b8f
> +        .quad 0x3feffffd4456a10d, 0x3ee37231085c7d9a
> +        .quad 0x3feffffd6a3554a1, 0x3ee26eb9daed6f7e
> +        .quad 0x3feffffd8e1a2f22, 0x3ee1783ceac28910
> +        .quad 0x3feffffdb01e8546, 0x3ee08e1badf0fced
> +        .quad 0x3feffffdd05a75ea, 0x3edf5f7d88472604
> +        .quad 0x3feffffdeee4f810, 0x3eddb92b5212fb8d
> +        .quad 0x3feffffe0bd3e852, 0x3edc282cd3957eda
> +        .quad 0x3feffffe273c15b7, 0x3edaab7abace48dc
> +        .quad 0x3feffffe41314e06, 0x3ed94219bfcb4928
> +        .quad 0x3feffffe59c6698b, 0x3ed7eb1a2075864e
> +        .quad 0x3feffffe710d565e, 0x3ed6a597219a93da
> +        .quad 0x3feffffe8717232d, 0x3ed570b69502f313
> +        .quad 0x3feffffe9bf4098c, 0x3ed44ba864670882
> +        .quad 0x3feffffeafb377d5, 0x3ed335a62115bce2
> +        .quad 0x3feffffec2641a9e, 0x3ed22df298214423
> +        .quad 0x3feffffed413e5b7, 0x3ed133d96ae7e0dd
> +        .quad 0x3feffffee4d01cd6, 0x3ed046aeabcfcdec
> +        .quad 0x3feffffef4a55bd4, 0x3ececb9cfe1d8642
> +        .quad 0x3fefffff039f9e8f, 0x3ecd21397ead99cb
> +        .quad 0x3fefffff11ca4876, 0x3ecb8d094c86d374
> +        .quad 0x3fefffff1f302bc1, 0x3eca0df0f0c626dc
> +        .quad 0x3fefffff2bdb904d, 0x3ec8a2e269750a39
> +        .quad 0x3fefffff37d63a36, 0x3ec74adc8f4064d3
> +        .quad 0x3fefffff43297019, 0x3ec604ea819f007c
> +        .quad 0x3fefffff4dde0118, 0x3ec4d0231928c6f9
> +        .quad 0x3fefffff57fc4a95, 0x3ec3aba85fe22e20
> +        .quad 0x3fefffff618c3da6, 0x3ec296a70f414053
> +        .quad 0x3fefffff6a956450, 0x3ec1905613b3abf2
> +        .quad 0x3fefffff731ee681, 0x3ec097f6156f32c5
> +        .quad 0x3fefffff7b2f8ed6, 0x3ebf59a20caf6695
> +        .quad 0x3fefffff82cdcf1b, 0x3ebd9c73698fb1dc
> +        .quad 0x3fefffff89ffc4aa, 0x3ebbf716c6168bae
> +        .quad 0x3fefffff90cb3c81, 0x3eba6852c6b58392
> +        .quad 0x3fefffff9735b73b, 0x3eb8eefd70594a89
> +        .quad 0x3fefffff9d446ccc, 0x3eb789fb715aae95
> +        .quad 0x3fefffffa2fc5015, 0x3eb6383f726a8e04
> +        .quad 0x3fefffffa8621251, 0x3eb4f8c96f26a26a
> +        .quad 0x3fefffffad7a2652, 0x3eb3caa61607f920
> +        .quad 0x3fefffffb248c39d, 0x3eb2acee2f5ecdb8
> +        .quad 0x3fefffffb6d1e95d, 0x3eb19ec60b1242ed
> +        .quad 0x3fefffffbb196132, 0x3eb09f5cf4dd2877
> +        .quad 0x3fefffffbf22c1e2, 0x3eaf5bd95d8730d8
> +        .quad 0x3fefffffc2f171e3, 0x3ead9371e2ff7c35
> +        .quad 0x3fefffffc688a9cf, 0x3eabe41de54d155a
> +        .quad 0x3fefffffc9eb76ac, 0x3eaa4c89e08ef4f3
> +        .quad 0x3fefffffcd1cbc28, 0x3ea8cb738399b12c
> +        .quad 0x3fefffffd01f36af, 0x3ea75fa8dbc84bec
> +        .quad 0x3fefffffd2f57d68, 0x3ea608078a70dcbc
> +        .quad 0x3fefffffd5a2041f, 0x3ea4c37c0394d094
> +        .quad 0x3fefffffd8271d12, 0x3ea39100d5687bfe
> +        .quad 0x3fefffffda86faa9, 0x3ea26f9df8519bd7
> +        .quad 0x3fefffffdcc3b117, 0x3ea15e6827001f18
> +        .quad 0x3fefffffdedf37ed, 0x3ea05c803e4831c1
> +        .quad 0x3fefffffe0db6b91, 0x3e9ed22548cffd35
> +        .quad 0x3fefffffe2ba0ea5, 0x3e9d06ad6ecdf971
> +        .quad 0x3fefffffe47ccb60, 0x3e9b551c847fbc96
> +        .quad 0x3fefffffe62534d4, 0x3e99bc09f112b494
> +        .quad 0x3fefffffe7b4c81e, 0x3e983a1ff0aa239d
> +        .quad 0x3fefffffe92ced93, 0x3e96ce1aa3fd7bdd
> +        .quad 0x3fefffffea8ef9cf, 0x3e9576c72b514859
> +        .quad 0x3fefffffebdc2ec6, 0x3e943302cc4a0da8
> +        .quad 0x3fefffffed15bcba, 0x3e9301ba221dc9bb
> +        .quad 0x3fefffffee3cc32c, 0x3e91e1e857adc568
> +        .quad 0x3fefffffef5251c2, 0x3e90d2966b1746f7
> +        .quad 0x3feffffff0576917, 0x3e8fa5b4f49cc6b2
> +        .quad 0x3feffffff14cfb92, 0x3e8dc3ae30b55c16
> +        .quad 0x3feffffff233ee1d, 0x3e8bfd7555a3bd68
> +        .quad 0x3feffffff30d18e8, 0x3e8a517d9e61628a
> +        .quad 0x3feffffff3d9480f, 0x3e88be4f8f6c951f
> +        .quad 0x3feffffff4993c46, 0x3e874287ded49339
> +        .quad 0x3feffffff54dab72, 0x3e85dcd669f2cd34
> +        .quad 0x3feffffff5f74141, 0x3e848bfd38302871
> +        .quad 0x3feffffff6969fb8, 0x3e834ecf8a3c124a
> +        .quad 0x3feffffff72c5fb6, 0x3e822430f521cbcf
> +        .quad 0x3feffffff7b91176, 0x3e810b1488aeb235
> +        .quad 0x3feffffff83d3d07, 0x3e80027c00a263a6
> +        .quad 0x3feffffff8b962be, 0x3e7e12ee004efc37
> +        .quad 0x3feffffff92dfba2, 0x3e7c3e44ae32b16b
> +        .quad 0x3feffffff99b79d2, 0x3e7a854ea14102a8
> +        .quad 0x3feffffffa0248e8, 0x3e78e6761569f45d
> +        .quad 0x3feffffffa62ce54, 0x3e77603bac345f65
> +        .quad 0x3feffffffabd69b4, 0x3e75f1353cdad001
> +        .quad 0x3feffffffb127525, 0x3e74980cb3c80949
> +        .quad 0x3feffffffb624592, 0x3e73537f00b6ad4d
> +        .quad 0x3feffffffbad2aff, 0x3e72225b12bffc68
> +        .quad 0x3feffffffbf370cd, 0x3e710380e1adb7e9
> +        .quad 0x3feffffffc355dfd, 0x3e6febc107d5efaa
> +        .quad 0x3feffffffc733572, 0x3e6df0f2a0ee6947
> +        .quad 0x3feffffffcad3626, 0x3e6c14b2188bcee4
> +        .quad 0x3feffffffce39b67, 0x3e6a553644f7f07d
> +        .quad 0x3feffffffd169d0c, 0x3e68b0cfce0579e0
> +        .quad 0x3feffffffd466fa5, 0x3e6725e7c5dd20f7
> +        .quad 0x3feffffffd7344aa, 0x3e65b2fe547a1340
> +        .quad 0x3feffffffd9d4aab, 0x3e6456a974e92e93
> +        .quad 0x3feffffffdc4ad7a, 0x3e630f93c3699078
> +        .quad 0x3feffffffde9964e, 0x3e61dc7b5b978cf8
> +        .quad 0x3feffffffe0c2bf0, 0x3e60bc30c5d52f15
> +        .quad 0x3feffffffe2c92db, 0x3e5f5b2be65a0c7f
> +        .quad 0x3feffffffe4aed5e, 0x3e5d5f3a8dea7357
> +        .quad 0x3feffffffe675bbd, 0x3e5b82915b03515b
> +        .quad 0x3feffffffe81fc4e, 0x3e59c3517e789488
> +        .quad 0x3feffffffe9aeb97, 0x3e581fb7df06136e
> +        .quad 0x3feffffffeb24467, 0x3e56961b8d641d06
> +        .quad 0x3feffffffec81ff2, 0x3e5524ec4d916cae
> +        .quad 0x3feffffffedc95e7, 0x3e53cab1343d18d1
> +        .quad 0x3feffffffeefbc85, 0x3e52860757487a01
> +        .quad 0x3fefffffff01a8b6, 0x3e5155a09065d4f7
> +        .quad 0x3fefffffff126e1e, 0x3e50384250e4c9fc
> +        .quad 0x3fefffffff221f30, 0x3e4e59890b926c78
> +        .quad 0x3fefffffff30cd3f, 0x3e4c642116a8a9e3
> +        .quad 0x3fefffffff3e8892, 0x3e4a8e405e651ab6
> +        .quad 0x3fefffffff4b606f, 0x3e48d5f98114f872
> +        .quad 0x3fefffffff57632d, 0x3e47397c5a66e307
> +        .quad 0x3fefffffff629e44, 0x3e45b71456c5a4c4
> +        .quad 0x3fefffffff6d1e56, 0x3e444d26de513197
> +        .quad 0x3fefffffff76ef3f, 0x3e42fa31d6371537
> +        .quad 0x3fefffffff801c1f, 0x3e41bcca373b7b43
> +        .quad 0x3fefffffff88af67, 0x3e40939ab853339f
> +        .quad 0x3fefffffff90b2e3, 0x3e3efac5187b2863
> +        .quad 0x3fefffffff982fc1, 0x3e3cf1e86235d0e7
> +        .quad 0x3fefffffff9f2e9f, 0x3e3b0a68a2128bab
> +        .quad 0x3fefffffffa5b790, 0x3e39423165bc4444
> +        .quad 0x3fefffffffabd229, 0x3e37974e743dea3d
> +        .quad 0x3fefffffffb18582, 0x3e3607e9eacd1050
> +        .quad 0x3fefffffffb6d844, 0x3e34924a74dec729
> +        .quad 0x3fefffffffbbd0aa, 0x3e3334d19e0c2160
> +        .quad 0x3fefffffffc0748f, 0x3e31edfa3c5f5cca
> +        .quad 0x3fefffffffc4c96c, 0x3e30bc56f1b54701
> +        .quad 0x3fefffffffc8d462, 0x3e2f3d2185e047d9
> +        .quad 0x3fefffffffcc9a41, 0x3e2d26cb87945e87
> +        .quad 0x3fefffffffd01f89, 0x3e2b334fac4b9f99
> +        .quad 0x3fefffffffd36871, 0x3e296076f7918d1c
> +        .quad 0x3fefffffffd678ed, 0x3e27ac2d72fc2c63
> +        .quad 0x3fefffffffd954ae, 0x3e2614801550319e
> +        .quad 0x3fefffffffdbff2a, 0x3e24979ac8b28927
> +        .quad 0x3fefffffffde7ba0, 0x3e2333c68e2d0548
> +        .quad 0x3fefffffffe0cd16, 0x3e21e767bce37dd7
> +        .quad 0x3fefffffffe2f664, 0x3e20b0fc5b6d05a0
> +        .quad 0x3fefffffffe4fa30, 0x3e1f1e3523b41d7d
> +        .quad 0x3fefffffffe6daf7, 0x3e1d00de6608effe
> +        .quad 0x3fefffffffe89b0c, 0x3e1b0778b7b3301b
> +        .quad 0x3fefffffffea3c9a, 0x3e192fb04ec0f6cf
> +        .quad 0x3fefffffffebc1a9, 0x3e177756ec9f78fa
> +        .quad 0x3fefffffffed2c21, 0x3e15dc61922d5a06
> +        .quad 0x3fefffffffee7dc8, 0x3e145ce65699ff6d
> +        .quad 0x3fefffffffefb847, 0x3e12f71a5f159970
> +        .quad 0x3feffffffff0dd2b, 0x3e11a94ff571654f
> +        .quad 0x3feffffffff1ede9, 0x3e1071f4bbea09ec
> +        .quad 0x3feffffffff2ebda, 0x3e0e9f1ff8ddd774
> +        .quad 0x3feffffffff3d843, 0x3e0c818223a202c7
> +        .quad 0x3feffffffff4b453, 0x3e0a887bd2b4404d
> +        .quad 0x3feffffffff58126, 0x3e08b1a336c5eb6b
> +        .quad 0x3feffffffff63fc3, 0x3e06fab63324088a
> +        .quad 0x3feffffffff6f121, 0x3e056197e30205ba
> +        .quad 0x3feffffffff79626, 0x3e03e44e45301b92
> +        .quad 0x3feffffffff82fab, 0x3e0281000bfe4c3f
> +        .quad 0x3feffffffff8be77, 0x3e0135f28f2d50b4
> +        .quad 0x3feffffffff94346, 0x3e000187dded5975
> +        .quad 0x3feffffffff9bec8, 0x3dfdc479de0ef001
> +        .quad 0x3feffffffffa319f, 0x3dfbad4fdad3caa1
> +        .quad 0x3feffffffffa9c63, 0x3df9baed3ed27ab8
> +        .quad 0x3feffffffffaffa4, 0x3df7ead9ce4285bb
> +        .quad 0x3feffffffffb5be5, 0x3df63ac6b4edc88e
> +        .quad 0x3feffffffffbb1a2, 0x3df4a88be2a6390c
> +        .quad 0x3feffffffffc014e, 0x3df332259185f1a0
> +        .quad 0x3feffffffffc4b56, 0x3df1d5b1f3793044
> +        .quad 0x3feffffffffc901c, 0x3df0916f04b6e18b
> +        .quad 0x3feffffffffccfff, 0x3deec77101de6926
> +        .quad 0x3feffffffffd0b56, 0x3dec960bf23153e0
> +        .quad 0x3feffffffffd4271, 0x3dea8bd20fc65ef7
> +        .quad 0x3feffffffffd759d, 0x3de8a61745ec7d1d
> +        .quad 0x3feffffffffda520, 0x3de6e25d0e756261
> +        .quad 0x3feffffffffdd13c, 0x3de53e4f7d1666cb
> +        .quad 0x3feffffffffdfa2d, 0x3de3b7c27a7ddb0e
> +        .quad 0x3feffffffffe202d, 0x3de24caf2c32af14
> +        .quad 0x3feffffffffe4371, 0x3de0fb3186804d0f
> +        .quad 0x3feffffffffe642a, 0x3ddf830c0bb41fd7
> +        .quad 0x3feffffffffe8286, 0x3ddd3c0f1a91c846
> +        .quad 0x3feffffffffe9eb0, 0x3ddb1e5acf351d87
> +        .quad 0x3feffffffffeb8d0, 0x3dd92712d259ce66
> +        .quad 0x3feffffffffed10a, 0x3dd7538c60a04476
> +        .quad 0x3feffffffffee782, 0x3dd5a14b04b47879
> +        .quad 0x3feffffffffefc57, 0x3dd40dfd87456f4c
> +        .quad 0x3fefffffffff0fa7, 0x3dd2977b1172b9d5
> +        .quad 0x3fefffffffff218f, 0x3dd13bc07e891491
> +        .quad 0x3fefffffffff3227, 0x3dcff1dbb4300811
> +        .quad 0x3fefffffffff4188, 0x3dcd9a880f306bd8
> +        .quad 0x3fefffffffff4fc9, 0x3dcb6e45220b55e0
> +        .quad 0x3fefffffffff5cfd, 0x3dc96a0b33f2c4da
> +        .quad 0x3fefffffffff6939, 0x3dc78b07e9e924ac
> +        .quad 0x3fefffffffff748e, 0x3dc5ce9ab1670dd2
> +        .quad 0x3fefffffffff7f0d, 0x3dc4325167006bb0
> +        .quad 0x3fefffffffff88c5, 0x3dc2b3e53538ff3f
> +        .quad 0x3fefffffffff91c6, 0x3dc15137a7f44864
> +        .quad 0x3fefffffffff9a1b, 0x3dc0084ff125639d
> +        .quad 0x3fefffffffffa1d2, 0x3dbdaeb0b7311ec7
> +        .quad 0x3fefffffffffa8f6, 0x3dbb7937d1c40c53
> +        .quad 0x3fefffffffffaf92, 0x3db96d082f59ab06
> +        .quad 0x3fefffffffffb5b0, 0x3db7872d9fa10aad
> +        .quad 0x3fefffffffffbb58, 0x3db5c4e8e37bc7d0
> +        .quad 0x3fefffffffffc095, 0x3db423ac0df49a40
> +        .quad 0x3fefffffffffc56d, 0x3db2a117230ad284
> +        .quad 0x3fefffffffffc9e8, 0x3db13af4f04f9998
> +        .quad 0x3fefffffffffce0d, 0x3dafde703724e560
> +        .quad 0x3fefffffffffd1e1, 0x3dad77f0c82e7641
> +        .quad 0x3fefffffffffd56c, 0x3dab3ee02611d7dd
> +        .quad 0x3fefffffffffd8b3, 0x3da92ff33023d5bd
> +        .quad 0x3fefffffffffdbba, 0x3da7481a9e69f53f
> +        .quad 0x3fefffffffffde86, 0x3da5847eda620959
> +        .quad 0x3fefffffffffe11d, 0x3da3e27c1fcc74bd
> +        .quad 0x3fefffffffffe380, 0x3da25f9ee0b923dc
> +        .quad 0x3fefffffffffe5b6, 0x3da0f9a068653200
> +        .quad 0x3fefffffffffe7c0, 0x3d9f5cc7718082b0
> +        .quad 0x3fefffffffffe9a2, 0x3d9cf7e53d6a2ca5
> +        .quad 0x3fefffffffffeb60, 0x3d9ac0f5f3229372
> +        .quad 0x3fefffffffffecfb, 0x3d98b498644847ea
> +        .quad 0x3fefffffffffee77, 0x3d96cfa9bcca59dc
> +        .quad 0x3fefffffffffefd6, 0x3d950f411d4fd2cd
> +        .quad 0x3feffffffffff11a, 0x3d9370ab8327af5e
> +        .quad 0x3feffffffffff245, 0x3d91f167f88c6b6e
> +        .quad 0x3feffffffffff359, 0x3d908f24085d4597
> +        .quad 0x3feffffffffff457, 0x3d8e8f70e181d61a
> +        .quad 0x3feffffffffff542, 0x3d8c324c20e337dc
> +        .quad 0x3feffffffffff61b, 0x3d8a03261574b54e
> +        .quad 0x3feffffffffff6e3, 0x3d87fe903cdf5855
> +        .quad 0x3feffffffffff79b, 0x3d86215c58da3450
> +        .quad 0x3feffffffffff845, 0x3d846897d4b69fc6
> +        .quad 0x3feffffffffff8e2, 0x3d82d1877d731b7b
> +        .quad 0x3feffffffffff973, 0x3d8159a386b11517
> +        .quad 0x3feffffffffff9f8, 0x3d7ffd27ae9393ce
> +        .quad 0x3feffffffffffa73, 0x3d7d7c593130dd0b
> +        .quad 0x3feffffffffffae4, 0x3d7b2cd607c79bcf
> +        .quad 0x3feffffffffffb4c, 0x3d790ae4d3405651
> +        .quad 0x3feffffffffffbad, 0x3d771312dd1759e2
> +        .quad 0x3feffffffffffc05, 0x3d75422ef5d8949d
> +        .quad 0x3feffffffffffc57, 0x3d739544b0ecc957
> +        .quad 0x3feffffffffffca2, 0x3d720997f73e73dd
> +        .quad 0x3feffffffffffce7, 0x3d709ca0eaacd277
> +        .quad 0x3feffffffffffd27, 0x3d6e9810295890ec
> +        .quad 0x3feffffffffffd62, 0x3d6c2b45b5aa4a1d
> +        .quad 0x3feffffffffffd98, 0x3d69eee068fa7596
> +        .quad 0x3feffffffffffdca, 0x3d67df2b399c10a8
> +        .quad 0x3feffffffffffdf8, 0x3d65f8b87a31bd85
> +        .quad 0x3feffffffffffe22, 0x3d64385c96e9a2d9
> +        .quad 0x3feffffffffffe49, 0x3d629b2933ef4cbc
> +        .quad 0x3feffffffffffe6c, 0x3d611e68a6378f8a
> +        .quad 0x3feffffffffffe8d, 0x3d5f7f338086a86b
> +        .quad 0x3feffffffffffeab, 0x3d5cf8d7d9ce040a
> +        .quad 0x3feffffffffffec7, 0x3d5aa577251ae485
> +        .quad 0x3feffffffffffee1, 0x3d58811d739efb5f
> +        .quad 0x3feffffffffffef8, 0x3d568823e52970be
> +        .quad 0x3fefffffffffff0e, 0x3d54b72ae68e8b4c
> +        .quad 0x3fefffffffffff22, 0x3d530b14dbe876bc
> +        .quad 0x3fefffffffffff34, 0x3d5181012ef86610
> +        .quad 0x3fefffffffffff45, 0x3d501647ba798745
> +        .quad 0x3fefffffffffff54, 0x3d4d90e917701675
> +        .quad 0x3fefffffffffff62, 0x3d4b2a87e86d0c8a
> +        .quad 0x3fefffffffffff6f, 0x3d48f53dcb377293
> +        .quad 0x3fefffffffffff7b, 0x3d46ed2f2515e933
> +        .quad 0x3fefffffffffff86, 0x3d450ecc9ed47f19
> +        .quad 0x3fefffffffffff90, 0x3d4356cd5ce7799e
> +        .quad 0x3fefffffffffff9a, 0x3d41c229a587ab78
> +        .quad 0x3fefffffffffffa2, 0x3d404e15ecc7f3f6
> +        .quad 0x3fefffffffffffaa, 0x3d3deffc7e6a6017
> +        .quad 0x3fefffffffffffb1, 0x3d3b7b040832f310
> +        .quad 0x3fefffffffffffb8, 0x3d3938e021f36d76
> +        .quad 0x3fefffffffffffbe, 0x3d37258610b3b233
> +        .quad 0x3fefffffffffffc3, 0x3d353d3bfc82a909
> +        .quad 0x3fefffffffffffc8, 0x3d337c92babdc2fd
> +        .quad 0x3fefffffffffffcd, 0x3d31e06010120f6a
> +        .quad 0x3fefffffffffffd1, 0x3d3065b9616170d4
> +        .quad 0x3fefffffffffffd5, 0x3d2e13dd96b3753b
> +        .quad 0x3fefffffffffffd9, 0x3d2b950d32467392
> +        .quad 0x3fefffffffffffdc, 0x3d294a72263259a5
> +        .quad 0x3fefffffffffffdf, 0x3d272fd93e036cdc
> +        .quad 0x3fefffffffffffe2, 0x3d254164576929ab
> +        .quad 0x3fefffffffffffe4, 0x3d237b83c521fe96
> +        .quad 0x3fefffffffffffe7, 0x3d21daf033182e96
> +        .quad 0x3fefffffffffffe9, 0x3d205ca50205d26a
> +        .quad 0x3fefffffffffffeb, 0x3d1dfbb6235639fa
> +        .quad 0x3fefffffffffffed, 0x3d1b7807e294781f
> +        .quad 0x3fefffffffffffee, 0x3d19298add70a734
> +        .quad 0x3feffffffffffff0, 0x3d170beaf9c7ffb6
> +        .quad 0x3feffffffffffff1, 0x3d151b2cd6709222
> +        .quad 0x3feffffffffffff3, 0x3d1353a6cf7f7fff
> +        .quad 0x3feffffffffffff4, 0x3d11b1fa8cbe84a7
> +        .quad 0x3feffffffffffff5, 0x3d10330f0fd69921
> +        .quad 0x3feffffffffffff6, 0x3d0da81670f96f9b
> +        .quad 0x3feffffffffffff7, 0x3d0b24a16b4d09aa
> +        .quad 0x3feffffffffffff7, 0x3d08d6eeb6efdbd6
> +        .quad 0x3feffffffffffff8, 0x3d06ba91ac734786
> +        .quad 0x3feffffffffffff9, 0x3d04cb7966770ab5
> +        .quad 0x3feffffffffffff9, 0x3d0305e9721d0981
> +        .quad 0x3feffffffffffffa, 0x3d01667311fff70a
> +        .quad 0x3feffffffffffffb, 0x3cffd3de10d62855
> +        .quad 0x3feffffffffffffb, 0x3cfd1aefbcd48d0c
> +        .quad 0x3feffffffffffffb, 0x3cfa9cc93c25aca9
> +        .quad 0x3feffffffffffffc, 0x3cf85487ee3ea735
> +        .quad 0x3feffffffffffffc, 0x3cf63daf8b4b1e0c
> +        .quad 0x3feffffffffffffd, 0x3cf45421e69a6ca1
> +        .quad 0x3feffffffffffffd, 0x3cf294175802d99a
> +        .quad 0x3feffffffffffffd, 0x3cf0fa17bf41068f
> +        .quad 0x3feffffffffffffd, 0x3cef05e82aae2bb9
> +        .quad 0x3feffffffffffffe, 0x3cec578101b29058
> +        .quad 0x3feffffffffffffe, 0x3ce9e39dc5dd2f7c
> +        .quad 0x3feffffffffffffe, 0x3ce7a553a728bbf2
> +        .quad 0x3feffffffffffffe, 0x3ce5982008db1304
> +        .quad 0x3feffffffffffffe, 0x3ce3b7e00422e51b
> +        .quad 0x3feffffffffffffe, 0x3ce200c898d9ee3e
> +        .quad 0x3fefffffffffffff, 0x3ce06f5f7eb65a56
> +        .quad 0x3fefffffffffffff, 0x3cde00e9148a1d25
> +        .quad 0x3fefffffffffffff, 0x3cdb623734024e92
> +        .quad 0x3fefffffffffffff, 0x3cd8fd4e01891bf8
> +        .quad 0x3fefffffffffffff, 0x3cd6cd44c7470d89
> +        .quad 0x3fefffffffffffff, 0x3cd4cd9c04158cd7
> +        .quad 0x3fefffffffffffff, 0x3cd2fa34bf5c8344
> +        .quad 0x3fefffffffffffff, 0x3cd14f4890ff2461
> +        .quad 0x3fefffffffffffff, 0x3ccf92c49dfa4df5
> +        .quad 0x3fefffffffffffff, 0x3ccccaaea71ab0df
> +        .quad 0x3fefffffffffffff, 0x3cca40829f001197
> +        .quad 0x3ff0000000000000, 0x3cc7eef13b59e96c
> +        .quad 0x3ff0000000000000, 0x3cc5d11e1a252bf5
> +        .quad 0x3ff0000000000000, 0x3cc3e296303b2297
> +        .quad 0x3ff0000000000000, 0x3cc21f47009f43ce
> +        .quad 0x3ff0000000000000, 0x3cc083768c5e4542
> +        .quad 0x3ff0000000000000, 0x3cbe1777d831265f
> +        .quad 0x3ff0000000000000, 0x3cbb69f10b0191b5
> +        .quad 0x3ff0000000000000, 0x3cb8f8a3a05b5b53
> +        .quad 0x3ff0000000000000, 0x3cb6be573c40c8e7
> +        .quad 0x3ff0000000000000, 0x3cb4b645ba991fdb
> +        .align 32
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff  /* _AbsMask */
> +        .align 32
> +        .quad 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000  /* _MaxThreshold = 6.0 - 1.0/128.0 */
> +        .align 32
> +        .quad 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000  /* SRound */
> +        .align 32
> +        .quad 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000  /* _U2THreshold  */
> +        .align 32
> +        .quad 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5  /* _poly_1_0 */
> +        .align 32
> +        .quad 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1  /* _poly_1_1 */
> +        .align 32
> +        .quad 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57  /* _poly_3_0 */
> +        .align 32
> +        .quad 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8  /* _poly_3_1 */
> +        .align 32
> +        .quad 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F  /* _poly_5_0 */
> +        .align 32
> +        .quad 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122  /* _poly_5_1 */
> +        .align 32
> +        .quad 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6  /* _poly_1_2 */
> +        .align 32
> +        .quad 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd  /* _poly_3_2 */
> +        .align 32
> +        .quad 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c  /* _poly_1_3 */
> +        .align 32
> +        .quad 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555  /* _poly_3_3 */
> +        .align 32
> +        .quad 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff  /* _Mask32 */
> +        .align 32
> +        .type	__svml_derf_data_internal,@object
> +        .size	__svml_derf_data_internal,.-__svml_derf_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core-avx2.S
> new file mode 100644
> index 0000000000..3456142289
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized erf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_erf _ZGVeN8v_erf_avx2_wrapper
> +#include "../svml_d_erf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core.c
> new file mode 100644
> index 0000000000..78e4a852c6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized erf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_erf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_erf, __GI__ZGVeN8v_erf, __redirect__ZGVeN8v_erf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core_avx512.S
> new file mode 100644
> index 0000000000..38f373102a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_erf8_core_avx512.S
> @@ -0,0 +1,983 @@
> +/* Function erf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Basic formula is
> + *    erf(x) ~ erf(x0) +
> + *              + exp(-x0*x0)*D*(1+c0+T*P1(T)+D^2*P3(T)+D^4*P5(T)+D^6*p7+D^8*p9)
> + *   where D=x-x0, T=x0*D
> + *   x0 is x rounded to a specified number of fractional bits (in this case 7),
> + *    except that x0=0 for |x|<3.5/128.0 (using x0=0 for first 4 table entries)
> + *
> + *   Data table packs both erf(x0)_high and a few bits of erf(x0)_low in one
> + *   entry (in place of redundant exponent bits)
> + *
> + */
> +
> +/* Offsets for data table __svml_derf_data_internal
> + */
> +#define _erf_tbl                      	0
> +#define _AbsMask                      	12288
> +#define _MaxThreshold                 	12352
> +#define _SRound                       	12416
> +#define _U2Threshold                  	12480
> +#define _poly1_0                      	12544
> +#define _poly1_1                      	12608
> +#define _poly3_0                      	12672
> +#define _poly3_1                      	12736
> +#define _poly5_0                      	12800
> +#define _poly5_1                      	12864
> +#define _poly1_2                      	12928
> +#define _poly3_2                      	12992
> +#define _poly1_3                      	13056
> +#define _poly3_3                      	13120
> +#define _Mask32                       	13184
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_erf_skx)
> +/*
> + * vector gather: erf(x0),
> + * second value is exp(-x0*x0)
> + */
> +        lea       __svml_derf_data_internal(%rip), %rax
> +
> +/*
> + * erf(x) rounds to 1.0 for x>_MaxThreshold (5.9921875)
> + * can compute all results in the main path
> + */
> +        vmovups   _MaxThreshold+__svml_derf_data_internal(%rip), %zmm9
> +        vmovups   _SRound+__svml_derf_data_internal(%rip), %zmm11
> +        vmovups   _U2Threshold+__svml_derf_data_internal(%rip), %zmm10
> +        vandpd    _AbsMask+__svml_derf_data_internal(%rip), %zmm0, %zmm7
> +        vpternlogd $0xff, %zmm1, %zmm1, %zmm14
> +        kxnorw    %k0, %k0, %k3
> +        kxnorw    %k0, %k0, %k2
> +        vminpd    {sae}, %zmm9, %zmm7, %zmm12
> +
> +/* save sign */
> +        vxorpd    %zmm0, %zmm7, %zmm8
> +        vaddpd    {rn-sae}, %zmm11, %zmm12, %zmm15
> +        vcmppd    $26, {sae}, %zmm10, %zmm12, %k1
> +
> +/*
> + * _LA_ polynomial computation
> + * Start polynomial evaluation
> + */
> +        vmovups   _poly1_0+__svml_derf_data_internal(%rip), %zmm10
> +        vpsllq    $4, %zmm15, %zmm3
> +        vsubpd    {rn-sae}, %zmm11, %zmm15, %zmm13
> +        vmovups   _poly3_0+__svml_derf_data_internal(%rip), %zmm11
> +        vmovups   _poly3_3+__svml_derf_data_internal(%rip), %zmm15
> +        vsubpd    {rn-sae}, %zmm13, %zmm12, %zmm1
> +        vmulpd    {rn-sae}, %zmm1, %zmm13, %zmm6
> +
> +/* NaN fixup */
> +        vminpd    {sae}, %zmm7, %zmm1, %zmm7
> +        vmovups   _poly1_2+__svml_derf_data_internal(%rip), %zmm13
> +        vpandq    _Mask32+__svml_derf_data_internal(%rip), %zmm3, %zmm2
> +        vpmovqd   %zmm2, %ymm0
> +        vmovups   _poly1_1+__svml_derf_data_internal(%rip), %zmm2
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm10, %zmm2
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm6, %zmm2
> +        vpxord    %zmm4, %zmm4, %zmm4
> +        vgatherdpd 8(%rax,%ymm0), %zmm4{%k3}
> +        vpxord    %zmm5, %zmm5, %zmm5
> +        vgatherdpd (%rax,%ymm0), %zmm5{%k2}
> +        vmovups   _poly3_1+__svml_derf_data_internal(%rip), %zmm0
> +
> +/* Sign | _Erf_H */
> +        vxorpd    %zmm8, %zmm5, %zmm5
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm11, %zmm0
> +        vpandnq   %zmm12, %zmm12, %zmm14{%k1}
> +        vandpd    %zmm14, %zmm1, %zmm9
> +
> +/* Sign | Diff */
> +        vxorpd    %zmm8, %zmm7, %zmm1
> +        vmovups   _poly5_0+__svml_derf_data_internal(%rip), %zmm12
> +        vmovups   _poly5_1+__svml_derf_data_internal(%rip), %zmm7
> +        vmovups   _poly3_2+__svml_derf_data_internal(%rip), %zmm14
> +
> +/* D2 = Diff^2 */
> +        vmulpd    {rn-sae}, %zmm9, %zmm9, %zmm3
> +
> +/* T^2 */
> +        vmulpd    {rn-sae}, %zmm6, %zmm6, %zmm9
> +
> +/* exp_h(x0) * Diff */
> +        vmulpd    {rn-sae}, %zmm1, %zmm4, %zmm4
> +        vfmadd231pd {rn-sae}, %zmm6, %zmm12, %zmm7
> +        vmovups   _poly1_3+__svml_derf_data_internal(%rip), %zmm12
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm6, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm15, %zmm3, %zmm7
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm6, %zmm2
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm6, %zmm0
> +
> +/* P1 = T^2*P1 - T */
> +        vfmsub213pd {rn-sae}, %zmm6, %zmm9, %zmm2
> +
> +/* P1 + P3*D2 */
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm3, %zmm0
> +
> +/*
> + * branch-free
> + * low part of result: exp_h(x0) * Diff*(1+P1)
> + */
> +        vfmadd213pd {rn-sae}, %zmm4, %zmm4, %zmm0
> +
> +/* Final result */
> +        vaddpd    {rn-sae}, %zmm5, %zmm0, %zmm6
> +
> +/* Fix erf(-0) = -0 */
> +        vorpd     %zmm8, %zmm6, %zmm0
> +        ret
> +
> +END(_ZGVeN8v_erf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_derf_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _erf_tbl[6*128*2][2];
> +        __declspec(align(64)) VUINT32 _AbsMask[8][2];
> +        __declspec(align(64)) VUINT32 _MaxThreshold[8][2];
> +        __declspec(align(64)) VUINT32 _SRound[8][2];
> +        __declspec(align(64)) VUINT32 _U2Threshold[8][2];
> +        __declspec(align(64)) VUINT32 _poly1_0[8][2];
> +        __declspec(align(64)) VUINT32 _poly1_1[8][2];
> +        __declspec(align(64)) VUINT32 _poly3_0[8][2];
> +        __declspec(align(64)) VUINT32 _poly3_1[8][2];
> +        __declspec(align(64)) VUINT32 _poly5_0[8][2];
> +        __declspec(align(64)) VUINT32 _poly5_1[8][2];
> +        __declspec(align(64)) VUINT32 _poly1_2[8][2];
> +        __declspec(align(64)) VUINT32 _poly3_2[8][2];
> +        __declspec(align(64)) VUINT32 _poly1_3[8][2];
> +        __declspec(align(64)) VUINT32 _poly3_3[8][2];
> +        __declspec(align(64)) VUINT32 _Mask32[8][2];
> +} __svml_derf_data_internal;
> +#endif
> +__svml_derf_data_internal:
> +        /*== _erf_tbl ==*/
> +        .quad 0x0000000000000000, 0x3ff20dd750429b6d
> +        .quad 0x3f820dbf3deb1340, 0x3ff20d8f1975c85d
> +        .quad 0x3f920d77083f17a0, 0x3ff20cb67bd452c7
> +        .quad 0x3f9b137e0cf584dc, 0x3ff20b4d8bac36c1
> +        .quad 0x3fa20c5645dd2538, 0x3ff209546ad13ccf
> +        .quad 0x3fa68e5d3bbc9526, 0x3ff206cb4897b148
> +        .quad 0x3fab0fafef135745, 0x3ff203b261cd0053
> +        .quad 0x3faf902a77bd3821, 0x3ff2000a00ae3804
> +        .quad 0x3fb207d480e90658, 0x3ff1fbd27cdc72d3
> +        .quad 0x3fb44703e87e8593, 0x3ff1f70c3b4f2cc8
> +        .quad 0x3fb68591a1e83b5d, 0x3ff1f1b7ae44867f
> +        .quad 0x3fb8c36beb8a8d23, 0x3ff1ebd5552f795b
> +        .quad 0x3fbb0081148a873a, 0x3ff1e565bca400d4
> +        .quad 0x3fbd3cbf7e70a4b3, 0x3ff1de697e413d29
> +        .quad 0x3fbf78159ec8bb50, 0x3ff1d6e14099944a
> +        .quad 0x3fc0d939005f65e5, 0x3ff1cecdb718d61c
> +        .quad 0x3fc1f5e1a35c3b89, 0x3ff1c62fa1e869b6
> +        .quad 0x3fc311fc15f56d14, 0x3ff1bd07cdd189ac
> +        .quad 0x3fc42d7fc2f64959, 0x3ff1b357141d95d5
> +        .quad 0x3fc548642321d7c6, 0x3ff1a91e5a748165
> +        .quad 0x3fc662a0bdf7a89f, 0x3ff19e5e92b964ab
> +        .quad 0x3fc77c2d2a765f9e, 0x3ff19318bae53a04
> +        .quad 0x3fc895010fdbdbfd, 0x3ff1874ddcdfce24
> +        .quad 0x3fc9ad142662e14d, 0x3ff17aff0e56ec10
> +        .quad 0x3fcac45e37fe2526, 0x3ff16e2d7093cd8c
> +        .quad 0x3fcbdad72110a648, 0x3ff160da304ed92f
> +        .quad 0x3fccf076d1233237, 0x3ff153068581b781
> +        .quad 0x3fce05354b96ff36, 0x3ff144b3b337c90c
> +        .quad 0x3fcf190aa85540e2, 0x3ff135e3075d076b
> +        .quad 0x3fd015f78a3dcf3d, 0x3ff12695da8b5bde
> +        .quad 0x3fd09eed6982b948, 0x3ff116cd8fd67618
> +        .quad 0x3fd127631eb8de32, 0x3ff1068b94962e5e
> +        .quad 0x3fd1af54e232d609, 0x3ff0f5d1602f7e41
> +        .quad 0x3fd236bef825d9a2, 0x3ff0e4a073dc1b91
> +        .quad 0x3fd2bd9db0f7827f, 0x3ff0d2fa5a70c168
> +        .quad 0x3fd343ed6989b7d9, 0x3ff0c0e0a8223359
> +        .quad 0x3fd3c9aa8b84beda, 0x3ff0ae54fa490723
> +        .quad 0x3fd44ed18d9f6462, 0x3ff09b58f724416b
> +        .quad 0x3fd4d35ef3e5372e, 0x3ff087ee4d9ad247
> +        .quad 0x3fd5574f4ffac98e, 0x3ff07416b4fbfe7c
> +        .quad 0x3fd5da9f415ff23f, 0x3ff05fd3ecbec298
> +        .quad 0x3fd65d4b75b00471, 0x3ff04b27bc403d30
> +        .quad 0x3fd6df50a8dff772, 0x3ff03613f2812daf
> +        .quad 0x3fd760aba57a76bf, 0x3ff0209a65e29545
> +        .quad 0x3fd7e15944d9d3e4, 0x3ff00abcf3e187a9
> +        .quad 0x3fd861566f5fd3c0, 0x3fefe8fb01a47307
> +        .quad 0x3fd8e0a01cab516b, 0x3fefbbbbef34b4b2
> +        .quad 0x3fd95f3353cbb146, 0x3fef8dc092d58ff8
> +        .quad 0x3fd9dd0d2b721f39, 0x3fef5f0cdaf15313
> +        .quad 0x3fda5a2aca209394, 0x3fef2fa4c16c0019
> +        .quad 0x3fdad68966569a87, 0x3feeff8c4b1375db
> +        .quad 0x3fdb522646bbda68, 0x3feecec7870ebca8
> +        .quad 0x3fdbccfec24855b8, 0x3fee9d5a8e4c934e
> +        .quad 0x3fdc4710406a65fc, 0x3fee6b4982f158b9
> +        .quad 0x3fdcc058392a6d2d, 0x3fee38988fc46e72
> +        .quad 0x3fdd38d4354c3bd0, 0x3fee054be79d3042
> +        .quad 0x3fddb081ce6e2a48, 0x3fedd167c4cf9d2a
> +        .quad 0x3fde275eaf25e458, 0x3fed9cf06898cdaf
> +        .quad 0x3fde9d68931ae650, 0x3fed67ea1a8b5368
> +        .quad 0x3fdf129d471eabb1, 0x3fed325927fb9d89
> +        .quad 0x3fdf86faa9428f9d, 0x3fecfc41e36c7df9
> +        .quad 0x3fdffa7ea8eb5fd0, 0x3fecc5a8a3fbea40
> +        .quad 0x3fe03693a371519c, 0x3fec8e91c4d01368
> +        .quad 0x3fe06f794ab2cae7, 0x3fec5701a484ef9d
> +        .quad 0x3fe0a7ef5c18edd2, 0x3fec1efca49a5011
> +        .quad 0x3fe0dff4f247f6c6, 0x3febe68728e29d5e
> +        .quad 0x3fe1178930ada115, 0x3febada596f25436
> +        .quad 0x3fe14eab43841b55, 0x3feb745c55905bf8
> +        .quad 0x3fe1855a5fd3dd50, 0x3feb3aafcc27502e
> +        .quad 0x3fe1bb95c3746199, 0x3feb00a46237d5be
> +        .quad 0x3fe1f15cb50bc4de, 0x3feac63e7ecc1411
> +        .quad 0x3fe226ae840d4d70, 0x3fea8b8287ec6a09
> +        .quad 0x3fe25b8a88b6dd7f, 0x3fea5074e2157620
> +        .quad 0x3fe28ff0240d52cd, 0x3fea1519efaf889e
> +        .quad 0x3fe2c3debfd7d6c1, 0x3fe9d97610879642
> +        .quad 0x3fe2f755ce9a21f4, 0x3fe99d8da149c13f
> +        .quad 0x3fe32a54cb8db67b, 0x3fe96164fafd8de3
> +        .quad 0x3fe35cdb3a9a144d, 0x3fe925007283d7aa
> +        .quad 0x3fe38ee8a84beb71, 0x3fe8e86458169af8
> +        .quad 0x3fe3c07ca9cb4f9e, 0x3fe8ab94f6caa71d
> +        .quad 0x3fe3f196dcd0f135, 0x3fe86e9694134b9e
> +        .quad 0x3fe42236e79a5fa6, 0x3fe8316d6f48133d
> +        .quad 0x3fe4525c78dd5966, 0x3fe7f41dc12c9e89
> +        .quad 0x3fe4820747ba2dc2, 0x3fe7b6abbb7aaf19
> +        .quad 0x3fe4b13713ad3513, 0x3fe7791b886e7403
> +        .quad 0x3fe4dfeba47f63cc, 0x3fe73b714a552763
> +        .quad 0x3fe50e24ca35fd2c, 0x3fe6fdb11b1e0c34
> +        .quad 0x3fe53be25d016a4f, 0x3fe6bfdf0beddaf5
> +        .quad 0x3fe569243d2b3a9b, 0x3fe681ff24b4ab04
> +        .quad 0x3fe595ea53035283, 0x3fe6441563c665d4
> +        .quad 0x3fe5c2348ecc4dc3, 0x3fe60625bd75d07b
> +        .quad 0x3fe5ee02e8a71a53, 0x3fe5c8341bb23767
> +        .quad 0x3fe61955607dd15d, 0x3fe58a445da7c74c
> +        .quad 0x3fe6442bfdedd397, 0x3fe54c5a57629db0
> +        .quad 0x3fe66e86d0312e82, 0x3fe50e79d1749ac9
> +        .quad 0x3fe69865ee075011, 0x3fe4d0a6889dfd9f
> +        .quad 0x3fe6c1c9759d0e5f, 0x3fe492e42d78d2c5
> +        .quad 0x3fe6eab18c74091b, 0x3fe4553664273d24
> +        .quad 0x3fe7131e5f496a5a, 0x3fe417a0c4049fd0
> +        .quad 0x3fe73b1021fc0cb8, 0x3fe3da26d759aef5
> +        .quad 0x3fe762870f720c6f, 0x3fe39ccc1b136d5a
> +        .quad 0x3fe78983697dc96f, 0x3fe35f93fe7d1b3d
> +        .quad 0x3fe7b00578c26037, 0x3fe32281e2fd1a92
> +        .quad 0x3fe7d60d8c979f7b, 0x3fe2e5991bd4cbfc
> +        .quad 0x3fe7fb9bfaed8078, 0x3fe2a8dcede3673b
> +        .quad 0x3fe820b1202f27fb, 0x3fe26c508f6bd0ff
> +        .quad 0x3fe8454d5f25760d, 0x3fe22ff727dd6f7b
> +        .quad 0x3fe8697120d92a4a, 0x3fe1f3d3cf9ffe5a
> +        .quad 0x3fe88d1cd474a2e0, 0x3fe1b7e98fe26217
> +        .quad 0x3fe8b050ef253c37, 0x3fe17c3b626c7a12
> +        .quad 0x3fe8d30debfc572e, 0x3fe140cc3173f007
> +        .quad 0x3fe8f5544bd00c04, 0x3fe1059ed7740313
> +        .quad 0x3fe91724951b8fc6, 0x3fe0cab61f084b93
> +        .quad 0x3fe9387f53df5238, 0x3fe09014c2ca74da
> +        .quad 0x3fe959651980da31, 0x3fe055bd6d32e8d7
> +        .quad 0x3fe979d67caa6631, 0x3fe01bb2b87c6968
> +        .quad 0x3fe999d4192a5715, 0x3fdfc3ee5d1524b0
> +        .quad 0x3fe9b95e8fd26aba, 0x3fdf511a91a67d2a
> +        .quad 0x3fe9d8768656cc42, 0x3fdedeeee0959518
> +        .quad 0x3fe9f71ca72cffb6, 0x3fde6d6ffaa65a25
> +        .quad 0x3fea1551a16aaeaf, 0x3fddfca26f5bbf88
> +        .quad 0x3fea331628a45b92, 0x3fdd8c8aace11e63
> +        .quad 0x3fea506af4cc00f4, 0x3fdd1d2cfff91594
> +        .quad 0x3fea6d50c20fa293, 0x3fdcae8d93f1d7b7
> +        .quad 0x3fea89c850b7d54d, 0x3fdc40b0729ed548
> +        .quad 0x3feaa5d265064366, 0x3fdbd3998457afdb
> +        .quad 0x3feac16fc7143263, 0x3fdb674c8ffc6283
> +        .quad 0x3feadca142b10f98, 0x3fdafbcd3afe8ab6
> +        .quad 0x3feaf767a741088b, 0x3fda911f096fbc26
> +        .quad 0x3feb11c3c79bb424, 0x3fda27455e14c93c
> +        .quad 0x3feb2bb679ead19c, 0x3fd9be437a7de946
> +        .quad 0x3feb4540978921ee, 0x3fd9561c7f23a47b
> +        .quad 0x3feb5e62fce16095, 0x3fd8eed36b886d93
> +        .quad 0x3feb771e894d602e, 0x3fd8886b1e5ecfd1
> +        .quad 0x3feb8f741ef54f83, 0x3fd822e655b417e7
> +        .quad 0x3feba764a2af2b78, 0x3fd7be47af1f5d89
> +        .quad 0x3febbef0fbde6221, 0x3fd75a91a7f4d2ed
> +        .quad 0x3febd61a1453ab44, 0x3fd6f7c69d7d3ef8
> +        .quad 0x3febece0d82d1a5c, 0x3fd695e8cd31867e
> +        .quad 0x3fec034635b66e23, 0x3fd634fa54fa285f
> +        .quad 0x3fec194b1d49a184, 0x3fd5d4fd33729015
> +        .quad 0x3fec2ef0812fc1bd, 0x3fd575f3483021c3
> +        .quad 0x3fec443755820d64, 0x3fd517de540ce2a3
> +        .quad 0x3fec5920900b5fd1, 0x3fd4babff975a04c
> +        .quad 0x3fec6dad2829ec62, 0x3fd45e99bcbb7915
> +        .quad 0x3fec81de16b14cef, 0x3fd4036d0468a7a2
> +        .quad 0x3fec95b455cce69d, 0x3fd3a93b1998736c
> +        .quad 0x3feca930e0e2a825, 0x3fd35005285227f1
> +        .quad 0x3fecbc54b476248d, 0x3fd2f7cc3fe6f423
> +        .quad 0x3feccf20ce0c0d27, 0x3fd2a09153529381
> +        .quad 0x3fece1962c0e0d8b, 0x3fd24a55399ea239
> +        .quad 0x3fecf3b5cdaf0c39, 0x3fd1f518ae487dc8
> +        .quad 0x3fed0580b2cfd249, 0x3fd1a0dc51a9934d
> +        .quad 0x3fed16f7dbe41ca0, 0x3fd14da0a961fd14
> +        .quad 0x3fed281c49d818d0, 0x3fd0fb6620c550af
> +        .quad 0x3fed38eefdf64fdd, 0x3fd0aa2d09497f2b
> +        .quad 0x3fed4970f9ce00d9, 0x3fd059f59af7a906
> +        .quad 0x3fed59a33f19ed42, 0x3fd00abff4dec7a3
> +        .quad 0x3fed6986cfa798e7, 0x3fcf79183b101c5b
> +        .quad 0x3fed791cad3eff01, 0x3fcedeb406d9c825
> +        .quad 0x3fed8865d98abe01, 0x3fce4652fadcb6b2
> +        .quad 0x3fed97635600bb89, 0x3fcdaff4969c0b04
> +        .quad 0x3feda61623cb41e0, 0x3fcd1b982c501370
> +        .quad 0x3fedb47f43b2980d, 0x3fcc893ce1dcbef7
> +        .quad 0x3fedc29fb60715af, 0x3fcbf8e1b1ca2279
> +        .quad 0x3fedd0787a8bb39d, 0x3fcb6a856c3ed54f
> +        .quad 0x3fedde0a90611a0d, 0x3fcade26b7fbed95
> +        .quad 0x3fedeb56f5f12d28, 0x3fca53c4135a6526
> +        .quad 0x3fedf85ea8db188e, 0x3fc9cb5bd549b111
> +        .quad 0x3fee0522a5dfda73, 0x3fc944ec2e4f5630
> +        .quad 0x3fee11a3e8cf4eb8, 0x3fc8c07329874652
> +        .quad 0x3fee1de36c75ba58, 0x3fc83deeada4d25a
> +        .quad 0x3fee29e22a89d766, 0x3fc7bd5c7df3fe9c
> +        .quad 0x3fee35a11b9b61ce, 0x3fc73eba3b5b07b7
> +        .quad 0x3fee4121370224cc, 0x3fc6c205655be720
> +        .quad 0x3fee4c6372cd8927, 0x3fc6473b5b15a7a1
> +        .quad 0x3fee5768c3b4a3fc, 0x3fc5ce595c455b0a
> +        .quad 0x3fee62321d06c5e0, 0x3fc5575c8a468362
> +        .quad 0x3fee6cc0709c8a0d, 0x3fc4e241e912c305
> +        .quad 0x3fee7714aec96534, 0x3fc46f066040a832
> +        .quad 0x3fee812fc64db369, 0x3fc3fda6bc016994
> +        .quad 0x3fee8b12a44944a8, 0x3fc38e1fae1d6a9d
> +        .quad 0x3fee94be342e6743, 0x3fc3206dceef5f87
> +        .quad 0x3fee9e335fb56f87, 0x3fc2b48d9e5dea1c
> +        .quad 0x3feea7730ed0bbb9, 0x3fc24a7b84d38971
> +        .quad 0x3feeb07e27a133aa, 0x3fc1e233d434b813
> +        .quad 0x3feeb9558e6b42ce, 0x3fc17bb2c8d41535
> +        .quad 0x3feec1fa258c4bea, 0x3fc116f48a6476cc
> +        .quad 0x3feeca6ccd709544, 0x3fc0b3f52ce8c383
> +        .quad 0x3feed2ae6489ac1e, 0x3fc052b0b1a174ea
> +        .quad 0x3feedabfc7453e63, 0x3fbfe6460fef4680
> +        .quad 0x3feee2a1d004692c, 0x3fbf2a901ccafb37
> +        .quad 0x3feeea5557137ae0, 0x3fbe723726b824a9
> +        .quad 0x3feef1db32a2277c, 0x3fbdbd32ac4c99b0
> +        .quad 0x3feef93436bc2daa, 0x3fbd0b7a0f921e7c
> +        .quad 0x3fef006135426b26, 0x3fbc5d0497c09e74
> +        .quad 0x3fef0762fde45ee6, 0x3fbbb1c972f23e50
> +        .quad 0x3fef0e3a5e1a1788, 0x3fbb09bfb7d11a84
> +        .quad 0x3fef14e8211e8c55, 0x3fba64de673e8837
> +        .quad 0x3fef1b6d0fea5f4d, 0x3fb9c31c6df3b1b8
> +        .quad 0x3fef21c9f12f0677, 0x3fb92470a61b6965
> +        .quad 0x3fef27ff89525acf, 0x3fb888d1d8e510a3
> +        .quad 0x3fef2e0e9a6a8b09, 0x3fb7f036c0107294
> +        .quad 0x3fef33f7e43a706b, 0x3fb75a96077274ba
> +        .quad 0x3fef39bc242e43e6, 0x3fb6c7e64e7281cb
> +        .quad 0x3fef3f5c1558b19e, 0x3fb6381e2980956b
> +        .quad 0x3fef44d870704911, 0x3fb5ab342383d178
> +        .quad 0x3fef4a31ebcd47df, 0x3fb5211ebf41880b
> +        .quad 0x3fef4f693b67bd77, 0x3fb499d478bca735
> +        .quad 0x3fef547f10d60597, 0x3fb4154bc68d75c3
> +        .quad 0x3fef59741b4b97cf, 0x3fb3937b1b31925a
> +        .quad 0x3fef5e4907982a07, 0x3fb31458e6542847
> +        .quad 0x3fef62fe80272419, 0x3fb297db960e4f63
> +        .quad 0x3fef67952cff6282, 0x3fb21df9981f8e53
> +        .quad 0x3fef6c0db3c34641, 0x3fb1a6a95b1e786f
> +        .quad 0x3fef7068b7b10fd9, 0x3fb131e14fa1625d
> +        .quad 0x3fef74a6d9a38383, 0x3fb0bf97e95f2a64
> +        .quad 0x3fef78c8b812d498, 0x3fb04fc3a0481321
> +        .quad 0x3fef7cceef15d631, 0x3fafc4b5e32d6259
> +        .quad 0x3fef80ba18636f07, 0x3faeeea8c1b1db94
> +        .quad 0x3fef848acb544e95, 0x3fae1d4cf1e2450a
> +        .quad 0x3fef88419ce4e184, 0x3fad508f9a1ea64f
> +        .quad 0x3fef8bdf1fb78370, 0x3fac885df3451a07
> +        .quad 0x3fef8f63e416ebff, 0x3fabc4a54a84e834
> +        .quad 0x3fef92d077f8d56d, 0x3fab055303221015
> +        .quad 0x3fef96256700da8e, 0x3faa4a549829587e
> +        .quad 0x3fef99633a838a57, 0x3fa993979e14fffe
> +        .quad 0x3fef9c8a7989af0d, 0x3fa8e109c4622913
> +        .quad 0x3fef9f9ba8d3c733, 0x3fa83298d717210e
> +        .quad 0x3fefa2974addae45, 0x3fa78832c03aa2b1
> +        .quad 0x3fefa57ddfe27376, 0x3fa6e1c5893c380b
> +        .quad 0x3fefa84fe5e05c8d, 0x3fa63f3f5c4de13b
> +        .quad 0x3fefab0dd89d1309, 0x3fa5a08e85af27e0
> +        .quad 0x3fefadb831a9f9c3, 0x3fa505a174e9c929
> +        .quad 0x3fefb04f6868a944, 0x3fa46e66be002240
> +        .quad 0x3fefb2d3f20f9101, 0x3fa3dacd1a8d8cce
> +        .quad 0x3fefb54641aebbc9, 0x3fa34ac36ad8dafe
> +        .quad 0x3fefb7a6c834b5a2, 0x3fa2be38b6d92415
> +        .quad 0x3fefb9f5f4739170, 0x3fa2351c2f2d1449
> +        .quad 0x3fefbc3433260ca5, 0x3fa1af5d2e04f3f6
> +        .quad 0x3fefbe61eef4cf6a, 0x3fa12ceb37ff9bc3
> +        .quad 0x3fefc07f907bc794, 0x3fa0adb5fcfa8c75
> +        .quad 0x3fefc28d7e4f9cd0, 0x3fa031ad58d56279
> +        .quad 0x3fefc48c1d033c7a, 0x3f9f7182a851bca2
> +        .quad 0x3fefc67bcf2d7b8f, 0x3f9e85c449e377f3
> +        .quad 0x3fefc85cf56ecd38, 0x3f9da0005e5f28df
> +        .quad 0x3fefca2fee770c79, 0x3f9cc0180af00a8b
> +        .quad 0x3fefcbf5170b578b, 0x3f9be5ecd2fcb5f9
> +        .quad 0x3fefcdacca0bfb73, 0x3f9b1160991ff737
> +        .quad 0x3fefcf57607a6e7c, 0x3f9a4255a00b9f03
> +        .quad 0x3fefd0f5317f582f, 0x3f9978ae8b55ce1b
> +        .quad 0x3fefd2869270a56f, 0x3f98b44e6031383e
> +        .quad 0x3fefd40bd6d7a785, 0x3f97f5188610ddc8
> +        .quad 0x3fefd58550773cb5, 0x3f973af0c737bb45
> +        .quad 0x3fefd6f34f52013a, 0x3f9685bb5134ef13
> +        .quad 0x3fefd85621b0876d, 0x3f95d55cb54cd53a
> +        .quad 0x3fefd9ae142795e3, 0x3f9529b9e8cf9a1e
> +        .quad 0x3fefdafb719e6a69, 0x3f9482b8455dc491
> +        .quad 0x3fefdc3e835500b3, 0x3f93e03d891b37de
> +        .quad 0x3fefdd7790ea5bc0, 0x3f93422fd6d12e2b
> +        .quad 0x3fefdea6e062d0c9, 0x3f92a875b5ffab56
> +        .quad 0x3fefdfccb62e52d3, 0x3f9212f612dee7fb
> +        .quad 0x3fefe0e9552ebdd6, 0x3f9181983e5133dd
> +        .quad 0x3fefe1fcfebe2083, 0x3f90f443edc5ce49
> +        .quad 0x3fefe307f2b503d0, 0x3f906ae13b0d3255
> +        .quad 0x3fefe40a6f70af4b, 0x3f8fcab1483ea7fc
> +        .quad 0x3fefe504b1d9696c, 0x3f8ec72615a894c4
> +        .quad 0x3fefe5f6f568b301, 0x3f8dcaf3691fc448
> +        .quad 0x3fefe6e1742f7cf6, 0x3f8cd5ec93c12432
> +        .quad 0x3fefe7c466dc57a1, 0x3f8be7e5ac24963b
> +        .quad 0x3fefe8a004c19ae6, 0x3f8b00b38d6b3575
> +        .quad 0x3fefe97483db8670, 0x3f8a202bd6372dce
> +        .quad 0x3fefea4218d6594a, 0x3f894624e78e0faf
> +        .quad 0x3fefeb08f7146046, 0x3f887275e3a6869e
> +        .quad 0x3fefebc950b3fa75, 0x3f87a4f6aca256cb
> +        .quad 0x3fefec835695932e, 0x3f86dd7fe3358230
> +        .quad 0x3fefed37386190fb, 0x3f861beae53b72b7
> +        .quad 0x3fefede5248e38f4, 0x3f856011cc3b036d
> +        .quad 0x3fefee8d486585ee, 0x3f84a9cf6bda3f4c
> +        .quad 0x3fefef2fd00af31a, 0x3f83f8ff5042a88e
> +        .quad 0x3fefefcce6813974, 0x3f834d7dbc76d7e5
> +        .quad 0x3feff064b5afffbe, 0x3f82a727a89a3f14
> +        .quad 0x3feff0f766697c76, 0x3f8205dac02bd6b9
> +        .quad 0x3feff18520700971, 0x3f81697560347b26
> +        .quad 0x3feff20e0a7ba8c2, 0x3f80d1d69569b82d
> +        .quad 0x3feff2924a3f7a83, 0x3f803ede1a45bfee
> +        .quad 0x3feff312046f2339, 0x3f7f60d8aa2a88f2
> +        .quad 0x3feff38d5cc4227f, 0x3f7e4cc4abf7d065
> +        .quad 0x3feff404760319b4, 0x3f7d4143a9dfe965
> +        .quad 0x3feff47772010262, 0x3f7c3e1a5f5c077c
> +        .quad 0x3feff4e671a85425, 0x3f7b430ecf4a83a8
> +        .quad 0x3feff55194fe19df, 0x3f7a4fe83fb9db25
> +        .quad 0x3feff5b8fb26f5f6, 0x3f79646f35a76624
> +        .quad 0x3feff61cc26c1578, 0x3f78806d70b2fc36
> +        .quad 0x3feff67d08401202, 0x3f77a3ade6c8b3e5
> +        .quad 0x3feff6d9e943c231, 0x3f76cdfcbfc1e263
> +        .quad 0x3feff733814af88c, 0x3f75ff2750fe7820
> +        .quad 0x3feff789eb6130c9, 0x3f7536fc18f7ce5c
> +        .quad 0x3feff7dd41ce2b4d, 0x3f74754abacdf1dc
> +        .quad 0x3feff82d9e1a76d8, 0x3f73b9e3f9d06e3f
> +        .quad 0x3feff87b1913e853, 0x3f730499b503957f
> +        .quad 0x3feff8c5cad200a5, 0x3f72553ee2a336bf
> +        .quad 0x3feff90dcaba4096, 0x3f71aba78ba3af89
> +        .quad 0x3feff9532f846ab0, 0x3f7107a8c7323a6e
> +        .quad 0x3feff9960f3eb327, 0x3f706918b6355624
> +        .quad 0x3feff9d67f51ddba, 0x3f6f9f9cfd9c3035
> +        .quad 0x3feffa14948549a7, 0x3f6e77448fb66bb9
> +        .quad 0x3feffa506302ebae, 0x3f6d58da68fd1170
> +        .quad 0x3feffa89fe5b3625, 0x3f6c4412bf4b8f0b
> +        .quad 0x3feffac17988ef4b, 0x3f6b38a3af2e55b4
> +        .quad 0x3feffaf6e6f4f5c0, 0x3f6a3645330550ff
> +        .quad 0x3feffb2a5879f35e, 0x3f693cb11a30d765
> +        .quad 0x3feffb5bdf67fe6f, 0x3f684ba3004a50d0
> +        .quad 0x3feffb8b8c88295f, 0x3f6762d84469c18f
> +        .quad 0x3feffbb970200110, 0x3f66821000795a03
> +        .quad 0x3feffbe599f4f9d9, 0x3f65a90b00981d93
> +        .quad 0x3feffc10194fcb64, 0x3f64d78bba8ca5fd
> +        .quad 0x3feffc38fcffbb7c, 0x3f640d564548fad7
> +        .quad 0x3feffc60535dd7f5, 0x3f634a305080681f
> +        .quad 0x3feffc862a501fd7, 0x3f628de11c5031eb
> +        .quad 0x3feffcaa8f4c9bea, 0x3f61d83170fbf6fb
> +        .quad 0x3feffccd8f5c66d1, 0x3f6128eb96be8798
> +        .quad 0x3feffcef371ea4d7, 0x3f607fdb4dafea5f
> +        .quad 0x3feffd0f92cb6ba7, 0x3f5fb99b8b8279e1
> +        .quad 0x3feffd2eae369a07, 0x3f5e7f232d9e2630
> +        .quad 0x3feffd4c94d29fdb, 0x3f5d4fed7195d7e8
> +        .quad 0x3feffd6951b33686, 0x3f5c2b9cf7f893bf
> +        .quad 0x3feffd84ef9009ee, 0x3f5b11d702b3deb2
> +        .quad 0x3feffd9f78c7524a, 0x3f5a024365f771bd
> +        .quad 0x3feffdb8f7605ee7, 0x3f58fc8c794b03b5
> +        .quad 0x3feffdd1750e1220, 0x3f58005f08d6f1ef
> +        .quad 0x3feffde8fb314ebf, 0x3f570d6a46e07dda
> +        .quad 0x3feffdff92db56e5, 0x3f56235fbd7a4345
> +        .quad 0x3feffe1544d01ccb, 0x3f5541f340697987
> +        .quad 0x3feffe2a1988857c, 0x3f5468dadf4080ab
> +        .quad 0x3feffe3e19349dc7, 0x3f5397ced7af2b15
> +        .quad 0x3feffe514bbdc197, 0x3f52ce898809244e
> +        .quad 0x3feffe63b8c8b5f7, 0x3f520cc76202c5fb
> +        .quad 0x3feffe7567b7b5e1, 0x3f515246dda49d47
> +        .quad 0x3feffe865fac722b, 0x3f509ec86c75d497
> +        .quad 0x3feffe96a78a04a9, 0x3f4fe41cd9bb4eee
> +        .quad 0x3feffea645f6d6da, 0x3f4e97ba3b77f306
> +        .quad 0x3feffeb5415e7c44, 0x3f4d57f524723822
> +        .quad 0x3feffec39ff380b9, 0x3f4c245d4b99847a
> +        .quad 0x3feffed167b12ac2, 0x3f4afc85e0f82e12
> +        .quad 0x3feffede9e5d3262, 0x3f49e005769dbc1d
> +        .quad 0x3feffeeb49896c6d, 0x3f48ce75e9f6f8a0
> +        .quad 0x3feffef76e956a9f, 0x3f47c7744d9378f7
> +        .quad 0x3fefff0312b010b5, 0x3f46caa0d3582fe9
> +        .quad 0x3fefff0e3ad91ec2, 0x3f45d79eb71e893b
> +        .quad 0x3fefff18ebe2b0e1, 0x3f44ee1429bf7cc0
> +        .quad 0x3fefff232a72b48e, 0x3f440daa3c89f5b6
> +        .quad 0x3fefff2cfb0453d9, 0x3f43360ccd23db3a
> +        .quad 0x3fefff3661e9569d, 0x3f4266ea71d4f71a
> +        .quad 0x3fefff3f634b79f9, 0x3f419ff4663ae9df
> +        .quad 0x3fefff48032dbe40, 0x3f40e0de78654d1e
> +        .quad 0x3fefff50456dab8c, 0x3f40295ef6591848
> +        .quad 0x3fefff582dc48d30, 0x3f3ef25d37f49fe1
> +        .quad 0x3fefff5fbfc8a439, 0x3f3da01102b5f851
> +        .quad 0x3fefff66feee5129, 0x3f3c5b5412dcafad
> +        .quad 0x3fefff6dee89352e, 0x3f3b23a5a23e4210
> +        .quad 0x3fefff7491cd4af6, 0x3f39f8893d8fd1c1
> +        .quad 0x3fefff7aebcff755, 0x3f38d986a4187285
> +        .quad 0x3fefff80ff8911fd, 0x3f37c629a822bc9e
> +        .quad 0x3fefff86cfd3e657, 0x3f36be02102b3520
> +        .quad 0x3fefff8c5f702ccf, 0x3f35c0a378c90bca
> +        .quad 0x3fefff91b102fca8, 0x3f34cda5374ea275
> +        .quad 0x3fefff96c717b695, 0x3f33e4a23d1f4703
> +        .quad 0x3fefff9ba420e834, 0x3f330538fbb77ecd
> +        .quad 0x3fefffa04a7928b1, 0x3f322f0b496539be
> +        .quad 0x3fefffa4bc63ee9a, 0x3f3161be46ad3b50
> +        .quad 0x3fefffa8fc0e5f33, 0x3f309cfa445b00ff
> +        .quad 0x3fefffad0b901755, 0x3f2fc0d55470cf51
> +        .quad 0x3fefffb0ecebee1b, 0x3f2e577bbcd49935
> +        .quad 0x3fefffb4a210b172, 0x3f2cfd4a5adec5c0
> +        .quad 0x3fefffb82cd9dcbf, 0x3f2bb1a9657ce465
> +        .quad 0x3fefffbb8f1049c6, 0x3f2a740684026555
> +        .quad 0x3fefffbeca6adbe9, 0x3f2943d4a1d1ed39
> +        .quad 0x3fefffc1e08f25f5, 0x3f28208bc334a6a5
> +        .quad 0x3fefffc4d3120aa1, 0x3f2709a8db59f25c
> +        .quad 0x3fefffc7a37857d2, 0x3f25feada379d8b7
> +        .quad 0x3fefffca53375ce3, 0x3f24ff207314a102
> +        .quad 0x3fefffcce3b57bff, 0x3f240a8c1949f75e
> +        .quad 0x3fefffcf564ab6b7, 0x3f23207fb7420eb9
> +        .quad 0x3fefffd1ac4135f9, 0x3f22408e9ba3327f
> +        .quad 0x3fefffd3e6d5cd87, 0x3f216a501f0e42ca
> +        .quad 0x3fefffd607387b07, 0x3f209d5f819c9e29
> +        .quad 0x3fefffd80e8ce0da, 0x3f1fb2b792b40a22
> +        .quad 0x3fefffd9fdeabcce, 0x3f1e3bcf436a1a95
> +        .quad 0x3fefffdbd65e5ad0, 0x3f1cd55277c18d05
> +        .quad 0x3fefffdd98e903b2, 0x3f1b7e94604479dc
> +        .quad 0x3fefffdf46816833, 0x3f1a36eec00926dd
> +        .quad 0x3fefffe0e0140857, 0x3f18fdc1b2dcf7b9
> +        .quad 0x3fefffe26683972a, 0x3f17d2737527c3f9
> +        .quad 0x3fefffe3daa95b18, 0x3f16b4702d7d5849
> +        .quad 0x3fefffe53d558ae9, 0x3f15a329b7d30748
> +        .quad 0x3fefffe68f4fa777, 0x3f149e17724f4d41
> +        .quad 0x3fefffe7d156d244, 0x3f13a4b60ba9aa4e
> +        .quad 0x3fefffe904222101, 0x3f12b6875310f785
> +        .quad 0x3fefffea2860ee1e, 0x3f11d312098e9dba
> +        .quad 0x3fefffeb3ebb267b, 0x3f10f9e1b4dd36df
> +        .quad 0x3fefffec47d19457, 0x3f102a8673a94692
> +        .quad 0x3fefffed443e2787, 0x3f0ec929a665b449
> +        .quad 0x3fefffee34943b15, 0x3f0d4f4b4c8e09ed
> +        .quad 0x3fefffef1960d85d, 0x3f0be6abbb10a5aa
> +        .quad 0x3fefffeff32af7af, 0x3f0a8e8cc1fadef6
> +        .quad 0x3feffff0c273bea2, 0x3f094637d5bacfdb
> +        .quad 0x3feffff187b6bc0e, 0x3f080cfdc72220cf
> +        .quad 0x3feffff2436a21dc, 0x3f06e2367dc27f95
> +        .quad 0x3feffff2f5fefcaa, 0x3f05c540b4936fd2
> +        .quad 0x3feffff39fe16963, 0x3f04b581b8d170fc
> +        .quad 0x3feffff44178c8d2, 0x3f03b2652b06c2b2
> +        .quad 0x3feffff4db27f146, 0x3f02bb5cc22e5db6
> +        .quad 0x3feffff56d4d5e5e, 0x3f01cfe010e2052d
> +        .quad 0x3feffff5f8435efc, 0x3f00ef6c4c84a0fe
> +        .quad 0x3feffff67c604180, 0x3f001984165a5f36
> +        .quad 0x3feffff6f9f67e55, 0x3efe9b5e8d00ce77
> +        .quad 0x3feffff77154e0d6, 0x3efd16f5716c6c1a
> +        .quad 0x3feffff7e2c6aea2, 0x3efba4f035d60e03
> +        .quad 0x3feffff84e93cd75, 0x3efa447b7b03f045
> +        .quad 0x3feffff8b500e77c, 0x3ef8f4ccca7fc90d
> +        .quad 0x3feffff9164f8e46, 0x3ef7b5223dac7336
> +        .quad 0x3feffff972be5c59, 0x3ef684c227fcacef
> +        .quad 0x3feffff9ca891572, 0x3ef562fac4329b48
> +        .quad 0x3feffffa1de8c582, 0x3ef44f21e49054f2
> +        .quad 0x3feffffa6d13de73, 0x3ef34894a5e24657
> +        .quad 0x3feffffab83e54b8, 0x3ef24eb7254ccf83
> +        .quad 0x3feffffaff99bac4, 0x3ef160f438c70913
> +        .quad 0x3feffffb43555b5f, 0x3ef07ebd2a2d2844
> +        .quad 0x3feffffb839e52f3, 0x3eef4f12e9ab070a
> +        .quad 0x3feffffbc09fa7cd, 0x3eedb5ad0b27805c
> +        .quad 0x3feffffbfa82616b, 0x3eec304efa2c6f4e
> +        .quad 0x3feffffc316d9ed0, 0x3eeabe09e9144b5e
> +        .quad 0x3feffffc6586abf6, 0x3ee95df988e76644
> +        .quad 0x3feffffc96f1165e, 0x3ee80f439b4ee04b
> +        .quad 0x3feffffcc5cec0c1, 0x3ee6d11788a69c64
> +        .quad 0x3feffffcf23ff5fc, 0x3ee5a2adfa0b4bc4
> +        .quad 0x3feffffd1c637b2b, 0x3ee4834877429b8f
> +        .quad 0x3feffffd4456a10d, 0x3ee37231085c7d9a
> +        .quad 0x3feffffd6a3554a1, 0x3ee26eb9daed6f7e
> +        .quad 0x3feffffd8e1a2f22, 0x3ee1783ceac28910
> +        .quad 0x3feffffdb01e8546, 0x3ee08e1badf0fced
> +        .quad 0x3feffffdd05a75ea, 0x3edf5f7d88472604
> +        .quad 0x3feffffdeee4f810, 0x3eddb92b5212fb8d
> +        .quad 0x3feffffe0bd3e852, 0x3edc282cd3957eda
> +        .quad 0x3feffffe273c15b7, 0x3edaab7abace48dc
> +        .quad 0x3feffffe41314e06, 0x3ed94219bfcb4928
> +        .quad 0x3feffffe59c6698b, 0x3ed7eb1a2075864e
> +        .quad 0x3feffffe710d565e, 0x3ed6a597219a93da
> +        .quad 0x3feffffe8717232d, 0x3ed570b69502f313
> +        .quad 0x3feffffe9bf4098c, 0x3ed44ba864670882
> +        .quad 0x3feffffeafb377d5, 0x3ed335a62115bce2
> +        .quad 0x3feffffec2641a9e, 0x3ed22df298214423
> +        .quad 0x3feffffed413e5b7, 0x3ed133d96ae7e0dd
> +        .quad 0x3feffffee4d01cd6, 0x3ed046aeabcfcdec
> +        .quad 0x3feffffef4a55bd4, 0x3ececb9cfe1d8642
> +        .quad 0x3fefffff039f9e8f, 0x3ecd21397ead99cb
> +        .quad 0x3fefffff11ca4876, 0x3ecb8d094c86d374
> +        .quad 0x3fefffff1f302bc1, 0x3eca0df0f0c626dc
> +        .quad 0x3fefffff2bdb904d, 0x3ec8a2e269750a39
> +        .quad 0x3fefffff37d63a36, 0x3ec74adc8f4064d3
> +        .quad 0x3fefffff43297019, 0x3ec604ea819f007c
> +        .quad 0x3fefffff4dde0118, 0x3ec4d0231928c6f9
> +        .quad 0x3fefffff57fc4a95, 0x3ec3aba85fe22e20
> +        .quad 0x3fefffff618c3da6, 0x3ec296a70f414053
> +        .quad 0x3fefffff6a956450, 0x3ec1905613b3abf2
> +        .quad 0x3fefffff731ee681, 0x3ec097f6156f32c5
> +        .quad 0x3fefffff7b2f8ed6, 0x3ebf59a20caf6695
> +        .quad 0x3fefffff82cdcf1b, 0x3ebd9c73698fb1dc
> +        .quad 0x3fefffff89ffc4aa, 0x3ebbf716c6168bae
> +        .quad 0x3fefffff90cb3c81, 0x3eba6852c6b58392
> +        .quad 0x3fefffff9735b73b, 0x3eb8eefd70594a89
> +        .quad 0x3fefffff9d446ccc, 0x3eb789fb715aae95
> +        .quad 0x3fefffffa2fc5015, 0x3eb6383f726a8e04
> +        .quad 0x3fefffffa8621251, 0x3eb4f8c96f26a26a
> +        .quad 0x3fefffffad7a2652, 0x3eb3caa61607f920
> +        .quad 0x3fefffffb248c39d, 0x3eb2acee2f5ecdb8
> +        .quad 0x3fefffffb6d1e95d, 0x3eb19ec60b1242ed
> +        .quad 0x3fefffffbb196132, 0x3eb09f5cf4dd2877
> +        .quad 0x3fefffffbf22c1e2, 0x3eaf5bd95d8730d8
> +        .quad 0x3fefffffc2f171e3, 0x3ead9371e2ff7c35
> +        .quad 0x3fefffffc688a9cf, 0x3eabe41de54d155a
> +        .quad 0x3fefffffc9eb76ac, 0x3eaa4c89e08ef4f3
> +        .quad 0x3fefffffcd1cbc28, 0x3ea8cb738399b12c
> +        .quad 0x3fefffffd01f36af, 0x3ea75fa8dbc84bec
> +        .quad 0x3fefffffd2f57d68, 0x3ea608078a70dcbc
> +        .quad 0x3fefffffd5a2041f, 0x3ea4c37c0394d094
> +        .quad 0x3fefffffd8271d12, 0x3ea39100d5687bfe
> +        .quad 0x3fefffffda86faa9, 0x3ea26f9df8519bd7
> +        .quad 0x3fefffffdcc3b117, 0x3ea15e6827001f18
> +        .quad 0x3fefffffdedf37ed, 0x3ea05c803e4831c1
> +        .quad 0x3fefffffe0db6b91, 0x3e9ed22548cffd35
> +        .quad 0x3fefffffe2ba0ea5, 0x3e9d06ad6ecdf971
> +        .quad 0x3fefffffe47ccb60, 0x3e9b551c847fbc96
> +        .quad 0x3fefffffe62534d4, 0x3e99bc09f112b494
> +        .quad 0x3fefffffe7b4c81e, 0x3e983a1ff0aa239d
> +        .quad 0x3fefffffe92ced93, 0x3e96ce1aa3fd7bdd
> +        .quad 0x3fefffffea8ef9cf, 0x3e9576c72b514859
> +        .quad 0x3fefffffebdc2ec6, 0x3e943302cc4a0da8
> +        .quad 0x3fefffffed15bcba, 0x3e9301ba221dc9bb
> +        .quad 0x3fefffffee3cc32c, 0x3e91e1e857adc568
> +        .quad 0x3fefffffef5251c2, 0x3e90d2966b1746f7
> +        .quad 0x3feffffff0576917, 0x3e8fa5b4f49cc6b2
> +        .quad 0x3feffffff14cfb92, 0x3e8dc3ae30b55c16
> +        .quad 0x3feffffff233ee1d, 0x3e8bfd7555a3bd68
> +        .quad 0x3feffffff30d18e8, 0x3e8a517d9e61628a
> +        .quad 0x3feffffff3d9480f, 0x3e88be4f8f6c951f
> +        .quad 0x3feffffff4993c46, 0x3e874287ded49339
> +        .quad 0x3feffffff54dab72, 0x3e85dcd669f2cd34
> +        .quad 0x3feffffff5f74141, 0x3e848bfd38302871
> +        .quad 0x3feffffff6969fb8, 0x3e834ecf8a3c124a
> +        .quad 0x3feffffff72c5fb6, 0x3e822430f521cbcf
> +        .quad 0x3feffffff7b91176, 0x3e810b1488aeb235
> +        .quad 0x3feffffff83d3d07, 0x3e80027c00a263a6
> +        .quad 0x3feffffff8b962be, 0x3e7e12ee004efc37
> +        .quad 0x3feffffff92dfba2, 0x3e7c3e44ae32b16b
> +        .quad 0x3feffffff99b79d2, 0x3e7a854ea14102a8
> +        .quad 0x3feffffffa0248e8, 0x3e78e6761569f45d
> +        .quad 0x3feffffffa62ce54, 0x3e77603bac345f65
> +        .quad 0x3feffffffabd69b4, 0x3e75f1353cdad001
> +        .quad 0x3feffffffb127525, 0x3e74980cb3c80949
> +        .quad 0x3feffffffb624592, 0x3e73537f00b6ad4d
> +        .quad 0x3feffffffbad2aff, 0x3e72225b12bffc68
> +        .quad 0x3feffffffbf370cd, 0x3e710380e1adb7e9
> +        .quad 0x3feffffffc355dfd, 0x3e6febc107d5efaa
> +        .quad 0x3feffffffc733572, 0x3e6df0f2a0ee6947
> +        .quad 0x3feffffffcad3626, 0x3e6c14b2188bcee4
> +        .quad 0x3feffffffce39b67, 0x3e6a553644f7f07d
> +        .quad 0x3feffffffd169d0c, 0x3e68b0cfce0579e0
> +        .quad 0x3feffffffd466fa5, 0x3e6725e7c5dd20f7
> +        .quad 0x3feffffffd7344aa, 0x3e65b2fe547a1340
> +        .quad 0x3feffffffd9d4aab, 0x3e6456a974e92e93
> +        .quad 0x3feffffffdc4ad7a, 0x3e630f93c3699078
> +        .quad 0x3feffffffde9964e, 0x3e61dc7b5b978cf8
> +        .quad 0x3feffffffe0c2bf0, 0x3e60bc30c5d52f15
> +        .quad 0x3feffffffe2c92db, 0x3e5f5b2be65a0c7f
> +        .quad 0x3feffffffe4aed5e, 0x3e5d5f3a8dea7357
> +        .quad 0x3feffffffe675bbd, 0x3e5b82915b03515b
> +        .quad 0x3feffffffe81fc4e, 0x3e59c3517e789488
> +        .quad 0x3feffffffe9aeb97, 0x3e581fb7df06136e
> +        .quad 0x3feffffffeb24467, 0x3e56961b8d641d06
> +        .quad 0x3feffffffec81ff2, 0x3e5524ec4d916cae
> +        .quad 0x3feffffffedc95e7, 0x3e53cab1343d18d1
> +        .quad 0x3feffffffeefbc85, 0x3e52860757487a01
> +        .quad 0x3fefffffff01a8b6, 0x3e5155a09065d4f7
> +        .quad 0x3fefffffff126e1e, 0x3e50384250e4c9fc
> +        .quad 0x3fefffffff221f30, 0x3e4e59890b926c78
> +        .quad 0x3fefffffff30cd3f, 0x3e4c642116a8a9e3
> +        .quad 0x3fefffffff3e8892, 0x3e4a8e405e651ab6
> +        .quad 0x3fefffffff4b606f, 0x3e48d5f98114f872
> +        .quad 0x3fefffffff57632d, 0x3e47397c5a66e307
> +        .quad 0x3fefffffff629e44, 0x3e45b71456c5a4c4
> +        .quad 0x3fefffffff6d1e56, 0x3e444d26de513197
> +        .quad 0x3fefffffff76ef3f, 0x3e42fa31d6371537
> +        .quad 0x3fefffffff801c1f, 0x3e41bcca373b7b43
> +        .quad 0x3fefffffff88af67, 0x3e40939ab853339f
> +        .quad 0x3fefffffff90b2e3, 0x3e3efac5187b2863
> +        .quad 0x3fefffffff982fc1, 0x3e3cf1e86235d0e7
> +        .quad 0x3fefffffff9f2e9f, 0x3e3b0a68a2128bab
> +        .quad 0x3fefffffffa5b790, 0x3e39423165bc4444
> +        .quad 0x3fefffffffabd229, 0x3e37974e743dea3d
> +        .quad 0x3fefffffffb18582, 0x3e3607e9eacd1050
> +        .quad 0x3fefffffffb6d844, 0x3e34924a74dec729
> +        .quad 0x3fefffffffbbd0aa, 0x3e3334d19e0c2160
> +        .quad 0x3fefffffffc0748f, 0x3e31edfa3c5f5cca
> +        .quad 0x3fefffffffc4c96c, 0x3e30bc56f1b54701
> +        .quad 0x3fefffffffc8d462, 0x3e2f3d2185e047d9
> +        .quad 0x3fefffffffcc9a41, 0x3e2d26cb87945e87
> +        .quad 0x3fefffffffd01f89, 0x3e2b334fac4b9f99
> +        .quad 0x3fefffffffd36871, 0x3e296076f7918d1c
> +        .quad 0x3fefffffffd678ed, 0x3e27ac2d72fc2c63
> +        .quad 0x3fefffffffd954ae, 0x3e2614801550319e
> +        .quad 0x3fefffffffdbff2a, 0x3e24979ac8b28927
> +        .quad 0x3fefffffffde7ba0, 0x3e2333c68e2d0548
> +        .quad 0x3fefffffffe0cd16, 0x3e21e767bce37dd7
> +        .quad 0x3fefffffffe2f664, 0x3e20b0fc5b6d05a0
> +        .quad 0x3fefffffffe4fa30, 0x3e1f1e3523b41d7d
> +        .quad 0x3fefffffffe6daf7, 0x3e1d00de6608effe
> +        .quad 0x3fefffffffe89b0c, 0x3e1b0778b7b3301b
> +        .quad 0x3fefffffffea3c9a, 0x3e192fb04ec0f6cf
> +        .quad 0x3fefffffffebc1a9, 0x3e177756ec9f78fa
> +        .quad 0x3fefffffffed2c21, 0x3e15dc61922d5a06
> +        .quad 0x3fefffffffee7dc8, 0x3e145ce65699ff6d
> +        .quad 0x3fefffffffefb847, 0x3e12f71a5f159970
> +        .quad 0x3feffffffff0dd2b, 0x3e11a94ff571654f
> +        .quad 0x3feffffffff1ede9, 0x3e1071f4bbea09ec
> +        .quad 0x3feffffffff2ebda, 0x3e0e9f1ff8ddd774
> +        .quad 0x3feffffffff3d843, 0x3e0c818223a202c7
> +        .quad 0x3feffffffff4b453, 0x3e0a887bd2b4404d
> +        .quad 0x3feffffffff58126, 0x3e08b1a336c5eb6b
> +        .quad 0x3feffffffff63fc3, 0x3e06fab63324088a
> +        .quad 0x3feffffffff6f121, 0x3e056197e30205ba
> +        .quad 0x3feffffffff79626, 0x3e03e44e45301b92
> +        .quad 0x3feffffffff82fab, 0x3e0281000bfe4c3f
> +        .quad 0x3feffffffff8be77, 0x3e0135f28f2d50b4
> +        .quad 0x3feffffffff94346, 0x3e000187dded5975
> +        .quad 0x3feffffffff9bec8, 0x3dfdc479de0ef001
> +        .quad 0x3feffffffffa319f, 0x3dfbad4fdad3caa1
> +        .quad 0x3feffffffffa9c63, 0x3df9baed3ed27ab8
> +        .quad 0x3feffffffffaffa4, 0x3df7ead9ce4285bb
> +        .quad 0x3feffffffffb5be5, 0x3df63ac6b4edc88e
> +        .quad 0x3feffffffffbb1a2, 0x3df4a88be2a6390c
> +        .quad 0x3feffffffffc014e, 0x3df332259185f1a0
> +        .quad 0x3feffffffffc4b56, 0x3df1d5b1f3793044
> +        .quad 0x3feffffffffc901c, 0x3df0916f04b6e18b
> +        .quad 0x3feffffffffccfff, 0x3deec77101de6926
> +        .quad 0x3feffffffffd0b56, 0x3dec960bf23153e0
> +        .quad 0x3feffffffffd4271, 0x3dea8bd20fc65ef7
> +        .quad 0x3feffffffffd759d, 0x3de8a61745ec7d1d
> +        .quad 0x3feffffffffda520, 0x3de6e25d0e756261
> +        .quad 0x3feffffffffdd13c, 0x3de53e4f7d1666cb
> +        .quad 0x3feffffffffdfa2d, 0x3de3b7c27a7ddb0e
> +        .quad 0x3feffffffffe202d, 0x3de24caf2c32af14
> +        .quad 0x3feffffffffe4371, 0x3de0fb3186804d0f
> +        .quad 0x3feffffffffe642a, 0x3ddf830c0bb41fd7
> +        .quad 0x3feffffffffe8286, 0x3ddd3c0f1a91c846
> +        .quad 0x3feffffffffe9eb0, 0x3ddb1e5acf351d87
> +        .quad 0x3feffffffffeb8d0, 0x3dd92712d259ce66
> +        .quad 0x3feffffffffed10a, 0x3dd7538c60a04476
> +        .quad 0x3feffffffffee782, 0x3dd5a14b04b47879
> +        .quad 0x3feffffffffefc57, 0x3dd40dfd87456f4c
> +        .quad 0x3fefffffffff0fa7, 0x3dd2977b1172b9d5
> +        .quad 0x3fefffffffff218f, 0x3dd13bc07e891491
> +        .quad 0x3fefffffffff3227, 0x3dcff1dbb4300811
> +        .quad 0x3fefffffffff4188, 0x3dcd9a880f306bd8
> +        .quad 0x3fefffffffff4fc9, 0x3dcb6e45220b55e0
> +        .quad 0x3fefffffffff5cfd, 0x3dc96a0b33f2c4da
> +        .quad 0x3fefffffffff6939, 0x3dc78b07e9e924ac
> +        .quad 0x3fefffffffff748e, 0x3dc5ce9ab1670dd2
> +        .quad 0x3fefffffffff7f0d, 0x3dc4325167006bb0
> +        .quad 0x3fefffffffff88c5, 0x3dc2b3e53538ff3f
> +        .quad 0x3fefffffffff91c6, 0x3dc15137a7f44864
> +        .quad 0x3fefffffffff9a1b, 0x3dc0084ff125639d
> +        .quad 0x3fefffffffffa1d2, 0x3dbdaeb0b7311ec7
> +        .quad 0x3fefffffffffa8f6, 0x3dbb7937d1c40c53
> +        .quad 0x3fefffffffffaf92, 0x3db96d082f59ab06
> +        .quad 0x3fefffffffffb5b0, 0x3db7872d9fa10aad
> +        .quad 0x3fefffffffffbb58, 0x3db5c4e8e37bc7d0
> +        .quad 0x3fefffffffffc095, 0x3db423ac0df49a40
> +        .quad 0x3fefffffffffc56d, 0x3db2a117230ad284
> +        .quad 0x3fefffffffffc9e8, 0x3db13af4f04f9998
> +        .quad 0x3fefffffffffce0d, 0x3dafde703724e560
> +        .quad 0x3fefffffffffd1e1, 0x3dad77f0c82e7641
> +        .quad 0x3fefffffffffd56c, 0x3dab3ee02611d7dd
> +        .quad 0x3fefffffffffd8b3, 0x3da92ff33023d5bd
> +        .quad 0x3fefffffffffdbba, 0x3da7481a9e69f53f
> +        .quad 0x3fefffffffffde86, 0x3da5847eda620959
> +        .quad 0x3fefffffffffe11d, 0x3da3e27c1fcc74bd
> +        .quad 0x3fefffffffffe380, 0x3da25f9ee0b923dc
> +        .quad 0x3fefffffffffe5b6, 0x3da0f9a068653200
> +        .quad 0x3fefffffffffe7c0, 0x3d9f5cc7718082b0
> +        .quad 0x3fefffffffffe9a2, 0x3d9cf7e53d6a2ca5
> +        .quad 0x3fefffffffffeb60, 0x3d9ac0f5f3229372
> +        .quad 0x3fefffffffffecfb, 0x3d98b498644847ea
> +        .quad 0x3fefffffffffee77, 0x3d96cfa9bcca59dc
> +        .quad 0x3fefffffffffefd6, 0x3d950f411d4fd2cd
> +        .quad 0x3feffffffffff11a, 0x3d9370ab8327af5e
> +        .quad 0x3feffffffffff245, 0x3d91f167f88c6b6e
> +        .quad 0x3feffffffffff359, 0x3d908f24085d4597
> +        .quad 0x3feffffffffff457, 0x3d8e8f70e181d61a
> +        .quad 0x3feffffffffff542, 0x3d8c324c20e337dc
> +        .quad 0x3feffffffffff61b, 0x3d8a03261574b54e
> +        .quad 0x3feffffffffff6e3, 0x3d87fe903cdf5855
> +        .quad 0x3feffffffffff79b, 0x3d86215c58da3450
> +        .quad 0x3feffffffffff845, 0x3d846897d4b69fc6
> +        .quad 0x3feffffffffff8e2, 0x3d82d1877d731b7b
> +        .quad 0x3feffffffffff973, 0x3d8159a386b11517
> +        .quad 0x3feffffffffff9f8, 0x3d7ffd27ae9393ce
> +        .quad 0x3feffffffffffa73, 0x3d7d7c593130dd0b
> +        .quad 0x3feffffffffffae4, 0x3d7b2cd607c79bcf
> +        .quad 0x3feffffffffffb4c, 0x3d790ae4d3405651
> +        .quad 0x3feffffffffffbad, 0x3d771312dd1759e2
> +        .quad 0x3feffffffffffc05, 0x3d75422ef5d8949d
> +        .quad 0x3feffffffffffc57, 0x3d739544b0ecc957
> +        .quad 0x3feffffffffffca2, 0x3d720997f73e73dd
> +        .quad 0x3feffffffffffce7, 0x3d709ca0eaacd277
> +        .quad 0x3feffffffffffd27, 0x3d6e9810295890ec
> +        .quad 0x3feffffffffffd62, 0x3d6c2b45b5aa4a1d
> +        .quad 0x3feffffffffffd98, 0x3d69eee068fa7596
> +        .quad 0x3feffffffffffdca, 0x3d67df2b399c10a8
> +        .quad 0x3feffffffffffdf8, 0x3d65f8b87a31bd85
> +        .quad 0x3feffffffffffe22, 0x3d64385c96e9a2d9
> +        .quad 0x3feffffffffffe49, 0x3d629b2933ef4cbc
> +        .quad 0x3feffffffffffe6c, 0x3d611e68a6378f8a
> +        .quad 0x3feffffffffffe8d, 0x3d5f7f338086a86b
> +        .quad 0x3feffffffffffeab, 0x3d5cf8d7d9ce040a
> +        .quad 0x3feffffffffffec7, 0x3d5aa577251ae485
> +        .quad 0x3feffffffffffee1, 0x3d58811d739efb5f
> +        .quad 0x3feffffffffffef8, 0x3d568823e52970be
> +        .quad 0x3fefffffffffff0e, 0x3d54b72ae68e8b4c
> +        .quad 0x3fefffffffffff22, 0x3d530b14dbe876bc
> +        .quad 0x3fefffffffffff34, 0x3d5181012ef86610
> +        .quad 0x3fefffffffffff45, 0x3d501647ba798745
> +        .quad 0x3fefffffffffff54, 0x3d4d90e917701675
> +        .quad 0x3fefffffffffff62, 0x3d4b2a87e86d0c8a
> +        .quad 0x3fefffffffffff6f, 0x3d48f53dcb377293
> +        .quad 0x3fefffffffffff7b, 0x3d46ed2f2515e933
> +        .quad 0x3fefffffffffff86, 0x3d450ecc9ed47f19
> +        .quad 0x3fefffffffffff90, 0x3d4356cd5ce7799e
> +        .quad 0x3fefffffffffff9a, 0x3d41c229a587ab78
> +        .quad 0x3fefffffffffffa2, 0x3d404e15ecc7f3f6
> +        .quad 0x3fefffffffffffaa, 0x3d3deffc7e6a6017
> +        .quad 0x3fefffffffffffb1, 0x3d3b7b040832f310
> +        .quad 0x3fefffffffffffb8, 0x3d3938e021f36d76
> +        .quad 0x3fefffffffffffbe, 0x3d37258610b3b233
> +        .quad 0x3fefffffffffffc3, 0x3d353d3bfc82a909
> +        .quad 0x3fefffffffffffc8, 0x3d337c92babdc2fd
> +        .quad 0x3fefffffffffffcd, 0x3d31e06010120f6a
> +        .quad 0x3fefffffffffffd1, 0x3d3065b9616170d4
> +        .quad 0x3fefffffffffffd5, 0x3d2e13dd96b3753b
> +        .quad 0x3fefffffffffffd9, 0x3d2b950d32467392
> +        .quad 0x3fefffffffffffdc, 0x3d294a72263259a5
> +        .quad 0x3fefffffffffffdf, 0x3d272fd93e036cdc
> +        .quad 0x3fefffffffffffe2, 0x3d254164576929ab
> +        .quad 0x3fefffffffffffe4, 0x3d237b83c521fe96
> +        .quad 0x3fefffffffffffe7, 0x3d21daf033182e96
> +        .quad 0x3fefffffffffffe9, 0x3d205ca50205d26a
> +        .quad 0x3fefffffffffffeb, 0x3d1dfbb6235639fa
> +        .quad 0x3fefffffffffffed, 0x3d1b7807e294781f
> +        .quad 0x3fefffffffffffee, 0x3d19298add70a734
> +        .quad 0x3feffffffffffff0, 0x3d170beaf9c7ffb6
> +        .quad 0x3feffffffffffff1, 0x3d151b2cd6709222
> +        .quad 0x3feffffffffffff3, 0x3d1353a6cf7f7fff
> +        .quad 0x3feffffffffffff4, 0x3d11b1fa8cbe84a7
> +        .quad 0x3feffffffffffff5, 0x3d10330f0fd69921
> +        .quad 0x3feffffffffffff6, 0x3d0da81670f96f9b
> +        .quad 0x3feffffffffffff7, 0x3d0b24a16b4d09aa
> +        .quad 0x3feffffffffffff7, 0x3d08d6eeb6efdbd6
> +        .quad 0x3feffffffffffff8, 0x3d06ba91ac734786
> +        .quad 0x3feffffffffffff9, 0x3d04cb7966770ab5
> +        .quad 0x3feffffffffffff9, 0x3d0305e9721d0981
> +        .quad 0x3feffffffffffffa, 0x3d01667311fff70a
> +        .quad 0x3feffffffffffffb, 0x3cffd3de10d62855
> +        .quad 0x3feffffffffffffb, 0x3cfd1aefbcd48d0c
> +        .quad 0x3feffffffffffffb, 0x3cfa9cc93c25aca9
> +        .quad 0x3feffffffffffffc, 0x3cf85487ee3ea735
> +        .quad 0x3feffffffffffffc, 0x3cf63daf8b4b1e0c
> +        .quad 0x3feffffffffffffd, 0x3cf45421e69a6ca1
> +        .quad 0x3feffffffffffffd, 0x3cf294175802d99a
> +        .quad 0x3feffffffffffffd, 0x3cf0fa17bf41068f
> +        .quad 0x3feffffffffffffd, 0x3cef05e82aae2bb9
> +        .quad 0x3feffffffffffffe, 0x3cec578101b29058
> +        .quad 0x3feffffffffffffe, 0x3ce9e39dc5dd2f7c
> +        .quad 0x3feffffffffffffe, 0x3ce7a553a728bbf2
> +        .quad 0x3feffffffffffffe, 0x3ce5982008db1304
> +        .quad 0x3feffffffffffffe, 0x3ce3b7e00422e51b
> +        .quad 0x3feffffffffffffe, 0x3ce200c898d9ee3e
> +        .quad 0x3fefffffffffffff, 0x3ce06f5f7eb65a56
> +        .quad 0x3fefffffffffffff, 0x3cde00e9148a1d25
> +        .quad 0x3fefffffffffffff, 0x3cdb623734024e92
> +        .quad 0x3fefffffffffffff, 0x3cd8fd4e01891bf8
> +        .quad 0x3fefffffffffffff, 0x3cd6cd44c7470d89
> +        .quad 0x3fefffffffffffff, 0x3cd4cd9c04158cd7
> +        .quad 0x3fefffffffffffff, 0x3cd2fa34bf5c8344
> +        .quad 0x3fefffffffffffff, 0x3cd14f4890ff2461
> +        .quad 0x3fefffffffffffff, 0x3ccf92c49dfa4df5
> +        .quad 0x3fefffffffffffff, 0x3ccccaaea71ab0df
> +        .quad 0x3fefffffffffffff, 0x3cca40829f001197
> +        .quad 0x3ff0000000000000, 0x3cc7eef13b59e96c
> +        .quad 0x3ff0000000000000, 0x3cc5d11e1a252bf5
> +        .quad 0x3ff0000000000000, 0x3cc3e296303b2297
> +        .quad 0x3ff0000000000000, 0x3cc21f47009f43ce
> +        .quad 0x3ff0000000000000, 0x3cc083768c5e4542
> +        .quad 0x3ff0000000000000, 0x3cbe1777d831265f
> +        .quad 0x3ff0000000000000, 0x3cbb69f10b0191b5
> +        .quad 0x3ff0000000000000, 0x3cb8f8a3a05b5b53
> +        .quad 0x3ff0000000000000, 0x3cb6be573c40c8e7
> +        .quad 0x3ff0000000000000, 0x3cb4b645ba991fdb
> +        .align 64
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff  /* _AbsMask */
> +        .align 64
> +        .quad 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000, 0x4017f80000000000  /* _MaxThreshold = 6.0 - 1.0/128.0 */
> +        .align 64
> +        .quad 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000, 0x42c0000000000000  /* SRound */
> +        .align 64
> +        .quad 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000, 0x2ff0000000000000  /* _U2THreshold  */
> +        .align 64
> +        .quad 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5, 0xbfa6c16db05bdea5  /* _poly_1_0 */
> +        .align 64
> +        .quad 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1, 0x3fc1111235a363b1  /* _poly_1_1 */
> +        .align 64
> +        .quad 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57, 0x3fcc71ca1c71eb57  /* _poly_3_0 */
> +        .align 64
> +        .quad 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8, 0xbfd9999c2be2dda8  /* _poly_3_1 */
> +        .align 64
> +        .quad 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F, 0xbfc5555800001B4F  /* _poly_5_0 */
> +        .align 64
> +        .quad 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122, 0x3fb9999E2BE2F122  /* _poly_5_1 */
> +        .align 64
> +        .quad 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6, 0xbfd55555555547f6  /* _poly_1_2 */
> +        .align 64
> +        .quad 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd, 0x3fdfffffffffd4cd  /* _poly_3_2 */
> +        .align 64
> +        .quad 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c, 0x3fe5555555554b0c  /* _poly_1_3 */
> +        .align 64
> +        .quad 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555, 0xbfd5555555555555  /* _poly_3_3 */
> +        .align 64
> +        .quad 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff, 0x00000000ffffffff  /* _Mask32 */
> +        .align 64
> +        .type	__svml_derf_data_internal,@object
> +        .size	__svml_derf_data_internal,.-__svml_derf_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core-avx2.S
> new file mode 100644
> index 0000000000..852a247f83
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized erff.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_erff _ZGVeN16v_erff_avx2_wrapper
> +#include "../svml_s_erff16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core.c
> new file mode 100644
> index 0000000000..5714eaf023
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized erff, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_erff
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_erff, __GI__ZGVeN16v_erff,
> +	       __redirect__ZGVeN16v_erff)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core_avx512.S
> new file mode 100644
> index 0000000000..5cdc8a77f7
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff16_core_avx512.S
> @@ -0,0 +1,185 @@
> +/* Function erff vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   erf(x) is computed as higher precision simple polynomial
> + *   with no lookup table:
> + *
> + *     R = P0 + x^2*(P1 + x^2*(P2 + .... x^2*P12));
> + *     erf(x) = R * R * x;
> + *
> + *   Special cases:
> + *
> + *   erf(0)    = 0
> + *   erf(+INF) = +1
> + *   erf(-INF) = -1
> + *   erf(QNaN) = QNaN
> + *   erf(SNaN) = QNaN
> + *
> + */
> +
> +/* Offsets for data table __svml_serf_data_internal
> + */
> +#define _AbsMask                      	0
> +#define _One                          	64
> +#define _gf_MaxThreshold_LA           	128
> +#define _gf_la_poly_0                 	192
> +#define _gf_la_poly_1                 	256
> +#define _gf_la_poly_2                 	320
> +#define _gf_la_poly_3                 	384
> +#define _gf_la_poly_4                 	448
> +#define _gf_la_poly_5                 	512
> +#define _gf_la_poly_6                 	576
> +#define _gf_la_poly_7                 	640
> +#define _gf_la_poly_8                 	704
> +#define _gf_la_poly_9                 	768
> +#define _gf_la_poly_10                	832
> +#define _gf_la_poly_11                	896
> +#define _gf_la_poly_12                	960
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_erff_skx)
> +        vmovaps   %zmm0, %zmm8
> +        vmulps    {rn-sae}, %zmm8, %zmm8, %zmm11
> +        vmovups   _gf_la_poly_11+__svml_serf_data_internal(%rip), %zmm15
> +        vmovups   _gf_la_poly_12+__svml_serf_data_internal(%rip), %zmm10
> +        vmovups   _gf_la_poly_10+__svml_serf_data_internal(%rip), %zmm9
> +        vmovups   _gf_la_poly_9+__svml_serf_data_internal(%rip), %zmm7
> +        vmovups   _gf_la_poly_8+__svml_serf_data_internal(%rip), %zmm0
> +        vmovups   _gf_la_poly_7+__svml_serf_data_internal(%rip), %zmm1
> +        vmovups   _gf_la_poly_6+__svml_serf_data_internal(%rip), %zmm2
> +        vmovups   _gf_la_poly_5+__svml_serf_data_internal(%rip), %zmm3
> +        vmovups   _gf_la_poly_4+__svml_serf_data_internal(%rip), %zmm4
> +        vmovups   _gf_la_poly_3+__svml_serf_data_internal(%rip), %zmm5
> +        vmovups   _gf_la_poly_2+__svml_serf_data_internal(%rip), %zmm6
> +        vextractf32x8 $1, %zmm8, %ymm13
> +        vcvtps2pd {sae}, %ymm8, %zmm12
> +        vcvtps2pd {sae}, %ymm13, %zmm14
> +        vmulpd    {rn-sae}, %zmm12, %zmm12, %zmm12
> +        vmulpd    {rn-sae}, %zmm14, %zmm14, %zmm13
> +
> +/* R = P0 + x^2*(P1 + x^2*(P2 + .... x^2*P12)); */
> +        vmovaps   %zmm15, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm12, %zmm10, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm10, %zmm15
> +        vmovups   _gf_la_poly_1+__svml_serf_data_internal(%rip), %zmm10
> +        vfmadd213pd {rn-sae}, %zmm9, %zmm12, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm15, %zmm9
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm12, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm9, %zmm7
> +        vfmadd213pd {rn-sae}, %zmm0, %zmm12, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm7, %zmm0
> +        vmovups   _gf_MaxThreshold_LA+__svml_serf_data_internal(%rip), %zmm7
> +        vfmadd213pd {rn-sae}, %zmm1, %zmm12, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm0, %zmm1
> +        vmovups   _gf_la_poly_0+__svml_serf_data_internal(%rip), %zmm0
> +        vcmpps    $22, {sae}, %zmm11, %zmm7, %k1
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm12, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm1, %zmm2
> +        vfmadd213pd {rn-sae}, %zmm3, %zmm12, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm2, %zmm3
> +        vfmadd213pd {rn-sae}, %zmm4, %zmm12, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm3, %zmm4
> +        vfmadd213pd {rn-sae}, %zmm5, %zmm12, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm4, %zmm5
> +        vfmadd213pd {rn-sae}, %zmm6, %zmm12, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm5, %zmm6
> +        vmovups   _AbsMask+__svml_serf_data_internal(%rip), %zmm5
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm12, %zmm14
> +        vfmadd231pd {rn-sae}, %zmm13, %zmm6, %zmm10
> +        vandnps   %zmm8, %zmm5, %zmm6
> +        vfmadd213pd {rn-sae}, %zmm0, %zmm14, %zmm12
> +        vfmadd213pd {rn-sae}, %zmm0, %zmm10, %zmm13
> +        vorps     _One+__svml_serf_data_internal(%rip), %zmm6, %zmm0
> +        vmulpd    {rn-sae}, %zmm12, %zmm12, %zmm1
> +        vmulpd    {rn-sae}, %zmm13, %zmm13, %zmm3
> +        vcvtpd2ps {rn-sae}, %zmm1, %ymm2
> +        vcvtpd2ps {rn-sae}, %zmm3, %ymm4
> +        vinsertf32x8 $1, %ymm4, %zmm2, %zmm9
> +
> +/* erf(x) = R * R * x; */
> +        vmulps    {rn-sae}, %zmm8, %zmm9, %zmm0{%k1}
> +        ret
> +
> +END(_ZGVeN16v_erff_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_serf_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _AbsMask[16][1];
> +        __declspec(align(64)) VUINT32 _One[16][1];
> +        __declspec(align(64)) VUINT32 _gf_MaxThreshold_LA[16][1];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_0[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_1[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_2[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_3[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_4[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_5[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_6[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_7[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_8[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_9[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_10[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_11[8][2];
> +        __declspec(align(64)) VUINT32 _gf_la_poly_12[8][2];
> +} __svml_serf_data_internal;
> +#endif
> +__svml_serf_data_internal:
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _AbsMask */
> +        .align 64
> +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000  /* _One */
> +        .align 64
> +        .long 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a, 0x41558c5a          /* _gf_MaxThreshold_LA */
> +        .align 64
> +        .quad 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903, 0x3ff0fefbd933b903  /* _gf_la_poly_0 */
> +        .align 64
> +        .quad 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367, 0xbfc6a948101e6367  /* _gf_la_poly_1 */
> +        .align 64
> +        .quad 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b, 0x3fa3a334ce602c6b  /* _gf_la_poly_2 */
> +        .align 64
> +        .quad 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc, 0xbf799309ea0c81dc  /* _gf_la_poly_3 */
> +        .align 64
> +        .quad 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392, 0x3f476df64a40e392  /* _gf_la_poly_4 */
> +        .align 64
> +        .quad 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede, 0xbf0a5216b9508ede  /* _gf_la_poly_5 */
> +        .align 64
> +        .quad 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0, 0x3ea5794b95c8e8a0  /* _gf_la_poly_6 */
> +        .align 64
> +        .quad 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f, 0x3e94b6c0b485f30f  /* _gf_la_poly_7 */
> +        .align 64
> +        .quad 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523, 0xbe65806ce17f0523  /* _gf_la_poly_8 */
> +        .align 64
> +        .quad 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47, 0x3e2715640470db47  /* _gf_la_poly_9 */
> +        .align 64
> +        .quad 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03, 0xbdddcb2653d80f03  /* _gf_la_poly_10 */
> +        .align 64
> +        .quad 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb, 0x3d85eadfc762d3eb  /* _gf_la_poly_11 */
> +        .align 64
> +        .quad 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1, 0xbd1c668a2871f0f1  /* _gf_la_poly_12 */
> +        .align 64
> +        .type	__svml_serf_data_internal,@object
> +        .size	__svml_serf_data_internal,.-__svml_serf_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core-sse2.S
> new file mode 100644
> index 0000000000..651fd267a5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized erff, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_erff _ZGVbN4v_erff_sse2
> +#include "../svml_s_erff4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core.c
> new file mode 100644
> index 0000000000..02286a68c6
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized erff, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_erff
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_erff, __GI__ZGVbN4v_erff,
> +	       __redirect__ZGVbN4v_erff)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core_sse4.S
> new file mode 100644
> index 0000000000..5c052f5921
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff4_core_sse4.S
> @@ -0,0 +1,664 @@
> +/* Function erff vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Basic formula is
> + *    erf(x) ~ erf(x0) +
> + *              + exp(-x0*x0)*D*(1+c0+T*P1(T)+D^2*P3(T)+D^4*p5)
> + *   where D=x-x0, T=x0*D
> + *   x0 is x rounded to a specified number of fractional bits (in this case 8),
> + *    except that x0=0 for |x|<3.5/256.0 (using x0=0 for first 4 table entries)
> + *
> + *   Data table packs both erf(x0)_high and a few bits of erf(x0)_low in one
> + *   entry (in place of redundant exponent bits)
> + *
> + */
> +
> +/* Offsets for data table __svml_serf_data_internal
> + */
> +#define _erf_tbl                      	0
> +#define _AbsMask                      	4032
> +#define _MaxThreshold                 	4048
> +#define _SRound                       	4064
> +#define _U2Threshold                  	4080
> +#define _poly3_0                      	4096
> +
> +/* Lookup bias for data table __svml_serf_data_internal.  */
> +#define Table_Lookup_Bias               -0x3c000000
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_erff_sse4)
> +        lea       Table_Lookup_Bias+__svml_serf_data_internal(%rip), %rdi
> +        movups    _AbsMask+__svml_serf_data_internal(%rip), %xmm9
> +        andps     %xmm0, %xmm9
> +
> +/*
> + * erf(x) rounds to 1.0 for x>_MaxThreshold (3.9375)
> + * can compute all results in the main path
> + */
> +        movaps    %xmm9, %xmm12
> +
> +/* save sign */
> +        pxor      %xmm9, %xmm0
> +        minps     _MaxThreshold+__svml_serf_data_internal(%rip), %xmm12
> +
> +/*
> + * vector gather:
> + * erf(x0), exp(-x0*x0)*2.0/sqrt(pi)
> + */
> +        movups    _SRound+__svml_serf_data_internal(%rip), %xmm1
> +        movaps    %xmm1, %xmm4
> +        movups    _U2Threshold+__svml_serf_data_internal(%rip), %xmm11
> +        addps     %xmm12, %xmm4
> +        cmpltps   %xmm12, %xmm11
> +        movaps    %xmm4, %xmm10
> +        pslld     $3, %xmm4
> +        pshufd    $1, %xmm4, %xmm2
> +        subps     %xmm1, %xmm10
> +        movd      %xmm4, %eax
> +        movd      %xmm2, %edx
> +        pshufd    $2, %xmm4, %xmm3
> +        subps     %xmm10, %xmm12
> +        movd      %xmm3, %ecx
> +        andps     %xmm12, %xmm11
> +
> +/* D2 = Diff^2 */
> +        mulps     %xmm11, %xmm11
> +        mulps     %xmm12, %xmm10
> +
> +/* NaN fixup */
> +        minps     %xmm9, %xmm12
> +
> +/*
> + * Start polynomial evaluation
> + * P1
> + */
> +        mulps     _poly3_0+__svml_serf_data_internal(%rip), %xmm11
> +        pshufd    $3, %xmm4, %xmm5
> +        subps     %xmm10, %xmm11
> +        movd      %xmm5, %esi
> +
> +/*
> + * branch-free
> + * (exp_h(x0) * Diff) * (poly + 1.0)
> + */
> +        mulps     %xmm12, %xmm11
> +        movslq    %eax, %rax
> +        addps     %xmm11, %xmm12
> +        movslq    %edx, %rdx
> +        movslq    %ecx, %rcx
> +        movslq    %esi, %rsi
> +        movq      (%rdi,%rax), %xmm13
> +        movq      (%rdi,%rdx), %xmm6
> +        movq      (%rdi,%rcx), %xmm8
> +        movq      (%rdi,%rsi), %xmm7
> +        unpcklps  %xmm6, %xmm13
> +        unpcklps  %xmm7, %xmm8
> +        movaps    %xmm13, %xmm14
> +        shufps    $238, %xmm8, %xmm13
> +
> +/* Final result */
> +        mulps     %xmm12, %xmm13
> +        movlhps   %xmm8, %xmm14
> +        addps     %xmm13, %xmm14
> +
> +/* set sign */
> +        orps      %xmm14, %xmm0
> +        ret
> +
> +END(_ZGVbN4v_erff_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_serf_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _erf_tbl[1008][1];
> +        __declspec(align(16)) VUINT32 _AbsMask[4][1];
> +        __declspec(align(16)) VUINT32 _MaxThreshold[4][1];
> +        __declspec(align(16)) VUINT32 _SRound[4][1];
> +        __declspec(align(16)) VUINT32 _U2Threshold[4][1];
> +        __declspec(align(16)) VUINT32 _poly3_0[4][1];
> +} __svml_serf_data_internal;
> +#endif
> +__svml_serf_data_internal:
> +        /*== _erf_tbl ==*/
> +        .long 0x00000000, 0x3f906ebb
> +        .long 0x3c106dfa, 0x3f906c79
> +        .long 0x3c906bb8, 0x3f9065b4
> +        .long 0x3cd89bf0, 0x3f905a6c
> +        .long 0x3d1062b2, 0x3f904aa3
> +        .long 0x3d3472ea, 0x3f90365a
> +        .long 0x3d587d7f, 0x3f901d93
> +        .long 0x3d7c8154, 0x3f900050
> +        .long 0x3d903ea4, 0x3f8fde94
> +        .long 0x3da2381f, 0x3f8fb862
> +        .long 0x3db42c8d, 0x3f8f8dbd
> +        .long 0x3dc61b5f, 0x3f8f5eab
> +        .long 0x3dd80409, 0x3f8f2b2e
> +        .long 0x3de9e5fc, 0x3f8ef34c
> +        .long 0x3dfbc0ad, 0x3f8eb70a
> +        .long 0x3e06c9c8, 0x3f8e766e
> +        .long 0x3e0faf0d, 0x3f8e317d
> +        .long 0x3e188fe1, 0x3f8de83e
> +        .long 0x3e216bfe, 0x3f8d9ab9
> +        .long 0x3e2a4321, 0x3f8d48f3
> +        .long 0x3e331506, 0x3f8cf2f5
> +        .long 0x3e3be169, 0x3f8c98c6
> +        .long 0x3e44a808, 0x3f8c3a6f
> +        .long 0x3e4d68a1, 0x3f8bd7f8
> +        .long 0x3e5622f2, 0x3f8b716c
> +        .long 0x3e5ed6b9, 0x3f8b06d2
> +        .long 0x3e6783b7, 0x3f8a9834
> +        .long 0x3e7029aa, 0x3f8a259e
> +        .long 0x3e78c855, 0x3f89af18
> +        .long 0x3e80afbc, 0x3f8934af
> +        .long 0x3e84f76b, 0x3f88b66c
> +        .long 0x3e893b19, 0x3f88345d
> +        .long 0x3e8d7aa7, 0x3f87ae8b
> +        .long 0x3e91b5f8, 0x3f872504
> +        .long 0x3e95ecee, 0x3f8697d3
> +        .long 0x3e9a1f6b, 0x3f860705
> +        .long 0x3e9e4d54, 0x3f8572a8
> +        .long 0x3ea2768c, 0x3f84dac8
> +        .long 0x3ea69af8, 0x3f843f72
> +        .long 0x3eaaba7a, 0x3f83a0b6
> +        .long 0x3eaed4fa, 0x3f82fe9f
> +        .long 0x3eb2ea5c, 0x3f82593e
> +        .long 0x3eb6fa85, 0x3f81b0a0
> +        .long 0x3ebb055d, 0x3f8104d3
> +        .long 0x3ebf0aca, 0x3f8055e8
> +        .long 0x3ec30ab3, 0x3f7f47d8
> +        .long 0x3ec70501, 0x3f7ddddf
> +        .long 0x3ecaf99b, 0x3f7c6e05
> +        .long 0x3ecee869, 0x3f7af867
> +        .long 0x3ed2d156, 0x3f797d26
> +        .long 0x3ed6b44b, 0x3f77fc62
> +        .long 0x3eda9132, 0x3f76763c
> +        .long 0x3ede67f6, 0x3f74ead4
> +        .long 0x3ee23882, 0x3f735a4c
> +        .long 0x3ee602c2, 0x3f71c4c4
> +        .long 0x3ee9c6a2, 0x3f702a5f
> +        .long 0x3eed840e, 0x3f6e8b3e
> +        .long 0x3ef13af5, 0x3f6ce783
> +        .long 0x3ef4eb45, 0x3f6b3f51
> +        .long 0x3ef894ea, 0x3f6992c9
> +        .long 0x3efc37d5, 0x3f67e20f
> +        .long 0x3effd3f5, 0x3f662d45
> +        .long 0x3f01b49d, 0x3f64748e
> +        .long 0x3f037bca, 0x3f62b80d
> +        .long 0x3f053f7b, 0x3f60f7e5
> +        .long 0x3f06ffa8, 0x3f5f3439
> +        .long 0x3f08bc4a, 0x3f5d6d2d
> +        .long 0x3f0a755a, 0x3f5ba2e3
> +        .long 0x3f0c2ad3, 0x3f59d57e
> +        .long 0x3f0ddcae, 0x3f580523
> +        .long 0x3f0f8ae6, 0x3f5631f4
> +        .long 0x3f113574, 0x3f545c14
> +        .long 0x3f12dc54, 0x3f5283a7
> +        .long 0x3f147f81, 0x3f50a8cf
> +        .long 0x3f161ef6, 0x3f4ecbb1
> +        .long 0x3f17baae, 0x3f4cec6d
> +        .long 0x3f1952a6, 0x3f4b0b28
> +        .long 0x3f1ae6da, 0x3f492804
> +        .long 0x3f1c7745, 0x3f474323
> +        .long 0x3f1e03e5, 0x3f455ca8
> +        .long 0x3f1f8cb7, 0x3f4374b5
> +        .long 0x3f2111b7, 0x3f418b6b
> +        .long 0x3f2292e4, 0x3f3fa0ee
> +        .long 0x3f24103a, 0x3f3db55e
> +        .long 0x3f2589b9, 0x3f3bc8dc
> +        .long 0x3f26ff5d, 0x3f39db8a
> +        .long 0x3f287126, 0x3f37ed89
> +        .long 0x3f29df13, 0x3f35fef8
> +        .long 0x3f2b4922, 0x3f340ff9
> +        .long 0x3f2caf53, 0x3f3220ab
> +        .long 0x3f2e11a4, 0x3f30312e
> +        .long 0x3f2f7017, 0x3f2e41a1
> +        .long 0x3f30caab, 0x3f2c5223
> +        .long 0x3f322160, 0x3f2a62d3
> +        .long 0x3f337437, 0x3f2873cf
> +        .long 0x3f34c32f, 0x3f268534
> +        .long 0x3f360e4c, 0x3f249721
> +        .long 0x3f37558c, 0x3f22a9b3
> +        .long 0x3f3898f3, 0x3f20bd06
> +        .long 0x3f39d881, 0x3f1ed137
> +        .long 0x3f3b1438, 0x3f1ce661
> +        .long 0x3f3c4c1b, 0x3f1afca0
> +        .long 0x3f3d802c, 0x3f19140f
> +        .long 0x3f3eb06c, 0x3f172cc9
> +        .long 0x3f3fdce0, 0x3f1546e7
> +        .long 0x3f410589, 0x3f136284
> +        .long 0x3f422a6b, 0x3f117fb9
> +        .long 0x3f434b89, 0x3f0f9e9e
> +        .long 0x3f4468e7, 0x3f0dbf4c
> +        .long 0x3f458287, 0x3f0be1db
> +        .long 0x3f46986f, 0x3f0a0662
> +        .long 0x3f47aaa2, 0x3f082cf7
> +        .long 0x3f48b925, 0x3f0655b1
> +        .long 0x3f49c3fb, 0x3f0480a6
> +        .long 0x3f4acb29, 0x3f02adeb
> +        .long 0x3f4bceb4, 0x3f00dd96
> +        .long 0x3f4ccea1, 0x3efe1f73
> +        .long 0x3f4dcaf4, 0x3efa88d5
> +        .long 0x3f4ec3b4, 0x3ef6f777
> +        .long 0x3f4fb8e5, 0x3ef36b80
> +        .long 0x3f50aa8d, 0x3eefe513
> +        .long 0x3f5198b1, 0x3eec6455
> +        .long 0x3f528358, 0x3ee8e968
> +        .long 0x3f536a86, 0x3ee5746d
> +        .long 0x3f544e43, 0x3ee20584
> +        .long 0x3f552e93, 0x3ede9ccc
> +        .long 0x3f560b7e, 0x3edb3a64
> +        .long 0x3f56e50a, 0x3ed7de6a
> +        .long 0x3f57bb3d, 0x3ed488f8
> +        .long 0x3f588e1e, 0x3ed13a2b
> +        .long 0x3f595db4, 0x3ecdf21c
> +        .long 0x3f5a2a05, 0x3ecab0e4
> +        .long 0x3f5af318, 0x3ec7769b
> +        .long 0x3f5bb8f4, 0x3ec44359
> +        .long 0x3f5c7ba1, 0x3ec11733
> +        .long 0x3f5d3b25, 0x3ebdf23d
> +        .long 0x3f5df788, 0x3ebad48d
> +        .long 0x3f5eb0d1, 0x3eb7be35
> +        .long 0x3f5f6707, 0x3eb4af46
> +        .long 0x3f601a32, 0x3eb1a7d3
> +        .long 0x3f60ca59, 0x3eaea7ea
> +        .long 0x3f617784, 0x3eabaf9a
> +        .long 0x3f6221bb, 0x3ea8bef3
> +        .long 0x3f62c905, 0x3ea5d600
> +        .long 0x3f636d69, 0x3ea2f4ce
> +        .long 0x3f640ef1, 0x3ea01b68
> +        .long 0x3f64ada3, 0x3e9d49d9
> +        .long 0x3f654987, 0x3e9a8029
> +        .long 0x3f65e2a6, 0x3e97be62
> +        .long 0x3f667906, 0x3e95048b
> +        .long 0x3f670cb1, 0x3e9252aa
> +        .long 0x3f679dae, 0x3e8fa8c5
> +        .long 0x3f682c06, 0x3e8d06e3
> +        .long 0x3f68b7bf, 0x3e8a6d05
> +        .long 0x3f6940e2, 0x3e87db31
> +        .long 0x3f69c778, 0x3e855168
> +        .long 0x3f6a4b88, 0x3e82cfad
> +        .long 0x3f6acd1a, 0x3e805600
> +        .long 0x3f6b4c36, 0x3e7bc8c2
> +        .long 0x3f6bc8e5, 0x3e76f5a0
> +        .long 0x3f6c432f, 0x3e723298
> +        .long 0x3f6cbb1b, 0x3e6d7fa5
> +        .long 0x3f6d30b1, 0x3e68dcc1
> +        .long 0x3f6da3fa, 0x3e6449e7
> +        .long 0x3f6e14fe, 0x3e5fc70e
> +        .long 0x3f6e83c4, 0x3e5b542b
> +        .long 0x3f6ef055, 0x3e56f136
> +        .long 0x3f6f5ab8, 0x3e529e21
> +        .long 0x3f6fc2f5, 0x3e4e5adf
> +        .long 0x3f702915, 0x3e4a2761
> +        .long 0x3f708d1f, 0x3e460399
> +        .long 0x3f70ef1b, 0x3e41ef75
> +        .long 0x3f714f11, 0x3e3deae4
> +        .long 0x3f71ad09, 0x3e39f5d2
> +        .long 0x3f72090a, 0x3e36102b
> +        .long 0x3f72631c, 0x3e3239db
> +        .long 0x3f72bb46, 0x3e2e72cb
> +        .long 0x3f731191, 0x3e2abae4
> +        .long 0x3f736604, 0x3e27120f
> +        .long 0x3f73b8a5, 0x3e237833
> +        .long 0x3f74097e, 0x3e1fed36
> +        .long 0x3f745895, 0x3e1c70fd
> +        .long 0x3f74a5f2, 0x3e19036e
> +        .long 0x3f74f19b, 0x3e15a46d
> +        .long 0x3f753b98, 0x3e1253dc
> +        .long 0x3f7583f1, 0x3e0f119f
> +        .long 0x3f75caac, 0x3e0bdd96
> +        .long 0x3f760fd1, 0x3e08b7a4
> +        .long 0x3f765366, 0x3e059fa9
> +        .long 0x3f769573, 0x3e029586
> +        .long 0x3f76d5fe, 0x3dff3230
> +        .long 0x3f77150f, 0x3df95481
> +        .long 0x3f7752ab, 0x3df391b9
> +        .long 0x3f778eda, 0x3dede995
> +        .long 0x3f77c9a2, 0x3de85bd0
> +        .long 0x3f78030a, 0x3de2e825
> +        .long 0x3f783b18, 0x3ddd8e4c
> +        .long 0x3f7871d3, 0x3dd84dfe
> +        .long 0x3f78a741, 0x3dd326f3
> +        .long 0x3f78db68, 0x3dce18e3
> +        .long 0x3f790e50, 0x3dc92385
> +        .long 0x3f793ffc, 0x3dc4468f
> +        .long 0x3f797075, 0x3dbf81b6
> +        .long 0x3f799fbf, 0x3dbad4b0
> +        .long 0x3f79cde1, 0x3db63f32
> +        .long 0x3f79fae1, 0x3db1c0f1
> +        .long 0x3f7a26c4, 0x3dad59a1
> +        .long 0x3f7a518f, 0x3da908f6
> +        .long 0x3f7a7b4a, 0x3da4cea4
> +        .long 0x3f7aa3f9, 0x3da0aa5e
> +        .long 0x3f7acba1, 0x3d9c9bd9
> +        .long 0x3f7af248, 0x3d98a2c7
> +        .long 0x3f7b17f4, 0x3d94bedd
> +        .long 0x3f7b3ca9, 0x3d90efcd
> +        .long 0x3f7b606e, 0x3d8d354b
> +        .long 0x3f7b8346, 0x3d898f0a
> +        .long 0x3f7ba537, 0x3d85fcbf
> +        .long 0x3f7bc646, 0x3d827e1d
> +        .long 0x3f7be677, 0x3d7e25af
> +        .long 0x3f7c05d1, 0x3d777546
> +        .long 0x3f7c2456, 0x3d70ea68
> +        .long 0x3f7c420d, 0x3d6a847d
> +        .long 0x3f7c5ef9, 0x3d6442f0
> +        .long 0x3f7c7b1f, 0x3d5e252a
> +        .long 0x3f7c9684, 0x3d582a98
> +        .long 0x3f7cb12b, 0x3d5252a5
> +        .long 0x3f7ccb1a, 0x3d4c9cbd
> +        .long 0x3f7ce454, 0x3d47084e
> +        .long 0x3f7cfcdd, 0x3d4194c7
> +        .long 0x3f7d14ba, 0x3d3c4196
> +        .long 0x3f7d2bef, 0x3d370e2c
> +        .long 0x3f7d427f, 0x3d31f9fb
> +        .long 0x3f7d586f, 0x3d2d0474
> +        .long 0x3f7d6dc2, 0x3d282d0c
> +        .long 0x3f7d827b, 0x3d237336
> +        .long 0x3f7d96a0, 0x3d1ed669
> +        .long 0x3f7daa32, 0x3d1a561b
> +        .long 0x3f7dbd36, 0x3d15f1c6
> +        .long 0x3f7dcfb0, 0x3d11a8e1
> +        .long 0x3f7de1a2, 0x3d0d7ae9
> +        .long 0x3f7df30f, 0x3d09675a
> +        .long 0x3f7e03fd, 0x3d056db0
> +        .long 0x3f7e146c, 0x3d018d6b
> +        .long 0x3f7e2461, 0x3cfb8c15
> +        .long 0x3f7e33de, 0x3cf42e22
> +        .long 0x3f7e42e8, 0x3ced0003
> +        .long 0x3f7e517f, 0x3ce600c0
> +        .long 0x3f7e5fa9, 0x3cdf2f67
> +        .long 0x3f7e6d66, 0x3cd88b05
> +        .long 0x3f7e7abb, 0x3cd212ad
> +        .long 0x3f7e87aa, 0x3ccbc574
> +        .long 0x3f7e9435, 0x3cc5a273
> +        .long 0x3f7ea05f, 0x3cbfa8c4
> +        .long 0x3f7eac2b, 0x3cb9d786
> +        .long 0x3f7eb79a, 0x3cb42ddb
> +        .long 0x3f7ec2b1, 0x3caeaae6
> +        .long 0x3f7ecd71, 0x3ca94dcf
> +        .long 0x3f7ed7dc, 0x3ca415c2
> +        .long 0x3f7ee1f4, 0x3c9f01ec
> +        .long 0x3f7eebbd, 0x3c9a117f
> +        .long 0x3f7ef537, 0x3c9543ae
> +        .long 0x3f7efe66, 0x3c9097b1
> +        .long 0x3f7f074b, 0x3c8c0cc2
> +        .long 0x3f7f0fe8, 0x3c87a21f
> +        .long 0x3f7f1840, 0x3c83570a
> +        .long 0x3f7f2053, 0x3c7e558a
> +        .long 0x3f7f2826, 0x3c763931
> +        .long 0x3f7f2fb8, 0x3c6e579b
> +        .long 0x3f7f370c, 0x3c66af65
> +        .long 0x3f7f3e23, 0x3c5f3f2d
> +        .long 0x3f7f4500, 0x3c58059c
> +        .long 0x3f7f4ba4, 0x3c51015f
> +        .long 0x3f7f5211, 0x3c4a3127
> +        .long 0x3f7f5848, 0x3c4393af
> +        .long 0x3f7f5e4b, 0x3c3d27b5
> +        .long 0x3f7f641b, 0x3c36ebff
> +        .long 0x3f7f69ba, 0x3c30df57
> +        .long 0x3f7f6f29, 0x3c2b008e
> +        .long 0x3f7f746a, 0x3c254e7b
> +        .long 0x3f7f797f, 0x3c1fc7fb
> +        .long 0x3f7f7e67, 0x3c1a6bee
> +        .long 0x3f7f8326, 0x3c15393d
> +        .long 0x3f7f87bb, 0x3c102ed6
> +        .long 0x3f7f8c29, 0x3c0b4bab
> +        .long 0x3f7f9070, 0x3c068eb5
> +        .long 0x3f7f9492, 0x3c01f6f1
> +        .long 0x3f7f9890, 0x3bfb06c5
> +        .long 0x3f7f9c6b, 0x3bf26625
> +        .long 0x3f7fa024, 0x3bea0a1d
> +        .long 0x3f7fa3bc, 0x3be1f0d3
> +        .long 0x3f7fa734, 0x3bda1876
> +        .long 0x3f7faa8d, 0x3bd27f42
> +        .long 0x3f7fadc8, 0x3bcb237a
> +        .long 0x3f7fb0e6, 0x3bc4036c
> +        .long 0x3f7fb3e8, 0x3bbd1d6f
> +        .long 0x3f7fb6cf, 0x3bb66fe6
> +        .long 0x3f7fb99c, 0x3baff93b
> +        .long 0x3f7fbc4f, 0x3ba9b7e1
> +        .long 0x3f7fbeea, 0x3ba3aa56
> +        .long 0x3f7fc16d, 0x3b9dcf20
> +        .long 0x3f7fc3d9, 0x3b9824ce
> +        .long 0x3f7fc62e, 0x3b92a9f7
> +        .long 0x3f7fc86e, 0x3b8d5d3c
> +        .long 0x3f7fca99, 0x3b883d46
> +        .long 0x3f7fccb0, 0x3b8348c6
> +        .long 0x3f7fceb4, 0x3b7cfce8
> +        .long 0x3f7fd0a5, 0x3b73ba24
> +        .long 0x3f7fd283, 0x3b6ac6d3
> +        .long 0x3f7fd450, 0x3b622096
> +        .long 0x3f7fd60c, 0x3b59c51d
> +        .long 0x3f7fd7b7, 0x3b51b22a
> +        .long 0x3f7fd953, 0x3b49e589
> +        .long 0x3f7fdadf, 0x3b425d18
> +        .long 0x3f7fdc5c, 0x3b3b16c2
> +        .long 0x3f7fddcc, 0x3b341080
> +        .long 0x3f7fdf2d, 0x3b2d4858
> +        .long 0x3f7fe081, 0x3b26bc5e
> +        .long 0x3f7fe1c8, 0x3b206ab2
> +        .long 0x3f7fe303, 0x3b1a5183
> +        .long 0x3f7fe431, 0x3b146f09
> +        .long 0x3f7fe554, 0x3b0ec18c
> +        .long 0x3f7fe66c, 0x3b09475d
> +        .long 0x3f7fe77a, 0x3b03feda
> +        .long 0x3f7fe87d, 0x3afdccdc
> +        .long 0x3f7fe975, 0x3af3f919
> +        .long 0x3f7fea65, 0x3aea7f6c
> +        .long 0x3f7feb4b, 0x3ae15ce8
> +        .long 0x3f7fec27, 0x3ad88eb8
> +        .long 0x3f7fecfc, 0x3ad0121b
> +        .long 0x3f7fedc8, 0x3ac7e464
> +        .long 0x3f7fee8c, 0x3ac002f8
> +        .long 0x3f7fef48, 0x3ab86b52
> +        .long 0x3f7feffd, 0x3ab11afe
> +        .long 0x3f7ff0aa, 0x3aaa0f9a
> +        .long 0x3f7ff151, 0x3aa346d7
> +        .long 0x3f7ff1f1, 0x3a9cbe77
> +        .long 0x3f7ff28a, 0x3a96744c
> +        .long 0x3f7ff31e, 0x3a90663b
> +        .long 0x3f7ff3ab, 0x3a8a9237
> +        .long 0x3f7ff433, 0x3a84f643
> +        .long 0x3f7ff4b5, 0x3a7f20e7
> +        .long 0x3f7ff532, 0x3a74bdd2
> +        .long 0x3f7ff5aa, 0x3a6abfa9
> +        .long 0x3f7ff61d, 0x3a6122ea
> +        .long 0x3f7ff68b, 0x3a57e42f
> +        .long 0x3f7ff6f5, 0x3a4f002c
> +        .long 0x3f7ff75a, 0x3a4673af
> +        .long 0x3f7ff7bb, 0x3a3e3ba2
> +        .long 0x3f7ff819, 0x3a365507
> +        .long 0x3f7ff872, 0x3a2ebcf6
> +        .long 0x3f7ff8c7, 0x3a2770a1
> +        .long 0x3f7ff919, 0x3a206d52
> +        .long 0x3f7ff968, 0x3a19b066
> +        .long 0x3f7ff9b3, 0x3a133754
> +        .long 0x3f7ff9fb, 0x3a0cffa3
> +        .long 0x3f7ffa40, 0x3a0706f4
> +        .long 0x3f7ffa82, 0x3a014af8
> +        .long 0x3f7ffac1, 0x39f792ea
> +        .long 0x3f7ffafe, 0x39ed0088
> +        .long 0x3f7ffb38, 0x39e2daa1
> +        .long 0x3f7ffb6f, 0x39d91d2d
> +        .long 0x3f7ffba5, 0x39cfc44a
> +        .long 0x3f7ffbd7, 0x39c6cc35
> +        .long 0x3f7ffc08, 0x39be314d
> +        .long 0x3f7ffc36, 0x39b5f011
> +        .long 0x3f7ffc63, 0x39ae051c
> +        .long 0x3f7ffc8e, 0x39a66d2a
> +        .long 0x3f7ffcb6, 0x399f2512
> +        .long 0x3f7ffcdd, 0x399829c8
> +        .long 0x3f7ffd02, 0x3991785a
> +        .long 0x3f7ffd26, 0x398b0df2
> +        .long 0x3f7ffd48, 0x3984e7d2
> +        .long 0x3f7ffd68, 0x397e06ab
> +        .long 0x3f7ffd87, 0x3972bbde
> +        .long 0x3f7ffda5, 0x3967ea53
> +        .long 0x3f7ffdc1, 0x395d8d4b
> +        .long 0x3f7ffddc, 0x3953a034
> +        .long 0x3f7ffdf6, 0x394a1ea5
> +        .long 0x3f7ffe0f, 0x3941045e
> +        .long 0x3f7ffe27, 0x39384d47
> +        .long 0x3f7ffe3d, 0x392ff56d
> +        .long 0x3f7ffe53, 0x3927f904
> +        .long 0x3f7ffe67, 0x39205461
> +        .long 0x3f7ffe7b, 0x391903fe
> +        .long 0x3f7ffe8d, 0x39120475
> +        .long 0x3f7ffe9f, 0x390b5281
> +        .long 0x3f7ffeb0, 0x3904eafc
> +        .long 0x3f7ffec0, 0x38fd95bd
> +        .long 0x3f7ffed0, 0x38f1de7a
> +        .long 0x3f7ffedf, 0x38e6aa94
> +        .long 0x3f7ffeed, 0x38dbf4a3
> +        .long 0x3f7ffefa, 0x38d1b776
> +        .long 0x3f7fff07, 0x38c7ee0e
> +        .long 0x3f7fff13, 0x38be939c
> +        .long 0x3f7fff1f, 0x38b5a381
> +        .long 0x3f7fff2a, 0x38ad194e
> +        .long 0x3f7fff34, 0x38a4f0bc
> +        .long 0x3f7fff3f, 0x389d25b0
> +        .long 0x3f7fff48, 0x3895b43b
> +        .long 0x3f7fff51, 0x388e9890
> +        .long 0x3f7fff5a, 0x3887cf0e
> +        .long 0x3f7fff62, 0x38815434
> +        .long 0x3f7fff6a, 0x3876494d
> +        .long 0x3f7fff72, 0x386a7a5a
> +        .long 0x3f7fff79, 0x385f355e
> +        .long 0x3f7fff80, 0x38547466
> +        .long 0x3f7fff86, 0x384a31bf
> +        .long 0x3f7fff8c, 0x384067ee
> +        .long 0x3f7fff92, 0x383711b4
> +        .long 0x3f7fff98, 0x382e2a06
> +        .long 0x3f7fff9d, 0x3825ac0e
> +        .long 0x3f7fffa2, 0x381d9329
> +        .long 0x3f7fffa7, 0x3815dae6
> +        .long 0x3f7fffab, 0x380e7f01
> +        .long 0x3f7fffb0, 0x38077b62
> +        .long 0x3f7fffb4, 0x3800cc21
> +        .long 0x3f7fffb8, 0x37f4daf4
> +        .long 0x3f7fffbc, 0x37e8b7ac
> +        .long 0x3f7fffbf, 0x37dd2782
> +        .long 0x3f7fffc2, 0x37d223dc
> +        .long 0x3f7fffc6, 0x37c7a666
> +        .long 0x3f7fffc9, 0x37bda912
> +        .long 0x3f7fffcc, 0x37b42611
> +        .long 0x3f7fffce, 0x37ab17d6
> +        .long 0x3f7fffd1, 0x37a2790f
> +        .long 0x3f7fffd3, 0x379a44a5
> +        .long 0x3f7fffd6, 0x379275b9
> +        .long 0x3f7fffd8, 0x378b07a2
> +        .long 0x3f7fffda, 0x3783f5e9
> +        .long 0x3f7fffdc, 0x377a7897
> +        .long 0x3f7fffde, 0x376dad68
> +        .long 0x3f7fffe0, 0x37618278
> +        .long 0x3f7fffe2, 0x3755f04f
> +        .long 0x3f7fffe3, 0x374aefcc
> +        .long 0x3f7fffe5, 0x37407a1d
> +        .long 0x3f7fffe6, 0x373688bc
> +        .long 0x3f7fffe8, 0x372d1570
> +        .long 0x3f7fffe9, 0x37241a44
> +        .long 0x3f7fffea, 0x371b9188
> +        .long 0x3f7fffeb, 0x371375cf
> +        .long 0x3f7fffec, 0x370bc1e7
> +        .long 0x3f7fffee, 0x370470dd
> +        .long 0x3f7fffef, 0x36fafbec
> +        .long 0x3f7fffef, 0x36edc95b
> +        .long 0x3f7ffff0, 0x36e14167
> +        .long 0x3f7ffff1, 0x36d55bd6
> +        .long 0x3f7ffff2, 0x36ca10ce
> +        .long 0x3f7ffff3, 0x36bf58d1
> +        .long 0x3f7ffff4, 0x36b52cb9
> +        .long 0x3f7ffff4, 0x36ab85b5
> +        .long 0x3f7ffff5, 0x36a25d43
> +        .long 0x3f7ffff5, 0x3699ad31
> +        .long 0x3f7ffff6, 0x36916f95
> +        .long 0x3f7ffff7, 0x36899ecb
> +        .long 0x3f7ffff7, 0x36823575
> +        .long 0x3f7ffff8, 0x36765ce8
> +        .long 0x3f7ffff8, 0x366909cc
> +        .long 0x3f7ffff9, 0x365c684a
> +        .long 0x3f7ffff9, 0x36506f88
> +        .long 0x3f7ffff9, 0x36451713
> +        .long 0x3f7ffffa, 0x363a56e4
> +        .long 0x3f7ffffa, 0x36302754
> +        .long 0x3f7ffffa, 0x36268119
> +        .long 0x3f7ffffb, 0x361d5d43
> +        .long 0x3f7ffffb, 0x3614b538
> +        .long 0x3f7ffffb, 0x360c82b1
> +        .long 0x3f7ffffc, 0x3604bfb1
> +        .long 0x3f7ffffc, 0x35facd10
> +        .long 0x3f7ffffc, 0x35ece39b
> +        .long 0x3f7ffffc, 0x35dfb8b6
> +        .long 0x3f7ffffd, 0x35d34296
> +        .long 0x3f7ffffd, 0x35c777ec
> +        .long 0x3f7ffffd, 0x35bc4fdc
> +        .long 0x3f7ffffd, 0x35b1c1fc
> +        .long 0x3f7ffffd, 0x35a7c64b
> +        .long 0x3f7ffffd, 0x359e5531
> +        .long 0x3f7ffffe, 0x35956771
> +        .long 0x3f7ffffe, 0x358cf630
> +        .long 0x3f7ffffe, 0x3584fae8
> +        .long 0x3f7ffffe, 0x357adecb
> +        .long 0x3f7ffffe, 0x356c9b8f
> +        .long 0x3f7ffffe, 0x355f20ef
> +        .long 0x3f7ffffe, 0x3552644f
> +        .long 0x3f7ffffe, 0x35465b9c
> +        .long 0x3f7fffff, 0x353afd47
> +        .long 0x3f7fffff, 0x3530403c
> +        .long 0x3f7fffff, 0x35261be0
> +        .long 0x3f7fffff, 0x351c8807
> +        .long 0x3f7fffff, 0x35137cf0
> +        .long 0x3f7fffff, 0x350af341
> +        .long 0x3f7fffff, 0x3502e402
> +        .long 0x3f7fffff, 0x34f6912a
> +        .long 0x3f7fffff, 0x34e8356b
> +        .long 0x3f7fffff, 0x34daa8e4
> +        .long 0x3f7fffff, 0x34cde050
> +        .long 0x3f7fffff, 0x34c1d100
> +        .long 0x3f7fffff, 0x34b670d5
> +        .long 0x3f7fffff, 0x34abb639
> +        .long 0x3f7fffff, 0x34a19816
> +        .long 0x3f7fffff, 0x34980dd1
> +        .long 0x3f7fffff, 0x348f0f43
> +        .long 0x3f7fffff, 0x348694b3
> +        .long 0x3f800000, 0x347d2da8
> +        .long 0x3f800000, 0x346e1d72
> +        .align 16
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _AbsMask */
> +        .align 16
> +        .long 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000  /* _MaxThreshold */
> +        .align 16
> +        .long 0x47800000, 0x47800000, 0x47800000, 0x47800000  /* _SRound */
> +        .align 16
> +        .long 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000  /* _U2THreshold  */
> +        .align 16
> +        .long 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade  /* _poly_3_0 */
> +        .align 16
> +        .type	__svml_serf_data_internal,@object
> +        .size	__svml_serf_data_internal,.-__svml_serf_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core-sse.S
> new file mode 100644
> index 0000000000..4b939f8c55
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized erff, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_erff _ZGVdN8v_erff_sse_wrapper
> +#include "../svml_s_erff8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core.c
> new file mode 100644
> index 0000000000..50f5901db1
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized erff, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_erff
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_erff, __GI__ZGVdN8v_erff,
> +	       __redirect__ZGVdN8v_erff)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core_avx2.S
> new file mode 100644
> index 0000000000..4cd82b45e9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_erff8_core_avx2.S
> @@ -0,0 +1,669 @@
> +/* Function erff vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   Basic formula is
> + *    erf(x) ~ erf(x0) +
> + *              + exp(-x0*x0)*D*(1+c0+T*P1(T)+D^2*P3(T)+D^4*p5)
> + *   where D=x-x0, T=x0*D
> + *   x0 is x rounded to a specified number of fractional bits (in this case 8),
> + *    except that x0=0 for |x|<3.5/256.0 (using x0=0 for first 4 table entries)
> + *
> + *   Data table packs both erf(x0)_high and a few bits of erf(x0)_low in one
> + *   entry (in place of redundant exponent bits)
> + *
> + */
> +
> +/* Offsets for data table __svml_serf_data_internal
> + */
> +#define _erf_tbl                      	0
> +#define _AbsMask                      	4032
> +#define _MaxThreshold                 	4064
> +#define _SRound                       	4096
> +#define _U2Threshold                  	4128
> +#define _poly3_0                      	4160
> +
> +/* Lookup bias for data table __svml_serf_data_internal.  */
> +#define Table_Lookup_Bias               -0x3c000000
> +
> +#include <sysdep.h>
> +
> +        .text
> +	.section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_erff_avx2)
> +        lea       Table_Lookup_Bias+__svml_serf_data_internal(%rip), %rax
> +
> +/*
> + * vector gather:
> + * erf(x0), exp(-x0*x0)*2.0/sqrt(pi)
> + */
> +        vmovups   _SRound+__svml_serf_data_internal(%rip), %ymm7
> +        vandps    _AbsMask+__svml_serf_data_internal(%rip), %ymm0, %ymm6
> +
> +/*
> + * erf(x) rounds to 1.0 for x>_MaxThreshold (3.9375)
> + * can compute all results in the main path
> + */
> +        vminps    _MaxThreshold+__svml_serf_data_internal(%rip), %ymm6, %ymm8
> +        vaddps    %ymm7, %ymm8, %ymm10
> +        vcmpgt_oqps _U2Threshold+__svml_serf_data_internal(%rip), %ymm8, %ymm9
> +        vpslld    $3, %ymm10, %ymm11
> +        vsubps    %ymm7, %ymm10, %ymm4
> +        vsubps    %ymm4, %ymm8, %ymm3
> +        vandps    %ymm9, %ymm3, %ymm2
> +
> +/* NaN fixup */
> +        vminps    %ymm6, %ymm3, %ymm3
> +
> +/* D2 = Diff^2 */
> +        vmulps    %ymm2, %ymm2, %ymm2
> +
> +/* save sign */
> +        vxorps    %ymm0, %ymm6, %ymm5
> +        vmovd     %xmm11, %edx
> +        vextractf128 $1, %ymm11, %xmm12
> +        vpextrd   $2, %xmm11, %esi
> +        movslq    %edx, %rdx
> +        movslq    %esi, %rsi
> +        vmovd     %xmm12, %r8d
> +        vmovq     (%rax,%rdx), %xmm13
> +        vmovq     (%rax,%rsi), %xmm14
> +        vunpcklps %xmm14, %xmm13, %xmm10
> +        vmovups   _poly3_0+__svml_serf_data_internal(%rip), %ymm14
> +        vpextrd   $1, %xmm11, %ecx
> +        vpextrd   $3, %xmm11, %edi
> +        vpextrd   $1, %xmm12, %r9d
> +        vpextrd   $2, %xmm12, %r10d
> +        vpextrd   $3, %xmm12, %r11d
> +
> +/*
> + * Start polynomial evaluation
> + * P1
> + */
> +        vfmsub231ps %ymm14, %ymm3, %ymm4
> +        movslq    %ecx, %rcx
> +        movslq    %edi, %rdi
> +        movslq    %r8d, %r8
> +        movslq    %r9d, %r9
> +        movslq    %r10d, %r10
> +        movslq    %r11d, %r11
> +        vmovq     (%rax,%rcx), %xmm1
> +        vmovq     (%rax,%rdi), %xmm15
> +
> +/*
> + * branch-free
> + * (exp_h(x0) * Diff) * (poly + 1.0)
> + */
> +        vfmadd213ps %ymm3, %ymm2, %ymm4
> +        vmovq     (%rax,%r8), %xmm7
> +        vmovq     (%rax,%r9), %xmm0
> +        vmovq     (%rax,%r10), %xmm8
> +        vmovq     (%rax,%r11), %xmm9
> +        vunpcklps %xmm15, %xmm1, %xmm11
> +        vunpcklps %xmm8, %xmm7, %xmm1
> +        vunpcklps %xmm9, %xmm0, %xmm0
> +        vinsertf128 $1, %xmm1, %ymm10, %ymm12
> +        vinsertf128 $1, %xmm0, %ymm11, %ymm13
> +        vunpcklps %ymm13, %ymm12, %ymm0
> +        vunpckhps %ymm13, %ymm12, %ymm15
> +
> +/* Final result */
> +        vfmadd213ps %ymm0, %ymm15, %ymm4
> +
> +/* set sign */
> +        vorps     %ymm5, %ymm4, %ymm0
> +        ret
> +
> +END(_ZGVdN8v_erff_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_serf_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _erf_tbl[1008][1];
> +        __declspec(align(32)) VUINT32 _AbsMask[8][1];
> +        __declspec(align(32)) VUINT32 _MaxThreshold[8][1];
> +        __declspec(align(32)) VUINT32 _SRound[8][1];
> +        __declspec(align(32)) VUINT32 _U2Threshold[8][1];
> +        __declspec(align(32)) VUINT32 _poly3_0[8][1];
> +} __svml_serf_data_internal;
> +#endif
> +__svml_serf_data_internal:
> +        /*== _erf_tbl ==*/
> +        .long 0x00000000, 0x3f906ebb
> +        .long 0x3c106dfa, 0x3f906c79
> +        .long 0x3c906bb8, 0x3f9065b4
> +        .long 0x3cd89bf0, 0x3f905a6c
> +        .long 0x3d1062b2, 0x3f904aa3
> +        .long 0x3d3472ea, 0x3f90365a
> +        .long 0x3d587d7f, 0x3f901d93
> +        .long 0x3d7c8154, 0x3f900050
> +        .long 0x3d903ea4, 0x3f8fde94
> +        .long 0x3da2381f, 0x3f8fb862
> +        .long 0x3db42c8d, 0x3f8f8dbd
> +        .long 0x3dc61b5f, 0x3f8f5eab
> +        .long 0x3dd80409, 0x3f8f2b2e
> +        .long 0x3de9e5fc, 0x3f8ef34c
> +        .long 0x3dfbc0ad, 0x3f8eb70a
> +        .long 0x3e06c9c8, 0x3f8e766e
> +        .long 0x3e0faf0d, 0x3f8e317d
> +        .long 0x3e188fe1, 0x3f8de83e
> +        .long 0x3e216bfe, 0x3f8d9ab9
> +        .long 0x3e2a4321, 0x3f8d48f3
> +        .long 0x3e331506, 0x3f8cf2f5
> +        .long 0x3e3be169, 0x3f8c98c6
> +        .long 0x3e44a808, 0x3f8c3a6f
> +        .long 0x3e4d68a1, 0x3f8bd7f8
> +        .long 0x3e5622f2, 0x3f8b716c
> +        .long 0x3e5ed6b9, 0x3f8b06d2
> +        .long 0x3e6783b7, 0x3f8a9834
> +        .long 0x3e7029aa, 0x3f8a259e
> +        .long 0x3e78c855, 0x3f89af18
> +        .long 0x3e80afbc, 0x3f8934af
> +        .long 0x3e84f76b, 0x3f88b66c
> +        .long 0x3e893b19, 0x3f88345d
> +        .long 0x3e8d7aa7, 0x3f87ae8b
> +        .long 0x3e91b5f8, 0x3f872504
> +        .long 0x3e95ecee, 0x3f8697d3
> +        .long 0x3e9a1f6b, 0x3f860705
> +        .long 0x3e9e4d54, 0x3f8572a8
> +        .long 0x3ea2768c, 0x3f84dac8
> +        .long 0x3ea69af8, 0x3f843f72
> +        .long 0x3eaaba7a, 0x3f83a0b6
> +        .long 0x3eaed4fa, 0x3f82fe9f
> +        .long 0x3eb2ea5c, 0x3f82593e
> +        .long 0x3eb6fa85, 0x3f81b0a0
> +        .long 0x3ebb055d, 0x3f8104d3
> +        .long 0x3ebf0aca, 0x3f8055e8
> +        .long 0x3ec30ab3, 0x3f7f47d8
> +        .long 0x3ec70501, 0x3f7ddddf
> +        .long 0x3ecaf99b, 0x3f7c6e05
> +        .long 0x3ecee869, 0x3f7af867
> +        .long 0x3ed2d156, 0x3f797d26
> +        .long 0x3ed6b44b, 0x3f77fc62
> +        .long 0x3eda9132, 0x3f76763c
> +        .long 0x3ede67f6, 0x3f74ead4
> +        .long 0x3ee23882, 0x3f735a4c
> +        .long 0x3ee602c2, 0x3f71c4c4
> +        .long 0x3ee9c6a2, 0x3f702a5f
> +        .long 0x3eed840e, 0x3f6e8b3e
> +        .long 0x3ef13af5, 0x3f6ce783
> +        .long 0x3ef4eb45, 0x3f6b3f51
> +        .long 0x3ef894ea, 0x3f6992c9
> +        .long 0x3efc37d5, 0x3f67e20f
> +        .long 0x3effd3f5, 0x3f662d45
> +        .long 0x3f01b49d, 0x3f64748e
> +        .long 0x3f037bca, 0x3f62b80d
> +        .long 0x3f053f7b, 0x3f60f7e5
> +        .long 0x3f06ffa8, 0x3f5f3439
> +        .long 0x3f08bc4a, 0x3f5d6d2d
> +        .long 0x3f0a755a, 0x3f5ba2e3
> +        .long 0x3f0c2ad3, 0x3f59d57e
> +        .long 0x3f0ddcae, 0x3f580523
> +        .long 0x3f0f8ae6, 0x3f5631f4
> +        .long 0x3f113574, 0x3f545c14
> +        .long 0x3f12dc54, 0x3f5283a7
> +        .long 0x3f147f81, 0x3f50a8cf
> +        .long 0x3f161ef6, 0x3f4ecbb1
> +        .long 0x3f17baae, 0x3f4cec6d
> +        .long 0x3f1952a6, 0x3f4b0b28
> +        .long 0x3f1ae6da, 0x3f492804
> +        .long 0x3f1c7745, 0x3f474323
> +        .long 0x3f1e03e5, 0x3f455ca8
> +        .long 0x3f1f8cb7, 0x3f4374b5
> +        .long 0x3f2111b7, 0x3f418b6b
> +        .long 0x3f2292e4, 0x3f3fa0ee
> +        .long 0x3f24103a, 0x3f3db55e
> +        .long 0x3f2589b9, 0x3f3bc8dc
> +        .long 0x3f26ff5d, 0x3f39db8a
> +        .long 0x3f287126, 0x3f37ed89
> +        .long 0x3f29df13, 0x3f35fef8
> +        .long 0x3f2b4922, 0x3f340ff9
> +        .long 0x3f2caf53, 0x3f3220ab
> +        .long 0x3f2e11a4, 0x3f30312e
> +        .long 0x3f2f7017, 0x3f2e41a1
> +        .long 0x3f30caab, 0x3f2c5223
> +        .long 0x3f322160, 0x3f2a62d3
> +        .long 0x3f337437, 0x3f2873cf
> +        .long 0x3f34c32f, 0x3f268534
> +        .long 0x3f360e4c, 0x3f249721
> +        .long 0x3f37558c, 0x3f22a9b3
> +        .long 0x3f3898f3, 0x3f20bd06
> +        .long 0x3f39d881, 0x3f1ed137
> +        .long 0x3f3b1438, 0x3f1ce661
> +        .long 0x3f3c4c1b, 0x3f1afca0
> +        .long 0x3f3d802c, 0x3f19140f
> +        .long 0x3f3eb06c, 0x3f172cc9
> +        .long 0x3f3fdce0, 0x3f1546e7
> +        .long 0x3f410589, 0x3f136284
> +        .long 0x3f422a6b, 0x3f117fb9
> +        .long 0x3f434b89, 0x3f0f9e9e
> +        .long 0x3f4468e7, 0x3f0dbf4c
> +        .long 0x3f458287, 0x3f0be1db
> +        .long 0x3f46986f, 0x3f0a0662
> +        .long 0x3f47aaa2, 0x3f082cf7
> +        .long 0x3f48b925, 0x3f0655b1
> +        .long 0x3f49c3fb, 0x3f0480a6
> +        .long 0x3f4acb29, 0x3f02adeb
> +        .long 0x3f4bceb4, 0x3f00dd96
> +        .long 0x3f4ccea1, 0x3efe1f73
> +        .long 0x3f4dcaf4, 0x3efa88d5
> +        .long 0x3f4ec3b4, 0x3ef6f777
> +        .long 0x3f4fb8e5, 0x3ef36b80
> +        .long 0x3f50aa8d, 0x3eefe513
> +        .long 0x3f5198b1, 0x3eec6455
> +        .long 0x3f528358, 0x3ee8e968
> +        .long 0x3f536a86, 0x3ee5746d
> +        .long 0x3f544e43, 0x3ee20584
> +        .long 0x3f552e93, 0x3ede9ccc
> +        .long 0x3f560b7e, 0x3edb3a64
> +        .long 0x3f56e50a, 0x3ed7de6a
> +        .long 0x3f57bb3d, 0x3ed488f8
> +        .long 0x3f588e1e, 0x3ed13a2b
> +        .long 0x3f595db4, 0x3ecdf21c
> +        .long 0x3f5a2a05, 0x3ecab0e4
> +        .long 0x3f5af318, 0x3ec7769b
> +        .long 0x3f5bb8f4, 0x3ec44359
> +        .long 0x3f5c7ba1, 0x3ec11733
> +        .long 0x3f5d3b25, 0x3ebdf23d
> +        .long 0x3f5df788, 0x3ebad48d
> +        .long 0x3f5eb0d1, 0x3eb7be35
> +        .long 0x3f5f6707, 0x3eb4af46
> +        .long 0x3f601a32, 0x3eb1a7d3
> +        .long 0x3f60ca59, 0x3eaea7ea
> +        .long 0x3f617784, 0x3eabaf9a
> +        .long 0x3f6221bb, 0x3ea8bef3
> +        .long 0x3f62c905, 0x3ea5d600
> +        .long 0x3f636d69, 0x3ea2f4ce
> +        .long 0x3f640ef1, 0x3ea01b68
> +        .long 0x3f64ada3, 0x3e9d49d9
> +        .long 0x3f654987, 0x3e9a8029
> +        .long 0x3f65e2a6, 0x3e97be62
> +        .long 0x3f667906, 0x3e95048b
> +        .long 0x3f670cb1, 0x3e9252aa
> +        .long 0x3f679dae, 0x3e8fa8c5
> +        .long 0x3f682c06, 0x3e8d06e3
> +        .long 0x3f68b7bf, 0x3e8a6d05
> +        .long 0x3f6940e2, 0x3e87db31
> +        .long 0x3f69c778, 0x3e855168
> +        .long 0x3f6a4b88, 0x3e82cfad
> +        .long 0x3f6acd1a, 0x3e805600
> +        .long 0x3f6b4c36, 0x3e7bc8c2
> +        .long 0x3f6bc8e5, 0x3e76f5a0
> +        .long 0x3f6c432f, 0x3e723298
> +        .long 0x3f6cbb1b, 0x3e6d7fa5
> +        .long 0x3f6d30b1, 0x3e68dcc1
> +        .long 0x3f6da3fa, 0x3e6449e7
> +        .long 0x3f6e14fe, 0x3e5fc70e
> +        .long 0x3f6e83c4, 0x3e5b542b
> +        .long 0x3f6ef055, 0x3e56f136
> +        .long 0x3f6f5ab8, 0x3e529e21
> +        .long 0x3f6fc2f5, 0x3e4e5adf
> +        .long 0x3f702915, 0x3e4a2761
> +        .long 0x3f708d1f, 0x3e460399
> +        .long 0x3f70ef1b, 0x3e41ef75
> +        .long 0x3f714f11, 0x3e3deae4
> +        .long 0x3f71ad09, 0x3e39f5d2
> +        .long 0x3f72090a, 0x3e36102b
> +        .long 0x3f72631c, 0x3e3239db
> +        .long 0x3f72bb46, 0x3e2e72cb
> +        .long 0x3f731191, 0x3e2abae4
> +        .long 0x3f736604, 0x3e27120f
> +        .long 0x3f73b8a5, 0x3e237833
> +        .long 0x3f74097e, 0x3e1fed36
> +        .long 0x3f745895, 0x3e1c70fd
> +        .long 0x3f74a5f2, 0x3e19036e
> +        .long 0x3f74f19b, 0x3e15a46d
> +        .long 0x3f753b98, 0x3e1253dc
> +        .long 0x3f7583f1, 0x3e0f119f
> +        .long 0x3f75caac, 0x3e0bdd96
> +        .long 0x3f760fd1, 0x3e08b7a4
> +        .long 0x3f765366, 0x3e059fa9
> +        .long 0x3f769573, 0x3e029586
> +        .long 0x3f76d5fe, 0x3dff3230
> +        .long 0x3f77150f, 0x3df95481
> +        .long 0x3f7752ab, 0x3df391b9
> +        .long 0x3f778eda, 0x3dede995
> +        .long 0x3f77c9a2, 0x3de85bd0
> +        .long 0x3f78030a, 0x3de2e825
> +        .long 0x3f783b18, 0x3ddd8e4c
> +        .long 0x3f7871d3, 0x3dd84dfe
> +        .long 0x3f78a741, 0x3dd326f3
> +        .long 0x3f78db68, 0x3dce18e3
> +        .long 0x3f790e50, 0x3dc92385
> +        .long 0x3f793ffc, 0x3dc4468f
> +        .long 0x3f797075, 0x3dbf81b6
> +        .long 0x3f799fbf, 0x3dbad4b0
> +        .long 0x3f79cde1, 0x3db63f32
> +        .long 0x3f79fae1, 0x3db1c0f1
> +        .long 0x3f7a26c4, 0x3dad59a1
> +        .long 0x3f7a518f, 0x3da908f6
> +        .long 0x3f7a7b4a, 0x3da4cea4
> +        .long 0x3f7aa3f9, 0x3da0aa5e
> +        .long 0x3f7acba1, 0x3d9c9bd9
> +        .long 0x3f7af248, 0x3d98a2c7
> +        .long 0x3f7b17f4, 0x3d94bedd
> +        .long 0x3f7b3ca9, 0x3d90efcd
> +        .long 0x3f7b606e, 0x3d8d354b
> +        .long 0x3f7b8346, 0x3d898f0a
> +        .long 0x3f7ba537, 0x3d85fcbf
> +        .long 0x3f7bc646, 0x3d827e1d
> +        .long 0x3f7be677, 0x3d7e25af
> +        .long 0x3f7c05d1, 0x3d777546
> +        .long 0x3f7c2456, 0x3d70ea68
> +        .long 0x3f7c420d, 0x3d6a847d
> +        .long 0x3f7c5ef9, 0x3d6442f0
> +        .long 0x3f7c7b1f, 0x3d5e252a
> +        .long 0x3f7c9684, 0x3d582a98
> +        .long 0x3f7cb12b, 0x3d5252a5
> +        .long 0x3f7ccb1a, 0x3d4c9cbd
> +        .long 0x3f7ce454, 0x3d47084e
> +        .long 0x3f7cfcdd, 0x3d4194c7
> +        .long 0x3f7d14ba, 0x3d3c4196
> +        .long 0x3f7d2bef, 0x3d370e2c
> +        .long 0x3f7d427f, 0x3d31f9fb
> +        .long 0x3f7d586f, 0x3d2d0474
> +        .long 0x3f7d6dc2, 0x3d282d0c
> +        .long 0x3f7d827b, 0x3d237336
> +        .long 0x3f7d96a0, 0x3d1ed669
> +        .long 0x3f7daa32, 0x3d1a561b
> +        .long 0x3f7dbd36, 0x3d15f1c6
> +        .long 0x3f7dcfb0, 0x3d11a8e1
> +        .long 0x3f7de1a2, 0x3d0d7ae9
> +        .long 0x3f7df30f, 0x3d09675a
> +        .long 0x3f7e03fd, 0x3d056db0
> +        .long 0x3f7e146c, 0x3d018d6b
> +        .long 0x3f7e2461, 0x3cfb8c15
> +        .long 0x3f7e33de, 0x3cf42e22
> +        .long 0x3f7e42e8, 0x3ced0003
> +        .long 0x3f7e517f, 0x3ce600c0
> +        .long 0x3f7e5fa9, 0x3cdf2f67
> +        .long 0x3f7e6d66, 0x3cd88b05
> +        .long 0x3f7e7abb, 0x3cd212ad
> +        .long 0x3f7e87aa, 0x3ccbc574
> +        .long 0x3f7e9435, 0x3cc5a273
> +        .long 0x3f7ea05f, 0x3cbfa8c4
> +        .long 0x3f7eac2b, 0x3cb9d786
> +        .long 0x3f7eb79a, 0x3cb42ddb
> +        .long 0x3f7ec2b1, 0x3caeaae6
> +        .long 0x3f7ecd71, 0x3ca94dcf
> +        .long 0x3f7ed7dc, 0x3ca415c2
> +        .long 0x3f7ee1f4, 0x3c9f01ec
> +        .long 0x3f7eebbd, 0x3c9a117f
> +        .long 0x3f7ef537, 0x3c9543ae
> +        .long 0x3f7efe66, 0x3c9097b1
> +        .long 0x3f7f074b, 0x3c8c0cc2
> +        .long 0x3f7f0fe8, 0x3c87a21f
> +        .long 0x3f7f1840, 0x3c83570a
> +        .long 0x3f7f2053, 0x3c7e558a
> +        .long 0x3f7f2826, 0x3c763931
> +        .long 0x3f7f2fb8, 0x3c6e579b
> +        .long 0x3f7f370c, 0x3c66af65
> +        .long 0x3f7f3e23, 0x3c5f3f2d
> +        .long 0x3f7f4500, 0x3c58059c
> +        .long 0x3f7f4ba4, 0x3c51015f
> +        .long 0x3f7f5211, 0x3c4a3127
> +        .long 0x3f7f5848, 0x3c4393af
> +        .long 0x3f7f5e4b, 0x3c3d27b5
> +        .long 0x3f7f641b, 0x3c36ebff
> +        .long 0x3f7f69ba, 0x3c30df57
> +        .long 0x3f7f6f29, 0x3c2b008e
> +        .long 0x3f7f746a, 0x3c254e7b
> +        .long 0x3f7f797f, 0x3c1fc7fb
> +        .long 0x3f7f7e67, 0x3c1a6bee
> +        .long 0x3f7f8326, 0x3c15393d
> +        .long 0x3f7f87bb, 0x3c102ed6
> +        .long 0x3f7f8c29, 0x3c0b4bab
> +        .long 0x3f7f9070, 0x3c068eb5
> +        .long 0x3f7f9492, 0x3c01f6f1
> +        .long 0x3f7f9890, 0x3bfb06c5
> +        .long 0x3f7f9c6b, 0x3bf26625
> +        .long 0x3f7fa024, 0x3bea0a1d
> +        .long 0x3f7fa3bc, 0x3be1f0d3
> +        .long 0x3f7fa734, 0x3bda1876
> +        .long 0x3f7faa8d, 0x3bd27f42
> +        .long 0x3f7fadc8, 0x3bcb237a
> +        .long 0x3f7fb0e6, 0x3bc4036c
> +        .long 0x3f7fb3e8, 0x3bbd1d6f
> +        .long 0x3f7fb6cf, 0x3bb66fe6
> +        .long 0x3f7fb99c, 0x3baff93b
> +        .long 0x3f7fbc4f, 0x3ba9b7e1
> +        .long 0x3f7fbeea, 0x3ba3aa56
> +        .long 0x3f7fc16d, 0x3b9dcf20
> +        .long 0x3f7fc3d9, 0x3b9824ce
> +        .long 0x3f7fc62e, 0x3b92a9f7
> +        .long 0x3f7fc86e, 0x3b8d5d3c
> +        .long 0x3f7fca99, 0x3b883d46
> +        .long 0x3f7fccb0, 0x3b8348c6
> +        .long 0x3f7fceb4, 0x3b7cfce8
> +        .long 0x3f7fd0a5, 0x3b73ba24
> +        .long 0x3f7fd283, 0x3b6ac6d3
> +        .long 0x3f7fd450, 0x3b622096
> +        .long 0x3f7fd60c, 0x3b59c51d
> +        .long 0x3f7fd7b7, 0x3b51b22a
> +        .long 0x3f7fd953, 0x3b49e589
> +        .long 0x3f7fdadf, 0x3b425d18
> +        .long 0x3f7fdc5c, 0x3b3b16c2
> +        .long 0x3f7fddcc, 0x3b341080
> +        .long 0x3f7fdf2d, 0x3b2d4858
> +        .long 0x3f7fe081, 0x3b26bc5e
> +        .long 0x3f7fe1c8, 0x3b206ab2
> +        .long 0x3f7fe303, 0x3b1a5183
> +        .long 0x3f7fe431, 0x3b146f09
> +        .long 0x3f7fe554, 0x3b0ec18c
> +        .long 0x3f7fe66c, 0x3b09475d
> +        .long 0x3f7fe77a, 0x3b03feda
> +        .long 0x3f7fe87d, 0x3afdccdc
> +        .long 0x3f7fe975, 0x3af3f919
> +        .long 0x3f7fea65, 0x3aea7f6c
> +        .long 0x3f7feb4b, 0x3ae15ce8
> +        .long 0x3f7fec27, 0x3ad88eb8
> +        .long 0x3f7fecfc, 0x3ad0121b
> +        .long 0x3f7fedc8, 0x3ac7e464
> +        .long 0x3f7fee8c, 0x3ac002f8
> +        .long 0x3f7fef48, 0x3ab86b52
> +        .long 0x3f7feffd, 0x3ab11afe
> +        .long 0x3f7ff0aa, 0x3aaa0f9a
> +        .long 0x3f7ff151, 0x3aa346d7
> +        .long 0x3f7ff1f1, 0x3a9cbe77
> +        .long 0x3f7ff28a, 0x3a96744c
> +        .long 0x3f7ff31e, 0x3a90663b
> +        .long 0x3f7ff3ab, 0x3a8a9237
> +        .long 0x3f7ff433, 0x3a84f643
> +        .long 0x3f7ff4b5, 0x3a7f20e7
> +        .long 0x3f7ff532, 0x3a74bdd2
> +        .long 0x3f7ff5aa, 0x3a6abfa9
> +        .long 0x3f7ff61d, 0x3a6122ea
> +        .long 0x3f7ff68b, 0x3a57e42f
> +        .long 0x3f7ff6f5, 0x3a4f002c
> +        .long 0x3f7ff75a, 0x3a4673af
> +        .long 0x3f7ff7bb, 0x3a3e3ba2
> +        .long 0x3f7ff819, 0x3a365507
> +        .long 0x3f7ff872, 0x3a2ebcf6
> +        .long 0x3f7ff8c7, 0x3a2770a1
> +        .long 0x3f7ff919, 0x3a206d52
> +        .long 0x3f7ff968, 0x3a19b066
> +        .long 0x3f7ff9b3, 0x3a133754
> +        .long 0x3f7ff9fb, 0x3a0cffa3
> +        .long 0x3f7ffa40, 0x3a0706f4
> +        .long 0x3f7ffa82, 0x3a014af8
> +        .long 0x3f7ffac1, 0x39f792ea
> +        .long 0x3f7ffafe, 0x39ed0088
> +        .long 0x3f7ffb38, 0x39e2daa1
> +        .long 0x3f7ffb6f, 0x39d91d2d
> +        .long 0x3f7ffba5, 0x39cfc44a
> +        .long 0x3f7ffbd7, 0x39c6cc35
> +        .long 0x3f7ffc08, 0x39be314d
> +        .long 0x3f7ffc36, 0x39b5f011
> +        .long 0x3f7ffc63, 0x39ae051c
> +        .long 0x3f7ffc8e, 0x39a66d2a
> +        .long 0x3f7ffcb6, 0x399f2512
> +        .long 0x3f7ffcdd, 0x399829c8
> +        .long 0x3f7ffd02, 0x3991785a
> +        .long 0x3f7ffd26, 0x398b0df2
> +        .long 0x3f7ffd48, 0x3984e7d2
> +        .long 0x3f7ffd68, 0x397e06ab
> +        .long 0x3f7ffd87, 0x3972bbde
> +        .long 0x3f7ffda5, 0x3967ea53
> +        .long 0x3f7ffdc1, 0x395d8d4b
> +        .long 0x3f7ffddc, 0x3953a034
> +        .long 0x3f7ffdf6, 0x394a1ea5
> +        .long 0x3f7ffe0f, 0x3941045e
> +        .long 0x3f7ffe27, 0x39384d47
> +        .long 0x3f7ffe3d, 0x392ff56d
> +        .long 0x3f7ffe53, 0x3927f904
> +        .long 0x3f7ffe67, 0x39205461
> +        .long 0x3f7ffe7b, 0x391903fe
> +        .long 0x3f7ffe8d, 0x39120475
> +        .long 0x3f7ffe9f, 0x390b5281
> +        .long 0x3f7ffeb0, 0x3904eafc
> +        .long 0x3f7ffec0, 0x38fd95bd
> +        .long 0x3f7ffed0, 0x38f1de7a
> +        .long 0x3f7ffedf, 0x38e6aa94
> +        .long 0x3f7ffeed, 0x38dbf4a3
> +        .long 0x3f7ffefa, 0x38d1b776
> +        .long 0x3f7fff07, 0x38c7ee0e
> +        .long 0x3f7fff13, 0x38be939c
> +        .long 0x3f7fff1f, 0x38b5a381
> +        .long 0x3f7fff2a, 0x38ad194e
> +        .long 0x3f7fff34, 0x38a4f0bc
> +        .long 0x3f7fff3f, 0x389d25b0
> +        .long 0x3f7fff48, 0x3895b43b
> +        .long 0x3f7fff51, 0x388e9890
> +        .long 0x3f7fff5a, 0x3887cf0e
> +        .long 0x3f7fff62, 0x38815434
> +        .long 0x3f7fff6a, 0x3876494d
> +        .long 0x3f7fff72, 0x386a7a5a
> +        .long 0x3f7fff79, 0x385f355e
> +        .long 0x3f7fff80, 0x38547466
> +        .long 0x3f7fff86, 0x384a31bf
> +        .long 0x3f7fff8c, 0x384067ee
> +        .long 0x3f7fff92, 0x383711b4
> +        .long 0x3f7fff98, 0x382e2a06
> +        .long 0x3f7fff9d, 0x3825ac0e
> +        .long 0x3f7fffa2, 0x381d9329
> +        .long 0x3f7fffa7, 0x3815dae6
> +        .long 0x3f7fffab, 0x380e7f01
> +        .long 0x3f7fffb0, 0x38077b62
> +        .long 0x3f7fffb4, 0x3800cc21
> +        .long 0x3f7fffb8, 0x37f4daf4
> +        .long 0x3f7fffbc, 0x37e8b7ac
> +        .long 0x3f7fffbf, 0x37dd2782
> +        .long 0x3f7fffc2, 0x37d223dc
> +        .long 0x3f7fffc6, 0x37c7a666
> +        .long 0x3f7fffc9, 0x37bda912
> +        .long 0x3f7fffcc, 0x37b42611
> +        .long 0x3f7fffce, 0x37ab17d6
> +        .long 0x3f7fffd1, 0x37a2790f
> +        .long 0x3f7fffd3, 0x379a44a5
> +        .long 0x3f7fffd6, 0x379275b9
> +        .long 0x3f7fffd8, 0x378b07a2
> +        .long 0x3f7fffda, 0x3783f5e9
> +        .long 0x3f7fffdc, 0x377a7897
> +        .long 0x3f7fffde, 0x376dad68
> +        .long 0x3f7fffe0, 0x37618278
> +        .long 0x3f7fffe2, 0x3755f04f
> +        .long 0x3f7fffe3, 0x374aefcc
> +        .long 0x3f7fffe5, 0x37407a1d
> +        .long 0x3f7fffe6, 0x373688bc
> +        .long 0x3f7fffe8, 0x372d1570
> +        .long 0x3f7fffe9, 0x37241a44
> +        .long 0x3f7fffea, 0x371b9188
> +        .long 0x3f7fffeb, 0x371375cf
> +        .long 0x3f7fffec, 0x370bc1e7
> +        .long 0x3f7fffee, 0x370470dd
> +        .long 0x3f7fffef, 0x36fafbec
> +        .long 0x3f7fffef, 0x36edc95b
> +        .long 0x3f7ffff0, 0x36e14167
> +        .long 0x3f7ffff1, 0x36d55bd6
> +        .long 0x3f7ffff2, 0x36ca10ce
> +        .long 0x3f7ffff3, 0x36bf58d1
> +        .long 0x3f7ffff4, 0x36b52cb9
> +        .long 0x3f7ffff4, 0x36ab85b5
> +        .long 0x3f7ffff5, 0x36a25d43
> +        .long 0x3f7ffff5, 0x3699ad31
> +        .long 0x3f7ffff6, 0x36916f95
> +        .long 0x3f7ffff7, 0x36899ecb
> +        .long 0x3f7ffff7, 0x36823575
> +        .long 0x3f7ffff8, 0x36765ce8
> +        .long 0x3f7ffff8, 0x366909cc
> +        .long 0x3f7ffff9, 0x365c684a
> +        .long 0x3f7ffff9, 0x36506f88
> +        .long 0x3f7ffff9, 0x36451713
> +        .long 0x3f7ffffa, 0x363a56e4
> +        .long 0x3f7ffffa, 0x36302754
> +        .long 0x3f7ffffa, 0x36268119
> +        .long 0x3f7ffffb, 0x361d5d43
> +        .long 0x3f7ffffb, 0x3614b538
> +        .long 0x3f7ffffb, 0x360c82b1
> +        .long 0x3f7ffffc, 0x3604bfb1
> +        .long 0x3f7ffffc, 0x35facd10
> +        .long 0x3f7ffffc, 0x35ece39b
> +        .long 0x3f7ffffc, 0x35dfb8b6
> +        .long 0x3f7ffffd, 0x35d34296
> +        .long 0x3f7ffffd, 0x35c777ec
> +        .long 0x3f7ffffd, 0x35bc4fdc
> +        .long 0x3f7ffffd, 0x35b1c1fc
> +        .long 0x3f7ffffd, 0x35a7c64b
> +        .long 0x3f7ffffd, 0x359e5531
> +        .long 0x3f7ffffe, 0x35956771
> +        .long 0x3f7ffffe, 0x358cf630
> +        .long 0x3f7ffffe, 0x3584fae8
> +        .long 0x3f7ffffe, 0x357adecb
> +        .long 0x3f7ffffe, 0x356c9b8f
> +        .long 0x3f7ffffe, 0x355f20ef
> +        .long 0x3f7ffffe, 0x3552644f
> +        .long 0x3f7ffffe, 0x35465b9c
> +        .long 0x3f7fffff, 0x353afd47
> +        .long 0x3f7fffff, 0x3530403c
> +        .long 0x3f7fffff, 0x35261be0
> +        .long 0x3f7fffff, 0x351c8807
> +        .long 0x3f7fffff, 0x35137cf0
> +        .long 0x3f7fffff, 0x350af341
> +        .long 0x3f7fffff, 0x3502e402
> +        .long 0x3f7fffff, 0x34f6912a
> +        .long 0x3f7fffff, 0x34e8356b
> +        .long 0x3f7fffff, 0x34daa8e4
> +        .long 0x3f7fffff, 0x34cde050
> +        .long 0x3f7fffff, 0x34c1d100
> +        .long 0x3f7fffff, 0x34b670d5
> +        .long 0x3f7fffff, 0x34abb639
> +        .long 0x3f7fffff, 0x34a19816
> +        .long 0x3f7fffff, 0x34980dd1
> +        .long 0x3f7fffff, 0x348f0f43
> +        .long 0x3f7fffff, 0x348694b3
> +        .long 0x3f800000, 0x347d2da8
> +        .long 0x3f800000, 0x346e1d72
> +        .align 32
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff  /* _AbsMask */
> +        .align 32
> +        .long 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000, 0x407b8000  /* _MaxThreshold */
> +        .align 32
> +        .long 0x47800000, 0x47800000, 0x47800000, 0x47800000, 0x47800000, 0x47800000, 0x47800000, 0x47800000  /* _SRound */
> +        .align 32
> +        .long 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000, 0x2f800000  /* _U2THreshold  */
> +        .align 32
> +        .long 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade, 0xbeaaaade  /* _poly_3_0 */
> +        .align 32
> +        .type	__svml_serf_data_internal,@object
> +        .size	__svml_serf_data_internal,.-__svml_serf_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_erf2_core.S b/sysdeps/x86_64/fpu/svml_d_erf2_core.S
> new file mode 100644
> index 0000000000..6ef30af2bd
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_erf2_core.S
> @@ -0,0 +1,29 @@
> +/* Function erf vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN2v_erf)
> +WRAPPER_IMPL_SSE2 erf
> +END (_ZGVbN2v_erf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_erf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_erf4_core.S b/sysdeps/x86_64/fpu/svml_d_erf4_core.S
> new file mode 100644
> index 0000000000..2ca8dfe92e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_erf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function erf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN4v_erf)
> +WRAPPER_IMPL_AVX _ZGVbN2v_erf
> +END (_ZGVdN4v_erf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_erf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S
> new file mode 100644
> index 0000000000..264ff09459
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_erf4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function erf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVcN4v_erf)
> +WRAPPER_IMPL_AVX _ZGVbN2v_erf
> +END (_ZGVcN4v_erf)
> diff --git a/sysdeps/x86_64/fpu/svml_d_erf8_core.S b/sysdeps/x86_64/fpu/svml_d_erf8_core.S
> new file mode 100644
> index 0000000000..de8c2a48bb
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_erf8_core.S
> @@ -0,0 +1,25 @@
> +/* Function erf vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN8v_erf)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_erf
> +END (_ZGVeN8v_erf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_erff16_core.S b/sysdeps/x86_64/fpu/svml_s_erff16_core.S
> new file mode 100644
> index 0000000000..2c5037a0ec
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_erff16_core.S
> @@ -0,0 +1,25 @@
> +/* Function erff vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVeN16v_erff)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_erff
> +END (_ZGVeN16v_erff)
> diff --git a/sysdeps/x86_64/fpu/svml_s_erff4_core.S b/sysdeps/x86_64/fpu/svml_s_erff4_core.S
> new file mode 100644
> index 0000000000..0f58bb7aaf
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_erff4_core.S
> @@ -0,0 +1,29 @@
> +/* Function erff vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVbN4v_erff)
> +WRAPPER_IMPL_SSE2 erff
> +END (_ZGVbN4v_erff)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_erff)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_erff8_core.S b/sysdeps/x86_64/fpu/svml_s_erff8_core.S
> new file mode 100644
> index 0000000000..a9f287c420
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_erff8_core.S
> @@ -0,0 +1,29 @@
> +/* Function erff vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +	.text
> +ENTRY (_ZGVdN8v_erff)
> +WRAPPER_IMPL_AVX _ZGVbN4v_erff
> +END (_ZGVdN8v_erff)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_erff)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S
> new file mode 100644
> index 0000000000..ca5a8048e8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_erff8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function erff vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_erff)
> +WRAPPER_IMPL_AVX _ZGVbN4v_erff
> +END (_ZGVcN8v_erff)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx.c
> new file mode 100644
> index 0000000000..a2eceefc9b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-erf.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx2.c
> new file mode 100644
> index 0000000000..a2eceefc9b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-erf.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx512f.c
> new file mode 100644
> index 0000000000..a2eceefc9b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-erf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-erf.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-erf.c b/sysdeps/x86_64/fpu/test-double-libmvec-erf.c
> new file mode 100644
> index 0000000000..c1ded24b1d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-erf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC erf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index db7ae3e7a6..9d91ccfe51 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVbN2v_acosh)
> +VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVbN2v_erf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 269ae38f67..9e86d5fef8 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -46,6 +46,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVdN4v_acosh)
> +VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVdN4v_erf)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index d95b960a45..0f4ef00de4 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVcN4v_acosh)
> +VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVcN4v_erf)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index a22f08b5f8..975dff85af 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVeN8v_acosh)
> +VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVeN8v_erf)
>  
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx.c
> new file mode 100644
> index 0000000000..8cdf4dc069
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-erff.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx2.c
> new file mode 100644
> index 0000000000..8cdf4dc069
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-erff.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx512f.c
> new file mode 100644
> index 0000000000..8cdf4dc069
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-erff-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-erff.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-erff.c b/sysdeps/x86_64/fpu/test-float-libmvec-erff.c
> new file mode 100644
> index 0000000000..ba83826ab9
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-erff.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC erff
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 7982ae2c84..2b1e27391a 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVeN16v_acoshf)
> +VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVeN16v_erff)
>  
>  #define VEC_INT_TYPE __m512i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index bdfcbea2cd..78428bf517 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVbN4v_acoshf)
> +VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVbN4v_erff)
>  
>  #define VEC_INT_TYPE __m128i
>  
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index 7b3ba81441..dadd4e6ca0 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -46,6 +46,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVdN8v_acoshf)
> +VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVdN8v_erff)
>  
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index a13d2e4ca1..7b2d583e54 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
>  VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVcN8v_acoshf)
> +VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVcN8v_erff)
>  
>  #define VEC_INT_TYPE __m128i
>  
> -- 
> 2.31.1
> 

LGTM.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Thanks.


H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 13/18] x86-64: Add vector log1p/log1pf implementation to libmvec
  2021-12-29 21:26   ` H.J. Lu
@ 2021-12-29 23:28     ` Noah Goldstein
  2021-12-30  0:32       ` H.J. Lu
  0 siblings, 1 reply; 40+ messages in thread
From: Noah Goldstein @ 2021-12-29 23:28 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Sunil K Pandey, Kolesov, Andrey, GNU C Library, Cornea, Marius

On Wed, Dec 29, 2021 at 3:43 PM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Tue, Dec 28, 2021 at 10:39:55PM -0800, Sunil K Pandey wrote:
> > Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and
> > AVX512 versions for libmvec as per vector ABI.  It also contains
> > accuracy and ABI tests for vector log1p/log1pf with regenerated ulps.
> > ---
> >  bits/libm-simd-decl-stubs.h                   |   11 +
> >  math/bits/mathcalls.h                         |    2 +-
> >  .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
> >  sysdeps/x86/fpu/bits/math-vector.h            |    4 +
> >  .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
> >  sysdeps/x86_64/fpu/Makeconfig                 |    1 +
> >  sysdeps/x86_64/fpu/Versions                   |    2 +
> >  sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
> >  .../fpu/multiarch/svml_d_log1p2_core-sse2.S   |   20 +
> >  .../x86_64/fpu/multiarch/svml_d_log1p2_core.c |   27 +
> >  .../fpu/multiarch/svml_d_log1p2_core_sse4.S   | 1398 +++++++++++++++++
> >  .../fpu/multiarch/svml_d_log1p4_core-sse.S    |   20 +
> >  .../x86_64/fpu/multiarch/svml_d_log1p4_core.c |   27 +
> >  .../fpu/multiarch/svml_d_log1p4_core_avx2.S   | 1383 ++++++++++++++++
> >  .../fpu/multiarch/svml_d_log1p8_core-avx2.S   |   20 +
> >  .../x86_64/fpu/multiarch/svml_d_log1p8_core.c |   27 +
> >  .../fpu/multiarch/svml_d_log1p8_core_avx512.S |  317 ++++
> >  .../fpu/multiarch/svml_s_log1pf16_core-avx2.S |   20 +
> >  .../fpu/multiarch/svml_s_log1pf16_core.c      |   28 +
> >  .../multiarch/svml_s_log1pf16_core_avx512.S   |  271 ++++
> >  .../fpu/multiarch/svml_s_log1pf4_core-sse2.S  |   20 +
> >  .../fpu/multiarch/svml_s_log1pf4_core.c       |   28 +
> >  .../fpu/multiarch/svml_s_log1pf4_core_sse4.S  |  252 +++
> >  .../fpu/multiarch/svml_s_log1pf8_core-sse.S   |   20 +
> >  .../fpu/multiarch/svml_s_log1pf8_core.c       |   28 +
> >  .../fpu/multiarch/svml_s_log1pf8_core_avx2.S  |  254 +++
> >  sysdeps/x86_64/fpu/svml_d_log1p2_core.S       |   29 +
> >  sysdeps/x86_64/fpu/svml_d_log1p4_core.S       |   29 +
> >  sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S   |   25 +
> >  sysdeps/x86_64/fpu/svml_d_log1p8_core.S       |   25 +
> >  sysdeps/x86_64/fpu/svml_s_log1pf16_core.S     |   25 +
> >  sysdeps/x86_64/fpu/svml_s_log1pf4_core.S      |   29 +
> >  sysdeps/x86_64/fpu/svml_s_log1pf8_core.S      |   29 +
> >  sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S  |   25 +
> >  .../fpu/test-double-libmvec-log1p-avx.c       |    1 +
> >  .../fpu/test-double-libmvec-log1p-avx2.c      |    1 +
> >  .../fpu/test-double-libmvec-log1p-avx512f.c   |    1 +
> >  .../x86_64/fpu/test-double-libmvec-log1p.c    |    3 +
> >  .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
> >  .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
> >  .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
> >  .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
> >  .../fpu/test-float-libmvec-log1pf-avx.c       |    1 +
> >  .../fpu/test-float-libmvec-log1pf-avx2.c      |    1 +
> >  .../fpu/test-float-libmvec-log1pf-avx512f.c   |    1 +
> >  .../x86_64/fpu/test-float-libmvec-log1pf.c    |    3 +
> >  .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
> >  .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
> >  .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
> >  .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
> >  50 files changed, 4447 insertions(+), 1 deletion(-)
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
> >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p2_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p4_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p8_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
> >  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
> >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
> >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
> >
> > diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> > index 73252615ca..845246fab9 100644
> > --- a/bits/libm-simd-decl-stubs.h
> > +++ b/bits/libm-simd-decl-stubs.h
> > @@ -241,4 +241,15 @@
> >  #define __DECL_SIMD_log2f32x
> >  #define __DECL_SIMD_log2f64x
> >  #define __DECL_SIMD_log2f128x
> > +
> > +#define __DECL_SIMD_log1p
> > +#define __DECL_SIMD_log1pf
> > +#define __DECL_SIMD_log1pl
> > +#define __DECL_SIMD_log1pf16
> > +#define __DECL_SIMD_log1pf32
> > +#define __DECL_SIMD_log1pf64
> > +#define __DECL_SIMD_log1pf128
> > +#define __DECL_SIMD_log1pf32x
> > +#define __DECL_SIMD_log1pf64x
> > +#define __DECL_SIMD_log1pf128x
> >  #endif
> > diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> > index bfe52a4666..aa4bc61aa4 100644
> > --- a/math/bits/mathcalls.h
> > +++ b/math/bits/mathcalls.h
> > @@ -119,7 +119,7 @@ __MATHCALL_VEC (exp10,, (_Mdouble_ __x));
> >  __MATHCALL_VEC (expm1,, (_Mdouble_ __x));
> >
> >  /* Return log(1 + X).  */
> > -__MATHCALL (log1p,, (_Mdouble_ __x));
> > +__MATHCALL_VEC (log1p,, (_Mdouble_ __x));
> >
> >  /* Return the base 2 signed integral exponent of X.  */
> >  __MATHCALL (logb,, (_Mdouble_ __x));
> > diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> > index fa8b016c5d..68b940606a 100644
> > --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> > +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> > @@ -55,6 +55,7 @@ GLIBC_2.35 _ZGVbN2v_exp10 F
> >  GLIBC_2.35 _ZGVbN2v_exp2 F
> >  GLIBC_2.35 _ZGVbN2v_expm1 F
> >  GLIBC_2.35 _ZGVbN2v_log10 F
> > +GLIBC_2.35 _ZGVbN2v_log1p F
> >  GLIBC_2.35 _ZGVbN2v_log2 F
> >  GLIBC_2.35 _ZGVbN2v_sinh F
> >  GLIBC_2.35 _ZGVbN2vv_atan2 F
> > @@ -68,6 +69,7 @@ GLIBC_2.35 _ZGVbN4v_exp10f F
> >  GLIBC_2.35 _ZGVbN4v_exp2f F
> >  GLIBC_2.35 _ZGVbN4v_expm1f F
> >  GLIBC_2.35 _ZGVbN4v_log10f F
> > +GLIBC_2.35 _ZGVbN4v_log1pf F
> >  GLIBC_2.35 _ZGVbN4v_log2f F
> >  GLIBC_2.35 _ZGVbN4v_sinhf F
> >  GLIBC_2.35 _ZGVbN4vv_atan2f F
> > @@ -81,6 +83,7 @@ GLIBC_2.35 _ZGVcN4v_exp10 F
> >  GLIBC_2.35 _ZGVcN4v_exp2 F
> >  GLIBC_2.35 _ZGVcN4v_expm1 F
> >  GLIBC_2.35 _ZGVcN4v_log10 F
> > +GLIBC_2.35 _ZGVcN4v_log1p F
> >  GLIBC_2.35 _ZGVcN4v_log2 F
> >  GLIBC_2.35 _ZGVcN4v_sinh F
> >  GLIBC_2.35 _ZGVcN4vv_atan2 F
> > @@ -94,6 +97,7 @@ GLIBC_2.35 _ZGVcN8v_exp10f F
> >  GLIBC_2.35 _ZGVcN8v_exp2f F
> >  GLIBC_2.35 _ZGVcN8v_expm1f F
> >  GLIBC_2.35 _ZGVcN8v_log10f F
> > +GLIBC_2.35 _ZGVcN8v_log1pf F
> >  GLIBC_2.35 _ZGVcN8v_log2f F
> >  GLIBC_2.35 _ZGVcN8v_sinhf F
> >  GLIBC_2.35 _ZGVcN8vv_atan2f F
> > @@ -107,6 +111,7 @@ GLIBC_2.35 _ZGVdN4v_exp10 F
> >  GLIBC_2.35 _ZGVdN4v_exp2 F
> >  GLIBC_2.35 _ZGVdN4v_expm1 F
> >  GLIBC_2.35 _ZGVdN4v_log10 F
> > +GLIBC_2.35 _ZGVdN4v_log1p F
> >  GLIBC_2.35 _ZGVdN4v_log2 F
> >  GLIBC_2.35 _ZGVdN4v_sinh F
> >  GLIBC_2.35 _ZGVdN4vv_atan2 F
> > @@ -120,6 +125,7 @@ GLIBC_2.35 _ZGVdN8v_exp10f F
> >  GLIBC_2.35 _ZGVdN8v_exp2f F
> >  GLIBC_2.35 _ZGVdN8v_expm1f F
> >  GLIBC_2.35 _ZGVdN8v_log10f F
> > +GLIBC_2.35 _ZGVdN8v_log1pf F
> >  GLIBC_2.35 _ZGVdN8v_log2f F
> >  GLIBC_2.35 _ZGVdN8v_sinhf F
> >  GLIBC_2.35 _ZGVdN8vv_atan2f F
> > @@ -133,6 +139,7 @@ GLIBC_2.35 _ZGVeN16v_exp10f F
> >  GLIBC_2.35 _ZGVeN16v_exp2f F
> >  GLIBC_2.35 _ZGVeN16v_expm1f F
> >  GLIBC_2.35 _ZGVeN16v_log10f F
> > +GLIBC_2.35 _ZGVeN16v_log1pf F
> >  GLIBC_2.35 _ZGVeN16v_log2f F
> >  GLIBC_2.35 _ZGVeN16v_sinhf F
> >  GLIBC_2.35 _ZGVeN16vv_atan2f F
> > @@ -146,6 +153,7 @@ GLIBC_2.35 _ZGVeN8v_exp10 F
> >  GLIBC_2.35 _ZGVeN8v_exp2 F
> >  GLIBC_2.35 _ZGVeN8v_expm1 F
> >  GLIBC_2.35 _ZGVeN8v_log10 F
> > +GLIBC_2.35 _ZGVeN8v_log1p F
> >  GLIBC_2.35 _ZGVeN8v_log2 F
> >  GLIBC_2.35 _ZGVeN8v_sinh F
> >  GLIBC_2.35 _ZGVeN8vv_atan2 F
> > diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> > index 59d284a10a..14c9db3bb3 100644
> > --- a/sysdeps/x86/fpu/bits/math-vector.h
> > +++ b/sysdeps/x86/fpu/bits/math-vector.h
> > @@ -110,6 +110,10 @@
> >  #  define __DECL_SIMD_log2 __DECL_SIMD_x86_64
> >  #  undef __DECL_SIMD_log2f
> >  #  define __DECL_SIMD_log2f __DECL_SIMD_x86_64
> > +#  undef __DECL_SIMD_log1p
> > +#  define __DECL_SIMD_log1p __DECL_SIMD_x86_64
> > +#  undef __DECL_SIMD_log1pf
> > +#  define __DECL_SIMD_log1pf __DECL_SIMD_x86_64
> >
> >  # endif
> >  #endif
> > diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> > index a2ca9a203f..3dca196432 100644
> > --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> > +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> > @@ -54,6 +54,8 @@
> >  !GCC$ builtin (log10f) attributes simd (notinbranch) if('x86_64')
> >  !GCC$ builtin (log2) attributes simd (notinbranch) if('x86_64')
> >  !GCC$ builtin (log2f) attributes simd (notinbranch) if('x86_64')
> > +!GCC$ builtin (log1p) attributes simd (notinbranch) if('x86_64')
> > +!GCC$ builtin (log1pf) attributes simd (notinbranch) if('x86_64')
> >
> >  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
> >  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> > @@ -93,3 +95,5 @@
> >  !GCC$ builtin (log10f) attributes simd (notinbranch) if('x32')
> >  !GCC$ builtin (log2) attributes simd (notinbranch) if('x32')
> >  !GCC$ builtin (log2f) attributes simd (notinbranch) if('x32')
> > +!GCC$ builtin (log1p) attributes simd (notinbranch) if('x32')
> > +!GCC$ builtin (log1pf) attributes simd (notinbranch) if('x32')
> > diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> > index 8d6d0915af..378cb06d37 100644
> > --- a/sysdeps/x86_64/fpu/Makeconfig
> > +++ b/sysdeps/x86_64/fpu/Makeconfig
> > @@ -36,6 +36,7 @@ libmvec-funcs = \
> >    hypot \
> >    log \
> >    log10 \
> > +  log1p \
> >    log2 \
> >    pow \
> >    sin \
> > diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> > index 1b48c2d642..155fb115f3 100644
> > --- a/sysdeps/x86_64/fpu/Versions
> > +++ b/sysdeps/x86_64/fpu/Versions
> > @@ -23,6 +23,7 @@ libmvec {
> >      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
> >      _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
> >      _ZGVbN2v_log10; _ZGVcN4v_log10; _ZGVdN4v_log10; _ZGVeN8v_log10;
> > +    _ZGVbN2v_log1p; _ZGVcN4v_log1p; _ZGVdN4v_log1p; _ZGVeN8v_log1p;
> >      _ZGVbN2v_log2; _ZGVcN4v_log2; _ZGVdN4v_log2; _ZGVeN8v_log2;
> >      _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
> >      _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
> > @@ -36,6 +37,7 @@ libmvec {
> >      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
> >      _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
> >      _ZGVbN4v_log10f; _ZGVcN8v_log10f; _ZGVdN8v_log10f; _ZGVeN16v_log10f;
> > +    _ZGVbN4v_log1pf; _ZGVcN8v_log1pf; _ZGVdN8v_log1pf; _ZGVeN16v_log1pf;
> >      _ZGVbN4v_log2f; _ZGVcN8v_log2f; _ZGVdN8v_log2f; _ZGVeN16v_log2f;
> >      _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
> >      _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
> > diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> > index 3b7f3cee6f..a2b15a795b 100644
> > --- a/sysdeps/x86_64/fpu/libm-test-ulps
> > +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> > @@ -1685,6 +1685,26 @@ float: 2
> >  float128: 2
> >  ldouble: 3
> >
> > +Function: "log1p_vlen16":
> > +float: 2
> > +
> > +Function: "log1p_vlen2":
> > +double: 1
> > +
> > +Function: "log1p_vlen4":
> > +double: 1
> > +float: 2
> > +
> > +Function: "log1p_vlen4_avx2":
> > +double: 1
> > +
> > +Function: "log1p_vlen8":
> > +double: 1
> > +float: 2
> > +
> > +Function: "log1p_vlen8_avx2":
> > +float: 2
> > +
> >  Function: "log2":
> >  double: 2
> >  float: 1
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
> > new file mode 100644
> > index 0000000000..8004088346
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
> > @@ -0,0 +1,20 @@
> > +/* SSE2 version of vectorized log1p, vector length is 2.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define _ZGVbN2v_log1p _ZGVbN2v_log1p_sse2
> > +#include "../svml_d_log1p2_core.S"
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
> > new file mode 100644
> > index 0000000000..35ca620aba
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
> > @@ -0,0 +1,27 @@
> > +/* Multiple versions of vectorized log1p, vector length is 2.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define SYMBOL_NAME _ZGVbN2v_log1p
> > +#include "ifunc-mathvec-sse4_1.h"
> > +
> > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > +
> > +#ifdef SHARED
> > +__hidden_ver1 (_ZGVbN2v_log1p, __GI__ZGVbN2v_log1p, __redirect__ZGVbN2v_log1p)
> > +  __attribute__ ((visibility ("hidden")));
> > +#endif
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
> > new file mode 100644
> > index 0000000000..9d3f0647b4
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
> > @@ -0,0 +1,1398 @@
> > +/* Function log1p vectorized with SSE4.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   https://www.gnu.org/licenses/.  */
> > +
> > +/*
> > + * ALGORITHM DESCRIPTION:
> > + *
> > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > + *       log(Rcp) is tabulated
> > + *
> > + *
> > + */
> > +
> > +/* Offsets for data table __svml_dlog1p_data_internal
> > + */
> > +#define Log_HA_table                         0

Where is this used?

> > +#define Log_LA_table                         8208
> > +#define poly_coeff                           12320
> > +#define ExpMask                              12384
> > +#define Two10                                12400
> > +#define MinLog1p                             12416
> > +#define MaxLog1p                             12432
> > +#define One                                  12448
> > +#define SgnMask                              12464
> > +#define XThreshold                           12480
> > +#define XhMask                               12496
> > +#define Threshold                            12512
> > +#define Bias                                 12528
> > +#define Bias1                                12544
> > +#define ExpMask0                             12560
> > +#define ExpMask2                             12576
> > +#define L2                                   12592
> > +
> > +/* Lookup bias for data table __svml_dlog1p_data_internal.  */
> > +#define Table_Lookup_Bias               -0x405ff0
> > +
> > +#include <sysdep.h>
> > +
> > +        .text
> > +     .section .text.sse4,"ax",@progbits
> > +ENTRY(_ZGVbN2v_log1p_sse4)
> > +        pushq     %rbp
> > +        cfi_def_cfa_offset(16)
> > +        movq      %rsp, %rbp
> > +        cfi_def_cfa(6, 16)
> > +        cfi_offset(6, -16)
> > +        andq      $-32, %rsp
> > +        subq      $64, %rsp
> > +        movaps    %xmm0, %xmm7
> > +
> > +/* SgnMask used by all accuracies */
> > +        movups    SgnMask+__svml_dlog1p_data_internal(%rip), %xmm6
> > +        lea       Table_Lookup_Bias+__svml_dlog1p_data_internal(%rip), %rsi
> > +        movaps    %xmm6, %xmm8
> > +        movaps    %xmm7, %xmm15
> > +        movups    One+__svml_dlog1p_data_internal(%rip), %xmm0
> > +        andps     %xmm7, %xmm8
> > +        cmpltpd   XThreshold+__svml_dlog1p_data_internal(%rip), %xmm8
> > +        cmpnlepd  MaxLog1p+__svml_dlog1p_data_internal(%rip), %xmm15
> > +        movaps    %xmm0, %xmm4
> > +
> > +/* compute 1+x as high, low parts */
> > +        movaps    %xmm0, %xmm9
> > +        addpd     %xmm7, %xmm4
> > +        maxpd     %xmm7, %xmm9
> > +        orps      XhMask+__svml_dlog1p_data_internal(%rip), %xmm8
> > +        movaps    %xmm0, %xmm5
> > +
> > +/* preserve mantissa, set input exponent to 2^(-10) */
> > +        movups    ExpMask+__svml_dlog1p_data_internal(%rip), %xmm3
> > +        andps     %xmm8, %xmm4
> > +        andps     %xmm4, %xmm3
> > +
> > +/* check range */
> > +        movaps    %xmm7, %xmm8
> > +        orps      Two10+__svml_dlog1p_data_internal(%rip), %xmm3
> > +
> > +/* Compute SignMask for all accuracies, including EP */
> > +        andnps    %xmm7, %xmm6
> > +
> > +/* reciprocal approximation good to at least 11 bits */
> > +        cvtpd2ps  %xmm3, %xmm10
> > +        minpd     %xmm7, %xmm5
> > +        subpd     %xmm4, %xmm9
> > +        cmpltpd   MinLog1p+__svml_dlog1p_data_internal(%rip), %xmm8
> > +        addpd     %xmm9, %xmm5
> > +        movlhps   %xmm10, %xmm10
> > +        orps      %xmm15, %xmm8
> > +        rcpps     %xmm10, %xmm11
> > +
> > +/* combine and get argument value range mask */
> > +        movmskpd  %xmm8, %edx
> > +
> > +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> > +        movups    .FLT_16(%rip), %xmm13
> > +
> > +/* exponent of X needed to scale Xl */
> > +        movdqu    ExpMask0+__svml_dlog1p_data_internal(%rip), %xmm12
> > +        cvtps2pd  %xmm11, %xmm1
> > +        addpd     %xmm13, %xmm1
> > +        subpd     %xmm13, %xmm1
> > +
> > +/* 2^ (-10-exp(X) ) */
> > +        movdqu    ExpMask2+__svml_dlog1p_data_internal(%rip), %xmm2
> > +        pand      %xmm4, %xmm12
> > +        psubq     %xmm12, %xmm2
> > +        mulpd     %xmm1, %xmm3
> > +
> > +/* scale DblRcp */
> > +        mulpd     %xmm1, %xmm2
> > +        subpd     %xmm0, %xmm3
> > +
> > +/*
> > + * argument reduction
> > + * VQFMS( D, R, X, DblRcp1, One );
> > + */
> > +        mulpd     %xmm2, %xmm5
> > +        addpd     %xmm5, %xmm3
> > +
> > +/* exponent*log(2.0) */
> > +        movups    Threshold+__svml_dlog1p_data_internal(%rip), %xmm10
> > +
> > +/* exponent bits */
> > +        psrlq     $20, %xmm4
> > +        pshufd    $221, %xmm4, %xmm14
> > +
> > +/*
> > + * prepare table index
> > + * table lookup
> > + */
> > +        movaps    %xmm1, %xmm4
> > +        cmpltpd   %xmm1, %xmm10
> > +
> > +/* biased exponent in DP format */
> > +        cvtdq2pd  %xmm14, %xmm0
> > +
> > +/* polynomial */
> > +        movups    poly_coeff+__svml_dlog1p_data_internal(%rip), %xmm1
> > +        movaps    %xmm3, %xmm5
> > +        mulpd     %xmm3, %xmm1
> > +        mulpd     %xmm3, %xmm5
> > +        addpd     poly_coeff+16+__svml_dlog1p_data_internal(%rip), %xmm1
> > +        movups    poly_coeff+32+__svml_dlog1p_data_internal(%rip), %xmm2
> > +        psrlq     $40, %xmm4
> > +        mulpd     %xmm3, %xmm2
> > +        mulpd     %xmm5, %xmm1
> > +        addpd     poly_coeff+48+__svml_dlog1p_data_internal(%rip), %xmm2
> > +        movd      %xmm4, %eax
> > +        andps     Bias+__svml_dlog1p_data_internal(%rip), %xmm10
> > +        addpd     %xmm1, %xmm2
> > +
> > +/* reconstruction */
> > +        mulpd     %xmm2, %xmm5
> > +        orps      Bias1+__svml_dlog1p_data_internal(%rip), %xmm10
> > +        pshufd    $2, %xmm4, %xmm9
> > +        subpd     %xmm10, %xmm0
> > +        addpd     %xmm5, %xmm3
> > +        movd      %xmm9, %ecx
> > +        mulpd     L2+__svml_dlog1p_data_internal(%rip), %xmm0
> > +        movslq    %eax, %rax
> > +        movslq    %ecx, %rcx
> > +        movsd     (%rsi,%rax), %xmm11
> > +        movhpd    (%rsi,%rcx), %xmm11
> > +        addpd     %xmm3, %xmm11
> > +        addpd     %xmm11, %xmm0
> > +
> > +/* OR in the Sign of input argument to produce correct log1p(-0) */
> > +        orps      %xmm6, %xmm0
> > +        testl     %edx, %edx
> > +
> > +/* Go to special inputs processing branch */
> > +        jne       L(SPECIAL_VALUES_BRANCH)
> > +                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm7
> > +
> > +/* Restore registers
> > + * and exit the function
> > + */
> > +
> > +L(EXIT):
> > +        movq      %rbp, %rsp
> > +        popq      %rbp
> > +        cfi_def_cfa(7, 8)
> > +        cfi_restore(6)
> > +        ret
> > +        cfi_def_cfa(6, 16)
> > +        cfi_offset(6, -16)
> > +
> > +/* Branch to process
> > + * special inputs
> > + */
> > +
> > +L(SPECIAL_VALUES_BRANCH):
> > +        movups    %xmm7, 32(%rsp)
> > +        movups    %xmm0, 48(%rsp)
> > +                                # LOE rbx r12 r13 r14 r15 edx
> > +
> > +        xorl      %eax, %eax
> > +        movq      %r12, 16(%rsp)
> > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> > +        movl      %eax, %r12d
> > +        movq      %r13, 8(%rsp)
> > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> > +        movl      %edx, %r13d
> > +        movq      %r14, (%rsp)
> > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +/* Range mask
> > + * bits check
> > + */
> > +
> > +L(RANGEMASK_CHECK):
> > +        btl       %r12d, %r13d
> > +
> > +/* Call scalar math function */
> > +        jc        L(SCALAR_MATH_CALL)
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +/* Special inputs
> > + * processing loop
> > + */
> > +
> > +L(SPECIAL_VALUES_LOOP):
> > +        incl      %r12d
> > +        cmpl      $2, %r12d
> > +
> > +/* Check bits in range mask */
> > +        jl        L(RANGEMASK_CHECK)
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +        movq      16(%rsp), %r12
> > +        cfi_restore(12)
> > +        movq      8(%rsp), %r13
> > +        cfi_restore(13)
> > +        movq      (%rsp), %r14
> > +        cfi_restore(14)
> > +        movups    48(%rsp), %xmm0
> > +
> > +/* Go to exit */
> > +        jmp       L(EXIT)
> > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> > +                                # LOE rbx r12 r13 r14 r15 xmm0
> > +
> > +/* Scalar math fucntion call
> > + * to process special input
> > + */
> > +
> > +L(SCALAR_MATH_CALL):
> > +        movl      %r12d, %r14d
> > +        movsd     32(%rsp,%r14,8), %xmm0
> > +        call      log1p@PLT
> > +                                # LOE rbx r14 r15 r12d r13d xmm0
> > +
> > +        movsd     %xmm0, 48(%rsp,%r14,8)
> > +
> > +/* Process special inputs in loop */
> > +        jmp       L(SPECIAL_VALUES_LOOP)
> > +                                # LOE rbx r15 r12d r13d
> > +END(_ZGVbN2v_log1p_sse4)
> > +
> > +        .section .rodata, "a"
> > +        .align 16
> > +
> > +#ifdef __svml_dlog1p_data_internal_typedef
> > +typedef unsigned int VUINT32;
> > +typedef struct {
> > +        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
> > +        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
> > +        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
> > +        __declspec(align(16)) VUINT32 ExpMask[2][2];
> > +        __declspec(align(16)) VUINT32 Two10[2][2];
> > +        __declspec(align(16)) VUINT32 MinLog1p[2][2];
> > +        __declspec(align(16)) VUINT32 MaxLog1p[2][2];
> > +        __declspec(align(16)) VUINT32 One[2][2];
> > +        __declspec(align(16)) VUINT32 SgnMask[2][2];
> > +        __declspec(align(16)) VUINT32 XThreshold[2][2];
> > +        __declspec(align(16)) VUINT32 XhMask[2][2];
> > +        __declspec(align(16)) VUINT32 Threshold[2][2];
> > +        __declspec(align(16)) VUINT32 Bias[2][2];
> > +        __declspec(align(16)) VUINT32 Bias1[2][2];
> > +        __declspec(align(16)) VUINT32 ExpMask0[2][2];
> > +        __declspec(align(16)) VUINT32 ExpMask2[2][2];
> > +        __declspec(align(16)) VUINT32 L2[2][2];
> > +} __svml_dlog1p_data_internal;
> > +#endif
> > +__svml_dlog1p_data_internal:
> > +        /* Log_HA_table */
> > +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> > +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> > +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> > +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> > +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> > +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> > +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> > +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> > +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> > +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> > +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> > +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> > +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> > +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> > +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> > +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> > +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> > +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> > +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> > +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> > +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> > +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> > +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> > +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> > +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> > +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> > +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> > +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> > +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> > +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> > +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> > +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> > +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> > +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> > +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> > +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> > +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> > +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> > +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> > +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> > +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> > +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> > +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> > +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> > +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> > +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> > +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> > +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> > +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> > +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> > +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> > +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> > +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> > +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> > +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> > +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> > +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> > +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> > +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> > +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> > +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> > +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> > +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> > +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> > +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> > +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> > +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> > +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> > +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> > +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> > +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> > +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> > +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> > +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> > +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> > +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> > +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> > +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> > +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> > +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> > +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> > +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> > +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> > +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> > +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> > +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> > +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> > +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> > +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> > +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> > +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> > +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> > +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> > +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> > +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> > +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> > +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> > +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> > +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> > +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> > +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> > +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> > +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> > +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> > +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> > +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> > +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> > +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> > +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> > +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> > +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> > +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> > +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> > +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> > +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> > +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> > +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> > +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> > +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> > +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> > +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> > +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> > +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> > +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> > +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> > +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> > +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> > +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> > +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> > +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> > +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> > +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> > +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> > +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> > +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> > +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> > +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> > +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> > +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> > +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> > +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> > +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> > +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> > +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> > +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> > +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> > +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> > +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> > +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> > +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> > +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> > +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> > +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> > +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> > +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> > +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> > +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> > +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> > +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> > +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> > +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> > +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> > +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> > +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> > +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> > +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> > +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> > +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> > +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> > +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> > +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> > +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> > +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> > +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> > +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> > +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> > +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> > +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> > +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> > +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> > +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> > +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> > +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> > +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> > +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> > +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> > +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> > +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> > +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> > +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> > +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> > +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> > +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> > +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> > +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> > +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> > +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> > +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> > +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> > +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> > +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> > +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> > +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> > +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> > +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> > +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> > +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> > +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> > +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> > +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> > +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> > +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> > +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> > +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> > +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> > +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> > +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> > +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> > +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> > +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> > +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> > +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> > +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> > +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> > +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> > +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> > +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> > +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> > +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> > +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> > +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> > +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> > +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> > +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> > +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> > +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> > +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> > +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> > +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> > +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> > +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> > +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> > +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> > +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> > +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> > +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> > +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> > +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> > +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> > +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> > +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> > +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> > +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> > +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> > +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> > +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> > +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> > +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> > +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> > +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> > +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> > +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> > +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> > +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> > +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> > +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> > +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> > +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> > +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> > +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> > +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> > +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> > +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> > +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> > +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> > +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> > +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> > +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> > +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> > +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> > +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> > +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> > +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> > +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> > +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> > +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> > +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> > +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> > +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> > +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> > +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> > +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> > +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> > +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> > +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> > +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> > +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> > +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> > +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> > +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> > +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> > +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> > +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> > +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> > +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> > +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> > +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> > +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> > +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> > +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> > +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> > +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> > +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> > +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> > +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> > +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> > +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> > +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> > +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> > +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> > +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> > +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> > +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> > +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> > +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> > +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> > +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> > +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> > +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> > +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> > +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> > +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> > +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> > +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> > +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> > +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> > +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> > +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> > +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> > +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> > +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> > +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> > +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> > +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> > +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> > +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> > +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> > +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> > +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> > +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> > +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> > +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> > +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> > +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> > +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> > +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> > +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> > +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> > +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> > +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> > +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> > +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> > +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> > +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> > +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> > +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> > +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> > +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> > +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> > +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> > +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> > +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> > +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> > +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> > +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> > +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> > +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> > +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> > +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> > +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> > +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> > +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> > +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> > +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> > +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> > +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> > +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> > +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> > +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> > +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> > +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> > +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> > +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> > +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> > +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> > +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> > +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> > +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> > +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> > +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> > +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> > +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> > +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> > +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> > +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> > +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> > +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> > +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> > +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> > +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> > +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> > +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> > +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> > +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> > +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> > +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> > +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> > +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> > +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> > +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> > +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> > +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> > +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> > +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> > +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> > +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> > +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> > +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> > +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> > +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> > +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> > +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> > +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> > +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> > +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> > +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> > +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> > +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> > +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> > +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> > +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> > +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> > +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> > +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> > +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> > +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> > +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> > +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> > +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> > +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> > +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> > +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> > +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> > +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> > +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> > +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> > +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> > +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> > +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> > +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> > +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> > +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> > +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> > +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> > +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> > +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> > +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> > +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> > +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> > +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> > +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> > +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> > +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> > +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> > +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> > +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> > +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> > +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> > +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> > +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> > +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> > +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> > +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> > +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> > +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> > +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> > +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> > +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> > +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> > +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> > +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> > +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> > +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> > +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> > +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> > +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> > +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> > +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> > +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> > +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> > +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> > +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> > +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> > +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> > +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> > +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> > +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> > +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> > +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> > +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> > +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> > +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> > +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> > +        /*== Log_LA_table ==*/
> > +        .align 16
> > +        .quad 0x8000000000000000
> > +        .quad 0xbf5ff802a9ab10e6
> > +        .quad 0xbf6ff00aa2b10bc0
> > +        .quad 0xbf77ee11ebd82e94
> > +        .quad 0xbf7fe02a6b106789
> > +        .quad 0xbf83e7295d25a7d9
> > +        .quad 0xbf87dc475f810a77
> > +        .quad 0xbf8bcf712c74384c
> > +        .quad 0xbf8fc0a8b0fc03e4
> > +        .quad 0xbf91d7f7eb9eebe7
> > +        .quad 0xbf93cea44346a575
> > +        .quad 0xbf95c45a51b8d389
> > +        .quad 0xbf97b91b07d5b11b
> > +        .quad 0xbf99ace7551cc514
> > +        .quad 0xbf9b9fc027af9198
> > +        .quad 0xbf9d91a66c543cc4
> > +        .quad 0xbf9f829b0e783300
> > +        .quad 0xbfa0b94f7c196176
> > +        .quad 0xbfa1b0d98923d980
> > +        .quad 0xbfa2a7ec2214e873
> > +        .quad 0xbfa39e87b9febd60
> > +        .quad 0xbfa494acc34d911c
> > +        .quad 0xbfa58a5bafc8e4d5
> > +        .quad 0xbfa67f94f094bd98
> > +        .quad 0xbfa77458f632dcfc
> > +        .quad 0xbfa868a83083f6cf
> > +        .quad 0xbfa95c830ec8e3eb
> > +        .quad 0xbfaa4fe9ffa3d235
> > +        .quad 0xbfab42dd711971bf
> > +        .quad 0xbfac355dd0921f2d
> > +        .quad 0xbfad276b8adb0b52
> > +        .quad 0xbfae19070c276016
> > +        .quad 0xbfaf0a30c01162a6
> > +        .quad 0xbfaffae9119b9303
> > +        .quad 0xbfb075983598e471
> > +        .quad 0xbfb0ed839b5526fe
> > +        .quad 0xbfb16536eea37ae1
> > +        .quad 0xbfb1dcb263db1944
> > +        .quad 0xbfb253f62f0a1417
> > +        .quad 0xbfb2cb0283f5de1f
> > +        .quad 0xbfb341d7961bd1d1
> > +        .quad 0xbfb3b87598b1b6ee
> > +        .quad 0xbfb42edcbea646f0
> > +        .quad 0xbfb4a50d3aa1b040
> > +        .quad 0xbfb51b073f06183f
> > +        .quad 0xbfb590cafdf01c28
> > +        .quad 0xbfb60658a93750c4
> > +        .quad 0xbfb67bb0726ec0fc
> > +        .quad 0xbfb6f0d28ae56b4c
> > +        .quad 0xbfb765bf23a6be13
> > +        .quad 0xbfb7da766d7b12cd
> > +        .quad 0xbfb84ef898e8282a
> > +        .quad 0xbfb8c345d6319b21
> > +        .quad 0xbfb9375e55595ede
> > +        .quad 0xbfb9ab42462033ad
> > +        .quad 0xbfba1ef1d8061cd4
> > +        .quad 0xbfba926d3a4ad563
> > +        .quad 0xbfbb05b49bee43fe
> > +        .quad 0xbfbb78c82bb0eda1
> > +        .quad 0xbfbbeba818146765
> > +        .quad 0xbfbc5e548f5bc743
> > +        .quad 0xbfbcd0cdbf8c13e1
> > +        .quad 0xbfbd4313d66cb35d
> > +        .quad 0xbfbdb5270187d927
> > +        .quad 0xbfbe27076e2af2e6
> > +        .quad 0xbfbe98b549671467
> > +        .quad 0xbfbf0a30c01162a6
> > +        .quad 0xbfbf7b79fec37ddf
> > +        .quad 0xbfbfec9131dbeabb
> > +        .quad 0xbfc02ebb42bf3d4b
> > +        .quad 0xbfc0671512ca596e
> > +        .quad 0xbfc09f561ee719c3
> > +        .quad 0xbfc0d77e7cd08e59
> > +        .quad 0xbfc10f8e422539b1
> > +        .quad 0xbfc14785846742ac
> > +        .quad 0xbfc17f6458fca611
> > +        .quad 0xbfc1b72ad52f67a0
> > +        .quad 0xbfc1eed90e2dc2c3
> > +        .quad 0xbfc2266f190a5acb
> > +        .quad 0xbfc25ded0abc6ad2
> > +        .quad 0xbfc29552f81ff523
> > +        .quad 0xbfc2cca0f5f5f251
> > +        .quad 0xbfc303d718e47fd3
> > +        .quad 0xbfc33af575770e4f
> > +        .quad 0xbfc371fc201e8f74
> > +        .quad 0xbfc3a8eb2d31a376
> > +        .quad 0xbfc3dfc2b0ecc62a
> > +        .quad 0xbfc41682bf727bc0
> > +        .quad 0xbfc44d2b6ccb7d1e
> > +        .quad 0xbfc483bccce6e3dd
> > +        .quad 0xbfc4ba36f39a55e5
> > +        .quad 0xbfc4f099f4a230b2
> > +        .quad 0xbfc526e5e3a1b438
> > +        .quad 0xbfc55d1ad4232d6f
> > +        .quad 0xbfc59338d9982086
> > +        .quad 0xbfc5c940075972b9
> > +        .quad 0xbfc5ff3070a793d4
> > +        .quad 0xbfc6350a28aaa758
> > +        .quad 0xbfc66acd4272ad51
> > +        .quad 0xbfc6a079d0f7aad2
> > +        .quad 0xbfc6d60fe719d21d
> > +        .quad 0xbfc70b8f97a1aa75
> > +        .quad 0xbfc740f8f54037a5
> > +        .quad 0xbfc7764c128f2127
> > +        .quad 0xbfc7ab890210d909
> > +        .quad 0xbfc7e0afd630c274
> > +        .quad 0xbfc815c0a14357eb
> > +        .quad 0xbfc84abb75865139
> > +        .quad 0xbfc87fa06520c911
> > +        .quad 0xbfc8b46f8223625b
> > +        .quad 0xbfc8e928de886d41
> > +        .quad 0xbfc91dcc8c340bde
> > +        .quad 0xbfc9525a9cf456b4
> > +        .quad 0xbfc986d3228180ca
> > +        .quad 0xbfc9bb362e7dfb83
> > +        .quad 0xbfc9ef83d2769a34
> > +        .quad 0xbfca23bc1fe2b563
> > +        .quad 0xbfca57df28244dcd
> > +        .quad 0xbfca8becfc882f19
> > +        .quad 0xbfcabfe5ae46124c
> > +        .quad 0xbfcaf3c94e80bff3
> > +        .quad 0xbfcb2797ee46320c
> > +        .quad 0xbfcb5b519e8fb5a4
> > +        .quad 0xbfcb8ef670420c3b
> > +        .quad 0xbfcbc286742d8cd6
> > +        .quad 0xbfcbf601bb0e44e2
> > +        .quad 0xbfcc2968558c18c1
> > +        .quad 0xbfcc5cba543ae425
> > +        .quad 0xbfcc8ff7c79a9a22
> > +        .quad 0xbfccc320c0176502
> > +        .quad 0xbfccf6354e09c5dc
> > +        .quad 0xbfcd293581b6b3e7
> > +        .quad 0xbfcd5c216b4fbb91
> > +        .quad 0xbfcd8ef91af31d5e
> > +        .quad 0xbfcdc1bca0abec7d
> > +        .quad 0xbfcdf46c0c722d2f
> > +        .quad 0xbfce27076e2af2e6
> > +        .quad 0xbfce598ed5a87e2f
> > +        .quad 0xbfce8c0252aa5a60
> > +        .quad 0xbfcebe61f4dd7b0b
> > +        .quad 0xbfcef0adcbdc5936
> > +        .quad 0xbfcf22e5e72f105d
> > +        .quad 0xbfcf550a564b7b37
> > +        .quad 0xbfcf871b28955045
> > +        .quad 0xbfcfb9186d5e3e2b
> > +        .quad 0xbfcfeb0233e607cc
> > +        .quad 0xbfd00e6c45ad501d
> > +        .quad 0xbfd0274dc16c232f
> > +        .quad 0xbfd0402594b4d041
> > +        .quad 0xbfd058f3c703ebc6
> > +        .quad 0xbfd071b85fcd590d
> > +        .quad 0xbfd08a73667c57af
> > +        .quad 0xbfd0a324e27390e3
> > +        .quad 0xbfd0bbccdb0d24bd
> > +        .quad 0xbfd0d46b579ab74b
> > +        .quad 0xbfd0ed005f657da4
> > +        .quad 0xbfd1058bf9ae4ad5
> > +        .quad 0xbfd11e0e2dad9cb7
> > +        .quad 0xbfd136870293a8b0
> > +        .quad 0xbfd14ef67f88685a
> > +        .quad 0xbfd1675cababa60e
> > +        .quad 0xbfd17fb98e15095d
> > +        .quad 0xbfd1980d2dd4236f
> > +        .quad 0xbfd1b05791f07b49
> > +        .quad 0xbfd1c898c16999fb
> > +        .quad 0xbfd1e0d0c33716be
> > +        .quad 0xbfd1f8ff9e48a2f3
> > +        .quad 0xbfd211255986160c
> > +        .quad 0xbfd22941fbcf7966
> > +        .quad 0xbfd241558bfd1404
> > +        .quad 0xbfd2596010df763a
> > +        .quad 0xbfd27161913f853d
> > +        .quad 0xbfd2895a13de86a3
> > +        .quad 0xbfd2a1499f762bc9
> > +        .quad 0xbfd2b9303ab89d25
> > +        .quad 0xbfd2d10dec508583
> > +        .quad 0xbfd2e8e2bae11d31
> > +        .quad 0xbfd300aead06350c
> > +        .quad 0xbfd31871c9544185
> > +        .quad 0xbfd3302c16586588
> > +        .quad 0xbfd347dd9a987d55
> > +        .quad 0xbfd35f865c93293e
> > +        .quad 0xbfd3772662bfd85b
> > +        .quad 0xbfd38ebdb38ed321
> > +        .quad 0xbfd3a64c556945ea
> > +        .quad 0xbfd3bdd24eb14b6a
> > +        .quad 0xbfd3d54fa5c1f710
> > +        .quad 0xbfd3ecc460ef5f50
> > +        .quad 0xbfd404308686a7e4
> > +        .quad 0xbfd41b941cce0bee
> > +        .quad 0xbfd432ef2a04e814
> > +        .quad 0xbfd44a41b463c47c
> > +        .quad 0xbfd4618bc21c5ec2
> > +        .quad 0xbfd478cd5959b3d9
> > +        .quad 0xbfd49006804009d1
> > +        .quad 0xbfd4a7373cecf997
> > +        .quad 0xbfd4be5f957778a1
> > +        .quad 0xbfd4d57f8fefe27f
> > +        .quad 0xbfd4ec973260026a
> > +        .quad 0xbfd503a682cb1cb3
> > +        .quad 0xbfd51aad872df82d
> > +        .quad 0xbfd531ac457ee77e
> > +        .quad 0xbfd548a2c3add263
> > +        .quad 0xbfd55f9107a43ee2
> > +        .quad 0xbfd5767717455a6c
> > +        .quad 0xbfd58d54f86e02f2
> > +        .quad 0xbfd5a42ab0f4cfe2
> > +        .quad 0xbfd5baf846aa1b19
> > +        .quad 0xbfd5d1bdbf5809ca
> > +        .quad 0xbfd5e87b20c2954a
> > +        .quad 0xbfd5ff3070a793d4
> > +        .quad 0xbfd615ddb4bec13c
> > +        .quad 0xbfd62c82f2b9c795
> > +        .quad 0x3fd61965cdb02c1f
> > +        .quad 0x3fd602d08af091ec
> > +        .quad 0x3fd5ec433d5c35ae
> > +        .quad 0x3fd5d5bddf595f30
> > +        .quad 0x3fd5bf406b543db2
> > +        .quad 0x3fd5a8cadbbedfa1
> > +        .quad 0x3fd5925d2b112a59
> > +        .quad 0x3fd57bf753c8d1fb
> > +        .quad 0x3fd565995069514c
> > +        .quad 0x3fd54f431b7be1a9
> > +        .quad 0x3fd538f4af8f72fe
> > +        .quad 0x3fd522ae0738a3d8
> > +        .quad 0x3fd50c6f1d11b97c
> > +        .quad 0x3fd4f637ebba9810
> > +        .quad 0x3fd4e0086dd8baca
> > +        .quad 0x3fd4c9e09e172c3c
> > +        .quad 0x3fd4b3c077267e9a
> > +        .quad 0x3fd49da7f3bcc41f
> > +        .quad 0x3fd487970e958770
> > +        .quad 0x3fd4718dc271c41b
> > +        .quad 0x3fd45b8c0a17df13
> > +        .quad 0x3fd44591e0539f49
> > +        .quad 0x3fd42f9f3ff62642
> > +        .quad 0x3fd419b423d5e8c7
> > +        .quad 0x3fd403d086cea79c
> > +        .quad 0x3fd3edf463c1683e
> > +        .quad 0x3fd3d81fb5946dba
> > +        .quad 0x3fd3c25277333184
> > +        .quad 0x3fd3ac8ca38e5c5f
> > +        .quad 0x3fd396ce359bbf54
> > +        .quad 0x3fd3811728564cb2
> > +        .quad 0x3fd36b6776be1117
> > +        .quad 0x3fd355bf1bd82c8b
> > +        .quad 0x3fd3401e12aecba1
> > +        .quad 0x3fd32a84565120a8
> > +        .quad 0x3fd314f1e1d35ce4
> > +        .quad 0x3fd2ff66b04ea9d4
> > +        .quad 0x3fd2e9e2bce12286
> > +        .quad 0x3fd2d46602adccee
> > +        .quad 0x3fd2bef07cdc9354
> > +        .quad 0x3fd2a982269a3dbf
> > +        .quad 0x3fd2941afb186b7c
> > +        .quad 0x3fd27ebaf58d8c9d
> > +        .quad 0x3fd269621134db92
> > +        .quad 0x3fd25410494e56c7
> > +        .quad 0x3fd23ec5991eba49
> > +        .quad 0x3fd22981fbef797b
> > +        .quad 0x3fd214456d0eb8d4
> > +        .quad 0x3fd1ff0fe7cf47a7
> > +        .quad 0x3fd1e9e1678899f4
> > +        .quad 0x3fd1d4b9e796c245
> > +        .quad 0x3fd1bf99635a6b95
> > +        .quad 0x3fd1aa7fd638d33f
> > +        .quad 0x3fd1956d3b9bc2fa
> > +        .quad 0x3fd180618ef18adf
> > +        .quad 0x3fd16b5ccbacfb73
> > +        .quad 0x3fd1565eed455fc3
> > +        .quad 0x3fd14167ef367783
> > +        .quad 0x3fd12c77cd00713b
> > +        .quad 0x3fd1178e8227e47c
> > +        .quad 0x3fd102ac0a35cc1c
> > +        .quad 0x3fd0edd060b78081
> > +        .quad 0x3fd0d8fb813eb1ef
> > +        .quad 0x3fd0c42d676162e3
> > +        .quad 0x3fd0af660eb9e279
> > +        .quad 0x3fd09aa572e6c6d4
> > +        .quad 0x3fd085eb8f8ae797
> > +        .quad 0x3fd07138604d5862
> > +        .quad 0x3fd05c8be0d9635a
> > +        .quad 0x3fd047e60cde83b8
> > +        .quad 0x3fd03346e0106062
> > +        .quad 0x3fd01eae5626c691
> > +        .quad 0x3fd00a1c6adda473
> > +        .quad 0x3fcfeb2233ea07cd
> > +        .quad 0x3fcfc218be620a5e
> > +        .quad 0x3fcf991c6cb3b379
> > +        .quad 0x3fcf702d36777df0
> > +        .quad 0x3fcf474b134df229
> > +        .quad 0x3fcf1e75fadf9bde
> > +        .quad 0x3fcef5ade4dcffe6
> > +        .quad 0x3fceccf2c8fe920a
> > +        .quad 0x3fcea4449f04aaf5
> > +        .quad 0x3fce7ba35eb77e2a
> > +        .quad 0x3fce530effe71012
> > +        .quad 0x3fce2a877a6b2c12
> > +        .quad 0x3fce020cc6235ab5
> > +        .quad 0x3fcdd99edaf6d7e9
> > +        .quad 0x3fcdb13db0d48940
> > +        .quad 0x3fcd88e93fb2f450
> > +        .quad 0x3fcd60a17f903515
> > +        .quad 0x3fcd38666871f465
> > +        .quad 0x3fcd1037f2655e7b
> > +        .quad 0x3fcce816157f1988
> > +        .quad 0x3fccc000c9db3c52
> > +        .quad 0x3fcc97f8079d44ec
> > +        .quad 0x3fcc6ffbc6f00f71
> > +        .quad 0x3fcc480c0005ccd1
> > +        .quad 0x3fcc2028ab17f9b4
> > +        .quad 0x3fcbf851c067555f
> > +        .quad 0x3fcbd087383bd8ad
> > +        .quad 0x3fcba8c90ae4ad19
> > +        .quad 0x3fcb811730b823d2
> > +        .quad 0x3fcb5971a213acdb
> > +        .quad 0x3fcb31d8575bce3d
> > +        .quad 0x3fcb0a4b48fc1b46
> > +        .quad 0x3fcae2ca6f672bd4
> > +        .quad 0x3fcabb55c31693ad
> > +        .quad 0x3fca93ed3c8ad9e3
> > +        .quad 0x3fca6c90d44b704e
> > +        .quad 0x3fca454082e6ab05
> > +        .quad 0x3fca1dfc40f1b7f1
> > +        .quad 0x3fc9f6c407089664
> > +        .quad 0x3fc9cf97cdce0ec3
> > +        .quad 0x3fc9a8778debaa38
> > +        .quad 0x3fc981634011aa75
> > +        .quad 0x3fc95a5adcf7017f
> > +        .quad 0x3fc9335e5d594989
> > +        .quad 0x3fc90c6db9fcbcd9
> > +        .quad 0x3fc8e588ebac2dbf
> > +        .quad 0x3fc8beafeb38fe8c
> > +        .quad 0x3fc897e2b17b19a5
> > +        .quad 0x3fc871213750e994
> > +        .quad 0x3fc84a6b759f512f
> > +        .quad 0x3fc823c16551a3c2
> > +        .quad 0x3fc7fd22ff599d4f
> > +        .quad 0x3fc7d6903caf5ad0
> > +        .quad 0x3fc7b0091651528c
> > +        .quad 0x3fc7898d85444c73
> > +        .quad 0x3fc7631d82935a86
> > +        .quad 0x3fc73cb9074fd14d
> > +        .quad 0x3fc716600c914054
> > +        .quad 0x3fc6f0128b756abc
> > +        .quad 0x3fc6c9d07d203fc7
> > +        .quad 0x3fc6a399dabbd383
> > +        .quad 0x3fc67d6e9d785771
> > +        .quad 0x3fc6574ebe8c133a
> > +        .quad 0x3fc6313a37335d76
> > +        .quad 0x3fc60b3100b09476
> > +        .quad 0x3fc5e533144c1719
> > +        .quad 0x3fc5bf406b543db2
> > +        .quad 0x3fc59958ff1d52f1
> > +        .quad 0x3fc5737cc9018cdd
> > +        .quad 0x3fc54dabc26105d2
> > +        .quad 0x3fc527e5e4a1b58d
> > +        .quad 0x3fc5022b292f6a45
> > +        .quad 0x3fc4dc7b897bc1c8
> > +        .quad 0x3fc4b6d6fefe22a4
> > +        .quad 0x3fc4913d8333b561
> > +        .quad 0x3fc46baf0f9f5db7
> > +        .quad 0x3fc4462b9dc9b3dc
> > +        .quad 0x3fc420b32740fdd4
> > +        .quad 0x3fc3fb45a59928cc
> > +        .quad 0x3fc3d5e3126bc27f
> > +        .quad 0x3fc3b08b6757f2a9
> > +        .quad 0x3fc38b3e9e027479
> > +        .quad 0x3fc365fcb0159016
> > +        .quad 0x3fc340c59741142e
> > +        .quad 0x3fc31b994d3a4f85
> > +        .quad 0x3fc2f677cbbc0a96
> > +        .quad 0x3fc2d1610c86813a
> > +        .quad 0x3fc2ac55095f5c59
> > +        .quad 0x3fc28753bc11aba5
> > +        .quad 0x3fc2625d1e6ddf57
> > +        .quad 0x3fc23d712a49c202
> > +        .quad 0x3fc2188fd9807263
> > +        .quad 0x3fc1f3b925f25d41
> > +        .quad 0x3fc1ceed09853752
> > +        .quad 0x3fc1aa2b7e23f72a
> > +        .quad 0x3fc185747dbecf34
> > +        .quad 0x3fc160c8024b27b1
> > +        .quad 0x3fc13c2605c398c3
> > +        .quad 0x3fc1178e8227e47c
> > +        .quad 0x3fc0f301717cf0fb
> > +        .quad 0x3fc0ce7ecdccc28d
> > +        .quad 0x3fc0aa06912675d5
> > +        .quad 0x3fc08598b59e3a07
> > +        .quad 0x3fc06135354d4b18
> > +        .quad 0x3fc03cdc0a51ec0d
> > +        .quad 0x3fc0188d2ecf6140
> > +        .quad 0x3fbfe89139dbd566
> > +        .quad 0x3fbfa01c9db57ce2
> > +        .quad 0x3fbf57bc7d9005db
> > +        .quad 0x3fbf0f70cdd992e3
> > +        .quad 0x3fbec739830a1120
> > +        .quad 0x3fbe7f1691a32d3e
> > +        .quad 0x3fbe3707ee30487b
> > +        .quad 0x3fbdef0d8d466db9
> > +        .quad 0x3fbda727638446a2
> > +        .quad 0x3fbd5f55659210e2
> > +        .quad 0x3fbd179788219364
> > +        .quad 0x3fbccfedbfee13a8
> > +        .quad 0x3fbc885801bc4b23
> > +        .quad 0x3fbc40d6425a5cb1
> > +        .quad 0x3fbbf968769fca11
> > +        .quad 0x3fbbb20e936d6974
> > +        .quad 0x3fbb6ac88dad5b1c
> > +        .quad 0x3fbb23965a52ff00
> > +        .quad 0x3fbadc77ee5aea8c
> > +        .quad 0x3fba956d3ecade63
> > +        .quad 0x3fba4e7640b1bc38
> > +        .quad 0x3fba0792e9277cac
> > +        .quad 0x3fb9c0c32d4d2548
> > +        .quad 0x3fb97a07024cbe74
> > +        .quad 0x3fb9335e5d594989
> > +        .quad 0x3fb8ecc933aeb6e8
> > +        .quad 0x3fb8a6477a91dc29
> > +        .quad 0x3fb85fd927506a48
> > +        .quad 0x3fb8197e2f40e3f0
> > +        .quad 0x3fb7d33687c293c9
> > +        .quad 0x3fb78d02263d82d3
> > +        .quad 0x3fb746e100226ed9
> > +        .quad 0x3fb700d30aeac0e1
> > +        .quad 0x3fb6bad83c1883b6
> > +        .quad 0x3fb674f089365a7a
> > +        .quad 0x3fb62f1be7d77743
> > +        .quad 0x3fb5e95a4d9791cb
> > +        .quad 0x3fb5a3abb01ade25
> > +        .quad 0x3fb55e10050e0384
> > +        .quad 0x3fb518874226130a
> > +        .quad 0x3fb4d3115d207eac
> > +        .quad 0x3fb48dae4bc31018
> > +        .quad 0x3fb4485e03dbdfad
> > +        .quad 0x3fb403207b414b7f
> > +        .quad 0x3fb3bdf5a7d1ee64
> > +        .quad 0x3fb378dd7f749714
> > +        .quad 0x3fb333d7f8183f4b
> > +        .quad 0x3fb2eee507b40301
> > +        .quad 0x3fb2aa04a44717a5
> > +        .quad 0x3fb26536c3d8c369
> > +        .quad 0x3fb2207b5c78549e
> > +        .quad 0x3fb1dbd2643d190b
> > +        .quad 0x3fb1973bd1465567
> > +        .quad 0x3fb152b799bb3cc9
> > +        .quad 0x3fb10e45b3cae831
> > +        .quad 0x3fb0c9e615ac4e17
> > +        .quad 0x3fb08598b59e3a07
> > +        .quad 0x3fb0415d89e74444
> > +        .quad 0x3faffa6911ab9301
> > +        .quad 0x3faf723b517fc523
> > +        .quad 0x3faeea31c006b87c
> > +        .quad 0x3fae624c4a0b5e1b
> > +        .quad 0x3fadda8adc67ee4e
> > +        .quad 0x3fad52ed6405d86f
> > +        .quad 0x3faccb73cdddb2cc
> > +        .quad 0x3fac441e06f72a9e
> > +        .quad 0x3fabbcebfc68f420
> > +        .quad 0x3fab35dd9b58baad
> > +        .quad 0x3faaaef2d0fb10fc
> > +        .quad 0x3faa282b8a936171
> > +        .quad 0x3fa9a187b573de7c
> > +        .quad 0x3fa91b073efd7314
> > +        .quad 0x3fa894aa149fb343
> > +        .quad 0x3fa80e7023d8ccc4
> > +        .quad 0x3fa788595a3577ba
> > +        .quad 0x3fa70265a550e777
> > +        .quad 0x3fa67c94f2d4bb58
> > +        .quad 0x3fa5f6e73078efb8
> > +        .quad 0x3fa5715c4c03ceef
> > +        .quad 0x3fa4ebf43349e26f
> > +        .quad 0x3fa466aed42de3ea
> > +        .quad 0x3fa3e18c1ca0ae92
> > +        .quad 0x3fa35c8bfaa1306b
> > +        .quad 0x3fa2d7ae5c3c5bae
> > +        .quad 0x3fa252f32f8d183f
> > +        .quad 0x3fa1ce5a62bc353a
> > +        .quad 0x3fa149e3e4005a8d
> > +        .quad 0x3fa0c58fa19dfaaa
> > +        .quad 0x3fa0415d89e74444
> > +        .quad 0x3f9f7a9b16782856
> > +        .quad 0x3f9e72bf2813ce51
> > +        .quad 0x3f9d6b2725979802
> > +        .quad 0x3f9c63d2ec14aaf2
> > +        .quad 0x3f9b5cc258b718e6
> > +        .quad 0x3f9a55f548c5c43f
> > +        .quad 0x3f994f6b99a24475
> > +        .quad 0x3f98492528c8cabf
> > +        .quad 0x3f974321d3d006d3
> > +        .quad 0x3f963d6178690bd6
> > +        .quad 0x3f9537e3f45f3565
> > +        .quad 0x3f9432a925980cc1
> > +        .quad 0x3f932db0ea132e22
> > +        .quad 0x3f9228fb1fea2e28
> > +        .quad 0x3f912487a5507f70
> > +        .quad 0x3f90205658935847
> > +        .quad 0x3f8e38ce3033310c
> > +        .quad 0x3f8c317384c75f06
> > +        .quad 0x3f8a2a9c6c170462
> > +        .quad 0x3f882448a388a2aa
> > +        .quad 0x3f861e77e8b53fc6
> > +        .quad 0x3f841929f96832f0
> > +        .quad 0x3f82145e939ef1e9
> > +        .quad 0x3f8010157588de71
> > +        .quad 0x3f7c189cbb0e27fb
> > +        .quad 0x3f78121214586b54
> > +        .quad 0x3f740c8a747878e2
> > +        .quad 0x3f70080559588b35
> > +        .quad 0x3f680904828985c0
> > +        .quad 0x3f60040155d5889e
> > +        .quad 0x3f50020055655889
> > +        .quad 0x0000000000000000
> > +        /*== poly_coeff[4] ==*/
> > +        .align 16
> > +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> > +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> > +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> > +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> > +        /*== ExpMask ==*/
> > +        .align 16
> > +        .quad 0x000fffffffffffff, 0x000fffffffffffff
> > +        /*== Two10 ==*/
> > +        .align 16
> > +        .quad 0x3f50000000000000, 0x3f50000000000000
> > +        /*== MinLog1p = -1+2^(-53) ==*/
> > +        .align 16
> > +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff
> > +        /*== MaxLog1p ==*/
> > +        .align 16
> > +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000
> > +        /*== One ==*/
> > +        .align 16
> > +        .quad 0x3ff0000000000000, 0x3ff0000000000000
> > +        /*== SgnMask ==*/
> > +        .align 16
> > +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
> > +        /*== XThreshold ==*/
> > +        .align 16
> > +        .quad 0x3e00000000000000, 0x3e00000000000000
> > +        /*== XhMask ==*/
> > +        .align 16
> > +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00
> > +        /*== Threshold ==*/
> > +        .align 16
> > +        .quad 0x4086a00000000000, 0x4086a00000000000
> > +        /*== Bias ==*/
> > +        .align 16
> > +        .quad 0x408ff80000000000, 0x408ff80000000000
> > +        /*== Bias1 ==*/
> > +        .align 16
> > +        .quad 0x408ff00000000000, 0x408ff00000000000
> > +        /*== ExpMask ==*/
> > +        .align 16
> > +        .quad 0x7ff0000000000000, 0x7ff0000000000000
> > +        /*== ExpMask2 ==*/
> > +        .align 16
> > +        .quad 0x7f40000000000000, 0x7f40000000000000
> > +        /*== L2L ==*/
> > +        .align 16
> > +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> > +        .align 16
> > +        .type        __svml_dlog1p_data_internal,@object
> > +        .size        __svml_dlog1p_data_internal,.-__svml_dlog1p_data_internal
> > +        .space 96, 0x00
> > +        .align 16
> > +
> > +.FLT_16:
> > +        .long        0x00000000,0x43380000,0x00000000,0x43380000
> > +        .type        .FLT_16,@object
> > +        .size        .FLT_16,16
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
> > new file mode 100644
> > index 0000000000..ec01af680c
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
> > @@ -0,0 +1,20 @@
> > +/* SSE version of vectorized log1p, vector length is 4.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define _ZGVdN4v_log1p _ZGVdN4v_log1p_sse_wrapper
> > +#include "../svml_d_log1p4_core.S"
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
> > new file mode 100644
> > index 0000000000..808f3224ef
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
> > @@ -0,0 +1,27 @@
> > +/* Multiple versions of vectorized log1p, vector length is 4.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define SYMBOL_NAME _ZGVdN4v_log1p
> > +#include "ifunc-mathvec-avx2.h"
> > +
> > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > +
> > +#ifdef SHARED
> > +__hidden_ver1 (_ZGVdN4v_log1p, __GI__ZGVdN4v_log1p, __redirect__ZGVdN4v_log1p)
> > +  __attribute__ ((visibility ("hidden")));
> > +#endif
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
> > new file mode 100644
> > index 0000000000..548538b0ec
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
> > @@ -0,0 +1,1383 @@
> > +/* Function log1p vectorized with AVX2.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   https://www.gnu.org/licenses/.  */
> > +
> > +/*
> > + * ALGORITHM DESCRIPTION:
> > + *
> > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > + *       log(Rcp) is tabulated
> > + *
> > + *
> > + */
> > +
> > +/* Offsets for data table __svml_dlog1p_data_internal
> > + */
> > +#define Log_HA_table                         0
> > +#define Log_LA_table                         8224
> > +#define poly_coeff                           12352
> > +#define ExpMask                              12480
> > +#define Two10                                12512
> > +#define MinLog1p                             12544
> > +#define MaxLog1p                             12576
> > +#define One                                  12608
> > +#define SgnMask                              12640
> > +#define XThreshold                           12672
> > +#define XhMask                               12704
> > +#define Threshold                            12736
> > +#define Bias                                 12768
> > +#define Bias1                                12800
> > +#define ExpMask0                             12832
> > +#define ExpMask2                             12864
> > +#define L2                                   12896
> > +
> > +/* Lookup bias for data table __svml_dlog1p_data_internal.  */
> > +#define Table_Lookup_Bias               -0x405fe0
> > +
> > +#include <sysdep.h>
> > +
> > +        .text
> > +     .section .text.avx2,"ax",@progbits
> > +ENTRY(_ZGVdN4v_log1p_avx2)
> > +        pushq     %rbp
> > +        cfi_def_cfa_offset(16)
> > +        movq      %rsp, %rbp
> > +        cfi_def_cfa(6, 16)
> > +        cfi_offset(6, -16)
> > +        andq      $-32, %rsp
> > +        subq      $96, %rsp
> > +        lea       Table_Lookup_Bias+__svml_dlog1p_data_internal(%rip), %r8
> > +
> > +/* SgnMask used by all accuracies */
> > +        vmovupd   SgnMask+__svml_dlog1p_data_internal(%rip), %ymm12
> > +        vmovupd   One+__svml_dlog1p_data_internal(%rip), %ymm7
> > +
> > +/* 2^ (-10-exp(X) ) */
> > +        vmovupd   ExpMask2+__svml_dlog1p_data_internal(%rip), %ymm3
> > +        vmovapd   %ymm0, %ymm9
> > +        vandpd    %ymm12, %ymm9, %ymm10
> > +        vcmplt_oqpd XThreshold+__svml_dlog1p_data_internal(%rip), %ymm10, %ymm11
> > +        vaddpd    %ymm7, %ymm9, %ymm13
> > +
> > +/* compute 1+x as high, low parts */
> > +        vmaxpd    %ymm9, %ymm7, %ymm15
> > +        vminpd    %ymm9, %ymm7, %ymm6
> > +        vorpd     XhMask+__svml_dlog1p_data_internal(%rip), %ymm11, %ymm14
> > +        vandpd    %ymm14, %ymm13, %ymm4
> > +
> > +/* preserve mantissa, set input exponent to 2^(-10) */
> > +        vandpd    ExpMask+__svml_dlog1p_data_internal(%rip), %ymm4, %ymm5
> > +        vorpd     Two10+__svml_dlog1p_data_internal(%rip), %ymm5, %ymm5
> > +
> > +/* reciprocal approximation good to at least 11 bits */
> > +        vcvtpd2ps %ymm5, %xmm2
> > +        vsubpd    %ymm4, %ymm15, %ymm0
> > +
> > +/* check range */
> > +        vcmplt_oqpd MinLog1p+__svml_dlog1p_data_internal(%rip), %ymm9, %ymm15
> > +        vrcpps    %xmm2, %xmm1
> > +        vaddpd    %ymm0, %ymm6, %ymm6
> > +        vcmpnle_uqpd MaxLog1p+__svml_dlog1p_data_internal(%rip), %ymm9, %ymm0
> > +        vcvtps2pd %xmm1, %ymm11
> > +
> > +/* exponent of X needed to scale Xl */
> > +        vandps    ExpMask0+__svml_dlog1p_data_internal(%rip), %ymm4, %ymm10
> > +        vpsubq    %ymm10, %ymm3, %ymm13
> > +
> > +/* exponent bits */
> > +        vpsrlq    $20, %ymm4, %ymm4
> > +
> > +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> > +        vroundpd  $0, %ymm11, %ymm3
> > +
> > +/* scale DblRcp */
> > +        vmulpd    %ymm13, %ymm3, %ymm2
> > +
> > +/* exponent*log(2.0) */
> > +        vmovupd   Threshold+__svml_dlog1p_data_internal(%rip), %ymm13
> > +        vfmsub213pd %ymm7, %ymm3, %ymm5
> > +
> > +/* Compute SignMask for all accuracies, including EP */
> > +        vandnpd   %ymm9, %ymm12, %ymm8
> > +        vorpd     %ymm0, %ymm15, %ymm7
> > +
> > +/*
> > + * prepare table index
> > + * table lookup
> > + */
> > +        vpsrlq    $40, %ymm3, %ymm0
> > +
> > +/*
> > + * argument reduction
> > + * VQFMS( D, R, X, DblRcp1, One );
> > + */
> > +        vfmadd213pd %ymm5, %ymm2, %ymm6
> > +        vmovupd   poly_coeff+64+__svml_dlog1p_data_internal(%rip), %ymm2
> > +        vcmplt_oqpd %ymm3, %ymm13, %ymm3
> > +        vmulpd    %ymm6, %ymm6, %ymm5
> > +        vfmadd213pd poly_coeff+96+__svml_dlog1p_data_internal(%rip), %ymm6, %ymm2
> > +
> > +/* combine and get argument value range mask */
> > +        vmovmskpd %ymm7, %eax
> > +        vextractf128 $1, %ymm4, %xmm12
> > +        vshufps   $221, %xmm12, %xmm4, %xmm14
> > +
> > +/* biased exponent in DP format */
> > +        vcvtdq2pd %xmm14, %ymm1
> > +        vandpd    Bias+__svml_dlog1p_data_internal(%rip), %ymm3, %ymm14
> > +        vorpd     Bias1+__svml_dlog1p_data_internal(%rip), %ymm14, %ymm15
> > +        vsubpd    %ymm15, %ymm1, %ymm1
> > +        vmulpd    L2+__svml_dlog1p_data_internal(%rip), %ymm1, %ymm3
> > +
> > +/* polynomial */
> > +        vmovupd   poly_coeff+__svml_dlog1p_data_internal(%rip), %ymm1
> > +        vfmadd213pd poly_coeff+32+__svml_dlog1p_data_internal(%rip), %ymm6, %ymm1
> > +        vfmadd213pd %ymm2, %ymm5, %ymm1
> > +
> > +/* reconstruction */
> > +        vfmadd213pd %ymm6, %ymm5, %ymm1
> > +        vextractf128 $1, %ymm0, %xmm10
> > +        vmovd     %xmm0, %edx
> > +        vmovd     %xmm10, %esi
> > +        movslq    %edx, %rdx
> > +        vpextrd   $2, %xmm0, %ecx
> > +        movslq    %esi, %rsi
> > +        vpextrd   $2, %xmm10, %edi
> > +        movslq    %ecx, %rcx
> > +        movslq    %edi, %rdi
> > +        vmovsd    (%r8,%rdx), %xmm4
> > +        vmovsd    (%r8,%rsi), %xmm11
> > +        vmovhpd   (%r8,%rcx), %xmm4, %xmm7
> > +        vmovhpd   (%r8,%rdi), %xmm11, %xmm12
> > +        vinsertf128 $1, %xmm12, %ymm7, %ymm0
> > +        vaddpd    %ymm1, %ymm0, %ymm6
> > +        vaddpd    %ymm6, %ymm3, %ymm0
> > +
> > +/* OR in the Sign of input argument to produce correct log1p(-0) */
> > +        vorpd     %ymm8, %ymm0, %ymm0
> > +        testl     %eax, %eax
> > +
> > +/* Go to special inputs processing branch */
> > +        jne       L(SPECIAL_VALUES_BRANCH)
> > +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm9
> > +
> > +/* Restore registers
> > + * and exit the function
> > + */
> > +
> > +L(EXIT):
> > +        movq      %rbp, %rsp
> > +        popq      %rbp
> > +        cfi_def_cfa(7, 8)
> > +        cfi_restore(6)
> > +        ret
> > +        cfi_def_cfa(6, 16)
> > +        cfi_offset(6, -16)
> > +
> > +/* Branch to process
> > + * special inputs
> > + */
> > +
> > +L(SPECIAL_VALUES_BRANCH):
> > +        vmovupd   %ymm9, 32(%rsp)
> > +        vmovupd   %ymm0, 64(%rsp)
> > +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> > +
> > +        xorl      %edx, %edx
> > +                                # LOE rbx r12 r13 r14 r15 eax edx
> > +
> > +        vzeroupper
> > +        movq      %r12, 16(%rsp)
> > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> > +        movl      %edx, %r12d
> > +        movq      %r13, 8(%rsp)
> > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> > +        movl      %eax, %r13d
> > +        movq      %r14, (%rsp)
> > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +/* Range mask
> > + * bits check
> > + */
> > +
> > +L(RANGEMASK_CHECK):
> > +        btl       %r12d, %r13d
> > +
> > +/* Call scalar math function */
> > +        jc        L(SCALAR_MATH_CALL)
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +/* Special inputs
> > + * processing loop
> > + */
> > +
> > +L(SPECIAL_VALUES_LOOP):
> > +        incl      %r12d
> > +        cmpl      $4, %r12d
> > +
> > +/* Check bits in range mask */
> > +        jl        L(RANGEMASK_CHECK)
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +        movq      16(%rsp), %r12
> > +        cfi_restore(12)
> > +        movq      8(%rsp), %r13
> > +        cfi_restore(13)
> > +        movq      (%rsp), %r14
> > +        cfi_restore(14)
> > +        vmovupd   64(%rsp), %ymm0
> > +
> > +/* Go to exit */
> > +        jmp       L(EXIT)
> > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> > +                                # LOE rbx r12 r13 r14 r15 ymm0
> > +
> > +/* Scalar math fucntion call
> > + * to process special input
> > + */
> > +
> > +L(SCALAR_MATH_CALL):
> > +        movl      %r12d, %r14d
> > +        movsd     32(%rsp,%r14,8), %xmm0
> > +        call      log1p@PLT
> > +                                # LOE rbx r14 r15 r12d r13d xmm0
> > +
> > +        movsd     %xmm0, 64(%rsp,%r14,8)
> > +
> > +/* Process special inputs in loop */
> > +        jmp       L(SPECIAL_VALUES_LOOP)
> > +                                # LOE rbx r15 r12d r13d
> > +END(_ZGVdN4v_log1p_avx2)
> > +
> > +        .section .rodata, "a"
> > +        .align 32
> > +
> > +#ifdef __svml_dlog1p_data_internal_typedef
> > +typedef unsigned int VUINT32;
> > +typedef struct {
> > +        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
> > +        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
> > +        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
> > +        __declspec(align(32)) VUINT32 ExpMask[4][2];
> > +        __declspec(align(32)) VUINT32 Two10[4][2];
> > +        __declspec(align(32)) VUINT32 MinLog1p[4][2];
> > +        __declspec(align(32)) VUINT32 MaxLog1p[4][2];
> > +        __declspec(align(32)) VUINT32 One[4][2];
> > +        __declspec(align(32)) VUINT32 SgnMask[4][2];
> > +        __declspec(align(32)) VUINT32 XThreshold[4][2];
> > +        __declspec(align(32)) VUINT32 XhMask[4][2];
> > +        __declspec(align(32)) VUINT32 Threshold[4][2];
> > +        __declspec(align(32)) VUINT32 Bias[4][2];
> > +        __declspec(align(32)) VUINT32 Bias1[4][2];
> > +        __declspec(align(32)) VUINT32 ExpMask0[4][2];
> > +        __declspec(align(32)) VUINT32 ExpMask2[4][2];
> > +        __declspec(align(32)) VUINT32 L2[4][2];
> > +} __svml_dlog1p_data_internal;
> > +#endif
> > +__svml_dlog1p_data_internal:
> > +        /* Log_HA_table */
> > +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> > +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> > +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> > +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> > +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> > +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> > +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> > +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> > +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> > +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> > +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> > +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> > +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> > +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> > +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> > +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> > +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> > +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> > +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> > +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> > +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> > +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> > +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> > +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> > +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> > +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> > +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> > +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> > +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> > +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> > +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> > +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> > +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> > +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> > +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> > +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> > +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> > +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> > +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> > +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> > +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> > +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> > +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> > +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> > +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> > +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> > +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> > +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> > +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> > +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> > +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> > +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> > +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> > +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> > +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> > +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> > +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> > +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> > +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> > +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> > +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> > +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> > +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> > +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> > +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> > +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> > +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> > +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> > +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> > +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> > +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> > +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> > +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> > +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> > +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> > +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> > +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> > +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> > +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> > +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> > +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> > +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> > +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> > +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> > +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> > +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> > +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> > +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> > +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> > +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> > +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> > +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> > +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> > +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> > +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> > +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> > +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> > +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> > +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> > +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> > +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> > +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> > +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> > +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> > +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> > +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> > +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> > +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> > +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> > +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> > +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> > +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> > +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> > +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> > +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> > +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> > +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> > +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> > +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> > +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> > +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> > +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> > +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> > +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> > +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> > +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> > +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> > +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> > +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> > +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> > +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> > +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> > +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> > +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> > +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> > +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> > +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> > +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> > +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> > +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> > +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> > +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> > +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> > +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> > +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> > +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> > +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> > +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> > +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> > +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> > +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> > +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> > +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> > +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> > +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> > +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> > +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> > +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> > +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> > +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> > +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> > +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> > +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> > +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> > +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> > +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> > +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> > +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> > +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> > +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> > +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> > +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> > +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> > +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> > +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> > +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> > +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> > +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> > +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> > +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> > +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> > +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> > +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> > +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> > +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> > +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> > +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> > +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> > +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> > +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> > +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> > +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> > +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> > +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> > +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> > +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> > +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> > +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> > +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> > +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> > +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> > +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> > +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> > +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> > +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> > +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> > +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> > +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> > +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> > +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> > +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> > +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> > +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> > +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> > +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> > +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> > +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> > +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> > +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> > +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> > +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> > +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> > +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> > +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> > +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> > +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> > +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> > +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> > +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> > +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> > +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> > +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> > +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> > +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> > +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> > +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> > +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> > +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> > +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> > +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> > +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> > +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> > +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> > +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> > +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> > +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> > +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> > +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> > +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> > +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> > +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> > +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> > +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> > +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> > +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> > +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> > +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> > +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> > +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> > +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> > +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> > +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> > +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> > +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> > +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> > +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> > +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> > +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> > +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> > +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> > +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> > +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> > +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> > +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> > +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> > +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> > +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> > +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> > +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> > +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> > +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> > +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> > +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> > +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> > +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> > +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> > +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> > +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> > +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> > +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> > +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> > +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> > +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> > +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> > +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> > +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> > +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> > +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> > +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> > +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> > +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> > +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> > +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> > +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> > +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> > +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> > +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> > +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> > +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> > +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> > +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> > +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> > +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> > +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> > +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> > +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> > +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> > +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> > +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> > +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> > +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> > +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> > +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> > +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> > +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> > +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> > +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> > +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> > +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> > +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> > +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> > +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> > +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> > +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> > +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> > +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> > +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> > +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> > +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> > +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> > +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> > +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> > +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> > +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> > +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> > +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> > +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> > +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> > +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> > +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> > +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> > +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> > +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> > +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> > +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> > +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> > +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> > +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> > +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> > +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> > +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> > +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> > +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> > +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> > +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> > +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> > +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> > +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> > +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> > +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> > +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> > +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> > +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> > +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> > +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> > +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> > +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> > +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> > +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> > +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> > +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> > +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> > +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> > +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> > +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> > +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> > +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> > +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> > +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> > +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> > +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> > +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> > +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> > +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> > +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> > +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> > +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> > +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> > +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> > +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> > +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> > +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> > +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> > +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> > +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> > +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> > +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> > +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> > +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> > +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> > +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> > +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> > +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> > +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> > +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> > +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> > +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> > +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> > +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> > +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> > +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> > +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> > +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> > +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> > +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> > +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> > +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> > +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> > +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> > +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> > +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> > +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> > +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> > +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> > +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> > +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> > +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> > +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> > +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> > +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> > +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> > +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> > +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> > +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> > +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> > +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> > +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> > +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> > +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> > +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> > +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> > +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> > +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> > +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> > +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> > +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> > +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> > +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> > +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> > +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> > +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> > +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> > +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> > +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> > +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> > +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> > +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> > +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> > +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> > +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> > +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> > +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> > +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> > +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> > +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> > +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> > +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> > +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> > +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> > +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> > +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> > +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> > +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> > +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> > +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> > +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> > +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> > +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> > +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> > +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> > +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> > +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> > +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> > +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> > +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> > +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> > +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> > +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> > +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> > +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> > +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> > +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> > +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> > +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> > +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> > +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> > +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> > +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> > +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> > +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> > +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> > +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> > +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> > +        /*== Log_LA_table ==*/
> > +        .align 32
> > +        .quad 0x8000000000000000
> > +        .quad 0xbf5ff802a9ab10e6
> > +        .quad 0xbf6ff00aa2b10bc0
> > +        .quad 0xbf77ee11ebd82e94
> > +        .quad 0xbf7fe02a6b106789
> > +        .quad 0xbf83e7295d25a7d9
> > +        .quad 0xbf87dc475f810a77
> > +        .quad 0xbf8bcf712c74384c
> > +        .quad 0xbf8fc0a8b0fc03e4
> > +        .quad 0xbf91d7f7eb9eebe7
> > +        .quad 0xbf93cea44346a575
> > +        .quad 0xbf95c45a51b8d389
> > +        .quad 0xbf97b91b07d5b11b
> > +        .quad 0xbf99ace7551cc514
> > +        .quad 0xbf9b9fc027af9198
> > +        .quad 0xbf9d91a66c543cc4
> > +        .quad 0xbf9f829b0e783300
> > +        .quad 0xbfa0b94f7c196176
> > +        .quad 0xbfa1b0d98923d980
> > +        .quad 0xbfa2a7ec2214e873
> > +        .quad 0xbfa39e87b9febd60
> > +        .quad 0xbfa494acc34d911c
> > +        .quad 0xbfa58a5bafc8e4d5
> > +        .quad 0xbfa67f94f094bd98
> > +        .quad 0xbfa77458f632dcfc
> > +        .quad 0xbfa868a83083f6cf
> > +        .quad 0xbfa95c830ec8e3eb
> > +        .quad 0xbfaa4fe9ffa3d235
> > +        .quad 0xbfab42dd711971bf
> > +        .quad 0xbfac355dd0921f2d
> > +        .quad 0xbfad276b8adb0b52
> > +        .quad 0xbfae19070c276016
> > +        .quad 0xbfaf0a30c01162a6
> > +        .quad 0xbfaffae9119b9303
> > +        .quad 0xbfb075983598e471
> > +        .quad 0xbfb0ed839b5526fe
> > +        .quad 0xbfb16536eea37ae1
> > +        .quad 0xbfb1dcb263db1944
> > +        .quad 0xbfb253f62f0a1417
> > +        .quad 0xbfb2cb0283f5de1f
> > +        .quad 0xbfb341d7961bd1d1
> > +        .quad 0xbfb3b87598b1b6ee
> > +        .quad 0xbfb42edcbea646f0
> > +        .quad 0xbfb4a50d3aa1b040
> > +        .quad 0xbfb51b073f06183f
> > +        .quad 0xbfb590cafdf01c28
> > +        .quad 0xbfb60658a93750c4
> > +        .quad 0xbfb67bb0726ec0fc
> > +        .quad 0xbfb6f0d28ae56b4c
> > +        .quad 0xbfb765bf23a6be13
> > +        .quad 0xbfb7da766d7b12cd
> > +        .quad 0xbfb84ef898e8282a
> > +        .quad 0xbfb8c345d6319b21
> > +        .quad 0xbfb9375e55595ede
> > +        .quad 0xbfb9ab42462033ad
> > +        .quad 0xbfba1ef1d8061cd4
> > +        .quad 0xbfba926d3a4ad563
> > +        .quad 0xbfbb05b49bee43fe
> > +        .quad 0xbfbb78c82bb0eda1
> > +        .quad 0xbfbbeba818146765
> > +        .quad 0xbfbc5e548f5bc743
> > +        .quad 0xbfbcd0cdbf8c13e1
> > +        .quad 0xbfbd4313d66cb35d
> > +        .quad 0xbfbdb5270187d927
> > +        .quad 0xbfbe27076e2af2e6
> > +        .quad 0xbfbe98b549671467
> > +        .quad 0xbfbf0a30c01162a6
> > +        .quad 0xbfbf7b79fec37ddf
> > +        .quad 0xbfbfec9131dbeabb
> > +        .quad 0xbfc02ebb42bf3d4b
> > +        .quad 0xbfc0671512ca596e
> > +        .quad 0xbfc09f561ee719c3
> > +        .quad 0xbfc0d77e7cd08e59
> > +        .quad 0xbfc10f8e422539b1
> > +        .quad 0xbfc14785846742ac
> > +        .quad 0xbfc17f6458fca611
> > +        .quad 0xbfc1b72ad52f67a0
> > +        .quad 0xbfc1eed90e2dc2c3
> > +        .quad 0xbfc2266f190a5acb
> > +        .quad 0xbfc25ded0abc6ad2
> > +        .quad 0xbfc29552f81ff523
> > +        .quad 0xbfc2cca0f5f5f251
> > +        .quad 0xbfc303d718e47fd3
> > +        .quad 0xbfc33af575770e4f
> > +        .quad 0xbfc371fc201e8f74
> > +        .quad 0xbfc3a8eb2d31a376
> > +        .quad 0xbfc3dfc2b0ecc62a
> > +        .quad 0xbfc41682bf727bc0
> > +        .quad 0xbfc44d2b6ccb7d1e
> > +        .quad 0xbfc483bccce6e3dd
> > +        .quad 0xbfc4ba36f39a55e5
> > +        .quad 0xbfc4f099f4a230b2
> > +        .quad 0xbfc526e5e3a1b438
> > +        .quad 0xbfc55d1ad4232d6f
> > +        .quad 0xbfc59338d9982086
> > +        .quad 0xbfc5c940075972b9
> > +        .quad 0xbfc5ff3070a793d4
> > +        .quad 0xbfc6350a28aaa758
> > +        .quad 0xbfc66acd4272ad51
> > +        .quad 0xbfc6a079d0f7aad2
> > +        .quad 0xbfc6d60fe719d21d
> > +        .quad 0xbfc70b8f97a1aa75
> > +        .quad 0xbfc740f8f54037a5
> > +        .quad 0xbfc7764c128f2127
> > +        .quad 0xbfc7ab890210d909
> > +        .quad 0xbfc7e0afd630c274
> > +        .quad 0xbfc815c0a14357eb
> > +        .quad 0xbfc84abb75865139
> > +        .quad 0xbfc87fa06520c911
> > +        .quad 0xbfc8b46f8223625b
> > +        .quad 0xbfc8e928de886d41
> > +        .quad 0xbfc91dcc8c340bde
> > +        .quad 0xbfc9525a9cf456b4
> > +        .quad 0xbfc986d3228180ca
> > +        .quad 0xbfc9bb362e7dfb83
> > +        .quad 0xbfc9ef83d2769a34
> > +        .quad 0xbfca23bc1fe2b563
> > +        .quad 0xbfca57df28244dcd
> > +        .quad 0xbfca8becfc882f19
> > +        .quad 0xbfcabfe5ae46124c
> > +        .quad 0xbfcaf3c94e80bff3
> > +        .quad 0xbfcb2797ee46320c
> > +        .quad 0xbfcb5b519e8fb5a4
> > +        .quad 0xbfcb8ef670420c3b
> > +        .quad 0xbfcbc286742d8cd6
> > +        .quad 0xbfcbf601bb0e44e2
> > +        .quad 0xbfcc2968558c18c1
> > +        .quad 0xbfcc5cba543ae425
> > +        .quad 0xbfcc8ff7c79a9a22
> > +        .quad 0xbfccc320c0176502
> > +        .quad 0xbfccf6354e09c5dc
> > +        .quad 0xbfcd293581b6b3e7
> > +        .quad 0xbfcd5c216b4fbb91
> > +        .quad 0xbfcd8ef91af31d5e
> > +        .quad 0xbfcdc1bca0abec7d
> > +        .quad 0xbfcdf46c0c722d2f
> > +        .quad 0xbfce27076e2af2e6
> > +        .quad 0xbfce598ed5a87e2f
> > +        .quad 0xbfce8c0252aa5a60
> > +        .quad 0xbfcebe61f4dd7b0b
> > +        .quad 0xbfcef0adcbdc5936
> > +        .quad 0xbfcf22e5e72f105d
> > +        .quad 0xbfcf550a564b7b37
> > +        .quad 0xbfcf871b28955045
> > +        .quad 0xbfcfb9186d5e3e2b
> > +        .quad 0xbfcfeb0233e607cc
> > +        .quad 0xbfd00e6c45ad501d
> > +        .quad 0xbfd0274dc16c232f
> > +        .quad 0xbfd0402594b4d041
> > +        .quad 0xbfd058f3c703ebc6
> > +        .quad 0xbfd071b85fcd590d
> > +        .quad 0xbfd08a73667c57af
> > +        .quad 0xbfd0a324e27390e3
> > +        .quad 0xbfd0bbccdb0d24bd
> > +        .quad 0xbfd0d46b579ab74b
> > +        .quad 0xbfd0ed005f657da4
> > +        .quad 0xbfd1058bf9ae4ad5
> > +        .quad 0xbfd11e0e2dad9cb7
> > +        .quad 0xbfd136870293a8b0
> > +        .quad 0xbfd14ef67f88685a
> > +        .quad 0xbfd1675cababa60e
> > +        .quad 0xbfd17fb98e15095d
> > +        .quad 0xbfd1980d2dd4236f
> > +        .quad 0xbfd1b05791f07b49
> > +        .quad 0xbfd1c898c16999fb
> > +        .quad 0xbfd1e0d0c33716be
> > +        .quad 0xbfd1f8ff9e48a2f3
> > +        .quad 0xbfd211255986160c
> > +        .quad 0xbfd22941fbcf7966
> > +        .quad 0xbfd241558bfd1404
> > +        .quad 0xbfd2596010df763a
> > +        .quad 0xbfd27161913f853d
> > +        .quad 0xbfd2895a13de86a3
> > +        .quad 0xbfd2a1499f762bc9
> > +        .quad 0xbfd2b9303ab89d25
> > +        .quad 0xbfd2d10dec508583
> > +        .quad 0xbfd2e8e2bae11d31
> > +        .quad 0xbfd300aead06350c
> > +        .quad 0xbfd31871c9544185
> > +        .quad 0xbfd3302c16586588
> > +        .quad 0xbfd347dd9a987d55
> > +        .quad 0xbfd35f865c93293e
> > +        .quad 0xbfd3772662bfd85b
> > +        .quad 0xbfd38ebdb38ed321
> > +        .quad 0xbfd3a64c556945ea
> > +        .quad 0xbfd3bdd24eb14b6a
> > +        .quad 0xbfd3d54fa5c1f710
> > +        .quad 0xbfd3ecc460ef5f50
> > +        .quad 0xbfd404308686a7e4
> > +        .quad 0xbfd41b941cce0bee
> > +        .quad 0xbfd432ef2a04e814
> > +        .quad 0xbfd44a41b463c47c
> > +        .quad 0xbfd4618bc21c5ec2
> > +        .quad 0xbfd478cd5959b3d9
> > +        .quad 0xbfd49006804009d1
> > +        .quad 0xbfd4a7373cecf997
> > +        .quad 0xbfd4be5f957778a1
> > +        .quad 0xbfd4d57f8fefe27f
> > +        .quad 0xbfd4ec973260026a
> > +        .quad 0xbfd503a682cb1cb3
> > +        .quad 0xbfd51aad872df82d
> > +        .quad 0xbfd531ac457ee77e
> > +        .quad 0xbfd548a2c3add263
> > +        .quad 0xbfd55f9107a43ee2
> > +        .quad 0xbfd5767717455a6c
> > +        .quad 0xbfd58d54f86e02f2
> > +        .quad 0xbfd5a42ab0f4cfe2
> > +        .quad 0xbfd5baf846aa1b19
> > +        .quad 0xbfd5d1bdbf5809ca
> > +        .quad 0xbfd5e87b20c2954a
> > +        .quad 0xbfd5ff3070a793d4
> > +        .quad 0xbfd615ddb4bec13c
> > +        .quad 0xbfd62c82f2b9c795
> > +        .quad 0x3fd61965cdb02c1f
> > +        .quad 0x3fd602d08af091ec
> > +        .quad 0x3fd5ec433d5c35ae
> > +        .quad 0x3fd5d5bddf595f30
> > +        .quad 0x3fd5bf406b543db2
> > +        .quad 0x3fd5a8cadbbedfa1
> > +        .quad 0x3fd5925d2b112a59
> > +        .quad 0x3fd57bf753c8d1fb
> > +        .quad 0x3fd565995069514c
> > +        .quad 0x3fd54f431b7be1a9
> > +        .quad 0x3fd538f4af8f72fe
> > +        .quad 0x3fd522ae0738a3d8
> > +        .quad 0x3fd50c6f1d11b97c
> > +        .quad 0x3fd4f637ebba9810
> > +        .quad 0x3fd4e0086dd8baca
> > +        .quad 0x3fd4c9e09e172c3c
> > +        .quad 0x3fd4b3c077267e9a
> > +        .quad 0x3fd49da7f3bcc41f
> > +        .quad 0x3fd487970e958770
> > +        .quad 0x3fd4718dc271c41b
> > +        .quad 0x3fd45b8c0a17df13
> > +        .quad 0x3fd44591e0539f49
> > +        .quad 0x3fd42f9f3ff62642
> > +        .quad 0x3fd419b423d5e8c7
> > +        .quad 0x3fd403d086cea79c
> > +        .quad 0x3fd3edf463c1683e
> > +        .quad 0x3fd3d81fb5946dba
> > +        .quad 0x3fd3c25277333184
> > +        .quad 0x3fd3ac8ca38e5c5f
> > +        .quad 0x3fd396ce359bbf54
> > +        .quad 0x3fd3811728564cb2
> > +        .quad 0x3fd36b6776be1117
> > +        .quad 0x3fd355bf1bd82c8b
> > +        .quad 0x3fd3401e12aecba1
> > +        .quad 0x3fd32a84565120a8
> > +        .quad 0x3fd314f1e1d35ce4
> > +        .quad 0x3fd2ff66b04ea9d4
> > +        .quad 0x3fd2e9e2bce12286
> > +        .quad 0x3fd2d46602adccee
> > +        .quad 0x3fd2bef07cdc9354
> > +        .quad 0x3fd2a982269a3dbf
> > +        .quad 0x3fd2941afb186b7c
> > +        .quad 0x3fd27ebaf58d8c9d
> > +        .quad 0x3fd269621134db92
> > +        .quad 0x3fd25410494e56c7
> > +        .quad 0x3fd23ec5991eba49
> > +        .quad 0x3fd22981fbef797b
> > +        .quad 0x3fd214456d0eb8d4
> > +        .quad 0x3fd1ff0fe7cf47a7
> > +        .quad 0x3fd1e9e1678899f4
> > +        .quad 0x3fd1d4b9e796c245
> > +        .quad 0x3fd1bf99635a6b95
> > +        .quad 0x3fd1aa7fd638d33f
> > +        .quad 0x3fd1956d3b9bc2fa
> > +        .quad 0x3fd180618ef18adf
> > +        .quad 0x3fd16b5ccbacfb73
> > +        .quad 0x3fd1565eed455fc3
> > +        .quad 0x3fd14167ef367783
> > +        .quad 0x3fd12c77cd00713b
> > +        .quad 0x3fd1178e8227e47c
> > +        .quad 0x3fd102ac0a35cc1c
> > +        .quad 0x3fd0edd060b78081
> > +        .quad 0x3fd0d8fb813eb1ef
> > +        .quad 0x3fd0c42d676162e3
> > +        .quad 0x3fd0af660eb9e279
> > +        .quad 0x3fd09aa572e6c6d4
> > +        .quad 0x3fd085eb8f8ae797
> > +        .quad 0x3fd07138604d5862
> > +        .quad 0x3fd05c8be0d9635a
> > +        .quad 0x3fd047e60cde83b8
> > +        .quad 0x3fd03346e0106062
> > +        .quad 0x3fd01eae5626c691
> > +        .quad 0x3fd00a1c6adda473
> > +        .quad 0x3fcfeb2233ea07cd
> > +        .quad 0x3fcfc218be620a5e
> > +        .quad 0x3fcf991c6cb3b379
> > +        .quad 0x3fcf702d36777df0
> > +        .quad 0x3fcf474b134df229
> > +        .quad 0x3fcf1e75fadf9bde
> > +        .quad 0x3fcef5ade4dcffe6
> > +        .quad 0x3fceccf2c8fe920a
> > +        .quad 0x3fcea4449f04aaf5
> > +        .quad 0x3fce7ba35eb77e2a
> > +        .quad 0x3fce530effe71012
> > +        .quad 0x3fce2a877a6b2c12
> > +        .quad 0x3fce020cc6235ab5
> > +        .quad 0x3fcdd99edaf6d7e9
> > +        .quad 0x3fcdb13db0d48940
> > +        .quad 0x3fcd88e93fb2f450
> > +        .quad 0x3fcd60a17f903515
> > +        .quad 0x3fcd38666871f465
> > +        .quad 0x3fcd1037f2655e7b
> > +        .quad 0x3fcce816157f1988
> > +        .quad 0x3fccc000c9db3c52
> > +        .quad 0x3fcc97f8079d44ec
> > +        .quad 0x3fcc6ffbc6f00f71
> > +        .quad 0x3fcc480c0005ccd1
> > +        .quad 0x3fcc2028ab17f9b4
> > +        .quad 0x3fcbf851c067555f
> > +        .quad 0x3fcbd087383bd8ad
> > +        .quad 0x3fcba8c90ae4ad19
> > +        .quad 0x3fcb811730b823d2
> > +        .quad 0x3fcb5971a213acdb
> > +        .quad 0x3fcb31d8575bce3d
> > +        .quad 0x3fcb0a4b48fc1b46
> > +        .quad 0x3fcae2ca6f672bd4
> > +        .quad 0x3fcabb55c31693ad
> > +        .quad 0x3fca93ed3c8ad9e3
> > +        .quad 0x3fca6c90d44b704e
> > +        .quad 0x3fca454082e6ab05
> > +        .quad 0x3fca1dfc40f1b7f1
> > +        .quad 0x3fc9f6c407089664
> > +        .quad 0x3fc9cf97cdce0ec3
> > +        .quad 0x3fc9a8778debaa38
> > +        .quad 0x3fc981634011aa75
> > +        .quad 0x3fc95a5adcf7017f
> > +        .quad 0x3fc9335e5d594989
> > +        .quad 0x3fc90c6db9fcbcd9
> > +        .quad 0x3fc8e588ebac2dbf
> > +        .quad 0x3fc8beafeb38fe8c
> > +        .quad 0x3fc897e2b17b19a5
> > +        .quad 0x3fc871213750e994
> > +        .quad 0x3fc84a6b759f512f
> > +        .quad 0x3fc823c16551a3c2
> > +        .quad 0x3fc7fd22ff599d4f
> > +        .quad 0x3fc7d6903caf5ad0
> > +        .quad 0x3fc7b0091651528c
> > +        .quad 0x3fc7898d85444c73
> > +        .quad 0x3fc7631d82935a86
> > +        .quad 0x3fc73cb9074fd14d
> > +        .quad 0x3fc716600c914054
> > +        .quad 0x3fc6f0128b756abc
> > +        .quad 0x3fc6c9d07d203fc7
> > +        .quad 0x3fc6a399dabbd383
> > +        .quad 0x3fc67d6e9d785771
> > +        .quad 0x3fc6574ebe8c133a
> > +        .quad 0x3fc6313a37335d76
> > +        .quad 0x3fc60b3100b09476
> > +        .quad 0x3fc5e533144c1719
> > +        .quad 0x3fc5bf406b543db2
> > +        .quad 0x3fc59958ff1d52f1
> > +        .quad 0x3fc5737cc9018cdd
> > +        .quad 0x3fc54dabc26105d2
> > +        .quad 0x3fc527e5e4a1b58d
> > +        .quad 0x3fc5022b292f6a45
> > +        .quad 0x3fc4dc7b897bc1c8
> > +        .quad 0x3fc4b6d6fefe22a4
> > +        .quad 0x3fc4913d8333b561
> > +        .quad 0x3fc46baf0f9f5db7
> > +        .quad 0x3fc4462b9dc9b3dc
> > +        .quad 0x3fc420b32740fdd4
> > +        .quad 0x3fc3fb45a59928cc
> > +        .quad 0x3fc3d5e3126bc27f
> > +        .quad 0x3fc3b08b6757f2a9
> > +        .quad 0x3fc38b3e9e027479
> > +        .quad 0x3fc365fcb0159016
> > +        .quad 0x3fc340c59741142e
> > +        .quad 0x3fc31b994d3a4f85
> > +        .quad 0x3fc2f677cbbc0a96
> > +        .quad 0x3fc2d1610c86813a
> > +        .quad 0x3fc2ac55095f5c59
> > +        .quad 0x3fc28753bc11aba5
> > +        .quad 0x3fc2625d1e6ddf57
> > +        .quad 0x3fc23d712a49c202
> > +        .quad 0x3fc2188fd9807263
> > +        .quad 0x3fc1f3b925f25d41
> > +        .quad 0x3fc1ceed09853752
> > +        .quad 0x3fc1aa2b7e23f72a
> > +        .quad 0x3fc185747dbecf34
> > +        .quad 0x3fc160c8024b27b1
> > +        .quad 0x3fc13c2605c398c3
> > +        .quad 0x3fc1178e8227e47c
> > +        .quad 0x3fc0f301717cf0fb
> > +        .quad 0x3fc0ce7ecdccc28d
> > +        .quad 0x3fc0aa06912675d5
> > +        .quad 0x3fc08598b59e3a07
> > +        .quad 0x3fc06135354d4b18
> > +        .quad 0x3fc03cdc0a51ec0d
> > +        .quad 0x3fc0188d2ecf6140
> > +        .quad 0x3fbfe89139dbd566
> > +        .quad 0x3fbfa01c9db57ce2
> > +        .quad 0x3fbf57bc7d9005db
> > +        .quad 0x3fbf0f70cdd992e3
> > +        .quad 0x3fbec739830a1120
> > +        .quad 0x3fbe7f1691a32d3e
> > +        .quad 0x3fbe3707ee30487b
> > +        .quad 0x3fbdef0d8d466db9
> > +        .quad 0x3fbda727638446a2
> > +        .quad 0x3fbd5f55659210e2
> > +        .quad 0x3fbd179788219364
> > +        .quad 0x3fbccfedbfee13a8
> > +        .quad 0x3fbc885801bc4b23
> > +        .quad 0x3fbc40d6425a5cb1
> > +        .quad 0x3fbbf968769fca11
> > +        .quad 0x3fbbb20e936d6974
> > +        .quad 0x3fbb6ac88dad5b1c
> > +        .quad 0x3fbb23965a52ff00
> > +        .quad 0x3fbadc77ee5aea8c
> > +        .quad 0x3fba956d3ecade63
> > +        .quad 0x3fba4e7640b1bc38
> > +        .quad 0x3fba0792e9277cac
> > +        .quad 0x3fb9c0c32d4d2548
> > +        .quad 0x3fb97a07024cbe74
> > +        .quad 0x3fb9335e5d594989
> > +        .quad 0x3fb8ecc933aeb6e8
> > +        .quad 0x3fb8a6477a91dc29
> > +        .quad 0x3fb85fd927506a48
> > +        .quad 0x3fb8197e2f40e3f0
> > +        .quad 0x3fb7d33687c293c9
> > +        .quad 0x3fb78d02263d82d3
> > +        .quad 0x3fb746e100226ed9
> > +        .quad 0x3fb700d30aeac0e1
> > +        .quad 0x3fb6bad83c1883b6
> > +        .quad 0x3fb674f089365a7a
> > +        .quad 0x3fb62f1be7d77743
> > +        .quad 0x3fb5e95a4d9791cb
> > +        .quad 0x3fb5a3abb01ade25
> > +        .quad 0x3fb55e10050e0384
> > +        .quad 0x3fb518874226130a
> > +        .quad 0x3fb4d3115d207eac
> > +        .quad 0x3fb48dae4bc31018
> > +        .quad 0x3fb4485e03dbdfad
> > +        .quad 0x3fb403207b414b7f
> > +        .quad 0x3fb3bdf5a7d1ee64
> > +        .quad 0x3fb378dd7f749714
> > +        .quad 0x3fb333d7f8183f4b
> > +        .quad 0x3fb2eee507b40301
> > +        .quad 0x3fb2aa04a44717a5
> > +        .quad 0x3fb26536c3d8c369
> > +        .quad 0x3fb2207b5c78549e
> > +        .quad 0x3fb1dbd2643d190b
> > +        .quad 0x3fb1973bd1465567
> > +        .quad 0x3fb152b799bb3cc9
> > +        .quad 0x3fb10e45b3cae831
> > +        .quad 0x3fb0c9e615ac4e17
> > +        .quad 0x3fb08598b59e3a07
> > +        .quad 0x3fb0415d89e74444
> > +        .quad 0x3faffa6911ab9301
> > +        .quad 0x3faf723b517fc523
> > +        .quad 0x3faeea31c006b87c
> > +        .quad 0x3fae624c4a0b5e1b
> > +        .quad 0x3fadda8adc67ee4e
> > +        .quad 0x3fad52ed6405d86f
> > +        .quad 0x3faccb73cdddb2cc
> > +        .quad 0x3fac441e06f72a9e
> > +        .quad 0x3fabbcebfc68f420
> > +        .quad 0x3fab35dd9b58baad
> > +        .quad 0x3faaaef2d0fb10fc
> > +        .quad 0x3faa282b8a936171
> > +        .quad 0x3fa9a187b573de7c
> > +        .quad 0x3fa91b073efd7314
> > +        .quad 0x3fa894aa149fb343
> > +        .quad 0x3fa80e7023d8ccc4
> > +        .quad 0x3fa788595a3577ba
> > +        .quad 0x3fa70265a550e777
> > +        .quad 0x3fa67c94f2d4bb58
> > +        .quad 0x3fa5f6e73078efb8
> > +        .quad 0x3fa5715c4c03ceef
> > +        .quad 0x3fa4ebf43349e26f
> > +        .quad 0x3fa466aed42de3ea
> > +        .quad 0x3fa3e18c1ca0ae92
> > +        .quad 0x3fa35c8bfaa1306b
> > +        .quad 0x3fa2d7ae5c3c5bae
> > +        .quad 0x3fa252f32f8d183f
> > +        .quad 0x3fa1ce5a62bc353a
> > +        .quad 0x3fa149e3e4005a8d
> > +        .quad 0x3fa0c58fa19dfaaa
> > +        .quad 0x3fa0415d89e74444
> > +        .quad 0x3f9f7a9b16782856
> > +        .quad 0x3f9e72bf2813ce51
> > +        .quad 0x3f9d6b2725979802
> > +        .quad 0x3f9c63d2ec14aaf2
> > +        .quad 0x3f9b5cc258b718e6
> > +        .quad 0x3f9a55f548c5c43f
> > +        .quad 0x3f994f6b99a24475
> > +        .quad 0x3f98492528c8cabf
> > +        .quad 0x3f974321d3d006d3
> > +        .quad 0x3f963d6178690bd6
> > +        .quad 0x3f9537e3f45f3565
> > +        .quad 0x3f9432a925980cc1
> > +        .quad 0x3f932db0ea132e22
> > +        .quad 0x3f9228fb1fea2e28
> > +        .quad 0x3f912487a5507f70
> > +        .quad 0x3f90205658935847
> > +        .quad 0x3f8e38ce3033310c
> > +        .quad 0x3f8c317384c75f06
> > +        .quad 0x3f8a2a9c6c170462
> > +        .quad 0x3f882448a388a2aa
> > +        .quad 0x3f861e77e8b53fc6
> > +        .quad 0x3f841929f96832f0
> > +        .quad 0x3f82145e939ef1e9
> > +        .quad 0x3f8010157588de71
> > +        .quad 0x3f7c189cbb0e27fb
> > +        .quad 0x3f78121214586b54
> > +        .quad 0x3f740c8a747878e2
> > +        .quad 0x3f70080559588b35
> > +        .quad 0x3f680904828985c0
> > +        .quad 0x3f60040155d5889e
> > +        .quad 0x3f50020055655889
> > +        .quad 0x0000000000000000
> > +        /*== poly_coeff[4] ==*/
> > +        .align 32
> > +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> > +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> > +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> > +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> > +        /*== ExpMask ==*/
> > +        .align 32
> > +        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
> > +        /*== Two10 ==*/
> > +        .align 32
> > +        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
> > +        /*== MinLog1p = -1+2^(-53) ==*/
> > +        .align 32
> > +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff
> > +        /*== MaxLog1p ==*/
> > +        .align 32
> > +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000
> > +        /*== One ==*/
> > +        .align 32
> > +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> > +        /*== SgnMask ==*/
> > +        .align 32
> > +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> > +        /*== XThreshold ==*/
> > +        .align 32
> > +        .quad 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000
> > +        /*== XhMask ==*/
> > +        .align 32
> > +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00
> > +        /*== Threshold ==*/
> > +        .align 32
> > +        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
> > +        /*== Bias ==*/
> > +        .align 32
> > +        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
> > +        /*== Bias1 ==*/
> > +        .align 32
> > +        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
> > +        /*== ExpMask ==*/
> > +        .align 32
> > +        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000
> > +        /*== ExpMask2 ==*/
> > +        .align 32
> > +        .quad 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000
> > +        /*== L2L ==*/
> > +        .align 32
> > +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> > +        .align 32
> > +        .type        __svml_dlog1p_data_internal,@object
> > +        .size        __svml_dlog1p_data_internal,.-__svml_dlog1p_data_internal
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
> > new file mode 100644
> > index 0000000000..ca174a5f52
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
> > @@ -0,0 +1,20 @@
> > +/* AVX2 version of vectorized log1p, vector length is 8.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define _ZGVeN8v_log1p _ZGVeN8v_log1p_avx2_wrapper
> > +#include "../svml_d_log1p8_core.S"
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
> > new file mode 100644
> > index 0000000000..0aa35ec8c5
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
> > @@ -0,0 +1,27 @@
> > +/* Multiple versions of vectorized log1p, vector length is 8.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define SYMBOL_NAME _ZGVeN8v_log1p
> > +#include "ifunc-mathvec-avx512-skx.h"
> > +
> > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > +
> > +#ifdef SHARED
> > +__hidden_ver1 (_ZGVeN8v_log1p, __GI__ZGVeN8v_log1p, __redirect__ZGVeN8v_log1p)
> > +  __attribute__ ((visibility ("hidden")));
> > +#endif
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
> > new file mode 100644
> > index 0000000000..5e38ff8d39
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
> > @@ -0,0 +1,317 @@
> > +/* Function log1p vectorized with AVX-512.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   https://www.gnu.org/licenses/.  */
> > +
> > +/*
> > + * ALGORITHM DESCRIPTION:
> > + *
> > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > + *       log(Rcp) is tabulated
> > + *
> > + *
> > + */
> > +
> > +/* Offsets for data table __svml_dlog1p_data_internal_avx512
> > + */
> > +#define Log_tbl                              0
> > +#define One                                  128
> > +#define SgnMask                              192
> > +#define C075                                 256
> > +#define poly_coeff9                          320
> > +#define poly_coeff8                          384
> > +#define poly_coeff7                          448
> > +#define poly_coeff6                          512
> > +#define poly_coeff5                          576
> > +#define poly_coeff4                          640
> > +#define poly_coeff3                          704
> > +#define poly_coeff2                          768
> > +#define L2                                   832
> > +
> > +#include <sysdep.h>
> > +
> > +        .text
> > +     .section .text.evex512,"ax",@progbits
> > +ENTRY(_ZGVeN8v_log1p_skx)
> > +        pushq     %rbp
> > +        cfi_def_cfa_offset(16)
> > +        movq      %rsp, %rbp
> > +        cfi_def_cfa(6, 16)
> > +        cfi_offset(6, -16)
> > +        andq      $-64, %rsp
> > +        subq      $192, %rsp
> > +        vmovups   One+__svml_dlog1p_data_internal_avx512(%rip), %zmm7
> > +        vmovups   SgnMask+__svml_dlog1p_data_internal_avx512(%rip), %zmm14
> > +        vmovaps   %zmm0, %zmm9
> > +        vaddpd    {rn-sae}, %zmm9, %zmm7, %zmm11
> > +        vandpd    %zmm14, %zmm9, %zmm8
> > +
> > +/* compute 1+x as high, low parts */
> > +        vmaxpd    {sae}, %zmm9, %zmm7, %zmm10
> > +        vminpd    {sae}, %zmm9, %zmm7, %zmm12
> > +
> > +/* GetMant(x), normalized to [1,2) for x>=0, NaN for x<0 */
> > +        vgetmantpd $8, {sae}, %zmm11, %zmm6
> > +
> > +/* GetExp(x) */
> > +        vgetexppd {sae}, %zmm11, %zmm5
> > +        vsubpd    {rn-sae}, %zmm10, %zmm11, %zmm13
> > +
> > +/* DblRcp ~ 1/Mantissa */
> > +        vrcp14pd  %zmm6, %zmm15
> > +
> > +/* Start polynomial evaluation */
> > +        vmovups   poly_coeff9+__svml_dlog1p_data_internal_avx512(%rip), %zmm10
> > +        vmovups   poly_coeff7+__svml_dlog1p_data_internal_avx512(%rip), %zmm11
> > +
> > +/* Xl */
> > +        vsubpd    {rn-sae}, %zmm13, %zmm12, %zmm2
> > +        vxorpd    %zmm14, %zmm5, %zmm3
> > +
> > +/* round DblRcp to 4 fractional bits (RN mode, no Precision exception) */
> > +        vrndscalepd $88, {sae}, %zmm15, %zmm4
> > +        vmovups   poly_coeff5+__svml_dlog1p_data_internal_avx512(%rip), %zmm12
> > +        vmovups   poly_coeff6+__svml_dlog1p_data_internal_avx512(%rip), %zmm14
> > +        vmovups   poly_coeff3+__svml_dlog1p_data_internal_avx512(%rip), %zmm13
> > +
> > +/* Xl*2^(-Expon) */
> > +        vscalefpd {rn-sae}, %zmm3, %zmm2, %zmm1
> > +
> > +/* Reduced argument: R = DblRcp*(Mantissa+Xl) - 1 */
> > +        vfmsub213pd {rn-sae}, %zmm7, %zmm4, %zmm6
> > +        vmovups   __svml_dlog1p_data_internal_avx512(%rip), %zmm3
> > +
> > +/*
> > + * Table lookup
> > + * Prepare exponent correction: DblRcp<0.75?
> > + */
> > +        vmovups   C075+__svml_dlog1p_data_internal_avx512(%rip), %zmm2
> > +
> > +/* Prepare table index */
> > +        vpsrlq    $48, %zmm4, %zmm0
> > +        vfmadd231pd {rn-sae}, %zmm4, %zmm1, %zmm6
> > +        vmovups   poly_coeff8+__svml_dlog1p_data_internal_avx512(%rip), %zmm1
> > +        vcmppd    $17, {sae}, %zmm2, %zmm4, %k1
> > +        vcmppd    $4, {sae}, %zmm6, %zmm6, %k0
> > +        vfmadd231pd {rn-sae}, %zmm6, %zmm10, %zmm1
> > +        vmovups   poly_coeff4+__svml_dlog1p_data_internal_avx512(%rip), %zmm10
> > +        vfmadd231pd {rn-sae}, %zmm6, %zmm11, %zmm14
> > +        vmovups   L2+__svml_dlog1p_data_internal_avx512(%rip), %zmm4
> > +        vpermt2pd Log_tbl+64+__svml_dlog1p_data_internal_avx512(%rip), %zmm0, %zmm3
> > +
> > +/* add 1 to Expon if DblRcp<0.75 */
> > +        vaddpd    {rn-sae}, %zmm7, %zmm5, %zmm5{%k1}
> > +
> > +/* R^2 */
> > +        vmulpd    {rn-sae}, %zmm6, %zmm6, %zmm0
> > +        vfmadd231pd {rn-sae}, %zmm6, %zmm12, %zmm10
> > +        vmovups   poly_coeff2+__svml_dlog1p_data_internal_avx512(%rip), %zmm12
> > +        vmulpd    {rn-sae}, %zmm0, %zmm0, %zmm15
> > +        vfmadd231pd {rn-sae}, %zmm6, %zmm13, %zmm12
> > +        vfmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm1
> > +        kmovw     %k0, %edx
> > +        vfmadd213pd {rn-sae}, %zmm12, %zmm0, %zmm10
> > +
> > +/* polynomial */
> > +        vfmadd213pd {rn-sae}, %zmm10, %zmm15, %zmm1
> > +        vfmadd213pd {rn-sae}, %zmm6, %zmm0, %zmm1
> > +        vaddpd    {rn-sae}, %zmm1, %zmm3, %zmm6
> > +        vfmadd213pd {rn-sae}, %zmm6, %zmm4, %zmm5
> > +        vorpd     %zmm8, %zmm5, %zmm0
> > +        testl     %edx, %edx
> > +
> > +/* Go to special inputs processing branch */
> > +        jne       L(SPECIAL_VALUES_BRANCH)
> > +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm9
> > +
> > +/* Restore registers
> > + * and exit the function
> > + */
> > +
> > +L(EXIT):
> > +        movq      %rbp, %rsp
> > +        popq      %rbp
> > +        cfi_def_cfa(7, 8)
> > +        cfi_restore(6)
> > +        ret
> > +        cfi_def_cfa(6, 16)
> > +        cfi_offset(6, -16)
> > +
> > +/* Branch to process
> > + * special inputs
> > + */
> > +
> > +L(SPECIAL_VALUES_BRANCH):
> > +        vmovups   %zmm9, 64(%rsp)
> > +        vmovups   %zmm0, 128(%rsp)
> > +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> > +
> > +        xorl      %eax, %eax
> > +                                # LOE rbx r12 r13 r14 r15 eax edx
> > +
> > +        vzeroupper
> > +        movq      %r12, 16(%rsp)
> > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> > +        movl      %eax, %r12d
> > +        movq      %r13, 8(%rsp)
> > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> > +        movl      %edx, %r13d
> > +        movq      %r14, (%rsp)
> > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +/* Range mask
> > + * bits check
> > + */
> > +
> > +L(RANGEMASK_CHECK):
> > +        btl       %r12d, %r13d
> > +
> > +/* Call scalar math function */
> > +        jc        L(SCALAR_MATH_CALL)
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +/* Special inputs
> > + * processing loop
> > + */
> > +
> > +L(SPECIAL_VALUES_LOOP):
> > +        incl      %r12d
> > +        cmpl      $8, %r12d
> > +
> > +/* Check bits in range mask */
> > +        jl        L(RANGEMASK_CHECK)
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +        movq      16(%rsp), %r12
> > +        cfi_restore(12)
> > +        movq      8(%rsp), %r13
> > +        cfi_restore(13)
> > +        movq      (%rsp), %r14
> > +        cfi_restore(14)
> > +        vmovups   128(%rsp), %zmm0
> > +
> > +/* Go to exit */
> > +        jmp       L(EXIT)
> > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> > +                                # LOE rbx r12 r13 r14 r15 zmm0
> > +
> > +/* Scalar math fucntion call
> > + * to process special input
> > + */
> > +
> > +L(SCALAR_MATH_CALL):
> > +        movl      %r12d, %r14d
> > +        movsd     64(%rsp,%r14,8), %xmm0
> > +        call      log1p@PLT
> > +                                # LOE rbx r14 r15 r12d r13d xmm0
> > +
> > +        movsd     %xmm0, 128(%rsp,%r14,8)
> > +
> > +/* Process special inputs in loop */
> > +        jmp       L(SPECIAL_VALUES_LOOP)
> > +                                # LOE rbx r15 r12d r13d
> > +END(_ZGVeN8v_log1p_skx)
> > +
> > +        .section .rodata, "a"
> > +        .align 64
> > +
> > +#ifdef __svml_dlog1p_data_internal_avx512_typedef
> > +typedef unsigned int VUINT32;
> > +typedef struct {
> > +        __declspec(align(64)) VUINT32 Log_tbl[16][2];
> > +        __declspec(align(64)) VUINT32 One[8][2];
> > +        __declspec(align(64)) VUINT32 SgnMask[8][2];
> > +        __declspec(align(64)) VUINT32 C075[8][2];
> > +        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
> > +        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
> > +        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
> > +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> > +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> > +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> > +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> > +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> > +        __declspec(align(64)) VUINT32 L2[8][2];
> > +   } __svml_dlog1p_data_internal_avx512;
> > +#endif
> > +__svml_dlog1p_data_internal_avx512:
> > +        /*== Log_tbl ==*/
> > +        .quad 0x0000000000000000
> > +        .quad 0xbfaf0a30c01162a6
> > +        .quad 0xbfbe27076e2af2e6
> > +        .quad 0xbfc5ff3070a793d4
> > +        .quad 0xbfcc8ff7c79a9a22
> > +        .quad 0xbfd1675cababa60e
> > +        .quad 0xbfd4618bc21c5ec2
> > +        .quad 0xbfd739d7f6bbd007
> > +        .quad 0x3fd269621134db92
> > +        .quad 0x3fcf991c6cb3b379
> > +        .quad 0x3fca93ed3c8ad9e3
> > +        .quad 0x3fc5bf406b543db2
> > +        .quad 0x3fc1178e8227e47c
> > +        .quad 0x3fb9335e5d594989
> > +        .quad 0x3fb08598b59e3a07
> > +        .quad 0x3fa0415d89e74444
> > +        /*== One ==*/
> > +        .align 64
> > +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> > +        /*== SgnMask ==*/
> > +        .align 64
> > +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
> > +        /*== C075 0.75 ==*/
> > +        .align 64
> > +        .quad 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000
> > +        /*== poly_coeff9 ==*/
> > +        .align 64
> > +        .quad 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70
> > +        /*== poly_coeff8 ==*/
> > +        .align 64
> > +        .quad 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62
> > +        /*== poly_coeff7 ==*/
> > +        .align 64
> > +        .quad 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF
> > +        /*== poly_coeff6 ==*/
> > +        .align 64
> > +        .quad 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06
> > +        /*== poly_coeff5 ==*/
> > +        .align 64
> > +        .quad 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C
> > +        /*== poly_coeff4 ==*/
> > +        .align 64
> > +        .quad 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD
> > +        /*== poly_coeff3 ==*/
> > +        .align 64
> > +        .quad 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466
> > +        /*== poly_coeff2 ==*/
> > +        .align 64
> > +        .quad 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6
> > +        /*== L2 = log(2) ==*/
> > +        .align 64
> > +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> > +        .align 64
> > +        .type        __svml_dlog1p_data_internal_avx512,@object
> > +        .size        __svml_dlog1p_data_internal_avx512,.-__svml_dlog1p_data_internal_avx512
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
> > new file mode 100644
> > index 0000000000..3c0a0a01a2
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
> > @@ -0,0 +1,20 @@
> > +/* AVX2 version of vectorized log1pf.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define _ZGVeN16v_log1pf _ZGVeN16v_log1pf_avx2_wrapper
> > +#include "../svml_s_log1pf16_core.S"
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
> > new file mode 100644
> > index 0000000000..9af1320547
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
> > @@ -0,0 +1,28 @@
> > +/* Multiple versions of vectorized log1pf, vector length is 16.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define SYMBOL_NAME _ZGVeN16v_log1pf
> > +#include "ifunc-mathvec-avx512-skx.h"
> > +
> > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > +
> > +#ifdef SHARED
> > +__hidden_ver1 (_ZGVeN16v_log1pf, __GI__ZGVeN16v_log1pf,
> > +            __redirect__ZGVeN16v_log1pf)
> > +  __attribute__ ((visibility ("hidden")));
> > +#endif
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
> > new file mode 100644
> > index 0000000000..78b2fe417f
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
> > @@ -0,0 +1,271 @@
> > +/* Function log1pf vectorized with AVX-512.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   https://www.gnu.org/licenses/.  */
> > +
> > +/*
> > + * ALGORITHM DESCRIPTION:
> > + *
> > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > + *       log(Rcp) is tabulated
> > + *
> > + *
> > + */
> > +
> > +/* Offsets for data table __svml_slog1p_data_internal
> > + */
> > +#define SgnMask                              0
> > +#define sOne                                 64
> > +#define sPoly_1                              128
> > +#define sPoly_2                              192
> > +#define sPoly_3                              256
> > +#define sPoly_4                              320
> > +#define sPoly_5                              384
> > +#define sPoly_6                              448
> > +#define sPoly_7                              512
> > +#define sPoly_8                              576
> > +#define iHiDelta                             640
> > +#define iLoRange                             704
> > +#define iBrkValue                            768
> > +#define iOffExpoMask                         832
> > +#define sLn2                                 896
> > +
> > +#include <sysdep.h>
> > +
> > +        .text
> > +     .section .text.exex512,"ax",@progbits
> > +ENTRY(_ZGVeN16v_log1pf_skx)
> > +        pushq     %rbp
> > +        cfi_def_cfa_offset(16)
> > +        movq      %rsp, %rbp
> > +        cfi_def_cfa(6, 16)
> > +        cfi_offset(6, -16)
> > +        andq      $-64, %rsp
> > +        subq      $192, %rsp
> > +        vmovups   sOne+__svml_slog1p_data_internal(%rip), %zmm2
> > +
> > +/* reduction: compute r,n */
> > +        vmovups   iBrkValue+__svml_slog1p_data_internal(%rip), %zmm12
> > +        vmovups   SgnMask+__svml_slog1p_data_internal(%rip), %zmm4
> > +        vmovaps   %zmm0, %zmm3
> > +
> > +/* compute 1+x as high, low parts */
> > +        vmaxps    {sae}, %zmm3, %zmm2, %zmm5
> > +        vminps    {sae}, %zmm3, %zmm2, %zmm7
> > +        vandnps   %zmm3, %zmm4, %zmm1
> > +        vpternlogd $255, %zmm4, %zmm4, %zmm4
> > +        vaddps    {rn-sae}, %zmm7, %zmm5, %zmm9
> > +        vpsubd    %zmm12, %zmm9, %zmm10
> > +        vsubps    {rn-sae}, %zmm9, %zmm5, %zmm6
> > +
> > +/* check argument value ranges */
> > +        vpaddd    iHiDelta+__svml_slog1p_data_internal(%rip), %zmm9, %zmm8
> > +        vpsrad    $23, %zmm10, %zmm13
> > +        vmovups   sPoly_5+__svml_slog1p_data_internal(%rip), %zmm9
> > +        vpcmpd    $5, iLoRange+__svml_slog1p_data_internal(%rip), %zmm8, %k1
> > +        vpslld    $23, %zmm13, %zmm14
> > +        vaddps    {rn-sae}, %zmm7, %zmm6, %zmm15
> > +        vcvtdq2ps {rn-sae}, %zmm13, %zmm0
> > +        vpsubd    %zmm14, %zmm2, %zmm13
> > +        vmovups   sPoly_8+__svml_slog1p_data_internal(%rip), %zmm7
> > +        vmovups   sPoly_1+__svml_slog1p_data_internal(%rip), %zmm14
> > +        vmulps    {rn-sae}, %zmm13, %zmm15, %zmm6
> > +        vpandd    iOffExpoMask+__svml_slog1p_data_internal(%rip), %zmm10, %zmm11
> > +        vpaddd    %zmm12, %zmm11, %zmm5
> > +        vmovups   sPoly_4+__svml_slog1p_data_internal(%rip), %zmm10
> > +        vmovups   sPoly_3+__svml_slog1p_data_internal(%rip), %zmm11
> > +        vmovups   sPoly_2+__svml_slog1p_data_internal(%rip), %zmm12
> > +
> > +/* polynomial evaluation */
> > +        vsubps    {rn-sae}, %zmm2, %zmm5, %zmm2
> > +        vaddps    {rn-sae}, %zmm6, %zmm2, %zmm15
> > +        vmovups   sPoly_7+__svml_slog1p_data_internal(%rip), %zmm2
> > +        vfmadd231ps {rn-sae}, %zmm15, %zmm7, %zmm2
> > +        vpandnd   %zmm8, %zmm8, %zmm4{%k1}
> > +        vmovups   sPoly_6+__svml_slog1p_data_internal(%rip), %zmm8
> > +
> > +/* combine and get argument value range mask */
> > +        vptestmd  %zmm4, %zmm4, %k0
> > +        vfmadd213ps {rn-sae}, %zmm8, %zmm15, %zmm2
> > +        kmovw     %k0, %edx
> > +        vfmadd213ps {rn-sae}, %zmm9, %zmm15, %zmm2
> > +        vfmadd213ps {rn-sae}, %zmm10, %zmm15, %zmm2
> > +        vfmadd213ps {rn-sae}, %zmm11, %zmm15, %zmm2
> > +        vfmadd213ps {rn-sae}, %zmm12, %zmm15, %zmm2
> > +        vfmadd213ps {rn-sae}, %zmm14, %zmm15, %zmm2
> > +        vmulps    {rn-sae}, %zmm15, %zmm2, %zmm4
> > +        vfmadd213ps {rn-sae}, %zmm15, %zmm15, %zmm4
> > +
> > +/* final reconstruction */
> > +        vmovups   sLn2+__svml_slog1p_data_internal(%rip), %zmm15
> > +        vfmadd213ps {rn-sae}, %zmm4, %zmm15, %zmm0
> > +        vorps     %zmm1, %zmm0, %zmm0
> > +        testl     %edx, %edx
> > +
> > +/* Go to special inputs processing branch */
> > +        jne       L(SPECIAL_VALUES_BRANCH)
> > +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm3
> > +
> > +/* Restore registers
> > + * and exit the function
> > + */
> > +
> > +L(EXIT):
> > +        movq      %rbp, %rsp
> > +        popq      %rbp
> > +        cfi_def_cfa(7, 8)
> > +        cfi_restore(6)
> > +        ret
> > +        cfi_def_cfa(6, 16)
> > +        cfi_offset(6, -16)
> > +
> > +/* Branch to process
> > + * special inputs
> > + */
> > +
> > +L(SPECIAL_VALUES_BRANCH):
> > +        vmovups   %zmm3, 64(%rsp)
> > +        vmovups   %zmm0, 128(%rsp)
> > +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> > +
> > +        xorl      %eax, %eax
> > +                                # LOE rbx r12 r13 r14 r15 eax edx
> > +
> > +        vzeroupper
> > +        movq      %r12, 16(%rsp)
> > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> > +        movl      %eax, %r12d
> > +        movq      %r13, 8(%rsp)
> > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> > +        movl      %edx, %r13d
> > +        movq      %r14, (%rsp)
> > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +/* Range mask
> > + * bits check
> > + */
> > +
> > +L(RANGEMASK_CHECK):
> > +        btl       %r12d, %r13d
> > +
> > +/* Call scalar math function */
> > +        jc        L(SCALAR_MATH_CALL)
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +/* Special inputs
> > + * processing loop
> > + */
> > +
> > +L(SPECIAL_VALUES_LOOP):
> > +        incl      %r12d
> > +        cmpl      $16, %r12d
> > +
> > +/* Check bits in range mask */
> > +        jl        L(RANGEMASK_CHECK)
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +        movq      16(%rsp), %r12
> > +        cfi_restore(12)
> > +        movq      8(%rsp), %r13
> > +        cfi_restore(13)
> > +        movq      (%rsp), %r14
> > +        cfi_restore(14)
> > +        vmovups   128(%rsp), %zmm0
> > +
> > +/* Go to exit */
> > +        jmp       L(EXIT)
> > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> > +                                # LOE rbx r12 r13 r14 r15 zmm0
> > +
> > +/* Scalar math fucntion call
> > + * to process special input
> > + */
> > +
> > +L(SCALAR_MATH_CALL):
> > +        movl      %r12d, %r14d
> > +        movss     64(%rsp,%r14,4), %xmm0
> > +        call      log1pf@PLT
> > +                                # LOE rbx r14 r15 r12d r13d xmm0
> > +
> > +        movss     %xmm0, 128(%rsp,%r14,4)
> > +
> > +/* Process special inputs in loop */
> > +        jmp       L(SPECIAL_VALUES_LOOP)
> > +                                # LOE rbx r15 r12d r13d
> > +END(_ZGVeN16v_log1pf_skx)
> > +
> > +        .section .rodata, "a"
> > +        .align 64
> > +
> > +#ifdef __svml_slog1p_data_internal_typedef
> > +typedef unsigned int VUINT32;
> > +typedef struct {
> > +        __declspec(align(64)) VUINT32 SgnMask[16][1];
> > +        __declspec(align(64)) VUINT32 sOne[16][1];
> > +        __declspec(align(64)) VUINT32 sPoly[8][16][1];
> > +        __declspec(align(64)) VUINT32 iHiDelta[16][1];
> > +        __declspec(align(64)) VUINT32 iLoRange[16][1];
> > +        __declspec(align(64)) VUINT32 iBrkValue[16][1];
> > +        __declspec(align(64)) VUINT32 iOffExpoMask[16][1];
> > +        __declspec(align(64)) VUINT32 sLn2[16][1];
> > +} __svml_slog1p_data_internal;
> > +#endif
> > +__svml_slog1p_data_internal:
> > +        /*== SgnMask ==*/
> > +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> > +        /*== sOne = SP 1.0 ==*/
> > +        .align 64
> > +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> > +        /*== sPoly[] = SP polynomial ==*/
> > +        .align 64
> > +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> > +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> > +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> > +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> > +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> > +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> > +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> > +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> > +        /*== iHiDelta = SP 80000000-7f000000 ==*/
> > +        .align 64
> > +        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000
> > +        /*== iLoRange = SP 00800000+iHiDelta ==*/
> > +        .align 64
> > +        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000
> > +        /*== iBrkValue = SP 2/3 ==*/
> > +        .align 64
> > +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> > +        /*== iOffExpoMask = SP significand mask ==*/
> > +        .align 64
> > +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> > +        /*== sLn2 = SP ln(2) ==*/
> > +        .align 64
> > +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> > +        .align 64
> > +        .type        __svml_slog1p_data_internal,@object
> > +        .size        __svml_slog1p_data_internal,.-__svml_slog1p_data_internal
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
> > new file mode 100644
> > index 0000000000..913c8290c8
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
> > @@ -0,0 +1,20 @@
> > +/* SSE2 version of vectorized log1pf, vector length is 4.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define _ZGVbN4v_log1pf _ZGVbN4v_log1pf_sse2
> > +#include "../svml_s_log1pf4_core.S"
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
> > new file mode 100644
> > index 0000000000..b6aff48023
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
> > @@ -0,0 +1,28 @@
> > +/* Multiple versions of vectorized log1pf, vector length is 4.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define SYMBOL_NAME _ZGVbN4v_log1pf
> > +#include "ifunc-mathvec-sse4_1.h"
> > +
> > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > +
> > +#ifdef SHARED
> > +__hidden_ver1 (_ZGVbN4v_log1pf, __GI__ZGVbN4v_log1pf,
> > +            __redirect__ZGVbN4v_log1pf)
> > +  __attribute__ ((visibility ("hidden")));
> > +#endif
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
> > new file mode 100644
> > index 0000000000..ef1bae58c0
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
> > @@ -0,0 +1,252 @@
> > +/* Function log1pf vectorized with SSE4.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   https://www.gnu.org/licenses/.  */
> > +
> > +/*
> > + * ALGORITHM DESCRIPTION:
> > + *
> > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > + *       log(Rcp) is tabulated
> > + *
> > + *
> > + */
> > +
> > +/* Offsets for data table __svml_slog1p_data_internal
> > + */
> > +#define SgnMask                              0
> > +#define sOne                                 16
> > +#define sPoly                                32
> > +#define iHiDelta                             160
> > +#define iLoRange                             176
> > +#define iBrkValue                            192
> > +#define iOffExpoMask                         208
> > +#define sLn2                                 224
> > +
> > +#include <sysdep.h>
> > +
> > +        .text
> > +     .section .text.sse4,"ax",@progbits
> > +ENTRY(_ZGVbN4v_log1pf_sse4)
> > +        subq      $72, %rsp
> > +        cfi_def_cfa_offset(80)
> > +        movups    sOne+__svml_slog1p_data_internal(%rip), %xmm7
> > +
> > +/* compute 1+x as high, low parts */
> > +        movaps    %xmm7, %xmm1
> > +        movaps    %xmm7, %xmm5
> > +        maxps     %xmm0, %xmm1
> > +        minps     %xmm0, %xmm5
> > +        movaps    %xmm1, %xmm4
> > +
> > +/* check argument value ranges */
> > +        movdqu    iHiDelta+__svml_slog1p_data_internal(%rip), %xmm2
> > +        addps     %xmm5, %xmm4
> > +
> > +/* reduction: compute r,n */
> > +        movdqu    iBrkValue+__svml_slog1p_data_internal(%rip), %xmm3
> > +        paddd     %xmm4, %xmm2
> > +        movdqu    iOffExpoMask+__svml_slog1p_data_internal(%rip), %xmm8
> > +        subps     %xmm4, %xmm1
> > +        psubd     %xmm3, %xmm4
> > +        addps     %xmm1, %xmm5
> > +        pand      %xmm4, %xmm8
> > +        psrad     $23, %xmm4
> > +        cvtdq2ps  %xmm4, %xmm10
> > +        pslld     $23, %xmm4
> > +        movaps    %xmm7, %xmm1
> > +        paddd     %xmm3, %xmm8
> > +        psubd     %xmm4, %xmm1
> > +        mulps     %xmm5, %xmm1
> > +
> > +/* polynomial evaluation */
> > +        subps     %xmm7, %xmm8
> > +
> > +/* final reconstruction */
> > +        mulps     sLn2+__svml_slog1p_data_internal(%rip), %xmm10
> > +        addps     %xmm8, %xmm1
> > +        movups    sPoly+112+__svml_slog1p_data_internal(%rip), %xmm9
> > +        mulps     %xmm1, %xmm9
> > +        movdqu    iLoRange+__svml_slog1p_data_internal(%rip), %xmm6
> > +        pcmpgtd   %xmm2, %xmm6
> > +        addps     sPoly+96+__svml_slog1p_data_internal(%rip), %xmm9
> > +
> > +/* combine and get argument value range mask */
> > +        movmskps  %xmm6, %edx
> > +        movups    SgnMask+__svml_slog1p_data_internal(%rip), %xmm11
> > +        mulps     %xmm1, %xmm9
> > +        andnps    %xmm0, %xmm11
> > +        addps     sPoly+80+__svml_slog1p_data_internal(%rip), %xmm9
> > +        mulps     %xmm1, %xmm9
> > +        addps     sPoly+64+__svml_slog1p_data_internal(%rip), %xmm9
> > +        mulps     %xmm1, %xmm9
> > +        addps     sPoly+48+__svml_slog1p_data_internal(%rip), %xmm9
> > +        mulps     %xmm1, %xmm9
> > +        addps     sPoly+32+__svml_slog1p_data_internal(%rip), %xmm9
> > +        mulps     %xmm1, %xmm9
> > +        addps     sPoly+16+__svml_slog1p_data_internal(%rip), %xmm9
> > +        mulps     %xmm1, %xmm9
> > +        addps     sPoly+__svml_slog1p_data_internal(%rip), %xmm9
> > +        mulps     %xmm1, %xmm9
> > +        mulps     %xmm1, %xmm9
> > +        addps     %xmm9, %xmm1
> > +        addps     %xmm10, %xmm1
> > +        orps      %xmm11, %xmm1
> > +        testl     %edx, %edx
> > +
> > +/* Go to special inputs processing branch */
> > +        jne       L(SPECIAL_VALUES_BRANCH)
> > +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
> > +
> > +/* Restore registers
> > + * and exit the function
> > + */
> > +
> > +L(EXIT):
> > +        movaps    %xmm1, %xmm0
> > +        addq      $72, %rsp
> > +        cfi_def_cfa_offset(8)
> > +        ret
> > +        cfi_def_cfa_offset(80)
> > +
> > +/* Branch to process
> > + * special inputs
> > + */
> > +
> > +L(SPECIAL_VALUES_BRANCH):
> > +        movups    %xmm0, 32(%rsp)
> > +        movups    %xmm1, 48(%rsp)
> > +                                # LOE rbx rbp r12 r13 r14 r15 edx
> > +
> > +        xorl      %eax, %eax
> > +        movq      %r12, 16(%rsp)
> > +        cfi_offset(12, -64)
> > +        movl      %eax, %r12d
> > +        movq      %r13, 8(%rsp)
> > +        cfi_offset(13, -72)
> > +        movl      %edx, %r13d
> > +        movq      %r14, (%rsp)
> > +        cfi_offset(14, -80)
> > +                                # LOE rbx rbp r15 r12d r13d
> > +
> > +/* Range mask
> > + * bits check
> > + */
> > +
> > +L(RANGEMASK_CHECK):
> > +        btl       %r12d, %r13d
> > +
> > +/* Call scalar math function */
> > +        jc        L(SCALAR_MATH_CALL)
> > +                                # LOE rbx rbp r15 r12d r13d
> > +
> > +/* Special inputs
> > + * processing loop
> > + */
> > +
> > +L(SPECIAL_VALUES_LOOP):
> > +        incl      %r12d
> > +        cmpl      $4, %r12d
> > +
> > +/* Check bits in range mask */
> > +        jl        L(RANGEMASK_CHECK)
> > +                                # LOE rbx rbp r15 r12d r13d
> > +
> > +        movq      16(%rsp), %r12
> > +        cfi_restore(12)
> > +        movq      8(%rsp), %r13
> > +        cfi_restore(13)
> > +        movq      (%rsp), %r14
> > +        cfi_restore(14)
> > +        movups    48(%rsp), %xmm1
> > +
> > +/* Go to exit */
> > +        jmp       L(EXIT)
> > +        cfi_offset(12, -64)
> > +        cfi_offset(13, -72)
> > +        cfi_offset(14, -80)
> > +                                # LOE rbx rbp r12 r13 r14 r15 xmm1
> > +
> > +/* Scalar math fucntion call
> > + * to process special input
> > + */
> > +
> > +L(SCALAR_MATH_CALL):
> > +        movl      %r12d, %r14d
> > +        movss     32(%rsp,%r14,4), %xmm0
> > +        call      log1pf@PLT
> > +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> > +
> > +        movss     %xmm0, 48(%rsp,%r14,4)
> > +
> > +/* Process special inputs in loop */
> > +        jmp       L(SPECIAL_VALUES_LOOP)
> > +                                # LOE rbx rbp r15 r12d r13d
> > +END(_ZGVbN4v_log1pf_sse4)
> > +
> > +        .section .rodata, "a"
> > +        .align 16
> > +
> > +#ifdef __svml_slog1p_data_internal_typedef
> > +typedef unsigned int VUINT32;
> > +typedef struct {
> > +        __declspec(align(16)) VUINT32 SgnMask[4][1];
> > +        __declspec(align(16)) VUINT32 sOne[4][1];
> > +        __declspec(align(16)) VUINT32 sPoly[8][4][1];
> > +        __declspec(align(16)) VUINT32 iHiDelta[4][1];
> > +        __declspec(align(16)) VUINT32 iLoRange[4][1];
> > +        __declspec(align(16)) VUINT32 iBrkValue[4][1];
> > +        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
> > +        __declspec(align(16)) VUINT32 sLn2[4][1];
> > +} __svml_slog1p_data_internal;
> > +#endif
> > +__svml_slog1p_data_internal:
> > +        /*== SgnMask ==*/
> > +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> > +        /*== sOne = SP 1.0 ==*/
> > +        .align 16
> > +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> > +        /*== sPoly[] = SP polynomial ==*/
> > +        .align 16
> > +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> > +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> > +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> > +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> > +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> > +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> > +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> > +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> > +        /*== iHiDelta = SP 80000000-7f000000 ==*/
> > +        .align 16
> > +        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000
> > +        /*== iLoRange = SP 00800000+iHiDelta ==*/
> > +        .align 16
> > +        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000
> > +        /*== iBrkValue = SP 2/3 ==*/
> > +        .align 16
> > +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> > +        /*== iOffExpoMask = SP significand mask ==*/
> > +        .align 16
> > +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> > +        /*== sLn2 = SP ln(2) ==*/
> > +        .align 16
> > +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> > +        .align 16
> > +        .type        __svml_slog1p_data_internal,@object
> > +        .size        __svml_slog1p_data_internal,.-__svml_slog1p_data_internal
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
> > new file mode 100644
> > index 0000000000..c0b97d89e6
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
> > @@ -0,0 +1,20 @@
> > +/* SSE version of vectorized log1pf, vector length is 8.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +    Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define _ZGVdN8v_log1pf _ZGVdN8v_log1pf_sse_wrapper
> > +#include "../svml_s_log1pf8_core.S"
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
> > new file mode 100644
> > index 0000000000..a2bbe37129
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
> > @@ -0,0 +1,28 @@
> > +/* Multiple versions of vectorized log1pf, vector length is 8.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#define SYMBOL_NAME _ZGVdN8v_log1pf
> > +#include "ifunc-mathvec-avx2.h"
> > +
> > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > +
> > +#ifdef SHARED
> > +__hidden_ver1 (_ZGVdN8v_log1pf, __GI__ZGVdN8v_log1pf,
> > +            __redirect__ZGVdN8v_log1pf)
> > +  __attribute__ ((visibility ("hidden")));
> > +#endif
> > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
> > new file mode 100644
> > index 0000000000..957dc23e3f
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
> > @@ -0,0 +1,254 @@
> > +/* Function log1pf vectorized with AVX2.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   https://www.gnu.org/licenses/.  */
> > +
> > +/*
> > + * ALGORITHM DESCRIPTION:
> > + *
> > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > + *       log(Rcp) is tabulated
> > + *
> > + *
> > + */
> > +
> > +/* Offsets for data table __svml_slog1p_data_internal
> > + */
> > +#define SgnMask                              0
> > +#define sOne                                 32
> > +#define sPoly                                64
> > +#define iHiDelta                             320
> > +#define iLoRange                             352
> > +#define iBrkValue                            384
> > +#define iOffExpoMask                         416
> > +#define sLn2                                 448
> > +
> > +#include <sysdep.h>
> > +
> > +        .text
> > +     .section .text.avx2,"ax",@progbits
> > +ENTRY(_ZGVdN8v_log1pf_avx2)
> > +        pushq     %rbp
> > +        cfi_def_cfa_offset(16)
> > +        movq      %rsp, %rbp
> > +        cfi_def_cfa(6, 16)
> > +        cfi_offset(6, -16)
> > +        andq      $-32, %rsp
> > +        subq      $96, %rsp
> > +        vmovups   sOne+__svml_slog1p_data_internal(%rip), %ymm2
> > +
> > +/* reduction: compute r,n */
> > +        vmovups   iBrkValue+__svml_slog1p_data_internal(%rip), %ymm13
> > +        vmovups   SgnMask+__svml_slog1p_data_internal(%rip), %ymm4
> > +        vmovups   iLoRange+__svml_slog1p_data_internal(%rip), %ymm8
> > +        vmovaps   %ymm0, %ymm3
> > +
> > +/* compute 1+x as high, low parts */
> > +        vmaxps    %ymm3, %ymm2, %ymm5
> > +        vminps    %ymm3, %ymm2, %ymm6
> > +        vaddps    %ymm6, %ymm5, %ymm10
> > +        vpsubd    %ymm13, %ymm10, %ymm11
> > +
> > +/* check argument value ranges */
> > +        vpaddd    iHiDelta+__svml_slog1p_data_internal(%rip), %ymm10, %ymm9
> > +        vsubps    %ymm10, %ymm5, %ymm7
> > +        vpsrad    $23, %ymm11, %ymm14
> > +        vpand     iOffExpoMask+__svml_slog1p_data_internal(%rip), %ymm11, %ymm12
> > +        vpslld    $23, %ymm14, %ymm15
> > +        vcvtdq2ps %ymm14, %ymm0
> > +        vpsubd    %ymm15, %ymm2, %ymm14
> > +        vandnps   %ymm3, %ymm4, %ymm1
> > +        vaddps    %ymm7, %ymm6, %ymm4
> > +        vpaddd    %ymm13, %ymm12, %ymm6
> > +        vmulps    %ymm4, %ymm14, %ymm7
> > +
> > +/* polynomial evaluation */
> > +        vsubps    %ymm2, %ymm6, %ymm2
> > +        vpcmpgtd  %ymm9, %ymm8, %ymm5
> > +        vmovups   sPoly+224+__svml_slog1p_data_internal(%rip), %ymm8
> > +        vaddps    %ymm2, %ymm7, %ymm9
> > +        vfmadd213ps sPoly+192+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > +        vfmadd213ps sPoly+160+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > +        vfmadd213ps sPoly+128+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > +        vfmadd213ps sPoly+96+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > +        vfmadd213ps sPoly+64+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > +        vfmadd213ps sPoly+32+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > +        vfmadd213ps sPoly+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > +        vmulps    %ymm8, %ymm9, %ymm10
> > +        vfmadd213ps %ymm9, %ymm9, %ymm10
> > +
> > +/* final reconstruction */
> > +        vfmadd132ps sLn2+__svml_slog1p_data_internal(%rip), %ymm10, %ymm0
> > +
> > +/* combine and get argument value range mask */
> > +        vmovmskps %ymm5, %edx
> > +        vorps     %ymm1, %ymm0, %ymm0
> > +        testl     %edx, %edx
> > +
> > +/* Go to special inputs processing branch */
> > +        jne       L(SPECIAL_VALUES_BRANCH)
> > +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm3
> > +
> > +/* Restore registers
> > + * and exit the function
> > + */
> > +
> > +L(EXIT):
> > +        movq      %rbp, %rsp
> > +        popq      %rbp
> > +        cfi_def_cfa(7, 8)
> > +        cfi_restore(6)
> > +        ret
> > +        cfi_def_cfa(6, 16)
> > +        cfi_offset(6, -16)
> > +
> > +/* Branch to process
> > + * special inputs
> > + */
> > +
> > +L(SPECIAL_VALUES_BRANCH):
> > +        vmovups   %ymm3, 32(%rsp)
> > +        vmovups   %ymm0, 64(%rsp)
> > +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> > +
> > +        xorl      %eax, %eax
> > +                                # LOE rbx r12 r13 r14 r15 eax edx
> > +
> > +        vzeroupper
> > +        movq      %r12, 16(%rsp)
> > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> > +        movl      %eax, %r12d
> > +        movq      %r13, 8(%rsp)
> > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> > +        movl      %edx, %r13d
> > +        movq      %r14, (%rsp)
> > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +/* Range mask
> > + * bits check
> > + */
> > +
> > +L(RANGEMASK_CHECK):
> > +        btl       %r12d, %r13d
> > +
> > +/* Call scalar math function */
> > +        jc        L(SCALAR_MATH_CALL)
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +/* Special inputs
> > + * processing loop
> > + */
> > +
> > +L(SPECIAL_VALUES_LOOP):
> > +        incl      %r12d
> > +        cmpl      $8, %r12d
> > +
> > +/* Check bits in range mask */
> > +        jl        L(RANGEMASK_CHECK)
> > +                                # LOE rbx r15 r12d r13d
> > +
> > +        movq      16(%rsp), %r12
> > +        cfi_restore(12)
> > +        movq      8(%rsp), %r13
> > +        cfi_restore(13)
> > +        movq      (%rsp), %r14
> > +        cfi_restore(14)
> > +        vmovups   64(%rsp), %ymm0
> > +
> > +/* Go to exit */
> > +        jmp       L(EXIT)
> > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> > +                                # LOE rbx r12 r13 r14 r15 ymm0
> > +
> > +/* Scalar math fucntion call
> > + * to process special input
> > + */
> > +
> > +L(SCALAR_MATH_CALL):
> > +        movl      %r12d, %r14d
> > +        movss     32(%rsp,%r14,4), %xmm0
> > +        call      log1pf@PLT
> > +                                # LOE rbx r14 r15 r12d r13d xmm0
> > +
> > +        movss     %xmm0, 64(%rsp,%r14,4)
> > +
> > +/* Process special inputs in loop */
> > +        jmp       L(SPECIAL_VALUES_LOOP)
> > +                                # LOE rbx r15 r12d r13d
> > +END(_ZGVdN8v_log1pf_avx2)
> > +
> > +        .section .rodata, "a"
> > +        .align 32
> > +
> > +#ifdef __svml_slog1p_data_internal_typedef
> > +typedef unsigned int VUINT32;
> > +typedef struct {
> > +        __declspec(align(32)) VUINT32 SgnMask[8][1];
> > +        __declspec(align(32)) VUINT32 sOne[8][1];
> > +        __declspec(align(32)) VUINT32 sPoly[8][8][1];
> > +        __declspec(align(32)) VUINT32 iHiDelta[8][1];
> > +        __declspec(align(32)) VUINT32 iLoRange[8][1];
> > +        __declspec(align(32)) VUINT32 iBrkValue[8][1];
> > +        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
> > +        __declspec(align(32)) VUINT32 sLn2[8][1];
> > +} __svml_slog1p_data_internal;
> > +#endif
> > +__svml_slog1p_data_internal:
> > +        /*== SgnMask ==*/
> > +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> > +        /*== sOne = SP 1.0 ==*/
> > +        .align 32
> > +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> > +        /*== sPoly[] = SP polynomial ==*/
> > +        .align 32
> > +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> > +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> > +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> > +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> > +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> > +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> > +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> > +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> > +        /*== iHiDelta = SP 80000000-7f000000 ==*/
> > +        .align 32
> > +        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000
> > +        /*== iLoRange = SP 00800000+iHiDelta ==*/
> > +        .align 32
> > +        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000
> > +        /*== iBrkValue = SP 2/3 ==*/
> > +        .align 32
> > +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> > +        /*== iOffExpoMask = SP significand mask ==*/
> > +        .align 32
> > +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> > +        /*== sLn2 = SP ln(2) ==*/
> > +        .align 32
> > +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> > +        .align 32
> > +        .type        __svml_slog1p_data_internal,@object
> > +        .size        __svml_slog1p_data_internal,.-__svml_slog1p_data_internal
> > diff --git a/sysdeps/x86_64/fpu/svml_d_log1p2_core.S b/sysdeps/x86_64/fpu/svml_d_log1p2_core.S
> > new file mode 100644
> > index 0000000000..e3f01717d9
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/svml_d_log1p2_core.S
> > @@ -0,0 +1,29 @@
> > +/* Function log1p vectorized with SSE2.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sysdep.h>
> > +#include "svml_d_wrapper_impl.h"
> > +
> > +     .text
> > +ENTRY (_ZGVbN2v_log1p)
> > +WRAPPER_IMPL_SSE2 log1p
> > +END (_ZGVbN2v_log1p)
> > +
> > +#ifndef USE_MULTIARCH
> > + libmvec_hidden_def (_ZGVbN2v_log1p)
> > +#endif
> > diff --git a/sysdeps/x86_64/fpu/svml_d_log1p4_core.S b/sysdeps/x86_64/fpu/svml_d_log1p4_core.S
> > new file mode 100644
> > index 0000000000..49beb96183
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/svml_d_log1p4_core.S
> > @@ -0,0 +1,29 @@
> > +/* Function log1p vectorized with AVX2, wrapper version.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sysdep.h>
> > +#include "svml_d_wrapper_impl.h"
> > +
> > +     .text
> > +ENTRY (_ZGVdN4v_log1p)
> > +WRAPPER_IMPL_AVX _ZGVbN2v_log1p
> > +END (_ZGVdN4v_log1p)
> > +
> > +#ifndef USE_MULTIARCH
> > + libmvec_hidden_def (_ZGVdN4v_log1p)
> > +#endif
> > diff --git a/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
> > new file mode 100644
> > index 0000000000..8b89768b7c
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
> > @@ -0,0 +1,25 @@
> > +/* Function log1p vectorized in AVX ISA as wrapper to SSE4 ISA version.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sysdep.h>
> > +#include "svml_d_wrapper_impl.h"
> > +
> > +     .text
> > +ENTRY (_ZGVcN4v_log1p)
> > +WRAPPER_IMPL_AVX _ZGVbN2v_log1p
> > +END (_ZGVcN4v_log1p)
> > diff --git a/sysdeps/x86_64/fpu/svml_d_log1p8_core.S b/sysdeps/x86_64/fpu/svml_d_log1p8_core.S
> > new file mode 100644
> > index 0000000000..54b4d4ede8
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/svml_d_log1p8_core.S
> > @@ -0,0 +1,25 @@
> > +/* Function log1p vectorized with AVX-512, wrapper to AVX2.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sysdep.h>
> > +#include "svml_d_wrapper_impl.h"
> > +
> > +     .text
> > +ENTRY (_ZGVeN8v_log1p)
> > +WRAPPER_IMPL_AVX512 _ZGVdN4v_log1p
> > +END (_ZGVeN8v_log1p)
> > diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
> > new file mode 100644
> > index 0000000000..2c953d00fb
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
> > @@ -0,0 +1,25 @@
> > +/* Function log1pf vectorized with AVX-512. Wrapper to AVX2 version.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sysdep.h>
> > +#include "svml_s_wrapper_impl.h"
> > +
> > +     .text
> > +ENTRY (_ZGVeN16v_log1pf)
> > +WRAPPER_IMPL_AVX512 _ZGVdN8v_log1pf
> > +END (_ZGVeN16v_log1pf)
> > diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
> > new file mode 100644
> > index 0000000000..6f68762eaa
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
> > @@ -0,0 +1,29 @@
> > +/* Function log1pf vectorized with SSE2, wrapper version.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sysdep.h>
> > +#include "svml_s_wrapper_impl.h"
> > +
> > +     .text
> > +ENTRY (_ZGVbN4v_log1pf)
> > +WRAPPER_IMPL_SSE2 log1pf
> > +END (_ZGVbN4v_log1pf)
> > +
> > +#ifndef USE_MULTIARCH
> > + libmvec_hidden_def (_ZGVbN4v_log1pf)
> > +#endif
> > diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
> > new file mode 100644
> > index 0000000000..74f81283b1
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
> > @@ -0,0 +1,29 @@
> > +/* Function log1pf vectorized with AVX2, wrapper version.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sysdep.h>
> > +#include "svml_s_wrapper_impl.h"
> > +
> > +     .text
> > +ENTRY (_ZGVdN8v_log1pf)
> > +WRAPPER_IMPL_AVX _ZGVbN4v_log1pf
> > +END (_ZGVdN8v_log1pf)
> > +
> > +#ifndef USE_MULTIARCH
> > + libmvec_hidden_def (_ZGVdN8v_log1pf)
> > +#endif
> > diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
> > new file mode 100644
> > index 0000000000..f33be0e904
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
> > @@ -0,0 +1,25 @@
> > +/* Function log1pf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +#include <sysdep.h>
> > +#include "svml_s_wrapper_impl.h"
> > +
> > +        .text
> > +ENTRY (_ZGVcN8v_log1pf)
> > +WRAPPER_IMPL_AVX _ZGVbN4v_log1pf
> > +END (_ZGVcN8v_log1pf)
> > diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
> > new file mode 100644
> > index 0000000000..18aa6aaeaa
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
> > @@ -0,0 +1 @@
> > +#include "test-double-libmvec-log1p.c"
> > diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
> > new file mode 100644
> > index 0000000000..18aa6aaeaa
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
> > @@ -0,0 +1 @@
> > +#include "test-double-libmvec-log1p.c"
> > diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
> > new file mode 100644
> > index 0000000000..18aa6aaeaa
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
> > @@ -0,0 +1 @@
> > +#include "test-double-libmvec-log1p.c"
> > diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
> > new file mode 100644
> > index 0000000000..40937f987a
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
> > @@ -0,0 +1,3 @@
> > +#define LIBMVEC_TYPE double
> > +#define LIBMVEC_FUNC log1p
> > +#include "test-vector-abi-arg1.h"
> > diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> > index 08c91ff634..38359b05e3 100644
> > --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> > +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
> >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
> > +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
> >
> >  #define VEC_INT_TYPE __m128i
> >
> > diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> > index a2fb0de309..17701e7731 100644
> > --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> > +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> > @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
> >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
> > +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
> >
> >  #ifndef __ILP32__
> >  # define VEC_INT_TYPE __m256i
> > diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> > index dc65a4ee25..bba62b2446 100644
> > --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> > +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
> >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
> > +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
> >
> >  #define VEC_INT_TYPE __m128i
> >
> > diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> > index 253ee8c906..8a04e13a07 100644
> > --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> > +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
> >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
> > +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
> >
> >  #ifndef __ILP32__
> >  # define VEC_INT_TYPE __m512i
> > diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
> > new file mode 100644
> > index 0000000000..3395decaf4
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
> > @@ -0,0 +1 @@
> > +#include "test-float-libmvec-log1pf.c"
> > diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
> > new file mode 100644
> > index 0000000000..3395decaf4
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
> > @@ -0,0 +1 @@
> > +#include "test-float-libmvec-log1pf.c"
> > diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
> > new file mode 100644
> > index 0000000000..3395decaf4
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
> > @@ -0,0 +1 @@
> > +#include "test-float-libmvec-log1pf.c"
> > diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
> > new file mode 100644
> > index 0000000000..1b36069ded
> > --- /dev/null
> > +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
> > @@ -0,0 +1,3 @@
> > +#define LIBMVEC_TYPE float
> > +#define LIBMVEC_FUNC log1pf
> > +#include "test-vector-abi-arg1.h"
> > diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> > index 1c7db5146c..706f52c618 100644
> > --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> > +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
> >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
> > +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
> >
> >  #define VEC_INT_TYPE __m512i
> >
> > diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> > index 8ec51603b3..ceace4c53a 100644
> > --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> > +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
> >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
> > +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
> >
> >  #define VEC_INT_TYPE __m128i
> >
> > diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> > index 1cb4553c7a..06a4753409 100644
> > --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> > +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> > @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
> >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
> > +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
> >
> >  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
> >  #undef VECTOR_WRAPPER_fFF
> > diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> > index 6ecc1792bb..a87e5298e0 100644
> > --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> > +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
> >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
> >  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
> > +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
> >
> >  #define VEC_INT_TYPE __m128i
> >
> > --
> > 2.31.1
> >
>
> LGTM.
>
> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
>
> Thanks.
>
>
> H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 13/18] x86-64: Add vector log1p/log1pf implementation to libmvec
  2021-12-29 23:28     ` Noah Goldstein
@ 2021-12-30  0:32       ` H.J. Lu
  0 siblings, 0 replies; 40+ messages in thread
From: H.J. Lu @ 2021-12-30  0:32 UTC (permalink / raw)
  To: Noah Goldstein
  Cc: Sunil K Pandey, Kolesov, Andrey, GNU C Library, Cornea, Marius

On Wed, Dec 29, 2021 at 3:28 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Wed, Dec 29, 2021 at 3:43 PM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > On Tue, Dec 28, 2021 at 10:39:55PM -0800, Sunil K Pandey wrote:
> > > Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and
> > > AVX512 versions for libmvec as per vector ABI.  It also contains
> > > accuracy and ABI tests for vector log1p/log1pf with regenerated ulps.
> > > ---
> > >  bits/libm-simd-decl-stubs.h                   |   11 +
> > >  math/bits/mathcalls.h                         |    2 +-
> > >  .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
> > >  sysdeps/x86/fpu/bits/math-vector.h            |    4 +
> > >  .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
> > >  sysdeps/x86_64/fpu/Makeconfig                 |    1 +
> > >  sysdeps/x86_64/fpu/Versions                   |    2 +
> > >  sysdeps/x86_64/fpu/libm-test-ulps             |   20 +
> > >  .../fpu/multiarch/svml_d_log1p2_core-sse2.S   |   20 +
> > >  .../x86_64/fpu/multiarch/svml_d_log1p2_core.c |   27 +
> > >  .../fpu/multiarch/svml_d_log1p2_core_sse4.S   | 1398 +++++++++++++++++
> > >  .../fpu/multiarch/svml_d_log1p4_core-sse.S    |   20 +
> > >  .../x86_64/fpu/multiarch/svml_d_log1p4_core.c |   27 +
> > >  .../fpu/multiarch/svml_d_log1p4_core_avx2.S   | 1383 ++++++++++++++++
> > >  .../fpu/multiarch/svml_d_log1p8_core-avx2.S   |   20 +
> > >  .../x86_64/fpu/multiarch/svml_d_log1p8_core.c |   27 +
> > >  .../fpu/multiarch/svml_d_log1p8_core_avx512.S |  317 ++++
> > >  .../fpu/multiarch/svml_s_log1pf16_core-avx2.S |   20 +
> > >  .../fpu/multiarch/svml_s_log1pf16_core.c      |   28 +
> > >  .../multiarch/svml_s_log1pf16_core_avx512.S   |  271 ++++
> > >  .../fpu/multiarch/svml_s_log1pf4_core-sse2.S  |   20 +
> > >  .../fpu/multiarch/svml_s_log1pf4_core.c       |   28 +
> > >  .../fpu/multiarch/svml_s_log1pf4_core_sse4.S  |  252 +++
> > >  .../fpu/multiarch/svml_s_log1pf8_core-sse.S   |   20 +
> > >  .../fpu/multiarch/svml_s_log1pf8_core.c       |   28 +
> > >  .../fpu/multiarch/svml_s_log1pf8_core_avx2.S  |  254 +++
> > >  sysdeps/x86_64/fpu/svml_d_log1p2_core.S       |   29 +
> > >  sysdeps/x86_64/fpu/svml_d_log1p4_core.S       |   29 +
> > >  sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S   |   25 +
> > >  sysdeps/x86_64/fpu/svml_d_log1p8_core.S       |   25 +
> > >  sysdeps/x86_64/fpu/svml_s_log1pf16_core.S     |   25 +
> > >  sysdeps/x86_64/fpu/svml_s_log1pf4_core.S      |   29 +
> > >  sysdeps/x86_64/fpu/svml_s_log1pf8_core.S      |   29 +
> > >  sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S  |   25 +
> > >  .../fpu/test-double-libmvec-log1p-avx.c       |    1 +
> > >  .../fpu/test-double-libmvec-log1p-avx2.c      |    1 +
> > >  .../fpu/test-double-libmvec-log1p-avx512f.c   |    1 +
> > >  .../x86_64/fpu/test-double-libmvec-log1p.c    |    3 +
> > >  .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
> > >  .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
> > >  .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
> > >  .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
> > >  .../fpu/test-float-libmvec-log1pf-avx.c       |    1 +
> > >  .../fpu/test-float-libmvec-log1pf-avx2.c      |    1 +
> > >  .../fpu/test-float-libmvec-log1pf-avx512f.c   |    1 +
> > >  .../x86_64/fpu/test-float-libmvec-log1pf.c    |    3 +
> > >  .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
> > >  .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
> > >  .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
> > >  .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
> > >  50 files changed, 4447 insertions(+), 1 deletion(-)
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
> > >  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
> > >  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p2_core.S
> > >  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p4_core.S
> > >  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
> > >  create mode 100644 sysdeps/x86_64/fpu/svml_d_log1p8_core.S
> > >  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
> > >  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
> > >  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
> > >  create mode 100644 sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
> > >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
> > >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
> > >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
> > >  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
> > >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
> > >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
> > >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
> > >  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
> > >
> > > diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> > > index 73252615ca..845246fab9 100644
> > > --- a/bits/libm-simd-decl-stubs.h
> > > +++ b/bits/libm-simd-decl-stubs.h
> > > @@ -241,4 +241,15 @@
> > >  #define __DECL_SIMD_log2f32x
> > >  #define __DECL_SIMD_log2f64x
> > >  #define __DECL_SIMD_log2f128x
> > > +
> > > +#define __DECL_SIMD_log1p
> > > +#define __DECL_SIMD_log1pf
> > > +#define __DECL_SIMD_log1pl
> > > +#define __DECL_SIMD_log1pf16
> > > +#define __DECL_SIMD_log1pf32
> > > +#define __DECL_SIMD_log1pf64
> > > +#define __DECL_SIMD_log1pf128
> > > +#define __DECL_SIMD_log1pf32x
> > > +#define __DECL_SIMD_log1pf64x
> > > +#define __DECL_SIMD_log1pf128x
> > >  #endif
> > > diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> > > index bfe52a4666..aa4bc61aa4 100644
> > > --- a/math/bits/mathcalls.h
> > > +++ b/math/bits/mathcalls.h
> > > @@ -119,7 +119,7 @@ __MATHCALL_VEC (exp10,, (_Mdouble_ __x));
> > >  __MATHCALL_VEC (expm1,, (_Mdouble_ __x));
> > >
> > >  /* Return log(1 + X).  */
> > > -__MATHCALL (log1p,, (_Mdouble_ __x));
> > > +__MATHCALL_VEC (log1p,, (_Mdouble_ __x));
> > >
> > >  /* Return the base 2 signed integral exponent of X.  */
> > >  __MATHCALL (logb,, (_Mdouble_ __x));
> > > diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> > > index fa8b016c5d..68b940606a 100644
> > > --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> > > +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> > > @@ -55,6 +55,7 @@ GLIBC_2.35 _ZGVbN2v_exp10 F
> > >  GLIBC_2.35 _ZGVbN2v_exp2 F
> > >  GLIBC_2.35 _ZGVbN2v_expm1 F
> > >  GLIBC_2.35 _ZGVbN2v_log10 F
> > > +GLIBC_2.35 _ZGVbN2v_log1p F
> > >  GLIBC_2.35 _ZGVbN2v_log2 F
> > >  GLIBC_2.35 _ZGVbN2v_sinh F
> > >  GLIBC_2.35 _ZGVbN2vv_atan2 F
> > > @@ -68,6 +69,7 @@ GLIBC_2.35 _ZGVbN4v_exp10f F
> > >  GLIBC_2.35 _ZGVbN4v_exp2f F
> > >  GLIBC_2.35 _ZGVbN4v_expm1f F
> > >  GLIBC_2.35 _ZGVbN4v_log10f F
> > > +GLIBC_2.35 _ZGVbN4v_log1pf F
> > >  GLIBC_2.35 _ZGVbN4v_log2f F
> > >  GLIBC_2.35 _ZGVbN4v_sinhf F
> > >  GLIBC_2.35 _ZGVbN4vv_atan2f F
> > > @@ -81,6 +83,7 @@ GLIBC_2.35 _ZGVcN4v_exp10 F
> > >  GLIBC_2.35 _ZGVcN4v_exp2 F
> > >  GLIBC_2.35 _ZGVcN4v_expm1 F
> > >  GLIBC_2.35 _ZGVcN4v_log10 F
> > > +GLIBC_2.35 _ZGVcN4v_log1p F
> > >  GLIBC_2.35 _ZGVcN4v_log2 F
> > >  GLIBC_2.35 _ZGVcN4v_sinh F
> > >  GLIBC_2.35 _ZGVcN4vv_atan2 F
> > > @@ -94,6 +97,7 @@ GLIBC_2.35 _ZGVcN8v_exp10f F
> > >  GLIBC_2.35 _ZGVcN8v_exp2f F
> > >  GLIBC_2.35 _ZGVcN8v_expm1f F
> > >  GLIBC_2.35 _ZGVcN8v_log10f F
> > > +GLIBC_2.35 _ZGVcN8v_log1pf F
> > >  GLIBC_2.35 _ZGVcN8v_log2f F
> > >  GLIBC_2.35 _ZGVcN8v_sinhf F
> > >  GLIBC_2.35 _ZGVcN8vv_atan2f F
> > > @@ -107,6 +111,7 @@ GLIBC_2.35 _ZGVdN4v_exp10 F
> > >  GLIBC_2.35 _ZGVdN4v_exp2 F
> > >  GLIBC_2.35 _ZGVdN4v_expm1 F
> > >  GLIBC_2.35 _ZGVdN4v_log10 F
> > > +GLIBC_2.35 _ZGVdN4v_log1p F
> > >  GLIBC_2.35 _ZGVdN4v_log2 F
> > >  GLIBC_2.35 _ZGVdN4v_sinh F
> > >  GLIBC_2.35 _ZGVdN4vv_atan2 F
> > > @@ -120,6 +125,7 @@ GLIBC_2.35 _ZGVdN8v_exp10f F
> > >  GLIBC_2.35 _ZGVdN8v_exp2f F
> > >  GLIBC_2.35 _ZGVdN8v_expm1f F
> > >  GLIBC_2.35 _ZGVdN8v_log10f F
> > > +GLIBC_2.35 _ZGVdN8v_log1pf F
> > >  GLIBC_2.35 _ZGVdN8v_log2f F
> > >  GLIBC_2.35 _ZGVdN8v_sinhf F
> > >  GLIBC_2.35 _ZGVdN8vv_atan2f F
> > > @@ -133,6 +139,7 @@ GLIBC_2.35 _ZGVeN16v_exp10f F
> > >  GLIBC_2.35 _ZGVeN16v_exp2f F
> > >  GLIBC_2.35 _ZGVeN16v_expm1f F
> > >  GLIBC_2.35 _ZGVeN16v_log10f F
> > > +GLIBC_2.35 _ZGVeN16v_log1pf F
> > >  GLIBC_2.35 _ZGVeN16v_log2f F
> > >  GLIBC_2.35 _ZGVeN16v_sinhf F
> > >  GLIBC_2.35 _ZGVeN16vv_atan2f F
> > > @@ -146,6 +153,7 @@ GLIBC_2.35 _ZGVeN8v_exp10 F
> > >  GLIBC_2.35 _ZGVeN8v_exp2 F
> > >  GLIBC_2.35 _ZGVeN8v_expm1 F
> > >  GLIBC_2.35 _ZGVeN8v_log10 F
> > > +GLIBC_2.35 _ZGVeN8v_log1p F
> > >  GLIBC_2.35 _ZGVeN8v_log2 F
> > >  GLIBC_2.35 _ZGVeN8v_sinh F
> > >  GLIBC_2.35 _ZGVeN8vv_atan2 F
> > > diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> > > index 59d284a10a..14c9db3bb3 100644
> > > --- a/sysdeps/x86/fpu/bits/math-vector.h
> > > +++ b/sysdeps/x86/fpu/bits/math-vector.h
> > > @@ -110,6 +110,10 @@
> > >  #  define __DECL_SIMD_log2 __DECL_SIMD_x86_64
> > >  #  undef __DECL_SIMD_log2f
> > >  #  define __DECL_SIMD_log2f __DECL_SIMD_x86_64
> > > +#  undef __DECL_SIMD_log1p
> > > +#  define __DECL_SIMD_log1p __DECL_SIMD_x86_64
> > > +#  undef __DECL_SIMD_log1pf
> > > +#  define __DECL_SIMD_log1pf __DECL_SIMD_x86_64
> > >
> > >  # endif
> > >  #endif
> > > diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> > > index a2ca9a203f..3dca196432 100644
> > > --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> > > +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> > > @@ -54,6 +54,8 @@
> > >  !GCC$ builtin (log10f) attributes simd (notinbranch) if('x86_64')
> > >  !GCC$ builtin (log2) attributes simd (notinbranch) if('x86_64')
> > >  !GCC$ builtin (log2f) attributes simd (notinbranch) if('x86_64')
> > > +!GCC$ builtin (log1p) attributes simd (notinbranch) if('x86_64')
> > > +!GCC$ builtin (log1pf) attributes simd (notinbranch) if('x86_64')
> > >
> > >  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
> > >  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> > > @@ -93,3 +95,5 @@
> > >  !GCC$ builtin (log10f) attributes simd (notinbranch) if('x32')
> > >  !GCC$ builtin (log2) attributes simd (notinbranch) if('x32')
> > >  !GCC$ builtin (log2f) attributes simd (notinbranch) if('x32')
> > > +!GCC$ builtin (log1p) attributes simd (notinbranch) if('x32')
> > > +!GCC$ builtin (log1pf) attributes simd (notinbranch) if('x32')
> > > diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> > > index 8d6d0915af..378cb06d37 100644
> > > --- a/sysdeps/x86_64/fpu/Makeconfig
> > > +++ b/sysdeps/x86_64/fpu/Makeconfig
> > > @@ -36,6 +36,7 @@ libmvec-funcs = \
> > >    hypot \
> > >    log \
> > >    log10 \
> > > +  log1p \
> > >    log2 \
> > >    pow \
> > >    sin \
> > > diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> > > index 1b48c2d642..155fb115f3 100644
> > > --- a/sysdeps/x86_64/fpu/Versions
> > > +++ b/sysdeps/x86_64/fpu/Versions
> > > @@ -23,6 +23,7 @@ libmvec {
> > >      _ZGVbN2v_exp2; _ZGVcN4v_exp2; _ZGVdN4v_exp2; _ZGVeN8v_exp2;
> > >      _ZGVbN2v_expm1; _ZGVcN4v_expm1; _ZGVdN4v_expm1; _ZGVeN8v_expm1;
> > >      _ZGVbN2v_log10; _ZGVcN4v_log10; _ZGVdN4v_log10; _ZGVeN8v_log10;
> > > +    _ZGVbN2v_log1p; _ZGVcN4v_log1p; _ZGVdN4v_log1p; _ZGVeN8v_log1p;
> > >      _ZGVbN2v_log2; _ZGVcN4v_log2; _ZGVdN4v_log2; _ZGVeN8v_log2;
> > >      _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
> > >      _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
> > > @@ -36,6 +37,7 @@ libmvec {
> > >      _ZGVbN4v_exp2f; _ZGVcN8v_exp2f; _ZGVdN8v_exp2f; _ZGVeN16v_exp2f;
> > >      _ZGVbN4v_expm1f; _ZGVcN8v_expm1f; _ZGVdN8v_expm1f; _ZGVeN16v_expm1f;
> > >      _ZGVbN4v_log10f; _ZGVcN8v_log10f; _ZGVdN8v_log10f; _ZGVeN16v_log10f;
> > > +    _ZGVbN4v_log1pf; _ZGVcN8v_log1pf; _ZGVdN8v_log1pf; _ZGVeN16v_log1pf;
> > >      _ZGVbN4v_log2f; _ZGVcN8v_log2f; _ZGVdN8v_log2f; _ZGVeN16v_log2f;
> > >      _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
> > >      _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
> > > diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> > > index 3b7f3cee6f..a2b15a795b 100644
> > > --- a/sysdeps/x86_64/fpu/libm-test-ulps
> > > +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> > > @@ -1685,6 +1685,26 @@ float: 2
> > >  float128: 2
> > >  ldouble: 3
> > >
> > > +Function: "log1p_vlen16":
> > > +float: 2
> > > +
> > > +Function: "log1p_vlen2":
> > > +double: 1
> > > +
> > > +Function: "log1p_vlen4":
> > > +double: 1
> > > +float: 2
> > > +
> > > +Function: "log1p_vlen4_avx2":
> > > +double: 1
> > > +
> > > +Function: "log1p_vlen8":
> > > +double: 1
> > > +float: 2
> > > +
> > > +Function: "log1p_vlen8_avx2":
> > > +float: 2
> > > +
> > >  Function: "log2":
> > >  double: 2
> > >  float: 1
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
> > > new file mode 100644
> > > index 0000000000..8004088346
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core-sse2.S
> > > @@ -0,0 +1,20 @@
> > > +/* SSE2 version of vectorized log1p, vector length is 2.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define _ZGVbN2v_log1p _ZGVbN2v_log1p_sse2
> > > +#include "../svml_d_log1p2_core.S"
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
> > > new file mode 100644
> > > index 0000000000..35ca620aba
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core.c
> > > @@ -0,0 +1,27 @@
> > > +/* Multiple versions of vectorized log1p, vector length is 2.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define SYMBOL_NAME _ZGVbN2v_log1p
> > > +#include "ifunc-mathvec-sse4_1.h"
> > > +
> > > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > > +
> > > +#ifdef SHARED
> > > +__hidden_ver1 (_ZGVbN2v_log1p, __GI__ZGVbN2v_log1p, __redirect__ZGVbN2v_log1p)
> > > +  __attribute__ ((visibility ("hidden")));
> > > +#endif
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
> > > new file mode 100644
> > > index 0000000000..9d3f0647b4
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p2_core_sse4.S
> > > @@ -0,0 +1,1398 @@
> > > +/* Function log1p vectorized with SSE4.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   https://www.gnu.org/licenses/.  */
> > > +
> > > +/*
> > > + * ALGORITHM DESCRIPTION:
> > > + *
> > > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > > + *       log(Rcp) is tabulated
> > > + *
> > > + *
> > > + */
> > > +
> > > +/* Offsets for data table __svml_dlog1p_data_internal
> > > + */
> > > +#define Log_HA_table                         0
>
> Where is this used?

This field isn't used directly, but accessed via table lookup code.
A macro is defined for each field used, directly and indirectly.

>
> > > +#define Log_LA_table                         8208
> > > +#define poly_coeff                           12320
> > > +#define ExpMask                              12384
> > > +#define Two10                                12400
> > > +#define MinLog1p                             12416
> > > +#define MaxLog1p                             12432
> > > +#define One                                  12448
> > > +#define SgnMask                              12464
> > > +#define XThreshold                           12480
> > > +#define XhMask                               12496
> > > +#define Threshold                            12512
> > > +#define Bias                                 12528
> > > +#define Bias1                                12544
> > > +#define ExpMask0                             12560
> > > +#define ExpMask2                             12576
> > > +#define L2                                   12592
> > > +
> > > +/* Lookup bias for data table __svml_dlog1p_data_internal.  */
> > > +#define Table_Lookup_Bias               -0x405ff0
> > > +
> > > +#include <sysdep.h>
> > > +
> > > +        .text
> > > +     .section .text.sse4,"ax",@progbits
> > > +ENTRY(_ZGVbN2v_log1p_sse4)
> > > +        pushq     %rbp
> > > +        cfi_def_cfa_offset(16)
> > > +        movq      %rsp, %rbp
> > > +        cfi_def_cfa(6, 16)
> > > +        cfi_offset(6, -16)
> > > +        andq      $-32, %rsp
> > > +        subq      $64, %rsp
> > > +        movaps    %xmm0, %xmm7
> > > +
> > > +/* SgnMask used by all accuracies */
> > > +        movups    SgnMask+__svml_dlog1p_data_internal(%rip), %xmm6
> > > +        lea       Table_Lookup_Bias+__svml_dlog1p_data_internal(%rip), %rsi
> > > +        movaps    %xmm6, %xmm8
> > > +        movaps    %xmm7, %xmm15
> > > +        movups    One+__svml_dlog1p_data_internal(%rip), %xmm0
> > > +        andps     %xmm7, %xmm8
> > > +        cmpltpd   XThreshold+__svml_dlog1p_data_internal(%rip), %xmm8
> > > +        cmpnlepd  MaxLog1p+__svml_dlog1p_data_internal(%rip), %xmm15
> > > +        movaps    %xmm0, %xmm4
> > > +
> > > +/* compute 1+x as high, low parts */
> > > +        movaps    %xmm0, %xmm9
> > > +        addpd     %xmm7, %xmm4
> > > +        maxpd     %xmm7, %xmm9
> > > +        orps      XhMask+__svml_dlog1p_data_internal(%rip), %xmm8
> > > +        movaps    %xmm0, %xmm5
> > > +
> > > +/* preserve mantissa, set input exponent to 2^(-10) */
> > > +        movups    ExpMask+__svml_dlog1p_data_internal(%rip), %xmm3
> > > +        andps     %xmm8, %xmm4
> > > +        andps     %xmm4, %xmm3
> > > +
> > > +/* check range */
> > > +        movaps    %xmm7, %xmm8
> > > +        orps      Two10+__svml_dlog1p_data_internal(%rip), %xmm3
> > > +
> > > +/* Compute SignMask for all accuracies, including EP */
> > > +        andnps    %xmm7, %xmm6
> > > +
> > > +/* reciprocal approximation good to at least 11 bits */
> > > +        cvtpd2ps  %xmm3, %xmm10
> > > +        minpd     %xmm7, %xmm5
> > > +        subpd     %xmm4, %xmm9
> > > +        cmpltpd   MinLog1p+__svml_dlog1p_data_internal(%rip), %xmm8
> > > +        addpd     %xmm9, %xmm5
> > > +        movlhps   %xmm10, %xmm10
> > > +        orps      %xmm15, %xmm8
> > > +        rcpps     %xmm10, %xmm11
> > > +
> > > +/* combine and get argument value range mask */
> > > +        movmskpd  %xmm8, %edx
> > > +
> > > +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> > > +        movups    .FLT_16(%rip), %xmm13
> > > +
> > > +/* exponent of X needed to scale Xl */
> > > +        movdqu    ExpMask0+__svml_dlog1p_data_internal(%rip), %xmm12
> > > +        cvtps2pd  %xmm11, %xmm1
> > > +        addpd     %xmm13, %xmm1
> > > +        subpd     %xmm13, %xmm1
> > > +
> > > +/* 2^ (-10-exp(X) ) */
> > > +        movdqu    ExpMask2+__svml_dlog1p_data_internal(%rip), %xmm2
> > > +        pand      %xmm4, %xmm12
> > > +        psubq     %xmm12, %xmm2
> > > +        mulpd     %xmm1, %xmm3
> > > +
> > > +/* scale DblRcp */
> > > +        mulpd     %xmm1, %xmm2
> > > +        subpd     %xmm0, %xmm3
> > > +
> > > +/*
> > > + * argument reduction
> > > + * VQFMS( D, R, X, DblRcp1, One );
> > > + */
> > > +        mulpd     %xmm2, %xmm5
> > > +        addpd     %xmm5, %xmm3
> > > +
> > > +/* exponent*log(2.0) */
> > > +        movups    Threshold+__svml_dlog1p_data_internal(%rip), %xmm10
> > > +
> > > +/* exponent bits */
> > > +        psrlq     $20, %xmm4
> > > +        pshufd    $221, %xmm4, %xmm14
> > > +
> > > +/*
> > > + * prepare table index
> > > + * table lookup
> > > + */
> > > +        movaps    %xmm1, %xmm4
> > > +        cmpltpd   %xmm1, %xmm10
> > > +
> > > +/* biased exponent in DP format */
> > > +        cvtdq2pd  %xmm14, %xmm0
> > > +
> > > +/* polynomial */
> > > +        movups    poly_coeff+__svml_dlog1p_data_internal(%rip), %xmm1
> > > +        movaps    %xmm3, %xmm5
> > > +        mulpd     %xmm3, %xmm1
> > > +        mulpd     %xmm3, %xmm5
> > > +        addpd     poly_coeff+16+__svml_dlog1p_data_internal(%rip), %xmm1
> > > +        movups    poly_coeff+32+__svml_dlog1p_data_internal(%rip), %xmm2
> > > +        psrlq     $40, %xmm4
> > > +        mulpd     %xmm3, %xmm2
> > > +        mulpd     %xmm5, %xmm1
> > > +        addpd     poly_coeff+48+__svml_dlog1p_data_internal(%rip), %xmm2
> > > +        movd      %xmm4, %eax
> > > +        andps     Bias+__svml_dlog1p_data_internal(%rip), %xmm10
> > > +        addpd     %xmm1, %xmm2
> > > +
> > > +/* reconstruction */
> > > +        mulpd     %xmm2, %xmm5
> > > +        orps      Bias1+__svml_dlog1p_data_internal(%rip), %xmm10
> > > +        pshufd    $2, %xmm4, %xmm9
> > > +        subpd     %xmm10, %xmm0
> > > +        addpd     %xmm5, %xmm3
> > > +        movd      %xmm9, %ecx
> > > +        mulpd     L2+__svml_dlog1p_data_internal(%rip), %xmm0
> > > +        movslq    %eax, %rax
> > > +        movslq    %ecx, %rcx
> > > +        movsd     (%rsi,%rax), %xmm11
> > > +        movhpd    (%rsi,%rcx), %xmm11
> > > +        addpd     %xmm3, %xmm11
> > > +        addpd     %xmm11, %xmm0
> > > +
> > > +/* OR in the Sign of input argument to produce correct log1p(-0) */
> > > +        orps      %xmm6, %xmm0
> > > +        testl     %edx, %edx
> > > +
> > > +/* Go to special inputs processing branch */
> > > +        jne       L(SPECIAL_VALUES_BRANCH)
> > > +                                # LOE rbx r12 r13 r14 r15 edx xmm0 xmm7
> > > +
> > > +/* Restore registers
> > > + * and exit the function
> > > + */
> > > +
> > > +L(EXIT):
> > > +        movq      %rbp, %rsp
> > > +        popq      %rbp
> > > +        cfi_def_cfa(7, 8)
> > > +        cfi_restore(6)
> > > +        ret
> > > +        cfi_def_cfa(6, 16)
> > > +        cfi_offset(6, -16)
> > > +
> > > +/* Branch to process
> > > + * special inputs
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_BRANCH):
> > > +        movups    %xmm7, 32(%rsp)
> > > +        movups    %xmm0, 48(%rsp)
> > > +                                # LOE rbx r12 r13 r14 r15 edx
> > > +
> > > +        xorl      %eax, %eax
> > > +        movq      %r12, 16(%rsp)
> > > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> > > +        movl      %eax, %r12d
> > > +        movq      %r13, 8(%rsp)
> > > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> > > +        movl      %edx, %r13d
> > > +        movq      %r14, (%rsp)
> > > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +/* Range mask
> > > + * bits check
> > > + */
> > > +
> > > +L(RANGEMASK_CHECK):
> > > +        btl       %r12d, %r13d
> > > +
> > > +/* Call scalar math function */
> > > +        jc        L(SCALAR_MATH_CALL)
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +/* Special inputs
> > > + * processing loop
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_LOOP):
> > > +        incl      %r12d
> > > +        cmpl      $2, %r12d
> > > +
> > > +/* Check bits in range mask */
> > > +        jl        L(RANGEMASK_CHECK)
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +        movq      16(%rsp), %r12
> > > +        cfi_restore(12)
> > > +        movq      8(%rsp), %r13
> > > +        cfi_restore(13)
> > > +        movq      (%rsp), %r14
> > > +        cfi_restore(14)
> > > +        movups    48(%rsp), %xmm0
> > > +
> > > +/* Go to exit */
> > > +        jmp       L(EXIT)
> > > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -48; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xff, 0xff, 0xff, 0x22
> > > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -56; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xff, 0xff, 0xff, 0x22
> > > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -64; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x22
> > > +                                # LOE rbx r12 r13 r14 r15 xmm0
> > > +
> > > +/* Scalar math fucntion call
> > > + * to process special input
> > > + */
> > > +
> > > +L(SCALAR_MATH_CALL):
> > > +        movl      %r12d, %r14d
> > > +        movsd     32(%rsp,%r14,8), %xmm0
> > > +        call      log1p@PLT
> > > +                                # LOE rbx r14 r15 r12d r13d xmm0
> > > +
> > > +        movsd     %xmm0, 48(%rsp,%r14,8)
> > > +
> > > +/* Process special inputs in loop */
> > > +        jmp       L(SPECIAL_VALUES_LOOP)
> > > +                                # LOE rbx r15 r12d r13d
> > > +END(_ZGVbN2v_log1p_sse4)
> > > +
> > > +        .section .rodata, "a"
> > > +        .align 16
> > > +
> > > +#ifdef __svml_dlog1p_data_internal_typedef
> > > +typedef unsigned int VUINT32;
> > > +typedef struct {
> > > +        __declspec(align(16)) VUINT32 Log_HA_table[(1<<10)+2][2];
> > > +        __declspec(align(16)) VUINT32 Log_LA_table[(1<<9)+1][2];
> > > +        __declspec(align(16)) VUINT32 poly_coeff[4][2][2];
> > > +        __declspec(align(16)) VUINT32 ExpMask[2][2];
> > > +        __declspec(align(16)) VUINT32 Two10[2][2];
> > > +        __declspec(align(16)) VUINT32 MinLog1p[2][2];
> > > +        __declspec(align(16)) VUINT32 MaxLog1p[2][2];
> > > +        __declspec(align(16)) VUINT32 One[2][2];
> > > +        __declspec(align(16)) VUINT32 SgnMask[2][2];
> > > +        __declspec(align(16)) VUINT32 XThreshold[2][2];
> > > +        __declspec(align(16)) VUINT32 XhMask[2][2];
> > > +        __declspec(align(16)) VUINT32 Threshold[2][2];
> > > +        __declspec(align(16)) VUINT32 Bias[2][2];
> > > +        __declspec(align(16)) VUINT32 Bias1[2][2];
> > > +        __declspec(align(16)) VUINT32 ExpMask0[2][2];
> > > +        __declspec(align(16)) VUINT32 ExpMask2[2][2];
> > > +        __declspec(align(16)) VUINT32 L2[2][2];
> > > +} __svml_dlog1p_data_internal;
> > > +#endif
> > > +__svml_dlog1p_data_internal:
> > > +        /* Log_HA_table */
> > > +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> > > +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> > > +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> > > +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> > > +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> > > +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> > > +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> > > +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> > > +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> > > +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> > > +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> > > +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> > > +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> > > +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> > > +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> > > +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> > > +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> > > +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> > > +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> > > +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> > > +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> > > +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> > > +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> > > +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> > > +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> > > +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> > > +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> > > +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> > > +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> > > +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> > > +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> > > +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> > > +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> > > +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> > > +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> > > +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> > > +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> > > +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> > > +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> > > +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> > > +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> > > +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> > > +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> > > +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> > > +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> > > +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> > > +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> > > +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> > > +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> > > +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> > > +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> > > +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> > > +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> > > +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> > > +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> > > +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> > > +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> > > +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> > > +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> > > +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> > > +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> > > +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> > > +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> > > +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> > > +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> > > +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> > > +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> > > +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> > > +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> > > +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> > > +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> > > +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> > > +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> > > +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> > > +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> > > +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> > > +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> > > +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> > > +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> > > +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> > > +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> > > +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> > > +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> > > +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> > > +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> > > +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> > > +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> > > +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> > > +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> > > +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> > > +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> > > +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> > > +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> > > +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> > > +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> > > +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> > > +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> > > +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> > > +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> > > +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> > > +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> > > +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> > > +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> > > +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> > > +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> > > +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> > > +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> > > +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> > > +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> > > +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> > > +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> > > +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> > > +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> > > +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> > > +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> > > +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> > > +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> > > +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> > > +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> > > +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> > > +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> > > +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> > > +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> > > +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> > > +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> > > +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> > > +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> > > +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> > > +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> > > +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> > > +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> > > +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> > > +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> > > +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> > > +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> > > +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> > > +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> > > +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> > > +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> > > +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> > > +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> > > +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> > > +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> > > +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> > > +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> > > +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> > > +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> > > +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> > > +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> > > +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> > > +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> > > +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> > > +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> > > +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> > > +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> > > +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> > > +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> > > +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> > > +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> > > +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> > > +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> > > +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> > > +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> > > +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> > > +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> > > +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> > > +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> > > +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> > > +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> > > +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> > > +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> > > +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> > > +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> > > +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> > > +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> > > +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> > > +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> > > +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> > > +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> > > +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> > > +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> > > +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> > > +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> > > +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> > > +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> > > +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> > > +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> > > +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> > > +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> > > +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> > > +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> > > +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> > > +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> > > +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> > > +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> > > +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> > > +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> > > +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> > > +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> > > +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> > > +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> > > +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> > > +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> > > +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> > > +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> > > +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> > > +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> > > +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> > > +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> > > +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> > > +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> > > +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> > > +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> > > +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> > > +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> > > +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> > > +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> > > +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> > > +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> > > +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> > > +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> > > +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> > > +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> > > +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> > > +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> > > +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> > > +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> > > +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> > > +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> > > +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> > > +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> > > +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> > > +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> > > +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> > > +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> > > +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> > > +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> > > +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> > > +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> > > +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> > > +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> > > +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> > > +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> > > +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> > > +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> > > +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> > > +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> > > +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> > > +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> > > +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> > > +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> > > +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> > > +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> > > +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> > > +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> > > +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> > > +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> > > +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> > > +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> > > +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> > > +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> > > +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> > > +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> > > +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> > > +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> > > +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> > > +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> > > +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> > > +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> > > +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> > > +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> > > +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> > > +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> > > +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> > > +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> > > +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> > > +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> > > +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> > > +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> > > +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> > > +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> > > +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> > > +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> > > +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> > > +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> > > +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> > > +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> > > +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> > > +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> > > +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> > > +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> > > +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> > > +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> > > +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> > > +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> > > +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> > > +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> > > +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> > > +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> > > +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> > > +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> > > +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> > > +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> > > +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> > > +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> > > +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> > > +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> > > +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> > > +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> > > +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> > > +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> > > +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> > > +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> > > +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> > > +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> > > +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> > > +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> > > +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> > > +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> > > +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> > > +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> > > +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> > > +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> > > +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> > > +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> > > +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> > > +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> > > +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> > > +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> > > +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> > > +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> > > +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> > > +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> > > +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> > > +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> > > +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> > > +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> > > +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> > > +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> > > +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> > > +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> > > +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> > > +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> > > +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> > > +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> > > +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> > > +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> > > +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> > > +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> > > +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> > > +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> > > +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> > > +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> > > +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> > > +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> > > +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> > > +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> > > +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> > > +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> > > +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> > > +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> > > +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> > > +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> > > +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> > > +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> > > +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> > > +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> > > +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> > > +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> > > +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> > > +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> > > +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> > > +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> > > +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> > > +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> > > +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> > > +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> > > +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> > > +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> > > +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> > > +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> > > +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> > > +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> > > +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> > > +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> > > +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> > > +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> > > +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> > > +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> > > +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> > > +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> > > +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> > > +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> > > +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> > > +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> > > +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> > > +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> > > +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> > > +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> > > +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> > > +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> > > +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> > > +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> > > +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> > > +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> > > +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> > > +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> > > +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> > > +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> > > +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> > > +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> > > +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> > > +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> > > +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> > > +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> > > +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> > > +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> > > +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> > > +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> > > +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> > > +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> > > +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> > > +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> > > +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> > > +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> > > +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> > > +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> > > +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> > > +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> > > +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> > > +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> > > +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> > > +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> > > +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> > > +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> > > +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> > > +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> > > +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> > > +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> > > +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> > > +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> > > +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> > > +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> > > +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> > > +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> > > +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> > > +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> > > +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> > > +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> > > +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> > > +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> > > +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> > > +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> > > +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> > > +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> > > +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> > > +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> > > +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> > > +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> > > +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> > > +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> > > +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> > > +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> > > +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> > > +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> > > +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> > > +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> > > +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> > > +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> > > +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> > > +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> > > +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> > > +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> > > +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> > > +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> > > +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> > > +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> > > +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> > > +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> > > +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> > > +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> > > +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> > > +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> > > +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> > > +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> > > +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> > > +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> > > +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> > > +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> > > +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> > > +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> > > +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> > > +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> > > +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> > > +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> > > +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> > > +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> > > +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> > > +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> > > +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> > > +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> > > +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> > > +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> > > +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> > > +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> > > +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> > > +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> > > +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> > > +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> > > +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> > > +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> > > +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> > > +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> > > +        /*== Log_LA_table ==*/
> > > +        .align 16
> > > +        .quad 0x8000000000000000
> > > +        .quad 0xbf5ff802a9ab10e6
> > > +        .quad 0xbf6ff00aa2b10bc0
> > > +        .quad 0xbf77ee11ebd82e94
> > > +        .quad 0xbf7fe02a6b106789
> > > +        .quad 0xbf83e7295d25a7d9
> > > +        .quad 0xbf87dc475f810a77
> > > +        .quad 0xbf8bcf712c74384c
> > > +        .quad 0xbf8fc0a8b0fc03e4
> > > +        .quad 0xbf91d7f7eb9eebe7
> > > +        .quad 0xbf93cea44346a575
> > > +        .quad 0xbf95c45a51b8d389
> > > +        .quad 0xbf97b91b07d5b11b
> > > +        .quad 0xbf99ace7551cc514
> > > +        .quad 0xbf9b9fc027af9198
> > > +        .quad 0xbf9d91a66c543cc4
> > > +        .quad 0xbf9f829b0e783300
> > > +        .quad 0xbfa0b94f7c196176
> > > +        .quad 0xbfa1b0d98923d980
> > > +        .quad 0xbfa2a7ec2214e873
> > > +        .quad 0xbfa39e87b9febd60
> > > +        .quad 0xbfa494acc34d911c
> > > +        .quad 0xbfa58a5bafc8e4d5
> > > +        .quad 0xbfa67f94f094bd98
> > > +        .quad 0xbfa77458f632dcfc
> > > +        .quad 0xbfa868a83083f6cf
> > > +        .quad 0xbfa95c830ec8e3eb
> > > +        .quad 0xbfaa4fe9ffa3d235
> > > +        .quad 0xbfab42dd711971bf
> > > +        .quad 0xbfac355dd0921f2d
> > > +        .quad 0xbfad276b8adb0b52
> > > +        .quad 0xbfae19070c276016
> > > +        .quad 0xbfaf0a30c01162a6
> > > +        .quad 0xbfaffae9119b9303
> > > +        .quad 0xbfb075983598e471
> > > +        .quad 0xbfb0ed839b5526fe
> > > +        .quad 0xbfb16536eea37ae1
> > > +        .quad 0xbfb1dcb263db1944
> > > +        .quad 0xbfb253f62f0a1417
> > > +        .quad 0xbfb2cb0283f5de1f
> > > +        .quad 0xbfb341d7961bd1d1
> > > +        .quad 0xbfb3b87598b1b6ee
> > > +        .quad 0xbfb42edcbea646f0
> > > +        .quad 0xbfb4a50d3aa1b040
> > > +        .quad 0xbfb51b073f06183f
> > > +        .quad 0xbfb590cafdf01c28
> > > +        .quad 0xbfb60658a93750c4
> > > +        .quad 0xbfb67bb0726ec0fc
> > > +        .quad 0xbfb6f0d28ae56b4c
> > > +        .quad 0xbfb765bf23a6be13
> > > +        .quad 0xbfb7da766d7b12cd
> > > +        .quad 0xbfb84ef898e8282a
> > > +        .quad 0xbfb8c345d6319b21
> > > +        .quad 0xbfb9375e55595ede
> > > +        .quad 0xbfb9ab42462033ad
> > > +        .quad 0xbfba1ef1d8061cd4
> > > +        .quad 0xbfba926d3a4ad563
> > > +        .quad 0xbfbb05b49bee43fe
> > > +        .quad 0xbfbb78c82bb0eda1
> > > +        .quad 0xbfbbeba818146765
> > > +        .quad 0xbfbc5e548f5bc743
> > > +        .quad 0xbfbcd0cdbf8c13e1
> > > +        .quad 0xbfbd4313d66cb35d
> > > +        .quad 0xbfbdb5270187d927
> > > +        .quad 0xbfbe27076e2af2e6
> > > +        .quad 0xbfbe98b549671467
> > > +        .quad 0xbfbf0a30c01162a6
> > > +        .quad 0xbfbf7b79fec37ddf
> > > +        .quad 0xbfbfec9131dbeabb
> > > +        .quad 0xbfc02ebb42bf3d4b
> > > +        .quad 0xbfc0671512ca596e
> > > +        .quad 0xbfc09f561ee719c3
> > > +        .quad 0xbfc0d77e7cd08e59
> > > +        .quad 0xbfc10f8e422539b1
> > > +        .quad 0xbfc14785846742ac
> > > +        .quad 0xbfc17f6458fca611
> > > +        .quad 0xbfc1b72ad52f67a0
> > > +        .quad 0xbfc1eed90e2dc2c3
> > > +        .quad 0xbfc2266f190a5acb
> > > +        .quad 0xbfc25ded0abc6ad2
> > > +        .quad 0xbfc29552f81ff523
> > > +        .quad 0xbfc2cca0f5f5f251
> > > +        .quad 0xbfc303d718e47fd3
> > > +        .quad 0xbfc33af575770e4f
> > > +        .quad 0xbfc371fc201e8f74
> > > +        .quad 0xbfc3a8eb2d31a376
> > > +        .quad 0xbfc3dfc2b0ecc62a
> > > +        .quad 0xbfc41682bf727bc0
> > > +        .quad 0xbfc44d2b6ccb7d1e
> > > +        .quad 0xbfc483bccce6e3dd
> > > +        .quad 0xbfc4ba36f39a55e5
> > > +        .quad 0xbfc4f099f4a230b2
> > > +        .quad 0xbfc526e5e3a1b438
> > > +        .quad 0xbfc55d1ad4232d6f
> > > +        .quad 0xbfc59338d9982086
> > > +        .quad 0xbfc5c940075972b9
> > > +        .quad 0xbfc5ff3070a793d4
> > > +        .quad 0xbfc6350a28aaa758
> > > +        .quad 0xbfc66acd4272ad51
> > > +        .quad 0xbfc6a079d0f7aad2
> > > +        .quad 0xbfc6d60fe719d21d
> > > +        .quad 0xbfc70b8f97a1aa75
> > > +        .quad 0xbfc740f8f54037a5
> > > +        .quad 0xbfc7764c128f2127
> > > +        .quad 0xbfc7ab890210d909
> > > +        .quad 0xbfc7e0afd630c274
> > > +        .quad 0xbfc815c0a14357eb
> > > +        .quad 0xbfc84abb75865139
> > > +        .quad 0xbfc87fa06520c911
> > > +        .quad 0xbfc8b46f8223625b
> > > +        .quad 0xbfc8e928de886d41
> > > +        .quad 0xbfc91dcc8c340bde
> > > +        .quad 0xbfc9525a9cf456b4
> > > +        .quad 0xbfc986d3228180ca
> > > +        .quad 0xbfc9bb362e7dfb83
> > > +        .quad 0xbfc9ef83d2769a34
> > > +        .quad 0xbfca23bc1fe2b563
> > > +        .quad 0xbfca57df28244dcd
> > > +        .quad 0xbfca8becfc882f19
> > > +        .quad 0xbfcabfe5ae46124c
> > > +        .quad 0xbfcaf3c94e80bff3
> > > +        .quad 0xbfcb2797ee46320c
> > > +        .quad 0xbfcb5b519e8fb5a4
> > > +        .quad 0xbfcb8ef670420c3b
> > > +        .quad 0xbfcbc286742d8cd6
> > > +        .quad 0xbfcbf601bb0e44e2
> > > +        .quad 0xbfcc2968558c18c1
> > > +        .quad 0xbfcc5cba543ae425
> > > +        .quad 0xbfcc8ff7c79a9a22
> > > +        .quad 0xbfccc320c0176502
> > > +        .quad 0xbfccf6354e09c5dc
> > > +        .quad 0xbfcd293581b6b3e7
> > > +        .quad 0xbfcd5c216b4fbb91
> > > +        .quad 0xbfcd8ef91af31d5e
> > > +        .quad 0xbfcdc1bca0abec7d
> > > +        .quad 0xbfcdf46c0c722d2f
> > > +        .quad 0xbfce27076e2af2e6
> > > +        .quad 0xbfce598ed5a87e2f
> > > +        .quad 0xbfce8c0252aa5a60
> > > +        .quad 0xbfcebe61f4dd7b0b
> > > +        .quad 0xbfcef0adcbdc5936
> > > +        .quad 0xbfcf22e5e72f105d
> > > +        .quad 0xbfcf550a564b7b37
> > > +        .quad 0xbfcf871b28955045
> > > +        .quad 0xbfcfb9186d5e3e2b
> > > +        .quad 0xbfcfeb0233e607cc
> > > +        .quad 0xbfd00e6c45ad501d
> > > +        .quad 0xbfd0274dc16c232f
> > > +        .quad 0xbfd0402594b4d041
> > > +        .quad 0xbfd058f3c703ebc6
> > > +        .quad 0xbfd071b85fcd590d
> > > +        .quad 0xbfd08a73667c57af
> > > +        .quad 0xbfd0a324e27390e3
> > > +        .quad 0xbfd0bbccdb0d24bd
> > > +        .quad 0xbfd0d46b579ab74b
> > > +        .quad 0xbfd0ed005f657da4
> > > +        .quad 0xbfd1058bf9ae4ad5
> > > +        .quad 0xbfd11e0e2dad9cb7
> > > +        .quad 0xbfd136870293a8b0
> > > +        .quad 0xbfd14ef67f88685a
> > > +        .quad 0xbfd1675cababa60e
> > > +        .quad 0xbfd17fb98e15095d
> > > +        .quad 0xbfd1980d2dd4236f
> > > +        .quad 0xbfd1b05791f07b49
> > > +        .quad 0xbfd1c898c16999fb
> > > +        .quad 0xbfd1e0d0c33716be
> > > +        .quad 0xbfd1f8ff9e48a2f3
> > > +        .quad 0xbfd211255986160c
> > > +        .quad 0xbfd22941fbcf7966
> > > +        .quad 0xbfd241558bfd1404
> > > +        .quad 0xbfd2596010df763a
> > > +        .quad 0xbfd27161913f853d
> > > +        .quad 0xbfd2895a13de86a3
> > > +        .quad 0xbfd2a1499f762bc9
> > > +        .quad 0xbfd2b9303ab89d25
> > > +        .quad 0xbfd2d10dec508583
> > > +        .quad 0xbfd2e8e2bae11d31
> > > +        .quad 0xbfd300aead06350c
> > > +        .quad 0xbfd31871c9544185
> > > +        .quad 0xbfd3302c16586588
> > > +        .quad 0xbfd347dd9a987d55
> > > +        .quad 0xbfd35f865c93293e
> > > +        .quad 0xbfd3772662bfd85b
> > > +        .quad 0xbfd38ebdb38ed321
> > > +        .quad 0xbfd3a64c556945ea
> > > +        .quad 0xbfd3bdd24eb14b6a
> > > +        .quad 0xbfd3d54fa5c1f710
> > > +        .quad 0xbfd3ecc460ef5f50
> > > +        .quad 0xbfd404308686a7e4
> > > +        .quad 0xbfd41b941cce0bee
> > > +        .quad 0xbfd432ef2a04e814
> > > +        .quad 0xbfd44a41b463c47c
> > > +        .quad 0xbfd4618bc21c5ec2
> > > +        .quad 0xbfd478cd5959b3d9
> > > +        .quad 0xbfd49006804009d1
> > > +        .quad 0xbfd4a7373cecf997
> > > +        .quad 0xbfd4be5f957778a1
> > > +        .quad 0xbfd4d57f8fefe27f
> > > +        .quad 0xbfd4ec973260026a
> > > +        .quad 0xbfd503a682cb1cb3
> > > +        .quad 0xbfd51aad872df82d
> > > +        .quad 0xbfd531ac457ee77e
> > > +        .quad 0xbfd548a2c3add263
> > > +        .quad 0xbfd55f9107a43ee2
> > > +        .quad 0xbfd5767717455a6c
> > > +        .quad 0xbfd58d54f86e02f2
> > > +        .quad 0xbfd5a42ab0f4cfe2
> > > +        .quad 0xbfd5baf846aa1b19
> > > +        .quad 0xbfd5d1bdbf5809ca
> > > +        .quad 0xbfd5e87b20c2954a
> > > +        .quad 0xbfd5ff3070a793d4
> > > +        .quad 0xbfd615ddb4bec13c
> > > +        .quad 0xbfd62c82f2b9c795
> > > +        .quad 0x3fd61965cdb02c1f
> > > +        .quad 0x3fd602d08af091ec
> > > +        .quad 0x3fd5ec433d5c35ae
> > > +        .quad 0x3fd5d5bddf595f30
> > > +        .quad 0x3fd5bf406b543db2
> > > +        .quad 0x3fd5a8cadbbedfa1
> > > +        .quad 0x3fd5925d2b112a59
> > > +        .quad 0x3fd57bf753c8d1fb
> > > +        .quad 0x3fd565995069514c
> > > +        .quad 0x3fd54f431b7be1a9
> > > +        .quad 0x3fd538f4af8f72fe
> > > +        .quad 0x3fd522ae0738a3d8
> > > +        .quad 0x3fd50c6f1d11b97c
> > > +        .quad 0x3fd4f637ebba9810
> > > +        .quad 0x3fd4e0086dd8baca
> > > +        .quad 0x3fd4c9e09e172c3c
> > > +        .quad 0x3fd4b3c077267e9a
> > > +        .quad 0x3fd49da7f3bcc41f
> > > +        .quad 0x3fd487970e958770
> > > +        .quad 0x3fd4718dc271c41b
> > > +        .quad 0x3fd45b8c0a17df13
> > > +        .quad 0x3fd44591e0539f49
> > > +        .quad 0x3fd42f9f3ff62642
> > > +        .quad 0x3fd419b423d5e8c7
> > > +        .quad 0x3fd403d086cea79c
> > > +        .quad 0x3fd3edf463c1683e
> > > +        .quad 0x3fd3d81fb5946dba
> > > +        .quad 0x3fd3c25277333184
> > > +        .quad 0x3fd3ac8ca38e5c5f
> > > +        .quad 0x3fd396ce359bbf54
> > > +        .quad 0x3fd3811728564cb2
> > > +        .quad 0x3fd36b6776be1117
> > > +        .quad 0x3fd355bf1bd82c8b
> > > +        .quad 0x3fd3401e12aecba1
> > > +        .quad 0x3fd32a84565120a8
> > > +        .quad 0x3fd314f1e1d35ce4
> > > +        .quad 0x3fd2ff66b04ea9d4
> > > +        .quad 0x3fd2e9e2bce12286
> > > +        .quad 0x3fd2d46602adccee
> > > +        .quad 0x3fd2bef07cdc9354
> > > +        .quad 0x3fd2a982269a3dbf
> > > +        .quad 0x3fd2941afb186b7c
> > > +        .quad 0x3fd27ebaf58d8c9d
> > > +        .quad 0x3fd269621134db92
> > > +        .quad 0x3fd25410494e56c7
> > > +        .quad 0x3fd23ec5991eba49
> > > +        .quad 0x3fd22981fbef797b
> > > +        .quad 0x3fd214456d0eb8d4
> > > +        .quad 0x3fd1ff0fe7cf47a7
> > > +        .quad 0x3fd1e9e1678899f4
> > > +        .quad 0x3fd1d4b9e796c245
> > > +        .quad 0x3fd1bf99635a6b95
> > > +        .quad 0x3fd1aa7fd638d33f
> > > +        .quad 0x3fd1956d3b9bc2fa
> > > +        .quad 0x3fd180618ef18adf
> > > +        .quad 0x3fd16b5ccbacfb73
> > > +        .quad 0x3fd1565eed455fc3
> > > +        .quad 0x3fd14167ef367783
> > > +        .quad 0x3fd12c77cd00713b
> > > +        .quad 0x3fd1178e8227e47c
> > > +        .quad 0x3fd102ac0a35cc1c
> > > +        .quad 0x3fd0edd060b78081
> > > +        .quad 0x3fd0d8fb813eb1ef
> > > +        .quad 0x3fd0c42d676162e3
> > > +        .quad 0x3fd0af660eb9e279
> > > +        .quad 0x3fd09aa572e6c6d4
> > > +        .quad 0x3fd085eb8f8ae797
> > > +        .quad 0x3fd07138604d5862
> > > +        .quad 0x3fd05c8be0d9635a
> > > +        .quad 0x3fd047e60cde83b8
> > > +        .quad 0x3fd03346e0106062
> > > +        .quad 0x3fd01eae5626c691
> > > +        .quad 0x3fd00a1c6adda473
> > > +        .quad 0x3fcfeb2233ea07cd
> > > +        .quad 0x3fcfc218be620a5e
> > > +        .quad 0x3fcf991c6cb3b379
> > > +        .quad 0x3fcf702d36777df0
> > > +        .quad 0x3fcf474b134df229
> > > +        .quad 0x3fcf1e75fadf9bde
> > > +        .quad 0x3fcef5ade4dcffe6
> > > +        .quad 0x3fceccf2c8fe920a
> > > +        .quad 0x3fcea4449f04aaf5
> > > +        .quad 0x3fce7ba35eb77e2a
> > > +        .quad 0x3fce530effe71012
> > > +        .quad 0x3fce2a877a6b2c12
> > > +        .quad 0x3fce020cc6235ab5
> > > +        .quad 0x3fcdd99edaf6d7e9
> > > +        .quad 0x3fcdb13db0d48940
> > > +        .quad 0x3fcd88e93fb2f450
> > > +        .quad 0x3fcd60a17f903515
> > > +        .quad 0x3fcd38666871f465
> > > +        .quad 0x3fcd1037f2655e7b
> > > +        .quad 0x3fcce816157f1988
> > > +        .quad 0x3fccc000c9db3c52
> > > +        .quad 0x3fcc97f8079d44ec
> > > +        .quad 0x3fcc6ffbc6f00f71
> > > +        .quad 0x3fcc480c0005ccd1
> > > +        .quad 0x3fcc2028ab17f9b4
> > > +        .quad 0x3fcbf851c067555f
> > > +        .quad 0x3fcbd087383bd8ad
> > > +        .quad 0x3fcba8c90ae4ad19
> > > +        .quad 0x3fcb811730b823d2
> > > +        .quad 0x3fcb5971a213acdb
> > > +        .quad 0x3fcb31d8575bce3d
> > > +        .quad 0x3fcb0a4b48fc1b46
> > > +        .quad 0x3fcae2ca6f672bd4
> > > +        .quad 0x3fcabb55c31693ad
> > > +        .quad 0x3fca93ed3c8ad9e3
> > > +        .quad 0x3fca6c90d44b704e
> > > +        .quad 0x3fca454082e6ab05
> > > +        .quad 0x3fca1dfc40f1b7f1
> > > +        .quad 0x3fc9f6c407089664
> > > +        .quad 0x3fc9cf97cdce0ec3
> > > +        .quad 0x3fc9a8778debaa38
> > > +        .quad 0x3fc981634011aa75
> > > +        .quad 0x3fc95a5adcf7017f
> > > +        .quad 0x3fc9335e5d594989
> > > +        .quad 0x3fc90c6db9fcbcd9
> > > +        .quad 0x3fc8e588ebac2dbf
> > > +        .quad 0x3fc8beafeb38fe8c
> > > +        .quad 0x3fc897e2b17b19a5
> > > +        .quad 0x3fc871213750e994
> > > +        .quad 0x3fc84a6b759f512f
> > > +        .quad 0x3fc823c16551a3c2
> > > +        .quad 0x3fc7fd22ff599d4f
> > > +        .quad 0x3fc7d6903caf5ad0
> > > +        .quad 0x3fc7b0091651528c
> > > +        .quad 0x3fc7898d85444c73
> > > +        .quad 0x3fc7631d82935a86
> > > +        .quad 0x3fc73cb9074fd14d
> > > +        .quad 0x3fc716600c914054
> > > +        .quad 0x3fc6f0128b756abc
> > > +        .quad 0x3fc6c9d07d203fc7
> > > +        .quad 0x3fc6a399dabbd383
> > > +        .quad 0x3fc67d6e9d785771
> > > +        .quad 0x3fc6574ebe8c133a
> > > +        .quad 0x3fc6313a37335d76
> > > +        .quad 0x3fc60b3100b09476
> > > +        .quad 0x3fc5e533144c1719
> > > +        .quad 0x3fc5bf406b543db2
> > > +        .quad 0x3fc59958ff1d52f1
> > > +        .quad 0x3fc5737cc9018cdd
> > > +        .quad 0x3fc54dabc26105d2
> > > +        .quad 0x3fc527e5e4a1b58d
> > > +        .quad 0x3fc5022b292f6a45
> > > +        .quad 0x3fc4dc7b897bc1c8
> > > +        .quad 0x3fc4b6d6fefe22a4
> > > +        .quad 0x3fc4913d8333b561
> > > +        .quad 0x3fc46baf0f9f5db7
> > > +        .quad 0x3fc4462b9dc9b3dc
> > > +        .quad 0x3fc420b32740fdd4
> > > +        .quad 0x3fc3fb45a59928cc
> > > +        .quad 0x3fc3d5e3126bc27f
> > > +        .quad 0x3fc3b08b6757f2a9
> > > +        .quad 0x3fc38b3e9e027479
> > > +        .quad 0x3fc365fcb0159016
> > > +        .quad 0x3fc340c59741142e
> > > +        .quad 0x3fc31b994d3a4f85
> > > +        .quad 0x3fc2f677cbbc0a96
> > > +        .quad 0x3fc2d1610c86813a
> > > +        .quad 0x3fc2ac55095f5c59
> > > +        .quad 0x3fc28753bc11aba5
> > > +        .quad 0x3fc2625d1e6ddf57
> > > +        .quad 0x3fc23d712a49c202
> > > +        .quad 0x3fc2188fd9807263
> > > +        .quad 0x3fc1f3b925f25d41
> > > +        .quad 0x3fc1ceed09853752
> > > +        .quad 0x3fc1aa2b7e23f72a
> > > +        .quad 0x3fc185747dbecf34
> > > +        .quad 0x3fc160c8024b27b1
> > > +        .quad 0x3fc13c2605c398c3
> > > +        .quad 0x3fc1178e8227e47c
> > > +        .quad 0x3fc0f301717cf0fb
> > > +        .quad 0x3fc0ce7ecdccc28d
> > > +        .quad 0x3fc0aa06912675d5
> > > +        .quad 0x3fc08598b59e3a07
> > > +        .quad 0x3fc06135354d4b18
> > > +        .quad 0x3fc03cdc0a51ec0d
> > > +        .quad 0x3fc0188d2ecf6140
> > > +        .quad 0x3fbfe89139dbd566
> > > +        .quad 0x3fbfa01c9db57ce2
> > > +        .quad 0x3fbf57bc7d9005db
> > > +        .quad 0x3fbf0f70cdd992e3
> > > +        .quad 0x3fbec739830a1120
> > > +        .quad 0x3fbe7f1691a32d3e
> > > +        .quad 0x3fbe3707ee30487b
> > > +        .quad 0x3fbdef0d8d466db9
> > > +        .quad 0x3fbda727638446a2
> > > +        .quad 0x3fbd5f55659210e2
> > > +        .quad 0x3fbd179788219364
> > > +        .quad 0x3fbccfedbfee13a8
> > > +        .quad 0x3fbc885801bc4b23
> > > +        .quad 0x3fbc40d6425a5cb1
> > > +        .quad 0x3fbbf968769fca11
> > > +        .quad 0x3fbbb20e936d6974
> > > +        .quad 0x3fbb6ac88dad5b1c
> > > +        .quad 0x3fbb23965a52ff00
> > > +        .quad 0x3fbadc77ee5aea8c
> > > +        .quad 0x3fba956d3ecade63
> > > +        .quad 0x3fba4e7640b1bc38
> > > +        .quad 0x3fba0792e9277cac
> > > +        .quad 0x3fb9c0c32d4d2548
> > > +        .quad 0x3fb97a07024cbe74
> > > +        .quad 0x3fb9335e5d594989
> > > +        .quad 0x3fb8ecc933aeb6e8
> > > +        .quad 0x3fb8a6477a91dc29
> > > +        .quad 0x3fb85fd927506a48
> > > +        .quad 0x3fb8197e2f40e3f0
> > > +        .quad 0x3fb7d33687c293c9
> > > +        .quad 0x3fb78d02263d82d3
> > > +        .quad 0x3fb746e100226ed9
> > > +        .quad 0x3fb700d30aeac0e1
> > > +        .quad 0x3fb6bad83c1883b6
> > > +        .quad 0x3fb674f089365a7a
> > > +        .quad 0x3fb62f1be7d77743
> > > +        .quad 0x3fb5e95a4d9791cb
> > > +        .quad 0x3fb5a3abb01ade25
> > > +        .quad 0x3fb55e10050e0384
> > > +        .quad 0x3fb518874226130a
> > > +        .quad 0x3fb4d3115d207eac
> > > +        .quad 0x3fb48dae4bc31018
> > > +        .quad 0x3fb4485e03dbdfad
> > > +        .quad 0x3fb403207b414b7f
> > > +        .quad 0x3fb3bdf5a7d1ee64
> > > +        .quad 0x3fb378dd7f749714
> > > +        .quad 0x3fb333d7f8183f4b
> > > +        .quad 0x3fb2eee507b40301
> > > +        .quad 0x3fb2aa04a44717a5
> > > +        .quad 0x3fb26536c3d8c369
> > > +        .quad 0x3fb2207b5c78549e
> > > +        .quad 0x3fb1dbd2643d190b
> > > +        .quad 0x3fb1973bd1465567
> > > +        .quad 0x3fb152b799bb3cc9
> > > +        .quad 0x3fb10e45b3cae831
> > > +        .quad 0x3fb0c9e615ac4e17
> > > +        .quad 0x3fb08598b59e3a07
> > > +        .quad 0x3fb0415d89e74444
> > > +        .quad 0x3faffa6911ab9301
> > > +        .quad 0x3faf723b517fc523
> > > +        .quad 0x3faeea31c006b87c
> > > +        .quad 0x3fae624c4a0b5e1b
> > > +        .quad 0x3fadda8adc67ee4e
> > > +        .quad 0x3fad52ed6405d86f
> > > +        .quad 0x3faccb73cdddb2cc
> > > +        .quad 0x3fac441e06f72a9e
> > > +        .quad 0x3fabbcebfc68f420
> > > +        .quad 0x3fab35dd9b58baad
> > > +        .quad 0x3faaaef2d0fb10fc
> > > +        .quad 0x3faa282b8a936171
> > > +        .quad 0x3fa9a187b573de7c
> > > +        .quad 0x3fa91b073efd7314
> > > +        .quad 0x3fa894aa149fb343
> > > +        .quad 0x3fa80e7023d8ccc4
> > > +        .quad 0x3fa788595a3577ba
> > > +        .quad 0x3fa70265a550e777
> > > +        .quad 0x3fa67c94f2d4bb58
> > > +        .quad 0x3fa5f6e73078efb8
> > > +        .quad 0x3fa5715c4c03ceef
> > > +        .quad 0x3fa4ebf43349e26f
> > > +        .quad 0x3fa466aed42de3ea
> > > +        .quad 0x3fa3e18c1ca0ae92
> > > +        .quad 0x3fa35c8bfaa1306b
> > > +        .quad 0x3fa2d7ae5c3c5bae
> > > +        .quad 0x3fa252f32f8d183f
> > > +        .quad 0x3fa1ce5a62bc353a
> > > +        .quad 0x3fa149e3e4005a8d
> > > +        .quad 0x3fa0c58fa19dfaaa
> > > +        .quad 0x3fa0415d89e74444
> > > +        .quad 0x3f9f7a9b16782856
> > > +        .quad 0x3f9e72bf2813ce51
> > > +        .quad 0x3f9d6b2725979802
> > > +        .quad 0x3f9c63d2ec14aaf2
> > > +        .quad 0x3f9b5cc258b718e6
> > > +        .quad 0x3f9a55f548c5c43f
> > > +        .quad 0x3f994f6b99a24475
> > > +        .quad 0x3f98492528c8cabf
> > > +        .quad 0x3f974321d3d006d3
> > > +        .quad 0x3f963d6178690bd6
> > > +        .quad 0x3f9537e3f45f3565
> > > +        .quad 0x3f9432a925980cc1
> > > +        .quad 0x3f932db0ea132e22
> > > +        .quad 0x3f9228fb1fea2e28
> > > +        .quad 0x3f912487a5507f70
> > > +        .quad 0x3f90205658935847
> > > +        .quad 0x3f8e38ce3033310c
> > > +        .quad 0x3f8c317384c75f06
> > > +        .quad 0x3f8a2a9c6c170462
> > > +        .quad 0x3f882448a388a2aa
> > > +        .quad 0x3f861e77e8b53fc6
> > > +        .quad 0x3f841929f96832f0
> > > +        .quad 0x3f82145e939ef1e9
> > > +        .quad 0x3f8010157588de71
> > > +        .quad 0x3f7c189cbb0e27fb
> > > +        .quad 0x3f78121214586b54
> > > +        .quad 0x3f740c8a747878e2
> > > +        .quad 0x3f70080559588b35
> > > +        .quad 0x3f680904828985c0
> > > +        .quad 0x3f60040155d5889e
> > > +        .quad 0x3f50020055655889
> > > +        .quad 0x0000000000000000
> > > +        /*== poly_coeff[4] ==*/
> > > +        .align 16
> > > +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> > > +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> > > +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> > > +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> > > +        /*== ExpMask ==*/
> > > +        .align 16
> > > +        .quad 0x000fffffffffffff, 0x000fffffffffffff
> > > +        /*== Two10 ==*/
> > > +        .align 16
> > > +        .quad 0x3f50000000000000, 0x3f50000000000000
> > > +        /*== MinLog1p = -1+2^(-53) ==*/
> > > +        .align 16
> > > +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff
> > > +        /*== MaxLog1p ==*/
> > > +        .align 16
> > > +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000
> > > +        /*== One ==*/
> > > +        .align 16
> > > +        .quad 0x3ff0000000000000, 0x3ff0000000000000
> > > +        /*== SgnMask ==*/
> > > +        .align 16
> > > +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff
> > > +        /*== XThreshold ==*/
> > > +        .align 16
> > > +        .quad 0x3e00000000000000, 0x3e00000000000000
> > > +        /*== XhMask ==*/
> > > +        .align 16
> > > +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00
> > > +        /*== Threshold ==*/
> > > +        .align 16
> > > +        .quad 0x4086a00000000000, 0x4086a00000000000
> > > +        /*== Bias ==*/
> > > +        .align 16
> > > +        .quad 0x408ff80000000000, 0x408ff80000000000
> > > +        /*== Bias1 ==*/
> > > +        .align 16
> > > +        .quad 0x408ff00000000000, 0x408ff00000000000
> > > +        /*== ExpMask ==*/
> > > +        .align 16
> > > +        .quad 0x7ff0000000000000, 0x7ff0000000000000
> > > +        /*== ExpMask2 ==*/
> > > +        .align 16
> > > +        .quad 0x7f40000000000000, 0x7f40000000000000
> > > +        /*== L2L ==*/
> > > +        .align 16
> > > +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> > > +        .align 16
> > > +        .type        __svml_dlog1p_data_internal,@object
> > > +        .size        __svml_dlog1p_data_internal,.-__svml_dlog1p_data_internal
> > > +        .space 96, 0x00
> > > +        .align 16
> > > +
> > > +.FLT_16:
> > > +        .long        0x00000000,0x43380000,0x00000000,0x43380000
> > > +        .type        .FLT_16,@object
> > > +        .size        .FLT_16,16
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
> > > new file mode 100644
> > > index 0000000000..ec01af680c
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core-sse.S
> > > @@ -0,0 +1,20 @@
> > > +/* SSE version of vectorized log1p, vector length is 4.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define _ZGVdN4v_log1p _ZGVdN4v_log1p_sse_wrapper
> > > +#include "../svml_d_log1p4_core.S"
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
> > > new file mode 100644
> > > index 0000000000..808f3224ef
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core.c
> > > @@ -0,0 +1,27 @@
> > > +/* Multiple versions of vectorized log1p, vector length is 4.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define SYMBOL_NAME _ZGVdN4v_log1p
> > > +#include "ifunc-mathvec-avx2.h"
> > > +
> > > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > > +
> > > +#ifdef SHARED
> > > +__hidden_ver1 (_ZGVdN4v_log1p, __GI__ZGVdN4v_log1p, __redirect__ZGVdN4v_log1p)
> > > +  __attribute__ ((visibility ("hidden")));
> > > +#endif
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
> > > new file mode 100644
> > > index 0000000000..548538b0ec
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p4_core_avx2.S
> > > @@ -0,0 +1,1383 @@
> > > +/* Function log1p vectorized with AVX2.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   https://www.gnu.org/licenses/.  */
> > > +
> > > +/*
> > > + * ALGORITHM DESCRIPTION:
> > > + *
> > > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > > + *       log(Rcp) is tabulated
> > > + *
> > > + *
> > > + */
> > > +
> > > +/* Offsets for data table __svml_dlog1p_data_internal
> > > + */
> > > +#define Log_HA_table                         0
> > > +#define Log_LA_table                         8224
> > > +#define poly_coeff                           12352
> > > +#define ExpMask                              12480
> > > +#define Two10                                12512
> > > +#define MinLog1p                             12544
> > > +#define MaxLog1p                             12576
> > > +#define One                                  12608
> > > +#define SgnMask                              12640
> > > +#define XThreshold                           12672
> > > +#define XhMask                               12704
> > > +#define Threshold                            12736
> > > +#define Bias                                 12768
> > > +#define Bias1                                12800
> > > +#define ExpMask0                             12832
> > > +#define ExpMask2                             12864
> > > +#define L2                                   12896
> > > +
> > > +/* Lookup bias for data table __svml_dlog1p_data_internal.  */
> > > +#define Table_Lookup_Bias               -0x405fe0
> > > +
> > > +#include <sysdep.h>
> > > +
> > > +        .text
> > > +     .section .text.avx2,"ax",@progbits
> > > +ENTRY(_ZGVdN4v_log1p_avx2)
> > > +        pushq     %rbp
> > > +        cfi_def_cfa_offset(16)
> > > +        movq      %rsp, %rbp
> > > +        cfi_def_cfa(6, 16)
> > > +        cfi_offset(6, -16)
> > > +        andq      $-32, %rsp
> > > +        subq      $96, %rsp
> > > +        lea       Table_Lookup_Bias+__svml_dlog1p_data_internal(%rip), %r8
> > > +
> > > +/* SgnMask used by all accuracies */
> > > +        vmovupd   SgnMask+__svml_dlog1p_data_internal(%rip), %ymm12
> > > +        vmovupd   One+__svml_dlog1p_data_internal(%rip), %ymm7
> > > +
> > > +/* 2^ (-10-exp(X) ) */
> > > +        vmovupd   ExpMask2+__svml_dlog1p_data_internal(%rip), %ymm3
> > > +        vmovapd   %ymm0, %ymm9
> > > +        vandpd    %ymm12, %ymm9, %ymm10
> > > +        vcmplt_oqpd XThreshold+__svml_dlog1p_data_internal(%rip), %ymm10, %ymm11
> > > +        vaddpd    %ymm7, %ymm9, %ymm13
> > > +
> > > +/* compute 1+x as high, low parts */
> > > +        vmaxpd    %ymm9, %ymm7, %ymm15
> > > +        vminpd    %ymm9, %ymm7, %ymm6
> > > +        vorpd     XhMask+__svml_dlog1p_data_internal(%rip), %ymm11, %ymm14
> > > +        vandpd    %ymm14, %ymm13, %ymm4
> > > +
> > > +/* preserve mantissa, set input exponent to 2^(-10) */
> > > +        vandpd    ExpMask+__svml_dlog1p_data_internal(%rip), %ymm4, %ymm5
> > > +        vorpd     Two10+__svml_dlog1p_data_internal(%rip), %ymm5, %ymm5
> > > +
> > > +/* reciprocal approximation good to at least 11 bits */
> > > +        vcvtpd2ps %ymm5, %xmm2
> > > +        vsubpd    %ymm4, %ymm15, %ymm0
> > > +
> > > +/* check range */
> > > +        vcmplt_oqpd MinLog1p+__svml_dlog1p_data_internal(%rip), %ymm9, %ymm15
> > > +        vrcpps    %xmm2, %xmm1
> > > +        vaddpd    %ymm0, %ymm6, %ymm6
> > > +        vcmpnle_uqpd MaxLog1p+__svml_dlog1p_data_internal(%rip), %ymm9, %ymm0
> > > +        vcvtps2pd %xmm1, %ymm11
> > > +
> > > +/* exponent of X needed to scale Xl */
> > > +        vandps    ExpMask0+__svml_dlog1p_data_internal(%rip), %ymm4, %ymm10
> > > +        vpsubq    %ymm10, %ymm3, %ymm13
> > > +
> > > +/* exponent bits */
> > > +        vpsrlq    $20, %ymm4, %ymm4
> > > +
> > > +/* round reciprocal to nearest integer, will have 1+9 mantissa bits */
> > > +        vroundpd  $0, %ymm11, %ymm3
> > > +
> > > +/* scale DblRcp */
> > > +        vmulpd    %ymm13, %ymm3, %ymm2
> > > +
> > > +/* exponent*log(2.0) */
> > > +        vmovupd   Threshold+__svml_dlog1p_data_internal(%rip), %ymm13
> > > +        vfmsub213pd %ymm7, %ymm3, %ymm5
> > > +
> > > +/* Compute SignMask for all accuracies, including EP */
> > > +        vandnpd   %ymm9, %ymm12, %ymm8
> > > +        vorpd     %ymm0, %ymm15, %ymm7
> > > +
> > > +/*
> > > + * prepare table index
> > > + * table lookup
> > > + */
> > > +        vpsrlq    $40, %ymm3, %ymm0
> > > +
> > > +/*
> > > + * argument reduction
> > > + * VQFMS( D, R, X, DblRcp1, One );
> > > + */
> > > +        vfmadd213pd %ymm5, %ymm2, %ymm6
> > > +        vmovupd   poly_coeff+64+__svml_dlog1p_data_internal(%rip), %ymm2
> > > +        vcmplt_oqpd %ymm3, %ymm13, %ymm3
> > > +        vmulpd    %ymm6, %ymm6, %ymm5
> > > +        vfmadd213pd poly_coeff+96+__svml_dlog1p_data_internal(%rip), %ymm6, %ymm2
> > > +
> > > +/* combine and get argument value range mask */
> > > +        vmovmskpd %ymm7, %eax
> > > +        vextractf128 $1, %ymm4, %xmm12
> > > +        vshufps   $221, %xmm12, %xmm4, %xmm14
> > > +
> > > +/* biased exponent in DP format */
> > > +        vcvtdq2pd %xmm14, %ymm1
> > > +        vandpd    Bias+__svml_dlog1p_data_internal(%rip), %ymm3, %ymm14
> > > +        vorpd     Bias1+__svml_dlog1p_data_internal(%rip), %ymm14, %ymm15
> > > +        vsubpd    %ymm15, %ymm1, %ymm1
> > > +        vmulpd    L2+__svml_dlog1p_data_internal(%rip), %ymm1, %ymm3
> > > +
> > > +/* polynomial */
> > > +        vmovupd   poly_coeff+__svml_dlog1p_data_internal(%rip), %ymm1
> > > +        vfmadd213pd poly_coeff+32+__svml_dlog1p_data_internal(%rip), %ymm6, %ymm1
> > > +        vfmadd213pd %ymm2, %ymm5, %ymm1
> > > +
> > > +/* reconstruction */
> > > +        vfmadd213pd %ymm6, %ymm5, %ymm1
> > > +        vextractf128 $1, %ymm0, %xmm10
> > > +        vmovd     %xmm0, %edx
> > > +        vmovd     %xmm10, %esi
> > > +        movslq    %edx, %rdx
> > > +        vpextrd   $2, %xmm0, %ecx
> > > +        movslq    %esi, %rsi
> > > +        vpextrd   $2, %xmm10, %edi
> > > +        movslq    %ecx, %rcx
> > > +        movslq    %edi, %rdi
> > > +        vmovsd    (%r8,%rdx), %xmm4
> > > +        vmovsd    (%r8,%rsi), %xmm11
> > > +        vmovhpd   (%r8,%rcx), %xmm4, %xmm7
> > > +        vmovhpd   (%r8,%rdi), %xmm11, %xmm12
> > > +        vinsertf128 $1, %xmm12, %ymm7, %ymm0
> > > +        vaddpd    %ymm1, %ymm0, %ymm6
> > > +        vaddpd    %ymm6, %ymm3, %ymm0
> > > +
> > > +/* OR in the Sign of input argument to produce correct log1p(-0) */
> > > +        vorpd     %ymm8, %ymm0, %ymm0
> > > +        testl     %eax, %eax
> > > +
> > > +/* Go to special inputs processing branch */
> > > +        jne       L(SPECIAL_VALUES_BRANCH)
> > > +                                # LOE rbx r12 r13 r14 r15 eax ymm0 ymm9
> > > +
> > > +/* Restore registers
> > > + * and exit the function
> > > + */
> > > +
> > > +L(EXIT):
> > > +        movq      %rbp, %rsp
> > > +        popq      %rbp
> > > +        cfi_def_cfa(7, 8)
> > > +        cfi_restore(6)
> > > +        ret
> > > +        cfi_def_cfa(6, 16)
> > > +        cfi_offset(6, -16)
> > > +
> > > +/* Branch to process
> > > + * special inputs
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_BRANCH):
> > > +        vmovupd   %ymm9, 32(%rsp)
> > > +        vmovupd   %ymm0, 64(%rsp)
> > > +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> > > +
> > > +        xorl      %edx, %edx
> > > +                                # LOE rbx r12 r13 r14 r15 eax edx
> > > +
> > > +        vzeroupper
> > > +        movq      %r12, 16(%rsp)
> > > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> > > +        movl      %edx, %r12d
> > > +        movq      %r13, 8(%rsp)
> > > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> > > +        movl      %eax, %r13d
> > > +        movq      %r14, (%rsp)
> > > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +/* Range mask
> > > + * bits check
> > > + */
> > > +
> > > +L(RANGEMASK_CHECK):
> > > +        btl       %r12d, %r13d
> > > +
> > > +/* Call scalar math function */
> > > +        jc        L(SCALAR_MATH_CALL)
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +/* Special inputs
> > > + * processing loop
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_LOOP):
> > > +        incl      %r12d
> > > +        cmpl      $4, %r12d
> > > +
> > > +/* Check bits in range mask */
> > > +        jl        L(RANGEMASK_CHECK)
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +        movq      16(%rsp), %r12
> > > +        cfi_restore(12)
> > > +        movq      8(%rsp), %r13
> > > +        cfi_restore(13)
> > > +        movq      (%rsp), %r14
> > > +        cfi_restore(14)
> > > +        vmovupd   64(%rsp), %ymm0
> > > +
> > > +/* Go to exit */
> > > +        jmp       L(EXIT)
> > > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> > > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> > > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> > > +                                # LOE rbx r12 r13 r14 r15 ymm0
> > > +
> > > +/* Scalar math fucntion call
> > > + * to process special input
> > > + */
> > > +
> > > +L(SCALAR_MATH_CALL):
> > > +        movl      %r12d, %r14d
> > > +        movsd     32(%rsp,%r14,8), %xmm0
> > > +        call      log1p@PLT
> > > +                                # LOE rbx r14 r15 r12d r13d xmm0
> > > +
> > > +        movsd     %xmm0, 64(%rsp,%r14,8)
> > > +
> > > +/* Process special inputs in loop */
> > > +        jmp       L(SPECIAL_VALUES_LOOP)
> > > +                                # LOE rbx r15 r12d r13d
> > > +END(_ZGVdN4v_log1p_avx2)
> > > +
> > > +        .section .rodata, "a"
> > > +        .align 32
> > > +
> > > +#ifdef __svml_dlog1p_data_internal_typedef
> > > +typedef unsigned int VUINT32;
> > > +typedef struct {
> > > +        __declspec(align(32)) VUINT32 Log_HA_table[(1<<10)+2][2];
> > > +        __declspec(align(32)) VUINT32 Log_LA_table[(1<<9)+1][2];
> > > +        __declspec(align(32)) VUINT32 poly_coeff[4][4][2];
> > > +        __declspec(align(32)) VUINT32 ExpMask[4][2];
> > > +        __declspec(align(32)) VUINT32 Two10[4][2];
> > > +        __declspec(align(32)) VUINT32 MinLog1p[4][2];
> > > +        __declspec(align(32)) VUINT32 MaxLog1p[4][2];
> > > +        __declspec(align(32)) VUINT32 One[4][2];
> > > +        __declspec(align(32)) VUINT32 SgnMask[4][2];
> > > +        __declspec(align(32)) VUINT32 XThreshold[4][2];
> > > +        __declspec(align(32)) VUINT32 XhMask[4][2];
> > > +        __declspec(align(32)) VUINT32 Threshold[4][2];
> > > +        __declspec(align(32)) VUINT32 Bias[4][2];
> > > +        __declspec(align(32)) VUINT32 Bias1[4][2];
> > > +        __declspec(align(32)) VUINT32 ExpMask0[4][2];
> > > +        __declspec(align(32)) VUINT32 ExpMask2[4][2];
> > > +        __declspec(align(32)) VUINT32 L2[4][2];
> > > +} __svml_dlog1p_data_internal;
> > > +#endif
> > > +__svml_dlog1p_data_internal:
> > > +        /* Log_HA_table */
> > > +        .quad 0xc086232bdd7a8300, 0xbe1ce91eef3fb100
> > > +        .quad 0xc086232fdc7ad828, 0xbe1cefcffda73b6a
> > > +        .quad 0xc0862333d97d2ba0, 0xbe1cef406748f1ff
> > > +        .quad 0xc0862337d48378e0, 0xbe1cef2a9429925a
> > > +        .quad 0xc086233bcd8fb878, 0xbe1cf138d17ebecb
> > > +        .quad 0xc086233fc4a3e018, 0xbe1ceff2dbbbb29e
> > > +        .quad 0xc0862343b9c1e270, 0xbe1cf1a42aae437b
> > > +        .quad 0xc0862347acebaf68, 0xbe1cef3b152048af
> > > +        .quad 0xc086234b9e2333f0, 0xbe1cef20e127805e
> > > +        .quad 0xc086234f8d6a5a30, 0xbe1cf00ad6052cf4
> > > +        .quad 0xc08623537ac30980, 0xbe1cefc4642ee597
> > > +        .quad 0xc0862357662f2660, 0xbe1cf1f277d36e16
> > > +        .quad 0xc086235b4fb092a0, 0xbe1ceed009e8d8e6
> > > +        .quad 0xc086235f37492d28, 0xbe1cf1e4038cb362
> > > +        .quad 0xc08623631cfad250, 0xbe1cf0b0873b8557
> > > +        .quad 0xc086236700c75b98, 0xbe1cf15bb3227c0b
> > > +        .quad 0xc086236ae2b09fe0, 0xbe1cf151ef8ca9ed
> > > +        .quad 0xc086236ec2b87358, 0xbe1cefe1dc2cd2ed
> > > +        .quad 0xc0862372a0e0a780, 0xbe1cf0d1eec5454f
> > > +        .quad 0xc08623767d2b0b48, 0xbe1ceeefd570bbce
> > > +        .quad 0xc086237a57996af0, 0xbe1cee99ae91b3a7
> > > +        .quad 0xc086237e302d9028, 0xbe1cf0412830fbd1
> > > +        .quad 0xc086238206e94218, 0xbe1ceee898588610
> > > +        .quad 0xc0862385dbce4548, 0xbe1cee9a1fbcaaea
> > > +        .quad 0xc0862389aede5bc0, 0xbe1ceed8e7cc1ad6
> > > +        .quad 0xc086238d801b4500, 0xbe1cf10c8d059da6
> > > +        .quad 0xc08623914f86be18, 0xbe1ceee6c63a8165
> > > +        .quad 0xc08623951d228180, 0xbe1cf0c3592d2ff1
> > > +        .quad 0xc0862398e8f04758, 0xbe1cf0026cc4cb1b
> > > +        .quad 0xc086239cb2f1c538, 0xbe1cf15d48d8e670
> > > +        .quad 0xc08623a07b28ae60, 0xbe1cef359363787c
> > > +        .quad 0xc08623a44196b390, 0xbe1cefdf1ab2e82c
> > > +        .quad 0xc08623a8063d8338, 0xbe1cefe43c02aa84
> > > +        .quad 0xc08623abc91ec960, 0xbe1cf044f5ae35b7
> > > +        .quad 0xc08623af8a3c2fb8, 0xbe1cf0b0b4001e1b
> > > +        .quad 0xc08623b349975d98, 0xbe1cf1bae76dfbcf
> > > +        .quad 0xc08623b70731f810, 0xbe1cef0a72e13a62
> > > +        .quad 0xc08623bac30da1c8, 0xbe1cf184007d2b6b
> > > +        .quad 0xc08623be7d2bfb40, 0xbe1cf16f4b239e98
> > > +        .quad 0xc08623c2358ea2a0, 0xbe1cf0976acada87
> > > +        .quad 0xc08623c5ec3733d0, 0xbe1cf066318a16ff
> > > +        .quad 0xc08623c9a1274880, 0xbe1ceffaa7148798
> > > +        .quad 0xc08623cd54607820, 0xbe1cf23ab02e9b6e
> > > +        .quad 0xc08623d105e45800, 0xbe1cefdfef7d4fde
> > > +        .quad 0xc08623d4b5b47b20, 0xbe1cf17fece44f2b
> > > +        .quad 0xc08623d863d27270, 0xbe1cf18f907d0d7c
> > > +        .quad 0xc08623dc103fccb0, 0xbe1cee61fe072c98
> > > +        .quad 0xc08623dfbafe1668, 0xbe1cf022dd891e2f
> > > +        .quad 0xc08623e3640eda20, 0xbe1ceecc1daf4358
> > > +        .quad 0xc08623e70b73a028, 0xbe1cf0173c4fa380
> > > +        .quad 0xc08623eab12deec8, 0xbe1cf16a2150c2f4
> > > +        .quad 0xc08623ee553f4a30, 0xbe1cf1bf980b1f4b
> > > +        .quad 0xc08623f1f7a93480, 0xbe1cef8b731663c2
> > > +        .quad 0xc08623f5986d2dc0, 0xbe1cee9a664d7ef4
> > > +        .quad 0xc08623f9378cb3f0, 0xbe1cf1eda2af6400
> > > +        .quad 0xc08623fcd5094320, 0xbe1cf1923f9d68d7
> > > +        .quad 0xc086240070e45548, 0xbe1cf0747cd3e03a
> > > +        .quad 0xc08624040b1f6260, 0xbe1cf22ee855bd6d
> > > +        .quad 0xc0862407a3bbe078, 0xbe1cf0d57360c00b
> > > +        .quad 0xc086240b3abb4398, 0xbe1ceebc815cd575
> > > +        .quad 0xc086240ed01efdd0, 0xbe1cf03bfb970951
> > > +        .quad 0xc086241263e87f50, 0xbe1cf16e74768529
> > > +        .quad 0xc0862415f6193658, 0xbe1cefec64b8becb
> > > +        .quad 0xc086241986b28f30, 0xbe1cf0838d210baa
> > > +        .quad 0xc086241d15b5f448, 0xbe1cf0ea86e75b11
> > > +        .quad 0xc0862420a324ce28, 0xbe1cf1708d11d805
> > > +        .quad 0xc08624242f008380, 0xbe1ceea988c5a417
> > > +        .quad 0xc0862427b94a7910, 0xbe1cef166a7bbca5
> > > +        .quad 0xc086242b420411d0, 0xbe1cf0c9d9e86a38
> > > +        .quad 0xc086242ec92eaee8, 0xbe1cef0946455411
> > > +        .quad 0xc08624324ecbaf98, 0xbe1cefea60907739
> > > +        .quad 0xc0862435d2dc7160, 0xbe1cf1ed0934ce42
> > > +        .quad 0xc086243955624ff8, 0xbe1cf191ba746c7d
> > > +        .quad 0xc086243cd65ea548, 0xbe1ceeec78cf2a7e
> > > +        .quad 0xc086244055d2c968, 0xbe1cef345284c119
> > > +        .quad 0xc0862443d3c012b8, 0xbe1cf24f77355219
> > > +        .quad 0xc08624475027d5e8, 0xbe1cf05bf087e114
> > > +        .quad 0xc086244acb0b65d0, 0xbe1cef3504a32189
> > > +        .quad 0xc086244e446c1398, 0xbe1ceff54b2a406f
> > > +        .quad 0xc0862451bc4b2eb8, 0xbe1cf0757d54ed4f
> > > +        .quad 0xc086245532aa04f0, 0xbe1cf0c8099fdfd5
> > > +        .quad 0xc0862458a789e250, 0xbe1cf0b173796a31
> > > +        .quad 0xc086245c1aec1138, 0xbe1cf11d8734540d
> > > +        .quad 0xc086245f8cd1da60, 0xbe1cf1916a723ceb
> > > +        .quad 0xc0862462fd3c84d8, 0xbe1cf19a911e1da7
> > > +        .quad 0xc08624666c2d5608, 0xbe1cf23a9ef72e4f
> > > +        .quad 0xc0862469d9a591c0, 0xbe1cef503d947663
> > > +        .quad 0xc086246d45a67a18, 0xbe1cf0fceeb1a0b2
> > > +        .quad 0xc0862470b0314fa8, 0xbe1cf107e27e4fbc
> > > +        .quad 0xc086247419475160, 0xbe1cf03dd9922331
> > > +        .quad 0xc086247780e9bc98, 0xbe1cefce1a10e129
> > > +        .quad 0xc086247ae719cd18, 0xbe1ceea47f73c4f6
> > > +        .quad 0xc086247e4bd8bd10, 0xbe1ceec0ac56d100
> > > +        .quad 0xc0862481af27c528, 0xbe1cee8a6593278a
> > > +        .quad 0xc086248511081c70, 0xbe1cf2231dd9dec7
> > > +        .quad 0xc0862488717af888, 0xbe1cf0b4b8ed7da8
> > > +        .quad 0xc086248bd0818d68, 0xbe1cf1bd8d835002
> > > +        .quad 0xc086248f2e1d0d98, 0xbe1cf259acc107f4
> > > +        .quad 0xc08624928a4eaa20, 0xbe1cee897636b00c
> > > +        .quad 0xc0862495e5179270, 0xbe1cee757f20c326
> > > +        .quad 0xc08624993e78f490, 0xbe1cefafd3aa54a4
> > > +        .quad 0xc086249c9673fd10, 0xbe1cee7298d38b97
> > > +        .quad 0xc086249fed09d6f8, 0xbe1ceedc158d4ceb
> > > +        .quad 0xc08624a3423babe0, 0xbe1cf2282987cb2e
> > > +        .quad 0xc08624a6960aa400, 0xbe1cefe7381ecc4b
> > > +        .quad 0xc08624a9e877e600, 0xbe1cef328dbbce80
> > > +        .quad 0xc08624ad39849728, 0xbe1cefde45f3cc71
> > > +        .quad 0xc08624b08931db58, 0xbe1cefa8b89433b9
> > > +        .quad 0xc08624b3d780d500, 0xbe1cef6773c0b139
> > > +        .quad 0xc08624b72472a528, 0xbe1cf031c931c11f
> > > +        .quad 0xc08624ba70086b78, 0xbe1cf088f49275e7
> > > +        .quad 0xc08624bdba434630, 0xbe1cf17de0eaa86d
> > > +        .quad 0xc08624c103245238, 0xbe1cefd492f1ba75
> > > +        .quad 0xc08624c44aacab08, 0xbe1cf1253e154466
> > > +        .quad 0xc08624c790dd6ad0, 0xbe1cf0fb09ee6d55
> > > +        .quad 0xc08624cad5b7aa58, 0xbe1cf1f08dd048fe
> > > +        .quad 0xc08624ce193c8120, 0xbe1ceeca0809697f
> > > +        .quad 0xc08624d15b6d0538, 0xbe1cef8d5662d968
> > > +        .quad 0xc08624d49c4a4b78, 0xbe1cee97b556ed78
> > > +        .quad 0xc08624d7dbd56750, 0xbe1cf1b14b6acb75
> > > +        .quad 0xc08624db1a0f6b00, 0xbe1cef1e860623f2
> > > +        .quad 0xc08624de56f96758, 0xbe1ceeaf4d156f3d
> > > +        .quad 0xc08624e192946bf0, 0xbe1ceecc12b400ed
> > > +        .quad 0xc08624e4cce18710, 0xbe1cf180c40c794f
> > > +        .quad 0xc08624e805e1c5c8, 0xbe1cf185a08f7f65
> > > +        .quad 0xc08624eb3d9633d8, 0xbe1cef45fc924078
> > > +        .quad 0xc08624ee73ffdbb0, 0xbe1cf1e4f457f32a
> > > +        .quad 0xc08624f1a91fc6a0, 0xbe1cf040147b8a5a
> > > +        .quad 0xc08624f4dcf6fc98, 0xbe1cf1effca0dfb2
> > > +        .quad 0xc08624f80f868468, 0xbe1cf0470146e5bc
> > > +        .quad 0xc08624fb40cf6390, 0xbe1cef4dd186e501
> > > +        .quad 0xc08624fe70d29e60, 0xbe1ceebe257f66c7
> > > +        .quad 0xc08625019f9137f0, 0xbe1ceefb7a1c395c
> > > +        .quad 0xc0862504cd0c3220, 0xbe1cf209dedfed8c
> > > +        .quad 0xc0862507f9448db0, 0xbe1cf082da464994
> > > +        .quad 0xc086250b243b4a18, 0xbe1cee88694a73cf
> > > +        .quad 0xc086250e4df165a0, 0xbe1cf0b61e8f0531
> > > +        .quad 0xc08625117667dd78, 0xbe1cf1106599c962
> > > +        .quad 0xc08625149d9fad98, 0xbe1ceff1ee88af1f
> > > +        .quad 0xc0862517c399d0c8, 0xbe1cf0f746994ef6
> > > +        .quad 0xc086251ae85740b8, 0xbe1cefe8a1d077e4
> > > +        .quad 0xc086251e0bd8f5e0, 0xbe1cf1a1da036092
> > > +        .quad 0xc08625212e1fe7a8, 0xbe1cf0f8a7786fcd
> > > +        .quad 0xc08625244f2d0c48, 0xbe1cefa1174a07a7
> > > +        .quad 0xc08625276f0158d8, 0xbe1cef1043aa5b25
> > > +        .quad 0xc086252a8d9dc150, 0xbe1cf15d521c169d
> > > +        .quad 0xc086252dab033898, 0xbe1cf220bba8861f
> > > +        .quad 0xc0862530c732b078, 0xbe1cef51e310eae2
> > > +        .quad 0xc0862533e22d1988, 0xbe1cf222fcedd8ae
> > > +        .quad 0xc0862536fbf36370, 0xbe1cefdb4da4bda8
> > > +        .quad 0xc086253a14867ca0, 0xbe1ceeafc1112171
> > > +        .quad 0xc086253d2be75280, 0xbe1cee99dfb4b408
> > > +        .quad 0xc08625404216d160, 0xbe1cf22d2536f06b
> > > +        .quad 0xc08625435715e498, 0xbe1cef6abbf2e268
> > > +        .quad 0xc08625466ae57648, 0xbe1cf093a14789f5
> > > +        .quad 0xc08625497d866fa0, 0xbe1cf0f93655603c
> > > +        .quad 0xc086254c8ef9b8b8, 0xbe1cf1cc40c9aafc
> > > +        .quad 0xc086254f9f4038a8, 0xbe1ceeea5f4e9157
> > > +        .quad 0xc0862552ae5ad568, 0xbe1cefa9f52d4997
> > > +        .quad 0xc0862555bc4a7400, 0xbe1cefa490a638ff
> > > +        .quad 0xc0862558c90ff868, 0xbe1cef7fcf797d6f
> > > +        .quad 0xc086255bd4ac4590, 0xbe1cf1b4c51113c9
> > > +        .quad 0xc086255edf203d78, 0xbe1cef55e5b4a55d
> > > +        .quad 0xc0862561e86cc100, 0xbe1cf0d37a25f9dc
> > > +        .quad 0xc0862564f092b028, 0xbe1ceebe9efc19d9
> > > +        .quad 0xc0862567f792e9d8, 0xbe1cee8ad30a57b5
> > > +        .quad 0xc086256afd6e4c08, 0xbe1cef4e1817b90b
> > > +        .quad 0xc086256e0225b3b8, 0xbe1cee7fa9229996
> > > +        .quad 0xc086257105b9fce0, 0xbe1cf0b54963d945
> > > +        .quad 0xc0862574082c0298, 0xbe1cee5f2f3c7995
> > > +        .quad 0xc0862577097c9ee0, 0xbe1cf0828e303a2c
> > > +        .quad 0xc086257a09acaae0, 0xbe1cf172c3078947
> > > +        .quad 0xc086257d08bcfec0, 0xbe1cf189252afa22
> > > +        .quad 0xc086258006ae71b8, 0xbe1cefdb80426923
> > > +        .quad 0xc08625830381da08, 0xbe1ceef1391a0372
> > > +        .quad 0xc0862585ff380d00, 0xbe1cf17720c78d13
> > > +        .quad 0xc0862588f9d1df18, 0xbe1ceef1f9027d83
> > > +        .quad 0xc086258bf35023b8, 0xbe1cf06fac99dec9
> > > +        .quad 0xc086258eebb3ad78, 0xbe1cf1373eeb45c0
> > > +        .quad 0xc0862591e2fd4e00, 0xbe1cef777536bb81
> > > +        .quad 0xc0862594d92dd600, 0xbe1cf0f43ca40766
> > > +        .quad 0xc0862597ce461558, 0xbe1cefb2cfc6766b
> > > +        .quad 0xc086259ac246daf0, 0xbe1ceea49e64ffa2
> > > +        .quad 0xc086259db530f4c8, 0xbe1cf250fa457dec
> > > +        .quad 0xc08625a0a7053018, 0xbe1cf17d8bb2a44e
> > > +        .quad 0xc08625a397c45918, 0xbe1cf1d5906d54b7
> > > +        .quad 0xc08625a6876f3b30, 0xbe1cf08fe7b31780
> > > +        .quad 0xc08625a97606a0e0, 0xbe1cef13edfc9d11
> > > +        .quad 0xc08625ac638b53c8, 0xbe1cef9d2b107219
> > > +        .quad 0xc08625af4ffe1cb0, 0xbe1cf1ddd4ff6160
> > > +        .quad 0xc08625b23b5fc390, 0xbe1cefa02a996495
> > > +        .quad 0xc08625b525b10f68, 0xbe1cf166a7e37ee5
> > > +        .quad 0xc08625b80ef2c680, 0xbe1cef0b171068a5
> > > +        .quad 0xc08625baf725ae28, 0xbe1cf05c80779283
> > > +        .quad 0xc08625bdde4a8af0, 0xbe1cf1bbfbffb889
> > > +        .quad 0xc08625c0c4622090, 0xbe1cf0b8666c0124
> > > +        .quad 0xc08625c3a96d31e0, 0xbe1cf0a8fcf47a86
> > > +        .quad 0xc08625c68d6c80f0, 0xbe1cef46e18cb092
> > > +        .quad 0xc08625c97060cef0, 0xbe1cf1458a350efb
> > > +        .quad 0xc08625cc524adc58, 0xbe1ceeea1dadce12
> > > +        .quad 0xc08625cf332b68b0, 0xbe1cf0a1bfdc44c7
> > > +        .quad 0xc08625d2130332d0, 0xbe1cef96d02da73e
> > > +        .quad 0xc08625d4f1d2f8a8, 0xbe1cf2451c3c7701
> > > +        .quad 0xc08625d7cf9b7778, 0xbe1cf10d08f83812
> > > +        .quad 0xc08625daac5d6ba0, 0xbe1ceec5b4895c5e
> > > +        .quad 0xc08625dd881990b0, 0xbe1cf14e1325c5e4
> > > +        .quad 0xc08625e062d0a188, 0xbe1cf21d0904be12
> > > +        .quad 0xc08625e33c835838, 0xbe1ceed0839bcf21
> > > +        .quad 0xc08625e615326df0, 0xbe1cf1bb944889d2
> > > +        .quad 0xc08625e8ecde9b48, 0xbe1cee738e85eece
> > > +        .quad 0xc08625ebc38897e0, 0xbe1cf25c2bc6ef12
> > > +        .quad 0xc08625ee99311ac8, 0xbe1cf132b70a41ad
> > > +        .quad 0xc08625f16dd8da28, 0xbe1cf1984236a6e3
> > > +        .quad 0xc08625f441808b78, 0xbe1cf19ae74998f9
> > > +        .quad 0xc08625f71428e370, 0xbe1cef3e175d61a1
> > > +        .quad 0xc08625f9e5d295f8, 0xbe1cf101f9868fd9
> > > +        .quad 0xc08625fcb67e5658, 0xbe1cee69db83dcd2
> > > +        .quad 0xc08625ff862cd6f8, 0xbe1cf081b636af51
> > > +        .quad 0xc086260254dec9a8, 0xbe1cee62c7d59b3e
> > > +        .quad 0xc08626052294df58, 0xbe1cf1b745c57716
> > > +        .quad 0xc0862607ef4fc868, 0xbe1cef3d2800ea23
> > > +        .quad 0xc086260abb103458, 0xbe1cef480ff1acd2
> > > +        .quad 0xc086260d85d6d200, 0xbe1cf2424c9a17ef
> > > +        .quad 0xc08626104fa44f90, 0xbe1cf12cfde90fd5
> > > +        .quad 0xc086261318795a68, 0xbe1cf21f590dd5b6
> > > +        .quad 0xc0862615e0569f48, 0xbe1cf0c50f9cd28a
> > > +        .quad 0xc0862618a73cca30, 0xbe1ceedbdb520545
> > > +        .quad 0xc086261b6d2c8668, 0xbe1cf0b030396011
> > > +        .quad 0xc086261e32267e98, 0xbe1cf19917010e96
> > > +        .quad 0xc0862620f62b5cb0, 0xbe1cf07331355985
> > > +        .quad 0xc0862623b93bc9e8, 0xbe1cf01ae921a1c3
> > > +        .quad 0xc08626267b586ed0, 0xbe1cefe5cf0dbf0c
> > > +        .quad 0xc08626293c81f348, 0xbe1cf01b258aeb50
> > > +        .quad 0xc086262bfcb8fe88, 0xbe1cee6b9e7f4c68
> > > +        .quad 0xc086262ebbfe3710, 0xbe1cee684a9b21c9
> > > +        .quad 0xc08626317a5242b8, 0xbe1cf1f8bcde9a8b
> > > +        .quad 0xc086263437b5c6c0, 0xbe1cf1d063d36238
> > > +        .quad 0xc0862636f42967a8, 0xbe1cf1e31a19075e
> > > +        .quad 0xc0862639afadc950, 0xbe1cf1d8efdf7e7d
> > > +        .quad 0xc086263c6a438ef0, 0xbe1cf1812ee72dba
> > > +        .quad 0xc086263f23eb5b18, 0xbe1cf1449a9a2279
> > > +        .quad 0xc0862641dca5cfb8, 0xbe1cee96edce5085
> > > +        .quad 0xc086264494738e08, 0xbe1cf06797bd03b2
> > > +        .quad 0xc08626474b5536b8, 0xbe1cef91b9b7ffc1
> > > +        .quad 0xc086264a014b69c0, 0xbe1cef4b6721278f
> > > +        .quad 0xc086264cb656c678, 0xbe1cf1942925eb4a
> > > +        .quad 0xc086264f6a77eba8, 0xbe1cefa2c7bc2e39
> > > +        .quad 0xc08626521daf7758, 0xbe1cf252595aceb3
> > > +        .quad 0xc0862654cffe0718, 0xbe1cee8e9ae47ec2
> > > +        .quad 0xc0862657816437a8, 0xbe1cf1bf913828fa
> > > +        .quad 0xc086265a31e2a558, 0xbe1cf23475d6b366
> > > +        .quad 0xc086265ce179ebc8, 0xbe1cef8df00a922b
> > > +        .quad 0xc086265f902aa5f0, 0xbe1cef279bfa43e0
> > > +        .quad 0xc08626623df56e38, 0xbe1cf080e10b8365
> > > +        .quad 0xc0862664eadade70, 0xbe1cf1a518f9b544
> > > +        .quad 0xc086266796db8fd0, 0xbe1cef9308fed9e9
> > > +        .quad 0xc086266a41f81ae8, 0xbe1ceea3ae6b19c9
> > > +        .quad 0xc086266cec3117b8, 0xbe1ceef06003d4c2
> > > +        .quad 0xc086266f95871da8, 0xbe1cf0b8457ffb0c
> > > +        .quad 0xc08626723dfac390, 0xbe1cf0c526745ad6
> > > +        .quad 0xc0862674e58c9fa8, 0xbe1cf0cf91ff7b5d
> > > +        .quad 0xc08626778c3d4798, 0xbe1cefe260819380
> > > +        .quad 0xc086267a320d5070, 0xbe1ceebd90aa27a3
> > > +        .quad 0xc086267cd6fd4ea8, 0xbe1cf0388121dffa
> > > +        .quad 0xc086267f7b0dd630, 0xbe1cf1a3881435f1
> > > +        .quad 0xc08626821e3f7a68, 0xbe1cef28e9d9ac52
> > > +        .quad 0xc0862684c092ce08, 0xbe1cf02d300062dd
> > > +        .quad 0xc086268762086350, 0xbe1cefaee1edfa35
> > > +        .quad 0xc086268a02a0cbe0, 0xbe1cf0a5a052e936
> > > +        .quad 0xc086268ca25c98d8, 0xbe1cee60a4a497ed
> > > +        .quad 0xc086268f413c5ab0, 0xbe1cf0e4a5d0cf49
> > > +        .quad 0xc0862691df40a170, 0xbe1cf149235a4e6e
> > > +        .quad 0xc08626947c69fc80, 0xbe1cf215180b9fcc
> > > +        .quad 0xc086269718b8fac8, 0xbe1cef9b156a9840
> > > +        .quad 0xc0862699b42e2a90, 0xbe1cf054c91441be
> > > +        .quad 0xc086269c4eca19a8, 0xbe1cf13ded26512c
> > > +        .quad 0xc086269ee88d5550, 0xbe1cf22ea4d8ac06
> > > +        .quad 0xc08626a181786a40, 0xbe1cf2354666ee2e
> > > +        .quad 0xc08626a4198be4a8, 0xbe1cefef936752b3
> > > +        .quad 0xc08626a6b0c85020, 0xbe1cf1e360a9db68
> > > +        .quad 0xc08626a9472e37d8, 0xbe1ceed6aeb812c5
> > > +        .quad 0xc08626abdcbe2650, 0xbe1cf227340b4986
> > > +        .quad 0xc08626ae7178a5b0, 0xbe1cf0215a0cbe0d
> > > +        .quad 0xc08626b1055e3f70, 0xbe1cf256adf0ae26
> > > +        .quad 0xc08626b3986f7ca8, 0xbe1ceff3c67aed06
> > > +        .quad 0xc08626b62aace5c8, 0xbe1cf2159fb93652
> > > +        .quad 0xc08626b8bc1702e0, 0xbe1cf01e6dbd1c7f
> > > +        .quad 0xc08626bb4cae5b60, 0xbe1cf009e75d1c0c
> > > +        .quad 0xc08626bddc737648, 0xbe1ceec10a020e73
> > > +        .quad 0xc08626c06b66da08, 0xbe1cf06d5783eee7
> > > +        .quad 0xc08626c2f9890ca0, 0xbe1cf0cb8f169ffe
> > > +        .quad 0xc08626c586da9388, 0xbe1cef7de2452430
> > > +        .quad 0xc08626c8135bf3b0, 0xbe1cf05da6f783ae
> > > +        .quad 0xc08626ca9f0db198, 0xbe1cefcc877d681d
> > > +        .quad 0xc08626cd29f05138, 0xbe1cef0531954ab3
> > > +        .quad 0xc08626cfb4045608, 0xbe1cf06b8565ea3d
> > > +        .quad 0xc08626d23d4a4310, 0xbe1cefdc455d9d7e
> > > +        .quad 0xc08626d4c5c29ad0, 0xbe1ceefc47e8fa64
> > > +        .quad 0xc08626d74d6ddf48, 0xbe1cf1872bf033f2
> > > +        .quad 0xc08626d9d44c9210, 0xbe1cf19d91087f9d
> > > +        .quad 0xc08626dc5a5f3438, 0xbe1cf012d444c6ab
> > > +        .quad 0xc08626dedfa64650, 0xbe1cf0ba528ee153
> > > +        .quad 0xc08626e164224880, 0xbe1ceeb431709788
> > > +        .quad 0xc08626e3e7d3ba60, 0xbe1cf0b9af31a6a5
> > > +        .quad 0xc08626e66abb1b28, 0xbe1cf168fb2e135b
> > > +        .quad 0xc08626e8ecd8e990, 0xbe1cef9097461c93
> > > +        .quad 0xc08626eb6e2da3d0, 0xbe1cee7a434735d8
> > > +        .quad 0xc08626edeeb9c7a8, 0xbe1cf235732b86f2
> > > +        .quad 0xc08626f06e7dd280, 0xbe1cefe1510b89e6
> > > +        .quad 0xc08626f2ed7a4120, 0xbe1cf1f64b9b80ef
> > > +        .quad 0xc08626f56baf9000, 0xbe1cf08f320ca339
> > > +        .quad 0xc08626f7e91e3b08, 0xbe1cf1b1de2808a1
> > > +        .quad 0xc08626fa65c6bdc0, 0xbe1cf1976d778b28
> > > +        .quad 0xc08626fce1a99338, 0xbe1ceef40a4f076f
> > > +        .quad 0xc08626ff5cc73600, 0xbe1cef3e45869ce3
> > > +        .quad 0xc0862701d7202048, 0xbe1ceef601b4c9d6
> > > +        .quad 0xc086270450b4cbc0, 0xbe1cf1eaf0b57fd6
> > > +        .quad 0xc0862706c985b1c0, 0xbe1cef82a44990f3
> > > +        .quad 0xc086270941934b10, 0xbe1ceefe32981f2c
> > > +        .quad 0xc086270bb8de1018, 0xbe1cefbf6f5a0445
> > > +        .quad 0xc086270e2f6678d0, 0xbe1cf18dba75792c
> > > +        .quad 0xc0862710a52cfcc8, 0xbe1cf0da64ce995f
> > > +        .quad 0xc08627131a321318, 0xbe1cef04ac0fb802
> > > +        .quad 0xc08627158e763268, 0xbe1cee9d4e2ad9bd
> > > +        .quad 0xc086271801f9d0f8, 0xbe1cefa9b55407b5
> > > +        .quad 0xc086271a74bd64a0, 0xbe1cefe6bd329570
> > > +        .quad 0xc086271ce6c162c8, 0xbe1cef0b1205dc85
> > > +        .quad 0xc086271f58064068, 0xbe1cef092a785e3f
> > > +        .quad 0xc0862721c88c7210, 0xbe1cf050dcdaac30
> > > +        .quad 0xc086272438546be8, 0xbe1cf210907ded8b
> > > +        .quad 0xc0862726a75ea1b8, 0xbe1cee760be44f99
> > > +        .quad 0xc086272915ab86c0, 0xbe1ceeeee07c2bcc
> > > +        .quad 0xc086272b833b8df0, 0xbe1cf06874992df5
> > > +        .quad 0xc086272df00f29d0, 0xbe1cef8fac5d4899
> > > +        .quad 0xc08627305c26cc70, 0xbe1cf1103241cc99
> > > +        .quad 0xc0862732c782e788, 0xbe1cf1d35fef83fe
> > > +        .quad 0xc08627353223ec68, 0xbe1cef3ec8133e1d
> > > +        .quad 0xc08627379c0a4be8, 0xbe1cef7261daccd8
> > > +        .quad 0xc086273a05367688, 0xbe1cf18656c50806
> > > +        .quad 0xc086273c6da8dc68, 0xbe1cf1c8736e049a
> > > +        .quad 0xc086273ed561ed38, 0xbe1cf1f93bff4911
> > > +        .quad 0xc08627413c621848, 0xbe1cf188a4ea680c
> > > +        .quad 0xc0862743a2a9cc80, 0xbe1cf1d270930c80
> > > +        .quad 0xc086274608397868, 0xbe1cf25a328c28e2
> > > +        .quad 0xc08627486d118a28, 0xbe1cf106f90aa3b8
> > > +        .quad 0xc086274ad1326f80, 0xbe1cee5e9d2e885a
> > > +        .quad 0xc086274d349c95c0, 0xbe1cf1c0bac27228
> > > +        .quad 0xc086274f975069f8, 0xbe1cf1a1500f9b1c
> > > +        .quad 0xc0862751f94e58c0, 0xbe1cefc30663ac44
> > > +        .quad 0xc08627545a96ce48, 0xbe1cf17123e427a2
> > > +        .quad 0xc0862756bb2a3678, 0xbe1cefb92749fea4
> > > +        .quad 0xc08627591b08fcc0, 0xbe1cefa40e1ea74a
> > > +        .quad 0xc086275b7a338c40, 0xbe1cee6f4612c3e9
> > > +        .quad 0xc086275dd8aa4fa8, 0xbe1cf1c54a053627
> > > +        .quad 0xc0862760366db168, 0xbe1ceff5eb503d9e
> > > +        .quad 0xc0862762937e1b70, 0xbe1cf02e47f10cee
> > > +        .quad 0xc0862764efdbf768, 0xbe1ceeb06e1d0dad
> > > +        .quad 0xc08627674b87ae88, 0xbe1cf10aadd6dba5
> > > +        .quad 0xc0862769a681a9c0, 0xbe1cf24e9913d30f
> > > +        .quad 0xc086276c00ca51a0, 0xbe1cef47b301e312
> > > +        .quad 0xc086276e5a620e48, 0xbe1ceeb1cefc2e85
> > > +        .quad 0xc0862770b3494788, 0xbe1cf16f1fbbe011
> > > +        .quad 0xc08627730b8064e8, 0xbe1ceebdf75174c7
> > > +        .quad 0xc08627756307cd70, 0xbe1cf06e3871a0da
> > > +        .quad 0xc0862777b9dfe7f0, 0xbe1cef16799fd554
> > > +        .quad 0xc086277a10091ac0, 0xbe1cf248dabf5377
> > > +        .quad 0xc086277c6583cc00, 0xbe1cf0c78d92a2cd
> > > +        .quad 0xc086277eba506158, 0xbe1cf0b911b029f0
> > > +        .quad 0xc08627810e6f4028, 0xbe1cefdc24719766
> > > +        .quad 0xc086278361e0cd70, 0xbe1cefbb6562b7e7
> > > +        .quad 0xc0862785b4a56dd8, 0xbe1cf1e0afb349ec
> > > +        .quad 0xc086278806bd85c0, 0xbe1cf008292e52fc
> > > +        .quad 0xc086278a58297918, 0xbe1cf053073872bf
> > > +        .quad 0xc086278ca8e9ab88, 0xbe1cf17a0a55a947
> > > +        .quad 0xc086278ef8fe8068, 0xbe1ceeffb0b60234
> > > +        .quad 0xc086279148685aa0, 0xbe1cf162204794a8
> > > +        .quad 0xc086279397279ce0, 0xbe1cf24cc8cb48ac
> > > +        .quad 0xc0862795e53ca978, 0xbe1cf0c9be68d5c3
> > > +        .quad 0xc086279832a7e258, 0xbe1cf172cd3d7388
> > > +        .quad 0xc086279a7f69a930, 0xbe1ceea2465fbce5
> > > +        .quad 0xc086279ccb825f40, 0xbe1cf0a386d2500f
> > > +        .quad 0xc086279f16f26590, 0xbe1cf1e338ddc18a
> > > +        .quad 0xc08627a161ba1cd0, 0xbe1cef1f5049867f
> > > +        .quad 0xc08627a3abd9e548, 0xbe1cef96c1ea8b1f
> > > +        .quad 0xc08627a5f5521f00, 0xbe1cf138f6fd3c26
> > > +        .quad 0xc08627a83e2329b0, 0xbe1cf0d4fcbfdf3a
> > > +        .quad 0xc08627aa864d64b0, 0xbe1cf24870c12c81
> > > +        .quad 0xc08627accdd12f18, 0xbe1cf0ae2a56348d
> > > +        .quad 0xc08627af14aee7a0, 0xbe1cee8ca1a9b893
> > > +        .quad 0xc08627b15ae6eca8, 0xbe1cf20414d637b0
> > > +        .quad 0xc08627b3a0799c60, 0xbe1cf0fc6b7b12d8
> > > +        .quad 0xc08627b5e5675488, 0xbe1cf152d93c4a00
> > > +        .quad 0xc08627b829b072a0, 0xbe1cf1073f9b77c2
> > > +        .quad 0xc08627ba6d5553d8, 0xbe1cee694f97d5a4
> > > +        .quad 0xc08627bcb0565500, 0xbe1cf0456b8239d7
> > > +        .quad 0xc08627bef2b3d2b0, 0xbe1cf211497127e3
> > > +        .quad 0xc08627c1346e2930, 0xbe1cf01856c0384d
> > > +        .quad 0xc08627c37585b468, 0xbe1cefa7dd05479e
> > > +        .quad 0xc08627c5b5fad000, 0xbe1cef3ae8e50b93
> > > +        .quad 0xc08627c7f5cdd750, 0xbe1ceea5f32fdd3a
> > > +        .quad 0xc08627ca34ff2560, 0xbe1cef424caeb8d9
> > > +        .quad 0xc08627cc738f14f0, 0xbe1cf0194d07a81f
> > > +        .quad 0xc08627ceb17e0070, 0xbe1cf20f452000c1
> > > +        .quad 0xc08627d0eecc4210, 0xbe1cf00e356218e4
> > > +        .quad 0xc08627d32b7a33a0, 0xbe1cef30484b4bcb
> > > +        .quad 0xc08627d567882eb0, 0xbe1ceeea11a6641b
> > > +        .quad 0xc08627d7a2f68c80, 0xbe1cf13492d5bd7b
> > > +        .quad 0xc08627d9ddc5a618, 0xbe1ceeb7048fad96
> > > +        .quad 0xc08627dc17f5d418, 0xbe1ceef0666f0477
> > > +        .quad 0xc08627de51876ee8, 0xbe1cf060d4b8b5c2
> > > +        .quad 0xc08627e08a7acea8, 0xbe1cf0b2a4b6ff8c
> > > +        .quad 0xc08627e2c2d04b28, 0xbe1cf0e34809a875
> > > +        .quad 0xc08627e4fa883bf0, 0xbe1cf16bf74a3522
> > > +        .quad 0xc08627e731a2f848, 0xbe1cee6a24623d57
> > > +        .quad 0xc08627e96820d718, 0xbe1cefc7b4f1528e
> > > +        .quad 0xc08627eb9e022f18, 0xbe1cf163051f3548
> > > +        .quad 0xc08627edd34756b8, 0xbe1cef36b3366305
> > > +        .quad 0xc08627f007f0a408, 0xbe1cf18134625550
> > > +        .quad 0xc08627f23bfe6cf0, 0xbe1cf0ec32ec1a11
> > > +        .quad 0xc08627f46f710700, 0xbe1ceeb3b64f3edc
> > > +        .quad 0xc08627f6a248c778, 0xbe1cf0cd15805bc8
> > > +        .quad 0xc08627f8d4860368, 0xbe1cf20db3bddebe
> > > +        .quad 0xc08627fb06290f90, 0xbe1cf25188430e25
> > > +        .quad 0xc08627fd37324070, 0xbe1ceea1713490f9
> > > +        .quad 0xc08627ff67a1ea28, 0xbe1cf159521d234c
> > > +        .quad 0xc0862801977860b8, 0xbe1cf24dfe50783b
> > > +        .quad 0xc0862803c6b5f7d0, 0xbe1ceef2ef89a60b
> > > +        .quad 0xc0862805f55b02c8, 0xbe1cee7fc919d62c
> > > +        .quad 0xc08628082367d4c0, 0xbe1cf215a7fb513a
> > > +        .quad 0xc086280a50dcc0a8, 0xbe1cf0e4401c5ed4
> > > +        .quad 0xc086280c7dba1910, 0xbe1cf04ec734d256
> > > +        .quad 0xc086280eaa003050, 0xbe1cf010ad787fea
> > > +        .quad 0xc0862810d5af5880, 0xbe1cee622478393d
> > > +        .quad 0xc086281300c7e368, 0xbe1cf01c7482564f
> > > +        .quad 0xc08628152b4a22a0, 0xbe1cf0de20d33536
> > > +        .quad 0xc086281755366778, 0xbe1cef2edae5837d
> > > +        .quad 0xc08628197e8d02f0, 0xbe1cf0a345318cc9
> > > +        .quad 0xc086281ba74e45d8, 0xbe1cf20085aa34b8
> > > +        .quad 0xc086281dcf7a80c0, 0xbe1cef5fa845ad83
> > > +        .quad 0xc086281ff71203e0, 0xbe1cf050d1df69c4
> > > +        .quad 0xc08628221e151f48, 0xbe1ceffe43c035b9
> > > +        .quad 0xc0862824448422b8, 0xbe1cf14f3018d3c2
> > > +        .quad 0xc08628266a5f5dc0, 0xbe1cef0a5fbae83d
> > > +        .quad 0xc08628288fa71f98, 0xbe1ceff8a95b72a1
> > > +        .quad 0xc086282ab45bb750, 0xbe1cef073aa9849b
> > > +        .quad 0xc086282cd87d73a8, 0xbe1cef69b3835c02
> > > +        .quad 0xc086282efc0ca328, 0xbe1cf0bc139379a9
> > > +        .quad 0xc08628311f099420, 0xbe1cef247a9ec596
> > > +        .quad 0xc086283341749490, 0xbe1cef74bbcc488a
> > > +        .quad 0xc0862835634df248, 0xbe1cef4bc42e7b8e
> > > +        .quad 0xc08628378495fad0, 0xbe1cf136d4d5a810
> > > +        .quad 0xc0862839a54cfb80, 0xbe1cf0d290b24dd8
> > > +        .quad 0xc086283bc5734168, 0xbe1ceeebde8e0065
> > > +        .quad 0xc086283de5091950, 0xbe1cf1a09f60aa1e
> > > +        .quad 0xc0862840040ecfe0, 0xbe1cf0803947a234
> > > +        .quad 0xc08628422284b168, 0xbe1cf0abf7638127
> > > +        .quad 0xc0862844406b0a08, 0xbe1cf0f73ee12058
> > > +        .quad 0xc08628465dc225a0, 0xbe1cf2079971b26c
> > > +        .quad 0xc08628487a8a4fe0, 0xbe1cee74957564b1
> > > +        .quad 0xc086284a96c3d420, 0xbe1ceee77c1b7d43
> > > +        .quad 0xc086284cb26efd90, 0xbe1cf23addba6e09
> > > +        .quad 0xc086284ecd8c1730, 0xbe1cf199f4a1da60
> > > +        .quad 0xc0862850e81b6bb0, 0xbe1cf09fdea81393
> > > +        .quad 0xc0862853021d4588, 0xbe1cf176adb417f7
> > > +        .quad 0xc08628551b91ef00, 0xbe1cf0f64f84a8da
> > > +        .quad 0xc08628573479b220, 0xbe1ceec34cf49523
> > > +        .quad 0xc08628594cd4d8a8, 0xbe1cf16d60fbe0bb
> > > +        .quad 0xc086285b64a3ac40, 0xbe1cee8de7acfc7b
> > > +        .quad 0xc086285d7be67630, 0xbe1ceee6256cce8d
> > > +        .quad 0xc086285f929d7fa0, 0xbe1cee7d66a3d8a5
> > > +        .quad 0xc0862861a8c91170, 0xbe1cf0bef8265792
> > > +        .quad 0xc0862863be697458, 0xbe1cf097f890c6f8
> > > +        .quad 0xc0862865d37ef0c8, 0xbe1cf09502d5c3fc
> > > +        .quad 0xc0862867e809cf00, 0xbe1ceeffb239dac7
> > > +        .quad 0xc0862869fc0a56f8, 0xbe1cf1fbfff95c98
> > > +        .quad 0xc086286c0f80d090, 0xbe1cefa57ad3eef7
> > > +        .quad 0xc086286e226d8348, 0xbe1cf22c58b9183d
> > > +        .quad 0xc086287034d0b690, 0xbe1ceff262d0a248
> > > +        .quad 0xc086287246aab180, 0xbe1cefa7bc194186
> > > +        .quad 0xc086287457fbbb08, 0xbe1cf06782d784d9
> > > +        .quad 0xc086287668c419e0, 0xbe1cf1d44d0eaa07
> > > +        .quad 0xc086287879041490, 0xbe1cf034803c8a48
> > > +        .quad 0xc086287a88bbf158, 0xbe1cf08e84916b6f
> > > +        .quad 0xc086287c97ebf650, 0xbe1cf0c4d3dc1bc7
> > > +        .quad 0xc086287ea6946958, 0xbe1cefb1e4625943
> > > +        .quad 0xc0862880b4b59010, 0xbe1cf143efdd1fd0
> > > +        .quad 0xc0862882c24faff8, 0xbe1cee9896d016da
> > > +        .quad 0xc0862884cf630e38, 0xbe1cf2186072f2cc
> > > +        .quad 0xc0862886dbefeff0, 0xbe1cef9217633d34
> > > +        .quad 0xc0862888e7f699e0, 0xbe1cf05603549486
> > > +        .quad 0xc086288af37750b0, 0xbe1cef50fff513d3
> > > +        .quad 0xc086288cfe7258c0, 0xbe1cf127713b32d0
> > > +        .quad 0xc086288f08e7f650, 0xbe1cf05015520f3d
> > > +        .quad 0xc086289112d86d58, 0xbe1cf12eb458b26f
> > > +        .quad 0xc08628931c4401a8, 0xbe1cf22eae2887ed
> > > +        .quad 0xc0862895252af6e0, 0xbe1cefdd6656dd2d
> > > +        .quad 0xc08628972d8d9058, 0xbe1cf1048ea4e646
> > > +        .quad 0xc0862899356c1150, 0xbe1ceec4501167e9
> > > +        .quad 0xc086289b3cc6bcb8, 0xbe1cf0ad52becc3f
> > > +        .quad 0xc086289d439dd568, 0xbe1cf0daa4e00e35
> > > +        .quad 0xc086289f49f19df8, 0xbe1cf00b80de8d6a
> > > +        .quad 0xc08628a14fc258c8, 0xbe1cf1bcf2ea8464
> > > +        .quad 0xc08628a355104818, 0xbe1cf0435e2782b0
> > > +        .quad 0xc08628a559dbade0, 0xbe1cf0e3e1a5f56c
> > > +        .quad 0xc08628a75e24cbf8, 0xbe1cefed9d5a721d
> > > +        .quad 0xc08628a961ebe3f8, 0xbe1cf0d2d74321e2
> > > +        .quad 0xc08628ab65313750, 0xbe1cf24200eb55e9
> > > +        .quad 0xc08628ad67f50740, 0xbe1cf23e9d7cf979
> > > +        .quad 0xc08628af6a3794d0, 0xbe1cf23a088f421c
> > > +        .quad 0xc08628b16bf920e0, 0xbe1cef2c1de1ab32
> > > +        .quad 0xc08628b36d39ec08, 0xbe1cf1abc231f7b2
> > > +        .quad 0xc08628b56dfa36d0, 0xbe1cf2074d5ba303
> > > +        .quad 0xc08628b76e3a4180, 0xbe1cf05cd5eed880
> > > +        /*== Log_LA_table ==*/
> > > +        .align 32
> > > +        .quad 0x8000000000000000
> > > +        .quad 0xbf5ff802a9ab10e6
> > > +        .quad 0xbf6ff00aa2b10bc0
> > > +        .quad 0xbf77ee11ebd82e94
> > > +        .quad 0xbf7fe02a6b106789
> > > +        .quad 0xbf83e7295d25a7d9
> > > +        .quad 0xbf87dc475f810a77
> > > +        .quad 0xbf8bcf712c74384c
> > > +        .quad 0xbf8fc0a8b0fc03e4
> > > +        .quad 0xbf91d7f7eb9eebe7
> > > +        .quad 0xbf93cea44346a575
> > > +        .quad 0xbf95c45a51b8d389
> > > +        .quad 0xbf97b91b07d5b11b
> > > +        .quad 0xbf99ace7551cc514
> > > +        .quad 0xbf9b9fc027af9198
> > > +        .quad 0xbf9d91a66c543cc4
> > > +        .quad 0xbf9f829b0e783300
> > > +        .quad 0xbfa0b94f7c196176
> > > +        .quad 0xbfa1b0d98923d980
> > > +        .quad 0xbfa2a7ec2214e873
> > > +        .quad 0xbfa39e87b9febd60
> > > +        .quad 0xbfa494acc34d911c
> > > +        .quad 0xbfa58a5bafc8e4d5
> > > +        .quad 0xbfa67f94f094bd98
> > > +        .quad 0xbfa77458f632dcfc
> > > +        .quad 0xbfa868a83083f6cf
> > > +        .quad 0xbfa95c830ec8e3eb
> > > +        .quad 0xbfaa4fe9ffa3d235
> > > +        .quad 0xbfab42dd711971bf
> > > +        .quad 0xbfac355dd0921f2d
> > > +        .quad 0xbfad276b8adb0b52
> > > +        .quad 0xbfae19070c276016
> > > +        .quad 0xbfaf0a30c01162a6
> > > +        .quad 0xbfaffae9119b9303
> > > +        .quad 0xbfb075983598e471
> > > +        .quad 0xbfb0ed839b5526fe
> > > +        .quad 0xbfb16536eea37ae1
> > > +        .quad 0xbfb1dcb263db1944
> > > +        .quad 0xbfb253f62f0a1417
> > > +        .quad 0xbfb2cb0283f5de1f
> > > +        .quad 0xbfb341d7961bd1d1
> > > +        .quad 0xbfb3b87598b1b6ee
> > > +        .quad 0xbfb42edcbea646f0
> > > +        .quad 0xbfb4a50d3aa1b040
> > > +        .quad 0xbfb51b073f06183f
> > > +        .quad 0xbfb590cafdf01c28
> > > +        .quad 0xbfb60658a93750c4
> > > +        .quad 0xbfb67bb0726ec0fc
> > > +        .quad 0xbfb6f0d28ae56b4c
> > > +        .quad 0xbfb765bf23a6be13
> > > +        .quad 0xbfb7da766d7b12cd
> > > +        .quad 0xbfb84ef898e8282a
> > > +        .quad 0xbfb8c345d6319b21
> > > +        .quad 0xbfb9375e55595ede
> > > +        .quad 0xbfb9ab42462033ad
> > > +        .quad 0xbfba1ef1d8061cd4
> > > +        .quad 0xbfba926d3a4ad563
> > > +        .quad 0xbfbb05b49bee43fe
> > > +        .quad 0xbfbb78c82bb0eda1
> > > +        .quad 0xbfbbeba818146765
> > > +        .quad 0xbfbc5e548f5bc743
> > > +        .quad 0xbfbcd0cdbf8c13e1
> > > +        .quad 0xbfbd4313d66cb35d
> > > +        .quad 0xbfbdb5270187d927
> > > +        .quad 0xbfbe27076e2af2e6
> > > +        .quad 0xbfbe98b549671467
> > > +        .quad 0xbfbf0a30c01162a6
> > > +        .quad 0xbfbf7b79fec37ddf
> > > +        .quad 0xbfbfec9131dbeabb
> > > +        .quad 0xbfc02ebb42bf3d4b
> > > +        .quad 0xbfc0671512ca596e
> > > +        .quad 0xbfc09f561ee719c3
> > > +        .quad 0xbfc0d77e7cd08e59
> > > +        .quad 0xbfc10f8e422539b1
> > > +        .quad 0xbfc14785846742ac
> > > +        .quad 0xbfc17f6458fca611
> > > +        .quad 0xbfc1b72ad52f67a0
> > > +        .quad 0xbfc1eed90e2dc2c3
> > > +        .quad 0xbfc2266f190a5acb
> > > +        .quad 0xbfc25ded0abc6ad2
> > > +        .quad 0xbfc29552f81ff523
> > > +        .quad 0xbfc2cca0f5f5f251
> > > +        .quad 0xbfc303d718e47fd3
> > > +        .quad 0xbfc33af575770e4f
> > > +        .quad 0xbfc371fc201e8f74
> > > +        .quad 0xbfc3a8eb2d31a376
> > > +        .quad 0xbfc3dfc2b0ecc62a
> > > +        .quad 0xbfc41682bf727bc0
> > > +        .quad 0xbfc44d2b6ccb7d1e
> > > +        .quad 0xbfc483bccce6e3dd
> > > +        .quad 0xbfc4ba36f39a55e5
> > > +        .quad 0xbfc4f099f4a230b2
> > > +        .quad 0xbfc526e5e3a1b438
> > > +        .quad 0xbfc55d1ad4232d6f
> > > +        .quad 0xbfc59338d9982086
> > > +        .quad 0xbfc5c940075972b9
> > > +        .quad 0xbfc5ff3070a793d4
> > > +        .quad 0xbfc6350a28aaa758
> > > +        .quad 0xbfc66acd4272ad51
> > > +        .quad 0xbfc6a079d0f7aad2
> > > +        .quad 0xbfc6d60fe719d21d
> > > +        .quad 0xbfc70b8f97a1aa75
> > > +        .quad 0xbfc740f8f54037a5
> > > +        .quad 0xbfc7764c128f2127
> > > +        .quad 0xbfc7ab890210d909
> > > +        .quad 0xbfc7e0afd630c274
> > > +        .quad 0xbfc815c0a14357eb
> > > +        .quad 0xbfc84abb75865139
> > > +        .quad 0xbfc87fa06520c911
> > > +        .quad 0xbfc8b46f8223625b
> > > +        .quad 0xbfc8e928de886d41
> > > +        .quad 0xbfc91dcc8c340bde
> > > +        .quad 0xbfc9525a9cf456b4
> > > +        .quad 0xbfc986d3228180ca
> > > +        .quad 0xbfc9bb362e7dfb83
> > > +        .quad 0xbfc9ef83d2769a34
> > > +        .quad 0xbfca23bc1fe2b563
> > > +        .quad 0xbfca57df28244dcd
> > > +        .quad 0xbfca8becfc882f19
> > > +        .quad 0xbfcabfe5ae46124c
> > > +        .quad 0xbfcaf3c94e80bff3
> > > +        .quad 0xbfcb2797ee46320c
> > > +        .quad 0xbfcb5b519e8fb5a4
> > > +        .quad 0xbfcb8ef670420c3b
> > > +        .quad 0xbfcbc286742d8cd6
> > > +        .quad 0xbfcbf601bb0e44e2
> > > +        .quad 0xbfcc2968558c18c1
> > > +        .quad 0xbfcc5cba543ae425
> > > +        .quad 0xbfcc8ff7c79a9a22
> > > +        .quad 0xbfccc320c0176502
> > > +        .quad 0xbfccf6354e09c5dc
> > > +        .quad 0xbfcd293581b6b3e7
> > > +        .quad 0xbfcd5c216b4fbb91
> > > +        .quad 0xbfcd8ef91af31d5e
> > > +        .quad 0xbfcdc1bca0abec7d
> > > +        .quad 0xbfcdf46c0c722d2f
> > > +        .quad 0xbfce27076e2af2e6
> > > +        .quad 0xbfce598ed5a87e2f
> > > +        .quad 0xbfce8c0252aa5a60
> > > +        .quad 0xbfcebe61f4dd7b0b
> > > +        .quad 0xbfcef0adcbdc5936
> > > +        .quad 0xbfcf22e5e72f105d
> > > +        .quad 0xbfcf550a564b7b37
> > > +        .quad 0xbfcf871b28955045
> > > +        .quad 0xbfcfb9186d5e3e2b
> > > +        .quad 0xbfcfeb0233e607cc
> > > +        .quad 0xbfd00e6c45ad501d
> > > +        .quad 0xbfd0274dc16c232f
> > > +        .quad 0xbfd0402594b4d041
> > > +        .quad 0xbfd058f3c703ebc6
> > > +        .quad 0xbfd071b85fcd590d
> > > +        .quad 0xbfd08a73667c57af
> > > +        .quad 0xbfd0a324e27390e3
> > > +        .quad 0xbfd0bbccdb0d24bd
> > > +        .quad 0xbfd0d46b579ab74b
> > > +        .quad 0xbfd0ed005f657da4
> > > +        .quad 0xbfd1058bf9ae4ad5
> > > +        .quad 0xbfd11e0e2dad9cb7
> > > +        .quad 0xbfd136870293a8b0
> > > +        .quad 0xbfd14ef67f88685a
> > > +        .quad 0xbfd1675cababa60e
> > > +        .quad 0xbfd17fb98e15095d
> > > +        .quad 0xbfd1980d2dd4236f
> > > +        .quad 0xbfd1b05791f07b49
> > > +        .quad 0xbfd1c898c16999fb
> > > +        .quad 0xbfd1e0d0c33716be
> > > +        .quad 0xbfd1f8ff9e48a2f3
> > > +        .quad 0xbfd211255986160c
> > > +        .quad 0xbfd22941fbcf7966
> > > +        .quad 0xbfd241558bfd1404
> > > +        .quad 0xbfd2596010df763a
> > > +        .quad 0xbfd27161913f853d
> > > +        .quad 0xbfd2895a13de86a3
> > > +        .quad 0xbfd2a1499f762bc9
> > > +        .quad 0xbfd2b9303ab89d25
> > > +        .quad 0xbfd2d10dec508583
> > > +        .quad 0xbfd2e8e2bae11d31
> > > +        .quad 0xbfd300aead06350c
> > > +        .quad 0xbfd31871c9544185
> > > +        .quad 0xbfd3302c16586588
> > > +        .quad 0xbfd347dd9a987d55
> > > +        .quad 0xbfd35f865c93293e
> > > +        .quad 0xbfd3772662bfd85b
> > > +        .quad 0xbfd38ebdb38ed321
> > > +        .quad 0xbfd3a64c556945ea
> > > +        .quad 0xbfd3bdd24eb14b6a
> > > +        .quad 0xbfd3d54fa5c1f710
> > > +        .quad 0xbfd3ecc460ef5f50
> > > +        .quad 0xbfd404308686a7e4
> > > +        .quad 0xbfd41b941cce0bee
> > > +        .quad 0xbfd432ef2a04e814
> > > +        .quad 0xbfd44a41b463c47c
> > > +        .quad 0xbfd4618bc21c5ec2
> > > +        .quad 0xbfd478cd5959b3d9
> > > +        .quad 0xbfd49006804009d1
> > > +        .quad 0xbfd4a7373cecf997
> > > +        .quad 0xbfd4be5f957778a1
> > > +        .quad 0xbfd4d57f8fefe27f
> > > +        .quad 0xbfd4ec973260026a
> > > +        .quad 0xbfd503a682cb1cb3
> > > +        .quad 0xbfd51aad872df82d
> > > +        .quad 0xbfd531ac457ee77e
> > > +        .quad 0xbfd548a2c3add263
> > > +        .quad 0xbfd55f9107a43ee2
> > > +        .quad 0xbfd5767717455a6c
> > > +        .quad 0xbfd58d54f86e02f2
> > > +        .quad 0xbfd5a42ab0f4cfe2
> > > +        .quad 0xbfd5baf846aa1b19
> > > +        .quad 0xbfd5d1bdbf5809ca
> > > +        .quad 0xbfd5e87b20c2954a
> > > +        .quad 0xbfd5ff3070a793d4
> > > +        .quad 0xbfd615ddb4bec13c
> > > +        .quad 0xbfd62c82f2b9c795
> > > +        .quad 0x3fd61965cdb02c1f
> > > +        .quad 0x3fd602d08af091ec
> > > +        .quad 0x3fd5ec433d5c35ae
> > > +        .quad 0x3fd5d5bddf595f30
> > > +        .quad 0x3fd5bf406b543db2
> > > +        .quad 0x3fd5a8cadbbedfa1
> > > +        .quad 0x3fd5925d2b112a59
> > > +        .quad 0x3fd57bf753c8d1fb
> > > +        .quad 0x3fd565995069514c
> > > +        .quad 0x3fd54f431b7be1a9
> > > +        .quad 0x3fd538f4af8f72fe
> > > +        .quad 0x3fd522ae0738a3d8
> > > +        .quad 0x3fd50c6f1d11b97c
> > > +        .quad 0x3fd4f637ebba9810
> > > +        .quad 0x3fd4e0086dd8baca
> > > +        .quad 0x3fd4c9e09e172c3c
> > > +        .quad 0x3fd4b3c077267e9a
> > > +        .quad 0x3fd49da7f3bcc41f
> > > +        .quad 0x3fd487970e958770
> > > +        .quad 0x3fd4718dc271c41b
> > > +        .quad 0x3fd45b8c0a17df13
> > > +        .quad 0x3fd44591e0539f49
> > > +        .quad 0x3fd42f9f3ff62642
> > > +        .quad 0x3fd419b423d5e8c7
> > > +        .quad 0x3fd403d086cea79c
> > > +        .quad 0x3fd3edf463c1683e
> > > +        .quad 0x3fd3d81fb5946dba
> > > +        .quad 0x3fd3c25277333184
> > > +        .quad 0x3fd3ac8ca38e5c5f
> > > +        .quad 0x3fd396ce359bbf54
> > > +        .quad 0x3fd3811728564cb2
> > > +        .quad 0x3fd36b6776be1117
> > > +        .quad 0x3fd355bf1bd82c8b
> > > +        .quad 0x3fd3401e12aecba1
> > > +        .quad 0x3fd32a84565120a8
> > > +        .quad 0x3fd314f1e1d35ce4
> > > +        .quad 0x3fd2ff66b04ea9d4
> > > +        .quad 0x3fd2e9e2bce12286
> > > +        .quad 0x3fd2d46602adccee
> > > +        .quad 0x3fd2bef07cdc9354
> > > +        .quad 0x3fd2a982269a3dbf
> > > +        .quad 0x3fd2941afb186b7c
> > > +        .quad 0x3fd27ebaf58d8c9d
> > > +        .quad 0x3fd269621134db92
> > > +        .quad 0x3fd25410494e56c7
> > > +        .quad 0x3fd23ec5991eba49
> > > +        .quad 0x3fd22981fbef797b
> > > +        .quad 0x3fd214456d0eb8d4
> > > +        .quad 0x3fd1ff0fe7cf47a7
> > > +        .quad 0x3fd1e9e1678899f4
> > > +        .quad 0x3fd1d4b9e796c245
> > > +        .quad 0x3fd1bf99635a6b95
> > > +        .quad 0x3fd1aa7fd638d33f
> > > +        .quad 0x3fd1956d3b9bc2fa
> > > +        .quad 0x3fd180618ef18adf
> > > +        .quad 0x3fd16b5ccbacfb73
> > > +        .quad 0x3fd1565eed455fc3
> > > +        .quad 0x3fd14167ef367783
> > > +        .quad 0x3fd12c77cd00713b
> > > +        .quad 0x3fd1178e8227e47c
> > > +        .quad 0x3fd102ac0a35cc1c
> > > +        .quad 0x3fd0edd060b78081
> > > +        .quad 0x3fd0d8fb813eb1ef
> > > +        .quad 0x3fd0c42d676162e3
> > > +        .quad 0x3fd0af660eb9e279
> > > +        .quad 0x3fd09aa572e6c6d4
> > > +        .quad 0x3fd085eb8f8ae797
> > > +        .quad 0x3fd07138604d5862
> > > +        .quad 0x3fd05c8be0d9635a
> > > +        .quad 0x3fd047e60cde83b8
> > > +        .quad 0x3fd03346e0106062
> > > +        .quad 0x3fd01eae5626c691
> > > +        .quad 0x3fd00a1c6adda473
> > > +        .quad 0x3fcfeb2233ea07cd
> > > +        .quad 0x3fcfc218be620a5e
> > > +        .quad 0x3fcf991c6cb3b379
> > > +        .quad 0x3fcf702d36777df0
> > > +        .quad 0x3fcf474b134df229
> > > +        .quad 0x3fcf1e75fadf9bde
> > > +        .quad 0x3fcef5ade4dcffe6
> > > +        .quad 0x3fceccf2c8fe920a
> > > +        .quad 0x3fcea4449f04aaf5
> > > +        .quad 0x3fce7ba35eb77e2a
> > > +        .quad 0x3fce530effe71012
> > > +        .quad 0x3fce2a877a6b2c12
> > > +        .quad 0x3fce020cc6235ab5
> > > +        .quad 0x3fcdd99edaf6d7e9
> > > +        .quad 0x3fcdb13db0d48940
> > > +        .quad 0x3fcd88e93fb2f450
> > > +        .quad 0x3fcd60a17f903515
> > > +        .quad 0x3fcd38666871f465
> > > +        .quad 0x3fcd1037f2655e7b
> > > +        .quad 0x3fcce816157f1988
> > > +        .quad 0x3fccc000c9db3c52
> > > +        .quad 0x3fcc97f8079d44ec
> > > +        .quad 0x3fcc6ffbc6f00f71
> > > +        .quad 0x3fcc480c0005ccd1
> > > +        .quad 0x3fcc2028ab17f9b4
> > > +        .quad 0x3fcbf851c067555f
> > > +        .quad 0x3fcbd087383bd8ad
> > > +        .quad 0x3fcba8c90ae4ad19
> > > +        .quad 0x3fcb811730b823d2
> > > +        .quad 0x3fcb5971a213acdb
> > > +        .quad 0x3fcb31d8575bce3d
> > > +        .quad 0x3fcb0a4b48fc1b46
> > > +        .quad 0x3fcae2ca6f672bd4
> > > +        .quad 0x3fcabb55c31693ad
> > > +        .quad 0x3fca93ed3c8ad9e3
> > > +        .quad 0x3fca6c90d44b704e
> > > +        .quad 0x3fca454082e6ab05
> > > +        .quad 0x3fca1dfc40f1b7f1
> > > +        .quad 0x3fc9f6c407089664
> > > +        .quad 0x3fc9cf97cdce0ec3
> > > +        .quad 0x3fc9a8778debaa38
> > > +        .quad 0x3fc981634011aa75
> > > +        .quad 0x3fc95a5adcf7017f
> > > +        .quad 0x3fc9335e5d594989
> > > +        .quad 0x3fc90c6db9fcbcd9
> > > +        .quad 0x3fc8e588ebac2dbf
> > > +        .quad 0x3fc8beafeb38fe8c
> > > +        .quad 0x3fc897e2b17b19a5
> > > +        .quad 0x3fc871213750e994
> > > +        .quad 0x3fc84a6b759f512f
> > > +        .quad 0x3fc823c16551a3c2
> > > +        .quad 0x3fc7fd22ff599d4f
> > > +        .quad 0x3fc7d6903caf5ad0
> > > +        .quad 0x3fc7b0091651528c
> > > +        .quad 0x3fc7898d85444c73
> > > +        .quad 0x3fc7631d82935a86
> > > +        .quad 0x3fc73cb9074fd14d
> > > +        .quad 0x3fc716600c914054
> > > +        .quad 0x3fc6f0128b756abc
> > > +        .quad 0x3fc6c9d07d203fc7
> > > +        .quad 0x3fc6a399dabbd383
> > > +        .quad 0x3fc67d6e9d785771
> > > +        .quad 0x3fc6574ebe8c133a
> > > +        .quad 0x3fc6313a37335d76
> > > +        .quad 0x3fc60b3100b09476
> > > +        .quad 0x3fc5e533144c1719
> > > +        .quad 0x3fc5bf406b543db2
> > > +        .quad 0x3fc59958ff1d52f1
> > > +        .quad 0x3fc5737cc9018cdd
> > > +        .quad 0x3fc54dabc26105d2
> > > +        .quad 0x3fc527e5e4a1b58d
> > > +        .quad 0x3fc5022b292f6a45
> > > +        .quad 0x3fc4dc7b897bc1c8
> > > +        .quad 0x3fc4b6d6fefe22a4
> > > +        .quad 0x3fc4913d8333b561
> > > +        .quad 0x3fc46baf0f9f5db7
> > > +        .quad 0x3fc4462b9dc9b3dc
> > > +        .quad 0x3fc420b32740fdd4
> > > +        .quad 0x3fc3fb45a59928cc
> > > +        .quad 0x3fc3d5e3126bc27f
> > > +        .quad 0x3fc3b08b6757f2a9
> > > +        .quad 0x3fc38b3e9e027479
> > > +        .quad 0x3fc365fcb0159016
> > > +        .quad 0x3fc340c59741142e
> > > +        .quad 0x3fc31b994d3a4f85
> > > +        .quad 0x3fc2f677cbbc0a96
> > > +        .quad 0x3fc2d1610c86813a
> > > +        .quad 0x3fc2ac55095f5c59
> > > +        .quad 0x3fc28753bc11aba5
> > > +        .quad 0x3fc2625d1e6ddf57
> > > +        .quad 0x3fc23d712a49c202
> > > +        .quad 0x3fc2188fd9807263
> > > +        .quad 0x3fc1f3b925f25d41
> > > +        .quad 0x3fc1ceed09853752
> > > +        .quad 0x3fc1aa2b7e23f72a
> > > +        .quad 0x3fc185747dbecf34
> > > +        .quad 0x3fc160c8024b27b1
> > > +        .quad 0x3fc13c2605c398c3
> > > +        .quad 0x3fc1178e8227e47c
> > > +        .quad 0x3fc0f301717cf0fb
> > > +        .quad 0x3fc0ce7ecdccc28d
> > > +        .quad 0x3fc0aa06912675d5
> > > +        .quad 0x3fc08598b59e3a07
> > > +        .quad 0x3fc06135354d4b18
> > > +        .quad 0x3fc03cdc0a51ec0d
> > > +        .quad 0x3fc0188d2ecf6140
> > > +        .quad 0x3fbfe89139dbd566
> > > +        .quad 0x3fbfa01c9db57ce2
> > > +        .quad 0x3fbf57bc7d9005db
> > > +        .quad 0x3fbf0f70cdd992e3
> > > +        .quad 0x3fbec739830a1120
> > > +        .quad 0x3fbe7f1691a32d3e
> > > +        .quad 0x3fbe3707ee30487b
> > > +        .quad 0x3fbdef0d8d466db9
> > > +        .quad 0x3fbda727638446a2
> > > +        .quad 0x3fbd5f55659210e2
> > > +        .quad 0x3fbd179788219364
> > > +        .quad 0x3fbccfedbfee13a8
> > > +        .quad 0x3fbc885801bc4b23
> > > +        .quad 0x3fbc40d6425a5cb1
> > > +        .quad 0x3fbbf968769fca11
> > > +        .quad 0x3fbbb20e936d6974
> > > +        .quad 0x3fbb6ac88dad5b1c
> > > +        .quad 0x3fbb23965a52ff00
> > > +        .quad 0x3fbadc77ee5aea8c
> > > +        .quad 0x3fba956d3ecade63
> > > +        .quad 0x3fba4e7640b1bc38
> > > +        .quad 0x3fba0792e9277cac
> > > +        .quad 0x3fb9c0c32d4d2548
> > > +        .quad 0x3fb97a07024cbe74
> > > +        .quad 0x3fb9335e5d594989
> > > +        .quad 0x3fb8ecc933aeb6e8
> > > +        .quad 0x3fb8a6477a91dc29
> > > +        .quad 0x3fb85fd927506a48
> > > +        .quad 0x3fb8197e2f40e3f0
> > > +        .quad 0x3fb7d33687c293c9
> > > +        .quad 0x3fb78d02263d82d3
> > > +        .quad 0x3fb746e100226ed9
> > > +        .quad 0x3fb700d30aeac0e1
> > > +        .quad 0x3fb6bad83c1883b6
> > > +        .quad 0x3fb674f089365a7a
> > > +        .quad 0x3fb62f1be7d77743
> > > +        .quad 0x3fb5e95a4d9791cb
> > > +        .quad 0x3fb5a3abb01ade25
> > > +        .quad 0x3fb55e10050e0384
> > > +        .quad 0x3fb518874226130a
> > > +        .quad 0x3fb4d3115d207eac
> > > +        .quad 0x3fb48dae4bc31018
> > > +        .quad 0x3fb4485e03dbdfad
> > > +        .quad 0x3fb403207b414b7f
> > > +        .quad 0x3fb3bdf5a7d1ee64
> > > +        .quad 0x3fb378dd7f749714
> > > +        .quad 0x3fb333d7f8183f4b
> > > +        .quad 0x3fb2eee507b40301
> > > +        .quad 0x3fb2aa04a44717a5
> > > +        .quad 0x3fb26536c3d8c369
> > > +        .quad 0x3fb2207b5c78549e
> > > +        .quad 0x3fb1dbd2643d190b
> > > +        .quad 0x3fb1973bd1465567
> > > +        .quad 0x3fb152b799bb3cc9
> > > +        .quad 0x3fb10e45b3cae831
> > > +        .quad 0x3fb0c9e615ac4e17
> > > +        .quad 0x3fb08598b59e3a07
> > > +        .quad 0x3fb0415d89e74444
> > > +        .quad 0x3faffa6911ab9301
> > > +        .quad 0x3faf723b517fc523
> > > +        .quad 0x3faeea31c006b87c
> > > +        .quad 0x3fae624c4a0b5e1b
> > > +        .quad 0x3fadda8adc67ee4e
> > > +        .quad 0x3fad52ed6405d86f
> > > +        .quad 0x3faccb73cdddb2cc
> > > +        .quad 0x3fac441e06f72a9e
> > > +        .quad 0x3fabbcebfc68f420
> > > +        .quad 0x3fab35dd9b58baad
> > > +        .quad 0x3faaaef2d0fb10fc
> > > +        .quad 0x3faa282b8a936171
> > > +        .quad 0x3fa9a187b573de7c
> > > +        .quad 0x3fa91b073efd7314
> > > +        .quad 0x3fa894aa149fb343
> > > +        .quad 0x3fa80e7023d8ccc4
> > > +        .quad 0x3fa788595a3577ba
> > > +        .quad 0x3fa70265a550e777
> > > +        .quad 0x3fa67c94f2d4bb58
> > > +        .quad 0x3fa5f6e73078efb8
> > > +        .quad 0x3fa5715c4c03ceef
> > > +        .quad 0x3fa4ebf43349e26f
> > > +        .quad 0x3fa466aed42de3ea
> > > +        .quad 0x3fa3e18c1ca0ae92
> > > +        .quad 0x3fa35c8bfaa1306b
> > > +        .quad 0x3fa2d7ae5c3c5bae
> > > +        .quad 0x3fa252f32f8d183f
> > > +        .quad 0x3fa1ce5a62bc353a
> > > +        .quad 0x3fa149e3e4005a8d
> > > +        .quad 0x3fa0c58fa19dfaaa
> > > +        .quad 0x3fa0415d89e74444
> > > +        .quad 0x3f9f7a9b16782856
> > > +        .quad 0x3f9e72bf2813ce51
> > > +        .quad 0x3f9d6b2725979802
> > > +        .quad 0x3f9c63d2ec14aaf2
> > > +        .quad 0x3f9b5cc258b718e6
> > > +        .quad 0x3f9a55f548c5c43f
> > > +        .quad 0x3f994f6b99a24475
> > > +        .quad 0x3f98492528c8cabf
> > > +        .quad 0x3f974321d3d006d3
> > > +        .quad 0x3f963d6178690bd6
> > > +        .quad 0x3f9537e3f45f3565
> > > +        .quad 0x3f9432a925980cc1
> > > +        .quad 0x3f932db0ea132e22
> > > +        .quad 0x3f9228fb1fea2e28
> > > +        .quad 0x3f912487a5507f70
> > > +        .quad 0x3f90205658935847
> > > +        .quad 0x3f8e38ce3033310c
> > > +        .quad 0x3f8c317384c75f06
> > > +        .quad 0x3f8a2a9c6c170462
> > > +        .quad 0x3f882448a388a2aa
> > > +        .quad 0x3f861e77e8b53fc6
> > > +        .quad 0x3f841929f96832f0
> > > +        .quad 0x3f82145e939ef1e9
> > > +        .quad 0x3f8010157588de71
> > > +        .quad 0x3f7c189cbb0e27fb
> > > +        .quad 0x3f78121214586b54
> > > +        .quad 0x3f740c8a747878e2
> > > +        .quad 0x3f70080559588b35
> > > +        .quad 0x3f680904828985c0
> > > +        .quad 0x3f60040155d5889e
> > > +        .quad 0x3f50020055655889
> > > +        .quad 0x0000000000000000
> > > +        /*== poly_coeff[4] ==*/
> > > +        .align 32
> > > +        .quad 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A, 0x3fc9999CACDB4D0A /* coeff4 */
> > > +        .quad 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1, 0xbfd0000148058EE1 /* coeff3 */
> > > +        .quad 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5, 0x3fd55555555543C5 /* coeff2 */
> > > +        .quad 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F, 0xbfdFFFFFFFFFF81F /* coeff1 */
> > > +        /*== ExpMask ==*/
> > > +        .align 32
> > > +        .quad 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff, 0x000fffffffffffff
> > > +        /*== Two10 ==*/
> > > +        .align 32
> > > +        .quad 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000, 0x3f50000000000000
> > > +        /*== MinLog1p = -1+2^(-53) ==*/
> > > +        .align 32
> > > +        .quad 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff, 0xbfefffffffffffff
> > > +        /*== MaxLog1p ==*/
> > > +        .align 32
> > > +        .quad 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000, 0x7f3ffffffffff000
> > > +        /*== One ==*/
> > > +        .align 32
> > > +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> > > +        /*== SgnMask ==*/
> > > +        .align 32
> > > +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff
> > > +        /*== XThreshold ==*/
> > > +        .align 32
> > > +        .quad 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000, 0x3e00000000000000
> > > +        /*== XhMask ==*/
> > > +        .align 32
> > > +        .quad 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00, 0xfffffffffffffc00
> > > +        /*== Threshold ==*/
> > > +        .align 32
> > > +        .quad 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000, 0x4086a00000000000
> > > +        /*== Bias ==*/
> > > +        .align 32
> > > +        .quad 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000, 0x408ff80000000000
> > > +        /*== Bias1 ==*/
> > > +        .align 32
> > > +        .quad 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000, 0x408ff00000000000
> > > +        /*== ExpMask ==*/
> > > +        .align 32
> > > +        .quad 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000, 0x7ff0000000000000
> > > +        /*== ExpMask2 ==*/
> > > +        .align 32
> > > +        .quad 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000, 0x7f40000000000000
> > > +        /*== L2L ==*/
> > > +        .align 32
> > > +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> > > +        .align 32
> > > +        .type        __svml_dlog1p_data_internal,@object
> > > +        .size        __svml_dlog1p_data_internal,.-__svml_dlog1p_data_internal
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
> > > new file mode 100644
> > > index 0000000000..ca174a5f52
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core-avx2.S
> > > @@ -0,0 +1,20 @@
> > > +/* AVX2 version of vectorized log1p, vector length is 8.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define _ZGVeN8v_log1p _ZGVeN8v_log1p_avx2_wrapper
> > > +#include "../svml_d_log1p8_core.S"
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
> > > new file mode 100644
> > > index 0000000000..0aa35ec8c5
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core.c
> > > @@ -0,0 +1,27 @@
> > > +/* Multiple versions of vectorized log1p, vector length is 8.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define SYMBOL_NAME _ZGVeN8v_log1p
> > > +#include "ifunc-mathvec-avx512-skx.h"
> > > +
> > > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > > +
> > > +#ifdef SHARED
> > > +__hidden_ver1 (_ZGVeN8v_log1p, __GI__ZGVeN8v_log1p, __redirect__ZGVeN8v_log1p)
> > > +  __attribute__ ((visibility ("hidden")));
> > > +#endif
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
> > > new file mode 100644
> > > index 0000000000..5e38ff8d39
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_log1p8_core_avx512.S
> > > @@ -0,0 +1,317 @@
> > > +/* Function log1p vectorized with AVX-512.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   https://www.gnu.org/licenses/.  */
> > > +
> > > +/*
> > > + * ALGORITHM DESCRIPTION:
> > > + *
> > > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > > + *       log(Rcp) is tabulated
> > > + *
> > > + *
> > > + */
> > > +
> > > +/* Offsets for data table __svml_dlog1p_data_internal_avx512
> > > + */
> > > +#define Log_tbl                              0
> > > +#define One                                  128
> > > +#define SgnMask                              192
> > > +#define C075                                 256
> > > +#define poly_coeff9                          320
> > > +#define poly_coeff8                          384
> > > +#define poly_coeff7                          448
> > > +#define poly_coeff6                          512
> > > +#define poly_coeff5                          576
> > > +#define poly_coeff4                          640
> > > +#define poly_coeff3                          704
> > > +#define poly_coeff2                          768
> > > +#define L2                                   832
> > > +
> > > +#include <sysdep.h>
> > > +
> > > +        .text
> > > +     .section .text.evex512,"ax",@progbits
> > > +ENTRY(_ZGVeN8v_log1p_skx)
> > > +        pushq     %rbp
> > > +        cfi_def_cfa_offset(16)
> > > +        movq      %rsp, %rbp
> > > +        cfi_def_cfa(6, 16)
> > > +        cfi_offset(6, -16)
> > > +        andq      $-64, %rsp
> > > +        subq      $192, %rsp
> > > +        vmovups   One+__svml_dlog1p_data_internal_avx512(%rip), %zmm7
> > > +        vmovups   SgnMask+__svml_dlog1p_data_internal_avx512(%rip), %zmm14
> > > +        vmovaps   %zmm0, %zmm9
> > > +        vaddpd    {rn-sae}, %zmm9, %zmm7, %zmm11
> > > +        vandpd    %zmm14, %zmm9, %zmm8
> > > +
> > > +/* compute 1+x as high, low parts */
> > > +        vmaxpd    {sae}, %zmm9, %zmm7, %zmm10
> > > +        vminpd    {sae}, %zmm9, %zmm7, %zmm12
> > > +
> > > +/* GetMant(x), normalized to [1,2) for x>=0, NaN for x<0 */
> > > +        vgetmantpd $8, {sae}, %zmm11, %zmm6
> > > +
> > > +/* GetExp(x) */
> > > +        vgetexppd {sae}, %zmm11, %zmm5
> > > +        vsubpd    {rn-sae}, %zmm10, %zmm11, %zmm13
> > > +
> > > +/* DblRcp ~ 1/Mantissa */
> > > +        vrcp14pd  %zmm6, %zmm15
> > > +
> > > +/* Start polynomial evaluation */
> > > +        vmovups   poly_coeff9+__svml_dlog1p_data_internal_avx512(%rip), %zmm10
> > > +        vmovups   poly_coeff7+__svml_dlog1p_data_internal_avx512(%rip), %zmm11
> > > +
> > > +/* Xl */
> > > +        vsubpd    {rn-sae}, %zmm13, %zmm12, %zmm2
> > > +        vxorpd    %zmm14, %zmm5, %zmm3
> > > +
> > > +/* round DblRcp to 4 fractional bits (RN mode, no Precision exception) */
> > > +        vrndscalepd $88, {sae}, %zmm15, %zmm4
> > > +        vmovups   poly_coeff5+__svml_dlog1p_data_internal_avx512(%rip), %zmm12
> > > +        vmovups   poly_coeff6+__svml_dlog1p_data_internal_avx512(%rip), %zmm14
> > > +        vmovups   poly_coeff3+__svml_dlog1p_data_internal_avx512(%rip), %zmm13
> > > +
> > > +/* Xl*2^(-Expon) */
> > > +        vscalefpd {rn-sae}, %zmm3, %zmm2, %zmm1
> > > +
> > > +/* Reduced argument: R = DblRcp*(Mantissa+Xl) - 1 */
> > > +        vfmsub213pd {rn-sae}, %zmm7, %zmm4, %zmm6
> > > +        vmovups   __svml_dlog1p_data_internal_avx512(%rip), %zmm3
> > > +
> > > +/*
> > > + * Table lookup
> > > + * Prepare exponent correction: DblRcp<0.75?
> > > + */
> > > +        vmovups   C075+__svml_dlog1p_data_internal_avx512(%rip), %zmm2
> > > +
> > > +/* Prepare table index */
> > > +        vpsrlq    $48, %zmm4, %zmm0
> > > +        vfmadd231pd {rn-sae}, %zmm4, %zmm1, %zmm6
> > > +        vmovups   poly_coeff8+__svml_dlog1p_data_internal_avx512(%rip), %zmm1
> > > +        vcmppd    $17, {sae}, %zmm2, %zmm4, %k1
> > > +        vcmppd    $4, {sae}, %zmm6, %zmm6, %k0
> > > +        vfmadd231pd {rn-sae}, %zmm6, %zmm10, %zmm1
> > > +        vmovups   poly_coeff4+__svml_dlog1p_data_internal_avx512(%rip), %zmm10
> > > +        vfmadd231pd {rn-sae}, %zmm6, %zmm11, %zmm14
> > > +        vmovups   L2+__svml_dlog1p_data_internal_avx512(%rip), %zmm4
> > > +        vpermt2pd Log_tbl+64+__svml_dlog1p_data_internal_avx512(%rip), %zmm0, %zmm3
> > > +
> > > +/* add 1 to Expon if DblRcp<0.75 */
> > > +        vaddpd    {rn-sae}, %zmm7, %zmm5, %zmm5{%k1}
> > > +
> > > +/* R^2 */
> > > +        vmulpd    {rn-sae}, %zmm6, %zmm6, %zmm0
> > > +        vfmadd231pd {rn-sae}, %zmm6, %zmm12, %zmm10
> > > +        vmovups   poly_coeff2+__svml_dlog1p_data_internal_avx512(%rip), %zmm12
> > > +        vmulpd    {rn-sae}, %zmm0, %zmm0, %zmm15
> > > +        vfmadd231pd {rn-sae}, %zmm6, %zmm13, %zmm12
> > > +        vfmadd213pd {rn-sae}, %zmm14, %zmm0, %zmm1
> > > +        kmovw     %k0, %edx
> > > +        vfmadd213pd {rn-sae}, %zmm12, %zmm0, %zmm10
> > > +
> > > +/* polynomial */
> > > +        vfmadd213pd {rn-sae}, %zmm10, %zmm15, %zmm1
> > > +        vfmadd213pd {rn-sae}, %zmm6, %zmm0, %zmm1
> > > +        vaddpd    {rn-sae}, %zmm1, %zmm3, %zmm6
> > > +        vfmadd213pd {rn-sae}, %zmm6, %zmm4, %zmm5
> > > +        vorpd     %zmm8, %zmm5, %zmm0
> > > +        testl     %edx, %edx
> > > +
> > > +/* Go to special inputs processing branch */
> > > +        jne       L(SPECIAL_VALUES_BRANCH)
> > > +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm9
> > > +
> > > +/* Restore registers
> > > + * and exit the function
> > > + */
> > > +
> > > +L(EXIT):
> > > +        movq      %rbp, %rsp
> > > +        popq      %rbp
> > > +        cfi_def_cfa(7, 8)
> > > +        cfi_restore(6)
> > > +        ret
> > > +        cfi_def_cfa(6, 16)
> > > +        cfi_offset(6, -16)
> > > +
> > > +/* Branch to process
> > > + * special inputs
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_BRANCH):
> > > +        vmovups   %zmm9, 64(%rsp)
> > > +        vmovups   %zmm0, 128(%rsp)
> > > +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> > > +
> > > +        xorl      %eax, %eax
> > > +                                # LOE rbx r12 r13 r14 r15 eax edx
> > > +
> > > +        vzeroupper
> > > +        movq      %r12, 16(%rsp)
> > > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> > > +        movl      %eax, %r12d
> > > +        movq      %r13, 8(%rsp)
> > > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> > > +        movl      %edx, %r13d
> > > +        movq      %r14, (%rsp)
> > > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +/* Range mask
> > > + * bits check
> > > + */
> > > +
> > > +L(RANGEMASK_CHECK):
> > > +        btl       %r12d, %r13d
> > > +
> > > +/* Call scalar math function */
> > > +        jc        L(SCALAR_MATH_CALL)
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +/* Special inputs
> > > + * processing loop
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_LOOP):
> > > +        incl      %r12d
> > > +        cmpl      $8, %r12d
> > > +
> > > +/* Check bits in range mask */
> > > +        jl        L(RANGEMASK_CHECK)
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +        movq      16(%rsp), %r12
> > > +        cfi_restore(12)
> > > +        movq      8(%rsp), %r13
> > > +        cfi_restore(13)
> > > +        movq      (%rsp), %r14
> > > +        cfi_restore(14)
> > > +        vmovups   128(%rsp), %zmm0
> > > +
> > > +/* Go to exit */
> > > +        jmp       L(EXIT)
> > > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> > > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> > > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> > > +                                # LOE rbx r12 r13 r14 r15 zmm0
> > > +
> > > +/* Scalar math fucntion call
> > > + * to process special input
> > > + */
> > > +
> > > +L(SCALAR_MATH_CALL):
> > > +        movl      %r12d, %r14d
> > > +        movsd     64(%rsp,%r14,8), %xmm0
> > > +        call      log1p@PLT
> > > +                                # LOE rbx r14 r15 r12d r13d xmm0
> > > +
> > > +        movsd     %xmm0, 128(%rsp,%r14,8)
> > > +
> > > +/* Process special inputs in loop */
> > > +        jmp       L(SPECIAL_VALUES_LOOP)
> > > +                                # LOE rbx r15 r12d r13d
> > > +END(_ZGVeN8v_log1p_skx)
> > > +
> > > +        .section .rodata, "a"
> > > +        .align 64
> > > +
> > > +#ifdef __svml_dlog1p_data_internal_avx512_typedef
> > > +typedef unsigned int VUINT32;
> > > +typedef struct {
> > > +        __declspec(align(64)) VUINT32 Log_tbl[16][2];
> > > +        __declspec(align(64)) VUINT32 One[8][2];
> > > +        __declspec(align(64)) VUINT32 SgnMask[8][2];
> > > +        __declspec(align(64)) VUINT32 C075[8][2];
> > > +        __declspec(align(64)) VUINT32 poly_coeff9[8][2];
> > > +        __declspec(align(64)) VUINT32 poly_coeff8[8][2];
> > > +        __declspec(align(64)) VUINT32 poly_coeff7[8][2];
> > > +        __declspec(align(64)) VUINT32 poly_coeff6[8][2];
> > > +        __declspec(align(64)) VUINT32 poly_coeff5[8][2];
> > > +        __declspec(align(64)) VUINT32 poly_coeff4[8][2];
> > > +        __declspec(align(64)) VUINT32 poly_coeff3[8][2];
> > > +        __declspec(align(64)) VUINT32 poly_coeff2[8][2];
> > > +        __declspec(align(64)) VUINT32 L2[8][2];
> > > +   } __svml_dlog1p_data_internal_avx512;
> > > +#endif
> > > +__svml_dlog1p_data_internal_avx512:
> > > +        /*== Log_tbl ==*/
> > > +        .quad 0x0000000000000000
> > > +        .quad 0xbfaf0a30c01162a6
> > > +        .quad 0xbfbe27076e2af2e6
> > > +        .quad 0xbfc5ff3070a793d4
> > > +        .quad 0xbfcc8ff7c79a9a22
> > > +        .quad 0xbfd1675cababa60e
> > > +        .quad 0xbfd4618bc21c5ec2
> > > +        .quad 0xbfd739d7f6bbd007
> > > +        .quad 0x3fd269621134db92
> > > +        .quad 0x3fcf991c6cb3b379
> > > +        .quad 0x3fca93ed3c8ad9e3
> > > +        .quad 0x3fc5bf406b543db2
> > > +        .quad 0x3fc1178e8227e47c
> > > +        .quad 0x3fb9335e5d594989
> > > +        .quad 0x3fb08598b59e3a07
> > > +        .quad 0x3fa0415d89e74444
> > > +        /*== One ==*/
> > > +        .align 64
> > > +        .quad 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000, 0x3ff0000000000000
> > > +        /*== SgnMask ==*/
> > > +        .align 64
> > > +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000
> > > +        /*== C075 0.75 ==*/
> > > +        .align 64
> > > +        .quad 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000, 0x3fe8000000000000
> > > +        /*== poly_coeff9 ==*/
> > > +        .align 64
> > > +        .quad 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70, 0x3fbC81CD309D7C70
> > > +        /*== poly_coeff8 ==*/
> > > +        .align 64
> > > +        .quad 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62, 0xbfc007357E93AF62
> > > +        /*== poly_coeff7 ==*/
> > > +        .align 64
> > > +        .quad 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF, 0x3fc249229CEE81EF
> > > +        /*== poly_coeff6 ==*/
> > > +        .align 64
> > > +        .quad 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06, 0xbfc55553FB28DB06
> > > +        /*== poly_coeff5 ==*/
> > > +        .align 64
> > > +        .quad 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C, 0x3fc9999999CC9F5C
> > > +        /*== poly_coeff4 ==*/
> > > +        .align 64
> > > +        .quad 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD, 0xbfd00000000C05BD
> > > +        /*== poly_coeff3 ==*/
> > > +        .align 64
> > > +        .quad 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466, 0x3fd5555555555466
> > > +        /*== poly_coeff2 ==*/
> > > +        .align 64
> > > +        .quad 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6, 0xbfdFFFFFFFFFFFC6
> > > +        /*== L2 = log(2) ==*/
> > > +        .align 64
> > > +        .quad 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF, 0x3fe62E42FEFA39EF
> > > +        .align 64
> > > +        .type        __svml_dlog1p_data_internal_avx512,@object
> > > +        .size        __svml_dlog1p_data_internal_avx512,.-__svml_dlog1p_data_internal_avx512
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
> > > new file mode 100644
> > > index 0000000000..3c0a0a01a2
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core-avx2.S
> > > @@ -0,0 +1,20 @@
> > > +/* AVX2 version of vectorized log1pf.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define _ZGVeN16v_log1pf _ZGVeN16v_log1pf_avx2_wrapper
> > > +#include "../svml_s_log1pf16_core.S"
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
> > > new file mode 100644
> > > index 0000000000..9af1320547
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core.c
> > > @@ -0,0 +1,28 @@
> > > +/* Multiple versions of vectorized log1pf, vector length is 16.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define SYMBOL_NAME _ZGVeN16v_log1pf
> > > +#include "ifunc-mathvec-avx512-skx.h"
> > > +
> > > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > > +
> > > +#ifdef SHARED
> > > +__hidden_ver1 (_ZGVeN16v_log1pf, __GI__ZGVeN16v_log1pf,
> > > +            __redirect__ZGVeN16v_log1pf)
> > > +  __attribute__ ((visibility ("hidden")));
> > > +#endif
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
> > > new file mode 100644
> > > index 0000000000..78b2fe417f
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf16_core_avx512.S
> > > @@ -0,0 +1,271 @@
> > > +/* Function log1pf vectorized with AVX-512.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   https://www.gnu.org/licenses/.  */
> > > +
> > > +/*
> > > + * ALGORITHM DESCRIPTION:
> > > + *
> > > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > > + *       log(Rcp) is tabulated
> > > + *
> > > + *
> > > + */
> > > +
> > > +/* Offsets for data table __svml_slog1p_data_internal
> > > + */
> > > +#define SgnMask                              0
> > > +#define sOne                                 64
> > > +#define sPoly_1                              128
> > > +#define sPoly_2                              192
> > > +#define sPoly_3                              256
> > > +#define sPoly_4                              320
> > > +#define sPoly_5                              384
> > > +#define sPoly_6                              448
> > > +#define sPoly_7                              512
> > > +#define sPoly_8                              576
> > > +#define iHiDelta                             640
> > > +#define iLoRange                             704
> > > +#define iBrkValue                            768
> > > +#define iOffExpoMask                         832
> > > +#define sLn2                                 896
> > > +
> > > +#include <sysdep.h>
> > > +
> > > +        .text
> > > +     .section .text.exex512,"ax",@progbits
> > > +ENTRY(_ZGVeN16v_log1pf_skx)
> > > +        pushq     %rbp
> > > +        cfi_def_cfa_offset(16)
> > > +        movq      %rsp, %rbp
> > > +        cfi_def_cfa(6, 16)
> > > +        cfi_offset(6, -16)
> > > +        andq      $-64, %rsp
> > > +        subq      $192, %rsp
> > > +        vmovups   sOne+__svml_slog1p_data_internal(%rip), %zmm2
> > > +
> > > +/* reduction: compute r,n */
> > > +        vmovups   iBrkValue+__svml_slog1p_data_internal(%rip), %zmm12
> > > +        vmovups   SgnMask+__svml_slog1p_data_internal(%rip), %zmm4
> > > +        vmovaps   %zmm0, %zmm3
> > > +
> > > +/* compute 1+x as high, low parts */
> > > +        vmaxps    {sae}, %zmm3, %zmm2, %zmm5
> > > +        vminps    {sae}, %zmm3, %zmm2, %zmm7
> > > +        vandnps   %zmm3, %zmm4, %zmm1
> > > +        vpternlogd $255, %zmm4, %zmm4, %zmm4
> > > +        vaddps    {rn-sae}, %zmm7, %zmm5, %zmm9
> > > +        vpsubd    %zmm12, %zmm9, %zmm10
> > > +        vsubps    {rn-sae}, %zmm9, %zmm5, %zmm6
> > > +
> > > +/* check argument value ranges */
> > > +        vpaddd    iHiDelta+__svml_slog1p_data_internal(%rip), %zmm9, %zmm8
> > > +        vpsrad    $23, %zmm10, %zmm13
> > > +        vmovups   sPoly_5+__svml_slog1p_data_internal(%rip), %zmm9
> > > +        vpcmpd    $5, iLoRange+__svml_slog1p_data_internal(%rip), %zmm8, %k1
> > > +        vpslld    $23, %zmm13, %zmm14
> > > +        vaddps    {rn-sae}, %zmm7, %zmm6, %zmm15
> > > +        vcvtdq2ps {rn-sae}, %zmm13, %zmm0
> > > +        vpsubd    %zmm14, %zmm2, %zmm13
> > > +        vmovups   sPoly_8+__svml_slog1p_data_internal(%rip), %zmm7
> > > +        vmovups   sPoly_1+__svml_slog1p_data_internal(%rip), %zmm14
> > > +        vmulps    {rn-sae}, %zmm13, %zmm15, %zmm6
> > > +        vpandd    iOffExpoMask+__svml_slog1p_data_internal(%rip), %zmm10, %zmm11
> > > +        vpaddd    %zmm12, %zmm11, %zmm5
> > > +        vmovups   sPoly_4+__svml_slog1p_data_internal(%rip), %zmm10
> > > +        vmovups   sPoly_3+__svml_slog1p_data_internal(%rip), %zmm11
> > > +        vmovups   sPoly_2+__svml_slog1p_data_internal(%rip), %zmm12
> > > +
> > > +/* polynomial evaluation */
> > > +        vsubps    {rn-sae}, %zmm2, %zmm5, %zmm2
> > > +        vaddps    {rn-sae}, %zmm6, %zmm2, %zmm15
> > > +        vmovups   sPoly_7+__svml_slog1p_data_internal(%rip), %zmm2
> > > +        vfmadd231ps {rn-sae}, %zmm15, %zmm7, %zmm2
> > > +        vpandnd   %zmm8, %zmm8, %zmm4{%k1}
> > > +        vmovups   sPoly_6+__svml_slog1p_data_internal(%rip), %zmm8
> > > +
> > > +/* combine and get argument value range mask */
> > > +        vptestmd  %zmm4, %zmm4, %k0
> > > +        vfmadd213ps {rn-sae}, %zmm8, %zmm15, %zmm2
> > > +        kmovw     %k0, %edx
> > > +        vfmadd213ps {rn-sae}, %zmm9, %zmm15, %zmm2
> > > +        vfmadd213ps {rn-sae}, %zmm10, %zmm15, %zmm2
> > > +        vfmadd213ps {rn-sae}, %zmm11, %zmm15, %zmm2
> > > +        vfmadd213ps {rn-sae}, %zmm12, %zmm15, %zmm2
> > > +        vfmadd213ps {rn-sae}, %zmm14, %zmm15, %zmm2
> > > +        vmulps    {rn-sae}, %zmm15, %zmm2, %zmm4
> > > +        vfmadd213ps {rn-sae}, %zmm15, %zmm15, %zmm4
> > > +
> > > +/* final reconstruction */
> > > +        vmovups   sLn2+__svml_slog1p_data_internal(%rip), %zmm15
> > > +        vfmadd213ps {rn-sae}, %zmm4, %zmm15, %zmm0
> > > +        vorps     %zmm1, %zmm0, %zmm0
> > > +        testl     %edx, %edx
> > > +
> > > +/* Go to special inputs processing branch */
> > > +        jne       L(SPECIAL_VALUES_BRANCH)
> > > +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm3
> > > +
> > > +/* Restore registers
> > > + * and exit the function
> > > + */
> > > +
> > > +L(EXIT):
> > > +        movq      %rbp, %rsp
> > > +        popq      %rbp
> > > +        cfi_def_cfa(7, 8)
> > > +        cfi_restore(6)
> > > +        ret
> > > +        cfi_def_cfa(6, 16)
> > > +        cfi_offset(6, -16)
> > > +
> > > +/* Branch to process
> > > + * special inputs
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_BRANCH):
> > > +        vmovups   %zmm3, 64(%rsp)
> > > +        vmovups   %zmm0, 128(%rsp)
> > > +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> > > +
> > > +        xorl      %eax, %eax
> > > +                                # LOE rbx r12 r13 r14 r15 eax edx
> > > +
> > > +        vzeroupper
> > > +        movq      %r12, 16(%rsp)
> > > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> > > +        movl      %eax, %r12d
> > > +        movq      %r13, 8(%rsp)
> > > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> > > +        movl      %edx, %r13d
> > > +        movq      %r14, (%rsp)
> > > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +/* Range mask
> > > + * bits check
> > > + */
> > > +
> > > +L(RANGEMASK_CHECK):
> > > +        btl       %r12d, %r13d
> > > +
> > > +/* Call scalar math function */
> > > +        jc        L(SCALAR_MATH_CALL)
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +/* Special inputs
> > > + * processing loop
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_LOOP):
> > > +        incl      %r12d
> > > +        cmpl      $16, %r12d
> > > +
> > > +/* Check bits in range mask */
> > > +        jl        L(RANGEMASK_CHECK)
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +        movq      16(%rsp), %r12
> > > +        cfi_restore(12)
> > > +        movq      8(%rsp), %r13
> > > +        cfi_restore(13)
> > > +        movq      (%rsp), %r14
> > > +        cfi_restore(14)
> > > +        vmovups   128(%rsp), %zmm0
> > > +
> > > +/* Go to exit */
> > > +        jmp       L(EXIT)
> > > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> > > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> > > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> > > +                                # LOE rbx r12 r13 r14 r15 zmm0
> > > +
> > > +/* Scalar math fucntion call
> > > + * to process special input
> > > + */
> > > +
> > > +L(SCALAR_MATH_CALL):
> > > +        movl      %r12d, %r14d
> > > +        movss     64(%rsp,%r14,4), %xmm0
> > > +        call      log1pf@PLT
> > > +                                # LOE rbx r14 r15 r12d r13d xmm0
> > > +
> > > +        movss     %xmm0, 128(%rsp,%r14,4)
> > > +
> > > +/* Process special inputs in loop */
> > > +        jmp       L(SPECIAL_VALUES_LOOP)
> > > +                                # LOE rbx r15 r12d r13d
> > > +END(_ZGVeN16v_log1pf_skx)
> > > +
> > > +        .section .rodata, "a"
> > > +        .align 64
> > > +
> > > +#ifdef __svml_slog1p_data_internal_typedef
> > > +typedef unsigned int VUINT32;
> > > +typedef struct {
> > > +        __declspec(align(64)) VUINT32 SgnMask[16][1];
> > > +        __declspec(align(64)) VUINT32 sOne[16][1];
> > > +        __declspec(align(64)) VUINT32 sPoly[8][16][1];
> > > +        __declspec(align(64)) VUINT32 iHiDelta[16][1];
> > > +        __declspec(align(64)) VUINT32 iLoRange[16][1];
> > > +        __declspec(align(64)) VUINT32 iBrkValue[16][1];
> > > +        __declspec(align(64)) VUINT32 iOffExpoMask[16][1];
> > > +        __declspec(align(64)) VUINT32 sLn2[16][1];
> > > +} __svml_slog1p_data_internal;
> > > +#endif
> > > +__svml_slog1p_data_internal:
> > > +        /*== SgnMask ==*/
> > > +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> > > +        /*== sOne = SP 1.0 ==*/
> > > +        .align 64
> > > +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> > > +        /*== sPoly[] = SP polynomial ==*/
> > > +        .align 64
> > > +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> > > +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> > > +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> > > +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> > > +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> > > +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> > > +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> > > +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> > > +        /*== iHiDelta = SP 80000000-7f000000 ==*/
> > > +        .align 64
> > > +        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000
> > > +        /*== iLoRange = SP 00800000+iHiDelta ==*/
> > > +        .align 64
> > > +        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000
> > > +        /*== iBrkValue = SP 2/3 ==*/
> > > +        .align 64
> > > +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> > > +        /*== iOffExpoMask = SP significand mask ==*/
> > > +        .align 64
> > > +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> > > +        /*== sLn2 = SP ln(2) ==*/
> > > +        .align 64
> > > +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> > > +        .align 64
> > > +        .type        __svml_slog1p_data_internal,@object
> > > +        .size        __svml_slog1p_data_internal,.-__svml_slog1p_data_internal
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
> > > new file mode 100644
> > > index 0000000000..913c8290c8
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core-sse2.S
> > > @@ -0,0 +1,20 @@
> > > +/* SSE2 version of vectorized log1pf, vector length is 4.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define _ZGVbN4v_log1pf _ZGVbN4v_log1pf_sse2
> > > +#include "../svml_s_log1pf4_core.S"
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
> > > new file mode 100644
> > > index 0000000000..b6aff48023
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core.c
> > > @@ -0,0 +1,28 @@
> > > +/* Multiple versions of vectorized log1pf, vector length is 4.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define SYMBOL_NAME _ZGVbN4v_log1pf
> > > +#include "ifunc-mathvec-sse4_1.h"
> > > +
> > > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > > +
> > > +#ifdef SHARED
> > > +__hidden_ver1 (_ZGVbN4v_log1pf, __GI__ZGVbN4v_log1pf,
> > > +            __redirect__ZGVbN4v_log1pf)
> > > +  __attribute__ ((visibility ("hidden")));
> > > +#endif
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
> > > new file mode 100644
> > > index 0000000000..ef1bae58c0
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf4_core_sse4.S
> > > @@ -0,0 +1,252 @@
> > > +/* Function log1pf vectorized with SSE4.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   https://www.gnu.org/licenses/.  */
> > > +
> > > +/*
> > > + * ALGORITHM DESCRIPTION:
> > > + *
> > > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > > + *       log(Rcp) is tabulated
> > > + *
> > > + *
> > > + */
> > > +
> > > +/* Offsets for data table __svml_slog1p_data_internal
> > > + */
> > > +#define SgnMask                              0
> > > +#define sOne                                 16
> > > +#define sPoly                                32
> > > +#define iHiDelta                             160
> > > +#define iLoRange                             176
> > > +#define iBrkValue                            192
> > > +#define iOffExpoMask                         208
> > > +#define sLn2                                 224
> > > +
> > > +#include <sysdep.h>
> > > +
> > > +        .text
> > > +     .section .text.sse4,"ax",@progbits
> > > +ENTRY(_ZGVbN4v_log1pf_sse4)
> > > +        subq      $72, %rsp
> > > +        cfi_def_cfa_offset(80)
> > > +        movups    sOne+__svml_slog1p_data_internal(%rip), %xmm7
> > > +
> > > +/* compute 1+x as high, low parts */
> > > +        movaps    %xmm7, %xmm1
> > > +        movaps    %xmm7, %xmm5
> > > +        maxps     %xmm0, %xmm1
> > > +        minps     %xmm0, %xmm5
> > > +        movaps    %xmm1, %xmm4
> > > +
> > > +/* check argument value ranges */
> > > +        movdqu    iHiDelta+__svml_slog1p_data_internal(%rip), %xmm2
> > > +        addps     %xmm5, %xmm4
> > > +
> > > +/* reduction: compute r,n */
> > > +        movdqu    iBrkValue+__svml_slog1p_data_internal(%rip), %xmm3
> > > +        paddd     %xmm4, %xmm2
> > > +        movdqu    iOffExpoMask+__svml_slog1p_data_internal(%rip), %xmm8
> > > +        subps     %xmm4, %xmm1
> > > +        psubd     %xmm3, %xmm4
> > > +        addps     %xmm1, %xmm5
> > > +        pand      %xmm4, %xmm8
> > > +        psrad     $23, %xmm4
> > > +        cvtdq2ps  %xmm4, %xmm10
> > > +        pslld     $23, %xmm4
> > > +        movaps    %xmm7, %xmm1
> > > +        paddd     %xmm3, %xmm8
> > > +        psubd     %xmm4, %xmm1
> > > +        mulps     %xmm5, %xmm1
> > > +
> > > +/* polynomial evaluation */
> > > +        subps     %xmm7, %xmm8
> > > +
> > > +/* final reconstruction */
> > > +        mulps     sLn2+__svml_slog1p_data_internal(%rip), %xmm10
> > > +        addps     %xmm8, %xmm1
> > > +        movups    sPoly+112+__svml_slog1p_data_internal(%rip), %xmm9
> > > +        mulps     %xmm1, %xmm9
> > > +        movdqu    iLoRange+__svml_slog1p_data_internal(%rip), %xmm6
> > > +        pcmpgtd   %xmm2, %xmm6
> > > +        addps     sPoly+96+__svml_slog1p_data_internal(%rip), %xmm9
> > > +
> > > +/* combine and get argument value range mask */
> > > +        movmskps  %xmm6, %edx
> > > +        movups    SgnMask+__svml_slog1p_data_internal(%rip), %xmm11
> > > +        mulps     %xmm1, %xmm9
> > > +        andnps    %xmm0, %xmm11
> > > +        addps     sPoly+80+__svml_slog1p_data_internal(%rip), %xmm9
> > > +        mulps     %xmm1, %xmm9
> > > +        addps     sPoly+64+__svml_slog1p_data_internal(%rip), %xmm9
> > > +        mulps     %xmm1, %xmm9
> > > +        addps     sPoly+48+__svml_slog1p_data_internal(%rip), %xmm9
> > > +        mulps     %xmm1, %xmm9
> > > +        addps     sPoly+32+__svml_slog1p_data_internal(%rip), %xmm9
> > > +        mulps     %xmm1, %xmm9
> > > +        addps     sPoly+16+__svml_slog1p_data_internal(%rip), %xmm9
> > > +        mulps     %xmm1, %xmm9
> > > +        addps     sPoly+__svml_slog1p_data_internal(%rip), %xmm9
> > > +        mulps     %xmm1, %xmm9
> > > +        mulps     %xmm1, %xmm9
> > > +        addps     %xmm9, %xmm1
> > > +        addps     %xmm10, %xmm1
> > > +        orps      %xmm11, %xmm1
> > > +        testl     %edx, %edx
> > > +
> > > +/* Go to special inputs processing branch */
> > > +        jne       L(SPECIAL_VALUES_BRANCH)
> > > +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0 xmm1
> > > +
> > > +/* Restore registers
> > > + * and exit the function
> > > + */
> > > +
> > > +L(EXIT):
> > > +        movaps    %xmm1, %xmm0
> > > +        addq      $72, %rsp
> > > +        cfi_def_cfa_offset(8)
> > > +        ret
> > > +        cfi_def_cfa_offset(80)
> > > +
> > > +/* Branch to process
> > > + * special inputs
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_BRANCH):
> > > +        movups    %xmm0, 32(%rsp)
> > > +        movups    %xmm1, 48(%rsp)
> > > +                                # LOE rbx rbp r12 r13 r14 r15 edx
> > > +
> > > +        xorl      %eax, %eax
> > > +        movq      %r12, 16(%rsp)
> > > +        cfi_offset(12, -64)
> > > +        movl      %eax, %r12d
> > > +        movq      %r13, 8(%rsp)
> > > +        cfi_offset(13, -72)
> > > +        movl      %edx, %r13d
> > > +        movq      %r14, (%rsp)
> > > +        cfi_offset(14, -80)
> > > +                                # LOE rbx rbp r15 r12d r13d
> > > +
> > > +/* Range mask
> > > + * bits check
> > > + */
> > > +
> > > +L(RANGEMASK_CHECK):
> > > +        btl       %r12d, %r13d
> > > +
> > > +/* Call scalar math function */
> > > +        jc        L(SCALAR_MATH_CALL)
> > > +                                # LOE rbx rbp r15 r12d r13d
> > > +
> > > +/* Special inputs
> > > + * processing loop
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_LOOP):
> > > +        incl      %r12d
> > > +        cmpl      $4, %r12d
> > > +
> > > +/* Check bits in range mask */
> > > +        jl        L(RANGEMASK_CHECK)
> > > +                                # LOE rbx rbp r15 r12d r13d
> > > +
> > > +        movq      16(%rsp), %r12
> > > +        cfi_restore(12)
> > > +        movq      8(%rsp), %r13
> > > +        cfi_restore(13)
> > > +        movq      (%rsp), %r14
> > > +        cfi_restore(14)
> > > +        movups    48(%rsp), %xmm1
> > > +
> > > +/* Go to exit */
> > > +        jmp       L(EXIT)
> > > +        cfi_offset(12, -64)
> > > +        cfi_offset(13, -72)
> > > +        cfi_offset(14, -80)
> > > +                                # LOE rbx rbp r12 r13 r14 r15 xmm1
> > > +
> > > +/* Scalar math fucntion call
> > > + * to process special input
> > > + */
> > > +
> > > +L(SCALAR_MATH_CALL):
> > > +        movl      %r12d, %r14d
> > > +        movss     32(%rsp,%r14,4), %xmm0
> > > +        call      log1pf@PLT
> > > +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> > > +
> > > +        movss     %xmm0, 48(%rsp,%r14,4)
> > > +
> > > +/* Process special inputs in loop */
> > > +        jmp       L(SPECIAL_VALUES_LOOP)
> > > +                                # LOE rbx rbp r15 r12d r13d
> > > +END(_ZGVbN4v_log1pf_sse4)
> > > +
> > > +        .section .rodata, "a"
> > > +        .align 16
> > > +
> > > +#ifdef __svml_slog1p_data_internal_typedef
> > > +typedef unsigned int VUINT32;
> > > +typedef struct {
> > > +        __declspec(align(16)) VUINT32 SgnMask[4][1];
> > > +        __declspec(align(16)) VUINT32 sOne[4][1];
> > > +        __declspec(align(16)) VUINT32 sPoly[8][4][1];
> > > +        __declspec(align(16)) VUINT32 iHiDelta[4][1];
> > > +        __declspec(align(16)) VUINT32 iLoRange[4][1];
> > > +        __declspec(align(16)) VUINT32 iBrkValue[4][1];
> > > +        __declspec(align(16)) VUINT32 iOffExpoMask[4][1];
> > > +        __declspec(align(16)) VUINT32 sLn2[4][1];
> > > +} __svml_slog1p_data_internal;
> > > +#endif
> > > +__svml_slog1p_data_internal:
> > > +        /*== SgnMask ==*/
> > > +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> > > +        /*== sOne = SP 1.0 ==*/
> > > +        .align 16
> > > +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> > > +        /*== sPoly[] = SP polynomial ==*/
> > > +        .align 16
> > > +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> > > +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> > > +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> > > +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> > > +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> > > +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> > > +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> > > +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> > > +        /*== iHiDelta = SP 80000000-7f000000 ==*/
> > > +        .align 16
> > > +        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000
> > > +        /*== iLoRange = SP 00800000+iHiDelta ==*/
> > > +        .align 16
> > > +        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000
> > > +        /*== iBrkValue = SP 2/3 ==*/
> > > +        .align 16
> > > +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> > > +        /*== iOffExpoMask = SP significand mask ==*/
> > > +        .align 16
> > > +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> > > +        /*== sLn2 = SP ln(2) ==*/
> > > +        .align 16
> > > +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> > > +        .align 16
> > > +        .type        __svml_slog1p_data_internal,@object
> > > +        .size        __svml_slog1p_data_internal,.-__svml_slog1p_data_internal
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
> > > new file mode 100644
> > > index 0000000000..c0b97d89e6
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core-sse.S
> > > @@ -0,0 +1,20 @@
> > > +/* SSE version of vectorized log1pf, vector length is 8.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +    Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define _ZGVdN8v_log1pf _ZGVdN8v_log1pf_sse_wrapper
> > > +#include "../svml_s_log1pf8_core.S"
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
> > > new file mode 100644
> > > index 0000000000..a2bbe37129
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core.c
> > > @@ -0,0 +1,28 @@
> > > +/* Multiple versions of vectorized log1pf, vector length is 8.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#define SYMBOL_NAME _ZGVdN8v_log1pf
> > > +#include "ifunc-mathvec-avx2.h"
> > > +
> > > +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> > > +
> > > +#ifdef SHARED
> > > +__hidden_ver1 (_ZGVdN8v_log1pf, __GI__ZGVdN8v_log1pf,
> > > +            __redirect__ZGVdN8v_log1pf)
> > > +  __attribute__ ((visibility ("hidden")));
> > > +#endif
> > > diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
> > > new file mode 100644
> > > index 0000000000..957dc23e3f
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_log1pf8_core_avx2.S
> > > @@ -0,0 +1,254 @@
> > > +/* Function log1pf vectorized with AVX2.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   https://www.gnu.org/licenses/.  */
> > > +
> > > +/*
> > > + * ALGORITHM DESCRIPTION:
> > > + *
> > > + *    1+x = 2^k*(xh + xl) is computed in high-low parts; xh in [1,2)
> > > + *    Get short reciprocal approximation Rcp ~ 1/xh
> > > + *    R = (Rcp*xh - 1.0) + Rcp*xl
> > > + *    log1p(x) = k*log(2.0) - log(Rcp) + poly(R)
> > > + *       log(Rcp) is tabulated
> > > + *
> > > + *
> > > + */
> > > +
> > > +/* Offsets for data table __svml_slog1p_data_internal
> > > + */
> > > +#define SgnMask                              0
> > > +#define sOne                                 32
> > > +#define sPoly                                64
> > > +#define iHiDelta                             320
> > > +#define iLoRange                             352
> > > +#define iBrkValue                            384
> > > +#define iOffExpoMask                         416
> > > +#define sLn2                                 448
> > > +
> > > +#include <sysdep.h>
> > > +
> > > +        .text
> > > +     .section .text.avx2,"ax",@progbits
> > > +ENTRY(_ZGVdN8v_log1pf_avx2)
> > > +        pushq     %rbp
> > > +        cfi_def_cfa_offset(16)
> > > +        movq      %rsp, %rbp
> > > +        cfi_def_cfa(6, 16)
> > > +        cfi_offset(6, -16)
> > > +        andq      $-32, %rsp
> > > +        subq      $96, %rsp
> > > +        vmovups   sOne+__svml_slog1p_data_internal(%rip), %ymm2
> > > +
> > > +/* reduction: compute r,n */
> > > +        vmovups   iBrkValue+__svml_slog1p_data_internal(%rip), %ymm13
> > > +        vmovups   SgnMask+__svml_slog1p_data_internal(%rip), %ymm4
> > > +        vmovups   iLoRange+__svml_slog1p_data_internal(%rip), %ymm8
> > > +        vmovaps   %ymm0, %ymm3
> > > +
> > > +/* compute 1+x as high, low parts */
> > > +        vmaxps    %ymm3, %ymm2, %ymm5
> > > +        vminps    %ymm3, %ymm2, %ymm6
> > > +        vaddps    %ymm6, %ymm5, %ymm10
> > > +        vpsubd    %ymm13, %ymm10, %ymm11
> > > +
> > > +/* check argument value ranges */
> > > +        vpaddd    iHiDelta+__svml_slog1p_data_internal(%rip), %ymm10, %ymm9
> > > +        vsubps    %ymm10, %ymm5, %ymm7
> > > +        vpsrad    $23, %ymm11, %ymm14
> > > +        vpand     iOffExpoMask+__svml_slog1p_data_internal(%rip), %ymm11, %ymm12
> > > +        vpslld    $23, %ymm14, %ymm15
> > > +        vcvtdq2ps %ymm14, %ymm0
> > > +        vpsubd    %ymm15, %ymm2, %ymm14
> > > +        vandnps   %ymm3, %ymm4, %ymm1
> > > +        vaddps    %ymm7, %ymm6, %ymm4
> > > +        vpaddd    %ymm13, %ymm12, %ymm6
> > > +        vmulps    %ymm4, %ymm14, %ymm7
> > > +
> > > +/* polynomial evaluation */
> > > +        vsubps    %ymm2, %ymm6, %ymm2
> > > +        vpcmpgtd  %ymm9, %ymm8, %ymm5
> > > +        vmovups   sPoly+224+__svml_slog1p_data_internal(%rip), %ymm8
> > > +        vaddps    %ymm2, %ymm7, %ymm9
> > > +        vfmadd213ps sPoly+192+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > > +        vfmadd213ps sPoly+160+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > > +        vfmadd213ps sPoly+128+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > > +        vfmadd213ps sPoly+96+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > > +        vfmadd213ps sPoly+64+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > > +        vfmadd213ps sPoly+32+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > > +        vfmadd213ps sPoly+__svml_slog1p_data_internal(%rip), %ymm9, %ymm8
> > > +        vmulps    %ymm8, %ymm9, %ymm10
> > > +        vfmadd213ps %ymm9, %ymm9, %ymm10
> > > +
> > > +/* final reconstruction */
> > > +        vfmadd132ps sLn2+__svml_slog1p_data_internal(%rip), %ymm10, %ymm0
> > > +
> > > +/* combine and get argument value range mask */
> > > +        vmovmskps %ymm5, %edx
> > > +        vorps     %ymm1, %ymm0, %ymm0
> > > +        testl     %edx, %edx
> > > +
> > > +/* Go to special inputs processing branch */
> > > +        jne       L(SPECIAL_VALUES_BRANCH)
> > > +                                # LOE rbx r12 r13 r14 r15 edx ymm0 ymm3
> > > +
> > > +/* Restore registers
> > > + * and exit the function
> > > + */
> > > +
> > > +L(EXIT):
> > > +        movq      %rbp, %rsp
> > > +        popq      %rbp
> > > +        cfi_def_cfa(7, 8)
> > > +        cfi_restore(6)
> > > +        ret
> > > +        cfi_def_cfa(6, 16)
> > > +        cfi_offset(6, -16)
> > > +
> > > +/* Branch to process
> > > + * special inputs
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_BRANCH):
> > > +        vmovups   %ymm3, 32(%rsp)
> > > +        vmovups   %ymm0, 64(%rsp)
> > > +                                # LOE rbx r12 r13 r14 r15 edx ymm0
> > > +
> > > +        xorl      %eax, %eax
> > > +                                # LOE rbx r12 r13 r14 r15 eax edx
> > > +
> > > +        vzeroupper
> > > +        movq      %r12, 16(%rsp)
> > > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> > > +        movl      %eax, %r12d
> > > +        movq      %r13, 8(%rsp)
> > > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> > > +        movl      %edx, %r13d
> > > +        movq      %r14, (%rsp)
> > > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +/* Range mask
> > > + * bits check
> > > + */
> > > +
> > > +L(RANGEMASK_CHECK):
> > > +        btl       %r12d, %r13d
> > > +
> > > +/* Call scalar math function */
> > > +        jc        L(SCALAR_MATH_CALL)
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +/* Special inputs
> > > + * processing loop
> > > + */
> > > +
> > > +L(SPECIAL_VALUES_LOOP):
> > > +        incl      %r12d
> > > +        cmpl      $8, %r12d
> > > +
> > > +/* Check bits in range mask */
> > > +        jl        L(RANGEMASK_CHECK)
> > > +                                # LOE rbx r15 r12d r13d
> > > +
> > > +        movq      16(%rsp), %r12
> > > +        cfi_restore(12)
> > > +        movq      8(%rsp), %r13
> > > +        cfi_restore(13)
> > > +        movq      (%rsp), %r14
> > > +        cfi_restore(14)
> > > +        vmovups   64(%rsp), %ymm0
> > > +
> > > +/* Go to exit */
> > > +        jmp       L(EXIT)
> > > +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> > > +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> > > +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> > > +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> > > +                                # LOE rbx r12 r13 r14 r15 ymm0
> > > +
> > > +/* Scalar math fucntion call
> > > + * to process special input
> > > + */
> > > +
> > > +L(SCALAR_MATH_CALL):
> > > +        movl      %r12d, %r14d
> > > +        movss     32(%rsp,%r14,4), %xmm0
> > > +        call      log1pf@PLT
> > > +                                # LOE rbx r14 r15 r12d r13d xmm0
> > > +
> > > +        movss     %xmm0, 64(%rsp,%r14,4)
> > > +
> > > +/* Process special inputs in loop */
> > > +        jmp       L(SPECIAL_VALUES_LOOP)
> > > +                                # LOE rbx r15 r12d r13d
> > > +END(_ZGVdN8v_log1pf_avx2)
> > > +
> > > +        .section .rodata, "a"
> > > +        .align 32
> > > +
> > > +#ifdef __svml_slog1p_data_internal_typedef
> > > +typedef unsigned int VUINT32;
> > > +typedef struct {
> > > +        __declspec(align(32)) VUINT32 SgnMask[8][1];
> > > +        __declspec(align(32)) VUINT32 sOne[8][1];
> > > +        __declspec(align(32)) VUINT32 sPoly[8][8][1];
> > > +        __declspec(align(32)) VUINT32 iHiDelta[8][1];
> > > +        __declspec(align(32)) VUINT32 iLoRange[8][1];
> > > +        __declspec(align(32)) VUINT32 iBrkValue[8][1];
> > > +        __declspec(align(32)) VUINT32 iOffExpoMask[8][1];
> > > +        __declspec(align(32)) VUINT32 sLn2[8][1];
> > > +} __svml_slog1p_data_internal;
> > > +#endif
> > > +__svml_slog1p_data_internal:
> > > +        /*== SgnMask ==*/
> > > +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff
> > > +        /*== sOne = SP 1.0 ==*/
> > > +        .align 32
> > > +        .long 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000
> > > +        /*== sPoly[] = SP polynomial ==*/
> > > +        .align 32
> > > +        .long 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000, 0xbf000000 /* -5.0000000000000000000000000e-01 P0 */
> > > +        .long 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94, 0x3eaaaa94 /*  3.3333265781402587890625000e-01 P1 */
> > > +        .long 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e, 0xbe80058e /* -2.5004237890243530273437500e-01 P2 */
> > > +        .long 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190, 0x3e4ce190 /*  2.0007920265197753906250000e-01 P3 */
> > > +        .long 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37, 0xbe28ad37 /* -1.6472326219081878662109375e-01 P4 */
> > > +        .long 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12, 0x3e0fcb12 /*  1.4042308926582336425781250e-01 P5 */
> > > +        .long 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3, 0xbe1ad9e3 /* -1.5122179687023162841796875e-01 P6 */
> > > +        .long 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed, 0x3e0d84ed /*  1.3820238411426544189453125e-01 P7 */
> > > +        /*== iHiDelta = SP 80000000-7f000000 ==*/
> > > +        .align 32
> > > +        .long 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000, 0x01000000
> > > +        /*== iLoRange = SP 00800000+iHiDelta ==*/
> > > +        .align 32
> > > +        .long 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000, 0x01800000
> > > +        /*== iBrkValue = SP 2/3 ==*/
> > > +        .align 32
> > > +        .long 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab, 0x3f2aaaab
> > > +        /*== iOffExpoMask = SP significand mask ==*/
> > > +        .align 32
> > > +        .long 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff
> > > +        /*== sLn2 = SP ln(2) ==*/
> > > +        .align 32
> > > +        .long 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218, 0x3f317218
> > > +        .align 32
> > > +        .type        __svml_slog1p_data_internal,@object
> > > +        .size        __svml_slog1p_data_internal,.-__svml_slog1p_data_internal
> > > diff --git a/sysdeps/x86_64/fpu/svml_d_log1p2_core.S b/sysdeps/x86_64/fpu/svml_d_log1p2_core.S
> > > new file mode 100644
> > > index 0000000000..e3f01717d9
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/svml_d_log1p2_core.S
> > > @@ -0,0 +1,29 @@
> > > +/* Function log1p vectorized with SSE2.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#include <sysdep.h>
> > > +#include "svml_d_wrapper_impl.h"
> > > +
> > > +     .text
> > > +ENTRY (_ZGVbN2v_log1p)
> > > +WRAPPER_IMPL_SSE2 log1p
> > > +END (_ZGVbN2v_log1p)
> > > +
> > > +#ifndef USE_MULTIARCH
> > > + libmvec_hidden_def (_ZGVbN2v_log1p)
> > > +#endif
> > > diff --git a/sysdeps/x86_64/fpu/svml_d_log1p4_core.S b/sysdeps/x86_64/fpu/svml_d_log1p4_core.S
> > > new file mode 100644
> > > index 0000000000..49beb96183
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/svml_d_log1p4_core.S
> > > @@ -0,0 +1,29 @@
> > > +/* Function log1p vectorized with AVX2, wrapper version.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#include <sysdep.h>
> > > +#include "svml_d_wrapper_impl.h"
> > > +
> > > +     .text
> > > +ENTRY (_ZGVdN4v_log1p)
> > > +WRAPPER_IMPL_AVX _ZGVbN2v_log1p
> > > +END (_ZGVdN4v_log1p)
> > > +
> > > +#ifndef USE_MULTIARCH
> > > + libmvec_hidden_def (_ZGVdN4v_log1p)
> > > +#endif
> > > diff --git a/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
> > > new file mode 100644
> > > index 0000000000..8b89768b7c
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/svml_d_log1p4_core_avx.S
> > > @@ -0,0 +1,25 @@
> > > +/* Function log1p vectorized in AVX ISA as wrapper to SSE4 ISA version.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#include <sysdep.h>
> > > +#include "svml_d_wrapper_impl.h"
> > > +
> > > +     .text
> > > +ENTRY (_ZGVcN4v_log1p)
> > > +WRAPPER_IMPL_AVX _ZGVbN2v_log1p
> > > +END (_ZGVcN4v_log1p)
> > > diff --git a/sysdeps/x86_64/fpu/svml_d_log1p8_core.S b/sysdeps/x86_64/fpu/svml_d_log1p8_core.S
> > > new file mode 100644
> > > index 0000000000..54b4d4ede8
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/svml_d_log1p8_core.S
> > > @@ -0,0 +1,25 @@
> > > +/* Function log1p vectorized with AVX-512, wrapper to AVX2.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#include <sysdep.h>
> > > +#include "svml_d_wrapper_impl.h"
> > > +
> > > +     .text
> > > +ENTRY (_ZGVeN8v_log1p)
> > > +WRAPPER_IMPL_AVX512 _ZGVdN4v_log1p
> > > +END (_ZGVeN8v_log1p)
> > > diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
> > > new file mode 100644
> > > index 0000000000..2c953d00fb
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/svml_s_log1pf16_core.S
> > > @@ -0,0 +1,25 @@
> > > +/* Function log1pf vectorized with AVX-512. Wrapper to AVX2 version.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#include <sysdep.h>
> > > +#include "svml_s_wrapper_impl.h"
> > > +
> > > +     .text
> > > +ENTRY (_ZGVeN16v_log1pf)
> > > +WRAPPER_IMPL_AVX512 _ZGVdN8v_log1pf
> > > +END (_ZGVeN16v_log1pf)
> > > diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
> > > new file mode 100644
> > > index 0000000000..6f68762eaa
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/svml_s_log1pf4_core.S
> > > @@ -0,0 +1,29 @@
> > > +/* Function log1pf vectorized with SSE2, wrapper version.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#include <sysdep.h>
> > > +#include "svml_s_wrapper_impl.h"
> > > +
> > > +     .text
> > > +ENTRY (_ZGVbN4v_log1pf)
> > > +WRAPPER_IMPL_SSE2 log1pf
> > > +END (_ZGVbN4v_log1pf)
> > > +
> > > +#ifndef USE_MULTIARCH
> > > + libmvec_hidden_def (_ZGVbN4v_log1pf)
> > > +#endif
> > > diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S b/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
> > > new file mode 100644
> > > index 0000000000..74f81283b1
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/svml_s_log1pf8_core.S
> > > @@ -0,0 +1,29 @@
> > > +/* Function log1pf vectorized with AVX2, wrapper version.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#include <sysdep.h>
> > > +#include "svml_s_wrapper_impl.h"
> > > +
> > > +     .text
> > > +ENTRY (_ZGVdN8v_log1pf)
> > > +WRAPPER_IMPL_AVX _ZGVbN4v_log1pf
> > > +END (_ZGVdN8v_log1pf)
> > > +
> > > +#ifndef USE_MULTIARCH
> > > + libmvec_hidden_def (_ZGVdN8v_log1pf)
> > > +#endif
> > > diff --git a/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
> > > new file mode 100644
> > > index 0000000000..f33be0e904
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/svml_s_log1pf8_core_avx.S
> > > @@ -0,0 +1,25 @@
> > > +/* Function log1pf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> > > +   Copyright (C) 2021 Free Software Foundation, Inc.
> > > +   This file is part of the GNU C Library.
> > > +
> > > +   The GNU C Library is free software; you can redistribute it and/or
> > > +   modify it under the terms of the GNU Lesser General Public
> > > +   License as published by the Free Software Foundation; either
> > > +   version 2.1 of the License, or (at your option) any later version.
> > > +
> > > +   The GNU C Library is distributed in the hope that it will be useful,
> > > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > +   Lesser General Public License for more details.
> > > +
> > > +   You should have received a copy of the GNU Lesser General Public
> > > +   License along with the GNU C Library; if not, see
> > > +   <https://www.gnu.org/licenses/>.  */
> > > +
> > > +#include <sysdep.h>
> > > +#include "svml_s_wrapper_impl.h"
> > > +
> > > +        .text
> > > +ENTRY (_ZGVcN8v_log1pf)
> > > +WRAPPER_IMPL_AVX _ZGVbN4v_log1pf
> > > +END (_ZGVcN8v_log1pf)
> > > diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
> > > new file mode 100644
> > > index 0000000000..18aa6aaeaa
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx.c
> > > @@ -0,0 +1 @@
> > > +#include "test-double-libmvec-log1p.c"
> > > diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
> > > new file mode 100644
> > > index 0000000000..18aa6aaeaa
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx2.c
> > > @@ -0,0 +1 @@
> > > +#include "test-double-libmvec-log1p.c"
> > > diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
> > > new file mode 100644
> > > index 0000000000..18aa6aaeaa
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p-avx512f.c
> > > @@ -0,0 +1 @@
> > > +#include "test-double-libmvec-log1p.c"
> > > diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c b/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
> > > new file mode 100644
> > > index 0000000000..40937f987a
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/test-double-libmvec-log1p.c
> > > @@ -0,0 +1,3 @@
> > > +#define LIBMVEC_TYPE double
> > > +#define LIBMVEC_FUNC log1p
> > > +#include "test-vector-abi-arg1.h"
> > > diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> > > index 08c91ff634..38359b05e3 100644
> > > --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> > > +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> > > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVbN2v_cbrt)
> > >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVbN2vv_atan2)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVbN2v_log10)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVbN2v_log2)
> > > +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
> > >
> > >  #define VEC_INT_TYPE __m128i
> > >
> > > diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> > > index a2fb0de309..17701e7731 100644
> > > --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> > > +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> > > @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVdN4v_cbrt)
> > >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVdN4vv_atan2)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVdN4v_log10)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVdN4v_log2)
> > > +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
> > >
> > >  #ifndef __ILP32__
> > >  # define VEC_INT_TYPE __m256i
> > > diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> > > index dc65a4ee25..bba62b2446 100644
> > > --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> > > +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> > > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVcN4v_cbrt)
> > >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVcN4vv_atan2)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVcN4v_log10)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVcN4v_log2)
> > > +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
> > >
> > >  #define VEC_INT_TYPE __m128i
> > >
> > > diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> > > index 253ee8c906..8a04e13a07 100644
> > > --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> > > +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> > > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrt), _ZGVeN8v_cbrt)
> > >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2), _ZGVeN8vv_atan2)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log10), _ZGVeN8v_log10)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log2), _ZGVeN8v_log2)
> > > +VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
> > >
> > >  #ifndef __ILP32__
> > >  # define VEC_INT_TYPE __m512i
> > > diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
> > > new file mode 100644
> > > index 0000000000..3395decaf4
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx.c
> > > @@ -0,0 +1 @@
> > > +#include "test-float-libmvec-log1pf.c"
> > > diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
> > > new file mode 100644
> > > index 0000000000..3395decaf4
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx2.c
> > > @@ -0,0 +1 @@
> > > +#include "test-float-libmvec-log1pf.c"
> > > diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
> > > new file mode 100644
> > > index 0000000000..3395decaf4
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf-avx512f.c
> > > @@ -0,0 +1 @@
> > > +#include "test-float-libmvec-log1pf.c"
> > > diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
> > > new file mode 100644
> > > index 0000000000..1b36069ded
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/fpu/test-float-libmvec-log1pf.c
> > > @@ -0,0 +1,3 @@
> > > +#define LIBMVEC_TYPE float
> > > +#define LIBMVEC_FUNC log1pf
> > > +#include "test-vector-abi-arg1.h"
> > > diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> > > index 1c7db5146c..706f52c618 100644
> > > --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> > > +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> > > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVeN16v_cbrtf)
> > >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVeN16vv_atan2f)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVeN16v_log10f)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVeN16v_log2f)
> > > +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
> > >
> > >  #define VEC_INT_TYPE __m512i
> > >
> > > diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> > > index 8ec51603b3..ceace4c53a 100644
> > > --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> > > +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> > > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVbN4v_cbrtf)
> > >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVbN4vv_atan2f)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVbN4v_log10f)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVbN4v_log2f)
> > > +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
> > >
> > >  #define VEC_INT_TYPE __m128i
> > >
> > > diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> > > index 1cb4553c7a..06a4753409 100644
> > > --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> > > +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> > > @@ -43,6 +43,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVdN8v_cbrtf)
> > >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVdN8vv_atan2f)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVdN8v_log10f)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVdN8v_log2f)
> > > +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
> > >
> > >  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
> > >  #undef VECTOR_WRAPPER_fFF
> > > diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> > > index 6ecc1792bb..a87e5298e0 100644
> > > --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> > > +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> > > @@ -40,6 +40,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (cbrtf), _ZGVcN8v_cbrtf)
> > >  VECTOR_WRAPPER_ff (WRAPPER_NAME (atan2f), _ZGVcN8vv_atan2f)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log10f), _ZGVcN8v_log10f)
> > >  VECTOR_WRAPPER (WRAPPER_NAME (log2f), _ZGVcN8v_log2f)
> > > +VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
> > >
> > >  #define VEC_INT_TYPE __m128i
> > >
> > > --
> > > 2.31.1
> > >
> >
> > LGTM.
> >
> > Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
> >
> > Thanks.
> >
> >
> > H.J.



-- 
H.J.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 17/18] x86-64: Add vector tanh/tanhf implementation to libmvec
  2021-12-29  6:39 ` [PATCH v5 17/18] x86-64: Add vector tanh/tanhf " Sunil K Pandey
  2021-12-29 21:26   ` H.J. Lu
@ 2022-01-29  1:33   ` Noah Goldstein
  1 sibling, 0 replies; 40+ messages in thread
From: Noah Goldstein @ 2022-01-29  1:33 UTC (permalink / raw)
  To: Sunil K Pandey; +Cc: GNU C Library, Kolesov, Andrey, Cornea, Marius

On Wed, Dec 29, 2021 at 12:50 AM Sunil K Pandey via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and
> AVX512 versions for libmvec as per vector ABI.  It also contains
> accuracy and ABI tests for vector tanh/tanhf with regenerated ulps.
> ---
>  bits/libm-simd-decl-stubs.h                   |   11 +
>  math/bits/mathcalls.h                         |    2 +-
>  .../unix/sysv/linux/x86_64/libmvec.abilist    |    8 +
>  sysdeps/x86/fpu/bits/math-vector.h            |    4 +
>  .../x86/fpu/finclude/math-vector-fortran.h    |    4 +
>  sysdeps/x86_64/fpu/Makeconfig                 |    1 +
>  sysdeps/x86_64/fpu/Versions                   |    2 +
>  sysdeps/x86_64/fpu/libm-test-ulps             |   15 +
>  .../fpu/multiarch/svml_d_tanh2_core-sse2.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_tanh2_core.c  |   27 +
>  .../fpu/multiarch/svml_d_tanh2_core_sse4.S    | 1272 ++++++++++++++++
>  .../fpu/multiarch/svml_d_tanh4_core-sse.S     |   20 +
>  .../x86_64/fpu/multiarch/svml_d_tanh4_core.c  |   27 +
>  .../fpu/multiarch/svml_d_tanh4_core_avx2.S    | 1279 +++++++++++++++++
>  .../fpu/multiarch/svml_d_tanh8_core-avx2.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_d_tanh8_core.c  |   27 +
>  .../fpu/multiarch/svml_d_tanh8_core_avx512.S  |  472 ++++++
>  .../fpu/multiarch/svml_s_tanhf16_core-avx2.S  |   20 +
>  .../fpu/multiarch/svml_s_tanhf16_core.c       |   28 +
>  .../multiarch/svml_s_tanhf16_core_avx512.S    |  381 +++++
>  .../fpu/multiarch/svml_s_tanhf4_core-sse2.S   |   20 +
>  .../x86_64/fpu/multiarch/svml_s_tanhf4_core.c |   28 +
>  .../fpu/multiarch/svml_s_tanhf4_core_sse4.S   |  832 +++++++++++
>  .../fpu/multiarch/svml_s_tanhf8_core-sse.S    |   20 +
>  .../x86_64/fpu/multiarch/svml_s_tanhf8_core.c |   28 +
>  .../fpu/multiarch/svml_s_tanhf8_core_avx2.S   |  844 +++++++++++
>  sysdeps/x86_64/fpu/svml_d_tanh2_core.S        |   29 +
>  sysdeps/x86_64/fpu/svml_d_tanh4_core.S        |   29 +
>  sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S    |   25 +
>  sysdeps/x86_64/fpu/svml_d_tanh8_core.S        |   25 +
>  sysdeps/x86_64/fpu/svml_s_tanhf16_core.S      |   25 +
>  sysdeps/x86_64/fpu/svml_s_tanhf4_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_s_tanhf8_core.S       |   29 +
>  sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S   |   25 +
>  .../x86_64/fpu/test-double-libmvec-tanh-avx.c |    1 +
>  .../fpu/test-double-libmvec-tanh-avx2.c       |    1 +
>  .../fpu/test-double-libmvec-tanh-avx512f.c    |    1 +
>  sysdeps/x86_64/fpu/test-double-libmvec-tanh.c |    3 +
>  .../x86_64/fpu/test-double-vlen2-wrappers.c   |    1 +
>  .../fpu/test-double-vlen4-avx2-wrappers.c     |    1 +
>  .../x86_64/fpu/test-double-vlen4-wrappers.c   |    1 +
>  .../x86_64/fpu/test-double-vlen8-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-libmvec-tanhf-avx.c |    1 +
>  .../fpu/test-float-libmvec-tanhf-avx2.c       |    1 +
>  .../fpu/test-float-libmvec-tanhf-avx512f.c    |    1 +
>  sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c |    3 +
>  .../x86_64/fpu/test-float-vlen16-wrappers.c   |    1 +
>  .../x86_64/fpu/test-float-vlen4-wrappers.c    |    1 +
>  .../fpu/test-float-vlen8-avx2-wrappers.c      |    1 +
>  .../x86_64/fpu/test-float-vlen8-wrappers.c    |    1 +
>  50 files changed, 5647 insertions(+), 1 deletion(-)
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh2_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_d_tanh8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf16_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf4_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf8_core.S
>  create mode 100644 sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-tanh.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c
>  create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c
>
> diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h
> index 33d480031b..21f1a43232 100644
> --- a/bits/libm-simd-decl-stubs.h
> +++ b/bits/libm-simd-decl-stubs.h
> @@ -285,4 +285,15 @@
>  #define __DECL_SIMD_erff32x
>  #define __DECL_SIMD_erff64x
>  #define __DECL_SIMD_erff128x
> +
> +#define __DECL_SIMD_tanh
> +#define __DECL_SIMD_tanhf
> +#define __DECL_SIMD_tanhl
> +#define __DECL_SIMD_tanhf16
> +#define __DECL_SIMD_tanhf32
> +#define __DECL_SIMD_tanhf64
> +#define __DECL_SIMD_tanhf128
> +#define __DECL_SIMD_tanhf32x
> +#define __DECL_SIMD_tanhf64x
> +#define __DECL_SIMD_tanhf128x
>  #endif
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index a5b6c4457f..3d1c2056d5 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -72,7 +72,7 @@ __MATHCALL_VEC (cosh,, (_Mdouble_ __x));
>  /* Hyperbolic sine of X.  */
>  __MATHCALL_VEC (sinh,, (_Mdouble_ __x));
>  /* Hyperbolic tangent of X.  */
> -__MATHCALL (tanh,, (_Mdouble_ __x));
> +__MATHCALL_VEC (tanh,, (_Mdouble_ __x));
>
>  #ifdef __USE_GNU
>  /* Cosine and sine of X.  */
> diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> index 5525c8a0d6..e178cef683 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
> @@ -61,6 +61,7 @@ GLIBC_2.35 _ZGVbN2v_log10 F
>  GLIBC_2.35 _ZGVbN2v_log1p F
>  GLIBC_2.35 _ZGVbN2v_log2 F
>  GLIBC_2.35 _ZGVbN2v_sinh F
> +GLIBC_2.35 _ZGVbN2v_tanh F
>  GLIBC_2.35 _ZGVbN2vv_atan2 F
>  GLIBC_2.35 _ZGVbN2vv_hypot F
>  GLIBC_2.35 _ZGVbN4v_acosf F
> @@ -78,6 +79,7 @@ GLIBC_2.35 _ZGVbN4v_log10f F
>  GLIBC_2.35 _ZGVbN4v_log1pf F
>  GLIBC_2.35 _ZGVbN4v_log2f F
>  GLIBC_2.35 _ZGVbN4v_sinhf F
> +GLIBC_2.35 _ZGVbN4v_tanhf F
>  GLIBC_2.35 _ZGVbN4vv_atan2f F
>  GLIBC_2.35 _ZGVbN4vv_hypotf F
>  GLIBC_2.35 _ZGVcN4v_acos F
> @@ -95,6 +97,7 @@ GLIBC_2.35 _ZGVcN4v_log10 F
>  GLIBC_2.35 _ZGVcN4v_log1p F
>  GLIBC_2.35 _ZGVcN4v_log2 F
>  GLIBC_2.35 _ZGVcN4v_sinh F
> +GLIBC_2.35 _ZGVcN4v_tanh F
>  GLIBC_2.35 _ZGVcN4vv_atan2 F
>  GLIBC_2.35 _ZGVcN4vv_hypot F
>  GLIBC_2.35 _ZGVcN8v_acosf F
> @@ -112,6 +115,7 @@ GLIBC_2.35 _ZGVcN8v_log10f F
>  GLIBC_2.35 _ZGVcN8v_log1pf F
>  GLIBC_2.35 _ZGVcN8v_log2f F
>  GLIBC_2.35 _ZGVcN8v_sinhf F
> +GLIBC_2.35 _ZGVcN8v_tanhf F
>  GLIBC_2.35 _ZGVcN8vv_atan2f F
>  GLIBC_2.35 _ZGVcN8vv_hypotf F
>  GLIBC_2.35 _ZGVdN4v_acos F
> @@ -129,6 +133,7 @@ GLIBC_2.35 _ZGVdN4v_log10 F
>  GLIBC_2.35 _ZGVdN4v_log1p F
>  GLIBC_2.35 _ZGVdN4v_log2 F
>  GLIBC_2.35 _ZGVdN4v_sinh F
> +GLIBC_2.35 _ZGVdN4v_tanh F
>  GLIBC_2.35 _ZGVdN4vv_atan2 F
>  GLIBC_2.35 _ZGVdN4vv_hypot F
>  GLIBC_2.35 _ZGVdN8v_acosf F
> @@ -146,6 +151,7 @@ GLIBC_2.35 _ZGVdN8v_log10f F
>  GLIBC_2.35 _ZGVdN8v_log1pf F
>  GLIBC_2.35 _ZGVdN8v_log2f F
>  GLIBC_2.35 _ZGVdN8v_sinhf F
> +GLIBC_2.35 _ZGVdN8v_tanhf F
>  GLIBC_2.35 _ZGVdN8vv_atan2f F
>  GLIBC_2.35 _ZGVdN8vv_hypotf F
>  GLIBC_2.35 _ZGVeN16v_acosf F
> @@ -163,6 +169,7 @@ GLIBC_2.35 _ZGVeN16v_log10f F
>  GLIBC_2.35 _ZGVeN16v_log1pf F
>  GLIBC_2.35 _ZGVeN16v_log2f F
>  GLIBC_2.35 _ZGVeN16v_sinhf F
> +GLIBC_2.35 _ZGVeN16v_tanhf F
>  GLIBC_2.35 _ZGVeN16vv_atan2f F
>  GLIBC_2.35 _ZGVeN16vv_hypotf F
>  GLIBC_2.35 _ZGVeN8v_acos F
> @@ -180,5 +187,6 @@ GLIBC_2.35 _ZGVeN8v_log10 F
>  GLIBC_2.35 _ZGVeN8v_log1p F
>  GLIBC_2.35 _ZGVeN8v_log2 F
>  GLIBC_2.35 _ZGVeN8v_sinh F
> +GLIBC_2.35 _ZGVeN8v_tanh F
>  GLIBC_2.35 _ZGVeN8vv_atan2 F
>  GLIBC_2.35 _ZGVeN8vv_hypot F
> diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
> index ea0deb31c1..3c657f6108 100644
> --- a/sysdeps/x86/fpu/bits/math-vector.h
> +++ b/sysdeps/x86/fpu/bits/math-vector.h
> @@ -126,6 +126,10 @@
>  #  define __DECL_SIMD_erf __DECL_SIMD_x86_64
>  #  undef __DECL_SIMD_erff
>  #  define __DECL_SIMD_erff __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_tanh
> +#  define __DECL_SIMD_tanh __DECL_SIMD_x86_64
> +#  undef __DECL_SIMD_tanhf
> +#  define __DECL_SIMD_tanhf __DECL_SIMD_x86_64
>
>  # endif
>  #endif
> diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> index 42addd9a25..c7f81945fe 100644
> --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h
> @@ -62,6 +62,8 @@
>  !GCC$ builtin (acoshf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (erf) attributes simd (notinbranch) if('x86_64')
>  !GCC$ builtin (erff) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (tanh) attributes simd (notinbranch) if('x86_64')
> +!GCC$ builtin (tanhf) attributes simd (notinbranch) if('x86_64')
>
>  !GCC$ builtin (cos) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32')
> @@ -109,3 +111,5 @@
>  !GCC$ builtin (acoshf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (erf) attributes simd (notinbranch) if('x32')
>  !GCC$ builtin (erff) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (tanh) attributes simd (notinbranch) if('x32')
> +!GCC$ builtin (tanhf) attributes simd (notinbranch) if('x32')
> diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig
> index 2b89a1bba3..26df8d47bf 100644
> --- a/sysdeps/x86_64/fpu/Makeconfig
> +++ b/sysdeps/x86_64/fpu/Makeconfig
> @@ -45,6 +45,7 @@ libmvec-funcs = \
>    sin \
>    sincos \
>    sinh \
> +  tanh \
>
>  # Define libmvec function for benchtests directory.
>  libmvec-bench-funcs = \
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> index 2fcdef6944..adcbe0fefb 100644
> --- a/sysdeps/x86_64/fpu/Versions
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -29,6 +29,7 @@ libmvec {
>      _ZGVbN2v_log1p; _ZGVcN4v_log1p; _ZGVdN4v_log1p; _ZGVeN8v_log1p;
>      _ZGVbN2v_log2; _ZGVcN4v_log2; _ZGVdN4v_log2; _ZGVeN8v_log2;
>      _ZGVbN2v_sinh; _ZGVcN4v_sinh; _ZGVdN4v_sinh; _ZGVeN8v_sinh;
> +    _ZGVbN2v_tanh; _ZGVcN4v_tanh; _ZGVdN4v_tanh; _ZGVeN8v_tanh;
>      _ZGVbN2vv_atan2; _ZGVcN4vv_atan2; _ZGVdN4vv_atan2; _ZGVeN8vv_atan2;
>      _ZGVbN2vv_hypot; _ZGVcN4vv_hypot; _ZGVdN4vv_hypot; _ZGVeN8vv_hypot;
>      _ZGVbN4v_acosf; _ZGVcN8v_acosf; _ZGVdN8v_acosf; _ZGVeN16v_acosf;
> @@ -46,6 +47,7 @@ libmvec {
>      _ZGVbN4v_log1pf; _ZGVcN8v_log1pf; _ZGVdN8v_log1pf; _ZGVeN16v_log1pf;
>      _ZGVbN4v_log2f; _ZGVcN8v_log2f; _ZGVdN8v_log2f; _ZGVeN16v_log2f;
>      _ZGVbN4v_sinhf; _ZGVcN8v_sinhf; _ZGVdN8v_sinhf; _ZGVeN16v_sinhf;
> +    _ZGVbN4v_tanhf; _ZGVcN8v_tanhf; _ZGVdN8v_tanhf; _ZGVeN16v_tanhf;
>      _ZGVbN4vv_atan2f; _ZGVcN8vv_atan2f; _ZGVdN8vv_atan2f; _ZGVeN16vv_atan2f;
>      _ZGVbN4vv_hypotf; _ZGVcN8vv_hypotf; _ZGVdN8vv_hypotf; _ZGVeN16vv_hypotf;
>    }
> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
> index 929de0e786..bfaad7acef 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -2067,6 +2067,21 @@ float: 3
>  float128: 3
>  ldouble: 4
>
> +Function: "tanh_vlen16":
> +float: 1
> +
> +Function: "tanh_vlen2":
> +double: 1
> +
> +Function: "tanh_vlen4":
> +double: 1
> +
> +Function: "tanh_vlen4_avx2":
> +double: 1
> +
> +Function: "tanh_vlen8":
> +double: 1
> +
>  Function: "tgamma":
>  double: 9
>  float: 8
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S
> new file mode 100644
> index 0000000000..35b065fe55
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized tanh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN2v_tanh _ZGVbN2v_tanh_sse2
> +#include "../svml_d_tanh2_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c
> new file mode 100644
> index 0000000000..d2e63bdc56
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized tanh, vector length is 2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN2v_tanh
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN2v_tanh, __GI__ZGVbN2v_tanh, __redirect__ZGVbN2v_tanh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S
> new file mode 100644
> index 0000000000..35bbb5b04c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh2_core_sse4.S
> @@ -0,0 +1,1272 @@
> +/* Function tanh vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dtanh_data_internal
> + */
> +#define _dbP                           0
> +#define _dbSignMask                    7680
> +#define _dbAbsMask                     7696
> +#define _iExpMantMask                  7712
> +#define _iExpMask                      7728
> +#define _iMinIdxOfsMask                7744
> +#define _iMaxIdxMask                   7760
> +
> +#include <sysdep.h>
> +
> +        .text
> +       .section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN2v_tanh_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm13
> +        movq      _iExpMantMask+__svml_dtanh_data_internal(%rip), %xmm14
> +        lea       _dbP+96+__svml_dtanh_data_internal(%rip), %rsi
> +        pshufd    $221, %xmm13, %xmm8
> +
> +/* if VMIN, VMAX is defined for I type */
> +        pxor      %xmm10, %xmm10
> +        movq      _iMinIdxOfsMask+__svml_dtanh_data_internal(%rip), %xmm9
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        pand      %xmm14, %xmm8
> +        movdqa    %xmm8, %xmm11
> +        psubd     %xmm9, %xmm8
> +        movq      _iMaxIdxMask+__svml_dtanh_data_internal(%rip), %xmm5
> +        movdqa    %xmm8, %xmm6
> +        movdqa    %xmm8, %xmm7
> +        pcmpgtd   %xmm5, %xmm6
> +        pcmpgtd   %xmm10, %xmm7
> +        movdqa    %xmm6, %xmm3
> +        pand      %xmm7, %xmm8
> +        andps     %xmm6, %xmm5
> +        andnps    %xmm8, %xmm3
> +        orps      %xmm5, %xmm3
> +
> +/*
> + * VSHRIMM( I, iIndex, = iIndex, (17 - 4) );
> + * VGATHER_MATRIX( L2D, p, TAB._dbP, iIndex, 0, T_ITEM_SIZE, T_ITEM_GRAN, 13, 0, 0 );
> + */
> +        psrld     $10, %xmm3
> +        movd      %xmm3, %eax
> +        pshufd    $1, %xmm3, %xmm4
> +
> +/*  Constant loading  */
> +        movq      _iExpMask+__svml_dtanh_data_internal(%rip), %xmm15
> +        movd      %xmm4, %ecx
> +        pcmpgtd   %xmm15, %xmm11
> +        movmskps  %xmm11, %edx
> +        movups    _dbAbsMask+__svml_dtanh_data_internal(%rip), %xmm0
> +        movups    _dbSignMask+__svml_dtanh_data_internal(%rip), %xmm12
> +        andps     %xmm13, %xmm0
> +        movslq    %eax, %rax
> +        andps     %xmm13, %xmm12
> +        movslq    %ecx, %rcx
> +        movups    %xmm13, (%rsp)
> +        movups    -96(%rax,%rsi), %xmm11
> +        movups    -96(%rcx,%rsi), %xmm2
> +        movups    -80(%rax,%rsi), %xmm9
> +        movups    -48(%rax,%rsi), %xmm5
> +        movaps    %xmm9, %xmm10
> +        movups    -32(%rax,%rsi), %xmm3
> +        movaps    %xmm5, %xmm6
> +        movaps    %xmm3, %xmm4
> +        unpckhpd  %xmm2, %xmm11
> +        movups    -80(%rcx,%rsi), %xmm13
> +        movups    -48(%rcx,%rsi), %xmm15
> +        movups    -32(%rcx,%rsi), %xmm1
> +        movups    -64(%rax,%rsi), %xmm7
> +        movups    -16(%rax,%rsi), %xmm2
> +        movaps    %xmm7, %xmm8
> +        unpcklpd  %xmm13, %xmm10
> +        unpckhpd  %xmm13, %xmm9
> +        movups    -64(%rcx,%rsi), %xmm14
> +        movups    -16(%rcx,%rsi), %xmm13
> +        unpcklpd  %xmm15, %xmm6
> +        unpckhpd  %xmm15, %xmm5
> +        unpcklpd  %xmm1, %xmm4
> +        unpckhpd  %xmm1, %xmm3
> +        movaps    %xmm2, %xmm1
> +        movups    (%rax,%rsi), %xmm15
> +        unpcklpd  %xmm14, %xmm8
> +        unpckhpd  %xmm14, %xmm7
> +        unpcklpd  %xmm13, %xmm1
> +        unpckhpd  %xmm13, %xmm2
> +        movaps    %xmm15, %xmm13
> +        movups    (%rcx,%rsi), %xmm14
> +        unpcklpd  %xmm14, %xmm13
> +        addpd     %xmm13, %xmm0
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm1, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm3, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm4, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm5, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm6, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm7, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm8, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm9, %xmm2
> +        mulpd     %xmm0, %xmm2
> +        addpd     %xmm10, %xmm2
> +        mulpd     %xmm2, %xmm0
> +        addpd     %xmm11, %xmm0
> +        orps      %xmm12, %xmm0
> +        andl      $3, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    (%rsp), %xmm1
> +        movups    %xmm1, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 edx xmm0
> +
> +        xorl      %eax, %eax
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $2, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      tanh@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 48(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN2v_tanh_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_dtanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _dbP[60*16][2];
> +        __declspec(align(16)) VUINT32 _dbSignMask[2][2];
> +        __declspec(align(16)) VUINT32 _dbAbsMask[2][2];
> +        __declspec(align(16)) VUINT32 _iExpMantMask[4][1];
> +        __declspec(align(16)) VUINT32 _iExpMask[4][1];
> +        __declspec(align(16)) VUINT32 _iMinIdxOfsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iMaxIdxMask[4][1];
> +} __svml_dtanh_data_internal;
> +#endif
> +__svml_dtanh_data_internal:
> +        /* Polynomial coefficients */
> +        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* PH0 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF0000000000000   /* P1  = +1.000000000000000014103e+00 */
> +        .quad 0xBD197DEAD79668D3   /* P2  = -2.264132406596103056796e-14 */
> +        .quad 0xBFD555555553AF3C   /* P3  = -3.333333333273349741024e-01 */
> +        .quad 0xBE052F7CCA134846   /* P4  = -6.165791385711493738399e-10 */
> +        .quad 0x3FC11111563849D6   /* P5  = +1.333333655353061107201e-01 */
> +        .quad 0xBEB038623673FFB2   /* P6  = -9.668021563879858950855e-07 */
> +        .quad 0xBFAB9F685E64022E   /* P7  = -5.395055916051593179252e-02 */
> +        .quad 0xBF2A54E2B28F2207   /* P8  = -2.008940439550829012647e-04 */
> +        .quad 0x3F97CFB9328A230E   /* P9  = +2.325333949059698582189e-02 */
> +        .quad 0xBF75CA6D61723E02   /* P10 = -5.320002811586290441790e-03 */
> +        .quad 0x0000000000000000   /* B = +0        */
> +        .quad 0x3FF0000000000000   /* A = +1.0      */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C3708A564FAD29A   /* PL0 = +1.248663375337163807466e-18 */
> +        .quad 0x3FC0E6973998DA48   /* PH0 = +1.320370703922029154143e-01 */
> +        .quad 0x3FEF712EB25C0888   /* P1  = +9.825662120422444519229e-01 */
> +        .quad 0xBFC09B296F7C1EA9   /* P2  = -1.297351641044220078331e-01 */
> +        .quad 0xBFD3DD77541EDDA7   /* P3  = -3.103922196855485849143e-01 */
> +        .quad 0x3FB58FFCF4309615   /* P4  = +8.422833406128689275566e-02 */
> +        .quad 0x3FBD3ABE845DCF49   /* P5  = +1.141776154670967208833e-01 */
> +        .quad 0xBFA791DF538C37FA   /* P6  = -4.603479285115947936529e-02 */
> +        .quad 0xBFA4F872F69CD6E8   /* P7  = -4.095801601799370195284e-02 */
> +        .quad 0x3F9772E49EF6412B   /* P8  = +2.289921970583567527179e-02 */
> +        .quad 0x3F8CBC0807393909   /* P9  = +1.403051635784581776625e-02 */
> +        .quad 0xBF85F06A30F93319   /* P10 = -1.071246110873285040939e-02 */
> +        .quad 0xBFC1000000000000   /* B = -.132813 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6004EE5739DEAC   /* PL0 = +6.947247374112211856530e-18 */
> +        .quad 0x3FC2DC968E6E0D62   /* PH0 = +1.473568149050193398786e-01 */
> +        .quad 0x3FEF4E1E606D96DF   /* P1  = +9.782859691010478680677e-01 */
> +        .quad 0xBFC273BD70994AB9   /* P2  = -1.441571044730005866646e-01 */
> +        .quad 0xBFD382B548270D2C   /* P3  = -3.048527912726111386771e-01 */
> +        .quad 0x3FB7CD2D582A6B29   /* P4  = +9.297450449450351894400e-02 */
> +        .quad 0x3FBC1278CCCBF0DB   /* P5  = +1.096568584434324642303e-01 */
> +        .quad 0xBFA9C7F5115B86A1   /* P6  = -5.035367810138536095866e-02 */
> +        .quad 0xBFA371C21BAF618E   /* P7  = -3.797728145554222910481e-02 */
> +        .quad 0x3F9958943F68417E   /* P8  = +2.475196492201935923783e-02 */
> +        .quad 0x3F8930D5CFFD4152   /* P9  = +1.230017701132682667572e-02 */
> +        .quad 0xBF875CF7ADD31B76   /* P10 = -1.140779017658897660092e-02 */
> +        .quad 0xBFC3000000000000   /* B = -.148438 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7EABE24E052A1F   /* PL0 = +2.660321779421749543501e-17 */
> +        .quad 0x3FC4D04783618C71   /* PH0 = +1.626061812886266111366e-01 */
> +        .quad 0x3FEF2765AF97A4B3   /* P1  = +9.735592298067302883212e-01 */
> +        .quad 0xBFC443654205FEA5   /* P2  = -1.583067486171689074207e-01 */
> +        .quad 0xBFD31F2E208A5B97   /* P3  = -2.987780874040536844467e-01 */
> +        .quad 0x3FB9F235BD339878   /* P4  = +1.013520800512156573576e-01 */
> +        .quad 0x3FBAD0B0DFCCA141   /* P5  = +1.047468706498238100104e-01 */
> +        .quad 0xBFABD1B9600E608E   /* P6  = -5.433444306908184548967e-02 */
> +        .quad 0xBFA1CEBEAF07DB58   /* P7  = -3.478046309094534453598e-02 */
> +        .quad 0x3F9AFC9FB1D8EFD2   /* P8  = +2.635430834764902126383e-02 */
> +        .quad 0x3F8573444F1AB502   /* P9  = +1.047376028449287564018e-02 */
> +        .quad 0xBF8874FBC8F24406   /* P10 = -1.194187838544459322219e-02 */
> +        .quad 0xBFC5000000000000   /* B = -.164063 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7FB199D361A790   /* PL0 = +2.748994907060158996213e-17 */
> +        .quad 0x3FC6C170259E21F7   /* PH0 = +1.777782615356639783766e-01 */
> +        .quad 0x3FEEFD17479F7C65   /* P1  = +9.683948897253570478266e-01 */
> +        .quad 0xBFC609530FE4DF8D   /* P2  = -1.721595599753950294577e-01 */
> +        .quad 0xBFD2B3465D71B4DE   /* P3  = -2.921920692959484052676e-01 */
> +        .quad 0x3FBBFD2D34AC509B   /* P4  = +1.093319181057403192166e-01 */
> +        .quad 0x3FB9778C3C16A0FE   /* P5  = +9.948040453912551395183e-02 */
> +        .quad 0xBFADAC4D9E63C665   /* P6  = -5.795519407719210697372e-02 */
> +        .quad 0xBFA0139CCAD02D60   /* P7  = -3.139963126894929339124e-02 */
> +        .quad 0x3F9C5BF43BA6F19D   /* P8  = +2.769452680671379432854e-02 */
> +        .quad 0x3F8190B703350341   /* P9  = +8.576803002712575184772e-03 */
> +        .quad 0xBF8936606782858A   /* P10 = -1.231074634444230850234e-02 */
> +        .quad 0xBFC7000000000000   /* B = -.179688 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6A917CA3624D50   /* PL0 = +1.152216693509785660691e-17 */
> +        .quad 0x3FC8AFD7B974FABB   /* PH0 = +1.928662925292508878439e-01 */
> +        .quad 0x3FEECF47624A5D03   /* P1  = +9.628025932060214187231e-01 */
> +        .quad 0xBFC7C4C2CB4FDE4D   /* P2  = -1.856921665891938814679e-01 */
> +        .quad 0xBFD23F69CB2C1F9D   /* P3  = -2.851204380135586155453e-01 */
> +        .quad 0x3FBDEC5703A03814   /* P4  = +1.168875106670557712458e-01 */
> +        .quad 0x3FB8095003D0CF15   /* P5  = +9.389209836154706616487e-02 */
> +        .quad 0xBFAF554B47B10CBB   /* P6  = -6.119761705533607365968e-02 */
> +        .quad 0xBF9C89743FE7BC1B   /* P7  = -2.786809577986213853937e-02 */
> +        .quad 0x3F9D74725B746E7C   /* P8  = +2.876452143855921824991e-02 */
> +        .quad 0x3F7B2D8AFB70B88C   /* P9  = +6.635229968237631511880e-03 */
> +        .quad 0xBF89A0A2883EF6CB   /* P10 = -1.251341799058582545252e-02 */
> +        .quad 0xBFC9000000000000   /* B = -.195313 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7608279E8609CB   /* PL0 = +1.910958764623660748269e-17 */
> +        .quad 0x3FCA9B46D2DDC5E3   /* PH0 = +2.078636674519166172015e-01 */
> +        .quad 0x3FEE9E0BB72A01A1   /* P1  = +9.567926957534390123919e-01 */
> +        .quad 0xBFC974FAD10C5330   /* P2  = -1.988824387305156976885e-01 */
> +        .quad 0xBFD1C40ACCBA4044   /* P3  = -2.775904654781735703430e-01 */
> +        .quad 0x3FBFBE24E2987853   /* P4  = +1.239951184474830487522e-01 */
> +        .quad 0x3FB6885B4345E47F   /* P5  = +8.801813499839460539687e-02 */
> +        .quad 0xBFB06563D5670584   /* P6  = -6.404708824176991770896e-02 */
> +        .quad 0xBF98CD1D620DF6E2   /* P7  = -2.421995078065365147772e-02 */
> +        .quad 0x3F9E44EF3E844D21   /* P8  = +2.955983943054463683119e-02 */
> +        .quad 0x3F7325FA0148CAAE   /* P9  = +4.674889165971292322643e-03 */
> +        .quad 0xBF89B4C8556C2D92   /* P10 = -1.255184660614964011319e-02 */
> +        .quad 0xBFCB000000000000   /* B = -.210938 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6F19DAA20F51D5   /* PL0 = +1.348790537832000351176e-17 */
> +        .quad 0x3FCC83876CA98E15   /* PH0 = +2.227639465883021474557e-01 */
> +        .quad 0x3FEE697B662D07CD   /* P1  = +9.503762241004040620296e-01 */
> +        .quad 0xBFCB194C7ED76ACF   /* P2  = -2.117095584242946953999e-01 */
> +        .quad 0xBFD141A19E419762   /* P3  = -2.696308179350720680191e-01 */
> +        .quad 0x3FC0B89C64BC7B98   /* P4  = +1.306338779331468503007e-01 */
> +        .quad 0x3FB4F721150BBFC5   /* P5  = +8.189589275184434216748e-02 */
> +        .quad 0xBFB105AAFAB87898   /* P6  = -6.649273511036069461061e-02 */
> +        .quad 0xBF94FB3B31248C01   /* P7  = -2.048962104266749732921e-02 */
> +        .quad 0x3F9ECD31E588709C   /* P8  = +3.007963145692880855964e-02 */
> +        .quad 0x3F664A91A335C105   /* P9  = +2.721104095762541127495e-03 */
> +        .quad 0xBF89754E32E1E26E   /* P10 = -1.243077366619723806134e-02 */
> +        .quad 0xBFCD000000000000   /* B = -.226563 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6AC6C889D8111D   /* PL0 = +1.161245469312620769170e-17 */
> +        .quad 0x3FCE6864FE55A3D0   /* PH0 = +2.375608674877001114112e-01 */
> +        .quad 0x3FEE31AEE116B82B   /* P1  = +9.435648342384913826391e-01 */
> +        .quad 0xBFCCB114B69E808B   /* P2  = -2.241540805525839833707e-01 */
> +        .quad 0xBFD0B8AB913BA99D   /* P3  = -2.612713735858507980441e-01 */
> +        .quad 0x3FC1823322BED48A   /* P4  = +1.367858810096190233514e-01 */
> +        .quad 0x3FB35822B7929893   /* P5  = +7.556359273675842651653e-02 */
> +        .quad 0xBFB18B03CC78D2DA   /* P6  = -6.852744810096158580830e-02 */
> +        .quad 0xBF911CCC3C8D5E5D   /* P7  = -1.671141738492420009734e-02 */
> +        .quad 0x3F9F0DEC2D99B12F   /* P8  = +3.032654789278515819797e-02 */
> +        .quad 0x3F4A28398B4EBD98   /* P9  = +7.982521989244205404918e-04 */
> +        .quad 0xBF88E60CB2FAB9A4   /* P10 = -1.215753480150000985458e-02 */
> +        .quad 0xBFCF000000000000   /* B = -.242188 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C89D2B6774FB61D   /* PL0 = +4.479593208720169247958e-17 */
> +        .quad 0x3FD09C744F539BE4   /* PH0 = +2.595492148088267558848e-01 */
> +        .quad 0x3FEDD823B0400D42   /* P1  = +9.326342050921214825882e-01 */
> +        .quad 0xBFCEFBF7FF305FCC   /* P2  = -2.420644756355144687086e-01 */
> +        .quad 0xBFCFC01DC4F24A41   /* P3  = -2.480504237797323303990e-01 */
> +        .quad 0x3FC291A2C26D5548   /* P4  = +1.450694512701977626753e-01 */
> +        .quad 0x3FB0D562E672D188   /* P5  = +6.575601698097532991976e-02 */
> +        .quad 0xBFB2201ECC119E06   /* P6  = -7.080261690281738261872e-02 */
> +        .quad 0xBF8695D50F778D31   /* P7  = -1.102796987010509974642e-02 */
> +        .quad 0x3F9EEC8CFBC031A0   /* P8  = +3.019924437107734972427e-02 */
> +        .quad 0xBF6030F0A4D3660A   /* P9  = -1.976461417694923328722e-03 */
> +        .quad 0xBF87845288A4AEF5   /* P10 = -1.148285369398347838494e-02 */
> +        .quad 0xBFD1000000000000   /* B = -.265625 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8B6AAB614D1C8D   /* PL0 = +4.756035418366735312727e-17 */
> +        .quad 0x3FD275F7E1CF7F63   /* PH0 = +2.884502129727392616410e-01 */
> +        .quad 0x3FED56658F74C9CC   /* P1  = +9.167964746359813351341e-01 */
> +        .quad 0xBFD0ECC045EBD596   /* P2  = -2.644501383614054083635e-01 */
> +        .quad 0xBFCD5A4BDE179180   /* P3  = -2.293181261476426808811e-01 */
> +        .quad 0x3FC3C00047D34767   /* P4  = +1.542969084462655120552e-01 */
> +        .quad 0x3FAAC7CE84FD609F   /* P5  = +5.230565427217581251974e-02 */
> +        .quad 0xBFB288948D2E8B43   /* P6  = -7.239654967137902384931e-02 */
> +        .quad 0xBF6D6605AAD5A1C0   /* P7  = -3.588687008847041164896e-03 */
> +        .quad 0x3F9DDB0790848E97   /* P8  = +2.915584392134337382866e-02 */
> +        .quad 0xBF75FDE291BAD5B4   /* P9  = -5.369076763306269573660e-03 */
> +        .quad 0xBF84CEA5C52E0A78   /* P10 = -1.015977390284671071888e-02 */
> +        .quad 0xBFD3000000000000   /* B = -.296875 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7139A81C8A6ECF   /* PL0 = +1.494049799478574591322e-17 */
> +        .quad 0x3FD4470650036407   /* PH0 = +3.168350011233659890841e-01 */
> +        .quad 0x3FECC9A69DFDDD48   /* P1  = +8.996155820631566629678e-01 */
> +        .quad 0xBFD23DED3A37A09F   /* P2  = -2.850297039535778028925e-01 */
> +        .quad 0xBFCAD302395D51C1   /* P3  = -2.095644741153943890185e-01 */
> +        .quad 0x3FC4A8FE3F309C22   /* P4  = +1.614072617096278705115e-01 */
> +        .quad 0x3FA3D161188AA436   /* P5  = +3.870681213931741151586e-02 */
> +        .quad 0xBFB288CFE5494E98   /* P6  = -7.240008685885823969403e-02 */
> +        .quad 0x3F6C7903EED8D334   /* P7  = +3.475673371918475361081e-03 */
> +        .quad 0x3F9BE023CDFB02F6   /* P8  = +2.722221321778569498033e-02 */
> +        .quad 0xBF80F8296F2C3A95   /* P9  = -8.285831170295390358336e-03 */
> +        .quad 0xBF8152DF4790049B   /* P10 = -8.458847400108650973189e-03 */
> +        .quad 0xBFD5000000000000   /* B = -.328125 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7751FE0FEE8335   /* PL0 = +2.022712113430213599928e-17 */
> +        .quad 0x3FD60EF7120502A9   /* PH0 = +3.446633983585721261456e-01 */
> +        .quad 0x3FEC32D951E56E6F   /* P1  = +8.812071418319202070776e-01 */
> +        .quad 0xBFD370255FC004F8   /* P2  = -3.037198481616338996824e-01 */
> +        .quad 0xBFC832F0EBC6BB41   /* P3  = -1.890545989276351359107e-01 */
> +        .quad 0x3FC54C99A0FF432F   /* P4  = +1.664001499289269127540e-01 */
> +        .quad 0x3F99DAC0CC283C18   /* P5  = +2.524853941036661688369e-02 */
> +        .quad 0xBFB227B3896A026D   /* P6  = -7.091829399906553280461e-02 */
> +        .quad 0x3F84663364E1FB19   /* P7  = +9.960557476231411602383e-03 */
> +        .quad 0x3F9922D70DE07C57   /* P8  = +2.454696676442965935283e-02 */
> +        .quad 0xBF85C4A4EB6F86BC   /* P9  = -1.062897532932837635222e-02 */
> +        .quad 0xBF7AAB61214FFE17   /* P10 = -6.511096396024671890972e-03 */
> +        .quad 0xBFD7000000000000   /* B = -.359375 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3BFE67F266843B2C   /* PL0 = +1.030196791298162288777e-19 */
> +        .quad 0x3FD7CD3115FC0F16   /* PH0 = +3.718989100163850869407e-01 */
> +        .quad 0x3FEB92F96CCC2C5B   /* P1  = +8.616912007286247079761e-01 */
> +        .quad 0xBFD4827320135092   /* P2  = -3.204620183216856200247e-01 */
> +        .quad 0xBFC582B15550168A   /* P3  = -1.680509249273891977521e-01 */
> +        .quad 0x3FC5AC3B9A2E4C31   /* P4  = +1.693186285816366254244e-01 */
> +        .quad 0x3F88FA599FCADAFB   /* P5  = +1.219625491044728129762e-02 */
> +        .quad 0xBFB16EC8F5CA169E   /* P6  = -6.809669495313605642174e-02 */
> +        .quad 0x3F90140EFC748BBE   /* P7  = +1.570151725639922719844e-02 */
> +        .quad 0x3F95CFC49C1A28DC   /* P8  = +2.130038454792147768770e-02 */
> +        .quad 0xBF8946ED8B1BF454   /* P9  = -1.234231549050882816697e-02 */
> +        .quad 0xBF7239E55C1DD50F   /* P10 = -4.449745117985472755606e-03 */
> +        .quad 0xBFD9000000000000   /* B = -.390625 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6412330191189C   /* PL0 = +8.704448096175471149661e-18 */
> +        .quad 0x3FD9812B3B03F0A5   /* PH0 = +3.985088421175169703936e-01 */
> +        .quad 0x3FEAEB08C3C0E84D   /* P1  = +8.411907027541559254748e-01 */
> +        .quad 0xBFD57446B1BC46CF   /* P2  = -3.352219329545790787820e-01 */
> +        .quad 0xBFC2CA9ABC0444AD   /* P3  = -1.468079965639267634401e-01 */
> +        .quad 0x3FC5CA95F9460D18   /* P4  = +1.702449290424759093710e-01 */
> +        .quad 0xBF2C2DAA35DD05C3   /* P5  = -2.149839664813813012186e-04 */
> +        .quad 0xBFB069A516EEB75D   /* P6  = -6.411201295733578195472e-02 */
> +        .quad 0x3F9512716416FDC7   /* P7  = +2.057816670798986720058e-02 */
> +        .quad 0x3F921630CB1319A3   /* P8  = +1.766277541607908852593e-02 */
> +        .quad 0xBF8B76DA2EC99526   /* P9  = -1.341028647693549562145e-02 */
> +        .quad 0xBF63A97474A161E4   /* P10 = -2.400138332671485493040e-03 */
> +        .quad 0xBFDB000000000000   /* B = -.421875 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C89B79F5783381C   /* PL0 = +4.461236087774530799537e-17 */
> +        .quad 0x3FDB2A6C993B829D   /* PH0 = +4.244643684778937609003e-01 */
> +        .quad 0x3FEA3C0C1FBA328C   /* P1  = +8.198299998926627915155e-01 */
> +        .quad 0xBFD6457212F78DE0   /* P2  = -3.479886231636708581604e-01 */
> +        .quad 0xBFC0129BDA380A66   /* P3  = -1.255678954622282824818e-01 */
> +        .quad 0x3FC5AB77F388FBDE   /* P4  = +1.692953051696965507089e-01 */
> +        .quad 0xBF8822F3A6CADB7C   /* P5  = -1.178541519889874597783e-02 */
> +        .quad 0xBFAE4A876370A4BD   /* P6  = -5.916236008517603590739e-02 */
> +        .quad 0x3F991A89BC3B7710   /* P7  = +2.451529704455085335710e-02 */
> +        .quad 0x3F8C4A4328204D4B   /* P8  = +1.381351915555364098800e-02 */
> +        .quad 0xBF8C5F921D01EC0B   /* P9  = -1.385416174911393178490e-02 */
> +        .quad 0xBF3EE844C5B79FB8   /* P10 = -4.716079617694784908234e-04 */
> +        .quad 0xBFDD000000000000   /* B = -.453125 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C73FA437AD7AD87   /* PL0 = +1.732779905745858845932e-17 */
> +        .quad 0x3FDCC88C9902CF45   /* PH0 = +4.497405523536495697279e-01 */
> +        .quad 0x3FE9870845162D1D   /* P1  = +7.977334355686341748810e-01 */
> +        .quad 0xBFD6F62358F73DA8   /* P2  = -3.587730759436120677668e-01 */
> +        .quad 0xBFBAC4345D675FE1   /* P3  = -1.045563438450467661101e-01 */
> +        .quad 0x3FC5539DA8287019   /* P4  = +1.666142531474868131862e-01 */
> +        .quad 0xBF96E3E0DC04A09F   /* P5  = -2.235366194614185212822e-02 */
> +        .quad 0xBFAB5EC7147C207D   /* P6  = -5.345747113284546871398e-02 */
> +        .quad 0x3F9C24166FFA7A58   /* P7  = +2.748141344511120915667e-02 */
> +        .quad 0x3F8451B907819844   /* P8  = +9.921498815128277696693e-03 */
> +        .quad 0xBF8C1C6D19191FCB   /* P9  = -1.372609360545586670239e-02 */
> +        .quad 0x3F547372DF72E35A   /* P10 = +1.248228245272117756098e-03 */
> +        .quad 0xBFDF000000000000   /* B = -.484375 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C848FE06EE49950   /* PL0 = +3.566941590788961528958e-17 */
> +        .quad 0x3FDF20211A36475D   /* PH0 = +4.863360172249622803697e-01 */
> +        .quad 0x3FE86E67E6B80AC2   /* P1  = +7.634772783497611574659e-01 */
> +        .quad 0xBFD7C37C55474D9B   /* P2  = -3.713064987943767913461e-01 */
> +        .quad 0xBFB2EBF15F3CB036   /* P3  = -7.391270232318521952684e-02 */
> +        .quad 0x3FC4718C8EF6E3AA   /* P4  = +1.597152422016539530950e-01 */
> +        .quad 0xBFA277F8394E9B07   /* P5  = -3.607154559658991932071e-02 */
> +        .quad 0xBFA680312AB207E3   /* P6  = -4.394677778419955009224e-02 */
> +        .quad 0x3F9EDC9A8B57E286   /* P7  = +3.013841128810892143223e-02 */
> +        .quad 0x3F71B8C5E648EAF6   /* P8  = +4.326603932492947851719e-03 */
> +        .quad 0xBF89DB218356730C   /* P9  = -1.262499029217558458029e-02 */
> +        .quad 0x3F6B05728E6EBC8E   /* P10 = +3.298496001171330815865e-03 */
> +        .quad 0xBFE1000000000000   /* B = -.53125  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8429831EDD94DE   /* PL0 = +3.497576705878673192147e-17 */
> +        .quad 0x3FE10AF47E0BF610   /* PH0 = +5.325872861719194162333e-01 */
> +        .quad 0x3FE6EC5879F87EEE   /* P1  = +7.163507826080299761242e-01 */
> +        .quad 0xBFD86AD001BFE200   /* P2  = -3.815193192563413204129e-01 */
> +        .quad 0xBFA239045B661385   /* P3  = -3.559125533778398983564e-02 */
> +        .quad 0x3FC2B4572D9CC147   /* P4  = +1.461285565105845078038e-01 */
> +        .quad 0xBFA99F4F01740705   /* P5  = -5.004355328311586406115e-02 */
> +        .quad 0xBF9F449C484F4879   /* P6  = -3.053516570418721511214e-02 */
> +        .quad 0x3F9F5F42169D7DDE   /* P7  = +3.063681853325116830798e-02 */
> +        .quad 0xBF6111B1BA632A97   /* P8  = -2.083632588527460989469e-03 */
> +        .quad 0xBF84725FBE5B6E61   /* P9  = -9.983776089419639342530e-03 */
> +        .quad 0x3F7438A2986CFA9C   /* P10 = +4.936823976832951342488e-03 */
> +        .quad 0xBFE3000000000000   /* B = -.59375  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BE9160BFB3505   /* PL0 = +1.210424670976053242391e-17 */
> +        .quad 0x3FE26D76F73233C7   /* PH0 = +5.758623912857893101247e-01 */
> +        .quad 0x3FE56363B5B93937   /* P1  = +6.683825063026124740752e-01 */
> +        .quad 0xBFD8A2244B27297E   /* P2  = -3.848963483730115724200e-01 */
> +        .quad 0xBF52CA2F101EEF63   /* P3  = -1.146837196286797844817e-03 */
> +        .quad 0x3FC081BC342243AD   /* P4  = +1.289592032012739958675e-01 */
> +        .quad 0xBFAE38DB4A932344   /* P5  = -5.902753148399722719732e-02 */
> +        .quad 0xBF91F814D4AE90C6   /* P6  = -1.754791782481459457885e-02 */
> +        .quad 0x3F9D056AE193C4F3   /* P7  = +2.834097863973723355792e-02 */
> +        .quad 0xBF7BD0B502D8F3A0   /* P8  = -6.790835451792626336974e-03 */
> +        .quad 0xBF7B763F7BB8AE2F   /* P9  = -6.704566938008179114124e-03 */
> +        .quad 0x3F76036F42D9AB69   /* P10 = +5.374369252971835729099e-03 */
> +        .quad 0xBFE5000000000000   /* B = -.65625  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8B64AF0450486E   /* PL0 = +4.751979286662385162741e-17 */
> +        .quad 0x3FE3B75F8BCB742D   /* PH0 = +6.161344271055263499548e-01 */
> +        .quad 0x3FE3DA23BC12369F   /* P1  = +6.203783677353447780947e-01 */
> +        .quad 0xBFD8768FF4B46416   /* P2  = -3.822364701932782367281e-01 */
> +        .quad 0x3F9D67CB8AD9CB1A   /* P3  = +2.871625933625941117406e-02 */
> +        .quad 0x3FBC168CB7827DF4   /* P4  = +1.097190807363331305006e-01 */
> +        .quad 0xBFB03A2B83C9272E   /* P5  = -6.338760344911228324430e-02 */
> +        .quad 0xBF789FEB595297DC   /* P6  = -6.011885959344067548074e-03 */
> +        .quad 0x3F98BD01B4C335E7   /* P7  = +2.415850320612902513532e-02 */
> +        .quad 0xBF83BADC303D6535   /* P8  = -9.633751127398152979976e-03 */
> +        .quad 0xBF6C54E7A1C1E3F3   /* P9  = -3.458454519258407989501e-03 */
> +        .quad 0x3F7408394B7EF3E7   /* P10 = +4.890655334688332484537e-03 */
> +        .quad 0xBFE7000000000000   /* B = -.71875  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6A48557F6E0D3E   /* PL0 = +1.139824111505584215867e-17 */
> +        .quad 0x3FE4E8D895B010DC   /* PH0 = +6.534235881413468227663e-01 */
> +        .quad 0x3FE25652FAAF8A73   /* P1  = +5.730376144604875448991e-01 */
> +        .quad 0xBFD7F6C3A57C444B   /* P2  = -3.744362941807295084434e-01 */
> +        .quad 0x3FAB7866E3F99EBE   /* P3  = +5.365296872042567001598e-02 */
> +        .quad 0x3FB6FA1DF47CCD40   /* P4  = +8.975398272450707099784e-02 */
> +        .quad 0xBFB05508D3741B8E   /* P5  = -6.379752314033580026840e-02 */
> +        .quad 0x3F6C3EFDF7BB279C   /* P6  = +3.448005705512137236209e-03 */
> +        .quad 0x3F9372BADD6D3E27   /* P7  = +1.899234749299530050806e-02 */
> +        .quad 0xBF860FD5AE65F3DA   /* P8  = -1.077238977881649471165e-02 */
> +        .quad 0xBF47266FFB07E628   /* P9  = -7.064863949032872448118e-04 */
> +        .quad 0x3F6F9763992C2A05   /* P10 = +3.856367614735181120799e-03 */
> +        .quad 0xBFE9000000000000   /* B = -.78125  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BB6A2B194E3AB   /* PL0 = +1.201878007209462528697e-17 */
> +        .quad 0x3FE602609AAE7C22   /* PH0 = +6.877902051090851731630e-01 */
> +        .quad 0x3FE0DCBAFE191C7F   /* P1  = +5.269446337560025312137e-01 */
> +        .quad 0xBFD732028428A9FB   /* P2  = -3.624273577321727538225e-01 */
> +        .quad 0x3FB2D92389BE065B   /* P3  = +7.362577545975439796588e-02 */
> +        .quad 0x3FB1F6A9C8C49993   /* P4  = +7.017003203927733370937e-02 */
> +        .quad 0xBFAF47C0B50B56EE   /* P5  = -6.109430513394707378526e-02 */
> +        .quad 0x3F85A8EDD1356223   /* P6  = +1.057611269668352068104e-02 */
> +        .quad 0x3F8BE05C5CD1B4FA   /* P7  = +1.361152799855823798207e-02 */
> +        .quad 0xBF85A0EFE4552F76   /* P8  = -1.056086936537046752272e-02 */
> +        .quad 0x3F559F2A6A356194   /* P9  = +1.319686337259627831943e-03 */
> +        .quad 0x3F6576F5E989208D   /* P10 = +2.620201394425042596201e-03 */
> +        .quad 0xBFEB000000000000   /* B = -.84375  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C80328BD86C8B74   /* PL0 = +2.809809047161267929701e-17 */
> +        .quad 0x3FE704BB1B7FCB81   /* PH0 = +7.193275010198335595035e-01 */
> +        .quad 0x3FDEE264AAD6C40C   /* P1  = +4.825679462765613089739e-01 */
> +        .quad 0xBFD637493CE659F1   /* P2  = -3.471243948673921548357e-01 */
> +        .quad 0x3FB6BE3A3DEE6F4A   /* P3  = +8.884014141079635303208e-02 */
> +        .quad 0x3FAA85EB6470AC0F   /* P4  = +5.180297471118688523488e-02 */
> +        .quad 0xBFACC0146EA4858D   /* P5  = -5.615295267694895314457e-02 */
> +        .quad 0x3F8F8FB683CDDAC5   /* P6  = +1.541082944616557159055e-02 */
> +        .quad 0x3F819515DEE2CB91   /* P7  = +8.585139145315585602547e-03 */
> +        .quad 0xBF834E45E6AF9EA1   /* P8  = -9.426637747267209169415e-03 */
> +        .quad 0x3F65250F197CA56D   /* P9  = +2.581147662472352252568e-03 */
> +        .quad 0x3F57A766026D036C   /* P10 = +1.443719500187702367690e-03 */
> +        .quad 0xBFED000000000000   /* B = -.90625  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C716F7EEF7B61AD   /* PL0 = +1.512291215142578135651e-17 */
> +        .quad 0x3FE7F0E1A4CD846E   /* PH0 = +7.481544703297353660076e-01 */
> +        .quad 0x3FDC2D4CC872DC09   /* P1  = +4.402648885256331012598e-01 */
> +        .quad 0xBFD514A99F92ED53   /* P2  = -3.293861444796750250530e-01 */
> +        .quad 0x3FB9846A6CF2F337   /* P3  = +9.967675361526749494844e-02 */
> +        .quad 0x3FA20896939AB161   /* P4  = +3.522177268800664413493e-02 */
> +        .quad 0xBFA97E801F31EE0D   /* P5  = -4.979324703978358553405e-02 */
> +        .quad 0x3F92A11F47B82085   /* P6  = +1.819275737037219740638e-02 */
> +        .quad 0x3F717D70FE289C34   /* P7  = +4.270020845559097605514e-03 */
> +        .quad 0xBF7FDCF1D3F6CE2D   /* P8  = -7.779068604054678540132e-03 */
> +        .quad 0x3F69F607E81AF6B6   /* P9  = +3.169074480722534625181e-03 */
> +        .quad 0x3F3F925C80D0F889   /* P10 = +4.817462766516585511824e-04 */
> +        .quad 0xBFEF000000000000   /* B = -.96875  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C931A11D7E8606E   /* PL0 = +6.627280241435322692188e-17 */
> +        .quad 0x3FE92BFB370D9B71   /* PH0 = +7.866188121086975515439e-01 */
> +        .quad 0x3FD866160E454111   /* P1  = +3.812308444367014680480e-01 */
> +        .quad 0xBFD33149F3801DBA   /* P2  = -2.998833539899937679796e-01 */
> +        .quad 0x3FBBDB6D4C949899   /* P3  = +1.088169395412442909023e-01 */
> +        .quad 0x3F8D6AB2A74B9343   /* P4  = +1.436366627735597372494e-02 */
> +        .quad 0xBFA404D1047C5D72   /* P5  = -3.909924678571997970917e-02 */
> +        .quad 0x3F93C47D9ACCD919   /* P6  = +1.930423981976856424661e-02 */
> +        .quad 0xBF41B755642CFF1B   /* P7  = -5.406538915408738478158e-04 */
> +        .quad 0xBF74B5301AA1E788   /* P8  = -5.055606752756853900641e-03 */
> +        .quad 0x3F69A84C5B2A3E68   /* P9  = +3.132008679422249529120e-03 */
> +        .quad 0xBF3CF47830328C11   /* P10 = -4.418176105877589308931e-04 */
> +        .quad 0xBFF1000000000000   /* B = -1.0625   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C884D471B8FD396   /* PL0 = +4.215701792312937090514e-17 */
> +        .quad 0x3FEA8DBCBC31897A   /* PH0 = +8.298019099859594849278e-01 */
> +        .quad 0x3FD3EE730537C8EA   /* P1  = +3.114287901836535219818e-01 */
> +        .quad 0xBFD08A05AD27CE32   /* P2  = -2.584242049190123217982e-01 */
> +        .quad 0x3FBC5255406F84B6   /* P3  = +1.106313021005175045399e-01 */
> +        .quad 0xBF772FA2F633AA5E   /* P4  = -5.660664147607434209241e-03 */
> +        .quad 0xBF99DD8E4C473FC4   /* P5  = -2.525923100057504533247e-02 */
> +        .quad 0x3F9183C935B6495D   /* P6  = +1.710428610165003372069e-02 */
> +        .quad 0xBF70471A3A591480   /* P7  = -3.974058583087303228038e-03 */
> +        .quad 0xBF603DDD4DEBB9A4   /* P8  = -1.982624278176818987264e-03 */
> +        .quad 0x3F62591E44D3C17F   /* P9  = +2.239760512218135956425e-03 */
> +        .quad 0xBF4C195D3A9B1AB4   /* P10 = -8.575158328419569430544e-04 */
> +        .quad 0xBFF3000000000000   /* B = -1.1875   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C90DD1C9BFF7F64   /* PL0 = +5.850777430004479798187e-17 */
> +        .quad 0x3FEBAD50A4A68BC1   /* PH0 = +8.649066177207417327466e-01 */
> +        .quad 0x3FD01FBA72CEE1A5   /* P1  = +2.519365426228666233893e-01 */
> +        .quad 0xBFCBE432F647C4D6   /* P2  = -2.179015829602010702633e-01 */
> +        .quad 0x3FBABF92B6E5AC73   /* P3  = +1.044856735731387955105e-01 */
> +        .quad 0xBF922983AA24E217   /* P4  = -1.773648954369563555378e-02 */
> +        .quad 0xBF8C72214C14E23A   /* P5  = -1.388956082756564056328e-02 */
> +        .quad 0x3F8ACB4D1F388E8B   /* P6  = +1.308307887581540972153e-02 */
> +        .quad 0xBF740EF8B4A2EE3B   /* P7  = -4.897090441029978580995e-03 */
> +        .quad 0xBF0EA9F30C8DC900   /* P8  = -5.848668076326342477133e-05 */
> +        .quad 0x3F53CC40D18713AE   /* P9  = +1.208365725788622757410e-03 */
> +        .quad 0xBF4848B86029CBA1   /* P10 = -7.410908004444779592485e-04 */
> +        .quad 0xBFF5000000000000   /* B = -1.3125   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8FB61781D22681   /* PL0 = +5.501032995458057064843e-17 */
> +        .quad 0x3FEC950A3340C8BF   /* PH0 = +8.931933404003514764824e-01 */
> +        .quad 0x3FC9E1DFFD385423   /* P1  = +2.022056566644617586005e-01 */
> +        .quad 0xBFC71E2FF88EBA23   /* P2  = -1.806087459239772032583e-01 */
> +        .quad 0x3FB80AEBD07AB5BA   /* P3  = +9.391664352252506838449e-02 */
> +        .quad 0xBF98404E27EAE6ED   /* P4  = -2.368280523908243895884e-02 */
> +        .quad 0xBF772DA520B5006E   /* P5  = -5.658764868087568802107e-03 */
> +        .quad 0x3F824C9268AF9423   /* P6  = +8.935111827620250551925e-03 */
> +        .quad 0xBF722AE76D206AE3   /* P7  = -4.435447701349490160113e-03 */
> +        .quad 0x3F4B807F56298D5E   /* P8  = +8.392926941493230644497e-04 */
> +        .quad 0x3F3D71027DF95D2A   /* P9  = +4.492407879061627603159e-04 */
> +        .quad 0xBF3EBD17676755FB   /* P10 = -4.690343988874298905483e-04 */
> +        .quad 0xBFF7000000000000   /* B = -1.4375   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C95393C63CE8224   /* PL0 = +7.363407705201031038415e-17 */
> +        .quad 0x3FED4E6F464286B0   /* PH0 = +9.158245441687622445670e-01 */
> +        .quad 0x3FC4A45842B7DE1E   /* P1  = +1.612654042980787191461e-01 */
> +        .quad 0xBFC2E7885AFDD3D0   /* P2  = -1.476908153814791087327e-01 */
> +        .quad 0x3FB4DD6DD51D3FEB   /* P3  = +8.150373890862254580204e-02 */
> +        .quad 0xBF9A05D3ADAB489C   /* P4  = -2.541285274021075503042e-02 */
> +        .quad 0xBF3459B643B4995C   /* P5  = -3.105230313899165257622e-04 */
> +        .quad 0x3F766B30745F2E3A   /* P6  = +5.473317409222350365811e-03 */
> +        .quad 0xBF6C2C891E555BDF   /* P7  = -3.439204988051155730940e-03 */
> +        .quad 0x3F5194F30D6C576D   /* P8  = +1.073109966176012791522e-03 */
> +        .quad 0x3EF4DBB43C3132A2   /* P9  = +1.989194766975849961365e-05 */
> +        .quad 0xBF2E45EBAB3C15A0   /* P10 = -2.309656316514087783666e-04 */
> +        .quad 0xBFF9000000000000   /* B = -1.5625   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C75111669651DAA   /* PL0 = +1.827249135453834384396e-17 */
> +        .quad 0x3FEDE1EB5937518F   /* PH0 = +9.338280432225917193634e-01 */
> +        .quad 0x3FC06129C7C8EBB1   /* P1  = +1.279651856910653382507e-01 */
> +        .quad 0xBFBE9763041064E1   /* P2  = -1.194974789545031421774e-01 */
> +        .quad 0x3FB1A5B9F9113928   /* P3  = +6.893503504509068635308e-02 */
> +        .quad 0xBF992145039F9AFE   /* P4  = -2.454097590080105816526e-02 */
> +        .quad 0x3F66CB116EA49C89   /* P5  = +2.782377288116648315142e-03 */
> +        .quad 0x3F67F972FDF30001   /* P6  = +2.926563829163342740100e-03 */
> +        .quad 0xBF63A7B5975F02F3   /* P7  = -2.399305983061922438601e-03 */
> +        .quad 0x3F4FDE7B8777F4C8   /* P8  = +9.725669069095216373599e-04 */
> +        .quad 0xBF25918876626BA4   /* P9  = -1.645545082212515656240e-04 */
> +        .quad 0xBF1495123C991F00   /* P10 = -7.851527984669912693674e-05 */
> +        .quad 0xBFFB000000000000   /* B = -1.6875   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9F29A5B7426D27   /* PL0 = +1.081172820484012446345e-16 */
> +        .quad 0x3FEE56B6F3EFABFC   /* PH0 = +9.480852856044061915952e-01 */
> +        .quad 0x3FB9E3EFD94BB9FC   /* P1  = +1.011342912204113371518e-01 */
> +        .quad 0xBFB88BD9760FECA7   /* P2  = -9.588393337610288420285e-02 */
> +        .quad 0x3FAD48A0350B3ACF   /* P3  = +5.719471595295077387313e-02 */
> +        .quad 0xBF96CC6A5110F129   /* P4  = -2.226415748394675367257e-02 */
> +        .quad 0x3F71934687170384   /* P5  = +4.290843485649345772606e-03 */
> +        .quad 0x3F5407BAF73B3DF9   /* P6  = +1.222546180475235334287e-03 */
> +        .quad 0xBF591B626C0646DD   /* P7  = -1.532407870488964407324e-03 */
> +        .quad 0x3F48B0E1DD283558   /* P8  = +7.535078860329375669277e-04 */
> +        .quad 0xBF2B322292840D2B   /* P9  = -2.074877932117605962646e-04 */
> +        .quad 0xBE99E4061120C741   /* P10 = -3.858017559892704559672e-07 */
> +        .quad 0xBFFD000000000000   /* B = -1.8125   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6AF8C2041C67CD   /* PL0 = +1.169711482626385762338e-17 */
> +        .quad 0x3FEEB2DFEDD5EC93   /* PH0 = +9.593352933146824801369e-01 */
> +        .quad 0x3FB465A205CFB638   /* P1  = +7.967579500083210999681e-02 */
> +        .quad 0xBFB3914BF68D39FF   /* P2  = -7.643580216720378576778e-02 */
> +        .quad 0x3FA7F21A08C5C734   /* P3  = +4.676896435820623621673e-02 */
> +        .quad 0xBF93DA9560EA9960   /* P4  = -1.938851741820124550772e-02 */
> +        .quad 0x3F73953FEC62820E   /* P5  = +4.781007481284861359820e-03 */
> +        .quad 0x3F2749D5E1273E3C   /* P6  = +1.776765426044646108071e-04 */
> +        .quad 0xBF4D46B0B498CE5A   /* P7  = -8.934367007839658352859e-04 */
> +        .quad 0x3F4153D680E1F4C4   /* P8  = +5.287930851093571206574e-04 */
> +        .quad 0xBF28477014ECA6A2   /* P9  = -1.852344816708944640949e-04 */
> +        .quad 0x3EFFAC54E07CEB4B   /* P10 = +3.020588886147182143902e-05 */
> +        .quad 0xBFFF000000000000   /* B = -1.9375   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7A8AF2BB2231F2   /* PL0 = +2.302217989249372577466e-17 */
> +        .quad 0x3FEF1994DF724FC8   /* PH0 = +9.718727459135090285258e-01 */
> +        .quad 0x3FAC65B1BC0C9D58   /* P1  = +5.546336575053583942603e-02 */
> +        .quad 0xBFAB9937BDA747C8   /* P2  = -5.390333356957871365599e-02 */
> +        .quad 0x3FA15B42D9EF931C   /* P3  = +3.389939222669210777241e-02 */
> +        .quad 0xBF8EACD8E8507A3C   /* P4  = -1.497811755149058215502e-02 */
> +        .quad 0x3F7263A15721C682   /* P5  = +4.489546046998806349050e-03 */
> +        .quad 0xBF42A032ACDC3B32   /* P6  = -5.684134900735048121829e-04 */
> +        .quad 0xBF3431E79B5AD185   /* P7  = -3.081503340170088810438e-04 */
> +        .quad 0x3F31B51667C7DF5E   /* P8  = +2.701930714290502424828e-04 */
> +        .quad 0xBF1F8709579250AD   /* P9  = -1.202678157759563704341e-04 */
> +        .quad 0x3F01ED8ED1BF9595   /* P10 = +3.419487094883790833778e-05 */
> +        .quad 0xC001000000000000   /* B = -2.125    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C86F3F7C3DAFC55   /* PL0 = +3.981710680748877459333e-17 */
> +        .quad 0x3FEF73776B2AA2DB   /* PH0 = +9.828450291725759901951e-01 */
> +        .quad 0x3FA16A7FC4D7B900   /* P1  = +3.401564863075812007064e-02 */
> +        .quad 0xBFA11E03803AD621   /* P2  = -3.343211117082156940532e-02 */
> +        .quad 0x3F9609591597297F   /* P3  = +2.152003473546803654658e-02 */
> +        .quad 0xBF847E74ED9BBB0C   /* P4  = -1.000682211039596246436e-02 */
> +        .quad 0x3F6BFF771725CD65   /* P5  = +3.417713736035987187864e-03 */
> +        .quad 0xBF491D1FF73C18FA   /* P6  = -7.664114077392807421000e-04 */
> +        .quad 0x3EF53EE467B51DC5   /* P7  = +2.026145237479599375099e-05 */
> +        .quad 0x3F160135BE0D94A0   /* P8  = +8.394136922403255700685e-05 */
> +        .quad 0xBF0B32CB1D276A40   /* P9  = -5.187685350778849443841e-05 */
> +        .quad 0x3EF4DAF70C12D555   /* P10 = +1.988919462255396826584e-05 */
> +        .quad 0xC003000000000000   /* B = -2.375    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C19DBF4E2E5B7DC   /* PL0 = +3.504575836708380670219e-19 */
> +        .quad 0x3FEFAA7934B75EBD   /* PH0 = +9.895597486128832054320e-01 */
> +        .quad 0x3F9545200830A42C   /* P1  = +2.077150392520736492125e-02 */
> +        .quad 0xBF950C46D285F6BC   /* P2  = -2.055464420253970271376e-02 */
> +        .quad 0x3F8B79F5BFC6513F   /* P3  = +1.341621390819425058164e-02 */
> +        .quad 0xBF7A50ADAD777898   /* P4  = -6.424597194806612772505e-03 */
> +        .quad 0x3F633A19BE8255E3   /* P5  = +2.347040444940816227383e-03 */
> +        .quad 0xBF44E609BC2557B7   /* P6  = -6.377742322836087134324e-04 */
> +        .quad 0x3F1AFCBAD60EAACD   /* P7  = +1.029480968230231421206e-04 */
> +        .quad 0x3EE80476AC34A8EF   /* P8  = +1.145240583485084317660e-05 */
> +        .quad 0xBEF278E23DE463E9   /* P9  = -1.761646478213091821804e-05 */
> +        .quad 0x3EE209FAF377264D   /* P10 = +8.601658563106529694651e-06 */
> +        .quad 0xC005000000000000   /* B = -2.625    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C979D62702C631C   /* PL0 = +8.193023793215066385979e-17 */
> +        .quad 0x3FEFCC04CDBCDC4B   /* PH0 = +9.936546343150295390600e-01 */
> +        .quad 0x3F89E87D088D269A   /* P1  = +1.265046770426474576547e-02 */
> +        .quad 0xBF89BE6721012B80   /* P2  = -1.257019586059526836624e-02 */
> +        .quad 0x3F80F1C13E8D39D3   /* P3  = +8.273610803056031004326e-03 */
> +        .quad 0xBF7082DBC9602757   /* P4  = -4.031046430108839563004e-03 */
> +        .quad 0x3F590BE9BD4E0A11   /* P5  = +1.528719197467002507978e-03 */
> +        .quad 0xBF3DCC2BEF6D0283   /* P6  = -4.546744598208711809986e-04 */
> +        .quad 0x3F1A08065C4A8E85   /* P7  = +9.930170842636406837764e-05 */
> +        .quad 0xBEE528117D0410F3   /* P8  = -1.008821337267942266431e-05 */
> +        .quad 0xBED0BE73A44FF565   /* P9  = -3.992069257383521775961e-06 */
> +        .quad 0x3EC9B0C11E342E38   /* P10 = +3.062539904901699218737e-06 */
> +        .quad 0xC007000000000000   /* B = -2.875    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C804B931AD7A3CC   /* PL0 = +2.826768921701616830245e-17 */
> +        .quad 0x3FEFE06EB0688212   /* PH0 = +9.961465306733450209009e-01 */
> +        .quad 0x3F7F81BD8876224D   /* P1  = +7.692089427458426472642e-03 */
> +        .quad 0xBF7F62A8C699A963   /* P2  = -7.662448196791823756776e-03 */
> +        .quad 0x3F74C31E2B2A6A28   /* P3  = +5.068891378551522166321e-03 */
> +        .quad 0xBF6470D537F16227   /* P4  = -2.495209162173734080001e-03 */
> +        .quad 0x3F4FAEEF61C89673   /* P5  = +9.668988091717359455754e-04 */
> +        .quad 0xBF33C5E80B349783   /* P6  = -3.017131341088651514023e-04 */
> +        .quad 0x3F138F3D31037A6B   /* P7  = +7.461367590931028650557e-05 */
> +        .quad 0xBEEB3C780996FFE3   /* P8  = -1.298723536791163711556e-05 */
> +        .quad 0x3E9D0C75BC8BFEFC   /* P9  = +4.328589367358221917138e-07 */
> +        .quad 0x3EAC3865227764D4   /* P10 = +8.410302755848104487452e-07 */
> +        .quad 0xC009000000000000   /* B = -3.125    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C5B978B202749F9   /* PL0 = +5.983054034451594408315e-18 */
> +        .quad 0x3FEFECD6B7EA3128   /* PH0 = +9.976609794698889643882e-01 */
> +        .quad 0x3F73238B786137FE   /* P1  = +4.672570043181776968058e-03 */
> +        .quad 0xBF731815ACEA072E   /* P2  = -4.661640805922390930706e-03 */
> +        .quad 0x3F6956F0816D5AEE   /* P3  = +3.093213784647877798933e-03 */
> +        .quad 0xBF591A16286C4885   /* P4  = -1.532098425461232453877e-03 */
> +        .quad 0x3F43B3E3A00C6096   /* P5  = +6.012784434430592468442e-04 */
> +        .quad 0xBF29441B2A56DEC7   /* P6  = -1.927645836710038499293e-04 */
> +        .quad 0x3F0A99C3A2E857B6   /* P7  = +5.073669705184196724674e-05 */
> +        .quad 0xBEE61CB034DDC151   /* P8  = -1.054385361573597042258e-05 */
> +        .quad 0x3EB792BBC76D6107   /* P9  = +1.405070887824641788698e-06 */
> +        .quad 0x3E761472362A16F0   /* P10 = +8.225391704739515383837e-08 */
> +        .quad 0xC00B000000000000   /* B = -3.375    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9C290AFCBDE00D   /* PL0 = +9.770074992945060684926e-17 */
> +        .quad 0x3FEFF45F6D36133A   /* PH0 = +9.985806592017987259879e-01 */
> +        .quad 0x3F673CEC093032DE   /* P1  = +2.836667068100913999228e-03 */
> +        .quad 0xBF67347A7CD844D5   /* P2  = -2.832640870800243808078e-03 */
> +        .quad 0x3F5EDA25530355DB   /* P3  = +1.883064698679040793627e-03 */
> +        .quad 0xBF4EAD3BBABC1BA9   /* P4  = -9.361783645268534848806e-04 */
> +        .quad 0x3F3842E61CD35432   /* P5  = +3.701984213198588740338e-04 */
> +        .quad 0xBF1F9AB7FD1A3DDD   /* P6  = -1.205611036090218544867e-04 */
> +        .quad 0x3F0136C154EA3DED   /* P7  = +3.283288480304320224929e-05 */
> +        .quad 0xBEDF12807F721E66   /* P8  = -7.408207230892235753013e-06 */
> +        .quad 0x3EB5B53687AD5112   /* P9  = +1.293889481520047941659e-06 */
> +        .quad 0xBE801E90FBFED147   /* P10 = -1.200988872775447204019e-07 */
> +        .quad 0xC00D000000000000   /* B = -3.625    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9E323294294877   /* PL0 = +1.047637125334028950603e-16 */
> +        .quad 0x3FEFF8F21CDAAA62   /* PH0 = +9.991388858373506653976e-01 */
> +        .quad 0x3F5C3470628813F2   /* P1  = +1.721486807697344658108e-03 */
> +        .quad 0xBF5C2E38AC6FF8D2   /* P2  = -1.720004411026422324849e-03 */
> +        .quad 0x3F52C13234626F43   /* P3  = +1.144694354969070234454e-03 */
> +        .quad 0xBF42B0A47DF47BB4   /* P4  = -5.703738387728891173354e-04 */
> +        .quad 0x3F2DB2889E32FBFD   /* P5  = +2.265731592156760387344e-04 */
> +        .quad 0xBF1385FBD54C5A55   /* P6  = -7.447576110695385196414e-05 */
> +        .quad 0x3EF5AFA812C6984E   /* P7  = +2.068153223579892541184e-05 */
> +        .quad 0xBED47097C188A03C   /* P8  = -4.873231795467276043290e-06 */
> +        .quad 0x3EAFF2B982F7EE8C   /* P9  = +9.521288628073486288914e-07 */
> +        .quad 0xBE828EC5B57D424D   /* P10 = -1.382656715739529384702e-07 */
> +        .quad 0xC00F000000000000   /* B = -3.875    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9BA40DA6983BEC   /* PL0 = +9.589840482158163453169e-17 */
> +        .quad 0x3FEFFCAAC3F20E65   /* PH0 = +9.995931460438894911036e-01 */
> +        .quad 0x3F4AA87CF664754C   /* P1  = +8.135423820793490331956e-04 */
> +        .quad 0xBF4AA5B62919E224   /* P2  = -8.132113891426467676310e-04 */
> +        .quad 0x3F41C01B53B0B312   /* P3  = +5.416997368051531710388e-04 */
> +        .quad 0xBF31B8B54D091751   /* P4  = -2.704088811110632606347e-04 */
> +        .quad 0x3F1C431305954ECC   /* P5  = +1.078110084525254933728e-04 */
> +        .quad 0xBF02B7DEAD0D44E6   /* P6  = -3.570221236393906131126e-05 */
> +        .quad 0x3EE51C6EFF109EA9   /* P7  = +1.006654199116272154479e-05 */
> +        .quad 0xBEC48CFB08072D17   /* P8  = -2.449834994621594976610e-06 */
> +        .quad 0x3EA1585EC59CAE34   /* P9  = +5.169271261920604503617e-07 */
> +        .quad 0xBE78832BAF950BA9   /* P10 = -9.131575131209528255629e-08 */
> +        .quad 0xC011000000000000   /* B = -4.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8FBF237F4AFE10   /* PL0 = +5.507163370275307643966e-17 */
> +        .quad 0x3FEFFEC61279A3A4   /* PH0 = +9.998503075449787225182e-01 */
> +        .quad 0x3F339E78281A00EA   /* P1  = +2.993625022114214863645e-04 */
> +        .quad 0xBF339DB7B072AD62   /* P2  = -2.993176899035080028902e-04 */
> +        .quad 0x3F2A259E658EF4E4   /* P3  = +1.994853835451177669594e-04 */
> +        .quad 0xBF1A219C312B10BA   /* P4  = -9.968295880030927192162e-05 */
> +        .quad 0x3F04E146B4F5F4B7   /* P5  = +3.982541113154699160876e-05 */
> +        .quad 0xBEEBC5F137088210   /* P6  = -1.324329943580649487333e-05 */
> +        .quad 0x3ECF96736E300B00   /* P7  = +3.765547135882256916132e-06 */
> +        .quad 0xBEAF4874840B91EB   /* P8  = -9.323068824421825762292e-07 */
> +        .quad 0x3E8B6AB2B5C8FD3F   /* P9  = +2.042709991312793245971e-07 */
> +        .quad 0xBE650BCCE62FD2B7   /* P10 = -3.920140725219944650830e-08 */
> +        .quad 0xC013000000000000   /* B = -4.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9C869C85471703   /* PL0 = +9.896883942603146946483e-17 */
> +        .quad 0x3FEFFF8C81C6DC33   /* PH0 = +9.999449286177707341139e-01 */
> +        .quad 0x3F1CDF5A2E4D7C69   /* P1  = +1.101397316012206760643e-04 */
> +        .quad 0xBF1CDEF1F9BE63BE   /* P2  = -1.101336660539594564027e-04 */
> +        .quad 0x3F133EC10C83AAA0   /* P3  = +7.341435696487731017506e-05 */
> +        .quad 0xBF033DAB325FAACB   /* P4  = -3.669909192168459445238e-05 */
> +        .quad 0x3EEEC598FA98BAD8   /* P5  = +1.467316890843338172161e-05 */
> +        .quad 0xBED47F1A15BA368E   /* P6  = -4.886744445221253126882e-06 */
> +        .quad 0x3EB761FBE7D201C1   /* P7  = +1.393720509029845064726e-06 */
> +        .quad 0xBE974CD75A43BF6B   /* P8  = -3.471994551992448536007e-07 */
> +        .quad 0x3E74B02965BBF8DC   /* P9  = +7.706929621914905669946e-08 */
> +        .quad 0xBE504EF4E3892A66   /* P10 = -1.518840362012570189110e-08 */
> +        .quad 0xC015000000000000   /* B = -5.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C643810400471B0   /* PL0 = +8.768592603904887599187e-18 */
> +        .quad 0x3FEFFFD583014825   /* PH0 = +9.999797400180382433987e-01 */
> +        .quad 0x3F053E71416C43CA   /* P1  = +4.051955345663706869871e-05 */
> +        .quad 0xBF053E550C7C8CC9   /* P2  = -4.051873253121394012080e-05 */
> +        .quad 0x3EFC52D0D90D4843   /* P3  = +2.701139380018752534477e-05 */
> +        .quad 0xBEEC523A6ADBE142   /* P4  = -1.350460237457883558350e-05 */
> +        .quad 0x3ED6A73E22D844B3   /* P5  = +5.400965660055565196396e-06 */
> +        .quad 0xBEBE31D10F23ACD0   /* P6  = -1.799738182979224868919e-06 */
> +        .quad 0x3EA13E14264DEAB2   /* P7  = +5.138663935333241981438e-07 */
> +        .quad 0xBE81385ABB98EDCC   /* P8  = -1.282999997786486835638e-07 */
> +        .quad 0x3E5EB9164593E0B6   /* P9  = +2.861301981891537161158e-08 */
> +        .quad 0xBE387218CFE7772E   /* P10 = -5.691705994073124478195e-09 */
> +        .quad 0xC017000000000000   /* B = -5.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C92530433F4C703   /* PL0 = +6.357512739163799046861e-17 */
> +        .quad 0x3FEFFFF05E8D3191   /* PH0 = +9.999925467214315633058e-01 */
> +        .quad 0x3EEF42DDFA52B575   /* P1  = +1.490650158538873335176e-05 */
> +        .quad 0xBEEF42CEB54212AA   /* P2  = -1.490639048307961378200e-05 */
> +        .quad 0x3EE4D7201CBCB853   /* P3  = +9.937445518550804010127e-06 */
> +        .quad 0xBED4D6F764B66C37   /* P4  = -4.968574624976280456686e-06 */
> +        .quad 0x3EC0ABB806EBDE71   /* P5  = +1.987311456171617620608e-06 */
> +        .quad 0xBEA6399CF854F876   /* P6  = -6.623581475862682369330e-07 */
> +        .quad 0x3E8964B91728D7C9   /* P7  = +1.891959403186505598965e-07 */
> +        .quad 0xBE6961A0528444D6   /* P8  = -4.727645325404986954168e-08 */
> +        .quad 0x3E46AE3B0814EE00   /* P9  = +1.056147192151514779549e-08 */
> +        .quad 0xBE221B8194DACD16   /* P10 = -2.107984154277957626641e-09 */
> +        .quad 0xC019000000000000   /* B = -6.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7BB5622CE1A79E   /* PL0 = +2.403331811901679167526e-17 */
> +        .quad 0x3FEFFFFA3FF22708   /* PH0 = +9.999972580855862602789e-01 */
> +        .quad 0x3ED7003552D53503   /* P1  = +5.483821309338170039906e-06 */
> +        .quad 0xBED7003130C1AB92   /* P2  = -5.483806273169366545037e-06 */
> +        .quad 0x3ECEAAE13B699C45   /* P3  = +3.655850800133043324271e-06 */
> +        .quad 0xBEBEAACB305F3D07   /* P4  = -1.827905351959291114416e-06 */
> +        .quad 0x3EA8887F5F9C87EF   /* P5  = +7.311461438267648556646e-07 */
> +        .quad 0xBE905AD08DF8454F   /* P6  = -2.437046884027860662692e-07 */
> +        .quad 0x3E72B068300B703F   /* P7  = +6.962228483613086736676e-08 */
> +        .quad 0xBE52AF921A71C058   /* P8  = -1.740252888706390465423e-08 */
> +        .quad 0x3E30B53EAA35300D   /* P9  = +3.890131469838137725119e-09 */
> +        .quad 0xBE0AB60CDAD7E22E   /* P10 = -7.773963050435300060566e-10 */
> +        .quad 0xC01B000000000000   /* B = -6.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8BD1ACF80D7256   /* PL0 = +4.825835138930451121169e-17 */
> +        .quad 0x3FEFFFFDE2760A41   /* PH0 = +9.999989913051835488389e-01 */
> +        .quad 0x3EC0EC4F1EC27E55   /* P1  = +2.017388615341105998718e-06 */
> +        .quad 0xBEC0EC4E005E6EAC   /* P2  = -2.017386580411626200507e-06 */
> +        .quad 0x3EB6906504BC4610   /* P3  = +1.344921673533307001969e-06 */
> +        .quad 0xBEA6905F0D52C8B5   /* P4  = -6.724581235377781360384e-07 */
> +        .quad 0x3E920D0F5CCE152B   /* P5  = +2.689810941136721216499e-07 */
> +        .quad 0xBE7811505B10E753   /* P6  = -8.965891741619763761543e-08 */
> +        .quad 0x3E5B811EE4F9B8EE   /* P7  = +2.561544781706659619288e-08 */
> +        .quad 0xBE3B80ABC067E840   /* P8  = -6.403452884688571158579e-09 */
> +        .quad 0x3E1898E394E09335   /* P9  = +1.431746793613569087489e-09 */
> +        .quad 0xBDF3ABB5BA711DB7   /* P10 = -2.862469657501951918569e-10 */
> +        .quad 0xC01D000000000000   /* B = -7.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8AE01DB39A3791   /* PL0 = +4.662147961093911873193e-17 */
> +        .quad 0x3FEFFFFF38C76668   /* PH0 = +9.999996289217962797125e-01 */
> +        .quad 0x3EA8E712E56E1188   /* P1  = +7.421562696484951529573e-07 */
> +        .quad 0xBEA8E7124A650791   /* P2  = -7.421559942504648535596e-07 */
> +        .quad 0x3EA09A0B62D8EF94   /* P3  = +4.947702955735978541097e-07 */
> +        .quad 0xBE909A09C56C2107   /* P4  = -2.473847805916120382218e-07 */
> +        .quad 0x3E7A900A90A54A6E   /* P5  = +9.895362410487317236618e-08 */
> +        .quad 0xBE61B5557BB449B6   /* P6  = -3.298434544432568302770e-08 */
> +        .quad 0x3E443CC74732CDCA   /* P7  = +9.423781066565733462466e-09 */
> +        .quad 0xBE243CA8AA8D6E54   /* P8  = -2.355890888986360997159e-09 */
> +        .quad 0x3E0219C341E0D1B4   /* P9  = +5.267978308406275552691e-10 */
> +        .quad 0xBDDCF49A10950F13   /* P10 = -1.053394074620716018815e-10 */
> +        .quad 0xC01F000000000000   /* B = -7.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C75CB18F3775414   /* PL0 = +1.890271747518592444083e-17 */
> +        .quad 0x3FEFFFFFD38C39F0   /* PH0 = +9.999999172012490333827e-01 */
> +        .quad 0x3E8639E2F89493BB   /* P1  = +1.655974950855472979393e-07 */
> +        .quad 0xBE8639E2D9B29562   /* P2  = -1.655974813708346974914e-07 */
> +        .quad 0x3E7DA2836A1F706E   /* P3  = +1.103982989742589616541e-07 */
> +        .quad 0xBE6DA282C6733DAE   /* P4  = -5.519913131581509871840e-08 */
> +        .quad 0x3E57B53A278851FD   /* P5  = +2.207971980430773309147e-08 */
> +        .quad 0xBE3F9C4A72536E22   /* P6  = -7.359895614149337484810e-09 */
> +        .quad 0x3E220E81FBE19CDD   /* P7  = +2.102073153607135257714e-09 */
> +        .quad 0xBE020E8875ADA8D8   /* P8  = -5.255211642212584097407e-10 */
> +        .quad 0x3DE07634328384FC   /* P9  = +1.197748786062966341989e-10 */
> +        .quad 0xBDBA54078E3C351F   /* P10 = -2.394539505021488953905e-11 */
> +        .quad 0xC021000000000000   /* B = -8.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C98B78738B0EDEF   /* PL0 = +8.575399788039081964921e-17 */
> +        .quad 0x3FEFFFFFF9FBEA40   /* PH0 = +9.999999887944071019774e-01 */
> +        .quad 0x3E581056FAC28C46   /* P1  = +2.241118550516412682327e-08 */
> +        .quad 0xBE581056F63A4351   /* P2  = -2.241118525356742542550e-08 */
> +        .quad 0x3E500AE49533790A   /* P3  = +1.494078933911655875521e-08 */
> +        .quad 0xBE400AE489ACBA90   /* P4  = -7.470394349637968945652e-09 */
> +        .quad 0x3E29AB0D59A1967B   /* P5  = +2.988168557255271725494e-09 */
> +        .quad 0xBE111CB32D6EEF2B   /* P6  = -9.960558400070350772418e-10 */
> +        .quad 0x3DF38CBADF396908   /* P7  = +2.844859618921805216353e-10 */
> +        .quad 0xBDD38CC7B92CECD3   /* P8  = -7.112220386749926320915e-11 */
> +        .quad 0x3DB1D2BBE2705032   /* P9  = +1.621008722427575444686e-11 */
> +        .quad 0xBD8C8199294E6380   /* P10 = -3.240784656869469020111e-12 */
> +        .quad 0xC023000000000000   /* B = -9.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8EEEC16618B984   /* PL0 = +5.365957423487855307906e-17 */
> +        .quad 0x3FEFFFFFFF2F9279   /* PH0 = +9.999999984834878619111e-01 */
> +        .quad 0x3E2A0DB0D052B148   /* P1  = +3.033024167396880687734e-09 */
> +        .quad 0xBE2A0DB0CFA6AB71   /* P2  = -3.033024162734192808028e-09 */
> +        .quad 0x3E215E75D53A3105   /* P3  = +2.022016035353114070618e-09 */
> +        .quad 0xBE115E75D40AA47F   /* P4  = -1.011008013562702155050e-09 */
> +        .quad 0x3DFBCA5CDC12ED1C   /* P5  = +4.044047007631481841556e-10 */
> +        .quad 0xBDE286E85704FC22   /* P6  = -1.348015410318274576187e-10 */
> +        .quad 0x3DC52A8925354517   /* P7  = +3.850101197145027796396e-11 */
> +        .quad 0xBDA52A97EA3F5F4A   /* P8  = -9.625355478142550638468e-12 */
> +        .quad 0x3D834C011A2AC0F7   /* P9  = +2.193802608697321032841e-12 */
> +        .quad 0xBD5EDD05BDCB3A62   /* P10 = -4.385948508419928563300e-13 */
> +        .quad 0xC025000000000000   /* B = -10.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BD8B474BBF792   /* PL0 = +1.207649585364892639612e-17 */
> +        .quad 0x3FEFFFFFFFE3CAD8   /* PH0 = +9.999999997947623953110e-01 */
> +        .quad 0x3DFC3527E43C565F   /* P1  = +4.104751852963940338559e-10 */
> +        .quad 0xBDFC3527E420F415   /* P2  = -4.104751852036136216697e-10 */
> +        .quad 0x3DF2CE1A8D806DAD   /* P3  = +2.736501142887952919489e-10 */
> +        .quad 0xBDE2CE1A8DDF690A   /* P4  = -1.368250573053032426141e-10 */
> +        .quad 0x3DCE169832D8BD68   /* P5  = +5.473022586854025789680e-11 */
> +        .quad 0xBDB40F0FE853DA5B   /* P6  = -1.824340550195944358477e-11 */
> +        .quad 0x3D96EA8D930D31A1   /* P7  = +5.210545794901128943676e-12 */
> +        .quad 0xBD76EA9DB0D09839   /* P8  = -1.302650427355019556441e-12 */
> +        .quad 0x3D54E474FD4303A1   /* P9  = +2.968990047962355000258e-13 */
> +        .quad 0xBD30B526CA2B228A   /* P10 = -5.935740124899435401321e-14 */
> +        .quad 0xC027000000000000   /* B = -11.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C56E8953D525FD5   /* PL0 = +4.967494994909661698725e-18 */
> +        .quad 0x3FEFFFFFFFFC2EB9   /* PH0 = +9.999999999722241073030e-01 */
> +        .quad 0x3DCE8A37A48016C2   /* P1  = +5.555177547354687971427e-11 */
> +        .quad 0xBDCE8A37A479B7D4   /* P2  = -5.555177547084873157964e-11 */
> +        .quad 0x3DC45C250CFA9C16   /* P3  = +3.703451575129414499553e-11 */
> +        .quad 0xBDB45C250D9F8467   /* P4  = -1.851725791056759260154e-11 */
> +        .quad 0x3DA049BB33CBD4E9   /* P5  = +7.406930640558963265190e-12 */
> +        .quad 0xBD85B7A407C422C1   /* P6  = -2.468976464832073512208e-12 */
> +        .quad 0x3D68CF9CED2B3FD5   /* P7  = +7.051706989348171774536e-13 */
> +        .quad 0xBD48CFAE64C352B3   /* P8  = -1.762945685274427023683e-13 */
> +        .quad 0x3D269EAE08690D52   /* P9  = +4.018091287355461204663e-14 */
> +        .quad 0xBD0216CBEAFFF5AA   /* P10 = -8.033151495672990022322e-15 */
> +        .quad 0xC029000000000000   /* B = -12.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8ACF1392B106D3   /* PL0 = +4.650601502940921454330e-17 */
> +        .quad 0x3FEFFFFFFFFF7BBD   /* PH0 = +9.999999999962408958609e-01 */
> +        .quad 0x3DA088529889B316   /* P1  = +7.518115268189742464885e-12 */
> +        .quad 0xBDA088529887F4C4   /* P2  = -7.518115268005149164680e-12 */
> +        .quad 0x3D960B18BF1DF711   /* P3  = +5.012076679213679703380e-12 */
> +        .quad 0xBD860B18BFD99A48   /* P4  = -2.506038344573564868987e-12 */
> +        .quad 0x3D71A27E7CA64143   /* P5  = +1.002419056539285288454e-12 */
> +        .quad 0xBD5783530EA76D91   /* P6  = -3.341396294294381580191e-13 */
> +        .quad 0x3D3ADCC75CBD2A03   /* P7  = +9.543447641637910477850e-14 */
> +        .quad 0xBD1ADCDA46BE5F17   /* P8  = -2.385887543769010971872e-14 */
> +        .quad 0x3CF87D77650BE5B8   /* P9  = +5.437895260471143131391e-15 */
> +        .quad 0xBCD395AE6E74C6D2   /* P10 = -1.087168847335561258239e-15 */
> +        .quad 0xC02B000000000000   /* B = -13.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C97A8A295292858   /* PL0 = +8.208271151146829171896e-17 */
> +        .quad 0x3FEFFFFFFFFFEE19   /* PH0 = +9.999999999994911847878e-01 */
> +        .quad 0x3D71E642BB008F95   /* P1  = +1.017466259229268282255e-12 */
> +        .quad 0xBD71E642BAFEEC54   /* P2  = -1.017466259207593392022e-12 */
> +        .quad 0x3D67DDAE41647741   /* P3  = +6.783108169938233581038e-13 */
> +        .quad 0xBD57DDAE4230F34B   /* P4  = -3.391554091734942426856e-13 */
> +        .quad 0x3D4317C33FAE2536   /* P5  = +1.356626669455791324801e-13 */
> +        .quad 0xBD2975040D3E26B9   /* P6  = -4.522088139411435138867e-14 */
> +        .quad 0x3D0D155DCD0F0AFB   /* P7  = +1.291565189902030307333e-14 */
> +        .quad 0xBCED157247832B20   /* P8  = -3.228947666403019234175e-15 */
> +        .quad 0x3CCA83D70F607C28   /* P9  = +7.359390959466796619024e-16 */
> +        .quad 0xBCA5343952C1E19E   /* P10 = -1.471323041436694087188e-16 */
> +        .quad 0xC02D000000000000   /* B = -14.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9B7876CBC5306E   /* PL0 = +9.530765996816607711732e-17 */
> +        .quad 0x3FEFFFFFFFFFFD93   /* PH0 = +9.999999999999310551502e-01 */
> +        .quad 0x3D436121E2640D76   /* P1  = +1.376990843765503869546e-13 */
> +        .quad 0xBD436121E26250EA   /* P2  = -1.376990843736775811281e-13 */
> +        .quad 0x3D39D6D7CA259186   /* P3  = +9.179938654047876451320e-14 */
> +        .quad 0xBD29D6D7CB0327CE   /* P4  = -4.589969336188563660531e-14 */
> +        .quad 0x3D14ABE4DC31244A   /* P5  = +1.835994545584345768382e-14 */
> +        .quad 0xBCFB8FDB82AB6BB7   /* P6  = -6.119980791767901275443e-15 */
> +        .quad 0x3CDF7CF757491B60   /* P7  = +1.747943407988343076526e-15 */
> +        .quad 0xBCBF7D0D833640FB   /* P8  = -4.369905470133249448357e-16 */
> +        .quad 0x3C9CB512F6BDC754   /* P9  = +9.959852600692493655511e-17 */
> +        .quad 0xBC76F50AB1B0E9BA   /* P10 = -1.991219205936492089091e-17 */
> +        .quad 0xC02F000000000000   /* B = -15.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6FFE15D5F78543   /* PL0 = +1.387454417328248962819e-17 */
> +        .quad 0x3FEFFFFFFFFFFFE1   /* PH0 = +9.999999999999965583086e-01 */
> +        .quad 0x3CFEE00288B99C26   /* P1  = +6.855635762864742358597e-15 */
> +        .quad 0xBCFEE0027D060EE2   /* P2  = -6.855635607998342735403e-15 */
> +        .quad 0x3CF4954AA23148A2   /* P3  = +4.570381865813341696777e-15 */
> +        .quad 0xBCE4954B5DAD3010   /* P4  = -2.285192173571711474199e-15 */
> +        .quad 0x3CD07883DD8793BD   /* P5  = +9.143109661358222028007e-16 */
> +        .quad 0xBCB5F5F4BB87ADCF   /* P6  = -3.047668447080103869032e-16 */
> +        .quad 0x3C98F1A905097685   /* P7  = +8.654183371862458774513e-17 */
> +        .quad 0xBC78F2D585007222   /* P8  = -2.163943551222030413627e-17 */
> +        .quad 0x3C58A37CC5082B5F   /* P9  = +5.342649626494471588064e-18 */
> +        .quad 0xBC33AE7917F94D17   /* P10 = -1.066938163384541013918e-18 */
> +        .quad 0xC031000000000000   /* B = -17        */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C91BF1D80474F0F   /* PL0 = +6.157069264461989135096e-17 */
> +        .quad 0x3FEFFFFFFFFFFFFE   /* PH0 = +9.999999999999997779554e-01 */
> +        .quad 0x3CB72071400E6275   /* P1  = +3.209478247225075961360e-16 */
> +        .quad 0xBCB72071400A9F37   /* P2  = -3.209478247103497434502e-16 */
> +        .quad 0x3CAED5EC39A77629   /* P3  = +2.139652050028423711308e-16 */
> +        .quad 0xBC9ED5EC3B530600   /* P4  = -1.069826028468029104719e-16 */
> +        .quad 0x3C88AB2BFED159DE   /* P5  = +4.279326904335078988705e-17 */
> +        .quad 0xBC70721D1220B3FC   /* P6  = -1.426441958074916244382e-17 */
> +        .quad 0x3C52C96049721FB8   /* P7  = +4.073700029965821523731e-18 */
> +        .quad 0xBC32C971215735DC   /* P8  = -1.018438939975201710113e-18 */
> +        .quad 0x3C112EF658AB41A9   /* P9  = +2.328791246104218830028e-19 */
> +        .quad 0xBBEB7B598C6AD3DE   /* P10 = -4.655603964908654142787e-20 */
> +        .quad 0xC03287E0C98F84E5   /* B = -18.530774 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF0000000000000   /* PH0 = +1.000000000000000000000e+00 */
> +        .quad 0x0000000000000000   /* P1  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P2  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P3  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P4  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P5  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P6  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P7  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P8  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P9  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P10 = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* B = +0        */
> +        .quad 0x0000000000000000   /* A = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .align 16
> +        .quad 0x8000000000000000, 0x8000000000000000   /* _dbSignMask       */
> +        .align 16
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff   /* _dbAbsMask        */
> +        .align 16
> +        .long 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000           /* _iExpMantMask     */
> +        .align 16
> +        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMask         */
> +        .align 16
> +        .long 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000           /* _iMinIdxOfsMask   */
> +        .align 16
> +        .long 0x00760000, 0x00760000, 0x00760000, 0x00760000           /* _iMaxIdxMask      */
> +        .align 16
> +        .type  __svml_dtanh_data_internal,@object
> +        .size  __svml_dtanh_data_internal,.-__svml_dtanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S
> new file mode 100644
> index 0000000000..80e85c47ec
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized tanh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN4v_tanh _ZGVdN4v_tanh_sse_wrapper
> +#include "../svml_d_tanh4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c
> new file mode 100644
> index 0000000000..a26e62052b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized tanh, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN4v_tanh
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN4v_tanh, __GI__ZGVdN4v_tanh, __redirect__ZGVdN4v_tanh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S
> new file mode 100644
> index 0000000000..53dda241e4
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh4_core_avx2.S
> @@ -0,0 +1,1279 @@
> +/* Function tanh vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dtanh_data_internal
> + */
> +#define _dbP                           0
> +#define _dbSignMask                    7680
> +#define _dbAbsMask                     7712
> +#define _iExpMantMask                  7744
> +#define _iExpMask                      7776
> +#define _iMinIdxOfsMask                7808
> +#define _iMaxIdxMask                   7840
> +
> +#include <sysdep.h>
> +
> +        .text
> +       .section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN4v_tanh_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        subq      $96, %rsp
> +        lea       _dbP+96+__svml_dtanh_data_internal(%rip), %r8
> +        vmovupd   %ymm0, (%rsp)
> +
> +/* if VMIN, VMAX is defined for I type */
> +        vpxor     %xmm11, %xmm11, %xmm11
> +
> +/*  Constant loading  */
> +        vmovups   _iMaxIdxMask+__svml_dtanh_data_internal(%rip), %xmm8
> +        vandpd    _dbAbsMask+__svml_dtanh_data_internal(%rip), %ymm0, %ymm1
> +        vandpd    _dbSignMask+__svml_dtanh_data_internal(%rip), %ymm0, %ymm2
> +        vextractf128 $1, %ymm0, %xmm15
> +        vshufps   $221, %xmm15, %xmm0, %xmm14
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        vpand     _iExpMantMask+__svml_dtanh_data_internal(%rip), %xmm14, %xmm12
> +        vpsubd    _iMinIdxOfsMask+__svml_dtanh_data_internal(%rip), %xmm12, %xmm9
> +        vpcmpgtd  %xmm11, %xmm9, %xmm10
> +        vpcmpgtd  %xmm8, %xmm9, %xmm0
> +        vpand     %xmm10, %xmm9, %xmm7
> +        blendvps  %xmm0, %xmm8, %xmm7
> +
> +/*
> + * VSHRIMM( I, iIndex, = iIndex, (17 - 4) );
> + * VGATHER_MATRIX( L2D, p, TAB._dbP, iIndex, 0, T_ITEM_SIZE, T_ITEM_GRAN, 13, 0, 0 );
> + */
> +        vpsrld    $10, %xmm7, %xmm6
> +        vmovd     %xmm6, %edx
> +        vpcmpgtd  _iExpMask+__svml_dtanh_data_internal(%rip), %xmm12, %xmm13
> +        vmovmskps %xmm13, %eax
> +        vpextrd   $1, %xmm6, %ecx
> +        movslq    %edx, %rdx
> +        movslq    %ecx, %rcx
> +        vpextrd   $2, %xmm6, %esi
> +        vpextrd   $3, %xmm6, %edi
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        vmovupd   -96(%rdx,%r8), %xmm3
> +        vmovupd   -96(%rcx,%r8), %xmm4
> +        vmovupd   -80(%rcx,%r8), %xmm13
> +        vmovupd   -64(%rcx,%r8), %xmm9
> +        vmovupd   -80(%rdx,%r8), %xmm14
> +        vmovupd   -64(%rdx,%r8), %xmm10
> +        vmovupd   -48(%rdx,%r8), %xmm6
> +        vinsertf128 $1, -96(%rsi,%r8), %ymm3, %ymm0
> +        vinsertf128 $1, -96(%rdi,%r8), %ymm4, %ymm15
> +        vmovupd   -48(%rcx,%r8), %xmm3
> +        vunpckhpd %ymm15, %ymm0, %ymm0
> +        vinsertf128 $1, -80(%rsi,%r8), %ymm14, %ymm12
> +        vinsertf128 $1, -64(%rsi,%r8), %ymm10, %ymm8
> +        vinsertf128 $1, -80(%rdi,%r8), %ymm13, %ymm11
> +        vinsertf128 $1, -64(%rdi,%r8), %ymm9, %ymm7
> +        vunpcklpd %ymm11, %ymm12, %ymm15
> +        vunpckhpd %ymm11, %ymm12, %ymm14
> +        vunpcklpd %ymm7, %ymm8, %ymm13
> +        vunpckhpd %ymm7, %ymm8, %ymm12
> +        vmovupd   -32(%rdx,%r8), %xmm9
> +        vmovupd   -32(%rcx,%r8), %xmm8
> +        vinsertf128 $1, -48(%rsi,%r8), %ymm6, %ymm4
> +        vinsertf128 $1, -48(%rdi,%r8), %ymm3, %ymm5
> +        vunpcklpd %ymm5, %ymm4, %ymm11
> +        vunpckhpd %ymm5, %ymm4, %ymm10
> +        vmovupd   -16(%rdx,%r8), %xmm3
> +        vmovupd   -16(%rcx,%r8), %xmm4
> +        vinsertf128 $1, -32(%rsi,%r8), %ymm9, %ymm7
> +        vinsertf128 $1, -32(%rdi,%r8), %ymm8, %ymm6
> +        vunpcklpd %ymm6, %ymm7, %ymm9
> +        vunpckhpd %ymm6, %ymm7, %ymm8
> +        vinsertf128 $1, -16(%rsi,%r8), %ymm3, %ymm5
> +        vinsertf128 $1, -16(%rdi,%r8), %ymm4, %ymm6
> +        vunpcklpd %ymm6, %ymm5, %ymm7
> +        vunpckhpd %ymm6, %ymm5, %ymm6
> +        vmovupd   (%rdx,%r8), %xmm3
> +        vmovupd   (%rcx,%r8), %xmm5
> +        vinsertf128 $1, (%rsi,%r8), %ymm3, %ymm4
> +        vinsertf128 $1, (%rdi,%r8), %ymm5, %ymm5
> +        vunpcklpd %ymm5, %ymm4, %ymm3
> +        vaddpd    %ymm3, %ymm1, %ymm1
> +        vfmadd213pd %ymm7, %ymm1, %ymm6
> +        vfmadd213pd %ymm8, %ymm1, %ymm6
> +        vfmadd213pd %ymm9, %ymm1, %ymm6
> +        vfmadd213pd %ymm10, %ymm1, %ymm6
> +        vfmadd213pd %ymm11, %ymm1, %ymm6
> +        vfmadd213pd %ymm12, %ymm1, %ymm6
> +        vfmadd213pd %ymm13, %ymm1, %ymm6
> +        vfmadd213pd %ymm14, %ymm1, %ymm6
> +        vfmadd213pd %ymm15, %ymm1, %ymm6
> +        vfmadd213pd %ymm0, %ymm1, %ymm6
> +        vorpd     %ymm2, %ymm6, %ymm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovupd   (%rsp), %ymm1
> +        vmovupd   %ymm0, 64(%rsp)
> +        vmovupd   %ymm1, 32(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 eax ymm0
> +
> +        xorl      %edx, %edx
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovupd   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -80; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xb0, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -88; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa8, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -96; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xa0, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     32(%rsp,%r14,8), %xmm0
> +        call      tanh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 64(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN4v_tanh_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_dtanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _dbP[60*16][2];
> +        __declspec(align(32)) VUINT32 _dbSignMask[4][2];
> +        __declspec(align(32)) VUINT32 _dbAbsMask[4][2];
> +        __declspec(align(32)) VUINT32 _iExpMantMask[8][1];
> +        __declspec(align(32)) VUINT32 _iExpMask[8][1];
> +        __declspec(align(32)) VUINT32 _iMinIdxOfsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iMaxIdxMask[8][1];
> +} __svml_dtanh_data_internal;
> +#endif
> +__svml_dtanh_data_internal:
> +        /* Polynomial coefficients */
> +        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* PH0 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF0000000000000   /* P1  = +1.000000000000000014103e+00 */
> +        .quad 0xBD197DEAD79668D3   /* P2  = -2.264132406596103056796e-14 */
> +        .quad 0xBFD555555553AF3C   /* P3  = -3.333333333273349741024e-01 */
> +        .quad 0xBE052F7CCA134846   /* P4  = -6.165791385711493738399e-10 */
> +        .quad 0x3FC11111563849D6   /* P5  = +1.333333655353061107201e-01 */
> +        .quad 0xBEB038623673FFB2   /* P6  = -9.668021563879858950855e-07 */
> +        .quad 0xBFAB9F685E64022E   /* P7  = -5.395055916051593179252e-02 */
> +        .quad 0xBF2A54E2B28F2207   /* P8  = -2.008940439550829012647e-04 */
> +        .quad 0x3F97CFB9328A230E   /* P9  = +2.325333949059698582189e-02 */
> +        .quad 0xBF75CA6D61723E02   /* P10 = -5.320002811586290441790e-03 */
> +        .quad 0x0000000000000000   /* B = +0        */
> +        .quad 0x3FF0000000000000   /* A = +1.0      */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C3708A564FAD29A   /* PL0 = +1.248663375337163807466e-18 */
> +        .quad 0x3FC0E6973998DA48   /* PH0 = +1.320370703922029154143e-01 */
> +        .quad 0x3FEF712EB25C0888   /* P1  = +9.825662120422444519229e-01 */
> +        .quad 0xBFC09B296F7C1EA9   /* P2  = -1.297351641044220078331e-01 */
> +        .quad 0xBFD3DD77541EDDA7   /* P3  = -3.103922196855485849143e-01 */
> +        .quad 0x3FB58FFCF4309615   /* P4  = +8.422833406128689275566e-02 */
> +        .quad 0x3FBD3ABE845DCF49   /* P5  = +1.141776154670967208833e-01 */
> +        .quad 0xBFA791DF538C37FA   /* P6  = -4.603479285115947936529e-02 */
> +        .quad 0xBFA4F872F69CD6E8   /* P7  = -4.095801601799370195284e-02 */
> +        .quad 0x3F9772E49EF6412B   /* P8  = +2.289921970583567527179e-02 */
> +        .quad 0x3F8CBC0807393909   /* P9  = +1.403051635784581776625e-02 */
> +        .quad 0xBF85F06A30F93319   /* P10 = -1.071246110873285040939e-02 */
> +        .quad 0xBFC1000000000000   /* B = -.132813 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6004EE5739DEAC   /* PL0 = +6.947247374112211856530e-18 */
> +        .quad 0x3FC2DC968E6E0D62   /* PH0 = +1.473568149050193398786e-01 */
> +        .quad 0x3FEF4E1E606D96DF   /* P1  = +9.782859691010478680677e-01 */
> +        .quad 0xBFC273BD70994AB9   /* P2  = -1.441571044730005866646e-01 */
> +        .quad 0xBFD382B548270D2C   /* P3  = -3.048527912726111386771e-01 */
> +        .quad 0x3FB7CD2D582A6B29   /* P4  = +9.297450449450351894400e-02 */
> +        .quad 0x3FBC1278CCCBF0DB   /* P5  = +1.096568584434324642303e-01 */
> +        .quad 0xBFA9C7F5115B86A1   /* P6  = -5.035367810138536095866e-02 */
> +        .quad 0xBFA371C21BAF618E   /* P7  = -3.797728145554222910481e-02 */
> +        .quad 0x3F9958943F68417E   /* P8  = +2.475196492201935923783e-02 */
> +        .quad 0x3F8930D5CFFD4152   /* P9  = +1.230017701132682667572e-02 */
> +        .quad 0xBF875CF7ADD31B76   /* P10 = -1.140779017658897660092e-02 */
> +        .quad 0xBFC3000000000000   /* B = -.148438 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7EABE24E052A1F   /* PL0 = +2.660321779421749543501e-17 */
> +        .quad 0x3FC4D04783618C71   /* PH0 = +1.626061812886266111366e-01 */
> +        .quad 0x3FEF2765AF97A4B3   /* P1  = +9.735592298067302883212e-01 */
> +        .quad 0xBFC443654205FEA5   /* P2  = -1.583067486171689074207e-01 */
> +        .quad 0xBFD31F2E208A5B97   /* P3  = -2.987780874040536844467e-01 */
> +        .quad 0x3FB9F235BD339878   /* P4  = +1.013520800512156573576e-01 */
> +        .quad 0x3FBAD0B0DFCCA141   /* P5  = +1.047468706498238100104e-01 */
> +        .quad 0xBFABD1B9600E608E   /* P6  = -5.433444306908184548967e-02 */
> +        .quad 0xBFA1CEBEAF07DB58   /* P7  = -3.478046309094534453598e-02 */
> +        .quad 0x3F9AFC9FB1D8EFD2   /* P8  = +2.635430834764902126383e-02 */
> +        .quad 0x3F8573444F1AB502   /* P9  = +1.047376028449287564018e-02 */
> +        .quad 0xBF8874FBC8F24406   /* P10 = -1.194187838544459322219e-02 */
> +        .quad 0xBFC5000000000000   /* B = -.164063 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7FB199D361A790   /* PL0 = +2.748994907060158996213e-17 */
> +        .quad 0x3FC6C170259E21F7   /* PH0 = +1.777782615356639783766e-01 */
> +        .quad 0x3FEEFD17479F7C65   /* P1  = +9.683948897253570478266e-01 */
> +        .quad 0xBFC609530FE4DF8D   /* P2  = -1.721595599753950294577e-01 */
> +        .quad 0xBFD2B3465D71B4DE   /* P3  = -2.921920692959484052676e-01 */
> +        .quad 0x3FBBFD2D34AC509B   /* P4  = +1.093319181057403192166e-01 */
> +        .quad 0x3FB9778C3C16A0FE   /* P5  = +9.948040453912551395183e-02 */
> +        .quad 0xBFADAC4D9E63C665   /* P6  = -5.795519407719210697372e-02 */
> +        .quad 0xBFA0139CCAD02D60   /* P7  = -3.139963126894929339124e-02 */
> +        .quad 0x3F9C5BF43BA6F19D   /* P8  = +2.769452680671379432854e-02 */
> +        .quad 0x3F8190B703350341   /* P9  = +8.576803002712575184772e-03 */
> +        .quad 0xBF8936606782858A   /* P10 = -1.231074634444230850234e-02 */
> +        .quad 0xBFC7000000000000   /* B = -.179688 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6A917CA3624D50   /* PL0 = +1.152216693509785660691e-17 */
> +        .quad 0x3FC8AFD7B974FABB   /* PH0 = +1.928662925292508878439e-01 */
> +        .quad 0x3FEECF47624A5D03   /* P1  = +9.628025932060214187231e-01 */
> +        .quad 0xBFC7C4C2CB4FDE4D   /* P2  = -1.856921665891938814679e-01 */
> +        .quad 0xBFD23F69CB2C1F9D   /* P3  = -2.851204380135586155453e-01 */
> +        .quad 0x3FBDEC5703A03814   /* P4  = +1.168875106670557712458e-01 */
> +        .quad 0x3FB8095003D0CF15   /* P5  = +9.389209836154706616487e-02 */
> +        .quad 0xBFAF554B47B10CBB   /* P6  = -6.119761705533607365968e-02 */
> +        .quad 0xBF9C89743FE7BC1B   /* P7  = -2.786809577986213853937e-02 */
> +        .quad 0x3F9D74725B746E7C   /* P8  = +2.876452143855921824991e-02 */
> +        .quad 0x3F7B2D8AFB70B88C   /* P9  = +6.635229968237631511880e-03 */
> +        .quad 0xBF89A0A2883EF6CB   /* P10 = -1.251341799058582545252e-02 */
> +        .quad 0xBFC9000000000000   /* B = -.195313 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7608279E8609CB   /* PL0 = +1.910958764623660748269e-17 */
> +        .quad 0x3FCA9B46D2DDC5E3   /* PH0 = +2.078636674519166172015e-01 */
> +        .quad 0x3FEE9E0BB72A01A1   /* P1  = +9.567926957534390123919e-01 */
> +        .quad 0xBFC974FAD10C5330   /* P2  = -1.988824387305156976885e-01 */
> +        .quad 0xBFD1C40ACCBA4044   /* P3  = -2.775904654781735703430e-01 */
> +        .quad 0x3FBFBE24E2987853   /* P4  = +1.239951184474830487522e-01 */
> +        .quad 0x3FB6885B4345E47F   /* P5  = +8.801813499839460539687e-02 */
> +        .quad 0xBFB06563D5670584   /* P6  = -6.404708824176991770896e-02 */
> +        .quad 0xBF98CD1D620DF6E2   /* P7  = -2.421995078065365147772e-02 */
> +        .quad 0x3F9E44EF3E844D21   /* P8  = +2.955983943054463683119e-02 */
> +        .quad 0x3F7325FA0148CAAE   /* P9  = +4.674889165971292322643e-03 */
> +        .quad 0xBF89B4C8556C2D92   /* P10 = -1.255184660614964011319e-02 */
> +        .quad 0xBFCB000000000000   /* B = -.210938 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6F19DAA20F51D5   /* PL0 = +1.348790537832000351176e-17 */
> +        .quad 0x3FCC83876CA98E15   /* PH0 = +2.227639465883021474557e-01 */
> +        .quad 0x3FEE697B662D07CD   /* P1  = +9.503762241004040620296e-01 */
> +        .quad 0xBFCB194C7ED76ACF   /* P2  = -2.117095584242946953999e-01 */
> +        .quad 0xBFD141A19E419762   /* P3  = -2.696308179350720680191e-01 */
> +        .quad 0x3FC0B89C64BC7B98   /* P4  = +1.306338779331468503007e-01 */
> +        .quad 0x3FB4F721150BBFC5   /* P5  = +8.189589275184434216748e-02 */
> +        .quad 0xBFB105AAFAB87898   /* P6  = -6.649273511036069461061e-02 */
> +        .quad 0xBF94FB3B31248C01   /* P7  = -2.048962104266749732921e-02 */
> +        .quad 0x3F9ECD31E588709C   /* P8  = +3.007963145692880855964e-02 */
> +        .quad 0x3F664A91A335C105   /* P9  = +2.721104095762541127495e-03 */
> +        .quad 0xBF89754E32E1E26E   /* P10 = -1.243077366619723806134e-02 */
> +        .quad 0xBFCD000000000000   /* B = -.226563 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6AC6C889D8111D   /* PL0 = +1.161245469312620769170e-17 */
> +        .quad 0x3FCE6864FE55A3D0   /* PH0 = +2.375608674877001114112e-01 */
> +        .quad 0x3FEE31AEE116B82B   /* P1  = +9.435648342384913826391e-01 */
> +        .quad 0xBFCCB114B69E808B   /* P2  = -2.241540805525839833707e-01 */
> +        .quad 0xBFD0B8AB913BA99D   /* P3  = -2.612713735858507980441e-01 */
> +        .quad 0x3FC1823322BED48A   /* P4  = +1.367858810096190233514e-01 */
> +        .quad 0x3FB35822B7929893   /* P5  = +7.556359273675842651653e-02 */
> +        .quad 0xBFB18B03CC78D2DA   /* P6  = -6.852744810096158580830e-02 */
> +        .quad 0xBF911CCC3C8D5E5D   /* P7  = -1.671141738492420009734e-02 */
> +        .quad 0x3F9F0DEC2D99B12F   /* P8  = +3.032654789278515819797e-02 */
> +        .quad 0x3F4A28398B4EBD98   /* P9  = +7.982521989244205404918e-04 */
> +        .quad 0xBF88E60CB2FAB9A4   /* P10 = -1.215753480150000985458e-02 */
> +        .quad 0xBFCF000000000000   /* B = -.242188 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C89D2B6774FB61D   /* PL0 = +4.479593208720169247958e-17 */
> +        .quad 0x3FD09C744F539BE4   /* PH0 = +2.595492148088267558848e-01 */
> +        .quad 0x3FEDD823B0400D42   /* P1  = +9.326342050921214825882e-01 */
> +        .quad 0xBFCEFBF7FF305FCC   /* P2  = -2.420644756355144687086e-01 */
> +        .quad 0xBFCFC01DC4F24A41   /* P3  = -2.480504237797323303990e-01 */
> +        .quad 0x3FC291A2C26D5548   /* P4  = +1.450694512701977626753e-01 */
> +        .quad 0x3FB0D562E672D188   /* P5  = +6.575601698097532991976e-02 */
> +        .quad 0xBFB2201ECC119E06   /* P6  = -7.080261690281738261872e-02 */
> +        .quad 0xBF8695D50F778D31   /* P7  = -1.102796987010509974642e-02 */
> +        .quad 0x3F9EEC8CFBC031A0   /* P8  = +3.019924437107734972427e-02 */
> +        .quad 0xBF6030F0A4D3660A   /* P9  = -1.976461417694923328722e-03 */
> +        .quad 0xBF87845288A4AEF5   /* P10 = -1.148285369398347838494e-02 */
> +        .quad 0xBFD1000000000000   /* B = -.265625 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8B6AAB614D1C8D   /* PL0 = +4.756035418366735312727e-17 */
> +        .quad 0x3FD275F7E1CF7F63   /* PH0 = +2.884502129727392616410e-01 */
> +        .quad 0x3FED56658F74C9CC   /* P1  = +9.167964746359813351341e-01 */
> +        .quad 0xBFD0ECC045EBD596   /* P2  = -2.644501383614054083635e-01 */
> +        .quad 0xBFCD5A4BDE179180   /* P3  = -2.293181261476426808811e-01 */
> +        .quad 0x3FC3C00047D34767   /* P4  = +1.542969084462655120552e-01 */
> +        .quad 0x3FAAC7CE84FD609F   /* P5  = +5.230565427217581251974e-02 */
> +        .quad 0xBFB288948D2E8B43   /* P6  = -7.239654967137902384931e-02 */
> +        .quad 0xBF6D6605AAD5A1C0   /* P7  = -3.588687008847041164896e-03 */
> +        .quad 0x3F9DDB0790848E97   /* P8  = +2.915584392134337382866e-02 */
> +        .quad 0xBF75FDE291BAD5B4   /* P9  = -5.369076763306269573660e-03 */
> +        .quad 0xBF84CEA5C52E0A78   /* P10 = -1.015977390284671071888e-02 */
> +        .quad 0xBFD3000000000000   /* B = -.296875 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7139A81C8A6ECF   /* PL0 = +1.494049799478574591322e-17 */
> +        .quad 0x3FD4470650036407   /* PH0 = +3.168350011233659890841e-01 */
> +        .quad 0x3FECC9A69DFDDD48   /* P1  = +8.996155820631566629678e-01 */
> +        .quad 0xBFD23DED3A37A09F   /* P2  = -2.850297039535778028925e-01 */
> +        .quad 0xBFCAD302395D51C1   /* P3  = -2.095644741153943890185e-01 */
> +        .quad 0x3FC4A8FE3F309C22   /* P4  = +1.614072617096278705115e-01 */
> +        .quad 0x3FA3D161188AA436   /* P5  = +3.870681213931741151586e-02 */
> +        .quad 0xBFB288CFE5494E98   /* P6  = -7.240008685885823969403e-02 */
> +        .quad 0x3F6C7903EED8D334   /* P7  = +3.475673371918475361081e-03 */
> +        .quad 0x3F9BE023CDFB02F6   /* P8  = +2.722221321778569498033e-02 */
> +        .quad 0xBF80F8296F2C3A95   /* P9  = -8.285831170295390358336e-03 */
> +        .quad 0xBF8152DF4790049B   /* P10 = -8.458847400108650973189e-03 */
> +        .quad 0xBFD5000000000000   /* B = -.328125 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7751FE0FEE8335   /* PL0 = +2.022712113430213599928e-17 */
> +        .quad 0x3FD60EF7120502A9   /* PH0 = +3.446633983585721261456e-01 */
> +        .quad 0x3FEC32D951E56E6F   /* P1  = +8.812071418319202070776e-01 */
> +        .quad 0xBFD370255FC004F8   /* P2  = -3.037198481616338996824e-01 */
> +        .quad 0xBFC832F0EBC6BB41   /* P3  = -1.890545989276351359107e-01 */
> +        .quad 0x3FC54C99A0FF432F   /* P4  = +1.664001499289269127540e-01 */
> +        .quad 0x3F99DAC0CC283C18   /* P5  = +2.524853941036661688369e-02 */
> +        .quad 0xBFB227B3896A026D   /* P6  = -7.091829399906553280461e-02 */
> +        .quad 0x3F84663364E1FB19   /* P7  = +9.960557476231411602383e-03 */
> +        .quad 0x3F9922D70DE07C57   /* P8  = +2.454696676442965935283e-02 */
> +        .quad 0xBF85C4A4EB6F86BC   /* P9  = -1.062897532932837635222e-02 */
> +        .quad 0xBF7AAB61214FFE17   /* P10 = -6.511096396024671890972e-03 */
> +        .quad 0xBFD7000000000000   /* B = -.359375 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3BFE67F266843B2C   /* PL0 = +1.030196791298162288777e-19 */
> +        .quad 0x3FD7CD3115FC0F16   /* PH0 = +3.718989100163850869407e-01 */
> +        .quad 0x3FEB92F96CCC2C5B   /* P1  = +8.616912007286247079761e-01 */
> +        .quad 0xBFD4827320135092   /* P2  = -3.204620183216856200247e-01 */
> +        .quad 0xBFC582B15550168A   /* P3  = -1.680509249273891977521e-01 */
> +        .quad 0x3FC5AC3B9A2E4C31   /* P4  = +1.693186285816366254244e-01 */
> +        .quad 0x3F88FA599FCADAFB   /* P5  = +1.219625491044728129762e-02 */
> +        .quad 0xBFB16EC8F5CA169E   /* P6  = -6.809669495313605642174e-02 */
> +        .quad 0x3F90140EFC748BBE   /* P7  = +1.570151725639922719844e-02 */
> +        .quad 0x3F95CFC49C1A28DC   /* P8  = +2.130038454792147768770e-02 */
> +        .quad 0xBF8946ED8B1BF454   /* P9  = -1.234231549050882816697e-02 */
> +        .quad 0xBF7239E55C1DD50F   /* P10 = -4.449745117985472755606e-03 */
> +        .quad 0xBFD9000000000000   /* B = -.390625 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6412330191189C   /* PL0 = +8.704448096175471149661e-18 */
> +        .quad 0x3FD9812B3B03F0A5   /* PH0 = +3.985088421175169703936e-01 */
> +        .quad 0x3FEAEB08C3C0E84D   /* P1  = +8.411907027541559254748e-01 */
> +        .quad 0xBFD57446B1BC46CF   /* P2  = -3.352219329545790787820e-01 */
> +        .quad 0xBFC2CA9ABC0444AD   /* P3  = -1.468079965639267634401e-01 */
> +        .quad 0x3FC5CA95F9460D18   /* P4  = +1.702449290424759093710e-01 */
> +        .quad 0xBF2C2DAA35DD05C3   /* P5  = -2.149839664813813012186e-04 */
> +        .quad 0xBFB069A516EEB75D   /* P6  = -6.411201295733578195472e-02 */
> +        .quad 0x3F9512716416FDC7   /* P7  = +2.057816670798986720058e-02 */
> +        .quad 0x3F921630CB1319A3   /* P8  = +1.766277541607908852593e-02 */
> +        .quad 0xBF8B76DA2EC99526   /* P9  = -1.341028647693549562145e-02 */
> +        .quad 0xBF63A97474A161E4   /* P10 = -2.400138332671485493040e-03 */
> +        .quad 0xBFDB000000000000   /* B = -.421875 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C89B79F5783381C   /* PL0 = +4.461236087774530799537e-17 */
> +        .quad 0x3FDB2A6C993B829D   /* PH0 = +4.244643684778937609003e-01 */
> +        .quad 0x3FEA3C0C1FBA328C   /* P1  = +8.198299998926627915155e-01 */
> +        .quad 0xBFD6457212F78DE0   /* P2  = -3.479886231636708581604e-01 */
> +        .quad 0xBFC0129BDA380A66   /* P3  = -1.255678954622282824818e-01 */
> +        .quad 0x3FC5AB77F388FBDE   /* P4  = +1.692953051696965507089e-01 */
> +        .quad 0xBF8822F3A6CADB7C   /* P5  = -1.178541519889874597783e-02 */
> +        .quad 0xBFAE4A876370A4BD   /* P6  = -5.916236008517603590739e-02 */
> +        .quad 0x3F991A89BC3B7710   /* P7  = +2.451529704455085335710e-02 */
> +        .quad 0x3F8C4A4328204D4B   /* P8  = +1.381351915555364098800e-02 */
> +        .quad 0xBF8C5F921D01EC0B   /* P9  = -1.385416174911393178490e-02 */
> +        .quad 0xBF3EE844C5B79FB8   /* P10 = -4.716079617694784908234e-04 */
> +        .quad 0xBFDD000000000000   /* B = -.453125 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C73FA437AD7AD87   /* PL0 = +1.732779905745858845932e-17 */
> +        .quad 0x3FDCC88C9902CF45   /* PH0 = +4.497405523536495697279e-01 */
> +        .quad 0x3FE9870845162D1D   /* P1  = +7.977334355686341748810e-01 */
> +        .quad 0xBFD6F62358F73DA8   /* P2  = -3.587730759436120677668e-01 */
> +        .quad 0xBFBAC4345D675FE1   /* P3  = -1.045563438450467661101e-01 */
> +        .quad 0x3FC5539DA8287019   /* P4  = +1.666142531474868131862e-01 */
> +        .quad 0xBF96E3E0DC04A09F   /* P5  = -2.235366194614185212822e-02 */
> +        .quad 0xBFAB5EC7147C207D   /* P6  = -5.345747113284546871398e-02 */
> +        .quad 0x3F9C24166FFA7A58   /* P7  = +2.748141344511120915667e-02 */
> +        .quad 0x3F8451B907819844   /* P8  = +9.921498815128277696693e-03 */
> +        .quad 0xBF8C1C6D19191FCB   /* P9  = -1.372609360545586670239e-02 */
> +        .quad 0x3F547372DF72E35A   /* P10 = +1.248228245272117756098e-03 */
> +        .quad 0xBFDF000000000000   /* B = -.484375 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C848FE06EE49950   /* PL0 = +3.566941590788961528958e-17 */
> +        .quad 0x3FDF20211A36475D   /* PH0 = +4.863360172249622803697e-01 */
> +        .quad 0x3FE86E67E6B80AC2   /* P1  = +7.634772783497611574659e-01 */
> +        .quad 0xBFD7C37C55474D9B   /* P2  = -3.713064987943767913461e-01 */
> +        .quad 0xBFB2EBF15F3CB036   /* P3  = -7.391270232318521952684e-02 */
> +        .quad 0x3FC4718C8EF6E3AA   /* P4  = +1.597152422016539530950e-01 */
> +        .quad 0xBFA277F8394E9B07   /* P5  = -3.607154559658991932071e-02 */
> +        .quad 0xBFA680312AB207E3   /* P6  = -4.394677778419955009224e-02 */
> +        .quad 0x3F9EDC9A8B57E286   /* P7  = +3.013841128810892143223e-02 */
> +        .quad 0x3F71B8C5E648EAF6   /* P8  = +4.326603932492947851719e-03 */
> +        .quad 0xBF89DB218356730C   /* P9  = -1.262499029217558458029e-02 */
> +        .quad 0x3F6B05728E6EBC8E   /* P10 = +3.298496001171330815865e-03 */
> +        .quad 0xBFE1000000000000   /* B = -.53125  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8429831EDD94DE   /* PL0 = +3.497576705878673192147e-17 */
> +        .quad 0x3FE10AF47E0BF610   /* PH0 = +5.325872861719194162333e-01 */
> +        .quad 0x3FE6EC5879F87EEE   /* P1  = +7.163507826080299761242e-01 */
> +        .quad 0xBFD86AD001BFE200   /* P2  = -3.815193192563413204129e-01 */
> +        .quad 0xBFA239045B661385   /* P3  = -3.559125533778398983564e-02 */
> +        .quad 0x3FC2B4572D9CC147   /* P4  = +1.461285565105845078038e-01 */
> +        .quad 0xBFA99F4F01740705   /* P5  = -5.004355328311586406115e-02 */
> +        .quad 0xBF9F449C484F4879   /* P6  = -3.053516570418721511214e-02 */
> +        .quad 0x3F9F5F42169D7DDE   /* P7  = +3.063681853325116830798e-02 */
> +        .quad 0xBF6111B1BA632A97   /* P8  = -2.083632588527460989469e-03 */
> +        .quad 0xBF84725FBE5B6E61   /* P9  = -9.983776089419639342530e-03 */
> +        .quad 0x3F7438A2986CFA9C   /* P10 = +4.936823976832951342488e-03 */
> +        .quad 0xBFE3000000000000   /* B = -.59375  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BE9160BFB3505   /* PL0 = +1.210424670976053242391e-17 */
> +        .quad 0x3FE26D76F73233C7   /* PH0 = +5.758623912857893101247e-01 */
> +        .quad 0x3FE56363B5B93937   /* P1  = +6.683825063026124740752e-01 */
> +        .quad 0xBFD8A2244B27297E   /* P2  = -3.848963483730115724200e-01 */
> +        .quad 0xBF52CA2F101EEF63   /* P3  = -1.146837196286797844817e-03 */
> +        .quad 0x3FC081BC342243AD   /* P4  = +1.289592032012739958675e-01 */
> +        .quad 0xBFAE38DB4A932344   /* P5  = -5.902753148399722719732e-02 */
> +        .quad 0xBF91F814D4AE90C6   /* P6  = -1.754791782481459457885e-02 */
> +        .quad 0x3F9D056AE193C4F3   /* P7  = +2.834097863973723355792e-02 */
> +        .quad 0xBF7BD0B502D8F3A0   /* P8  = -6.790835451792626336974e-03 */
> +        .quad 0xBF7B763F7BB8AE2F   /* P9  = -6.704566938008179114124e-03 */
> +        .quad 0x3F76036F42D9AB69   /* P10 = +5.374369252971835729099e-03 */
> +        .quad 0xBFE5000000000000   /* B = -.65625  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8B64AF0450486E   /* PL0 = +4.751979286662385162741e-17 */
> +        .quad 0x3FE3B75F8BCB742D   /* PH0 = +6.161344271055263499548e-01 */
> +        .quad 0x3FE3DA23BC12369F   /* P1  = +6.203783677353447780947e-01 */
> +        .quad 0xBFD8768FF4B46416   /* P2  = -3.822364701932782367281e-01 */
> +        .quad 0x3F9D67CB8AD9CB1A   /* P3  = +2.871625933625941117406e-02 */
> +        .quad 0x3FBC168CB7827DF4   /* P4  = +1.097190807363331305006e-01 */
> +        .quad 0xBFB03A2B83C9272E   /* P5  = -6.338760344911228324430e-02 */
> +        .quad 0xBF789FEB595297DC   /* P6  = -6.011885959344067548074e-03 */
> +        .quad 0x3F98BD01B4C335E7   /* P7  = +2.415850320612902513532e-02 */
> +        .quad 0xBF83BADC303D6535   /* P8  = -9.633751127398152979976e-03 */
> +        .quad 0xBF6C54E7A1C1E3F3   /* P9  = -3.458454519258407989501e-03 */
> +        .quad 0x3F7408394B7EF3E7   /* P10 = +4.890655334688332484537e-03 */
> +        .quad 0xBFE7000000000000   /* B = -.71875  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6A48557F6E0D3E   /* PL0 = +1.139824111505584215867e-17 */
> +        .quad 0x3FE4E8D895B010DC   /* PH0 = +6.534235881413468227663e-01 */
> +        .quad 0x3FE25652FAAF8A73   /* P1  = +5.730376144604875448991e-01 */
> +        .quad 0xBFD7F6C3A57C444B   /* P2  = -3.744362941807295084434e-01 */
> +        .quad 0x3FAB7866E3F99EBE   /* P3  = +5.365296872042567001598e-02 */
> +        .quad 0x3FB6FA1DF47CCD40   /* P4  = +8.975398272450707099784e-02 */
> +        .quad 0xBFB05508D3741B8E   /* P5  = -6.379752314033580026840e-02 */
> +        .quad 0x3F6C3EFDF7BB279C   /* P6  = +3.448005705512137236209e-03 */
> +        .quad 0x3F9372BADD6D3E27   /* P7  = +1.899234749299530050806e-02 */
> +        .quad 0xBF860FD5AE65F3DA   /* P8  = -1.077238977881649471165e-02 */
> +        .quad 0xBF47266FFB07E628   /* P9  = -7.064863949032872448118e-04 */
> +        .quad 0x3F6F9763992C2A05   /* P10 = +3.856367614735181120799e-03 */
> +        .quad 0xBFE9000000000000   /* B = -.78125  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BB6A2B194E3AB   /* PL0 = +1.201878007209462528697e-17 */
> +        .quad 0x3FE602609AAE7C22   /* PH0 = +6.877902051090851731630e-01 */
> +        .quad 0x3FE0DCBAFE191C7F   /* P1  = +5.269446337560025312137e-01 */
> +        .quad 0xBFD732028428A9FB   /* P2  = -3.624273577321727538225e-01 */
> +        .quad 0x3FB2D92389BE065B   /* P3  = +7.362577545975439796588e-02 */
> +        .quad 0x3FB1F6A9C8C49993   /* P4  = +7.017003203927733370937e-02 */
> +        .quad 0xBFAF47C0B50B56EE   /* P5  = -6.109430513394707378526e-02 */
> +        .quad 0x3F85A8EDD1356223   /* P6  = +1.057611269668352068104e-02 */
> +        .quad 0x3F8BE05C5CD1B4FA   /* P7  = +1.361152799855823798207e-02 */
> +        .quad 0xBF85A0EFE4552F76   /* P8  = -1.056086936537046752272e-02 */
> +        .quad 0x3F559F2A6A356194   /* P9  = +1.319686337259627831943e-03 */
> +        .quad 0x3F6576F5E989208D   /* P10 = +2.620201394425042596201e-03 */
> +        .quad 0xBFEB000000000000   /* B = -.84375  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C80328BD86C8B74   /* PL0 = +2.809809047161267929701e-17 */
> +        .quad 0x3FE704BB1B7FCB81   /* PH0 = +7.193275010198335595035e-01 */
> +        .quad 0x3FDEE264AAD6C40C   /* P1  = +4.825679462765613089739e-01 */
> +        .quad 0xBFD637493CE659F1   /* P2  = -3.471243948673921548357e-01 */
> +        .quad 0x3FB6BE3A3DEE6F4A   /* P3  = +8.884014141079635303208e-02 */
> +        .quad 0x3FAA85EB6470AC0F   /* P4  = +5.180297471118688523488e-02 */
> +        .quad 0xBFACC0146EA4858D   /* P5  = -5.615295267694895314457e-02 */
> +        .quad 0x3F8F8FB683CDDAC5   /* P6  = +1.541082944616557159055e-02 */
> +        .quad 0x3F819515DEE2CB91   /* P7  = +8.585139145315585602547e-03 */
> +        .quad 0xBF834E45E6AF9EA1   /* P8  = -9.426637747267209169415e-03 */
> +        .quad 0x3F65250F197CA56D   /* P9  = +2.581147662472352252568e-03 */
> +        .quad 0x3F57A766026D036C   /* P10 = +1.443719500187702367690e-03 */
> +        .quad 0xBFED000000000000   /* B = -.90625  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C716F7EEF7B61AD   /* PL0 = +1.512291215142578135651e-17 */
> +        .quad 0x3FE7F0E1A4CD846E   /* PH0 = +7.481544703297353660076e-01 */
> +        .quad 0x3FDC2D4CC872DC09   /* P1  = +4.402648885256331012598e-01 */
> +        .quad 0xBFD514A99F92ED53   /* P2  = -3.293861444796750250530e-01 */
> +        .quad 0x3FB9846A6CF2F337   /* P3  = +9.967675361526749494844e-02 */
> +        .quad 0x3FA20896939AB161   /* P4  = +3.522177268800664413493e-02 */
> +        .quad 0xBFA97E801F31EE0D   /* P5  = -4.979324703978358553405e-02 */
> +        .quad 0x3F92A11F47B82085   /* P6  = +1.819275737037219740638e-02 */
> +        .quad 0x3F717D70FE289C34   /* P7  = +4.270020845559097605514e-03 */
> +        .quad 0xBF7FDCF1D3F6CE2D   /* P8  = -7.779068604054678540132e-03 */
> +        .quad 0x3F69F607E81AF6B6   /* P9  = +3.169074480722534625181e-03 */
> +        .quad 0x3F3F925C80D0F889   /* P10 = +4.817462766516585511824e-04 */
> +        .quad 0xBFEF000000000000   /* B = -.96875  */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C931A11D7E8606E   /* PL0 = +6.627280241435322692188e-17 */
> +        .quad 0x3FE92BFB370D9B71   /* PH0 = +7.866188121086975515439e-01 */
> +        .quad 0x3FD866160E454111   /* P1  = +3.812308444367014680480e-01 */
> +        .quad 0xBFD33149F3801DBA   /* P2  = -2.998833539899937679796e-01 */
> +        .quad 0x3FBBDB6D4C949899   /* P3  = +1.088169395412442909023e-01 */
> +        .quad 0x3F8D6AB2A74B9343   /* P4  = +1.436366627735597372494e-02 */
> +        .quad 0xBFA404D1047C5D72   /* P5  = -3.909924678571997970917e-02 */
> +        .quad 0x3F93C47D9ACCD919   /* P6  = +1.930423981976856424661e-02 */
> +        .quad 0xBF41B755642CFF1B   /* P7  = -5.406538915408738478158e-04 */
> +        .quad 0xBF74B5301AA1E788   /* P8  = -5.055606752756853900641e-03 */
> +        .quad 0x3F69A84C5B2A3E68   /* P9  = +3.132008679422249529120e-03 */
> +        .quad 0xBF3CF47830328C11   /* P10 = -4.418176105877589308931e-04 */
> +        .quad 0xBFF1000000000000   /* B = -1.0625   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C884D471B8FD396   /* PL0 = +4.215701792312937090514e-17 */
> +        .quad 0x3FEA8DBCBC31897A   /* PH0 = +8.298019099859594849278e-01 */
> +        .quad 0x3FD3EE730537C8EA   /* P1  = +3.114287901836535219818e-01 */
> +        .quad 0xBFD08A05AD27CE32   /* P2  = -2.584242049190123217982e-01 */
> +        .quad 0x3FBC5255406F84B6   /* P3  = +1.106313021005175045399e-01 */
> +        .quad 0xBF772FA2F633AA5E   /* P4  = -5.660664147607434209241e-03 */
> +        .quad 0xBF99DD8E4C473FC4   /* P5  = -2.525923100057504533247e-02 */
> +        .quad 0x3F9183C935B6495D   /* P6  = +1.710428610165003372069e-02 */
> +        .quad 0xBF70471A3A591480   /* P7  = -3.974058583087303228038e-03 */
> +        .quad 0xBF603DDD4DEBB9A4   /* P8  = -1.982624278176818987264e-03 */
> +        .quad 0x3F62591E44D3C17F   /* P9  = +2.239760512218135956425e-03 */
> +        .quad 0xBF4C195D3A9B1AB4   /* P10 = -8.575158328419569430544e-04 */
> +        .quad 0xBFF3000000000000   /* B = -1.1875   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C90DD1C9BFF7F64   /* PL0 = +5.850777430004479798187e-17 */
> +        .quad 0x3FEBAD50A4A68BC1   /* PH0 = +8.649066177207417327466e-01 */
> +        .quad 0x3FD01FBA72CEE1A5   /* P1  = +2.519365426228666233893e-01 */
> +        .quad 0xBFCBE432F647C4D6   /* P2  = -2.179015829602010702633e-01 */
> +        .quad 0x3FBABF92B6E5AC73   /* P3  = +1.044856735731387955105e-01 */
> +        .quad 0xBF922983AA24E217   /* P4  = -1.773648954369563555378e-02 */
> +        .quad 0xBF8C72214C14E23A   /* P5  = -1.388956082756564056328e-02 */
> +        .quad 0x3F8ACB4D1F388E8B   /* P6  = +1.308307887581540972153e-02 */
> +        .quad 0xBF740EF8B4A2EE3B   /* P7  = -4.897090441029978580995e-03 */
> +        .quad 0xBF0EA9F30C8DC900   /* P8  = -5.848668076326342477133e-05 */
> +        .quad 0x3F53CC40D18713AE   /* P9  = +1.208365725788622757410e-03 */
> +        .quad 0xBF4848B86029CBA1   /* P10 = -7.410908004444779592485e-04 */
> +        .quad 0xBFF5000000000000   /* B = -1.3125   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8FB61781D22681   /* PL0 = +5.501032995458057064843e-17 */
> +        .quad 0x3FEC950A3340C8BF   /* PH0 = +8.931933404003514764824e-01 */
> +        .quad 0x3FC9E1DFFD385423   /* P1  = +2.022056566644617586005e-01 */
> +        .quad 0xBFC71E2FF88EBA23   /* P2  = -1.806087459239772032583e-01 */
> +        .quad 0x3FB80AEBD07AB5BA   /* P3  = +9.391664352252506838449e-02 */
> +        .quad 0xBF98404E27EAE6ED   /* P4  = -2.368280523908243895884e-02 */
> +        .quad 0xBF772DA520B5006E   /* P5  = -5.658764868087568802107e-03 */
> +        .quad 0x3F824C9268AF9423   /* P6  = +8.935111827620250551925e-03 */
> +        .quad 0xBF722AE76D206AE3   /* P7  = -4.435447701349490160113e-03 */
> +        .quad 0x3F4B807F56298D5E   /* P8  = +8.392926941493230644497e-04 */
> +        .quad 0x3F3D71027DF95D2A   /* P9  = +4.492407879061627603159e-04 */
> +        .quad 0xBF3EBD17676755FB   /* P10 = -4.690343988874298905483e-04 */
> +        .quad 0xBFF7000000000000   /* B = -1.4375   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C95393C63CE8224   /* PL0 = +7.363407705201031038415e-17 */
> +        .quad 0x3FED4E6F464286B0   /* PH0 = +9.158245441687622445670e-01 */
> +        .quad 0x3FC4A45842B7DE1E   /* P1  = +1.612654042980787191461e-01 */
> +        .quad 0xBFC2E7885AFDD3D0   /* P2  = -1.476908153814791087327e-01 */
> +        .quad 0x3FB4DD6DD51D3FEB   /* P3  = +8.150373890862254580204e-02 */
> +        .quad 0xBF9A05D3ADAB489C   /* P4  = -2.541285274021075503042e-02 */
> +        .quad 0xBF3459B643B4995C   /* P5  = -3.105230313899165257622e-04 */
> +        .quad 0x3F766B30745F2E3A   /* P6  = +5.473317409222350365811e-03 */
> +        .quad 0xBF6C2C891E555BDF   /* P7  = -3.439204988051155730940e-03 */
> +        .quad 0x3F5194F30D6C576D   /* P8  = +1.073109966176012791522e-03 */
> +        .quad 0x3EF4DBB43C3132A2   /* P9  = +1.989194766975849961365e-05 */
> +        .quad 0xBF2E45EBAB3C15A0   /* P10 = -2.309656316514087783666e-04 */
> +        .quad 0xBFF9000000000000   /* B = -1.5625   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C75111669651DAA   /* PL0 = +1.827249135453834384396e-17 */
> +        .quad 0x3FEDE1EB5937518F   /* PH0 = +9.338280432225917193634e-01 */
> +        .quad 0x3FC06129C7C8EBB1   /* P1  = +1.279651856910653382507e-01 */
> +        .quad 0xBFBE9763041064E1   /* P2  = -1.194974789545031421774e-01 */
> +        .quad 0x3FB1A5B9F9113928   /* P3  = +6.893503504509068635308e-02 */
> +        .quad 0xBF992145039F9AFE   /* P4  = -2.454097590080105816526e-02 */
> +        .quad 0x3F66CB116EA49C89   /* P5  = +2.782377288116648315142e-03 */
> +        .quad 0x3F67F972FDF30001   /* P6  = +2.926563829163342740100e-03 */
> +        .quad 0xBF63A7B5975F02F3   /* P7  = -2.399305983061922438601e-03 */
> +        .quad 0x3F4FDE7B8777F4C8   /* P8  = +9.725669069095216373599e-04 */
> +        .quad 0xBF25918876626BA4   /* P9  = -1.645545082212515656240e-04 */
> +        .quad 0xBF1495123C991F00   /* P10 = -7.851527984669912693674e-05 */
> +        .quad 0xBFFB000000000000   /* B = -1.6875   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9F29A5B7426D27   /* PL0 = +1.081172820484012446345e-16 */
> +        .quad 0x3FEE56B6F3EFABFC   /* PH0 = +9.480852856044061915952e-01 */
> +        .quad 0x3FB9E3EFD94BB9FC   /* P1  = +1.011342912204113371518e-01 */
> +        .quad 0xBFB88BD9760FECA7   /* P2  = -9.588393337610288420285e-02 */
> +        .quad 0x3FAD48A0350B3ACF   /* P3  = +5.719471595295077387313e-02 */
> +        .quad 0xBF96CC6A5110F129   /* P4  = -2.226415748394675367257e-02 */
> +        .quad 0x3F71934687170384   /* P5  = +4.290843485649345772606e-03 */
> +        .quad 0x3F5407BAF73B3DF9   /* P6  = +1.222546180475235334287e-03 */
> +        .quad 0xBF591B626C0646DD   /* P7  = -1.532407870488964407324e-03 */
> +        .quad 0x3F48B0E1DD283558   /* P8  = +7.535078860329375669277e-04 */
> +        .quad 0xBF2B322292840D2B   /* P9  = -2.074877932117605962646e-04 */
> +        .quad 0xBE99E4061120C741   /* P10 = -3.858017559892704559672e-07 */
> +        .quad 0xBFFD000000000000   /* B = -1.8125   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6AF8C2041C67CD   /* PL0 = +1.169711482626385762338e-17 */
> +        .quad 0x3FEEB2DFEDD5EC93   /* PH0 = +9.593352933146824801369e-01 */
> +        .quad 0x3FB465A205CFB638   /* P1  = +7.967579500083210999681e-02 */
> +        .quad 0xBFB3914BF68D39FF   /* P2  = -7.643580216720378576778e-02 */
> +        .quad 0x3FA7F21A08C5C734   /* P3  = +4.676896435820623621673e-02 */
> +        .quad 0xBF93DA9560EA9960   /* P4  = -1.938851741820124550772e-02 */
> +        .quad 0x3F73953FEC62820E   /* P5  = +4.781007481284861359820e-03 */
> +        .quad 0x3F2749D5E1273E3C   /* P6  = +1.776765426044646108071e-04 */
> +        .quad 0xBF4D46B0B498CE5A   /* P7  = -8.934367007839658352859e-04 */
> +        .quad 0x3F4153D680E1F4C4   /* P8  = +5.287930851093571206574e-04 */
> +        .quad 0xBF28477014ECA6A2   /* P9  = -1.852344816708944640949e-04 */
> +        .quad 0x3EFFAC54E07CEB4B   /* P10 = +3.020588886147182143902e-05 */
> +        .quad 0xBFFF000000000000   /* B = -1.9375   */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7A8AF2BB2231F2   /* PL0 = +2.302217989249372577466e-17 */
> +        .quad 0x3FEF1994DF724FC8   /* PH0 = +9.718727459135090285258e-01 */
> +        .quad 0x3FAC65B1BC0C9D58   /* P1  = +5.546336575053583942603e-02 */
> +        .quad 0xBFAB9937BDA747C8   /* P2  = -5.390333356957871365599e-02 */
> +        .quad 0x3FA15B42D9EF931C   /* P3  = +3.389939222669210777241e-02 */
> +        .quad 0xBF8EACD8E8507A3C   /* P4  = -1.497811755149058215502e-02 */
> +        .quad 0x3F7263A15721C682   /* P5  = +4.489546046998806349050e-03 */
> +        .quad 0xBF42A032ACDC3B32   /* P6  = -5.684134900735048121829e-04 */
> +        .quad 0xBF3431E79B5AD185   /* P7  = -3.081503340170088810438e-04 */
> +        .quad 0x3F31B51667C7DF5E   /* P8  = +2.701930714290502424828e-04 */
> +        .quad 0xBF1F8709579250AD   /* P9  = -1.202678157759563704341e-04 */
> +        .quad 0x3F01ED8ED1BF9595   /* P10 = +3.419487094883790833778e-05 */
> +        .quad 0xC001000000000000   /* B = -2.125    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C86F3F7C3DAFC55   /* PL0 = +3.981710680748877459333e-17 */
> +        .quad 0x3FEF73776B2AA2DB   /* PH0 = +9.828450291725759901951e-01 */
> +        .quad 0x3FA16A7FC4D7B900   /* P1  = +3.401564863075812007064e-02 */
> +        .quad 0xBFA11E03803AD621   /* P2  = -3.343211117082156940532e-02 */
> +        .quad 0x3F9609591597297F   /* P3  = +2.152003473546803654658e-02 */
> +        .quad 0xBF847E74ED9BBB0C   /* P4  = -1.000682211039596246436e-02 */
> +        .quad 0x3F6BFF771725CD65   /* P5  = +3.417713736035987187864e-03 */
> +        .quad 0xBF491D1FF73C18FA   /* P6  = -7.664114077392807421000e-04 */
> +        .quad 0x3EF53EE467B51DC5   /* P7  = +2.026145237479599375099e-05 */
> +        .quad 0x3F160135BE0D94A0   /* P8  = +8.394136922403255700685e-05 */
> +        .quad 0xBF0B32CB1D276A40   /* P9  = -5.187685350778849443841e-05 */
> +        .quad 0x3EF4DAF70C12D555   /* P10 = +1.988919462255396826584e-05 */
> +        .quad 0xC003000000000000   /* B = -2.375    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C19DBF4E2E5B7DC   /* PL0 = +3.504575836708380670219e-19 */
> +        .quad 0x3FEFAA7934B75EBD   /* PH0 = +9.895597486128832054320e-01 */
> +        .quad 0x3F9545200830A42C   /* P1  = +2.077150392520736492125e-02 */
> +        .quad 0xBF950C46D285F6BC   /* P2  = -2.055464420253970271376e-02 */
> +        .quad 0x3F8B79F5BFC6513F   /* P3  = +1.341621390819425058164e-02 */
> +        .quad 0xBF7A50ADAD777898   /* P4  = -6.424597194806612772505e-03 */
> +        .quad 0x3F633A19BE8255E3   /* P5  = +2.347040444940816227383e-03 */
> +        .quad 0xBF44E609BC2557B7   /* P6  = -6.377742322836087134324e-04 */
> +        .quad 0x3F1AFCBAD60EAACD   /* P7  = +1.029480968230231421206e-04 */
> +        .quad 0x3EE80476AC34A8EF   /* P8  = +1.145240583485084317660e-05 */
> +        .quad 0xBEF278E23DE463E9   /* P9  = -1.761646478213091821804e-05 */
> +        .quad 0x3EE209FAF377264D   /* P10 = +8.601658563106529694651e-06 */
> +        .quad 0xC005000000000000   /* B = -2.625    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C979D62702C631C   /* PL0 = +8.193023793215066385979e-17 */
> +        .quad 0x3FEFCC04CDBCDC4B   /* PH0 = +9.936546343150295390600e-01 */
> +        .quad 0x3F89E87D088D269A   /* P1  = +1.265046770426474576547e-02 */
> +        .quad 0xBF89BE6721012B80   /* P2  = -1.257019586059526836624e-02 */
> +        .quad 0x3F80F1C13E8D39D3   /* P3  = +8.273610803056031004326e-03 */
> +        .quad 0xBF7082DBC9602757   /* P4  = -4.031046430108839563004e-03 */
> +        .quad 0x3F590BE9BD4E0A11   /* P5  = +1.528719197467002507978e-03 */
> +        .quad 0xBF3DCC2BEF6D0283   /* P6  = -4.546744598208711809986e-04 */
> +        .quad 0x3F1A08065C4A8E85   /* P7  = +9.930170842636406837764e-05 */
> +        .quad 0xBEE528117D0410F3   /* P8  = -1.008821337267942266431e-05 */
> +        .quad 0xBED0BE73A44FF565   /* P9  = -3.992069257383521775961e-06 */
> +        .quad 0x3EC9B0C11E342E38   /* P10 = +3.062539904901699218737e-06 */
> +        .quad 0xC007000000000000   /* B = -2.875    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C804B931AD7A3CC   /* PL0 = +2.826768921701616830245e-17 */
> +        .quad 0x3FEFE06EB0688212   /* PH0 = +9.961465306733450209009e-01 */
> +        .quad 0x3F7F81BD8876224D   /* P1  = +7.692089427458426472642e-03 */
> +        .quad 0xBF7F62A8C699A963   /* P2  = -7.662448196791823756776e-03 */
> +        .quad 0x3F74C31E2B2A6A28   /* P3  = +5.068891378551522166321e-03 */
> +        .quad 0xBF6470D537F16227   /* P4  = -2.495209162173734080001e-03 */
> +        .quad 0x3F4FAEEF61C89673   /* P5  = +9.668988091717359455754e-04 */
> +        .quad 0xBF33C5E80B349783   /* P6  = -3.017131341088651514023e-04 */
> +        .quad 0x3F138F3D31037A6B   /* P7  = +7.461367590931028650557e-05 */
> +        .quad 0xBEEB3C780996FFE3   /* P8  = -1.298723536791163711556e-05 */
> +        .quad 0x3E9D0C75BC8BFEFC   /* P9  = +4.328589367358221917138e-07 */
> +        .quad 0x3EAC3865227764D4   /* P10 = +8.410302755848104487452e-07 */
> +        .quad 0xC009000000000000   /* B = -3.125    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C5B978B202749F9   /* PL0 = +5.983054034451594408315e-18 */
> +        .quad 0x3FEFECD6B7EA3128   /* PH0 = +9.976609794698889643882e-01 */
> +        .quad 0x3F73238B786137FE   /* P1  = +4.672570043181776968058e-03 */
> +        .quad 0xBF731815ACEA072E   /* P2  = -4.661640805922390930706e-03 */
> +        .quad 0x3F6956F0816D5AEE   /* P3  = +3.093213784647877798933e-03 */
> +        .quad 0xBF591A16286C4885   /* P4  = -1.532098425461232453877e-03 */
> +        .quad 0x3F43B3E3A00C6096   /* P5  = +6.012784434430592468442e-04 */
> +        .quad 0xBF29441B2A56DEC7   /* P6  = -1.927645836710038499293e-04 */
> +        .quad 0x3F0A99C3A2E857B6   /* P7  = +5.073669705184196724674e-05 */
> +        .quad 0xBEE61CB034DDC151   /* P8  = -1.054385361573597042258e-05 */
> +        .quad 0x3EB792BBC76D6107   /* P9  = +1.405070887824641788698e-06 */
> +        .quad 0x3E761472362A16F0   /* P10 = +8.225391704739515383837e-08 */
> +        .quad 0xC00B000000000000   /* B = -3.375    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9C290AFCBDE00D   /* PL0 = +9.770074992945060684926e-17 */
> +        .quad 0x3FEFF45F6D36133A   /* PH0 = +9.985806592017987259879e-01 */
> +        .quad 0x3F673CEC093032DE   /* P1  = +2.836667068100913999228e-03 */
> +        .quad 0xBF67347A7CD844D5   /* P2  = -2.832640870800243808078e-03 */
> +        .quad 0x3F5EDA25530355DB   /* P3  = +1.883064698679040793627e-03 */
> +        .quad 0xBF4EAD3BBABC1BA9   /* P4  = -9.361783645268534848806e-04 */
> +        .quad 0x3F3842E61CD35432   /* P5  = +3.701984213198588740338e-04 */
> +        .quad 0xBF1F9AB7FD1A3DDD   /* P6  = -1.205611036090218544867e-04 */
> +        .quad 0x3F0136C154EA3DED   /* P7  = +3.283288480304320224929e-05 */
> +        .quad 0xBEDF12807F721E66   /* P8  = -7.408207230892235753013e-06 */
> +        .quad 0x3EB5B53687AD5112   /* P9  = +1.293889481520047941659e-06 */
> +        .quad 0xBE801E90FBFED147   /* P10 = -1.200988872775447204019e-07 */
> +        .quad 0xC00D000000000000   /* B = -3.625    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9E323294294877   /* PL0 = +1.047637125334028950603e-16 */
> +        .quad 0x3FEFF8F21CDAAA62   /* PH0 = +9.991388858373506653976e-01 */
> +        .quad 0x3F5C3470628813F2   /* P1  = +1.721486807697344658108e-03 */
> +        .quad 0xBF5C2E38AC6FF8D2   /* P2  = -1.720004411026422324849e-03 */
> +        .quad 0x3F52C13234626F43   /* P3  = +1.144694354969070234454e-03 */
> +        .quad 0xBF42B0A47DF47BB4   /* P4  = -5.703738387728891173354e-04 */
> +        .quad 0x3F2DB2889E32FBFD   /* P5  = +2.265731592156760387344e-04 */
> +        .quad 0xBF1385FBD54C5A55   /* P6  = -7.447576110695385196414e-05 */
> +        .quad 0x3EF5AFA812C6984E   /* P7  = +2.068153223579892541184e-05 */
> +        .quad 0xBED47097C188A03C   /* P8  = -4.873231795467276043290e-06 */
> +        .quad 0x3EAFF2B982F7EE8C   /* P9  = +9.521288628073486288914e-07 */
> +        .quad 0xBE828EC5B57D424D   /* P10 = -1.382656715739529384702e-07 */
> +        .quad 0xC00F000000000000   /* B = -3.875    */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9BA40DA6983BEC   /* PL0 = +9.589840482158163453169e-17 */
> +        .quad 0x3FEFFCAAC3F20E65   /* PH0 = +9.995931460438894911036e-01 */
> +        .quad 0x3F4AA87CF664754C   /* P1  = +8.135423820793490331956e-04 */
> +        .quad 0xBF4AA5B62919E224   /* P2  = -8.132113891426467676310e-04 */
> +        .quad 0x3F41C01B53B0B312   /* P3  = +5.416997368051531710388e-04 */
> +        .quad 0xBF31B8B54D091751   /* P4  = -2.704088811110632606347e-04 */
> +        .quad 0x3F1C431305954ECC   /* P5  = +1.078110084525254933728e-04 */
> +        .quad 0xBF02B7DEAD0D44E6   /* P6  = -3.570221236393906131126e-05 */
> +        .quad 0x3EE51C6EFF109EA9   /* P7  = +1.006654199116272154479e-05 */
> +        .quad 0xBEC48CFB08072D17   /* P8  = -2.449834994621594976610e-06 */
> +        .quad 0x3EA1585EC59CAE34   /* P9  = +5.169271261920604503617e-07 */
> +        .quad 0xBE78832BAF950BA9   /* P10 = -9.131575131209528255629e-08 */
> +        .quad 0xC011000000000000   /* B = -4.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8FBF237F4AFE10   /* PL0 = +5.507163370275307643966e-17 */
> +        .quad 0x3FEFFEC61279A3A4   /* PH0 = +9.998503075449787225182e-01 */
> +        .quad 0x3F339E78281A00EA   /* P1  = +2.993625022114214863645e-04 */
> +        .quad 0xBF339DB7B072AD62   /* P2  = -2.993176899035080028902e-04 */
> +        .quad 0x3F2A259E658EF4E4   /* P3  = +1.994853835451177669594e-04 */
> +        .quad 0xBF1A219C312B10BA   /* P4  = -9.968295880030927192162e-05 */
> +        .quad 0x3F04E146B4F5F4B7   /* P5  = +3.982541113154699160876e-05 */
> +        .quad 0xBEEBC5F137088210   /* P6  = -1.324329943580649487333e-05 */
> +        .quad 0x3ECF96736E300B00   /* P7  = +3.765547135882256916132e-06 */
> +        .quad 0xBEAF4874840B91EB   /* P8  = -9.323068824421825762292e-07 */
> +        .quad 0x3E8B6AB2B5C8FD3F   /* P9  = +2.042709991312793245971e-07 */
> +        .quad 0xBE650BCCE62FD2B7   /* P10 = -3.920140725219944650830e-08 */
> +        .quad 0xC013000000000000   /* B = -4.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9C869C85471703   /* PL0 = +9.896883942603146946483e-17 */
> +        .quad 0x3FEFFF8C81C6DC33   /* PH0 = +9.999449286177707341139e-01 */
> +        .quad 0x3F1CDF5A2E4D7C69   /* P1  = +1.101397316012206760643e-04 */
> +        .quad 0xBF1CDEF1F9BE63BE   /* P2  = -1.101336660539594564027e-04 */
> +        .quad 0x3F133EC10C83AAA0   /* P3  = +7.341435696487731017506e-05 */
> +        .quad 0xBF033DAB325FAACB   /* P4  = -3.669909192168459445238e-05 */
> +        .quad 0x3EEEC598FA98BAD8   /* P5  = +1.467316890843338172161e-05 */
> +        .quad 0xBED47F1A15BA368E   /* P6  = -4.886744445221253126882e-06 */
> +        .quad 0x3EB761FBE7D201C1   /* P7  = +1.393720509029845064726e-06 */
> +        .quad 0xBE974CD75A43BF6B   /* P8  = -3.471994551992448536007e-07 */
> +        .quad 0x3E74B02965BBF8DC   /* P9  = +7.706929621914905669946e-08 */
> +        .quad 0xBE504EF4E3892A66   /* P10 = -1.518840362012570189110e-08 */
> +        .quad 0xC015000000000000   /* B = -5.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C643810400471B0   /* PL0 = +8.768592603904887599187e-18 */
> +        .quad 0x3FEFFFD583014825   /* PH0 = +9.999797400180382433987e-01 */
> +        .quad 0x3F053E71416C43CA   /* P1  = +4.051955345663706869871e-05 */
> +        .quad 0xBF053E550C7C8CC9   /* P2  = -4.051873253121394012080e-05 */
> +        .quad 0x3EFC52D0D90D4843   /* P3  = +2.701139380018752534477e-05 */
> +        .quad 0xBEEC523A6ADBE142   /* P4  = -1.350460237457883558350e-05 */
> +        .quad 0x3ED6A73E22D844B3   /* P5  = +5.400965660055565196396e-06 */
> +        .quad 0xBEBE31D10F23ACD0   /* P6  = -1.799738182979224868919e-06 */
> +        .quad 0x3EA13E14264DEAB2   /* P7  = +5.138663935333241981438e-07 */
> +        .quad 0xBE81385ABB98EDCC   /* P8  = -1.282999997786486835638e-07 */
> +        .quad 0x3E5EB9164593E0B6   /* P9  = +2.861301981891537161158e-08 */
> +        .quad 0xBE387218CFE7772E   /* P10 = -5.691705994073124478195e-09 */
> +        .quad 0xC017000000000000   /* B = -5.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C92530433F4C703   /* PL0 = +6.357512739163799046861e-17 */
> +        .quad 0x3FEFFFF05E8D3191   /* PH0 = +9.999925467214315633058e-01 */
> +        .quad 0x3EEF42DDFA52B575   /* P1  = +1.490650158538873335176e-05 */
> +        .quad 0xBEEF42CEB54212AA   /* P2  = -1.490639048307961378200e-05 */
> +        .quad 0x3EE4D7201CBCB853   /* P3  = +9.937445518550804010127e-06 */
> +        .quad 0xBED4D6F764B66C37   /* P4  = -4.968574624976280456686e-06 */
> +        .quad 0x3EC0ABB806EBDE71   /* P5  = +1.987311456171617620608e-06 */
> +        .quad 0xBEA6399CF854F876   /* P6  = -6.623581475862682369330e-07 */
> +        .quad 0x3E8964B91728D7C9   /* P7  = +1.891959403186505598965e-07 */
> +        .quad 0xBE6961A0528444D6   /* P8  = -4.727645325404986954168e-08 */
> +        .quad 0x3E46AE3B0814EE00   /* P9  = +1.056147192151514779549e-08 */
> +        .quad 0xBE221B8194DACD16   /* P10 = -2.107984154277957626641e-09 */
> +        .quad 0xC019000000000000   /* B = -6.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C7BB5622CE1A79E   /* PL0 = +2.403331811901679167526e-17 */
> +        .quad 0x3FEFFFFA3FF22708   /* PH0 = +9.999972580855862602789e-01 */
> +        .quad 0x3ED7003552D53503   /* P1  = +5.483821309338170039906e-06 */
> +        .quad 0xBED7003130C1AB92   /* P2  = -5.483806273169366545037e-06 */
> +        .quad 0x3ECEAAE13B699C45   /* P3  = +3.655850800133043324271e-06 */
> +        .quad 0xBEBEAACB305F3D07   /* P4  = -1.827905351959291114416e-06 */
> +        .quad 0x3EA8887F5F9C87EF   /* P5  = +7.311461438267648556646e-07 */
> +        .quad 0xBE905AD08DF8454F   /* P6  = -2.437046884027860662692e-07 */
> +        .quad 0x3E72B068300B703F   /* P7  = +6.962228483613086736676e-08 */
> +        .quad 0xBE52AF921A71C058   /* P8  = -1.740252888706390465423e-08 */
> +        .quad 0x3E30B53EAA35300D   /* P9  = +3.890131469838137725119e-09 */
> +        .quad 0xBE0AB60CDAD7E22E   /* P10 = -7.773963050435300060566e-10 */
> +        .quad 0xC01B000000000000   /* B = -6.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8BD1ACF80D7256   /* PL0 = +4.825835138930451121169e-17 */
> +        .quad 0x3FEFFFFDE2760A41   /* PH0 = +9.999989913051835488389e-01 */
> +        .quad 0x3EC0EC4F1EC27E55   /* P1  = +2.017388615341105998718e-06 */
> +        .quad 0xBEC0EC4E005E6EAC   /* P2  = -2.017386580411626200507e-06 */
> +        .quad 0x3EB6906504BC4610   /* P3  = +1.344921673533307001969e-06 */
> +        .quad 0xBEA6905F0D52C8B5   /* P4  = -6.724581235377781360384e-07 */
> +        .quad 0x3E920D0F5CCE152B   /* P5  = +2.689810941136721216499e-07 */
> +        .quad 0xBE7811505B10E753   /* P6  = -8.965891741619763761543e-08 */
> +        .quad 0x3E5B811EE4F9B8EE   /* P7  = +2.561544781706659619288e-08 */
> +        .quad 0xBE3B80ABC067E840   /* P8  = -6.403452884688571158579e-09 */
> +        .quad 0x3E1898E394E09335   /* P9  = +1.431746793613569087489e-09 */
> +        .quad 0xBDF3ABB5BA711DB7   /* P10 = -2.862469657501951918569e-10 */
> +        .quad 0xC01D000000000000   /* B = -7.25     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8AE01DB39A3791   /* PL0 = +4.662147961093911873193e-17 */
> +        .quad 0x3FEFFFFF38C76668   /* PH0 = +9.999996289217962797125e-01 */
> +        .quad 0x3EA8E712E56E1188   /* P1  = +7.421562696484951529573e-07 */
> +        .quad 0xBEA8E7124A650791   /* P2  = -7.421559942504648535596e-07 */
> +        .quad 0x3EA09A0B62D8EF94   /* P3  = +4.947702955735978541097e-07 */
> +        .quad 0xBE909A09C56C2107   /* P4  = -2.473847805916120382218e-07 */
> +        .quad 0x3E7A900A90A54A6E   /* P5  = +9.895362410487317236618e-08 */
> +        .quad 0xBE61B5557BB449B6   /* P6  = -3.298434544432568302770e-08 */
> +        .quad 0x3E443CC74732CDCA   /* P7  = +9.423781066565733462466e-09 */
> +        .quad 0xBE243CA8AA8D6E54   /* P8  = -2.355890888986360997159e-09 */
> +        .quad 0x3E0219C341E0D1B4   /* P9  = +5.267978308406275552691e-10 */
> +        .quad 0xBDDCF49A10950F13   /* P10 = -1.053394074620716018815e-10 */
> +        .quad 0xC01F000000000000   /* B = -7.75     */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C75CB18F3775414   /* PL0 = +1.890271747518592444083e-17 */
> +        .quad 0x3FEFFFFFD38C39F0   /* PH0 = +9.999999172012490333827e-01 */
> +        .quad 0x3E8639E2F89493BB   /* P1  = +1.655974950855472979393e-07 */
> +        .quad 0xBE8639E2D9B29562   /* P2  = -1.655974813708346974914e-07 */
> +        .quad 0x3E7DA2836A1F706E   /* P3  = +1.103982989742589616541e-07 */
> +        .quad 0xBE6DA282C6733DAE   /* P4  = -5.519913131581509871840e-08 */
> +        .quad 0x3E57B53A278851FD   /* P5  = +2.207971980430773309147e-08 */
> +        .quad 0xBE3F9C4A72536E22   /* P6  = -7.359895614149337484810e-09 */
> +        .quad 0x3E220E81FBE19CDD   /* P7  = +2.102073153607135257714e-09 */
> +        .quad 0xBE020E8875ADA8D8   /* P8  = -5.255211642212584097407e-10 */
> +        .quad 0x3DE07634328384FC   /* P9  = +1.197748786062966341989e-10 */
> +        .quad 0xBDBA54078E3C351F   /* P10 = -2.394539505021488953905e-11 */
> +        .quad 0xC021000000000000   /* B = -8.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C98B78738B0EDEF   /* PL0 = +8.575399788039081964921e-17 */
> +        .quad 0x3FEFFFFFF9FBEA40   /* PH0 = +9.999999887944071019774e-01 */
> +        .quad 0x3E581056FAC28C46   /* P1  = +2.241118550516412682327e-08 */
> +        .quad 0xBE581056F63A4351   /* P2  = -2.241118525356742542550e-08 */
> +        .quad 0x3E500AE49533790A   /* P3  = +1.494078933911655875521e-08 */
> +        .quad 0xBE400AE489ACBA90   /* P4  = -7.470394349637968945652e-09 */
> +        .quad 0x3E29AB0D59A1967B   /* P5  = +2.988168557255271725494e-09 */
> +        .quad 0xBE111CB32D6EEF2B   /* P6  = -9.960558400070350772418e-10 */
> +        .quad 0x3DF38CBADF396908   /* P7  = +2.844859618921805216353e-10 */
> +        .quad 0xBDD38CC7B92CECD3   /* P8  = -7.112220386749926320915e-11 */
> +        .quad 0x3DB1D2BBE2705032   /* P9  = +1.621008722427575444686e-11 */
> +        .quad 0xBD8C8199294E6380   /* P10 = -3.240784656869469020111e-12 */
> +        .quad 0xC023000000000000   /* B = -9.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8EEEC16618B984   /* PL0 = +5.365957423487855307906e-17 */
> +        .quad 0x3FEFFFFFFF2F9279   /* PH0 = +9.999999984834878619111e-01 */
> +        .quad 0x3E2A0DB0D052B148   /* P1  = +3.033024167396880687734e-09 */
> +        .quad 0xBE2A0DB0CFA6AB71   /* P2  = -3.033024162734192808028e-09 */
> +        .quad 0x3E215E75D53A3105   /* P3  = +2.022016035353114070618e-09 */
> +        .quad 0xBE115E75D40AA47F   /* P4  = -1.011008013562702155050e-09 */
> +        .quad 0x3DFBCA5CDC12ED1C   /* P5  = +4.044047007631481841556e-10 */
> +        .quad 0xBDE286E85704FC22   /* P6  = -1.348015410318274576187e-10 */
> +        .quad 0x3DC52A8925354517   /* P7  = +3.850101197145027796396e-11 */
> +        .quad 0xBDA52A97EA3F5F4A   /* P8  = -9.625355478142550638468e-12 */
> +        .quad 0x3D834C011A2AC0F7   /* P9  = +2.193802608697321032841e-12 */
> +        .quad 0xBD5EDD05BDCB3A62   /* P10 = -4.385948508419928563300e-13 */
> +        .quad 0xC025000000000000   /* B = -10.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6BD8B474BBF792   /* PL0 = +1.207649585364892639612e-17 */
> +        .quad 0x3FEFFFFFFFE3CAD8   /* PH0 = +9.999999997947623953110e-01 */
> +        .quad 0x3DFC3527E43C565F   /* P1  = +4.104751852963940338559e-10 */
> +        .quad 0xBDFC3527E420F415   /* P2  = -4.104751852036136216697e-10 */
> +        .quad 0x3DF2CE1A8D806DAD   /* P3  = +2.736501142887952919489e-10 */
> +        .quad 0xBDE2CE1A8DDF690A   /* P4  = -1.368250573053032426141e-10 */
> +        .quad 0x3DCE169832D8BD68   /* P5  = +5.473022586854025789680e-11 */
> +        .quad 0xBDB40F0FE853DA5B   /* P6  = -1.824340550195944358477e-11 */
> +        .quad 0x3D96EA8D930D31A1   /* P7  = +5.210545794901128943676e-12 */
> +        .quad 0xBD76EA9DB0D09839   /* P8  = -1.302650427355019556441e-12 */
> +        .quad 0x3D54E474FD4303A1   /* P9  = +2.968990047962355000258e-13 */
> +        .quad 0xBD30B526CA2B228A   /* P10 = -5.935740124899435401321e-14 */
> +        .quad 0xC027000000000000   /* B = -11.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C56E8953D525FD5   /* PL0 = +4.967494994909661698725e-18 */
> +        .quad 0x3FEFFFFFFFFC2EB9   /* PH0 = +9.999999999722241073030e-01 */
> +        .quad 0x3DCE8A37A48016C2   /* P1  = +5.555177547354687971427e-11 */
> +        .quad 0xBDCE8A37A479B7D4   /* P2  = -5.555177547084873157964e-11 */
> +        .quad 0x3DC45C250CFA9C16   /* P3  = +3.703451575129414499553e-11 */
> +        .quad 0xBDB45C250D9F8467   /* P4  = -1.851725791056759260154e-11 */
> +        .quad 0x3DA049BB33CBD4E9   /* P5  = +7.406930640558963265190e-12 */
> +        .quad 0xBD85B7A407C422C1   /* P6  = -2.468976464832073512208e-12 */
> +        .quad 0x3D68CF9CED2B3FD5   /* P7  = +7.051706989348171774536e-13 */
> +        .quad 0xBD48CFAE64C352B3   /* P8  = -1.762945685274427023683e-13 */
> +        .quad 0x3D269EAE08690D52   /* P9  = +4.018091287355461204663e-14 */
> +        .quad 0xBD0216CBEAFFF5AA   /* P10 = -8.033151495672990022322e-15 */
> +        .quad 0xC029000000000000   /* B = -12.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C8ACF1392B106D3   /* PL0 = +4.650601502940921454330e-17 */
> +        .quad 0x3FEFFFFFFFFF7BBD   /* PH0 = +9.999999999962408958609e-01 */
> +        .quad 0x3DA088529889B316   /* P1  = +7.518115268189742464885e-12 */
> +        .quad 0xBDA088529887F4C4   /* P2  = -7.518115268005149164680e-12 */
> +        .quad 0x3D960B18BF1DF711   /* P3  = +5.012076679213679703380e-12 */
> +        .quad 0xBD860B18BFD99A48   /* P4  = -2.506038344573564868987e-12 */
> +        .quad 0x3D71A27E7CA64143   /* P5  = +1.002419056539285288454e-12 */
> +        .quad 0xBD5783530EA76D91   /* P6  = -3.341396294294381580191e-13 */
> +        .quad 0x3D3ADCC75CBD2A03   /* P7  = +9.543447641637910477850e-14 */
> +        .quad 0xBD1ADCDA46BE5F17   /* P8  = -2.385887543769010971872e-14 */
> +        .quad 0x3CF87D77650BE5B8   /* P9  = +5.437895260471143131391e-15 */
> +        .quad 0xBCD395AE6E74C6D2   /* P10 = -1.087168847335561258239e-15 */
> +        .quad 0xC02B000000000000   /* B = -13.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C97A8A295292858   /* PL0 = +8.208271151146829171896e-17 */
> +        .quad 0x3FEFFFFFFFFFEE19   /* PH0 = +9.999999999994911847878e-01 */
> +        .quad 0x3D71E642BB008F95   /* P1  = +1.017466259229268282255e-12 */
> +        .quad 0xBD71E642BAFEEC54   /* P2  = -1.017466259207593392022e-12 */
> +        .quad 0x3D67DDAE41647741   /* P3  = +6.783108169938233581038e-13 */
> +        .quad 0xBD57DDAE4230F34B   /* P4  = -3.391554091734942426856e-13 */
> +        .quad 0x3D4317C33FAE2536   /* P5  = +1.356626669455791324801e-13 */
> +        .quad 0xBD2975040D3E26B9   /* P6  = -4.522088139411435138867e-14 */
> +        .quad 0x3D0D155DCD0F0AFB   /* P7  = +1.291565189902030307333e-14 */
> +        .quad 0xBCED157247832B20   /* P8  = -3.228947666403019234175e-15 */
> +        .quad 0x3CCA83D70F607C28   /* P9  = +7.359390959466796619024e-16 */
> +        .quad 0xBCA5343952C1E19E   /* P10 = -1.471323041436694087188e-16 */
> +        .quad 0xC02D000000000000   /* B = -14.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C9B7876CBC5306E   /* PL0 = +9.530765996816607711732e-17 */
> +        .quad 0x3FEFFFFFFFFFFD93   /* PH0 = +9.999999999999310551502e-01 */
> +        .quad 0x3D436121E2640D76   /* P1  = +1.376990843765503869546e-13 */
> +        .quad 0xBD436121E26250EA   /* P2  = -1.376990843736775811281e-13 */
> +        .quad 0x3D39D6D7CA259186   /* P3  = +9.179938654047876451320e-14 */
> +        .quad 0xBD29D6D7CB0327CE   /* P4  = -4.589969336188563660531e-14 */
> +        .quad 0x3D14ABE4DC31244A   /* P5  = +1.835994545584345768382e-14 */
> +        .quad 0xBCFB8FDB82AB6BB7   /* P6  = -6.119980791767901275443e-15 */
> +        .quad 0x3CDF7CF757491B60   /* P7  = +1.747943407988343076526e-15 */
> +        .quad 0xBCBF7D0D833640FB   /* P8  = -4.369905470133249448357e-16 */
> +        .quad 0x3C9CB512F6BDC754   /* P9  = +9.959852600692493655511e-17 */
> +        .quad 0xBC76F50AB1B0E9BA   /* P10 = -1.991219205936492089091e-17 */
> +        .quad 0xC02F000000000000   /* B = -15.5      */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C6FFE15D5F78543   /* PL0 = +1.387454417328248962819e-17 */
> +        .quad 0x3FEFFFFFFFFFFFE1   /* PH0 = +9.999999999999965583086e-01 */
> +        .quad 0x3CFEE00288B99C26   /* P1  = +6.855635762864742358597e-15 */
> +        .quad 0xBCFEE0027D060EE2   /* P2  = -6.855635607998342735403e-15 */
> +        .quad 0x3CF4954AA23148A2   /* P3  = +4.570381865813341696777e-15 */
> +        .quad 0xBCE4954B5DAD3010   /* P4  = -2.285192173571711474199e-15 */
> +        .quad 0x3CD07883DD8793BD   /* P5  = +9.143109661358222028007e-16 */
> +        .quad 0xBCB5F5F4BB87ADCF   /* P6  = -3.047668447080103869032e-16 */
> +        .quad 0x3C98F1A905097685   /* P7  = +8.654183371862458774513e-17 */
> +        .quad 0xBC78F2D585007222   /* P8  = -2.163943551222030413627e-17 */
> +        .quad 0x3C58A37CC5082B5F   /* P9  = +5.342649626494471588064e-18 */
> +        .quad 0xBC33AE7917F94D17   /* P10 = -1.066938163384541013918e-18 */
> +        .quad 0xC031000000000000   /* B = -17        */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x3C91BF1D80474F0F   /* PL0 = +6.157069264461989135096e-17 */
> +        .quad 0x3FEFFFFFFFFFFFFE   /* PH0 = +9.999999999999997779554e-01 */
> +        .quad 0x3CB72071400E6275   /* P1  = +3.209478247225075961360e-16 */
> +        .quad 0xBCB72071400A9F37   /* P2  = -3.209478247103497434502e-16 */
> +        .quad 0x3CAED5EC39A77629   /* P3  = +2.139652050028423711308e-16 */
> +        .quad 0xBC9ED5EC3B530600   /* P4  = -1.069826028468029104719e-16 */
> +        .quad 0x3C88AB2BFED159DE   /* P5  = +4.279326904335078988705e-17 */
> +        .quad 0xBC70721D1220B3FC   /* P6  = -1.426441958074916244382e-17 */
> +        .quad 0x3C52C96049721FB8   /* P7  = +4.073700029965821523731e-18 */
> +        .quad 0xBC32C971215735DC   /* P8  = -1.018438939975201710113e-18 */
> +        .quad 0x3C112EF658AB41A9   /* P9  = +2.328791246104218830028e-19 */
> +        .quad 0xBBEB7B598C6AD3DE   /* P10 = -4.655603964908654142787e-20 */
> +        .quad 0xC03287E0C98F84E5   /* B = -18.530774 */
> +        .quad 0x3FF0000000000000   /* A = +1        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* PL0 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF0000000000000   /* PH0 = +1.000000000000000000000e+00 */
> +        .quad 0x0000000000000000   /* P1  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P2  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P3  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P4  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P5  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P6  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P7  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P8  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P9  = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* P10 = +0.000000000000000000000e-01 */
> +        .quad 0x0000000000000000   /* B = +0        */
> +        .quad 0x0000000000000000   /* A = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .quad 0x0000000000000000   /* Align value = +0        */
> +        .align 32
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000   /* _dbSignMask       */
> +        .align 32
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff   /* _dbAbsMask        */
> +        .align 32
> +        .long 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000           /* _iExpMantMask     */
> +        .align 32
> +        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMask         */
> +        .align 32
> +        .long 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000           /* _iMinIdxOfsMask   */
> +        .align 32
> +        .long 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000           /* _iMaxIdxMask      */
> +        .align 32
> +        .type  __svml_dtanh_data_internal,@object
> +        .size  __svml_dtanh_data_internal,.-__svml_dtanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S
> new file mode 100644
> index 0000000000..92fb24a640
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized tanh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN8v_tanh _ZGVeN8v_tanh_avx2_wrapper
> +#include "../svml_d_tanh8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c
> new file mode 100644
> index 0000000000..495cb1f4fc
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core.c
> @@ -0,0 +1,27 @@
> +/* Multiple versions of vectorized tanh, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN8v_tanh
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN8v_tanh, __GI__ZGVeN8v_tanh, __redirect__ZGVeN8v_tanh)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S
> new file mode 100644
> index 0000000000..01fc22ba6f
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_tanh8_core_avx512.S
> @@ -0,0 +1,472 @@
> +/* Function tanh vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_dtanh_data_internal
> + */
> +#define _dC                            0
> +#define _dP0                           128
> +#define _dP1                           256
> +#define _dP2                           384
> +#define _dP3                           512
> +#define _dP4                           640
> +#define _dP5                           768
> +#define _dP6                           896
> +#define _dP7                           1024
> +#define _dP8                           1152
> +#define _dP9                           1280
> +#define _dP10                          1408
> +#define _dP11                          1536
> +#define _dP12                          1664
> +#define _dP13                          1792
> +#define _dP14                          1920
> +#define _dP15                          2048
> +#define _dP16                          2176
> +#define _dP17                          2304
> +#define _iExpMantMask_UISA             2432
> +#define _iMinIdxOfsMask_UISA           2496
> +#define _iMaxIdxMask_UISA              2560
> +#define _dbSignMask                    2624
> +#define _dbAbsMask                     2688
> +#define _iExpMantMask                  2752
> +#define _iExpMask                      2816
> +#define _iMinIdxOfsMask                2880
> +#define _iMaxIdxMask                   2944
> +
> +#include <sysdep.h>
> +
> +        .text
> +       .section .text.evex512,"ax",@progbits
> +ENTRY(_ZGVeN8v_tanh_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $320, %rsp
> +        vpsrlq    $32, %zmm0, %zmm4
> +        vmovups   %zmm0, (%rsp)
> +        vmovups   __svml_dtanh_data_internal(%rip), %zmm14
> +        vmovups   _dP0+__svml_dtanh_data_internal(%rip), %zmm15
> +        vpmovqd   %zmm4, %ymm5
> +
> +/*  Constant loading  */
> +        vandpd    _dbAbsMask+__svml_dtanh_data_internal(%rip), %zmm0, %zmm13
> +        vandpd    _dbSignMask+__svml_dtanh_data_internal(%rip), %zmm0, %zmm3
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        vpand     _iExpMantMask_UISA+__svml_dtanh_data_internal(%rip), %ymm5, %ymm7
> +        vmovups   _dP2+__svml_dtanh_data_internal(%rip), %zmm0
> +        vmovups   _dP16+__svml_dtanh_data_internal(%rip), %zmm4
> +        vmovups   _dP15+__svml_dtanh_data_internal(%rip), %zmm5
> +        vmovups   %zmm3, 64(%rsp)
> +        vmovups   _dP3+__svml_dtanh_data_internal(%rip), %zmm3
> +        vpsubd    _iMinIdxOfsMask_UISA+__svml_dtanh_data_internal(%rip), %ymm7, %ymm8
> +
> +/* if VMIN, VMAX is defined for I type */
> +        vxorps    %ymm9, %ymm9, %ymm9
> +        vpmaxsd   %ymm9, %ymm8, %ymm10
> +        vpminsd   _iMaxIdxMask_UISA+__svml_dtanh_data_internal(%rip), %ymm10, %ymm11
> +        vpsrld    $19, %ymm11, %ymm12
> +        vmovups   _dP12+__svml_dtanh_data_internal(%rip), %zmm8
> +        vmovups   _dP11+__svml_dtanh_data_internal(%rip), %zmm9
> +        vmovups   _dP10+__svml_dtanh_data_internal(%rip), %zmm10
> +        vmovups   _dP9+__svml_dtanh_data_internal(%rip), %zmm11
> +        vpmovzxdq %ymm12, %zmm2
> +        vmovups   _dP8+__svml_dtanh_data_internal(%rip), %zmm12
> +        vpermt2pd _dP2+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm0
> +        vpermt2pd _dC+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm14
> +        vpermt2pd _dP16+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm4
> +        vpermt2pd _dP15+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm5
> +        vsubpd    {rn-sae}, %zmm14, %zmm13, %zmm1
> +        vpermt2pd _dP12+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm8
> +        vpermt2pd _dP11+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm9
> +        vpermt2pd _dP10+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm10
> +        vpermt2pd _dP9+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm11
> +        vpermt2pd _dP8+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm12
> +        vpermt2pd _dP3+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm3
> +        vpermt2pd _dP0+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm15
> +        vmovups   %zmm0, 192(%rsp)
> +        vmovups   _dP17+__svml_dtanh_data_internal(%rip), %zmm0
> +        vmovups   _dP7+__svml_dtanh_data_internal(%rip), %zmm13
> +        vmovups   _dP6+__svml_dtanh_data_internal(%rip), %zmm14
> +        vmovups   %zmm3, 256(%rsp)
> +        vmovups   _dP5+__svml_dtanh_data_internal(%rip), %zmm3
> +        vmovups   %zmm15, 128(%rsp)
> +        vmovups   _dP4+__svml_dtanh_data_internal(%rip), %zmm15
> +        vpermt2pd _dP17+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm0
> +        vpermt2pd _dP7+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm13
> +        vpermt2pd _dP6+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm14
> +        vpermt2pd _dP5+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm3
> +        vpermt2pd _dP4+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm15
> +        vfmadd213pd {rn-sae}, %zmm4, %zmm1, %zmm0
> +        vpcmpgtd  _iExpMask+__svml_dtanh_data_internal(%rip), %ymm7, %ymm6
> +        vmovmskps %ymm6, %edx
> +        vmovups   _dP14+__svml_dtanh_data_internal(%rip), %zmm6
> +        vfmadd213pd {rn-sae}, %zmm5, %zmm1, %zmm0
> +        vmovups   _dP13+__svml_dtanh_data_internal(%rip), %zmm7
> +        vpermt2pd _dP14+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm6
> +        vpermt2pd _dP13+64+__svml_dtanh_data_internal(%rip), %zmm2, %zmm7
> +        vfmadd213pd {rn-sae}, %zmm6, %zmm1, %zmm0
> +        vmovups   256(%rsp), %zmm2
> +        vfmadd213pd {rn-sae}, %zmm7, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm8, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm9, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm10, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm11, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm12, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm13, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm14, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm3, %zmm1, %zmm0
> +        vmovups   128(%rsp), %zmm3
> +        vfmadd213pd {rn-sae}, %zmm15, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm1, %zmm0
> +        vmovups   192(%rsp), %zmm2
> +        vfmadd213pd {rn-sae}, %zmm2, %zmm1, %zmm0
> +        vfmadd213pd {rn-sae}, %zmm3, %zmm1, %zmm0
> +        vorpd     64(%rsp), %zmm0, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   (%rsp), %zmm1
> +        vmovups   %zmm0, 128(%rsp)
> +        vmovups   %zmm1, 64(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -304; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -312; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xfe, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -320; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -304; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xd0, 0xfe, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -312; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc8, 0xfe, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -320; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xc0, 0xfe, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movsd     64(%rsp,%r14,8), %xmm0
> +        call      tanh@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movsd     %xmm0, 128(%rsp,%r14,8)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN8v_tanh_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_dtanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _dC[16][2];
> +        __declspec(align(64)) VUINT32 _dP0[16][2];
> +        __declspec(align(64)) VUINT32 _dP1[16][2];
> +        __declspec(align(64)) VUINT32 _dP2[16][2];
> +        __declspec(align(64)) VUINT32 _dP3[16][2];
> +        __declspec(align(64)) VUINT32 _dP4[16][2];
> +        __declspec(align(64)) VUINT32 _dP5[16][2];
> +        __declspec(align(64)) VUINT32 _dP6[16][2];
> +        __declspec(align(64)) VUINT32 _dP7[16][2];
> +        __declspec(align(64)) VUINT32 _dP8[16][2];
> +        __declspec(align(64)) VUINT32 _dP9[16][2];
> +        __declspec(align(64)) VUINT32 _dP10[16][2];
> +        __declspec(align(64)) VUINT32 _dP11[16][2];
> +        __declspec(align(64)) VUINT32 _dP12[16][2];
> +        __declspec(align(64)) VUINT32 _dP13[16][2];
> +        __declspec(align(64)) VUINT32 _dP14[16][2];
> +        __declspec(align(64)) VUINT32 _dP15[16][2];
> +        __declspec(align(64)) VUINT32 _dP16[16][2];
> +        __declspec(align(64)) VUINT32 _dP17[16][2];
> +        __declspec(align(64)) VUINT32 _iExpMantMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _iMinIdxOfsMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _iMaxIdxMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _dbSignMask[8][2];
> +        __declspec(align(64)) VUINT32 _dbAbsMask[8][2];
> +        __declspec(align(64)) VUINT32 _iExpMantMask[16][1];
> +        __declspec(align(64)) VUINT32 _iExpMask[16][1];
> +        __declspec(align(64)) VUINT32 _iMinIdxOfsMask[16][1];
> +        __declspec(align(64)) VUINT32 _iMaxIdxMask[16][1];
> +} __svml_dtanh_data_internal;
> +#endif
> +__svml_dtanh_data_internal:
> +        /*== _dC ==*/
> +        .quad 0x0000000000000000, 0x3fcc000000000000, 0x3fd4000000000000, 0x3fdc000000000000
> +        .quad 0x3fe4000000000000, 0x3fec000000000000, 0x3ff4000000000000, 0x3ffc000000000000
> +        .quad 0x4004000000000000, 0x400c000000000000, 0x4014000000000000, 0x401c000000000000
> +        .quad 0x4024000000000000, 0x402c000000000000, 0x4034000000000000, 0x0000000000000000
> +        /*== p0 ==*/
> +        .align 64
> +        .quad 0x0000000000000000, 0x3fcb8fd0416a7c92, 0x3fd35f98a0ea650e, 0x3fda5729ee488037
> +        .quad 0x3fe1bf47eabb8f95, 0x3fe686650b8c2015, 0x3feb2523bb6b2dee, 0x3fee1fbf97e33527
> +        .quad 0x3fef9258260a71c2, 0x3feff112c63a9077, 0x3fefff419668df11, 0x3feffffc832750f2
> +        .quad 0x3feffffffdc96f35, 0x3fefffffffffcf58, 0x3ff0000000000000, 0x3ff0000000000000
> +        /*== p1 ==*/
> +        .align 64
> +        .quad 0x0000000000000000, 0x3c65e23ebcd3bcbe, 0xbc4c600bac3adf00, 0x3c6c44091785d040
> +        .quad 0x3c8221d7a6e3674b, 0x3c69f89d2cf6b85c, 0x3c73b3e9ec0b8f1c, 0xbc7f8d4b0428aada
> +        .quad 0xbc7c52d880cf43c0, 0x3c7dd36e37096480, 0x3c7b4f6380c442ca, 0xbc729755de470096
> +        .quad 0x3c84cf852845efbd, 0x3c6fc4fb440a5378, 0xbc63981083b55870, 0x0000000000000000
> +        /*== p2 ==*/
> +        .align 64
> +        .quad 0x3ff0000000000000, 0x3fee842ca3f08532, 0x3fed11574af58f1b, 0x3fea945b9c24e4f9
> +        .quad 0x3fe6284c3374f815, 0x3fe02500a09f8d6e, 0x3fd1f25131e3a8c0, 0x3fbd22ca1c24a139
> +        .quad 0x3f9b3afe1fba5c76, 0x3f6dd37d19b22b21, 0x3f27ccec13a9ef96, 0x3ecbe6c3f33250ae
> +        .quad 0x3e41b4865394f75f, 0x3d8853f01bda5f28, 0x3c73953c0197ef58, 0x0000000000000000
> +        /*== p3 ==*/
> +        .align 64
> +        .quad 0xbbf0b3ea3fdfaa19, 0xbfca48aaeb53bc21, 0xbfd19921f4329916, 0xbfd5e0f09bef8011
> +        .quad 0xbfd893b59c35c882, 0xbfd6ba7cb7576538, 0xbfce7291743d7555, 0xbfbb6d85a01efb80
> +        .quad 0xbf9addae58c7141a, 0xbf6dc59376c7aa19, 0xbf27cc5e74677410, 0xbecbe6c0e8b4cc87
> +        .quad 0xbe41b486526b0565, 0xbd8853f01bef63a4, 0xbc73955be519be31, 0x0000000000000000
> +        /*== p4 ==*/
> +        .align 64
> +        .quad 0xbfd5555555555555, 0xbfd183afc292ba11, 0xbfcc1a4b039c9bfa, 0xbfc16e1e6d8d0be6
> +        .quad 0xbf92426c751e48a2, 0x3fb4f152b2bad124, 0x3fbbba40cbef72be, 0x3fb01ba038be6a3d
> +        .quad 0x3f916df44871efc8, 0x3f63c6869dfc8870, 0x3f1fb9aef915d828, 0x3ec299d1e27c6e11
> +        .quad 0x3e379b5ddcca334c, 0x3d8037f57bc62c9a, 0x3c6a2d4b50a2cff7, 0x0000000000000000
> +        /*== p5 ==*/
> +        .align 64
> +        .quad 0xbce6863ee44ed636, 0x3fc04dcd0476c75e, 0x3fc43d3449a80f08, 0x3fc5c26f3699b7e7
> +        .quad 0x3fc1a686f6ab2533, 0x3faf203c316ce730, 0xbf89c7a02788557c, 0xbf98157e26e0d541
> +        .quad 0xbf807b55c1c7d278, 0xbf53a18d5843190f, 0xbf0fb6bbc89b1a5b, 0xbeb299c9c684a963
> +        .quad 0xbe279b5dd4fb3d01, 0xbd7037f57ae72aa6, 0xbc5a2ca2bba78e86, 0x0000000000000000
> +        /*== p6 ==*/
> +        .align 64
> +        .quad 0x3fc1111111112ab5, 0x3fb5c19efdfc08ad, 0x3fa74c98dc34fbac, 0xbf790d6a8eff0a77
> +        .quad 0xbfac3c021789a786, 0xbfae2196b7326859, 0xbf93a7a011ff8c2a, 0x3f6e4709c7e8430e
> +        .quad 0x3f67682afa611151, 0x3f3ef2ee77717cbf, 0x3ef95a4482f180b7, 0x3e9dc2c27da3b603
> +        .quad 0x3e12e2afd9f7433e, 0x3d59f320348679ba, 0x3c44b61d9bbcc940, 0x0000000000000000
> +        /*== p7 ==*/
> +        .align 64
> +        .quad 0xbda1ea19ddddb3b4, 0xbfb0b8df995ce4df, 0xbfb2955cf41e8164, 0xbfaf9d05c309f7c6
> +        .quad 0xbf987d27ccff4291, 0x3f8b2ca62572b098, 0x3f8f1cf6c7f5b00a, 0x3f60379811e43dd5
> +        .quad 0xbf4793826f78537e, 0xbf2405695e36240f, 0xbee0e08de39ce756, 0xbe83d709ba5f714e
> +        .quad 0xbdf92e3fc5ee63e0, 0xbd414cc030f2110e, 0xbc2ba022e8d82a87, 0x0000000000000000
> +        /*== p8 ==*/
> +        .align 64
> +        .quad 0xbfaba1ba1990520b, 0xbf96e37bba52f6fc, 0x3ecff7df18455399, 0x3f97362834d33a4e
> +        .quad 0x3f9e7f8380184b45, 0x3f869543e7c420d4, 0xbf7326bd4914222a, 0xbf5fc15b0a9d98fa
> +        .quad 0x3f14cffcfa69fbb6, 0x3f057e48e5b79d10, 0x3ec33b66d7d77264, 0x3e66ac4e578b9b10
> +        .quad 0x3ddcc74b8d3d5c42, 0x3d23c589137f92b4, 0x3c107f8e2c8707a1, 0x0000000000000000
> +        /*== p9 ==*/
> +        .align 64
> +        .quad 0xbe351ca7f096011f, 0x3f9eaaf3320c3851, 0x3f9cf823fe761fc1, 0x3f9022271754ff1f
> +        .quad 0xbf731fe77c9c60af, 0xbf84a6046865ec7d, 0xbf4ca3f1f2b9192b, 0x3f4c77dee0afd227
> +        .quad 0x3f04055bce68597a, 0xbee2bf0cb4a71647, 0xbea31eaafe73efd5, 0xbe46abb02c4368ed
> +        .quad 0xbdbcc749ca8079dd, 0xbd03c5883836b9d2, 0xbbf07a5416264aec, 0x0000000000000000
> +        /*== p10 ==*/
> +        .align 64
> +        .quad 0x3f9664f94e6ac14e, 0xbf94d3343bae39dd, 0xbf7bc748e60df843, 0xbf8c89372b43ba85
> +        .quad 0xbf8129a092de747a, 0x3f60c85b4d538746, 0x3f5be9392199ec18, 0xbf2a0c68a4489f10
> +        .quad 0xbf00462601dc2faa, 0x3eb7b6a219dea9f4, 0x3e80cbcc8d4c5c8a, 0x3e2425bb231a5e29
> +        .quad 0x3d9992a4beac8662, 0x3ce191ba5ed3fb67, 0x3bc892450bad44c4, 0x0000000000000000
> +        /*== p11 ==*/
> +        .align 64
> +        .quad 0xbea8c4c1fd7852fe, 0xbfccce16b1046f13, 0xbf81a16f224bb7b6, 0xbf62cbf00406bc09
> +        .quad 0x3f75b29bb02cf69b, 0x3f607df0f9f90c17, 0xbf4b852a6e0758d5, 0xbf0078c63d1b8445
> +        .quad 0x3eec12eadd55be7a, 0xbe6fa600f593181b, 0xbe5a3c935dce3f7d, 0xbe001c6d95e3ae96
> +        .quad 0xbd74755a00ea1fd3, 0xbcbc1c6c063bb7ac, 0xbba3be9a4460fe00, 0x0000000000000000
> +        /*== p12 ==*/
> +        .align 64
> +        .quad 0xbf822404577aa9dd, 0x403d8b07f7a82aa3, 0xbf9f44ab92fbab0a, 0x3fb2eac604473d6a
> +        .quad 0x3f45f87d903aaac8, 0xbf5e104671036300, 0x3f19bc98ddf0f340, 0x3f0d4304bc9246e8
> +        .quad 0xbed13c415f7b9d41, 0xbe722b8d9720cdb0, 0x3e322666d739bec0, 0x3dd76a553d7e7918
> +        .quad 0x3d4de0fa59416a39, 0x3c948716cf3681b4, 0x3b873f9f2d2fda99, 0x0000000000000000
> +        /*== p13 ==*/
> +        .align 64
> +        .quad 0xbefdd99a221ed573, 0x4070593a3735bab4, 0xbfccab654e44835e, 0x3fd13ed80037dbac
> +        .quad 0xbf6045b9076cc487, 0x3f2085ee7e8ac170, 0x3f23524622610430, 0xbeff12a6626911b4
> +        .quad 0x3eab9008bca408af, 0x3e634df71865f620, 0xbe05bb1bcf83ca73, 0xbdaf2ac143fb6762
> +        .quad 0xbd23eae52a3dbf57, 0xbc6b5e3e9ca0955e, 0xbb5eca68e2c1ba2e, 0x0000000000000000
> +        /*== p14 ==*/
> +        .align 64
> +        .quad 0x3f6e3be689423841, 0xc0d263511f5baac1, 0x40169f73b15ebe5c, 0xc025c1dd41cd6cb5
> +        .quad 0xbf58fd89fe05e0d1, 0x3f73f7af01d5af7a, 0xbf1e40bdead17e6b, 0x3ee224cd6c4513e5
> +        .quad 0xbe24b645e68eeaa3, 0xbe4abfebfb72bc83, 0x3dd51c38f8695ed3, 0x3d8313ac38c6832b
> +        .quad 0x3cf7787935626685, 0x3c401ffc49c6bc29, 0xbabf0b21acfa52ab, 0x0000000000000000
> +        /*== p15 ==*/
> +        .align 64
> +        .quad 0xbf2a1306713a4f3a, 0xc1045e509116b066, 0x4041fab9250984ce, 0xc0458d090ec3de95
> +        .quad 0xbf74949d60113d63, 0x3f7c9fd6200d0ade, 0x3f02cd40e0ad0a9f, 0xbe858ab8e019f311
> +        .quad 0xbe792fa6323b7cf8, 0x3e2df04d67876402, 0xbd95c72be95e4d2c, 0xbd55a89c30203106
> +        .quad 0xbccad6b3bb9eff65, 0xbc12705ccd3dd884, 0xba8e0a4c47ae75f5, 0x0000000000000000
> +        /*== p16 ==*/
> +        .align 64
> +        .quad 0xbf55d7e76dc56871, 0x41528c38809c90c7, 0xc076d57fb5190b02, 0x4085f09f888f8ada
> +        .quad 0x3fa246332a2fcba5, 0xbfb29d851a896fcd, 0x3ed9065ae369b212, 0xbeb8e1ba4c98a030
> +        .quad 0x3e6ffd0766ad4016, 0xbe0c63c29f505f5b, 0xbd7fab216b9e0e49, 0x3d2826b62056aa27
> +        .quad 0x3ca313e31762f523, 0x3bea37aa21895319, 0x3ae5c7f1fd871496, 0x0000000000000000
> +        /*== p17 ==*/
> +        .align 64
> +        .quad 0x3f35e67ab76a26e7, 0x41848ee0627d8206, 0xc0a216d618b489ec, 0x40a5b89107c8af4f
> +        .quad 0x3fb69d8374520eda, 0xbfbded519f981716, 0xbef02d288b5b3371, 0x3eb290981209c1a6
> +        .quad 0xbe567e924bf5ff6e, 0x3de3f7f7de6b0eb6, 0x3d69ed18bae3ebbc, 0xbcf7534c4f3dfa71
> +        .quad 0xbc730b73f1eaff20, 0xbbba2cff8135d462, 0xbab5a71b5f7d9035, 0x0000000000000000
> +        .align 64
> +        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask_UISA     */
> +        .align 64
> +        .long 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000, 0x3fc00000           /* _iMinIdxOfsMask_UISA   */
> +        .align 64
> +        .long 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000, 0x00780000           /* _iMaxIdxMask_UISA      */
> +        .align 64
> +        .quad 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 0x8000000000000000   /* _dbSignMask       */
> +        .align 64
> +        .quad 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff, 0x7fffffffffffffff   /* _dbAbsMask        */
> +        .align 64
> +        .long 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000, 0x7ffe0000           /* _iExpMantMask     */
> +        .align 64
> +        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMask         */
> +        .align 64
> +        .long 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000, 0x3fbe0000           /* _iMinIdxOfsMask   */
> +        .align 64
> +        .long 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000, 0x00760000           /* _iMaxIdxMask      */
> +        .align 64
> +        .type  __svml_dtanh_data_internal,@object
> +        .size  __svml_dtanh_data_internal,.-__svml_dtanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S
> new file mode 100644
> index 0000000000..76bb22229e
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core-avx2.S
> @@ -0,0 +1,20 @@
> +/* AVX2 version of vectorized tanhf.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVeN16v_tanhf _ZGVeN16v_tanhf_avx2_wrapper
> +#include "../svml_s_tanhf16_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c
> new file mode 100644
> index 0000000000..cec4c7ed74
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized tanhf, vector length is 16.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVeN16v_tanhf
> +#include "ifunc-mathvec-avx512-skx.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVeN16v_tanhf, __GI__ZGVeN16v_tanhf,
> +              __redirect__ZGVeN16v_tanhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
> new file mode 100644
> index 0000000000..b6bdf97cc5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
> @@ -0,0 +1,381 @@
> +/* Function tanhf vectorized with AVX-512.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_stanh_data_internal
> + */
> +#define _sC                            0
> +#define _sP0                           128
> +#define _sP2                           256
> +#define _sP3                           384
> +#define _sP4                           512
> +#define _sP5                           640
> +#define _sP6                           768
> +#define _sP7                           896
> +#define _iExpMantMask_UISA             1024
> +#define _iMinIdxOfsMask_UISA           1088
> +#define _iMaxIdxMask_UISA              1152
> +#define _sSignMask                     1216
> +#define _sAbsMask                      1280
> +#define _iExpMantMask                  1344
> +#define _iExpMask                      1408
> +#define _iMinIdxOfsMask                1472
> +#define _iMaxIdxMask                   1536
> +
> +#include <sysdep.h>
> +
> +        .text
> +       .section .text.exex512,"ax",@progbits
> +ENTRY(_ZGVeN16v_tanhf_skx)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-64, %rsp
> +        subq      $192, %rsp
> +        vmovaps   %zmm0, %zmm1
> +        vmovups   __svml_stanh_data_internal(%rip), %zmm9
> +        vmovups   _sP6+__svml_stanh_data_internal(%rip), %zmm11
> +        vmovups   _sP5+__svml_stanh_data_internal(%rip), %zmm12
> +        vmovups   _sP4+__svml_stanh_data_internal(%rip), %zmm13
> +        vmovups   _sP3+__svml_stanh_data_internal(%rip), %zmm14
> +        vmovups   _sP2+__svml_stanh_data_internal(%rip), %zmm15
> +        vpternlogd $255, %zmm2, %zmm2, %zmm2
> +        vandps    _sAbsMask+__svml_stanh_data_internal(%rip), %zmm1, %zmm8
> +        vandps    _sSignMask+__svml_stanh_data_internal(%rip), %zmm1, %zmm0
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        vpandd    _iExpMantMask_UISA+__svml_stanh_data_internal(%rip), %zmm1, %zmm3
> +        vpsubd    _iMinIdxOfsMask_UISA+__svml_stanh_data_internal(%rip), %zmm3, %zmm4
> +        vpcmpd    $2, _iExpMask+__svml_stanh_data_internal(%rip), %zmm3, %k1
> +
> +/*
> + *  small table specific variables *
> + *  Constant loading
> + */
> +        vpxord    %zmm5, %zmm5, %zmm5
> +
> +/* if VMIN, VMAX is defined for I type */
> +        vpmaxsd   %zmm5, %zmm4, %zmm6
> +        vpminsd   _iMaxIdxMask_UISA+__svml_stanh_data_internal(%rip), %zmm6, %zmm7
> +        vpsrld    $21, %zmm7, %zmm10
> +        vmovups   _sP7+__svml_stanh_data_internal(%rip), %zmm4
> +        vpermt2ps _sC+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm9
> +        vpermt2ps _sP6+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm11
> +        vpermt2ps _sP7+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm4
> +        vpermt2ps _sP5+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm12
> +        vpermt2ps _sP4+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm13
> +        vpermt2ps _sP3+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm14
> +        vpermt2ps _sP2+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm15
> +        vpandnd   %zmm3, %zmm3, %zmm2{%k1}
> +        vptestmd  %zmm2, %zmm2, %k0
> +        vmovups   _sP0+__svml_stanh_data_internal(%rip), %zmm3
> +        vsubps    {rn-sae}, %zmm9, %zmm8, %zmm2
> +        kmovw     %k0, %edx
> +        vfmadd213ps {rn-sae}, %zmm11, %zmm2, %zmm4
> +        vpermt2ps _sP0+64+__svml_stanh_data_internal(%rip), %zmm10, %zmm3
> +        vfmadd213ps {rn-sae}, %zmm12, %zmm2, %zmm4
> +        vfmadd213ps {rn-sae}, %zmm13, %zmm2, %zmm4
> +        vfmadd213ps {rn-sae}, %zmm14, %zmm2, %zmm4
> +        vfmadd213ps {rn-sae}, %zmm15, %zmm2, %zmm4
> +        vfmadd213ps {rn-sae}, %zmm3, %zmm2, %zmm4
> +        vorps     %zmm0, %zmm4, %zmm0
> +        testl     %edx, %edx
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0 zmm1
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %zmm1, 64(%rsp)
> +        vmovups   %zmm0, 128(%rsp)
> +                                # LOE rbx r12 r13 r14 r15 edx zmm0
> +
> +        xorl      %eax, %eax
> +                                # LOE rbx r12 r13 r14 r15 eax edx
> +
> +        vzeroupper
> +        movq      %r12, 16(%rsp)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        movl      %eax, %r12d
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        movl      %edx, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $16, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   128(%rsp), %zmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -176; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x50, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -184; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x48, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -64; DW_OP_and; DW_OP_const4s: -192; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xc0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x40, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r12 r13 r14 r15 zmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     64(%rsp,%r14,4), %xmm0
> +        call      tanhf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 128(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVeN16v_tanhf_skx)
> +
> +        .section .rodata, "a"
> +        .align 64
> +
> +#ifdef __svml_stanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(64)) VUINT32 _sC[32][1];
> +        __declspec(align(64)) VUINT32 _sP0[32][1];
> +        __declspec(align(64)) VUINT32 _sP2[32][1];
> +        __declspec(align(64)) VUINT32 _sP3[32][1];
> +        __declspec(align(64)) VUINT32 _sP4[32][1];
> +        __declspec(align(64)) VUINT32 _sP5[32][1];
> +        __declspec(align(64)) VUINT32 _sP6[32][1];
> +        __declspec(align(64)) VUINT32 _sP7[32][1];
> +        __declspec(align(64)) VUINT32 _iExpMantMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _iMinIdxOfsMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _iMaxIdxMask_UISA[16][1];
> +        __declspec(align(64)) VUINT32 _sSignMask[16][1];
> +        __declspec(align(64)) VUINT32 _sAbsMask[16][1];
> +        __declspec(align(64)) VUINT32 _iExpMantMask[16][1];
> +        __declspec(align(64)) VUINT32 _iExpMask[16][1];
> +        __declspec(align(64)) VUINT32 _iMinIdxOfsMask[16][1];
> +        __declspec(align(64)) VUINT32 _iMaxIdxMask[16][1];
> +} __svml_stanh_data_internal;
> +#endif
> +__svml_stanh_data_internal:
> +        /*== _sC ==*/
> +        .long 0x00000000, 0x3d700000, 0x3d900000, 0x3db00000
> +        .long 0x3dd00000, 0x3df00000, 0x3e100000, 0x3e300000
> +        .long 0x3e500000, 0x3e700000, 0x3e900000, 0x3eb00000
> +        .long 0x3ed00000, 0x3ef00000, 0x3f100000, 0x3f300000
> +        .long 0x3f500000, 0x3f700000, 0x3f900000, 0x3fb00000
> +        .long 0x3fd00000, 0x3ff00000, 0x40100000, 0x40300000
> +        .long 0x40500000, 0x40700000, 0x40900000, 0x40b00000
> +        .long 0x40d00000, 0x40f00000, 0x41100000, 0x00000000
> +        /*== p0 ==*/
> +        .align 64
> +        .long 0x00000000, 0x3d6fb9c9, 0x3d8fc35f, 0x3daf9169
> +        .long 0x3dcf49ab, 0x3deee849, 0x3e0f0ee8, 0x3e2e4984
> +        .long 0x3e4d2f8e, 0x3e6bb32e, 0x3e8c51cd, 0x3ea96163
> +        .long 0x3ec543f1, 0x3edfd735, 0x3f028438, 0x3f18abf0
> +        .long 0x3f2bc480, 0x3f3bec1c, 0x3f4f2e5b, 0x3f613c53
> +        .long 0x3f6ce37d, 0x3f743c4f, 0x3f7a5feb, 0x3f7dea85
> +        .long 0x3f7f3b3d, 0x3f7fb78c, 0x3f7fefd4, 0x3f7ffdd0
> +        .long 0x3f7fffb4, 0x3f7ffff6, 0x3f7fffff, 0x3f800000
> +        /*== p2 ==*/
> +        .align 64
> +        .long 0x3f800000, 0x3f7f1f84, 0x3f7ebd11, 0x3f7e1e5f
> +        .long 0x3f7d609f, 0x3f7c842d, 0x3f7b00e5, 0x3f789580
> +        .long 0x3f75b8ad, 0x3f726fd9, 0x3f6cc59b, 0x3f63fb92
> +        .long 0x3f59ff97, 0x3f4f11d7, 0x3f3d7573, 0x3f24f360
> +        .long 0x3f0cbfe7, 0x3eec1a69, 0x3eb0a801, 0x3e6753a2
> +        .long 0x3e132f1a, 0x3db7e7d3, 0x3d320845, 0x3c84d3d4
> +        .long 0x3bc477b7, 0x3b10d3da, 0x3a01601e, 0x388c1a3b
> +        .long 0x3717b0da, 0x35a43bce, 0x338306c6, 0x00000000
> +        /*== p3 ==*/
> +        .align 64
> +        .long 0xb0343c7b, 0xbd6ee69d, 0xbd8f0da7, 0xbdae477d
> +        .long 0xbdcd2a1f, 0xbdeba80d, 0xbe0c443b, 0xbe293cf3
> +        .long 0xbe44f282, 0xbe5f3651, 0xbe81c7c0, 0xbe96d7ca
> +        .long 0xbea7fb8e, 0xbeb50e9e, 0xbec12efe, 0xbec4be92
> +        .long 0xbebce070, 0xbead510e, 0xbe8ef7d6, 0xbe4b8704
> +        .long 0xbe083237, 0xbdaf7449, 0xbd2e1ec4, 0xbc83bf06
> +        .long 0xbbc3e0b5, 0xbb10aadc, 0xba0157db, 0xb88c18f2
> +        .long 0xb717b096, 0xb5a43bae, 0xb383012c, 0x00000000
> +        /*== p4 ==*/
> +        .align 64
> +        .long 0xbeaaaaa5, 0xbeab0612, 0xbea7f01f, 0xbea4e120
> +        .long 0xbea387b7, 0xbea15962, 0xbe9d57f7, 0xbe976b5a
> +        .long 0xbe90230d, 0xbe880dff, 0xbe7479b3, 0xbe4c3d88
> +        .long 0xbe212482, 0xbdeb8cba, 0xbd5e78ad, 0x3c6b5e6e
> +        .long 0x3d839143, 0x3dc21ee1, 0x3de347af, 0x3dcbec96
> +        .long 0x3d99ef2d, 0x3d542ea1, 0x3cdde701, 0x3c2cca67
> +        .long 0x3b81cb27, 0x3ac073a1, 0x39ac3032, 0x383a94d9
> +        .long 0x36ca081d, 0x355abd4c, 0x332b3cb6, 0x00000000
> +        /*== p5 ==*/
> +        .align 64
> +        .long 0xb76dd6b9, 0xbe1c276d, 0x3c1dcf2f, 0x3dc1a78d
> +        .long 0x3d96f985, 0x3da2b61b, 0x3dc13397, 0x3dd2f670
> +        .long 0x3df48a0a, 0x3e06c5a8, 0x3e1a3aba, 0x3e27c405
> +        .long 0x3e2e78d0, 0x3e2c3e44, 0x3e1d3097, 0x3df4a8f4
> +        .long 0x3da38508, 0x3d31416a, 0x3b562657, 0xbcaeeac9
> +        .long 0xbcce9419, 0xbcaaeac4, 0xbc49e7d0, 0xbba71ddd
> +        .long 0xbb003b0e, 0xba3f9a05, 0xb92c08a7, 0xb7ba9232
> +        .long 0xb64a0b0f, 0xb4dac169, 0xb2ab78ac, 0x00000000
> +        /*== p6 ==*/
> +        .align 64
> +        .long 0x3e0910e9, 0x43761143, 0x4165ecdc, 0xc190f756
> +        .long 0xc08c097d, 0xc02ba813, 0xbf7f6bda, 0x3f2b1dc0
> +        .long 0x3ece105d, 0x3f426a94, 0xbadb0dc4, 0x3da43b17
> +        .long 0xbd51ab88, 0xbcaea23d, 0xbd3b6d8d, 0xbd6caaad
> +        .long 0xbd795bed, 0xbd5fddda, 0xbd038f3b, 0xbc1cad63
> +        .long 0x3abb4766, 0x3b95f10b, 0x3b825873, 0x3afaea66
> +        .long 0x3a49f878, 0x39996bf3, 0x388f3e6c, 0x371bb0e3
> +        .long 0x35a8a5e6, 0x34369b17, 0x322487b0, 0x00000000
> +        /*== p7 ==*/
> +        .align 64
> +        .long 0xbc0e2f66, 0x460bda12, 0x43d638ef, 0xc3e11c3e
> +        .long 0xc2baa4e9, 0xc249da2d, 0xc1859b82, 0x40dd5b57
> +        .long 0x40494640, 0x40c730a8, 0xbf0f160e, 0x3e30e76f
> +        .long 0xbea81387, 0xbdb26a1c, 0xbd351e57, 0xbb4c01a0
> +        .long 0x3c1d7bfb, 0x3c722cd1, 0x3c973f1c, 0x3c33a31b
> +        .long 0x3b862ef4, 0x3a27b3d0, 0xba3b5907, 0xba0efc22
> +        .long 0xb97f9f0f, 0xb8c8af50, 0xb7bdddfb, 0xb64f2950
> +        .long 0xb4e085b1, 0xb3731dfa, 0xb15a1f04, 0x00000000
> +        .align 64
> +        .long 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000, 0x7fe00000           /* _iExpMantMask_UISA     */
> +        .align 64
> +        .long 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000, 0x3d400000           /* _iMinIdxOfsMask_UISA   */
> +        .align 64
> +        .long 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000, 0x03e00000           /* _iMaxIdxMask_UISA      */
> +        .align 64
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSignMask        */
> +        .align 64
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff           /* _sAbsMask         */
> +        .align 64
> +        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask     */
> +        .align 64
> +        .long 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000           /* _iExpMask         */
> +        .align 64
> +        .long 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000           /* _iMinIdxOfsMask   */
> +        .align 64
> +        .long 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000           /* _iMaxIdxMask      */
> +        .align 64
> +        .type  __svml_stanh_data_internal,@object
> +        .size  __svml_stanh_data_internal,.-__svml_stanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S
> new file mode 100644
> index 0000000000..cd290db337
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core-sse2.S
> @@ -0,0 +1,20 @@
> +/* SSE2 version of vectorized tanhf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVbN4v_tanhf _ZGVbN4v_tanhf_sse2
> +#include "../svml_s_tanhf4_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c
> new file mode 100644
> index 0000000000..2dcb1f3676
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized tanhf, vector length is 4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVbN4v_tanhf
> +#include "ifunc-mathvec-sse4_1.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVbN4v_tanhf, __GI__ZGVbN4v_tanhf,
> +              __redirect__ZGVbN4v_tanhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S
> new file mode 100644
> index 0000000000..3a0ce20473
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S
> @@ -0,0 +1,832 @@
> +/* Function tanhf vectorized with SSE4.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_stanh_data_internal
> + */
> +#define _dbP                           0
> +#define _sSignMask                     4288
> +#define _sAbsMask                      4304
> +#define _iExpMantMask                  4320
> +#define _iExpMask                      4336
> +#define _iMinIdxOfsMask                4352
> +#define _iMaxIdxMask                   4368
> +
> +#include <sysdep.h>
> +
> +        .text
> +       .section .text.sse4,"ax",@progbits
> +ENTRY(_ZGVbN4v_tanhf_sse4)
> +        subq      $72, %rsp
> +        cfi_def_cfa_offset(80)
> +        movaps    %xmm0, %xmm5
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        movdqu    _iExpMantMask+__svml_stanh_data_internal(%rip), %xmm9
> +        lea       _dbP+16+__svml_stanh_data_internal(%rip), %r8
> +        pand      %xmm5, %xmm9
> +
> +/* if VMIN, VMAX is defined for I type */
> +        pxor      %xmm7, %xmm7
> +        movdqa    %xmm9, %xmm6
> +        psubd     _iMinIdxOfsMask+__svml_stanh_data_internal(%rip), %xmm9
> +
> +/*
> + *  small table specific variables *
> + *  Constant loading
> + */
> +        movdqu    _iMaxIdxMask+__svml_stanh_data_internal(%rip), %xmm10
> +        movdqa    %xmm9, %xmm11
> +        movdqa    %xmm9, %xmm8
> +        pcmpgtd   %xmm10, %xmm11
> +        pcmpgtd   %xmm7, %xmm8
> +        movdqa    %xmm11, %xmm14
> +        pand      %xmm8, %xmm9
> +        andps     %xmm11, %xmm10
> +        andnps    %xmm9, %xmm14
> +        orps      %xmm10, %xmm14
> +        psrld     $14, %xmm14
> +        movd      %xmm14, %edx
> +        pshufd    $1, %xmm14, %xmm12
> +        pshufd    $2, %xmm14, %xmm13
> +        movd      %xmm12, %ecx
> +        pshufd    $3, %xmm14, %xmm15
> +        movups    _sAbsMask+__svml_stanh_data_internal(%rip), %xmm3
> +        movslq    %edx, %rdx
> +        andps     %xmm5, %xmm3
> +        movslq    %ecx, %rcx
> +        pcmpgtd   _iExpMask+__svml_stanh_data_internal(%rip), %xmm6
> +        movd      %xmm13, %esi
> +        movups    -16(%rdx,%r8), %xmm2
> +        movaps    %xmm2, %xmm0
> +        movd      %xmm15, %edi
> +        movmskps  %xmm6, %eax
> +        movups    -16(%rcx,%r8), %xmm6
> +        unpcklpd  %xmm6, %xmm0
> +        unpckhpd  %xmm6, %xmm2
> +        cvtps2pd  %xmm3, %xmm6
> +        movhlps   %xmm3, %xmm3
> +        cvtps2pd  %xmm3, %xmm3
> +        movslq    %esi, %rsi
> +        movslq    %edi, %rdi
> +        movups    (%rcx,%r8), %xmm8
> +        movups    (%rdx,%r8), %xmm12
> +        movups    (%rsi,%r8), %xmm13
> +        movaps    %xmm12, %xmm10
> +        movups    (%rdi,%r8), %xmm9
> +        movaps    %xmm13, %xmm11
> +        unpckhpd  %xmm8, %xmm12
> +        unpckhpd  %xmm9, %xmm13
> +        mulpd     %xmm6, %xmm12
> +        mulpd     %xmm3, %xmm13
> +        unpcklpd  %xmm8, %xmm10
> +        unpcklpd  %xmm9, %xmm11
> +        addpd     %xmm10, %xmm12
> +        addpd     %xmm11, %xmm13
> +        mulpd     %xmm6, %xmm12
> +        mulpd     %xmm3, %xmm13
> +        addpd     %xmm2, %xmm12
> +        movups    -16(%rsi,%r8), %xmm1
> +        movups    -16(%rdi,%r8), %xmm7
> +        movaps    %xmm1, %xmm14
> +        unpckhpd  %xmm7, %xmm1
> +        addpd     %xmm1, %xmm13
> +        mulpd     %xmm12, %xmm6
> +        mulpd     %xmm13, %xmm3
> +        addpd     %xmm0, %xmm6
> +        unpcklpd  %xmm7, %xmm14
> +        addpd     %xmm14, %xmm3
> +        cvtpd2ps  %xmm6, %xmm0
> +        cvtpd2ps  %xmm3, %xmm1
> +        movups    _sSignMask+__svml_stanh_data_internal(%rip), %xmm4
> +        movlhps   %xmm1, %xmm0
> +        andps     %xmm5, %xmm4
> +        orps      %xmm4, %xmm0
> +        testl     %eax, %eax
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax xmm0 xmm5
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $72, %rsp
> +        cfi_def_cfa_offset(8)
> +        ret
> +        cfi_def_cfa_offset(80)
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        movups    %xmm5, 32(%rsp)
> +        movups    %xmm0, 48(%rsp)
> +                                # LOE rbx rbp r12 r13 r14 r15 eax
> +
> +        xorl      %edx, %edx
> +        movq      %r12, 16(%rsp)
> +        cfi_offset(12, -64)
> +        movl      %edx, %r12d
> +        movq      %r13, 8(%rsp)
> +        cfi_offset(13, -72)
> +        movl      %eax, %r13d
> +        movq      %r14, (%rsp)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $4, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx rbp r15 r12d r13d
> +
> +        movq      16(%rsp), %r12
> +        cfi_restore(12)
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        movups    48(%rsp), %xmm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        cfi_offset(12, -64)
> +        cfi_offset(13, -72)
> +        cfi_offset(14, -80)
> +                                # LOE rbx rbp r12 r13 r14 r15 xmm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      tanhf@PLT
> +                                # LOE rbx rbp r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 48(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx rbp r15 r12d r13d
> +END(_ZGVbN4v_tanhf_sse4)
> +
> +        .section .rodata, "a"
> +        .align 16
> +
> +#ifdef __svml_stanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(16)) VUINT32 _dbP[(134*4)][2];
> +        __declspec(align(16)) VUINT32 _sSignMask[4][1];
> +        __declspec(align(16)) VUINT32 _sAbsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iExpMantMask[4][1];
> +        __declspec(align(16)) VUINT32 _iExpMask[4][1];
> +        __declspec(align(16)) VUINT32 _iMinIdxOfsMask[4][1];
> +        __declspec(align(16)) VUINT32 _iMaxIdxMask[4][1];
> +} __svml_stanh_data_internal;
> +#endif
> +__svml_stanh_data_internal:
> +        /* Pol_000:  err=7.93e-09, x in [0.0000000; 0.0312500]. */
> +        .quad 0x0000000000000000  /* A00 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF00000022C70EB  /* A01 = +1.000000008097283510367e+00 */
> +        .quad 0xBED00E878CFFA194  /* A02 = -3.828228912518614443549e-06 */
> +        .quad 0xBFD551766D0607A9  /* A03 = -3.330970825846813476723e-01 */
> +        .quad 0xBE53D60CE3E4C297  /* A00 = -1.847383956330407336230e-08 */
> +        .quad 0x3FF000024177CF5C  /* A01 = +1.000002151235967140508e+00 */
> +        .quad 0xBF1758BC94A51A25  /* A02 = -8.906031613262943753568e-05 */
> +        .quad 0xBFD53EAE67E0D4F0  /* A03 = -3.319507612644221339337e-01 */
> +        .quad 0xBE5A9E47EF32D6FE  /* A00 = -2.479020984039698285657e-08 */
> +        .quad 0x3FF00002DA983057  /* A01 = +1.000002721676556793895e+00 */
> +        .quad 0xBF1BD953509E94AA  /* A02 = -1.062352277175377670507e-04 */
> +        .quad 0xBFD53BDB562EEDD5  /* A03 = -3.317783681520414806876e-01 */
> +        .quad 0xBE6191BBE496D294  /* A00 = -3.272532162914017685901e-08 */
> +        .quad 0x3FF0000390492017  /* A01 = +1.000003398528866105366e+00 */
> +        .quad 0xBF20727E814A57CE  /* A02 = -1.254825043772153972919e-04 */
> +        .quad 0xBFD538DE060A6F22  /* A03 = -3.315959033004550748913e-01 */
> +        .quad 0xBE66DAFA2A893A25  /* A00 = -4.257146219278012568149e-08 */
> +        .quad 0x3FF0000465E08CD1  /* A01 = +1.000004194219219266770e+00 */
> +        .quad 0xBF2341C765EF91B6  /* A02 = -1.469188600530365522261e-04 */
> +        .quad 0xBFD535B6841FAF9E  /* A03 = -3.314033785124993469751e-01 */
> +        .quad 0xBE6D5794E361E964  /* A00 = -5.465394929765249413434e-08 */
> +        .quad 0x3FF000055EE2A0CB  /* A01 = +1.000005121846742950353e+00 */
> +        .quad 0xBF265E6C77E66C8B  /* A02 = -1.706607253709506650304e-04 */
> +        .quad 0xBFD53264DDCCEDA6  /* A03 = -3.312008062382240103361e-01 */
> +        .quad 0xBE729C844D374A6E  /* A00 = -6.933284462462096107184e-08 */
> +        .quad 0x3FF000067F019093  /* A01 = +1.000006195180536350264e+00 */
> +        .quad 0xBF29CC5348D6DCE5  /* A02 = -1.968242326435338705130e-04 */
> +        .quad 0xBFD52EE92121ED35  /* A03 = -3.309881995734998416658e-01 */
> +        .quad 0xBE775AEA17EAA872  /* A00 = -8.700465590574974405858e-08 */
> +        .quad 0x3FF00007CA1D66B8  /* A01 = +1.000007428656699559610e+00 */
> +        .quad 0xBF2D8F5EB98A2637  /* A02 = -2.255252009216044881395e-04 */
> +        .quad 0xBFD52B435CDF9128  /* A03 = -3.307655722585587376727e-01 */
> +        .quad 0xBE7D04DA28C343F0  /* A00 = -1.081040272327705484794e-07 */
> +        .quad 0x3FF000094443CCF5  /* A01 = +1.000008837375216730337e+00 */
> +        .quad 0xBF30D5B76C947AE5  /* A02 = -2.568791210978817814332e-04 */
> +        .quad 0xBFD52773A0776FAD  /* A03 = -3.305329386764651045105e-01 */
> +        .quad 0xBE81DD77A12C51C7  /* A00 = -1.331054169875768625701e-07 */
> +        .quad 0x3FF0000AF1AFD2DA  /* A01 = +1.000010437096696680470e+00 */
> +        .quad 0xBF331230624C1680  /* A02 = -2.910011410651516805537e-04 */
> +        .quad 0xBFD52379FC0B61DF  /* A03 = -3.302903138515186909352e-01 */
> +        .quad 0xBE85D04EEEB3C435  /* A00 = -1.625247628488202841012e-07 */
> +        .quad 0x3FF0000CD6C9B1F2  /* A01 = +1.000012244238970726684e+00 */
> +        .quad 0xBF357F0742FADDD4  /* A02 = -3.280060509313874068243e-04 */
> +        .quad 0xBFD51F56806D0E81  /* A03 = -3.300377134475880880338e-01 */
> +        .quad 0xBE8A6E289B59681B  /* A00 = -1.969211333326924655065e-07 */
> +        .quad 0x3FF0000EF8268F72  /* A01 = +1.000014275873550406715e+00 */
> +        .quad 0xBF381E277A1B747A  /* A02 = -3.680082682942575423093e-04 */
> +        .quad 0xBFD51B093F1D6FD4  /* A03 = -3.297751537663746734808e-01 */
> +        .quad 0xBE8FCBC40EE9ABD5  /* A00 = -2.368983653301529373887e-07 */
> +        .quad 0x3FF000115A883B6C  /* A01 = +1.000016549721943981410e+00 */
> +        .quad 0xBF3AF17AC974B3D9  /* A02 = -4.111218235774406434303e-04 */
> +        .quad 0xBFD516924A4C549C  /* A03 = -3.295026517456081105450e-01 */
> +        .quad 0xBE92FFBC60A3F956  /* A00 = -2.831066871072026054144e-07 */
> +        .quad 0x3FF0001402DCED8A  /* A01 = +1.000019084151832604590e+00 */
> +        .quad 0xBF3DFAE9390C4801  /* A02 = -4.574603454311488280083e-04 */
> +        .quad 0xBFD511F1B4D7DC3A  /* A03 = -3.292202249571719585575e-01 */
> +        .quad 0xBE9690A22F96D5AD  /* A00 = -3.362443262393081632612e-07 */
> +        .quad 0x3FF00016F63EFF5D  /* A01 = +1.000021898173108825247e+00 */
> +        .quad 0xBF409E2C839605BB  /* A02 = -5.071370461992499986334e-04 */
> +        .quad 0xBFD50D27924BEE00  /* A03 = -3.289278916051614487515e-01 */
> +        .quad 0xBE9AA56C65E72A73  /* A00 = -3.970591019557469835586e-07 */
> +        .quad 0x3FF0001A39F4A43E  /* A01 = +1.000025011433776978009e+00 */
> +        .quad 0xBF425BD74C3D6667  /* A02 = -5.602647074553602319844e-04 */
> +        .quad 0xBFD50833F6E1ABA2  /* A03 = -3.286256705238718156536e-01 */
> +        .quad 0xBE9F4BD4FF1A83B0  /* A00 = -4.663500013744687071912e-07 */
> +        .quad 0x3FF0001DD36F9EC2  /* A01 = +1.000028444215715683896e+00 */
> +        .quad 0xBF44376634149405  /* A02 = -6.169556656102642569831e-04 */
> +        .quad 0xBFD50316F77EDEE5  /* A03 = -3.283135811757190158922e-01 */
> +        .quad 0xBEA3B625387BB079  /* A00 = -5.874486399249461304297e-07 */
> +        .quad 0x3FF00023E14CFBA9  /* A01 = +1.000034217911642153709e+00 */
> +        .quad 0xBF47392F923218D2  /* A02 = -7.087213783883111826306e-04 */
> +        .quad 0xBFD4FB1FACDEB938  /* A03 = -3.278273761924483942209e-01 */
> +        .quad 0xBEAA6E24F543500A  /* A00 = -7.876828740601738750574e-07 */
> +        .quad 0x3FF0002D5C6E8412  /* A01 = +1.000043259679163742959e+00 */
> +        .quad 0xBF4BAF02BD7FDD70  /* A02 = -8.448375110664940040861e-04 */
> +        .quad 0xBFD4EFEE6527A7DE  /* A03 = -3.271442401734229177279e-01 */
> +        .quad 0xBEB16E3EBE2157D0  /* A00 = -1.038947396133402500647e-06 */
> +        .quad 0x3FF00038990FEE2F  /* A01 = +1.000053975962952312884e+00 */
> +        .quad 0xBF50569481C574CB  /* A02 = -9.972048056490652716971e-04 */
> +        .quad 0xBFD4E419278DA2B4  /* A03 = -3.264220129263251113372e-01 */
> +        .quad 0xBEB6A7B6723165D4  /* A00 = -1.350350836279403750524e-06 */
> +        .quad 0x3FF00045CAB4158E  /* A01 = +1.000066558657042303793e+00 */
> +        .quad 0xBF531D7C9C849108  /* A02 = -1.166698160951775212202e-03 */
> +        .quad 0xBFD4D7A0BB33B152  /* A03 = -3.256608799117844954552e-01 */
> +        .quad 0xBEBD0EE2A8654AFD  /* A00 = -1.732000471561702711532e-06 */
> +        .quad 0x3FF00055276F18D6  /* A01 = +1.000081209219890521211e+00 */
> +        .quad 0xBF562FDBA3FB6C6C  /* A02 = -1.354183666925102939860e-03 */
> +        .quad 0xBFD4CA85F1B93DB2  /* A03 = -3.248610363561638125773e-01 */
> +        .quad 0xBEC269D4036A207E  /* A00 = -2.195047297096822741730e-06 */
> +        .quad 0x3FF00066E7DA6E4E  /* A01 = +1.000098138500919997540e+00 */
> +        .quad 0xBF5991499FC36B3A  /* A02 = -1.560518167983372759405e-03 */
> +        .quad 0xBFD4BCC9A72283D6  /* A03 = -3.240226871658341556426e-01 */
> +        .quad 0xBEC7154B6C09CFE1  /* A00 = -2.751729738565190291276e-06 */
> +        .quad 0x3FF0007B47086B80  /* A01 = +1.000117566559055148900e+00 */
> +        .quad 0xBF5D455433B4F8F4  /* A02 = -1.786548832412968197680e-03 */
> +        .quad 0xBFD4AE6CC1BFE145  /* A03 = -3.231460468373550942722e-01 */
> +        .quad 0xBECCA68CC64A0F8A  /* A00 = -3.415415948561670285790e-06 */
> +        .quad 0x3FF00092827742F7  /* A01 = +1.000139722473418535387e+00 */
> +        .quad 0xBF60A7BF15A527AF  /* A02 = -2.033112728132522705610e-03 */
> +        .quad 0xBFD49F703214084C  /* A03 = -3.222313393636155876010e-01 */
> +        .quad 0xBED19E68676B241B  /* A00 = -4.200644630977303616698e-06 */
> +        .quad 0x3FF000ACDA037B26  /* A01 = +1.000164844146362863597e+00 */
> +        .quad 0xBF62D99F836A02F8  /* A02 = -2.301036405072284102280e-03 */
> +        .quad 0xBFD48FD4F2B91B28  /* A03 = -3.212787981359945810311e-01 */
> +        .quad 0xBED57CF4B0C7AA54  /* A00 = -5.123164339408145209103e-06 */
> +        .quad 0x3FF000CA8FD9E1A1  /* A01 = +1.000193178099017865534e+00 */
> +        .quad 0xBF653A014548E686  /* A02 = -2.591135484433962181405e-03 */
> +        .quad 0xBFD47F9C0844B38F  /* A03 = -3.202886658426046806447e-01 */
> +        .quad 0xBEDA012B1B1A41E2  /* A00 = -6.199971197454598722328e-06 */
> +        .quad 0x3FF000EBE868FDF4  /* A01 = +1.000224979259539459520e+00 */
> +        .quad 0xBF67CA9427E0A544  /* A02 = -2.904214255086275467410e-03 */
> +        .quad 0xBFD46EC6812ADB37  /* A03 = -3.192611943626845749655e-01 */
> +        .quad 0xBEDF3EAC5BF12194  /* A00 = -7.449344990702664567927e-06 */
> +        .quad 0x3FF001112A520784  /* A01 = +1.000260510744255704196e+00 */
> +        .quad 0xBF6A8D01ABDA4DC4  /* A02 = -3.241065277345108255891e-03 */
> +        .quad 0xBFD45D55759FFA4A  /* A03 = -3.181966446572103146551e-01 */
> +        .quad 0xBEE2A541BC274267  /* A00 = -8.890883582164319970972e-06 */
> +        .quad 0x3FF0013A9E5961F2  /* A01 = +1.000300043631906721231e+00 */
> +        .quad 0xBF6D82ECD080C540  /* A02 = -3.602468994380686462264e-03 */
> +        .quad 0xBFD44B4A0779C0AD  /* A03 = -3.170952866557950611259e-01 */
> +        .quad 0xBEE61D97609A27F4  /* A00 = -1.054553560499505625520e-05 */
> +        .quad 0x3FF001688F56A3AF  /* A01 = +1.000343856731187974773e+00 */
> +        .quad 0xBF7056F8EFB683EC  /* A02 = -3.989193351487490407647e-03 */
> +        .quad 0xBFD438A5620F0F74  /* A03 = -3.159573991399533543500e-01 */
> +        .quad 0xBEEA145429EDD370  /* A00 = -1.243563138839952927732e-05 */
> +        .quad 0x3FF0019B4A242A67  /* A01 = +1.000392236341804297339e+00 */
> +        .quad 0xBF7207D31CA78D9B  /* A02 = -4.401993423445739288258e-03 */
> +        .quad 0xBFD42568BA16E7CD  /* A03 = -3.147832696228050619602e-01 */
> +        .quad 0xBEEE96370D52680F  /* A00 = -1.458491207477835326165e-05 */
> +        .quad 0x3FF001D31D8E4115  /* A01 = +1.000445476009251821736e+00 */
> +        .quad 0xBF73D4CC11EDC094  /* A02 = -4.841611050196221316400e-03 */
> +        .quad 0xBFD411954D8664E7  /* A03 = -3.135731942252974469021e-01 */
> +        .quad 0xBEF338C046215EF8  /* A00 = -1.833122622260562810219e-05 */
> +        .quad 0x3FF00230C32C2EC1  /* A01 = +1.000534784691737621998e+00 */
> +        .quad 0xBF76BD019BCC5DAF  /* A02 = -5.551344188254799492943e-03 */
> +        .quad 0xBFD3F2C7156DC21E  /* A03 = -3.116929730668135389848e-01 */
> +        .quad 0xBEF9B15EAE411EAE  /* A00 = -2.450261207822986676092e-05 */
> +        .quad 0x3FF002C2DF057A4D  /* A01 = +1.000674124886830940184e+00 */
> +        .quad 0xBF7B08CCD9AC1E30  /* A02 = -6.600189396301511801646e-03 */
> +        .quad 0xBFD3C7A7A114FED8  /* A03 = -3.090609620157755976777e-01 */
> +        .quad 0xBF00E36483C373B3  /* A00 = -3.221178528332122595812e-05 */
> +        .quad 0x3FF0036F419480D7  /* A01 = +1.000838524028997644777e+00 */
> +        .quad 0xBF7FD255D1777007  /* A02 = -7.768950679260206403087e-03 */
> +        .quad 0xBFD39A453911D6CE  /* A03 = -3.062909180947429588215e-01 */
> +        .quad 0xBF05DFA04DD12059  /* A00 = -4.172046622180685472624e-05 */
> +        .quad 0x3FF00438B2A03D8D  /* A01 = +1.001030633695197069599e+00 */
> +        .quad 0xBF828F8DBB4A9D10  /* A02 = -9.062869337255224921890e-03 */
> +        .quad 0xBFD36AAB704697D9  /* A03 = -3.033856007044711255993e-01 */
> +        .quad 0xBF0BF3E0C647DEFB  /* A00 = -5.331544597092331081714e-05 */
> +        .quad 0x3FF005221063D36D  /* A01 = +1.001253189109060359741e+00 */
> +        .quad 0xBF857A2CB3C96102  /* A02 = -1.048693584122917590862e-02 */
> +        .quad 0xBFD338E65BBB4FEC  /* A03 = -3.003478904549854444639e-01 */
> +        .quad 0xBF11A506ED7C9D31  /* A00 = -6.730894835681591541979e-05 */
> +        .quad 0x3FF0062E4D0EA92A  /* A01 = +1.001508999829250345925e+00 */
> +        .quad 0xBF88AB82C2761AF3  /* A02 = -1.204588085125866091241e-02 */
> +        .quad 0xBFD305028D6BD206  /* A03 = -2.971807843271395688234e-01 */
> +        .quad 0xBF1607C0922D9BF1  /* A00 = -8.403885708006799337092e-05 */
> +        .quad 0x3FF007606C341961  /* A01 = +1.001800940198869449560e+00 */
> +        .quad 0xBF8C25E6DA487BCF  /* A02 = -1.374416688582682892494e-02 */
> +        .quad 0xBFD2CF0D0EE8F7B5  /* A03 = -2.938873906713255768075e-01 */
> +        .quad 0xBF1B3A8480A0A16D  /* A00 = -1.038688061788578038307e-04 */
> +        .quad 0x3FF008BB802D02D6  /* A01 = +1.002131939589323561535e+00 */
> +        .quad 0xBF8FEB8AE99FD100  /* A02 = -1.558598065819483124983e-02 */
> +        .quad 0xBFD297135BD0911B  /* A03 = -2.904709240558688843059e-01 */
> +        .quad 0xBF20ABB9BDB75C65  /* A00 = -1.271881327357976163798e-04 */
> +        .quad 0x3FF00A42A76D8CD1  /* A01 = +1.002504972472525901495e+00 */
> +        .quad 0xBF91FF3D752BB9E6  /* A02 = -1.757522609380570560722e-02 */
> +        .quad 0xBFD25D235C1F88B4  /* A03 = -2.869346999779154305799e-01 */
> +        .quad 0xBF243D3254425461  /* A00 = -1.544116913733432829448e-04 */
> +        .quad 0x3FF00BF909D1795E  /* A01 = +1.002923048355647051011e+00 */
> +        .quad 0xBF94304E04D44942  /* A02 = -1.971551804042204897316e-02 */
> +        .quad 0xBFD2214B5E61CFA6  /* A03 = -2.832821294498394371075e-01 */
> +        .quad 0xBF286070011B61CE  /* A00 = -1.859795307186510085994e-04 */
> +        .quad 0x3FF00DE1D5E1627E  /* A01 = +1.003389201612804537689e+00 */
> +        .quad 0xBF9689D5F4163F59  /* A02 = -2.201017668045266231780e-02 */
> +        .quad 0xBFD1E39A11C3B42C  /* A03 = -2.795167134743816728104e-01 */
> +        .quad 0xBF2D250B366A79E8  /* A00 = -2.223564326486314902259e-04 */
> +        .quad 0x3FF010003E134001  /* A01 = +1.003906481248123094829e+00 */
> +        .quad 0xBF990C9FF91F6F81  /* A02 = -2.446222265267250853271e-02 */
> +        .quad 0xBFD1A41E80084CDC  /* A03 = -2.756420374218586655246e-01 */
> +        .quad 0xBF314DB5DDC2A30E  /* A00 = -2.640313157465248123865e-04 */
> +        .quad 0x3FF012577608921B  /* A01 = +1.004477940624503018441e+00 */
> +        .quad 0xBF9BB9626875B0C9  /* A02 = -2.707437288829409385849e-02 */
> +        .quad 0xBFD162E80768A9D0  /* A03 = -2.716617653228725615122e-01 */
> +        .quad 0xBF346A6133808864  /* A00 = -3.115165050094957730625e-04 */
> +        .quad 0x3FF014EAAFCC88A3  /* A01 = +1.005106627192198898157e+00 */
> +        .quad 0xBF9E90BEF9BF7419  /* A02 = -2.984903716411588595059e-02 */
> +        .quad 0xBFD12006545F7FAD  /* A03 = -2.675796340899932457269e-01 */
> +        .quad 0xBF37F180DC3848EA  /* A00 = -3.653468704395550778821e-04 */
> +        .quad 0x3FF017BD19147861  /* A01 = +1.005795572250939295955e+00 */
> +        .quad 0xBFA0C9A14C702E07  /* A02 = -3.278831537326359207851e-02 */
> +        .quad 0xBFD0DB895B650092  /* A03 = -2.633994476818851682154e-01 */
> +        .quad 0xBF3BEC6AAC6D7635  /* A00 = -4.260788377246944457107e-04 */
> +        .quad 0x3FF01AD1D884E719  /* A01 = +1.006547780778822565040e+00 */
> +        .quad 0xBFA260B2A1B1434A  /* A02 = -3.589399551186163439542e-02 */
> +        .quad 0xBFD09581529E93D6  /* A03 = -2.591250712233067465817e-01 */
> +        .quad 0xBF4164E26167882B  /* A00 = -5.308251737086202562063e-04 */
> +        .quad 0x3FF01FEF14B62B81  /* A01 = +1.007796364693348545316e+00 */
> +        .quad 0xBFA4EB014538AA42  /* A02 = -4.085544557559163403315e-02 */
> +        .quad 0xBFD029D36FEAF41F  /* A03 = -2.525528519580024222613e-01 */
> +        .quad 0xBF46F6FFF4E53DC8  /* A00 = -7.008313930700277652464e-04 */
> +        .quad 0x3FF027CBB51CBBA0  /* A01 = +1.009715754956893363214e+00 */
> +        .quad 0xBFA89DEC9FEC112E  /* A02 = -4.807986690687680864098e-02 */
> +        .quad 0xBFCF2A99464D0DB4  /* A03 = -2.434875100390009317053e-01 */
> +        .quad 0xBF4DCC9C4F66A4D9  /* A00 = -9.094012482836712945103e-04 */
> +        .quad 0x3FF030E7CFCCD583  /* A01 = +1.011939822882909068014e+00 */
> +        .quad 0xBFACAA3B95814081  /* A02 = -5.598627281199331645611e-02 */
> +        .quad 0xBFCDF78F156BE7CF  /* A03 = -2.341173987004467604844e-01 */
> +        .quad 0xBF5308ED74E5C7A6  /* A00 = -1.161796466103906435435e-03 */
> +        .quad 0x3FF03B5986412ECB  /* A01 = +1.014489674026594512313e+00 */
> +        .quad 0xBFB087EBA88DCC3F  /* A02 = -6.457398285947223148806e-02 */
> +        .quad 0xBFCCBB9BD134862F  /* A03 = -2.244753619680052991736e-01 */
> +        .quad 0xBF57FA23C00DF4B5  /* A00 = -1.463446533505758208674e-03 */
> +        .quad 0x3FF0473558A1BCC0  /* A01 = +1.017384859292903342975e+00 */
> +        .quad 0xBFB2E702BC6360EF  /* A02 = -7.383744334527241048871e-02 */
> +        .quad 0xBFCB77D546379288  /* A03 = -2.145945160729250122955e-01 */
> +        .quad 0xBF5DD12971557F71  /* A00 = -1.819887610814388068450e-03 */
> +        .quad 0x3FF0548DDF5000A8  /* A01 = +1.020643112482540360020e+00 */
> +        .quad 0xBFB571B63DA186E1  /* A02 = -8.376635555898871710045e-02 */
> +        .quad 0xBFCA2D5202605148  /* A03 = -2.045080672838912594358e-01 */
> +        .quad 0xBF6252B1AD5D4F17  /* A00 = -2.236697221556737096709e-03 */
> +        .quad 0x3FF063738A910BF7  /* A01 = +1.024280110622155737232e+00 */
> +        .quad 0xBFB8270C8E6B601B  /* A02 = -9.434584118878357184013e-02 */
> +        .quad 0xBFC8DD27D950A07E  /* A03 = -1.942491351230763441116e-01 */
> +        .quad 0xBF66470C91730CFC  /* A00 = -2.719425723258004842786e-03 */
> +        .quad 0x3FF073F468FCF331  /* A01 = +1.028309259519300633556e+00 */
> +        .quad 0xBFBB05C2952191E4  /* A02 = -1.055566419686964629854e-01 */
> +        .quad 0xBFC7886A770DE2BD  /* A03 = -1.838505822486435070662e-01 */
> +        .quad 0xBF6AD114AC8E98EC  /* A00 = -3.273525599485007861467e-03 */
> +        .quad 0x3FF0861BF53E5226  /* A01 = +1.032741506559554434119e+00 */
> +        .quad 0xBFBE0C4F9B461507  /* A02 = -1.173753503881763554650e-01 */
> +        .quad 0xBFC6302A037CDE3A  /* A03 = -1.733448521642786954722e-01 */
> +        .quad 0xBF6FFBDE2A6C2AF8  /* A00 = -3.904279630096648551207e-03 */
> +        .quad 0x3FF099F2EB8E7DA3  /* A01 = +1.037585182326304034106e+00 */
> +        .quad 0xBFC09C74D192DDF0  /* A02 = -1.297746680554463516444e-01 */
> +        .quad 0xBFC4D571D8E3079F  /* A03 = -1.627638157861470424859e-01 */
> +        .quad 0xBF72E8FDC0B952AA  /* A00 = -4.616728994353872309042e-03 */
> +        .quad 0x3FF0AF7F273C9533  /* A01 = +1.042845872181101141152e+00 */
> +        .quad 0xBFC244C512736F10  /* A02 = -1.427236881344176033792e-01 */
> +        .quad 0xBFC379474F58B902  /* A03 = -1.521386277613104298645e-01 */
> +        .quad 0xBF762EABAF17395B  /* A00 = -5.415602341101023557701e-03 */
> +        .quad 0x3FF0C6C3886F63FB  /* A01 = +1.048526318502125631582e+00 */
> +        .quad 0xBFC3FDF9918EA12A  /* A02 = -1.561881981590514389957e-01 */
> +        .quad 0xBFC21CA89ECAB895  /* A03 = -1.414995932913753196036e-01 */
> +        .quad 0xBF79D387CE5B2BAE  /* A00 = -6.305246822828998107258e-03 */
> +        .quad 0x3FF0DFBFE2346376  /* A01 = +1.054626353847394337748e+00 */
> +        .quad 0xBFC5C6DA43602620  /* A02 = -1.701309994680721970894e-01 */
> +        .quad 0xBFC0C08BD8DB6631  /* A03 = -1.308760460731704100557e-01 */
> +        .quad 0xBF7DDBA8E8DA9060  /* A00 = -7.289562037531366334164e-03 */
> +        .quad 0x3FF0FA70F0D1B464  /* A01 = +1.061142864894713433443e+00 */
> +        .quad 0xBFC79E18D92BAA7C  /* A02 = -1.845122394946264732241e-01 */
> +        .quad 0xBFBECBBBF74C2669  /* A03 = -1.202962378266875381749e-01 */
> +        .quad 0xBF81254E76EA25DA  /* A00 = -8.371937755572145950511e-03 */
> +        .quad 0x3FF116D05835EBD0  /* A01 = +1.068069786618014660462e+00 */
> +        .quad 0xBFC982539E2ED224  /* A02 = -1.992897531869327609755e-01 */
> +        .quad 0xBFBC1B043C350159  /* A03 = -1.097872397413132278254e-01 */
> +        .quad 0xBF8391ACBA863403  /* A00 = -9.555196230190082448686e-03 */
> +        .quad 0x3FF134D4AA477FE2  /* A01 = +1.075398125794884141015e+00 */
> +        .quad 0xBFCB7218609FEAFB  /* A02 = -2.144194099235717521079e-01 */
> +        .quad 0xBFB970A16CB88329  /* A03 = -9.937485603633135211599e-02 */
> +        .quad 0xBF87935088E48E8B  /* A00 = -1.151144902957603431692e-02 */
> +        .quad 0x3FF1649892AD7DD3  /* A01 = +1.087059567413110938716e+00 */
> +        .quad 0xBFCE6971DDE75409  /* A02 = -2.375929196847723912089e-01 */
> +        .quad 0xBFB58291E88CB251  /* A03 = -8.402358939628952472223e-02 */
> +        .quad 0xBF8DB3A62C325325  /* A00 = -1.450280973794233242702e-02 */
> +        .quad 0x3FF1A9C900C6DEEA  /* A01 = +1.103951457056548068891e+00 */
> +        .quad 0xBFD13DBC65B0E08E  /* A02 = -2.693930619311765140012e-01 */
> +        .quad 0xBFB06696F62696D1  /* A03 = -6.406539449252625362252e-02 */
> +        .quad 0xBF92583699F2E27A  /* A00 = -1.791463198307716858659e-02 */
> +        .quad 0x3FF1F451B85AA9F0  /* A01 = +1.122148246892376022288e+00 */
> +        .quad 0xBFD34FD5F8288180  /* A02 = -3.017477916164565954205e-01 */
> +        .quad 0xBFA6FB692825B683  /* A03 = -4.488686194495718900788e-02 */
> +        .quad 0xBF9641C26E673D6F  /* A00 = -2.173522757385398448959e-02 */
> +        .quad 0x3FF24364DA5E2B07  /* A01 = +1.141453602790251542487e+00 */
> +        .quad 0xBFD564A5A5EF5890  /* A02 = -3.342680092295120530821e-01 */
> +        .quad 0xBF9B43712011A982  /* A03 = -2.662445791467283467968e-02 */
> +        .quad 0xBF9A901038EC2F39  /* A00 = -2.594018313816024226548e-02 */
> +        .quad 0x3FF2961356DFFEBA  /* A01 = +1.161639537196534011088e+00 */
> +        .quad 0xBFD775EBB17198C7  /* A02 = -3.665723069046972759644e-01 */
> +        .quad 0xBF833B1A926CD462  /* A03 = -9.390075295963199591975e-03 */
> +        .quad 0xBF9F396A6A461B91  /* A00 = -3.049246095317987084727e-02 */
> +        .quad 0x3FF2EB53BAEF534B  /* A01 = +1.182452898229899629357e+00 */
> +        .quad 0xBFD97DABF8AD8BBD  /* A02 = -3.982953957076310058660e-01 */
> +        .quad 0x3F7B8F6A3E0F8837  /* A03 = +6.728568086119371925713e-03 */
> +        .quad 0xBFA21878590F8BAA  /* A00 = -3.534294211546946951064e-02 */
> +        .quad 0x3FF34209790236E1  /* A01 = +1.203622315111197105253e+00 */
> +        .quad 0xBFDB764C0E71BECB  /* A02 = -4.290952817018306997277e-01 */
> +        .quad 0x3F962FE0C03F84C0  /* A03 = +2.166701482190513949888e-02 */
> +        .quad 0xBFA4B36B9AD27ECC  /* A00 = -4.043136849327097492868e-02 */
> +        .quad 0x3FF3990C5B12FC16  /* A01 = +1.224865298994477935679e+00 */
> +        .quad 0xBFDD5AABB0D01390  /* A02 = -4.586590983092770912322e-01 */
> +        .quad 0x3FA21DAF5CA162DB  /* A03 = +3.538272863142363083844e-02 */
> +        .quad 0xBFA7645E4D7BF28B  /* A00 = -4.568762489177399105378e-02 */
> +        .quad 0x3FF3EF2FD51C0D9F  /* A01 = +1.245895225962932562069e+00 */
> +        .quad 0xBFDF26377E1B686E  /* A02 = -4.867075664057044503963e-01 */
> +        .quad 0x3FA8803E756EE812  /* A03 = +4.785342391501513914509e-02 */
> +        .quad 0xBFAA210925C64413  /* A00 = -5.103329263796054643398e-02 */
> +        .quad 0x3FF44349F897D8E7  /* A01 = +1.266427966181760345066e+00 */
> +        .quad 0xBFE06A7B02C6D8E2  /* A02 = -5.129981092675530707226e-01 */
> +        .quad 0x3FAE3F194734F5D0  /* A03 = +5.907515520309980505687e-02 */
> +        .quad 0xBFACDE48F8A19BBB  /* A00 = -5.638340029764018351832e-02 */
> +        .quad 0x3FF49439D5466582  /* A01 = +1.286187966447272845727e+00 */
> +        .quad 0xBFE131C7C1063DDC  /* A02 = -5.373266954429101183166e-01 */
> +        .quad 0x3FB1ADEEC36AD805  /* A03 = +6.906025191241844940482e-02 */
> +        .quad 0xBFAF905D8F585680  /* A00 = -6.164829611604449866036e-02 */
> +        .quad 0x3FF4E0ED1FD27F99  /* A01 = +1.304913639360142818546e+00 */
> +        .quad 0xBFE1E7A859DC1D3D  /* A02 = -5.595285182070380836095e-01 */
> +        .quad 0x3FB3ED018E4642A1  /* A03 = +7.783517573831001679086e-02 */
> +        .quad 0xBFB11595104160BA  /* A00 = -6.673556944713512906198e-02 */
> +        .quad 0x3FF528650340490B  /* A01 = +1.322361958217302513319e+00 */
> +        .quad 0xBFE28B14B40BC974  /* A02 = -5.794776455425521000109e-01 */
> +        .quad 0x3FB5DF49F5BAF6D7  /* A03 = +8.543836831355676453281e-02 */
> +        .quad 0xBFB2513A97344BA4  /* A00 = -7.155195418844911836587e-02 */
> +        .quad 0x3FF569BA0DB5EE14  /* A01 = +1.338312200124055273420e+00 */
> +        .quad 0xBFE31B53A8B67B20  /* A02 = -5.970857901737396389308e-01 */
> +        .quad 0x3FB787F297BB0544  /* A03 = +9.191814617499455275507e-02 */
> +        .quad 0xBFB37512E848FAFA  /* A00 = -7.600515528700305112331e-02 */
> +        .quad 0x3FF5A41F33B403C8  /* A01 = +1.352568819013173495591e+00 */
> +        .quad 0xBFE397F6EA9A58A5  /* A02 = -6.123003561103997904880e-01 */
> +        .quad 0x3FB8EAA9FF25CA06  /* A03 = +9.733068923177520814782e-02 */
> +        .quad 0xBFB47B3E603AFC5D  /* A00 = -8.000554894805263217439e-02 */
> +        .quad 0x3FF5D6E3EDE40487  /* A01 = +1.364963464031718975988e+00 */
> +        .quad 0xBFE400D5BCA6D631  /* A02 = -6.251019177058819709103e-01 */
> +        .quad 0x3FBA0B830ED567FE  /* A03 = +1.017381583418739132707e-01 */
> +        .quad 0xBFB5BBFE8AC90496  /* A00 = -8.489981544791400103200e-02 */
> +        .quad 0x3FF612BA70107E95  /* A01 = +1.379572332145390989311e+00 */
> +        .quad 0xBFE477EAF1FA7693  /* A02 = -6.396383978023599814478e-01 */
> +        .quad 0x3FBB4784B7C08A95  /* A03 = +1.065600346196709652391e-01 */
> +        .quad 0xBFB6D5D940743939  /* A00 = -8.920057128509463473254e-02 */
> +        .quad 0x3FF644A8748F70CE  /* A01 = +1.391762214006166953340e+00 */
> +        .quad 0xBFE4D646AB07EA37  /* A02 = -6.511567440459832267763e-01 */
> +        .quad 0x3FBC354F4E1D5292  /* A03 = +1.101884427747086558913e-01 */
> +        .quad 0xBFB7223D19E4F3D1  /* A00 = -9.036619074045339206069e-02 */
> +        .quad 0x3FF6518FEB42B7FA  /* A01 = +1.394912642466350494175e+00 */
> +        .quad 0xBFE4ED86CB87498C  /* A02 = -6.539949393430091184598e-01 */
> +        .quad 0x3FBC6D29F28CCA9B  /* A03 = +1.110407082713131127205e-01 */
> +        .quad 0xBFB6878652FF6312  /* A00 = -8.800544287022329936754e-02 */
> +        .quad 0x3FF63948C302D040  /* A01 = +1.388985406648330922508e+00 */
> +        .quad 0xBFE4C4E2E7904E17  /* A02 = -6.490339777687407218920e-01 */
> +        .quad 0x3FBC127356CA1ABE  /* A03 = +1.096565329445224612481e-01 */
> +        .quad 0xBFB4F5D18B0C91D6  /* A00 = -8.187589306596207427980e-02 */
> +        .quad 0x3FF5FD27EB7DD0B8  /* A01 = +1.374305648697413673176e+00 */
> +        .quad 0xBFE464E01A2B2FC6  /* A02 = -6.373138915164353601739e-01 */
> +        .quad 0x3FBB460547674A30  /* A03 = +1.065371798825160976065e-01 */
> +        .quad 0xBFB26642FA16A685  /* A00 = -7.187288861919156890412e-02 */
> +        .quad 0x3FF59F9BEDE1C95A  /* A01 = +1.351467065073470141812e+00 */
> +        .quad 0xBFE3D67920C8FBEA  /* A02 = -6.199308052381387046381e-01 */
> +        .quad 0x3FBA24F6A8D3CBC1  /* A03 = +1.021265184570401413078e-01 */
> +        .quad 0xBFADB5294794F097  /* A00 = -5.802277563859197656582e-02 */
> +        .quad 0x3FF523EA7B9CF453  /* A01 = +1.321268542159732772845e+00 */
> +        .quad 0xBFE322A8B55E35DB  /* A02 = -5.979808370918208160205e-01 */
> +        .quad 0x3FB8C8673B1B3E37  /* A03 = +9.680791085269722928697e-02 */
> +        .quad 0xBFA4B7D661965C6A  /* A00 = -4.046506825687219699450e-02 */
> +        .quad 0x3FF48DE3E2CE3122  /* A01 = +1.284641157110919085227e+00 */
> +        .quad 0xBFE251FED1A7F445  /* A02 = -5.725092024655472622285e-01 */
> +        .quad 0x3FB745699FCABDB9  /* A03 = +9.090290213747821701507e-02 */
> +        .quad 0xBF93E60456E4EE1D  /* A00 = -1.943213253365004902773e-02 */
> +        .quad 0x3FF3E1A14E628A59  /* A01 = +1.242585474196536532432e+00 */
> +        .quad 0xBFE16C5AB660E876  /* A02 = -5.444768488007543094653e-01 */
> +        .quad 0x3FB5AD33AA8C188F  /* A03 = +8.467410005332197397987e-02 */
> +        .quad 0x3F738C17C47C7961  /* A00 = +4.772274820224659853951e-03 */
> +        .quad 0x3FF3234DDE3BD146  /* A01 = +1.196119182682268355933e+00 */
> +        .quad 0xBFE078C0D77A9D3B  /* A02 = -5.147403915952176722826e-01 */
> +        .quad 0x3FB40D74B3E276B8  /* A03 = +7.833032027925923568290e-02 */
> +        .quad 0x3FA0474BECC689C7  /* A00 = +3.179394975019849550746e-02 */
> +        .quad 0x3FF256FB4FA7D18A  /* A01 = +1.146235762743432307076e+00 */
> +        .quad 0xBFDEFA8E3FB285E2  /* A02 = -4.840427038235174395098e-01 */
> +        .quad 0x3FB270C007493D59  /* A03 = +7.203293016322244446403e-02 */
> +        .quad 0x3FAF5BD51E479BDC  /* A00 = +6.124750132203590768931e-02 */
> +        .quad 0x3FF18081D0B53BC5  /* A01 = +1.093873801484492647162e+00 */
> +        .quad 0xBFDCFE2439BD0C03  /* A02 = -4.530115665294831006626e-01 */
> +        .quad 0x3FB0DEFE5A45AFDD  /* A03 = +6.590261176978580437424e-02 */
> +        .quad 0x3FB7BD5D2806EA26  /* A00 = +9.273321368429118805032e-02 */
> +        .quad 0x3FF0A369E35B4440  /* A01 = +1.039895904647224256223e+00 */
> +        .quad 0xBFDB04BC5C9951E7  /* A02 = -4.221640495573226181669e-01 */
> +        .quad 0x3FAEBBBAA9D6DEEF  /* A03 = +6.002600978120919278380e-02 */
> +        .quad 0x3FC01BE411098DBC  /* A00 = +1.258511622610124502941e-01 */
> +        .quad 0x3FEF85BDABC031C1  /* A01 = +9.850757936961188621083e-01 */
> +        .quad 0xBFD91521375097C2  /* A02 = -3.919146576102968682065e-01 */
> +        .quad 0x3FABE26F0086D982  /* A03 = +5.446192628317005068883e-02 */
> +        .quad 0x3FC481D7FF5776B9  /* A00 = +1.602125164781023347604e-01 */
> +        .quad 0x3FEDC3506C1E7218  /* A01 = +9.300920592973538347792e-01 */
> +        .quad 0xBFD7349A88DA7D4F  /* A02 = -3.625856720409119104964e-01 */
> +        .quad 0x3FA936E2DFF8E2AE  /* A03 = +4.924687370334389358018e-02 */
> +        .quad 0x3FC90471F96FA27A  /* A00 = +1.954481571149420671141e-01 */
> +        .quad 0x3FEC0451601987A2  /* A01 = +8.755270840595026360376e-01 */
> +        .quad 0xBFD5671CD4B898DC  /* A02 = -3.344184949259110251063e-01 */
> +        .quad 0x3FA6BB9594603B67  /* A03 = +4.439990459660841243261e-02 */
> +        .quad 0x3FCFD8ADB9ED944C  /* A00 = +2.488000066615846384011e-01 */
> +        .quad 0x3FE978C073F6809A  /* A01 = +7.959902062321078108909e-01 */
> +        .quad 0xBFD2DF7E00BCD5A9  /* A02 = -2.948908812716931060471e-01 */
> +        .quad 0x3FA3614033D490B2  /* A03 = +3.785133965200894456959e-02 */
> +        .quad 0x3FD4846A12AFE5A0  /* A00 = +3.205819303981005674586e-01 */
> +        .quad 0x3FE63A1147D40472  /* A01 = +6.945883181471244061100e-01 */
> +        .quad 0xBFCFA2268AD34450  /* A02 = -2.471359422548027318101e-01 */
> +        .quad 0x3F9F150201D9FFE0  /* A03 = +3.035357605267552383310e-02 */
> +        .quad 0x3FD9018641F82BEB  /* A00 = +3.907180446846598154131e-01 */
> +        .quad 0x3FE33B7C220FFBDC  /* A01 = +6.010113396913498995389e-01 */
> +        .quad 0xBFCA4E4187E29C86  /* A02 = -2.055131829740483584423e-01 */
> +        .quad 0x3F98C30CED19F8F4  /* A03 = +2.418155858185229434287e-02 */
> +        .quad 0x3FDD4B8255BEB078  /* A00 = +4.577337109901757905561e-01 */
> +        .quad 0x3FE0858B19D3A49B  /* A01 = +5.163016800335243905451e-01 */
> +        .quad 0xBFC5BC929EACE564  /* A02 = -1.698172831327539045176e-01 */
> +        .quad 0x3F93A083CE57DE2B  /* A03 = +1.916700312537337677621e-02 */
> +        .quad 0x3FE0A8E5E039295C  /* A00 = +5.206174258576470315063e-01 */
> +        .quad 0x3FDC35E1234583FE  /* A01 = +4.407885403107342225937e-01 */
> +        .quad 0xBFC1DE034E31AEB9  /* A02 = -1.395877963835710222629e-01 */
> +        .quad 0x3F8EFDEBB3471BDC  /* A03 = +1.513275280821162888101e-02 */
> +        .quad 0x3FE2851B603CB2A5  /* A00 = +5.787484054213406503564e-01 */
> +        .quad 0x3FD7F4A44ABBB286  /* A01 = +3.743067483726821853551e-01 */
> +        .quad 0xBFBD3EEB67087DE7  /* A02 = -1.142413260026767657385e-01 */
> +        .quad 0x3F8864F38329E8BD  /* A03 = +1.191129917173260922836e-02 */
> +        .quad 0x3FE437DBE3C34AC1  /* A00 = +6.318187187665317283702e-01 */
> +        .quad 0x3FD43F6F789441B5  /* A01 = +3.163717916040938438194e-01 */
> +        .quad 0xBFB7D92E7901B9A4  /* A02 = -9.315767721429907277653e-02 */
> +        .quad 0x3F8327ED342308E1  /* A03 = +9.353497651663324544136e-03 */
> +        .quad 0x3FE5C0977766D55C  /* A00 = +6.797597248138731451661e-01 */
> +        .quad 0x3FD10B42A764D8F9  /* A01 = +2.663122782427219115142e-01 */
> +        .quad 0xBFB3633351D3D70F  /* A02 = -7.573242900602060456716e-02 */
> +        .quad 0x3F7E079E30FF899C  /* A03 = +7.331483779099558922843e-03 */
> +        .quad 0x3FE7202CE08A88C4  /* A00 = +7.226776490754436288455e-01 */
> +        .quad 0x3FCC973EB5662B01  /* A01 = +2.233656297433626314319e-01 */
> +        .quad 0xBFAF70A455F9920B  /* A02 = -6.140626477716545211782e-02 */
> +        .quad 0x3F77812411CE99B6  /* A03 = +5.738392731393584730859e-03 */
> +        .quad 0x3FE85879424095B1  /* A00 = +7.608000082006382003286e-01 */
> +        .quad 0x3FC7E73BD1674D84  /* A01 = +1.867441914060742336190e-01 */
> +        .quad 0xBFA96F84E4BF333B  /* A02 = -4.967894832916504993525e-02 */
> +        .quad 0x3F72606DDCA6E117  /* A03 = +4.486493251924870105662e-03 */
> +        .quad 0x3FE96BFE4957F4DD  /* A00 = +7.944327766887472330737e-01 */
> +        .quad 0x3FC3ED4780D25478  /* A01 = +1.556786898624158421711e-01 */
> +        .quad 0xBFA489C5F9A56B58  /* A02 = -4.011362717093075458408e-02 */
> +        .quad 0x3F6CB5DC17E9AD2A  /* A03 = +3.504686231556104931972e-03 */
> +        .quad 0x3FEA5D9CB2F41234  /* A00 = +8.239272589858672724006e-01 */
> +        .quad 0x3FC091A758374DCF  /* A01 = +1.294449978582705440555e-01 */
> +        .quad 0xBFA08E436D4B5CE0  /* A02 = -3.233538350257858517978e-02 */
> +        .quad 0x3F666997AD53E6B7  /* A03 = +2.735897297154145629133e-03 */
> +        .quad 0x3FEB3060342CB850  /* A00 = +8.496552485501158713532e-01 */
> +        .quad 0x3FBB7D30BBC7DC1B  /* A01 = +1.073790033768634993860e-01 */
> +        .quad 0xBF9AA6BA3443D9E3  /* A02 = -2.602663940430173170060e-02 */
> +        .quad 0x3F617CA764B7850B  /* A03 = +2.134634914668814050648e-03 */
> +        .quad 0x3FEBE759A6A0C7B8  /* A00 = +8.719909910635044170135e-01 */
> +        .quad 0x3FB6C10DE6A703FF  /* A01 = +8.888327485239243264115e-02 */
> +        .quad 0xBF956C566D8BE1F6  /* A02 = -2.092108768099084498138e-02 */
> +        .quad 0x3F5B46D1A4A59CF8  /* A03 = +1.664833764687232917079e-03 */
> +        .quad 0x3FEC858494887A04  /* A00 = +8.912985707318630268503e-01 */
> +        .quad 0x3FB2CC31F543394D  /* A01 = +7.342827070099140762682e-02 */
> +        .quad 0xBF9133477FF69137  /* A02 = -1.679717749142747504343e-02 */
> +        .quad 0x3F5544482FBB4DA5  /* A03 = +1.298017973501022466823e-03 */
> +        .quad 0x3FED0DB59D0E32E9  /* A00 = +9.079235141267335551518e-01 */
> +        .quad 0x3FAF006BAFFC6EF4  /* A01 = +6.055008433597022787787e-02 */
> +        .quad 0xBF8B97146FA2B97A  /* A02 = -1.347175565419144252499e-02 */
> +        .quad 0x3F5093B01F4CDC69  /* A03 = +1.011774057770665211434e-03 */
> +        .quad 0x3FEDB487C3EC457C  /* A00 = +9.282873942012623835751e-01 */
> +        .quad 0x3FA7390C09D0BD1D  /* A01 = +4.535710925881118044112e-02 */
> +        .quad 0xBF83D9F7C3181106  /* A02 = -9.693084374710735778846e-03 */
> +        .quad 0x3F46E34A0A3C0E64  /* A03 = +6.984817050299072134500e-04 */
> +        .quad 0x3FEE5FFCB4E6EB00  /* A00 = +9.492171796076434020506e-01 */
> +        .quad 0x3F9F4913ED00AADF  /* A01 = +3.055220731782070861526e-02 */
> +        .quad 0xBF79670BD0E59B5C  /* A02 = -6.201788097633133961528e-03 */
> +        .quad 0x3F3BC998EBCAF96D  /* A03 = +4.240034429975534616304e-04 */
> +        .quad 0x3FEEDBA41E9542FE  /* A00 = +9.643116566968215064293e-01 */
> +        .quad 0x3F94F5DD18D9C24D  /* A01 = +2.046914543319848858727e-02 */
> +        .quad 0xBF7034896AA122B9  /* A02 = -3.956352980886528904192e-03 */
> +        .quad 0x3F30DCCB47810B39  /* A03 = +2.573009765038273091199e-04 */
> +        .quad 0x3FEF33F2882520ED  /* A00 = +9.750912341196716903724e-01 */
> +        .quad 0x3F8BF37F2CF553FF  /* A01 = +1.364802699996836392315e-02 */
> +        .quad 0xBF649F6F05A69619  /* A02 = -2.517430152880317534986e-03 */
> +        .quad 0x3F247623C950AAC9  /* A03 = +1.561087307505231250044e-04 */
> +        .quad 0x3FEF727757751741  /* A00 = +9.827229221489021115943e-01 */
> +        .quad 0x3F828E67912C4400  /* A01 = +9.060677640748693306705e-03 */
> +        .quad 0xBF5A2F51A806CC2C  /* A02 = -1.598195784123355826789e-03 */
> +        .quad 0x3F18D35D7687E613  /* A03 = +9.470231965016282719549e-05 */
> +        .quad 0x3FEF9E6325C5942A  /* A00 = +9.880843866091073568469e-01 */
> +        .quad 0x3F788AB117618F76  /* A01 = +5.991641772286606867914e-03 */
> +        .quad 0xBF5096EAB0B1EA89  /* A02 = -1.012543859160305046233e-03 */
> +        .quad 0x3F0E1E50EC4435AB  /* A03 = +5.744633156910412119652e-05 */
> +        .quad 0x3FEFBD0784049369  /* A00 = +9.918248728250605994461e-01 */
> +        .quad 0x3F702BBD8294035F  /* A01 = +3.947963975634432264028e-03 */
> +        .quad 0xBF44FB55E0F00593  /* A02 = -6.403130845457509273330e-04 */
> +        .quad 0x3F0244DCD723230A  /* A03 = +3.484534217219031730379e-05 */
> +        .quad 0x3FEFD245E2366A43  /* A00 = +9.944180887426415926811e-01 */
> +        .quad 0x3F653D82EC088433  /* A01 = +2.592807490387838333795e-03 */
> +        .quad 0xBF3A7DF75E013CB8  /* A02 = -4.042366908878036561859e-04 */
> +        .quad 0x3EF6298E69F991CD  /* A03 = +2.113564425911141559972e-05 */
> +        .quad 0x3FEFE0EAA508BC69  /* A00 = +9.962056372950317539861e-01 */
> +        .quad 0x3F5BD0771AF3FDDA  /* A01 = +1.697651208644282514598e-03 */
> +        .quad 0xBF30B2E1254DE571  /* A02 = -2.548026725928887099328e-04 */
> +        .quad 0x3EEAE28B70EC0256  /* A03 = +1.281973848454955042307e-05 */
> +        .quad 0x3FEFEAF5303D7F96  /* A00 = +9.974313680831865536192e-01 */
> +        .quad 0x3F5229111365657E  /* A01 = +1.108423877289460134782e-03 */
> +        .quad 0xBF250572D04DFE66  /* A02 = -1.603796628408704519168e-04 */
> +        .quad 0x3EE04E89BB57C981  /* A03 = +7.775682983689149966743e-06 */
> +        .quad 0x3FEFF1CF52F1CF44  /* A00 = +9.982678051005469122003e-01 */
> +        .quad 0x3F47A71316147CEB  /* A01 = +7.218211359577819110842e-04 */
> +        .quad 0xBF1A6D7604055719  /* A02 = -1.008132248946049582547e-04 */
> +        .quad 0x3ED3C8047586A85C  /* A03 = +4.716233739913014633626e-06 */
> +        .quad 0x3FEFF6770369EF69  /* A00 = +9.988360468555416149528e-01 */
> +        .quad 0x3F3EBB261180FBF0  /* A01 = +4.689186039321105101130e-04 */
> +        .quad 0xBF1097754FE19D7F  /* A02 = -6.329206004950480057066e-05 */
> +        .quad 0x3EC7FEFF83BCA0A7  /* A03 = +2.860556404988488738366e-06 */
> +        .quad 0x3FEFF99D42371AC4  /* A00 = +9.992204945818561334647e-01 */
> +        .quad 0x3F33EB2AEC271F59  /* A01 = +3.039340773764907474054e-04 */
> +        .quad 0xBF04CF18E0FC0D79  /* A02 = -3.968996690952969588805e-05 */
> +        .quad 0x3EBD1BDBD6019BE9  /* A03 = +1.735021065507727833886e-06 */
> +        .quad 0x3FEFFBBCA32B0D91  /* A00 = +9.994795977476532700123e-01 */
> +        .quad 0x3F29C41E1615110A  /* A01 = +1.965796209707565346710e-04 */
> +        .quad 0xBEFA11F93D9DCB5A  /* A02 = -2.486248909101414873235e-05 */
> +        .quad 0x3EB1A7CA4546F7A7  /* A03 = +1.052345642723709228769e-06 */
> +        .quad 0x3FEFFD298B8E8DE2  /* A00 = +9.996535993308806045121e-01 */
> +        .quad 0x3F20A1C42D523C5B  /* A01 = +1.268913244172078754520e-04 */
> +        .quad 0xBEF0507A364AFAE4  /* A02 = -1.555859070622834605755e-05 */
> +        .quad 0x3EA56ACA17E7CDF4  /* A03 = +6.382806956848098872313e-07 */
> +        .quad 0x3FEFFE1DC82BA5A3  /* A00 = +9.997700604991915929176e-01 */
> +        .quad 0x3F156E73B90F1769  /* A01 = +8.175450626798714452801e-05 */
> +        .quad 0xBEE4663579D0A09F  /* A02 = -9.727122057226747625365e-06 */
> +        .quad 0x3E99FAF6FEC5D4C1  /* A03 = +3.871371052824002996020e-07 */
> +        .quad 0x3FEFFEF8D0BB5E81  /* A00 = +9.998745037837154514548e-01 */
> +        .quad 0x3F06686DA18D39C3  /* A01 = +4.273972098777251447726e-05 */
> +        .quad 0xBED46BC298073E90  /* A02 = -4.868731025855742842491e-06 */
> +        .quad 0x3E88E42286B9D0FD  /* A03 = +1.854535328530838170114e-07 */
> +        .quad 0x3FEFFF8DBC68DDC7  /* A00 = +9.999455146670975791423e-01 */
> +        .quad 0x3EF26B2953A80AF0  /* A01 = +1.756534514108903368909e-05 */
> +        .quad 0xBEBFC4472D580F83  /* A02 = -1.893443529411295465239e-06 */
> +        .quad 0x3E72505B4553D19F  /* A03 = +6.822456673547912277047e-08 */
> +        .quad 0x3FEFFFCED1276609  /* A00 = +9.999765477215883935358e-01 */
> +        .quad 0x3EDE1A94C7CC58F5  /* A01 = +7.177313020153979672606e-06 */
> +        .quad 0xBEA8A2C988744E57  /* A02 = -7.342066660497443762363e-07 */
> +        .quad 0x3E5AF30036BBBAF4  /* A03 = +2.509841882843541084885e-08 */
> +        .quad 0x3FEFFFEAFE70FCFC  /* A00 = +9.999899835164849370983e-01 */
> +        .quad 0x3EC879175E3549F5  /* A01 = +2.917410471128503564412e-06 */
> +        .quad 0xBE930E36677D1813  /* A02 = -2.839493400307523115929e-07 */
> +        .quad 0x3E43D4005B42D48F  /* A03 = +9.233192745401904898013e-09 */
> +        .quad 0x3ff0000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .align 16
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSignMask        */
> +        .align 16
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff           /* _sAbsMask         */
> +        .align 16
> +        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask     */
> +        .align 16
> +        .long 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000           /* _iExpMask         */
> +        .align 16
> +        .long 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000           /* _iMinIdxOfsMask   */
> +        .align 16
> +        .long 0x04280000, 0x04280000, 0x04280000, 0x04280000           /* _iMaxIdxMask      */
> +        .align 16
> +        .type  __svml_stanh_data_internal,@object
> +        .size  __svml_stanh_data_internal,.-__svml_stanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S
> new file mode 100644
> index 0000000000..a56795e3cd
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core-sse.S
> @@ -0,0 +1,20 @@
> +/* SSE version of vectorized tanhf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +    Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define _ZGVdN8v_tanhf _ZGVdN8v_tanhf_sse_wrapper
> +#include "../svml_s_tanhf8_core.S"
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c
> new file mode 100644
> index 0000000000..fadcea36ab
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core.c
> @@ -0,0 +1,28 @@
> +/* Multiple versions of vectorized tanhf, vector length is 8.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define SYMBOL_NAME _ZGVdN8v_tanhf
> +#include "ifunc-mathvec-avx2.h"
> +
> +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ());
> +
> +#ifdef SHARED
> +__hidden_ver1 (_ZGVdN8v_tanhf, __GI__ZGVdN8v_tanhf,
> +              __redirect__ZGVdN8v_tanhf)
> +  __attribute__ ((visibility ("hidden")));
> +#endif
> diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
> new file mode 100644
> index 0000000000..c19e6bf8b5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
> @@ -0,0 +1,844 @@
> +/* Function tanhf vectorized with AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   https://www.gnu.org/licenses/.  */
> +
> +/*
> + * ALGORITHM DESCRIPTION:
> + *
> + *   NOTE: Since the hyperbolic tangent function is odd
> + *         (tanh(x) = -tanh(-x)), below algorithm deals with the absolute
> + *         value of the argument |x|: tanh(x) = sign(x) * tanh(|x|)
> + *
> + *   We use a table lookup method to compute tanh(|x|).
> + *   The basic idea is to split the input range into a number of subintervals
> + *   and to approximate tanh(.) with a polynomial on each of them.
> + *
> + *   IEEE SPECIAL CONDITIONS:
> + *   x = [+,-]0, r = [+,-]0
> + *   x = +Inf,   r = +1
> + *   x = -Inf,   r = -1
> + *   x = QNaN,   r = QNaN
> + *   x = SNaN,   r = QNaN
> + *
> + *
> + *   ALGORITHM DETAILS
> + *   We handle special values in a callout function, aside from main path
> + *   computations. "Special" for this algorithm are:
> + *   INF, NAN, |x| > HUGE_THRESHOLD
> + *
> + *
> + *   Main path computations are organized as follows:
> + *   Actually we split the interval [0, SATURATION_THRESHOLD)
> + *   into a number of subintervals.  On each subinterval we approximate tanh(.)
> + *   with a minimax polynomial of pre-defined degree. Polynomial coefficients
> + *   are computed beforehand and stored in table. We also use
> + *
> + *       y := |x| + B,
> + *
> + *   here B depends on subinterval and is used to make argument
> + *   closer to zero.
> + *   We also add large fake interval [SATURATION_THRESHOLD, HUGE_THRESHOLD],
> + *   where 1.0 + 0.0*y + 0.0*y^2 ... coefficients are stored - just to
> + *   preserve main path computation logic but return 1.0 for all arguments.
> + *
> + *   Hence reconstruction looks as follows:
> + *   we extract proper polynomial and range reduction coefficients
> + *        (Pj and B), corresponding to subinterval, to which |x| belongs,
> + *        and return
> + *
> + *       r := sign(x) * (P0 + P1 * y + ... + Pn * y^n)
> + *
> + *   NOTE: we use multiprecision technique to multiply and sum the first
> + *         K terms of the polynomial. So Pj, j = 0..K are stored in
> + *         table each as a pair of target precision numbers (Pj and PLj) to
> + *         achieve wider than target precision.
> + *
> + *
> + */
> +
> +/* Offsets for data table __svml_stanh_data_internal
> + */
> +#define _dbP                           0
> +#define _sSignMask                     4288
> +#define _sAbsMask                      4320
> +#define _iExpMantMask                  4352
> +#define _iExpMask                      4384
> +#define _iMinIdxOfsMask                4416
> +#define _iMaxIdxMask                   4448
> +
> +#include <sysdep.h>
> +
> +        .text
> +       .section .text.avx2,"ax",@progbits
> +ENTRY(_ZGVdN8v_tanhf_avx2)
> +        pushq     %rbp
> +        cfi_def_cfa_offset(16)
> +        movq      %rsp, %rbp
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        andq      $-32, %rsp
> +        pushq     %r12
> +        subq      $120, %rsp
> +        lea       _dbP+16+__svml_stanh_data_internal(%rip), %r10
> +        vmovaps   %ymm0, %ymm12
> +
> +/* Here huge arguments, INF and NaNs are filtered out to callout. */
> +        vpand     _iExpMantMask+__svml_stanh_data_internal(%rip), %ymm12, %ymm14
> +
> +/*
> + *  small table specific variables *
> + *  Constant loading
> + */
> +        vmovups   _iMaxIdxMask+__svml_stanh_data_internal(%rip), %ymm8
> +        vpsubd    _iMinIdxOfsMask+__svml_stanh_data_internal(%rip), %ymm14, %ymm9
> +
> +/* if VMIN, VMAX is defined for I type */
> +        vxorps    %ymm15, %ymm15, %ymm15
> +        vpcmpgtd  %ymm15, %ymm9, %ymm0
> +        vpand     %ymm0, %ymm9, %ymm7
> +        vpcmpgtd  %ymm8, %ymm9, %ymm6
> +        vblendvps %ymm6, %ymm8, %ymm7, %ymm3
> +        vpsrld    $14, %ymm3, %ymm1
> +        vpcmpgtd  _iExpMask+__svml_stanh_data_internal(%rip), %ymm14, %ymm13
> +        vmovmskps %ymm13, %r11d
> +        vandps    _sAbsMask+__svml_stanh_data_internal(%rip), %ymm12, %ymm10
> +        vandps    _sSignMask+__svml_stanh_data_internal(%rip), %ymm12, %ymm11
> +        vextractf128 $1, %ymm1, %xmm2
> +        vmovd     %xmm1, %r9d
> +        vmovd     %xmm2, %ecx
> +        vpextrd   $1, %xmm2, %edx
> +        vpextrd   $1, %xmm1, %r8d
> +        movslq    %r9d, %r9
> +        movslq    %edx, %rdx
> +        movslq    %r8d, %r8
> +        vpextrd   $2, %xmm1, %edi
> +        movslq    %ecx, %rcx
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -8; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf8, 0xff, 0xff, 0xff, 0x22
> +        vpextrd   $3, %xmm2, %r12d
> +        vpextrd   $3, %xmm1, %esi
> +        vpextrd   $2, %xmm2, %eax
> +        movslq    %edi, %rdi
> +        movslq    %r12d, %r12
> +        movslq    %esi, %rsi
> +        movslq    %eax, %rax
> +        vmovupd   -16(%r9,%r10), %xmm5
> +        vmovupd   -16(%rdx,%r10), %xmm14
> +        vmovupd   -16(%rcx,%r10), %xmm13
> +        vmovupd   (%r9,%r10), %xmm1
> +        vmovupd   (%r8,%r10), %xmm2
> +        vmovupd   -16(%r8,%r10), %xmm4
> +        vinsertf128 $1, -16(%rdi,%r10), %ymm5, %ymm15
> +        vinsertf128 $1, -16(%r12,%r10), %ymm14, %ymm3
> +        vinsertf128 $1, -16(%rax,%r10), %ymm13, %ymm6
> +        vinsertf128 $1, (%rdi,%r10), %ymm1, %ymm5
> +        vinsertf128 $1, (%rsi,%r10), %ymm2, %ymm14
> +        vunpcklpd %ymm3, %ymm6, %ymm8
> +        vunpckhpd %ymm3, %ymm6, %ymm6
> +        vunpcklpd %ymm14, %ymm5, %ymm3
> +        vunpckhpd %ymm14, %ymm5, %ymm2
> +        vmovupd   (%rcx,%r10), %xmm13
> +        vcvtps2pd %xmm10, %ymm5
> +        vextractf128 $1, %ymm10, %xmm10
> +        vfmadd213pd %ymm3, %ymm5, %ymm2
> +        vinsertf128 $1, -16(%rsi,%r10), %ymm4, %ymm0
> +        vmovupd   (%rdx,%r10), %xmm4
> +        vunpcklpd %ymm0, %ymm15, %ymm9
> +        vunpckhpd %ymm0, %ymm15, %ymm7
> +        vfmadd213pd %ymm7, %ymm5, %ymm2
> +        vfmadd213pd %ymm9, %ymm5, %ymm2
> +        vinsertf128 $1, (%r12,%r10), %ymm4, %ymm0
> +        vcvtps2pd %xmm10, %ymm4
> +        vinsertf128 $1, (%rax,%r10), %ymm13, %ymm15
> +        vunpcklpd %ymm0, %ymm15, %ymm1
> +        vunpckhpd %ymm0, %ymm15, %ymm0
> +        vfmadd213pd %ymm1, %ymm4, %ymm0
> +        vcvtpd2ps %ymm2, %xmm1
> +        vfmadd213pd %ymm6, %ymm4, %ymm0
> +        vfmadd213pd %ymm8, %ymm4, %ymm0
> +        vcvtpd2ps %ymm0, %xmm0
> +        vinsertf128 $1, %xmm0, %ymm1, %ymm2
> +        vorps     %ymm11, %ymm2, %ymm0
> +        testl     %r11d, %r11d
> +
> +/* Go to special inputs processing branch */
> +        jne       L(SPECIAL_VALUES_BRANCH)
> +                                # LOE rbx r13 r14 r15 r11d ymm0 ymm12
> +
> +/* Restore registers
> + * and exit the function
> + */
> +
> +L(EXIT):
> +        addq      $120, %rsp
> +        cfi_restore(12)
> +        popq      %r12
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        cfi_def_cfa(7, 8)
> +        cfi_restore(6)
> +        ret
> +        cfi_def_cfa(6, 16)
> +        cfi_offset(6, -16)
> +        /*  DW_CFA_expression: r12 (r12) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -8; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0c, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0xf8, 0xff, 0xff, 0xff, 0x22
> +
> +/* Branch to process
> + * special inputs
> + */
> +
> +L(SPECIAL_VALUES_BRANCH):
> +        vmovups   %ymm12, 32(%rsp)
> +        vmovups   %ymm0, 64(%rsp)
> +                                # LOE rbx r13 r14 r15 r11d ymm0
> +
> +        xorl      %r12d, %r12d
> +                                # LOE rbx r13 r14 r15 r11d r12d
> +
> +        vzeroupper
> +        movq      %r13, 8(%rsp)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        movl      %r11d, %r13d
> +        movq      %r14, (%rsp)
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Range mask
> + * bits check
> + */
> +
> +L(RANGEMASK_CHECK):
> +        btl       %r12d, %r13d
> +
> +/* Call scalar math function */
> +        jc        L(SCALAR_MATH_CALL)
> +                                # LOE rbx r15 r12d r13d
> +
> +/* Special inputs
> + * processing loop
> + */
> +
> +L(SPECIAL_VALUES_LOOP):
> +        incl      %r12d
> +        cmpl      $8, %r12d
> +
> +/* Check bits in range mask */
> +        jl        L(RANGEMASK_CHECK)
> +                                # LOE rbx r15 r12d r13d
> +
> +        movq      8(%rsp), %r13
> +        cfi_restore(13)
> +        movq      (%rsp), %r14
> +        cfi_restore(14)
> +        vmovups   64(%rsp), %ymm0
> +
> +/* Go to exit */
> +        jmp       L(EXIT)
> +        /*  DW_CFA_expression: r13 (r13) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -120; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0d, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x88, 0xff, 0xff, 0xff, 0x22
> +        /*  DW_CFA_expression: r14 (r14) (DW_OP_lit8; DW_OP_minus; DW_OP_const4s: -32; DW_OP_and; DW_OP_const4s: -128; DW_OP_plus)  */
> +        .cfi_escape 0x10, 0x0e, 0x0e, 0x38, 0x1c, 0x0d, 0xe0, 0xff, 0xff, 0xff, 0x1a, 0x0d, 0x80, 0xff, 0xff, 0xff, 0x22
> +                                # LOE rbx r13 r14 r15 ymm0
> +
> +/* Scalar math fucntion call
> + * to process special input
> + */
> +
> +L(SCALAR_MATH_CALL):
> +        movl      %r12d, %r14d
> +        movss     32(%rsp,%r14,4), %xmm0
> +        call      tanhf@PLT
> +                                # LOE rbx r14 r15 r12d r13d xmm0
> +
> +        movss     %xmm0, 64(%rsp,%r14,4)
> +
> +/* Process special inputs in loop */
> +        jmp       L(SPECIAL_VALUES_LOOP)
> +                                # LOE rbx r15 r12d r13d
> +END(_ZGVdN8v_tanhf_avx2)
> +
> +        .section .rodata, "a"
> +        .align 32
> +
> +#ifdef __svml_stanh_data_internal_typedef
> +typedef unsigned int VUINT32;
> +typedef struct
> +{
> +        __declspec(align(32)) VUINT32 _dbP[(134*4)][2];
> +        __declspec(align(32)) VUINT32 _sSignMask[8][1];
> +        __declspec(align(32)) VUINT32 _sAbsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iExpMantMask[8][1];
> +        __declspec(align(32)) VUINT32 _iExpMask[8][1];
> +        __declspec(align(32)) VUINT32 _iMinIdxOfsMask[8][1];
> +        __declspec(align(32)) VUINT32 _iMaxIdxMask[8][1];
> +} __svml_stanh_data_internal;
> +#endif
> +__svml_stanh_data_internal:
> +        /* Pol_000:  err=7.93e-09, x in [0.0000000; 0.0312500]. */
> +        .quad 0x0000000000000000  /* A00 = +0.000000000000000000000e-01 */
> +        .quad 0x3FF00000022C70EB  /* A01 = +1.000000008097283510367e+00 */
> +        .quad 0xBED00E878CFFA194  /* A02 = -3.828228912518614443549e-06 */
> +        .quad 0xBFD551766D0607A9  /* A03 = -3.330970825846813476723e-01 */
> +        .quad 0xBE53D60CE3E4C297  /* A00 = -1.847383956330407336230e-08 */
> +        .quad 0x3FF000024177CF5C  /* A01 = +1.000002151235967140508e+00 */
> +        .quad 0xBF1758BC94A51A25  /* A02 = -8.906031613262943753568e-05 */
> +        .quad 0xBFD53EAE67E0D4F0  /* A03 = -3.319507612644221339337e-01 */
> +        .quad 0xBE5A9E47EF32D6FE  /* A00 = -2.479020984039698285657e-08 */
> +        .quad 0x3FF00002DA983057  /* A01 = +1.000002721676556793895e+00 */
> +        .quad 0xBF1BD953509E94AA  /* A02 = -1.062352277175377670507e-04 */
> +        .quad 0xBFD53BDB562EEDD5  /* A03 = -3.317783681520414806876e-01 */
> +        .quad 0xBE6191BBE496D294  /* A00 = -3.272532162914017685901e-08 */
> +        .quad 0x3FF0000390492017  /* A01 = +1.000003398528866105366e+00 */
> +        .quad 0xBF20727E814A57CE  /* A02 = -1.254825043772153972919e-04 */
> +        .quad 0xBFD538DE060A6F22  /* A03 = -3.315959033004550748913e-01 */
> +        .quad 0xBE66DAFA2A893A25  /* A00 = -4.257146219278012568149e-08 */
> +        .quad 0x3FF0000465E08CD1  /* A01 = +1.000004194219219266770e+00 */
> +        .quad 0xBF2341C765EF91B6  /* A02 = -1.469188600530365522261e-04 */
> +        .quad 0xBFD535B6841FAF9E  /* A03 = -3.314033785124993469751e-01 */
> +        .quad 0xBE6D5794E361E964  /* A00 = -5.465394929765249413434e-08 */
> +        .quad 0x3FF000055EE2A0CB  /* A01 = +1.000005121846742950353e+00 */
> +        .quad 0xBF265E6C77E66C8B  /* A02 = -1.706607253709506650304e-04 */
> +        .quad 0xBFD53264DDCCEDA6  /* A03 = -3.312008062382240103361e-01 */
> +        .quad 0xBE729C844D374A6E  /* A00 = -6.933284462462096107184e-08 */
> +        .quad 0x3FF000067F019093  /* A01 = +1.000006195180536350264e+00 */
> +        .quad 0xBF29CC5348D6DCE5  /* A02 = -1.968242326435338705130e-04 */
> +        .quad 0xBFD52EE92121ED35  /* A03 = -3.309881995734998416658e-01 */
> +        .quad 0xBE775AEA17EAA872  /* A00 = -8.700465590574974405858e-08 */
> +        .quad 0x3FF00007CA1D66B8  /* A01 = +1.000007428656699559610e+00 */
> +        .quad 0xBF2D8F5EB98A2637  /* A02 = -2.255252009216044881395e-04 */
> +        .quad 0xBFD52B435CDF9128  /* A03 = -3.307655722585587376727e-01 */
> +        .quad 0xBE7D04DA28C343F0  /* A00 = -1.081040272327705484794e-07 */
> +        .quad 0x3FF000094443CCF5  /* A01 = +1.000008837375216730337e+00 */
> +        .quad 0xBF30D5B76C947AE5  /* A02 = -2.568791210978817814332e-04 */
> +        .quad 0xBFD52773A0776FAD  /* A03 = -3.305329386764651045105e-01 */
> +        .quad 0xBE81DD77A12C51C7  /* A00 = -1.331054169875768625701e-07 */
> +        .quad 0x3FF0000AF1AFD2DA  /* A01 = +1.000010437096696680470e+00 */
> +        .quad 0xBF331230624C1680  /* A02 = -2.910011410651516805537e-04 */
> +        .quad 0xBFD52379FC0B61DF  /* A03 = -3.302903138515186909352e-01 */
> +        .quad 0xBE85D04EEEB3C435  /* A00 = -1.625247628488202841012e-07 */
> +        .quad 0x3FF0000CD6C9B1F2  /* A01 = +1.000012244238970726684e+00 */
> +        .quad 0xBF357F0742FADDD4  /* A02 = -3.280060509313874068243e-04 */
> +        .quad 0xBFD51F56806D0E81  /* A03 = -3.300377134475880880338e-01 */
> +        .quad 0xBE8A6E289B59681B  /* A00 = -1.969211333326924655065e-07 */
> +        .quad 0x3FF0000EF8268F72  /* A01 = +1.000014275873550406715e+00 */
> +        .quad 0xBF381E277A1B747A  /* A02 = -3.680082682942575423093e-04 */
> +        .quad 0xBFD51B093F1D6FD4  /* A03 = -3.297751537663746734808e-01 */
> +        .quad 0xBE8FCBC40EE9ABD5  /* A00 = -2.368983653301529373887e-07 */
> +        .quad 0x3FF000115A883B6C  /* A01 = +1.000016549721943981410e+00 */
> +        .quad 0xBF3AF17AC974B3D9  /* A02 = -4.111218235774406434303e-04 */
> +        .quad 0xBFD516924A4C549C  /* A03 = -3.295026517456081105450e-01 */
> +        .quad 0xBE92FFBC60A3F956  /* A00 = -2.831066871072026054144e-07 */
> +        .quad 0x3FF0001402DCED8A  /* A01 = +1.000019084151832604590e+00 */
> +        .quad 0xBF3DFAE9390C4801  /* A02 = -4.574603454311488280083e-04 */
> +        .quad 0xBFD511F1B4D7DC3A  /* A03 = -3.292202249571719585575e-01 */
> +        .quad 0xBE9690A22F96D5AD  /* A00 = -3.362443262393081632612e-07 */
> +        .quad 0x3FF00016F63EFF5D  /* A01 = +1.000021898173108825247e+00 */
> +        .quad 0xBF409E2C839605BB  /* A02 = -5.071370461992499986334e-04 */
> +        .quad 0xBFD50D27924BEE00  /* A03 = -3.289278916051614487515e-01 */
> +        .quad 0xBE9AA56C65E72A73  /* A00 = -3.970591019557469835586e-07 */
> +        .quad 0x3FF0001A39F4A43E  /* A01 = +1.000025011433776978009e+00 */
> +        .quad 0xBF425BD74C3D6667  /* A02 = -5.602647074553602319844e-04 */
> +        .quad 0xBFD50833F6E1ABA2  /* A03 = -3.286256705238718156536e-01 */
> +        .quad 0xBE9F4BD4FF1A83B0  /* A00 = -4.663500013744687071912e-07 */
> +        .quad 0x3FF0001DD36F9EC2  /* A01 = +1.000028444215715683896e+00 */
> +        .quad 0xBF44376634149405  /* A02 = -6.169556656102642569831e-04 */
> +        .quad 0xBFD50316F77EDEE5  /* A03 = -3.283135811757190158922e-01 */
> +        .quad 0xBEA3B625387BB079  /* A00 = -5.874486399249461304297e-07 */
> +        .quad 0x3FF00023E14CFBA9  /* A01 = +1.000034217911642153709e+00 */
> +        .quad 0xBF47392F923218D2  /* A02 = -7.087213783883111826306e-04 */
> +        .quad 0xBFD4FB1FACDEB938  /* A03 = -3.278273761924483942209e-01 */
> +        .quad 0xBEAA6E24F543500A  /* A00 = -7.876828740601738750574e-07 */
> +        .quad 0x3FF0002D5C6E8412  /* A01 = +1.000043259679163742959e+00 */
> +        .quad 0xBF4BAF02BD7FDD70  /* A02 = -8.448375110664940040861e-04 */
> +        .quad 0xBFD4EFEE6527A7DE  /* A03 = -3.271442401734229177279e-01 */
> +        .quad 0xBEB16E3EBE2157D0  /* A00 = -1.038947396133402500647e-06 */
> +        .quad 0x3FF00038990FEE2F  /* A01 = +1.000053975962952312884e+00 */
> +        .quad 0xBF50569481C574CB  /* A02 = -9.972048056490652716971e-04 */
> +        .quad 0xBFD4E419278DA2B4  /* A03 = -3.264220129263251113372e-01 */
> +        .quad 0xBEB6A7B6723165D4  /* A00 = -1.350350836279403750524e-06 */
> +        .quad 0x3FF00045CAB4158E  /* A01 = +1.000066558657042303793e+00 */
> +        .quad 0xBF531D7C9C849108  /* A02 = -1.166698160951775212202e-03 */
> +        .quad 0xBFD4D7A0BB33B152  /* A03 = -3.256608799117844954552e-01 */
> +        .quad 0xBEBD0EE2A8654AFD  /* A00 = -1.732000471561702711532e-06 */
> +        .quad 0x3FF00055276F18D6  /* A01 = +1.000081209219890521211e+00 */
> +        .quad 0xBF562FDBA3FB6C6C  /* A02 = -1.354183666925102939860e-03 */
> +        .quad 0xBFD4CA85F1B93DB2  /* A03 = -3.248610363561638125773e-01 */
> +        .quad 0xBEC269D4036A207E  /* A00 = -2.195047297096822741730e-06 */
> +        .quad 0x3FF00066E7DA6E4E  /* A01 = +1.000098138500919997540e+00 */
> +        .quad 0xBF5991499FC36B3A  /* A02 = -1.560518167983372759405e-03 */
> +        .quad 0xBFD4BCC9A72283D6  /* A03 = -3.240226871658341556426e-01 */
> +        .quad 0xBEC7154B6C09CFE1  /* A00 = -2.751729738565190291276e-06 */
> +        .quad 0x3FF0007B47086B80  /* A01 = +1.000117566559055148900e+00 */
> +        .quad 0xBF5D455433B4F8F4  /* A02 = -1.786548832412968197680e-03 */
> +        .quad 0xBFD4AE6CC1BFE145  /* A03 = -3.231460468373550942722e-01 */
> +        .quad 0xBECCA68CC64A0F8A  /* A00 = -3.415415948561670285790e-06 */
> +        .quad 0x3FF00092827742F7  /* A01 = +1.000139722473418535387e+00 */
> +        .quad 0xBF60A7BF15A527AF  /* A02 = -2.033112728132522705610e-03 */
> +        .quad 0xBFD49F703214084C  /* A03 = -3.222313393636155876010e-01 */
> +        .quad 0xBED19E68676B241B  /* A00 = -4.200644630977303616698e-06 */
> +        .quad 0x3FF000ACDA037B26  /* A01 = +1.000164844146362863597e+00 */
> +        .quad 0xBF62D99F836A02F8  /* A02 = -2.301036405072284102280e-03 */
> +        .quad 0xBFD48FD4F2B91B28  /* A03 = -3.212787981359945810311e-01 */
> +        .quad 0xBED57CF4B0C7AA54  /* A00 = -5.123164339408145209103e-06 */
> +        .quad 0x3FF000CA8FD9E1A1  /* A01 = +1.000193178099017865534e+00 */
> +        .quad 0xBF653A014548E686  /* A02 = -2.591135484433962181405e-03 */
> +        .quad 0xBFD47F9C0844B38F  /* A03 = -3.202886658426046806447e-01 */
> +        .quad 0xBEDA012B1B1A41E2  /* A00 = -6.199971197454598722328e-06 */
> +        .quad 0x3FF000EBE868FDF4  /* A01 = +1.000224979259539459520e+00 */
> +        .quad 0xBF67CA9427E0A544  /* A02 = -2.904214255086275467410e-03 */
> +        .quad 0xBFD46EC6812ADB37  /* A03 = -3.192611943626845749655e-01 */
> +        .quad 0xBEDF3EAC5BF12194  /* A00 = -7.449344990702664567927e-06 */
> +        .quad 0x3FF001112A520784  /* A01 = +1.000260510744255704196e+00 */
> +        .quad 0xBF6A8D01ABDA4DC4  /* A02 = -3.241065277345108255891e-03 */
> +        .quad 0xBFD45D55759FFA4A  /* A03 = -3.181966446572103146551e-01 */
> +        .quad 0xBEE2A541BC274267  /* A00 = -8.890883582164319970972e-06 */
> +        .quad 0x3FF0013A9E5961F2  /* A01 = +1.000300043631906721231e+00 */
> +        .quad 0xBF6D82ECD080C540  /* A02 = -3.602468994380686462264e-03 */
> +        .quad 0xBFD44B4A0779C0AD  /* A03 = -3.170952866557950611259e-01 */
> +        .quad 0xBEE61D97609A27F4  /* A00 = -1.054553560499505625520e-05 */
> +        .quad 0x3FF001688F56A3AF  /* A01 = +1.000343856731187974773e+00 */
> +        .quad 0xBF7056F8EFB683EC  /* A02 = -3.989193351487490407647e-03 */
> +        .quad 0xBFD438A5620F0F74  /* A03 = -3.159573991399533543500e-01 */
> +        .quad 0xBEEA145429EDD370  /* A00 = -1.243563138839952927732e-05 */
> +        .quad 0x3FF0019B4A242A67  /* A01 = +1.000392236341804297339e+00 */
> +        .quad 0xBF7207D31CA78D9B  /* A02 = -4.401993423445739288258e-03 */
> +        .quad 0xBFD42568BA16E7CD  /* A03 = -3.147832696228050619602e-01 */
> +        .quad 0xBEEE96370D52680F  /* A00 = -1.458491207477835326165e-05 */
> +        .quad 0x3FF001D31D8E4115  /* A01 = +1.000445476009251821736e+00 */
> +        .quad 0xBF73D4CC11EDC094  /* A02 = -4.841611050196221316400e-03 */
> +        .quad 0xBFD411954D8664E7  /* A03 = -3.135731942252974469021e-01 */
> +        .quad 0xBEF338C046215EF8  /* A00 = -1.833122622260562810219e-05 */
> +        .quad 0x3FF00230C32C2EC1  /* A01 = +1.000534784691737621998e+00 */
> +        .quad 0xBF76BD019BCC5DAF  /* A02 = -5.551344188254799492943e-03 */
> +        .quad 0xBFD3F2C7156DC21E  /* A03 = -3.116929730668135389848e-01 */
> +        .quad 0xBEF9B15EAE411EAE  /* A00 = -2.450261207822986676092e-05 */
> +        .quad 0x3FF002C2DF057A4D  /* A01 = +1.000674124886830940184e+00 */
> +        .quad 0xBF7B08CCD9AC1E30  /* A02 = -6.600189396301511801646e-03 */
> +        .quad 0xBFD3C7A7A114FED8  /* A03 = -3.090609620157755976777e-01 */
> +        .quad 0xBF00E36483C373B3  /* A00 = -3.221178528332122595812e-05 */
> +        .quad 0x3FF0036F419480D7  /* A01 = +1.000838524028997644777e+00 */
> +        .quad 0xBF7FD255D1777007  /* A02 = -7.768950679260206403087e-03 */
> +        .quad 0xBFD39A453911D6CE  /* A03 = -3.062909180947429588215e-01 */
> +        .quad 0xBF05DFA04DD12059  /* A00 = -4.172046622180685472624e-05 */
> +        .quad 0x3FF00438B2A03D8D  /* A01 = +1.001030633695197069599e+00 */
> +        .quad 0xBF828F8DBB4A9D10  /* A02 = -9.062869337255224921890e-03 */
> +        .quad 0xBFD36AAB704697D9  /* A03 = -3.033856007044711255993e-01 */
> +        .quad 0xBF0BF3E0C647DEFB  /* A00 = -5.331544597092331081714e-05 */
> +        .quad 0x3FF005221063D36D  /* A01 = +1.001253189109060359741e+00 */
> +        .quad 0xBF857A2CB3C96102  /* A02 = -1.048693584122917590862e-02 */
> +        .quad 0xBFD338E65BBB4FEC  /* A03 = -3.003478904549854444639e-01 */
> +        .quad 0xBF11A506ED7C9D31  /* A00 = -6.730894835681591541979e-05 */
> +        .quad 0x3FF0062E4D0EA92A  /* A01 = +1.001508999829250345925e+00 */
> +        .quad 0xBF88AB82C2761AF3  /* A02 = -1.204588085125866091241e-02 */
> +        .quad 0xBFD305028D6BD206  /* A03 = -2.971807843271395688234e-01 */
> +        .quad 0xBF1607C0922D9BF1  /* A00 = -8.403885708006799337092e-05 */
> +        .quad 0x3FF007606C341961  /* A01 = +1.001800940198869449560e+00 */
> +        .quad 0xBF8C25E6DA487BCF  /* A02 = -1.374416688582682892494e-02 */
> +        .quad 0xBFD2CF0D0EE8F7B5  /* A03 = -2.938873906713255768075e-01 */
> +        .quad 0xBF1B3A8480A0A16D  /* A00 = -1.038688061788578038307e-04 */
> +        .quad 0x3FF008BB802D02D6  /* A01 = +1.002131939589323561535e+00 */
> +        .quad 0xBF8FEB8AE99FD100  /* A02 = -1.558598065819483124983e-02 */
> +        .quad 0xBFD297135BD0911B  /* A03 = -2.904709240558688843059e-01 */
> +        .quad 0xBF20ABB9BDB75C65  /* A00 = -1.271881327357976163798e-04 */
> +        .quad 0x3FF00A42A76D8CD1  /* A01 = +1.002504972472525901495e+00 */
> +        .quad 0xBF91FF3D752BB9E6  /* A02 = -1.757522609380570560722e-02 */
> +        .quad 0xBFD25D235C1F88B4  /* A03 = -2.869346999779154305799e-01 */
> +        .quad 0xBF243D3254425461  /* A00 = -1.544116913733432829448e-04 */
> +        .quad 0x3FF00BF909D1795E  /* A01 = +1.002923048355647051011e+00 */
> +        .quad 0xBF94304E04D44942  /* A02 = -1.971551804042204897316e-02 */
> +        .quad 0xBFD2214B5E61CFA6  /* A03 = -2.832821294498394371075e-01 */
> +        .quad 0xBF286070011B61CE  /* A00 = -1.859795307186510085994e-04 */
> +        .quad 0x3FF00DE1D5E1627E  /* A01 = +1.003389201612804537689e+00 */
> +        .quad 0xBF9689D5F4163F59  /* A02 = -2.201017668045266231780e-02 */
> +        .quad 0xBFD1E39A11C3B42C  /* A03 = -2.795167134743816728104e-01 */
> +        .quad 0xBF2D250B366A79E8  /* A00 = -2.223564326486314902259e-04 */
> +        .quad 0x3FF010003E134001  /* A01 = +1.003906481248123094829e+00 */
> +        .quad 0xBF990C9FF91F6F81  /* A02 = -2.446222265267250853271e-02 */
> +        .quad 0xBFD1A41E80084CDC  /* A03 = -2.756420374218586655246e-01 */
> +        .quad 0xBF314DB5DDC2A30E  /* A00 = -2.640313157465248123865e-04 */
> +        .quad 0x3FF012577608921B  /* A01 = +1.004477940624503018441e+00 */
> +        .quad 0xBF9BB9626875B0C9  /* A02 = -2.707437288829409385849e-02 */
> +        .quad 0xBFD162E80768A9D0  /* A03 = -2.716617653228725615122e-01 */
> +        .quad 0xBF346A6133808864  /* A00 = -3.115165050094957730625e-04 */
> +        .quad 0x3FF014EAAFCC88A3  /* A01 = +1.005106627192198898157e+00 */
> +        .quad 0xBF9E90BEF9BF7419  /* A02 = -2.984903716411588595059e-02 */
> +        .quad 0xBFD12006545F7FAD  /* A03 = -2.675796340899932457269e-01 */
> +        .quad 0xBF37F180DC3848EA  /* A00 = -3.653468704395550778821e-04 */
> +        .quad 0x3FF017BD19147861  /* A01 = +1.005795572250939295955e+00 */
> +        .quad 0xBFA0C9A14C702E07  /* A02 = -3.278831537326359207851e-02 */
> +        .quad 0xBFD0DB895B650092  /* A03 = -2.633994476818851682154e-01 */
> +        .quad 0xBF3BEC6AAC6D7635  /* A00 = -4.260788377246944457107e-04 */
> +        .quad 0x3FF01AD1D884E719  /* A01 = +1.006547780778822565040e+00 */
> +        .quad 0xBFA260B2A1B1434A  /* A02 = -3.589399551186163439542e-02 */
> +        .quad 0xBFD09581529E93D6  /* A03 = -2.591250712233067465817e-01 */
> +        .quad 0xBF4164E26167882B  /* A00 = -5.308251737086202562063e-04 */
> +        .quad 0x3FF01FEF14B62B81  /* A01 = +1.007796364693348545316e+00 */
> +        .quad 0xBFA4EB014538AA42  /* A02 = -4.085544557559163403315e-02 */
> +        .quad 0xBFD029D36FEAF41F  /* A03 = -2.525528519580024222613e-01 */
> +        .quad 0xBF46F6FFF4E53DC8  /* A00 = -7.008313930700277652464e-04 */
> +        .quad 0x3FF027CBB51CBBA0  /* A01 = +1.009715754956893363214e+00 */
> +        .quad 0xBFA89DEC9FEC112E  /* A02 = -4.807986690687680864098e-02 */
> +        .quad 0xBFCF2A99464D0DB4  /* A03 = -2.434875100390009317053e-01 */
> +        .quad 0xBF4DCC9C4F66A4D9  /* A00 = -9.094012482836712945103e-04 */
> +        .quad 0x3FF030E7CFCCD583  /* A01 = +1.011939822882909068014e+00 */
> +        .quad 0xBFACAA3B95814081  /* A02 = -5.598627281199331645611e-02 */
> +        .quad 0xBFCDF78F156BE7CF  /* A03 = -2.341173987004467604844e-01 */
> +        .quad 0xBF5308ED74E5C7A6  /* A00 = -1.161796466103906435435e-03 */
> +        .quad 0x3FF03B5986412ECB  /* A01 = +1.014489674026594512313e+00 */
> +        .quad 0xBFB087EBA88DCC3F  /* A02 = -6.457398285947223148806e-02 */
> +        .quad 0xBFCCBB9BD134862F  /* A03 = -2.244753619680052991736e-01 */
> +        .quad 0xBF57FA23C00DF4B5  /* A00 = -1.463446533505758208674e-03 */
> +        .quad 0x3FF0473558A1BCC0  /* A01 = +1.017384859292903342975e+00 */
> +        .quad 0xBFB2E702BC6360EF  /* A02 = -7.383744334527241048871e-02 */
> +        .quad 0xBFCB77D546379288  /* A03 = -2.145945160729250122955e-01 */
> +        .quad 0xBF5DD12971557F71  /* A00 = -1.819887610814388068450e-03 */
> +        .quad 0x3FF0548DDF5000A8  /* A01 = +1.020643112482540360020e+00 */
> +        .quad 0xBFB571B63DA186E1  /* A02 = -8.376635555898871710045e-02 */
> +        .quad 0xBFCA2D5202605148  /* A03 = -2.045080672838912594358e-01 */
> +        .quad 0xBF6252B1AD5D4F17  /* A00 = -2.236697221556737096709e-03 */
> +        .quad 0x3FF063738A910BF7  /* A01 = +1.024280110622155737232e+00 */
> +        .quad 0xBFB8270C8E6B601B  /* A02 = -9.434584118878357184013e-02 */
> +        .quad 0xBFC8DD27D950A07E  /* A03 = -1.942491351230763441116e-01 */
> +        .quad 0xBF66470C91730CFC  /* A00 = -2.719425723258004842786e-03 */
> +        .quad 0x3FF073F468FCF331  /* A01 = +1.028309259519300633556e+00 */
> +        .quad 0xBFBB05C2952191E4  /* A02 = -1.055566419686964629854e-01 */
> +        .quad 0xBFC7886A770DE2BD  /* A03 = -1.838505822486435070662e-01 */
> +        .quad 0xBF6AD114AC8E98EC  /* A00 = -3.273525599485007861467e-03 */
> +        .quad 0x3FF0861BF53E5226  /* A01 = +1.032741506559554434119e+00 */
> +        .quad 0xBFBE0C4F9B461507  /* A02 = -1.173753503881763554650e-01 */
> +        .quad 0xBFC6302A037CDE3A  /* A03 = -1.733448521642786954722e-01 */
> +        .quad 0xBF6FFBDE2A6C2AF8  /* A00 = -3.904279630096648551207e-03 */
> +        .quad 0x3FF099F2EB8E7DA3  /* A01 = +1.037585182326304034106e+00 */
> +        .quad 0xBFC09C74D192DDF0  /* A02 = -1.297746680554463516444e-01 */
> +        .quad 0xBFC4D571D8E3079F  /* A03 = -1.627638157861470424859e-01 */
> +        .quad 0xBF72E8FDC0B952AA  /* A00 = -4.616728994353872309042e-03 */
> +        .quad 0x3FF0AF7F273C9533  /* A01 = +1.042845872181101141152e+00 */
> +        .quad 0xBFC244C512736F10  /* A02 = -1.427236881344176033792e-01 */
> +        .quad 0xBFC379474F58B902  /* A03 = -1.521386277613104298645e-01 */
> +        .quad 0xBF762EABAF17395B  /* A00 = -5.415602341101023557701e-03 */
> +        .quad 0x3FF0C6C3886F63FB  /* A01 = +1.048526318502125631582e+00 */
> +        .quad 0xBFC3FDF9918EA12A  /* A02 = -1.561881981590514389957e-01 */
> +        .quad 0xBFC21CA89ECAB895  /* A03 = -1.414995932913753196036e-01 */
> +        .quad 0xBF79D387CE5B2BAE  /* A00 = -6.305246822828998107258e-03 */
> +        .quad 0x3FF0DFBFE2346376  /* A01 = +1.054626353847394337748e+00 */
> +        .quad 0xBFC5C6DA43602620  /* A02 = -1.701309994680721970894e-01 */
> +        .quad 0xBFC0C08BD8DB6631  /* A03 = -1.308760460731704100557e-01 */
> +        .quad 0xBF7DDBA8E8DA9060  /* A00 = -7.289562037531366334164e-03 */
> +        .quad 0x3FF0FA70F0D1B464  /* A01 = +1.061142864894713433443e+00 */
> +        .quad 0xBFC79E18D92BAA7C  /* A02 = -1.845122394946264732241e-01 */
> +        .quad 0xBFBECBBBF74C2669  /* A03 = -1.202962378266875381749e-01 */
> +        .quad 0xBF81254E76EA25DA  /* A00 = -8.371937755572145950511e-03 */
> +        .quad 0x3FF116D05835EBD0  /* A01 = +1.068069786618014660462e+00 */
> +        .quad 0xBFC982539E2ED224  /* A02 = -1.992897531869327609755e-01 */
> +        .quad 0xBFBC1B043C350159  /* A03 = -1.097872397413132278254e-01 */
> +        .quad 0xBF8391ACBA863403  /* A00 = -9.555196230190082448686e-03 */
> +        .quad 0x3FF134D4AA477FE2  /* A01 = +1.075398125794884141015e+00 */
> +        .quad 0xBFCB7218609FEAFB  /* A02 = -2.144194099235717521079e-01 */
> +        .quad 0xBFB970A16CB88329  /* A03 = -9.937485603633135211599e-02 */
> +        .quad 0xBF87935088E48E8B  /* A00 = -1.151144902957603431692e-02 */
> +        .quad 0x3FF1649892AD7DD3  /* A01 = +1.087059567413110938716e+00 */
> +        .quad 0xBFCE6971DDE75409  /* A02 = -2.375929196847723912089e-01 */
> +        .quad 0xBFB58291E88CB251  /* A03 = -8.402358939628952472223e-02 */
> +        .quad 0xBF8DB3A62C325325  /* A00 = -1.450280973794233242702e-02 */
> +        .quad 0x3FF1A9C900C6DEEA  /* A01 = +1.103951457056548068891e+00 */
> +        .quad 0xBFD13DBC65B0E08E  /* A02 = -2.693930619311765140012e-01 */
> +        .quad 0xBFB06696F62696D1  /* A03 = -6.406539449252625362252e-02 */
> +        .quad 0xBF92583699F2E27A  /* A00 = -1.791463198307716858659e-02 */
> +        .quad 0x3FF1F451B85AA9F0  /* A01 = +1.122148246892376022288e+00 */
> +        .quad 0xBFD34FD5F8288180  /* A02 = -3.017477916164565954205e-01 */
> +        .quad 0xBFA6FB692825B683  /* A03 = -4.488686194495718900788e-02 */
> +        .quad 0xBF9641C26E673D6F  /* A00 = -2.173522757385398448959e-02 */
> +        .quad 0x3FF24364DA5E2B07  /* A01 = +1.141453602790251542487e+00 */
> +        .quad 0xBFD564A5A5EF5890  /* A02 = -3.342680092295120530821e-01 */
> +        .quad 0xBF9B43712011A982  /* A03 = -2.662445791467283467968e-02 */
> +        .quad 0xBF9A901038EC2F39  /* A00 = -2.594018313816024226548e-02 */
> +        .quad 0x3FF2961356DFFEBA  /* A01 = +1.161639537196534011088e+00 */
> +        .quad 0xBFD775EBB17198C7  /* A02 = -3.665723069046972759644e-01 */
> +        .quad 0xBF833B1A926CD462  /* A03 = -9.390075295963199591975e-03 */
> +        .quad 0xBF9F396A6A461B91  /* A00 = -3.049246095317987084727e-02 */
> +        .quad 0x3FF2EB53BAEF534B  /* A01 = +1.182452898229899629357e+00 */
> +        .quad 0xBFD97DABF8AD8BBD  /* A02 = -3.982953957076310058660e-01 */
> +        .quad 0x3F7B8F6A3E0F8837  /* A03 = +6.728568086119371925713e-03 */
> +        .quad 0xBFA21878590F8BAA  /* A00 = -3.534294211546946951064e-02 */
> +        .quad 0x3FF34209790236E1  /* A01 = +1.203622315111197105253e+00 */
> +        .quad 0xBFDB764C0E71BECB  /* A02 = -4.290952817018306997277e-01 */
> +        .quad 0x3F962FE0C03F84C0  /* A03 = +2.166701482190513949888e-02 */
> +        .quad 0xBFA4B36B9AD27ECC  /* A00 = -4.043136849327097492868e-02 */
> +        .quad 0x3FF3990C5B12FC16  /* A01 = +1.224865298994477935679e+00 */
> +        .quad 0xBFDD5AABB0D01390  /* A02 = -4.586590983092770912322e-01 */
> +        .quad 0x3FA21DAF5CA162DB  /* A03 = +3.538272863142363083844e-02 */
> +        .quad 0xBFA7645E4D7BF28B  /* A00 = -4.568762489177399105378e-02 */
> +        .quad 0x3FF3EF2FD51C0D9F  /* A01 = +1.245895225962932562069e+00 */
> +        .quad 0xBFDF26377E1B686E  /* A02 = -4.867075664057044503963e-01 */
> +        .quad 0x3FA8803E756EE812  /* A03 = +4.785342391501513914509e-02 */
> +        .quad 0xBFAA210925C64413  /* A00 = -5.103329263796054643398e-02 */
> +        .quad 0x3FF44349F897D8E7  /* A01 = +1.266427966181760345066e+00 */
> +        .quad 0xBFE06A7B02C6D8E2  /* A02 = -5.129981092675530707226e-01 */
> +        .quad 0x3FAE3F194734F5D0  /* A03 = +5.907515520309980505687e-02 */
> +        .quad 0xBFACDE48F8A19BBB  /* A00 = -5.638340029764018351832e-02 */
> +        .quad 0x3FF49439D5466582  /* A01 = +1.286187966447272845727e+00 */
> +        .quad 0xBFE131C7C1063DDC  /* A02 = -5.373266954429101183166e-01 */
> +        .quad 0x3FB1ADEEC36AD805  /* A03 = +6.906025191241844940482e-02 */
> +        .quad 0xBFAF905D8F585680  /* A00 = -6.164829611604449866036e-02 */
> +        .quad 0x3FF4E0ED1FD27F99  /* A01 = +1.304913639360142818546e+00 */
> +        .quad 0xBFE1E7A859DC1D3D  /* A02 = -5.595285182070380836095e-01 */
> +        .quad 0x3FB3ED018E4642A1  /* A03 = +7.783517573831001679086e-02 */
> +        .quad 0xBFB11595104160BA  /* A00 = -6.673556944713512906198e-02 */
> +        .quad 0x3FF528650340490B  /* A01 = +1.322361958217302513319e+00 */
> +        .quad 0xBFE28B14B40BC974  /* A02 = -5.794776455425521000109e-01 */
> +        .quad 0x3FB5DF49F5BAF6D7  /* A03 = +8.543836831355676453281e-02 */
> +        .quad 0xBFB2513A97344BA4  /* A00 = -7.155195418844911836587e-02 */
> +        .quad 0x3FF569BA0DB5EE14  /* A01 = +1.338312200124055273420e+00 */
> +        .quad 0xBFE31B53A8B67B20  /* A02 = -5.970857901737396389308e-01 */
> +        .quad 0x3FB787F297BB0544  /* A03 = +9.191814617499455275507e-02 */
> +        .quad 0xBFB37512E848FAFA  /* A00 = -7.600515528700305112331e-02 */
> +        .quad 0x3FF5A41F33B403C8  /* A01 = +1.352568819013173495591e+00 */
> +        .quad 0xBFE397F6EA9A58A5  /* A02 = -6.123003561103997904880e-01 */
> +        .quad 0x3FB8EAA9FF25CA06  /* A03 = +9.733068923177520814782e-02 */
> +        .quad 0xBFB47B3E603AFC5D  /* A00 = -8.000554894805263217439e-02 */
> +        .quad 0x3FF5D6E3EDE40487  /* A01 = +1.364963464031718975988e+00 */
> +        .quad 0xBFE400D5BCA6D631  /* A02 = -6.251019177058819709103e-01 */
> +        .quad 0x3FBA0B830ED567FE  /* A03 = +1.017381583418739132707e-01 */
> +        .quad 0xBFB5BBFE8AC90496  /* A00 = -8.489981544791400103200e-02 */
> +        .quad 0x3FF612BA70107E95  /* A01 = +1.379572332145390989311e+00 */
> +        .quad 0xBFE477EAF1FA7693  /* A02 = -6.396383978023599814478e-01 */
> +        .quad 0x3FBB4784B7C08A95  /* A03 = +1.065600346196709652391e-01 */
> +        .quad 0xBFB6D5D940743939  /* A00 = -8.920057128509463473254e-02 */
> +        .quad 0x3FF644A8748F70CE  /* A01 = +1.391762214006166953340e+00 */
> +        .quad 0xBFE4D646AB07EA37  /* A02 = -6.511567440459832267763e-01 */
> +        .quad 0x3FBC354F4E1D5292  /* A03 = +1.101884427747086558913e-01 */
> +        .quad 0xBFB7223D19E4F3D1  /* A00 = -9.036619074045339206069e-02 */
> +        .quad 0x3FF6518FEB42B7FA  /* A01 = +1.394912642466350494175e+00 */
> +        .quad 0xBFE4ED86CB87498C  /* A02 = -6.539949393430091184598e-01 */
> +        .quad 0x3FBC6D29F28CCA9B  /* A03 = +1.110407082713131127205e-01 */
> +        .quad 0xBFB6878652FF6312  /* A00 = -8.800544287022329936754e-02 */
> +        .quad 0x3FF63948C302D040  /* A01 = +1.388985406648330922508e+00 */
> +        .quad 0xBFE4C4E2E7904E17  /* A02 = -6.490339777687407218920e-01 */
> +        .quad 0x3FBC127356CA1ABE  /* A03 = +1.096565329445224612481e-01 */
> +        .quad 0xBFB4F5D18B0C91D6  /* A00 = -8.187589306596207427980e-02 */
> +        .quad 0x3FF5FD27EB7DD0B8  /* A01 = +1.374305648697413673176e+00 */
> +        .quad 0xBFE464E01A2B2FC6  /* A02 = -6.373138915164353601739e-01 */
> +        .quad 0x3FBB460547674A30  /* A03 = +1.065371798825160976065e-01 */
> +        .quad 0xBFB26642FA16A685  /* A00 = -7.187288861919156890412e-02 */
> +        .quad 0x3FF59F9BEDE1C95A  /* A01 = +1.351467065073470141812e+00 */
> +        .quad 0xBFE3D67920C8FBEA  /* A02 = -6.199308052381387046381e-01 */
> +        .quad 0x3FBA24F6A8D3CBC1  /* A03 = +1.021265184570401413078e-01 */
> +        .quad 0xBFADB5294794F097  /* A00 = -5.802277563859197656582e-02 */
> +        .quad 0x3FF523EA7B9CF453  /* A01 = +1.321268542159732772845e+00 */
> +        .quad 0xBFE322A8B55E35DB  /* A02 = -5.979808370918208160205e-01 */
> +        .quad 0x3FB8C8673B1B3E37  /* A03 = +9.680791085269722928697e-02 */
> +        .quad 0xBFA4B7D661965C6A  /* A00 = -4.046506825687219699450e-02 */
> +        .quad 0x3FF48DE3E2CE3122  /* A01 = +1.284641157110919085227e+00 */
> +        .quad 0xBFE251FED1A7F445  /* A02 = -5.725092024655472622285e-01 */
> +        .quad 0x3FB745699FCABDB9  /* A03 = +9.090290213747821701507e-02 */
> +        .quad 0xBF93E60456E4EE1D  /* A00 = -1.943213253365004902773e-02 */
> +        .quad 0x3FF3E1A14E628A59  /* A01 = +1.242585474196536532432e+00 */
> +        .quad 0xBFE16C5AB660E876  /* A02 = -5.444768488007543094653e-01 */
> +        .quad 0x3FB5AD33AA8C188F  /* A03 = +8.467410005332197397987e-02 */
> +        .quad 0x3F738C17C47C7961  /* A00 = +4.772274820224659853951e-03 */
> +        .quad 0x3FF3234DDE3BD146  /* A01 = +1.196119182682268355933e+00 */
> +        .quad 0xBFE078C0D77A9D3B  /* A02 = -5.147403915952176722826e-01 */
> +        .quad 0x3FB40D74B3E276B8  /* A03 = +7.833032027925923568290e-02 */
> +        .quad 0x3FA0474BECC689C7  /* A00 = +3.179394975019849550746e-02 */
> +        .quad 0x3FF256FB4FA7D18A  /* A01 = +1.146235762743432307076e+00 */
> +        .quad 0xBFDEFA8E3FB285E2  /* A02 = -4.840427038235174395098e-01 */
> +        .quad 0x3FB270C007493D59  /* A03 = +7.203293016322244446403e-02 */
> +        .quad 0x3FAF5BD51E479BDC  /* A00 = +6.124750132203590768931e-02 */
> +        .quad 0x3FF18081D0B53BC5  /* A01 = +1.093873801484492647162e+00 */
> +        .quad 0xBFDCFE2439BD0C03  /* A02 = -4.530115665294831006626e-01 */
> +        .quad 0x3FB0DEFE5A45AFDD  /* A03 = +6.590261176978580437424e-02 */
> +        .quad 0x3FB7BD5D2806EA26  /* A00 = +9.273321368429118805032e-02 */
> +        .quad 0x3FF0A369E35B4440  /* A01 = +1.039895904647224256223e+00 */
> +        .quad 0xBFDB04BC5C9951E7  /* A02 = -4.221640495573226181669e-01 */
> +        .quad 0x3FAEBBBAA9D6DEEF  /* A03 = +6.002600978120919278380e-02 */
> +        .quad 0x3FC01BE411098DBC  /* A00 = +1.258511622610124502941e-01 */
> +        .quad 0x3FEF85BDABC031C1  /* A01 = +9.850757936961188621083e-01 */
> +        .quad 0xBFD91521375097C2  /* A02 = -3.919146576102968682065e-01 */
> +        .quad 0x3FABE26F0086D982  /* A03 = +5.446192628317005068883e-02 */
> +        .quad 0x3FC481D7FF5776B9  /* A00 = +1.602125164781023347604e-01 */
> +        .quad 0x3FEDC3506C1E7218  /* A01 = +9.300920592973538347792e-01 */
> +        .quad 0xBFD7349A88DA7D4F  /* A02 = -3.625856720409119104964e-01 */
> +        .quad 0x3FA936E2DFF8E2AE  /* A03 = +4.924687370334389358018e-02 */
> +        .quad 0x3FC90471F96FA27A  /* A00 = +1.954481571149420671141e-01 */
> +        .quad 0x3FEC0451601987A2  /* A01 = +8.755270840595026360376e-01 */
> +        .quad 0xBFD5671CD4B898DC  /* A02 = -3.344184949259110251063e-01 */
> +        .quad 0x3FA6BB9594603B67  /* A03 = +4.439990459660841243261e-02 */
> +        .quad 0x3FCFD8ADB9ED944C  /* A00 = +2.488000066615846384011e-01 */
> +        .quad 0x3FE978C073F6809A  /* A01 = +7.959902062321078108909e-01 */
> +        .quad 0xBFD2DF7E00BCD5A9  /* A02 = -2.948908812716931060471e-01 */
> +        .quad 0x3FA3614033D490B2  /* A03 = +3.785133965200894456959e-02 */
> +        .quad 0x3FD4846A12AFE5A0  /* A00 = +3.205819303981005674586e-01 */
> +        .quad 0x3FE63A1147D40472  /* A01 = +6.945883181471244061100e-01 */
> +        .quad 0xBFCFA2268AD34450  /* A02 = -2.471359422548027318101e-01 */
> +        .quad 0x3F9F150201D9FFE0  /* A03 = +3.035357605267552383310e-02 */
> +        .quad 0x3FD9018641F82BEB  /* A00 = +3.907180446846598154131e-01 */
> +        .quad 0x3FE33B7C220FFBDC  /* A01 = +6.010113396913498995389e-01 */
> +        .quad 0xBFCA4E4187E29C86  /* A02 = -2.055131829740483584423e-01 */
> +        .quad 0x3F98C30CED19F8F4  /* A03 = +2.418155858185229434287e-02 */
> +        .quad 0x3FDD4B8255BEB078  /* A00 = +4.577337109901757905561e-01 */
> +        .quad 0x3FE0858B19D3A49B  /* A01 = +5.163016800335243905451e-01 */
> +        .quad 0xBFC5BC929EACE564  /* A02 = -1.698172831327539045176e-01 */
> +        .quad 0x3F93A083CE57DE2B  /* A03 = +1.916700312537337677621e-02 */
> +        .quad 0x3FE0A8E5E039295C  /* A00 = +5.206174258576470315063e-01 */
> +        .quad 0x3FDC35E1234583FE  /* A01 = +4.407885403107342225937e-01 */
> +        .quad 0xBFC1DE034E31AEB9  /* A02 = -1.395877963835710222629e-01 */
> +        .quad 0x3F8EFDEBB3471BDC  /* A03 = +1.513275280821162888101e-02 */
> +        .quad 0x3FE2851B603CB2A5  /* A00 = +5.787484054213406503564e-01 */
> +        .quad 0x3FD7F4A44ABBB286  /* A01 = +3.743067483726821853551e-01 */
> +        .quad 0xBFBD3EEB67087DE7  /* A02 = -1.142413260026767657385e-01 */
> +        .quad 0x3F8864F38329E8BD  /* A03 = +1.191129917173260922836e-02 */
> +        .quad 0x3FE437DBE3C34AC1  /* A00 = +6.318187187665317283702e-01 */
> +        .quad 0x3FD43F6F789441B5  /* A01 = +3.163717916040938438194e-01 */
> +        .quad 0xBFB7D92E7901B9A4  /* A02 = -9.315767721429907277653e-02 */
> +        .quad 0x3F8327ED342308E1  /* A03 = +9.353497651663324544136e-03 */
> +        .quad 0x3FE5C0977766D55C  /* A00 = +6.797597248138731451661e-01 */
> +        .quad 0x3FD10B42A764D8F9  /* A01 = +2.663122782427219115142e-01 */
> +        .quad 0xBFB3633351D3D70F  /* A02 = -7.573242900602060456716e-02 */
> +        .quad 0x3F7E079E30FF899C  /* A03 = +7.331483779099558922843e-03 */
> +        .quad 0x3FE7202CE08A88C4  /* A00 = +7.226776490754436288455e-01 */
> +        .quad 0x3FCC973EB5662B01  /* A01 = +2.233656297433626314319e-01 */
> +        .quad 0xBFAF70A455F9920B  /* A02 = -6.140626477716545211782e-02 */
> +        .quad 0x3F77812411CE99B6  /* A03 = +5.738392731393584730859e-03 */
> +        .quad 0x3FE85879424095B1  /* A00 = +7.608000082006382003286e-01 */
> +        .quad 0x3FC7E73BD1674D84  /* A01 = +1.867441914060742336190e-01 */
> +        .quad 0xBFA96F84E4BF333B  /* A02 = -4.967894832916504993525e-02 */
> +        .quad 0x3F72606DDCA6E117  /* A03 = +4.486493251924870105662e-03 */
> +        .quad 0x3FE96BFE4957F4DD  /* A00 = +7.944327766887472330737e-01 */
> +        .quad 0x3FC3ED4780D25478  /* A01 = +1.556786898624158421711e-01 */
> +        .quad 0xBFA489C5F9A56B58  /* A02 = -4.011362717093075458408e-02 */
> +        .quad 0x3F6CB5DC17E9AD2A  /* A03 = +3.504686231556104931972e-03 */
> +        .quad 0x3FEA5D9CB2F41234  /* A00 = +8.239272589858672724006e-01 */
> +        .quad 0x3FC091A758374DCF  /* A01 = +1.294449978582705440555e-01 */
> +        .quad 0xBFA08E436D4B5CE0  /* A02 = -3.233538350257858517978e-02 */
> +        .quad 0x3F666997AD53E6B7  /* A03 = +2.735897297154145629133e-03 */
> +        .quad 0x3FEB3060342CB850  /* A00 = +8.496552485501158713532e-01 */
> +        .quad 0x3FBB7D30BBC7DC1B  /* A01 = +1.073790033768634993860e-01 */
> +        .quad 0xBF9AA6BA3443D9E3  /* A02 = -2.602663940430173170060e-02 */
> +        .quad 0x3F617CA764B7850B  /* A03 = +2.134634914668814050648e-03 */
> +        .quad 0x3FEBE759A6A0C7B8  /* A00 = +8.719909910635044170135e-01 */
> +        .quad 0x3FB6C10DE6A703FF  /* A01 = +8.888327485239243264115e-02 */
> +        .quad 0xBF956C566D8BE1F6  /* A02 = -2.092108768099084498138e-02 */
> +        .quad 0x3F5B46D1A4A59CF8  /* A03 = +1.664833764687232917079e-03 */
> +        .quad 0x3FEC858494887A04  /* A00 = +8.912985707318630268503e-01 */
> +        .quad 0x3FB2CC31F543394D  /* A01 = +7.342827070099140762682e-02 */
> +        .quad 0xBF9133477FF69137  /* A02 = -1.679717749142747504343e-02 */
> +        .quad 0x3F5544482FBB4DA5  /* A03 = +1.298017973501022466823e-03 */
> +        .quad 0x3FED0DB59D0E32E9  /* A00 = +9.079235141267335551518e-01 */
> +        .quad 0x3FAF006BAFFC6EF4  /* A01 = +6.055008433597022787787e-02 */
> +        .quad 0xBF8B97146FA2B97A  /* A02 = -1.347175565419144252499e-02 */
> +        .quad 0x3F5093B01F4CDC69  /* A03 = +1.011774057770665211434e-03 */
> +        .quad 0x3FEDB487C3EC457C  /* A00 = +9.282873942012623835751e-01 */
> +        .quad 0x3FA7390C09D0BD1D  /* A01 = +4.535710925881118044112e-02 */
> +        .quad 0xBF83D9F7C3181106  /* A02 = -9.693084374710735778846e-03 */
> +        .quad 0x3F46E34A0A3C0E64  /* A03 = +6.984817050299072134500e-04 */
> +        .quad 0x3FEE5FFCB4E6EB00  /* A00 = +9.492171796076434020506e-01 */
> +        .quad 0x3F9F4913ED00AADF  /* A01 = +3.055220731782070861526e-02 */
> +        .quad 0xBF79670BD0E59B5C  /* A02 = -6.201788097633133961528e-03 */
> +        .quad 0x3F3BC998EBCAF96D  /* A03 = +4.240034429975534616304e-04 */
> +        .quad 0x3FEEDBA41E9542FE  /* A00 = +9.643116566968215064293e-01 */
> +        .quad 0x3F94F5DD18D9C24D  /* A01 = +2.046914543319848858727e-02 */
> +        .quad 0xBF7034896AA122B9  /* A02 = -3.956352980886528904192e-03 */
> +        .quad 0x3F30DCCB47810B39  /* A03 = +2.573009765038273091199e-04 */
> +        .quad 0x3FEF33F2882520ED  /* A00 = +9.750912341196716903724e-01 */
> +        .quad 0x3F8BF37F2CF553FF  /* A01 = +1.364802699996836392315e-02 */
> +        .quad 0xBF649F6F05A69619  /* A02 = -2.517430152880317534986e-03 */
> +        .quad 0x3F247623C950AAC9  /* A03 = +1.561087307505231250044e-04 */
> +        .quad 0x3FEF727757751741  /* A00 = +9.827229221489021115943e-01 */
> +        .quad 0x3F828E67912C4400  /* A01 = +9.060677640748693306705e-03 */
> +        .quad 0xBF5A2F51A806CC2C  /* A02 = -1.598195784123355826789e-03 */
> +        .quad 0x3F18D35D7687E613  /* A03 = +9.470231965016282719549e-05 */
> +        .quad 0x3FEF9E6325C5942A  /* A00 = +9.880843866091073568469e-01 */
> +        .quad 0x3F788AB117618F76  /* A01 = +5.991641772286606867914e-03 */
> +        .quad 0xBF5096EAB0B1EA89  /* A02 = -1.012543859160305046233e-03 */
> +        .quad 0x3F0E1E50EC4435AB  /* A03 = +5.744633156910412119652e-05 */
> +        .quad 0x3FEFBD0784049369  /* A00 = +9.918248728250605994461e-01 */
> +        .quad 0x3F702BBD8294035F  /* A01 = +3.947963975634432264028e-03 */
> +        .quad 0xBF44FB55E0F00593  /* A02 = -6.403130845457509273330e-04 */
> +        .quad 0x3F0244DCD723230A  /* A03 = +3.484534217219031730379e-05 */
> +        .quad 0x3FEFD245E2366A43  /* A00 = +9.944180887426415926811e-01 */
> +        .quad 0x3F653D82EC088433  /* A01 = +2.592807490387838333795e-03 */
> +        .quad 0xBF3A7DF75E013CB8  /* A02 = -4.042366908878036561859e-04 */
> +        .quad 0x3EF6298E69F991CD  /* A03 = +2.113564425911141559972e-05 */
> +        .quad 0x3FEFE0EAA508BC69  /* A00 = +9.962056372950317539861e-01 */
> +        .quad 0x3F5BD0771AF3FDDA  /* A01 = +1.697651208644282514598e-03 */
> +        .quad 0xBF30B2E1254DE571  /* A02 = -2.548026725928887099328e-04 */
> +        .quad 0x3EEAE28B70EC0256  /* A03 = +1.281973848454955042307e-05 */
> +        .quad 0x3FEFEAF5303D7F96  /* A00 = +9.974313680831865536192e-01 */
> +        .quad 0x3F5229111365657E  /* A01 = +1.108423877289460134782e-03 */
> +        .quad 0xBF250572D04DFE66  /* A02 = -1.603796628408704519168e-04 */
> +        .quad 0x3EE04E89BB57C981  /* A03 = +7.775682983689149966743e-06 */
> +        .quad 0x3FEFF1CF52F1CF44  /* A00 = +9.982678051005469122003e-01 */
> +        .quad 0x3F47A71316147CEB  /* A01 = +7.218211359577819110842e-04 */
> +        .quad 0xBF1A6D7604055719  /* A02 = -1.008132248946049582547e-04 */
> +        .quad 0x3ED3C8047586A85C  /* A03 = +4.716233739913014633626e-06 */
> +        .quad 0x3FEFF6770369EF69  /* A00 = +9.988360468555416149528e-01 */
> +        .quad 0x3F3EBB261180FBF0  /* A01 = +4.689186039321105101130e-04 */
> +        .quad 0xBF1097754FE19D7F  /* A02 = -6.329206004950480057066e-05 */
> +        .quad 0x3EC7FEFF83BCA0A7  /* A03 = +2.860556404988488738366e-06 */
> +        .quad 0x3FEFF99D42371AC4  /* A00 = +9.992204945818561334647e-01 */
> +        .quad 0x3F33EB2AEC271F59  /* A01 = +3.039340773764907474054e-04 */
> +        .quad 0xBF04CF18E0FC0D79  /* A02 = -3.968996690952969588805e-05 */
> +        .quad 0x3EBD1BDBD6019BE9  /* A03 = +1.735021065507727833886e-06 */
> +        .quad 0x3FEFFBBCA32B0D91  /* A00 = +9.994795977476532700123e-01 */
> +        .quad 0x3F29C41E1615110A  /* A01 = +1.965796209707565346710e-04 */
> +        .quad 0xBEFA11F93D9DCB5A  /* A02 = -2.486248909101414873235e-05 */
> +        .quad 0x3EB1A7CA4546F7A7  /* A03 = +1.052345642723709228769e-06 */
> +        .quad 0x3FEFFD298B8E8DE2  /* A00 = +9.996535993308806045121e-01 */
> +        .quad 0x3F20A1C42D523C5B  /* A01 = +1.268913244172078754520e-04 */
> +        .quad 0xBEF0507A364AFAE4  /* A02 = -1.555859070622834605755e-05 */
> +        .quad 0x3EA56ACA17E7CDF4  /* A03 = +6.382806956848098872313e-07 */
> +        .quad 0x3FEFFE1DC82BA5A3  /* A00 = +9.997700604991915929176e-01 */
> +        .quad 0x3F156E73B90F1769  /* A01 = +8.175450626798714452801e-05 */
> +        .quad 0xBEE4663579D0A09F  /* A02 = -9.727122057226747625365e-06 */
> +        .quad 0x3E99FAF6FEC5D4C1  /* A03 = +3.871371052824002996020e-07 */
> +        .quad 0x3FEFFEF8D0BB5E81  /* A00 = +9.998745037837154514548e-01 */
> +        .quad 0x3F06686DA18D39C3  /* A01 = +4.273972098777251447726e-05 */
> +        .quad 0xBED46BC298073E90  /* A02 = -4.868731025855742842491e-06 */
> +        .quad 0x3E88E42286B9D0FD  /* A03 = +1.854535328530838170114e-07 */
> +        .quad 0x3FEFFF8DBC68DDC7  /* A00 = +9.999455146670975791423e-01 */
> +        .quad 0x3EF26B2953A80AF0  /* A01 = +1.756534514108903368909e-05 */
> +        .quad 0xBEBFC4472D580F83  /* A02 = -1.893443529411295465239e-06 */
> +        .quad 0x3E72505B4553D19F  /* A03 = +6.822456673547912277047e-08 */
> +        .quad 0x3FEFFFCED1276609  /* A00 = +9.999765477215883935358e-01 */
> +        .quad 0x3EDE1A94C7CC58F5  /* A01 = +7.177313020153979672606e-06 */
> +        .quad 0xBEA8A2C988744E57  /* A02 = -7.342066660497443762363e-07 */
> +        .quad 0x3E5AF30036BBBAF4  /* A03 = +2.509841882843541084885e-08 */
> +        .quad 0x3FEFFFEAFE70FCFC  /* A00 = +9.999899835164849370983e-01 */
> +        .quad 0x3EC879175E3549F5  /* A01 = +2.917410471128503564412e-06 */
> +        .quad 0xBE930E36677D1813  /* A02 = -2.839493400307523115929e-07 */
> +        .quad 0x3E43D4005B42D48F  /* A03 = +9.233192745401904898013e-09 */
> +        .quad 0x3ff0000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000
> +        .quad 0x0000000000000000

Are these last 4 values in use or just there for padding?
> +        .align 32
> +        .long 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000, 0x80000000           /* _sSignMask        */
> +        .align 32
> +        .long 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff           /* _sAbsMask         */
> +        .align 32
> +        .long 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000, 0x7ff80000           /* _iExpMantMask     */
> +        .align 32
> +        .long 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000, 0x7f000000           /* _iExpMask         */
> +        .align 32
> +        .long 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000, 0x3cf80000           /* _iMinIdxOfsMask   */
> +        .align 32
> +        .long 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000, 0x04280000           /* _iMaxIdxMask      */
> +        .align 32
> +        .type  __svml_stanh_data_internal,@object
> +        .size  __svml_stanh_data_internal,.-__svml_stanh_data_internal
> diff --git a/sysdeps/x86_64/fpu/svml_d_tanh2_core.S b/sysdeps/x86_64/fpu/svml_d_tanh2_core.S
> new file mode 100644
> index 0000000000..c703131777
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_tanh2_core.S
> @@ -0,0 +1,29 @@
> +/* Function tanh vectorized with SSE2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +       .text
> +ENTRY (_ZGVbN2v_tanh)
> +WRAPPER_IMPL_SSE2 tanh
> +END (_ZGVbN2v_tanh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN2v_tanh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_tanh4_core.S b/sysdeps/x86_64/fpu/svml_d_tanh4_core.S
> new file mode 100644
> index 0000000000..fb293f4dba
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_tanh4_core.S
> @@ -0,0 +1,29 @@
> +/* Function tanh vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +       .text
> +ENTRY (_ZGVdN4v_tanh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_tanh
> +END (_ZGVdN4v_tanh)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN4v_tanh)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S
> new file mode 100644
> index 0000000000..5385a2c27c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_tanh4_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function tanh vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +       .text
> +ENTRY (_ZGVcN4v_tanh)
> +WRAPPER_IMPL_AVX _ZGVbN2v_tanh
> +END (_ZGVcN4v_tanh)
> diff --git a/sysdeps/x86_64/fpu/svml_d_tanh8_core.S b/sysdeps/x86_64/fpu/svml_d_tanh8_core.S
> new file mode 100644
> index 0000000000..9dafa7bb9a
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_tanh8_core.S
> @@ -0,0 +1,25 @@
> +/* Function tanh vectorized with AVX-512, wrapper to AVX2.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_d_wrapper_impl.h"
> +
> +       .text
> +ENTRY (_ZGVeN8v_tanh)
> +WRAPPER_IMPL_AVX512 _ZGVdN4v_tanh
> +END (_ZGVeN8v_tanh)
> diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf16_core.S b/sysdeps/x86_64/fpu/svml_s_tanhf16_core.S
> new file mode 100644
> index 0000000000..19d51365e8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_tanhf16_core.S
> @@ -0,0 +1,25 @@
> +/* Function tanhf vectorized with AVX-512. Wrapper to AVX2 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +       .text
> +ENTRY (_ZGVeN16v_tanhf)
> +WRAPPER_IMPL_AVX512 _ZGVdN8v_tanhf
> +END (_ZGVeN16v_tanhf)
> diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf4_core.S b/sysdeps/x86_64/fpu/svml_s_tanhf4_core.S
> new file mode 100644
> index 0000000000..6b98950f84
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_tanhf4_core.S
> @@ -0,0 +1,29 @@
> +/* Function tanhf vectorized with SSE2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +       .text
> +ENTRY (_ZGVbN4v_tanhf)
> +WRAPPER_IMPL_SSE2 tanhf
> +END (_ZGVbN4v_tanhf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVbN4v_tanhf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf8_core.S b/sysdeps/x86_64/fpu/svml_s_tanhf8_core.S
> new file mode 100644
> index 0000000000..3ada061ae0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_tanhf8_core.S
> @@ -0,0 +1,29 @@
> +/* Function tanhf vectorized with AVX2, wrapper version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +       .text
> +ENTRY (_ZGVdN8v_tanhf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_tanhf
> +END (_ZGVdN8v_tanhf)
> +
> +#ifndef USE_MULTIARCH
> + libmvec_hidden_def (_ZGVdN8v_tanhf)
> +#endif
> diff --git a/sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S b/sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S
> new file mode 100644
> index 0000000000..255d45952d
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_s_tanhf8_core_avx.S
> @@ -0,0 +1,25 @@
> +/* Function tanhf vectorized in AVX ISA as wrapper to SSE4 ISA version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +#include "svml_s_wrapper_impl.h"
> +
> +        .text
> +ENTRY (_ZGVcN8v_tanhf)
> +WRAPPER_IMPL_AVX _ZGVbN4v_tanhf
> +END (_ZGVcN8v_tanhf)
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c
> new file mode 100644
> index 0000000000..a456c574e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-tanh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c
> new file mode 100644
> index 0000000000..a456c574e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx2.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-tanh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c
> new file mode 100644
> index 0000000000..a456c574e2
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-double-libmvec-tanh.c"
> diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-tanh.c b/sysdeps/x86_64/fpu/test-double-libmvec-tanh.c
> new file mode 100644
> index 0000000000..4cb6a169d8
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-libmvec-tanh.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE double
> +#define LIBMVEC_FUNC tanh
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> index 9d91ccfe51..f53bb6813e 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVbN2v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVbN2v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVbN2v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVbN2v_erf)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVbN2v_tanh)
>
>  #define VEC_INT_TYPE __m128i
>
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> index 9e86d5fef8..0452c3db38 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c
> @@ -47,6 +47,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVdN4v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVdN4v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVdN4v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVdN4v_erf)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVdN4v_tanh)
>
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m256i
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> index 0f4ef00de4..197d5afc88 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVcN4v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVcN4v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVcN4v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVcN4v_erf)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVcN4v_tanh)
>
>  #define VEC_INT_TYPE __m128i
>
> diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> index 975dff85af..e56ece640c 100644
> --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1p), _ZGVeN8v_log1p)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanh), _ZGVeN8v_atanh)
>  VECTOR_WRAPPER (WRAPPER_NAME (acosh), _ZGVeN8v_acosh)
>  VECTOR_WRAPPER (WRAPPER_NAME (erf), _ZGVeN8v_erf)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanh), _ZGVeN8v_tanh)
>
>  #ifndef __ILP32__
>  # define VEC_INT_TYPE __m512i
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c
> new file mode 100644
> index 0000000000..254f9201aa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-tanhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c
> new file mode 100644
> index 0000000000..254f9201aa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx2.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-tanhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c
> new file mode 100644
> index 0000000000..254f9201aa
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf-avx512f.c
> @@ -0,0 +1 @@
> +#include "test-float-libmvec-tanhf.c"
> diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c
> new file mode 100644
> index 0000000000..9a61ee8f9c
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-float-libmvec-tanhf.c
> @@ -0,0 +1,3 @@
> +#define LIBMVEC_TYPE float
> +#define LIBMVEC_FUNC tanhf
> +#include "test-vector-abi-arg1.h"
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> index 2b1e27391a..abbebf9993 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVeN16v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVeN16v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVeN16v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVeN16v_erff)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVeN16v_tanhf)
>
>  #define VEC_INT_TYPE __m512i
>
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> index 78428bf517..ae1c8b98c2 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVbN4v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVbN4v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVbN4v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVbN4v_erff)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVbN4v_tanhf)
>
>  #define VEC_INT_TYPE __m128i
>
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> index dadd4e6ca0..eb477a0371 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
> @@ -47,6 +47,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVdN8v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVdN8v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVdN8v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVdN8v_erff)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVdN8v_tanhf)
>
>  /* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf.  */
>  #undef VECTOR_WRAPPER_fFF
> diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> index 7b2d583e54..944f7f0a75 100644
> --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
> @@ -44,6 +44,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (log1pf), _ZGVcN8v_log1pf)
>  VECTOR_WRAPPER (WRAPPER_NAME (atanhf), _ZGVcN8v_atanhf)
>  VECTOR_WRAPPER (WRAPPER_NAME (acoshf), _ZGVcN8v_acoshf)
>  VECTOR_WRAPPER (WRAPPER_NAME (erff), _ZGVcN8v_erff)
> +VECTOR_WRAPPER (WRAPPER_NAME (tanhf), _ZGVcN8v_tanhf)
>
>  #define VEC_INT_TYPE __m128i
>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2022-01-29  1:34 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-29  6:39 [PATCH v5 00/18] x86-64: Add vector math functions to libmvec Sunil K Pandey
2021-12-29  6:39 ` [PATCH v5 01/18] x86-64: Add vector atan/atanf implementation " Sunil K Pandey
2021-12-29 21:25   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 02/18] x86-64: Add vector asin/asinf " Sunil K Pandey
2021-12-29 21:25   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 03/18] x86-64: Add vector hypot/hypotf " Sunil K Pandey
2021-12-29 21:24   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 04/18] x86-64: Add vector exp2/exp2f " Sunil K Pandey
2021-12-29 21:25   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 05/18] x86-64: Add vector exp10/exp10f " Sunil K Pandey
2021-12-29 21:25   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 06/18] x86-64: Add vector cosh/coshf " Sunil K Pandey
2021-12-29 21:26   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 07/18] x86-64: Add vector expm1/expm1f " Sunil K Pandey
2021-12-29 21:25   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 08/18] x86-64: Add vector sinh/sinhf " Sunil K Pandey
2021-12-29 21:26   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 09/18] x86-64: Add vector cbrt/cbrtf " Sunil K Pandey
2021-12-29 21:25   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 10/18] x86-64: Add vector atan2/atan2f " Sunil K Pandey
2021-12-29 21:26   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 11/18] x86-64: Add vector log10/log10f " Sunil K Pandey
2021-12-29 21:26   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 12/18] x86-64: Add vector log2/log2f " Sunil K Pandey
2021-12-29 21:26   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 13/18] x86-64: Add vector log1p/log1pf " Sunil K Pandey
2021-12-29 21:26   ` H.J. Lu
2021-12-29 23:28     ` Noah Goldstein
2021-12-30  0:32       ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 14/18] x86-64: Add vector atanh/atanhf " Sunil K Pandey
2021-12-29 21:26   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 15/18] x86-64: Add vector acosh/acoshf " Sunil K Pandey
2021-12-29 21:26   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 16/18] x86-64: Add vector erf/erff " Sunil K Pandey
2021-12-29 21:27   ` H.J. Lu
2021-12-29  6:39 ` [PATCH v5 17/18] x86-64: Add vector tanh/tanhf " Sunil K Pandey
2021-12-29 21:26   ` H.J. Lu
2022-01-29  1:33   ` Noah Goldstein
2021-12-29  6:40 ` [PATCH v5 18/18] x86-64: Add vector asinh/asinhf " Sunil K Pandey
2021-12-29 21:27   ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).