[RFC] How to add vector math functions to Glibc

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [RFC] How to add vector math functions to Glibc
@ 2014-09-18 13:48 Andrew Senkevich
  2014-09-18 13:57 ` Andrew Senkevich
                   ` (2 more replies)
  0 siblings, 3 replies; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-18 13:48 UTC (permalink / raw)
  To: libc-alpha

Hi all,

due to latest OpenMP and CilkPlus features supported in GCC vectorized
math functions can be utilized more widely. So we need to find general
way how to add vectorized math functions to Glibc.
Lets discuss the following questions which were raised after first
patch x86_64 specific was sent.

1. Should functions go in libm or a separate libmvec library?

2. What requirements on the compiler / assembler versions used are imposed
by the requirement that the ABI provided by glibc's shared libraries must
not depend on the tools used to build glibc, and what such requirements is
it OK to impose (it may be OK to move to GCC 4.6 as minimum compiler at
present, but requiring a more recent version would be a problem; we'd need
to consider what binutils version we can require)?  If a separate libmvec
is used, is it OK simply not to build it if those requirements aren't met?
(It's definitely not OK for the ABI of a library to vary incompatibly, but
it might be OK for the presence of a library to be conditional.)

3. Should it be declared that these vectorized functions do not set errno?
(If so, then any header code that enables them to be used must of course
avoid enabling them in the default -fmath-errno case.)  Similarly, do they
follow the other goals documented in the glibc manual for accuracy of
results and exceptions (for all input values, including e.g. range
reduction)?  If not, further conditionals such as -ffast-math may be
needed.

4. We need to handle different architectures having different sets of
functions vectorized.

5. We need to handle different architectures possibly not having the
same set of vector ISAs for each function.

6. How do we handle different glibc versions having vectorized functions
for different vector ISA extensions?

7. Note that if we're relying on #pragma omp declare simd meaning a precise
set of function versions are available - with a guarantee that no future
compiler version will interpret is also meaning an AVX512 version is
available, for example, so that it's safely possible to use older glibc
with a newer compiler - there should be some sort of ABI document
(preferably compiler-independent) stating that this is the meaning of that
pragma on x86_64 and that this pragma says nothing about availability of
function versions for other vector extensions and that if an ABI is
defined for such versions in future, it will use a different pragma to
declare their availability. Is there such an ABI document available that
defines what this pragma means on x86_64?

--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-18 13:48 [RFC] How to add vector math functions to Glibc Andrew Senkevich
@ 2014-09-18 13:57 ` Andrew Senkevich
  2014-09-18 17:05   ` Joseph S. Myers
  2014-09-18 15:37 ` H.J. Lu
  2014-09-21 16:31 ` Andi Kleen
  2 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-18 13:57 UTC (permalink / raw)
  To: libc-alpha

2014-09-18 17:48 GMT+04:00 Andrew Senkevich <andrew.n.senkevich@gmail.com>:
> Lets discuss the following questions which were raised after first
> patch x86_64 specific was sent.

In x86_64 case Vector Function ABI
http://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-
Vector-Function-2012-v0.9.5.pdf exists which GCC follows (with a
little differences) generating vector call.

> 1. Should functions go in libm or a separate libmvec library?

If integrate new functions with libm it will be easier for developers
to employ vectorization and should improve acceptance.
Libmvec case affects compiler options and it seems new header need to
be included instead of math.h, or is it OK include it in math.h?
But libmvec could be better from the side of optional build (mentioned
later in 2.) in case of addition most modern implementations.

> 2. What requirements on the compiler / assembler versions used are imposed
> by the requirement that the ABI provided by glibc's shared libraries must
> not depend on the tools used to build glibc, and what such requirements is
> it OK to impose (it may be OK to move to GCC 4.6 as minimum compiler at
> present, but requiring a more recent version would be a problem; we'd need
> to consider what binutils version we can require)?  If a separate libmvec
> is used, is it OK simply not to build it if those requirements aren't met?
> (It's definitely not OK for the ABI of a library to vary incompatibly, but
> it might be OK for the presence of a library to be conditional.)

Addition of vectorized versions in currently existing in Glibc ISAs
may not affect build (f.e. SSE4, AVX, AVX2 in x86_64 case).
Most modern implementation imply necessary checks in configure and
hiding under according macros, another words it can have optional
build.

> 3. Should it be declared that these vectorized functions do not set errno?
> (If so, then any header code that enables them to be used must of course
> avoid enabling them in the default -fmath-errno case.)  Similarly, do they
> follow the other goals documented in the glibc manual for accuracy of
> results and exceptions (for all input values, including e.g. range
> reduction)?  If not, further conditionals such as -ffast-math may be
> needed.

For x86_64 these functions doesn’t set errno and have less accuracy so we
need to require to set -fast-math (which sets -fno-math-errno) to use
them (or may be set -ffast-math under -fopenmp for x86_64).

> 4. We need to handle different architectures having different sets of
> functions vectorized.

We need to have some way for #pragma simd declare to be architecture dependent.
It gives possibility to have different sets of vectorized functions
for different architectures.

> 5. We need to handle different architectures possibly not having the
> same set of vector ISAs for each function.

If GCC requires all possible versions exist (as for x86_64 for now) we
can achieve it by wrappers.
If GCC can vectorize in only several ISAs for some architecture - we
need to have only that according implementations.

> 6. How do we handle different glibc versions having vectorized functions
> for different vector ISA extensions?

It seems it is became problem only if someone has old installed Glibc
version and binary with symbol from new Glibc version?

> 7. Note that if we're relying on #pragma omp declare simd meaning a precise
> set of function versions are available - with a guarantee that no future
> compiler version will interpret is also meaning an AVX512 version is
> available, for example, so that it's safely possible to use older glibc
> with a newer compiler - there should be some sort of ABI document
> (preferably compiler-independent) stating that this is the meaning of that
> pragma on x86_64 and that this pragma says nothing about availability of
> function versions for other vector extensions and that if an ABI is
> defined for such versions in future, it will use a different pragma to
> declare their availability. Is there such an ABI document available that
> defines what this pragma means on x86_64?

To inform GCC from Glibc side we need to have some OpenMP extension
informing GCC about currently available ISAs.

--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-18 13:48 [RFC] How to add vector math functions to Glibc Andrew Senkevich
  2014-09-18 13:57 ` Andrew Senkevich
@ 2014-09-18 15:37 ` H.J. Lu
  2014-09-18 17:29   ` Andrew Senkevich
  2014-09-21 16:31 ` Andi Kleen
  2 siblings, 1 reply; 67+ messages in thread
From: H.J. Lu @ 2014-09-18 15:37 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Thu, Sep 18, 2014 at 6:48 AM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> Hi all,
>
> due to latest OpenMP and CilkPlus features supported in GCC vectorized
> math functions can be utilized more widely. So we need to find general
> way how to add vectorized math functions to Glibc.
> Lets discuss the following questions which were raised after first
> patch x86_64 specific was sent.
>

What are the main target users of this extension? Is
it mainly for programmers to use it directly in their
applications, mostly independent of compilers? Or is it
mainly for GCC?

-- 
H.J.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-18 13:57 ` Andrew Senkevich
@ 2014-09-18 17:05   ` Joseph S. Myers
  2014-09-22 11:48     ` Andrew Senkevich
  2014-09-24 19:46     ` Andrew Senkevich
  0 siblings, 2 replies; 67+ messages in thread
From: Joseph S. Myers @ 2014-09-18 17:05 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Thu, 18 Sep 2014, Andrew Senkevich wrote:

> > 1. Should functions go in libm or a separate libmvec library?
> 
> If integrate new functions with libm it will be easier for developers
> to employ vectorization and should improve acceptance.
> Libmvec case affects compiler options and it seems new header need to
> be included instead of math.h, or is it OK include it in math.h?
> But libmvec could be better from the side of optional build (mentioned
> later in 2.) in case of addition most modern implementations.

There is also the option of installing libm.so as a linker script that 
refers to libmvec inside AS_NEEDED - libc.so is already a linker script 
after all.  That way, libmvec would automatically be linked into programs 
that need it (preserving compatibility with the POSIX rules about what 
library names need specifying to get what functions), without restricting 
the tools that can build glibc if vector extensions are added for which 
tool support is recent, but building libmvec is optional.

> > 2. What requirements on the compiler / assembler versions used are imposed
> > by the requirement that the ABI provided by glibc's shared libraries must
> > not depend on the tools used to build glibc, and what such requirements is
> > it OK to impose (it may be OK to move to GCC 4.6 as minimum compiler at
> > present, but requiring a more recent version would be a problem; we'd need
> > to consider what binutils version we can require)?  If a separate libmvec
> > is used, is it OK simply not to build it if those requirements aren't met?
> > (It's definitely not OK for the ABI of a library to vary incompatibly, but
> > it might be OK for the presence of a library to be conditional.)
> 
> Addition of vectorized versions in currently existing in Glibc ISAs
> may not affect build (f.e. SSE4, AVX, AVX2 in x86_64 case).
> Most modern implementation imply necessary checks in configure and
> hiding under according macros, another words it can have optional
> build.

It looks like AVX2 support was new in binutils 2.22.  Currently the 
minimum is 2.20.  Do we think it's OK to move the minimum to 2.22 (or 
later), or do we need to make libmvec optional on x86_64?

> > 3. Should it be declared that these vectorized functions do not set errno?
> > (If so, then any header code that enables them to be used must of course
> > avoid enabling them in the default -fmath-errno case.)  Similarly, do they
> > follow the other goals documented in the glibc manual for accuracy of
> > results and exceptions (for all input values, including e.g. range
> > reduction)?  If not, further conditionals such as -ffast-math may be
> > needed.
> 
> For x86_64 these functions doesn't set errno and have less accuracy so we
> need to require to set -fast-math (which sets -fno-math-errno) to use
> them (or may be set -ffast-math under -fopenmp for x86_64).

We should understand the specific properties of the functions to know what 
the relevant conditions are under which they can be used.  (Those 
conditions should then map to preprocessor conditions in the bits/ header 
that is how glibc tells the compiler about availability of the functions.)

According to the description at <https://sourceware.org/glibc/wiki/libm>, 
the functions may have inaccurate exceptions (-fno-trapping-math needed), 
not set errno (-fno-math-errno needed), may not work properly outside 
round-to-nearest mode (-fno-rounding-math needed - which is the default).  
A 4-ulp maximum error is also stated (which is within the accuracy goals 
expected by the testsuite for most functions, though some users may want 
more accurate functions).  They are stated to work for special cases, so 
no need for -ffinite-math-only.

Right now, there are no preprocessor macros for -fno-trapping-math, 
-fno-math-errno or -fno-rounding-math.  So I think you're constrained to 
using these functions only if __FAST_MATH__ is predefined.  However, it 
would be a good idea to add GCC predefines for each of those three 
options.  Then these functions could be used if (defined 
__NO_TRAPPING_MATH__ && defined __NO_MATH_ERRNO__ && defined 
__NO_ROUNDING_MATH__) || defined __FAST_MATH__.  (People using older 
compilers could choose to define those macros themselves if they want the 
functions without using -ffast-math.  There could also be a glibc-defined 
macro for users to define - just like __USE_STRING_INLINES, you could have 
__USE_MATH_VECTOR_FUNCTIONS that users can use to enable these functions 
even if their compiler options would not otherwise cause them to be used.)

__NO_MATH_ERRNO__ would be otherwise useful for replacing _LIB_VERSION / 
libieee with a better way of selecting no-errno versions of functions.

Given these properties of the functions, we can then appropriately 
condition which tests from libm-test.inc are run for them.

> > 4. We need to handle different architectures having different sets of
> > functions vectorized.
> 
> We need to have some way for #pragma simd declare to be architecture dependent.
> It gives possibility to have different sets of vectorized functions
> for different architectures.

I think having a long series of macros such as __DECL_SIMD_COS_DOUBLE as I 
suggested in <https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html>, 
that the architecture-specific <bits/math-vector.h> header may or may not 
define, would work for this.

The default bits/math-vector.h, for architectures without such functions, 
should be minimal (that is, it should not be necessary to define long 
series of macros for functions you don't have versions of on your 
architecture, so that adding vectorized versions of a new function for one 
architecture doesn't require you to update the files for lots of other 
architectures to say they don't have vectorized versions of that function; 
architecture-independent files should handle the macros possibly not being 
defined).

> > 6. How do we handle different glibc versions having vectorized functions
> > for different vector ISA extensions?
> 
> It seems it is became problem only if someone has old installed Glibc
> version and binary with symbol from new Glibc version?

No, this is the point about the pragma used being required to have 
GCC-version-independent semantics about what vector ISAs the function is 
available for, so that old glibc and headers work when used with newer 
compilers that know about newer vector ISAs - hence the need for some 
OpenMP extension for that purpose.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-18 15:37 ` H.J. Lu
@ 2014-09-18 17:29   ` Andrew Senkevich
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-18 17:29 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha

>> due to latest OpenMP and CilkPlus features supported in GCC vectorized
>> math functions can be utilized more widely. So we need to find general
>> way how to add vectorized math functions to Glibc.
>> Lets discuss the following questions which were raised after first
>> patch x86_64 specific was sent.
>>
>
> What are the main target users of this extension? Is
> it mainly for programmers to use it directly in their
> applications, mostly independent of compilers? Or is it
> mainly for GCC?

It is mainly for GCC.

--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-18 13:48 [RFC] How to add vector math functions to Glibc Andrew Senkevich
  2014-09-18 13:57 ` Andrew Senkevich
  2014-09-18 15:37 ` H.J. Lu
@ 2014-09-21 16:31 ` Andi Kleen
  2014-09-25 19:43   ` Carlos O'Donell
  2 siblings, 1 reply; 67+ messages in thread
From: Andi Kleen @ 2014-09-21 16:31 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:


> due to latest OpenMP and CilkPlus features supported in GCC vectorized
> math functions can be utilized more widely. So we need to find general
> way how to add vectorized math functions to Glibc.
> Lets discuss the following questions which were raised after first
> patch x86_64 specific was sent.
>
> 1. Should functions go in libm or a separate libmvec library?

I would just put them into libm. Simplifies things for everyone.

>
> 2. What requirements on the compiler / assembler versions used are imposed
> by the requirement that the ABI provided by glibc's shared libraries must
> not depend on the tools used to build glibc, and what such requirements is
> it OK to impose (it may be OK to move to GCC 4.6 as minimum compiler at
> present, but requiring a more recent version would be a problem; we'd need
> to consider what binutils version we can require)?  If a separate libmvec
> is used, is it OK simply not to build it if those requirements aren't met?

Requiring newer binutils should be fine. I believe glibc has done that
in the past.

> (It's definitely not OK for the ABI of a library to vary incompatibly, but
> it might be OK for the presence of a library to be conditional.)

It would be even fine to leave out the symbols from a libm
in this case. The error message to the user should be of similar
quality as for a missing library (in fact likely better)

> 6. How do we handle different glibc versions having vectorized functions
> for different vector ISA extensions?

If your glibc does not not support what you compiled for the dynamic
linker will just error out. Seems reasonable to me.

>
> 7. Note that if we're relying on #pragma omp declare simd meaning a precise
> set of function versions are available - with a guarantee that no future
> compiler version will interpret is also meaning an AVX512 version is
> available, for example, so that it's safely possible to use older
> glibc

I guess that would need an (new?) option to gcc to declare what it
can expect to be available in the library? With distributions
setting the default at gcc build time.

-Andi

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-18 17:05   ` Joseph S. Myers
@ 2014-09-22 11:48     ` Andrew Senkevich
  2014-09-22 12:37       ` Joseph S. Myers
  2014-09-24 19:46     ` Andrew Senkevich
  1 sibling, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-22 11:48 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

Hi Joseph,

>> > 4. We need to handle different architectures having different sets of
>> > functions vectorized.
>>
>> We need to have some way for #pragma simd declare to be architecture dependent.
>> It gives possibility to have different sets of vectorized functions
>> for different architectures.
>
> I think having a long series of macros such as __DECL_SIMD_COS_DOUBLE as I
> suggested in <https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html>,
> that the architecture-specific <bits/math-vector.h> header may or may not
> define, would work for this.
>
> The default bits/math-vector.h, for architectures without such functions,
> should be minimal (that is, it should not be necessary to define long
> series of macros for functions you don't have versions of on your
> architecture, so that adding vectorized versions of a new function for one
> architecture doesn't require you to update the files for lots of other
> architectures to say they don't have vectorized versions of that function;
> architecture-independent files should handle the macros possibly not being
> defined).

Is it OK to have following scheme:

diff --git a/bits/math-vector.h b/bits/math-vector.h
new file mode 100644
index 0000000..4a9c786
--- /dev/null
+++ b/bits/math-vector.h
@@ -0,0 +1,21 @@
+/* Platform-specific SIMD declarations for math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; include
<math.h> instead."
+#endif
diff --git a/math/Makefile b/math/Makefile
index 866bc0f..1941b62 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -26,7 +26,7 @@ headers               := math.h bits/mathcalls.h
bits/mathinline.h bits/huge_val.h \
                   bits/huge_valf.h bits/huge_vall.h bits/inf.h bits/nan.h \
                   fpu_control.h complex.h bits/cmathcalls.h fenv.h \
                   bits/fenv.h bits/fenvinline.h bits/mathdef.h tgmath.h \
-                  bits/math-finite.h
+                  bits/math-finite.h bits/math-vector.h

 # FPU support code.
 aux            := setfpucw fpu_control
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index ae94990..2d31a11 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -70,7 +60,15 @@ __MATHCALL (atan,, (_Mdouble_ __x));
 __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));

 /* Cosine of X.  */
+#if !defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cos
+__DECL_SIMD_cos
+#endif
+#if defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cosf
+__DECL_SIMD_cosf
+#endif
+#if defined _Mlong_double_ && defined __DECL_SIMD_cosl
+__DECL_SIMD_cosl
+#endif
 __MATHCALL (cos,, (_Mdouble_ __x));
 /* Sine of X.  */
 __MATHCALL (sin,, (_Mdouble_ __x));
diff --git a/math/math.h b/math/math.h
index 72ec2ca..32a7bec 100644
--- a/math/math.h
+++ b/math/math.h
@@ -27,6 +27,9 @@

 __BEGIN_DECLS

+/* Get machine-dependent vector math functions declarations */
+#include <bits/math-vector.h>
+
 /* Get machine-dependent HUGE_VAL value (returned on overflow).
    On all IEEE754 machines, this is +Infinity.  */
 #include <bits/huge_val.h>
diff --git a/sysdeps/x86_64/bits/math-vector.h
b/sysdeps/x86_64/bits/math-vector.h
new file mode 100644
index 0000000..512f4e4
--- /dev/null
+++ b/sysdeps/x86_64/bits/math-vector.h
@@ -0,0 +1,30 @@
+/* SIMD declarations of math functions for x86_64.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; include
<math.h> instead."
+#endif
+
+#if defined _OPENMP && _OPENMP >= 201307
+# define __DECL_SIMD_AVX   _Pragma ("omp declare simd notinbranch
processor(core_3rd_gen_avx)")
+# define __DECL_SIMD_SSE4 _Pragma ("omp declare simd notinbranch
processor(core_i7_sse4_2)")
+
+# define __DECL_SIMD_cos  __DECL_SIMD_AVX
+# define __DECL_SIMD_cosf __DECL_SIMD_SSE4
+#endif

where clause processor() taken from CilkPlus as example, for OpenMP it
is better to discuss - also clause processor or not.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-22 11:48     ` Andrew Senkevich
@ 2014-09-22 12:37       ` Joseph S. Myers
  0 siblings, 0 replies; 67+ messages in thread
From: Joseph S. Myers @ 2014-09-22 12:37 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Mon, 22 Sep 2014, Andrew Senkevich wrote:

> Is it OK to have following scheme:

Yes, that's the sort of thing I'd expect.  However:

> diff --git a/sysdeps/x86_64/bits/math-vector.h

The installed header files for x86_64 (-m64), x86_64 (x32) and i386 are 
meant to be the same, to support using the same compiler and headers with 
any of -m64, -mx32 and -m32.  Thus, this header needs to go in 
sysdeps/x86/fpu, and to contain __x86_64__ preprocessor conditionals.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-18 17:05   ` Joseph S. Myers
  2014-09-22 11:48     ` Andrew Senkevich
@ 2014-09-24 19:46     ` Andrew Senkevich
  2014-09-24 20:19       ` Joseph S. Myers
  1 sibling, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-24 19:46 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

Hi Joseph,

>> > 1. Should functions go in libm or a separate libmvec library?
>>
>> If integrate new functions with libm it will be easier for developers
>> to employ vectorization and should improve acceptance.
>> Libmvec case affects compiler options and it seems new header need to
>> be included instead of math.h, or is it OK include it in math.h?
>> But libmvec could be better from the side of optional build (mentioned
>> later in 2.) in case of addition most modern implementations.
>
> There is also the option of installing libm.so as a linker script that
> refers to libmvec inside AS_NEEDED - libc.so is already a linker script
> after all.  That way, libmvec would automatically be linked into programs
> that need it (preserving compatibility with the POSIX rules about what
> library names need specifying to get what functions), without restricting
> the tools that can build glibc if vector extensions are added for which
> tool support is recent, but building libmvec is optional.

is it OK to have following changes in GLIBC build:

diff --git a/Makeconfig b/Makeconfig
index cef0f06..5173414 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -1060,11 +1060,11 @@ endif
 # These are the subdirectories containing the library source.  The order
 # is more or less arbitrary.  The sorting step will take care of the
 # dependencies.
-all-subdirs = csu assert ctype locale intl catgets math setjmp signal    \
-      stdlib stdio-common libio malloc string wcsmbs time dirent    \
-      grp pwd posix io termios resource misc socket sysvipc gmon    \
-      gnulib iconv iconvdata wctype manual shadow gshadow po argp   \
-      crypt localedata timezone rt conform debug    \
+all-subdirs = csu assert ctype locale intl catgets math mathvect setjmp    \
+      signal stdlib stdio-common libio malloc string wcsmbs time    \
+      dirent grp pwd posix io termios resource misc socket sysvipc  \
+      gmon gnulib iconv iconvdata wctype manual shadow gshadow po   \
+      argp crypt localedata timezone rt conform debug    \
       $(add-on-subdirs) dlfcn elf

 ifndef avoid-generated
diff --git a/Makerules b/Makerules
index 6b30e8c..058e748 100644
--- a/Makerules
+++ b/Makerules
@@ -965,7 +965,22 @@ $(inst_libdir)/libc.so: $(common-objpfx)format.lds \
       ' AS_NEEDED (' $(rtlddir)/$(rtld-installed-name) ') )' \
  ) > $@.new
  mv -f $@.new $@
-
+else
+ifeq ($(subdir),mathvect)
+# We need to install libm.so as linker script
+# for more comfortable use of libmvect library.
+subdir_install: $(inst_libdir)/libm.so
+$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
+ $(common-objpfx)math/libm.so$(libm.so-version) \
+ $(common-objpfx)mathvect/libmvect.so$(libmvect.so-version) \
+ $(+force)
+ (echo '/* GNU ld script */';\
+ cat $<; \
+ echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
+      'AS_NEEDED ( $(slibdir)/libmvect.so$(libmvect.so-version) ) )' \
+ ) > $@.new
+ mv -f $@.new $@
+endif
 endif

 else
diff --git a/shlib-versions b/shlib-versions
index 40469bd..8fddc04 100644
--- a/shlib-versions
+++ b/shlib-versions
@@ -35,6 +35,12 @@ sh.*-.*-linux.* libm=6 GLIBC_2.2
 .*-.*-linux.* libm=6
 .*-.*-gnu-gnu.* libm=6

+# We provide libmvect.so.1 starting from GLIBC_2.21 symbol set.
+sparc64.*-.*-linux.* libmvect=1 GLIBC_2.21
+sh.*-.*-linux.* libmvect=1 GLIBC_2.21
+.*-.*-linux.* libmvect=1 GLIBC_2.21
+.*-.*-gnu-gnu.* libmvect=1 GLIBC_2.21
+
 # We provide libc.so.6 for Linux kernel versions 2.0 and later.
 sh.*-.*-linux.* libc=6 GLIBC_2.2
 sparc64.*-.*-linux.* libc=6 GLIBC_2.2
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
index 1cb3ec5..520ed09 100644
--- a/sysdeps/x86_64/fpu/Makefile
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -1,3 +1,3 @@
-ifeq ($(subdir),math)
-libm-support += svml_d_cos4_core svml_d_cos_data
+ifeq ($(subdir),mathvect)
+libmvect-support += svml_d_cos4_core svml_d_cos_data
 endif
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index 1717a7a..1fd0921 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -1,4 +1,4 @@
-libm {
+libmvect {
diff --git a/sysdeps/x86_64/fpu/svml_d_cos4_core.S
b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
index 8334875..d7fd42f 100644
--- a/sysdeps/x86_64/fpu/svml_d_cos4_core.S
+++ b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
@@ -167,7 +168,7 @@ _LBL_1_10:
         vmovsd    328(%rsp,%r15), %xmm0
         vzeroupper

-        call      __cos@PLT
+        call      cos@PLT

         vmovsd    %xmm0, 392(%rsp,%r15)
         jmp       _LBL_1_8
@@ -178,8 +179,9 @@ _LBL_1_12:
         vmovsd    320(%rsp,%r15), %xmm0
         vzeroupper

-        call      __cos@PLT
+        call      cos@PLT

         vmovsd    %xmm0, 384(%rsp,%r15)
         jmp       _LBL_1_7


WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-24 19:46     ` Andrew Senkevich
@ 2014-09-24 20:19       ` Joseph S. Myers
  2014-09-25 15:18         ` Andrew Senkevich
  0 siblings, 1 reply; 67+ messages in thread
From: Joseph S. Myers @ 2014-09-24 20:19 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Wed, 24 Sep 2014, Andrew Senkevich wrote:

> Hi Joseph,
> 
> >> > 1. Should functions go in libm or a separate libmvec library?
> >>
> >> If integrate new functions with libm it will be easier for developers
> >> to employ vectorization and should improve acceptance.
> >> Libmvec case affects compiler options and it seems new header need to
> >> be included instead of math.h, or is it OK include it in math.h?
> >> But libmvec could be better from the side of optional build (mentioned
> >> later in 2.) in case of addition most modern implementations.
> >
> > There is also the option of installing libm.so as a linker script that
> > refers to libmvec inside AS_NEEDED - libc.so is already a linker script
> > after all.  That way, libmvec would automatically be linked into programs
> > that need it (preserving compatibility with the POSIX rules about what
> > library names need specifying to get what functions), without restricting
> > the tools that can build glibc if vector extensions are added for which
> > tool support is recent, but building libmvec is optional.
> 
> is it OK to have following changes in GLIBC build:

This patch doesn't seem to be relative to current sources.

> diff --git a/Makerules b/Makerules
> index 6b30e8c..058e748 100644
> --- a/Makerules
> +++ b/Makerules
> @@ -965,7 +965,22 @@ $(inst_libdir)/libc.so: $(common-objpfx)format.lds \
>        ' AS_NEEDED (' $(rtlddir)/$(rtld-installed-name) ') )' \
>   ) > $@.new
>   mv -f $@.new $@
> -
> +else
> +ifeq ($(subdir),mathvect)
> +# We need to install libm.so as linker script
> +# for more comfortable use of libmvect library.

If consensus ends up being to have such a library (libmvec or libmvect?), 
then the installation rules for libm.so as a linker script should go in 
math/Makefile, not the toplevel Makerules.  (I don't know what if any 
changes might be needed to allow subdirectories to provide libraries as 
linker scripts.)

> diff --git a/shlib-versions b/shlib-versions
> index 40469bd..8fddc04 100644
> --- a/shlib-versions
> +++ b/shlib-versions
> @@ -35,6 +35,12 @@ sh.*-.*-linux.* libm=6 GLIBC_2.2
>  .*-.*-linux.* libm=6
>  .*-.*-gnu-gnu.* libm=6
> 
> +# We provide libmvect.so.1 starting from GLIBC_2.21 symbol set.
> +sparc64.*-.*-linux.* libmvect=1 GLIBC_2.21
> +sh.*-.*-linux.* libmvect=1 GLIBC_2.21
> +.*-.*-linux.* libmvect=1 GLIBC_2.21
> +.*-.*-gnu-gnu.* libmvect=1 GLIBC_2.21

No.  No architecture-specific or OS-specific content goes in 
shlib-versions outside the appropriate sysdeps directories any more, and 
shlib-versions no longer has a first column listing patterns matching 
configuration triplets.

Since the library will be architecture-specific, shlib-versions entries 
for its symbol versions would go in the relevant architecture files only 
when support is actually added for a given library.  But actually you 
don't need such entries, since each architecture will have Versions file 
entries for the symbols provided in this library on that architecture; 
symbol versions in shlib-versions only serve to override Versions files.  
So all you need is a single toplevel shlib-versions line to specify the 
SONAME.  That is, a line just saying

libmvect=1

or similar.

Also, some configuration support will be needed to allow an architecture 
to specify (via its sysdeps configure fragments, I suppose) whether it 
supports this library at all, so builds for other architectures don't 
attempt to build it.  If not built, of course libm.so should remain as-is; 
it can't become a linker script referring to the new library.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-24 20:19       ` Joseph S. Myers
@ 2014-09-25 15:18         ` Andrew Senkevich
  2014-09-25 15:40           ` H.J. Lu
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-25 15:18 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

> If consensus ends up being to have such a library (libmvec or libmvect?),
> then the installation rules for libm.so as a linker script should go in
> math/Makefile, not the toplevel Makerules.  (I don't know what if any
> changes might be needed to allow subdirectories to provide libraries as
> linker scripts.)

There were three options about the place where to add vectorized math functions:

1. GLIBC (libm)
2. GLIBC (additional library)
3. GCC

In GLIBC cases build of vectorized functions can be conditional, no
additional -lmvec required because of libm.so installed as linked
script in case of vectorized functions available, so it seems not very
important whether functions located in additional library or in libm.

We have different feedback on that three options. But it is important
to determine now for further implementation.

How can we achieve the final decision?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-25 15:18         ` Andrew Senkevich
@ 2014-09-25 15:40           ` H.J. Lu
  2014-09-25 19:27             ` Carlos O'Donell
  0 siblings, 1 reply; 67+ messages in thread
From: H.J. Lu @ 2014-09-25 15:40 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Joseph S. Myers, libc-alpha

On Thu, Sep 25, 2014 at 8:17 AM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
>> If consensus ends up being to have such a library (libmvec or libmvect?),
>> then the installation rules for libm.so as a linker script should go in
>> math/Makefile, not the toplevel Makerules.  (I don't know what if any
>> changes might be needed to allow subdirectories to provide libraries as
>> linker scripts.)
>
> There were three options about the place where to add vectorized math functions:
>
> 1. GLIBC (libm)
> 2. GLIBC (additional library)
> 3. GCC
>
> In GLIBC cases build of vectorized functions can be conditional, no
> additional -lmvec required because of libm.so installed as linked
> script in case of vectorized functions available, so it seems not very
> important whether functions located in additional library or in libm.
>

I don't think they should be in libm since most of applications
won't use those vector functions, which increase libm size
unnecessarily. A separate library is better.


-- 
H.J.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-25 15:40           ` H.J. Lu
@ 2014-09-25 19:27             ` Carlos O'Donell
  2014-09-25 19:37               ` H.J. Lu
  0 siblings, 1 reply; 67+ messages in thread
From: Carlos O'Donell @ 2014-09-25 19:27 UTC (permalink / raw)
  To: H.J. Lu, Andrew Senkevich; +Cc: Joseph S. Myers, libc-alpha

On 09/25/2014 11:40 AM, H.J. Lu wrote:
> On Thu, Sep 25, 2014 at 8:17 AM, Andrew Senkevich
> <andrew.n.senkevich@gmail.com> wrote:
>>> If consensus ends up being to have such a library (libmvec or libmvect?),
>>> then the installation rules for libm.so as a linker script should go in
>>> math/Makefile, not the toplevel Makerules.  (I don't know what if any
>>> changes might be needed to allow subdirectories to provide libraries as
>>> linker scripts.)
>>
>> There were three options about the place where to add vectorized math functions:
>>
>> 1. GLIBC (libm)
>> 2. GLIBC (additional library)
>> 3. GCC
>>
>> In GLIBC cases build of vectorized functions can be conditional, no
>> additional -lmvec required because of libm.so installed as linked
>> script in case of vectorized functions available, so it seems not very
>> important whether functions located in additional library or in libm.
>>
> 
> I don't think they should be in libm since most of applications
> won't use those vector functions, which increase libm size
> unnecessarily. A separate library is better.

I agree. A distinct libmvec.so is best.

I see the consensus that #2 is the way forward.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-25 19:27             ` Carlos O'Donell
@ 2014-09-25 19:37               ` H.J. Lu
  2014-09-25 19:55                 ` Carlos O'Donell
  0 siblings, 1 reply; 67+ messages in thread
From: H.J. Lu @ 2014-09-25 19:37 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Andrew Senkevich, Joseph S. Myers, libc-alpha

On Thu, Sep 25, 2014 at 12:27 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/25/2014 11:40 AM, H.J. Lu wrote:
>> On Thu, Sep 25, 2014 at 8:17 AM, Andrew Senkevich
>> <andrew.n.senkevich@gmail.com> wrote:
>>>> If consensus ends up being to have such a library (libmvec or libmvect?),
>>>> then the installation rules for libm.so as a linker script should go in
>>>> math/Makefile, not the toplevel Makerules.  (I don't know what if any
>>>> changes might be needed to allow subdirectories to provide libraries as
>>>> linker scripts.)
>>>
>>> There were three options about the place where to add vectorized math functions:
>>>
>>> 1. GLIBC (libm)
>>> 2. GLIBC (additional library)
>>> 3. GCC
>>>
>>> In GLIBC cases build of vectorized functions can be conditional, no
>>> additional -lmvec required because of libm.so installed as linked
>>> script in case of vectorized functions available, so it seems not very
>>> important whether functions located in additional library or in libm.
>>>
>>
>> I don't think they should be in libm since most of applications
>> won't use those vector functions, which increase libm size
>> unnecessarily. A separate library is better.
>
> I agree. A distinct libmvec.so is best.
>
> I see the consensus that #2 is the way forward.

Since this vector library targets GCC, there are
pros to put it in GCC.


-- 
H.J.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-21 16:31 ` Andi Kleen
@ 2014-09-25 19:43   ` Carlos O'Donell
  0 siblings, 0 replies; 67+ messages in thread
From: Carlos O'Donell @ 2014-09-25 19:43 UTC (permalink / raw)
  To: Andi Kleen, Andrew Senkevich; +Cc: libc-alpha

On 09/21/2014 12:31 PM, Andi Kleen wrote:
> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
> 
> 
>> due to latest OpenMP and CilkPlus features supported in GCC vectorized
>> math functions can be utilized more widely. So we need to find general
>> way how to add vectorized math functions to Glibc.
>> Lets discuss the following questions which were raised after first
>> patch x86_64 specific was sent.
>>
>> 1. Should functions go in libm or a separate libmvec library?
> 
> I would just put them into libm. Simplifies things for everyone.

I disagree, I think libmvec.so is the best choice.

Adding to libm.so adds features that we are very new to libm.so,
that we might need to fix, that not all users use. It is the perfect
candidate to start out as a distinct library.

Nothing prevents us from putting libmvec.so symbols into libm.so in
the future if this turns out to be what we want.

I argue for a conservative approach:

* libmvec.so.

* limb.so as a linker script with AS_NEEDED for libmvec.so.

>>
>> 2. What requirements on the compiler / assembler versions used are imposed
>> by the requirement that the ABI provided by glibc's shared libraries must
>> not depend on the tools used to build glibc, and what such requirements is
>> it OK to impose (it may be OK to move to GCC 4.6 as minimum compiler at
>> present, but requiring a more recent version would be a problem; we'd need
>> to consider what binutils version we can require)?  If a separate libmvec
>> is used, is it OK simply not to build it if those requirements aren't met?
> 
> Requiring newer binutils should be fine. I believe glibc has done that
> in the past.

Agreed.

>> (It's definitely not OK for the ABI of a library to vary incompatibly, but
>> it might be OK for the presence of a library to be conditional.)
> 
> It would be even fine to leave out the symbols from a libm
> in this case. The error message to the user should be of similar
> quality as for a missing library (in fact likely better)

I'm not opposed, but I think it's overly complicated.

>> 6. How do we handle different glibc versions having vectorized functions
>> for different vector ISA extensions?
> 
> If your glibc does not not support what you compiled for the dynamic
> linker will just error out. Seems reasonable to me.

Either it errors out because the underlying symbols aren't there, or
it errors out because the object ISA and the runtime ISA don't match.

Note that we have no present support for the dynamic loader to detect
runtime ISA mismatch. We have some support for this in the static
linker via .gnu_attributes. It would be nice to have some fast bit-mask
checking at runtime to detect invalid objects being loaded.

Note that the ld.so.cache takes care of some of this, and that's used
to filter out libraries you shouldn't be able to load. Yet if the cache
isn't populated you still search and load whatever is on disk.

>>
>> 7. Note that if we're relying on #pragma omp declare simd meaning a precise
>> set of function versions are available - with a guarantee that no future
>> compiler version will interpret is also meaning an AVX512 version is
>> available, for example, so that it's safely possible to use older
>> glibc
> 
> I guess that would need an (new?) option to gcc to declare what it
> can expect to be available in the library? With distributions
> setting the default at gcc build time.

I don't understand this question.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-25 19:37               ` H.J. Lu
@ 2014-09-25 19:55                 ` Carlos O'Donell
  2014-09-25 20:03                   ` H.J. Lu
  0 siblings, 1 reply; 67+ messages in thread
From: Carlos O'Donell @ 2014-09-25 19:55 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Andrew Senkevich, Joseph S. Myers, libc-alpha

On 09/25/2014 03:37 PM, H.J. Lu wrote:
> On Thu, Sep 25, 2014 at 12:27 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 09/25/2014 11:40 AM, H.J. Lu wrote:
>>> On Thu, Sep 25, 2014 at 8:17 AM, Andrew Senkevich
>>> <andrew.n.senkevich@gmail.com> wrote:
>>>>> If consensus ends up being to have such a library (libmvec or libmvect?),
>>>>> then the installation rules for libm.so as a linker script should go in
>>>>> math/Makefile, not the toplevel Makerules.  (I don't know what if any
>>>>> changes might be needed to allow subdirectories to provide libraries as
>>>>> linker scripts.)
>>>>
>>>> There were three options about the place where to add vectorized math functions:
>>>>
>>>> 1. GLIBC (libm)
>>>> 2. GLIBC (additional library)
>>>> 3. GCC
>>>>
>>>> In GLIBC cases build of vectorized functions can be conditional, no
>>>> additional -lmvec required because of libm.so installed as linked
>>>> script in case of vectorized functions available, so it seems not very
>>>> important whether functions located in additional library or in libm.
>>>>
>>>
>>> I don't think they should be in libm since most of applications
>>> won't use those vector functions, which increase libm size
>>> unnecessarily. A separate library is better.
>>
>> I agree. A distinct libmvec.so is best.
>>
>> I see the consensus that #2 is the way forward.
> 
> Since this vector library targets GCC, there are
> pros to put it in GCC.
 
Sorry, I don't understand this part, and perhaps that's why I didn't
understand question 7 in the previous post.

What does it mean for the vector library to target GCC?

Cheers,
Carlos.
 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-25 19:55                 ` Carlos O'Donell
@ 2014-09-25 20:03                   ` H.J. Lu
  2014-09-25 20:48                     ` Carlos O'Donell
  0 siblings, 1 reply; 67+ messages in thread
From: H.J. Lu @ 2014-09-25 20:03 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Andrew Senkevich, Joseph S. Myers, libc-alpha

On Thu, Sep 25, 2014 at 12:55 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/25/2014 03:37 PM, H.J. Lu wrote:
>> On Thu, Sep 25, 2014 at 12:27 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>>> On 09/25/2014 11:40 AM, H.J. Lu wrote:
>>>> On Thu, Sep 25, 2014 at 8:17 AM, Andrew Senkevich
>>>> <andrew.n.senkevich@gmail.com> wrote:
>>>>>> If consensus ends up being to have such a library (libmvec or libmvect?),
>>>>>> then the installation rules for libm.so as a linker script should go in
>>>>>> math/Makefile, not the toplevel Makerules.  (I don't know what if any
>>>>>> changes might be needed to allow subdirectories to provide libraries as
>>>>>> linker scripts.)
>>>>>
>>>>> There were three options about the place where to add vectorized math functions:
>>>>>
>>>>> 1. GLIBC (libm)
>>>>> 2. GLIBC (additional library)
>>>>> 3. GCC
>>>>>
>>>>> In GLIBC cases build of vectorized functions can be conditional, no
>>>>> additional -lmvec required because of libm.so installed as linked
>>>>> script in case of vectorized functions available, so it seems not very
>>>>> important whether functions located in additional library or in libm.
>>>>>
>>>>
>>>> I don't think they should be in libm since most of applications
>>>> won't use those vector functions, which increase libm size
>>>> unnecessarily. A separate library is better.
>>>
>>> I agree. A distinct libmvec.so is best.
>>>
>>> I see the consensus that #2 is the way forward.
>>
>> Since this vector library targets GCC, there are
>> pros to put it in GCC.
>
> Sorry, I don't understand this part, and perhaps that's why I didn't
> understand question 7 in the previous post.
>
> What does it mean for the vector library to target GCC?
>

From

https://sourceware.org/glibc/wiki/libm#Addition_of_x86_64_vector_math_functions_to_Glibc

3.1. Goal

Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD
constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
and Cilk Plus constructs (#6-8 in
http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
on x86_64 by adding SSE4, AVX and AVX2 vector implementations of
several vector math functions (float and double versions). AVX-512
versions are planned to be added later. These functions can be also
used manually (with intrinsics) by developers to obtain speedup.

So it is mainly for GCC.

-- 
H.J.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-25 20:03                   ` H.J. Lu
@ 2014-09-25 20:48                     ` Carlos O'Donell
  2014-09-26 13:46                       ` Andrew Senkevich
  2014-09-26 15:03                       ` H.J. Lu
  0 siblings, 2 replies; 67+ messages in thread
From: Carlos O'Donell @ 2014-09-25 20:48 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Andrew Senkevich, Joseph S. Myers, libc-alpha

On 09/25/2014 04:03 PM, H.J. Lu wrote:
>> Sorry, I don't understand this part, and perhaps that's why I didn't
>> understand question 7 in the previous post.
>>
>> What does it mean for the vector library to target GCC?
>>
> 
> From
> 
> https://sourceware.org/glibc/wiki/libm#Addition_of_x86_64_vector_math_functions_to_Glibc

Thanks for the pointer. I understand now.
 
> 3.1. Goal
> 
> Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD
> constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
> and Cilk Plus constructs (#6-8 in
> http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
> on x86_64 by adding SSE4, AVX and AVX2 vector implementations of
> several vector math functions (float and double versions). AVX-512
> versions are planned to be added later. These functions can be also
> used manually (with intrinsics) by developers to obtain speedup.
> 
> So it is mainly for GCC.

The only counter-argument to that is that a single implementation
in glibc can be shared by gcc and llvm or any other compiler. As
noted in "3.5 Open questions, a."

Intel needs to decide where they want this piece of technology
to reside. I don't know that the community can make this choice
for Intel.

The community is ready to work with Intel to implement this in
glibc.

I've tried to clarify a few of these:
https://sourceware.org/glibc/wiki/libm#Open_questions

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-25 20:48                     ` Carlos O'Donell
@ 2014-09-26 13:46                       ` Andrew Senkevich
  2014-09-26 14:13                         ` Carlos O'Donell
  2014-09-26 14:15                         ` Carlos O'Donell
  2014-09-26 15:03                       ` H.J. Lu
  1 sibling, 2 replies; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-26 13:46 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: H.J. Lu, Joseph S. Myers, libc-alpha

>> 3.1. Goal
>>
>> Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD
>> constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
>> and Cilk Plus constructs (#6-8 in
>> http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
>> on x86_64 by adding SSE4, AVX and AVX2 vector implementations of
>> several vector math functions (float and double versions). AVX-512
>> versions are planned to be added later. These functions can be also
>> used manually (with intrinsics) by developers to obtain speedup.
>>
>> So it is mainly for GCC.
>
> The only counter-argument to that is that a single implementation
> in glibc can be shared by gcc and llvm or any other compiler. As
> noted in "3.5 Open questions, a."

Yes, it was a little bit inaccurate, corrected on wiki.

> Intel needs to decide where they want this piece of technology
> to reside. I don't know that the community can make this choice
> for Intel.
>
> The community is ready to work with Intel to implement this in
> glibc.

Yes, we also would like to add vector functions to new library libmvec.

So lets discuss Glibc build changes.
Build of libmvec (and hence libm.so installation) need to be
architecture dependent and optional, and some changes already was
discussed in https://sourceware.org/ml/libc-alpha/2014-09/msg00578.html.
Is it OK additionally to have configure option --enable-mathvec with
default=no and with default=yes for x86_86 build?



--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-26 13:46                       ` Andrew Senkevich
@ 2014-09-26 14:13                         ` Carlos O'Donell
  2014-09-26 14:15                         ` Carlos O'Donell
  1 sibling, 0 replies; 67+ messages in thread
From: Carlos O'Donell @ 2014-09-26 14:13 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: H.J. Lu, Joseph S. Myers, libc-alpha

On 09/26/2014 09:45 AM, Andrew Senkevich wrote:
>>> 3.1. Goal
>>>
>>> Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD
>>> constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
>>> and Cilk Plus constructs (#6-8 in
>>> http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
>>> on x86_64 by adding SSE4, AVX and AVX2 vector implementations of
>>> several vector math functions (float and double versions). AVX-512
>>> versions are planned to be added later. These functions can be also
>>> used manually (with intrinsics) by developers to obtain speedup.
>>>
>>> So it is mainly for GCC.
>>
>> The only counter-argument to that is that a single implementation
>> in glibc can be shared by gcc and llvm or any other compiler. As
>> noted in "3.5 Open questions, a."
> 
> Yes, it was a little bit inaccurate, corrected on wiki.

Thanks. Having the wiki document is useful to allow us and others
to stay organized over the decisions we've already made.

I've moved the two answered questions into a "Consensus" header
for the vector library.

Please correct this if I'm wrong.

>> Intel needs to decide where they want this piece of technology
>> to reside. I don't know that the community can make this choice
>> for Intel.
>>
>> The community is ready to work with Intel to implement this in
>> glibc.
> 
> Yes, we also would like to add vector functions to new library libmvec.

OK.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-26 13:46                       ` Andrew Senkevich
  2014-09-26 14:13                         ` Carlos O'Donell
@ 2014-09-26 14:15                         ` Carlos O'Donell
  2014-09-30 15:00                           ` Andrew Senkevich
  1 sibling, 1 reply; 67+ messages in thread
From: Carlos O'Donell @ 2014-09-26 14:15 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: H.J. Lu, Joseph S. Myers, libc-alpha

On 09/26/2014 09:45 AM, Andrew Senkevich wrote:
> So lets discuss Glibc build changes.
> Build of libmvec (and hence libm.so installation) need to be
> architecture dependent and optional, and some changes already was
> discussed in https://sourceware.org/ml/libc-alpha/2014-09/msg00578.html.
> Is it OK additionally to have configure option --enable-mathvec with
> default=no and with default=yes for x86_86 build?

Under what circumstances would a non-x86_64 target build with
--enable-mathvec?

When they have their own API/ABI standard to implement and provide
in libmvec.so?

What's wrong with simply producing a libmvec.so that has no public
symbols?

It simplifies everything to just always ship libmvec.so, even if
it's empty.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-25 20:48                     ` Carlos O'Donell
  2014-09-26 13:46                       ` Andrew Senkevich
@ 2014-09-26 15:03                       ` H.J. Lu
  2014-09-26 15:48                         ` Carlos O'Donell
  1 sibling, 1 reply; 67+ messages in thread
From: H.J. Lu @ 2014-09-26 15:03 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Andrew Senkevich, Joseph S. Myers, libc-alpha

On Thu, Sep 25, 2014 at 1:48 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/25/2014 04:03 PM, H.J. Lu wrote:
>>> Sorry, I don't understand this part, and perhaps that's why I didn't
>>> understand question 7 in the previous post.
>>>
>>> What does it mean for the vector library to target GCC?
>>>
>>
>> From
>>
>> https://sourceware.org/glibc/wiki/libm#Addition_of_x86_64_vector_math_functions_to_Glibc
>
> Thanks for the pointer. I understand now.
>
>> 3.1. Goal
>>
>> Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD
>> constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
>> and Cilk Plus constructs (#6-8 in
>> http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
>> on x86_64 by adding SSE4, AVX and AVX2 vector implementations of
>> several vector math functions (float and double versions). AVX-512
>> versions are planned to be added later. These functions can be also
>> used manually (with intrinsics) by developers to obtain speedup.
>>
>> So it is mainly for GCC.
>
> The only counter-argument to that is that a single implementation
> in glibc can be shared by gcc and llvm or any other compiler. As
> noted in "3.5 Open questions, a."
>

I don't think it is an issue since llvm uses libraries from GCC
by default and put it in GLIBC doesn't help llvm on non-GLIBC
based systems either.

Put it in GLIBC means that GCC 5.0 may not support vector
math, depending on the build or target GLIBC.  It is not a good
GCC user experience.

Put it in GCC means that GCC 5.0 supports vector math on
all platforms supported by vector math, including non-GLIBC
based systems.

That is true that programmers who want to use vector math
directly need to install GCC 5.0.  That is one reason I am
asking who is the main user of vector math.

-- 
H.J.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-26 15:03                       ` H.J. Lu
@ 2014-09-26 15:48                         ` Carlos O'Donell
  2014-09-26 16:08                           ` H.J. Lu
  0 siblings, 1 reply; 67+ messages in thread
From: Carlos O'Donell @ 2014-09-26 15:48 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Andrew Senkevich, Joseph S. Myers, libc-alpha

On 09/26/2014 11:03 AM, H.J. Lu wrote:
> Put it in GLIBC means that GCC 5.0 may not support vector
> math, depending on the build or target GLIBC.  It is not a good
> GCC user experience.

The same goes for all the other C11 and C++11 features we
coordinate with glibc? I don't see how this has anything to do
with a "user experience." The distribution maintainers need to
make sure glibc is up to date, and gcc's required features are
present. This is always how it works.

> That is true that programmers who want to use vector math
> directly need to install GCC 5.0.  That is one reason I am
> asking who is the main user of vector math.

That's a serious upgrade for many users.

While backporting libmvec.so is pretty simple, and allows
the feature to be present for immediate use directly by a
developer, and later to be used when they are ready to
upgrade to gcc 5.0.

I think this chioce may actually be larger than just Intel.

For example IBM, and particularly their Power vector math
functions were explained to me as being callable directly
by developers. Thus Power might want libmvec.so in glibc?

I still believe that developers will want to call these
by hand if possible for code that is hand-optimized.

That IMO calls out for this to be libmvec.so in glibc.

Not to mention that gcc has no math testing infrastructure
and glibc does. We can immediately put all of this test
infrastructure to play for libmvec.so.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-26 15:48                         ` Carlos O'Donell
@ 2014-09-26 16:08                           ` H.J. Lu
  2014-09-26 17:55                             ` Carlos O'Donell
  0 siblings, 1 reply; 67+ messages in thread
From: H.J. Lu @ 2014-09-26 16:08 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Andrew Senkevich, Joseph S. Myers, libc-alpha

On Fri, Sep 26, 2014 at 8:48 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/26/2014 11:03 AM, H.J. Lu wrote:
>> Put it in GLIBC means that GCC 5.0 may not support vector
>> math, depending on the build or target GLIBC.  It is not a good
>> GCC user experience.
>
> The same goes for all the other C11 and C++11 features we
> coordinate with glibc? I don't see how this has anything to do
> with a "user experience." The distribution maintainers need to
> make sure glibc is up to date, and gcc's required features are
> present. This is always how it works.

When there is no choice, then you don't have a choice.

>> That is true that programmers who want to use vector math
>> directly need to install GCC 5.0.  That is one reason I am
>> asking who is the main user of vector math.
>
> That's a serious upgrade for many users.

I view it quite opposite.  No one I know of installs a new GLIBC
unless it comes from the system.  But they may install a new
GCC.   It is much easier to install a new GCC than a new GLIBC.

> While backporting libmvec.so is pretty simple, and allows
> the feature to be present for immediate use directly by a
> developer, and later to be used when they are ready to
> upgrade to gcc 5.0.

Same backport goes for GCC.

> I think this chioce may actually be larger than just Intel.
>
> For example IBM, and particularly their Power vector math
> functions were explained to me as being callable directly
> by developers. Thus Power might want libmvec.so in glibc?

Does Power have the same API as x86?  If not, how will they
be used by programmers?

Again, we need to decide

1. Who is the main user.
2. How it is used by the main user.
3. What is the impact on the programmers.

If we put it in GLIBC, we should have a API with a generic
implementation and each target can have optimized implementation.

> I still believe that developers will want to call these
> by hand if possible for code that is hand-optimized.

One can do that, regardless whether vector math is in
GCC or GLIBC.

> That IMO calls out for this to be libmvec.so in glibc.
>
> Not to mention that gcc has no math testing infrastructure
> and glibc does. We can immediately put all of this test
> infrastructure to play for libmvec.so.

Vector math can have a testsuite, like other run-time libraries.


-- 
H.J.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-26 16:08                           ` H.J. Lu
@ 2014-09-26 17:55                             ` Carlos O'Donell
  2014-09-26 18:06                               ` H.J. Lu
  2014-09-30 16:17                               ` Andrew Pinski
  0 siblings, 2 replies; 67+ messages in thread
From: Carlos O'Donell @ 2014-09-26 17:55 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Andrew Senkevich, Joseph S. Myers, libc-alpha

On 09/26/2014 12:08 PM, H.J. Lu wrote:
>> I think this chioce may actually be larger than just Intel.
>>
>> For example IBM, and particularly their Power vector math
>> functions were explained to me as being callable directly
>> by developers. Thus Power might want libmvec.so in glibc?
> 
> Does Power have the same API as x86?  If not, how will they
> be used by programmers?

Power does not have the same API.

I expect that David Edhelson was talking about these:
http://pic.dhe.ibm.com/infocenter/compbg/v121v141/index.jsp?topic=%2Fcom.ibm.xlcpp121.bg.doc%2Fproguide%2Fvector.html

Though I haven't verified.

> Again, we need to decide
> 
> 1. Who is the main user.

Normal developers.

> 2. How it is used by the main user.

They call those functions.

> 3. What is the impact on the programmers.

If the functions are in glibc, we can deploy them independent
of compiler.

> If we put it in GLIBC, we should have a API with a generic
> implementation and each target can have optimized implementation.

I disagree.

Each target will likely have two APIs:

(a) The legacy API supported for compatibility with existing
    applications following the existing published APIs.
    e.g. IBM and Intel vector functions.

(b) A generic GNU implemetnation that all targets can have.

We aren't even talking about (b) yet.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-26 17:55                             ` Carlos O'Donell
@ 2014-09-26 18:06                               ` H.J. Lu
  2014-09-30 16:17                               ` Andrew Pinski
  1 sibling, 0 replies; 67+ messages in thread
From: H.J. Lu @ 2014-09-26 18:06 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Andrew Senkevich, Joseph S. Myers, libc-alpha

On Fri, Sep 26, 2014 at 10:55 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/26/2014 12:08 PM, H.J. Lu wrote:
>>> I think this chioce may actually be larger than just Intel.
>>>
>>> For example IBM, and particularly their Power vector math
>>> functions were explained to me as being callable directly
>>> by developers. Thus Power might want libmvec.so in glibc?
>>
>> Does Power have the same API as x86?  If not, how will they
>> be used by programmers?
>
> Power does not have the same API.
>
> I expect that David Edhelson was talking about these:
> http://pic.dhe.ibm.com/infocenter/compbg/v121v141/index.jsp?topic=%2Fcom.ibm.xlcpp121.bg.doc%2Fproguide%2Fvector.html
>
> Though I haven't verified.
>
>> Again, we need to decide
>>
>> 1. Who is the main user.
>
> Normal developers.
>

This isn't the same as the goal list at

https://sourceware.org/glibc/wiki/libm#Addition_of_x86_64_vector_math_functions_to_Glibc

which says:

3.1. Goal

Main goal is utilize of SIMD constructs in OpenMP4.0 (#2.8 in
http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf and Cilk Plus (#6-8
in http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
on x86_64 by adding vector implementations of several vector math
functions (float and double versions).

which is pretty much targeting compiler.


-- 
H.J.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-26 14:15                         ` Carlos O'Donell
@ 2014-09-30 15:00                           ` Andrew Senkevich
  2014-09-30 15:44                             ` Andreas Schwab
  2014-09-30 16:35                             ` Joseph S. Myers
  0 siblings, 2 replies; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-30 15:00 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Joseph S. Myers, libc-alpha

2014-09-26 18:15 GMT+04:00 Carlos O'Donell <carlos@redhat.com>:
> On 09/26/2014 09:45 AM, Andrew Senkevich wrote:
>> So lets discuss Glibc build changes.
>> Build of libmvec (and hence libm.so installation) need to be
>> architecture dependent and optional, and some changes already was
>> discussed in https://sourceware.org/ml/libc-alpha/2014-09/msg00578.html.
>> Is it OK additionally to have configure option --enable-mathvec with
>> default=no and with default=yes for x86_86 build?
>
> Under what circumstances would a non-x86_64 target build with
> --enable-mathvec?
>
> When they have their own API/ABI standard to implement and provide
> in libmvec.so?
>
> What's wrong with simply producing a libmvec.so that has no public
> symbols?
>
> It simplifies everything to just always ship libmvec.so, even if
> it's empty.

Based on previous discussion, now we have the following changes:

diff --git a/configure b/configure
index 89566c5..5456c43 100755
--- a/configure
+++ b/configure
@@ -4521,7 +4521,7 @@ $as_echo_n "checking version of $AS... " >&6; }
   ac_prog_version=`$AS --version 2>&1 | sed -n 's/^.*GNU assembler.*
\([0-9]*\.[0-9.]*\).*$/\1/p'`
   case $ac_prog_version in
     '') ac_prog_version="v. ?.??, bad"; ac_verc_fail=yes;;
-    2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*)
+    2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*)
        ac_prog_version="$ac_prog_version, ok"; ac_verc_fail=no;;
     *) ac_prog_version="$ac_prog_version, bad"; ac_verc_fail=yes;;

diff --git a/configure.ac b/configure.ac
index 82d0896..c5c1758 100644
--- a/configure.ac
+++ b/configure.ac
@@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
 # Accept binutils 2.20 or newer.
 AC_CHECK_PROG_VER(AS, $AS, --version,
   [GNU assembler.* \([0-9]*\.[0-9.]*\)],
-  [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
critic_missing="$critic_missing as")
+  [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
critic_missing="$critic_missing as")
 AC_CHECK_PROG_VER(LD, $LD, --version,
   [GNU ld.* \([0-9][0-9]*\.[0-9.]*\)],
   [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], LD=:
critic_missing="$critic_missing ld")
diff --git a/Makeconfig b/Makeconfig
index 24a3b82..65136d9 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -1018,7 +1018,7 @@ all-subdirs = csu assert ctype locale intl
catgets math setjmp signal    \
       stdlib stdio-common libio malloc string wcsmbs time dirent    \
       grp pwd posix io termios resource misc socket sysvipc gmon    \
       gnulib iconv iconvdata wctype manual shadow gshadow po argp   \
-      crypt localedata timezone rt conform debug    \
+      crypt localedata timezone rt conform debug mathvec    \
       $(add-on-subdirs) dlfcn elf

 ifndef avoid-generated
diff --git a/bits/math-vector.h b/bits/math-vector.h
new file mode 100644
index 0000000..1c1c7ba
--- /dev/null
+++ b/bits/math-vector.h
@@ -0,0 +1,22 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License  published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+ include <math.h> instead."
+#endif
diff --git a/math/Makefile b/math/Makefile
index 866bc0f..1941b62 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -26,7 +26,7 @@ headers := math.h bits/mathcalls.h bits/mathinline.h
bits/huge_val.h \
    bits/huge_valf.h bits/huge_vall.h bits/inf.h bits/nan.h \
    fpu_control.h complex.h bits/cmathcalls.h fenv.h \
    bits/fenv.h bits/fenvinline.h bits/mathdef.h tgmath.h \
-   bits/math-finite.h
+   bits/math-finite.h bits/math-vector.h

 # FPU support code.
 aux := setfpucw fpu_control
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 8a94a7e..2d31a11 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -60,6 +60,15 @@ __MATHCALL (atan,, (_Mdouble_ __x));
 __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));

 /* Cosine of X.  */
+#if !defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cos
+__DECL_SIMD_cos
+#endif
+#if defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cosf
+__DECL_SIMD_cosf
+#endif
+#if defined _Mlong_double_ && defined __DECL_SIMD_cosl
+__DECL_SIMD_cosl
+#endif
 __MATHCALL (cos,, (_Mdouble_ __x));
 /* Sine of X.  */
 __MATHCALL (sin,, (_Mdouble_ __x));
diff --git a/math/math.h b/math/math.h
index 72ec2ca..32a7bec 100644
--- a/math/math.h
+++ b/math/math.h
@@ -27,6 +27,9 @@

 __BEGIN_DECLS

+/* Get machine-dependent vector math functions declarations */
+#include <bits/math-vector.h>
+
 /* Get machine-dependent HUGE_VAL value (returned on overflow).
    On all IEEE754 machines, this is +Infinity.  */
 #include <bits/huge_val.h>
diff --git a/mathvec/Makefile b/mathvec/Makefile
new file mode 100644
index 0000000..8aa4937
--- /dev/null
+++ b/mathvec/Makefile
@@ -0,0 +1,45 @@
+# Copyright (C) 2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Makefile for the vector math library.
+
+subdir := mathvec
+
+include ../Makeconfig
+
+extra-libs := libmvec
+extra-libs-others = $(extra-libs)
+
+libmvec.so-no-z-defs = yes
+libmvec-routines = $(strip $(libmvec-support))
+
+# We need to install libm.so as linker script
+# for more comfortable use of vector math library.
+subdir_install: $(inst_libdir)/libm.so
+
+$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
+ $(common-objpfx)math/libm.so$(libm.so-version) \
+ $(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
+ $(+force)
+ (echo '/* GNU ld script */';\
+ cat $<; \
+ echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
+ 'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
+ ) > $@.new
+ mv -f $@.new $@
+
+include ../Rules
diff --git a/sysdeps/unix/sysv/linux/shlib-versions
b/sysdeps/unix/sysv/linux/shlib-versions
index 9160557..4a32c8a 100644
--- a/sysdeps/unix/sysv/linux/shlib-versions
+++ b/sysdeps/unix/sysv/linux/shlib-versions
@@ -1,2 +1,3 @@
 libm=6
 libc=6
+libmvec=1
diff --git a/sysdeps/x86/fpu/bits/math-vector.h
b/sysdeps/x86/fpu/bits/math-vector.h
new file mode 100644
index 0000000..375c176
--- /dev/null
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -0,0 +1,45 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+ include <math.h> instead."
+#endif
+
+#if defined __x86_64__ && defined __FAST_MATH__
+# define __DECL_SIMD_AVX2
+# define __DECL_SIMD_SSE4
+
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case. */
+#  undef __DECL_SIMD_AVX2
+#  undef __DECL_SIMD_SSE4
+#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
+#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
+# elif defined _CILKPLUS && _CILKPLUS >= 0
+/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
+#  undef __DECL_SIMD_AVX2
+#  undef __DECL_SIMD_SSE4
+#  define __DECL_SIMD_AVX2 __attribute__((vector (nomask)))
+#  define __DECL_SIMD_SSE4 __attribute__((vector (processor(core_i7_sse4_2), \
+  nomask)))
+# endif
+
+# define __DECL_SIMD_cos  __DECL_SIMD_AVX2
+# define __DECL_SIMD_cosf __DECL_SIMD_SSE4
+#endif
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
new file mode 100644
index 0000000..588f2f8
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -0,0 +1,3 @@
+ifeq ($(subdir),mathvec)
+libmvec-support += svml_d_cos4_core svml_d_cos_data
+endif
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
new file mode 100644
index 0000000..3d433d2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Versions
@@ -0,0 +1,5 @@
+libmvec {
+  GLIBC_2.21 {
+    _ZGVdN4v_cos;
+  }
+}
diff --git a/sysdeps/x86_64/fpu/svml_d_cos4_core.S
b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
new file mode 100644
index 0000000..7316d2b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
@@ -0,0 +1,185 @@
+/* Function cos vectorized with AVX2.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+ .text
+ENTRY(_ZGVdN4v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *
+ *    ( low accuracy ( < 4ulp ) or enhanced performance ( half of
correct mantissa ) implementation )
+ *
+ *    Argument representation:
+ *    arg + Pi/2 = (N*Pi + R)
+ *
+ *    Result calculation:
+ *    cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ *    sin(R) is approximated by corresponding polynomial
+ */
+        pushq     %rbp
+        movq      %rsp, %rbp
+        andq      $-64, %rsp
+        subq      $448, %rsp
+        movq      __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
+        vmovapd   %ymm0, %ymm1
+        vmovupd   192(%rax), %ymm4
+        vmovupd   256(%rax), %ymm5
+
+/* ARGUMENT RANGE REDUCTION:
+ * Add Pi/2 to argument: X' = X+Pi/2
+ */
+        vaddpd    128(%rax), %ymm1, %ymm7
+
+/* Get absolute argument value: X' = |X'| */
+        vandpd    (%rax), %ymm7, %ymm2
+
+/* Y = X'*InvPi + RS : right shifter add */
+        vfmadd213pd %ymm5, %ymm4, %ymm7
+        vmovupd   1216(%rax), %ymm4
+
+/* Check for large arguments path */
+        vcmpnle_uqpd 64(%rax), %ymm2, %ymm3
+
+/* N = Y - RS : right shifter sub */
+        vsubpd    %ymm5, %ymm7, %ymm6
+        vmovupd   640(%rax), %ymm2
+
+/* SignRes = Y<<63 : shift LSB to MSB place for result sign */
+        vpsllq    $63, %ymm7, %ymm7
+
+/* N = N - 0.5 */
+        vsubpd    320(%rax), %ymm6, %ymm0
+        vmovmskpd %ymm3, %ecx
+
+/* R = X - N*Pi1 */
+        vmovapd   %ymm1, %ymm3
+        vfnmadd231pd %ymm0, %ymm2, %ymm3
+
+/* R = R - N*Pi2 */
+        vfnmadd231pd 704(%rax), %ymm0, %ymm3
+
+/* R = R - N*Pi3 */
+        vfnmadd132pd 768(%rax), %ymm3, %ymm0
+
+/* POLYNOMIAL APPROXIMATION:
+ * R2 = R*R
+ */
+        vmulpd    %ymm0, %ymm0, %ymm5
+        vfmadd213pd 1152(%rax), %ymm5, %ymm4
+        vfmadd213pd 1088(%rax), %ymm5, %ymm4
+        vfmadd213pd 1024(%rax), %ymm5, %ymm4
+
+/* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))) */
+        vfmadd213pd 960(%rax), %ymm5, %ymm4
+        vfmadd213pd 896(%rax), %ymm5, %ymm4
+        vfmadd213pd 832(%rax), %ymm5, %ymm4
+        vmulpd    %ymm5, %ymm4, %ymm6
+        vfmadd213pd %ymm0, %ymm0, %ymm6
+
+/* RECONSTRUCTION:
+ * Final sign setting: Res = Poly^SignRes
+ */
+        vxorpd    %ymm7, %ymm6, %ymm0
+        testl     %ecx, %ecx
+        jne       _LBL_1_3
+
+_LBL_1_2:
+        movq      %rbp, %rsp
+        popq      %rbp
+        ret
+
+_LBL_1_3:
+        vmovupd   %ymm1, 320(%rsp)
+        vmovupd   %ymm0, 384(%rsp)
+        je        _LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        vmovups   %ymm8, 224(%rsp)
+        vmovups   %ymm9, 192(%rsp)
+        vmovups   %ymm10, 160(%rsp)
+        vmovups   %ymm11, 128(%rsp)
+        vmovups   %ymm12, 96(%rsp)
+        vmovups   %ymm13, 64(%rsp)
+        vmovups   %ymm14, 32(%rsp)
+        vmovups   %ymm15, (%rsp)
+        movq      %rsi, 264(%rsp)
+        movq      %rdi, 256(%rsp)
+        movq      %r12, 296(%rsp)
+        movb      %dl, %r12b
+        movq      %r13, 288(%rsp)
+        movl      %ecx, %r13d
+        movq      %r14, 280(%rsp)
+        movl      %eax, %r14d
+        movq      %r15, 272(%rsp)
+
+_LBL_1_6:
+        btl       %r14d, %r13d
+        jc        _LBL_1_12
+
+_LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        _LBL_1_10
+
+_LBL_1_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        _LBL_1_6
+
+        vmovups   224(%rsp), %ymm8
+        vmovups   192(%rsp), %ymm9
+        vmovups   160(%rsp), %ymm10
+        vmovups   128(%rsp), %ymm11
+        vmovups   96(%rsp), %ymm12
+        vmovups   64(%rsp), %ymm13
+        vmovups   32(%rsp), %ymm14
+        vmovups   (%rsp), %ymm15
+        vmovupd   384(%rsp), %ymm0
+        movq      264(%rsp), %rsi
+        movq      256(%rsp), %rdi
+        movq      296(%rsp), %r12
+        movq      288(%rsp), %r13
+        movq      280(%rsp), %r14
+        movq      272(%rsp), %r15
+        jmp       _LBL_1_2
+
+_LBL_1_10:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    328(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 392(%rsp,%r15)
+        jmp       _LBL_1_8
+
+_LBL_1_12:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    320(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 384(%rsp,%r15)
+        jmp       _LBL_1_7
+END(_ZGVdN4v_cos)
diff --git a/sysdeps/x86_64/fpu/svml_d_cos_data.S
b/sysdeps/x86_64/fpu/svml_d_cos_data.S
new file mode 100644
index 0000000..7bb1aba
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos_data.S
@@ -0,0 +1,426 @@
+/* Data for vectorized cos.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+ .section .rodata, "a"
+
+ .align 64
+ .globl __gnu_svml_dcos_data
+__gnu_svml_dcos_data:
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 0
+ .long 1096810496
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1413754136
+ .long 1073291771
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1127743488
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 0
+ .long 1071644672
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 1073741824
+ .long 1074340347
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 0
+ .long 1048855597
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 2147483648
+ .long 1023952536
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1880851354
+ .long 998820945
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 1413754136
+ .long 1074340347
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 856972294
+ .long 1017226790
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 688016905
+ .long 962338001
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 1431655591
+ .long 3217380693
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 286303400
+ .long 1065423121
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 430291053
+ .long 3207201184
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 2150694560
+ .long 1053236707
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1174413873
+ .long 3193628213
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 1470296608
+ .long 1038487144
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 135375560
+ .long 3177836758
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 4294967295
+ .long 2147483647
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 1841940611
+ .long 1070882608
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 0
+ .long 1127219200
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 4294967295
+ .long 1127219199
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .long 8388606
+ .long 1127219200
+ .type __gnu_svml_dcos_data,@object
+ .size __gnu_svml_dcos_data,1600


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 15:00                           ` Andrew Senkevich
@ 2014-09-30 15:44                             ` Andreas Schwab
  2014-09-30 15:53                               ` Andrew Senkevich
  2014-09-30 16:35                             ` Joseph S. Myers
  1 sibling, 1 reply; 67+ messages in thread
From: Andreas Schwab @ 2014-09-30 15:44 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Carlos O'Donell, Joseph S. Myers, libc-alpha

Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:

> diff --git a/configure.ac b/configure.ac
> index 82d0896..c5c1758 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
>  # Accept binutils 2.20 or newer.
>  AC_CHECK_PROG_VER(AS, $AS, --version,
>    [GNU assembler.* \([0-9]*\.[0-9.]*\)],
> -  [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
> critic_missing="$critic_missing as")
> +  [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
> critic_missing="$critic_missing as")

What are you trying to do here?  That doesn't look correct.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 15:44                             ` Andreas Schwab
@ 2014-09-30 15:53                               ` Andrew Senkevich
  2014-09-30 16:16                                 ` Andreas Schwab
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-30 15:53 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Carlos O'Donell, Joseph S. Myers, libc-alpha

2014-09-30 19:44 GMT+04:00 Andreas Schwab <schwab@suse.de>:
> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
>
>> diff --git a/configure.ac b/configure.ac
>> index 82d0896..c5c1758 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
>>  # Accept binutils 2.20 or newer.
>>  AC_CHECK_PROG_VER(AS, $AS, --version,
>>    [GNU assembler.* \([0-9]*\.[0-9.]*\)],
>> -  [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
>> critic_missing="$critic_missing as")
>> +  [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
>> critic_missing="$critic_missing as")
>
> What are you trying to do here?  That doesn't look correct.

It is update of minimum required version of binutils to 2.22.
And of course comment need to be updated also...


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 15:53                               ` Andrew Senkevich
@ 2014-09-30 16:16                                 ` Andreas Schwab
  2014-09-30 16:30                                   ` Andrew Senkevich
  0 siblings, 1 reply; 67+ messages in thread
From: Andreas Schwab @ 2014-09-30 16:16 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Carlos O'Donell, Joseph S. Myers, libc-alpha

Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:

> 2014-09-30 19:44 GMT+04:00 Andreas Schwab <schwab@suse.de>:
>> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
>>
>>> diff --git a/configure.ac b/configure.ac
>>> index 82d0896..c5c1758 100644
>>> --- a/configure.ac
>>> +++ b/configure.ac
>>> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
>>>  # Accept binutils 2.20 or newer.
>>>  AC_CHECK_PROG_VER(AS, $AS, --version,
>>>    [GNU assembler.* \([0-9]*\.[0-9.]*\)],
>>> -  [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
>>> critic_missing="$critic_missing as")
>>> +  [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
>>> critic_missing="$critic_missing as")
>>
>> What are you trying to do here?  That doesn't look correct.
>
> It is update of minimum required version of binutils to 2.22.

Along with excluding 2.30, 2.31, 2.40, 2.41 ...

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-26 17:55                             ` Carlos O'Donell
  2014-09-26 18:06                               ` H.J. Lu
@ 2014-09-30 16:17                               ` Andrew Pinski
  1 sibling, 0 replies; 67+ messages in thread
From: Andrew Pinski @ 2014-09-30 16:17 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: H.J. Lu, Andrew Senkevich, Joseph S. Myers, libc-alpha

On Fri, Sep 26, 2014 at 10:55 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/26/2014 12:08 PM, H.J. Lu wrote:
>>> I think this chioce may actually be larger than just Intel.
>>>
>>> For example IBM, and particularly their Power vector math
>>> functions were explained to me as being callable directly
>>> by developers. Thus Power might want libmvec.so in glibc?
>>
>> Does Power have the same API as x86?  If not, how will they
>> be used by programmers?
>
> Power does not have the same API.

They do have a similar API; at least the Cell does:
https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/6BFB9899CEA5456800257360001938B3/$file/SIMD_Library_Specification_for_CBEA_1.2.pdf
(I Helped write this spec when I was at Sony).

Which looks like it is also at
http://pic.dhe.ibm.com/infocenter/compbg/v121v141/index.jsp?topic=%2Fcom.ibm.xlcpp121.bg.doc%2Fproguide%2Fmass_simd.html
.


Thanks,
Andrew Pinski

>
> I expect that David Edhelson was talking about these:
> http://pic.dhe.ibm.com/infocenter/compbg/v121v141/index.jsp?topic=%2Fcom.ibm.xlcpp121.bg.doc%2Fproguide%2Fvector.html




>
> Though I haven't verified.
>
>> Again, we need to decide
>>
>> 1. Who is the main user.
>
> Normal developers.
>
>> 2. How it is used by the main user.
>
> They call those functions.
>
>> 3. What is the impact on the programmers.
>
> If the functions are in glibc, we can deploy them independent
> of compiler.
>
>> If we put it in GLIBC, we should have a API with a generic
>> implementation and each target can have optimized implementation.
>
> I disagree.
>
> Each target will likely have two APIs:
>
> (a) The legacy API supported for compatibility with existing
>     applications following the existing published APIs.
>     e.g. IBM and Intel vector functions.
>
> (b) A generic GNU implemetnation that all targets can have.
>
> We aren't even talking about (b) yet.
>
> Cheers,
> Carlos.
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 16:16                                 ` Andreas Schwab
@ 2014-09-30 16:30                                   ` Andrew Senkevich
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-30 16:30 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Carlos O'Donell, Joseph S. Myers, libc-alpha

2014-09-30 20:16 GMT+04:00 Andreas Schwab <schwab@suse.de>:
> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
>
>> 2014-09-30 19:44 GMT+04:00 Andreas Schwab <schwab@suse.de>:
>>> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
>>>
>>>> diff --git a/configure.ac b/configure.ac
>>>> index 82d0896..c5c1758 100644
>>>> --- a/configure.ac
>>>> +++ b/configure.ac
>>>> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
>>>>  # Accept binutils 2.20 or newer.
>>>>  AC_CHECK_PROG_VER(AS, $AS, --version,
>>>>    [GNU assembler.* \([0-9]*\.[0-9.]*\)],
>>>> -  [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
>>>> critic_missing="$critic_missing as")
>>>> +  [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
>>>> critic_missing="$critic_missing as")
>>>
>>> What are you trying to do here?  That doesn't look correct.
>>
>> It is update of minimum required version of binutils to 2.22.
>
> Along with excluding 2.30, 2.31, 2.40, 2.41 ...

Yes, thank you, it need to be fixed:

-  [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
+  [2.1[0-9][0-9]*|2.2[2-9]*|2.[3-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:

And need to add the same check for ld version...


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 15:00                           ` Andrew Senkevich
  2014-09-30 15:44                             ` Andreas Schwab
@ 2014-09-30 16:35                             ` Joseph S. Myers
  2014-09-30 18:40                               ` Christoph Lauter
                                                 ` (2 more replies)
  1 sibling, 3 replies; 67+ messages in thread
From: Joseph S. Myers @ 2014-09-30 16:35 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Carlos O'Donell, libc-alpha

On Tue, 30 Sep 2014, Andrew Senkevich wrote:

> diff --git a/configure.ac b/configure.ac
> index 82d0896..c5c1758 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
>  # Accept binutils 2.20 or newer.
>  AC_CHECK_PROG_VER(AS, $AS, --version,
>    [GNU assembler.* \([0-9]*\.[0-9.]*\)],
> -  [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
> critic_missing="$critic_missing as")
> +  [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
> critic_missing="$critic_missing as")
>  AC_CHECK_PROG_VER(LD, $LD, --version,
>    [GNU ld.* \([0-9][0-9]*\.[0-9.]*\)],
>    [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], LD=:
> critic_missing="$critic_missing ld")

Any change to required versions needs to include an update to install.texi 
(and the generated INSTALL file).  It should also be proposed in a 
separate thread whose subject describes what is being proposed.

> +# We need to install libm.so as linker script
> +# for more comfortable use of vector math library.
> +subdir_install: $(inst_libdir)/libm.so
> +
> +$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
> + $(common-objpfx)math/libm.so$(libm.so-version) \
> + $(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
> + $(+force)
> + (echo '/* GNU ld script */';\
> + cat $<; \
> + echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
> + 'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
> + ) > $@.new
> + mv -f $@.new $@

Do you have ordering issues here?  It seems bad for math/ to install a 
direct symlink and then mathvec/ to change it to something else - all 
installation rules for libm should be in the math/ directory.

Do you need to link libmvec against libm (and if so, I'd expect associated 
Makefile rules, and maybe a Depend file to ensure the directories are 
built in the right order)?

Also, I'm not sure the empty libmvec option for unsupported architectures 
when we consider the case where functions require GCC or binutils versions 
newer than we wish to require, so they are optional on some architecture.  
I think having libmvec built or not built on that architecture, depending 
on the tools installed, is better than possibly having it built but empty 
if the tools are too old.

> diff --git a/sysdeps/unix/sysv/linux/shlib-versions
> b/sysdeps/unix/sysv/linux/shlib-versions
> index 9160557..4a32c8a 100644
> --- a/sysdeps/unix/sysv/linux/shlib-versions
> +++ b/sysdeps/unix/sysv/linux/shlib-versions
> @@ -1,2 +1,3 @@
>  libm=6
>  libc=6
> +libmvec=1

There is nothing Linux-specific about this library, so the toplevel 
shlib-versions seems better.

Did the patch pass the testsuite?  If so, you have a problem - you didn't 
add ABI test baselines for this library (in this version, a default empty 
baseline, and one in sysdeps/unix/sysv/linux/x86_64), so the ABI tests 
should have failed, and you need to find out why they didn't run for this 
library, and fix that.  If it failed for lack of ABI test baselines, add 
them.

> +#if defined __x86_64__ && defined __FAST_MATH__
> +# define __DECL_SIMD_AVX2
> +# define __DECL_SIMD_SSE4

I don't see the need for this initial define to empty and subsequent 
#undef.  Except you should probably have comments explaining exactly what 
these macros mean in terms of what function versions they define to be 
available.

> +# if defined _OPENMP && _OPENMP >= 201307
> +/* OpenMP case. */
> +#  undef __DECL_SIMD_AVX2
> +#  undef __DECL_SIMD_SSE4
> +#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
> +#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")

I think there should be a comment pointing to the ABI/API documentation 
that says what function versions this pragma defines to be available and 
guaranteeing that it will not be redefined to e.g. say that AVX512 is 
available so that existing headers will work with future compilers (but 
another pragma will be needed if in future AVX512 versions are added).

> +# elif defined _CILKPLUS && _CILKPLUS >= 0
> +/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
> +#  undef __DECL_SIMD_AVX2
> +#  undef __DECL_SIMD_SSE4
> +#  define __DECL_SIMD_AVX2 __attribute__((vector (nomask)))
> +#  define __DECL_SIMD_SSE4 __attribute__((vector (processor(core_i7_sse4_2), \
> +  nomask)))

To be namespace-clean, you have to use reserved-namespace versions of 
attributes.  That is, __vector__, __nomask__, __processor__ and 
__core_i7_sse4_2__.

> + .align 64
> + .globl __gnu_svml_dcos_data
> +__gnu_svml_dcos_data:
> + .long 4294967295

What are the semantics of the values in this table (please add a comment)?  
How was this table generated?

> + .type __gnu_svml_dcos_data,@object
> + .size __gnu_svml_dcos_data,1600

.size __gnu_svml_dcos_data,.-__gnu_svml_dcos_data

seems better than hardcoding another magic number for the size here.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 16:35                             ` Joseph S. Myers
  2014-09-30 18:40                               ` Christoph Lauter
@ 2014-09-30 18:40                               ` Andrew Senkevich
  2014-09-30 20:03                                 ` Joseph S. Myers
  2014-10-01 18:47                               ` Andrew Senkevich
  2 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-09-30 18:40 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Carlos O'Donell, libc-alpha

2014-09-30 20:35 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:
> On Tue, 30 Sep 2014, Andrew Senkevich wrote:
>
>> diff --git a/configure.ac b/configure.ac
>> index 82d0896..c5c1758 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
>>  # Accept binutils 2.20 or newer.
>>  AC_CHECK_PROG_VER(AS, $AS, --version,
>>    [GNU assembler.* \([0-9]*\.[0-9.]*\)],
>> -  [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
>> critic_missing="$critic_missing as")
>> +  [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
>> critic_missing="$critic_missing as")
>>  AC_CHECK_PROG_VER(LD, $LD, --version,
>>    [GNU ld.* \([0-9][0-9]*\.[0-9.]*\)],
>>    [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], LD=:
>> critic_missing="$critic_missing ld")
>
> Any change to required versions needs to include an update to install.texi
> (and the generated INSTALL file).  It should also be proposed in a
> separate thread whose subject describes what is being proposed.

I thought it is already agreed in
https://sourceware.org/ml/libc-alpha/2014-09/msg00586.html
But if separate thread is required I can start it.

>> +# We need to install libm.so as linker script
>> +# for more comfortable use of vector math library.
>> +subdir_install: $(inst_libdir)/libm.so
>> +
>> +$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
>> + $(common-objpfx)math/libm.so$(libm.so-version) \
>> + $(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
>> + $(+force)
>> + (echo '/* GNU ld script */';\
>> + cat $<; \
>> + echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
>> + 'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
>> + ) > $@.new
>> + mv -f $@.new $@
>
> Do you have ordering issues here?  It seems bad for math/ to install a
> direct symlink and then mathvec/ to change it to something else - all
> installation rules for libm should be in the math/ directory.

It must be in another Makefile of course.

> Do you need to link libmvec against libm (and if so, I'd expect associated
> Makefile rules, and maybe a Depend file to ensure the directories are
> built in the right order)?

Libmvec contains calls to scalar version from libm, but not supposed
to be used directly.
Is it ok not to link libmvec against libm in this case?

> Also, I'm not sure the empty libmvec option for unsupported architectures
> when we consider the case where functions require GCC or binutils versions
> newer than we wish to require, so they are optional on some architecture.
> I think having libmvec built or not built on that architecture, depending
> on the tools installed, is better than possibly having it built but empty
> if the tools are too old.

If library is empty but headers installed it will cause compilation
fail with according options.
Is it OK to add configure option enabled by default on x86_64 and
disabled on unsupported architectures?

> Did the patch pass the testsuite?  If so, you have a problem - you didn't
> add ABI test baselines for this library (in this version, a default empty
> baseline, and one in sysdeps/unix/sysv/linux/x86_64), so the ABI tests
> should have failed, and you need to find out why they didn't run for this
> library, and fix that.  If it failed for lack of ABI test baselines, add
> them.

Patch didn't pass the testsuite (even I don't mean it as patch, just as RFC).
The following will be added:

diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libmvec.abilist
b/sysdeps/unix/sysv/linux/x86_64/64/libmvec.abilist
new file mode 100644
index 0000000..1d53a6c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libmvec.abilist
@@ -0,0 +1 @@
+GLIBC_2.21
+ GLIBC_2.21 A
+ _ZGVdN4v_cos F

>> +#if defined __x86_64__ && defined __FAST_MATH__
>> +# define __DECL_SIMD_AVX2
>> +# define __DECL_SIMD_SSE4

> I don't see the need for this initial define to empty and subsequent
> #undef.  Except you should probably have comments explaining exactly what
> these macros mean in terms of what function versions they define to be
> available.

If one function added it affects addition of 2 lines (both OpenMP and
Cilk Plus cases),
in this case it affects addition of only one line.


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 16:35                             ` Joseph S. Myers
@ 2014-09-30 18:40                               ` Christoph Lauter
  2014-09-30 20:15                                 ` Joseph S. Myers
  2014-09-30 18:40                               ` Andrew Senkevich
  2014-10-01 18:47                               ` Andrew Senkevich
  2 siblings, 1 reply; 67+ messages in thread
From: Christoph Lauter @ 2014-09-30 18:40 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Andrew Senkevich, Carlos O'Donell, libc-alpha

Hi all,

just 2cts from someone who wrote a couple of libm functions alreday in 
his life:

Joseph S. Myers wrote on 30/09/2014 18:35:

>> +# if defined _OPENMP && _OPENMP >= 201307
>> +/* OpenMP case. */
>> +#  undef __DECL_SIMD_AVX2
>> +#  undef __DECL_SIMD_SSE4
>> +#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
>> +#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
>
> I think there should be a comment pointing to the ABI/API documentation
> that says what function versions this pragma defines to be available and
> guaranteeing that it will not be redefined to e.g. say that AVX512 is
> available so that existing headers will work with future compilers (but
> another pragma will be needed if in future AVX512 versions are added).
>

Yeah, the ABI/API is not quite self-documenting with functions declared 
as follows:

Andrew Senkevich wrote on 30/09/2014 17:00:
> +#include <sysdep.h>
> +
> + .text
> +ENTRY(_ZGVdN4v_cos)
> +
> +/* ALGORITHM DESCRIPTION:
> + *
> + *    ( low accuracy ( < 4ulp ) or enhanced performance ( half of
> correct mantissa ) implementation )
> + *
> + *    Argument representation:
> + *    arg + Pi/2 = (N*Pi + R)
> + *
> + *    Result calculation:
> + *    cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
> + *    sin(R) is approximated by corresponding polynomial
> + */
> +        pushq     %rbp
> +        movq      %rsp, %rbp
> +        andq      $-64, %rsp
> +        subq      $448, %rsp
> +        movq      __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
> +        vmovapd   %ymm0, %ymm1
> +        vmovupd   192(%rax), %ymm4
> +        vmovupd   256(%rax), %ymm5
> +

Of course, there are comments in the code about how the algorithm works 
but the code mainly is assembly with lots of magic numbers everywhere.

Frankly speaking, I have trouble seeing the difference between that code 
and a binary blob. Yes, this last remark is polemic.


>> +# elif defined _CILKPLUS && _CILKPLUS >= 0
>> +/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
>> +#  undef __DECL_SIMD_AVX2
>> +#  undef __DECL_SIMD_SSE4
>> +#  define __DECL_SIMD_AVX2 __attribute__((vector (nomask)))
>> +#  define __DECL_SIMD_SSE4 __attribute__((vector (processor(core_i7_sse4_2), \
>> +  nomask)))
>
> To be namespace-clean, you have to use reserved-namespace versions of
> attributes.  That is, __vector__, __nomask__, __processor__ and
> __core_i7_sse4_2__.
>
>> + .align 64
>> + .globl __gnu_svml_dcos_data
>> +__gnu_svml_dcos_data:
>> + .long 4294967295
>
> What are the semantics of the values in this table (please add a comment)?
> How was this table generated?
>

Yeah, who codes floating-point values as (little-endian ?) memory 
notation in decimal? I would understand hexadecimal but decimal?

As is, the code is unmaintainable.

>> + .type __gnu_svml_dcos_data,@object
>> + .size __gnu_svml_dcos_data,1600
>
> .size __gnu_svml_dcos_data,.-__gnu_svml_dcos_data
>
> seems better than hardcoding another magic number for the size here.
>

Yeah, so in conclusion: is there any technical rationale why a compiler 
couldn't produce vectorized libm function suitable for the purpose of 
gcc/cilk integration?

Best Regards,

Christoph Lauter


-- 
Christoph Lauter
MaÃ®tre de confÃ©rences - Associate Professor
Ã‰quipe PEQUAN - LIP6 - UPMC Paris 6
4, place Jussieu, 75252 Paris Cedex 05, 26-00/301
Tel.: +33144278029 / +33182521777
http://www.christoph-lauter.org/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 18:40                               ` Andrew Senkevich
@ 2014-09-30 20:03                                 ` Joseph S. Myers
  2014-10-01 13:26                                   ` Andrew Senkevich
  0 siblings, 1 reply; 67+ messages in thread
From: Joseph S. Myers @ 2014-09-30 20:03 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Carlos O'Donell, libc-alpha

On Tue, 30 Sep 2014, Andrew Senkevich wrote:

> 2014-09-30 20:35 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:
> > On Tue, 30 Sep 2014, Andrew Senkevich wrote:
> >
> >> diff --git a/configure.ac b/configure.ac
> >> index 82d0896..c5c1758 100644
> >> --- a/configure.ac
> >> +++ b/configure.ac
> >> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
> >>  # Accept binutils 2.20 or newer.
> >>  AC_CHECK_PROG_VER(AS, $AS, --version,
> >>    [GNU assembler.* \([0-9]*\.[0-9.]*\)],
> >> -  [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
> >> critic_missing="$critic_missing as")
> >> +  [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
> >> critic_missing="$critic_missing as")
> >>  AC_CHECK_PROG_VER(LD, $LD, --version,
> >>    [GNU ld.* \([0-9][0-9]*\.[0-9.]*\)],
> >>    [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], LD=:
> >> critic_missing="$critic_missing ld")
> >
> > Any change to required versions needs to include an update to install.texi
> > (and the generated INSTALL file).  It should also be proposed in a
> > separate thread whose subject describes what is being proposed.
> 
> I thought it is already agreed in
> https://sourceware.org/ml/libc-alpha/2014-09/msg00586.html
> But if separate thread is required I can start it.

In general, patch submissions should be minimal (subject to bisectability) 
- if pieces can sensibly be separated out, they should be, and each piece 
should be given a meaningful subject (which will be the summary line of 
the git commit message) describing what that piece does.  It's entirely 
plausible there are people concerned about a change to build requirements 
who aren't concerned about vector functions.

> > Do you need to link libmvec against libm (and if so, I'd expect associated
> > Makefile rules, and maybe a Depend file to ensure the directories are
> > built in the right order)?
> 
> Libmvec contains calls to scalar version from libm, but not supposed
> to be used directly.
> Is it ok not to link libmvec against libm in this case?

No.  To have proper versioned undefined references, if a library A has an 
undefined reference to a function from another library B then A must be 
linked against B; otherwise you get an undefined reference without symbol 
version specified.

> Is it OK to add configure option enabled by default on x86_64 and
> disabled on unsupported architectures?

I think that would be appropriate.

> diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libmvec.abilist
> b/sysdeps/unix/sysv/linux/x86_64/64/libmvec.abilist

Unless and until there is a reason for the set of symbols in this library 
to differ between -mx32 and -m64, I think the ABI baseline should go 
directly in sysdeps/unix/sysv/linux/x86_64/ rather than the 64/ 
subdirectory.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 18:40                               ` Christoph Lauter
@ 2014-09-30 20:15                                 ` Joseph S. Myers
  2014-10-02 11:55                                   ` Andrew Senkevich
  0 siblings, 1 reply; 67+ messages in thread
From: Joseph S. Myers @ 2014-09-30 20:15 UTC (permalink / raw)
  To: Christoph Lauter; +Cc: Andrew Senkevich, Carlos O'Donell, libc-alpha

On Tue, 30 Sep 2014, Christoph Lauter wrote:

> Hi all,
> 
> just 2cts from someone who wrote a couple of libm functions alreday in his
> life:
> 
> Joseph S. Myers wrote on 30/09/2014 18:35:
> 
> > > +# if defined _OPENMP && _OPENMP >= 201307
> > > +/* OpenMP case. */
> > > +#  undef __DECL_SIMD_AVX2
> > > +#  undef __DECL_SIMD_SSE4
> > > +#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
> > > +#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
> > 
> > I think there should be a comment pointing to the ABI/API documentation
> > that says what function versions this pragma defines to be available and
> > guaranteeing that it will not be redefined to e.g. say that AVX512 is
> > available so that existing headers will work with future compilers (but
> > another pragma will be needed if in future AVX512 versions are added).
> > 
> 
> Yeah, the ABI/API is not quite self-documenting with functions declared as
> follows:

What I'm referring to here is somewhat different - it's the ABI/API that 
defines the contact between the library and compiler implied by the pragma 
(or, in the Cilk Plus case, by the attribute).

That ABI/API will effectively say "this pragma / attribute means that 
versions of this function are available for the following vector ISAs" 
(and then go on to say what the ABI is for each ISA).  It should also say 
explicitly that compilers must not interpret the pragma / attribute as 
meaning that functions are available for any other vector ISAs and that 
new pragmas / attributes will be defined for any new vector ISAs as 
needed.  That avoids future compilers misinterpreting glibc 2.21's headers 
as meaning it provides e.g. AVX512 versions of functions.

This ABI/API should be generically about OpenMP / Cilk Plus on x86_64 
processors, rather than specifically about GCC, to establish an 
interpretation intended to be shared by any compiler that implements those 
features, now or in the future.

(Of course then the patch does actually need to provide all the function 
versions implied by the pragma / attribute.)

> > > + .align 64
> > > + .globl __gnu_svml_dcos_data
> > > +__gnu_svml_dcos_data:
> > > + .long 4294967295
> > 
> > What are the semantics of the values in this table (please add a comment)?
> > How was this table generated?
> > 
> 
> Yeah, who codes floating-point values as (little-endian ?) memory notation in
> decimal? I would understand hexadecimal but decimal?
> 
> As is, the code is unmaintainable.

And, generally, we want to be able to regenerate any such tables if there 
are changes to the algorithms.  This means at a minimum having comments 
giving the semantics of the table (coefficients of whatever polynomial 
approximation to a given function on given intervals, for example), but 
preferably source code to generate the table.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 20:03                                 ` Joseph S. Myers
@ 2014-10-01 13:26                                   ` Andrew Senkevich
  2014-10-01 13:46                                     ` Joseph S. Myers
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-01 13:26 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Carlos O'Donell, libc-alpha

2014-10-01 0:03 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:
> On Tue, 30 Sep 2014, Andrew Senkevich wrote:
>
>> 2014-09-30 20:35 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:
>> > On Tue, 30 Sep 2014, Andrew Senkevich wrote:
>> >
>> >> diff --git a/configure.ac b/configure.ac
>> >> index 82d0896..c5c1758 100644
>> >> --- a/configure.ac
>> >> +++ b/configure.ac
>> >> @@ -903,7 +903,7 @@ LIBC_PROG_BINUTILS
>> >>  # Accept binutils 2.20 or newer.
>> >>  AC_CHECK_PROG_VER(AS, $AS, --version,
>> >>    [GNU assembler.* \([0-9]*\.[0-9.]*\)],
>> >> -  [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], AS=:
>> >> critic_missing="$critic_missing as")
>> >> +  [2.1[0-9][0-9]*|2.[2-9][2-9]*|[3-9].*|[1-9][0-9]*], AS=:
>> >> critic_missing="$critic_missing as")
>> >>  AC_CHECK_PROG_VER(LD, $LD, --version,
>> >>    [GNU ld.* \([0-9][0-9]*\.[0-9.]*\)],
>> >>    [2.1[0-9][0-9]*|2.[2-9][0-9]*|[3-9].*|[1-9][0-9]*], LD=:
>> >> critic_missing="$critic_missing ld")
>> >
>> > Any change to required versions needs to include an update to install.texi
>> > (and the generated INSTALL file).  It should also be proposed in a
>> > separate thread whose subject describes what is being proposed.
>>
>> I thought it is already agreed in
>> https://sourceware.org/ml/libc-alpha/2014-09/msg00586.html
>> But if separate thread is required I can start it.
>
> In general, patch submissions should be minimal (subject to bisectability)
> - if pieces can sensibly be separated out, they should be, and each piece
> should be given a meaningful subject (which will be the summary line of
> the git commit message) describing what that piece does.  It's entirely
> plausible there are people concerned about a change to build requirements
> who aren't concerned about vector functions.

Is it OK to send patch with such update, containing also deletion of
configure checks about AVX2 support as well as according preprocessor
directive for hiding AVX2 codes? May be something else need to be
updated?


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-01 13:26                                   ` Andrew Senkevich
@ 2014-10-01 13:46                                     ` Joseph S. Myers
  0 siblings, 0 replies; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-01 13:46 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Carlos O'Donell, libc-alpha

On Wed, 1 Oct 2014, Andrew Senkevich wrote:

> > In general, patch submissions should be minimal (subject to bisectability)
> > - if pieces can sensibly be separated out, they should be, and each piece
> > should be given a meaningful subject (which will be the summary line of
> > the git commit message) describing what that piece does.  It's entirely
> > plausible there are people concerned about a change to build requirements
> > who aren't concerned about vector functions.
> 
> Is it OK to send patch with such update, containing also deletion of
> configure checks about AVX2 support as well as according preprocessor
> directive for hiding AVX2 codes? May be something else need to be
> updated?

I advise keeping architecture-specific removal of configure checks 
separate from architecture-independent increases in minimum versions.

The AVX2 checks appear to be compiler tests, not binutils tests, so they 
could only be removed after an increase of minimum GCC version for 
building glibc to 4.7.  Again, discussion of minimum GCC versions 
(architecture-independent) is best done in a separate thread that is 
explicitly about that question and that question only, but I'm not sure if 
there would be a consensus for 4.7 or only for 4.6 as new minimum version.  
And removal of configure checks that are obsolete with the new minimum 
version might still best be separate from the patch that actually 
increases the minimum.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 16:35                             ` Joseph S. Myers
  2014-09-30 18:40                               ` Christoph Lauter
  2014-09-30 18:40                               ` Andrew Senkevich
@ 2014-10-01 18:47                               ` Andrew Senkevich
  2 siblings, 0 replies; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-01 18:47 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Carlos O'Donell, libc-alpha

>> +# We need to install libm.so as linker script
>> +# for more comfortable use of vector math library.
>> +subdir_install: $(inst_libdir)/libm.so
>> +
>> +$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
>> + $(common-objpfx)math/libm.so$(libm.so-version) \
>> + $(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
>> + $(+force)
>> + (echo '/* GNU ld script */';\
>> + cat $<; \
>> + echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
>> + 'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
>> + ) > $@.new
>> + mv -f $@.new $@
>
> Do you have ordering issues here?  It seems bad for math/ to install a
> direct symlink and then mathvec/ to change it to something else - all
> installation rules for libm should be in the math/ directory.

Inserted in math/Makefile this rule produces warning about overriding
recipe for target libm.so (as I see rule for libm.so was already
generated from o-iterator.mk).

If use temporary target:

+subdir_install: $(inst_libdir)/libm.so.tmp
+$(inst_libdir)/libm.so.tmp: $(common-objpfx)format.lds \
+       $(common-objpfx)math/libm.so$(libm.so-version) \
+       $(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
+       $(+force)
+       (echo '/* GNU ld script */';\
+       cat $<; \
+       echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
+       'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
+       ) > $@
+       mv -f $@ $(inst_libdir)/libm.so

$(inst_libdir)/libm.so became overwritten later.
So I have temporary file and need to move it to $(inst_libdir)/libm.so
at the end.

If would be great if someone can give me advice how to do it.


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-09-30 20:15                                 ` Joseph S. Myers
@ 2014-10-02 11:55                                   ` Andrew Senkevich
  2014-10-02 14:21                                     ` Joseph S. Myers
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-02 11:55 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Christoph Lauter, Carlos O'Donell, libc-alpha

>> > > + .align 64
>> > > + .globl __gnu_svml_dcos_data
>> > > +__gnu_svml_dcos_data:
>> > > + .long 4294967295
>> >
>> > What are the semantics of the values in this table (please add a comment)?

This tables contain data of several types - polynomial coefficients,
some constants, lookup-tables.

>> > How was this table generated?

Values was calculated with Maple, Mathematica and Sollya.

>> Yeah, who codes floating-point values as (little-endian ?) memory notation in
>> decimal? I would understand hexadecimal but decimal?

What is requirements for data representation? Lets determine how
values will be represented here.

> And, generally, we want to be able to regenerate any such tables if there
> are changes to the algorithms.  This means at a minimum having comments
> giving the semantics of the table (coefficients of whatever polynomial
> approximation to a given function on given intervals, for example), but
> preferably source code to generate the table.

We can add some comments but regeneration of this tables is not supported.


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-02 11:55                                   ` Andrew Senkevich
@ 2014-10-02 14:21                                     ` Joseph S. Myers
  2014-10-09 17:10                                       ` Andrew Senkevich
  0 siblings, 1 reply; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-02 14:21 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Christoph Lauter, Carlos O'Donell, libc-alpha

On Thu, 2 Oct 2014, Andrew Senkevich wrote:

> >> > > + .align 64
> >> > > + .globl __gnu_svml_dcos_data
> >> > > +__gnu_svml_dcos_data:
> >> > > + .long 4294967295
> >> >
> >> > What are the semantics of the values in this table (please add a comment)?
> 
> This tables contain data of several types - polynomial coefficients,
> some constants, lookup-tables.

That then indicates that each part of the table should have a comment 
explaining the exact semantics of the values in that part of the table, 
and naming the macro used for the offset of that part of the table from 
the start of the table - and where the code refers to parts of the table, 
it should use those macros for the offsets instead of hardcoding magic 
constants in the relevant instructions.  Furthermore, if you define those 
macros in a common header, the table can do

.if .-__gnu_svml_dcos_data != MACRO_NAME
.err
.endif

at the start of each section of the table, so avoiding the need for 
comments to mention the macro names and making sure the macros are 
accurate.  Then if someone changes part of the function implementation, 
requiring replacing just one section of the table, you don't have problems 
with quiet problems from not updating offsets - failing to update the 
macros correctly will cause an immediate build failure.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-02 14:21                                     ` Joseph S. Myers
@ 2014-10-09 17:10                                       ` Andrew Senkevich
  2014-10-09 17:39                                         ` Andreas Schwab
  2014-10-09 17:45                                         ` Joseph S. Myers
  0 siblings, 2 replies; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-09 17:10 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Christoph Lauter, Carlos O'Donell, libc-alpha

Hi all,

lets discuss changes in the testsuite, --enable-mathvec configure
option and comments for data table.
Some runtime or configure check also need to be added for running
tests only on appropriate hardware.

diff --git a/Makeconfig b/Makeconfig
index 24a3b82..4672008 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -476,7 +476,7 @@ link-libc = $(link-libc-rpath-link)
$(link-libc-before-gnulib) $(gnulib)
 link-libc-tests = $(link-libc-tests-rpath-link) \
   $(link-libc-before-gnulib) $(gnulib-tests)
 # This is how to find at build-time things that will be installed there.
-rpath-dirs = math elf dlfcn nss nis rt resolv crypt
+rpath-dirs = math elf dlfcn nss nis rt resolv crypt mathvec
 rpath-link = \
 $(common-objdir):$(subst $(empty) ,:,$(patsubst
../$(subdir),.,$(rpath-dirs:%=$(common-objpfx)%)))
 else
@@ -1018,7 +1018,7 @@ all-subdirs = csu assert ctype locale intl
catgets math setjmp signal    \
       stdlib stdio-common libio malloc string wcsmbs time dirent    \
       grp pwd posix io termios resource misc socket sysvipc gmon    \
       gnulib iconv iconvdata wctype manual shadow gshadow po argp   \
-      crypt localedata timezone rt conform debug    \
+      crypt localedata timezone rt conform debug mathvec    \
       $(add-on-subdirs) dlfcn elf

 ifndef avoid-generated

diff --git a/config.make.in b/config.make.in
index 4a781fd..09fe220 100644
--- a/config.make.in
+++ b/config.make.in
@@ -93,6 +93,7 @@ use-nscd = @use_nscd@
 build-hardcoded-path-in-tests= @hardcoded_path_in_tests@
 build-pt-chown = @build_pt_chown@
 enable-lock-elision = @enable_lock_elision@
+build-mathvect = @build_mathvec@

 # Build tools.
 CC = @CC@

diff --git a/configure.ac b/configure.ac
index 82d0896..c32e508 100644
--- a/configure.ac
+++ b/configure.ac
@@ -353,6 +353,17 @@ if test "$build_pt_chown" = yes; then
   AC_DEFINE(HAVE_PT_CHOWN)
 fi

+AC_ARG_ENABLE([mathvec],
+      [AS_HELP_STRING([--enable-mathvec],
+      [Enable building and installing mathvec @<:@default=yes on
x86_64 build, else default=no@:>@])],
+      [build_mathvec=$enableval],
+      [if test -n "$(gcc -v 2>&1 | grep 'Target: x86_64')"; then :
+    build_mathvec=yes
+  else
+    build_mathvec=no
+  fi])
+AC_SUBST(build_mathvec)
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.

diff --git a/math/gen-libm-test.pl b/math/gen-libm-test.pl
index b5d599f..9899e1a 100755
--- a/math/gen-libm-test.pl
+++ b/math/gen-libm-test.pl
@@ -87,7 +87,7 @@ if ($opt_h) {
 $ulps_file = $opt_u if ($opt_u);
 $output_dir = $opt_o if ($opt_o);

-$input = "libm-test.inc";
+$input = "${srcdir}libm-test.inc";
 $auto_input = "${srcdir}auto-libm-test-out";
 $output = "${output_dir}libm-test.c";

diff --git a/math/libm-test.inc b/math/libm-test.inc
index f86a4fa..39901c4 100644
--- a/math/libm-test.inc
+++ b/math/libm-test.inc
@@ -706,13 +706,15 @@ test_single_errno (const char *test_name, int errno_value,
 static void
 test_errno (const char *test_name, int errno_value, int exceptions)
 {
-  ++noErrnoTests;
-  if (exceptions & ERRNO_UNCHANGED)
-    test_single_errno (test_name, errno_value, 0, "unchanged");
-  if (exceptions & ERRNO_EDOM)
-    test_single_errno (test_name, errno_value, EDOM, "EDOM");
-  if (exceptions & ERRNO_ERANGE)
-    test_single_errno (test_name, errno_value, ERANGE, "ERANGE");
+#ifndef TEST_MATHVEC
+      ++noErrnoTests;
+      if (exceptions & ERRNO_UNCHANGED)
+        test_single_errno (test_name, errno_value, 0, "unchanged");
+      if (exceptions & ERRNO_EDOM)
+        test_single_errno (test_name, errno_value, EDOM, "EDOM");
+      if (exceptions & ERRNO_ERANGE)
+        test_single_errno (test_name, errno_value, ERANGE, "ERANGE");
+#endif
 }

 /* Returns the number of ulps that GIVEN is away from EXPECTED.  */
@@ -1734,6 +1736,20 @@ struct test_fFF_11_data
     } \
   while (0);

+/* Run tests for a given function in TONEAREST rounding modes.  */
+#define TN_RM_TEST(FUNC, EXACT, ARRAY, LOOP_MACRO, END_MACRO, ...) \
+  do \
+    { \
+      do \
+ { \
+  START (FUNC, EXACT); \
+  LOOP_MACRO (FUNC, ARRAY, FE_TONEAREST, ## __VA_ARGS__); \
+  END_MACRO; \
+ } \
+      while (0); \
+    } \
+  while (0);
+
 /* This is to prevent messages from the SVID libm emulation.  */
 int
 matherr (struct exception *x __attribute__ ((unused)))
@@ -6258,7 +6274,11 @@ static const struct test_f_f_data cos_test_data[] =
 static void
 cos_test (void)
 {
+#ifndef TEST_MATHVEC
   ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
+#else
+  TN_RM_TEST (vector_cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
+#endif
 }

@@ -9824,6 +9844,7 @@ main (int argc, char **argv)
   initialize ();
   printf (TEST_MSG);

+#ifndef TEST_MATHVEC
   check_ulp ();

   /* Keep the tests a wee bit ordered (according to ISO C99).  */
@@ -9960,6 +9981,11 @@ main (int argc, char **argv)
   y0_test ();
   y1_test ();
   yn_test ();
+#else
+  /* Vector trigonometric functions:  */
+  cos_test ();
+
+#endif

   if (output_ulps)
     fclose (ulps_file);

diff --git a/mathvec/Makefile b/mathvec/Makefile
new file mode 100644
index 0000000..546741a
--- /dev/null
+++ b/mathvec/Makefile
@@ -0,0 +1,63 @@
+# Copyright (C) 2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Makefile for the vector math library.
+
+subdir := mathvec
+
+include ../Makeconfig
+
+ifeq ($(build-mathvect),yes)
+extra-libs := libmvec
+extra-libs-others = $(extra-libs)
+endif
+
+libmvec-routines = $(strip $(libmvec-support))
+
+$(objpfx)libmvec.so: $(common-objpfx)math/libm.so
+
+# Rules for the test suite.
+ifeq ($(build-mathvect),yes)
+ifneq (no,$(PERL))
+libmvec-tests = test-vec-double
+libmvec-tests.o = $(addsuffix .o,$(libmvec-tests))
+tests = $(libmvec-tests)
+
+libmvec-tests-generated = $(common-objpfx)math/libm-test-ulps.h
$(common-objpfx)math/libm-test.c
+generated += $(libmvec-tests-generated) libmvec-test.stmp
+
+# This is needed for dependencies
+before-compile += $(common-objpfx)math/libm-test.c
+ulps-file = $(firstword $(wildcard $(sysdirs:%=%/libm-test-ulps)))
+
+$(addprefix $(objpfx), $(libmvec-tests-generated)): $(objpfx)libmvec-test.stmp
+
+$(objpfx)libmvec-test.stmp: $(ulps-file) ../math/libm-test.inc \
+ ../math/gen-libm-test.pl ../math/auto-libm-test-out
+ $(make-target-directory)
+ $(PERL) ../math/gen-libm-test.pl -u $< -o "$(common-objpfx)math/"
+ @echo > $@
+
+$(objpfx)test-vec-double.o: $(objpfx)libmvec-test.stmp
+endif
+endif
+
+CFLAGS-test-vec-double.c = -fno-inline -ffloat-store -fno-builtin
-frounding-math -mavx2 -Wno-unused-function
+
+rtld-tests-LDFLAGS += $(common-objpfx)math/libm.so $(objpfx)libmvec.so
+
+include ../Rules

diff --git a/mathvec/test-vec-double.c b/mathvec/test-vec-double.c
new file mode 100644
index 0000000..d418ac2
--- /dev/null
+++ b/mathvec/test-vec-double.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FUNC(function) function
+#define FLOAT double
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat)
Cdouble
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_DOUBLE 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#include <immintrin.h>
+
+extern __m256d _ZGVdN4v_cos(__m256d);
+
+double vector_cos(double x)
+{
+  int i;
+  __m256d mx = _mm256_set1_pd(x);
+  __m256d mr = _ZGVdN4v_cos(mx);
+
+  for(i=1;i<4;i++)
+  {
+    if (((double*)&mr)[0]!=((double*)&mr)[i])
+    {
+      return ((double*)&mr)[0]+0.1;
+    }
+  }
+
+  return ((double*)&mr)[0];
+}
+
+#define TEST_MATHVEC
+#define EXCEPTION_TESTS_double 0
+
+#include "../math/libm-test.c"

diff --git a/sysdeps/x86_64/fpu/libm-test-ulps
b/sysdeps/x86_64/fpu/libm-test-ulps
index 36e1b76..bdc7b56 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -905,6 +905,9 @@ idouble: 1
 ildouble: 2
 ldouble: 2

+Function: "vector_cos":
+double: 1
+
 Function: "cosh":
 double: 1
 float: 1

diff --git a/sysdeps/x86_64/fpu/svml_d_cos_data.S
b/sysdeps/x86_64/fpu/svml_d_cos_data.S
new file mode 100644
index 0000000..0f2ff1f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos_data.S
@@ -0,0 +1,492 @@
+/* Data for vectorized cos.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+ .section .rodata, "a"
+
+ .align 64
+ .globl __gnu_svml_dcos_data
+
+/* Data table for vector implementations of function cos.
+ * The table may contain polynomial, reduction, lookup
+ * coefficients and other constants obtained through different
+ * methods of research and experimental work.
+ */
+__gnu_svml_dcos_data:
+
+/* General constants:
+ * lAbsMask
+ */
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+
+/* lRangeVal */
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+ .long 0x00000000
+ .long 0x41600000
+
+/* HalfPI */
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+ .long 0x54442d18
+ .long 0x3ff921fb
+
+/* InvPI */
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+
+/* RShifter */
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+ .long 0x00000000
+ .long 0x43380000
+
+/* OneHalf */
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+ .long 0x00000000
+ .long 0x3fe00000
+
+/* Range reduction PI-based constants:
+ * PI1
+ */
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+ .long 0x40000000
+ .long 0x400921fb
+
+/* PI2 */
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+ .long 0x00000000
+ .long 0x3e84442d
+
+/* PI3 */
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+ .long 0x80000000
+ .long 0x3d084698
+
+/* PI4 */
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+ .long 0x701b839a
+ .long 0x3b88cc51
+
+/* Range reduction PI-based constants if FMA available:
+ * PI1_FMA
+ */
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+ .long 0x54442d18
+ .long 0x400921fb
+
+/* PI2_FMA */
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+ .long 0x33145c06
+ .long 0x3ca1a626
+
+/* PI3_FMA */
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+ .long 0x29024e09
+ .long 0x395c1cd1
+
+/* Polynomial coeffifients (relative error 2^(-52.115)):
+ * C1
+ */
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+ .long 0x555554a7
+ .long 0xbfc55555
+
+/* C2 */
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+ .long 0x1110a4a8
+ .long 0x3f811111
+
+/* C3 */
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+ .long 0x19a5b86d
+ .long 0xbf2a01a0
+
+/* C4 */
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+ .long 0x8030fea0
+ .long 0x3ec71de3
+
+/* C5 */
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+ .long 0x46002231
+ .long 0xbe5ae635
+
+/* C6 */
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+ .long 0x57a2f220
+ .long 0x3de60e68
+
+/* C7 */
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+ .long 0x0811aac8
+ .long 0xbd69f0d6
+
+/* Additional constants:
+ * AbsMask
+ */
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+ .long 0xffffffff
+ .long 0x7fffffff
+
+/* InvPI */
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+ .long 0x6dc9c883
+ .long 0x3fd45f30
+
+/* RShifter_la */
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+ .long 0x00000000
+ .long 0x43300000
+
+/* RShifter_la */
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+ .long 0xffffffff
+ .long 0x432fffff
+
+/* RSXmax_la */
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .long 0x007ffffe
+ .long 0x43300000
+ .type __gnu_svml_dcos_data,@object
+ .size __gnu_svml_dcos_data,.-__gnu_svml_dcos_data



--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-09 17:10                                       ` Andrew Senkevich
@ 2014-10-09 17:39                                         ` Andreas Schwab
  2014-10-09 17:46                                           ` Joseph S. Myers
  2014-10-09 17:45                                         ` Joseph S. Myers
  1 sibling, 1 reply; 67+ messages in thread
From: Andreas Schwab @ 2014-10-09 17:39 UTC (permalink / raw)
  To: Andrew Senkevich
  Cc: Joseph S. Myers, Christoph Lauter, Carlos O'Donell, libc-alpha

Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:

> +      [if test -n "$(gcc -v 2>&1 | grep 'Target: x86_64')"; then :

You can get the target with -dumpmachine.  But neither takes -m32 into
account, so you'd better check the __x86_64__ predefine.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-09 17:10                                       ` Andrew Senkevich
  2014-10-09 17:39                                         ` Andreas Schwab
@ 2014-10-09 17:45                                         ` Joseph S. Myers
  2014-10-10 13:27                                           ` Andrew Senkevich
  2014-10-16 16:37                                           ` Andrew Senkevich
  1 sibling, 2 replies; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-09 17:45 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Christoph Lauter, Carlos O'Donell, libc-alpha

On Thu, 9 Oct 2014, Andrew Senkevich wrote:

> lets discuss changes in the testsuite, --enable-mathvec configure
> option and comments for data table.

I think the patch submission needs much more explanation (several 
paragraphs explaining what this patch does and how it relates to previous 
patch submissions and discussion in this area).  At this stage of 
discussion, the carefully written analysis of the implementation choices 
you faced and the decisions you reached, with rationale, is much more 
important than the patch itself.

> diff --git a/configure.ac b/configure.ac
> index 82d0896..c32e508 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -353,6 +353,17 @@ if test "$build_pt_chown" = yes; then
>    AC_DEFINE(HAVE_PT_CHOWN)
>  fi
> 
> +AC_ARG_ENABLE([mathvec],
> +      [AS_HELP_STRING([--enable-mathvec],
> +      [Enable building and installing mathvec @<:@default=yes on
> x86_64 build, else default=no@:>@])],
> +      [build_mathvec=$enableval],
> +      [if test -n "$(gcc -v 2>&1 | grep 'Target: x86_64')"; then :

No, you never put architecture-dependencies in the toplevel configure 
script.  Instead, the default needs to be determined by variables that may 
be set by sysdeps configure fragments.

> diff --git a/math/libm-test.inc b/math/libm-test.inc
> index f86a4fa..39901c4 100644
> --- a/math/libm-test.inc
> +++ b/math/libm-test.inc
> @@ -706,13 +706,15 @@ test_single_errno (const char *test_name, int errno_value,
>  static void
>  test_errno (const char *test_name, int errno_value, int exceptions)
>  {
> -  ++noErrnoTests;
> -  if (exceptions & ERRNO_UNCHANGED)
> -    test_single_errno (test_name, errno_value, 0, "unchanged");
> -  if (exceptions & ERRNO_EDOM)
> -    test_single_errno (test_name, errno_value, EDOM, "EDOM");
> -  if (exceptions & ERRNO_ERANGE)
> -    test_single_errno (test_name, errno_value, ERANGE, "ERANGE");
> +#ifndef TEST_MATHVEC

It would seem better to change test_single_errno where it already has a 
conditional "#ifndef TEST_INLINE".

> +/* Run tests for a given function in TONEAREST rounding modes.  */
> +#define TN_RM_TEST(FUNC, EXACT, ARRAY, LOOP_MACRO, END_MACRO, ...) \

I think you should arrange for IF_ROUND_INIT_* to return false for modes 
other than FE_TONEAREST when doing the vector tests, rather than having a 
new macro like this.

> @@ -6258,7 +6274,11 @@ static const struct test_f_f_data cos_test_data[] =
>  static void
>  cos_test (void)
>  {
> +#ifndef TEST_MATHVEC
>    ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
> +#else
> +  TN_RM_TEST (vector_cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
> +#endif
>  }

And I don't think we want conditionals like this for every function - 
indeed, the tests shouldn't need to know which functions have vector 
versions at all.

Instead, I suggest that all the testing of different variants takes place 
in the math/ directory - and in addition to testing float, double, 
ldouble, ifloat, idouble, ildoubl, that there are also cases float-vector, 
double-vector, ldouble-vector.  (I also suggest renaming the ifloat, 
idouble, ildoubl cases to match this general pattern.)

That is, there are some number of variants that may be tested for each 
floating-point type.  It may be useful for sysdeps Makefile fragments to 
be able to add to the list of variants.  math/Makefile should then arrange 
for the tests to be run for all relevant combinations of (type, variant).

> +CFLAGS-test-vec-double.c = -fno-inline -ffloat-store -fno-builtin
> -frounding-math -mavx2 -Wno-unused-function

Again, nothing architecture-specific (such as -mavx) in 
architecture-independent files.  If architecture-specific options are 
needed for testing, you need to set up a system of variables that can go 
in sysdeps Makefile fragments.  And since you can't determine at configure 
time what host the tests might run on, instruction set features such as 
AVX need testing for at runtime (this means building separate source files 
for the test with separate options so that you know the compiler won't 
generate AVX code before you've tested for AVX availability).

> +#include <immintrin.h>
> +
> +extern __m256d _ZGVdN4v_cos(__m256d);

We need an architecture-independent way of testing.  It might involve 
architecture-specific files providing information about how to map from 
the scalar function to the vector function, what vector functions are 
available, etc. - but the structure needs to have such a division between 
architecture-specific and architecture-independent files.

(I'd like tests to cover normal use via the installed headers, such as 
-fopenmp, but I think testing the vector functions directly *is* a good 
idea as well.)

> +/* General constants:
> + * lAbsMask
> + */

I really don't think these comments are sufficient to explain the 
semantics of the values.  I'm expecting comments of the form "the 
following N 64-bit values are IEEE binary64 constants a[0], a[1], ... for 
a minimax polynomial expansion a[0] + a[1]x + a[2]x^2 + ... of 
cos(x+0.125) for x in the interval [0.125,0.25]" or similar - an 
unambiguous description of exactly what the values mean / how they are 
used.  And see my previous point about defining macros for the offsets in 
this table in such a way that build errors will occur if the macro values 
are wrong.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-09 17:39                                         ` Andreas Schwab
@ 2014-10-09 17:46                                           ` Joseph S. Myers
  0 siblings, 0 replies; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-09 17:46 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Andrew Senkevich, Christoph Lauter, Carlos O'Donell, libc-alpha

On Thu, 9 Oct 2014, Andreas Schwab wrote:

> Andrew Senkevich <andrew.n.senkevich@gmail.com> writes:
> 
> > +      [if test -n "$(gcc -v 2>&1 | grep 'Target: x86_64')"; then :
> 
> You can get the target with -dumpmachine.  But neither takes -m32 into
> account, so you'd better check the __x86_64__ predefine.

And it shouldn't use "gcc" at all - the compiler used is $CC.  But we've 
moved all such configuration into sysdeps configure scripts, so that's the 
right approach here.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-09 17:45                                         ` Joseph S. Myers
@ 2014-10-10 13:27                                           ` Andrew Senkevich
  2014-10-10 15:23                                             ` Joseph S. Myers
  2014-10-16 16:37                                           ` Andrew Senkevich
  1 sibling, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-10 13:27 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Christoph Lauter, Carlos O'Donell, libc-alpha

2014-10-09 21:45 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:

>> +/* General constants:
>> + * lAbsMask
>> + */
>
> I really don't think these comments are sufficient to explain the
> semantics of the values.  I'm expecting comments of the form "the
> following N 64-bit values are IEEE binary64 constants a[0], a[1], ... for
> a minimax polynomial expansion a[0] + a[1]x + a[2]x^2 + ... of
> cos(x+0.125) for x in the interval [0.125,0.25]" or similar - an
> unambiguous description of exactly what the values mean / how they are
> used.

Table values were obtained mostly through many years of research and
experimental work, were part of old enough codes and we have no
detailed comments there either. So our proposal is to stay at current
level of comments as these codes proved their correctness and
effectiveness through many years of intensive usage in math
applications in such institutions as CERN, LLNL, etc.

> And see my previous point about defining macros for the offsets in
> this table in such a way that build errors will occur if the macro values
> are wrong.

We will follow-up, though these sources will not change often and they
have no influence on usage value.


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-10 13:27                                           ` Andrew Senkevich
@ 2014-10-10 15:23                                             ` Joseph S. Myers
  0 siblings, 0 replies; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-10 15:23 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Christoph Lauter, Carlos O'Donell, libc-alpha

On Fri, 10 Oct 2014, Andrew Senkevich wrote:

> Table values were obtained mostly through many years of research and
> experimental work, were part of old enough codes and we have no
> detailed comments there either. So our proposal is to stay at current
> level of comments as these codes proved their correctness and
> effectiveness through many years of intensive usage in math
> applications in such institutions as CERN, LLNL, etc.

So maybe you aren't sure if e.g. the values are the result of rounding to 
floating-point values a minimax polynomial approximation over the reals, 
or if they are a minimax polynomial approximation over floating-point 
values, or if they are some other kind of polynomial approximation.  But 
you can still make the comments say how they are used.

E.g. in <https://sourceware.org/ml/libc-alpha/2014-09/msg00680.html> you 
have a comment saying "Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7)))".  Now if 
you repeated that in the table, with the additional information of *what 
this is a polynomial approximation for* ((cos(x)-1)/x^2? (sin(x)-x)/x^3?), 
and *what interval the approximation is used on*, you've provided enough 
information there for someone who wants to recompute values optimized in a 
particular way to do so.

This goes together with a few other things to make the table more 
readable:

* If the values are 64-bit doubles, representing them with .quad rather 
than as pairs of .long would make things clearer.

* Where you have vectors repeating the same value eight times, using .rept 
/ .endr would make this obvious and make the source code smaller.

* Combining this with my previous suggestion in 
<https://sourceware.org/ml/libc-alpha/2014-10/msg00040.html> regarding how 
to make the offsets of table entries explicit, you could do:

/* Define a vector of eight copies of VALUE, whose offset from the
   start of the table __gnu_svml_dcos_data must be OFFSET.  */
.macro double_vector offset value
.if .-__gnu_svml_dcos_data != \offset
.err
.endif
.rept 8
.quad \value
.endr
.endm

and then define the values as

double_vector OFFSET_LABSMASK 0x7fffffffffffffff
double_vector OFFSET_LRANGEVAL 0x4160000000000000

etc. - you still need the comments explaining what each of the values is / 
how it is used, and still need the function implementation to use those 
OFFSET_* macros for offsets rather than hardcoding their values, but I 
think macro calls like this are about as clear as you can get for actually 
putting the constants into the table in a .S file.

> > And see my previous point about defining macros for the offsets in
> > this table in such a way that build errors will occur if the macro values
> > are wrong.
> 
> We will follow-up, though these sources will not change often and they
> have no influence on usage value.

Software is for people to read and modify, not just for computers to 
execute.  It's inherent to Free Software that you don't know who might be 
using or modifying it and in what way - so enough information should be 
provided in the source code that someone other than the original author 
can plausibly make local changes (e.g. changing the algorithm in a 
particular case only).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-09 17:45                                         ` Joseph S. Myers
  2014-10-10 13:27                                           ` Andrew Senkevich
@ 2014-10-16 16:37                                           ` Andrew Senkevich
  2014-10-16 21:51                                             ` Joseph S. Myers
  1 sibling, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-16 16:37 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

Hi, Joseph,

here is the patch with test suite changes with fixes based on your
previous comments.

>> @@ -6258,7 +6274,11 @@ static const struct test_f_f_data cos_test_data[] =
>>  static void
>>  cos_test (void)
>>  {
>> +#ifndef TEST_MATHVEC
>>    ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
>> +#else
>> +  TN_RM_TEST (vector_cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
>> +#endif
>>  }
>
> And I don't think we want conditionals like this for every function -
> indeed, the tests shouldn't need to know which functions have vector
> versions at all.

Do you mean to use the same *_test function for testing vector
(through wrapper)?
I have such scheme now but it requires to add macros named as standard
function and it also caused changes in START macros.

>> +CFLAGS-test-vec-double.c = -fno-inline -ffloat-store -fno-builtin
>> -frounding-math -mavx2 -Wno-unused-function
>
> And since you can't determine at configure
> time what host the tests might run on, instruction set features such as
> AVX need testing for at runtime (this means building separate source files
> for the test with separate options so that you know the compiler won't
> generate AVX code before you've tested for AVX availability).

Because of vector tests grouped by ISA we have different test driver
names containing vector length (test-double-vlen4.c for AVX2).
Scalar wrappers (called from test driver) will be in separate files
(test-double-vlen4-wrapper.c) and will be built with
architecture-specific options specified in sysdeps Makefile.
For runtime check we need to insert condition before wrapper start so
with help of new macros added in *_test function that condition could
be defined in test driver.
Now I have built test-double-vlen4 manually and if this way is ok I
will prepare that sysdeps Makefile.

diff --git a/math/libm-test.inc b/math/libm-test.inc
index f86a4fa..9ddb77e 100644
--- a/math/libm-test.inc
+++ b/math/libm-test.inc
@@ -684,7 +684,7 @@ static void
 test_single_errno (const char *test_name, int errno_value,
    int expected_value, const char *expected_name)
 {
-#ifndef TEST_INLINE
+#if !defined TEST_INLINE && !defined TEST_MATHVEC
   if (errno_value == expected_value)
     {
       if (print_screen (1))
@@ -1691,8 +1691,9 @@ struct test_fFF_11_data
   ROUND_RESTORE_ ## ROUNDING_MODE

 /* Start and end the tests for a given function.  */
-#define START(FUNC, EXACT) \
-  const char *this_func = #FUNC; \
+#define STR_CON(x,y) __STRING(x##y)
+#define START(FUNC, SUFF, EXACT) \
+  const char *this_func = STR_CON (FUNC, SUFF); \
   init_max_error (this_func, EXACT)
 #define END \
   print_max_error (this_func)
@@ -1705,28 +1706,28 @@ struct test_fFF_11_data
     { \
       do \
  { \
-  START (FUNC, EXACT); \
+  START (FUNC, , EXACT); \
   LOOP_MACRO (FUNC, ARRAY, , ## __VA_ARGS__); \
   END_MACRO; \
  } \
       while (0); \
       do \
  { \
-  START (FUNC ## _downward, EXACT); \
+  START (FUNC, _downward, EXACT); \
   LOOP_MACRO (FUNC, ARRAY, FE_DOWNWARD, ## __VA_ARGS__); \
   END_MACRO; \
  } \
       while (0); \
       do \
  { \
-  START (FUNC ## _towardzero, EXACT); \
+  START (FUNC, _towardzero, EXACT); \
   LOOP_MACRO (FUNC, ARRAY, FE_TOWARDZERO, ## __VA_ARGS__); \
   END_MACRO; \
  } \
       while (0); \
       do \
  { \
-  START (FUNC ## _upward, EXACT); \
+  START (FUNC, _upward, EXACT); \
   LOOP_MACRO (FUNC, ARRAY, FE_UPWARD, ## __VA_ARGS__); \
   END_MACRO; \
  } \
@@ -6034,7 +6035,7 @@ static const struct test_c_c_data cexp_test_data[] =
 static void
 cexp_test (void)
 {
-  START (cexp, 0);
+  START (cexp, , 0);
   RUN_TEST_LOOP_c_c (cexp, cexp_test_data, );
   END_COMPLEX;
 }
@@ -6247,7 +6248,7 @@ copysign_test (void)


 static const struct test_f_f_data cos_test_data[] =
-  {
+  {
     TEST_f_f (cos, plus_infty, qnan_value, INVALID_EXCEPTION|ERRNO_EDOM),
     TEST_f_f (cos, minus_infty, qnan_value, INVALID_EXCEPTION|ERRNO_EDOM),
     TEST_f_f (cos, qnan_value, qnan_value,
NO_INEXACT_EXCEPTION|ERRNO_UNCHANGED),
@@ -6255,9 +6256,14 @@ static const struct test_f_f_data cos_test_data[] =
     AUTO_TESTS_f_f (cos),
   };

+#ifndef CHECKARCH
+# define CHECKARCH
+#endif
+
 static void
 cos_test (void)
 {
+  CHECKARCH
   ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
 }

@@ -7548,7 +7554,7 @@ static const struct test_if_f_data jn_test_data[] =
 static void
 jn_test (void)
 {
-  START (jn, 0);
+  START (jn, , 0);
   RUN_TEST_LOOP_if_f (jn, jn_test_data, );
   END;
 }
@@ -9374,7 +9380,7 @@ static const struct test_f_f_data tgamma_test_data[] =
 static void
 tgamma_test (void)
 {
-  START (tgamma, 0);
+  START (tgamma, , 0);
   RUN_TEST_LOOP_f_f (tgamma, tgamma_test_data, );
   END;
 }
@@ -9824,6 +9830,12 @@ main (int argc, char **argv)
   initialize ();
   printf (TEST_MSG);

+  /* Vector trigonometric functions:  */
+#ifdef TEST_MATHVEC
+
+  cos_test ();
+
+#else
   check_ulp ();

   /* Keep the tests a wee bit ordered (according to ISO C99).  */
@@ -9960,6 +9972,7 @@ main (int argc, char **argv)
   y0_test ();
   y1_test ();
   yn_test ();
+#endif

   if (output_ulps)
     fclose (ulps_file);

diff --git a/sysdeps/x86_64/fpu/libm-test-ulps
b/sysdeps/x86_64/fpu/libm-test-ulps
index 36e1b76..0e11cd5 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -905,6 +905,12 @@ idouble: 1
 ildouble: 2
 ldouble: 2

+
+Function: "vlen4_cos":
+double: 1
+
 Function: "cosh":
 double: 1
 float: 1

diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
new file mode 100644
index 0000000..35e130e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
@@ -0,0 +1,40 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FLOAT double
+
+// Wrapper from scalar to vector function implemented in AVX2.
+#define VECTOR_WRAPPER(scalar_func,vector_func) \
+extern __m256d vector_func(__m256d); \
+FLOAT scalar_func(FLOAT x)\
+{\
+  int i;\
+  __m256d mx = _mm256_set1_pd(x);\
+  __m256d mr = vector_func(mx);\
+  for(i=1;i<4;i++)\
+  {\
+    if (((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#include <immintrin.h>
+
+VECTOR_WRAPPER(vlen4_cos,_ZGVdN4v_cos)

diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c
b/sysdeps/x86_64/fpu/test-double-vlen4.c
new file mode 100644
index 0000000..ce40c04
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4.c
@@ -0,0 +1,44 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FUNC(function) function
+#define FLOAT double
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat)
Cdouble
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_DOUBLE 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define TEST_MATHVEC
+#define EXCEPTION_TESTS_double 0
+#define ROUNDING_TESTS_double(MODE) ((MODE) == FE_TONEAREST)
+
+#define cos vlen4_cos
+
+#include <init-arch.h>
+
+#define CHECKARCH \
+__init_cpu_features();\
+if (__cpu_features.feature[index_AVX2_Usable] & bit_AVX2_Usable)
+
+#include "libm-test.c"


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-16 16:37                                           ` Andrew Senkevich
@ 2014-10-16 21:51                                             ` Joseph S. Myers
  2014-10-21 13:20                                               ` Andrew Senkevich
  0 siblings, 1 reply; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-16 21:51 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Thu, 16 Oct 2014, Andrew Senkevich wrote:

> >> @@ -6258,7 +6274,11 @@ static const struct test_f_f_data cos_test_data[] =
> >>  static void
> >>  cos_test (void)
> >>  {
> >> +#ifndef TEST_MATHVEC
> >>    ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
> >> +#else
> >> +  TN_RM_TEST (vector_cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
> >> +#endif
> >>  }
> >
> > And I don't think we want conditionals like this for every function -
> > indeed, the tests shouldn't need to know which functions have vector
> > versions at all.
> 
> Do you mean to use the same *_test function for testing vector
> (through wrapper)?

Yes.  I don't have a full design, but the principle is to change how the 
macros for running tests expand (or what functions they call do) 
conditional on what is being tested, so that none of the conditionals are 
at the level of individual functions if it can be avoided.  And I don't 
think you should need to change calls to START, just the expansion.

> Because of vector tests grouped by ISA we have different test driver
> names containing vector length (test-double-vlen4.c for AVX2).
> Scalar wrappers (called from test driver) will be in separate files
> (test-double-vlen4-wrapper.c) and will be built with
> architecture-specific options specified in sysdeps Makefile.
> For runtime check we need to insert condition before wrapper start so
> with help of new macros added in *_test function that condition could
> be defined in test driver.

I'd think that the check for AVX2 etc. availability could run once in 
main, rather than in the tests of individual functions.

> @@ -6247,7 +6248,7 @@ copysign_test (void)
> 
> 
>  static const struct test_f_f_data cos_test_data[] =
> -  {
> +  {

This looks like a bogus diff hunk.

> +  /* Vector trigonometric functions:  */
> +#ifdef TEST_MATHVEC
> +
> +  cos_test ();
> +
> +#else

There shouldn't be such conditionals.  It should be arranged that if 
there isn't a relevant vector version of a particular function, running 
vector tests for that function does nothing - so there are no conditionals 
on which *_test functions to run, and none inside those functions, just 
conditionals affecting what the test macros do (by means of conditionals 
inside them such as if (HAVE_VECTOR_cos_double_vlen4), for example, 
resulting from appropriate concatenations).

> diff --git a/sysdeps/x86_64/fpu/libm-test-ulps
> b/sysdeps/x86_64/fpu/libm-test-ulps
> index 36e1b76..0e11cd5 100644
> --- a/sysdeps/x86_64/fpu/libm-test-ulps
> +++ b/sysdeps/x86_64/fpu/libm-test-ulps
> @@ -905,6 +905,12 @@ idouble: 1
>  ildouble: 2
>  ldouble: 2
> 
> +
> +Function: "vlen4_cos":
> +double: 1
> +
>  Function: "cosh":
>  double: 1
>  float: 1

This looks odd.  There shouldn't be the double blank line, and entries 
should be sorted alphabetically - this file should be updated by "make 
regen-ulps", and you need to ensure that regen-ulps does include the ulps 
for the tests of the vector functions.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-16 21:51                                             ` Joseph S. Myers
@ 2014-10-21 13:20                                               ` Andrew Senkevich
  2014-10-21 15:29                                                 ` Joseph S. Myers
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-21 13:20 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

2014-10-17 1:51 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:

>> +  /* Vector trigonometric functions:  */
>> +#ifdef TEST_MATHVEC
>> +
>> +  cos_test ();
>> +
>> +#else
>
> There shouldn't be such conditionals.  It should be arranged that if
> there isn't a relevant vector version of a particular function, running
> vector tests for that function does nothing - so there are no conditionals
> on which *_test functions to run, and none inside those functions, just
> conditionals affecting what the test macros do (by means of conditionals
> inside them such as if (HAVE_VECTOR_cos_double_vlen4), for example,
> resulting from appropriate concatenations).

With HAVE_VECTOR_cos_double_vlen4 we need to have such macros with
zero for all set of not vector functions which is huge.
May be more suitable way is to have determined name of vector function
wrapper and selection based on function name?
I mean to have something like this in test driver test-double-vlen4.c:

#define HAVE_VECTOR 1
#define VEC_PREFIX_STR "VECTOR_LEN_"
#define cos VECTOR_LEN_4_cos
#include "libm-test.c"

in test-double-vlen4-wrapper.c:

VECTOR_WRAPPER(VECTOR_LEN_4_cos,_ZGVdN4v_cos)

and in libm-test.inc:

#ifndef HAVE_VECTOR
# define HAVE_VECTOR 0
#endif

#ifndef VEC_PREFIX_STR
# define VEC_PREFIX_STR ""
#endif

static const char *vec_prefix = VEC_PREFIX_STR;

static int is_vector_name(const char *this_func)
{
  if (strncmp(this_func, vec_prefix, strlen(vec_prefix))==0)
    return 1;
  return 0;
}

#define STR_CON(x,y) __STRING(x##y)

/* Start and end the tests for a given function.  */
#define START(FUNC, SUFF, EXACT) \
  const char *this_func = STR_CON (FUNC, SUFF); \
  if (HAVE_VECTOR && !is_vector_name(this_func)) return; \
  init_max_error (this_func, EXACT)

Is this way ok?


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-21 13:20                                               ` Andrew Senkevich
@ 2014-10-21 15:29                                                 ` Joseph S. Myers
  2014-10-23 19:23                                                   ` Andrew Senkevich
  0 siblings, 1 reply; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-21 15:29 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Tue, 21 Oct 2014, Andrew Senkevich wrote:

> > There shouldn't be such conditionals.  It should be arranged that if
> > there isn't a relevant vector version of a particular function, running
> > vector tests for that function does nothing - so there are no conditionals
> > on which *_test functions to run, and none inside those functions, just
> > conditionals affecting what the test macros do (by means of conditionals
> > inside them such as if (HAVE_VECTOR_cos_double_vlen4), for example,
> > resulting from appropriate concatenations).
> 
> With HAVE_VECTOR_cos_double_vlen4 we need to have such macros with
> zero for all set of not vector functions which is huge.

But I'd hope such macros could be generated by gen-libm-test.pl (or some 
such script, anyway) rather than needing lots of repetitive definitions to 
be maintained by hand and checked in.

Essentially:

* The architecture-specific headers (installed headers, or possibly 
non-installed ones used only by the testsuite in some cases) contain the 
information about what vector versions of what functions are available.  
Things are designed so that they only need to contain definitions where 
vector functions are available, not where they aren't (to avoid needing to 
repeat slightly different huge lists for each architecture).

* Where a default definition to 0 is needed in any cases, the relevant 
definitions are generated automatically.  (Indeed, this might make sense 
for a header included by bits/mathcalls.h, so that __MATHCALL can expand 
to include the right __DECL_SIMD_*, which might end up empty, rather than 
needing lots of #if conditionals before every function declaration there.)

(Incidentally, there have been so many different patch fragments posted in 
this discussion that it's hard to follow what you're proposing, if e.g. in 
the discussion of testing it's relevant to look at what you're proposing 
for installed headers.  I think it would help if you had a git branch with 
the current set of proposed changes, that you frequently rebase so it 
always shows what you currently propose.)

> May be more suitable way is to have determined name of vector function
> wrapper and selection based on function name?
> I mean to have something like this in test driver test-double-vlen4.c:
> 
> #define HAVE_VECTOR 1
> #define VEC_PREFIX_STR "VECTOR_LEN_"
> #define cos VECTOR_LEN_4_cos
> #include "libm-test.c"
> 
> in test-double-vlen4-wrapper.c:
> 
> VECTOR_WRAPPER(VECTOR_LEN_4_cos,_ZGVdN4v_cos)

If you did something like that, I think it would still be desirable to 
have some form of automatic generation of a list of defines, one per 
function and conditional as needed on whether the relevant vector version 
of that function exists.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-21 15:29                                                 ` Joseph S. Myers
@ 2014-10-23 19:23                                                   ` Andrew Senkevich
  2014-10-23 21:37                                                     ` Joseph S. Myers
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-23 19:23 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 657 bytes --]

Hi, Joseph,

attach contains current situation in my branch. I have generated
additional header with series of definitions, now that header included
in libm-test.inc but also only generation can be left in some script.
Currently information from math.h is used for selection what test to
run in vector case, but it required according changes in
math-vector.h. Added tests for cos and cosf (float with no vector
function body, just a stub now).
Test suite passed with no fails on math tests on non AVX2 target, on
AVX2 target also (but math/test-float-vlen8 must fail so it is
strange, will look).
Let me know if such changes ok in general.


--
WBR,
Andrew

[-- Attachment #2: libmvec_23.10.patch --]
[-- Type: application/octet-stream, Size: 99876 bytes --]

diff --git a/Makeconfig b/Makeconfig
index 24a3b82..4672008 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -476,7 +476,7 @@ link-libc = $(link-libc-rpath-link) $(link-libc-before-gnulib) $(gnulib)
 link-libc-tests = $(link-libc-tests-rpath-link) \
 		  $(link-libc-before-gnulib) $(gnulib-tests)
 # This is how to find at build-time things that will be installed there.
-rpath-dirs = math elf dlfcn nss nis rt resolv crypt
+rpath-dirs = math elf dlfcn nss nis rt resolv crypt mathvec
 rpath-link = \
 $(common-objdir):$(subst $(empty) ,:,$(patsubst ../$(subdir),.,$(rpath-dirs:%=$(common-objpfx)%)))
 else
@@ -1018,7 +1018,7 @@ all-subdirs = csu assert ctype locale intl catgets math setjmp signal	    \
 	      stdlib stdio-common libio malloc string wcsmbs time dirent    \
 	      grp pwd posix io termios resource misc socket sysvipc gmon    \
 	      gnulib iconv iconvdata wctype manual shadow gshadow po argp   \
-	      crypt localedata timezone rt conform debug		    \
+	      crypt localedata timezone rt conform debug mathvec	    \
 	      $(add-on-subdirs) dlfcn elf
 
 ifndef avoid-generated
diff --git a/bits/math-vector.h b/bits/math-vector.h
new file mode 100644
index 0000000..c8fe5cb
--- /dev/null
+++ b/bits/math-vector.h
@@ -0,0 +1,22 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License  published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+	include <math.h> instead."
+#endif
diff --git a/configure b/configure
index 89566c5..2c9787d 100755
--- a/configure
+++ b/configure
@@ -774,6 +774,7 @@ enable_systemtap
 enable_build_nscd
 enable_nscd
 enable_pt_chown
+enable_mathvec
 with_cpu
 '
       ac_precious_vars='build_alias
@@ -1437,6 +1438,8 @@ Optional Features:
   --disable-build-nscd    disable building and installing the nscd daemon
   --disable-nscd          library functions will not contact the nscd daemon
   --enable-pt_chown       Enable building and installing pt_chown
+  --enable-mathvec        Enable building and installing mathvec [default=yes
+                          on x86_64 build, else default=no]
 
 Optional Packages:
   --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
@@ -3730,6 +3733,14 @@ if test "$build_pt_chown" = yes; then
 
 fi
 
+# Check whether --enable-mathvec was given.
+if test "${enable_mathvec+set}" = set; then :
+  enableval=$enable_mathvec; build_mathvec=$enableval
+else
+  build_mathvec=notset
+fi
+
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.
diff --git a/configure.ac b/configure.ac
index 82d0896..b505201 100644
--- a/configure.ac
+++ b/configure.ac
@@ -353,6 +353,12 @@ if test "$build_pt_chown" = yes; then
   AC_DEFINE(HAVE_PT_CHOWN)
 fi
 
+AC_ARG_ENABLE([mathvec],
+	      [AS_HELP_STRING([--enable-mathvec],
+	      [Enable building and installing mathvec @<:@default=yes on x86_64 build, else default=no@:>@])],
+	      [build_mathvec=$enableval],
+	      [build_mathvec=notset])
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.
diff --git a/math/Makefile b/math/Makefile
index 866bc0f..c6659aa 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -26,7 +26,7 @@ headers		:= math.h bits/mathcalls.h bits/mathinline.h bits/huge_val.h \
 		   bits/huge_valf.h bits/huge_vall.h bits/inf.h bits/nan.h \
 		   fpu_control.h complex.h bits/cmathcalls.h fenv.h \
 		   bits/fenv.h bits/fenvinline.h bits/mathdef.h tgmath.h \
-		   bits/math-finite.h
+		   bits/math-finite.h bits/math-vector.h
 
 # FPU support code.
 aux		:= setfpucw fpu_control
@@ -85,6 +85,22 @@ generated += $(foreach s,.c .S l.c l.S f.c f.S,$(calls:s_%=m_%$s))
 routines = $(calls) $(calls:=f) $(long-c-$(long-double-fcts))
 long-c-yes = $(calls:=l)
 
+ifeq ($(build-mathvect),yes)
+# We need to install libm.so as linker script
+# for more comfortable use of vector math library.
+subdir_install: $(inst_libdir)/libm.so.tmp
+$(inst_libdir)/libm.so.tmp: $(common-objpfx)format.lds \
+	$(common-objpfx)math/libm.so$(libm.so-version) \
+	$(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
+	$(+force)
+	(echo '/* GNU ld script */';\
+	cat $<; \
+	echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
+	'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
+	) > $@
+	mv -f $@ $(inst_libdir)/libm.so # TODO do it somehow after all other
+endif
+
 # Rules for the test suite.
 tests = test-matherr test-fenv atest-exp atest-sincos atest-exp2 basic-test \
 	test-misc test-fpucw test-fpucw-ieee tst-definitions test-tgmath \
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 8a94a7e..2d31a11 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -60,6 +60,15 @@ __MATHCALL (atan,, (_Mdouble_ __x));
 __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
 
 /* Cosine of X.  */
+#if !defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cos
+__DECL_SIMD_cos
+#endif
+#if defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cosf
+__DECL_SIMD_cosf
+#endif
+#if defined _Mlong_double_ && defined __DECL_SIMD_cosl
+__DECL_SIMD_cosl
+#endif
 __MATHCALL (cos,, (_Mdouble_ __x));
 /* Sine of X.  */
 __MATHCALL (sin,, (_Mdouble_ __x));
diff --git a/math/have_vector.h b/math/have_vector.h
new file mode 100644
index 0000000..94aacf0
--- /dev/null
+++ b/math/have_vector.h
@@ -0,0 +1,2574 @@
+/* 
+Definitions below are generated with the following bash script:
+for func in $(grep ALL_RM_TEST math/libm-test.inc | awk {'print $2'} | sed -e "s/(//" -e "s/,//"); do 
+echo "#ifdef __DECL_SIMD_${func}"
+echo "# define HAVE_VECTOR_${func} 1"
+echo "# define VEC_PREFIX_${func} VEC_PREFIX"
+echo "#else"
+echo "# define HAVE_VECTOR_${func} 0"
+echo "# define VEC_PREFIX_${func}"
+echo "#endif"
+echo
+echo "#ifdef __DECL_SIMD_${func}f"
+echo "# define HAVE_VECTOR_${func}f 1"
+echo "# define VEC_PREFIX_${func}f VEC_PREFIX"
+echo "#else"
+echo "# define HAVE_VECTOR_${func}f 0"
+echo "# define VEC_PREFIX_${func}f"
+echo "#endif"
+echo
+echo "#ifdef __DECL_SIMD_${func}l"
+echo "# define HAVE_VECTOR_${func}l 1"
+echo "# define VEC_PREFIX_${func}l VEC_PREFIX"
+echo "#else"
+echo "# define HAVE_VECTOR_${func}l 0"
+echo "# define VEC_PREFIX_${func}l"
+echo "#endif"
+echo
+done
+*/
+
+#ifdef __DECL_SIMD_finite
+# define HAVE_VECTOR_finite 1
+# define VEC_PREFIX_finite VEC_PREFIX
+#else
+# define HAVE_VECTOR_finite 0
+# define VEC_PREFIX_finite
+#endif
+
+#ifdef __DECL_SIMD_finitef
+# define HAVE_VECTOR_finitef 1
+# define VEC_PREFIX_finitef VEC_PREFIX
+#else
+# define HAVE_VECTOR_finitef 0
+# define VEC_PREFIX_finitef
+#endif
+
+#ifdef __DECL_SIMD_finitel
+# define HAVE_VECTOR_finitel 1
+# define VEC_PREFIX_finitel VEC_PREFIX
+#else
+# define HAVE_VECTOR_finitel 0
+# define VEC_PREFIX_finitel
+#endif
+
+#ifdef __DECL_SIMD_fpclassify
+# define HAVE_VECTOR_fpclassify 1
+# define VEC_PREFIX_fpclassify VEC_PREFIX
+#else
+# define HAVE_VECTOR_fpclassify 0
+# define VEC_PREFIX_fpclassify
+#endif
+
+#ifdef __DECL_SIMD_fpclassifyf
+# define HAVE_VECTOR_fpclassifyf 1
+# define VEC_PREFIX_fpclassifyf VEC_PREFIX
+#else
+# define HAVE_VECTOR_fpclassifyf 0
+# define VEC_PREFIX_fpclassifyf
+#endif
+
+#ifdef __DECL_SIMD_fpclassifyl
+# define HAVE_VECTOR_fpclassifyl 1
+# define VEC_PREFIX_fpclassifyl VEC_PREFIX
+#else
+# define HAVE_VECTOR_fpclassifyl 0
+# define VEC_PREFIX_fpclassifyl
+#endif
+
+#ifdef __DECL_SIMD_isfinite
+# define HAVE_VECTOR_isfinite 1
+# define VEC_PREFIX_isfinite VEC_PREFIX
+#else
+# define HAVE_VECTOR_isfinite 0
+# define VEC_PREFIX_isfinite
+#endif
+
+#ifdef __DECL_SIMD_isfinitef
+# define HAVE_VECTOR_isfinitef 1
+# define VEC_PREFIX_isfinitef VEC_PREFIX
+#else
+# define HAVE_VECTOR_isfinitef 0
+# define VEC_PREFIX_isfinitef
+#endif
+
+#ifdef __DECL_SIMD_isfinitel
+# define HAVE_VECTOR_isfinitel 1
+# define VEC_PREFIX_isfinitel VEC_PREFIX
+#else
+# define HAVE_VECTOR_isfinitel 0
+# define VEC_PREFIX_isfinitel
+#endif
+
+#ifdef __DECL_SIMD_isinf
+# define HAVE_VECTOR_isinf 1
+# define VEC_PREFIX_isinf VEC_PREFIX
+#else
+# define HAVE_VECTOR_isinf 0
+# define VEC_PREFIX_isinf
+#endif
+
+#ifdef __DECL_SIMD_isinff
+# define HAVE_VECTOR_isinff 1
+# define VEC_PREFIX_isinff VEC_PREFIX
+#else
+# define HAVE_VECTOR_isinff 0
+# define VEC_PREFIX_isinff
+#endif
+
+#ifdef __DECL_SIMD_isinfl
+# define HAVE_VECTOR_isinfl 1
+# define VEC_PREFIX_isinfl VEC_PREFIX
+#else
+# define HAVE_VECTOR_isinfl 0
+# define VEC_PREFIX_isinfl
+#endif
+
+#ifdef __DECL_SIMD_isnan
+# define HAVE_VECTOR_isnan 1
+# define VEC_PREFIX_isnan VEC_PREFIX
+#else
+# define HAVE_VECTOR_isnan 0
+# define VEC_PREFIX_isnan
+#endif
+
+#ifdef __DECL_SIMD_isnanf
+# define HAVE_VECTOR_isnanf 1
+# define VEC_PREFIX_isnanf VEC_PREFIX
+#else
+# define HAVE_VECTOR_isnanf 0
+# define VEC_PREFIX_isnanf
+#endif
+
+#ifdef __DECL_SIMD_isnanl
+# define HAVE_VECTOR_isnanl 1
+# define VEC_PREFIX_isnanl VEC_PREFIX
+#else
+# define HAVE_VECTOR_isnanl 0
+# define VEC_PREFIX_isnanl
+#endif
+
+#ifdef __DECL_SIMD_isnormal
+# define HAVE_VECTOR_isnormal 1
+# define VEC_PREFIX_isnormal VEC_PREFIX
+#else
+# define HAVE_VECTOR_isnormal 0
+# define VEC_PREFIX_isnormal
+#endif
+
+#ifdef __DECL_SIMD_isnormalf
+# define HAVE_VECTOR_isnormalf 1
+# define VEC_PREFIX_isnormalf VEC_PREFIX
+#else
+# define HAVE_VECTOR_isnormalf 0
+# define VEC_PREFIX_isnormalf
+#endif
+
+#ifdef __DECL_SIMD_isnormall
+# define HAVE_VECTOR_isnormall 1
+# define VEC_PREFIX_isnormall VEC_PREFIX
+#else
+# define HAVE_VECTOR_isnormall 0
+# define VEC_PREFIX_isnormall
+#endif
+
+#ifdef __DECL_SIMD_issignaling
+# define HAVE_VECTOR_issignaling 1
+# define VEC_PREFIX_issignaling VEC_PREFIX
+#else
+# define HAVE_VECTOR_issignaling 0
+# define VEC_PREFIX_issignaling
+#endif
+
+#ifdef __DECL_SIMD_issignalingf
+# define HAVE_VECTOR_issignalingf 1
+# define VEC_PREFIX_issignalingf VEC_PREFIX
+#else
+# define HAVE_VECTOR_issignalingf 0
+# define VEC_PREFIX_issignalingf
+#endif
+
+#ifdef __DECL_SIMD_issignalingl
+# define HAVE_VECTOR_issignalingl 1
+# define VEC_PREFIX_issignalingl VEC_PREFIX
+#else
+# define HAVE_VECTOR_issignalingl 0
+# define VEC_PREFIX_issignalingl
+#endif
+
+#ifdef __DECL_SIMD_signbit
+# define HAVE_VECTOR_signbit 1
+# define VEC_PREFIX_signbit VEC_PREFIX
+#else
+# define HAVE_VECTOR_signbit 0
+# define VEC_PREFIX_signbit
+#endif
+
+#ifdef __DECL_SIMD_signbitf
+# define HAVE_VECTOR_signbitf 1
+# define VEC_PREFIX_signbitf VEC_PREFIX
+#else
+# define HAVE_VECTOR_signbitf 0
+# define VEC_PREFIX_signbitf
+#endif
+
+#ifdef __DECL_SIMD_signbitl
+# define HAVE_VECTOR_signbitl 1
+# define VEC_PREFIX_signbitl VEC_PREFIX
+#else
+# define HAVE_VECTOR_signbitl 0
+# define VEC_PREFIX_signbitl
+#endif
+
+#ifdef __DECL_SIMD_acos
+# define HAVE_VECTOR_acos 1
+# define VEC_PREFIX_acos VEC_PREFIX
+#else
+# define HAVE_VECTOR_acos 0
+# define VEC_PREFIX_acos
+#endif
+
+#ifdef __DECL_SIMD_acosf
+# define HAVE_VECTOR_acosf 1
+# define VEC_PREFIX_acosf VEC_PREFIX
+#else
+# define HAVE_VECTOR_acosf 0
+# define VEC_PREFIX_acosf
+#endif
+
+#ifdef __DECL_SIMD_acosl
+# define HAVE_VECTOR_acosl 1
+# define VEC_PREFIX_acosl VEC_PREFIX
+#else
+# define HAVE_VECTOR_acosl 0
+# define VEC_PREFIX_acosl
+#endif
+
+#ifdef __DECL_SIMD_asin
+# define HAVE_VECTOR_asin 1
+# define VEC_PREFIX_asin VEC_PREFIX
+#else
+# define HAVE_VECTOR_asin 0
+# define VEC_PREFIX_asin
+#endif
+
+#ifdef __DECL_SIMD_asinf
+# define HAVE_VECTOR_asinf 1
+# define VEC_PREFIX_asinf VEC_PREFIX
+#else
+# define HAVE_VECTOR_asinf 0
+# define VEC_PREFIX_asinf
+#endif
+
+#ifdef __DECL_SIMD_asinl
+# define HAVE_VECTOR_asinl 1
+# define VEC_PREFIX_asinl VEC_PREFIX
+#else
+# define HAVE_VECTOR_asinl 0
+# define VEC_PREFIX_asinl
+#endif
+
+#ifdef __DECL_SIMD_atan
+# define HAVE_VECTOR_atan 1
+# define VEC_PREFIX_atan VEC_PREFIX
+#else
+# define HAVE_VECTOR_atan 0
+# define VEC_PREFIX_atan
+#endif
+
+#ifdef __DECL_SIMD_atanf
+# define HAVE_VECTOR_atanf 1
+# define VEC_PREFIX_atanf VEC_PREFIX
+#else
+# define HAVE_VECTOR_atanf 0
+# define VEC_PREFIX_atanf
+#endif
+
+#ifdef __DECL_SIMD_atanl
+# define HAVE_VECTOR_atanl 1
+# define VEC_PREFIX_atanl VEC_PREFIX
+#else
+# define HAVE_VECTOR_atanl 0
+# define VEC_PREFIX_atanl
+#endif
+
+#ifdef __DECL_SIMD_atan2
+# define HAVE_VECTOR_atan2 1
+# define VEC_PREFIX_atan2 VEC_PREFIX
+#else
+# define HAVE_VECTOR_atan2 0
+# define VEC_PREFIX_atan2
+#endif
+
+#ifdef __DECL_SIMD_atan2f
+# define HAVE_VECTOR_atan2f 1
+# define VEC_PREFIX_atan2f VEC_PREFIX
+#else
+# define HAVE_VECTOR_atan2f 0
+# define VEC_PREFIX_atan2f
+#endif
+
+#ifdef __DECL_SIMD_atan2l
+# define HAVE_VECTOR_atan2l 1
+# define VEC_PREFIX_atan2l VEC_PREFIX
+#else
+# define HAVE_VECTOR_atan2l 0
+# define VEC_PREFIX_atan2l
+#endif
+
+#ifdef __DECL_SIMD_cos
+# define HAVE_VECTOR_cos 1
+# define VEC_PREFIX_cos VEC_PREFIX
+#else
+# define HAVE_VECTOR_cos 0
+# define VEC_PREFIX_cos
+#endif
+
+#ifdef __DECL_SIMD_cosf
+# define HAVE_VECTOR_cosf 1
+# define VEC_PREFIX_cosf VEC_PREFIX
+#else
+# define HAVE_VECTOR_cosf 0
+# define VEC_PREFIX_cosf
+#endif
+
+#ifdef __DECL_SIMD_cosl
+# define HAVE_VECTOR_cosl 1
+# define VEC_PREFIX_cosl VEC_PREFIX
+#else
+# define HAVE_VECTOR_cosl 0
+# define VEC_PREFIX_cosl
+#endif
+
+#ifdef __DECL_SIMD_sin
+# define HAVE_VECTOR_sin 1
+# define VEC_PREFIX_sin VEC_PREFIX
+#else
+# define HAVE_VECTOR_sin 0
+# define VEC_PREFIX_sin
+#endif
+
+#ifdef __DECL_SIMD_sinf
+# define HAVE_VECTOR_sinf 1
+# define VEC_PREFIX_sinf VEC_PREFIX
+#else
+# define HAVE_VECTOR_sinf 0
+# define VEC_PREFIX_sinf
+#endif
+
+#ifdef __DECL_SIMD_sinl
+# define HAVE_VECTOR_sinl 1
+# define VEC_PREFIX_sinl VEC_PREFIX
+#else
+# define HAVE_VECTOR_sinl 0
+# define VEC_PREFIX_sinl
+#endif
+
+#ifdef __DECL_SIMD_sincos
+# define HAVE_VECTOR_sincos 1
+# define VEC_PREFIX_sincos VEC_PREFIX
+#else
+# define HAVE_VECTOR_sincos 0
+# define VEC_PREFIX_sincos
+#endif
+
+#ifdef __DECL_SIMD_sincosf
+# define HAVE_VECTOR_sincosf 1
+# define VEC_PREFIX_sincosf VEC_PREFIX
+#else
+# define HAVE_VECTOR_sincosf 0
+# define VEC_PREFIX_sincosf
+#endif
+
+#ifdef __DECL_SIMD_sincosl
+# define HAVE_VECTOR_sincosl 1
+# define VEC_PREFIX_sincosl VEC_PREFIX
+#else
+# define HAVE_VECTOR_sincosl 0
+# define VEC_PREFIX_sincosl
+#endif
+
+#ifdef __DECL_SIMD_tan
+# define HAVE_VECTOR_tan 1
+# define VEC_PREFIX_tan VEC_PREFIX
+#else
+# define HAVE_VECTOR_tan 0
+# define VEC_PREFIX_tan
+#endif
+
+#ifdef __DECL_SIMD_tanf
+# define HAVE_VECTOR_tanf 1
+# define VEC_PREFIX_tanf VEC_PREFIX
+#else
+# define HAVE_VECTOR_tanf 0
+# define VEC_PREFIX_tanf
+#endif
+
+#ifdef __DECL_SIMD_tanl
+# define HAVE_VECTOR_tanl 1
+# define VEC_PREFIX_tanl VEC_PREFIX
+#else
+# define HAVE_VECTOR_tanl 0
+# define VEC_PREFIX_tanl
+#endif
+
+#ifdef __DECL_SIMD_acosh
+# define HAVE_VECTOR_acosh 1
+# define VEC_PREFIX_acosh VEC_PREFIX
+#else
+# define HAVE_VECTOR_acosh 0
+# define VEC_PREFIX_acosh
+#endif
+
+#ifdef __DECL_SIMD_acoshf
+# define HAVE_VECTOR_acoshf 1
+# define VEC_PREFIX_acoshf VEC_PREFIX
+#else
+# define HAVE_VECTOR_acoshf 0
+# define VEC_PREFIX_acoshf
+#endif
+
+#ifdef __DECL_SIMD_acoshl
+# define HAVE_VECTOR_acoshl 1
+# define VEC_PREFIX_acoshl VEC_PREFIX
+#else
+# define HAVE_VECTOR_acoshl 0
+# define VEC_PREFIX_acoshl
+#endif
+
+#ifdef __DECL_SIMD_asinh
+# define HAVE_VECTOR_asinh 1
+# define VEC_PREFIX_asinh VEC_PREFIX
+#else
+# define HAVE_VECTOR_asinh 0
+# define VEC_PREFIX_asinh
+#endif
+
+#ifdef __DECL_SIMD_asinhf
+# define HAVE_VECTOR_asinhf 1
+# define VEC_PREFIX_asinhf VEC_PREFIX
+#else
+# define HAVE_VECTOR_asinhf 0
+# define VEC_PREFIX_asinhf
+#endif
+
+#ifdef __DECL_SIMD_asinhl
+# define HAVE_VECTOR_asinhl 1
+# define VEC_PREFIX_asinhl VEC_PREFIX
+#else
+# define HAVE_VECTOR_asinhl 0
+# define VEC_PREFIX_asinhl
+#endif
+
+#ifdef __DECL_SIMD_atanh
+# define HAVE_VECTOR_atanh 1
+# define VEC_PREFIX_atanh VEC_PREFIX
+#else
+# define HAVE_VECTOR_atanh 0
+# define VEC_PREFIX_atanh
+#endif
+
+#ifdef __DECL_SIMD_atanhf
+# define HAVE_VECTOR_atanhf 1
+# define VEC_PREFIX_atanhf VEC_PREFIX
+#else
+# define HAVE_VECTOR_atanhf 0
+# define VEC_PREFIX_atanhf
+#endif
+
+#ifdef __DECL_SIMD_atanhl
+# define HAVE_VECTOR_atanhl 1
+# define VEC_PREFIX_atanhl VEC_PREFIX
+#else
+# define HAVE_VECTOR_atanhl 0
+# define VEC_PREFIX_atanhl
+#endif
+
+#ifdef __DECL_SIMD_cosh
+# define HAVE_VECTOR_cosh 1
+# define VEC_PREFIX_cosh VEC_PREFIX
+#else
+# define HAVE_VECTOR_cosh 0
+# define VEC_PREFIX_cosh
+#endif
+
+#ifdef __DECL_SIMD_coshf
+# define HAVE_VECTOR_coshf 1
+# define VEC_PREFIX_coshf VEC_PREFIX
+#else
+# define HAVE_VECTOR_coshf 0
+# define VEC_PREFIX_coshf
+#endif
+
+#ifdef __DECL_SIMD_coshl
+# define HAVE_VECTOR_coshl 1
+# define VEC_PREFIX_coshl VEC_PREFIX
+#else
+# define HAVE_VECTOR_coshl 0
+# define VEC_PREFIX_coshl
+#endif
+
+#ifdef __DECL_SIMD_sinh
+# define HAVE_VECTOR_sinh 1
+# define VEC_PREFIX_sinh VEC_PREFIX
+#else
+# define HAVE_VECTOR_sinh 0
+# define VEC_PREFIX_sinh
+#endif
+
+#ifdef __DECL_SIMD_sinhf
+# define HAVE_VECTOR_sinhf 1
+# define VEC_PREFIX_sinhf VEC_PREFIX
+#else
+# define HAVE_VECTOR_sinhf 0
+# define VEC_PREFIX_sinhf
+#endif
+
+#ifdef __DECL_SIMD_sinhl
+# define HAVE_VECTOR_sinhl 1
+# define VEC_PREFIX_sinhl VEC_PREFIX
+#else
+# define HAVE_VECTOR_sinhl 0
+# define VEC_PREFIX_sinhl
+#endif
+
+#ifdef __DECL_SIMD_tanh
+# define HAVE_VECTOR_tanh 1
+# define VEC_PREFIX_tanh VEC_PREFIX
+#else
+# define HAVE_VECTOR_tanh 0
+# define VEC_PREFIX_tanh
+#endif
+
+#ifdef __DECL_SIMD_tanhf
+# define HAVE_VECTOR_tanhf 1
+# define VEC_PREFIX_tanhf VEC_PREFIX
+#else
+# define HAVE_VECTOR_tanhf 0
+# define VEC_PREFIX_tanhf
+#endif
+
+#ifdef __DECL_SIMD_tanhl
+# define HAVE_VECTOR_tanhl 1
+# define VEC_PREFIX_tanhl VEC_PREFIX
+#else
+# define HAVE_VECTOR_tanhl 0
+# define VEC_PREFIX_tanhl
+#endif
+
+#ifdef __DECL_SIMD_exp
+# define HAVE_VECTOR_exp 1
+# define VEC_PREFIX_exp VEC_PREFIX
+#else
+# define HAVE_VECTOR_exp 0
+# define VEC_PREFIX_exp
+#endif
+
+#ifdef __DECL_SIMD_expf
+# define HAVE_VECTOR_expf 1
+# define VEC_PREFIX_expf VEC_PREFIX
+#else
+# define HAVE_VECTOR_expf 0
+# define VEC_PREFIX_expf
+#endif
+
+#ifdef __DECL_SIMD_expl
+# define HAVE_VECTOR_expl 1
+# define VEC_PREFIX_expl VEC_PREFIX
+#else
+# define HAVE_VECTOR_expl 0
+# define VEC_PREFIX_expl
+#endif
+
+#ifdef __DECL_SIMD_exp10
+# define HAVE_VECTOR_exp10 1
+# define VEC_PREFIX_exp10 VEC_PREFIX
+#else
+# define HAVE_VECTOR_exp10 0
+# define VEC_PREFIX_exp10
+#endif
+
+#ifdef __DECL_SIMD_exp10f
+# define HAVE_VECTOR_exp10f 1
+# define VEC_PREFIX_exp10f VEC_PREFIX
+#else
+# define HAVE_VECTOR_exp10f 0
+# define VEC_PREFIX_exp10f
+#endif
+
+#ifdef __DECL_SIMD_exp10l
+# define HAVE_VECTOR_exp10l 1
+# define VEC_PREFIX_exp10l VEC_PREFIX
+#else
+# define HAVE_VECTOR_exp10l 0
+# define VEC_PREFIX_exp10l
+#endif
+
+#ifdef __DECL_SIMD_exp2
+# define HAVE_VECTOR_exp2 1
+# define VEC_PREFIX_exp2 VEC_PREFIX
+#else
+# define HAVE_VECTOR_exp2 0
+# define VEC_PREFIX_exp2
+#endif
+
+#ifdef __DECL_SIMD_exp2f
+# define HAVE_VECTOR_exp2f 1
+# define VEC_PREFIX_exp2f VEC_PREFIX
+#else
+# define HAVE_VECTOR_exp2f 0
+# define VEC_PREFIX_exp2f
+#endif
+
+#ifdef __DECL_SIMD_exp2l
+# define HAVE_VECTOR_exp2l 1
+# define VEC_PREFIX_exp2l VEC_PREFIX
+#else
+# define HAVE_VECTOR_exp2l 0
+# define VEC_PREFIX_exp2l
+#endif
+
+#ifdef __DECL_SIMD_expm1
+# define HAVE_VECTOR_expm1 1
+# define VEC_PREFIX_expm1 VEC_PREFIX
+#else
+# define HAVE_VECTOR_expm1 0
+# define VEC_PREFIX_expm1
+#endif
+
+#ifdef __DECL_SIMD_expm1f
+# define HAVE_VECTOR_expm1f 1
+# define VEC_PREFIX_expm1f VEC_PREFIX
+#else
+# define HAVE_VECTOR_expm1f 0
+# define VEC_PREFIX_expm1f
+#endif
+
+#ifdef __DECL_SIMD_expm1l
+# define HAVE_VECTOR_expm1l 1
+# define VEC_PREFIX_expm1l VEC_PREFIX
+#else
+# define HAVE_VECTOR_expm1l 0
+# define VEC_PREFIX_expm1l
+#endif
+
+#ifdef __DECL_SIMD_frexp
+# define HAVE_VECTOR_frexp 1
+# define VEC_PREFIX_frexp VEC_PREFIX
+#else
+# define HAVE_VECTOR_frexp 0
+# define VEC_PREFIX_frexp
+#endif
+
+#ifdef __DECL_SIMD_frexpf
+# define HAVE_VECTOR_frexpf 1
+# define VEC_PREFIX_frexpf VEC_PREFIX
+#else
+# define HAVE_VECTOR_frexpf 0
+# define VEC_PREFIX_frexpf
+#endif
+
+#ifdef __DECL_SIMD_frexpl
+# define HAVE_VECTOR_frexpl 1
+# define VEC_PREFIX_frexpl VEC_PREFIX
+#else
+# define HAVE_VECTOR_frexpl 0
+# define VEC_PREFIX_frexpl
+#endif
+
+#ifdef __DECL_SIMD_ldexp
+# define HAVE_VECTOR_ldexp 1
+# define VEC_PREFIX_ldexp VEC_PREFIX
+#else
+# define HAVE_VECTOR_ldexp 0
+# define VEC_PREFIX_ldexp
+#endif
+
+#ifdef __DECL_SIMD_ldexpf
+# define HAVE_VECTOR_ldexpf 1
+# define VEC_PREFIX_ldexpf VEC_PREFIX
+#else
+# define HAVE_VECTOR_ldexpf 0
+# define VEC_PREFIX_ldexpf
+#endif
+
+#ifdef __DECL_SIMD_ldexpl
+# define HAVE_VECTOR_ldexpl 1
+# define VEC_PREFIX_ldexpl VEC_PREFIX
+#else
+# define HAVE_VECTOR_ldexpl 0
+# define VEC_PREFIX_ldexpl
+#endif
+
+#ifdef __DECL_SIMD_log
+# define HAVE_VECTOR_log 1
+# define VEC_PREFIX_log VEC_PREFIX
+#else
+# define HAVE_VECTOR_log 0
+# define VEC_PREFIX_log
+#endif
+
+#ifdef __DECL_SIMD_logf
+# define HAVE_VECTOR_logf 1
+# define VEC_PREFIX_logf VEC_PREFIX
+#else
+# define HAVE_VECTOR_logf 0
+# define VEC_PREFIX_logf
+#endif
+
+#ifdef __DECL_SIMD_logl
+# define HAVE_VECTOR_logl 1
+# define VEC_PREFIX_logl VEC_PREFIX
+#else
+# define HAVE_VECTOR_logl 0
+# define VEC_PREFIX_logl
+#endif
+
+#ifdef __DECL_SIMD_log10
+# define HAVE_VECTOR_log10 1
+# define VEC_PREFIX_log10 VEC_PREFIX
+#else
+# define HAVE_VECTOR_log10 0
+# define VEC_PREFIX_log10
+#endif
+
+#ifdef __DECL_SIMD_log10f
+# define HAVE_VECTOR_log10f 1
+# define VEC_PREFIX_log10f VEC_PREFIX
+#else
+# define HAVE_VECTOR_log10f 0
+# define VEC_PREFIX_log10f
+#endif
+
+#ifdef __DECL_SIMD_log10l
+# define HAVE_VECTOR_log10l 1
+# define VEC_PREFIX_log10l VEC_PREFIX
+#else
+# define HAVE_VECTOR_log10l 0
+# define VEC_PREFIX_log10l
+#endif
+
+#ifdef __DECL_SIMD_log1p
+# define HAVE_VECTOR_log1p 1
+# define VEC_PREFIX_log1p VEC_PREFIX
+#else
+# define HAVE_VECTOR_log1p 0
+# define VEC_PREFIX_log1p
+#endif
+
+#ifdef __DECL_SIMD_log1pf
+# define HAVE_VECTOR_log1pf 1
+# define VEC_PREFIX_log1pf VEC_PREFIX
+#else
+# define HAVE_VECTOR_log1pf 0
+# define VEC_PREFIX_log1pf
+#endif
+
+#ifdef __DECL_SIMD_log1pl
+# define HAVE_VECTOR_log1pl 1
+# define VEC_PREFIX_log1pl VEC_PREFIX
+#else
+# define HAVE_VECTOR_log1pl 0
+# define VEC_PREFIX_log1pl
+#endif
+
+#ifdef __DECL_SIMD_log2
+# define HAVE_VECTOR_log2 1
+# define VEC_PREFIX_log2 VEC_PREFIX
+#else
+# define HAVE_VECTOR_log2 0
+# define VEC_PREFIX_log2
+#endif
+
+#ifdef __DECL_SIMD_log2f
+# define HAVE_VECTOR_log2f 1
+# define VEC_PREFIX_log2f VEC_PREFIX
+#else
+# define HAVE_VECTOR_log2f 0
+# define VEC_PREFIX_log2f
+#endif
+
+#ifdef __DECL_SIMD_log2l
+# define HAVE_VECTOR_log2l 1
+# define VEC_PREFIX_log2l VEC_PREFIX
+#else
+# define HAVE_VECTOR_log2l 0
+# define VEC_PREFIX_log2l
+#endif
+
+#ifdef __DECL_SIMD_logb
+# define HAVE_VECTOR_logb 1
+# define VEC_PREFIX_logb VEC_PREFIX
+#else
+# define HAVE_VECTOR_logb 0
+# define VEC_PREFIX_logb
+#endif
+
+#ifdef __DECL_SIMD_logbf
+# define HAVE_VECTOR_logbf 1
+# define VEC_PREFIX_logbf VEC_PREFIX
+#else
+# define HAVE_VECTOR_logbf 0
+# define VEC_PREFIX_logbf
+#endif
+
+#ifdef __DECL_SIMD_logbl
+# define HAVE_VECTOR_logbl 1
+# define VEC_PREFIX_logbl VEC_PREFIX
+#else
+# define HAVE_VECTOR_logbl 0
+# define VEC_PREFIX_logbl
+#endif
+
+#ifdef __DECL_SIMD_modf
+# define HAVE_VECTOR_modf 1
+# define VEC_PREFIX_modf VEC_PREFIX
+#else
+# define HAVE_VECTOR_modf 0
+# define VEC_PREFIX_modf
+#endif
+
+#ifdef __DECL_SIMD_modff
+# define HAVE_VECTOR_modff 1
+# define VEC_PREFIX_modff VEC_PREFIX
+#else
+# define HAVE_VECTOR_modff 0
+# define VEC_PREFIX_modff
+#endif
+
+#ifdef __DECL_SIMD_modfl
+# define HAVE_VECTOR_modfl 1
+# define VEC_PREFIX_modfl VEC_PREFIX
+#else
+# define HAVE_VECTOR_modfl 0
+# define VEC_PREFIX_modfl
+#endif
+
+#ifdef __DECL_SIMD_pow10
+# define HAVE_VECTOR_pow10 1
+# define VEC_PREFIX_pow10 VEC_PREFIX
+#else
+# define HAVE_VECTOR_pow10 0
+# define VEC_PREFIX_pow10
+#endif
+
+#ifdef __DECL_SIMD_pow10f
+# define HAVE_VECTOR_pow10f 1
+# define VEC_PREFIX_pow10f VEC_PREFIX
+#else
+# define HAVE_VECTOR_pow10f 0
+# define VEC_PREFIX_pow10f
+#endif
+
+#ifdef __DECL_SIMD_pow10l
+# define HAVE_VECTOR_pow10l 1
+# define VEC_PREFIX_pow10l VEC_PREFIX
+#else
+# define HAVE_VECTOR_pow10l 0
+# define VEC_PREFIX_pow10l
+#endif
+
+#ifdef __DECL_SIMD_ilogb
+# define HAVE_VECTOR_ilogb 1
+# define VEC_PREFIX_ilogb VEC_PREFIX
+#else
+# define HAVE_VECTOR_ilogb 0
+# define VEC_PREFIX_ilogb
+#endif
+
+#ifdef __DECL_SIMD_ilogbf
+# define HAVE_VECTOR_ilogbf 1
+# define VEC_PREFIX_ilogbf VEC_PREFIX
+#else
+# define HAVE_VECTOR_ilogbf 0
+# define VEC_PREFIX_ilogbf
+#endif
+
+#ifdef __DECL_SIMD_ilogbl
+# define HAVE_VECTOR_ilogbl 1
+# define VEC_PREFIX_ilogbl VEC_PREFIX
+#else
+# define HAVE_VECTOR_ilogbl 0
+# define VEC_PREFIX_ilogbl
+#endif
+
+#ifdef __DECL_SIMD_scalb
+# define HAVE_VECTOR_scalb 1
+# define VEC_PREFIX_scalb VEC_PREFIX
+#else
+# define HAVE_VECTOR_scalb 0
+# define VEC_PREFIX_scalb
+#endif
+
+#ifdef __DECL_SIMD_scalbf
+# define HAVE_VECTOR_scalbf 1
+# define VEC_PREFIX_scalbf VEC_PREFIX
+#else
+# define HAVE_VECTOR_scalbf 0
+# define VEC_PREFIX_scalbf
+#endif
+
+#ifdef __DECL_SIMD_scalbl
+# define HAVE_VECTOR_scalbl 1
+# define VEC_PREFIX_scalbl VEC_PREFIX
+#else
+# define HAVE_VECTOR_scalbl 0
+# define VEC_PREFIX_scalbl
+#endif
+
+#ifdef __DECL_SIMD_scalbn
+# define HAVE_VECTOR_scalbn 1
+# define VEC_PREFIX_scalbn VEC_PREFIX
+#else
+# define HAVE_VECTOR_scalbn 0
+# define VEC_PREFIX_scalbn
+#endif
+
+#ifdef __DECL_SIMD_scalbnf
+# define HAVE_VECTOR_scalbnf 1
+# define VEC_PREFIX_scalbnf VEC_PREFIX
+#else
+# define HAVE_VECTOR_scalbnf 0
+# define VEC_PREFIX_scalbnf
+#endif
+
+#ifdef __DECL_SIMD_scalbnl
+# define HAVE_VECTOR_scalbnl 1
+# define VEC_PREFIX_scalbnl VEC_PREFIX
+#else
+# define HAVE_VECTOR_scalbnl 0
+# define VEC_PREFIX_scalbnl
+#endif
+
+#ifdef __DECL_SIMD_scalbln
+# define HAVE_VECTOR_scalbln 1
+# define VEC_PREFIX_scalbln VEC_PREFIX
+#else
+# define HAVE_VECTOR_scalbln 0
+# define VEC_PREFIX_scalbln
+#endif
+
+#ifdef __DECL_SIMD_scalblnf
+# define HAVE_VECTOR_scalblnf 1
+# define VEC_PREFIX_scalblnf VEC_PREFIX
+#else
+# define HAVE_VECTOR_scalblnf 0
+# define VEC_PREFIX_scalblnf
+#endif
+
+#ifdef __DECL_SIMD_scalblnl
+# define HAVE_VECTOR_scalblnl 1
+# define VEC_PREFIX_scalblnl VEC_PREFIX
+#else
+# define HAVE_VECTOR_scalblnl 0
+# define VEC_PREFIX_scalblnl
+#endif
+
+#ifdef __DECL_SIMD_significand
+# define HAVE_VECTOR_significand 1
+# define VEC_PREFIX_significand VEC_PREFIX
+#else
+# define HAVE_VECTOR_significand 0
+# define VEC_PREFIX_significand
+#endif
+
+#ifdef __DECL_SIMD_significandf
+# define HAVE_VECTOR_significandf 1
+# define VEC_PREFIX_significandf VEC_PREFIX
+#else
+# define HAVE_VECTOR_significandf 0
+# define VEC_PREFIX_significandf
+#endif
+
+#ifdef __DECL_SIMD_significandl
+# define HAVE_VECTOR_significandl 1
+# define VEC_PREFIX_significandl VEC_PREFIX
+#else
+# define HAVE_VECTOR_significandl 0
+# define VEC_PREFIX_significandl
+#endif
+
+#ifdef __DECL_SIMD_cbrt
+# define HAVE_VECTOR_cbrt 1
+# define VEC_PREFIX_cbrt VEC_PREFIX
+#else
+# define HAVE_VECTOR_cbrt 0
+# define VEC_PREFIX_cbrt
+#endif
+
+#ifdef __DECL_SIMD_cbrtf
+# define HAVE_VECTOR_cbrtf 1
+# define VEC_PREFIX_cbrtf VEC_PREFIX
+#else
+# define HAVE_VECTOR_cbrtf 0
+# define VEC_PREFIX_cbrtf
+#endif
+
+#ifdef __DECL_SIMD_cbrtl
+# define HAVE_VECTOR_cbrtl 1
+# define VEC_PREFIX_cbrtl VEC_PREFIX
+#else
+# define HAVE_VECTOR_cbrtl 0
+# define VEC_PREFIX_cbrtl
+#endif
+
+#ifdef __DECL_SIMD_fabs
+# define HAVE_VECTOR_fabs 1
+# define VEC_PREFIX_fabs VEC_PREFIX
+#else
+# define HAVE_VECTOR_fabs 0
+# define VEC_PREFIX_fabs
+#endif
+
+#ifdef __DECL_SIMD_fabsf
+# define HAVE_VECTOR_fabsf 1
+# define VEC_PREFIX_fabsf VEC_PREFIX
+#else
+# define HAVE_VECTOR_fabsf 0
+# define VEC_PREFIX_fabsf
+#endif
+
+#ifdef __DECL_SIMD_fabsl
+# define HAVE_VECTOR_fabsl 1
+# define VEC_PREFIX_fabsl VEC_PREFIX
+#else
+# define HAVE_VECTOR_fabsl 0
+# define VEC_PREFIX_fabsl
+#endif
+
+#ifdef __DECL_SIMD_hypot
+# define HAVE_VECTOR_hypot 1
+# define VEC_PREFIX_hypot VEC_PREFIX
+#else
+# define HAVE_VECTOR_hypot 0
+# define VEC_PREFIX_hypot
+#endif
+
+#ifdef __DECL_SIMD_hypotf
+# define HAVE_VECTOR_hypotf 1
+# define VEC_PREFIX_hypotf VEC_PREFIX
+#else
+# define HAVE_VECTOR_hypotf 0
+# define VEC_PREFIX_hypotf
+#endif
+
+#ifdef __DECL_SIMD_hypotl
+# define HAVE_VECTOR_hypotl 1
+# define VEC_PREFIX_hypotl VEC_PREFIX
+#else
+# define HAVE_VECTOR_hypotl 0
+# define VEC_PREFIX_hypotl
+#endif
+
+#ifdef __DECL_SIMD_pow
+# define HAVE_VECTOR_pow 1
+# define VEC_PREFIX_pow VEC_PREFIX
+#else
+# define HAVE_VECTOR_pow 0
+# define VEC_PREFIX_pow
+#endif
+
+#ifdef __DECL_SIMD_powf
+# define HAVE_VECTOR_powf 1
+# define VEC_PREFIX_powf VEC_PREFIX
+#else
+# define HAVE_VECTOR_powf 0
+# define VEC_PREFIX_powf
+#endif
+
+#ifdef __DECL_SIMD_powl
+# define HAVE_VECTOR_powl 1
+# define VEC_PREFIX_powl VEC_PREFIX
+#else
+# define HAVE_VECTOR_powl 0
+# define VEC_PREFIX_powl
+#endif
+
+#ifdef __DECL_SIMD_sqrt
+# define HAVE_VECTOR_sqrt 1
+# define VEC_PREFIX_sqrt VEC_PREFIX
+#else
+# define HAVE_VECTOR_sqrt 0
+# define VEC_PREFIX_sqrt
+#endif
+
+#ifdef __DECL_SIMD_sqrtf
+# define HAVE_VECTOR_sqrtf 1
+# define VEC_PREFIX_sqrtf VEC_PREFIX
+#else
+# define HAVE_VECTOR_sqrtf 0
+# define VEC_PREFIX_sqrtf
+#endif
+
+#ifdef __DECL_SIMD_sqrtl
+# define HAVE_VECTOR_sqrtl 1
+# define VEC_PREFIX_sqrtl VEC_PREFIX
+#else
+# define HAVE_VECTOR_sqrtl 0
+# define VEC_PREFIX_sqrtl
+#endif
+
+#ifdef __DECL_SIMD_erf
+# define HAVE_VECTOR_erf 1
+# define VEC_PREFIX_erf VEC_PREFIX
+#else
+# define HAVE_VECTOR_erf 0
+# define VEC_PREFIX_erf
+#endif
+
+#ifdef __DECL_SIMD_erff
+# define HAVE_VECTOR_erff 1
+# define VEC_PREFIX_erff VEC_PREFIX
+#else
+# define HAVE_VECTOR_erff 0
+# define VEC_PREFIX_erff
+#endif
+
+#ifdef __DECL_SIMD_erfl
+# define HAVE_VECTOR_erfl 1
+# define VEC_PREFIX_erfl VEC_PREFIX
+#else
+# define HAVE_VECTOR_erfl 0
+# define VEC_PREFIX_erfl
+#endif
+
+#ifdef __DECL_SIMD_erfc
+# define HAVE_VECTOR_erfc 1
+# define VEC_PREFIX_erfc VEC_PREFIX
+#else
+# define HAVE_VECTOR_erfc 0
+# define VEC_PREFIX_erfc
+#endif
+
+#ifdef __DECL_SIMD_erfcf
+# define HAVE_VECTOR_erfcf 1
+# define VEC_PREFIX_erfcf VEC_PREFIX
+#else
+# define HAVE_VECTOR_erfcf 0
+# define VEC_PREFIX_erfcf
+#endif
+
+#ifdef __DECL_SIMD_erfcl
+# define HAVE_VECTOR_erfcl 1
+# define VEC_PREFIX_erfcl VEC_PREFIX
+#else
+# define HAVE_VECTOR_erfcl 0
+# define VEC_PREFIX_erfcl
+#endif
+
+#ifdef __DECL_SIMD_gamma
+# define HAVE_VECTOR_gamma 1
+# define VEC_PREFIX_gamma VEC_PREFIX
+#else
+# define HAVE_VECTOR_gamma 0
+# define VEC_PREFIX_gamma
+#endif
+
+#ifdef __DECL_SIMD_gammaf
+# define HAVE_VECTOR_gammaf 1
+# define VEC_PREFIX_gammaf VEC_PREFIX
+#else
+# define HAVE_VECTOR_gammaf 0
+# define VEC_PREFIX_gammaf
+#endif
+
+#ifdef __DECL_SIMD_gammal
+# define HAVE_VECTOR_gammal 1
+# define VEC_PREFIX_gammal VEC_PREFIX
+#else
+# define HAVE_VECTOR_gammal 0
+# define VEC_PREFIX_gammal
+#endif
+
+#ifdef __DECL_SIMD_lgamma
+# define HAVE_VECTOR_lgamma 1
+# define VEC_PREFIX_lgamma VEC_PREFIX
+#else
+# define HAVE_VECTOR_lgamma 0
+# define VEC_PREFIX_lgamma
+#endif
+
+#ifdef __DECL_SIMD_lgammaf
+# define HAVE_VECTOR_lgammaf 1
+# define VEC_PREFIX_lgammaf VEC_PREFIX
+#else
+# define HAVE_VECTOR_lgammaf 0
+# define VEC_PREFIX_lgammaf
+#endif
+
+#ifdef __DECL_SIMD_lgammal
+# define HAVE_VECTOR_lgammal 1
+# define VEC_PREFIX_lgammal VEC_PREFIX
+#else
+# define HAVE_VECTOR_lgammal 0
+# define VEC_PREFIX_lgammal
+#endif
+
+#ifdef __DECL_SIMD_tgamma
+# define HAVE_VECTOR_tgamma 1
+# define VEC_PREFIX_tgamma VEC_PREFIX
+#else
+# define HAVE_VECTOR_tgamma 0
+# define VEC_PREFIX_tgamma
+#endif
+
+#ifdef __DECL_SIMD_tgammaf
+# define HAVE_VECTOR_tgammaf 1
+# define VEC_PREFIX_tgammaf VEC_PREFIX
+#else
+# define HAVE_VECTOR_tgammaf 0
+# define VEC_PREFIX_tgammaf
+#endif
+
+#ifdef __DECL_SIMD_tgammal
+# define HAVE_VECTOR_tgammal 1
+# define VEC_PREFIX_tgammal VEC_PREFIX
+#else
+# define HAVE_VECTOR_tgammal 0
+# define VEC_PREFIX_tgammal
+#endif
+
+#ifdef __DECL_SIMD_ceil
+# define HAVE_VECTOR_ceil 1
+# define VEC_PREFIX_ceil VEC_PREFIX
+#else
+# define HAVE_VECTOR_ceil 0
+# define VEC_PREFIX_ceil
+#endif
+
+#ifdef __DECL_SIMD_ceilf
+# define HAVE_VECTOR_ceilf 1
+# define VEC_PREFIX_ceilf VEC_PREFIX
+#else
+# define HAVE_VECTOR_ceilf 0
+# define VEC_PREFIX_ceilf
+#endif
+
+#ifdef __DECL_SIMD_ceill
+# define HAVE_VECTOR_ceill 1
+# define VEC_PREFIX_ceill VEC_PREFIX
+#else
+# define HAVE_VECTOR_ceill 0
+# define VEC_PREFIX_ceill
+#endif
+
+#ifdef __DECL_SIMD_floor
+# define HAVE_VECTOR_floor 1
+# define VEC_PREFIX_floor VEC_PREFIX
+#else
+# define HAVE_VECTOR_floor 0
+# define VEC_PREFIX_floor
+#endif
+
+#ifdef __DECL_SIMD_floorf
+# define HAVE_VECTOR_floorf 1
+# define VEC_PREFIX_floorf VEC_PREFIX
+#else
+# define HAVE_VECTOR_floorf 0
+# define VEC_PREFIX_floorf
+#endif
+
+#ifdef __DECL_SIMD_floorl
+# define HAVE_VECTOR_floorl 1
+# define VEC_PREFIX_floorl VEC_PREFIX
+#else
+# define HAVE_VECTOR_floorl 0
+# define VEC_PREFIX_floorl
+#endif
+
+#ifdef __DECL_SIMD_nearbyint
+# define HAVE_VECTOR_nearbyint 1
+# define VEC_PREFIX_nearbyint VEC_PREFIX
+#else
+# define HAVE_VECTOR_nearbyint 0
+# define VEC_PREFIX_nearbyint
+#endif
+
+#ifdef __DECL_SIMD_nearbyintf
+# define HAVE_VECTOR_nearbyintf 1
+# define VEC_PREFIX_nearbyintf VEC_PREFIX
+#else
+# define HAVE_VECTOR_nearbyintf 0
+# define VEC_PREFIX_nearbyintf
+#endif
+
+#ifdef __DECL_SIMD_nearbyintl
+# define HAVE_VECTOR_nearbyintl 1
+# define VEC_PREFIX_nearbyintl VEC_PREFIX
+#else
+# define HAVE_VECTOR_nearbyintl 0
+# define VEC_PREFIX_nearbyintl
+#endif
+
+#ifdef __DECL_SIMD_rint
+# define HAVE_VECTOR_rint 1
+# define VEC_PREFIX_rint VEC_PREFIX
+#else
+# define HAVE_VECTOR_rint 0
+# define VEC_PREFIX_rint
+#endif
+
+#ifdef __DECL_SIMD_rintf
+# define HAVE_VECTOR_rintf 1
+# define VEC_PREFIX_rintf VEC_PREFIX
+#else
+# define HAVE_VECTOR_rintf 0
+# define VEC_PREFIX_rintf
+#endif
+
+#ifdef __DECL_SIMD_rintl
+# define HAVE_VECTOR_rintl 1
+# define VEC_PREFIX_rintl VEC_PREFIX
+#else
+# define HAVE_VECTOR_rintl 0
+# define VEC_PREFIX_rintl
+#endif
+
+#ifdef __DECL_SIMD_lrint
+# define HAVE_VECTOR_lrint 1
+# define VEC_PREFIX_lrint VEC_PREFIX
+#else
+# define HAVE_VECTOR_lrint 0
+# define VEC_PREFIX_lrint
+#endif
+
+#ifdef __DECL_SIMD_lrintf
+# define HAVE_VECTOR_lrintf 1
+# define VEC_PREFIX_lrintf VEC_PREFIX
+#else
+# define HAVE_VECTOR_lrintf 0
+# define VEC_PREFIX_lrintf
+#endif
+
+#ifdef __DECL_SIMD_lrintl
+# define HAVE_VECTOR_lrintl 1
+# define VEC_PREFIX_lrintl VEC_PREFIX
+#else
+# define HAVE_VECTOR_lrintl 0
+# define VEC_PREFIX_lrintl
+#endif
+
+#ifdef __DECL_SIMD_llrint
+# define HAVE_VECTOR_llrint 1
+# define VEC_PREFIX_llrint VEC_PREFIX
+#else
+# define HAVE_VECTOR_llrint 0
+# define VEC_PREFIX_llrint
+#endif
+
+#ifdef __DECL_SIMD_llrintf
+# define HAVE_VECTOR_llrintf 1
+# define VEC_PREFIX_llrintf VEC_PREFIX
+#else
+# define HAVE_VECTOR_llrintf 0
+# define VEC_PREFIX_llrintf
+#endif
+
+#ifdef __DECL_SIMD_llrintl
+# define HAVE_VECTOR_llrintl 1
+# define VEC_PREFIX_llrintl VEC_PREFIX
+#else
+# define HAVE_VECTOR_llrintl 0
+# define VEC_PREFIX_llrintl
+#endif
+
+#ifdef __DECL_SIMD_round
+# define HAVE_VECTOR_round 1
+# define VEC_PREFIX_round VEC_PREFIX
+#else
+# define HAVE_VECTOR_round 0
+# define VEC_PREFIX_round
+#endif
+
+#ifdef __DECL_SIMD_roundf
+# define HAVE_VECTOR_roundf 1
+# define VEC_PREFIX_roundf VEC_PREFIX
+#else
+# define HAVE_VECTOR_roundf 0
+# define VEC_PREFIX_roundf
+#endif
+
+#ifdef __DECL_SIMD_roundl
+# define HAVE_VECTOR_roundl 1
+# define VEC_PREFIX_roundl VEC_PREFIX
+#else
+# define HAVE_VECTOR_roundl 0
+# define VEC_PREFIX_roundl
+#endif
+
+#ifdef __DECL_SIMD_lround
+# define HAVE_VECTOR_lround 1
+# define VEC_PREFIX_lround VEC_PREFIX
+#else
+# define HAVE_VECTOR_lround 0
+# define VEC_PREFIX_lround
+#endif
+
+#ifdef __DECL_SIMD_lroundf
+# define HAVE_VECTOR_lroundf 1
+# define VEC_PREFIX_lroundf VEC_PREFIX
+#else
+# define HAVE_VECTOR_lroundf 0
+# define VEC_PREFIX_lroundf
+#endif
+
+#ifdef __DECL_SIMD_lroundl
+# define HAVE_VECTOR_lroundl 1
+# define VEC_PREFIX_lroundl VEC_PREFIX
+#else
+# define HAVE_VECTOR_lroundl 0
+# define VEC_PREFIX_lroundl
+#endif
+
+#ifdef __DECL_SIMD_llround
+# define HAVE_VECTOR_llround 1
+# define VEC_PREFIX_llround VEC_PREFIX
+#else
+# define HAVE_VECTOR_llround 0
+# define VEC_PREFIX_llround
+#endif
+
+#ifdef __DECL_SIMD_llroundf
+# define HAVE_VECTOR_llroundf 1
+# define VEC_PREFIX_llroundf VEC_PREFIX
+#else
+# define HAVE_VECTOR_llroundf 0
+# define VEC_PREFIX_llroundf
+#endif
+
+#ifdef __DECL_SIMD_llroundl
+# define HAVE_VECTOR_llroundl 1
+# define VEC_PREFIX_llroundl VEC_PREFIX
+#else
+# define HAVE_VECTOR_llroundl 0
+# define VEC_PREFIX_llroundl
+#endif
+
+#ifdef __DECL_SIMD_trunc
+# define HAVE_VECTOR_trunc 1
+# define VEC_PREFIX_trunc VEC_PREFIX
+#else
+# define HAVE_VECTOR_trunc 0
+# define VEC_PREFIX_trunc
+#endif
+
+#ifdef __DECL_SIMD_truncf
+# define HAVE_VECTOR_truncf 1
+# define VEC_PREFIX_truncf VEC_PREFIX
+#else
+# define HAVE_VECTOR_truncf 0
+# define VEC_PREFIX_truncf
+#endif
+
+#ifdef __DECL_SIMD_truncl
+# define HAVE_VECTOR_truncl 1
+# define VEC_PREFIX_truncl VEC_PREFIX
+#else
+# define HAVE_VECTOR_truncl 0
+# define VEC_PREFIX_truncl
+#endif
+
+#ifdef __DECL_SIMD_drem
+# define HAVE_VECTOR_drem 1
+# define VEC_PREFIX_drem VEC_PREFIX
+#else
+# define HAVE_VECTOR_drem 0
+# define VEC_PREFIX_drem
+#endif
+
+#ifdef __DECL_SIMD_dremf
+# define HAVE_VECTOR_dremf 1
+# define VEC_PREFIX_dremf VEC_PREFIX
+#else
+# define HAVE_VECTOR_dremf 0
+# define VEC_PREFIX_dremf
+#endif
+
+#ifdef __DECL_SIMD_dreml
+# define HAVE_VECTOR_dreml 1
+# define VEC_PREFIX_dreml VEC_PREFIX
+#else
+# define HAVE_VECTOR_dreml 0
+# define VEC_PREFIX_dreml
+#endif
+
+#ifdef __DECL_SIMD_fmod
+# define HAVE_VECTOR_fmod 1
+# define VEC_PREFIX_fmod VEC_PREFIX
+#else
+# define HAVE_VECTOR_fmod 0
+# define VEC_PREFIX_fmod
+#endif
+
+#ifdef __DECL_SIMD_fmodf
+# define HAVE_VECTOR_fmodf 1
+# define VEC_PREFIX_fmodf VEC_PREFIX
+#else
+# define HAVE_VECTOR_fmodf 0
+# define VEC_PREFIX_fmodf
+#endif
+
+#ifdef __DECL_SIMD_fmodl
+# define HAVE_VECTOR_fmodl 1
+# define VEC_PREFIX_fmodl VEC_PREFIX
+#else
+# define HAVE_VECTOR_fmodl 0
+# define VEC_PREFIX_fmodl
+#endif
+
+#ifdef __DECL_SIMD_remainder
+# define HAVE_VECTOR_remainder 1
+# define VEC_PREFIX_remainder VEC_PREFIX
+#else
+# define HAVE_VECTOR_remainder 0
+# define VEC_PREFIX_remainder
+#endif
+
+#ifdef __DECL_SIMD_remainderf
+# define HAVE_VECTOR_remainderf 1
+# define VEC_PREFIX_remainderf VEC_PREFIX
+#else
+# define HAVE_VECTOR_remainderf 0
+# define VEC_PREFIX_remainderf
+#endif
+
+#ifdef __DECL_SIMD_remainderl
+# define HAVE_VECTOR_remainderl 1
+# define VEC_PREFIX_remainderl VEC_PREFIX
+#else
+# define HAVE_VECTOR_remainderl 0
+# define VEC_PREFIX_remainderl
+#endif
+
+#ifdef __DECL_SIMD_remquo
+# define HAVE_VECTOR_remquo 1
+# define VEC_PREFIX_remquo VEC_PREFIX
+#else
+# define HAVE_VECTOR_remquo 0
+# define VEC_PREFIX_remquo
+#endif
+
+#ifdef __DECL_SIMD_remquof
+# define HAVE_VECTOR_remquof 1
+# define VEC_PREFIX_remquof VEC_PREFIX
+#else
+# define HAVE_VECTOR_remquof 0
+# define VEC_PREFIX_remquof
+#endif
+
+#ifdef __DECL_SIMD_remquol
+# define HAVE_VECTOR_remquol 1
+# define VEC_PREFIX_remquol VEC_PREFIX
+#else
+# define HAVE_VECTOR_remquol 0
+# define VEC_PREFIX_remquol
+#endif
+
+#ifdef __DECL_SIMD_copysign
+# define HAVE_VECTOR_copysign 1
+# define VEC_PREFIX_copysign VEC_PREFIX
+#else
+# define HAVE_VECTOR_copysign 0
+# define VEC_PREFIX_copysign
+#endif
+
+#ifdef __DECL_SIMD_copysignf
+# define HAVE_VECTOR_copysignf 1
+# define VEC_PREFIX_copysignf VEC_PREFIX
+#else
+# define HAVE_VECTOR_copysignf 0
+# define VEC_PREFIX_copysignf
+#endif
+
+#ifdef __DECL_SIMD_copysignl
+# define HAVE_VECTOR_copysignl 1
+# define VEC_PREFIX_copysignl VEC_PREFIX
+#else
+# define HAVE_VECTOR_copysignl 0
+# define VEC_PREFIX_copysignl
+#endif
+
+#ifdef __DECL_SIMD_nextafter
+# define HAVE_VECTOR_nextafter 1
+# define VEC_PREFIX_nextafter VEC_PREFIX
+#else
+# define HAVE_VECTOR_nextafter 0
+# define VEC_PREFIX_nextafter
+#endif
+
+#ifdef __DECL_SIMD_nextafterf
+# define HAVE_VECTOR_nextafterf 1
+# define VEC_PREFIX_nextafterf VEC_PREFIX
+#else
+# define HAVE_VECTOR_nextafterf 0
+# define VEC_PREFIX_nextafterf
+#endif
+
+#ifdef __DECL_SIMD_nextafterl
+# define HAVE_VECTOR_nextafterl 1
+# define VEC_PREFIX_nextafterl VEC_PREFIX
+#else
+# define HAVE_VECTOR_nextafterl 0
+# define VEC_PREFIX_nextafterl
+#endif
+
+#ifdef __DECL_SIMD_nexttoward
+# define HAVE_VECTOR_nexttoward 1
+# define VEC_PREFIX_nexttoward VEC_PREFIX
+#else
+# define HAVE_VECTOR_nexttoward 0
+# define VEC_PREFIX_nexttoward
+#endif
+
+#ifdef __DECL_SIMD_nexttowardf
+# define HAVE_VECTOR_nexttowardf 1
+# define VEC_PREFIX_nexttowardf VEC_PREFIX
+#else
+# define HAVE_VECTOR_nexttowardf 0
+# define VEC_PREFIX_nexttowardf
+#endif
+
+#ifdef __DECL_SIMD_nexttowardl
+# define HAVE_VECTOR_nexttowardl 1
+# define VEC_PREFIX_nexttowardl VEC_PREFIX
+#else
+# define HAVE_VECTOR_nexttowardl 0
+# define VEC_PREFIX_nexttowardl
+#endif
+
+#ifdef __DECL_SIMD_fdim
+# define HAVE_VECTOR_fdim 1
+# define VEC_PREFIX_fdim VEC_PREFIX
+#else
+# define HAVE_VECTOR_fdim 0
+# define VEC_PREFIX_fdim
+#endif
+
+#ifdef __DECL_SIMD_fdimf
+# define HAVE_VECTOR_fdimf 1
+# define VEC_PREFIX_fdimf VEC_PREFIX
+#else
+# define HAVE_VECTOR_fdimf 0
+# define VEC_PREFIX_fdimf
+#endif
+
+#ifdef __DECL_SIMD_fdiml
+# define HAVE_VECTOR_fdiml 1
+# define VEC_PREFIX_fdiml VEC_PREFIX
+#else
+# define HAVE_VECTOR_fdiml 0
+# define VEC_PREFIX_fdiml
+#endif
+
+#ifdef __DECL_SIMD_fmax
+# define HAVE_VECTOR_fmax 1
+# define VEC_PREFIX_fmax VEC_PREFIX
+#else
+# define HAVE_VECTOR_fmax 0
+# define VEC_PREFIX_fmax
+#endif
+
+#ifdef __DECL_SIMD_fmaxf
+# define HAVE_VECTOR_fmaxf 1
+# define VEC_PREFIX_fmaxf VEC_PREFIX
+#else
+# define HAVE_VECTOR_fmaxf 0
+# define VEC_PREFIX_fmaxf
+#endif
+
+#ifdef __DECL_SIMD_fmaxl
+# define HAVE_VECTOR_fmaxl 1
+# define VEC_PREFIX_fmaxl VEC_PREFIX
+#else
+# define HAVE_VECTOR_fmaxl 0
+# define VEC_PREFIX_fmaxl
+#endif
+
+#ifdef __DECL_SIMD_fmin
+# define HAVE_VECTOR_fmin 1
+# define VEC_PREFIX_fmin VEC_PREFIX
+#else
+# define HAVE_VECTOR_fmin 0
+# define VEC_PREFIX_fmin
+#endif
+
+#ifdef __DECL_SIMD_fminf
+# define HAVE_VECTOR_fminf 1
+# define VEC_PREFIX_fminf VEC_PREFIX
+#else
+# define HAVE_VECTOR_fminf 0
+# define VEC_PREFIX_fminf
+#endif
+
+#ifdef __DECL_SIMD_fminl
+# define HAVE_VECTOR_fminl 1
+# define VEC_PREFIX_fminl VEC_PREFIX
+#else
+# define HAVE_VECTOR_fminl 0
+# define VEC_PREFIX_fminl
+#endif
+
+#ifdef __DECL_SIMD_fma
+# define HAVE_VECTOR_fma 1
+# define VEC_PREFIX_fma VEC_PREFIX
+#else
+# define HAVE_VECTOR_fma 0
+# define VEC_PREFIX_fma
+#endif
+
+#ifdef __DECL_SIMD_fmaf
+# define HAVE_VECTOR_fmaf 1
+# define VEC_PREFIX_fmaf VEC_PREFIX
+#else
+# define HAVE_VECTOR_fmaf 0
+# define VEC_PREFIX_fmaf
+#endif
+
+#ifdef __DECL_SIMD_fmal
+# define HAVE_VECTOR_fmal 1
+# define VEC_PREFIX_fmal VEC_PREFIX
+#else
+# define HAVE_VECTOR_fmal 0
+# define VEC_PREFIX_fmal
+#endif
+
+#ifdef __DECL_SIMD_isgreater
+# define HAVE_VECTOR_isgreater 1
+# define VEC_PREFIX_isgreater VEC_PREFIX
+#else
+# define HAVE_VECTOR_isgreater 0
+# define VEC_PREFIX_isgreater
+#endif
+
+#ifdef __DECL_SIMD_isgreaterf
+# define HAVE_VECTOR_isgreaterf 1
+# define VEC_PREFIX_isgreaterf VEC_PREFIX
+#else
+# define HAVE_VECTOR_isgreaterf 0
+# define VEC_PREFIX_isgreaterf
+#endif
+
+#ifdef __DECL_SIMD_isgreaterl
+# define HAVE_VECTOR_isgreaterl 1
+# define VEC_PREFIX_isgreaterl VEC_PREFIX
+#else
+# define HAVE_VECTOR_isgreaterl 0
+# define VEC_PREFIX_isgreaterl
+#endif
+
+#ifdef __DECL_SIMD_isgreaterequal
+# define HAVE_VECTOR_isgreaterequal 1
+# define VEC_PREFIX_isgreaterequal VEC_PREFIX
+#else
+# define HAVE_VECTOR_isgreaterequal 0
+# define VEC_PREFIX_isgreaterequal
+#endif
+
+#ifdef __DECL_SIMD_isgreaterequalf
+# define HAVE_VECTOR_isgreaterequalf 1
+# define VEC_PREFIX_isgreaterequalf VEC_PREFIX
+#else
+# define HAVE_VECTOR_isgreaterequalf 0
+# define VEC_PREFIX_isgreaterequalf
+#endif
+
+#ifdef __DECL_SIMD_isgreaterequall
+# define HAVE_VECTOR_isgreaterequall 1
+# define VEC_PREFIX_isgreaterequall VEC_PREFIX
+#else
+# define HAVE_VECTOR_isgreaterequall 0
+# define VEC_PREFIX_isgreaterequall
+#endif
+
+#ifdef __DECL_SIMD_isless
+# define HAVE_VECTOR_isless 1
+# define VEC_PREFIX_isless VEC_PREFIX
+#else
+# define HAVE_VECTOR_isless 0
+# define VEC_PREFIX_isless
+#endif
+
+#ifdef __DECL_SIMD_islessf
+# define HAVE_VECTOR_islessf 1
+# define VEC_PREFIX_islessf VEC_PREFIX
+#else
+# define HAVE_VECTOR_islessf 0
+# define VEC_PREFIX_islessf
+#endif
+
+#ifdef __DECL_SIMD_islessl
+# define HAVE_VECTOR_islessl 1
+# define VEC_PREFIX_islessl VEC_PREFIX
+#else
+# define HAVE_VECTOR_islessl 0
+# define VEC_PREFIX_islessl
+#endif
+
+#ifdef __DECL_SIMD_islessequal
+# define HAVE_VECTOR_islessequal 1
+# define VEC_PREFIX_islessequal VEC_PREFIX
+#else
+# define HAVE_VECTOR_islessequal 0
+# define VEC_PREFIX_islessequal
+#endif
+
+#ifdef __DECL_SIMD_islessequalf
+# define HAVE_VECTOR_islessequalf 1
+# define VEC_PREFIX_islessequalf VEC_PREFIX
+#else
+# define HAVE_VECTOR_islessequalf 0
+# define VEC_PREFIX_islessequalf
+#endif
+
+#ifdef __DECL_SIMD_islessequall
+# define HAVE_VECTOR_islessequall 1
+# define VEC_PREFIX_islessequall VEC_PREFIX
+#else
+# define HAVE_VECTOR_islessequall 0
+# define VEC_PREFIX_islessequall
+#endif
+
+#ifdef __DECL_SIMD_islessgreater
+# define HAVE_VECTOR_islessgreater 1
+# define VEC_PREFIX_islessgreater VEC_PREFIX
+#else
+# define HAVE_VECTOR_islessgreater 0
+# define VEC_PREFIX_islessgreater
+#endif
+
+#ifdef __DECL_SIMD_islessgreaterf
+# define HAVE_VECTOR_islessgreaterf 1
+# define VEC_PREFIX_islessgreaterf VEC_PREFIX
+#else
+# define HAVE_VECTOR_islessgreaterf 0
+# define VEC_PREFIX_islessgreaterf
+#endif
+
+#ifdef __DECL_SIMD_islessgreaterl
+# define HAVE_VECTOR_islessgreaterl 1
+# define VEC_PREFIX_islessgreaterl VEC_PREFIX
+#else
+# define HAVE_VECTOR_islessgreaterl 0
+# define VEC_PREFIX_islessgreaterl
+#endif
+
+#ifdef __DECL_SIMD_isunordered
+# define HAVE_VECTOR_isunordered 1
+# define VEC_PREFIX_isunordered VEC_PREFIX
+#else
+# define HAVE_VECTOR_isunordered 0
+# define VEC_PREFIX_isunordered
+#endif
+
+#ifdef __DECL_SIMD_isunorderedf
+# define HAVE_VECTOR_isunorderedf 1
+# define VEC_PREFIX_isunorderedf VEC_PREFIX
+#else
+# define HAVE_VECTOR_isunorderedf 0
+# define VEC_PREFIX_isunorderedf
+#endif
+
+#ifdef __DECL_SIMD_isunorderedl
+# define HAVE_VECTOR_isunorderedl 1
+# define VEC_PREFIX_isunorderedl VEC_PREFIX
+#else
+# define HAVE_VECTOR_isunorderedl 0
+# define VEC_PREFIX_isunorderedl
+#endif
+
+#ifdef __DECL_SIMD_cabs
+# define HAVE_VECTOR_cabs 1
+# define VEC_PREFIX_cabs VEC_PREFIX
+#else
+# define HAVE_VECTOR_cabs 0
+# define VEC_PREFIX_cabs
+#endif
+
+#ifdef __DECL_SIMD_cabsf
+# define HAVE_VECTOR_cabsf 1
+# define VEC_PREFIX_cabsf VEC_PREFIX
+#else
+# define HAVE_VECTOR_cabsf 0
+# define VEC_PREFIX_cabsf
+#endif
+
+#ifdef __DECL_SIMD_cabsl
+# define HAVE_VECTOR_cabsl 1
+# define VEC_PREFIX_cabsl VEC_PREFIX
+#else
+# define HAVE_VECTOR_cabsl 0
+# define VEC_PREFIX_cabsl
+#endif
+
+#ifdef __DECL_SIMD_cacos
+# define HAVE_VECTOR_cacos 1
+# define VEC_PREFIX_cacos VEC_PREFIX
+#else
+# define HAVE_VECTOR_cacos 0
+# define VEC_PREFIX_cacos
+#endif
+
+#ifdef __DECL_SIMD_cacosf
+# define HAVE_VECTOR_cacosf 1
+# define VEC_PREFIX_cacosf VEC_PREFIX
+#else
+# define HAVE_VECTOR_cacosf 0
+# define VEC_PREFIX_cacosf
+#endif
+
+#ifdef __DECL_SIMD_cacosl
+# define HAVE_VECTOR_cacosl 1
+# define VEC_PREFIX_cacosl VEC_PREFIX
+#else
+# define HAVE_VECTOR_cacosl 0
+# define VEC_PREFIX_cacosl
+#endif
+
+#ifdef __DECL_SIMD_cacosh
+# define HAVE_VECTOR_cacosh 1
+# define VEC_PREFIX_cacosh VEC_PREFIX
+#else
+# define HAVE_VECTOR_cacosh 0
+# define VEC_PREFIX_cacosh
+#endif
+
+#ifdef __DECL_SIMD_cacoshf
+# define HAVE_VECTOR_cacoshf 1
+# define VEC_PREFIX_cacoshf VEC_PREFIX
+#else
+# define HAVE_VECTOR_cacoshf 0
+# define VEC_PREFIX_cacoshf
+#endif
+
+#ifdef __DECL_SIMD_cacoshl
+# define HAVE_VECTOR_cacoshl 1
+# define VEC_PREFIX_cacoshl VEC_PREFIX
+#else
+# define HAVE_VECTOR_cacoshl 0
+# define VEC_PREFIX_cacoshl
+#endif
+
+#ifdef __DECL_SIMD_carg
+# define HAVE_VECTOR_carg 1
+# define VEC_PREFIX_carg VEC_PREFIX
+#else
+# define HAVE_VECTOR_carg 0
+# define VEC_PREFIX_carg
+#endif
+
+#ifdef __DECL_SIMD_cargf
+# define HAVE_VECTOR_cargf 1
+# define VEC_PREFIX_cargf VEC_PREFIX
+#else
+# define HAVE_VECTOR_cargf 0
+# define VEC_PREFIX_cargf
+#endif
+
+#ifdef __DECL_SIMD_cargl
+# define HAVE_VECTOR_cargl 1
+# define VEC_PREFIX_cargl VEC_PREFIX
+#else
+# define HAVE_VECTOR_cargl 0
+# define VEC_PREFIX_cargl
+#endif
+
+#ifdef __DECL_SIMD_casin
+# define HAVE_VECTOR_casin 1
+# define VEC_PREFIX_casin VEC_PREFIX
+#else
+# define HAVE_VECTOR_casin 0
+# define VEC_PREFIX_casin
+#endif
+
+#ifdef __DECL_SIMD_casinf
+# define HAVE_VECTOR_casinf 1
+# define VEC_PREFIX_casinf VEC_PREFIX
+#else
+# define HAVE_VECTOR_casinf 0
+# define VEC_PREFIX_casinf
+#endif
+
+#ifdef __DECL_SIMD_casinl
+# define HAVE_VECTOR_casinl 1
+# define VEC_PREFIX_casinl VEC_PREFIX
+#else
+# define HAVE_VECTOR_casinl 0
+# define VEC_PREFIX_casinl
+#endif
+
+#ifdef __DECL_SIMD_casinh
+# define HAVE_VECTOR_casinh 1
+# define VEC_PREFIX_casinh VEC_PREFIX
+#else
+# define HAVE_VECTOR_casinh 0
+# define VEC_PREFIX_casinh
+#endif
+
+#ifdef __DECL_SIMD_casinhf
+# define HAVE_VECTOR_casinhf 1
+# define VEC_PREFIX_casinhf VEC_PREFIX
+#else
+# define HAVE_VECTOR_casinhf 0
+# define VEC_PREFIX_casinhf
+#endif
+
+#ifdef __DECL_SIMD_casinhl
+# define HAVE_VECTOR_casinhl 1
+# define VEC_PREFIX_casinhl VEC_PREFIX
+#else
+# define HAVE_VECTOR_casinhl 0
+# define VEC_PREFIX_casinhl
+#endif
+
+#ifdef __DECL_SIMD_catan
+# define HAVE_VECTOR_catan 1
+# define VEC_PREFIX_catan VEC_PREFIX
+#else
+# define HAVE_VECTOR_catan 0
+# define VEC_PREFIX_catan
+#endif
+
+#ifdef __DECL_SIMD_catanf
+# define HAVE_VECTOR_catanf 1
+# define VEC_PREFIX_catanf VEC_PREFIX
+#else
+# define HAVE_VECTOR_catanf 0
+# define VEC_PREFIX_catanf
+#endif
+
+#ifdef __DECL_SIMD_catanl
+# define HAVE_VECTOR_catanl 1
+# define VEC_PREFIX_catanl VEC_PREFIX
+#else
+# define HAVE_VECTOR_catanl 0
+# define VEC_PREFIX_catanl
+#endif
+
+#ifdef __DECL_SIMD_catanh
+# define HAVE_VECTOR_catanh 1
+# define VEC_PREFIX_catanh VEC_PREFIX
+#else
+# define HAVE_VECTOR_catanh 0
+# define VEC_PREFIX_catanh
+#endif
+
+#ifdef __DECL_SIMD_catanhf
+# define HAVE_VECTOR_catanhf 1
+# define VEC_PREFIX_catanhf VEC_PREFIX
+#else
+# define HAVE_VECTOR_catanhf 0
+# define VEC_PREFIX_catanhf
+#endif
+
+#ifdef __DECL_SIMD_catanhl
+# define HAVE_VECTOR_catanhl 1
+# define VEC_PREFIX_catanhl VEC_PREFIX
+#else
+# define HAVE_VECTOR_catanhl 0
+# define VEC_PREFIX_catanhl
+#endif
+
+#ifdef __DECL_SIMD_ccos
+# define HAVE_VECTOR_ccos 1
+# define VEC_PREFIX_ccos VEC_PREFIX
+#else
+# define HAVE_VECTOR_ccos 0
+# define VEC_PREFIX_ccos
+#endif
+
+#ifdef __DECL_SIMD_ccosf
+# define HAVE_VECTOR_ccosf 1
+# define VEC_PREFIX_ccosf VEC_PREFIX
+#else
+# define HAVE_VECTOR_ccosf 0
+# define VEC_PREFIX_ccosf
+#endif
+
+#ifdef __DECL_SIMD_ccosl
+# define HAVE_VECTOR_ccosl 1
+# define VEC_PREFIX_ccosl VEC_PREFIX
+#else
+# define HAVE_VECTOR_ccosl 0
+# define VEC_PREFIX_ccosl
+#endif
+
+#ifdef __DECL_SIMD_ccosh
+# define HAVE_VECTOR_ccosh 1
+# define VEC_PREFIX_ccosh VEC_PREFIX
+#else
+# define HAVE_VECTOR_ccosh 0
+# define VEC_PREFIX_ccosh
+#endif
+
+#ifdef __DECL_SIMD_ccoshf
+# define HAVE_VECTOR_ccoshf 1
+# define VEC_PREFIX_ccoshf VEC_PREFIX
+#else
+# define HAVE_VECTOR_ccoshf 0
+# define VEC_PREFIX_ccoshf
+#endif
+
+#ifdef __DECL_SIMD_ccoshl
+# define HAVE_VECTOR_ccoshl 1
+# define VEC_PREFIX_ccoshl VEC_PREFIX
+#else
+# define HAVE_VECTOR_ccoshl 0
+# define VEC_PREFIX_ccoshl
+#endif
+
+#ifdef __DECL_SIMD_cexp
+# define HAVE_VECTOR_cexp 1
+# define VEC_PREFIX_cexp VEC_PREFIX
+#else
+# define HAVE_VECTOR_cexp 0
+# define VEC_PREFIX_cexp
+#endif
+
+#ifdef __DECL_SIMD_cexpf
+# define HAVE_VECTOR_cexpf 1
+# define VEC_PREFIX_cexpf VEC_PREFIX
+#else
+# define HAVE_VECTOR_cexpf 0
+# define VEC_PREFIX_cexpf
+#endif
+
+#ifdef __DECL_SIMD_cexpl
+# define HAVE_VECTOR_cexpl 1
+# define VEC_PREFIX_cexpl VEC_PREFIX
+#else
+# define HAVE_VECTOR_cexpl 0
+# define VEC_PREFIX_cexpl
+#endif
+
+#ifdef __DECL_SIMD_cimag
+# define HAVE_VECTOR_cimag 1
+# define VEC_PREFIX_cimag VEC_PREFIX
+#else
+# define HAVE_VECTOR_cimag 0
+# define VEC_PREFIX_cimag
+#endif
+
+#ifdef __DECL_SIMD_cimagf
+# define HAVE_VECTOR_cimagf 1
+# define VEC_PREFIX_cimagf VEC_PREFIX
+#else
+# define HAVE_VECTOR_cimagf 0
+# define VEC_PREFIX_cimagf
+#endif
+
+#ifdef __DECL_SIMD_cimagl
+# define HAVE_VECTOR_cimagl 1
+# define VEC_PREFIX_cimagl VEC_PREFIX
+#else
+# define HAVE_VECTOR_cimagl 0
+# define VEC_PREFIX_cimagl
+#endif
+
+#ifdef __DECL_SIMD_clog10
+# define HAVE_VECTOR_clog10 1
+# define VEC_PREFIX_clog10 VEC_PREFIX
+#else
+# define HAVE_VECTOR_clog10 0
+# define VEC_PREFIX_clog10
+#endif
+
+#ifdef __DECL_SIMD_clog10f
+# define HAVE_VECTOR_clog10f 1
+# define VEC_PREFIX_clog10f VEC_PREFIX
+#else
+# define HAVE_VECTOR_clog10f 0
+# define VEC_PREFIX_clog10f
+#endif
+
+#ifdef __DECL_SIMD_clog10l
+# define HAVE_VECTOR_clog10l 1
+# define VEC_PREFIX_clog10l VEC_PREFIX
+#else
+# define HAVE_VECTOR_clog10l 0
+# define VEC_PREFIX_clog10l
+#endif
+
+#ifdef __DECL_SIMD_clog
+# define HAVE_VECTOR_clog 1
+# define VEC_PREFIX_clog VEC_PREFIX
+#else
+# define HAVE_VECTOR_clog 0
+# define VEC_PREFIX_clog
+#endif
+
+#ifdef __DECL_SIMD_clogf
+# define HAVE_VECTOR_clogf 1
+# define VEC_PREFIX_clogf VEC_PREFIX
+#else
+# define HAVE_VECTOR_clogf 0
+# define VEC_PREFIX_clogf
+#endif
+
+#ifdef __DECL_SIMD_clogl
+# define HAVE_VECTOR_clogl 1
+# define VEC_PREFIX_clogl VEC_PREFIX
+#else
+# define HAVE_VECTOR_clogl 0
+# define VEC_PREFIX_clogl
+#endif
+
+#ifdef __DECL_SIMD_conj
+# define HAVE_VECTOR_conj 1
+# define VEC_PREFIX_conj VEC_PREFIX
+#else
+# define HAVE_VECTOR_conj 0
+# define VEC_PREFIX_conj
+#endif
+
+#ifdef __DECL_SIMD_conjf
+# define HAVE_VECTOR_conjf 1
+# define VEC_PREFIX_conjf VEC_PREFIX
+#else
+# define HAVE_VECTOR_conjf 0
+# define VEC_PREFIX_conjf
+#endif
+
+#ifdef __DECL_SIMD_conjl
+# define HAVE_VECTOR_conjl 1
+# define VEC_PREFIX_conjl VEC_PREFIX
+#else
+# define HAVE_VECTOR_conjl 0
+# define VEC_PREFIX_conjl
+#endif
+
+#ifdef __DECL_SIMD_cpow
+# define HAVE_VECTOR_cpow 1
+# define VEC_PREFIX_cpow VEC_PREFIX
+#else
+# define HAVE_VECTOR_cpow 0
+# define VEC_PREFIX_cpow
+#endif
+
+#ifdef __DECL_SIMD_cpowf
+# define HAVE_VECTOR_cpowf 1
+# define VEC_PREFIX_cpowf VEC_PREFIX
+#else
+# define HAVE_VECTOR_cpowf 0
+# define VEC_PREFIX_cpowf
+#endif
+
+#ifdef __DECL_SIMD_cpowl
+# define HAVE_VECTOR_cpowl 1
+# define VEC_PREFIX_cpowl VEC_PREFIX
+#else
+# define HAVE_VECTOR_cpowl 0
+# define VEC_PREFIX_cpowl
+#endif
+
+#ifdef __DECL_SIMD_cproj
+# define HAVE_VECTOR_cproj 1
+# define VEC_PREFIX_cproj VEC_PREFIX
+#else
+# define HAVE_VECTOR_cproj 0
+# define VEC_PREFIX_cproj
+#endif
+
+#ifdef __DECL_SIMD_cprojf
+# define HAVE_VECTOR_cprojf 1
+# define VEC_PREFIX_cprojf VEC_PREFIX
+#else
+# define HAVE_VECTOR_cprojf 0
+# define VEC_PREFIX_cprojf
+#endif
+
+#ifdef __DECL_SIMD_cprojl
+# define HAVE_VECTOR_cprojl 1
+# define VEC_PREFIX_cprojl VEC_PREFIX
+#else
+# define HAVE_VECTOR_cprojl 0
+# define VEC_PREFIX_cprojl
+#endif
+
+#ifdef __DECL_SIMD_creal
+# define HAVE_VECTOR_creal 1
+# define VEC_PREFIX_creal VEC_PREFIX
+#else
+# define HAVE_VECTOR_creal 0
+# define VEC_PREFIX_creal
+#endif
+
+#ifdef __DECL_SIMD_crealf
+# define HAVE_VECTOR_crealf 1
+# define VEC_PREFIX_crealf VEC_PREFIX
+#else
+# define HAVE_VECTOR_crealf 0
+# define VEC_PREFIX_crealf
+#endif
+
+#ifdef __DECL_SIMD_creall
+# define HAVE_VECTOR_creall 1
+# define VEC_PREFIX_creall VEC_PREFIX
+#else
+# define HAVE_VECTOR_creall 0
+# define VEC_PREFIX_creall
+#endif
+
+#ifdef __DECL_SIMD_csin
+# define HAVE_VECTOR_csin 1
+# define VEC_PREFIX_csin VEC_PREFIX
+#else
+# define HAVE_VECTOR_csin 0
+# define VEC_PREFIX_csin
+#endif
+
+#ifdef __DECL_SIMD_csinf
+# define HAVE_VECTOR_csinf 1
+# define VEC_PREFIX_csinf VEC_PREFIX
+#else
+# define HAVE_VECTOR_csinf 0
+# define VEC_PREFIX_csinf
+#endif
+
+#ifdef __DECL_SIMD_csinl
+# define HAVE_VECTOR_csinl 1
+# define VEC_PREFIX_csinl VEC_PREFIX
+#else
+# define HAVE_VECTOR_csinl 0
+# define VEC_PREFIX_csinl
+#endif
+
+#ifdef __DECL_SIMD_csinh
+# define HAVE_VECTOR_csinh 1
+# define VEC_PREFIX_csinh VEC_PREFIX
+#else
+# define HAVE_VECTOR_csinh 0
+# define VEC_PREFIX_csinh
+#endif
+
+#ifdef __DECL_SIMD_csinhf
+# define HAVE_VECTOR_csinhf 1
+# define VEC_PREFIX_csinhf VEC_PREFIX
+#else
+# define HAVE_VECTOR_csinhf 0
+# define VEC_PREFIX_csinhf
+#endif
+
+#ifdef __DECL_SIMD_csinhl
+# define HAVE_VECTOR_csinhl 1
+# define VEC_PREFIX_csinhl VEC_PREFIX
+#else
+# define HAVE_VECTOR_csinhl 0
+# define VEC_PREFIX_csinhl
+#endif
+
+#ifdef __DECL_SIMD_csqrt
+# define HAVE_VECTOR_csqrt 1
+# define VEC_PREFIX_csqrt VEC_PREFIX
+#else
+# define HAVE_VECTOR_csqrt 0
+# define VEC_PREFIX_csqrt
+#endif
+
+#ifdef __DECL_SIMD_csqrtf
+# define HAVE_VECTOR_csqrtf 1
+# define VEC_PREFIX_csqrtf VEC_PREFIX
+#else
+# define HAVE_VECTOR_csqrtf 0
+# define VEC_PREFIX_csqrtf
+#endif
+
+#ifdef __DECL_SIMD_csqrtl
+# define HAVE_VECTOR_csqrtl 1
+# define VEC_PREFIX_csqrtl VEC_PREFIX
+#else
+# define HAVE_VECTOR_csqrtl 0
+# define VEC_PREFIX_csqrtl
+#endif
+
+#ifdef __DECL_SIMD_ctan
+# define HAVE_VECTOR_ctan 1
+# define VEC_PREFIX_ctan VEC_PREFIX
+#else
+# define HAVE_VECTOR_ctan 0
+# define VEC_PREFIX_ctan
+#endif
+
+#ifdef __DECL_SIMD_ctanf
+# define HAVE_VECTOR_ctanf 1
+# define VEC_PREFIX_ctanf VEC_PREFIX
+#else
+# define HAVE_VECTOR_ctanf 0
+# define VEC_PREFIX_ctanf
+#endif
+
+#ifdef __DECL_SIMD_ctanl
+# define HAVE_VECTOR_ctanl 1
+# define VEC_PREFIX_ctanl VEC_PREFIX
+#else
+# define HAVE_VECTOR_ctanl 0
+# define VEC_PREFIX_ctanl
+#endif
+
+#ifdef __DECL_SIMD_ctanh
+# define HAVE_VECTOR_ctanh 1
+# define VEC_PREFIX_ctanh VEC_PREFIX
+#else
+# define HAVE_VECTOR_ctanh 0
+# define VEC_PREFIX_ctanh
+#endif
+
+#ifdef __DECL_SIMD_ctanhf
+# define HAVE_VECTOR_ctanhf 1
+# define VEC_PREFIX_ctanhf VEC_PREFIX
+#else
+# define HAVE_VECTOR_ctanhf 0
+# define VEC_PREFIX_ctanhf
+#endif
+
+#ifdef __DECL_SIMD_ctanhl
+# define HAVE_VECTOR_ctanhl 1
+# define VEC_PREFIX_ctanhl VEC_PREFIX
+#else
+# define HAVE_VECTOR_ctanhl 0
+# define VEC_PREFIX_ctanhl
+#endif
+
+#ifdef __DECL_SIMD_j0
+# define HAVE_VECTOR_j0 1
+# define VEC_PREFIX_j0 VEC_PREFIX
+#else
+# define HAVE_VECTOR_j0 0
+# define VEC_PREFIX_j0
+#endif
+
+#ifdef __DECL_SIMD_j0f
+# define HAVE_VECTOR_j0f 1
+# define VEC_PREFIX_j0f VEC_PREFIX
+#else
+# define HAVE_VECTOR_j0f 0
+# define VEC_PREFIX_j0f
+#endif
+
+#ifdef __DECL_SIMD_j0l
+# define HAVE_VECTOR_j0l 1
+# define VEC_PREFIX_j0l VEC_PREFIX
+#else
+# define HAVE_VECTOR_j0l 0
+# define VEC_PREFIX_j0l
+#endif
+
+#ifdef __DECL_SIMD_j1
+# define HAVE_VECTOR_j1 1
+# define VEC_PREFIX_j1 VEC_PREFIX
+#else
+# define HAVE_VECTOR_j1 0
+# define VEC_PREFIX_j1
+#endif
+
+#ifdef __DECL_SIMD_j1f
+# define HAVE_VECTOR_j1f 1
+# define VEC_PREFIX_j1f VEC_PREFIX
+#else
+# define HAVE_VECTOR_j1f 0
+# define VEC_PREFIX_j1f
+#endif
+
+#ifdef __DECL_SIMD_j1l
+# define HAVE_VECTOR_j1l 1
+# define VEC_PREFIX_j1l VEC_PREFIX
+#else
+# define HAVE_VECTOR_j1l 0
+# define VEC_PREFIX_j1l
+#endif
+
+#ifdef __DECL_SIMD_jn
+# define HAVE_VECTOR_jn 1
+# define VEC_PREFIX_jn VEC_PREFIX
+#else
+# define HAVE_VECTOR_jn 0
+# define VEC_PREFIX_jn
+#endif
+
+#ifdef __DECL_SIMD_jnf
+# define HAVE_VECTOR_jnf 1
+# define VEC_PREFIX_jnf VEC_PREFIX
+#else
+# define HAVE_VECTOR_jnf 0
+# define VEC_PREFIX_jnf
+#endif
+
+#ifdef __DECL_SIMD_jnl
+# define HAVE_VECTOR_jnl 1
+# define VEC_PREFIX_jnl VEC_PREFIX
+#else
+# define HAVE_VECTOR_jnl 0
+# define VEC_PREFIX_jnl
+#endif
+
+#ifdef __DECL_SIMD_y0
+# define HAVE_VECTOR_y0 1
+# define VEC_PREFIX_y0 VEC_PREFIX
+#else
+# define HAVE_VECTOR_y0 0
+# define VEC_PREFIX_y0
+#endif
+
+#ifdef __DECL_SIMD_y0f
+# define HAVE_VECTOR_y0f 1
+# define VEC_PREFIX_y0f VEC_PREFIX
+#else
+# define HAVE_VECTOR_y0f 0
+# define VEC_PREFIX_y0f
+#endif
+
+#ifdef __DECL_SIMD_y0l
+# define HAVE_VECTOR_y0l 1
+# define VEC_PREFIX_y0l VEC_PREFIX
+#else
+# define HAVE_VECTOR_y0l 0
+# define VEC_PREFIX_y0l
+#endif
+
+#ifdef __DECL_SIMD_y1
+# define HAVE_VECTOR_y1 1
+# define VEC_PREFIX_y1 VEC_PREFIX
+#else
+# define HAVE_VECTOR_y1 0
+# define VEC_PREFIX_y1
+#endif
+
+#ifdef __DECL_SIMD_y1f
+# define HAVE_VECTOR_y1f 1
+# define VEC_PREFIX_y1f VEC_PREFIX
+#else
+# define HAVE_VECTOR_y1f 0
+# define VEC_PREFIX_y1f
+#endif
+
+#ifdef __DECL_SIMD_y1l
+# define HAVE_VECTOR_y1l 1
+# define VEC_PREFIX_y1l VEC_PREFIX
+#else
+# define HAVE_VECTOR_y1l 0
+# define VEC_PREFIX_y1l
+#endif
+
+#ifdef __DECL_SIMD_yn
+# define HAVE_VECTOR_yn 1
+# define VEC_PREFIX_yn VEC_PREFIX
+#else
+# define HAVE_VECTOR_yn 0
+# define VEC_PREFIX_yn
+#endif
+
+#ifdef __DECL_SIMD_ynf
+# define HAVE_VECTOR_ynf 1
+# define VEC_PREFIX_ynf VEC_PREFIX
+#else
+# define HAVE_VECTOR_ynf 0
+# define VEC_PREFIX_ynf
+#endif
+
+#ifdef __DECL_SIMD_ynl
+# define HAVE_VECTOR_ynl 1
+# define VEC_PREFIX_ynl VEC_PREFIX
+#else
+# define HAVE_VECTOR_ynl 0
+# define VEC_PREFIX_ynl
+#endif
+
diff --git a/math/libm-test.inc b/math/libm-test.inc
index f86a4fa..f8a10e2 100644
--- a/math/libm-test.inc
+++ b/math/libm-test.inc
@@ -126,6 +126,7 @@
 #include <argp.h>
 #include <tininess.h>
 #include <math-tests.h>
+#include <init-arch.h>
 
 /* Structure for ulp data for a function, or the real or imaginary
    part of a function.  */
@@ -302,6 +303,8 @@ static int output_max_error;	/* Should the maximal errors printed?  */
 static int output_points;	/* Should the single function results printed?  */
 static int ignore_max_ulp;	/* Should we ignore max_ulp?  */
 
+static int avx2_usable;		/* Set to 1 if AVX2 supported */
+
 #define plus_zero	CHOOSE (0.0L, 0.0, 0.0f,	\
 				0.0L, 0.0, 0.0f)
 #define minus_zero	CHOOSE (-0.0L, -0.0, -0.0f,	\
@@ -678,13 +681,17 @@ test_exceptions (const char *test_name, int exception)
   feclearexcept (FE_ALL_EXCEPT);
 }
 
+#ifndef TEST_MATHVEC
+# define TEST_MATHVEC 0
+#endif
+
 /* Test whether errno for TEST_NAME, set to ERRNO_VALUE, has value
    EXPECTED_VALUE (description EXPECTED_NAME).  */
 static void
 test_single_errno (const char *test_name, int errno_value,
 		   int expected_value, const char *expected_name)
 {
-#ifndef TEST_INLINE
+#if !defined TEST_INLINE && !TEST_MATHVEC
   if (errno_value == expected_value)
     {
       if (print_screen (1))
@@ -1295,16 +1302,19 @@ struct test_fFF_11_data
 
 /* Run an individual test, including any required setup and checking
    of results, or loop over all tests in an array.  */
-#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
-		     EXCEPTIONS);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,				\
+		     EXCEPTIONS)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_float (test_name,							\
+		     CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )),	\
+			     FUNC (FUNC_NAME)) (ARG),				\
+			     EXPECTED,						\
+		     EXCEPTIONS);						\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_f_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1690,10 +1700,34 @@ struct test_fFF_11_data
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra2_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
 
+#ifndef CHECK_ARCH_EXT
+# define CHECK_ARCH_EXT
+#endif
+
+#ifndef VEC_PREFIX 
+# define VEC_PREFIX
+#endif
+
+#include "have_vector.h"
+
+#define CONCAT(prefix,func) __CONCAT(prefix,func)
+
+#define CONCAT3(a,b,c) a ## b ## c
+#define CONCAT3_1(a,b,c) CONCAT3(a,b,c)
+
+#define HAVE_VECTOR_INNER(func,sfx) HAVE_VECTOR_ ## func ## sfx
+#define HAVE_VECTOR_(func,sfx) HAVE_VECTOR_INNER(func,sfx)
+
+#define STR(a,b,c) __STRING(a##b##c)
+#define CONCAT3_1_STR(a,b,c) STR(a,b,c)
+
 /* Start and end the tests for a given function.  */
-#define START(FUNC, EXACT)			\
-  const char *this_func = #FUNC;		\
+#define START(FUN, SUFF, EXACT)						\
+  CHECK_ARCH_EXT							\
+  if (TEST_MATHVEC && !HAVE_VECTOR_(FUN, FUNC( ))) return;		\
+  const char *this_func = CONCAT3_1_STR(VEC_PREFIX, FUN, SUFF);	\
   init_max_error (this_func, EXACT)
+  
 #define END					\
   print_max_error (this_func)
 #define END_COMPLEX				\
@@ -1705,28 +1739,28 @@ struct test_fFF_11_data
     {									\
       do								\
 	{								\
-	  START (FUNC, EXACT);						\
+	  START (FUNC, , EXACT);					\
 	  LOOP_MACRO (FUNC, ARRAY, , ## __VA_ARGS__);			\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _downward, EXACT);				\
+	  START (FUNC, _downward, EXACT);				\
 	  LOOP_MACRO (FUNC, ARRAY, FE_DOWNWARD, ## __VA_ARGS__);	\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _towardzero, EXACT);				\
+	  START (FUNC, _towardzero, EXACT);				\
 	  LOOP_MACRO (FUNC, ARRAY, FE_TOWARDZERO, ## __VA_ARGS__);	\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _upward, EXACT);				\
+	  START (FUNC, _upward, EXACT);					\
 	  LOOP_MACRO (FUNC, ARRAY, FE_UPWARD, ## __VA_ARGS__);		\
 	  END_MACRO;							\
 	}								\
@@ -1746,7 +1780,6 @@ matherr (struct exception *x __attribute__ ((unused)))
   Tests for single functions of libm.
   Please keep them alphabetically sorted!
 ****************************************************************************/
-
 static const struct test_f_f_data acos_test_data[] =
   {
     TEST_f_f (acos, plus_infty, qnan_value, INVALID_EXCEPTION|ERRNO_EDOM),
@@ -6034,7 +6067,7 @@ static const struct test_c_c_data cexp_test_data[] =
 static void
 cexp_test (void)
 {
-  START (cexp, 0);
+  START (cexp, , 0);
   RUN_TEST_LOOP_c_c (cexp, cexp_test_data, );
   END_COMPLEX;
 }
@@ -6245,7 +6278,6 @@ copysign_test (void)
   ALL_RM_TEST (copysign, 1, copysign_test_data, RUN_TEST_LOOP_ff_f, END);
 }
 
-
 static const struct test_f_f_data cos_test_data[] =
   {
     TEST_f_f (cos, plus_infty, qnan_value, INVALID_EXCEPTION|ERRNO_EDOM),
@@ -6261,7 +6293,6 @@ cos_test (void)
   ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
 }
 
-
 static const struct test_f_f_data cosh_test_data[] =
   {
     TEST_f_f (cosh, plus_infty, plus_infty, NO_TEST_INLINE),
@@ -7548,7 +7579,7 @@ static const struct test_if_f_data jn_test_data[] =
 static void
 jn_test (void)
 {
-  START (jn, 0);
+  START (jn, , 0);
   RUN_TEST_LOOP_if_f (jn, jn_test_data, );
   END;
 }
@@ -9374,7 +9405,7 @@ static const struct test_f_f_data tgamma_test_data[] =
 static void
 tgamma_test (void)
 {
-  START (tgamma, 0);
+  START (tgamma, , 0);
   RUN_TEST_LOOP_f_f (tgamma, tgamma_test_data, );
   END;
 }
@@ -9628,7 +9659,6 @@ significand_test (void)
   ALL_RM_TEST (significand, 1, significand_test_data, RUN_TEST_LOOP_f_f, END);
 }
 
-
 static void
 initialize (void)
 {
@@ -9790,6 +9820,7 @@ main (int argc, char **argv)
   output_dir = NULL;
   /* XXX set to 0 for releases.  */
   ignore_max_ulp = 0;
+  avx2_usable = 0;
 
   /* Parse and process arguments.  */
   argp_parse (&argp, argc, argv, 0, &remaining, NULL);
@@ -9820,10 +9851,14 @@ main (int argc, char **argv)
 	}
     }
 
-
   initialize ();
   printf (TEST_MSG);
 
+#if TEST_MATHVEC
+  __init_cpu_features();
+  avx2_usable = __cpu_features.feature[index_AVX2_Usable] & bit_AVX2_Usable;
+#endif
+
   check_ulp ();
 
   /* Keep the tests a wee bit ordered (according to ISO C99).  */
diff --git a/math/math.h b/math/math.h
index dc532b7..94ec05b 100644
--- a/math/math.h
+++ b/math/math.h
@@ -27,6 +27,9 @@
 
 __BEGIN_DECLS
 
+/* Get machine-dependent vector math functions declarations */
+#include <bits/math-vector.h>
+
 /* Get machine-dependent HUGE_VAL value (returned on overflow).
    On all IEEE754 machines, this is +Infinity.  */
 #include <bits/huge_val.h>
diff --git a/mathvec/Depend b/mathvec/Depend
new file mode 100644
index 0000000..ede10ab
--- /dev/null
+++ b/mathvec/Depend
@@ -0,0 +1 @@
+math
diff --git a/mathvec/Makefile b/mathvec/Makefile
new file mode 100644
index 0000000..16b918e
--- /dev/null
+++ b/mathvec/Makefile
@@ -0,0 +1,35 @@
+# Copyright (C) 2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Makefile for the vector math library.
+
+subdir		:= mathvec
+
+include ../Makeconfig
+
+ifeq ($(build-mathvec),yes)
+extra-libs	:= libmvec
+extra-libs-others = $(extra-libs)
+
+libmvec-routines = $(strip $(libmvec-support))
+
+$(objpfx)libmvec.so: $(common-objpfx)math/libm.so
+endif
+
+# Rules for the test suite are in math directory
+
+include ../Rules
diff --git a/shlib-versions b/shlib-versions
index e05b248..fa3cf1d 100644
--- a/shlib-versions
+++ b/shlib-versions
@@ -71,3 +71,6 @@ libanl=1
 # This defines the libgcc soname version this glibc is to load for
 # asynchronous cancellation to work correctly.
 libgcc_s=1
+
+# The vector math library
+libmvec=1
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
new file mode 100644
index 0000000..8272ddd
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -0,0 +1,3 @@
+GLIBC_2.21
+ GLIBC_2.21 A
+ _ZGVdN4v_cos F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
new file mode 100644
index 0000000..fdd967f
--- /dev/null
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -0,0 +1,44 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+	include <math.h> instead."
+#endif
+
+#if defined __x86_64__ && defined __FAST_MATH__
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case. */
+#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
+#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
+#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
+#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
+# elif defined _CILKPLUS && _CILKPLUS >= 0
+/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
+#  define __DECL_SIMD_AVX2 __attribute__((__vector__(nomask)))
+#  define __DECL_SIMD_SSE4 __attribute__((__vector__(processor(core_i7_sse4_2),\
+						     nomask)))
+#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
+#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
+# endif
+#endif
+
+#if defined TEST_MATHVEC
+# define __DECL_SIMD_cos
+# define __DECL_SIMD_cosf
+#endif
diff --git a/sysdeps/x86_64/configure b/sysdeps/x86_64/configure
index 7d4dadd..b087e65 100644
--- a/sysdeps/x86_64/configure
+++ b/sysdeps/x86_64/configure
@@ -43,6 +43,36 @@ fi
 
 
 
+if test $build_mathvec == notset; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for compiler target is x86_64" >&5
+$as_echo_n "checking for compiler target is x86_64... " >&6; }
+if ${libc_cv_cc_target_x86_64+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+    cat > conftest.c <<\EOF
+        #if !defined (__x86_64__)
+        # error "target is not x86_64"
+        #endif
+EOF
+  if { ac_try='${CC-cc} -c $ASFLAGS conftest.c 1>&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }; then
+    libc_cv_cc_target_x86_64=yes
+  else
+    libc_cv_cc_target_x86_64=no
+  fi
+  rm -f conftest*
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_cc_target_x86_64" >&5
+$as_echo "$libc_cv_cc_target_x86_64" >&6; }
+  build_mathvec=$libc_cv_cc_target_x86_64
+fi
+config_vars="$config_vars
+build-mathvec = $build_mathvec"
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for SSE4 support" >&5
 $as_echo_n "checking for SSE4 support... " >&6; }
 if ${libc_cv_cc_sse4+:} false; then :
diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
index c9f9a51..91c4cdf 100644
--- a/sysdeps/x86_64/configure.ac
+++ b/sysdeps/x86_64/configure.ac
@@ -5,6 +5,24 @@ AC_CHECK_HEADER([cpuid.h], ,
   [AC_MSG_ERROR([gcc must provide the <cpuid.h> header])],
   [/* No default includes.  */])
 
+dnl Check if compiler target is x86_64.
+if test $build_mathvec == notset; then
+  AC_CACHE_CHECK(for compiler target is x86_64, libc_cv_cc_target_x86_64, [dnl
+  cat > conftest.c <<\EOF
+        #if !defined (__x86_64__)
+        # error "target is not x86_64"
+        #endif
+EOF
+  if AC_TRY_COMMAND(${CC-cc} -c $ASFLAGS conftest.c 1>&AS_MESSAGE_LOG_FD); then
+    libc_cv_cc_target_x86_64=yes
+  else
+    libc_cv_cc_target_x86_64=no
+  fi
+  rm -f conftest*])
+  build_mathvec=$libc_cv_cc_target_x86_64
+fi
+LIBC_CONFIG_VAR([build-mathvec], [$build_mathvec])
+
 dnl Check if -msse4 works.
 AC_CACHE_CHECK(for SSE4 support, libc_cv_cc_sse4, [dnl
 LIBC_TRY_CC_OPTION([-msse4], [libc_cv_cc_sse4=yes], [libc_cv_cc_sse4=no])
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
new file mode 100644
index 0000000..d585fa0
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -0,0 +1,33 @@
+ifeq ($(subdir),mathvec)
+libmvec-support += svml_d_cos4_core svml_d_cos_data
+endif
+
+# Rules for libmvec tests
+ifeq ($(subdir),math)
+ifneq ($(PERL),no)
+ifeq ($(build-mathvec),yes)
+libm-tests += test-double-vlen4 test-float-vlen8
+
+CFLAGS-test-double-vlen4-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
+				     -frounding-math -mavx2
+CFLAGS-test-float-vlen8-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
+				    -frounding-math -mavx2
+
+CFLAGS-test-double-vlen4.c = -fno-inline -ffloat-store -fno-builtin \
+			     -frounding-math
+CFLAGS-test-float-vlen8.c = -fno-inline -ffloat-store -fno-builtin \
+			    -frounding-math
+
+$(objpfx)test-double-vlen4.o: $(objpfx)libm-test.stmp
+$(objpfx)test-float-vlen8.o: $(objpfx)libm-test.stmp
+
+$(objpfx)test-double-vlen4: $(common-objpfx)mathvec/libmvec.so \
+			    $(objpfx)test-double-vlen4-wrapper.o \
+			    $(objpfx)init-arch.o
+$(objpfx)test-float-vlen8: $(common-objpfx)mathvec/libmvec.so \
+			    $(objpfx)test-float-vlen8-wrapper.o \
+			    $(objpfx)init-arch.o
+
+endif
+endif
+endif
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
new file mode 100644
index 0000000..3d433d2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Versions
@@ -0,0 +1,5 @@
+libmvec {
+  GLIBC_2.21 {
+    _ZGVdN4v_cos;
+  }
+}
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 36e1b76..9e4f8cd 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1961,6 +1961,12 @@ ifloat: 3
 ildouble: 4
 ldouble: 4
 
+Function: "vlen4_cos":
+double: 1
+
+Function: "vlen8_cos":
+float: 1
+
 Function: "y0":
 double: 2
 float: 1
diff --git a/sysdeps/x86_64/fpu/svml_d_cos4_core.S b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
new file mode 100644
index 0000000..7c9f62e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
@@ -0,0 +1,186 @@
+/* Function cos vectorized with AVX2.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.text
+ENTRY(_ZGVdN4v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *     
+ *    ( low accuracy ( < 4ulp ) or enhanced performance 
+ *      ( half of correct mantissa ) implementation )
+ *     
+ *    Argument representation:
+ *    arg + Pi/2 = (N*Pi + R)
+ *    
+ *    Result calculation:
+ *    cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ *    sin(R) is approximated by corresponding polynomial
+ */
+        pushq     %rbp
+        movq      %rsp, %rbp
+        andq      $-64, %rsp
+        subq      $448, %rsp
+        movq      __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
+        vmovapd   %ymm0, %ymm1
+        vmovupd   192(%rax), %ymm4
+        vmovupd   256(%rax), %ymm5
+
+/* ARGUMENT RANGE REDUCTION:
+ * Add Pi/2 to argument: X' = X+Pi/2
+ */
+        vaddpd    128(%rax), %ymm1, %ymm7
+
+/* Get absolute argument value: X' = |X'| */
+        vandpd    (%rax), %ymm7, %ymm2
+
+/* Y = X'*InvPi + RS : right shifter add */
+        vfmadd213pd %ymm5, %ymm4, %ymm7
+        vmovupd   1216(%rax), %ymm4
+
+/* Check for large arguments path */
+        vcmpnle_uqpd 64(%rax), %ymm2, %ymm3
+
+/* N = Y - RS : right shifter sub */
+        vsubpd    %ymm5, %ymm7, %ymm6
+        vmovupd   640(%rax), %ymm2
+
+/* SignRes = Y<<63 : shift LSB to MSB place for result sign */
+        vpsllq    $63, %ymm7, %ymm7
+
+/* N = N - 0.5 */
+        vsubpd    320(%rax), %ymm6, %ymm0
+        vmovmskpd %ymm3, %ecx
+
+/* R = X - N*Pi1 */
+        vmovapd   %ymm1, %ymm3
+        vfnmadd231pd %ymm0, %ymm2, %ymm3
+
+/* R = R - N*Pi2 */
+        vfnmadd231pd 704(%rax), %ymm0, %ymm3
+
+/* R = R - N*Pi3 */
+        vfnmadd132pd 768(%rax), %ymm3, %ymm0
+
+/* POLYNOMIAL APPROXIMATION:
+ * R2 = R*R
+ */
+        vmulpd    %ymm0, %ymm0, %ymm5
+        vfmadd213pd 1152(%rax), %ymm5, %ymm4
+        vfmadd213pd 1088(%rax), %ymm5, %ymm4
+        vfmadd213pd 1024(%rax), %ymm5, %ymm4
+
+/* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))) */
+        vfmadd213pd 960(%rax), %ymm5, %ymm4
+        vfmadd213pd 896(%rax), %ymm5, %ymm4
+        vfmadd213pd 832(%rax), %ymm5, %ymm4
+        vmulpd    %ymm5, %ymm4, %ymm6
+        vfmadd213pd %ymm0, %ymm0, %ymm6
+
+/* RECONSTRUCTION:
+ * Final sign setting: Res = Poly^SignRes 
+ */
+        vxorpd    %ymm7, %ymm6, %ymm0
+        testl     %ecx, %ecx
+        jne       _LBL_1_3
+
+_LBL_1_2:
+        movq      %rbp, %rsp
+        popq      %rbp
+        ret
+
+_LBL_1_3:
+        vmovupd   %ymm1, 320(%rsp)
+        vmovupd   %ymm0, 384(%rsp)
+        je        _LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        vmovups   %ymm8, 224(%rsp)
+        vmovups   %ymm9, 192(%rsp)
+        vmovups   %ymm10, 160(%rsp)
+        vmovups   %ymm11, 128(%rsp)
+        vmovups   %ymm12, 96(%rsp)
+        vmovups   %ymm13, 64(%rsp)
+        vmovups   %ymm14, 32(%rsp)
+        vmovups   %ymm15, (%rsp)
+        movq      %rsi, 264(%rsp)
+        movq      %rdi, 256(%rsp)
+        movq      %r12, 296(%rsp)
+        movb      %dl, %r12b
+        movq      %r13, 288(%rsp)
+        movl      %ecx, %r13d
+        movq      %r14, 280(%rsp)
+        movl      %eax, %r14d
+        movq      %r15, 272(%rsp)
+
+_LBL_1_6:
+        btl       %r14d, %r13d
+        jc        _LBL_1_12
+
+_LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        _LBL_1_10
+
+_LBL_1_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        _LBL_1_6
+
+        vmovups   224(%rsp), %ymm8
+        vmovups   192(%rsp), %ymm9
+        vmovups   160(%rsp), %ymm10
+        vmovups   128(%rsp), %ymm11
+        vmovups   96(%rsp), %ymm12
+        vmovups   64(%rsp), %ymm13
+        vmovups   32(%rsp), %ymm14
+        vmovups   (%rsp), %ymm15
+        vmovupd   384(%rsp), %ymm0
+        movq      264(%rsp), %rsi
+        movq      256(%rsp), %rdi
+        movq      296(%rsp), %r12
+        movq      288(%rsp), %r13
+        movq      280(%rsp), %r14
+        movq      272(%rsp), %r15
+        jmp       _LBL_1_2
+
+_LBL_1_10:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    328(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 392(%rsp,%r15)
+        jmp       _LBL_1_8
+
+_LBL_1_12:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    320(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 384(%rsp,%r15)
+        jmp       _LBL_1_7
+END(_ZGVdN4v_cos)
diff --git a/sysdeps/x86_64/fpu/svml_d_cos_data.S b/sysdeps/x86_64/fpu/svml_d_cos_data.S
new file mode 100644
index 0000000..0f2ff1f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos_data.S
@@ -0,0 +1,492 @@
+/* Data for vectorized cos.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+	.section .rodata, "a"
+
+	.align 64
+	.globl __gnu_svml_dcos_data
+
+/* Data table for vector implementations of function cos. 
+ * The table may contain polynomial, reduction, lookup
+ * coefficients and other constants obtained through different
+ * methods of research and experimental work.
+ */
+__gnu_svml_dcos_data:
+
+/* General constants:
+ * lAbsMask
+ */
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+
+/* lRangeVal */
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+
+/* HalfPI */
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+
+/* InvPI */
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+
+/* RShifter */
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+
+/* OneHalf */
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+
+/* Range reduction PI-based constants:
+ * PI1
+ */
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+
+/* PI2 */
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+
+/* PI3 */
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+
+/* PI4 */
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+
+/* Range reduction PI-based constants if FMA available:
+ * PI1_FMA
+ */
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+
+/* PI2_FMA */
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+
+/* PI3_FMA */
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+
+/* Polynomial coeffifients (relative error 2^(-52.115)):
+ * C1
+ */
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+
+/* C2 */
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+
+/* C3 */
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+
+/* C4 */
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+
+/* C5 */
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+
+/* C6 */
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+
+/* C7 */
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+
+/* Additional constants:
+ * AbsMask
+ */
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+
+/* InvPI */
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+
+/* RShifter_la */
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+
+/* RShifter_la */
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+
+/* RSXmax_la */
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.type	__gnu_svml_dcos_data,@object
+	.size	__gnu_svml_dcos_data,.-__gnu_svml_dcos_data
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
new file mode 100644
index 0000000..0778e23
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
@@ -0,0 +1,40 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-double-vlen4.h"
+
+// Wrapper from scalar to vector function implemented in AVX2.
+#define VECTOR_WRAPPER(scalar_func, vector_func) \
+extern __m256d vector_func(__m256d); \
+FLOAT scalar_func(FLOAT x)\
+{\
+  int i;\
+  __m256d mx = _mm256_set1_pd(x);\
+  __m256d mr = vector_func(mx);\
+  for(i=1;i<4;i++)\
+  {\
+    if (((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#include <immintrin.h>
+
+VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVdN4v_cos)
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c b/sysdeps/x86_64/fpu/test-double-vlen4.c
new file mode 100644
index 0000000..4d3d9a3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4.c
@@ -0,0 +1,41 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FUNC(function) function
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat) Cdouble
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_DOUBLE 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define EXCEPTION_TESTS_double 0
+#define ROUNDING_TESTS_double(MODE) ((MODE) == FE_TONEAREST)
+
+#define TEST_MATHVEC 1
+#define CHECK_ARCH_EXT if (!avx2_usable) return;
+
+#include "test-double-vlen4.h"
+
+extern FLOAT WRAPPER_NAME (cos) (FLOAT);
+
+#include "libm-test.c"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.h b/sysdeps/x86_64/fpu/test-double-vlen4.h
new file mode 100644
index 0000000..f664fa6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4.h
@@ -0,0 +1,8 @@
+
+#define FLOAT double
+
+#define VEC_PREFIX vlen4_
+
+#define CONC(a,b) a ## b
+#define CONC1(a,b) CONC(a,b)
+#define WRAPPER_NAME(function) CONC1(VEC_PREFIX,function)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrapper.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrapper.c
new file mode 100644
index 0000000..a11d853
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrapper.c
@@ -0,0 +1,39 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-float-vlen8.h"
+
+#include <immintrin.h>
+
+//extern __m256 _ZGVdN8v_cosf(__m256);
+
+float vlen8_cosf(float x)
+{
+  int i;
+  __m256 mx = _mm256_set1_ps(x);
+  __m256 mr = mx; //_ZGVdN8v_cosf(mx);
+
+  for(i=1;i<8;i++)
+  {
+    if(((float*)&mr)[0]!=((float*)&mr)[i])
+    {
+      return ((float*)&mr)[0]+0.1;
+    }
+  }
+
+  return ((float*)&mr)[0];
+}
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/test-float-vlen8.c
new file mode 100644
index 0000000..7bfc814
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-vlen8.c
@@ -0,0 +1,41 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FUNC(function) function ## f
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat) Cfloat
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_FLOAT 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define EXCEPTION_TESTS_float 0
+#define ROUNDING_TESTS_float(MODE) ((MODE) == FE_TONEAREST)
+
+#define TEST_MATHVEC 1
+#define CHECK_ARCH_EXT if (!avx2_usable) return;
+
+#include "test-float-vlen8.h"
+
+extern FLOAT WRAPPER_NAME (cosf) (FLOAT);
+
+#include "libm-test.c"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.h b/sysdeps/x86_64/fpu/test-float-vlen8.h
new file mode 100644
index 0000000..bf28229
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-vlen8.h
@@ -0,0 +1,8 @@
+
+#define FLOAT float
+
+#define VEC_PREFIX vlen8_
+
+#define CONC(a,b) a ## b
+#define CONC1(a,b) CONC(a,b)
+#define WRAPPER_NAME(function) CONC1(VEC_PREFIX,function)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-23 19:23                                                   ` Andrew Senkevich
@ 2014-10-23 21:37                                                     ` Joseph S. Myers
  2014-10-27 14:00                                                       ` Andrew Senkevich
                                                                         ` (2 more replies)
  0 siblings, 3 replies; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-23 21:37 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Thu, 23 Oct 2014, Andrew Senkevich wrote:

> Let me know if such changes ok in general.

I'm not clear we yet reached consensus on whether glibc is the right place 
for this; I think that discussion tailed off without a clear conclusion, 
and someone needs to reread it, post a careful analysis of the discussion 
so far and try to help the community reach consensus.

Regarding the specific patch:

> +	      [Enable building and installing mathvec @<:@default=yes on x86_64 build, else default=no@:>@])],

I don't think the help text in an architecture-independent file should 
refer to specific architectures like this; just say "default depends on 
architecture" or similar.

> +ifeq ($(build-mathvect),yes)
> +# We need to install libm.so as linker script
> +# for more comfortable use of vector math library.
> +subdir_install: $(inst_libdir)/libm.so.tmp
> +$(inst_libdir)/libm.so.tmp: $(common-objpfx)format.lds \
> +	$(common-objpfx)math/libm.so$(libm.so-version) \
> +	$(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
> +	$(+force)
> +	(echo '/* GNU ld script */';\
> +	cat $<; \
> +	echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
> +	'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
> +	) > $@
> +	mv -f $@ $(inst_libdir)/libm.so # TODO do it somehow after all other
> +endif

Clearly it's necessary to resolve how to disable the normal installation 
rule for libm.so so it can be cleanly replaced by this new one.

> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 8a94a7e..2d31a11 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -60,6 +60,15 @@ __MATHCALL (atan,, (_Mdouble_ __x));
>  __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
>  
>  /* Cosine of X.  */
> +#if !defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cos
> +__DECL_SIMD_cos
> +#endif
> +#if defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cosf
> +__DECL_SIMD_cosf
> +#endif
> +#if defined _Mlong_double_ && defined __DECL_SIMD_cosl
> +__DECL_SIMD_cosl
> +#endif
>  __MATHCALL (cos,, (_Mdouble_ __x));

As previously noted, I think it would be much better if the definition of 
__MATHCALL can include all the conditional bits (possibly through a 
generated header that defines __DECL_SIMD_cos etc. to empty if not defined 
by bits/math-vector.h).

> diff --git a/math/have_vector.h b/math/have_vector.h
> new file mode 100644
> index 0000000..94aacf0
> --- /dev/null
> +++ b/math/have_vector.h
> @@ -0,0 +1,2574 @@
> +/* 
> +Definitions below are generated with the following bash script:
> +for func in $(grep ALL_RM_TEST math/libm-test.inc | awk {'print $2'} | sed -e "s/(//" -e "s/,//"); do 

Rather than having such a file checked in, makefile rules / scripts to 
generate it at test time should be checked in.

> +static int avx2_usable;		/* Set to 1 if AVX2 supported */

Given that we expect multiple architectures to have vector functions, 
this belongs in some architecture-specific file that libm-test.inc 
includes, rather than directly in libm-test.inc (which shouldn't refer 
directly to AVX at all).

> -#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
> -		     EXCEPTIONS)					\
> -  do									\
> -    if (enable_test (EXCEPTIONS))					\
> -      {									\
> -	COMMON_TEST_SETUP (ARG_STR);					\
> -	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
> -		     EXCEPTIONS);					\
> -	COMMON_TEST_CLEANUP;						\
> -      }									\
> +#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,				\
> +		     EXCEPTIONS)						\
> +  do										\
> +    if (enable_test (EXCEPTIONS))						\
> +      {										\
> +	COMMON_TEST_SETUP (ARG_STR);						\
> +	check_float (test_name,							\
> +		     CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )),	\
> +			     FUNC (FUNC_NAME)) (ARG),				\
> +			     EXPECTED,						\
> +		     EXCEPTIONS);						\
> +	COMMON_TEST_CLEANUP;							\
> +      }										\

I think it would be better for FUNC to be defined, in the test file that 
includes libm-test.inc, in a way that avoids the need for the CONCAT* 
calls here.  (To avoid warnings / errors about undeclared functions, I 
suppose the generated header might then need to redefine e.g. vec_sin to 
sin if there isn't a vector version of sin.)

> +#if defined __x86_64__ && defined __FAST_MATH__
> +# if defined _OPENMP && _OPENMP >= 201307
> +/* OpenMP case. */
> +#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
> +#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")

Of course we still need the API/ABI documentation providing the stable 
guarantee about exactly what this pragma means regarding the function 
versions it is saying are available in glibc.

> +#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
> +#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
> +# elif defined _CILKPLUS && _CILKPLUS >= 0
> +/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
> +#  define __DECL_SIMD_AVX2 __attribute__((__vector__(nomask)))
> +#  define __DECL_SIMD_SSE4 __attribute__((__vector__(processor(core_i7_sse4_2),\
> +						     nomask)))

And as previously noted, this needs to be fixed to be namespace-clean - 
using __nomask__, __processor__, __core_i7_sse4_2__.

> +#if defined TEST_MATHVEC

No, you can't have such conditionals on a macro in the user's namespace in 
an installed header.

> diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
> index c9f9a51..91c4cdf 100644
> --- a/sysdeps/x86_64/configure.ac
> +++ b/sysdeps/x86_64/configure.ac
> @@ -5,6 +5,24 @@ AC_CHECK_HEADER([cpuid.h], ,
>    [AC_MSG_ERROR([gcc must provide the <cpuid.h> header])],
>    [/* No default includes.  */])
>  
> +dnl Check if compiler target is x86_64.

Not needed.  preconfigure fragments in sysdeps directories need to check 
the architecture, but configure ones don't (they'll only be run for the 
relevant architecture, unless one fragment explicitly sources another).

> diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
> new file mode 100644
> index 0000000..d585fa0
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/Makefile
> @@ -0,0 +1,33 @@
> +ifeq ($(subdir),mathvec)
> +libmvec-support += svml_d_cos4_core svml_d_cos_data
> +endif
> +
> +# Rules for libmvec tests
> +ifeq ($(subdir),math)
> +ifneq ($(PERL),no)
> +ifeq ($(build-mathvec),yes)
> +libm-tests += test-double-vlen4 test-float-vlen8
> +
> +CFLAGS-test-double-vlen4-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
> +				     -frounding-math -mavx2
> +CFLAGS-test-float-vlen8-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
> +				    -frounding-math -mavx2

I think the sysdeps makefile should actually just define that double-vlen4 
and float-vlen8 are the vector lengths for which testing should take 
place, with all the other testing rules being arranged in an 
architecture-independent way.

> +/* General constants:
> + * lAbsMask
> + */
> +	.long	0xffffffff
> +	.long	0x7fffffff

My previous point from 
<https://sourceware.org/ml/libc-alpha/2014-10/msg00324.html> still applies 
about how to make these tables more readable (one line per "double" 
constant, more explicitly say what the constants are) and ensure that the 
offsets in the tables are directly linked to the offsets used in the 
function implementation.

> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
> new file mode 100644
> index 0000000..0778e23
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c

This file may well need to be architecture-specific, at least as written, 
but ...

> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c b/sysdeps/x86_64/fpu/test-double-vlen4.c
> new file mode 100644
> index 0000000..4d3d9a3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/test-double-vlen4.c

 ... it's not at all clear that this one should need to be.  At present it 
has some architecture-specific bits

> +#define CHECK_ARCH_EXT if (!avx2_usable) return;
> +
> +extern FLOAT WRAPPER_NAME (cos) (FLOAT);

but I'd think those are all that needs to go somewhere 
architecture-specific and the rest is pretty generic to any architecture 
supporting vector functions for vectors of 4 doubles.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-23 21:37                                                     ` Joseph S. Myers
@ 2014-10-27 14:00                                                       ` Andrew Senkevich
  2014-10-27 14:39                                                         ` Joseph S. Myers
  2014-10-29 13:00                                                       ` Andrew Senkevich
  2014-11-06 20:51                                                       ` Andrew Senkevich
  2 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-27 14:00 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

2014-10-24 1:37 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:
> On Thu, 23 Oct 2014, Andrew Senkevich wrote:
>
>> Let me know if such changes ok in general.
>
> I'm not clear we yet reached consensus on whether glibc is the right place
> for this; I think that discussion tailed off without a clear conclusion,
> and someone needs to reread it, post a careful analysis of the discussion
> so far and try to help the community reach consensus.

It was already decided and written in Consensus paragraph on wiki in
https://sourceware.org/ml/libc-alpha/2014-09/msg00596.html.
Link to wiki - https://sourceware.org/glibc/wiki/libm#Consensus

>> +#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
>> +#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
>> +# elif defined _CILKPLUS && _CILKPLUS >= 0
>> +/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
>> +#  define __DECL_SIMD_AVX2 __attribute__((__vector__(nomask)))
>> +#  define __DECL_SIMD_SSE4 __attribute__((__vector__(processor(core_i7_sse4_2),\
>> +                                                  nomask)))
>
> And as previously noted, this needs to be fixed to be namespace-clean -
> using __nomask__, __processor__, __core_i7_sse4_2__.

It seems there are no such reserved-namespace word versions now...

>> +#if defined TEST_MATHVEC
>
> No, you can't have such conditionals on a macro in the user's namespace in
> an installed header.

Then we have to build vector tests with -D__FAST_MATH__
-DTEST_FAST_MATH -D_OPENMP=201307 to be sure we have needed
definitions from math.h?

>> diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
>> index c9f9a51..91c4cdf 100644
>> --- a/sysdeps/x86_64/configure.ac
>> +++ b/sysdeps/x86_64/configure.ac
>> @@ -5,6 +5,24 @@ AC_CHECK_HEADER([cpuid.h], ,
>>    [AC_MSG_ERROR([gcc must provide the <cpuid.h> header])],
>>    [/* No default includes.  */])
>>
>> +dnl Check if compiler target is x86_64.
>
> Not needed.  preconfigure fragments in sysdeps directories need to check
> the architecture, but configure ones don't (they'll only be run for the
> relevant architecture, unless one fragment explicitly sources another).

Clear, then it can be done in root configure like so:
+if test x"$build_mathvec" = xnotset; then
+  if test x"$machine" = xx86_64/64; then
+    build_mathvec=yes
+  else
+    build_mathvec=no
+  fi
+fi
+LIBC_CONFIG_VAR([build-mathvec], [$build_mathvec])


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-27 14:00                                                       ` Andrew Senkevich
@ 2014-10-27 14:39                                                         ` Joseph S. Myers
  0 siblings, 0 replies; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-27 14:39 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Mon, 27 Oct 2014, Andrew Senkevich wrote:

> >> +#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
> >> +#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
> >> +# elif defined _CILKPLUS && _CILKPLUS >= 0
> >> +/* CilkPlus case. TODO _CILKPLUS currently nowhere defined */
> >> +#  define __DECL_SIMD_AVX2 __attribute__((__vector__(nomask)))
> >> +#  define __DECL_SIMD_SSE4 __attribute__((__vector__(processor(core_i7_sse4_2),\
> >> +                                                  nomask)))
> >
> > And as previously noted, this needs to be fixed to be namespace-clean -
> > using __nomask__, __processor__, __core_i7_sse4_2__.
> 
> It seems there are no such reserved-namespace word versions now...

Then fix the compiler to have such reserved-namespace versions and put 
appropriate conditionals on a fixed compiler version in the header.  It's 
not OK to put random identifiers into an installed header like that.

> >> +#if defined TEST_MATHVEC
> >
> > No, you can't have such conditionals on a macro in the user's namespace in
> > an installed header.
> 
> Then we have to build vector tests with -D__FAST_MATH__
> -DTEST_FAST_MATH -D_OPENMP=201307 to be sure we have needed
> definitions from math.h?

Yes, -D__FAST_MATH__ is used for some other libm tests.

> >> diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
> >> index c9f9a51..91c4cdf 100644
> >> --- a/sysdeps/x86_64/configure.ac
> >> +++ b/sysdeps/x86_64/configure.ac
> >> @@ -5,6 +5,24 @@ AC_CHECK_HEADER([cpuid.h], ,
> >>    [AC_MSG_ERROR([gcc must provide the <cpuid.h> header])],
> >>    [/* No default includes.  */])
> >>
> >> +dnl Check if compiler target is x86_64.
> >
> > Not needed.  preconfigure fragments in sysdeps directories need to check
> > the architecture, but configure ones don't (they'll only be run for the
> > relevant architecture, unless one fragment explicitly sources another).
> 
> Clear, then it can be done in root configure like so:
> +if test x"$build_mathvec" = xnotset; then
> +  if test x"$machine" = xx86_64/64; then

No.  Such conditionals on particular systems do not go in the toplevel 
configure script.

If you want something for x86_64 (both -m64 and -mx32), it can go in 
sysdeps/x86_64/configure.ac, without any machine conditionals.  And if 
there's a good reason (please state the reason if so) something won't work 
for x32, put it in sysdeps/x86_64/64/configure.ac, again with no machine 
conditionals (and in that case, the implementation files would also go in 
sysdeps/x86_64/64/ directories).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-23 21:37                                                     ` Joseph S. Myers
  2014-10-27 14:00                                                       ` Andrew Senkevich
@ 2014-10-29 13:00                                                       ` Andrew Senkevich
  2014-10-29 18:50                                                         ` Joseph S. Myers
  2014-11-06 20:51                                                       ` Andrew Senkevich
  2 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-29 13:00 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 5846 bytes --]

2014-10-24 1:37 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:

>> +static int avx2_usable;              /* Set to 1 if AVX2 supported */
>
> Given that we expect multiple architectures to have vector functions,
> this belongs in some architecture-specific file that libm-test.inc
> includes, rather than directly in libm-test.inc (which shouldn't refer
> directly to AVX at all).

>which shouldn't refer directly to AVX at all
Do you mean to place avx2_usable initialization in procedure in
architecture-specific *.c file and have generic stub, call it from
test main() and change build accordingly?
May be simply stay __cpu_features.feature[index_AVX2_Usable] &
bit_AVX2_Usable in every test function inserted through macros? It
don't require so big changes and don't affect performance
significantly. Or insert initialization in test's main() through
macros also.

>> -#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,                      \
>> -                  EXCEPTIONS)                                        \
>> -  do                                                                 \
>> -    if (enable_test (EXCEPTIONS))                                    \
>> -      {                                                                      \
>> -     COMMON_TEST_SETUP (ARG_STR);                                    \
>> -     check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,       \
>> -                  EXCEPTIONS);                                       \
>> -     COMMON_TEST_CLEANUP;                                            \
>> -      }                                                                      \
>> +#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,                              \
>> +                  EXCEPTIONS)                                                \
>> +  do                                                                         \
>> +    if (enable_test (EXCEPTIONS))                                            \
>> +      {                                                                              \
>> +     COMMON_TEST_SETUP (ARG_STR);                                            \
>> +     check_float (test_name,                                                 \
>> +                  CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )),      \
>> +                          FUNC (FUNC_NAME)) (ARG),                           \
>> +                          EXPECTED,                                          \
>> +                  EXCEPTIONS);                                               \
>> +     COMMON_TEST_CLEANUP;                                                    \
>> +      }                                                                              \
>
> I think it would be better for FUNC to be defined, in the test file that
> includes libm-test.inc, in a way that avoids the need for the CONCAT*
> calls here.  (To avoid warnings / errors about undeclared functions, I
> suppose the generated header might then need to redefine e.g. vec_sin to
> sin if there isn't a vector version of sin.)

Not good idea to change FUNC definition since it used in libm-test.c
not only in test macros (so it may cause usage of vector function with
not vector parameter). But it is possible to reduce number of
concatenation if change generated definitions in way your have
proposed.

Not all functions tested trough ALL_RM_TEST - cexp, tgamma and jn
tested in not all rounding modes, so definitions for them we have to
generate in script manually.

>> diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
>> new file mode 100644
>> index 0000000..d585fa0
>> --- /dev/null
>> +++ b/sysdeps/x86_64/fpu/Makefile
>> @@ -0,0 +1,33 @@
>> +ifeq ($(subdir),mathvec)
>> +libmvec-support += svml_d_cos4_core svml_d_cos_data
>> +endif
>> +
>> +# Rules for libmvec tests
>> +ifeq ($(subdir),math)
>> +ifneq ($(PERL),no)
>> +ifeq ($(build-mathvec),yes)
>> +libm-tests += test-double-vlen4 test-float-vlen8
>> +
>> +CFLAGS-test-double-vlen4-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
>> +                                  -frounding-math -mavx2
>> +CFLAGS-test-float-vlen8-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
>> +                                 -frounding-math -mavx2
>
> I think the sysdeps makefile should actually just define that double-vlen4
> and float-vlen8 are the vector lengths for which testing should take
> place, with all the other testing rules being arranged in an
> architecture-independent way.

Do you mean to stay in sysdeps/x86_64/fpu/Makefile only CFLAGS-*
definitions or to setup some variable which will be used in common
makefile for build vector tests?

>> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
>> new file mode 100644
>> index 0000000..0778e23
>> --- /dev/null
>> +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrapper.c
>
> This file may well need to be architecture-specific, at least as written,
> but ...
>
>> diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c b/sysdeps/x86_64/fpu/test-double-vlen4.c
>> new file mode 100644
>> index 0000000..4d3d9a3
>> --- /dev/null
>> +++ b/sysdeps/x86_64/fpu/test-double-vlen4.c
>
>  ... it's not at all clear that this one should need to be.  At present it
> has some architecture-specific bits
>
>> +#define CHECK_ARCH_EXT if (!avx2_usable) return;
>> +
>> +extern FLOAT WRAPPER_NAME (cos) (FLOAT);
>
> but I'd think those are all that needs to go somewhere
> architecture-specific and the rest is pretty generic to any architecture
> supporting vector functions for vectors of 4 doubles.

Then lets have math/test-double-vlen4.h with common definitions and
sysdeps/x86_64/fpu/test-double-vlen4.c containing wrapper.
Attached patch contains only discussed here changes.


--
WBR,
Andrew

[-- Attachment #2: tmp29.10.patch --]
[-- Type: application/octet-stream, Size: 24935 bytes --]

diff --git a/math/Makefile b/math/Makefile
index 866bc0f..ea5e6e1
--- a/math/Makefile
+++ b/math/Makefile
@@ -26,7 +26,7 @@ headers		:= math.h bits/mathcalls.h bits/mathinline.h bits/huge_val.h \
 		   bits/huge_valf.h bits/huge_vall.h bits/inf.h bits/nan.h \
 		   fpu_control.h complex.h bits/cmathcalls.h fenv.h \
 		   bits/fenv.h bits/fenvinline.h bits/mathdef.h tgmath.h \
-		   bits/math-finite.h
+		   bits/math-finite.h bits/math-vector.h
 
 # FPU support code.
 aux		:= setfpucw fpu_control
@@ -85,6 +85,22 @@ generated += $(foreach s,.c .S l.c l.S f.c f.S,$(calls:s_%=m_%$s))
 routines = $(calls) $(calls:=f) $(long-c-$(long-double-fcts))
 long-c-yes = $(calls:=l)
 
+ifeq ($(build-mathvect),yes)
+# We need to install libm.so as linker script
+# for more comfortable use of vector math library.
+subdir_install: $(inst_libdir)/libm.so.tmp
+$(inst_libdir)/libm.so.tmp: $(common-objpfx)format.lds \
+	$(common-objpfx)math/libm.so$(libm.so-version) \
+	$(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
+	$(+force)
+	(echo '/* GNU ld script */';\
+	cat $<; \
+	echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
+	'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
+	) > $@
+	mv -f $@ $(inst_libdir)/libm.so
+endif
+
 # Rules for the test suite.
 tests = test-matherr test-fenv atest-exp atest-sincos atest-exp2 basic-test \
 	test-misc test-fpucw test-fpucw-ieee tst-definitions test-tgmath \
@@ -102,7 +118,7 @@ libm-tests = test-float test-double $(test-longdouble-$(long-double-fcts)) \
 libm-tests.o = $(addsuffix .o,$(libm-tests))
 
 tests += $(libm-tests)
-libm-tests-generated = libm-test-ulps.h libm-test.c
+libm-tests-generated = libm-test-ulps.h libm-have-vector.h libm-test.c
 generated += $(libm-tests-generated) libm-test.stmp
 
 # This is needed for dependencies
@@ -113,9 +129,10 @@ ulps-file = $(firstword $(wildcard $(sysdirs:%=%/libm-test-ulps)))
 $(addprefix $(objpfx), $(libm-tests-generated)): $(objpfx)libm-test.stmp
 
 $(objpfx)libm-test.stmp: $(ulps-file) libm-test.inc gen-libm-test.pl \
-			 auto-libm-test-out
+			 gen-libm-have-vector.sh auto-libm-test-out
 	$(make-target-directory)
 	$(PERL) gen-libm-test.pl -u $< -o "$(objpfx)"
+	$(BASH) gen-libm-have-vector.sh > $(objpfx)libm-have-vector.h
 	@echo > $@
 
 $(objpfx)test-float.o: $(objpfx)libm-test.stmp
@@ -124,6 +141,14 @@ $(objpfx)test-double.o: $(objpfx)libm-test.stmp
 $(objpfx)test-idouble.o: $(objpfx)libm-test.stmp
 $(objpfx)test-ldouble.o: $(objpfx)libm-test.stmp
 $(objpfx)test-ildoubl.o: $(objpfx)libm-test.stmp
+
+$(objpfx)test-double-vlen4.o: $(objpfx)libm-test.stmp
+#$(objpfx)test-float-vlen8.o: $(objpfx)libm-test.stmp
+
+$(objpfx)test-double-vlen4: $(common-objpfx)mathvec/libmvec.so \
+			    $(objpfx)init-arch.o
+#$(objpfx)test-float-vlen8: $(common-objpfx)mathvec/libmvec.so \
+			    $(objpfx)init-arch.o
 endif
 
 CFLAGS-test-float.c = -fno-inline -ffloat-store -fno-builtin -frounding-math
diff --git a/math/gen-libm-have-vector.sh b/math/gen-libm-have-vector.sh
new file mode 100755
index 0000000..baf1368
--- /dev/null
+++ b/math/gen-libm-have-vector.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+# Copyright (C) 1999-2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Generate series of definitions used for vector math functions tests.
+print_defs()
+{
+  echo "#ifdef __DECL_SIMD_$1"
+  echo "# define HAVE_VECTOR_$1 1"
+  echo "# define VEC_PREFIX_$1 WRAPPER_NAME($1)"
+  echo "#else"
+  echo "# define HAVE_VECTOR_$1 0"
+  echo "# define VEC_PREFIX_$1 $1"
+  echo "#endif"
+  echo
+}
+
+for func in $(grep ALL_RM_TEST libm-test.inc | grep -v define | sed -r "s/.*\(//; s/,.*//"); do 
+  print_defs ${func}
+  print_defs ${func}f
+  print_defs ${func}l
+done
+
+print_defs jn
+print_defs jnf
+print_defs jnl
+
+print_defs cexp
+print_defs cexpf
+print_defs cexpl
+
+print_defs tgamma
+print_defs tgammaf
+print_defs tgammal
diff --git a/math/libm-test.inc b/math/libm-test.inc
index f86a4fa..e348ed4 100644
--- a/math/libm-test.inc
+++ b/math/libm-test.inc
@@ -126,6 +126,7 @@
 #include <argp.h>
 #include <tininess.h>
 #include <math-tests.h>
+#include <init-arch.h>
 
 /* Structure for ulp data for a function, or the real or imaginary
    part of a function.  */
@@ -678,13 +679,17 @@ test_exceptions (const char *test_name, int exception)
   feclearexcept (FE_ALL_EXCEPT);
 }
 
+#ifndef TEST_MATHVEC
+# define TEST_MATHVEC 0
+#endif
+
 /* Test whether errno for TEST_NAME, set to ERRNO_VALUE, has value
    EXPECTED_VALUE (description EXPECTED_NAME).  */
 static void
 test_single_errno (const char *test_name, int errno_value,
 		   int expected_value, const char *expected_name)
 {
-#ifndef TEST_INLINE
+#if !defined TEST_INLINE && !TEST_MATHVEC
   if (errno_value == expected_value)
     {
       if (print_screen (1))
@@ -1295,16 +1300,17 @@ struct test_fFF_11_data
 
 /* Run an individual test, including any required setup and checking
    of results, or loop over all tests in an array.  */
-#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
-		     EXCEPTIONS);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
+		     EXCEPTIONS)				\
+  do								\
+    if (enable_test (EXCEPTIONS))				\
+      {								\
+	COMMON_TEST_SETUP (ARG_STR);				\
+	check_float (test_name,	FUNC_TEST (FUNC_NAME) (ARG),	\
+		     EXPECTED,					\
+		     EXCEPTIONS);				\
+	COMMON_TEST_CLEANUP;					\
+      }								\
   while (0)
 #define RUN_TEST_LOOP_f_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1313,16 +1319,16 @@ struct test_fFF_11_data
 		    (ARRAY)[i].RM_##ROUNDING_MODE.expected,		\
 		    (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);		\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_2_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,	\
-		     EXCEPTIONS)				\
-  do								\
-    if (enable_test (EXCEPTIONS))				\
-      {								\
-	COMMON_TEST_SETUP (ARG_STR);				\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG1, ARG2),	\
-		     EXPECTED, EXCEPTIONS);			\
-	COMMON_TEST_CLEANUP;					\
-      }								\
+#define RUN_TEST_2_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,		\
+		     EXCEPTIONS)					\
+  do									\
+    if (enable_test (EXCEPTIONS))					\
+      {									\
+	COMMON_TEST_SETUP (ARG_STR);					\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG1, ARG2),	\
+		     EXPECTED, EXCEPTIONS);				\
+	COMMON_TEST_CLEANUP;						\
+      }									\
   while (0)
 #define RUN_TEST_LOOP_2_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1340,16 +1346,16 @@ struct test_fFF_11_data
 #define RUN_TEST_LOOP_fl_f RUN_TEST_LOOP_2_f
 #define RUN_TEST_if_f RUN_TEST_2_f
 #define RUN_TEST_LOOP_if_f RUN_TEST_LOOP_2_f
-#define RUN_TEST_fff_f(ARG_STR, FUNC_NAME, ARG1, ARG2, ARG3,		\
-		       EXPECTED, EXCEPTIONS)				\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG1, ARG2, ARG3),	\
-		     EXPECTED, EXCEPTIONS);				\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fff_f(ARG_STR, FUNC_NAME, ARG1, ARG2, ARG3,			\
+		       EXPECTED, EXCEPTIONS)					\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG1, ARG2, ARG3),	\
+		     EXPECTED, EXCEPTIONS);					\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fff_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1359,17 +1365,17 @@ struct test_fFF_11_data
 		      (ARRAY)[i].RM_##ROUNDING_MODE.expected,		\
 		      (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_c_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,		\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name,						\
-		     FUNC (FUNC_NAME) (BUILD_COMPLEX (ARG1, ARG2)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_c_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,			\
+		     EXCEPTIONS)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_float (test_name,							\
+		     FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARG1, ARG2)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_c_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1387,7 +1393,7 @@ struct test_fFF_11_data
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
 	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		     EXCEPTIONS);					\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA_TEST)							\
@@ -1406,22 +1412,22 @@ struct test_fFF_11_data
 		     (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,		\
 		     (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_fF_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
-		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,		\
-		       EXTRA_EXPECTED)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
-	if (EXTRA_TEST)							\
-	  check_float (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);	\
-	EXTRA_OUTPUT_TEST_CLEANUP (1);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fF_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
+		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,			\
+		       EXTRA_EXPECTED)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;				\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);					\
+	if (EXTRA_TEST)								\
+	  check_float (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);		\
+	EXTRA_OUTPUT_TEST_CLEANUP (1);						\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fF_f1(FUNC_NAME, ARRAY, ROUNDING_MODE, EXTRA_VAR)	\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1433,22 +1439,22 @@ struct test_fFF_11_data
 		      (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,		\
 		      (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_fI_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
-		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,		\
-		       EXTRA_EXPECTED)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
-	if (EXTRA_TEST)							\
-	  check_int (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);	\
-	EXTRA_OUTPUT_TEST_CLEANUP (1);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fI_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
+		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,			\
+		       EXTRA_EXPECTED)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;				\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);					\
+	if (EXTRA_TEST)								\
+	  check_int (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);		\
+	EXTRA_OUTPUT_TEST_CLEANUP (1);						\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fI_f1(FUNC_NAME, ARRAY, ROUNDING_MODE, EXTRA_VAR)	\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1469,7 +1475,7 @@ struct test_fFF_11_data
 	COMMON_TEST_SETUP (ARG_STR);					\
 	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
 	check_float (test_name,						\
-		     FUNC (FUNC_NAME) (ARG1, ARG2, &(EXTRA_VAR)),	\
+		     FUNC_TEST (FUNC_NAME) (ARG1, ARG2, &(EXTRA_VAR)),	\
 		     EXPECTED, EXCEPTIONS);				\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA_TEST)							\
@@ -1490,17 +1496,17 @@ struct test_fFF_11_data
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,	\
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_c_c(ARG_STR, FUNC_NAME, ARGR, ARGC, EXPR, EXPC,	\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_complex (test_name,					\
-		       FUNC (FUNC_NAME) (BUILD_COMPLEX (ARGR, ARGC)),	\
-		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);		\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_c_c(ARG_STR, FUNC_NAME, ARGR, ARGC, EXPR, EXPC,		\
+		     EXCEPTIONS)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_complex (test_name,						\
+		       FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARGR, ARGC)),	\
+		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);			\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_c_c(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1511,18 +1517,18 @@ struct test_fFF_11_data
 		    (ARRAY)[i].RM_##ROUNDING_MODE.expc,			\
 		    (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);		\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_cc_c(ARG_STR, FUNC_NAME, ARG1R, ARG1C, ARG2R, ARG2C,	\
-		      EXPR, EXPC, EXCEPTIONS)				\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_complex (test_name,					\
-		       FUNC (FUNC_NAME) (BUILD_COMPLEX (ARG1R, ARG1C),	\
-					 BUILD_COMPLEX (ARG2R, ARG2C)),	\
-		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);		\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_cc_c(ARG_STR, FUNC_NAME, ARG1R, ARG1C, ARG2R, ARG2C,		\
+		      EXPR, EXPC, EXCEPTIONS)					\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_complex (test_name,						\
+		       FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARG1R, ARG1C),	\
+					      BUILD_COMPLEX (ARG2R, ARG2C)),	\
+		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);			\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_cc_c(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1539,7 +1545,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_int (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,		\
+	check_int (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		   EXCEPTIONS);						\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1592,7 +1598,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_bool (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_bool (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		    EXCEPTIONS);					\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1626,7 +1632,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_long (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_long (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		    EXCEPTIONS);					\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1643,8 +1649,8 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_longlong (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
-			EXCEPTIONS);					\
+	check_longlong (test_name, FUNC_TEST (FUNC_NAME) (ARG),		\
+			EXPECTED, EXCEPTIONS);				\
 	COMMON_TEST_CLEANUP;						\
       }									\
   while (0)
@@ -1663,7 +1669,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	FUNC (FUNC_NAME) (ARG, &(EXTRA1_VAR), &(EXTRA2_VAR));		\
+	FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA1_VAR), &(EXTRA2_VAR));	\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA1_TEST)						\
 	  check_float (extra1_name, EXTRA1_VAR, EXTRA1_EXPECTED,	\
@@ -1690,10 +1696,30 @@ struct test_fFF_11_data
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra2_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
 
+#ifndef CHECK_ARCH_EXT
+# define CHECK_ARCH_EXT
+#endif
+
+#ifndef VEC_PREFIX 
+# define VEC_PREFIX
+#endif
+
+#ifndef FUNC_TEST
+# define FUNC_TEST FUNC
+#endif
+
+#include "libm-have-vector.h"
+
+#define STR_CONCAT(a,b,c) __STRING(a##b##c)
+#define STR_CON3(a,b,c) STR_CONCAT(a,b,c)
+
 /* Start and end the tests for a given function.  */
-#define START(FUNC, EXACT)			\
-  const char *this_func = #FUNC;		\
+#define START(FUN, SUFF, EXACT)				\
+  CHECK_ARCH_EXT						\
+  if (TEST_MATHVEC && !HAVE_VECTOR_ ## FUN) return;		\
+  const char *this_func = STR_CON3(VEC_PREFIX, FUN, SUFF);	\
   init_max_error (this_func, EXACT)
+  
 #define END					\
   print_max_error (this_func)
 #define END_COMPLEX				\
@@ -1705,28 +1731,28 @@ struct test_fFF_11_data
     {									\
       do								\
 	{								\
-	  START (FUNC, EXACT);						\
+	  START (FUNC, , EXACT);					\
 	  LOOP_MACRO (FUNC, ARRAY, , ## __VA_ARGS__);			\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _downward, EXACT);				\
+	  START (FUNC, _downward, EXACT);				\
 	  LOOP_MACRO (FUNC, ARRAY, FE_DOWNWARD, ## __VA_ARGS__);	\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _towardzero, EXACT);				\
+	  START (FUNC, _towardzero, EXACT);				\
 	  LOOP_MACRO (FUNC, ARRAY, FE_TOWARDZERO, ## __VA_ARGS__);	\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _upward, EXACT);				\
+	  START (FUNC, _upward, EXACT);					\
 	  LOOP_MACRO (FUNC, ARRAY, FE_UPWARD, ## __VA_ARGS__);		\
 	  END_MACRO;							\
 	}								\
@@ -1746,7 +1772,6 @@ matherr (struct exception *x __attribute__ ((unused)))
   Tests for single functions of libm.
   Please keep them alphabetically sorted!
 ****************************************************************************/
-
 static const struct test_f_f_data acos_test_data[] =
   {
     TEST_f_f (acos, plus_infty, qnan_value, INVALID_EXCEPTION|ERRNO_EDOM),
@@ -6034,7 +6059,7 @@ static const struct test_c_c_data cexp_test_data[] =
 static void
 cexp_test (void)
 {
-  START (cexp, 0);
+  START (cexp, , 0);
   RUN_TEST_LOOP_c_c (cexp, cexp_test_data, );
   END_COMPLEX;
 }
@@ -6245,7 +6270,6 @@ copysign_test (void)
   ALL_RM_TEST (copysign, 1, copysign_test_data, RUN_TEST_LOOP_ff_f, END);
 }
 
-
 static const struct test_f_f_data cos_test_data[] =
   {
     TEST_f_f (cos, plus_infty, qnan_value, INVALID_EXCEPTION|ERRNO_EDOM),
@@ -6261,7 +6285,6 @@ cos_test (void)
   ALL_RM_TEST (cos, 0, cos_test_data, RUN_TEST_LOOP_f_f, END);
 }
 
-
 static const struct test_f_f_data cosh_test_data[] =
   {
     TEST_f_f (cosh, plus_infty, plus_infty, NO_TEST_INLINE),
@@ -7548,7 +7571,7 @@ static const struct test_if_f_data jn_test_data[] =
 static void
 jn_test (void)
 {
-  START (jn, 0);
+  START (jn, , 0);
   RUN_TEST_LOOP_if_f (jn, jn_test_data, );
   END;
 }
@@ -9374,7 +9397,7 @@ static const struct test_f_f_data tgamma_test_data[] =
 static void
 tgamma_test (void)
 {
-  START (tgamma, 0);
+  START (tgamma, , 0);
   RUN_TEST_LOOP_f_f (tgamma, tgamma_test_data, );
   END;
 }
@@ -9628,7 +9651,6 @@ significand_test (void)
   ALL_RM_TEST (significand, 1, significand_test_data, RUN_TEST_LOOP_f_f, END);
 }
 
-
 static void
 initialize (void)
 {
@@ -9820,10 +9842,15 @@ main (int argc, char **argv)
 	}
     }
 
-
   initialize ();
   printf (TEST_MSG);
 
+#if TEST_MATHVEC
+  static int avx2_usable = 0;
+  __init_cpu_features();
+  avx2_usable = __cpu_features.feature[index_AVX2_Usable] & bit_AVX2_Usable;
+#endif
+
   check_ulp ();
 
   /* Keep the tests a wee bit ordered (according to ISO C99).  */
diff --git a/math/test-double-vlen4.h b/math/test-double-vlen4.h
new file mode 100644
index 0000000..be47e5a
--- /dev/null
+++ b/math/test-double-vlen4.h
@@ -0,0 +1,26 @@
+
+#define FLOAT double
+#define FUNC(function) function
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat) Cdouble
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_DOUBLE 1
+#define TEST_MATHVEC 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define EXCEPTION_TESTS_double 0
+#define ROUNDING_TESTS_double(MODE) ((MODE) == FE_TONEAREST)
+
+#define VEC_PREFIX vlen4_
+
+#define CONCAT(prefix,func) __CONCAT(prefix, func)
+
+#define WRAPPER_NAME(function) CONCAT(VEC_PREFIX, function)
+
+#define FUNC_TEST(function) CONCAT(VEC_PREFIX_ ## function, FUNC( ))
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
new file mode 100755
index 0000000..9af2a20
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -0,0 +1,19 @@
+ifeq ($(subdir),mathvec)
+libmvec-support += svml_d_cos4_core svml_d_cos_data
+endif
+
+# Rules for libmvec tests
+ifeq ($(subdir),math)
+ifeq ($(build-mathvec),yes)
+libm-tests += test-double-vlen4 # test-float-vlen8
+
+CFLAGS-test-double-vlen4.c = -fno-inline -ffloat-store -fno-builtin \
+			     -frounding-math -mavx2 -D__FAST_MATH__ \
+			     -DTEST_FAST_MATH -D_OPENMP=201307
+
+#CFLAGS-test-float-vlen8.c = -fno-inline -ffloat-store -fno-builtin \
+			     -frounding-math -mavx2 -D__FAST_MATH__ \
+			     -DTEST_FAST_MATH -D_OPENMP=201307
+
+endif
+endif
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c b/sysdeps/x86_64/fpu/test-double-vlen4.c
new file mode 100644
index 0000000..67e5fd1
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4.c
@@ -0,0 +1,44 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-double-vlen4.h"
+
+// Wrapper from scalar to vector function implemented in AVX2.
+#define VECTOR_WRAPPER(scalar_func, vector_func) \
+extern __m256d vector_func(__m256d); \
+FLOAT scalar_func(FLOAT x)\
+{\
+  int i;\
+  __m256d mx = _mm256_set1_pd(x);\
+  __m256d mr = vector_func(mx);\
+  for(i=1;i<4;i++)\
+  {\
+    if (((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#include <immintrin.h>
+
+VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVdN4v_cos)
+
+#define CHECK_ARCH_EXT if (!avx2_usable) return;
+
+#include "libm-test.c"

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-29 13:00                                                       ` Andrew Senkevich
@ 2014-10-29 18:50                                                         ` Joseph S. Myers
  2014-10-30 12:15                                                           ` Andrew Senkevich
  0 siblings, 1 reply; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-29 18:50 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Wed, 29 Oct 2014, Andrew Senkevich wrote:

> 2014-10-24 1:37 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:
> 
> >> +static int avx2_usable;              /* Set to 1 if AVX2 supported */
> >
> > Given that we expect multiple architectures to have vector functions,
> > this belongs in some architecture-specific file that libm-test.inc
> > includes, rather than directly in libm-test.inc (which shouldn't refer
> > directly to AVX at all).
> 
> >which shouldn't refer directly to AVX at all
> Do you mean to place avx2_usable initialization in procedure in
> architecture-specific *.c file and have generic stub, call it from
> test main() and change build accordingly?

For example.  The aim is to get something clean in accordance with glibc's 
design principles - such as putting things that are 
architecture-independent in architecture-independent places, and things 
that are architecture-specific in architecture-specific places, with a 
minimum of duplication between architectures.  There may be multiple 
approaches that achieve that.

> > I think it would be better for FUNC to be defined, in the test file that
> > includes libm-test.inc, in a way that avoids the need for the CONCAT*
> > calls here.  (To avoid warnings / errors about undeclared functions, I
> > suppose the generated header might then need to redefine e.g. vec_sin to
> > sin if there isn't a vector version of sin.)
> 
> Not good idea to change FUNC definition since it used in libm-test.c
> not only in test macros (so it may cause usage of vector function with
> not vector parameter). But it is possible to reduce number of
> concatenation if change generated definitions in way your have
> proposed.

Well, maybe a preliminary refactoring patch is needed that separates FUNC 
into multiple macros, one for functions used in testsuite infrastructure 
and one for functions being tested.

There are lots of RUN_TEST_* macros (I don't think we should assume that 
only one of them will only ever be relevant for vector tests) - it seems a 
bad idea for every one of them to need to repeat something so cryptic as 
CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )), FUNC (FUNC_NAME)).

> Not all functions tested trough ALL_RM_TEST - cexp, tgamma and jn
> tested in not all rounding modes, so definitions for them we have to
> generate in script manually.

Yes, conversion of those to ALL_RM_TEST was deferred because of bugs it 
showed up that need fixing.  And in the case of cexp, the bugs appear to 
be present in other functions as well, but it's not convenient to add 
tests for them in all cases until csin / csinh have moved to tests in 
auto-libm-test-in - and for that, I'm waiting for a new MPC release with 
last December's speedups for mpc_sin / mpc_sinh.  I'm doubtful any changes 
to the arguments to START should be needed, but if they are, then you do 
indeed need to change the code for those three functions' tests manually.

> >> diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
> >> new file mode 100644
> >> index 0000000..d585fa0
> >> --- /dev/null
> >> +++ b/sysdeps/x86_64/fpu/Makefile
> >> @@ -0,0 +1,33 @@
> >> +ifeq ($(subdir),mathvec)
> >> +libmvec-support += svml_d_cos4_core svml_d_cos_data
> >> +endif
> >> +
> >> +# Rules for libmvec tests
> >> +ifeq ($(subdir),math)
> >> +ifneq ($(PERL),no)
> >> +ifeq ($(build-mathvec),yes)
> >> +libm-tests += test-double-vlen4 test-float-vlen8
> >> +
> >> +CFLAGS-test-double-vlen4-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
> >> +                                  -frounding-math -mavx2
> >> +CFLAGS-test-float-vlen8-wrapper.c = -fno-inline -ffloat-store -fno-builtin \
> >> +                                 -frounding-math -mavx2
> >
> > I think the sysdeps makefile should actually just define that double-vlen4
> > and float-vlen8 are the vector lengths for which testing should take
> > place, with all the other testing rules being arranged in an
> > architecture-independent way.
> 
> Do you mean to stay in sysdeps/x86_64/fpu/Makefile only CFLAGS-*
> definitions or to setup some variable which will be used in common
> makefile for build vector tests?

Only libmvec-support, and a variable containing "double-vlen4 float-vlen8" 
or similar as a list of vector formats for which to run tests, and a 
variable containing "-mavx2" as compiler options for building vector tests 
(all the other options there should be architecture-independent and 
defined only once in a variable in math/Makefile).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-29 18:50                                                         ` Joseph S. Myers
@ 2014-10-30 12:15                                                           ` Andrew Senkevich
  2014-10-30 13:55                                                             ` Joseph S. Myers
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-30 12:15 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

2014-10-29 21:50 GMT+03:00 Joseph S. Myers <joseph@codesourcery.com>:

>> > I think it would be better for FUNC to be defined, in the test file that
>> > includes libm-test.inc, in a way that avoids the need for the CONCAT*
>> > calls here.  (To avoid warnings / errors about undeclared functions, I
>> > suppose the generated header might then need to redefine e.g. vec_sin to
>> > sin if there isn't a vector version of sin.)
>>
>> Not good idea to change FUNC definition since it used in libm-test.c
>> not only in test macros (so it may cause usage of vector function with
>> not vector parameter). But it is possible to reduce number of
>> concatenation if change generated definitions in way your have
>> proposed.
>
> Well, maybe a preliminary refactoring patch is needed that separates FUNC
> into multiple macros, one for functions used in testsuite infrastructure
> and one for functions being tested.
>
> There are lots of RUN_TEST_* macros (I don't think we should assume that
> only one of them will only ever be relevant for vector tests) - it seems a
> bad idea for every one of them to need to repeat something so cryptic as
> CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )), FUNC (FUNC_NAME)).

But it is already old code, yesterday's patch looks so in this place:
FUNC_TEST (FUNC_NAME) (ARG)


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-30 12:15                                                           ` Andrew Senkevich
@ 2014-10-30 13:55                                                             ` Joseph S. Myers
  2014-10-30 20:07                                                               ` Joseph S. Myers
  0 siblings, 1 reply; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-30 13:55 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Thu, 30 Oct 2014, Andrew Senkevich wrote:

> > Well, maybe a preliminary refactoring patch is needed that separates FUNC
> > into multiple macros, one for functions used in testsuite infrastructure
> > and one for functions being tested.
> >
> > There are lots of RUN_TEST_* macros (I don't think we should assume that
> > only one of them will only ever be relevant for vector tests) - it seems a
> > bad idea for every one of them to need to repeat something so cryptic as
> > CONCAT (CONCAT3_1 (VEC_PREFIX_, FUNC_NAME, FUNC ( )), FUNC (FUNC_NAME)).
> 
> But it is already old code, yesterday's patch looks so in this place:
> FUNC_TEST (FUNC_NAME) (ARG)

As I said, *preliminary refactoring patch*.  Long sequences of variations 
on the same patch aren't helpful; if you find yourself sending them, you 
need to step back and think very carefully about how to restructure the 
submission to make things as clear and as easy to review as possible.  
That includes separating out any pieces, large or small, that are 
reasonably separable and can be justified on their own merits.  Having 
separated them, you then need to make *self-contained* submissions 
(including all relevant rationale and background), and ping those 
submissions weekly as needed (I haven't seen any pings of the binutils 
version requirement patch).  And please keep the state for your own 
patches in patchwork.sourceware.org clean; I see six entries there with 
the same description "[RFC] How to add vector math functions to Glibc", 
when there should be at most one.

If you do need to make multiple submissions of successive versions of the 
same patch, consider the submission style where each submission contains 
both the full self-contained description and rationale (that would go in 
the git log message) and a separate description of what has changed 
relative to the previous patch version (and number each patch version).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-30 13:55                                                             ` Joseph S. Myers
@ 2014-10-30 20:07                                                               ` Joseph S. Myers
  2014-10-31 10:24                                                                 ` Andrew Senkevich
  0 siblings, 1 reply; 67+ messages in thread
From: Joseph S. Myers @ 2014-10-30 20:07 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

Also, I don't see you in copyright.list, so unless you're covered by a 
corporate copyright assignment for glibc you should start work on 
completing the paperwork.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-30 20:07                                                               ` Joseph S. Myers
@ 2014-10-31 10:24                                                                 ` Andrew Senkevich
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Senkevich @ 2014-10-31 10:24 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

2014-10-30 23:07 GMT+03:00 Joseph S. Myers <joseph@codesourcery.com>:

> Also, I don't see you in copyright.list, so unless you're covered by a
> corporate copyright assignment for glibc you should start work on
> completing the paperwork.

Paperwork in progress.


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-10-23 21:37                                                     ` Joseph S. Myers
  2014-10-27 14:00                                                       ` Andrew Senkevich
  2014-10-29 13:00                                                       ` Andrew Senkevich
@ 2014-11-06 20:51                                                       ` Andrew Senkevich
  2014-11-14 15:45                                                         ` Andrew Senkevich
  2 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-11-06 20:51 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2059 bytes --]

Hi, Joseph,

2014-10-24 1:37 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:

> On Thu, 23 Oct 2014, Andrew Senkevich wrote:

>> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
>> index 8a94a7e..2d31a11 100644
>> --- a/math/bits/mathcalls.h
>> +++ b/math/bits/mathcalls.h
>> @@ -60,6 +60,15 @@ __MATHCALL (atan,, (_Mdouble_ __x));
>>  __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
>>
>>  /* Cosine of X.  */
>> +#if !defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cos
>> +__DECL_SIMD_cos
>> +#endif
>> +#if defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cosf
>> +__DECL_SIMD_cosf
>> +#endif
>> +#if defined _Mlong_double_ && defined __DECL_SIMD_cosl
>> +__DECL_SIMD_cosl
>> +#endif
>>  __MATHCALL (cos,, (_Mdouble_ __x));
>
> As previously noted, I think it would be much better if the definition of
> __MATHCALL can include all the conditional bits (possibly through a
> generated header that defines __DECL_SIMD_cos etc. to empty if not defined
> by bits/math-vector.h).

proposal is to use separated __MATHCALL_VEC for vector cases, because
it reduces number of needed empty definitions and can be simply
generated (__MATHCALL case requires a lot of manual search to obtain
all affected function names because of redefinitions in some files).

>> +#if defined __x86_64__ && defined __FAST_MATH__
>> +# if defined _OPENMP && _OPENMP >= 201307
>> +/* OpenMP case. */
>> +#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
>> +#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
>
> Of course we still need the API/ABI documentation providing the stable
> guarantee about exactly what this pragma means regarding the function
> versions it is saying are available in glibc.

We will follow-up on this soon.

I attached patch with almost all infrastructure fixes discussed
before. It seems pragma meaning and data tables remain to be done.
Patch affects a lot of files and of course will be separated to
minimal disjoint parts for submission later.


--
WBR,
Andrew

[-- Attachment #2: libmvec_061114.patch --]
[-- Type: application/octet-stream, Size: 57969 bytes --]

diff --git a/Makeconfig b/Makeconfig
index 24a3b82..4672008 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -476,7 +476,7 @@ link-libc = $(link-libc-rpath-link) $(link-libc-before-gnulib) $(gnulib)
 link-libc-tests = $(link-libc-tests-rpath-link) \
 		  $(link-libc-before-gnulib) $(gnulib-tests)
 # This is how to find at build-time things that will be installed there.
-rpath-dirs = math elf dlfcn nss nis rt resolv crypt
+rpath-dirs = math elf dlfcn nss nis rt resolv crypt mathvec
 rpath-link = \
 $(common-objdir):$(subst $(empty) ,:,$(patsubst ../$(subdir),.,$(rpath-dirs:%=$(common-objpfx)%)))
 else
@@ -1018,7 +1018,7 @@ all-subdirs = csu assert ctype locale intl catgets math setjmp signal	    \
 	      stdlib stdio-common libio malloc string wcsmbs time dirent    \
 	      grp pwd posix io termios resource misc socket sysvipc gmon    \
 	      gnulib iconv iconvdata wctype manual shadow gshadow po argp   \
-	      crypt localedata timezone rt conform debug		    \
+	      crypt localedata timezone rt conform debug mathvec	    \
 	      $(add-on-subdirs) dlfcn elf
 
 ifndef avoid-generated
diff --git a/bits/math-vector.h b/bits/math-vector.h
new file mode 100644
index 0000000..c8fe5cb
--- /dev/null
+++ b/bits/math-vector.h
@@ -0,0 +1,22 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License  published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+	include <math.h> instead."
+#endif
diff --git a/configure b/configure
index 24888d9..fd1a9b9 100755
--- a/configure
+++ b/configure
@@ -774,6 +774,7 @@ enable_systemtap
 enable_build_nscd
 enable_nscd
 enable_pt_chown
+enable_mathvec
 with_cpu
 '
       ac_precious_vars='build_alias
@@ -1437,6 +1438,8 @@ Optional Features:
   --disable-build-nscd    disable building and installing the nscd daemon
   --disable-nscd          library functions will not contact the nscd daemon
   --enable-pt_chown       Enable building and installing pt_chown
+  --enable-mathvec        Enable building and installing mathvec [default
+                          depends on architecture]
 
 Optional Packages:
   --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
@@ -3730,6 +3733,14 @@ if test "$build_pt_chown" = yes; then
 
 fi
 
+# Check whether --enable-mathvec was given.
+if test "${enable_mathvec+set}" = set; then :
+  enableval=$enable_mathvec; build_mathvec=$enableval
+else
+  build_mathvec=notset
+fi
+
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.
diff --git a/configure.ac b/configure.ac
index 9dd2c68..f86ed2e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -353,6 +353,12 @@ if test "$build_pt_chown" = yes; then
   AC_DEFINE(HAVE_PT_CHOWN)
 fi
 
+AC_ARG_ENABLE([mathvec],
+	      [AS_HELP_STRING([--enable-mathvec],
+	      [Enable building and installing mathvec @<:@default depends on architecture@:>@])],
+	      [build_mathvec=$enableval],
+	      [build_mathvec=notset])
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.
diff --git a/include/libm-simd-decl-stubs.h b/include/libm-simd-decl-stubs.h
new file mode 100644
index 0000000..0048717
--- /dev/null
+++ b/include/libm-simd-decl-stubs.h
@@ -0,0 +1,35 @@
+/* Empty definitions required for __MATHCALL_VEC unfolding in mathcalls.h.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Needed definitions could be generated with: 
+   for func in $(grep __MATHCALL_VEC math/bits/mathcalls.h |\
+		 sed -r "s|__MATHCALL_VEC.?\(||; s|,.*||"); do 
+     echo "#define __DECL_SIMD_${func}"; 
+     echo "#define __DECL_SIMD_${func}f"; 
+     echo "#define __DECL_SIMD_${func}l";
+   done 
+ */
+
+#ifndef _LIBM_SIMD_DECL_STUBS_H
+#define _LIBM_SIMD_DECL_STUBS_H 1
+
+#define __DECL_SIMD_cos
+#define __DECL_SIMD_cosf
+#define __DECL_SIMD_cosl
+
+#endif
diff --git a/math/Makefile b/math/Makefile
index 866bc0f..dee39d1 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -26,7 +26,7 @@ headers		:= math.h bits/mathcalls.h bits/mathinline.h bits/huge_val.h \
 		   bits/huge_valf.h bits/huge_vall.h bits/inf.h bits/nan.h \
 		   fpu_control.h complex.h bits/cmathcalls.h fenv.h \
 		   bits/fenv.h bits/fenvinline.h bits/mathdef.h tgmath.h \
-		   bits/math-finite.h
+		   bits/math-finite.h bits/math-vector.h libm-simd-decl-stubs.h
 
 # FPU support code.
 aux		:= setfpucw fpu_control
@@ -85,6 +85,22 @@ generated += $(foreach s,.c .S l.c l.S f.c f.S,$(calls:s_%=m_%$s))
 routines = $(calls) $(calls:=f) $(long-c-$(long-double-fcts))
 long-c-yes = $(calls:=l)
 
+ifeq ($(build-mathvec),yes)
+# We need to install libm.so as linker script
+# for more comfortable use of vector math library.
+install-lib-ldscripts := libm.so
+install_subdir: $(inst_libdir)/libm.so
+$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
+	$(libm) \
+	$(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
+	$(+force)
+	(echo '/* GNU ld script'; echo '*/';\
+	cat $<; \
+	echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
+	'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
+	) > $@
+endif
+
 # Rules for the test suite.
 tests = test-matherr test-fenv atest-exp atest-sincos atest-exp2 basic-test \
 	test-misc test-fpucw test-fpucw-ieee tst-definitions test-tgmath \
@@ -97,12 +113,13 @@ tests-static = test-fpucw-static test-fpucw-ieee-static
 test-longdouble-yes = test-ldouble test-ildoubl
 
 ifneq (no,$(PERL))
+libm-vec-tests = $(addprefix test-,$(libmvec-tests))
 libm-tests = test-float test-double $(test-longdouble-$(long-double-fcts)) \
-	test-ifloat test-idouble
+	test-ifloat test-idouble $(libm-vec-tests)
 libm-tests.o = $(addsuffix .o,$(libm-tests))
 
 tests += $(libm-tests)
-libm-tests-generated = libm-test-ulps.h libm-test.c
+libm-tests-generated = libm-test-ulps.h libm-have-vector-test.h libm-test.c
 generated += $(libm-tests-generated) libm-test.stmp
 
 # This is needed for dependencies
@@ -113,9 +130,10 @@ ulps-file = $(firstword $(wildcard $(sysdirs:%=%/libm-test-ulps)))
 $(addprefix $(objpfx), $(libm-tests-generated)): $(objpfx)libm-test.stmp
 
 $(objpfx)libm-test.stmp: $(ulps-file) libm-test.inc gen-libm-test.pl \
-			 auto-libm-test-out
+			 gen-libm-have-vector-test.sh auto-libm-test-out
 	$(make-target-directory)
 	$(PERL) gen-libm-test.pl -u $< -o "$(objpfx)"
+	$(BASH) gen-libm-have-vector-test.sh > $(objpfx)libm-have-vector-test.h
 	@echo > $@
 
 $(objpfx)test-float.o: $(objpfx)libm-test.stmp
@@ -124,8 +142,22 @@ $(objpfx)test-double.o: $(objpfx)libm-test.stmp
 $(objpfx)test-idouble.o: $(objpfx)libm-test.stmp
 $(objpfx)test-ldouble.o: $(objpfx)libm-test.stmp
 $(objpfx)test-ildoubl.o: $(objpfx)libm-test.stmp
+
+$(objpfx)test-double-vlen4.o: $(objpfx)libm-test.stmp
+$(objpfx)test-float-vlen8.o: $(objpfx)libm-test.stmp
+
+$(objpfx)test-double-vlen4: $(common-objpfx)mathvec/libmvec.so \
+			    $(objpfx)init-arch.o
+$(objpfx)test-float-vlen8: $(common-objpfx)mathvec/libmvec.so \
+			   $(objpfx)init-arch.o
 endif
 
+CFLAGS-test-double-vlen4.c = -fno-inline -ffloat-store -fno-builtin -frounding-math \
+			     -D__FAST_MATH__ -DTEST_FAST_MATH -D_OPENMP=201307 \
+			     -Wno-unknown-pragmas $(arch-ext-cflags)
+CFLAGS-test-float-vlen8.c = -fno-inline -ffloat-store -fno-builtin -frounding-math \
+			    -D__FAST_MATH__ -DTEST_FAST_MATH -D_OPENMP=201307 \
+			    -Wno-unknown-pragmas $(arch-ext-cflags)
 CFLAGS-test-float.c = -fno-inline -ffloat-store -fno-builtin -frounding-math
 CFLAGS-test-double.c = -fno-inline -ffloat-store -fno-builtin -frounding-math
 CFLAGS-test-ldouble.c = -fno-inline -ffloat-store -fno-builtin -frounding-math
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 8a94a7e..82928a1 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -60,7 +60,7 @@ __MATHCALL (atan,, (_Mdouble_ __x));
 __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
 
 /* Cosine of X.  */
-__MATHCALL (cos,, (_Mdouble_ __x));
+__MATHCALL_VEC (cos,, (_Mdouble_ __x));
 /* Sine of X.  */
 __MATHCALL (sin,, (_Mdouble_ __x));
 /* Tangent of X.  */
diff --git a/math/gen-libm-have-vector-test.sh b/math/gen-libm-have-vector-test.sh
new file mode 100755
index 0000000..95c7bef
--- /dev/null
+++ b/math/gen-libm-have-vector-test.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+# Copyright (C) 1999-2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Generate series of definitions used for vector math functions tests.
+print_defs()
+{
+  echo "#if defined TEST_VECTOR_$1 && TEST_VECTOR_$1"
+  echo "# define HAVE_VECTOR_$1 1"
+  echo "# define VEC_PREFIX_$1 WRAPPER_NAME($1)"
+  echo "#else"
+  echo "# define HAVE_VECTOR_$1 0"
+  echo "# define VEC_PREFIX_$1 $1"
+  echo "#endif"
+  echo
+}
+
+for func in $(grep ALL_RM_TEST libm-test.inc | grep -v define | sed -r "s/.*\(//; s/,.*//"); do 
+  print_defs ${func}
+  print_defs ${func}f
+  print_defs ${func}l
+done
+
+print_defs jn
+print_defs jnf
+print_defs jnl
+
+print_defs cexp
+print_defs cexpf
+print_defs cexpl
+
+print_defs tgamma
+print_defs tgammaf
+print_defs tgammal
diff --git a/math/libm-test.inc b/math/libm-test.inc
index f86a4fa..79bcfca 100644
--- a/math/libm-test.inc
+++ b/math/libm-test.inc
@@ -678,13 +678,17 @@ test_exceptions (const char *test_name, int exception)
   feclearexcept (FE_ALL_EXCEPT);
 }
 
+#ifndef TEST_MATHVEC
+# define TEST_MATHVEC 0
+#endif
+
 /* Test whether errno for TEST_NAME, set to ERRNO_VALUE, has value
    EXPECTED_VALUE (description EXPECTED_NAME).  */
 static void
 test_single_errno (const char *test_name, int errno_value,
 		   int expected_value, const char *expected_name)
 {
-#ifndef TEST_INLINE
+#if !defined TEST_INLINE && !TEST_MATHVEC
   if (errno_value == expected_value)
     {
       if (print_screen (1))
@@ -1295,16 +1299,17 @@ struct test_fFF_11_data
 
 /* Run an individual test, including any required setup and checking
    of results, or loop over all tests in an array.  */
-#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
-		     EXCEPTIONS);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
+		     EXCEPTIONS)				\
+  do								\
+    if (enable_test (EXCEPTIONS))				\
+      {								\
+	COMMON_TEST_SETUP (ARG_STR);				\
+	check_float (test_name,	FUNC_TEST (FUNC_NAME) (ARG),	\
+		     EXPECTED,					\
+		     EXCEPTIONS);				\
+	COMMON_TEST_CLEANUP;					\
+      }								\
   while (0)
 #define RUN_TEST_LOOP_f_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1313,16 +1318,16 @@ struct test_fFF_11_data
 		    (ARRAY)[i].RM_##ROUNDING_MODE.expected,		\
 		    (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);		\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_2_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,	\
-		     EXCEPTIONS)				\
-  do								\
-    if (enable_test (EXCEPTIONS))				\
-      {								\
-	COMMON_TEST_SETUP (ARG_STR);				\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG1, ARG2),	\
-		     EXPECTED, EXCEPTIONS);			\
-	COMMON_TEST_CLEANUP;					\
-      }								\
+#define RUN_TEST_2_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,		\
+		     EXCEPTIONS)					\
+  do									\
+    if (enable_test (EXCEPTIONS))					\
+      {									\
+	COMMON_TEST_SETUP (ARG_STR);					\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG1, ARG2),	\
+		     EXPECTED, EXCEPTIONS);				\
+	COMMON_TEST_CLEANUP;						\
+      }									\
   while (0)
 #define RUN_TEST_LOOP_2_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1340,16 +1345,16 @@ struct test_fFF_11_data
 #define RUN_TEST_LOOP_fl_f RUN_TEST_LOOP_2_f
 #define RUN_TEST_if_f RUN_TEST_2_f
 #define RUN_TEST_LOOP_if_f RUN_TEST_LOOP_2_f
-#define RUN_TEST_fff_f(ARG_STR, FUNC_NAME, ARG1, ARG2, ARG3,		\
-		       EXPECTED, EXCEPTIONS)				\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG1, ARG2, ARG3),	\
-		     EXPECTED, EXCEPTIONS);				\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fff_f(ARG_STR, FUNC_NAME, ARG1, ARG2, ARG3,			\
+		       EXPECTED, EXCEPTIONS)					\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG1, ARG2, ARG3),	\
+		     EXPECTED, EXCEPTIONS);					\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fff_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1359,17 +1364,17 @@ struct test_fFF_11_data
 		      (ARRAY)[i].RM_##ROUNDING_MODE.expected,		\
 		      (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_c_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,		\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name,						\
-		     FUNC (FUNC_NAME) (BUILD_COMPLEX (ARG1, ARG2)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_c_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,			\
+		     EXCEPTIONS)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_float (test_name,							\
+		     FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARG1, ARG2)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_c_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1387,7 +1392,7 @@ struct test_fFF_11_data
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
 	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		     EXCEPTIONS);					\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA_TEST)							\
@@ -1406,22 +1411,22 @@ struct test_fFF_11_data
 		     (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,		\
 		     (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_fF_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
-		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,		\
-		       EXTRA_EXPECTED)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
-	if (EXTRA_TEST)							\
-	  check_float (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);	\
-	EXTRA_OUTPUT_TEST_CLEANUP (1);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fF_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
+		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,			\
+		       EXTRA_EXPECTED)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;				\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);					\
+	if (EXTRA_TEST)								\
+	  check_float (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);		\
+	EXTRA_OUTPUT_TEST_CLEANUP (1);						\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fF_f1(FUNC_NAME, ARRAY, ROUNDING_MODE, EXTRA_VAR)	\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1433,22 +1438,22 @@ struct test_fFF_11_data
 		      (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,		\
 		      (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_fI_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
-		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,		\
-		       EXTRA_EXPECTED)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
-	if (EXTRA_TEST)							\
-	  check_int (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);	\
-	EXTRA_OUTPUT_TEST_CLEANUP (1);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fI_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
+		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,			\
+		       EXTRA_EXPECTED)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;				\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);					\
+	if (EXTRA_TEST)								\
+	  check_int (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);		\
+	EXTRA_OUTPUT_TEST_CLEANUP (1);						\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fI_f1(FUNC_NAME, ARRAY, ROUNDING_MODE, EXTRA_VAR)	\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1469,7 +1474,7 @@ struct test_fFF_11_data
 	COMMON_TEST_SETUP (ARG_STR);					\
 	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
 	check_float (test_name,						\
-		     FUNC (FUNC_NAME) (ARG1, ARG2, &(EXTRA_VAR)),	\
+		     FUNC_TEST (FUNC_NAME) (ARG1, ARG2, &(EXTRA_VAR)),	\
 		     EXPECTED, EXCEPTIONS);				\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA_TEST)							\
@@ -1490,17 +1495,17 @@ struct test_fFF_11_data
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,	\
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_c_c(ARG_STR, FUNC_NAME, ARGR, ARGC, EXPR, EXPC,	\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_complex (test_name,					\
-		       FUNC (FUNC_NAME) (BUILD_COMPLEX (ARGR, ARGC)),	\
-		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);		\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_c_c(ARG_STR, FUNC_NAME, ARGR, ARGC, EXPR, EXPC,		\
+		     EXCEPTIONS)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_complex (test_name,						\
+		       FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARGR, ARGC)),	\
+		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);			\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_c_c(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1511,18 +1516,18 @@ struct test_fFF_11_data
 		    (ARRAY)[i].RM_##ROUNDING_MODE.expc,			\
 		    (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);		\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_cc_c(ARG_STR, FUNC_NAME, ARG1R, ARG1C, ARG2R, ARG2C,	\
-		      EXPR, EXPC, EXCEPTIONS)				\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_complex (test_name,					\
-		       FUNC (FUNC_NAME) (BUILD_COMPLEX (ARG1R, ARG1C),	\
-					 BUILD_COMPLEX (ARG2R, ARG2C)),	\
-		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);		\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_cc_c(ARG_STR, FUNC_NAME, ARG1R, ARG1C, ARG2R, ARG2C,		\
+		      EXPR, EXPC, EXCEPTIONS)					\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_complex (test_name,						\
+		       FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARG1R, ARG1C),	\
+					      BUILD_COMPLEX (ARG2R, ARG2C)),	\
+		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);			\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_cc_c(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1539,7 +1544,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_int (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,		\
+	check_int (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		   EXCEPTIONS);						\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1592,7 +1597,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_bool (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_bool (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		    EXCEPTIONS);					\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1626,7 +1631,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_long (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_long (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		    EXCEPTIONS);					\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1643,8 +1648,8 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_longlong (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
-			EXCEPTIONS);					\
+	check_longlong (test_name, FUNC_TEST (FUNC_NAME) (ARG),		\
+			EXPECTED, EXCEPTIONS);				\
 	COMMON_TEST_CLEANUP;						\
       }									\
   while (0)
@@ -1663,7 +1668,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	FUNC (FUNC_NAME) (ARG, &(EXTRA1_VAR), &(EXTRA2_VAR));		\
+	FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA1_VAR), &(EXTRA2_VAR));	\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA1_TEST)						\
 	  check_float (extra1_name, EXTRA1_VAR, EXTRA1_EXPECTED,	\
@@ -1690,9 +1695,31 @@ struct test_fFF_11_data
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra2_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
 
+#ifndef INIT_ARCH_EXT
+# define INIT_ARCH_EXT
+# define CHECK_ARCH_EXT
+#endif
+
+#ifndef VEC_PREFIX 
+# define VEC_PREFIX
+#endif
+
+#ifndef FUNC_TEST
+# define FUNC_TEST FUNC
+#endif
+
+#include "libm-have-vector-test.h"
+
+#define STR_CONCAT(a,b,c) __STRING(a##b##c)
+#define STR_CON3(a,b,c) STR_CONCAT(a,b,c)
+
+#define HAVE_VECTOR(func) __CONCAT(HAVE_VECTOR_,func)
+
 /* Start and end the tests for a given function.  */
-#define START(FUNC, EXACT)			\
-  const char *this_func = #FUNC;		\
+#define START(FUN, SUFF, EXACT)					\
+  CHECK_ARCH_EXT						\
+  if (TEST_MATHVEC && !HAVE_VECTOR(FUNC(FUN))) return;		\
+  const char *this_func = STR_CON3(VEC_PREFIX,FUN,SUFF);	\
   init_max_error (this_func, EXACT)
 #define END					\
   print_max_error (this_func)
@@ -1705,28 +1732,28 @@ struct test_fFF_11_data
     {									\
       do								\
 	{								\
-	  START (FUNC, EXACT);						\
+	  START (FUNC, , EXACT);					\
 	  LOOP_MACRO (FUNC, ARRAY, , ## __VA_ARGS__);			\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _downward, EXACT);				\
+	  START (FUNC, _downward, EXACT);				\
 	  LOOP_MACRO (FUNC, ARRAY, FE_DOWNWARD, ## __VA_ARGS__);	\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _towardzero, EXACT);				\
+	  START (FUNC, _towardzero, EXACT);				\
 	  LOOP_MACRO (FUNC, ARRAY, FE_TOWARDZERO, ## __VA_ARGS__);	\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _upward, EXACT);				\
+	  START (FUNC, _upward, EXACT);					\
 	  LOOP_MACRO (FUNC, ARRAY, FE_UPWARD, ## __VA_ARGS__);		\
 	  END_MACRO;							\
 	}								\
@@ -6034,7 +6061,7 @@ static const struct test_c_c_data cexp_test_data[] =
 static void
 cexp_test (void)
 {
-  START (cexp, 0);
+  START (cexp, , 0);
   RUN_TEST_LOOP_c_c (cexp, cexp_test_data, );
   END_COMPLEX;
 }
@@ -7548,7 +7575,7 @@ static const struct test_if_f_data jn_test_data[] =
 static void
 jn_test (void)
 {
-  START (jn, 0);
+  START (jn, , 0);
   RUN_TEST_LOOP_if_f (jn, jn_test_data, );
   END;
 }
@@ -9374,7 +9401,7 @@ static const struct test_f_f_data tgamma_test_data[] =
 static void
 tgamma_test (void)
 {
-  START (tgamma, 0);
+  START (tgamma, , 0);
   RUN_TEST_LOOP_f_f (tgamma, tgamma_test_data, );
   END;
 }
@@ -9824,6 +9851,8 @@ main (int argc, char **argv)
   initialize ();
   printf (TEST_MSG);
 
+  INIT_ARCH_EXT
+
   check_ulp ();
 
   /* Keep the tests a wee bit ordered (according to ISO C99).  */
diff --git a/math/math.h b/math/math.h
index dc532b7..b44a23b 100644
--- a/math/math.h
+++ b/math/math.h
@@ -27,6 +27,9 @@
 
 __BEGIN_DECLS
 
+/* Get machine-dependent vector math functions declarations */
+#include <bits/math-vector.h>
+
 /* Get machine-dependent HUGE_VAL value (returned on overflow).
    On all IEEE754 machines, this is +Infinity.  */
 #include <bits/huge_val.h>
@@ -49,6 +52,12 @@ __BEGIN_DECLS
    so we can easily declare each function as both `name' and `__name',
    and can declare the float versions `namef' and `__namef'.  */
 
+#define __SIMD_DECL(function) __CONCAT(__DECL_SIMD_,function)
+
+#define __MATHCALL_VEC(function,suffix, args) 	\
+  __SIMD_DECL(__MATH_PRECNAME(function,suffix)) \
+  __MATHCALL(function,suffix, args)
+
 #define __MATHCALL(function,suffix, args)	\
   __MATHDECL (_Mdouble_,function,suffix, args)
 #define __MATHDECL(type, function,suffix, args) \
diff --git a/math/test-double-vlen4.h b/math/test-double-vlen4.h
new file mode 100644
index 0000000..a71a3d0
--- /dev/null
+++ b/math/test-double-vlen4.h
@@ -0,0 +1,42 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FLOAT double
+#define FUNC(function) function
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat) Cdouble
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_DOUBLE 1
+#define TEST_MATHVEC 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define EXCEPTION_TESTS_double 0
+#define ROUNDING_TESTS_double(MODE) ((MODE) == FE_TONEAREST)
+
+#define VEC_PREFIX vlen4_
+
+#define CONCAT(prefix,func) __CONCAT(prefix,func)
+
+#define WRAPPER_NAME(function) CONCAT(VEC_PREFIX,function)
+
+#define FUNC_TEST(function) VEC_PREFIX_ ## function
diff --git a/math/test-float-vlen8.h b/math/test-float-vlen8.h
new file mode 100644
index 0000000..a1a86a1
--- /dev/null
+++ b/math/test-float-vlen8.h
@@ -0,0 +1,42 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FLOAT float
+#define FUNC(function) function ## f
+#define TEST_MSG "testing float vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat) Cfloat
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_FLOAT 1
+#define TEST_MATHVEC 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define EXCEPTION_TESTS_float 0
+#define ROUNDING_TESTS_float(MODE) ((MODE) == FE_TONEAREST)
+
+#define VEC_PREFIX vlen8_
+
+#define CONCAT(prefix,func) __CONCAT(prefix,func)
+
+#define WRAPPER_NAME(function) CONCAT(VEC_PREFIX,function)
+
+#define FUNC_TEST(function) VEC_PREFIX_ ## function ## f
diff --git a/mathvec/Depend b/mathvec/Depend
new file mode 100644
index 0000000..ede10ab
--- /dev/null
+++ b/mathvec/Depend
@@ -0,0 +1 @@
+math
diff --git a/mathvec/Makefile b/mathvec/Makefile
new file mode 100644
index 0000000..26c552c
--- /dev/null
+++ b/mathvec/Makefile
@@ -0,0 +1,35 @@
+# Copyright (C) 2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Makefile for the vector math library.
+
+subdir		:= mathvec
+
+include ../Makeconfig
+
+ifeq ($(build-mathvec),yes)
+extra-libs	:= libmvec
+extra-libs-others = $(extra-libs)
+
+libmvec-routines = $(strip $(libmvec-support))
+
+$(objpfx)libmvec.so: $(libm)
+endif
+
+# Rules for the test suite are in math directory
+
+include ../Rules
diff --git a/shlib-versions b/shlib-versions
index e05b248..fa3cf1d 100644
--- a/shlib-versions
+++ b/shlib-versions
@@ -71,3 +71,6 @@ libanl=1
 # This defines the libgcc soname version this glibc is to load for
 # asynchronous cancellation to work correctly.
 libgcc_s=1
+
+# The vector math library
+libmvec=1
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
new file mode 100644
index 0000000..8272ddd
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -0,0 +1,3 @@
+GLIBC_2.21
+ GLIBC_2.21 A
+ _ZGVdN4v_cos F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
new file mode 100644
index 0000000..33ffabb
--- /dev/null
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -0,0 +1,53 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+	include <math.h> instead."
+#endif
+
+/* Get default empty definitions for simd declarations */
+#include <libm-simd-decl-stubs.h>
+
+#if defined __x86_64__ && defined __FAST_MATH__
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case. */
+#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch simdlen(4)")
+#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch simdlen(8)")
+#  undef __DECL_SIMD_cos
+#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
+#  undef __DECL_SIMD_cosf
+#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
+# elif defined _CILKPLUS && _CILKPLUS >= 0 
+/* CilkPlus case. 
+ * TODO _CILKPLUS currently nowhere defined, 
+ * add reserved-namespace versions and __GNUC_PREREQ
+#  define __DECL_SIMD_AVX2 __attribute__((__vector__(__vectorlength__(4),\
+						     __nomask__,\
+						     __processor__(\
+						       __core_4th_gen_avx__))))
+#  define __DECL_SIMD_SSE4 __attribute__((__vector__(__vectorlength__(8),\
+						     __nomask__,\
+						     __processor__(\
+						       __core_i7_sse4_2__))))
+#  undef __DECL_SIMD_cos
+#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
+#  undef __DECL_SIMD_cosf
+#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4 */
+# endif
+#endif
diff --git a/sysdeps/x86_64/configure b/sysdeps/x86_64/configure
index 7d4dadd..9773770 100644
--- a/sysdeps/x86_64/configure
+++ b/sysdeps/x86_64/configure
@@ -275,6 +275,16 @@ fi
 config_vars="$config_vars
 config-cflags-avx2 = $libc_cv_cc_avx2"
 
+if test x"$build_mathvec" = xnotset; then
+  if test x"$base_machine" = xx86_64; then
+    build_mathvec=yes
+  else
+    build_mathvec=no
+  fi
+fi
+config_vars="$config_vars
+build-mathvec = $build_mathvec"
+
 $as_echo "#define PI_STATIC_AND_HIDDEN 1" >>confdefs.h
 
 # work around problem with autoconf and empty lines at the end of files
diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
index c9f9a51..0b73d5b 100644
--- a/sysdeps/x86_64/configure.ac
+++ b/sysdeps/x86_64/configure.ac
@@ -99,6 +99,15 @@ if test $libc_cv_cc_avx2 = yes; then
 fi
 LIBC_CONFIG_VAR([config-cflags-avx2], [$libc_cv_cc_avx2])
 
+if test x"$build_mathvec" = xnotset; then
+  if test x"$base_machine" = xx86_64; then
+    build_mathvec=yes
+  else
+    build_mathvec=no
+  fi
+fi
+LIBC_CONFIG_VAR([build-mathvec], [$build_mathvec])
+
 dnl It is always possible to access static and hidden symbols in an
 dnl position independent way.
 AC_DEFINE(PI_STATIC_AND_HIDDEN)
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
new file mode 100644
index 0000000..1b65b09
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -0,0 +1,13 @@
+ifeq ($(subdir),mathvec)
+libmvec-support += svml_d_cos4_core svml_d_cos_data
+endif
+
+# Rules for libmvec tests
+ifeq ($(subdir),math)
+ifeq ($(build-mathvec),yes)
+libmvec-tests += double-vlen4 float-vlen8
+
+arch-ext-cflags = -mavx2 
+
+endif
+endif
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
new file mode 100644
index 0000000..3d433d2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Versions
@@ -0,0 +1,5 @@
+libmvec {
+  GLIBC_2.21 {
+    _ZGVdN4v_cos;
+  }
+}
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 36e1b76..9e4f8cd 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1961,6 +1961,12 @@ ifloat: 3
 ildouble: 4
 ldouble: 4
 
+Function: "vlen4_cos":
+double: 1
+
+Function: "vlen8_cos":
+float: 1
+
 Function: "y0":
 double: 2
 float: 1
diff --git a/sysdeps/x86_64/fpu/math-tests.h b/sysdeps/x86_64/fpu/math-tests.h
new file mode 100644
index 0000000..466b97b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/math-tests.h
@@ -0,0 +1,34 @@
+/* Configuration for math tests.  x86_64 version.
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef REQUIRE_AVX2
+# include <init-arch.h>
+
+  static int avx2_usable;	/* Set to 1 if AVX2 supported */
+
+# define INIT_ARCH_EXT 						\
+    __init_cpu_features ();					\
+    avx2_usable = __cpu_features.feature[index_AVX2_Usable]	\
+		& bit_AVX2_Usable;
+
+# define CHECK_ARCH_EXT						\
+  if (!avx2_usable) return;
+
+#endif
+
+#include_next <math-tests.h>
diff --git a/sysdeps/x86_64/fpu/svml_d_cos4_core.S b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
new file mode 100644
index 0000000..7c9f62e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
@@ -0,0 +1,186 @@
+/* Function cos vectorized with AVX2.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.text
+ENTRY(_ZGVdN4v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *     
+ *    ( low accuracy ( < 4ulp ) or enhanced performance 
+ *      ( half of correct mantissa ) implementation )
+ *     
+ *    Argument representation:
+ *    arg + Pi/2 = (N*Pi + R)
+ *    
+ *    Result calculation:
+ *    cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ *    sin(R) is approximated by corresponding polynomial
+ */
+        pushq     %rbp
+        movq      %rsp, %rbp
+        andq      $-64, %rsp
+        subq      $448, %rsp
+        movq      __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
+        vmovapd   %ymm0, %ymm1
+        vmovupd   192(%rax), %ymm4
+        vmovupd   256(%rax), %ymm5
+
+/* ARGUMENT RANGE REDUCTION:
+ * Add Pi/2 to argument: X' = X+Pi/2
+ */
+        vaddpd    128(%rax), %ymm1, %ymm7
+
+/* Get absolute argument value: X' = |X'| */
+        vandpd    (%rax), %ymm7, %ymm2
+
+/* Y = X'*InvPi + RS : right shifter add */
+        vfmadd213pd %ymm5, %ymm4, %ymm7
+        vmovupd   1216(%rax), %ymm4
+
+/* Check for large arguments path */
+        vcmpnle_uqpd 64(%rax), %ymm2, %ymm3
+
+/* N = Y - RS : right shifter sub */
+        vsubpd    %ymm5, %ymm7, %ymm6
+        vmovupd   640(%rax), %ymm2
+
+/* SignRes = Y<<63 : shift LSB to MSB place for result sign */
+        vpsllq    $63, %ymm7, %ymm7
+
+/* N = N - 0.5 */
+        vsubpd    320(%rax), %ymm6, %ymm0
+        vmovmskpd %ymm3, %ecx
+
+/* R = X - N*Pi1 */
+        vmovapd   %ymm1, %ymm3
+        vfnmadd231pd %ymm0, %ymm2, %ymm3
+
+/* R = R - N*Pi2 */
+        vfnmadd231pd 704(%rax), %ymm0, %ymm3
+
+/* R = R - N*Pi3 */
+        vfnmadd132pd 768(%rax), %ymm3, %ymm0
+
+/* POLYNOMIAL APPROXIMATION:
+ * R2 = R*R
+ */
+        vmulpd    %ymm0, %ymm0, %ymm5
+        vfmadd213pd 1152(%rax), %ymm5, %ymm4
+        vfmadd213pd 1088(%rax), %ymm5, %ymm4
+        vfmadd213pd 1024(%rax), %ymm5, %ymm4
+
+/* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))) */
+        vfmadd213pd 960(%rax), %ymm5, %ymm4
+        vfmadd213pd 896(%rax), %ymm5, %ymm4
+        vfmadd213pd 832(%rax), %ymm5, %ymm4
+        vmulpd    %ymm5, %ymm4, %ymm6
+        vfmadd213pd %ymm0, %ymm0, %ymm6
+
+/* RECONSTRUCTION:
+ * Final sign setting: Res = Poly^SignRes 
+ */
+        vxorpd    %ymm7, %ymm6, %ymm0
+        testl     %ecx, %ecx
+        jne       _LBL_1_3
+
+_LBL_1_2:
+        movq      %rbp, %rsp
+        popq      %rbp
+        ret
+
+_LBL_1_3:
+        vmovupd   %ymm1, 320(%rsp)
+        vmovupd   %ymm0, 384(%rsp)
+        je        _LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        vmovups   %ymm8, 224(%rsp)
+        vmovups   %ymm9, 192(%rsp)
+        vmovups   %ymm10, 160(%rsp)
+        vmovups   %ymm11, 128(%rsp)
+        vmovups   %ymm12, 96(%rsp)
+        vmovups   %ymm13, 64(%rsp)
+        vmovups   %ymm14, 32(%rsp)
+        vmovups   %ymm15, (%rsp)
+        movq      %rsi, 264(%rsp)
+        movq      %rdi, 256(%rsp)
+        movq      %r12, 296(%rsp)
+        movb      %dl, %r12b
+        movq      %r13, 288(%rsp)
+        movl      %ecx, %r13d
+        movq      %r14, 280(%rsp)
+        movl      %eax, %r14d
+        movq      %r15, 272(%rsp)
+
+_LBL_1_6:
+        btl       %r14d, %r13d
+        jc        _LBL_1_12
+
+_LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        _LBL_1_10
+
+_LBL_1_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        _LBL_1_6
+
+        vmovups   224(%rsp), %ymm8
+        vmovups   192(%rsp), %ymm9
+        vmovups   160(%rsp), %ymm10
+        vmovups   128(%rsp), %ymm11
+        vmovups   96(%rsp), %ymm12
+        vmovups   64(%rsp), %ymm13
+        vmovups   32(%rsp), %ymm14
+        vmovups   (%rsp), %ymm15
+        vmovupd   384(%rsp), %ymm0
+        movq      264(%rsp), %rsi
+        movq      256(%rsp), %rdi
+        movq      296(%rsp), %r12
+        movq      288(%rsp), %r13
+        movq      280(%rsp), %r14
+        movq      272(%rsp), %r15
+        jmp       _LBL_1_2
+
+_LBL_1_10:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    328(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 392(%rsp,%r15)
+        jmp       _LBL_1_8
+
+_LBL_1_12:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    320(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 384(%rsp,%r15)
+        jmp       _LBL_1_7
+END(_ZGVdN4v_cos)
diff --git a/sysdeps/x86_64/fpu/svml_d_cos_data.S b/sysdeps/x86_64/fpu/svml_d_cos_data.S
new file mode 100644
index 0000000..5c4431a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos_data.S
@@ -0,0 +1,493 @@
+/* Data for vectorized cos.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* TODO Make tables more readable according comments */
+
+	.section .rodata, "a"
+
+	.align 64
+	.globl __gnu_svml_dcos_data
+
+/* Data table for vector implementations of function cos. 
+ * The table may contain polynomial, reduction, lookup
+ * coefficients and other constants obtained through different
+ * methods of research and experimental work.
+ */
+__gnu_svml_dcos_data:
+
+/* General constants:
+ * lAbsMask
+ */
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+
+/* lRangeVal */
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+	.long	0x00000000
+	.long	0x41600000
+
+/* HalfPI */
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+	.long	0x54442d18
+	.long	0x3ff921fb
+
+/* InvPI */
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+
+/* RShifter */
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+	.long	0x00000000
+	.long	0x43380000
+
+/* OneHalf */
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+	.long	0x00000000
+	.long	0x3fe00000
+
+/* Range reduction PI-based constants:
+ * PI1
+ */
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+	.long	0x40000000
+	.long	0x400921fb
+
+/* PI2 */
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+	.long	0x00000000
+	.long	0x3e84442d
+
+/* PI3 */
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+	.long	0x80000000
+	.long	0x3d084698
+
+/* PI4 */
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+	.long	0x701b839a
+	.long	0x3b88cc51
+
+/* Range reduction PI-based constants if FMA available:
+ * PI1_FMA
+ */
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+	.long	0x54442d18
+	.long	0x400921fb
+
+/* PI2_FMA */
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+	.long	0x33145c06
+	.long	0x3ca1a626
+
+/* PI3_FMA */
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+	.long	0x29024e09
+	.long	0x395c1cd1
+
+/* Polynomial coeffifients (relative error 2^(-52.115)):
+ * C1
+ */
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+	.long	0x555554a7
+	.long	0xbfc55555
+
+/* C2 */
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+	.long	0x1110a4a8
+	.long	0x3f811111
+
+/* C3 */
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+	.long	0x19a5b86d
+	.long	0xbf2a01a0
+
+/* C4 */
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+	.long	0x8030fea0
+	.long	0x3ec71de3
+
+/* C5 */
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+	.long	0x46002231
+	.long	0xbe5ae635
+
+/* C6 */
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+	.long	0x57a2f220
+	.long	0x3de60e68
+
+/* C7 */
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+	.long	0x0811aac8
+	.long	0xbd69f0d6
+
+/* Additional constants:
+ * AbsMask
+ */
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+	.long	0xffffffff
+	.long	0x7fffffff
+
+/* InvPI */
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+	.long	0x6dc9c883
+	.long	0x3fd45f30
+
+/* RShifter_la */
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+	.long	0x00000000
+	.long	0x43300000
+
+/* RShifter_la */
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+	.long	0xffffffff
+	.long	0x432fffff
+
+/* RSXmax_la */
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.long	0x007ffffe
+	.long	0x43300000
+	.type	__gnu_svml_dcos_data,@object
+	.size	__gnu_svml_dcos_data,.-__gnu_svml_dcos_data
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c b/sysdeps/x86_64/fpu/test-double-vlen4.c
new file mode 100644
index 0000000..68b07ca
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4.c
@@ -0,0 +1,46 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-double-vlen4.h"
+
+// Wrapper from scalar to vector function implemented in AVX2.
+#define VECTOR_WRAPPER(scalar_func,vector_func) \
+extern __m256d vector_func(__m256d);\
+FLOAT scalar_func(FLOAT x)\
+{\
+  int i;\
+  __m256d mx = _mm256_set1_pd(x);\
+  __m256d mr = vector_func(mx);\
+  for(i=1;i<4;i++)\
+  {\
+    if (((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#include <immintrin.h>
+
+VECTOR_WRAPPER (WRAPPER_NAME(cos),_ZGVdN4v_cos)
+
+#define TEST_VECTOR_cos 1
+
+#define REQUIRE_AVX2
+
+#include "libm-test.c"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/test-float-vlen8.c
new file mode 100644
index 0000000..3898df9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-vlen8.c
@@ -0,0 +1,45 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-float-vlen8.h"
+
+#define VECTOR_WRAPPER(scalar_func,vector_func) \
+/*extern __m256 vector_func(__m256);*/\
+FLOAT scalar_func(FLOAT x)\
+{\
+  int i;\
+  __m256 mx = _mm256_set1_ps(x);\
+  __m256 mr = mx; /*vector_func(mx);*/\
+  for(i=1;i<8;i++)\
+  {\
+    if(((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#include <immintrin.h>
+
+VECTOR_WRAPPER (WRAPPER_NAME(cosf),_ZGVdN8v_cosf)
+
+#define TEST_VECTOR_cosf 0
+
+#define REQUIRE_AVX2
+
+#include "libm-test.c"

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-11-06 20:51                                                       ` Andrew Senkevich
@ 2014-11-14 15:45                                                         ` Andrew Senkevich
  2014-11-14 16:51                                                           ` Joseph Myers
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-11-14 15:45 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2320 bytes --]

2014-11-06 23:51 GMT+03:00 Andrew Senkevich <andrew.n.senkevich@gmail.com>:
> Hi, Joseph,
>
> 2014-10-24 1:37 GMT+04:00 Joseph S. Myers <joseph@codesourcery.com>:
>
>> On Thu, 23 Oct 2014, Andrew Senkevich wrote:
>
>>> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
>>> index 8a94a7e..2d31a11 100644
>>> --- a/math/bits/mathcalls.h
>>> +++ b/math/bits/mathcalls.h
>>> @@ -60,6 +60,15 @@ __MATHCALL (atan,, (_Mdouble_ __x));
>>>  __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
>>>
>>>  /* Cosine of X.  */
>>> +#if !defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cos
>>> +__DECL_SIMD_cos
>>> +#endif
>>> +#if defined _Mfloat_ && !defined _Mlong_double_ && defined __DECL_SIMD_cosf
>>> +__DECL_SIMD_cosf
>>> +#endif
>>> +#if defined _Mlong_double_ && defined __DECL_SIMD_cosl
>>> +__DECL_SIMD_cosl
>>> +#endif
>>>  __MATHCALL (cos,, (_Mdouble_ __x));
>>
>> As previously noted, I think it would be much better if the definition of
>> __MATHCALL can include all the conditional bits (possibly through a
>> generated header that defines __DECL_SIMD_cos etc. to empty if not defined
>> by bits/math-vector.h).
>
> proposal is to use separated __MATHCALL_VEC for vector cases, because
> it reduces number of needed empty definitions and can be simply
> generated (__MATHCALL case requires a lot of manual search to obtain
> all affected function names because of redefinitions in some files).
>
>>> +#if defined __x86_64__ && defined __FAST_MATH__
>>> +# if defined _OPENMP && _OPENMP >= 201307
>>> +/* OpenMP case. */
>>> +#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch")
>>> +#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch")
>>
>> Of course we still need the API/ABI documentation providing the stable
>> guarantee about exactly what this pragma means regarding the function
>> versions it is saying are available in glibc.
>
> We will follow-up on this soon.
>
> I attached patch with almost all infrastructure fixes discussed
> before. It seems pragma meaning and data tables remain to be done.
> Patch affects a lot of files and of course will be separated to
> minimal disjoint parts for submission later.

Here is the patch updated in part of data table and function code
accordingly points mentioned before in this discussion.


--
WBR,
Andrew

[-- Attachment #2: libmvec_141114.patch --]
[-- Type: application/octet-stream, Size: 52589 bytes --]

diff --git a/Makeconfig b/Makeconfig
index 24a3b82..4672008 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -476,7 +476,7 @@ link-libc = $(link-libc-rpath-link) $(link-libc-before-gnulib) $(gnulib)
 link-libc-tests = $(link-libc-tests-rpath-link) \
 		  $(link-libc-before-gnulib) $(gnulib-tests)
 # This is how to find at build-time things that will be installed there.
-rpath-dirs = math elf dlfcn nss nis rt resolv crypt
+rpath-dirs = math elf dlfcn nss nis rt resolv crypt mathvec
 rpath-link = \
 $(common-objdir):$(subst $(empty) ,:,$(patsubst ../$(subdir),.,$(rpath-dirs:%=$(common-objpfx)%)))
 else
@@ -1018,7 +1018,7 @@ all-subdirs = csu assert ctype locale intl catgets math setjmp signal	    \
 	      stdlib stdio-common libio malloc string wcsmbs time dirent    \
 	      grp pwd posix io termios resource misc socket sysvipc gmon    \
 	      gnulib iconv iconvdata wctype manual shadow gshadow po argp   \
-	      crypt localedata timezone rt conform debug		    \
+	      crypt localedata timezone rt conform debug mathvec	    \
 	      $(add-on-subdirs) dlfcn elf
 
 ifndef avoid-generated
diff --git a/bits/math-vector.h b/bits/math-vector.h
new file mode 100644
index 0000000..c8fe5cb
--- /dev/null
+++ b/bits/math-vector.h
@@ -0,0 +1,22 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License  published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+	include <math.h> instead."
+#endif
diff --git a/configure b/configure
index 3c161be..dac0294 100755
--- a/configure
+++ b/configure
@@ -774,6 +774,7 @@ enable_systemtap
 enable_build_nscd
 enable_nscd
 enable_pt_chown
+enable_mathvec
 with_cpu
 '
       ac_precious_vars='build_alias
@@ -1437,6 +1438,8 @@ Optional Features:
   --disable-build-nscd    disable building and installing the nscd daemon
   --disable-nscd          library functions will not contact the nscd daemon
   --enable-pt_chown       Enable building and installing pt_chown
+  --enable-mathvec        Enable building and installing mathvec [default
+                          depends on architecture]
 
 Optional Packages:
   --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
@@ -3730,6 +3733,14 @@ if test "$build_pt_chown" = yes; then
 
 fi
 
+# Check whether --enable-mathvec was given.
+if test "${enable_mathvec+set}" = set; then :
+  enableval=$enable_mathvec; build_mathvec=$enableval
+else
+  build_mathvec=notset
+fi
+
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.
diff --git a/configure.ac b/configure.ac
index a982407..b8fd820 100644
--- a/configure.ac
+++ b/configure.ac
@@ -353,6 +353,12 @@ if test "$build_pt_chown" = yes; then
   AC_DEFINE(HAVE_PT_CHOWN)
 fi
 
+AC_ARG_ENABLE([mathvec],
+	      [AS_HELP_STRING([--enable-mathvec],
+	      [Enable building and installing mathvec @<:@default depends on architecture@:>@])],
+	      [build_mathvec=$enableval],
+	      [build_mathvec=notset])
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.
diff --git a/include/libm-simd-decl-stubs.h b/include/libm-simd-decl-stubs.h
new file mode 100644
index 0000000..0048717
--- /dev/null
+++ b/include/libm-simd-decl-stubs.h
@@ -0,0 +1,35 @@
+/* Empty definitions required for __MATHCALL_VEC unfolding in mathcalls.h.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Needed definitions could be generated with: 
+   for func in $(grep __MATHCALL_VEC math/bits/mathcalls.h |\
+		 sed -r "s|__MATHCALL_VEC.?\(||; s|,.*||"); do 
+     echo "#define __DECL_SIMD_${func}"; 
+     echo "#define __DECL_SIMD_${func}f"; 
+     echo "#define __DECL_SIMD_${func}l";
+   done 
+ */
+
+#ifndef _LIBM_SIMD_DECL_STUBS_H
+#define _LIBM_SIMD_DECL_STUBS_H 1
+
+#define __DECL_SIMD_cos
+#define __DECL_SIMD_cosf
+#define __DECL_SIMD_cosl
+
+#endif
diff --git a/math/Makefile b/math/Makefile
index 866bc0f..dee39d1 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -26,7 +26,7 @@ headers		:= math.h bits/mathcalls.h bits/mathinline.h bits/huge_val.h \
 		   bits/huge_valf.h bits/huge_vall.h bits/inf.h bits/nan.h \
 		   fpu_control.h complex.h bits/cmathcalls.h fenv.h \
 		   bits/fenv.h bits/fenvinline.h bits/mathdef.h tgmath.h \
-		   bits/math-finite.h
+		   bits/math-finite.h bits/math-vector.h libm-simd-decl-stubs.h
 
 # FPU support code.
 aux		:= setfpucw fpu_control
@@ -85,6 +85,22 @@ generated += $(foreach s,.c .S l.c l.S f.c f.S,$(calls:s_%=m_%$s))
 routines = $(calls) $(calls:=f) $(long-c-$(long-double-fcts))
 long-c-yes = $(calls:=l)
 
+ifeq ($(build-mathvec),yes)
+# We need to install libm.so as linker script
+# for more comfortable use of vector math library.
+install-lib-ldscripts := libm.so
+install_subdir: $(inst_libdir)/libm.so
+$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
+	$(libm) \
+	$(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
+	$(+force)
+	(echo '/* GNU ld script'; echo '*/';\
+	cat $<; \
+	echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
+	'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
+	) > $@
+endif
+
 # Rules for the test suite.
 tests = test-matherr test-fenv atest-exp atest-sincos atest-exp2 basic-test \
 	test-misc test-fpucw test-fpucw-ieee tst-definitions test-tgmath \
@@ -97,12 +113,13 @@ tests-static = test-fpucw-static test-fpucw-ieee-static
 test-longdouble-yes = test-ldouble test-ildoubl
 
 ifneq (no,$(PERL))
+libm-vec-tests = $(addprefix test-,$(libmvec-tests))
 libm-tests = test-float test-double $(test-longdouble-$(long-double-fcts)) \
-	test-ifloat test-idouble
+	test-ifloat test-idouble $(libm-vec-tests)
 libm-tests.o = $(addsuffix .o,$(libm-tests))
 
 tests += $(libm-tests)
-libm-tests-generated = libm-test-ulps.h libm-test.c
+libm-tests-generated = libm-test-ulps.h libm-have-vector-test.h libm-test.c
 generated += $(libm-tests-generated) libm-test.stmp
 
 # This is needed for dependencies
@@ -113,9 +130,10 @@ ulps-file = $(firstword $(wildcard $(sysdirs:%=%/libm-test-ulps)))
 $(addprefix $(objpfx), $(libm-tests-generated)): $(objpfx)libm-test.stmp
 
 $(objpfx)libm-test.stmp: $(ulps-file) libm-test.inc gen-libm-test.pl \
-			 auto-libm-test-out
+			 gen-libm-have-vector-test.sh auto-libm-test-out
 	$(make-target-directory)
 	$(PERL) gen-libm-test.pl -u $< -o "$(objpfx)"
+	$(BASH) gen-libm-have-vector-test.sh > $(objpfx)libm-have-vector-test.h
 	@echo > $@
 
 $(objpfx)test-float.o: $(objpfx)libm-test.stmp
@@ -124,8 +142,22 @@ $(objpfx)test-double.o: $(objpfx)libm-test.stmp
 $(objpfx)test-idouble.o: $(objpfx)libm-test.stmp
 $(objpfx)test-ldouble.o: $(objpfx)libm-test.stmp
 $(objpfx)test-ildoubl.o: $(objpfx)libm-test.stmp
+
+$(objpfx)test-double-vlen4.o: $(objpfx)libm-test.stmp
+$(objpfx)test-float-vlen8.o: $(objpfx)libm-test.stmp
+
+$(objpfx)test-double-vlen4: $(common-objpfx)mathvec/libmvec.so \
+			    $(objpfx)init-arch.o
+$(objpfx)test-float-vlen8: $(common-objpfx)mathvec/libmvec.so \
+			   $(objpfx)init-arch.o
 endif
 
+CFLAGS-test-double-vlen4.c = -fno-inline -ffloat-store -fno-builtin -frounding-math \
+			     -D__FAST_MATH__ -DTEST_FAST_MATH -D_OPENMP=201307 \
+			     -Wno-unknown-pragmas $(arch-ext-cflags)
+CFLAGS-test-float-vlen8.c = -fno-inline -ffloat-store -fno-builtin -frounding-math \
+			    -D__FAST_MATH__ -DTEST_FAST_MATH -D_OPENMP=201307 \
+			    -Wno-unknown-pragmas $(arch-ext-cflags)
 CFLAGS-test-float.c = -fno-inline -ffloat-store -fno-builtin -frounding-math
 CFLAGS-test-double.c = -fno-inline -ffloat-store -fno-builtin -frounding-math
 CFLAGS-test-ldouble.c = -fno-inline -ffloat-store -fno-builtin -frounding-math
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 8a94a7e..82928a1 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -60,7 +60,7 @@ __MATHCALL (atan,, (_Mdouble_ __x));
 __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
 
 /* Cosine of X.  */
-__MATHCALL (cos,, (_Mdouble_ __x));
+__MATHCALL_VEC (cos,, (_Mdouble_ __x));
 /* Sine of X.  */
 __MATHCALL (sin,, (_Mdouble_ __x));
 /* Tangent of X.  */
diff --git a/math/gen-libm-have-vector-test.sh b/math/gen-libm-have-vector-test.sh
new file mode 100755
index 0000000..95c7bef
--- /dev/null
+++ b/math/gen-libm-have-vector-test.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+# Copyright (C) 1999-2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Generate series of definitions used for vector math functions tests.
+print_defs()
+{
+  echo "#if defined TEST_VECTOR_$1 && TEST_VECTOR_$1"
+  echo "# define HAVE_VECTOR_$1 1"
+  echo "# define VEC_PREFIX_$1 WRAPPER_NAME($1)"
+  echo "#else"
+  echo "# define HAVE_VECTOR_$1 0"
+  echo "# define VEC_PREFIX_$1 $1"
+  echo "#endif"
+  echo
+}
+
+for func in $(grep ALL_RM_TEST libm-test.inc | grep -v define | sed -r "s/.*\(//; s/,.*//"); do 
+  print_defs ${func}
+  print_defs ${func}f
+  print_defs ${func}l
+done
+
+print_defs jn
+print_defs jnf
+print_defs jnl
+
+print_defs cexp
+print_defs cexpf
+print_defs cexpl
+
+print_defs tgamma
+print_defs tgammaf
+print_defs tgammal
diff --git a/math/libm-test.inc b/math/libm-test.inc
index f86a4fa..79bcfca 100644
--- a/math/libm-test.inc
+++ b/math/libm-test.inc
@@ -678,13 +678,17 @@ test_exceptions (const char *test_name, int exception)
   feclearexcept (FE_ALL_EXCEPT);
 }
 
+#ifndef TEST_MATHVEC
+# define TEST_MATHVEC 0
+#endif
+
 /* Test whether errno for TEST_NAME, set to ERRNO_VALUE, has value
    EXPECTED_VALUE (description EXPECTED_NAME).  */
 static void
 test_single_errno (const char *test_name, int errno_value,
 		   int expected_value, const char *expected_name)
 {
-#ifndef TEST_INLINE
+#if !defined TEST_INLINE && !TEST_MATHVEC
   if (errno_value == expected_value)
     {
       if (print_screen (1))
@@ -1295,16 +1299,17 @@ struct test_fFF_11_data
 
 /* Run an individual test, including any required setup and checking
    of results, or loop over all tests in an array.  */
-#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
-		     EXCEPTIONS);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
+		     EXCEPTIONS)				\
+  do								\
+    if (enable_test (EXCEPTIONS))				\
+      {								\
+	COMMON_TEST_SETUP (ARG_STR);				\
+	check_float (test_name,	FUNC_TEST (FUNC_NAME) (ARG),	\
+		     EXPECTED,					\
+		     EXCEPTIONS);				\
+	COMMON_TEST_CLEANUP;					\
+      }								\
   while (0)
 #define RUN_TEST_LOOP_f_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1313,16 +1318,16 @@ struct test_fFF_11_data
 		    (ARRAY)[i].RM_##ROUNDING_MODE.expected,		\
 		    (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);		\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_2_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,	\
-		     EXCEPTIONS)				\
-  do								\
-    if (enable_test (EXCEPTIONS))				\
-      {								\
-	COMMON_TEST_SETUP (ARG_STR);				\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG1, ARG2),	\
-		     EXPECTED, EXCEPTIONS);			\
-	COMMON_TEST_CLEANUP;					\
-      }								\
+#define RUN_TEST_2_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,		\
+		     EXCEPTIONS)					\
+  do									\
+    if (enable_test (EXCEPTIONS))					\
+      {									\
+	COMMON_TEST_SETUP (ARG_STR);					\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG1, ARG2),	\
+		     EXPECTED, EXCEPTIONS);				\
+	COMMON_TEST_CLEANUP;						\
+      }									\
   while (0)
 #define RUN_TEST_LOOP_2_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1340,16 +1345,16 @@ struct test_fFF_11_data
 #define RUN_TEST_LOOP_fl_f RUN_TEST_LOOP_2_f
 #define RUN_TEST_if_f RUN_TEST_2_f
 #define RUN_TEST_LOOP_if_f RUN_TEST_LOOP_2_f
-#define RUN_TEST_fff_f(ARG_STR, FUNC_NAME, ARG1, ARG2, ARG3,		\
-		       EXPECTED, EXCEPTIONS)				\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG1, ARG2, ARG3),	\
-		     EXPECTED, EXCEPTIONS);				\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fff_f(ARG_STR, FUNC_NAME, ARG1, ARG2, ARG3,			\
+		       EXPECTED, EXCEPTIONS)					\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG1, ARG2, ARG3),	\
+		     EXPECTED, EXCEPTIONS);					\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fff_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1359,17 +1364,17 @@ struct test_fFF_11_data
 		      (ARRAY)[i].RM_##ROUNDING_MODE.expected,		\
 		      (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_c_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,		\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name,						\
-		     FUNC (FUNC_NAME) (BUILD_COMPLEX (ARG1, ARG2)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_c_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,			\
+		     EXCEPTIONS)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_float (test_name,							\
+		     FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARG1, ARG2)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_c_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1387,7 +1392,7 @@ struct test_fFF_11_data
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
 	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		     EXCEPTIONS);					\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA_TEST)							\
@@ -1406,22 +1411,22 @@ struct test_fFF_11_data
 		     (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,		\
 		     (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_fF_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
-		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,		\
-		       EXTRA_EXPECTED)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
-	if (EXTRA_TEST)							\
-	  check_float (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);	\
-	EXTRA_OUTPUT_TEST_CLEANUP (1);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fF_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
+		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,			\
+		       EXTRA_EXPECTED)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;				\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);					\
+	if (EXTRA_TEST)								\
+	  check_float (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);		\
+	EXTRA_OUTPUT_TEST_CLEANUP (1);						\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fF_f1(FUNC_NAME, ARRAY, ROUNDING_MODE, EXTRA_VAR)	\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1433,22 +1438,22 @@ struct test_fFF_11_data
 		      (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,		\
 		      (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_fI_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
-		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,		\
-		       EXTRA_EXPECTED)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
-	if (EXTRA_TEST)							\
-	  check_int (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);	\
-	EXTRA_OUTPUT_TEST_CLEANUP (1);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fI_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
+		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,			\
+		       EXTRA_EXPECTED)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;				\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);					\
+	if (EXTRA_TEST)								\
+	  check_int (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);		\
+	EXTRA_OUTPUT_TEST_CLEANUP (1);						\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fI_f1(FUNC_NAME, ARRAY, ROUNDING_MODE, EXTRA_VAR)	\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1469,7 +1474,7 @@ struct test_fFF_11_data
 	COMMON_TEST_SETUP (ARG_STR);					\
 	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
 	check_float (test_name,						\
-		     FUNC (FUNC_NAME) (ARG1, ARG2, &(EXTRA_VAR)),	\
+		     FUNC_TEST (FUNC_NAME) (ARG1, ARG2, &(EXTRA_VAR)),	\
 		     EXPECTED, EXCEPTIONS);				\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA_TEST)							\
@@ -1490,17 +1495,17 @@ struct test_fFF_11_data
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,	\
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_c_c(ARG_STR, FUNC_NAME, ARGR, ARGC, EXPR, EXPC,	\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_complex (test_name,					\
-		       FUNC (FUNC_NAME) (BUILD_COMPLEX (ARGR, ARGC)),	\
-		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);		\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_c_c(ARG_STR, FUNC_NAME, ARGR, ARGC, EXPR, EXPC,		\
+		     EXCEPTIONS)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_complex (test_name,						\
+		       FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARGR, ARGC)),	\
+		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);			\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_c_c(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1511,18 +1516,18 @@ struct test_fFF_11_data
 		    (ARRAY)[i].RM_##ROUNDING_MODE.expc,			\
 		    (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);		\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_cc_c(ARG_STR, FUNC_NAME, ARG1R, ARG1C, ARG2R, ARG2C,	\
-		      EXPR, EXPC, EXCEPTIONS)				\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_complex (test_name,					\
-		       FUNC (FUNC_NAME) (BUILD_COMPLEX (ARG1R, ARG1C),	\
-					 BUILD_COMPLEX (ARG2R, ARG2C)),	\
-		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);		\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_cc_c(ARG_STR, FUNC_NAME, ARG1R, ARG1C, ARG2R, ARG2C,		\
+		      EXPR, EXPC, EXCEPTIONS)					\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_complex (test_name,						\
+		       FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARG1R, ARG1C),	\
+					      BUILD_COMPLEX (ARG2R, ARG2C)),	\
+		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);			\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_cc_c(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1539,7 +1544,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_int (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,		\
+	check_int (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		   EXCEPTIONS);						\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1592,7 +1597,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_bool (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_bool (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		    EXCEPTIONS);					\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1626,7 +1631,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_long (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_long (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		    EXCEPTIONS);					\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1643,8 +1648,8 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_longlong (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
-			EXCEPTIONS);					\
+	check_longlong (test_name, FUNC_TEST (FUNC_NAME) (ARG),		\
+			EXPECTED, EXCEPTIONS);				\
 	COMMON_TEST_CLEANUP;						\
       }									\
   while (0)
@@ -1663,7 +1668,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	FUNC (FUNC_NAME) (ARG, &(EXTRA1_VAR), &(EXTRA2_VAR));		\
+	FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA1_VAR), &(EXTRA2_VAR));	\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA1_TEST)						\
 	  check_float (extra1_name, EXTRA1_VAR, EXTRA1_EXPECTED,	\
@@ -1690,9 +1695,31 @@ struct test_fFF_11_data
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra2_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
 
+#ifndef INIT_ARCH_EXT
+# define INIT_ARCH_EXT
+# define CHECK_ARCH_EXT
+#endif
+
+#ifndef VEC_PREFIX 
+# define VEC_PREFIX
+#endif
+
+#ifndef FUNC_TEST
+# define FUNC_TEST FUNC
+#endif
+
+#include "libm-have-vector-test.h"
+
+#define STR_CONCAT(a,b,c) __STRING(a##b##c)
+#define STR_CON3(a,b,c) STR_CONCAT(a,b,c)
+
+#define HAVE_VECTOR(func) __CONCAT(HAVE_VECTOR_,func)
+
 /* Start and end the tests for a given function.  */
-#define START(FUNC, EXACT)			\
-  const char *this_func = #FUNC;		\
+#define START(FUN, SUFF, EXACT)					\
+  CHECK_ARCH_EXT						\
+  if (TEST_MATHVEC && !HAVE_VECTOR(FUNC(FUN))) return;		\
+  const char *this_func = STR_CON3(VEC_PREFIX,FUN,SUFF);	\
   init_max_error (this_func, EXACT)
 #define END					\
   print_max_error (this_func)
@@ -1705,28 +1732,28 @@ struct test_fFF_11_data
     {									\
       do								\
 	{								\
-	  START (FUNC, EXACT);						\
+	  START (FUNC, , EXACT);					\
 	  LOOP_MACRO (FUNC, ARRAY, , ## __VA_ARGS__);			\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _downward, EXACT);				\
+	  START (FUNC, _downward, EXACT);				\
 	  LOOP_MACRO (FUNC, ARRAY, FE_DOWNWARD, ## __VA_ARGS__);	\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _towardzero, EXACT);				\
+	  START (FUNC, _towardzero, EXACT);				\
 	  LOOP_MACRO (FUNC, ARRAY, FE_TOWARDZERO, ## __VA_ARGS__);	\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _upward, EXACT);				\
+	  START (FUNC, _upward, EXACT);					\
 	  LOOP_MACRO (FUNC, ARRAY, FE_UPWARD, ## __VA_ARGS__);		\
 	  END_MACRO;							\
 	}								\
@@ -6034,7 +6061,7 @@ static const struct test_c_c_data cexp_test_data[] =
 static void
 cexp_test (void)
 {
-  START (cexp, 0);
+  START (cexp, , 0);
   RUN_TEST_LOOP_c_c (cexp, cexp_test_data, );
   END_COMPLEX;
 }
@@ -7548,7 +7575,7 @@ static const struct test_if_f_data jn_test_data[] =
 static void
 jn_test (void)
 {
-  START (jn, 0);
+  START (jn, , 0);
   RUN_TEST_LOOP_if_f (jn, jn_test_data, );
   END;
 }
@@ -9374,7 +9401,7 @@ static const struct test_f_f_data tgamma_test_data[] =
 static void
 tgamma_test (void)
 {
-  START (tgamma, 0);
+  START (tgamma, , 0);
   RUN_TEST_LOOP_f_f (tgamma, tgamma_test_data, );
   END;
 }
@@ -9824,6 +9851,8 @@ main (int argc, char **argv)
   initialize ();
   printf (TEST_MSG);
 
+  INIT_ARCH_EXT
+
   check_ulp ();
 
   /* Keep the tests a wee bit ordered (according to ISO C99).  */
diff --git a/math/math.h b/math/math.h
index dc532b7..b44a23b 100644
--- a/math/math.h
+++ b/math/math.h
@@ -27,6 +27,9 @@
 
 __BEGIN_DECLS
 
+/* Get machine-dependent vector math functions declarations */
+#include <bits/math-vector.h>
+
 /* Get machine-dependent HUGE_VAL value (returned on overflow).
    On all IEEE754 machines, this is +Infinity.  */
 #include <bits/huge_val.h>
@@ -49,6 +52,12 @@ __BEGIN_DECLS
    so we can easily declare each function as both `name' and `__name',
    and can declare the float versions `namef' and `__namef'.  */
 
+#define __SIMD_DECL(function) __CONCAT(__DECL_SIMD_,function)
+
+#define __MATHCALL_VEC(function,suffix, args) 	\
+  __SIMD_DECL(__MATH_PRECNAME(function,suffix)) \
+  __MATHCALL(function,suffix, args)
+
 #define __MATHCALL(function,suffix, args)	\
   __MATHDECL (_Mdouble_,function,suffix, args)
 #define __MATHDECL(type, function,suffix, args) \
diff --git a/math/test-double-vlen4.h b/math/test-double-vlen4.h
new file mode 100644
index 0000000..a71a3d0
--- /dev/null
+++ b/math/test-double-vlen4.h
@@ -0,0 +1,42 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FLOAT double
+#define FUNC(function) function
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat) Cdouble
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_DOUBLE 1
+#define TEST_MATHVEC 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define EXCEPTION_TESTS_double 0
+#define ROUNDING_TESTS_double(MODE) ((MODE) == FE_TONEAREST)
+
+#define VEC_PREFIX vlen4_
+
+#define CONCAT(prefix,func) __CONCAT(prefix,func)
+
+#define WRAPPER_NAME(function) CONCAT(VEC_PREFIX,function)
+
+#define FUNC_TEST(function) VEC_PREFIX_ ## function
diff --git a/math/test-float-vlen8.h b/math/test-float-vlen8.h
new file mode 100644
index 0000000..a1a86a1
--- /dev/null
+++ b/math/test-float-vlen8.h
@@ -0,0 +1,42 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FLOAT float
+#define FUNC(function) function ## f
+#define TEST_MSG "testing float vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat) Cfloat
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_FLOAT 1
+#define TEST_MATHVEC 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define EXCEPTION_TESTS_float 0
+#define ROUNDING_TESTS_float(MODE) ((MODE) == FE_TONEAREST)
+
+#define VEC_PREFIX vlen8_
+
+#define CONCAT(prefix,func) __CONCAT(prefix,func)
+
+#define WRAPPER_NAME(function) CONCAT(VEC_PREFIX,function)
+
+#define FUNC_TEST(function) VEC_PREFIX_ ## function ## f
diff --git a/mathvec/Depend b/mathvec/Depend
new file mode 100644
index 0000000..ede10ab
--- /dev/null
+++ b/mathvec/Depend
@@ -0,0 +1 @@
+math
diff --git a/mathvec/Makefile b/mathvec/Makefile
new file mode 100644
index 0000000..26c552c
--- /dev/null
+++ b/mathvec/Makefile
@@ -0,0 +1,35 @@
+# Copyright (C) 2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Makefile for the vector math library.
+
+subdir		:= mathvec
+
+include ../Makeconfig
+
+ifeq ($(build-mathvec),yes)
+extra-libs	:= libmvec
+extra-libs-others = $(extra-libs)
+
+libmvec-routines = $(strip $(libmvec-support))
+
+$(objpfx)libmvec.so: $(libm)
+endif
+
+# Rules for the test suite are in math directory
+
+include ../Rules
diff --git a/shlib-versions b/shlib-versions
index e05b248..fa3cf1d 100644
--- a/shlib-versions
+++ b/shlib-versions
@@ -71,3 +71,6 @@ libanl=1
 # This defines the libgcc soname version this glibc is to load for
 # asynchronous cancellation to work correctly.
 libgcc_s=1
+
+# The vector math library
+libmvec=1
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
new file mode 100644
index 0000000..8272ddd
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -0,0 +1,3 @@
+GLIBC_2.21
+ GLIBC_2.21 A
+ _ZGVdN4v_cos F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
new file mode 100644
index 0000000..c58a481
--- /dev/null
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -0,0 +1,50 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+	include <math.h> instead."
+#endif
+
+/* Get default empty definitions for simd declarations */
+#include <libm-simd-decl-stubs.h>
+
+#if defined __x86_64__ && defined __FAST_MATH__
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case. */
+/* TODO document about pragma meaning */
+#  define __DECL_SIMD_AVX2 _Pragma("omp declare simd notinbranch simdlen(4)")
+#  define __DECL_SIMD_SSE4 _Pragma("omp declare simd notinbranch simdlen(8)")
+#  undef __DECL_SIMD_cos
+#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
+#  undef __DECL_SIMD_cosf
+#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
+# elif defined _CILKPLUS && _CILKPLUS >= 0 
+/* CilkPlus case. */
+/* TODO _CILKPLUS currently nowhere defined, 
+ * add reserved-namespace versions and __GNUC_PREREQ
+#  define __DECL_SIMD_AVX2 __attribute__((__vector__(__vectorlength__(4),\
+						     __nomask__)))
+#  define __DECL_SIMD_SSE4 __attribute__((__vector__(__vectorlength__(8),\
+						     __nomask__)))
+#  undef __DECL_SIMD_cos
+#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
+#  undef __DECL_SIMD_cosf
+#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4 */
+# endif
+#endif
diff --git a/sysdeps/x86_64/configure b/sysdeps/x86_64/configure
index 7d4dadd..9773770 100644
--- a/sysdeps/x86_64/configure
+++ b/sysdeps/x86_64/configure
@@ -275,6 +275,16 @@ fi
 config_vars="$config_vars
 config-cflags-avx2 = $libc_cv_cc_avx2"
 
+if test x"$build_mathvec" = xnotset; then
+  if test x"$base_machine" = xx86_64; then
+    build_mathvec=yes
+  else
+    build_mathvec=no
+  fi
+fi
+config_vars="$config_vars
+build-mathvec = $build_mathvec"
+
 $as_echo "#define PI_STATIC_AND_HIDDEN 1" >>confdefs.h
 
 # work around problem with autoconf and empty lines at the end of files
diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
index c9f9a51..0b73d5b 100644
--- a/sysdeps/x86_64/configure.ac
+++ b/sysdeps/x86_64/configure.ac
@@ -99,6 +99,15 @@ if test $libc_cv_cc_avx2 = yes; then
 fi
 LIBC_CONFIG_VAR([config-cflags-avx2], [$libc_cv_cc_avx2])
 
+if test x"$build_mathvec" = xnotset; then
+  if test x"$base_machine" = xx86_64; then
+    build_mathvec=yes
+  else
+    build_mathvec=no
+  fi
+fi
+LIBC_CONFIG_VAR([build-mathvec], [$build_mathvec])
+
 dnl It is always possible to access static and hidden symbols in an
 dnl position independent way.
 AC_DEFINE(PI_STATIC_AND_HIDDEN)
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
new file mode 100644
index 0000000..1b65b09
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -0,0 +1,13 @@
+ifeq ($(subdir),mathvec)
+libmvec-support += svml_d_cos4_core svml_d_cos_data
+endif
+
+# Rules for libmvec tests
+ifeq ($(subdir),math)
+ifeq ($(build-mathvec),yes)
+libmvec-tests += double-vlen4 float-vlen8
+
+arch-ext-cflags = -mavx2 
+
+endif
+endif
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
new file mode 100644
index 0000000..3d433d2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Versions
@@ -0,0 +1,5 @@
+libmvec {
+  GLIBC_2.21 {
+    _ZGVdN4v_cos;
+  }
+}
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 36e1b76..9e4f8cd 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1961,6 +1961,12 @@ ifloat: 3
 ildouble: 4
 ldouble: 4
 
+Function: "vlen4_cos":
+double: 1
+
+Function: "vlen8_cos":
+float: 1
+
 Function: "y0":
 double: 2
 float: 1
diff --git a/sysdeps/x86_64/fpu/math-tests.h b/sysdeps/x86_64/fpu/math-tests.h
new file mode 100644
index 0000000..466b97b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/math-tests.h
@@ -0,0 +1,34 @@
+/* Configuration for math tests.  x86_64 version.
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef REQUIRE_AVX2
+# include <init-arch.h>
+
+  static int avx2_usable;	/* Set to 1 if AVX2 supported */
+
+# define INIT_ARCH_EXT 						\
+    __init_cpu_features ();					\
+    avx2_usable = __cpu_features.feature[index_AVX2_Usable]	\
+		& bit_AVX2_Usable;
+
+# define CHECK_ARCH_EXT						\
+  if (!avx2_usable) return;
+
+#endif
+
+#include_next <math-tests.h>
diff --git a/sysdeps/x86_64/fpu/svml_d_cos4_core.S b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
new file mode 100644
index 0000000..95db6b3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
@@ -0,0 +1,195 @@
+/* Function cos vectorized with AVX2.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+#define _DATA_TABLE_OFFSETS_ONLY_
+#include "svml_d_cos_data.S"
+
+	.text
+ENTRY(_ZGVdN4v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *     
+ *    ( low accuracy ( < 4ulp ) or enhanced performance 
+ *      ( half of correct mantissa ) implementation )
+ *     
+ *    Argument representation:
+ *    arg + Pi/2 = (N*Pi + R)
+ *    
+ *    Result calculation:
+ *    cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ *    sin(R) is approximated by corresponding polynomial
+ */
+        pushq     %rbp
+        movq      %rsp, %rbp
+        andq      $-64, %rsp
+        subq      $448, %rsp
+        movq      __svml_dcos_data@GOTPCREL(%rip), %rax
+        vmovapd   %ymm0, %ymm1
+        vmovupd __dInvPI(%rax), %ymm4
+        vmovupd __dRShifter(%rax), %ymm5
+
+/*
+ * ARGUMENT RANGE REDUCTION:
+ * Add Pi/2 to argument: X' = X+Pi/2
+ */
+        vaddpd __dHalfPI(%rax), %ymm1, %ymm7
+
+/* Get absolute argument value: X' = |X'| */
+        vandpd __dAbsMask(%rax), %ymm7, %ymm2
+
+/* Y = X'*InvPi + RS : right shifter add */
+        vfmadd213pd %ymm5, %ymm4, %ymm7
+        vmovupd __dC7(%rax), %ymm4
+
+/* Check for large arguments path */
+        vcmpnle_uqpd __dRangeVal(%rax), %ymm2, %ymm3
+
+/* N = Y - RS : right shifter sub */
+        vsubpd    %ymm5, %ymm7, %ymm6
+        vmovupd __dPI1_FMA(%rax), %ymm2
+
+/* SignRes = Y<<63 : shift LSB to MSB place for result sign */
+        vpsllq    $63, %ymm7, %ymm7
+
+/* N = N - 0.5 */
+        vsubpd __dOneHalf(%rax), %ymm6, %ymm0
+        vmovmskpd %ymm3, %ecx
+
+/* R = X - N*Pi1 */
+        vmovapd   %ymm1, %ymm3
+        vfnmadd231pd %ymm0, %ymm2, %ymm3
+
+/* R = R - N*Pi2 */
+        vfnmadd231pd __dPI2_FMA(%rax), %ymm0, %ymm3
+
+/* R = R - N*Pi3 */
+        vfnmadd132pd __dPI3_FMA(%rax), %ymm3, %ymm0
+
+/*
+ * POLYNOMIAL APPROXIMATION:
+ * R2 = R*R
+ */
+        vmulpd    %ymm0, %ymm0, %ymm5
+        vfmadd213pd __dC6(%rax), %ymm5, %ymm4
+        vfmadd213pd __dC5(%rax), %ymm5, %ymm4
+        vfmadd213pd __dC4(%rax), %ymm5, %ymm4
+
+/* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))) */
+        vfmadd213pd __dC3(%rax), %ymm5, %ymm4
+
+/* Poly = R+R*(R2*(C1+R2*(C2+R2*Poly))) */
+        vfmadd213pd __dC2(%rax), %ymm5, %ymm4
+        vfmadd213pd __dC1(%rax), %ymm5, %ymm4
+        vmulpd    %ymm5, %ymm4, %ymm6
+        vfmadd213pd %ymm0, %ymm0, %ymm6
+
+/*
+ * RECONSTRUCTION:
+ * Final sign setting: Res = Poly^SignRes
+ */
+        vxorpd    %ymm7, %ymm6, %ymm0
+        testl     %ecx, %ecx
+        jne       .LBL_1_3
+
+.LBL_1_2:
+        movq      %rbp, %rsp
+        popq      %rbp
+        ret
+
+.LBL_1_3:
+        vmovupd   %ymm1, 320(%rsp)
+        vmovupd   %ymm0, 384(%rsp)
+        je        .LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        vmovups   %ymm8, 224(%rsp)
+        vmovups   %ymm9, 192(%rsp)
+        vmovups   %ymm10, 160(%rsp)
+        vmovups   %ymm11, 128(%rsp)
+        vmovups   %ymm12, 96(%rsp)
+        vmovups   %ymm13, 64(%rsp)
+        vmovups   %ymm14, 32(%rsp)
+        vmovups   %ymm15, (%rsp)
+        movq      %rsi, 264(%rsp)
+        movq      %rdi, 256(%rsp)
+        movq      %r12, 296(%rsp)
+        movb      %dl, %r12b
+        movq      %r13, 288(%rsp)
+        movl      %ecx, %r13d
+        movq      %r14, 280(%rsp)
+        movl      %eax, %r14d
+        movq      %r15, 272(%rsp)
+
+.LBL_1_6:
+        btl       %r14d, %r13d
+        jc        .LBL_1_12
+
+.LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        .LBL_1_10
+
+.LBL_1_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        .LBL_1_6
+
+        vmovups   224(%rsp), %ymm8
+        vmovups   192(%rsp), %ymm9
+        vmovups   160(%rsp), %ymm10
+        vmovups   128(%rsp), %ymm11
+        vmovups   96(%rsp), %ymm12
+        vmovups   64(%rsp), %ymm13
+        vmovups   32(%rsp), %ymm14
+        vmovups   (%rsp), %ymm15
+        vmovupd   384(%rsp), %ymm0
+        movq      264(%rsp), %rsi
+        movq      256(%rsp), %rdi
+        movq      296(%rsp), %r12
+        movq      288(%rsp), %r13
+        movq      280(%rsp), %r14
+        movq      272(%rsp), %r15
+        jmp       .LBL_1_2
+
+.LBL_1_10:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    328(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 392(%rsp,%r15)
+        jmp       .LBL_1_8
+
+.LBL_1_12:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    320(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 384(%rsp,%r15)
+        jmp       .LBL_1_7
+
+END(_ZGVdN4v_cos)
diff --git a/sysdeps/x86_64/fpu/svml_d_cos_data.S b/sysdeps/x86_64/fpu/svml_d_cos_data.S
new file mode 100644
index 0000000..4e9f36b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos_data.S
@@ -0,0 +1,147 @@
+/* Data for vectorized cos.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef D_COS_DATA
+#define D_COS_DATA
+
+/* Offsets for data table
+ */
+#define __dAbsMask		0
+#define __dRangeVal		64
+#define __dHalfPI		128
+#define __dInvPI		192
+#define __dRShifter		256
+#define __dOneHalf		320
+#define __dPI1			384
+#define __dPI2			448
+#define __dPI3			512
+#define __dPI4			576
+#define __dPI1_FMA		640
+#define __dPI2_FMA		704
+#define __dPI3_FMA		768
+#define __dC1			832
+#define __dC2			896
+#define __dC3			960
+#define __dC4			1024
+#define __dC5			1088
+#define __dC6			1152
+#define __dC7			1216
+#define __dAbsMask_la		1280
+#define __dInvPI_la		1344
+#define __dRShifter_la		1408
+#define __dRShifterm5_la	1472
+#define __dRXmax_la		1536
+
+#ifndef _DATA_TABLE_OFFSETS_ONLY_
+
+.macro double_vector offset value
+.if .-__svml_dcos_data != \offset
+.err
+.endif
+.rept 8
+.quad \value
+.endr
+.endm
+
+	.section .rodata, "a"
+	.align 64
+
+/* Data table for vector implementations of function cos. 
+ * The table may contain polynomial, reduction, lookup
+ * coefficients and other constants obtained through different
+ * methods of research and experimental work.
+ */
+	.globl __svml_dcos_data
+__svml_dcos_data:
+
+/* General purpose constants:
+ * absolute value mask
+ */
+double_vector __dAbsMask 0x7fffffffffffffff
+
+/* working range threshold */
+double_vector __dRangeVal 0x4160000000000000
+
+/* PI/2 */
+double_vector __dHalfPI 0x3ff921fb54442d18
+
+/* 1/PI */
+double_vector __dInvPI 0x3fd45f306dc9c883
+
+/* right-shifter constant */
+double_vector __dRShifter 0x4338000000000000
+
+/* 0.5 */
+double_vector __dOneHalf 0x3fe0000000000000
+
+/* Range reduction PI-based constants:
+ * PI high part
+ */
+double_vector __dPI1 0x400921fb40000000
+
+/* PI mid  part 1 */
+double_vector __dPI2 0x3e84442d00000000
+
+/* PI mid  part 2 */
+double_vector __dPI3 0x3d08469880000000
+
+/* PI low  part */
+double_vector __dPI4 0x3b88cc51701b839a
+
+/* Range reduction PI-based constants if FMA available:
+ * PI high part (FMA available)
+ */
+double_vector __dPI1_FMA 0x400921fb54442d18
+
+/* PI mid part  (FMA available) */
+double_vector __dPI2_FMA 0x3ca1a62633145c06
+
+/* PI low part  (FMA available) */
+double_vector __dPI3_FMA 0x395c1cd129024e09
+
+/* Polynomial coefficients (relative error 2^(-52.115)): */
+double_vector __dC1 0xbfc55555555554a7
+double_vector __dC2 0x3f8111111110a4a8
+double_vector __dC3 0xbf2a01a019a5b86d
+double_vector __dC4 0x3ec71de38030fea0
+double_vector __dC5 0xbe5ae63546002231
+double_vector __dC6 0x3de60e6857a2f220
+double_vector __dC7 0xbd69f0d60811aac8
+
+/*
+ * Additional constants:
+ * absolute value mask
+ */
+double_vector __dAbsMask_la 0x7fffffffffffffff
+
+/* 1/PI */
+double_vector __dInvPI_la 0x3fd45f306dc9c883
+
+/* right-shifer for low accuracy version */
+double_vector __dRShifter_la 0x4330000000000000 
+
+/* right-shifer-1.0 for low accuracy version */
+double_vector __dRShifterm5_la 0x432fffffffffffff
+
+/* right-shifer with low mask for low accuracy version */
+double_vector __dRXmax_la 0x43300000007ffffe
+
+	.type	__svml_dcos_data,@object
+	.size	__svml_dcos_data,.-__svml_dcos_data
+#endif
+#endif
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c b/sysdeps/x86_64/fpu/test-double-vlen4.c
new file mode 100644
index 0000000..68b07ca
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4.c
@@ -0,0 +1,46 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-double-vlen4.h"
+
+// Wrapper from scalar to vector function implemented in AVX2.
+#define VECTOR_WRAPPER(scalar_func,vector_func) \
+extern __m256d vector_func(__m256d);\
+FLOAT scalar_func(FLOAT x)\
+{\
+  int i;\
+  __m256d mx = _mm256_set1_pd(x);\
+  __m256d mr = vector_func(mx);\
+  for(i=1;i<4;i++)\
+  {\
+    if (((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#include <immintrin.h>
+
+VECTOR_WRAPPER (WRAPPER_NAME(cos),_ZGVdN4v_cos)
+
+#define TEST_VECTOR_cos 1
+
+#define REQUIRE_AVX2
+
+#include "libm-test.c"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/test-float-vlen8.c
new file mode 100644
index 0000000..3898df9
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-vlen8.c
@@ -0,0 +1,45 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-float-vlen8.h"
+
+#define VECTOR_WRAPPER(scalar_func,vector_func) \
+/*extern __m256 vector_func(__m256);*/\
+FLOAT scalar_func(FLOAT x)\
+{\
+  int i;\
+  __m256 mx = _mm256_set1_ps(x);\
+  __m256 mr = mx; /*vector_func(mx);*/\
+  for(i=1;i<8;i++)\
+  {\
+    if(((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#include <immintrin.h>
+
+VECTOR_WRAPPER (WRAPPER_NAME(cosf),_ZGVdN8v_cosf)
+
+#define TEST_VECTOR_cosf 0
+
+#define REQUIRE_AVX2
+
+#include "libm-test.c"

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-11-14 15:45                                                         ` Andrew Senkevich
@ 2014-11-14 16:51                                                           ` Joseph Myers
  2014-11-18 19:06                                                             ` Andrew Senkevich
  0 siblings, 1 reply; 67+ messages in thread
From: Joseph Myers @ 2014-11-14 16:51 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Fri, 14 Nov 2014, Andrew Senkevich wrote:

> +#define __SIMD_DECL(function) __CONCAT(__DECL_SIMD_,function)
> +
> +#define __MATHCALL_VEC(function,suffix, args) 	\
> +  __SIMD_DECL(__MATH_PRECNAME(function,suffix)) \
> +  __MATHCALL(function,suffix, args)

Generally, throughout the patch, use GNU style: spaces before open 
parentheses for calls to functions and function-like macros (not of course 
in "#define func(args)" where C syntax doesn't allow that space) and after 
commas.

> diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
> index c9f9a51..0b73d5b 100644
> --- a/sysdeps/x86_64/configure.ac
> +++ b/sysdeps/x86_64/configure.ac
> @@ -99,6 +99,15 @@ if test $libc_cv_cc_avx2 = yes; then
>  fi
>  LIBC_CONFIG_VAR([config-cflags-avx2], [$libc_cv_cc_avx2])
>  
> +if test x"$build_mathvec" = xnotset; then
> +  if test x"$base_machine" = xx86_64; then

No need for the base_machine test here; this configure fragment will never 
be called for non-x86_64 machines.  It's only preconfigure fragments that 
need to check for an applicable machine, not configure ones.

> +LIBC_CONFIG_VAR([build-mathvec], [$build_mathvec])

I think the LIBC_CONFIG_VAR call belongs in the toplevel configure script 
(after the sysdeps configure fragments have been run) - as does setting 
build_mathvec to "no" if it's still "notset" after running the sysdeps 
configure fragments.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-11-14 16:51                                                           ` Joseph Myers
@ 2014-11-18 19:06                                                             ` Andrew Senkevich
  2014-11-18 22:49                                                               ` Joseph Myers
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Senkevich @ 2014-11-18 19:06 UTC (permalink / raw)
  To: Joseph Myers; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 516 bytes --]

Hi Joseph,

attached patch now contain versions of vector cos in SSE4, AVX and AVX2 ISAs.

Because of both AVX and AVX2 versions have vector length 4 there are
some changes in tests - I put AVX2 make rules in sysdeps makefile and
its name changed to test-double-vlen4-avx2 , AVX test stay with old
name, in ULPs file specified both versions.

If everything is okey, let me know and I will prepare separated
patches while document about pragma meaning is preparing (we plan to
add it at last steps).


--
WBR,
Andrew

[-- Attachment #2: libmvec_181114.patch --]
[-- Type: application/octet-stream, Size: 66319 bytes --]

diff --git a/Makeconfig b/Makeconfig
index 24a3b82..4672008 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -476,7 +476,7 @@ link-libc = $(link-libc-rpath-link) $(link-libc-before-gnulib) $(gnulib)
 link-libc-tests = $(link-libc-tests-rpath-link) \
 		  $(link-libc-before-gnulib) $(gnulib-tests)
 # This is how to find at build-time things that will be installed there.
-rpath-dirs = math elf dlfcn nss nis rt resolv crypt
+rpath-dirs = math elf dlfcn nss nis rt resolv crypt mathvec
 rpath-link = \
 $(common-objdir):$(subst $(empty) ,:,$(patsubst ../$(subdir),.,$(rpath-dirs:%=$(common-objpfx)%)))
 else
@@ -1018,7 +1018,7 @@ all-subdirs = csu assert ctype locale intl catgets math setjmp signal	    \
 	      stdlib stdio-common libio malloc string wcsmbs time dirent    \
 	      grp pwd posix io termios resource misc socket sysvipc gmon    \
 	      gnulib iconv iconvdata wctype manual shadow gshadow po argp   \
-	      crypt localedata timezone rt conform debug		    \
+	      crypt localedata timezone rt conform debug mathvec	    \
 	      $(add-on-subdirs) dlfcn elf
 
 ifndef avoid-generated
diff --git a/bits/math-vector.h b/bits/math-vector.h
new file mode 100644
index 0000000..c8fe5cb
--- /dev/null
+++ b/bits/math-vector.h
@@ -0,0 +1,22 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License  published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+	include <math.h> instead."
+#endif
diff --git a/configure b/configure
index 0cb54ec..a3ea531 100755
--- a/configure
+++ b/configure
@@ -774,6 +774,7 @@ enable_systemtap
 enable_build_nscd
 enable_nscd
 enable_pt_chown
+enable_mathvec
 with_cpu
 '
       ac_precious_vars='build_alias
@@ -1437,6 +1438,8 @@ Optional Features:
   --disable-build-nscd    disable building and installing the nscd daemon
   --disable-nscd          library functions will not contact the nscd daemon
   --enable-pt_chown       Enable building and installing pt_chown
+  --enable-mathvec        Enable building and installing mathvec [default
+                          depends on architecture]
 
 Optional Packages:
   --with-PACKAGE[=ARG]    use PACKAGE [ARG=yes]
@@ -3730,6 +3733,14 @@ if test "$build_pt_chown" = yes; then
 
 fi
 
+# Check whether --enable-mathvec was given.
+if test "${enable_mathvec+set}" = set; then :
+  enableval=$enable_mathvec; build_mathvec=$enableval
+else
+  build_mathvec=notset
+fi
+
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.
@@ -7039,6 +7050,12 @@ $as_echo "running configure fragment for $dir" >&6; }
   fi
 done
 
+if test x"$build_mathvec" = xnotset; then
+  build_mathvec=no
+fi
+config_vars="$config_vars
+build-mathvec = $build_mathvec"
+
 
 
 
diff --git a/configure.ac b/configure.ac
index b2c4b1f..f6805aa 100644
--- a/configure.ac
+++ b/configure.ac
@@ -353,6 +353,12 @@ if test "$build_pt_chown" = yes; then
   AC_DEFINE(HAVE_PT_CHOWN)
 fi
 
+AC_ARG_ENABLE([mathvec],
+	      [AS_HELP_STRING([--enable-mathvec],
+	      [Enable building and installing mathvec @<:@default depends on architecture@:>@])],
+	      [build_mathvec=$enableval],
+	      [build_mathvec=notset])
+
 # We keep the original values in `$config_*' and never modify them, so we
 # can write them unchanged into config.make.  Everything else uses
 # $machine, $vendor, and $os, and changes them whenever convenient.
@@ -1939,6 +1945,11 @@ for dir in $sysnames; do
   fi
 done
 
+if test x"$build_mathvec" = xnotset; then
+  build_mathvec=no
+fi
+LIBC_CONFIG_VAR([build-mathvec], [$build_mathvec])
+
 AC_SUBST(libc_extra_cflags)
 AC_SUBST(libc_extra_cppflags)
 
diff --git a/include/libm-simd-decl-stubs.h b/include/libm-simd-decl-stubs.h
new file mode 100644
index 0000000..0048717
--- /dev/null
+++ b/include/libm-simd-decl-stubs.h
@@ -0,0 +1,35 @@
+/* Empty definitions required for __MATHCALL_VEC unfolding in mathcalls.h.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Needed definitions could be generated with: 
+   for func in $(grep __MATHCALL_VEC math/bits/mathcalls.h |\
+		 sed -r "s|__MATHCALL_VEC.?\(||; s|,.*||"); do 
+     echo "#define __DECL_SIMD_${func}"; 
+     echo "#define __DECL_SIMD_${func}f"; 
+     echo "#define __DECL_SIMD_${func}l";
+   done 
+ */
+
+#ifndef _LIBM_SIMD_DECL_STUBS_H
+#define _LIBM_SIMD_DECL_STUBS_H 1
+
+#define __DECL_SIMD_cos
+#define __DECL_SIMD_cosf
+#define __DECL_SIMD_cosl
+
+#endif
diff --git a/math/Makefile b/math/Makefile
index 866bc0f..4981358 100644
--- a/math/Makefile
+++ b/math/Makefile
@@ -26,7 +26,7 @@ headers		:= math.h bits/mathcalls.h bits/mathinline.h bits/huge_val.h \
 		   bits/huge_valf.h bits/huge_vall.h bits/inf.h bits/nan.h \
 		   fpu_control.h complex.h bits/cmathcalls.h fenv.h \
 		   bits/fenv.h bits/fenvinline.h bits/mathdef.h tgmath.h \
-		   bits/math-finite.h
+		   bits/math-finite.h bits/math-vector.h libm-simd-decl-stubs.h
 
 # FPU support code.
 aux		:= setfpucw fpu_control
@@ -85,6 +85,22 @@ generated += $(foreach s,.c .S l.c l.S f.c f.S,$(calls:s_%=m_%$s))
 routines = $(calls) $(calls:=f) $(long-c-$(long-double-fcts))
 long-c-yes = $(calls:=l)
 
+ifeq ($(build-mathvec),yes)
+# We need to install libm.so as linker script
+# for more comfortable use of vector math library.
+install-lib-ldscripts := libm.so
+install_subdir: $(inst_libdir)/libm.so
+$(inst_libdir)/libm.so: $(common-objpfx)format.lds \
+	$(libm) \
+	$(common-objpfx)mathvec/libmvec.so$(libmvec.so-version) \
+	$(+force)
+	(echo '/* GNU ld script'; echo '*/';\
+	cat $<; \
+	echo 'GROUP ( $(slibdir)/libm.so$(libm.so-version) ' \
+	'AS_NEEDED ( $(slibdir)/libmvec.so$(libmvec.so-version) ) )' \
+	) > $@
+endif
+
 # Rules for the test suite.
 tests = test-matherr test-fenv atest-exp atest-sincos atest-exp2 basic-test \
 	test-misc test-fpucw test-fpucw-ieee tst-definitions test-tgmath \
@@ -97,12 +113,13 @@ tests-static = test-fpucw-static test-fpucw-ieee-static
 test-longdouble-yes = test-ldouble test-ildoubl
 
 ifneq (no,$(PERL))
+libm-vec-tests = $(addprefix test-,$(libmvec-tests))
 libm-tests = test-float test-double $(test-longdouble-$(long-double-fcts)) \
-	test-ifloat test-idouble
+	test-ifloat test-idouble $(libm-vec-tests)
 libm-tests.o = $(addsuffix .o,$(libm-tests))
 
 tests += $(libm-tests)
-libm-tests-generated = libm-test-ulps.h libm-test.c
+libm-tests-generated = libm-test-ulps.h libm-have-vector-test.h libm-test.c
 generated += $(libm-tests-generated) libm-test.stmp
 
 # This is needed for dependencies
@@ -113,9 +130,10 @@ ulps-file = $(firstword $(wildcard $(sysdirs:%=%/libm-test-ulps)))
 $(addprefix $(objpfx), $(libm-tests-generated)): $(objpfx)libm-test.stmp
 
 $(objpfx)libm-test.stmp: $(ulps-file) libm-test.inc gen-libm-test.pl \
-			 auto-libm-test-out
+			 gen-libm-have-vector-test.sh auto-libm-test-out
 	$(make-target-directory)
 	$(PERL) gen-libm-test.pl -u $< -o "$(objpfx)"
+	$(BASH) gen-libm-have-vector-test.sh > $(objpfx)libm-have-vector-test.h
 	@echo > $@
 
 $(objpfx)test-float.o: $(objpfx)libm-test.stmp
@@ -124,8 +142,22 @@ $(objpfx)test-double.o: $(objpfx)libm-test.stmp
 $(objpfx)test-idouble.o: $(objpfx)libm-test.stmp
 $(objpfx)test-ldouble.o: $(objpfx)libm-test.stmp
 $(objpfx)test-ildoubl.o: $(objpfx)libm-test.stmp
+
+$(objpfx)test-double-vlen2.o: $(objpfx)libm-test.stmp
+$(objpfx)test-double-vlen4.o: $(objpfx)libm-test.stmp
+
+$(objpfx)test-double-vlen2: $(common-objpfx)mathvec/libmvec.so \
+			    $(objpfx)init-arch.o
+$(objpfx)test-double-vlen4: $(common-objpfx)mathvec/libmvec.so \
+			    $(objpfx)init-arch.o
 endif
 
+CFLAGS-test-double-vlen2.c = -fno-inline -ffloat-store -fno-builtin -frounding-math \
+			     -D__FAST_MATH__ -DTEST_FAST_MATH -D_OPENMP=201307 \
+			     -Wno-unknown-pragmas
+CFLAGS-test-double-vlen4.c = -fno-inline -ffloat-store -fno-builtin -frounding-math \
+			     -D__FAST_MATH__ -DTEST_FAST_MATH -D_OPENMP=201307 \
+			     -Wno-unknown-pragmas $(arch-ext-cflags)
 CFLAGS-test-float.c = -fno-inline -ffloat-store -fno-builtin -frounding-math
 CFLAGS-test-double.c = -fno-inline -ffloat-store -fno-builtin -frounding-math
 CFLAGS-test-ldouble.c = -fno-inline -ffloat-store -fno-builtin -frounding-math
diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 8a94a7e..82928a1 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -60,7 +60,7 @@ __MATHCALL (atan,, (_Mdouble_ __x));
 __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
 
 /* Cosine of X.  */
-__MATHCALL (cos,, (_Mdouble_ __x));
+__MATHCALL_VEC (cos,, (_Mdouble_ __x));
 /* Sine of X.  */
 __MATHCALL (sin,, (_Mdouble_ __x));
 /* Tangent of X.  */
diff --git a/math/gen-libm-have-vector-test.sh b/math/gen-libm-have-vector-test.sh
new file mode 100755
index 0000000..95c7bef
--- /dev/null
+++ b/math/gen-libm-have-vector-test.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+# Copyright (C) 1999-2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Generate series of definitions used for vector math functions tests.
+print_defs()
+{
+  echo "#if defined TEST_VECTOR_$1 && TEST_VECTOR_$1"
+  echo "# define HAVE_VECTOR_$1 1"
+  echo "# define VEC_PREFIX_$1 WRAPPER_NAME($1)"
+  echo "#else"
+  echo "# define HAVE_VECTOR_$1 0"
+  echo "# define VEC_PREFIX_$1 $1"
+  echo "#endif"
+  echo
+}
+
+for func in $(grep ALL_RM_TEST libm-test.inc | grep -v define | sed -r "s/.*\(//; s/,.*//"); do 
+  print_defs ${func}
+  print_defs ${func}f
+  print_defs ${func}l
+done
+
+print_defs jn
+print_defs jnf
+print_defs jnl
+
+print_defs cexp
+print_defs cexpf
+print_defs cexpl
+
+print_defs tgamma
+print_defs tgammaf
+print_defs tgammal
diff --git a/math/libm-test.inc b/math/libm-test.inc
index f86a4fa..b22bbad 100644
--- a/math/libm-test.inc
+++ b/math/libm-test.inc
@@ -678,13 +678,17 @@ test_exceptions (const char *test_name, int exception)
   feclearexcept (FE_ALL_EXCEPT);
 }
 
+#ifndef TEST_MATHVEC
+# define TEST_MATHVEC 0
+#endif
+
 /* Test whether errno for TEST_NAME, set to ERRNO_VALUE, has value
    EXPECTED_VALUE (description EXPECTED_NAME).  */
 static void
 test_single_errno (const char *test_name, int errno_value,
 		   int expected_value, const char *expected_name)
 {
-#ifndef TEST_INLINE
+#if !defined TEST_INLINE && !TEST_MATHVEC
   if (errno_value == expected_value)
     {
       if (print_screen (1))
@@ -1295,16 +1299,17 @@ struct test_fFF_11_data
 
 /* Run an individual test, including any required setup and checking
    of results, or loop over all tests in an array.  */
-#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
-		     EXCEPTIONS);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_f_f(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
+		     EXCEPTIONS)				\
+  do								\
+    if (enable_test (EXCEPTIONS))				\
+      {								\
+	COMMON_TEST_SETUP (ARG_STR);				\
+	check_float (test_name,	FUNC_TEST (FUNC_NAME) (ARG),	\
+		     EXPECTED,					\
+		     EXCEPTIONS);				\
+	COMMON_TEST_CLEANUP;					\
+      }								\
   while (0)
 #define RUN_TEST_LOOP_f_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1313,16 +1318,16 @@ struct test_fFF_11_data
 		    (ARRAY)[i].RM_##ROUNDING_MODE.expected,		\
 		    (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);		\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_2_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,	\
-		     EXCEPTIONS)				\
-  do								\
-    if (enable_test (EXCEPTIONS))				\
-      {								\
-	COMMON_TEST_SETUP (ARG_STR);				\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG1, ARG2),	\
-		     EXPECTED, EXCEPTIONS);			\
-	COMMON_TEST_CLEANUP;					\
-      }								\
+#define RUN_TEST_2_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,		\
+		     EXCEPTIONS)					\
+  do									\
+    if (enable_test (EXCEPTIONS))					\
+      {									\
+	COMMON_TEST_SETUP (ARG_STR);					\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG1, ARG2),	\
+		     EXPECTED, EXCEPTIONS);				\
+	COMMON_TEST_CLEANUP;						\
+      }									\
   while (0)
 #define RUN_TEST_LOOP_2_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1340,16 +1345,16 @@ struct test_fFF_11_data
 #define RUN_TEST_LOOP_fl_f RUN_TEST_LOOP_2_f
 #define RUN_TEST_if_f RUN_TEST_2_f
 #define RUN_TEST_LOOP_if_f RUN_TEST_LOOP_2_f
-#define RUN_TEST_fff_f(ARG_STR, FUNC_NAME, ARG1, ARG2, ARG3,		\
-		       EXPECTED, EXCEPTIONS)				\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG1, ARG2, ARG3),	\
-		     EXPECTED, EXCEPTIONS);				\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fff_f(ARG_STR, FUNC_NAME, ARG1, ARG2, ARG3,			\
+		       EXPECTED, EXCEPTIONS)					\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG1, ARG2, ARG3),	\
+		     EXPECTED, EXCEPTIONS);					\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fff_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1359,17 +1364,17 @@ struct test_fFF_11_data
 		      (ARRAY)[i].RM_##ROUNDING_MODE.expected,		\
 		      (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_c_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,		\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_float (test_name,						\
-		     FUNC (FUNC_NAME) (BUILD_COMPLEX (ARG1, ARG2)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_c_f(ARG_STR, FUNC_NAME, ARG1, ARG2, EXPECTED,			\
+		     EXCEPTIONS)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_float (test_name,							\
+		     FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARG1, ARG2)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_c_f(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1387,7 +1392,7 @@ struct test_fFF_11_data
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
 	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		     EXCEPTIONS);					\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA_TEST)							\
@@ -1406,22 +1411,22 @@ struct test_fFF_11_data
 		     (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,		\
 		     (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_fF_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
-		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,		\
-		       EXTRA_EXPECTED)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
-	if (EXTRA_TEST)							\
-	  check_float (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);	\
-	EXTRA_OUTPUT_TEST_CLEANUP (1);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fF_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
+		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,			\
+		       EXTRA_EXPECTED)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;				\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);					\
+	if (EXTRA_TEST)								\
+	  check_float (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);		\
+	EXTRA_OUTPUT_TEST_CLEANUP (1);						\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fF_f1(FUNC_NAME, ARRAY, ROUNDING_MODE, EXTRA_VAR)	\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1433,22 +1438,22 @@ struct test_fFF_11_data
 		      (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,		\
 		      (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_fI_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,		\
-		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,		\
-		       EXTRA_EXPECTED)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
-	check_float (test_name, FUNC (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
-		     EXPECTED, EXCEPTIONS);				\
-	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
-	if (EXTRA_TEST)							\
-	  check_int (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);	\
-	EXTRA_OUTPUT_TEST_CLEANUP (1);					\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_fI_f1(ARG_STR, FUNC_NAME, ARG, EXPECTED,			\
+		       EXCEPTIONS, EXTRA_VAR, EXTRA_TEST,			\
+		       EXTRA_EXPECTED)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;				\
+	check_float (test_name, FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA_VAR)),	\
+		     EXPECTED, EXCEPTIONS);					\
+	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);					\
+	if (EXTRA_TEST)								\
+	  check_int (extra1_name, EXTRA_VAR, EXTRA_EXPECTED, 0);		\
+	EXTRA_OUTPUT_TEST_CLEANUP (1);						\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_fI_f1(FUNC_NAME, ARRAY, ROUNDING_MODE, EXTRA_VAR)	\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1469,7 +1474,7 @@ struct test_fFF_11_data
 	COMMON_TEST_SETUP (ARG_STR);					\
 	(EXTRA_VAR) = (EXTRA_EXPECTED) == 0 ? 1 : 0;			\
 	check_float (test_name,						\
-		     FUNC (FUNC_NAME) (ARG1, ARG2, &(EXTRA_VAR)),	\
+		     FUNC_TEST (FUNC_NAME) (ARG1, ARG2, &(EXTRA_VAR)),	\
 		     EXPECTED, EXCEPTIONS);				\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA_TEST)							\
@@ -1490,17 +1495,17 @@ struct test_fFF_11_data
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra_test,	\
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_c_c(ARG_STR, FUNC_NAME, ARGR, ARGC, EXPR, EXPC,	\
-		     EXCEPTIONS)					\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_complex (test_name,					\
-		       FUNC (FUNC_NAME) (BUILD_COMPLEX (ARGR, ARGC)),	\
-		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);		\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_c_c(ARG_STR, FUNC_NAME, ARGR, ARGC, EXPR, EXPC,		\
+		     EXCEPTIONS)						\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_complex (test_name,						\
+		       FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARGR, ARGC)),	\
+		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);			\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_c_c(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1511,18 +1516,18 @@ struct test_fFF_11_data
 		    (ARRAY)[i].RM_##ROUNDING_MODE.expc,			\
 		    (ARRAY)[i].RM_##ROUNDING_MODE.exceptions);		\
   ROUND_RESTORE_ ## ROUNDING_MODE
-#define RUN_TEST_cc_c(ARG_STR, FUNC_NAME, ARG1R, ARG1C, ARG2R, ARG2C,	\
-		      EXPR, EXPC, EXCEPTIONS)				\
-  do									\
-    if (enable_test (EXCEPTIONS))					\
-      {									\
-	COMMON_TEST_SETUP (ARG_STR);					\
-	check_complex (test_name,					\
-		       FUNC (FUNC_NAME) (BUILD_COMPLEX (ARG1R, ARG1C),	\
-					 BUILD_COMPLEX (ARG2R, ARG2C)),	\
-		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);		\
-	COMMON_TEST_CLEANUP;						\
-      }									\
+#define RUN_TEST_cc_c(ARG_STR, FUNC_NAME, ARG1R, ARG1C, ARG2R, ARG2C,		\
+		      EXPR, EXPC, EXCEPTIONS)					\
+  do										\
+    if (enable_test (EXCEPTIONS))						\
+      {										\
+	COMMON_TEST_SETUP (ARG_STR);						\
+	check_complex (test_name,						\
+		       FUNC_TEST (FUNC_NAME) (BUILD_COMPLEX (ARG1R, ARG1C),	\
+					      BUILD_COMPLEX (ARG2R, ARG2C)),	\
+		       BUILD_COMPLEX (EXPR, EXPC), EXCEPTIONS);			\
+	COMMON_TEST_CLEANUP;							\
+      }										\
   while (0)
 #define RUN_TEST_LOOP_cc_c(FUNC_NAME, ARRAY, ROUNDING_MODE)		\
   IF_ROUND_INIT_ ## ROUNDING_MODE					\
@@ -1539,7 +1544,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_int (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,		\
+	check_int (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		   EXCEPTIONS);						\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1592,7 +1597,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_bool (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_bool (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		    EXCEPTIONS);					\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1626,7 +1631,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_long (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
+	check_long (test_name, FUNC_TEST (FUNC_NAME) (ARG), EXPECTED,	\
 		    EXCEPTIONS);					\
 	COMMON_TEST_CLEANUP;						\
       }									\
@@ -1643,8 +1648,8 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	check_longlong (test_name, FUNC (FUNC_NAME) (ARG), EXPECTED,	\
-			EXCEPTIONS);					\
+	check_longlong (test_name, FUNC_TEST (FUNC_NAME) (ARG),		\
+			EXPECTED, EXCEPTIONS);				\
 	COMMON_TEST_CLEANUP;						\
       }									\
   while (0)
@@ -1663,7 +1668,7 @@ struct test_fFF_11_data
     if (enable_test (EXCEPTIONS))					\
       {									\
 	COMMON_TEST_SETUP (ARG_STR);					\
-	FUNC (FUNC_NAME) (ARG, &(EXTRA1_VAR), &(EXTRA2_VAR));		\
+	FUNC_TEST (FUNC_NAME) (ARG, &(EXTRA1_VAR), &(EXTRA2_VAR));	\
 	EXTRA_OUTPUT_TEST_SETUP (ARG_STR, 1);				\
 	if (EXTRA1_TEST)						\
 	  check_float (extra1_name, EXTRA1_VAR, EXTRA1_EXPECTED,	\
@@ -1690,9 +1695,31 @@ struct test_fFF_11_data
 		       (ARRAY)[i].RM_##ROUNDING_MODE.extra2_expected);	\
   ROUND_RESTORE_ ## ROUNDING_MODE
 
+#ifndef INIT_ARCH_EXT
+# define INIT_ARCH_EXT
+# define CHECK_ARCH_EXT
+#endif
+
+#ifndef VEC_PREFIX 
+# define VEC_PREFIX
+#endif
+
+#ifndef FUNC_TEST
+# define FUNC_TEST FUNC
+#endif
+
+#include "libm-have-vector-test.h"
+
+#define STR_CONCAT(a,b,c) __STRING (a##b##c)
+#define STR_CON3(a,b,c) STR_CONCAT (a,b,c)
+
+#define HAVE_VECTOR(func) __CONCAT (HAVE_VECTOR_,func)
+
 /* Start and end the tests for a given function.  */
-#define START(FUNC, EXACT)			\
-  const char *this_func = #FUNC;		\
+#define START(FUN, SUFF, EXACT)					\
+  CHECK_ARCH_EXT						\
+  if (TEST_MATHVEC && !HAVE_VECTOR (FUNC (FUN))) return;		\
+  const char *this_func = STR_CON3 (VEC_PREFIX,FUN,SUFF);	\
   init_max_error (this_func, EXACT)
 #define END					\
   print_max_error (this_func)
@@ -1705,28 +1732,28 @@ struct test_fFF_11_data
     {									\
       do								\
 	{								\
-	  START (FUNC, EXACT);						\
+	  START (FUNC, , EXACT);					\
 	  LOOP_MACRO (FUNC, ARRAY, , ## __VA_ARGS__);			\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _downward, EXACT);				\
+	  START (FUNC, _downward, EXACT);				\
 	  LOOP_MACRO (FUNC, ARRAY, FE_DOWNWARD, ## __VA_ARGS__);	\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _towardzero, EXACT);				\
+	  START (FUNC, _towardzero, EXACT);				\
 	  LOOP_MACRO (FUNC, ARRAY, FE_TOWARDZERO, ## __VA_ARGS__);	\
 	  END_MACRO;							\
 	}								\
       while (0);							\
       do								\
 	{								\
-	  START (FUNC ## _upward, EXACT);				\
+	  START (FUNC, _upward, EXACT);					\
 	  LOOP_MACRO (FUNC, ARRAY, FE_UPWARD, ## __VA_ARGS__);		\
 	  END_MACRO;							\
 	}								\
@@ -6034,7 +6061,7 @@ static const struct test_c_c_data cexp_test_data[] =
 static void
 cexp_test (void)
 {
-  START (cexp, 0);
+  START (cexp, , 0);
   RUN_TEST_LOOP_c_c (cexp, cexp_test_data, );
   END_COMPLEX;
 }
@@ -7548,7 +7575,7 @@ static const struct test_if_f_data jn_test_data[] =
 static void
 jn_test (void)
 {
-  START (jn, 0);
+  START (jn, , 0);
   RUN_TEST_LOOP_if_f (jn, jn_test_data, );
   END;
 }
@@ -9374,7 +9401,7 @@ static const struct test_f_f_data tgamma_test_data[] =
 static void
 tgamma_test (void)
 {
-  START (tgamma, 0);
+  START (tgamma, , 0);
   RUN_TEST_LOOP_f_f (tgamma, tgamma_test_data, );
   END;
 }
@@ -9824,6 +9851,8 @@ main (int argc, char **argv)
   initialize ();
   printf (TEST_MSG);
 
+  INIT_ARCH_EXT
+
   check_ulp ();
 
   /* Keep the tests a wee bit ordered (according to ISO C99).  */
diff --git a/math/math.h b/math/math.h
index dc532b7..8609c22 100644
--- a/math/math.h
+++ b/math/math.h
@@ -27,6 +27,9 @@
 
 __BEGIN_DECLS
 
+/* Get machine-dependent vector math functions declarations */
+#include <bits/math-vector.h>
+
 /* Get machine-dependent HUGE_VAL value (returned on overflow).
    On all IEEE754 machines, this is +Infinity.  */
 #include <bits/huge_val.h>
@@ -49,6 +52,12 @@ __BEGIN_DECLS
    so we can easily declare each function as both `name' and `__name',
    and can declare the float versions `namef' and `__namef'.  */
 
+#define __SIMD_DECL(function) __CONCAT (__DECL_SIMD_,function)
+
+#define __MATHCALL_VEC(function,suffix, args) 	\
+  __SIMD_DECL (__MATH_PRECNAME(function,suffix)) \
+  __MATHCALL (function,suffix, args)
+
 #define __MATHCALL(function,suffix, args)	\
   __MATHDECL (_Mdouble_,function,suffix, args)
 #define __MATHDECL(type, function,suffix, args) \
diff --git a/math/test-double-vlen2.h b/math/test-double-vlen2.h
new file mode 100644
index 0000000..d5e92d1
--- /dev/null
+++ b/math/test-double-vlen2.h
@@ -0,0 +1,42 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FLOAT double
+#define FUNC(function) function
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat) Cdouble
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_DOUBLE 1
+#define TEST_MATHVEC 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define EXCEPTION_TESTS_double 0
+#define ROUNDING_TESTS_double(MODE) ((MODE) == FE_TONEAREST)
+
+#define VEC_PREFIX vlen2_
+
+#define CONCAT(prefix,func) __CONCAT (prefix,func)
+
+#define WRAPPER_NAME(function) CONCAT (VEC_PREFIX,function)
+
+#define FUNC_TEST(function) VEC_PREFIX_ ## function
diff --git a/math/test-double-vlen4.h b/math/test-double-vlen4.h
new file mode 100644
index 0000000..f8fc66e
--- /dev/null
+++ b/math/test-double-vlen4.h
@@ -0,0 +1,40 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FLOAT double
+#define FUNC(function) function
+#define TEST_MSG "testing double vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat) Cdouble
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_DOUBLE 1
+#define TEST_MATHVEC 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define EXCEPTION_TESTS_double 0
+#define ROUNDING_TESTS_double(MODE) ((MODE) == FE_TONEAREST)
+
+#define CONCAT(prefix,func) __CONCAT (prefix,func)
+
+#define WRAPPER_NAME(function) CONCAT (VEC_PREFIX,function)
+
+#define FUNC_TEST(function) VEC_PREFIX_ ## function
diff --git a/math/test-float-vlen8.h b/math/test-float-vlen8.h
new file mode 100644
index 0000000..2984e0c
--- /dev/null
+++ b/math/test-float-vlen8.h
@@ -0,0 +1,42 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FLOAT float
+#define FUNC(function) function ## f
+#define TEST_MSG "testing float vector math (without inline functions)\n"
+#define MATHCONST(x) x
+#define CHOOSE(Clongdouble,Cdouble,Cfloat,Cinlinelongdouble,Cinlinedouble,Cinlinefloat) Cfloat
+#define PRINTF_EXPR "e"
+#define PRINTF_XEXPR "a"
+#define PRINTF_NEXPR "f"
+#define TEST_FLOAT 1
+#define TEST_MATHVEC 1
+
+#ifndef __NO_MATH_INLINES
+# define __NO_MATH_INLINES
+#endif
+
+#define EXCEPTION_TESTS_float 0
+#define ROUNDING_TESTS_float(MODE) ((MODE) == FE_TONEAREST)
+
+#define VEC_PREFIX vlen8_
+
+#define CONCAT(prefix,func) __CONCAT (prefix,func)
+
+#define WRAPPER_NAME(function) CONCAT (VEC_PREFIX,function)
+
+#define FUNC_TEST(function) VEC_PREFIX_ ## function ## f
diff --git a/mathvec/Depend b/mathvec/Depend
new file mode 100644
index 0000000..ede10ab
--- /dev/null
+++ b/mathvec/Depend
@@ -0,0 +1 @@
+math
diff --git a/mathvec/Makefile b/mathvec/Makefile
new file mode 100644
index 0000000..26c552c
--- /dev/null
+++ b/mathvec/Makefile
@@ -0,0 +1,35 @@
+# Copyright (C) 2014 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <http://www.gnu.org/licenses/>.
+
+# Makefile for the vector math library.
+
+subdir		:= mathvec
+
+include ../Makeconfig
+
+ifeq ($(build-mathvec),yes)
+extra-libs	:= libmvec
+extra-libs-others = $(extra-libs)
+
+libmvec-routines = $(strip $(libmvec-support))
+
+$(objpfx)libmvec.so: $(libm)
+endif
+
+# Rules for the test suite are in math directory
+
+include ../Rules
diff --git a/shlib-versions b/shlib-versions
index e05b248..fa3cf1d 100644
--- a/shlib-versions
+++ b/shlib-versions
@@ -71,3 +71,6 @@ libanl=1
 # This defines the libgcc soname version this glibc is to load for
 # asynchronous cancellation to work correctly.
 libgcc_s=1
+
+# The vector math library
+libmvec=1
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
new file mode 100644
index 0000000..b984207
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -0,0 +1,5 @@
+GLIBC_2.21
+ GLIBC_2.21 A
+ _ZGVbN2v_cos F
+ _ZGVcN4v_cos F
+ _ZGVdN4v_cos F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
new file mode 100644
index 0000000..0d71ce9
--- /dev/null
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -0,0 +1,50 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly; \
+	include <math.h> instead."
+#endif
+
+/* Get default empty definitions for simd declarations */
+#include <libm-simd-decl-stubs.h>
+
+#if defined __x86_64__ && defined __FAST_MATH__
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case. */
+/* TODO document about pragma meaning */
+#  define __DECL_SIMD_AVX2 _Pragma ("omp declare simd notinbranch simdlen(4)")
+#  define __DECL_SIMD_SSE4 _Pragma ("omp declare simd notinbranch simdlen(8)")
+#  undef __DECL_SIMD_cos
+#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
+#  undef __DECL_SIMD_cosf
+#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4
+# elif defined _CILKPLUS && _CILKPLUS >= 0 
+/* CilkPlus case. */
+/* TODO _CILKPLUS currently nowhere defined, 
+ * add reserved-namespace versions and __GNUC_PREREQ
+#  define __DECL_SIMD_AVX2 __attribute__ ((__vector__ (__vectorlength__(4),\
+						       __nomask__)))
+#  define __DECL_SIMD_SSE4 __attribute__ ((__vector__ (__vectorlength__(8),\
+						       __nomask__)))
+#  undef __DECL_SIMD_cos
+#  define __DECL_SIMD_cos  __DECL_SIMD_AVX2
+#  undef __DECL_SIMD_cosf
+#  define __DECL_SIMD_cosf __DECL_SIMD_SSE4 */
+# endif
+#endif
diff --git a/sysdeps/x86_64/configure b/sysdeps/x86_64/configure
index 7d4dadd..685c036 100644
--- a/sysdeps/x86_64/configure
+++ b/sysdeps/x86_64/configure
@@ -275,6 +275,8 @@ fi
 config_vars="$config_vars
 config-cflags-avx2 = $libc_cv_cc_avx2"
 
+build_mathvec=yes
+
 $as_echo "#define PI_STATIC_AND_HIDDEN 1" >>confdefs.h
 
 # work around problem with autoconf and empty lines at the end of files
diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac
index c9f9a51..e9eceb1 100644
--- a/sysdeps/x86_64/configure.ac
+++ b/sysdeps/x86_64/configure.ac
@@ -99,6 +99,9 @@ if test $libc_cv_cc_avx2 = yes; then
 fi
 LIBC_CONFIG_VAR([config-cflags-avx2], [$libc_cv_cc_avx2])
 
+dnl Set build_mathvec
+build_mathvec=yes
+
 dnl It is always possible to access static and hidden symbols in an
 dnl position independent way.
 AC_DEFINE(PI_STATIC_AND_HIDDEN)
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
new file mode 100644
index 0000000..a994c73
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -0,0 +1,22 @@
+ifeq ($(subdir),mathvec)
+libmvec-support += svml_d_cos2_core svml_d_cos4_core_avx \
+		   svml_d_cos4_core_avx2 svml_d_cos_data
+endif
+
+# Rules for libmvec tests
+ifeq ($(subdir),math)
+ifeq ($(build-mathvec),yes)
+libmvec-tests += double-vlen2 double-vlen4 double-vlen4-avx2
+
+arch-ext-cflags = -mavx
+
+$(objpfx)test-double-vlen4-avx2.o: $(objpfx)libm-test.stmp
+
+$(objpfx)test-double-vlen4-avx2: $(common-objpfx)mathvec/libmvec.so \
+				 $(objpfx)init-arch.o
+
+CFLAGS-test-double-vlen4-avx2.c = -fno-inline -ffloat-store -fno-builtin -frounding-math \
+				  -D__FAST_MATH__ -DTEST_FAST_MATH -D_OPENMP=201307 \
+				  -Wno-unknown-pragmas -mavx2
+endif
+endif
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
new file mode 100644
index 0000000..c18d985
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Versions
@@ -0,0 +1,7 @@
+libmvec {
+  GLIBC_2.21 {
+    _ZGVbN2v_cos;
+    _ZGVcN4v_cos;
+    _ZGVdN4v_cos;
+  }
+}
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 36e1b76..b5c88d4 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1961,6 +1961,15 @@ ifloat: 3
 ildouble: 4
 ldouble: 4
 
+Function: "vlen2_cos":
+double: 1
+
+Function: "vlen4_avx2_cos":
+double: 1
+
+Function: "vlen4_avx_cos":
+double: 1
+
 Function: "y0":
 double: 2
 float: 1
diff --git a/sysdeps/x86_64/fpu/math-tests.h b/sysdeps/x86_64/fpu/math-tests.h
new file mode 100644
index 0000000..466b97b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/math-tests.h
@@ -0,0 +1,34 @@
+/* Configuration for math tests.  x86_64 version.
+   Copyright (C) 2013-2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifdef REQUIRE_AVX2
+# include <init-arch.h>
+
+  static int avx2_usable;	/* Set to 1 if AVX2 supported */
+
+# define INIT_ARCH_EXT 						\
+    __init_cpu_features ();					\
+    avx2_usable = __cpu_features.feature[index_AVX2_Usable]	\
+		& bit_AVX2_Usable;
+
+# define CHECK_ARCH_EXT						\
+  if (!avx2_usable) return;
+
+#endif
+
+#include_next <math-tests.h>
diff --git a/sysdeps/x86_64/fpu/svml_d_cos2_core.S b/sysdeps/x86_64/fpu/svml_d_cos2_core.S
new file mode 100644
index 0000000..47288c2
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos2_core.S
@@ -0,0 +1,210 @@
+/* Function cos vectorized with SSE4.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+#define _DATA_TABLE_OFFSETS_ONLY_
+#include "svml_d_cos_data.S"
+
+	.text
+ENTRY(_ZGVbN2v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *
+ *     ( low accuracy ( < 4ulp ) or enhanced performance ( half of correct mantissa ) implementation )
+ * 
+ *     Argument representation:
+ *     arg + Pi/2 = (N*Pi + R)
+ * 
+ *     Result calculation:
+ *     cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ *     sin(R) is approximated by corresponding polynomial
+ */
+        pushq     %rbp
+        movq      %rsp, %rbp
+        andq      $-64, %rsp
+        subq      $320, %rsp
+        movaps    %xmm0, %xmm3
+        movq      __svml_dcos_data@GOTPCREL(%rip), %rax
+        movups    __dHalfPI(%rax), %xmm2
+
+/* ARGUMENT RANGE REDUCTION:
+ * Add Pi/2 to argument: X' = X+Pi/2
+ */
+        addpd     %xmm3, %xmm2
+        movups    __dInvPI(%rax), %xmm5
+        movups    __dAbsMask(%rax), %xmm4
+
+/* Get absolute argument value: X' = |X'| */
+        andps     %xmm2, %xmm4
+
+/* Y = X'*InvPi + RS : right shifter add */
+        mulpd     %xmm5, %xmm2
+
+/* Check for large arguments path */
+        cmpnlepd  __dRangeVal(%rax), %xmm4
+        movups    __dRShifter(%rax), %xmm6
+        addpd     %xmm6, %xmm2
+        movmskpd  %xmm4, %ecx
+
+/* N = Y - RS : right shifter sub */
+        movaps    %xmm2, %xmm1
+
+/* SignRes = Y<<63 : shift LSB to MSB place for result sign */
+        psllq     $63, %xmm2
+        subpd     %xmm6, %xmm1
+
+/* N = N - 0.5 */
+        subpd     __dOneHalf(%rax), %xmm1
+        movups    __dPI1(%rax), %xmm7
+
+/* R = X - N*Pi1 */
+        mulpd     %xmm1, %xmm7
+        movups    __dPI2(%rax), %xmm4
+
+/* R = R - N*Pi2 */
+        mulpd     %xmm1, %xmm4
+        subpd     %xmm7, %xmm0
+        movups    __dPI3(%rax), %xmm5
+
+/* R = R - N*Pi3 */
+        mulpd     %xmm1, %xmm5
+        subpd     %xmm4, %xmm0
+
+/* R = R - N*Pi4 */
+        movups     __dPI4(%rax), %xmm6
+        mulpd     %xmm6, %xmm1
+        subpd     %xmm5, %xmm0
+        subpd     %xmm1, %xmm0
+
+/* POLYNOMIAL APPROXIMATION:
+ * R2 = R*R
+ */
+        movaps    %xmm0, %xmm4
+        mulpd     %xmm0, %xmm4
+        movups    __dC7(%rax), %xmm1
+        mulpd     %xmm4, %xmm1
+        addpd     __dC6(%rax), %xmm1
+        mulpd     %xmm4, %xmm1
+        addpd     __dC5(%rax), %xmm1
+        mulpd     %xmm4, %xmm1
+        addpd     __dC4(%rax), %xmm1
+
+/* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))) */
+        mulpd     %xmm4, %xmm1
+        addpd     __dC3(%rax), %xmm1
+
+/* Poly = R+R*(R2*(C1+R2*(C2+R2*Poly))) */
+        mulpd     %xmm4, %xmm1
+        addpd     __dC2(%rax), %xmm1
+        mulpd     %xmm4, %xmm1
+        addpd     __dC1(%rax), %xmm1
+        mulpd     %xmm1, %xmm4
+        mulpd     %xmm0, %xmm4
+        addpd     %xmm4, %xmm0
+
+/* RECONSTRUCTION:
+ * Final sign setting: Res = Poly^SignRes
+ */
+        xorps     %xmm2, %xmm0
+        testl     %ecx, %ecx
+        jne       .LBL_1_3
+
+.LBL_1_2:
+        movq      %rbp, %rsp
+        popq      %rbp
+        ret
+
+.LBL_1_3:
+        movups    %xmm3, 192(%rsp)
+        movups    %xmm0, 256(%rsp)
+        je        .LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        movups    %xmm8, 112(%rsp)
+        movups    %xmm9, 96(%rsp)
+        movups    %xmm10, 80(%rsp)
+        movups    %xmm11, 64(%rsp)
+        movups    %xmm12, 48(%rsp)
+        movups    %xmm13, 32(%rsp)
+        movups    %xmm14, 16(%rsp)
+        movups    %xmm15, (%rsp)
+        movq      %rsi, 136(%rsp)
+        movq      %rdi, 128(%rsp)
+        movq      %r12, 168(%rsp)
+        movb      %dl, %r12b
+        movq      %r13, 160(%rsp)
+        movl      %ecx, %r13d
+        movq      %r14, 152(%rsp)
+        movl      %eax, %r14d
+        movq      %r15, 144(%rsp)
+
+.LBL_1_6:
+        btl       %r14d, %r13d
+        jc        .LBL_1_12
+
+.LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        .LBL_1_10
+
+.LBL_1_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        .LBL_1_6
+
+        movups    112(%rsp), %xmm8
+        movups    96(%rsp), %xmm9
+        movups    80(%rsp), %xmm10
+        movups    64(%rsp), %xmm11
+        movups    48(%rsp), %xmm12
+        movups    32(%rsp), %xmm13
+        movups    16(%rsp), %xmm14
+        movups    (%rsp), %xmm15
+        movq      136(%rsp), %rsi
+        movq      128(%rsp), %rdi
+        movq      168(%rsp), %r12
+        movq      160(%rsp), %r13
+        movq      152(%rsp), %r14
+        movq      144(%rsp), %r15
+        movups    256(%rsp), %xmm0
+        jmp       .LBL_1_2
+
+.LBL_1_10:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        movsd     200(%rsp,%r15), %xmm0
+
+        call      cos@PLT
+
+        movsd     %xmm0, 264(%rsp,%r15)
+        jmp       .LBL_1_8
+
+.LBL_1_12:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        movsd     192(%rsp,%r15), %xmm0
+
+        call      cos@PLT
+
+        movsd     %xmm0, 256(%rsp,%r15)
+        jmp       .LBL_1_7
+
+END(_ZGVbN2v_cos)
diff --git a/sysdeps/x86_64/fpu/svml_d_cos4_core_avx.S b/sysdeps/x86_64/fpu/svml_d_cos4_core_avx.S
new file mode 100644
index 0000000..24b4f75
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos4_core_avx.S
@@ -0,0 +1,39 @@
+/* Function cos vectorized in AVX ISA as wrapper to SSE4 ISA version.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.text
+ENTRY(_ZGVcN4v_cos)
+        pushq		%rbp                                    
+        movq		%rsp, %rbp
+        andq		$-32, %rsp
+        subq		$32, %rsp
+        vextractf128	$1, %ymm0, (%rsp)
+        vzeroupper
+        call		_ZGVbN2v_cos@PLT
+        vmovapd		%xmm0, 16(%rsp)
+        vmovaps		(%rsp), %xmm0
+        call		_ZGVbN2v_cos@PLT
+        vmovapd		%xmm0, %xmm1
+        vmovapd		16(%rsp), %xmm0
+        vinsertf128	$1, %xmm1, %ymm0, %ymm0
+        movq		%rbp, %rsp
+        popq		%rbp
+        ret
+END(_ZGVcN4v_cos)
diff --git a/sysdeps/x86_64/fpu/svml_d_cos4_core_avx2.S b/sysdeps/x86_64/fpu/svml_d_cos4_core_avx2.S
new file mode 100644
index 0000000..95db6b3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos4_core_avx2.S
@@ -0,0 +1,195 @@
+/* Function cos vectorized with AVX2.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+#define _DATA_TABLE_OFFSETS_ONLY_
+#include "svml_d_cos_data.S"
+
+	.text
+ENTRY(_ZGVdN4v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *     
+ *    ( low accuracy ( < 4ulp ) or enhanced performance 
+ *      ( half of correct mantissa ) implementation )
+ *     
+ *    Argument representation:
+ *    arg + Pi/2 = (N*Pi + R)
+ *    
+ *    Result calculation:
+ *    cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ *    sin(R) is approximated by corresponding polynomial
+ */
+        pushq     %rbp
+        movq      %rsp, %rbp
+        andq      $-64, %rsp
+        subq      $448, %rsp
+        movq      __svml_dcos_data@GOTPCREL(%rip), %rax
+        vmovapd   %ymm0, %ymm1
+        vmovupd __dInvPI(%rax), %ymm4
+        vmovupd __dRShifter(%rax), %ymm5
+
+/*
+ * ARGUMENT RANGE REDUCTION:
+ * Add Pi/2 to argument: X' = X+Pi/2
+ */
+        vaddpd __dHalfPI(%rax), %ymm1, %ymm7
+
+/* Get absolute argument value: X' = |X'| */
+        vandpd __dAbsMask(%rax), %ymm7, %ymm2
+
+/* Y = X'*InvPi + RS : right shifter add */
+        vfmadd213pd %ymm5, %ymm4, %ymm7
+        vmovupd __dC7(%rax), %ymm4
+
+/* Check for large arguments path */
+        vcmpnle_uqpd __dRangeVal(%rax), %ymm2, %ymm3
+
+/* N = Y - RS : right shifter sub */
+        vsubpd    %ymm5, %ymm7, %ymm6
+        vmovupd __dPI1_FMA(%rax), %ymm2
+
+/* SignRes = Y<<63 : shift LSB to MSB place for result sign */
+        vpsllq    $63, %ymm7, %ymm7
+
+/* N = N - 0.5 */
+        vsubpd __dOneHalf(%rax), %ymm6, %ymm0
+        vmovmskpd %ymm3, %ecx
+
+/* R = X - N*Pi1 */
+        vmovapd   %ymm1, %ymm3
+        vfnmadd231pd %ymm0, %ymm2, %ymm3
+
+/* R = R - N*Pi2 */
+        vfnmadd231pd __dPI2_FMA(%rax), %ymm0, %ymm3
+
+/* R = R - N*Pi3 */
+        vfnmadd132pd __dPI3_FMA(%rax), %ymm3, %ymm0
+
+/*
+ * POLYNOMIAL APPROXIMATION:
+ * R2 = R*R
+ */
+        vmulpd    %ymm0, %ymm0, %ymm5
+        vfmadd213pd __dC6(%rax), %ymm5, %ymm4
+        vfmadd213pd __dC5(%rax), %ymm5, %ymm4
+        vfmadd213pd __dC4(%rax), %ymm5, %ymm4
+
+/* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))) */
+        vfmadd213pd __dC3(%rax), %ymm5, %ymm4
+
+/* Poly = R+R*(R2*(C1+R2*(C2+R2*Poly))) */
+        vfmadd213pd __dC2(%rax), %ymm5, %ymm4
+        vfmadd213pd __dC1(%rax), %ymm5, %ymm4
+        vmulpd    %ymm5, %ymm4, %ymm6
+        vfmadd213pd %ymm0, %ymm0, %ymm6
+
+/*
+ * RECONSTRUCTION:
+ * Final sign setting: Res = Poly^SignRes
+ */
+        vxorpd    %ymm7, %ymm6, %ymm0
+        testl     %ecx, %ecx
+        jne       .LBL_1_3
+
+.LBL_1_2:
+        movq      %rbp, %rsp
+        popq      %rbp
+        ret
+
+.LBL_1_3:
+        vmovupd   %ymm1, 320(%rsp)
+        vmovupd   %ymm0, 384(%rsp)
+        je        .LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        vmovups   %ymm8, 224(%rsp)
+        vmovups   %ymm9, 192(%rsp)
+        vmovups   %ymm10, 160(%rsp)
+        vmovups   %ymm11, 128(%rsp)
+        vmovups   %ymm12, 96(%rsp)
+        vmovups   %ymm13, 64(%rsp)
+        vmovups   %ymm14, 32(%rsp)
+        vmovups   %ymm15, (%rsp)
+        movq      %rsi, 264(%rsp)
+        movq      %rdi, 256(%rsp)
+        movq      %r12, 296(%rsp)
+        movb      %dl, %r12b
+        movq      %r13, 288(%rsp)
+        movl      %ecx, %r13d
+        movq      %r14, 280(%rsp)
+        movl      %eax, %r14d
+        movq      %r15, 272(%rsp)
+
+.LBL_1_6:
+        btl       %r14d, %r13d
+        jc        .LBL_1_12
+
+.LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        .LBL_1_10
+
+.LBL_1_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        .LBL_1_6
+
+        vmovups   224(%rsp), %ymm8
+        vmovups   192(%rsp), %ymm9
+        vmovups   160(%rsp), %ymm10
+        vmovups   128(%rsp), %ymm11
+        vmovups   96(%rsp), %ymm12
+        vmovups   64(%rsp), %ymm13
+        vmovups   32(%rsp), %ymm14
+        vmovups   (%rsp), %ymm15
+        vmovupd   384(%rsp), %ymm0
+        movq      264(%rsp), %rsi
+        movq      256(%rsp), %rdi
+        movq      296(%rsp), %r12
+        movq      288(%rsp), %r13
+        movq      280(%rsp), %r14
+        movq      272(%rsp), %r15
+        jmp       .LBL_1_2
+
+.LBL_1_10:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    328(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 392(%rsp,%r15)
+        jmp       .LBL_1_8
+
+.LBL_1_12:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    320(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 384(%rsp,%r15)
+        jmp       .LBL_1_7
+
+END(_ZGVdN4v_cos)
diff --git a/sysdeps/x86_64/fpu/svml_d_cos_data.S b/sysdeps/x86_64/fpu/svml_d_cos_data.S
new file mode 100644
index 0000000..4e9f36b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos_data.S
@@ -0,0 +1,147 @@
+/* Data for vectorized cos.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef D_COS_DATA
+#define D_COS_DATA
+
+/* Offsets for data table
+ */
+#define __dAbsMask		0
+#define __dRangeVal		64
+#define __dHalfPI		128
+#define __dInvPI		192
+#define __dRShifter		256
+#define __dOneHalf		320
+#define __dPI1			384
+#define __dPI2			448
+#define __dPI3			512
+#define __dPI4			576
+#define __dPI1_FMA		640
+#define __dPI2_FMA		704
+#define __dPI3_FMA		768
+#define __dC1			832
+#define __dC2			896
+#define __dC3			960
+#define __dC4			1024
+#define __dC5			1088
+#define __dC6			1152
+#define __dC7			1216
+#define __dAbsMask_la		1280
+#define __dInvPI_la		1344
+#define __dRShifter_la		1408
+#define __dRShifterm5_la	1472
+#define __dRXmax_la		1536
+
+#ifndef _DATA_TABLE_OFFSETS_ONLY_
+
+.macro double_vector offset value
+.if .-__svml_dcos_data != \offset
+.err
+.endif
+.rept 8
+.quad \value
+.endr
+.endm
+
+	.section .rodata, "a"
+	.align 64
+
+/* Data table for vector implementations of function cos. 
+ * The table may contain polynomial, reduction, lookup
+ * coefficients and other constants obtained through different
+ * methods of research and experimental work.
+ */
+	.globl __svml_dcos_data
+__svml_dcos_data:
+
+/* General purpose constants:
+ * absolute value mask
+ */
+double_vector __dAbsMask 0x7fffffffffffffff
+
+/* working range threshold */
+double_vector __dRangeVal 0x4160000000000000
+
+/* PI/2 */
+double_vector __dHalfPI 0x3ff921fb54442d18
+
+/* 1/PI */
+double_vector __dInvPI 0x3fd45f306dc9c883
+
+/* right-shifter constant */
+double_vector __dRShifter 0x4338000000000000
+
+/* 0.5 */
+double_vector __dOneHalf 0x3fe0000000000000
+
+/* Range reduction PI-based constants:
+ * PI high part
+ */
+double_vector __dPI1 0x400921fb40000000
+
+/* PI mid  part 1 */
+double_vector __dPI2 0x3e84442d00000000
+
+/* PI mid  part 2 */
+double_vector __dPI3 0x3d08469880000000
+
+/* PI low  part */
+double_vector __dPI4 0x3b88cc51701b839a
+
+/* Range reduction PI-based constants if FMA available:
+ * PI high part (FMA available)
+ */
+double_vector __dPI1_FMA 0x400921fb54442d18
+
+/* PI mid part  (FMA available) */
+double_vector __dPI2_FMA 0x3ca1a62633145c06
+
+/* PI low part  (FMA available) */
+double_vector __dPI3_FMA 0x395c1cd129024e09
+
+/* Polynomial coefficients (relative error 2^(-52.115)): */
+double_vector __dC1 0xbfc55555555554a7
+double_vector __dC2 0x3f8111111110a4a8
+double_vector __dC3 0xbf2a01a019a5b86d
+double_vector __dC4 0x3ec71de38030fea0
+double_vector __dC5 0xbe5ae63546002231
+double_vector __dC6 0x3de60e6857a2f220
+double_vector __dC7 0xbd69f0d60811aac8
+
+/*
+ * Additional constants:
+ * absolute value mask
+ */
+double_vector __dAbsMask_la 0x7fffffffffffffff
+
+/* 1/PI */
+double_vector __dInvPI_la 0x3fd45f306dc9c883
+
+/* right-shifer for low accuracy version */
+double_vector __dRShifter_la 0x4330000000000000 
+
+/* right-shifer-1.0 for low accuracy version */
+double_vector __dRShifterm5_la 0x432fffffffffffff
+
+/* right-shifer with low mask for low accuracy version */
+double_vector __dRXmax_la 0x43300000007ffffe
+
+	.type	__svml_dcos_data,@object
+	.size	__svml_dcos_data,.-__svml_dcos_data
+#endif
+#endif
diff --git a/sysdeps/x86_64/fpu/test-double-vlen2.c b/sysdeps/x86_64/fpu/test-double-vlen2.c
new file mode 100755
index 0000000..674c5de
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen2.c
@@ -0,0 +1,44 @@
+/* Tests for SSE4 ISA versions of vector math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-double-vlen2.h"
+#include <immintrin.h>
+
+// Wrapper from scalar to vector function implemented in SSE4.
+#define VECTOR_WRAPPER(scalar_func,vector_func) \
+extern __m128d vector_func ( __m128d);\
+FLOAT scalar_func (FLOAT x)\
+{\
+  int i;\
+  __m128d mx = _mm_set1_pd (x);\
+  __m128d mr = vector_func (mx);\
+  for (i=1;i<2;i++)\
+  {\
+    if (((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+VECTOR_WRAPPER (WRAPPER_NAME (cos),_ZGVbN2v_cos)
+
+#define TEST_VECTOR_cos 1
+
+#include "libm-test.c"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2.c
new file mode 100644
index 0000000..15b7930
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2.c
@@ -0,0 +1,48 @@
+/* Tests for AVX2 ISA versions of vector math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-double-vlen4.h"
+#include <immintrin.h>
+
+// Wrapper from scalar to vector function implemented in AVX2.
+#define VECTOR_WRAPPER(scalar_func,vector_func) \
+extern __m256d vector_func (__m256d);\
+FLOAT scalar_func (FLOAT x)\
+{\
+  int i;\
+  __m256d mx = _mm256_set1_pd (x);\
+  __m256d mr = vector_func (mx);\
+  for (i=1;i<4;i++)\
+  {\
+    if (((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#define VEC_PREFIX vlen4_avx2_
+
+VECTOR_WRAPPER (WRAPPER_NAME (cos),_ZGVdN4v_cos)
+
+#define TEST_VECTOR_cos 1
+
+#define REQUIRE_AVX2
+
+#include "libm-test.c"
diff --git a/sysdeps/x86_64/fpu/test-double-vlen4.c b/sysdeps/x86_64/fpu/test-double-vlen4.c
new file mode 100644
index 0000000..5f68af5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-double-vlen4.c
@@ -0,0 +1,46 @@
+/* Tests for AVX ISA versions of vector math functions.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-double-vlen4.h"
+#include <immintrin.h>
+
+// Wrapper from scalar to vector function implemented in AVX.
+#define VECTOR_WRAPPER(scalar_func,vector_func) \
+extern __m256d vector_func (__m256d);\
+FLOAT scalar_func (FLOAT x)\
+{\
+  int i;\
+  __m256d mx = _mm256_set1_pd (x);\
+  __m256d mr = vector_func (mx);\
+  for (i=1;i<4;i++)\
+  {\
+    if (((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#define VEC_PREFIX vlen4_avx_
+
+VECTOR_WRAPPER (WRAPPER_NAME (cos),_ZGVcN4v_cos)
+
+#define TEST_VECTOR_cos 1
+
+#include "libm-test.c"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/test-float-vlen8.c
new file mode 100644
index 0000000..fdb3b5f
--- /dev/null
+++ b/sysdeps/x86_64/fpu/test-float-vlen8.c
@@ -0,0 +1,45 @@
+/* Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "test-float-vlen8.h"
+
+#define VECTOR_WRAPPER(scalar_func,vector_func) \
+/*extern __m256 vector_func (__m256);*/\
+FLOAT scalar_func (FLOAT x)\
+{\
+  int i;\
+  __m256 mx = _mm256_set1_ps (x);\
+  __m256 mr = mx; /*vector_func (mx);*/\
+  for (i=1;i<8;i++)\
+  {\
+    if (((FLOAT*)&mr)[0]!=((FLOAT*)&mr)[i])\
+    {\
+      return ((FLOAT*)&mr)[0]+0.1;\
+    }\
+  }\
+  return ((FLOAT*)&mr)[0];\
+}
+
+#include <immintrin.h>
+
+VECTOR_WRAPPER (WRAPPER_NAME (cosf),_ZGVdN8v_cosf)
+
+#define TEST_VECTOR_cosf 0
+
+#define REQUIRE_AVX2
+
+#include "libm-test.c"

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC] How to add vector math functions to Glibc
  2014-11-18 19:06                                                             ` Andrew Senkevich
@ 2014-11-18 22:49                                                               ` Joseph Myers
  0 siblings, 0 replies; 67+ messages in thread
From: Joseph Myers @ 2014-11-18 22:49 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha

On Tue, 18 Nov 2014, Andrew Senkevich wrote:

> Hi Joseph,
> 
> attached patch now contain versions of vector cos in SSE4, AVX and AVX2 ISAs.
> 
> Because of both AVX and AVX2 versions have vector length 4 there are
> some changes in tests - I put AVX2 make rules in sysdeps makefile and
> its name changed to test-double-vlen4-avx2 , AVX test stay with old
> name, in ULPs file specified both versions.
> 
> If everything is okey, let me know and I will prepare separated
> patches while document about pragma meaning is preparing (we plan to
> add it at last steps).

The overall approach seems reasonable.  I fully expect further revisions 
to be needed to some of the individual patches once they are submitted.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2014-11-18 22:49 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-18 13:48 [RFC] How to add vector math functions to Glibc Andrew Senkevich
2014-09-18 13:57 ` Andrew Senkevich
2014-09-18 17:05   ` Joseph S. Myers
2014-09-22 11:48     ` Andrew Senkevich
2014-09-22 12:37       ` Joseph S. Myers
2014-09-24 19:46     ` Andrew Senkevich
2014-09-24 20:19       ` Joseph S. Myers
2014-09-25 15:18         ` Andrew Senkevich
2014-09-25 15:40           ` H.J. Lu
2014-09-25 19:27             ` Carlos O'Donell
2014-09-25 19:37               ` H.J. Lu
2014-09-25 19:55                 ` Carlos O'Donell
2014-09-25 20:03                   ` H.J. Lu
2014-09-25 20:48                     ` Carlos O'Donell
2014-09-26 13:46                       ` Andrew Senkevich
2014-09-26 14:13                         ` Carlos O'Donell
2014-09-26 14:15                         ` Carlos O'Donell
2014-09-30 15:00                           ` Andrew Senkevich
2014-09-30 15:44                             ` Andreas Schwab
2014-09-30 15:53                               ` Andrew Senkevich
2014-09-30 16:16                                 ` Andreas Schwab
2014-09-30 16:30                                   ` Andrew Senkevich
2014-09-30 16:35                             ` Joseph S. Myers
2014-09-30 18:40                               ` Christoph Lauter
2014-09-30 20:15                                 ` Joseph S. Myers
2014-10-02 11:55                                   ` Andrew Senkevich
2014-10-02 14:21                                     ` Joseph S. Myers
2014-10-09 17:10                                       ` Andrew Senkevich
2014-10-09 17:39                                         ` Andreas Schwab
2014-10-09 17:46                                           ` Joseph S. Myers
2014-10-09 17:45                                         ` Joseph S. Myers
2014-10-10 13:27                                           ` Andrew Senkevich
2014-10-10 15:23                                             ` Joseph S. Myers
2014-10-16 16:37                                           ` Andrew Senkevich
2014-10-16 21:51                                             ` Joseph S. Myers
2014-10-21 13:20                                               ` Andrew Senkevich
2014-10-21 15:29                                                 ` Joseph S. Myers
2014-10-23 19:23                                                   ` Andrew Senkevich
2014-10-23 21:37                                                     ` Joseph S. Myers
2014-10-27 14:00                                                       ` Andrew Senkevich
2014-10-27 14:39                                                         ` Joseph S. Myers
2014-10-29 13:00                                                       ` Andrew Senkevich
2014-10-29 18:50                                                         ` Joseph S. Myers
2014-10-30 12:15                                                           ` Andrew Senkevich
2014-10-30 13:55                                                             ` Joseph S. Myers
2014-10-30 20:07                                                               ` Joseph S. Myers
2014-10-31 10:24                                                                 ` Andrew Senkevich
2014-11-06 20:51                                                       ` Andrew Senkevich
2014-11-14 15:45                                                         ` Andrew Senkevich
2014-11-14 16:51                                                           ` Joseph Myers
2014-11-18 19:06                                                             ` Andrew Senkevich
2014-11-18 22:49                                                               ` Joseph Myers
2014-09-30 18:40                               ` Andrew Senkevich
2014-09-30 20:03                                 ` Joseph S. Myers
2014-10-01 13:26                                   ` Andrew Senkevich
2014-10-01 13:46                                     ` Joseph S. Myers
2014-10-01 18:47                               ` Andrew Senkevich
2014-09-26 15:03                       ` H.J. Lu
2014-09-26 15:48                         ` Carlos O'Donell
2014-09-26 16:08                           ` H.J. Lu
2014-09-26 17:55                             ` Carlos O'Donell
2014-09-26 18:06                               ` H.J. Lu
2014-09-30 16:17                               ` Andrew Pinski
2014-09-18 15:37 ` H.J. Lu
2014-09-18 17:29   ` Andrew Senkevich
2014-09-21 16:31 ` Andi Kleen
2014-09-25 19:43   ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).