[PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
@ 2014-09-10 15:08 Andrew Senkevich
  2014-09-10 16:08 ` Joseph S. Myers
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Senkevich @ 2014-09-10 15:08 UTC (permalink / raw)
  To: libc-alpha; +Cc: igor.zamyatin, Melik-Adamyan, Areg, jakub

Hi all,

this is the first patch in the series of patches which will add Intel
vectorized math functions to Glibc.

This is addition of only one function - cos vectorized with AVX2
instructions, vector length is 4.
We plan to add at first stage cos, sin, sincos, exp, log, pow.

Changes in math library testsuite at the moment in work and will be
presented soon.
Also changes in ABI document will be prepared.

Any comments are welcome!

ChangeLog

2014-09-10  Andrew Senkevich  <andrew.n.senkevich@gmail.com>

        * math/bits/mathcalls.h (__DECL_SIMD): New macro which helps to add
        vector function declaration.
        * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: _ZGVdN4v_cos added.
        * sysdeps/unix/sysv/linux/x86_64/64/localplt.data: New file.
        * sysdeps/x86_64/fpu/Makefile: New file, added new files to build.
        * sysdeps/x86_64/fpu/Versions: New file, _ZGVdN4v_cos added.
        * sysdeps/x86_64/fpu/svml_d_cos4_core.S (_ZGVdN4v_cos): New file.
        * sysdeps/x86_64/fpu/svml_d_cos_data.S: New file with data array.



--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-10 15:08 [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc Andrew Senkevich
@ 2014-09-10 16:08 ` Joseph S. Myers
  2014-09-11 10:11   ` Matthew Fortune
  2014-09-11 19:32   ` Carlos O'Donell
  0 siblings, 2 replies; 52+ messages in thread
From: Joseph S. Myers @ 2014-09-10 16:08 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha, igor.zamyatin, Melik-Adamyan, Areg, jakub

On Wed, 10 Sep 2014, Andrew Senkevich wrote:

> Hi all,
> 
> this is the first patch in the series of patches which will add Intel
> vectorized math functions to Glibc.

Please start by raising general design questions on libc-alpha before 
sending any patches; only send patches once there is consensus on the 
general questions and that consensus has been written up on a wiki page.  
For example:

* Should functions go in libm or a separate libmvec library?

* What requirements on the compiler / assembler versions used are imposed 
by the requirement that the ABI provided by glibc's shared libraries must 
not depend on the tools used to build glibc, and what such requirements is 
it OK to impose (it may be OK to move to GCC 4.6 as minimum compiler at 
present, but requiring a more recent version would be a problem; we'd need 
to consider what binutils version we can require)?  If a separate libmvec 
is used, is it OK simply not to build it if those requirements aren't met?  
(It's definitely not OK for the ABI of a library to vary incompatibly, but 
it might be OK for the presence of a library to be conditional.)

* Should it be declared that these vectorized functions do not set errno?  
(If so, then any header code that enables them to be used must of course 
avoid enabling them in the default -fmath-errno case.)  Similarly, do they 
follow the other goals documented in the glibc manual for accuracy of 
results and exceptions (for all input values, including e.g. range 
reduction)?  If not, further conditionals such as -ffast-math may be 
needed.

* How do we handle different glibc versions having vectorized functions 
for different vector ISA extensions?  You're using a single __DECL_SIMD, 
and having such a function only for AVX2.  But one glibc version could 
have a function vectorized for ISA extensions A and B, with another 
version adding it vectorized for C.  The compiler the user uses with the 
installed glibc headers must be able to tell from those headers which 
functions have what vectorized versions.  That is, if a glibc version is 
released where _Pragma ("omp declare simd") is used with a function that 
only has an AVX2 vectorized version, no past or future GCC version can 
interpret that pragma as meaning that any version other than AVX2 is 
available (it must be possible to use any installed glibc headers with 
both past and future compilers).

* Similarly, we need to handle different architectures having different 
sets of functions vectorized and possibly not having the same set of 
vector ISAs for each function.  Maybe this suggests having an 
architecture-specific bits/ header that, for each function that might be 
vectorized, defines a macro to tell the compiler what vector versions are 
available.  E.g.

#define __DECL_SIMD_COS_FLOAT /* empty */
#define __DECL_SIMD_COS_DOUBLE __DECL_SIMD_AVX2
#define __DECL_SIMD_COS_LONG_DOUBLE /* empty */

where the declaration of cos automatically uses __DECL_SIMD_COS_DOUBLE, 
and __DECL_SIMD_AVX2 expands to a directive whose semantics are agreed 
with compilers to mean that an AVX2 vectorized version of the function is 
available (but other vectorized versions may not be).

Obviously new functions go at new symbol versions (so GLIBC_2.21 at 
present, not GLIBC_2.2.5 with a completely inappropriate Versions comment 
in your patch).  I'd expect you to need appropriate section directives for 
the data table you add to ensure it goes in read-only data, not writable.  
And you shouldn't be adding a local PLT reference to cos; call an internal 
hidden function.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-10 16:08 ` Joseph S. Myers
@ 2014-09-11 10:11   ` Matthew Fortune
  2014-09-11 19:47     ` Adhemerval Zanella
                       ` (2 more replies)
  2014-09-11 19:32   ` Carlos O'Donell
  1 sibling, 3 replies; 52+ messages in thread
From: Matthew Fortune @ 2014-09-11 10:11 UTC (permalink / raw)
  To: Joseph S. Myers, Andrew Senkevich
  Cc: libc-alpha, igor.zamyatin, Melik-Adamyan, Areg, jakub

Joseph S. Myers <joseph@codesourcery.com> writes:
> On Wed, 10 Sep 2014, Andrew Senkevich wrote:
> 
> > Hi all,
> >
> > this is the first patch in the series of patches which will add Intel
> > vectorized math functions to Glibc.
> 
> Please start by raising general design questions on libc-alpha before
> sending any patches; only send patches once there is consensus on the
> general questions and that consensus has been written up on a wiki page.
> For example:

FWIW I had envisaged vectorised math functions looking more generic such
that they were not dependent on vector size:

void sin_simd (float* dst, float* src, int count)
{
  while (count > 0)
  {
    *dst = sinf (*src);
    dst++;
    src++;
    count--;
  }
}

With this pattern, the precise SIMD ISA extension is not important to the
caller. The additional cost of loading/storing the data compared with the
number of instructions required to perform the vectorised operation does
not seem significant. Ifunc is then the obvious way to select the widest
SIMD available at runtime.

Matthew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-10 16:08 ` Joseph S. Myers
  2014-09-11 10:11   ` Matthew Fortune
@ 2014-09-11 19:32   ` Carlos O'Donell
  2014-09-11 20:19     ` Zamyatin, Igor
  2014-09-16 16:57     ` Andrew Senkevich
  1 sibling, 2 replies; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-11 19:32 UTC (permalink / raw)
  To: Joseph S. Myers, Andrew Senkevich
  Cc: libc-alpha, igor.zamyatin, Melik-Adamyan, Areg, jakub

On 09/10/2014 12:07 PM, Joseph S. Myers wrote:
> On Wed, 10 Sep 2014, Andrew Senkevich wrote:
> 
>> Hi all,
>>
>> this is the first patch in the series of patches which will add Intel
>> vectorized math functions to Glibc.
> 
> Please start by raising general design questions on libc-alpha before 
> sending any patches; only send patches once there is consensus on the 
> general questions and that consensus has been written up on a wiki page.  

I agree.

Please create a wiki page for this design writeup.

https://sourceware.org/glibc/wiki/

Create an account and I can add you to the editors list.

https://sourceware.org/glibc/wiki/EditorGroup

> For example:

> * Should functions go in libm or a separate libmvec library?

I do not thing adding these to libm is a good idea, and libmvec should
be the right way forward, with coordination via the compiler driver to
add -lmvec to these functions.

> * What requirements on the compiler / assembler versions used are imposed 
> by the requirement that the ABI provided by glibc's shared libraries must 
> not depend on the tools used to build glibc, and what such requirements is 
> it OK to impose (it may be OK to move to GCC 4.6 as minimum compiler at 
> present, but requiring a more recent version would be a problem; we'd need 
> to consider what binutils version we can require)?  If a separate libmvec 
> is used, is it OK simply not to build it if those requirements aren't met?  
> (It's definitely not OK for the ABI of a library to vary incompatibly, but 
> it might be OK for the presence of a library to be conditional.)

That's right. This patch has no configure checks that I can see, so how
does it get built if the compiler or assembler doesn't support the required
features?

> * Should it be declared that these vectorized functions do not set errno?  
> (If so, then any header code that enables them to be used must of course 
> avoid enabling them in the default -fmath-errno case.)  Similarly, do they 
> follow the other goals documented in the glibc manual for accuracy of 
> results and exceptions (for all input values, including e.g. range 
> reduction)?  If not, further conditionals such as -ffast-math may be 
> needed.

I assume they follow the Intel documented definitions of these functions.

A pointer to those again would be helpful.

> * How do we handle different glibc versions having vectorized functions 
> for different vector ISA extensions?  You're using a single __DECL_SIMD, 
> and having such a function only for AVX2.  But one glibc version could 
> have a function vectorized for ISA extensions A and B, with another 
> version adding it vectorized for C.  The compiler the user uses with the 
> installed glibc headers must be able to tell from those headers which 
> functions have what vectorized versions.  That is, if a glibc version is 
> released where _Pragma ("omp declare simd") is used with a function that 
> only has an AVX2 vectorized version, no past or future GCC version can 
> interpret that pragma as meaning that any version other than AVX2 is 
> available (it must be possible to use any installed glibc headers with 
> both past and future compilers).
> 
> * Similarly, we need to handle different architectures having different 
> sets of functions vectorized and possibly not having the same set of 
> vector ISAs for each function.  Maybe this suggests having an 
> architecture-specific bits/ header that, for each function that might be 
> vectorized, defines a macro to tell the compiler what vector versions are 
> available.  E.g.
> 
> #define __DECL_SIMD_COS_FLOAT /* empty */
> #define __DECL_SIMD_COS_DOUBLE __DECL_SIMD_AVX2
> #define __DECL_SIMD_COS_LONG_DOUBLE /* empty */
> 
> where the declaration of cos automatically uses __DECL_SIMD_COS_DOUBLE, 
> and __DECL_SIMD_AVX2 expands to a directive whose semantics are agreed 
> with compilers to mean that an AVX2 vectorized version of the function is 
> available (but other vectorized versions may not be).
> 
> Obviously new functions go at new symbol versions (so GLIBC_2.21 at 
> present, not GLIBC_2.2.5 with a completely inappropriate Versions comment 
> in your patch).  I'd expect you to need appropriate section directives for 
> the data table you add to ensure it goes in read-only data, not writable.  
> And you shouldn't be adding a local PLT reference to cos; call an internal 
> hidden function.
> 

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 10:11   ` Matthew Fortune
@ 2014-09-11 19:47     ` Adhemerval Zanella
  2014-09-11 20:00     ` Carlos O'Donell
  2014-09-11 21:02     ` Rich Felker
  2 siblings, 0 replies; 52+ messages in thread
From: Adhemerval Zanella @ 2014-09-11 19:47 UTC (permalink / raw)
  To: libc-alpha

On 11-09-2014 07:11, Matthew Fortune wrote:
> Joseph S. Myers <joseph@codesourcery.com> writes:
>> On Wed, 10 Sep 2014, Andrew Senkevich wrote:
>>
>>> Hi all,
>>>
>>> this is the first patch in the series of patches which will add Intel
>>> vectorized math functions to Glibc.
>> Please start by raising general design questions on libc-alpha before
>> sending any patches; only send patches once there is consensus on the
>> general questions and that consensus has been written up on a wiki page.
>> For example:
> FWIW I had envisaged vectorised math functions looking more generic such
> that they were not dependent on vector size:
>
> void sin_simd (float* dst, float* src, int count)
> {
>   while (count > 0)
>   {
>     *dst = sinf (*src);
>     dst++;
>     src++;
>     count--;
>   }
> }
>
> With this pattern, the precise SIMD ISA extension is not important to the
> caller. The additional cost of loading/storing the data compared with the
> number of instructions required to perform the vectorised operation does
> not seem significant. Ifunc is then the obvious way to select the widest
> SIMD available at runtime.

That not the AIM of this proposal, neither it is best suited for a SIMD
math library.  For most cases I want to work on limited and fixed sizes provided
by ISA, and to make function calls using an ABI to pass arguments along registers.
For instance, using AVX from Intel or VSX for POWER I would like to pass along
128-bits of data (4xfloat) in register and return is in same way.  The same
principle applies to longer SIMD types, for instance AVX2.

That's why Joseph has put 'But one glibc version could have a function vectorized
for ISA extensions A and B, with another  version adding it vectorized for C.'

Right now, I would like some discussion on how we will handle all the question
Joseph has posed, before any patch posting.

>
> Matthew
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 10:11   ` Matthew Fortune
  2014-09-11 19:47     ` Adhemerval Zanella
@ 2014-09-11 20:00     ` Carlos O'Donell
  2014-09-11 21:02     ` Rich Felker
  2 siblings, 0 replies; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-11 20:00 UTC (permalink / raw)
  To: Matthew Fortune, Joseph S. Myers, Andrew Senkevich
  Cc: libc-alpha, igor.zamyatin, Melik-Adamyan, Areg, jakub

On 09/11/2014 06:11 AM, Matthew Fortune wrote:
> Joseph S. Myers <joseph@codesourcery.com> writes:
>> On Wed, 10 Sep 2014, Andrew Senkevich wrote:
>>
>>> Hi all,
>>>
>>> this is the first patch in the series of patches which will add Intel
>>> vectorized math functions to Glibc.
>>
>> Please start by raising general design questions on libc-alpha before
>> sending any patches; only send patches once there is consensus on the
>> general questions and that consensus has been written up on a wiki page.
>> For example:
> 
> FWIW I had envisaged vectorised math functions looking more generic such
> that they were not dependent on vector size:
> 
> void sin_simd (float* dst, float* src, int count)
> {
>   while (count > 0)
>   {
>     *dst = sinf (*src);
>     dst++;
>     src++;
>     count--;
>   }
> }
> 
> With this pattern, the precise SIMD ISA extension is not important to the
> caller. The additional cost of loading/storing the data compared with the
> number of instructions required to perform the vectorised operation does
> not seem significant. Ifunc is then the obvious way to select the widest
> SIMD available at runtime.

This isn't going to happen right now. It might be a distinct API, but right
now we are talking about existing API's created by Intel, and IBM, that
developers have already been using or know how to use.

A new GNU API for vectorized math functions is a distinct topic that we can
talk about, and talk about in generalities.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 19:32   ` Carlos O'Donell
@ 2014-09-11 20:19     ` Zamyatin, Igor
  2014-09-11 20:26       ` Carlos O'Donell
  2014-09-11 20:52       ` Joseph S. Myers
  2014-09-16 16:57     ` Andrew Senkevich
  1 sibling, 2 replies; 52+ messages in thread
From: Zamyatin, Igor @ 2014-09-11 20:19 UTC (permalink / raw)
  To: Carlos O'Donell, Joseph S. Myers, Andrew Senkevich
  Cc: libc-alpha, Melik-Adamyan, Areg, jakub

> >
> > Please start by raising general design questions on libc-alpha before
> > sending any patches; only send patches once there is consensus on the
> > general questions and that consensus has been written up on a wiki page.
> 

Joseph, thanks for bringing all these items for discussion.

> I agree.
> 
> Please create a wiki page for this design writeup.
> 
> https://sourceware.org/glibc/wiki/
> 
> Create an account and I can add you to the editors list.
> 
> https://sourceware.org/glibc/wiki/EditorGroup

Created - IgorZamyatin

> 
> > For example:
> 
> > * Should functions go in libm or a separate libmvec library?
> 
> I do not thing adding these to libm is a good idea, and libmvec should be the
> right way forward, with coordination via the compiler driver to add -lmvec to
> these functions.
> 

No problem, we will make changes according this approach.

> > * What requirements on the compiler / assembler versions used are
> > imposed by the requirement that the ABI provided by glibc's shared
> > libraries must not depend on the tools used to build glibc, and what
> > such requirements is it OK to impose (it may be OK to move to GCC 4.6
> > as minimum compiler at present, but requiring a more recent version
> > would be a problem; we'd need to consider what binutils version we can
> > require)?  If a separate libmvec is used, is it OK simply not to build it if those
> requirements aren't met?
> > (It's definitely not OK for the ABI of a library to vary incompatibly,
> > but it might be OK for the presence of a library to be conditional.)
> 
> That's right. This patch has no configure checks that I can see, so how does it
> get built if the compiler or assembler doesn't support the required features?

Compiler is checked by _OPENMP macro.
As for assembler - I agree, we should add checks for different ISAs. 

> 
> > * Should it be declared that these vectorized functions do not set errno?
> > (If so, then any header code that enables them to be used must of
> > course avoid enabling them in the default -fmath-errno case.)
> > Similarly, do they follow the other goals documented in the glibc
> > manual for accuracy of results and exceptions (for all input values,
> > including e.g. range reduction)?  If not, further conditionals such as
> > -ffast-math may be needed.
> 
> I assume they follow the Intel documented definitions of these functions.
> 
> A pointer to those again would be helpful.

Will do.

> 
> > * How do we handle different glibc versions having vectorized
> > functions for different vector ISA extensions?  You're using a single
> > __DECL_SIMD, and having such a function only for AVX2.  But one glibc
> > version could have a function vectorized for ISA extensions A and B,
> > with another version adding it vectorized for C.  The compiler the
> > user uses with the installed glibc headers must be able to tell from
> > those headers which functions have what vectorized versions.  That is,
> > if a glibc version is released where _Pragma ("omp declare simd") is
> > used with a function that only has an AVX2 vectorized version, no past
> > or future GCC version can interpret that pragma as meaning that any
> > version other than AVX2 is available (it must be possible to use any
> > installed glibc headers with both past and future compilers).

We can require providing versions for all possible ISAs (we are going to do this for x86) so compiler should do all work itself.

Thanks,
Igor

> >
> > * Similarly, we need to handle different architectures having
> > different sets of functions vectorized and possibly not having the
> > same set of vector ISAs for each function.  Maybe this suggests having
> > an architecture-specific bits/ header that, for each function that
> > might be vectorized, defines a macro to tell the compiler what vector
> > versions are available.  E.g.
> >
> > #define __DECL_SIMD_COS_FLOAT /* empty */ #define
> > __DECL_SIMD_COS_DOUBLE __DECL_SIMD_AVX2 #define
> > __DECL_SIMD_COS_LONG_DOUBLE /* empty */
> >
> > where the declaration of cos automatically uses
> > __DECL_SIMD_COS_DOUBLE, and __DECL_SIMD_AVX2 expands to a
> directive
> > whose semantics are agreed with compilers to mean that an AVX2
> > vectorized version of the function is available (but other vectorized
> versions may not be).
> >
> > Obviously new functions go at new symbol versions (so GLIBC_2.21 at
> > present, not GLIBC_2.2.5 with a completely inappropriate Versions
> > comment in your patch).  I'd expect you to need appropriate section
> > directives for the data table you add to ensure it goes in read-only data, not
> writable.
> > And you shouldn't be adding a local PLT reference to cos; call an
> > internal hidden function.
> >
> 
> Cheers,
> Carlos.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 20:19     ` Zamyatin, Igor
@ 2014-09-11 20:26       ` Carlos O'Donell
  2014-09-11 20:52       ` Joseph S. Myers
  1 sibling, 0 replies; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-11 20:26 UTC (permalink / raw)
  To: Zamyatin, Igor, Joseph S. Myers, Andrew Senkevich
  Cc: libc-alpha, Melik-Adamyan, Areg, jakub

On 09/11/2014 04:19 PM, Zamyatin, Igor wrote:
> Created - IgorZamyatin

You're in the editor group and can create a wiki page design document.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 20:19     ` Zamyatin, Igor
  2014-09-11 20:26       ` Carlos O'Donell
@ 2014-09-11 20:52       ` Joseph S. Myers
  2014-09-11 20:57         ` H.J. Lu
  1 sibling, 1 reply; 52+ messages in thread
From: Joseph S. Myers @ 2014-09-11 20:52 UTC (permalink / raw)
  To: Zamyatin, Igor
  Cc: Carlos O'Donell, Andrew Senkevich, libc-alpha, Melik-Adamyan,
	Areg, jakub

On Thu, 11 Sep 2014, Zamyatin, Igor wrote:

> > > * How do we handle different glibc versions having vectorized
> > > functions for different vector ISA extensions?  You're using a single
> > > __DECL_SIMD, and having such a function only for AVX2.  But one glibc
> > > version could have a function vectorized for ISA extensions A and B,
> > > with another version adding it vectorized for C.  The compiler the
> > > user uses with the installed glibc headers must be able to tell from
> > > those headers which functions have what vectorized versions.  That is,
> > > if a glibc version is released where _Pragma ("omp declare simd") is
> > > used with a function that only has an AVX2 vectorized version, no past
> > > or future GCC version can interpret that pragma as meaning that any
> > > version other than AVX2 is available (it must be possible to use any
> > > installed glibc headers with both past and future compilers).
> 
> We can require providing versions for all possible ISAs (we are going to 
> do this for x86) so compiler should do all work itself.

That doesn't answer my question.  Maybe glibc 2.21 provides such versions 
for all x86 ISAs there are at present, up to AVX512 - and then a new 
extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and 
-mavx1024, it must not try to generate calls to the AVX1024 functions, 
because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds 
the AVX1024 functions.  So something needs to be different in the headers 
of 2.26 to inform GCC 7 that AVX1024 versions of the functions are 
available.  And I think that means the directive that communicates 
function availability to the compiler needs to identify the set of ISAs 
for which versions of the function in question are available.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 20:52       ` Joseph S. Myers
@ 2014-09-11 20:57         ` H.J. Lu
  2014-09-12  0:10           ` Carlos O'Donell
  0 siblings, 1 reply; 52+ messages in thread
From: H.J. Lu @ 2014-09-11 20:57 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Zamyatin, Igor, Carlos O'Donell, Andrew Senkevich,
	libc-alpha, Melik-Adamyan, Areg, jakub

On Thu, Sep 11, 2014 at 1:52 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Thu, 11 Sep 2014, Zamyatin, Igor wrote:
>
>> > > * How do we handle different glibc versions having vectorized
>> > > functions for different vector ISA extensions?  You're using a single
>> > > __DECL_SIMD, and having such a function only for AVX2.  But one glibc
>> > > version could have a function vectorized for ISA extensions A and B,
>> > > with another version adding it vectorized for C.  The compiler the
>> > > user uses with the installed glibc headers must be able to tell from
>> > > those headers which functions have what vectorized versions.  That is,
>> > > if a glibc version is released where _Pragma ("omp declare simd") is
>> > > used with a function that only has an AVX2 vectorized version, no past
>> > > or future GCC version can interpret that pragma as meaning that any
>> > > version other than AVX2 is available (it must be possible to use any
>> > > installed glibc headers with both past and future compilers).
>>
>> We can require providing versions for all possible ISAs (we are going to
>> do this for x86) so compiler should do all work itself.
>
> That doesn't answer my question.  Maybe glibc 2.21 provides such versions
> for all x86 ISAs there are at present, up to AVX512 - and then a new
> extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and
> -mavx1024, it must not try to generate calls to the AVX1024 functions,
> because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds
> the AVX1024 functions.  So something needs to be different in the headers
> of 2.26 to inform GCC 7 that AVX1024 versions of the functions are
> available.  And I think that means the directive that communicates
> function availability to the compiler needs to identify the set of ISAs
> for which versions of the function in question are available.
>

Wouldn't it be better to put libmvec in GCC instead?

-- 
H.J.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 10:11   ` Matthew Fortune
  2014-09-11 19:47     ` Adhemerval Zanella
  2014-09-11 20:00     ` Carlos O'Donell
@ 2014-09-11 21:02     ` Rich Felker
  2014-09-12  0:06       ` Carlos O'Donell
  2014-09-12  5:33       ` Andi Kleen
  2 siblings, 2 replies; 52+ messages in thread
From: Rich Felker @ 2014-09-11 21:02 UTC (permalink / raw)
  To: Matthew Fortune
  Cc: Joseph S. Myers, Andrew Senkevich, libc-alpha, igor.zamyatin,
	Melik-Adamyan, Areg, jakub

On Thu, Sep 11, 2014 at 10:11:14AM +0000, Matthew Fortune wrote:
> Joseph S. Myers <joseph@codesourcery.com> writes:
> > On Wed, 10 Sep 2014, Andrew Senkevich wrote:
> > 
> > > Hi all,
> > >
> > > this is the first patch in the series of patches which will add Intel
> > > vectorized math functions to Glibc.
> > 
> > Please start by raising general design questions on libc-alpha before
> > sending any patches; only send patches once there is consensus on the
> > general questions and that consensus has been written up on a wiki page.
> > For example:
> 
> FWIW I had envisaged vectorised math functions looking more generic such
> that they were not dependent on vector size:
> 
> void sin_simd (float* dst, float* src, int count)
> {
>   while (count > 0)
>   {
>     *dst = sinf (*src);
>     dst++;
>     src++;
>     count--;
>   }
> }
> 
> With this pattern, the precise SIMD ISA extension is not important to the
> caller. The additional cost of loading/storing the data compared with the
> number of instructions required to perform the vectorised operation does
> not seem significant. Ifunc is then the obvious way to select the widest
> SIMD available at runtime.

This really seems like something the compiler should be doing --
translating parallelizable calls to the standard math functions into
calls to special simd versions (or better yet, LTO'ing in an
on-the-fly simd version based on the non-simd-specific code in libm.a)
rather than having applications written to a klunky API that's
designed around particular hardware features.

Rich

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 21:02     ` Rich Felker
@ 2014-09-12  0:06       ` Carlos O'Donell
  2014-09-12  5:33       ` Andi Kleen
  1 sibling, 0 replies; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-12  0:06 UTC (permalink / raw)
  To: Rich Felker, Matthew Fortune
  Cc: Joseph S. Myers, Andrew Senkevich, libc-alpha, igor.zamyatin,
	Melik-Adamyan, Areg, jakub

On 09/11/2014 05:02 PM, Rich Felker wrote:
> This really seems like something the compiler should be doing --
> translating parallelizable calls to the standard math functions into
> calls to special simd versions (or better yet, LTO'ing in an
> on-the-fly simd version based on the non-simd-specific code in libm.a)
> rather than having applications written to a klunky API that's
> designed around particular hardware features.

Nothing is stopping the compiler from doing exactly what you say and
I know people who are working on it.

However, I think we should also support the klunky APIs that have been
in use at companies like Intel and IBM in something like libmvec.

So it's not one or the other, but both.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 20:57         ` H.J. Lu
@ 2014-09-12  0:10           ` Carlos O'Donell
  2014-09-12 15:01             ` H.J. Lu
  0 siblings, 1 reply; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-12  0:10 UTC (permalink / raw)
  To: H.J. Lu, Joseph S. Myers
  Cc: Zamyatin, Igor, Andrew Senkevich, libc-alpha, Melik-Adamyan, Areg, jakub

On 09/11/2014 04:57 PM, H.J. Lu wrote:
>> That doesn't answer my question.  Maybe glibc 2.21 provides such versions
>> for all x86 ISAs there are at present, up to AVX512 - and then a new
>> extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and
>> -mavx1024, it must not try to generate calls to the AVX1024 functions,
>> because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds
>> the AVX1024 functions.  So something needs to be different in the headers
>> of 2.26 to inform GCC 7 that AVX1024 versions of the functions are
>> available.  And I think that means the directive that communicates
>> function availability to the compiler needs to identify the set of ISAs
>> for which versions of the function in question are available.
>>
> 
> Wouldn't it be better to put libmvec in GCC instead?
 
That's certainly a discussion we can have.

What do you see as the pros and cons?

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 21:02     ` Rich Felker
  2014-09-12  0:06       ` Carlos O'Donell
@ 2014-09-12  5:33       ` Andi Kleen
  2014-09-12  7:18         ` Ondřej Bílka
                           ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Andi Kleen @ 2014-09-12  5:33 UTC (permalink / raw)
  To: Rich Felker
  Cc: Matthew Fortune, Joseph S. Myers, Andrew Senkevich, libc-alpha,
	igor.zamyatin, Melik-Adamyan, Areg, jakub

Rich Felker <dalias@libc.org> writes:
>
> This really seems like something the compiler should be doing --
> translating parallelizable calls to the standard math functions into
> calls to special simd versions (

Of course gcc already supports that. Even in two different flavours.

Not sure why the patch doesn't implement one of those ABIs though.

     -mveclibabi=type
           Specifies the ABI type to use for vectorizing intrinsics
           using an external library.
           Supported values for type are svml for the Intel short vector
           math library and acml for
           the AMD math core library.  To use this option, both
           -ftree-vectorize and
           -funsafe-math-optimizations have to be enabled, and an SVML
           or ACML ABI-compatible
           library must be specified at link time.

           GCC currently emits calls to "vmldExp2", "vmldLn2",
           "vmldLog102", "vmldLog102",
           "vmldPow2", "vmldTanh2", "vmldTan2", "vmldAtan2",
           "vmldAtanh2", "vmldCbrt2",
           "vmldSinh2", "vmldSin2", "vmldAsinh2", "vmldAsin2",
           "vmldCosh2", "vmldCos2",
           "vmldAcosh2", "vmldAcos2", "vmlsExp4", "vmlsLn4",
           "vmlsLog104", "vmlsLog104",
           "vmlsPow4", "vmlsTanh4", "vmlsTan4", "vmlsAtan4",
           "vmlsAtanh4", "vmlsCbrt4",
           "vmlsSinh4", "vmlsSin4", "vmlsAsinh4", "vmlsAsin4",
           "vmlsCosh4", "vmlsCos4",
           "vmlsAcosh4" and "vmlsAcos4" for corresponding function type
           when -mveclibabi=svml is
           used, and "__vrd2_sin", "__vrd2_cos", "__vrd2_exp",
           "__vrd2_log", "__vrd2_log2",
           "__vrd2_log10", "__vrs4_sinf", "__vrs4_cosf", "__vrs4_expf",
           "__vrs4_logf",
           "__vrs4_log2f", "__vrs4_log10f" and "__vrs4_powf" for the
           corresponding function type
           when -mveclibabi=acml is used.


-Andi


-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12  5:33       ` Andi Kleen
@ 2014-09-12  7:18         ` Ondřej Bílka
  2014-09-12 17:04           ` Andi Kleen
  2014-09-12  7:43         ` Jakub Jelinek
  2014-09-12 19:18         ` Carlos O'Donell
  2 siblings, 1 reply; 52+ messages in thread
From: Ondřej Bílka @ 2014-09-12  7:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Rich Felker, Matthew Fortune, Joseph S. Myers, Andrew Senkevich,
	libc-alpha, igor.zamyatin, Melik-Adamyan, Areg, jakub

On Thu, Sep 11, 2014 at 10:33:41PM -0700, Andi Kleen wrote:
> Rich Felker <dalias@libc.org> writes:
> >
> > This really seems like something the compiler should be doing --
> > translating parallelizable calls to the standard math functions into
> > calls to special simd versions (
> 
> Of course gcc already supports that. Even in two different flavours.
> 
> Not sure why the patch doesn't implement one of those ABIs though.
> 
>      -mveclibabi=type
>            Specifies the ABI type to use for vectorizing intrinsics
>            using an external library.
>            Supported values for type are svml for the Intel short vector
>            math library and acml for
>            the AMD math core library.  To use this option, both
>            -ftree-vectorize and
>            -funsafe-math-optimizations have to be enabled, and an SVML
>            or ACML ABI-compatible
>            library must be specified at link time.
> 
>            GCC currently emits calls to "vmldExp2", "vmldLn2",

Which has problem when one want to support both users with svml, amcl or
nothing package maintainers for some reason do not want create three
versions of same package.

What about doing runtime detection what is present? With ifunc one could
make use logic like

int vectorized;
function_ifunc ()
{
  if (!(svml = dlopen ("svml.so")))
    {
      if (!(amcl = dlopen ("amcl.so")))
        return function;
      vec_exp = dlsym (amcl, "__vrd2_exp");
      return function;
    }
  else
    {
      vec_exp = dlsym (svml, "vmldExp2");
      return function;
    }
}

when vectorized loop could look like

if (size < 4 || !vec_exp)
  goto simple_loop;
else
  goto vector_loop;

That would also preserve compatibility and allow to add avx versions
with detection if processor supports them.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12  5:33       ` Andi Kleen
  2014-09-12  7:18         ` Ondřej Bílka
@ 2014-09-12  7:43         ` Jakub Jelinek
  2014-09-12 16:55           ` Andi Kleen
  2014-09-12 17:03           ` Joseph S. Myers
  2014-09-12 19:18         ` Carlos O'Donell
  2 siblings, 2 replies; 52+ messages in thread
From: Jakub Jelinek @ 2014-09-12  7:43 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Rich Felker, Matthew Fortune, Joseph S. Myers, Andrew Senkevich,
	libc-alpha, igor.zamyatin, Melik-Adamyan, Areg

On Thu, Sep 11, 2014 at 10:33:41PM -0700, Andi Kleen wrote:
> Rich Felker <dalias@libc.org> writes:
> >
> > This really seems like something the compiler should be doing --
> > translating parallelizable calls to the standard math functions into
> > calls to special simd versions (
> 
> Of course gcc already supports that. Even in two different flavours.
> 
> Not sure why the patch doesn't implement one of those ABIs though.

Because GCC supports even another flavor, which presumably the patches
implement.  The two you are mentioning are for compatibility with existing
math libraries.  The third one is used by #pragma omp declare simd
functions and Cilk+ elemental functions.  So, to use those you don't
need any extra gcc support, glibc headers could just add
#if defined(__OPENMP) && __OPENMP >= 201307
#pragma omp declare simd
#endif
on the prototypes (of course maybe with some clauses if needed).

IMHO entry points which take arbitrary length vectors in memory plus counts
are generally useful, you can do whatever you want under the hood, but
entrypoints with arguments directly in vector registers are useful too, you
don't have to use IFUNC for that, the compiler knows what version it
prefers, and if the body of the vectorized loop does more than just
compute a single transcendental function in a tight loop often will be more
handy than having to spill all the vectors to memory and pass it that way.

	Jakub

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12  0:10           ` Carlos O'Donell
@ 2014-09-12 15:01             ` H.J. Lu
  2014-09-12 15:10               ` Carlos O'Donell
  0 siblings, 1 reply; 52+ messages in thread
From: H.J. Lu @ 2014-09-12 15:01 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Joseph S. Myers, Zamyatin, Igor, Andrew Senkevich, libc-alpha,
	Melik-Adamyan, Areg, jakub

On Thu, Sep 11, 2014 at 5:10 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/11/2014 04:57 PM, H.J. Lu wrote:
>>> That doesn't answer my question.  Maybe glibc 2.21 provides such versions
>>> for all x86 ISAs there are at present, up to AVX512 - and then a new
>>> extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and
>>> -mavx1024, it must not try to generate calls to the AVX1024 functions,
>>> because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds
>>> the AVX1024 functions.  So something needs to be different in the headers
>>> of 2.26 to inform GCC 7 that AVX1024 versions of the functions are
>>> available.  And I think that means the directive that communicates
>>> function availability to the compiler needs to identify the set of ISAs
>>> for which versions of the function in question are available.
>>>
>>
>> Wouldn't it be better to put libmvec in GCC instead?
>
> That's certainly a discussion we can have.
>
> What do you see as the pros and cons?
>

It depends on who are the main target users of this library.
If it is mainly for programmers to use them directly in their
applications, mostly independent of compilers, it should be
in glibc.  But if it is mainly used by GCC, it should be in
GCC, just like other run-time libraries.


-- 
H.J.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 15:01             ` H.J. Lu
@ 2014-09-12 15:10               ` Carlos O'Donell
  2014-09-12 16:00                 ` Torvald Riegel
  2014-09-12 19:05                 ` H.J. Lu
  0 siblings, 2 replies; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-12 15:10 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Joseph S. Myers, Zamyatin, Igor, Andrew Senkevich, libc-alpha,
	Melik-Adamyan, Areg, jakub

On 09/12/2014 11:01 AM, H.J. Lu wrote:
> On Thu, Sep 11, 2014 at 5:10 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 09/11/2014 04:57 PM, H.J. Lu wrote:
>>>> That doesn't answer my question.  Maybe glibc 2.21 provides such versions
>>>> for all x86 ISAs there are at present, up to AVX512 - and then a new
>>>> extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and
>>>> -mavx1024, it must not try to generate calls to the AVX1024 functions,
>>>> because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds
>>>> the AVX1024 functions.  So something needs to be different in the headers
>>>> of 2.26 to inform GCC 7 that AVX1024 versions of the functions are
>>>> available.  And I think that means the directive that communicates
>>>> function availability to the compiler needs to identify the set of ISAs
>>>> for which versions of the function in question are available.
>>>>
>>>
>>> Wouldn't it be better to put libmvec in GCC instead?
>>
>> That's certainly a discussion we can have.
>>
>> What do you see as the pros and cons?
>>
> 
> It depends on who are the main target users of this library.
> If it is mainly for programmers to use them directly in their
> applications, mostly independent of compilers, it should be
> in glibc.  But if it is mainly used by GCC, it should be in
> GCC, just like other run-time libraries.

The former. I want users to be able to call these functions
directly regardless of the compiler. The same goes for the
ppc-related API that has been in use for a long time by
developers there.

The compiler can certainly make use of these functions, and
any more standard cross-machine GNU API we design, but it
should always be possible to call them directly.

Does this mean libmvec should be in glibc?

Cheers,
Carlos.
 
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 15:10               ` Carlos O'Donell
@ 2014-09-12 16:00                 ` Torvald Riegel
  2014-09-12 17:37                   ` Carlos O'Donell
  2014-09-12 19:05                 ` H.J. Lu
  1 sibling, 1 reply; 52+ messages in thread
From: Torvald Riegel @ 2014-09-12 16:00 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: H.J. Lu, Joseph S. Myers, Zamyatin, Igor, Andrew Senkevich,
	libc-alpha, Melik-Adamyan, Areg, jakub

On Fri, 2014-09-12 at 11:10 -0400, Carlos O'Donell wrote:
> On 09/12/2014 11:01 AM, H.J. Lu wrote:
> > On Thu, Sep 11, 2014 at 5:10 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> >> On 09/11/2014 04:57 PM, H.J. Lu wrote:
> >>>> That doesn't answer my question.  Maybe glibc 2.21 provides such versions
> >>>> for all x86 ISAs there are at present, up to AVX512 - and then a new
> >>>> extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and
> >>>> -mavx1024, it must not try to generate calls to the AVX1024 functions,
> >>>> because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds
> >>>> the AVX1024 functions.  So something needs to be different in the headers
> >>>> of 2.26 to inform GCC 7 that AVX1024 versions of the functions are
> >>>> available.  And I think that means the directive that communicates
> >>>> function availability to the compiler needs to identify the set of ISAs
> >>>> for which versions of the function in question are available.
> >>>>
> >>>
> >>> Wouldn't it be better to put libmvec in GCC instead?
> >>
> >> That's certainly a discussion we can have.
> >>
> >> What do you see as the pros and cons?
> >>
> > 
> > It depends on who are the main target users of this library.
> > If it is mainly for programmers to use them directly in their
> > applications, mostly independent of compilers, it should be
> > in glibc.  But if it is mainly used by GCC, it should be in
> > GCC, just like other run-time libraries.
> 
> The former. I want users to be able to call these functions
> directly regardless of the compiler. The same goes for the
> ppc-related API that has been in use for a long time by
> developers there.

From a long-term maintenance perspective, it seems easier to support
higher-level SIMD / vectorization abstractions in programming languages
than glibc APIs for each and every HW vector instruction variant.  We'd
have an ABI to maintain too if we chose the runtime library route, but
then at least it's not another glibc API.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12  7:43         ` Jakub Jelinek
@ 2014-09-12 16:55           ` Andi Kleen
  2014-09-12 17:03           ` Joseph S. Myers
  1 sibling, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2014-09-12 16:55 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Andi Kleen, Rich Felker, Matthew Fortune, Joseph S. Myers,
	Andrew Senkevich, libc-alpha, igor.zamyatin, Melik-Adamyan, Areg

> Because GCC supports even another flavor, which presumably the patches
> implement.  The two you are mentioning are for compatibility with existing
> math libraries.  The third one is used by #pragma omp declare simd
> functions and Cilk+ elemental functions.  So, to use those you don't
> need any extra gcc support, glibc headers could just add
> #if defined(__OPENMP) && __OPENMP >= 201307
> #pragma omp declare simd
> #endif
> on the prototypes (of course maybe with some clauses if needed).

Makes sense. Thanks.

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12  7:43         ` Jakub Jelinek
  2014-09-12 16:55           ` Andi Kleen
@ 2014-09-12 17:03           ` Joseph S. Myers
  2014-09-12 17:09             ` Jakub Jelinek
  1 sibling, 1 reply; 52+ messages in thread
From: Joseph S. Myers @ 2014-09-12 17:03 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Andi Kleen, Rich Felker, Matthew Fortune, Andrew Senkevich,
	libc-alpha, igor.zamyatin, Melik-Adamyan, Areg

On Fri, 12 Sep 2014, Jakub Jelinek wrote:

> Because GCC supports even another flavor, which presumably the patches
> implement.  The two you are mentioning are for compatibility with existing
> math libraries.  The third one is used by #pragma omp declare simd
> functions and Cilk+ elemental functions.  So, to use those you don't
> need any extra gcc support, glibc headers could just add
> #if defined(__OPENMP) && __OPENMP >= 201307
> #pragma omp declare simd
> #endif
> on the prototypes (of course maybe with some clauses if needed).

That's what this patch does - but we need a way for the headers to declare 
to GCC which vector ISAs have such versions of a given function available, 
in a way that works both for (new GCC, old glibc) (GCC knows about newer 
vector ISAs without function versions in that glibc, and mustn't try to 
generate calls to functions that glibc doesn't have) and (old GCC, new 
glibc) (glibc is declaring availability of vector versions the old GCC 
doesn't know how to use, so the references to those vector versions need 
to be quietly ignored or conditional on GCC version).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12  7:18         ` Ondřej Bílka
@ 2014-09-12 17:04           ` Andi Kleen
  0 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2014-09-12 17:04 UTC (permalink / raw)
  To: Ondřej Bílka
  Cc: Andi Kleen, Rich Felker, Matthew Fortune, Joseph S. Myers,
	Andrew Senkevich, libc-alpha, igor.zamyatin, Melik-Adamyan, Areg,
	jakub

> Which has problem when one want to support both users with svml, amcl or
> nothing package maintainers for some reason do not want create three
> versions of same package.

I assume once Linux establishes a standard vectorized 
math library (like this one), the other math libraries will follow
and start implementing compatible ABIs.

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 17:03           ` Joseph S. Myers
@ 2014-09-12 17:09             ` Jakub Jelinek
  2014-09-15 12:36               ` Zamyatin, Igor
  2014-11-12 17:42               ` Andrew Senkevich
  0 siblings, 2 replies; 52+ messages in thread
From: Jakub Jelinek @ 2014-09-12 17:09 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Andi Kleen, Rich Felker, Matthew Fortune, Andrew Senkevich,
	libc-alpha, igor.zamyatin, Melik-Adamyan, Areg

On Fri, Sep 12, 2014 at 05:03:06PM +0000, Joseph S. Myers wrote:
> That's what this patch does - but we need a way for the headers to declare 
> to GCC which vector ISAs have such versions of a given function available, 
> in a way that works both for (new GCC, old glibc) (GCC knows about newer 
> vector ISAs without function versions in that glibc, and mustn't try to 
> generate calls to functions that glibc doesn't have) and (old GCC, new 
> glibc) (glibc is declaring availability of vector versions the old GCC 
> doesn't know how to use, so the references to those vector versions need 
> to be quietly ignored or conditional on GCC version).

In Cilk+ there is a way to tell which ISA the elemental function is compiled
for.  In OpenMP we've made a GCC ABI decision that on i?86/x86_64 all of
SSE2, AVX, AVX2 and AVX-512 passing conventions are used; those can be
emitted as aliases, thunks or real functions (have to optimize this on GCC
side at some point).  E.g. in Intel ABI (which uses different letters) it
always uses just SSE2.
When/if AVX-1024 is added, we won't change the ABI, so one will still have
to use two AVX-512 calls; but perhaps we can add as OpenMP extension
some clause like Cilk+ has to specify ISA.

	Jakub

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 16:00                 ` Torvald Riegel
@ 2014-09-12 17:37                   ` Carlos O'Donell
  2014-09-12 22:38                     ` Torvald Riegel
  0 siblings, 1 reply; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-12 17:37 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: H.J. Lu, Joseph S. Myers, Zamyatin, Igor, Andrew Senkevich,
	libc-alpha, Melik-Adamyan, Areg, jakub

On 09/12/2014 12:00 PM, Torvald Riegel wrote:
> On Fri, 2014-09-12 at 11:10 -0400, Carlos O'Donell wrote:
>> On 09/12/2014 11:01 AM, H.J. Lu wrote:
>>> On Thu, Sep 11, 2014 at 5:10 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>>>> On 09/11/2014 04:57 PM, H.J. Lu wrote:
>>>>>> That doesn't answer my question.  Maybe glibc 2.21 provides such versions
>>>>>> for all x86 ISAs there are at present, up to AVX512 - and then a new
>>>>>> extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and
>>>>>> -mavx1024, it must not try to generate calls to the AVX1024 functions,
>>>>>> because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds
>>>>>> the AVX1024 functions.  So something needs to be different in the headers
>>>>>> of 2.26 to inform GCC 7 that AVX1024 versions of the functions are
>>>>>> available.  And I think that means the directive that communicates
>>>>>> function availability to the compiler needs to identify the set of ISAs
>>>>>> for which versions of the function in question are available.
>>>>>>
>>>>>
>>>>> Wouldn't it be better to put libmvec in GCC instead?
>>>>
>>>> That's certainly a discussion we can have.
>>>>
>>>> What do you see as the pros and cons?
>>>>
>>>
>>> It depends on who are the main target users of this library.
>>> If it is mainly for programmers to use them directly in their
>>> applications, mostly independent of compilers, it should be
>>> in glibc.  But if it is mainly used by GCC, it should be in
>>> GCC, just like other run-time libraries.
>>
>> The former. I want users to be able to call these functions
>> directly regardless of the compiler. The same goes for the
>> ppc-related API that has been in use for a long time by
>> developers there.
> 
> From a long-term maintenance perspective, it seems easier to support
> higher-level SIMD / vectorization abstractions in programming languages
> than glibc APIs for each and every HW vector instruction variant.  We'd
> have an ABI to maintain too if we chose the runtime library route, but
> then at least it's not another glibc API.

That is correct. However it also requires a much much much more complicated
API design because it has to map optimally to an arbitrarily different
future hardware design that we don't know of yet. We might extrapolate
that AVX-512 -> 1024 -> 2048 -> 4096 etc, and design in that direction
for all machines. Not to mention unified error reporting. It's a harder
problem that exposing, for now, the vendor APIs as a starting point,
as a place where the community can learn about the issues behind these
APIs and design something new.

Cheers,
Carlos.
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 15:10               ` Carlos O'Donell
  2014-09-12 16:00                 ` Torvald Riegel
@ 2014-09-12 19:05                 ` H.J. Lu
  2014-09-12 19:13                   ` Carlos O'Donell
  1 sibling, 1 reply; 52+ messages in thread
From: H.J. Lu @ 2014-09-12 19:05 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Joseph S. Myers, Zamyatin, Igor, Andrew Senkevich, libc-alpha,
	Melik-Adamyan, Areg, jakub

On Fri, Sep 12, 2014 at 8:10 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/12/2014 11:01 AM, H.J. Lu wrote:
>> On Thu, Sep 11, 2014 at 5:10 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>>> On 09/11/2014 04:57 PM, H.J. Lu wrote:
>>>>> That doesn't answer my question.  Maybe glibc 2.21 provides such versions
>>>>> for all x86 ISAs there are at present, up to AVX512 - and then a new
>>>>> extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and
>>>>> -mavx1024, it must not try to generate calls to the AVX1024 functions,
>>>>> because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds
>>>>> the AVX1024 functions.  So something needs to be different in the headers
>>>>> of 2.26 to inform GCC 7 that AVX1024 versions of the functions are
>>>>> available.  And I think that means the directive that communicates
>>>>> function availability to the compiler needs to identify the set of ISAs
>>>>> for which versions of the function in question are available.
>>>>>
>>>>
>>>> Wouldn't it be better to put libmvec in GCC instead?
>>>
>>> That's certainly a discussion we can have.
>>>
>>> What do you see as the pros and cons?
>>>
>>
>> It depends on who are the main target users of this library.
>> If it is mainly for programmers to use them directly in their
>> applications, mostly independent of compilers, it should be
>> in glibc.  But if it is mainly used by GCC, it should be in
>> GCC, just like other run-time libraries.
>
> The former. I want users to be able to call these functions
> directly regardless of the compiler. The same goes for the
> ppc-related API that has been in use for a long time by
> developers there.
>
> The compiler can certainly make use of these functions, and
> any more standard cross-machine GNU API we design, but it
> should always be possible to call them directly.
>
> Does this mean libmvec should be in glibc?
>

If the target users are programmers,  we should make it easier
to use for programmers.  We can provide a generic API with a
generic implementation.  Each target can provide an optimized
version which is transparent to users.  We can use IFUNC to
select the best version at run-time.



-- 
H.J.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 19:05                 ` H.J. Lu
@ 2014-09-12 19:13                   ` Carlos O'Donell
  2014-09-12 19:31                     ` H.J. Lu
  0 siblings, 1 reply; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-12 19:13 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Joseph S. Myers, Zamyatin, Igor, Andrew Senkevich, libc-alpha,
	Melik-Adamyan, Areg, jakub

On 09/12/2014 03:05 PM, H.J. Lu wrote:
> On Fri, Sep 12, 2014 at 8:10 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>> On 09/12/2014 11:01 AM, H.J. Lu wrote:
>>> On Thu, Sep 11, 2014 at 5:10 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>>>> On 09/11/2014 04:57 PM, H.J. Lu wrote:
>>>>>> That doesn't answer my question.  Maybe glibc 2.21 provides such versions
>>>>>> for all x86 ISAs there are at present, up to AVX512 - and then a new
>>>>>> extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and
>>>>>> -mavx1024, it must not try to generate calls to the AVX1024 functions,
>>>>>> because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds
>>>>>> the AVX1024 functions.  So something needs to be different in the headers
>>>>>> of 2.26 to inform GCC 7 that AVX1024 versions of the functions are
>>>>>> available.  And I think that means the directive that communicates
>>>>>> function availability to the compiler needs to identify the set of ISAs
>>>>>> for which versions of the function in question are available.
>>>>>>
>>>>>
>>>>> Wouldn't it be better to put libmvec in GCC instead?
>>>>
>>>> That's certainly a discussion we can have.
>>>>
>>>> What do you see as the pros and cons?
>>>>
>>>
>>> It depends on who are the main target users of this library.
>>> If it is mainly for programmers to use them directly in their
>>> applications, mostly independent of compilers, it should be
>>> in glibc.  But if it is mainly used by GCC, it should be in
>>> GCC, just like other run-time libraries.
>>
>> The former. I want users to be able to call these functions
>> directly regardless of the compiler. The same goes for the
>> ppc-related API that has been in use for a long time by
>> developers there.
>>
>> The compiler can certainly make use of these functions, and
>> any more standard cross-machine GNU API we design, but it
>> should always be possible to call them directly.
>>
>> Does this mean libmvec should be in glibc?
>>
> 
> If the target users are programmers,  we should make it easier
> to use for programmers.  We can provide a generic API with a
> generic implementation.  Each target can provide an optimized
> version which is transparent to users.  We can use IFUNC to
> select the best version at run-time.

I think such an implementation is orthogonal to exposing the
already documented vector functions supported by Intel?

Similarly for IBM.

First a foremost we should support users who are expecting
to be able to call the functions Intel and IBM have already
documented.

Second to that we can create a new API?

Note that the generic API might by very difficult to design,
which is why I don't suggest we tackle it first.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12  5:33       ` Andi Kleen
  2014-09-12  7:18         ` Ondřej Bílka
  2014-09-12  7:43         ` Jakub Jelinek
@ 2014-09-12 19:18         ` Carlos O'Donell
  2014-09-12 19:20           ` Carlos O'Donell
  2 siblings, 1 reply; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-12 19:18 UTC (permalink / raw)
  To: Andi Kleen, Rich Felker
  Cc: Matthew Fortune, Joseph S. Myers, Andrew Senkevich, libc-alpha,
	igor.zamyatin, Melik-Adamyan, Areg, jakub

On 09/12/2014 01:33 AM, Andi Kleen wrote:
> Rich Felker <dalias@libc.org> writes:
>>
>> This really seems like something the compiler should be doing --
>> translating parallelizable calls to the standard math functions into
>> calls to special simd versions (
> 
> Of course gcc already supports that. Even in two different flavours.
> 
> Not sure why the patch doesn't implement one of those ABIs though.

Please note that AFAIK gcc doesn't implement the ABI as documented
by Intel, but a variant, and my expectation is that we should *absolutely*
be targetting the gcc version of the Intel ABI otherwise what's the point?

>      -mveclibabi=type
>            Specifies the ABI type to use for vectorizing intrinsics
>            using an external library.
>            Supported values for type are svml for the Intel short vector
>            math library and acml for
>            the AMD math core library.  To use this option, both
>            -ftree-vectorize and
>            -funsafe-math-optimizations have to be enabled, and an SVML
>            or ACML ABI-compatible
>            library must be specified at link time.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 19:18         ` Carlos O'Donell
@ 2014-09-12 19:20           ` Carlos O'Donell
  2014-09-12 19:56             ` Rich Felker
  0 siblings, 1 reply; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-12 19:20 UTC (permalink / raw)
  To: Andi Kleen, Rich Felker
  Cc: Matthew Fortune, Joseph S. Myers, Andrew Senkevich, libc-alpha,
	igor.zamyatin, Melik-Adamyan, Areg, jakub

On 09/12/2014 03:17 PM, Carlos O'Donell wrote:
> On 09/12/2014 01:33 AM, Andi Kleen wrote:
>> Rich Felker <dalias@libc.org> writes:
>>>
>>> This really seems like something the compiler should be doing --
>>> translating parallelizable calls to the standard math functions into
>>> calls to special simd versions (
>>
>> Of course gcc already supports that. Even in two different flavours.
>>
>> Not sure why the patch doesn't implement one of those ABIs though.
> 
> Please note that AFAIK gcc doesn't implement the ABI as documented
> by Intel, but a variant, and my expectation is that we should *absolutely*
> be targetting the gcc version of the Intel ABI otherwise what's the point?

I see Jakub already responded that there is a 3rd ABI :}

c.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 19:13                   ` Carlos O'Donell
@ 2014-09-12 19:31                     ` H.J. Lu
  0 siblings, 0 replies; 52+ messages in thread
From: H.J. Lu @ 2014-09-12 19:31 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Joseph S. Myers, Zamyatin, Igor, Andrew Senkevich, libc-alpha,
	Melik-Adamyan, Areg, jakub

On Fri, Sep 12, 2014 at 12:13 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 09/12/2014 03:05 PM, H.J. Lu wrote:
>> On Fri, Sep 12, 2014 at 8:10 AM, Carlos O'Donell <carlos@redhat.com> wrote:
>>> On 09/12/2014 11:01 AM, H.J. Lu wrote:
>>>> On Thu, Sep 11, 2014 at 5:10 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>>>>> On 09/11/2014 04:57 PM, H.J. Lu wrote:
>>>>>>> That doesn't answer my question.  Maybe glibc 2.21 provides such versions
>>>>>>> for all x86 ISAs there are at present, up to AVX512 - and then a new
>>>>>>> extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and
>>>>>>> -mavx1024, it must not try to generate calls to the AVX1024 functions,
>>>>>>> because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds
>>>>>>> the AVX1024 functions.  So something needs to be different in the headers
>>>>>>> of 2.26 to inform GCC 7 that AVX1024 versions of the functions are
>>>>>>> available.  And I think that means the directive that communicates
>>>>>>> function availability to the compiler needs to identify the set of ISAs
>>>>>>> for which versions of the function in question are available.
>>>>>>>
>>>>>>
>>>>>> Wouldn't it be better to put libmvec in GCC instead?
>>>>>
>>>>> That's certainly a discussion we can have.
>>>>>
>>>>> What do you see as the pros and cons?
>>>>>
>>>>
>>>> It depends on who are the main target users of this library.
>>>> If it is mainly for programmers to use them directly in their
>>>> applications, mostly independent of compilers, it should be
>>>> in glibc.  But if it is mainly used by GCC, it should be in
>>>> GCC, just like other run-time libraries.
>>>
>>> The former. I want users to be able to call these functions
>>> directly regardless of the compiler. The same goes for the
>>> ppc-related API that has been in use for a long time by
>>> developers there.
>>>
>>> The compiler can certainly make use of these functions, and
>>> any more standard cross-machine GNU API we design, but it
>>> should always be possible to call them directly.
>>>
>>> Does this mean libmvec should be in glibc?
>>>
>>
>> If the target users are programmers,  we should make it easier
>> to use for programmers.  We can provide a generic API with a
>> generic implementation.  Each target can provide an optimized
>> version which is transparent to users.  We can use IFUNC to
>> select the best version at run-time.
>
> I think such an implementation is orthogonal to exposing the
> already documented vector functions supported by Intel?
>
> Similarly for IBM.
>
> First a foremost we should support users who are expecting
> to be able to call the functions Intel and IBM have already
> documented.
>
> Second to that we can create a new API?
>
> Note that the generic API might by very difficult to design,
> which is why I don't suggest we tackle it first.
>

I think it is the best solution for our users.


-- 
H.J.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 19:20           ` Carlos O'Donell
@ 2014-09-12 19:56             ` Rich Felker
  2014-09-12 20:33               ` Jakub Jelinek
  0 siblings, 1 reply; 52+ messages in thread
From: Rich Felker @ 2014-09-12 19:56 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Andi Kleen, Matthew Fortune, Joseph S. Myers, Andrew Senkevich,
	libc-alpha, igor.zamyatin, Melik-Adamyan, Areg, jakub

On Fri, Sep 12, 2014 at 03:19:48PM -0400, Carlos O'Donell wrote:
> On 09/12/2014 03:17 PM, Carlos O'Donell wrote:
> > On 09/12/2014 01:33 AM, Andi Kleen wrote:
> >> Rich Felker <dalias@libc.org> writes:
> >>>
> >>> This really seems like something the compiler should be doing --
> >>> translating parallelizable calls to the standard math functions into
> >>> calls to special simd versions (
> >>
> >> Of course gcc already supports that. Even in two different flavours.
> >>
> >> Not sure why the patch doesn't implement one of those ABIs though.
> > 
> > Please note that AFAIK gcc doesn't implement the ABI as documented
> > by Intel, but a variant, and my expectation is that we should *absolutely*
> > be targetting the gcc version of the Intel ABI otherwise what's the point?
> 
> I see Jakub already responded that there is a 3rd ABI :}

I don't see any point in targeting ABIs that don't have the function
names in a reserved namespace, since the compiler cannot automatically
make such transformations using non-reserved namespace.

Rich

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 19:56             ` Rich Felker
@ 2014-09-12 20:33               ` Jakub Jelinek
  0 siblings, 0 replies; 52+ messages in thread
From: Jakub Jelinek @ 2014-09-12 20:33 UTC (permalink / raw)
  To: Rich Felker
  Cc: Carlos O'Donell, Andi Kleen, Matthew Fortune,
	Joseph S. Myers, Andrew Senkevich, libc-alpha, igor.zamyatin,
	Melik-Adamyan, Areg

On Fri, Sep 12, 2014 at 03:56:19PM -0400, Rich Felker wrote:
> On Fri, Sep 12, 2014 at 03:19:48PM -0400, Carlos O'Donell wrote:
> > On 09/12/2014 03:17 PM, Carlos O'Donell wrote:
> > > On 09/12/2014 01:33 AM, Andi Kleen wrote:
> > >> Rich Felker <dalias@libc.org> writes:
> > >>>
> > >>> This really seems like something the compiler should be doing --
> > >>> translating parallelizable calls to the standard math functions into
> > >>> calls to special simd versions (
> > >>
> > >> Of course gcc already supports that. Even in two different flavours.
> > >>
> > >> Not sure why the patch doesn't implement one of those ABIs though.
> > > 
> > > Please note that AFAIK gcc doesn't implement the ABI as documented
> > > by Intel, but a variant, and my expectation is that we should *absolutely*
> > > be targetting the gcc version of the Intel ABI otherwise what's the point?
> > 
> > I see Jakub already responded that there is a 3rd ABI :}
> 
> I don't see any point in targeting ABIs that don't have the function
> names in a reserved namespace, since the compiler cannot automatically
> make such transformations using non-reserved namespace.

???  The names are in reserved namespace, e.g. for
#pragma omp declare simd
double foo (double x, double y, double z) { return x + y + z; }
on i?86/x86_64 the mangled names are
_ZGVbN2vvv_foo
_ZGVbM2vvv_foo
_ZGVcN4vvv_foo
_ZGVcM4vvv_foo
_ZGVdN4vvv_foo
_ZGVdM4vvv_foo

	Jakub

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 17:37                   ` Carlos O'Donell
@ 2014-09-12 22:38                     ` Torvald Riegel
  2014-09-12 22:47                       ` Carlos O'Donell
  0 siblings, 1 reply; 52+ messages in thread
From: Torvald Riegel @ 2014-09-12 22:38 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: H.J. Lu, Joseph S. Myers, Zamyatin, Igor, Andrew Senkevich,
	libc-alpha, Melik-Adamyan, Areg, jakub

On Fri, 2014-09-12 at 13:37 -0400, Carlos O'Donell wrote:
> On 09/12/2014 12:00 PM, Torvald Riegel wrote:
> > On Fri, 2014-09-12 at 11:10 -0400, Carlos O'Donell wrote:
> >> On 09/12/2014 11:01 AM, H.J. Lu wrote:
> >>> On Thu, Sep 11, 2014 at 5:10 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> >>>> On 09/11/2014 04:57 PM, H.J. Lu wrote:
> >>>>>> That doesn't answer my question.  Maybe glibc 2.21 provides such versions
> >>>>>> for all x86 ISAs there are at present, up to AVX512 - and then a new
> >>>>>> extension AVX1024 appears.  When GCC 7 is used with glibc 2.21 headers and
> >>>>>> -mavx1024, it must not try to generate calls to the AVX1024 functions,
> >>>>>> because glibc 2.21 doesn't have such functions.  But maybe glibc 2.26 adds
> >>>>>> the AVX1024 functions.  So something needs to be different in the headers
> >>>>>> of 2.26 to inform GCC 7 that AVX1024 versions of the functions are
> >>>>>> available.  And I think that means the directive that communicates
> >>>>>> function availability to the compiler needs to identify the set of ISAs
> >>>>>> for which versions of the function in question are available.
> >>>>>>
> >>>>>
> >>>>> Wouldn't it be better to put libmvec in GCC instead?
> >>>>
> >>>> That's certainly a discussion we can have.
> >>>>
> >>>> What do you see as the pros and cons?
> >>>>
> >>>
> >>> It depends on who are the main target users of this library.
> >>> If it is mainly for programmers to use them directly in their
> >>> applications, mostly independent of compilers, it should be
> >>> in glibc.  But if it is mainly used by GCC, it should be in
> >>> GCC, just like other run-time libraries.
> >>
> >> The former. I want users to be able to call these functions
> >> directly regardless of the compiler. The same goes for the
> >> ppc-related API that has been in use for a long time by
> >> developers there.
> > 
> > From a long-term maintenance perspective, it seems easier to support
> > higher-level SIMD / vectorization abstractions in programming languages
> > than glibc APIs for each and every HW vector instruction variant.  We'd
> > have an ABI to maintain too if we chose the runtime library route, but
> > then at least it's not another glibc API.
> 
> That is correct. However it also requires a much much much more complicated
> API design because it has to map optimally to an arbitrarily different
> future hardware design that we don't know of yet. We might extrapolate
> that AVX-512 -> 1024 -> 2048 -> 4096 etc, and design in that direction
> for all machines. Not to mention unified error reporting. It's a harder
> problem that exposing, for now, the vendor APIs as a starting point,
> as a place where the community can learn about the issues behind these
> APIs and design something new.

The language-level interfaces I was speaking about (they're not exactly
APIs) are different.  They have to expose some of the complexity you
mention (see the Cilk+ elemental functions), but hide a lot in the
compiler.  Standards-wise, the current direction does not seem to go
towards trying to have portable APIs for vector instructions, but rather
let programmers specify that SIMD parallelism is allowed and then give a
few hints about how a programmer-supplied function might be used (e.g.,
that one argument is always linear, etc.).  That's why I said that we'd
have complexity in the ABIs (and in the compiler), but are not exposing
a lot of that to programmers via APIs.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 22:38                     ` Torvald Riegel
@ 2014-09-12 22:47                       ` Carlos O'Donell
  0 siblings, 0 replies; 52+ messages in thread
From: Carlos O'Donell @ 2014-09-12 22:47 UTC (permalink / raw)
  To: Torvald Riegel
  Cc: H.J. Lu, Joseph S. Myers, Zamyatin, Igor, Andrew Senkevich,
	libc-alpha, Melik-Adamyan, Areg, jakub

On 09/12/2014 06:38 PM, Torvald Riegel wrote:
>> That is correct. However it also requires a much much much more complicated
>> API design because it has to map optimally to an arbitrarily different
>> future hardware design that we don't know of yet. We might extrapolate
>> that AVX-512 -> 1024 -> 2048 -> 4096 etc, and design in that direction
>> for all machines. Not to mention unified error reporting. It's a harder
>> problem that exposing, for now, the vendor APIs as a starting point,
>> as a place where the community can learn about the issues behind these
>> APIs and design something new.
> 
> The language-level interfaces I was speaking about (they're not exactly
> APIs) are different.  They have to expose some of the complexity you
> mention (see the Cilk+ elemental functions), but hide a lot in the
> compiler.  Standards-wise, the current direction does not seem to go
> towards trying to have portable APIs for vector instructions, but rather
> let programmers specify that SIMD parallelism is allowed and then give a
> few hints about how a programmer-supplied function might be used (e.g.,
> that one argument is always linear, etc.).  That's why I said that we'd
> have complexity in the ABIs (and in the compiler), but are not exposing
> a lot of that to programmers via APIs.

That makes a certain amount of sense, but with the Intel functions
documented the developers expect to be able to write bespoke loops
and call those functions.

Perhaps for the generic implementation we may chose not to expose
that to programmers.

c.
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 17:09             ` Jakub Jelinek
@ 2014-09-15 12:36               ` Zamyatin, Igor
  2014-09-15 16:43                 ` Andi Kleen
  2014-11-12 17:42               ` Andrew Senkevich
  1 sibling, 1 reply; 52+ messages in thread
From: Zamyatin, Igor @ 2014-09-15 12:36 UTC (permalink / raw)
  To: Jakub Jelinek, Joseph S. Myers
  Cc: Andi Kleen, Rich Felker, Matthew Fortune, Andrew Senkevich,
	libc-alpha, Melik-Adamyan, Areg

> > That's what this patch does - but we need a way for the headers to
> > declare to GCC which vector ISAs have such versions of a given
> > function available, in a way that works both for (new GCC, old glibc)
> > (GCC knows about newer vector ISAs without function versions in that
> > glibc, and mustn't try to generate calls to functions that glibc
> > doesn't have) and (old GCC, new
> > glibc) (glibc is declaring availability of vector versions the old GCC
> > doesn't know how to use, so the references to those vector versions
> > need to be quietly ignored or conditional on GCC version).
> 
> In Cilk+ there is a way to tell which ISA the elemental function is compiled for.
> In OpenMP we've made a GCC ABI decision that on i?86/x86_64 all of SSE2,
> AVX, AVX2 and AVX-512 passing conventions are used; those can be emitted
> as aliases, thunks or real functions (have to optimize this on GCC side at some
> point).  E.g. in Intel ABI (which uses different letters) it always uses just SSE2.
> When/if AVX-1024 is added, we won't change the ABI, so one will still have to
> use two AVX-512 calls; but perhaps we can add as OpenMP extension some
> clause like Cilk+ has to specify ISA.

But this will not help in case of a binary compiled, say, for AVX512 target and launched on a system with glibc which contains vector versions up to AVX2, right?

Thanks,
Igor

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-15 12:36               ` Zamyatin, Igor
@ 2014-09-15 16:43                 ` Andi Kleen
  0 siblings, 0 replies; 52+ messages in thread
From: Andi Kleen @ 2014-09-15 16:43 UTC (permalink / raw)
  To: Zamyatin, Igor
  Cc: Jakub Jelinek, Joseph S. Myers, Andi Kleen, Rich Felker,
	Matthew Fortune, Andrew Senkevich, libc-alpha, Melik-Adamyan,
	Areg

> But this will not help in case of a binary compiled, say, for AVX512 target and launched on a system with glibc which contains vector versions up to AVX2, right?

You always need a glibc or newer than what you build with to run.
It's not different from any other new symbol in glibc.

Just shouldn't have newer subset glibcs that do not contain some
new symbols, but I don't think anyone is proposing that.

-andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-11 19:32   ` Carlos O'Donell
  2014-09-11 20:19     ` Zamyatin, Igor
@ 2014-09-16 16:57     ` Andrew Senkevich
  2014-09-16 17:02       ` H.J. Lu
  1 sibling, 1 reply; 52+ messages in thread
From: Andrew Senkevich @ 2014-09-16 16:57 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Joseph S. Myers, libc-alpha, igor.zamyatin, Melik-Adamyan, Areg, jakub

Hi all,

I have added information on Enhancing libm page -
https://sourceware.org/glibc/wiki/libm#Addition_of_x86_64_vector_math_functions_to_Glibc


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-16 16:57     ` Andrew Senkevich
@ 2014-09-16 17:02       ` H.J. Lu
  2014-09-17  9:56         ` Andrew Senkevich
  0 siblings, 1 reply; 52+ messages in thread
From: H.J. Lu @ 2014-09-16 17:02 UTC (permalink / raw)
  To: Andrew Senkevich
  Cc: Carlos O'Donell, Joseph S. Myers, libc-alpha, Zamyatin, Igor,
	Melik-Adamyan, Areg, Jakub Jelinek

On Tue, Sep 16, 2014 at 9:56 AM, Andrew Senkevich
<andrew.n.senkevich@gmail.com> wrote:
> Hi all,
>
> I have added information on Enhancing libm page -
> https://sourceware.org/glibc/wiki/libm#Addition_of_x86_64_vector_math_functions_to_Glibc
>

The wiki says:

3.1. Goal

Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD
constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
and Cilk Plus constructs (6-7 in
http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
on x86_64 by adding SSE4, AVX and AVX2 vector implementations of
several vector math functions (float and double versions). AVX-512
versions are planned to be added later. These functions can be also
used manually (with intrincics) by developers to obtain speedup.

It is the opposite of

https://sourceware.org/ml/libc-alpha/2014-09/msg00277.html

which is for programmers to use them directly in their
applications, mostly independent of compilers.

We need to come to an agreement on what goal is first.

-- 
H.J.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-16 17:02       ` H.J. Lu
@ 2014-09-17  9:56         ` Andrew Senkevich
  2014-09-17 10:09           ` Jakub Jelinek
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Senkevich @ 2014-09-17  9:56 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Carlos O'Donell, Joseph S. Myers, libc-alpha, Zamyatin, Igor,
	Melik-Adamyan, Areg, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1106 bytes --]

> The wiki says:
>
> 3.1. Goal
>
> Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD
> constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
> and Cilk Plus constructs (6-7 in
> http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
> on x86_64 by adding SSE4, AVX and AVX2 vector implementations of
> several vector math functions (float and double versions). AVX-512
> versions are planned to be added later. These functions can be also
> used manually (with intrincics) by developers to obtain speedup.
>
> It is the opposite of
>
> https://sourceware.org/ml/libc-alpha/2014-09/msg00277.html
>
> which is for programmers to use them directly in their
> applications, mostly independent of compilers.
>
> We need to come to an agreement on what goal is first.
>
> --
> H.J.

Hi H.J.,

of course the first goal is to improve vectorization. Usage with
intrinsics is additional goal and is not very significant.

Attached first patch corrected according last comments in
https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html.

[-- Attachment #2: vectorized_cos_v2.patch --]
[-- Type: application/octet-stream, Size: 15725 bytes --]

diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 8a94a7e..f7e5e39 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -46,6 +46,17 @@
 # error "Never include <bits/mathcalls.h> directly; include <math.h> instead."
 #endif
 
+#undef __DECL_SIMD
+
+/* For now we have vectorized version only for _Mdouble_ case */
+#if !defined _Mfloat_ && !defined _Mlong_double_
+# if defined _OPENMP && _OPENMP >= 201307
+#  define __DECL_SIMD _Pragma ("omp declare simd")
+# endif
+#else
+# define __DECL_SIMD
+#endif
+
 
 /* Trigonometric functions.  */
 
@@ -60,6 +71,7 @@ __MATHCALL (atan,, (_Mdouble_ __x));
 __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
 
 /* Cosine of X.  */
+__DECL_SIMD
 __MATHCALL (cos,, (_Mdouble_ __x));
 /* Sine of X.  */
 __MATHCALL (sin,, (_Mdouble_ __x));
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
index 2390934..bb791ea 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
@@ -402,5 +402,8 @@ GLIBC_2.2.5
  yn F
  ynf F
  ynl F
+GLIBC_2.21
+ GLIBC_2.21 A
+ _ZGVdN4v_cos F
 GLIBC_2.4
  GLIBC_2.4 A
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
new file mode 100644
index 0000000..1cb3ec5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -0,0 +1,3 @@
+ifeq ($(subdir),math)
+libm-support += svml_d_cos4_core svml_d_cos_data
+endif
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
new file mode 100644
index 0000000..1717a7a
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Versions
@@ -0,0 +1,7 @@
+libm {
+  GLIBC_2.21 {
+    # A generic bug got this omitted from other configurations' version
+    # sets, but we always had it.
+    _ZGVdN4v_cos;
+  }
+}
diff --git a/sysdeps/x86_64/fpu/svml_d_cos4_core.S b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
new file mode 100644
index 0000000..8334875
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
@@ -0,0 +1,185 @@
+/* Function cos vectorized with AVX2.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.text
+ENTRY(_ZGVdN4v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *     
+ *    ( low accuracy ( < 4ulp ) or enhanced performance ( half of correct mantissa ) implementation )
+ *     
+ *    Argument representation:
+ *    arg + Pi/2 = (N*Pi + R)
+ *    
+ *    Result calculation:
+ *    cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ *    sin(R) is approximated by corresponding polynomial
+ */
+        pushq     %rbp
+        movq      %rsp, %rbp
+        andq      $-64, %rsp
+        subq      $448, %rsp
+        movq      __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
+        vmovapd   %ymm0, %ymm1
+        vmovupd   192(%rax), %ymm4
+        vmovupd   256(%rax), %ymm5
+
+/* ARGUMENT RANGE REDUCTION:
+ * Add Pi/2 to argument: X' = X+Pi/2
+ */
+        vaddpd    128(%rax), %ymm1, %ymm7
+
+/* Get absolute argument value: X' = |X'| */
+        vandpd    (%rax), %ymm7, %ymm2
+
+/* Y = X'*InvPi + RS : right shifter add */
+        vfmadd213pd %ymm5, %ymm4, %ymm7
+        vmovupd   1216(%rax), %ymm4
+
+/* Check for large arguments path */
+        vcmpnle_uqpd 64(%rax), %ymm2, %ymm3
+
+/* N = Y - RS : right shifter sub */
+        vsubpd    %ymm5, %ymm7, %ymm6
+        vmovupd   640(%rax), %ymm2
+
+/* SignRes = Y<<63 : shift LSB to MSB place for result sign */
+        vpsllq    $63, %ymm7, %ymm7
+
+/* N = N - 0.5 */
+        vsubpd    320(%rax), %ymm6, %ymm0
+        vmovmskpd %ymm3, %ecx
+
+/* R = X - N*Pi1 */
+        vmovapd   %ymm1, %ymm3
+        vfnmadd231pd %ymm0, %ymm2, %ymm3
+
+/* R = R - N*Pi2 */
+        vfnmadd231pd 704(%rax), %ymm0, %ymm3
+
+/* R = R - N*Pi3 */
+        vfnmadd132pd 768(%rax), %ymm3, %ymm0
+
+/* POLYNOMIAL APPROXIMATION:
+ * R2 = R*R
+ */
+        vmulpd    %ymm0, %ymm0, %ymm5
+        vfmadd213pd 1152(%rax), %ymm5, %ymm4
+        vfmadd213pd 1088(%rax), %ymm5, %ymm4
+        vfmadd213pd 1024(%rax), %ymm5, %ymm4
+
+/* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))) */
+        vfmadd213pd 960(%rax), %ymm5, %ymm4
+        vfmadd213pd 896(%rax), %ymm5, %ymm4
+        vfmadd213pd 832(%rax), %ymm5, %ymm4
+        vmulpd    %ymm5, %ymm4, %ymm6
+        vfmadd213pd %ymm0, %ymm0, %ymm6
+
+/* RECONSTRUCTION:
+ * Final sign setting: Res = Poly^SignRes 
+ */
+        vxorpd    %ymm7, %ymm6, %ymm0
+        testl     %ecx, %ecx
+        jne       _LBL_1_3
+
+_LBL_1_2:
+        movq      %rbp, %rsp
+        popq      %rbp
+        ret
+
+_LBL_1_3:
+        vmovupd   %ymm1, 320(%rsp)
+        vmovupd   %ymm0, 384(%rsp)
+        je        _LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        vmovups   %ymm8, 224(%rsp)
+        vmovups   %ymm9, 192(%rsp)
+        vmovups   %ymm10, 160(%rsp)
+        vmovups   %ymm11, 128(%rsp)
+        vmovups   %ymm12, 96(%rsp)
+        vmovups   %ymm13, 64(%rsp)
+        vmovups   %ymm14, 32(%rsp)
+        vmovups   %ymm15, (%rsp)
+        movq      %rsi, 264(%rsp)
+        movq      %rdi, 256(%rsp)
+        movq      %r12, 296(%rsp)
+        movb      %dl, %r12b
+        movq      %r13, 288(%rsp)
+        movl      %ecx, %r13d
+        movq      %r14, 280(%rsp)
+        movl      %eax, %r14d
+        movq      %r15, 272(%rsp)
+
+_LBL_1_6:
+        btl       %r14d, %r13d
+        jc        _LBL_1_12
+
+_LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        _LBL_1_10
+
+_LBL_1_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        _LBL_1_6
+
+        vmovups   224(%rsp), %ymm8
+        vmovups   192(%rsp), %ymm9
+        vmovups   160(%rsp), %ymm10
+        vmovups   128(%rsp), %ymm11
+        vmovups   96(%rsp), %ymm12
+        vmovups   64(%rsp), %ymm13
+        vmovups   32(%rsp), %ymm14
+        vmovups   (%rsp), %ymm15
+        vmovupd   384(%rsp), %ymm0
+        movq      264(%rsp), %rsi
+        movq      256(%rsp), %rdi
+        movq      296(%rsp), %r12
+        movq      288(%rsp), %r13
+        movq      280(%rsp), %r14
+        movq      272(%rsp), %r15
+        jmp       _LBL_1_2
+
+_LBL_1_10:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    328(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      __cos@PLT
+
+        vmovsd    %xmm0, 392(%rsp,%r15)
+        jmp       _LBL_1_8
+
+_LBL_1_12:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    320(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      __cos@PLT
+
+        vmovsd    %xmm0, 384(%rsp,%r15)
+        jmp       _LBL_1_7
+END(_ZGVdN4v_cos)
diff --git a/sysdeps/x86_64/fpu/svml_d_cos_data.S b/sysdeps/x86_64/fpu/svml_d_cos_data.S
new file mode 100644
index 0000000..7bb1aba
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos_data.S
@@ -0,0 +1,426 @@
+/* Data for vectorized cos.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+	.section .rodata, "a"
+
+	.align 64
+	.globl __gnu_svml_dcos_data
+__gnu_svml_dcos_data:
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.type	__gnu_svml_dcos_data,@object
+	.size	__gnu_svml_dcos_data,1600

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-17  9:56         ` Andrew Senkevich
@ 2014-09-17 10:09           ` Jakub Jelinek
  2014-09-17 11:58             ` Andrew Senkevich
  0 siblings, 1 reply; 52+ messages in thread
From: Jakub Jelinek @ 2014-09-17 10:09 UTC (permalink / raw)
  To: Andrew Senkevich
  Cc: H.J. Lu, Carlos O'Donell, Joseph S. Myers, libc-alpha,
	Zamyatin, Igor, Melik-Adamyan, Areg

On Wed, Sep 17, 2014 at 01:56:06PM +0400, Andrew Senkevich wrote:
> > The wiki says:
> >
> > 3.1. Goal
> >
> > Main goal is to improve vectorization of GCC with OpenMP4.0 SIMD
> > constructs (#2.8 in http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
> > and Cilk Plus constructs (6-7 in
> > http://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm)
> > on x86_64 by adding SSE4, AVX and AVX2 vector implementations of
> > several vector math functions (float and double versions). AVX-512
> > versions are planned to be added later. These functions can be also
> > used manually (with intrincics) by developers to obtain speedup.
> >
> > It is the opposite of
> >
> > https://sourceware.org/ml/libc-alpha/2014-09/msg00277.html
> >
> > which is for programmers to use them directly in their
> > applications, mostly independent of compilers.
> >
> > We need to come to an agreement on what goal is first.
> >
> > --
> > H.J.
> 
> Hi H.J.,
> 
> of course the first goal is to improve vectorization. Usage with
> intrinsics is additional goal and is not very significant.
> 
> Attached first patch corrected according last comments in
> https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html.

--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -46,6 +46,17 @@
 # error "Never include <bits/mathcalls.h> directly; include <math.h> instead."
 #endif
 
+#undef __DECL_SIMD
+
+/* For now we have vectorized version only for _Mdouble_ case */
+#if !defined _Mfloat_ && !defined _Mlong_double_
+# if defined _OPENMP && _OPENMP >= 201307
+#  define __DECL_SIMD _Pragma ("omp declare simd")

As the function is provided only on x86_64, it needs to be guarded
by defined __x86_64__ too (or have some way how arch specific
headers can tell what function are elemental).

Also, only the N (notinbranch) version is provided, so you'd
need to use "omp declare simd notinbranch", and furthermore only
the AVX2 version is provided (that is not possible for gcc,
you need all of SSE2, AVX and AVX2 versions, the other two can be
thunked (extract arguments and call cos in a loop or similarly, then
pass result in vector reg again).

	Jakub

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-17 10:09           ` Jakub Jelinek
@ 2014-09-17 11:58             ` Andrew Senkevich
  2014-09-17 12:34               ` Joseph S. Myers
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Senkevich @ 2014-09-17 11:58 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: H.J. Lu, Carlos O'Donell, Joseph S. Myers, libc-alpha,
	Zamyatin, Igor, Melik-Adamyan, Areg

Hi Jakub,

> +/* For now we have vectorized version only for _Mdouble_ case */
> +#if !defined _Mfloat_ && !defined _Mlong_double_
> +# if defined _OPENMP && _OPENMP >= 201307
> +#  define __DECL_SIMD _Pragma ("omp declare simd")
>
> As the function is provided only on x86_64, it needs to be guarded
> by defined __x86_64__ too (or have some way how arch specific
> headers can tell what function are elemental).
> Also, only the N (notinbranch) version is provided, so you'd
> need to use "omp declare simd notinbranch", and furthermore only
> the AVX2 version is provided (that is not possible for gcc,
> you need all of SSE2, AVX and AVX2 versions, the other two can be
> thunked (extract arguments and call cos in a loop or similarly, then
> pass result in vector reg again).

thank you, this place will look so:

#if defined (__x86_64__) && !defined _Mfloat_ && !defined _Mlong_double_ \
     && defined _OPENMP && _OPENMP >= 201307
# define __DECL_SIMD _Pragma ("omp declare simd notinbranch")
#else
# define __DECL_SIMD
#endif

Only AVX will be thunked, SSE4 we have implemented.
It will be added in next patch soon.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-17 11:58             ` Andrew Senkevich
@ 2014-09-17 12:34               ` Joseph S. Myers
  0 siblings, 0 replies; 52+ messages in thread
From: Joseph S. Myers @ 2014-09-17 12:34 UTC (permalink / raw)
  To: Andrew Senkevich
  Cc: Jakub Jelinek, H.J. Lu, Carlos O'Donell, libc-alpha,
	Zamyatin, Igor, Melik-Adamyan, Areg

On Wed, 17 Sep 2014, Andrew Senkevich wrote:

> thank you, this place will look so:
> 
> #if defined (__x86_64__) && !defined _Mfloat_ && !defined _Mlong_double_ \
>      && defined _OPENMP && _OPENMP >= 201307
> # define __DECL_SIMD _Pragma ("omp declare simd notinbranch")
> #else
> # define __DECL_SIMD
> #endif

No, we never put architecture conditionals like that in 
architecture-independent headers.  You have to use bits/*.h headers 
(per-architecture) instead to provide all the information about which 
functions have which vectorized versions.  See what I suggested in 
<https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html> about macros 
such as __DECL_SIMD_COS_DOUBLE.

There's no point in sending more patch revisions until there's consensus 
on the overall implementation approach.  Please go back several steps and 
start an architecture-independent discussion in a new 
architecture-independent thread seeking consensus on all the points I 
raised in <https://sourceware.org/ml/libc-alpha/2014-09/msg00182.html>.  
State your initial answers to these points so people can agree or disagree 
with them, but be prepared to change following the results of the 
discussion.  Only once there is such consensus, write up the results on 
the wiki, seek confirmation on the list that other people agree that what 
you wrote up is an accurate representation of the consensus, and only then 
start sending patches (which will probably put functions in libmvec, not 
libm, based on the discussions so far).  The patches have to follow the 
consensus, not your preference if the consensus goes against what you 
prefer.

<https://sourceware.org/glibc/wiki/libm> does not in any way reflect any 
sort of consensus.  It's *ideas and proposals*.  Consensus can only be 
reached after discussions on libc-alpha (and if no-one comments on 
something, that's not consensus).  In general, proposals are best posted 
to libc-alpha so it's easy to tell what was being discussed in a 
particular thread - proposals on wiki pages make it harder to follow the 
discussion, as you need to work out which version of the wiki page someone 
was referring to.

Note that if we're relying on #pragma omp declare simd meaning a precise 
set of function versions are available - with a guarantee that no future 
compiler version will interpret is also meaning an AVX512 version is 
available, for example, so that it's safely possible to use older glibc 
with a newer compiler - there should be some sort of ABI document 
(preferably compiler-independent) stating that this is the meaning of that 
pragma on x86_64 and that this pragma says nothing about availability of 
function versions for other vector extensions and that if an ABI is 
defined for such versions in future, it will use a different pragma to 
declare their availability.  Is there such an ABI document available that 
defines what this pragma means on x86_64?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-12 17:09             ` Jakub Jelinek
  2014-09-15 12:36               ` Zamyatin, Igor
@ 2014-11-12 17:42               ` Andrew Senkevich
  2014-11-12 17:52                 ` Jakub Jelinek
  1 sibling, 1 reply; 52+ messages in thread
From: Andrew Senkevich @ 2014-11-12 17:42 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: libc-alpha, Joseph S. Myers

2014-09-12 21:08 GMT+04:00 Jakub Jelinek <jakub@redhat.com>:

> On Fri, Sep 12, 2014 at 05:03:06PM +0000, Joseph S. Myers wrote:
>> That's what this patch does - but we need a way for the headers to declare
>> to GCC which vector ISAs have such versions of a given function available,
>> in a way that works both for (new GCC, old glibc) (GCC knows about newer
>> vector ISAs without function versions in that glibc, and mustn't try to
>> generate calls to functions that glibc doesn't have) and (old GCC, new
>> glibc) (glibc is declaring availability of vector versions the old GCC
>> doesn't know how to use, so the references to those vector versions need
>> to be quietly ignored or conditional on GCC version).

> In Cilk+ there is a way to tell which ISA the elemental function is compiled
> for.  In OpenMP we've made a GCC ABI decision that on i?86/x86_64 all of
> SSE2, AVX, AVX2 and AVX-512 passing conventions are used; those can be
> emitted as aliases, thunks or real functions (have to optimize this on GCC
> side at some point).  E.g. in Intel ABI (which uses different letters) it
> always uses just SSE2.
> When/if AVX-1024 is added, we won't change the ABI, so one will still have
> to use two AVX-512 calls; but perhaps we can add as OpenMP extension
> some clause like Cilk+ has to specify ISA.

As we see processor clause unfortunately not parsed by GCC.
So no such way to specify ISA for both OpenMP and Cilk+ so far.

GCC 4.9 and 5.0 both can vectorize in SSE2, AVX, AVX2, AVX-512 (with
-mavx512f generated 2 calls of AVX2 versions).
For x86_64 we will add all of SSE2, AVX, AVX2 versions (some through wrappers).

Jakub, do you think we can plan to add processor clauses into GCC 6?

If yes, we can have separate SIMD declarations in math.h into 2 parts:
without processor clause under condition on GCC version is < 6.0 (we
just know that these compiler versions and glibc are synced in the
sense of versions of vector math functions) and with processor clause
under condition on GCC version is >= 6.0.

Does it look reasonable?


--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-11-12 17:42               ` Andrew Senkevich
@ 2014-11-12 17:52                 ` Jakub Jelinek
  2014-11-12 18:12                   ` Joseph Myers
  0 siblings, 1 reply; 52+ messages in thread
From: Jakub Jelinek @ 2014-11-12 17:52 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: libc-alpha, Joseph S. Myers

On Wed, Nov 12, 2014 at 09:42:11PM +0400, Andrew Senkevich wrote:
> 2014-09-12 21:08 GMT+04:00 Jakub Jelinek <jakub@redhat.com>:
> 
> > On Fri, Sep 12, 2014 at 05:03:06PM +0000, Joseph S. Myers wrote:
> >> That's what this patch does - but we need a way for the headers to declare
> >> to GCC which vector ISAs have such versions of a given function available,
> >> in a way that works both for (new GCC, old glibc) (GCC knows about newer
> >> vector ISAs without function versions in that glibc, and mustn't try to
> >> generate calls to functions that glibc doesn't have) and (old GCC, new
> >> glibc) (glibc is declaring availability of vector versions the old GCC
> >> doesn't know how to use, so the references to those vector versions need
> >> to be quietly ignored or conditional on GCC version).
> 
> > In Cilk+ there is a way to tell which ISA the elemental function is compiled
> > for.  In OpenMP we've made a GCC ABI decision that on i?86/x86_64 all of
> > SSE2, AVX, AVX2 and AVX-512 passing conventions are used; those can be
> > emitted as aliases, thunks or real functions (have to optimize this on GCC
> > side at some point).  E.g. in Intel ABI (which uses different letters) it
> > always uses just SSE2.
> > When/if AVX-1024 is added, we won't change the ABI, so one will still have
> > to use two AVX-512 calls; but perhaps we can add as OpenMP extension
> > some clause like Cilk+ has to specify ISA.
> 
> As we see processor clause unfortunately not parsed by GCC.
> So no such way to specify ISA for both OpenMP and Cilk+ so far.

Processor clause is parsed for Cilk+.  OpenMP doesn't have anything like
that, and I'm not sure it would be appropriate for the standard, because
the standard is not specific to a single CPU architecture and all parties
would need to agree on the names.  You actually don't care about the
processors anyway, but about the vector ISA.

> GCC 4.9 and 5.0 both can vectorize in SSE2, AVX, AVX2, AVX-512 (with
> -mavx512f generated 2 calls of AVX2 versions).

Indeed, those 3 are what GCC emits for OpenMP, AVX-512 is not emitted,
for Cilk+ it should emit SSE2 unless processor clause is used.

	Jakub

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-11-12 17:52                 ` Jakub Jelinek
@ 2014-11-12 18:12                   ` Joseph Myers
  2014-11-17 19:51                     ` Zamyatin, Igor
  0 siblings, 1 reply; 52+ messages in thread
From: Joseph Myers @ 2014-11-12 18:12 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Andrew Senkevich, libc-alpha

On Wed, 12 Nov 2014, Jakub Jelinek wrote:

> Processor clause is parsed for Cilk+.  OpenMP doesn't have anything like
> that, and I'm not sure it would be appropriate for the standard, because
> the standard is not specific to a single CPU architecture and all parties
> would need to agree on the names.  You actually don't care about the
> processors anyway, but about the vector ISA.

An alternative to having a processor clause now would be having an ABI/API 
document for OpenMP on x86_64 - agreed between implementations - that 
specifies what vector versions of a function the standard pragma means are 
available, and specifies that implementations must not generate calls to 
versions not listed unless some non-standard pragma is used to declare 
those other versions to be available (which would put off defining such a 
non-standard pragma until there is a desire to have vector versions for 
newer ISAs).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-11-12 18:12                   ` Joseph Myers
@ 2014-11-17 19:51                     ` Zamyatin, Igor
  2014-11-17 23:55                       ` Joseph Myers
  0 siblings, 1 reply; 52+ messages in thread
From: Zamyatin, Igor @ 2014-11-17 19:51 UTC (permalink / raw)
  To: Joseph Myers, Jakub Jelinek; +Cc: Andrew Senkevich, libc-alpha

> On Wed, 12 Nov 2014, Jakub Jelinek wrote:
> 
> > Processor clause is parsed for Cilk+.  OpenMP doesn't have anything
> > like that, and I'm not sure it would be appropriate for the standard,
> > because the standard is not specific to a single CPU architecture and
> > all parties would need to agree on the names.  You actually don't care
> > about the processors anyway, but about the vector ISA.

Right, right, it's about vector ISA. Anyway some mechanism for pointing the exact ISA to be used still looks useful for different needs...
BTW processor clause were replaced by architecture clause in Cilk Plus 1.2. This clause seems to serve exactly for pointing ISA - will find out more details about it.

> 
> An alternative to having a processor clause now would be having an ABI/API
> document for OpenMP on x86_64 - agreed between implementations - that
> specifies what vector versions of a function the standard pragma means are
> available, and specifies that implementations must not generate calls to
> versions not listed unless some non-standard pragma is used to declare
> those other versions to be available (which would put off defining such a
> non-standard pragma until there is a desire to have vector versions for
> newer ISAs).

We can prepare a document that describes what compiler (gcc 4.9 and gcc5) can generate (and of course make sure that we have all those versions in glibc) for x86_64 and put it somewhere on gcc.gnu.org (e.g. Release notes?) and, say, on glibc wiki.
Will it be enough for now?

Thanks,
Igor

> 
> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-11-17 19:51                     ` Zamyatin, Igor
@ 2014-11-17 23:55                       ` Joseph Myers
  2014-11-27 16:13                         ` Andrew Senkevich
  0 siblings, 1 reply; 52+ messages in thread
From: Joseph Myers @ 2014-11-17 23:55 UTC (permalink / raw)
  To: Zamyatin, Igor; +Cc: Jakub Jelinek, Andrew Senkevich, libc-alpha

On Mon, 17 Nov 2014, Zamyatin, Igor wrote:

> > An alternative to having a processor clause now would be having an ABI/API
> > document for OpenMP on x86_64 - agreed between implementations - that
> > specifies what vector versions of a function the standard pragma means are
> > available, and specifies that implementations must not generate calls to
> > versions not listed unless some non-standard pragma is used to declare
> > those other versions to be available (which would put off defining such a
> > non-standard pragma until there is a desire to have vector versions for
> > newer ISAs).
> 
> We can prepare a document that describes what compiler (gcc 4.9 and 
> gcc5) can generate (and of course make sure that we have all those 
> versions in glibc) for x86_64 and put it somewhere on gcc.gnu.org (e.g. 
> Release notes?) and, say, on glibc wiki. Will it be enough for now?

I'm thinking of a document that multiple implementations have accepted as 
describing the intended semantics of the pragma as regard what function 
versions may be assumed to be present, so that we can expect glibc using 
that pragma in installed headers to work with future versions of multiple 
compilers, rather than something GCC-specific.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-11-17 23:55                       ` Joseph Myers
@ 2014-11-27 16:13                         ` Andrew Senkevich
  2014-11-27 17:17                           ` Joseph Myers
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Senkevich @ 2014-11-27 16:13 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Zamyatin, Igor, Jakub Jelinek, libc-alpha

2014-11-18 2:55 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> On Mon, 17 Nov 2014, Zamyatin, Igor wrote:
>
>> > An alternative to having a processor clause now would be having an ABI/API
>> > document for OpenMP on x86_64 - agreed between implementations - that
>> > specifies what vector versions of a function the standard pragma means are
>> > available, and specifies that implementations must not generate calls to
>> > versions not listed unless some non-standard pragma is used to declare
>> > those other versions to be available (which would put off defining such a
>> > non-standard pragma until there is a desire to have vector versions for
>> > newer ISAs).
>>
>> We can prepare a document that describes what compiler (gcc 4.9 and
>> gcc5) can generate (and of course make sure that we have all those
>> versions in glibc) for x86_64 and put it somewhere on gcc.gnu.org (e.g.
>> Release notes?) and, say, on glibc wiki. Will it be enough for now?
>
> I'm thinking of a document that multiple implementations have accepted as
> describing the intended semantics of the pragma as regard what function
> versions may be assumed to be present, so that we can expect glibc using
> that pragma in installed headers to work with future versions of multiple
> compilers, rather than something GCC-specific.

Joseph,

here is draft version of such a document, could you please review it?

GLIBC 2.21 VECTOR MATH FUNCTIONS X86_64 ABI/API

This document describes x86_64 API of vector math functions which
added in Glibc 2.21 and contains the following parts:
1. Vector math functions
2. Auto vectorization and usage model with GCC
3. Variants of available vector math functions names
4. List of vector functions and their ISA specific names

1. Vector math functions

Vector math functions are vector variants of corresponding scalar math
operations implemented currently using SIMD ISA extensions SSE4, AVX
and AVX2 (AVX version for now implemented as wrapper with two calls of
SSE4 version). They take packed vector arguments, perform the
operation on each element of the packed vector argument, and return a
packed vector result.

Vector math functions are expected to be faster than repeatedly called
scalar equivalents in most cases. However, these vector versions
differ from the scalar analogues in accuracy and behavior on special
values. Functions are optimized for performance on their respective
domains if processing doesn’t incur special values like denormal
values, over- and under-flow, and out of range. Special values
processing is done in a scalar fashion via respective scalar routine
calls. Additionally functions like trigonometric may resort to scalar
processing of huge (or other) arguments that do not necessarily cause
special values, but rather require different and less SIMD-friendly
handling.
These functions tested to pass 4-ulp maximum relative error criterion
on their domains in round-to-nearest computation mode.

C99 compliance in terms of special values, errno:
a) Functions may not raise exceptions as required by C language
standard. Functions may raise spurious exceptions. This is considered
an artifact of SIMD processing and may be fixed in the future on the
case-by-case basis.
b) Functions may not change errno in some of the required cases, e.g.
if the SIMD friendly algorithm is done branch-free without a libm call
for that value. This is done for performance reasons.
c) As the implementation is dependent on libm, some accuracy and
special case problems may be inherent to this fact.
d) Functions do not guarantee fully correct results in computation
modes different from round-to-nearest one.

2. Auto vectorization and usage model with GCC

Vector math functions were added to Glibc with goal to utilize SIMD
constructs of OpenMP4.0 (#2.8 in
http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf). Cilk Plus SIMD
constructs support will be also added later.

Standard header math.h was changed with addition of OpenMP declare
simd directive for functions which have vector versions.
This directive has clauses for specifying additional properties of
vector implementations, for instance for vector function “cos”
implemented in SSE4 ISA added
#pragma omp declare simd notinbranch simdlen(2)
to its declaration in math.h.

Starting from version 4.9 GCC requires command like
gcc test.c -I/PATH_TO_GLIBC_2.21/include/ -L/PATH_TO_GLIBC_2.21/lib/
-fopenmp -ffast-math -lm –O1
(with architecture selection with -maxv, -mavx2 or default -msse4)
for auto vectorization of the following code in test.c:
#include <math.h>

int N = 3200;
double b[3200];
double a[3200];
int main (void)
{
  int i;
  #pragma omp simd
  for (i=0; i<N; i+=1)
  {
    b[i]=cos (a[i]);
  }
  return (0);
}

Exact names of functions to which compiler can generate calls are
described in the next part.

3. Variants of available vector math functions names

Name of vector function created by GCC is based on Intel Vector
Function ABI (http://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf)
with a little difference in part of name specifying ISA – namely
letters b, c, d instead of x, y, Y.

For compatibility with GCC according names was taken for vector math
functions in Glibc.

#pragma omp declare simd notinbranch simdlen(2) for some function
“func” means what the name of vector version is:
_ZGVbN2v_func (it is SSE4 implementation).

#pragma omp declare simd notinbranch simdlen(4) for some function
“func” means what the following names are available:
_ZGVcN4v_func (it is AVX implementation)
and
_ZGVdN4v_func (it is AVX2 implementation).

Every vector function should be provided by Glibc for each supported
ISA (currently SSE4, AVX and AVX2).

4. List of vector functions and their ISA specific names

Glibc 2.21 contains the following vector version names of math functions:

a) cos: _ZGVbN2v_cos, _ZGVcN4v_cos, _ZGVdN4v_cos

--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-11-27 16:13                         ` Andrew Senkevich
@ 2014-11-27 17:17                           ` Joseph Myers
  2014-11-27 19:24                             ` Andrew Senkevich
  0 siblings, 1 reply; 52+ messages in thread
From: Joseph Myers @ 2014-11-27 17:17 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Zamyatin, Igor, Jakub Jelinek, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1835 bytes --]

On Thu, 27 Nov 2014, Andrew Senkevich wrote:

> here is draft version of such a document, could you please review it?
> 
> GLIBC 2.21 VECTOR MATH FUNCTIONS X86_64 ABI/API

Most of this document is not part of what needs agreeing between compiler 
implementations (it could e.g. form part of an article informing people 
about new features in glibc 2.21).

> 3. Variants of available vector math functions names
> 
> Name of vector function created by GCC is based on Intel Vector
> Function ABI (http://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf)
> with a little difference in part of name specifying ISA â€“ namely
> letters b, c, d instead of x, y, Y.
> 
> For compatibility with GCC according names was taken for vector math
> functions in Glibc.
> 
> #pragma omp declare simd notinbranch simdlen(2) for some function
> â€œfuncâ€ means what the name of vector version is:
> _ZGVbN2v_func (it is SSE4 implementation).
> 
> #pragma omp declare simd notinbranch simdlen(4) for some function
> â€œfuncâ€ means what the following names are available:
> _ZGVcN4v_func (it is AVX implementation)
> and
> _ZGVdN4v_func (it is AVX2 implementation).

This is all that is needed.  It needs to be written in a 
compiler-independent manner (and agreed between compilers), and state 
explicitly that the semantics of those pragmas are independent of the 
processor for which code is being generated (so, for example, those 
pragmas must not be interpreted as meaning AVX512 versions of functions 
are available even if code is being built for a processor with AVX512 
support) and that any future ABI extension that defines additional vector 
function versions will also define a different pragma to declare their 
availability.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-11-27 17:17                           ` Joseph Myers
@ 2014-11-27 19:24                             ` Andrew Senkevich
  2014-11-27 20:12                               ` Joseph Myers
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Senkevich @ 2014-11-27 19:24 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Zamyatin, Igor, Jakub Jelinek, libc-alpha

2014-11-27 20:17 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> This is all that is needed.  It needs to be written in a
> compiler-independent manner (and agreed between compilers), and state
> explicitly that the semantics of those pragmas are independent of the
> processor for which code is being generated (so, for example, those
> pragmas must not be interpreted as meaning AVX512 versions of functions
> are available even if code is being built for a processor with AVX512
> support) and that any future ABI extension that defines additional vector
> function versions will also define a different pragma to declare their
> availability.

GLIBC 2.21 VECTOR MATH FUNCTION NAME VARIANTS FOR x86_64

Name of vector math function is based on Intel Vector Function ABI
(http://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf)
with a little difference in part of name specifying ISA – namely
letters b, c, d instead of x, y, Y.

#pragma omp declare simd notinbranch simdlen(2) for some function
“func” means what the name of vector version is:

_ZGVbN2v_func (it is SSE4 implementation).

#pragma omp declare simd notinbranch simdlen(4) for some function
“func” means what the following names are available:

_ZGVcN4v_func (it is AVX implementation)
and
_ZGVdN4v_func (it is AVX2 implementation).

Every vector function should be provided by Glibc for each supported
ISA (currently SSE4, AVX and AVX2).

Semantics of those pragmas are independent of the processor for which
code is being generated.

Those pragmas must not be interpreted as meaning AVX512 versions of
functions are available even if code is being built for a processor
with AVX512 support.

Any future ABI extension that defines additional vector function
versions will also define a different pragma to declare their
availability.

Is it Ok and where is the proper place to put this document, include
to documentation I think, not to wiki?

--
WBR,
Andrew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-11-27 19:24                             ` Andrew Senkevich
@ 2014-11-27 20:12                               ` Joseph Myers
  0 siblings, 0 replies; 52+ messages in thread
From: Joseph Myers @ 2014-11-27 20:12 UTC (permalink / raw)
  To: Andrew Senkevich; +Cc: Zamyatin, Igor, Jakub Jelinek, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3209 bytes --]

On Thu, 27 Nov 2014, Andrew Senkevich wrote:

> 2014-11-27 20:17 GMT+03:00 Joseph Myers <joseph@codesourcery.com>:
> > This is all that is needed.  It needs to be written in a
> > compiler-independent manner (and agreed between compilers), and state
> > explicitly that the semantics of those pragmas are independent of the
> > processor for which code is being generated (so, for example, those
> > pragmas must not be interpreted as meaning AVX512 versions of functions
> > are available even if code is being built for a processor with AVX512
> > support) and that any future ABI extension that defines additional vector
> > function versions will also define a different pragma to declare their
> > availability.
> 
> GLIBC 2.21 VECTOR MATH FUNCTION NAME VARIANTS FOR x86_64

Should be something like: "OpenMP vector function ABI for x86_64" (as it's 
about the meaning of #pragma omp declare simd for x86_64, and then glibc 
is simply a user - any other library could equally rely on such an ABI to 
describe the meaning of such a pragma in its headers).

> #pragma omp declare simd notinbranch simdlen(2) for some function
> â€œfuncâ€ means what the name of vector version is:
> 
> _ZGVbN2v_func (it is SSE4 implementation).
> 
> #pragma omp declare simd notinbranch simdlen(4) for some function
> â€œfuncâ€ means what the following names are available:
> 
> _ZGVcN4v_func (it is AVX implementation)
> and
> _ZGVdN4v_func (it is AVX2 implementation).
> 
> Every vector function should be provided by Glibc for each supported
> ISA (currently SSE4, AVX and AVX2).

The generic statement would seem to mean: if simdlen(2) is used then only 
the SSE4 version is needed; if simdlen(4) is used then the AVX and AVX2 
versions are needed; only if both are used are all three versions then 
needed.  Is that accurate?  This ABI document should be defining what any 
library can rely on when providing SIMD function implementations and an 
associated installed header that may be used with multiple compilers, 
rather than talking specifically about glibc choices.

> Those pragmas must not be interpreted as meaning AVX512 versions of
> functions are available even if code is being built for a processor
> with AVX512 support.

This should say "for example" or similar (AVX512 is simply an illustrative 
example.

> Is it Ok and where is the proper place to put this document, include
> to documentation I think, not to wiki?

I suggest putting the OpenMP ABI for x86_64 alongside that other ABI 
document you mentioned.  Or, you could add all the relevant OpenMP 
information to the x86_64 ABI document: 
http://www.x86-64.org/svn/trunk/x86-64-ABI/ - discussed on the 
x86-64-abi@googlegroups.com mailing list.

The important thing is not so much where it goes - it's that OpenMP 
maintainers for relevant compilers (probably GCC, LLVM, Intel compiler) 
agree that it represents how they intend to interpret the pragmas, so that 
when glibc installs headers with those pragmas we can be confident 
compilers will interpret them as intended and so programs can be built 
with one compiler to use these functions in glibc built with another 
compiler.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
  2014-09-10 15:15 Andrew Senkevich
@ 2014-09-10 16:55 ` Adhemerval Zanella
  0 siblings, 0 replies; 52+ messages in thread
From: Adhemerval Zanella @ 2014-09-10 16:55 UTC (permalink / raw)
  To: libc-alpha

On 10-09-2014 12:14, Andrew Senkevich wrote:
> Patch attached here

Before anything I would like to ask you to read the contributor checklist [1].
First, such change will require you to sign FSF copyright assignment before any
kind of review.

You also need to describe your intentions with your patch: it is an optimization
to current behavior? Just "Patch attached here" does not say anything about it.
Which is the performance evaluation? Which benchmarks did you use? Did you run
the testcase? Does the ULPs file need update?

For such changes the best way is to provide an internal symbol selected by IFUNC.
I will let x86 maintainers chime in, but I see adding a new symbol under GLIBC 2.2.5
 *and* and external PLT call is unacceptable in IMHO.

The patch also needs a proper ChangeLog. In a short: read the checklist first please.

[1] https://sourceware.org/glibc/wiki/Contribution%20checklist

>
>
> --
> WBR,
> Andrew
> diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
> index 8a94a7e..ebfa583 100644
> --- a/math/bits/mathcalls.h
> +++ b/math/bits/mathcalls.h
> @@ -46,6 +46,17 @@
>  # error "Never include <bits/mathcalls.h> directly; include <math.h> instead."
>  #endif
>  
> +#undef __DECL_SIMD
> +
> +#if defined _OPENMP && _OPENMP >= 201307
> +/* For now we have vectorized version only for _Mdouble_ case */
> +# ifdef _Mdouble_
> +#  define __DECL_SIMD _Pragma ("omp declare simd")
> +# endif
> +#else
> +# define __DECL_SIMD
> +#endif
> +
>  
>  /* Trigonometric functions.  */
>  
> @@ -60,6 +71,7 @@ __MATHCALL (atan,, (_Mdouble_ __x));
>  __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
>  
>  /* Cosine of X.  */
> +__DECL_SIMD
>  __MATHCALL (cos,, (_Mdouble_ __x));
>  /* Sine of X.  */
>  __MATHCALL (sin,, (_Mdouble_ __x));
> diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
> index 2390934..1aa3099 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
> @@ -89,6 +89,7 @@ GLIBC_2.18
>  GLIBC_2.2.5
>   GLIBC_2.2.5 A
>   _LIB_VERSION D 0x4
> + _ZGVdN4v_cos F
>   __clog10 F
>   __clog10f F
>   __clog10l F
> diff --git a/sysdeps/unix/sysv/linux/x86_64/64/localplt.data b/sysdeps/unix/sysv/linux/x86_64/64/localplt.data
> new file mode 100644
> index 0000000..1a683d9
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/x86_64/64/localplt.data
> @@ -0,0 +1,10 @@
> +# See scripts/check-localplt.awk for how this file is processed.
> +# PLT use is required for the malloc family and for matherr because
> +# users can define their own functions and have library internals call them.
> +libc.so: calloc
> +libc.so: free
> +libc.so: malloc
> +libc.so: memalign
> +libc.so: realloc
> +libm.so: matherr
> +libm.so: cos
> diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
> new file mode 100644
> index 0000000..1cb3ec5
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/Makefile
> @@ -0,0 +1,3 @@
> +ifeq ($(subdir),math)
> +libm-support += svml_d_cos4_core svml_d_cos_data
> +endif
> diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
> new file mode 100644
> index 0000000..d30fbb3
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/Versions
> @@ -0,0 +1,7 @@
> +libm {
> +  GLIBC_2.2.5 {
> +    # A generic bug got this omitted from other configurations' version
> +    # sets, but we always had it.
> +    _ZGVdN4v_cos;
> +  }
> +}
> diff --git a/sysdeps/x86_64/fpu/svml_d_cos4_core.S b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
> new file mode 100644
> index 0000000..7316d2b
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
> @@ -0,0 +1,185 @@
> +/* Function cos vectorized with AVX2.
> +   Copyright (C) 2014 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <sysdep.h>
> +
> +	.text
> +ENTRY(_ZGVdN4v_cos)
> +
> +/* ALGORITHM DESCRIPTION:
> + *     
> + *    ( low accuracy ( < 4ulp ) or enhanced performance ( half of correct mantissa ) implementation )
> + *     
> + *    Argument representation:
> + *    arg + Pi/2 = (N*Pi + R)
> + *    
> + *    Result calculation:
> + *    cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
> + *    sin(R) is approximated by corresponding polynomial
> + */
> +        pushq     %rbp
> +        movq      %rsp, %rbp
> +        andq      $-64, %rsp
> +        subq      $448, %rsp
> +        movq      __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
> +        vmovapd   %ymm0, %ymm1
> +        vmovupd   192(%rax), %ymm4
> +        vmovupd   256(%rax), %ymm5
> +
> +/* ARGUMENT RANGE REDUCTION:
> + * Add Pi/2 to argument: X' = X+Pi/2
> + */
> +        vaddpd    128(%rax), %ymm1, %ymm7
> +
> +/* Get absolute argument value: X' = |X'| */
> +        vandpd    (%rax), %ymm7, %ymm2
> +
> +/* Y = X'*InvPi + RS : right shifter add */
> +        vfmadd213pd %ymm5, %ymm4, %ymm7
> +        vmovupd   1216(%rax), %ymm4
> +
> +/* Check for large arguments path */
> +        vcmpnle_uqpd 64(%rax), %ymm2, %ymm3
> +
> +/* N = Y - RS : right shifter sub */
> +        vsubpd    %ymm5, %ymm7, %ymm6
> +        vmovupd   640(%rax), %ymm2
> +
> +/* SignRes = Y<<63 : shift LSB to MSB place for result sign */
> +        vpsllq    $63, %ymm7, %ymm7
> +
> +/* N = N - 0.5 */
> +        vsubpd    320(%rax), %ymm6, %ymm0
> +        vmovmskpd %ymm3, %ecx
> +
> +/* R = X - N*Pi1 */
> +        vmovapd   %ymm1, %ymm3
> +        vfnmadd231pd %ymm0, %ymm2, %ymm3
> +
> +/* R = R - N*Pi2 */
> +        vfnmadd231pd 704(%rax), %ymm0, %ymm3
> +
> +/* R = R - N*Pi3 */
> +        vfnmadd132pd 768(%rax), %ymm3, %ymm0
> +
> +/* POLYNOMIAL APPROXIMATION:
> + * R2 = R*R
> + */
> +        vmulpd    %ymm0, %ymm0, %ymm5
> +        vfmadd213pd 1152(%rax), %ymm5, %ymm4
> +        vfmadd213pd 1088(%rax), %ymm5, %ymm4
> +        vfmadd213pd 1024(%rax), %ymm5, %ymm4
> +
> +/* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))) */
> +        vfmadd213pd 960(%rax), %ymm5, %ymm4
> +        vfmadd213pd 896(%rax), %ymm5, %ymm4
> +        vfmadd213pd 832(%rax), %ymm5, %ymm4
> +        vmulpd    %ymm5, %ymm4, %ymm6
> +        vfmadd213pd %ymm0, %ymm0, %ymm6
> +
> +/* RECONSTRUCTION:
> + * Final sign setting: Res = Poly^SignRes 
> + */
> +        vxorpd    %ymm7, %ymm6, %ymm0
> +        testl     %ecx, %ecx
> +        jne       _LBL_1_3
> +
> +_LBL_1_2:
> +        movq      %rbp, %rsp
> +        popq      %rbp
> +        ret
> +
> +_LBL_1_3:
> +        vmovupd   %ymm1, 320(%rsp)
> +        vmovupd   %ymm0, 384(%rsp)
> +        je        _LBL_1_2
> +
> +        xorb      %dl, %dl
> +        xorl      %eax, %eax
> +        vmovups   %ymm8, 224(%rsp)
> +        vmovups   %ymm9, 192(%rsp)
> +        vmovups   %ymm10, 160(%rsp)
> +        vmovups   %ymm11, 128(%rsp)
> +        vmovups   %ymm12, 96(%rsp)
> +        vmovups   %ymm13, 64(%rsp)
> +        vmovups   %ymm14, 32(%rsp)
> +        vmovups   %ymm15, (%rsp)
> +        movq      %rsi, 264(%rsp)
> +        movq      %rdi, 256(%rsp)
> +        movq      %r12, 296(%rsp)
> +        movb      %dl, %r12b
> +        movq      %r13, 288(%rsp)
> +        movl      %ecx, %r13d
> +        movq      %r14, 280(%rsp)
> +        movl      %eax, %r14d
> +        movq      %r15, 272(%rsp)
> +
> +_LBL_1_6:
> +        btl       %r14d, %r13d
> +        jc        _LBL_1_12
> +
> +_LBL_1_7:
> +        lea       1(%r14), %esi
> +        btl       %esi, %r13d
> +        jc        _LBL_1_10
> +
> +_LBL_1_8:
> +        incb      %r12b
> +        addl      $2, %r14d
> +        cmpb      $16, %r12b
> +        jb        _LBL_1_6
> +
> +        vmovups   224(%rsp), %ymm8
> +        vmovups   192(%rsp), %ymm9
> +        vmovups   160(%rsp), %ymm10
> +        vmovups   128(%rsp), %ymm11
> +        vmovups   96(%rsp), %ymm12
> +        vmovups   64(%rsp), %ymm13
> +        vmovups   32(%rsp), %ymm14
> +        vmovups   (%rsp), %ymm15
> +        vmovupd   384(%rsp), %ymm0
> +        movq      264(%rsp), %rsi
> +        movq      256(%rsp), %rdi
> +        movq      296(%rsp), %r12
> +        movq      288(%rsp), %r13
> +        movq      280(%rsp), %r14
> +        movq      272(%rsp), %r15
> +        jmp       _LBL_1_2
> +
> +_LBL_1_10:
> +        movzbl    %r12b, %r15d
> +        shlq      $4, %r15
> +        vmovsd    328(%rsp,%r15), %xmm0
> +        vzeroupper
> +
> +        call      cos@PLT
> +
> +        vmovsd    %xmm0, 392(%rsp,%r15)
> +        jmp       _LBL_1_8
> +
> +_LBL_1_12:
> +        movzbl    %r12b, %r15d
> +        shlq      $4, %r15
> +        vmovsd    320(%rsp,%r15), %xmm0
> +        vzeroupper
> +
> +        call      cos@PLT
> +
> +        vmovsd    %xmm0, 384(%rsp,%r15)
> +        jmp       _LBL_1_7
> +END(_ZGVdN4v_cos)
> diff --git a/sysdeps/x86_64/fpu/svml_d_cos_data.S b/sysdeps/x86_64/fpu/svml_d_cos_data.S
> new file mode 100644
> index 0000000..53f5244
> --- /dev/null
> +++ b/sysdeps/x86_64/fpu/svml_d_cos_data.S
> @@ -0,0 +1,424 @@
> +/* Data for function cos vectorized with AVX2.
> +   Copyright (C) 2014 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +
> +	.align 64
> +	.globl __gnu_svml_dcos_data
> +__gnu_svml_dcos_data:
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	0
> +	.long	1096810496
> +	.long	0
> +	.long	1096810496
> +	.long	0
> +	.long	1096810496
> +	.long	0
> +	.long	1096810496
> +	.long	0
> +	.long	1096810496
> +	.long	0
> +	.long	1096810496
> +	.long	0
> +	.long	1096810496
> +	.long	0
> +	.long	1096810496
> +	.long	1413754136
> +	.long	1073291771
> +	.long	1413754136
> +	.long	1073291771
> +	.long	1413754136
> +	.long	1073291771
> +	.long	1413754136
> +	.long	1073291771
> +	.long	1413754136
> +	.long	1073291771
> +	.long	1413754136
> +	.long	1073291771
> +	.long	1413754136
> +	.long	1073291771
> +	.long	1413754136
> +	.long	1073291771
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	0
> +	.long	1127743488
> +	.long	0
> +	.long	1127743488
> +	.long	0
> +	.long	1127743488
> +	.long	0
> +	.long	1127743488
> +	.long	0
> +	.long	1127743488
> +	.long	0
> +	.long	1127743488
> +	.long	0
> +	.long	1127743488
> +	.long	0
> +	.long	1127743488
> +	.long	0
> +	.long	1071644672
> +	.long	0
> +	.long	1071644672
> +	.long	0
> +	.long	1071644672
> +	.long	0
> +	.long	1071644672
> +	.long	0
> +	.long	1071644672
> +	.long	0
> +	.long	1071644672
> +	.long	0
> +	.long	1071644672
> +	.long	0
> +	.long	1071644672
> +	.long	1073741824
> +	.long	1074340347
> +	.long	1073741824
> +	.long	1074340347
> +	.long	1073741824
> +	.long	1074340347
> +	.long	1073741824
> +	.long	1074340347
> +	.long	1073741824
> +	.long	1074340347
> +	.long	1073741824
> +	.long	1074340347
> +	.long	1073741824
> +	.long	1074340347
> +	.long	1073741824
> +	.long	1074340347
> +	.long	0
> +	.long	1048855597
> +	.long	0
> +	.long	1048855597
> +	.long	0
> +	.long	1048855597
> +	.long	0
> +	.long	1048855597
> +	.long	0
> +	.long	1048855597
> +	.long	0
> +	.long	1048855597
> +	.long	0
> +	.long	1048855597
> +	.long	0
> +	.long	1048855597
> +	.long	2147483648
> +	.long	1023952536
> +	.long	2147483648
> +	.long	1023952536
> +	.long	2147483648
> +	.long	1023952536
> +	.long	2147483648
> +	.long	1023952536
> +	.long	2147483648
> +	.long	1023952536
> +	.long	2147483648
> +	.long	1023952536
> +	.long	2147483648
> +	.long	1023952536
> +	.long	2147483648
> +	.long	1023952536
> +	.long	1880851354
> +	.long	998820945
> +	.long	1880851354
> +	.long	998820945
> +	.long	1880851354
> +	.long	998820945
> +	.long	1880851354
> +	.long	998820945
> +	.long	1880851354
> +	.long	998820945
> +	.long	1880851354
> +	.long	998820945
> +	.long	1880851354
> +	.long	998820945
> +	.long	1880851354
> +	.long	998820945
> +	.long	1413754136
> +	.long	1074340347
> +	.long	1413754136
> +	.long	1074340347
> +	.long	1413754136
> +	.long	1074340347
> +	.long	1413754136
> +	.long	1074340347
> +	.long	1413754136
> +	.long	1074340347
> +	.long	1413754136
> +	.long	1074340347
> +	.long	1413754136
> +	.long	1074340347
> +	.long	1413754136
> +	.long	1074340347
> +	.long	856972294
> +	.long	1017226790
> +	.long	856972294
> +	.long	1017226790
> +	.long	856972294
> +	.long	1017226790
> +	.long	856972294
> +	.long	1017226790
> +	.long	856972294
> +	.long	1017226790
> +	.long	856972294
> +	.long	1017226790
> +	.long	856972294
> +	.long	1017226790
> +	.long	856972294
> +	.long	1017226790
> +	.long	688016905
> +	.long	962338001
> +	.long	688016905
> +	.long	962338001
> +	.long	688016905
> +	.long	962338001
> +	.long	688016905
> +	.long	962338001
> +	.long	688016905
> +	.long	962338001
> +	.long	688016905
> +	.long	962338001
> +	.long	688016905
> +	.long	962338001
> +	.long	688016905
> +	.long	962338001
> +	.long	1431655591
> +	.long	3217380693
> +	.long	1431655591
> +	.long	3217380693
> +	.long	1431655591
> +	.long	3217380693
> +	.long	1431655591
> +	.long	3217380693
> +	.long	1431655591
> +	.long	3217380693
> +	.long	1431655591
> +	.long	3217380693
> +	.long	1431655591
> +	.long	3217380693
> +	.long	1431655591
> +	.long	3217380693
> +	.long	286303400
> +	.long	1065423121
> +	.long	286303400
> +	.long	1065423121
> +	.long	286303400
> +	.long	1065423121
> +	.long	286303400
> +	.long	1065423121
> +	.long	286303400
> +	.long	1065423121
> +	.long	286303400
> +	.long	1065423121
> +	.long	286303400
> +	.long	1065423121
> +	.long	286303400
> +	.long	1065423121
> +	.long	430291053
> +	.long	3207201184
> +	.long	430291053
> +	.long	3207201184
> +	.long	430291053
> +	.long	3207201184
> +	.long	430291053
> +	.long	3207201184
> +	.long	430291053
> +	.long	3207201184
> +	.long	430291053
> +	.long	3207201184
> +	.long	430291053
> +	.long	3207201184
> +	.long	430291053
> +	.long	3207201184
> +	.long	2150694560
> +	.long	1053236707
> +	.long	2150694560
> +	.long	1053236707
> +	.long	2150694560
> +	.long	1053236707
> +	.long	2150694560
> +	.long	1053236707
> +	.long	2150694560
> +	.long	1053236707
> +	.long	2150694560
> +	.long	1053236707
> +	.long	2150694560
> +	.long	1053236707
> +	.long	2150694560
> +	.long	1053236707
> +	.long	1174413873
> +	.long	3193628213
> +	.long	1174413873
> +	.long	3193628213
> +	.long	1174413873
> +	.long	3193628213
> +	.long	1174413873
> +	.long	3193628213
> +	.long	1174413873
> +	.long	3193628213
> +	.long	1174413873
> +	.long	3193628213
> +	.long	1174413873
> +	.long	3193628213
> +	.long	1174413873
> +	.long	3193628213
> +	.long	1470296608
> +	.long	1038487144
> +	.long	1470296608
> +	.long	1038487144
> +	.long	1470296608
> +	.long	1038487144
> +	.long	1470296608
> +	.long	1038487144
> +	.long	1470296608
> +	.long	1038487144
> +	.long	1470296608
> +	.long	1038487144
> +	.long	1470296608
> +	.long	1038487144
> +	.long	1470296608
> +	.long	1038487144
> +	.long	135375560
> +	.long	3177836758
> +	.long	135375560
> +	.long	3177836758
> +	.long	135375560
> +	.long	3177836758
> +	.long	135375560
> +	.long	3177836758
> +	.long	135375560
> +	.long	3177836758
> +	.long	135375560
> +	.long	3177836758
> +	.long	135375560
> +	.long	3177836758
> +	.long	135375560
> +	.long	3177836758
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	4294967295
> +	.long	2147483647
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	1841940611
> +	.long	1070882608
> +	.long	0
> +	.long	1127219200
> +	.long	0
> +	.long	1127219200
> +	.long	0
> +	.long	1127219200
> +	.long	0
> +	.long	1127219200
> +	.long	0
> +	.long	1127219200
> +	.long	0
> +	.long	1127219200
> +	.long	0
> +	.long	1127219200
> +	.long	0
> +	.long	1127219200
> +	.long	4294967295
> +	.long	1127219199
> +	.long	4294967295
> +	.long	1127219199
> +	.long	4294967295
> +	.long	1127219199
> +	.long	4294967295
> +	.long	1127219199
> +	.long	4294967295
> +	.long	1127219199
> +	.long	4294967295
> +	.long	1127219199
> +	.long	4294967295
> +	.long	1127219199
> +	.long	4294967295
> +	.long	1127219199
> +	.long	8388606
> +	.long	1127219200
> +	.long	8388606
> +	.long	1127219200
> +	.long	8388606
> +	.long	1127219200
> +	.long	8388606
> +	.long	1127219200
> +	.long	8388606
> +	.long	1127219200
> +	.long	8388606
> +	.long	1127219200
> +	.long	8388606
> +	.long	1127219200
> +	.long	8388606
> +	.long	1127219200
> +	.type	__gnu_svml_dcos_data,@object
> +	.size	__gnu_svml_dcos_data,1600

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc
@ 2014-09-10 15:15 Andrew Senkevich
  2014-09-10 16:55 ` Adhemerval Zanella
  0 siblings, 1 reply; 52+ messages in thread
From: Andrew Senkevich @ 2014-09-10 15:15 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 37 bytes --]

Patch attached here


--
WBR,
Andrew

[-- Attachment #2: vectorized_cos.patch --]
[-- Type: application/octet-stream, Size: 16267 bytes --]

diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h
index 8a94a7e..ebfa583 100644
--- a/math/bits/mathcalls.h
+++ b/math/bits/mathcalls.h
@@ -46,6 +46,17 @@
 # error "Never include <bits/mathcalls.h> directly; include <math.h> instead."
 #endif
 
+#undef __DECL_SIMD
+
+#if defined _OPENMP && _OPENMP >= 201307
+/* For now we have vectorized version only for _Mdouble_ case */
+# ifdef _Mdouble_
+#  define __DECL_SIMD _Pragma ("omp declare simd")
+# endif
+#else
+# define __DECL_SIMD
+#endif
+
 
 /* Trigonometric functions.  */
 
@@ -60,6 +71,7 @@ __MATHCALL (atan,, (_Mdouble_ __x));
 __MATHCALL (atan2,, (_Mdouble_ __y, _Mdouble_ __x));
 
 /* Cosine of X.  */
+__DECL_SIMD
 __MATHCALL (cos,, (_Mdouble_ __x));
 /* Sine of X.  */
 __MATHCALL (sin,, (_Mdouble_ __x));
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
index 2390934..1aa3099 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libm.abilist
@@ -89,6 +89,7 @@ GLIBC_2.18
 GLIBC_2.2.5
  GLIBC_2.2.5 A
  _LIB_VERSION D 0x4
+ _ZGVdN4v_cos F
  __clog10 F
  __clog10f F
  __clog10l F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/localplt.data b/sysdeps/unix/sysv/linux/x86_64/64/localplt.data
new file mode 100644
index 0000000..1a683d9
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86_64/64/localplt.data
@@ -0,0 +1,10 @@
+# See scripts/check-localplt.awk for how this file is processed.
+# PLT use is required for the malloc family and for matherr because
+# users can define their own functions and have library internals call them.
+libc.so: calloc
+libc.so: free
+libc.so: malloc
+libc.so: memalign
+libc.so: realloc
+libm.so: matherr
+libm.so: cos
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
new file mode 100644
index 0000000..1cb3ec5
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -0,0 +1,3 @@
+ifeq ($(subdir),math)
+libm-support += svml_d_cos4_core svml_d_cos_data
+endif
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
new file mode 100644
index 0000000..d30fbb3
--- /dev/null
+++ b/sysdeps/x86_64/fpu/Versions
@@ -0,0 +1,7 @@
+libm {
+  GLIBC_2.2.5 {
+    # A generic bug got this omitted from other configurations' version
+    # sets, but we always had it.
+    _ZGVdN4v_cos;
+  }
+}
diff --git a/sysdeps/x86_64/fpu/svml_d_cos4_core.S b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
new file mode 100644
index 0000000..7316d2b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos4_core.S
@@ -0,0 +1,185 @@
+/* Function cos vectorized with AVX2.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+
+	.text
+ENTRY(_ZGVdN4v_cos)
+
+/* ALGORITHM DESCRIPTION:
+ *     
+ *    ( low accuracy ( < 4ulp ) or enhanced performance ( half of correct mantissa ) implementation )
+ *     
+ *    Argument representation:
+ *    arg + Pi/2 = (N*Pi + R)
+ *    
+ *    Result calculation:
+ *    cos(arg) = sin(arg+Pi/2) = sin(N*Pi + R) = (-1)^N * sin(R)
+ *    sin(R) is approximated by corresponding polynomial
+ */
+        pushq     %rbp
+        movq      %rsp, %rbp
+        andq      $-64, %rsp
+        subq      $448, %rsp
+        movq      __gnu_svml_dcos_data@GOTPCREL(%rip), %rax
+        vmovapd   %ymm0, %ymm1
+        vmovupd   192(%rax), %ymm4
+        vmovupd   256(%rax), %ymm5
+
+/* ARGUMENT RANGE REDUCTION:
+ * Add Pi/2 to argument: X' = X+Pi/2
+ */
+        vaddpd    128(%rax), %ymm1, %ymm7
+
+/* Get absolute argument value: X' = |X'| */
+        vandpd    (%rax), %ymm7, %ymm2
+
+/* Y = X'*InvPi + RS : right shifter add */
+        vfmadd213pd %ymm5, %ymm4, %ymm7
+        vmovupd   1216(%rax), %ymm4
+
+/* Check for large arguments path */
+        vcmpnle_uqpd 64(%rax), %ymm2, %ymm3
+
+/* N = Y - RS : right shifter sub */
+        vsubpd    %ymm5, %ymm7, %ymm6
+        vmovupd   640(%rax), %ymm2
+
+/* SignRes = Y<<63 : shift LSB to MSB place for result sign */
+        vpsllq    $63, %ymm7, %ymm7
+
+/* N = N - 0.5 */
+        vsubpd    320(%rax), %ymm6, %ymm0
+        vmovmskpd %ymm3, %ecx
+
+/* R = X - N*Pi1 */
+        vmovapd   %ymm1, %ymm3
+        vfnmadd231pd %ymm0, %ymm2, %ymm3
+
+/* R = R - N*Pi2 */
+        vfnmadd231pd 704(%rax), %ymm0, %ymm3
+
+/* R = R - N*Pi3 */
+        vfnmadd132pd 768(%rax), %ymm3, %ymm0
+
+/* POLYNOMIAL APPROXIMATION:
+ * R2 = R*R
+ */
+        vmulpd    %ymm0, %ymm0, %ymm5
+        vfmadd213pd 1152(%rax), %ymm5, %ymm4
+        vfmadd213pd 1088(%rax), %ymm5, %ymm4
+        vfmadd213pd 1024(%rax), %ymm5, %ymm4
+
+/* Poly = C3+R2*(C4+R2*(C5+R2*(C6+R2*C7))) */
+        vfmadd213pd 960(%rax), %ymm5, %ymm4
+        vfmadd213pd 896(%rax), %ymm5, %ymm4
+        vfmadd213pd 832(%rax), %ymm5, %ymm4
+        vmulpd    %ymm5, %ymm4, %ymm6
+        vfmadd213pd %ymm0, %ymm0, %ymm6
+
+/* RECONSTRUCTION:
+ * Final sign setting: Res = Poly^SignRes 
+ */
+        vxorpd    %ymm7, %ymm6, %ymm0
+        testl     %ecx, %ecx
+        jne       _LBL_1_3
+
+_LBL_1_2:
+        movq      %rbp, %rsp
+        popq      %rbp
+        ret
+
+_LBL_1_3:
+        vmovupd   %ymm1, 320(%rsp)
+        vmovupd   %ymm0, 384(%rsp)
+        je        _LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        vmovups   %ymm8, 224(%rsp)
+        vmovups   %ymm9, 192(%rsp)
+        vmovups   %ymm10, 160(%rsp)
+        vmovups   %ymm11, 128(%rsp)
+        vmovups   %ymm12, 96(%rsp)
+        vmovups   %ymm13, 64(%rsp)
+        vmovups   %ymm14, 32(%rsp)
+        vmovups   %ymm15, (%rsp)
+        movq      %rsi, 264(%rsp)
+        movq      %rdi, 256(%rsp)
+        movq      %r12, 296(%rsp)
+        movb      %dl, %r12b
+        movq      %r13, 288(%rsp)
+        movl      %ecx, %r13d
+        movq      %r14, 280(%rsp)
+        movl      %eax, %r14d
+        movq      %r15, 272(%rsp)
+
+_LBL_1_6:
+        btl       %r14d, %r13d
+        jc        _LBL_1_12
+
+_LBL_1_7:
+        lea       1(%r14), %esi
+        btl       %esi, %r13d
+        jc        _LBL_1_10
+
+_LBL_1_8:
+        incb      %r12b
+        addl      $2, %r14d
+        cmpb      $16, %r12b
+        jb        _LBL_1_6
+
+        vmovups   224(%rsp), %ymm8
+        vmovups   192(%rsp), %ymm9
+        vmovups   160(%rsp), %ymm10
+        vmovups   128(%rsp), %ymm11
+        vmovups   96(%rsp), %ymm12
+        vmovups   64(%rsp), %ymm13
+        vmovups   32(%rsp), %ymm14
+        vmovups   (%rsp), %ymm15
+        vmovupd   384(%rsp), %ymm0
+        movq      264(%rsp), %rsi
+        movq      256(%rsp), %rdi
+        movq      296(%rsp), %r12
+        movq      288(%rsp), %r13
+        movq      280(%rsp), %r14
+        movq      272(%rsp), %r15
+        jmp       _LBL_1_2
+
+_LBL_1_10:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    328(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 392(%rsp,%r15)
+        jmp       _LBL_1_8
+
+_LBL_1_12:
+        movzbl    %r12b, %r15d
+        shlq      $4, %r15
+        vmovsd    320(%rsp,%r15), %xmm0
+        vzeroupper
+
+        call      cos@PLT
+
+        vmovsd    %xmm0, 384(%rsp,%r15)
+        jmp       _LBL_1_7
+END(_ZGVdN4v_cos)
diff --git a/sysdeps/x86_64/fpu/svml_d_cos_data.S b/sysdeps/x86_64/fpu/svml_d_cos_data.S
new file mode 100644
index 0000000..53f5244
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_d_cos_data.S
@@ -0,0 +1,424 @@
+/* Data for function cos vectorized with AVX2.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+	.align 64
+	.globl __gnu_svml_dcos_data
+__gnu_svml_dcos_data:
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	0
+	.long	1096810496
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1413754136
+	.long	1073291771
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1127743488
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	0
+	.long	1071644672
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	1073741824
+	.long	1074340347
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	0
+	.long	1048855597
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	2147483648
+	.long	1023952536
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1880851354
+	.long	998820945
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	1413754136
+	.long	1074340347
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	856972294
+	.long	1017226790
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	688016905
+	.long	962338001
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	1431655591
+	.long	3217380693
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	286303400
+	.long	1065423121
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	430291053
+	.long	3207201184
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	2150694560
+	.long	1053236707
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1174413873
+	.long	3193628213
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	1470296608
+	.long	1038487144
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	135375560
+	.long	3177836758
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	4294967295
+	.long	2147483647
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	1841940611
+	.long	1070882608
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	0
+	.long	1127219200
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	4294967295
+	.long	1127219199
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.long	8388606
+	.long	1127219200
+	.type	__gnu_svml_dcos_data,@object
+	.size	__gnu_svml_dcos_data,1600

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2014-11-27 20:12 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-10 15:08 [PATCH 1/N] x86_64 vectorization support: vectorized math functions addition to Glibc Andrew Senkevich
2014-09-10 16:08 ` Joseph S. Myers
2014-09-11 10:11   ` Matthew Fortune
2014-09-11 19:47     ` Adhemerval Zanella
2014-09-11 20:00     ` Carlos O'Donell
2014-09-11 21:02     ` Rich Felker
2014-09-12  0:06       ` Carlos O'Donell
2014-09-12  5:33       ` Andi Kleen
2014-09-12  7:18         ` Ondřej Bílka
2014-09-12 17:04           ` Andi Kleen
2014-09-12  7:43         ` Jakub Jelinek
2014-09-12 16:55           ` Andi Kleen
2014-09-12 17:03           ` Joseph S. Myers
2014-09-12 17:09             ` Jakub Jelinek
2014-09-15 12:36               ` Zamyatin, Igor
2014-09-15 16:43                 ` Andi Kleen
2014-11-12 17:42               ` Andrew Senkevich
2014-11-12 17:52                 ` Jakub Jelinek
2014-11-12 18:12                   ` Joseph Myers
2014-11-17 19:51                     ` Zamyatin, Igor
2014-11-17 23:55                       ` Joseph Myers
2014-11-27 16:13                         ` Andrew Senkevich
2014-11-27 17:17                           ` Joseph Myers
2014-11-27 19:24                             ` Andrew Senkevich
2014-11-27 20:12                               ` Joseph Myers
2014-09-12 19:18         ` Carlos O'Donell
2014-09-12 19:20           ` Carlos O'Donell
2014-09-12 19:56             ` Rich Felker
2014-09-12 20:33               ` Jakub Jelinek
2014-09-11 19:32   ` Carlos O'Donell
2014-09-11 20:19     ` Zamyatin, Igor
2014-09-11 20:26       ` Carlos O'Donell
2014-09-11 20:52       ` Joseph S. Myers
2014-09-11 20:57         ` H.J. Lu
2014-09-12  0:10           ` Carlos O'Donell
2014-09-12 15:01             ` H.J. Lu
2014-09-12 15:10               ` Carlos O'Donell
2014-09-12 16:00                 ` Torvald Riegel
2014-09-12 17:37                   ` Carlos O'Donell
2014-09-12 22:38                     ` Torvald Riegel
2014-09-12 22:47                       ` Carlos O'Donell
2014-09-12 19:05                 ` H.J. Lu
2014-09-12 19:13                   ` Carlos O'Donell
2014-09-12 19:31                     ` H.J. Lu
2014-09-16 16:57     ` Andrew Senkevich
2014-09-16 17:02       ` H.J. Lu
2014-09-17  9:56         ` Andrew Senkevich
2014-09-17 10:09           ` Jakub Jelinek
2014-09-17 11:58             ` Andrew Senkevich
2014-09-17 12:34               ` Joseph S. Myers
2014-09-10 15:15 Andrew Senkevich
2014-09-10 16:55 ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).