public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* x86-64 assembly codes vs C source codes
@ 2021-12-21 16:18 H.J. Lu
  2021-12-22 12:49 ` Adhemerval Zanella
  0 siblings, 1 reply; 5+ messages in thread
From: H.J. Lu @ 2021-12-21 16:18 UTC (permalink / raw)
  To: GNU C Library, Pandey, Sunil K, Cornea, Marius, Florian Weimer

Hi,

There are 421 x86-64 assembly source files.  Majority of
them have generic versions in C.  We added many x86-64
assembly source files for performance.   Most of x86-64
assembly source files started from the generic version in C.
They were compiled into assembly and optimized by hand.
Intel is committed to support x86-64 assembly source files
to improve performance and fix any bugs.

For 2.35, we'd like to add more x86-64 assembly source
files to libmvec.   The x86-64 assembly source files are the
preferred form for performance and accuracy today.

We will evaluate the generic alternative in the future
if it has similar performance and accuracy as the
assembly version, like:

ba4b8fab20 x86-64: Remove s_sincosf-sse2.S
4ca945e9c5 x86-64: Remove sysdeps/x86_64/fpu/s_cosf.S
9574c7b68d x86-64: Remove sysdeps/x86_64/fpu/s_sinf.S
e1f59bebd8 x86-64: Replace assembly versions of e_expf with generic e_expf.c
8537e0f6cf x86-64: Implement libmathvec IFUNC selectors in C
10a87ca476 x86-64: Implement libm IFUNC selectors in C
11ffcacb64 x86-64: Implement strcmp family IFUNC selectors in C
70fe2eb794 x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in C
9f4254b8bd x86-64: Implement wcscpy IFUNC selector in C
9ed0aa15d3 x86-64: Implement strcat family IFUNC selectors in C
b91a52d0d7 x86-64: Implement memcmp family IFUNC selectors in C
93e46f8773 x86-64: Implement memset family IFUNC selectors in C
5c3e322d3b x86-64: Implement memmove family IFUNC selectors in C
5a103908c0 x86-64: Implement strcpy family IFUNC selectors in C

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: x86-64 assembly codes vs C source codes
  2021-12-21 16:18 x86-64 assembly codes vs C source codes H.J. Lu
@ 2021-12-22 12:49 ` Adhemerval Zanella
  2021-12-22 13:39   ` H.J. Lu
  0 siblings, 1 reply; 5+ messages in thread
From: Adhemerval Zanella @ 2021-12-22 12:49 UTC (permalink / raw)
  To: H.J. Lu, GNU C Library, Pandey, Sunil K, Cornea, Marius, Florian Weimer



On 21/12/2021 13:18, H.J. Lu via Libc-alpha wrote:
> Hi,
> 
> There are 421 x86-64 assembly source files.  Majority of
> them have generic versions in C.  We added many x86-64
> assembly source files for performance.   Most of x86-64
> assembly source files started from the generic version in C.
> They were compiled into assembly and optimized by hand.
> Intel is committed to support x86-64 assembly source files
> to improve performance and fix any bugs.
> 
> For 2.35, we'd like to add more x86-64 assembly source
> files to libmvec.   The x86-64 assembly source files are the
> preferred form for performance and accuracy today.
> 
> We will evaluate the generic alternative in the future
> if it has similar performance and accuracy as the
> assembly version, like:

I understand this route could be the most straightforward one to provide
optimized code, since you do not need to handle multiple compilers or tune
the multiple possible options, nor handle different architectures and/or ABI 
or chips.

However, if you also check on the list you will note that a lot of assembly 
routines removal was possible due the use of better implementation which tried
to avoid the use of optimized assembly routines by using generic C source code
and leveraging compiler support with minimal arch-specific tuning.

This is more costly initially, but imho it does pay off in the long term.
For instance, the math implementations from ARM optimized routines [1]
provided a better implementation for all architectures, beaten even the x86
assembly one (as you noted in the list).

Similar is the new hypot implementation I created with Wilco, it is faster
for most architectures, specially when FMA is available (for instance for
x86_64-v2).

ARM and other developers could just crafted a assembly routine and optimize 
it solely for ARM.  It would require additionnal effort and time from other
maintainers to check and craft optimized version for each architecture.

I think this does make sense for string ones where it does rely on architecture 
and chip specific tuning that are hardly to express in generic C code. However
I really think with current compiler support, we really should avoid it for
math code.

Specially because it is already a not easy code to follow, it usually results
in a large code that may become unmaintainable due time (or at least require
a lot of more effort), it might hinder some compiler support improvements,
and it is containerized to a specific architecture (even when the code might
be re-used on different ones).

So I would ask you if you could improve the libmvec support to make it more
generic it would be really useful.

> 
> ba4b8fab20 x86-64: Remove s_sincosf-sse2.S
> 4ca945e9c5 x86-64: Remove sysdeps/x86_64/fpu/s_cosf.S
> 9574c7b68d x86-64: Remove sysdeps/x86_64/fpu/s_sinf.S
> e1f59bebd8 x86-64: Replace assembly versions of e_expf with generic e_expf.c
> 8537e0f6cf x86-64: Implement libmathvec IFUNC selectors in C
> 10a87ca476 x86-64: Implement libm IFUNC selectors in C
> 11ffcacb64 x86-64: Implement strcmp family IFUNC selectors in C
> 70fe2eb794 x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in C
> 9f4254b8bd x86-64: Implement wcscpy IFUNC selector in C
> 9ed0aa15d3 x86-64: Implement strcat family IFUNC selectors in C
> b91a52d0d7 x86-64: Implement memcmp family IFUNC selectors in C
> 93e46f8773 x86-64: Implement memset family IFUNC selectors in C
> 5c3e322d3b x86-64: Implement memmove family IFUNC selectors in C
> 5a103908c0 x86-64: Implement strcpy family IFUNC selectors in C
> 
> Thanks.
> 

[1] https://github.com/ARM-software/optimized-routines

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: x86-64 assembly codes vs C source codes
  2021-12-22 12:49 ` Adhemerval Zanella
@ 2021-12-22 13:39   ` H.J. Lu
  2021-12-22 14:23     ` Szabolcs Nagy
  0 siblings, 1 reply; 5+ messages in thread
From: H.J. Lu @ 2021-12-22 13:39 UTC (permalink / raw)
  To: Adhemerval Zanella
  Cc: GNU C Library, Pandey, Sunil K, Cornea, Marius, Florian Weimer

On Wed, Dec 22, 2021 at 4:49 AM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
> On 21/12/2021 13:18, H.J. Lu via Libc-alpha wrote:
> > Hi,
> >
> > There are 421 x86-64 assembly source files.  Majority of
> > them have generic versions in C.  We added many x86-64
> > assembly source files for performance.   Most of x86-64
> > assembly source files started from the generic version in C.
> > They were compiled into assembly and optimized by hand.
> > Intel is committed to support x86-64 assembly source files
> > to improve performance and fix any bugs.
> >
> > For 2.35, we'd like to add more x86-64 assembly source
> > files to libmvec.   The x86-64 assembly source files are the
> > preferred form for performance and accuracy today.
> >
> > We will evaluate the generic alternative in the future
> > if it has similar performance and accuracy as the
> > assembly version, like:
>
> I understand this route could be the most straightforward one to provide
> optimized code, since you do not need to handle multiple compilers or tune
> the multiple possible options, nor handle different architectures and/or ABI
> or chips.
>
> However, if you also check on the list you will note that a lot of assembly
> routines removal was possible due the use of better implementation which tried
> to avoid the use of optimized assembly routines by using generic C source code
> and leveraging compiler support with minimal arch-specific tuning.
>
> This is more costly initially, but imho it does pay off in the long term.
> For instance, the math implementations from ARM optimized routines [1]
> provided a better implementation for all architectures, beaten even the x86
> assembly one (as you noted in the list).

Thank you for understanding.  I appreciate your work on libm.

> Similar is the new hypot implementation I created with Wilco, it is faster
> for most architectures, specially when FMA is available (for instance for
> x86_64-v2).
>
> ARM and other developers could just crafted a assembly routine and optimize
> it solely for ARM.  It would require additionnal effort and time from other
> maintainers to check and craft optimized version for each architecture.
>
> I think this does make sense for string ones where it does rely on architecture
> and chip specific tuning that are hardly to express in generic C code. However
> I really think with current compiler support, we really should avoid it for
> math code.
>
> Specially because it is already a not easy code to follow, it usually results
> in a large code that may become unmaintainable due time (or at least require
> a lot of more effort), it might hinder some compiler support improvements,
> and it is containerized to a specific architecture (even when the code might
> be re-used on different ones).
>
> So I would ask you if you could improve the libmvec support to make it more
> generic it would be really useful.

The difficult part of generic implementation, as you have noted, is that
compiler support.  libmvec is a vector math library from Intel SVML.  It
requires a compiler with good vectorizer support to get performance.
Even with an Intel compiler, we still need to hand optimize it after compiler.
For the short term, we need assembly codes for libmvec.  I will discuss
internally what we should do for the long term.

> >
> > ba4b8fab20 x86-64: Remove s_sincosf-sse2.S
> > 4ca945e9c5 x86-64: Remove sysdeps/x86_64/fpu/s_cosf.S
> > 9574c7b68d x86-64: Remove sysdeps/x86_64/fpu/s_sinf.S
> > e1f59bebd8 x86-64: Replace assembly versions of e_expf with generic e_expf.c
> > 8537e0f6cf x86-64: Implement libmathvec IFUNC selectors in C
> > 10a87ca476 x86-64: Implement libm IFUNC selectors in C
> > 11ffcacb64 x86-64: Implement strcmp family IFUNC selectors in C
> > 70fe2eb794 x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in C
> > 9f4254b8bd x86-64: Implement wcscpy IFUNC selector in C
> > 9ed0aa15d3 x86-64: Implement strcat family IFUNC selectors in C
> > b91a52d0d7 x86-64: Implement memcmp family IFUNC selectors in C
> > 93e46f8773 x86-64: Implement memset family IFUNC selectors in C
> > 5c3e322d3b x86-64: Implement memmove family IFUNC selectors in C
> > 5a103908c0 x86-64: Implement strcpy family IFUNC selectors in C
> >
> > Thanks.
> >
>
> [1] https://github.com/ARM-software/optimized-routines



-- 
H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: x86-64 assembly codes vs C source codes
  2021-12-22 13:39   ` H.J. Lu
@ 2021-12-22 14:23     ` Szabolcs Nagy
  2021-12-22 15:36       ` H.J. Lu
  0 siblings, 1 reply; 5+ messages in thread
From: Szabolcs Nagy @ 2021-12-22 14:23 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Adhemerval Zanella, Florian Weimer, Cornea, Marius,
	GNU C Library, Pandey, Sunil K

The 12/22/2021 05:39, H.J. Lu via Libc-alpha wrote:
> On Wed, Dec 22, 2021 at 4:49 AM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
> > On 21/12/2021 13:18, H.J. Lu via Libc-alpha wrote:
> > So I would ask you if you could improve the libmvec support to make it more
> > generic it would be really useful.
> 
> The difficult part of generic implementation, as you have noted, is that
> compiler support.  libmvec is a vector math library from Intel SVML.  It
> requires a compiler with good vectorizer support to get performance.
> Even with an Intel compiler, we still need to hand optimize it after compiler.
> For the short term, we need assembly codes for libmvec.  I will discuss
> internally what we should do for the long term.

i think it is hard to write generic simd math code
even if there were good compiler support: the same
algorithm is likely not optimal across targets due
to different available instructions. but if we aim
for maintainability instead of performance then it
should be possible to have generic code.

the c language has no generic simd support, the gcc
vector syntax is very limited and targets handle simd
intrinsics sufficiently differently that using an
abstraction over the common subset is suboptimal:
sleef is an example where this has been tried and
it involves a fair bit of simd isa specific code and
it does not give consistent performance across targets,
it does not cover all aspects of target vector abis and
it does not have to work with an old gcc. but it's
likely more maintainable than writing asm code for
each supported isa.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: x86-64 assembly codes vs C source codes
  2021-12-22 14:23     ` Szabolcs Nagy
@ 2021-12-22 15:36       ` H.J. Lu
  0 siblings, 0 replies; 5+ messages in thread
From: H.J. Lu @ 2021-12-22 15:36 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Adhemerval Zanella, Florian Weimer, Cornea, Marius,
	GNU C Library, Pandey, Sunil K

On Wed, Dec 22, 2021 at 6:23 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
> The 12/22/2021 05:39, H.J. Lu via Libc-alpha wrote:
> > On Wed, Dec 22, 2021 at 4:49 AM Adhemerval Zanella
> > <adhemerval.zanella@linaro.org> wrote:
> > > On 21/12/2021 13:18, H.J. Lu via Libc-alpha wrote:
> > > So I would ask you if you could improve the libmvec support to make it more
> > > generic it would be really useful.
> >
> > The difficult part of generic implementation, as you have noted, is that
> > compiler support.  libmvec is a vector math library from Intel SVML.  It
> > requires a compiler with good vectorizer support to get performance.
> > Even with an Intel compiler, we still need to hand optimize it after compiler.
> > For the short term, we need assembly codes for libmvec.  I will discuss
> > internally what we should do for the long term.
>
> i think it is hard to write generic simd math code
> even if there were good compiler support: the same
> algorithm is likely not optimal across targets due
> to different available instructions. but if we aim
> for maintainability instead of performance then it
> should be possible to have generic code.
>
> the c language has no generic simd support, the gcc
> vector syntax is very limited and targets handle simd
> intrinsics sufficiently differently that using an
> abstraction over the common subset is suboptimal:
> sleef is an example where this has been tried and
> it involves a fair bit of simd isa specific code and
> it does not give consistent performance across targets,
> it does not cover all aspects of target vector abis and
> it does not have to work with an old gcc. but it's
> likely more maintainable than writing asm code for
> each supported isa.
>

Hi Szabolcs,

Thanks for pointing out all these issues with the vector
math library.

We'd like to contribute high performance libmvec functions
in assembly codes with good comments to glibc 2.35.  Sunil
will commit acos this week and submit the rest next week.

-- 
H.J.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-22 15:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-21 16:18 x86-64 assembly codes vs C source codes H.J. Lu
2021-12-22 12:49 ` Adhemerval Zanella
2021-12-22 13:39   ` H.J. Lu
2021-12-22 14:23     ` Szabolcs Nagy
2021-12-22 15:36       ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).