public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* RFC: Creating a more efficient sincos interface
@ 2018-09-13 13:27 Wilco Dijkstra
  2018-09-13 13:49 ` H.J. Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Wilco Dijkstra @ 2018-09-13 13:27 UTC (permalink / raw)
  To: libc-alpha, gcc; +Cc: nd

Hi,

The existing sincos functions use 2 pointers to return the sine and cosine result. In
most cases 4 memory accesses are necessary per call. This is inefficient and often
significantly slower than returning values in registers. I ran a few experiments on the
new optimized sincosf implementation in GLIBC using the following interface:

__complex__ float sincosf2 (float);

This has 50% higher throughput and a 25% reduction in latency on Cortex-A72 for
random inputs in the range +-PI/4. Larger inputs take longer and thus have lower
gains, but there is still a 5% gain on the (rarely used) path with full range reduction.
Given sincos is used in various HPC applications this can give a worthwile speedup.

LLVM already supports something similar for OSX using a struct of 2 floats.
Using complex float is better since not all targets may support returning structures in
floating point registers and GCC generates very inefficient code on targets that do
(PR86145).

What do people think? Ideally I'd like to support this in a generic way so all targets can
benefit, but it's also feasible to enable it on a per-target basis. Also since not all libraries
will support the new interface, there would have to be a flag or configure option to switch
the new interface off if not supported (maybe automatically based on the math.h header).

Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Creating a more efficient sincos interface
  2018-09-13 13:27 RFC: Creating a more efficient sincos interface Wilco Dijkstra
@ 2018-09-13 13:49 ` H.J. Lu
  2018-09-13 13:52 ` Florian Weimer
  2018-09-13 14:32 ` Alexander Monakov
  2 siblings, 0 replies; 7+ messages in thread
From: H.J. Lu @ 2018-09-13 13:49 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: libc-alpha, gcc, nd

On Thu, Sep 13, 2018 at 6:27 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Hi,
>
> The existing sincos functions use 2 pointers to return the sine and cosine result. In
> most cases 4 memory accesses are necessary per call. This is inefficient and often
> significantly slower than returning values in registers. I ran a few experiments on the
> new optimized sincosf implementation in GLIBC using the following interface:
>
> __complex__ float sincosf2 (float);

Is this an internal interface or public one?

> This has 50% higher throughput and a 25% reduction in latency on Cortex-A72 for
> random inputs in the range +-PI/4. Larger inputs take longer and thus have lower
> gains, but there is still a 5% gain on the (rarely used) path with full range reduction.
> Given sincos is used in various HPC applications this can give a worthwile speedup.
>
> LLVM already supports something similar for OSX using a struct of 2 floats.
> Using complex float is better since not all targets may support returning structures in
> floating point registers and GCC generates very inefficient code on targets that do
> (PR86145).
>
> What do people think? Ideally I'd like to support this in a generic way so all targets can
> benefit, but it's also feasible to enable it on a per-target basis. Also since not all libraries
> will support the new interface, there would have to be a flag or configure option to switch
> the new interface off if not supported (maybe automatically based on the math.h header).
>
> Wilco



-- 
H.J.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Creating a more efficient sincos interface
  2018-09-13 13:27 RFC: Creating a more efficient sincos interface Wilco Dijkstra
  2018-09-13 13:49 ` H.J. Lu
@ 2018-09-13 13:52 ` Florian Weimer
  2018-09-13 14:23   ` Szabolcs Nagy
  2018-09-13 14:54   ` Joseph Myers
  2018-09-13 14:32 ` Alexander Monakov
  2 siblings, 2 replies; 7+ messages in thread
From: Florian Weimer @ 2018-09-13 13:52 UTC (permalink / raw)
  To: Wilco Dijkstra, libc-alpha, gcc; +Cc: nd

On 09/13/2018 03:27 PM, Wilco Dijkstra wrote:
> Hi,
> 
> The existing sincos functions use 2 pointers to return the sine and cosine result. In
> most cases 4 memory accesses are necessary per call. This is inefficient and often
> significantly slower than returning values in registers. I ran a few experiments on the
> new optimized sincosf implementation in GLIBC using the following interface:
> 
> __complex__ float sincosf2 (float);
> 
> This has 50% higher throughput and a 25% reduction in latency on Cortex-A72 for
> random inputs in the range +-PI/4. Larger inputs take longer and thus have lower
> gains, but there is still a 5% gain on the (rarely used) path with full range reduction.
> Given sincos is used in various HPC applications this can give a worthwile speedup.

I think this is totally fine if you call it expif or something like that 
(and put the sine in the imaginary part, of course).

In general, I would object to using complex numbers for arbitrary pairs, 
but this doesn't apply to this case.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Creating a more efficient sincos interface
  2018-09-13 13:52 ` Florian Weimer
@ 2018-09-13 14:23   ` Szabolcs Nagy
  2018-09-13 14:54   ` Joseph Myers
  1 sibling, 0 replies; 7+ messages in thread
From: Szabolcs Nagy @ 2018-09-13 14:23 UTC (permalink / raw)
  To: Florian Weimer, Wilco Dijkstra, libc-alpha, gcc; +Cc: nd

On 13/09/18 14:52, Florian Weimer wrote:
> On 09/13/2018 03:27 PM, Wilco Dijkstra wrote:
>> Hi,
>>
>> The existing sincos functions use 2 pointers to return the sine and cosine result. In
>> most cases 4 memory accesses are necessary per call. This is inefficient and often
>> significantly slower than returning values in registers. I ran a few experiments on the
>> new optimized sincosf implementation in GLIBC using the following interface:
>>
>> __complex__ float sincosf2 (float);
>>
>> This has 50% higher throughput and a 25% reduction in latency on Cortex-A72 for
>> random inputs in the range +-PI/4. Larger inputs take longer and thus have lower
>> gains, but there is still a 5% gain on the (rarely used) path with full range reduction.
>> Given sincos is used in various HPC applications this can give a worthwile speedup.
> 
> I think this is totally fine if you call it expif or something like that (and put the sine in the imaginary part, of course).
> 

gcc seems to have a __builtin_cexpif
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/builtins.c;h=58ea7475ef7bb2a8abad2463b896efaa8fd79650;hb=HEAD#l2439

but i dont see it documented, may be we
can add an actual cexpif symbol with the
above signature?

> In general, I would object to using complex numbers for arbitrary pairs, but this doesn't apply to this case.
> 
> Thanks,
> Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Creating a more efficient sincos interface
  2018-09-13 13:27 RFC: Creating a more efficient sincos interface Wilco Dijkstra
  2018-09-13 13:49 ` H.J. Lu
  2018-09-13 13:52 ` Florian Weimer
@ 2018-09-13 14:32 ` Alexander Monakov
  2018-09-13 15:41   ` Richard Biener
  2 siblings, 1 reply; 7+ messages in thread
From: Alexander Monakov @ 2018-09-13 14:32 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: libc-alpha, gcc, nd

On Thu, 13 Sep 2018, Wilco Dijkstra wrote:
> What do people think? Ideally I'd like to support this in a generic way so all targets can
> benefit, but it's also feasible to enable it on a per-target basis. Also since not all libraries
> will support the new interface, there would have to be a flag or configure option to switch
> the new interface off if not supported (maybe automatically based on the math.h header).

GCC already has __builtin_cexpi for this, so I think you can introduce cexpi
implementation in libc, and then adjust expand_builtin_cexpi appropriately.

I wonder if it would be possible to add a fallback cexpi implementation to
libgcc.a that would be picked by the linker if there's no such symbol in libm?

Alexander

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Creating a more efficient sincos interface
  2018-09-13 13:52 ` Florian Weimer
  2018-09-13 14:23   ` Szabolcs Nagy
@ 2018-09-13 14:54   ` Joseph Myers
  1 sibling, 0 replies; 7+ messages in thread
From: Joseph Myers @ 2018-09-13 14:54 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Wilco Dijkstra, libc-alpha, gcc, nd

On Thu, 13 Sep 2018, Florian Weimer wrote:

> I think this is totally fine if you call it expif or something like that (and
> put the sine in the imaginary part, of course).

And declare it in bits/cmathcalls.h as included from complex.h, rather 
than in math.h.  With an appropriate custom RUN_TEST_LOOP_* macro that 
deals with the different order of expected results you should be able to 
put the tests in libm-test-sincos.inc, sharing the array of expected 
results with that for sincos rather than needing to generate a separate 
file of expected results with sin and cos swapped.  Presumably you'd want 
the various type-generic templates for complex functions using M_SINCOS to 
move to using (an implementation-namespace name for) the faster interface, 
but that could be a separate patch.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: Creating a more efficient sincos interface
  2018-09-13 14:32 ` Alexander Monakov
@ 2018-09-13 15:41   ` Richard Biener
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2018-09-13 15:41 UTC (permalink / raw)
  To: gcc, Alexander Monakov, Wilco Dijkstra; +Cc: libc-alpha, gcc, nd

On September 13, 2018 4:32:42 PM GMT+02:00, Alexander Monakov <amonakov@ispras.ru> wrote:
>On Thu, 13 Sep 2018, Wilco Dijkstra wrote:
>> What do people think? Ideally I'd like to support this in a generic
>way so all targets can
>> benefit, but it's also feasible to enable it on a per-target basis.
>Also since not all libraries
>> will support the new interface, there would have to be a flag or
>configure option to switch
>> the new interface off if not supported (maybe automatically based on
>the math.h header).
>
>GCC already has __builtin_cexpi for this, so I think you can introduce
>cexpi
>implementation in libc, and then adjust expand_builtin_cexpi
>appropriately.

Note currently we expand that to sincos (if available) or cexp. We use it for canonicalization and better optimization on GIMPLE (register promoting the pointed to vars). 

>I wonder if it would be possible to add a fallback cexpi implementation
>to
>libgcc.a that would be picked by the linker if there's no such symbol
>in libm?

That would probably be a requirement. 

Richard. 

>
>Alexander

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-09-13 15:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-13 13:27 RFC: Creating a more efficient sincos interface Wilco Dijkstra
2018-09-13 13:49 ` H.J. Lu
2018-09-13 13:52 ` Florian Weimer
2018-09-13 14:23   ` Szabolcs Nagy
2018-09-13 14:54   ` Joseph Myers
2018-09-13 14:32 ` Alexander Monakov
2018-09-13 15:41   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).