public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: Joseph Myers <joseph@codesourcery.com>
Cc: nd <nd@arm.com>, "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>
Subject: Re: Use floor functions not __floor functions in glibc libm
Date: Fri, 14 Sep 2018 12:57:00 -0000	[thread overview]
Message-ID: <HE1PR08MB1035234F4C64C1263009541B83190@HE1PR08MB1035.eurprd08.prod.outlook.com> (raw)

Hi Joseph,

> Similar to the changes that were made to call sqrt functions directly
> in glibc, instead of __ieee754_sqrt variants, so that the compiler
> could inline them automatically without needing special inline
> definitions in lots of math_private.h headers, this patch makes libm
> code call floor functions directly instead of __floor variants,
> removing the inlines / macros for x86_64 (SSE4.1) and powerpc
> (POWER5).

Looks great, thanks for doing this! The more general mechanism means
it should be much easier to do this for the remaining functions. Yes it
sounds like a good idea to do this for copysign too.

> Note that it's possible that in some cases an inline may be used where
> an IFUNC call was previously used - this is the case on x86_64, for
> example.  I think the direct calls to floor are still appropriate; if
> there's any significant performance cost from inline SSE2 floor
> instead of an IFUNC call ending up with SSE4.1 floor, that indicates
> that either the function should be doing something else that's faster
> than using floor at all, or it should itself have IFUNC variants, or
> that the compiler choice of inlining for generic tuning should change
> to allow for the possibility that, by not inlining, an SSE4.1 IFUNC
> might be called at runtime - but not that glibc should avoid calling
> floor internally.  (After all, all the same considerations would apply
> to any user program calling floor, where it might either be inlined or
> left as an out-of-line call allowing for a possible IFUNC.)  Any
> comments on this point?

Going via the PLT is expensive and it would be stupid to not inline simple
functions like floor, lrint etc. I did a quick experiment on floorf: 
On AArch64 a tight loop calling floorf is at least twice as fast than a library
call. On x64 the PLT overhead is at least 2.5 times.

The SSE2 floor instruction is twice as slow as the SSE4 version, however
due to the high PLT call overhead, inlining the SSE2 version is still 25%
faster than calling floorf using the SSE4 instruction. So inlining these
functions is always better.

Wilco

             reply	other threads:[~2018-09-14 12:57 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-14 12:57 Wilco Dijkstra [this message]
2018-09-14 13:17 ` Joseph Myers
2018-09-17 21:14 ` Joseph Myers
  -- strict thread matches above, loose matches on Subject: below --
2018-09-12 12:17 Joseph Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=HE1PR08MB1035234F4C64C1263009541B83190@HE1PR08MB1035.eurprd08.prod.outlook.com \
    --to=wilco.dijkstra@arm.com \
    --cc=joseph@codesourcery.com \
    --cc=libc-alpha@sourceware.org \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).