public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: <libc-help@sourceware.org>,
	"Adhemerval Zanella" <adhemerval.zanella@linaro.org>
Subject: Re: Twiddling with 64-bit values as 2 ints;
Date: Mon, 23 Aug 2021 17:37:13 +0200	[thread overview]
Message-ID: <3F07DF81FC2040E69CB83A78EDE05BB7@H270> (raw)
In-Reply-To: <f8460d66-dec6-9852-3710-8e5d6627df54@linaro.org>

Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:

> On 23/08/2021 10:18, Stefan Kanthak wrote:
>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>> 
>>> On 21/08/2021 10:34, Stefan Kanthak wrote:
>>>>
>>>> (Heretic.-) questions:
>>>> - why does glibc still employ such ugly code?
>>>> - Why doesn't glibc take advantage of 64-bit integers in such code?
>>>
>>> Because no one cared to adjust the implementation.  Recently Wilco
>>> has removed a lot of old code that still uses 32-bit instead of 64-bit
>>> bo bit twinddling in floating-pointer implementation (check caa884dda7
>>> and 9e97f239eae1f2).
>> 
>> That's good to hear.
>> 
>>> I think we should move to use a simplest code assuming 64-bit CPU
>> 
>> D'accord.
>> And there's a second direction where you might move: almost all CPUs
>> have separate general purpose registers and floating-point registers.
>> Bit-twiddling generally needs extra (and sometimes slow) transfers
>> between them.
>> In 32-bit environment, where arguments are typically passed on the
>> stack, at least loading an argument from the stack into a GPR or FPR
>> makes no difference.
>> In 64-bit environment, where arguments are passed in registers, they
>> should be operated on in these registers.
>> 
>> So: why not implement routines like nextafter() without bit-twiddling,
>> using floating-point as far as possible for architectures where this
>> gives better results?
> 
> Mainly because some math routines are not performance critical in the
> sense they are usually not hotspots and for these I would prefer the 
> simplest code that work with reasonable performance independently of
> the underlying ABI or architecture

With this we're back at square 1: my initial post showed such simple(st)
code.
The performance gain I experienced in my use case was more than
noticeable: on AMD64, the total runtime of my program decreased from
20s to 12s.

> (using integer operation might be be for soft-fp ABI for instance).



> For symbols are might be performance critical, we do have more optimized
> version.  Szabolcs and Wilco spent considerable time to tune a lot of
> math functions and to remove the slow code path; also for some routines
> we have internal defines that map then to compiler builtin when we know
> that compiler and architecture allows us to do so (check the rounding
> routines or sqrt for instance).
> 
> Recently we are aiming to avoid arch-specific code for complex routines,
> and prefer C implementation that leverage the compiler support.  It makes
> a *much* maintainable code and without the need to keep evaluating the 
> routines on each architecture new iterations (as some routines proven to
> be slower than more well coded generic implementation).

That was the goal of my patch: let the compiler operate on 64-bit integers
instead the C implementation on pairs of 32-bit integers.

>> The simple implementation I showed in my initial post improved the
>> throughput in my benchmark (on AMD64) by an order of magnitude.
>> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
>> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.
> 
> Your implementation triggered a lot of regression,

The initial, FP-preferring code was a demonstration, not a patch.

> you will need to sort this out before considering performance numbers.
> Also, we will need a proper benchmark to evaluate it, as Szabolcs and
> Wilco has done for their math work.
> 
>> 
>> Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be
>> used to select between the integer bit-twiddling code and FP-preferring
>> code during compilation?
> 
> No and I don't think we this would be a good addition.  As before, I would
> prefer to have a simple generic implementation that give us a good
> performance on modern hardware instead of a configurable one with many
> tunables.  The later is increases the maintainable cost (with testing and
> performance evaluation).

Having dedicated implementations for different architectures is even more
costly!
My intention/proposal is to have at most two different generic implementations,
one using integer bit-twiddling wherever possible, thus supporting soft-fp well,
the second using floating-point wherever possible, thus supporting modern
hardware well.

Stefan

  reply	other threads:[~2021-08-23 15:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-21 13:34 Stefan Kanthak
2021-08-23 12:23 ` Adhemerval Zanella
2021-08-23 13:18   ` Stefan Kanthak
2021-08-23 14:11     ` Adhemerval Zanella
2021-08-23 15:37       ` Stefan Kanthak [this message]
2021-08-23 16:51         ` Adhemerval Zanella
2021-08-23 17:32           ` Stefan Kanthak
2021-08-23 18:24             ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F07DF81FC2040E69CB83A78EDE05BB7@H270 \
    --to=stefan.kanthak@nexgo.de \
    --cc=adhemerval.zanella@linaro.org \
    --cc=libc-help@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).