From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: <libc-help@sourceware.org>,
"Adhemerval Zanella" <adhemerval.zanella@linaro.org>
Subject: Re: Twiddling with 64-bit values as 2 ints;
Date: Mon, 23 Aug 2021 17:37:13 +0200 [thread overview]
Message-ID: <3F07DF81FC2040E69CB83A78EDE05BB7@H270> (raw)
In-Reply-To: <f8460d66-dec6-9852-3710-8e5d6627df54@linaro.org>
Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> On 23/08/2021 10:18, Stefan Kanthak wrote:
>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>
>>> On 21/08/2021 10:34, Stefan Kanthak wrote:
>>>>
>>>> (Heretic.-) questions:
>>>> - why does glibc still employ such ugly code?
>>>> - Why doesn't glibc take advantage of 64-bit integers in such code?
>>>
>>> Because no one cared to adjust the implementation. Recently Wilco
>>> has removed a lot of old code that still uses 32-bit instead of 64-bit
>>> bo bit twinddling in floating-pointer implementation (check caa884dda7
>>> and 9e97f239eae1f2).
>>
>> That's good to hear.
>>
>>> I think we should move to use a simplest code assuming 64-bit CPU
>>
>> D'accord.
>> And there's a second direction where you might move: almost all CPUs
>> have separate general purpose registers and floating-point registers.
>> Bit-twiddling generally needs extra (and sometimes slow) transfers
>> between them.
>> In 32-bit environment, where arguments are typically passed on the
>> stack, at least loading an argument from the stack into a GPR or FPR
>> makes no difference.
>> In 64-bit environment, where arguments are passed in registers, they
>> should be operated on in these registers.
>>
>> So: why not implement routines like nextafter() without bit-twiddling,
>> using floating-point as far as possible for architectures where this
>> gives better results?
>
> Mainly because some math routines are not performance critical in the
> sense they are usually not hotspots and for these I would prefer the
> simplest code that work with reasonable performance independently of
> the underlying ABI or architecture
With this we're back at square 1: my initial post showed such simple(st)
code.
The performance gain I experienced in my use case was more than
noticeable: on AMD64, the total runtime of my program decreased from
20s to 12s.
> (using integer operation might be be for soft-fp ABI for instance).
> For symbols are might be performance critical, we do have more optimized
> version. Szabolcs and Wilco spent considerable time to tune a lot of
> math functions and to remove the slow code path; also for some routines
> we have internal defines that map then to compiler builtin when we know
> that compiler and architecture allows us to do so (check the rounding
> routines or sqrt for instance).
>
> Recently we are aiming to avoid arch-specific code for complex routines,
> and prefer C implementation that leverage the compiler support. It makes
> a *much* maintainable code and without the need to keep evaluating the
> routines on each architecture new iterations (as some routines proven to
> be slower than more well coded generic implementation).
That was the goal of my patch: let the compiler operate on 64-bit integers
instead the C implementation on pairs of 32-bit integers.
>> The simple implementation I showed in my initial post improved the
>> throughput in my benchmark (on AMD64) by an order of magnitude.
>> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
>> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.
>
> Your implementation triggered a lot of regression,
The initial, FP-preferring code was a demonstration, not a patch.
> you will need to sort this out before considering performance numbers.
> Also, we will need a proper benchmark to evaluate it, as Szabolcs and
> Wilco has done for their math work.
>
>>
>> Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be
>> used to select between the integer bit-twiddling code and FP-preferring
>> code during compilation?
>
> No and I don't think we this would be a good addition. As before, I would
> prefer to have a simple generic implementation that give us a good
> performance on modern hardware instead of a configurable one with many
> tunables. The later is increases the maintainable cost (with testing and
> performance evaluation).
Having dedicated implementations for different architectures is even more
costly!
My intention/proposal is to have at most two different generic implementations,
one using integer bit-twiddling wherever possible, thus supporting soft-fp well,
the second using floating-point wherever possible, thus supporting modern
hardware well.
Stefan
next prev parent reply other threads:[~2021-08-23 15:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-21 13:34 Stefan Kanthak
2021-08-23 12:23 ` Adhemerval Zanella
2021-08-23 13:18 ` Stefan Kanthak
2021-08-23 14:11 ` Adhemerval Zanella
2021-08-23 15:37 ` Stefan Kanthak [this message]
2021-08-23 16:51 ` Adhemerval Zanella
2021-08-23 17:32 ` Stefan Kanthak
2021-08-23 18:24 ` Adhemerval Zanella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3F07DF81FC2040E69CB83A78EDE05BB7@H270 \
--to=stefan.kanthak@nexgo.de \
--cc=adhemerval.zanella@linaro.org \
--cc=libc-help@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).