Re: Twiddling with 64-bit values as 2 ints;

public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed

From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Stefan Kanthak <stefan.kanthak@nexgo.de>, libc-help@sourceware.org
Subject: Re: Twiddling with 64-bit values as 2 ints;
Date: Mon, 23 Aug 2021 13:51:27 -0300	[thread overview]
Message-ID: <0978c043-b32b-ecf8-5cfe-de31c473bb4d@linaro.org> (raw)
In-Reply-To: <3F07DF81FC2040E69CB83A78EDE05BB7@H270>



On 23/08/2021 12:37, Stefan Kanthak wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
>> On 23/08/2021 10:18, Stefan Kanthak wrote:
>>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>>
>>>> On 21/08/2021 10:34, Stefan Kanthak wrote:
>>>>>
>>>>> (Heretic.-) questions:
>>>>> - why does glibc still employ such ugly code?
>>>>> - Why doesn't glibc take advantage of 64-bit integers in such code?
>>>>
>>>> Because no one cared to adjust the implementation.  Recently Wilco
>>>> has removed a lot of old code that still uses 32-bit instead of 64-bit
>>>> bo bit twinddling in floating-pointer implementation (check caa884dda7
>>>> and 9e97f239eae1f2).
>>>
>>> That's good to hear.
>>>
>>>> I think we should move to use a simplest code assuming 64-bit CPU
>>>
>>> D'accord.
>>> And there's a second direction where you might move: almost all CPUs
>>> have separate general purpose registers and floating-point registers.
>>> Bit-twiddling generally needs extra (and sometimes slow) transfers
>>> between them.
>>> In 32-bit environment, where arguments are typically passed on the
>>> stack, at least loading an argument from the stack into a GPR or FPR
>>> makes no difference.
>>> In 64-bit environment, where arguments are passed in registers, they
>>> should be operated on in these registers.
>>>
>>> So: why not implement routines like nextafter() without bit-twiddling,
>>> using floating-point as far as possible for architectures where this
>>> gives better results?
>>
>> Mainly because some math routines are not performance critical in the
>> sense they are usually not hotspots and for these I would prefer the 
>> simplest code that work with reasonable performance independently of
>> the underlying ABI or architecture
> 
> With this we're back at square 1: my initial post showed such simple(st)
> code.
> The performance gain I experienced in my use case was more than
> noticeable: on AMD64, the total runtime of my program decreased from
> 20s to 12s.
> 
>> (using integer operation might be be for soft-fp ABI for instance).
> 
> 
> 
>> For symbols are might be performance critical, we do have more optimized
>> version.  Szabolcs and Wilco spent considerable time to tune a lot of
>> math functions and to remove the slow code path; also for some routines
>> we have internal defines that map then to compiler builtin when we know
>> that compiler and architecture allows us to do so (check the rounding
>> routines or sqrt for instance).
>>
>> Recently we are aiming to avoid arch-specific code for complex routines,
>> and prefer C implementation that leverage the compiler support.  It makes
>> a *much* maintainable code and without the need to keep evaluating the 
>> routines on each architecture new iterations (as some routines proven to
>> be slower than more well coded generic implementation).
> 
> That was the goal of my patch: let the compiler operate on 64-bit integers
> instead the C implementation on pairs of 32-bit integers.
> 
>>> The simple implementation I showed in my initial post improved the
>>> throughput in my benchmark (on AMD64) by an order of magnitude.
>>> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
>>> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.
>>
>> Your implementation triggered a lot of regression,
> 
> The initial, FP-preferring code was a demonstration, not a patch.

Right, but it does do not much sense comparing performance numbers with
an implementation that adds a lot of regressions. 

> 
>> you will need to sort this out before considering performance numbers.
>> Also, we will need a proper benchmark to evaluate it, as Szabolcs and
>> Wilco has done for their math work.
>>
>>>
>>> Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be
>>> used to select between the integer bit-twiddling code and FP-preferring
>>> code during compilation?
>>
>> No and I don't think we this would be a good addition.  As before, I would
>> prefer to have a simple generic implementation that give us a good
>> performance on modern hardware instead of a configurable one with many
>> tunables.  The later is increases the maintainable cost (with testing and
>> performance evaluation).
> 
> Having dedicated implementations for different architectures is even more
> costly!
> My intention/proposal is to have at most two different generic implementations,
> one using integer bit-twiddling wherever possible, thus supporting soft-fp well,
> the second using floating-point wherever possible, thus supporting modern
> hardware well.

The only reservation I have for such approach it it would add some more maintenance
and testing.  I added similar optimization for hypot on powerpc, mainly to avoid
a CPU pipeline hazard on some chips due the GPR to FP transfer; but I am working
on generic solution to just remove the powerpc specific implementation in favor
over a generic one.

next prev parent reply	other threads:[~2021-08-23 16:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-21 13:34 Stefan Kanthak
2021-08-23 12:23 ` Adhemerval Zanella
2021-08-23 13:18   ` Stefan Kanthak
2021-08-23 14:11     ` Adhemerval Zanella
2021-08-23 15:37       ` Stefan Kanthak
2021-08-23 16:51         ` Adhemerval Zanella [this message]
2021-08-23 17:32           ` Stefan Kanthak
2021-08-23 18:24             ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0978c043-b32b-ecf8-5cfe-de31c473bb4d@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=libc-help@sourceware.org \
    --cc=stefan.kanthak@nexgo.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).