Twiddling with 64-bit values as 2 ints;

public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed

* Twiddling with 64-bit values as 2 ints;
@ 2021-08-21 13:34 Stefan Kanthak
  2021-08-23 12:23 ` Adhemerval Zanella
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Kanthak @ 2021-08-21 13:34 UTC (permalink / raw)
  To: libc-help

Hi,

32 years ago, C89 introduced 64-bit integers: [un]signed long long
IEEE 754 defined the 64-bit double-precision floating-point format,
now called binary64. in 1985.

Especially SunSoft's [fd]libm, which (to my knowledge) started around
this time, and also IBM's APMathLib/libultim, which followed a little
later, and also quite some ACM TOMS routines, but use (pairs of) 32-bit
integers for bit-twiddling on the representation of double/binary64:
additions/subtractions/shifts on the 52-bit mantissa/fraction, and
operations on the full 64-bit double, involve both ints, and need to
take care of the carry/borrow -- explicitly, and quite ugly!
It's also generally unknown whether a compiler will recognize this
sort of carry/borrow/overflow handling and generate proper machine
code using "add with carry"/"subtract with borrow" instructions.

JFTR: while sticking with 32-bit integers MAY give better performance
      on 32-bit processors, especially when an operations only involves
      either low or high part, the explicit carry/borrow handling can
      have negative performance impact.

See for example <http://www.netlib.no/netlib/toms/722>, written by
William J. Cody (known from Cody/Waite range reduction):

|    W. J. Cody, J. T. Coonen, March 30, 1992
...
|       /* Otherwise, use integer arithmetic to increment or      */
|       /* decrement least significant half of z, being careful   */
|       /* with carries and borrows involving most significant    */
|       /* half.                                                  */
|          else if (((argx < Zero) && (argx < argy)) ||
|                   ((argx > Zero) && (argx > argy))) {
|                   --lowpart(z);
|                   if (lowpart(z) == -1)
|                      --highpart(z);
|                   }
|                else {
|                   ++lowpart(z);
|                   if (lowpart(z) == 0)
|                      ++highpart(z);
|                   }
|

Compare this with the REALLY UGLY
<https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=math/s_nextafter.c;hb=HEAD>

|  * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
...
|        if(((ix>=0x7ff00000)&&((ix-0x7ff00000)|lx)!=0) ||   /* x is nan */
|           ((iy>=0x7ff00000)&&((iy-0x7ff00000)|ly)!=0))     /* y is nan */
|           return x+y;
...
|        if(hx>=0) {                               /* x > 0 */
|            if(hx>hy||((hx==hy)&&(lx>ly))) {      /* x > y, x -= ulp */
|                if(lx==0) hx -= 1;
|                lx -= 1;
|            } else {                              /* x < y, x += ulp */
|                lx += 1;
|                if(lx==0) hx += 1;
|            }
|        } else {                                  /* x < 0 */
|            if(hy>=0||hx>hy||((hx==hy)&&(lx>ly))){/* x < y, x -= ulp */
|                if(lx==0) hx -= 1;
|                lx -= 1;
|            } else {                              /* x > y, x += ulp */
|                lx += 1;
|                if(lx==0) hx += 1;
|            }
|        }

(Heretic.-) questions:
- why does glibc still employ such ugly code?
- Why doesn't glibc take advantage of 64-bit integers in such code?

JFTR: on 64-bit processors, when the compiler does not recognize
      that hx:lx and hy:ly are in fact a single 64-bit integer it
      can hold in a SINGLE register, but smears it over 2 registers,
      such cruft kills performance.

For 32-bit processors, the JFTR from above still holds: using 64-bit
integers with a C89 compiler should give better machine code.

Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Twiddling with 64-bit values as 2 ints;
  2021-08-21 13:34 Twiddling with 64-bit values as 2 ints; Stefan Kanthak
@ 2021-08-23 12:23 ` Adhemerval Zanella
  2021-08-23 13:18   ` Stefan Kanthak
  0 siblings, 1 reply; 8+ messages in thread
From: Adhemerval Zanella @ 2021-08-23 12:23 UTC (permalink / raw)
  To: Stefan Kanthak, libc-help

On 21/08/2021 10:34, Stefan Kanthak wrote:
> 
> (Heretic.-) questions:
> - why does glibc still employ such ugly code?
> - Why doesn't glibc take advantage of 64-bit integers in such code?

Because no one cared to adjust the implementation.  Recently Wilco
has removed a lot of old code that still uses 32-bit instead of 64-bit
bo bit twinddling in floating-pointer implementation (check caa884dda7
and 9e97f239eae1f2).

I think we should move to use a simplest code assuming 64-bit CPU
and let the compiler optimize it (which unfortunately gcc is not that
smart in all the cases).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Twiddling with 64-bit values as 2 ints;
  2021-08-23 12:23 ` Adhemerval Zanella
@ 2021-08-23 13:18   ` Stefan Kanthak
  2021-08-23 14:11     ` Adhemerval Zanella
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Kanthak @ 2021-08-23 13:18 UTC (permalink / raw)
  To: libc-help, Adhemerval Zanella

Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:

> On 21/08/2021 10:34, Stefan Kanthak wrote:
>> 
>> (Heretic.-) questions:
>> - why does glibc still employ such ugly code?
>> - Why doesn't glibc take advantage of 64-bit integers in such code?
> 
> Because no one cared to adjust the implementation.  Recently Wilco
> has removed a lot of old code that still uses 32-bit instead of 64-bit
> bo bit twinddling in floating-pointer implementation (check caa884dda7
> and 9e97f239eae1f2).

That's good to hear.

> I think we should move to use a simplest code assuming 64-bit CPU

D'accord.
And there's a second direction where you might move: almost all CPUs
have separate general purpose registers and floating-point registers.
Bit-twiddling generally needs extra (and sometimes slow) transfers
between them.
In 32-bit environment, where arguments are typically passed on the
stack, at least loading an argument from the stack into a GPR or FPR
makes no difference.
In 64-bit environment, where arguments are passed in registers, they
should be operated on in these registers.

So: why not implement routines like nextafter() without bit-twiddling,
using floating-point as far as possible for architectures where this
gives better results?

The simple implementation I showed in my initial post improved the
throughput in my benchmark (on AMD64) by an order of magnitude.
In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.

Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be
used to select between the integer bit-twiddling code and FP-preferring
code during compilation?

> and let the compiler optimize it (which unfortunately gcc is not that
> smart in all the cases).

I know, and I just learned that GCC does NOT perform quite some
optimisations I expect from a mature compiler.
Quoting Jakub Jelinek on gcc@gcc.gnu.org:

| GCC doesn't do value range propagation of floating point values, not
| even the special ones like NaNs, infinities, +/- zeros etc., and without
| that the earlier ifs aren't taken into account for the earlier code.

The code I used to demonstrate this deficiency is TOMS 722...

Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Twiddling with 64-bit values as 2 ints;
  2021-08-23 13:18   ` Stefan Kanthak
@ 2021-08-23 14:11     ` Adhemerval Zanella
  2021-08-23 15:37       ` Stefan Kanthak
  0 siblings, 1 reply; 8+ messages in thread
From: Adhemerval Zanella @ 2021-08-23 14:11 UTC (permalink / raw)
  To: Stefan Kanthak, libc-help

On 23/08/2021 10:18, Stefan Kanthak wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
>> On 21/08/2021 10:34, Stefan Kanthak wrote:
>>>
>>> (Heretic.-) questions:
>>> - why does glibc still employ such ugly code?
>>> - Why doesn't glibc take advantage of 64-bit integers in such code?
>>
>> Because no one cared to adjust the implementation.  Recently Wilco
>> has removed a lot of old code that still uses 32-bit instead of 64-bit
>> bo bit twinddling in floating-pointer implementation (check caa884dda7
>> and 9e97f239eae1f2).
> 
> That's good to hear.
> 
>> I think we should move to use a simplest code assuming 64-bit CPU
> 
> D'accord.
> And there's a second direction where you might move: almost all CPUs
> have separate general purpose registers and floating-point registers.
> Bit-twiddling generally needs extra (and sometimes slow) transfers
> between them.
> In 32-bit environment, where arguments are typically passed on the
> stack, at least loading an argument from the stack into a GPR or FPR
> makes no difference.
> In 64-bit environment, where arguments are passed in registers, they
> should be operated on in these registers.
> 
> So: why not implement routines like nextafter() without bit-twiddling,
> using floating-point as far as possible for architectures where this
> gives better results?

Mainly because some math routines are not performance critical in the
sense they are usually not hotspots and for these I would prefer the 
simplest code that work with reasonable performance independently of
the underlying ABI or architecture (using integer operation might be
be for soft-fp ABI for instance).

For symbols are might be performance critical, we do have more optimized
version.  Szabolcs and Wilco spent considerable time to tune a lot of
math functions and to remove the slow code path; also for some routines
we have internal defines that map then to compiler builtin when we know
that compiler and architecture allows us to do so (check the rounding
routines or sqrt for instance).

Recently we are aiming to avoid arch-specific code for complex routines,
and prefer C implementation that leverage the compiler support.  It makes
a *much* maintainable code and without the need to keep evaluating the 
routines on each architecture new iterations (as some routines proven to
be slower than more well coded generic implementation).

> 
> The simple implementation I showed in my initial post improved the
> throughput in my benchmark (on AMD64) by an order of magnitude.
> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.

Your implementation triggered a lot of regression, you will need to sort
this out before considering performance numbers.  Also, we will need
a proper benchmark to evaluate it, as Szabolcs and Wilco has done for
their math work.

> 
> Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be
> used to select between the integer bit-twiddling code and FP-preferring
> code during compilation?

No and I don't think we this would be a good addition.  As before, I would
prefer to have a simple generic implementation that give us a good
performance on modern hardware instead of a configurable one with many
tunables.  The later is increases the maintainable cost (with testing and
performance evaluation).

> 
>> and let the compiler optimize it (which unfortunately gcc is not that
>> smart in all the cases).
> 
> I know, and I just learned that GCC does NOT perform quite some
> optimisations I expect from a mature compiler.
> Quoting Jakub Jelinek on gcc@gcc.gnu.org:
> 
> | GCC doesn't do value range propagation of floating point values, not
> | even the special ones like NaNs, infinities, +/- zeros etc., and without
> | that the earlier ifs aren't taken into account for the earlier code.
> 
> The code I used to demonstrate this deficiency is TOMS 722...
> 
> Stefan
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Twiddling with 64-bit values as 2 ints;
  2021-08-23 14:11     ` Adhemerval Zanella
@ 2021-08-23 15:37       ` Stefan Kanthak
  2021-08-23 16:51         ` Adhemerval Zanella
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Kanthak @ 2021-08-23 15:37 UTC (permalink / raw)
  To: libc-help, Adhemerval Zanella

Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:

> On 23/08/2021 10:18, Stefan Kanthak wrote:
>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>> 
>>> On 21/08/2021 10:34, Stefan Kanthak wrote:
>>>>
>>>> (Heretic.-) questions:
>>>> - why does glibc still employ such ugly code?
>>>> - Why doesn't glibc take advantage of 64-bit integers in such code?
>>>
>>> Because no one cared to adjust the implementation.  Recently Wilco
>>> has removed a lot of old code that still uses 32-bit instead of 64-bit
>>> bo bit twinddling in floating-pointer implementation (check caa884dda7
>>> and 9e97f239eae1f2).
>> 
>> That's good to hear.
>> 
>>> I think we should move to use a simplest code assuming 64-bit CPU
>> 
>> D'accord.
>> And there's a second direction where you might move: almost all CPUs
>> have separate general purpose registers and floating-point registers.
>> Bit-twiddling generally needs extra (and sometimes slow) transfers
>> between them.
>> In 32-bit environment, where arguments are typically passed on the
>> stack, at least loading an argument from the stack into a GPR or FPR
>> makes no difference.
>> In 64-bit environment, where arguments are passed in registers, they
>> should be operated on in these registers.
>> 
>> So: why not implement routines like nextafter() without bit-twiddling,
>> using floating-point as far as possible for architectures where this
>> gives better results?
> 
> Mainly because some math routines are not performance critical in the
> sense they are usually not hotspots and for these I would prefer the 
> simplest code that work with reasonable performance independently of
> the underlying ABI or architecture

With this we're back at square 1: my initial post showed such simple(st)
code.
The performance gain I experienced in my use case was more than
noticeable: on AMD64, the total runtime of my program decreased from
20s to 12s.

> (using integer operation might be be for soft-fp ABI for instance).



> For symbols are might be performance critical, we do have more optimized
> version.  Szabolcs and Wilco spent considerable time to tune a lot of
> math functions and to remove the slow code path; also for some routines
> we have internal defines that map then to compiler builtin when we know
> that compiler and architecture allows us to do so (check the rounding
> routines or sqrt for instance).
> 
> Recently we are aiming to avoid arch-specific code for complex routines,
> and prefer C implementation that leverage the compiler support.  It makes
> a *much* maintainable code and without the need to keep evaluating the 
> routines on each architecture new iterations (as some routines proven to
> be slower than more well coded generic implementation).

That was the goal of my patch: let the compiler operate on 64-bit integers
instead the C implementation on pairs of 32-bit integers.

>> The simple implementation I showed in my initial post improved the
>> throughput in my benchmark (on AMD64) by an order of magnitude.
>> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
>> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.
> 
> Your implementation triggered a lot of regression,

The initial, FP-preferring code was a demonstration, not a patch.

> you will need to sort this out before considering performance numbers.
> Also, we will need a proper benchmark to evaluate it, as Szabolcs and
> Wilco has done for their math work.
> 
>> 
>> Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be
>> used to select between the integer bit-twiddling code and FP-preferring
>> code during compilation?
> 
> No and I don't think we this would be a good addition.  As before, I would
> prefer to have a simple generic implementation that give us a good
> performance on modern hardware instead of a configurable one with many
> tunables.  The later is increases the maintainable cost (with testing and
> performance evaluation).

Having dedicated implementations for different architectures is even more
costly!
My intention/proposal is to have at most two different generic implementations,
one using integer bit-twiddling wherever possible, thus supporting soft-fp well,
the second using floating-point wherever possible, thus supporting modern
hardware well.

Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Twiddling with 64-bit values as 2 ints;
  2021-08-23 15:37       ` Stefan Kanthak
@ 2021-08-23 16:51         ` Adhemerval Zanella
  2021-08-23 17:32           ` Stefan Kanthak
  0 siblings, 1 reply; 8+ messages in thread
From: Adhemerval Zanella @ 2021-08-23 16:51 UTC (permalink / raw)
  To: Stefan Kanthak, libc-help



On 23/08/2021 12:37, Stefan Kanthak wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
>> On 23/08/2021 10:18, Stefan Kanthak wrote:
>>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>>
>>>> On 21/08/2021 10:34, Stefan Kanthak wrote:
>>>>>
>>>>> (Heretic.-) questions:
>>>>> - why does glibc still employ such ugly code?
>>>>> - Why doesn't glibc take advantage of 64-bit integers in such code?
>>>>
>>>> Because no one cared to adjust the implementation.  Recently Wilco
>>>> has removed a lot of old code that still uses 32-bit instead of 64-bit
>>>> bo bit twinddling in floating-pointer implementation (check caa884dda7
>>>> and 9e97f239eae1f2).
>>>
>>> That's good to hear.
>>>
>>>> I think we should move to use a simplest code assuming 64-bit CPU
>>>
>>> D'accord.
>>> And there's a second direction where you might move: almost all CPUs
>>> have separate general purpose registers and floating-point registers.
>>> Bit-twiddling generally needs extra (and sometimes slow) transfers
>>> between them.
>>> In 32-bit environment, where arguments are typically passed on the
>>> stack, at least loading an argument from the stack into a GPR or FPR
>>> makes no difference.
>>> In 64-bit environment, where arguments are passed in registers, they
>>> should be operated on in these registers.
>>>
>>> So: why not implement routines like nextafter() without bit-twiddling,
>>> using floating-point as far as possible for architectures where this
>>> gives better results?
>>
>> Mainly because some math routines are not performance critical in the
>> sense they are usually not hotspots and for these I would prefer the 
>> simplest code that work with reasonable performance independently of
>> the underlying ABI or architecture
> 
> With this we're back at square 1: my initial post showed such simple(st)
> code.
> The performance gain I experienced in my use case was more than
> noticeable: on AMD64, the total runtime of my program decreased from
> 20s to 12s.
> 
>> (using integer operation might be be for soft-fp ABI for instance).
> 
> 
> 
>> For symbols are might be performance critical, we do have more optimized
>> version.  Szabolcs and Wilco spent considerable time to tune a lot of
>> math functions and to remove the slow code path; also for some routines
>> we have internal defines that map then to compiler builtin when we know
>> that compiler and architecture allows us to do so (check the rounding
>> routines or sqrt for instance).
>>
>> Recently we are aiming to avoid arch-specific code for complex routines,
>> and prefer C implementation that leverage the compiler support.  It makes
>> a *much* maintainable code and without the need to keep evaluating the 
>> routines on each architecture new iterations (as some routines proven to
>> be slower than more well coded generic implementation).
> 
> That was the goal of my patch: let the compiler operate on 64-bit integers
> instead the C implementation on pairs of 32-bit integers.
> 
>>> The simple implementation I showed in my initial post improved the
>>> throughput in my benchmark (on AMD64) by an order of magnitude.
>>> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
>>> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.
>>
>> Your implementation triggered a lot of regression,
> 
> The initial, FP-preferring code was a demonstration, not a patch.

Right, but it does do not much sense comparing performance numbers with
an implementation that adds a lot of regressions. 

> 
>> you will need to sort this out before considering performance numbers.
>> Also, we will need a proper benchmark to evaluate it, as Szabolcs and
>> Wilco has done for their math work.
>>
>>>
>>> Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be
>>> used to select between the integer bit-twiddling code and FP-preferring
>>> code during compilation?
>>
>> No and I don't think we this would be a good addition.  As before, I would
>> prefer to have a simple generic implementation that give us a good
>> performance on modern hardware instead of a configurable one with many
>> tunables.  The later is increases the maintainable cost (with testing and
>> performance evaluation).
> 
> Having dedicated implementations for different architectures is even more
> costly!
> My intention/proposal is to have at most two different generic implementations,
> one using integer bit-twiddling wherever possible, thus supporting soft-fp well,
> the second using floating-point wherever possible, thus supporting modern
> hardware well.

The only reservation I have for such approach it it would add some more maintenance
and testing.  I added similar optimization for hypot on powerpc, mainly to avoid
a CPU pipeline hazard on some chips due the GPR to FP transfer; but I am working
on generic solution to just remove the powerpc specific implementation in favor
over a generic one.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Twiddling with 64-bit values as 2 ints;
  2021-08-23 16:51         ` Adhemerval Zanella
@ 2021-08-23 17:32           ` Stefan Kanthak
  2021-08-23 18:24             ` Adhemerval Zanella
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Kanthak @ 2021-08-23 17:32 UTC (permalink / raw)
  To: libc-help, Adhemerval Zanella

Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:

> On 23/08/2021 12:37, Stefan Kanthak wrote:
>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>> 
>>> On 23/08/2021 10:18, Stefan Kanthak wrote:
>>>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:

>>>> The simple implementation I showed in my initial post improved the
>>>> throughput in my benchmark (on AMD64) by an order of magnitude.
>>>> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
>>>> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.
>>>
>>> Your implementation triggered a lot of regression,
>> 
>> The initial, FP-preferring code was a demonstration, not a patch.
> 
> Right, but it does do not much sense comparing performance numbers with
> an implementation that adds a lot of regressions. 

This argument also holds for a correct FP-preferring implementation due to
the POOR code GCC currently generates: the 4 superfluous FP-comparisions
plus conditional branches GCC generates have worse runtime than the missing
code to handle fenv/underflow/overflow/errno.

[...]

>> Having dedicated implementations for different architectures is even more
>> costly!
>> My intention/proposal is to have at most two different generic implementations,
>> one using integer bit-twiddling wherever possible, thus supporting soft-fp well,
>> the second using floating-point wherever possible, thus supporting modern
>> hardware well.
> 
> The only reservation I have for such approach it it would add some more maintenance
> and testing.

Insert "wherever needed" before/after "wherever possible".

Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Twiddling with 64-bit values as 2 ints;
  2021-08-23 17:32           ` Stefan Kanthak
@ 2021-08-23 18:24             ` Adhemerval Zanella
  0 siblings, 0 replies; 8+ messages in thread
From: Adhemerval Zanella @ 2021-08-23 18:24 UTC (permalink / raw)
  To: Stefan Kanthak, libc-help



On 23/08/2021 14:32, Stefan Kanthak wrote:
> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
>> On 23/08/2021 12:37, Stefan Kanthak wrote:
>>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>>
>>>> On 23/08/2021 10:18, Stefan Kanthak wrote:
>>>>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
>>>>> The simple implementation I showed in my initial post improved the
>>>>> throughput in my benchmark (on AMD64) by an order of magnitude.
>>>>> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
>>>>> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.
>>>>
>>>> Your implementation triggered a lot of regression,
>>>
>>> The initial, FP-preferring code was a demonstration, not a patch.
>>
>> Right, but it does do not much sense comparing performance numbers with
>> an implementation that adds a lot of regressions. 
> 
> This argument also holds for a correct FP-preferring implementation due to
> the POOR code GCC currently generates: the 4 superfluous FP-comparisions
> plus conditional branches GCC generates have worse runtime than the missing
> code to handle fenv/underflow/overflow/errno.

In any case, please come up with number only *after* you fix any regression
on testcases.

> 
> [...]
> 
>>> Having dedicated implementations for different architectures is even more
>>> costly!
>>> My intention/proposal is to have at most two different generic implementations,
>>> one using integer bit-twiddling wherever possible, thus supporting soft-fp well,
>>> the second using floating-point wherever possible, thus supporting modern
>>> hardware well.
>>
>> The only reservation I have for such approach it it would add some more maintenance
>> and testing.
> 
> Insert "wherever needed" before/after "wherever possible".
> 
> Stefan
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-08-23 18:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-21 13:34 Twiddling with 64-bit values as 2 ints; Stefan Kanthak
2021-08-23 12:23 ` Adhemerval Zanella
2021-08-23 13:18   ` Stefan Kanthak
2021-08-23 14:11     ` Adhemerval Zanella
2021-08-23 15:37       ` Stefan Kanthak
2021-08-23 16:51         ` Adhemerval Zanella
2021-08-23 17:32           ` Stefan Kanthak
2021-08-23 18:24             ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).