public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Re: Improvement of fmod()
@ 2023-03-07 22:13 Wilco Dijkstra
  0 siblings, 0 replies; 3+ messages in thread
From: Wilco Dijkstra @ 2023-03-07 22:13 UTC (permalink / raw)
  To: Adhemerval Zanella, edison.von.myosotis; +Cc: 'GNU C Library'

Hi,

Note the same code seems to be posted several times on Stackoverflow, eg.
[1], [2] - are they all the same author?

Also doing repeated division steps using CLZ is still essentially doing 1 bit at
a time. Previously we've had a patch doing 11+ bits per iteration using division.
In both cases you need hardware support or performance will be significantly
worse. The fastest solution is using a single division followed by multiply by
inverse - LLVM seems to have adopted that.

Cheers,
Wilco

[1] https://stackoverflow.com/questions/69222373/fastest-way-for-wrapping-a-value-in-an-interval
[2] https://stackoverflow.com/questions/41685039/is-fmod-faster-than-for-integer-modulus-calculation

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Improvement of fmod()
  2023-03-07 15:46 Edison von Myositis
@ 2023-03-07 17:01 ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 3+ messages in thread
From: Adhemerval Zanella Netto @ 2023-03-07 17:01 UTC (permalink / raw)
  To: Edison von Myositis, libc-alpha, H.J. Lu



On 07/03/23 12:46, Edison von Myositis via Libc-alpha wrote:
> I've implemented fmod() in a way that it runs +105% to +150% faster than
> the 30 yrs. old implementation from sun. It requires sth. like SETcc and BSR
> / LZCNT on x86.

This implementation is essentially the same posted sometime ago on libc-help [1],
did you used it as reference?  I also wrote some remarks of what would need to
be done to include it glibc.

PS: this maillist is mostly for patch submissions and related discussions.

[1] https://sourceware.org/pipermail/libc-help/2022-October/006310.html

> 
> #include <stdint.h>
> #include <string.h>
> #if defined(_MSC_VER)
>     #include <intrin.h>
> #endif
> 
> #define LIKELY(x) __builtin_expect((x), 1)
> #define UNLIKELY(x) __builtin_expect((x), 0)
> 
> #define MAX_EXP (0x7FF)
> #define SIGN_BIT ((uint64_t)1 << 63)
> #define EXP_MASK ((uint64_t)MAX_EXP << 52)
> #define IMPLCIT_BIT ((uint64_t)1 << 52)
> #define MANT_MASK (IMPLCIT_BIT - 1)
> 
> #define HAS_MAX_EXP(b) ((b) >= EXP_MASK)
> #define HAS_INF_MANT(b) (!((b) & MANT_MASK))
> 
> inline uint64_t bin( double d )
> {
>     uint64_t u;
>     memcpy( &u, &d, sizeof d );
>     return u;
> }
> 
> inline double dbl( uint64_t u )
> {
>     double d;
>     memcpy( &d, &u, sizeof u );
>     return d;
> }
> 
> inline void normalize( uint64_t *mant, int *exp )
> {
>     unsigned bits = __builtin_clzll( *mant ) - 11;
>     *mant <<= bits;
>     *exp -= bits;
> }
> 
> double myFmodC<( double counter, double denom )
> {
>     uint64_t
>         bCounter = bin( counter ),
>         bDenom = bin( denom ) & ~SIGN_BIT,
>         bSign = bCounter & SIGN_BIT;
>     bCounter &= ~SIGN_BIT;
>     if( UNLIKELY(!bDenom) || UNLIKELY(HAS_MAX_EXP(bCounter)) )
>         return (counter * denom) / (counter * denom);
>     if( UNLIKELY(HAS_MAX_EXP(bDenom)) )
>         if( LIKELY(HAS_INF_MANT(bDenom)) )
>             return counter;
>         else
>             return (counter * denom) / (counter * denom);
>     if( UNLIKELY(!bCounter) )
>         return counter;
>     int
>         counterExp = bCounter >> 52 & MAX_EXP,
>         denomExp = bDenom >> 52 & MAX_EXP;
>     uint64_t
>         counterMant = (uint64_t)(counterExp != 0) << 52 | bCounter &
> MANT_MASK,
>         denomMant = (uint64_t)(denomExp != 0) << 52 | bDenom & MANT_MASK;
>     if( UNLIKELY(!counterExp) )
>         // normalize counter
>         normalize( &counterMant, &counterExp ),
>         ++counterExp;
>     if( UNLIKELY(!denomExp) )
>         // normalize denominator
>         normalize( &denomMant, &denomExp ),
>         ++denomExp;
>     int remExp = counterExp;
>     uint64_t remMant = counterMant;
>     for( ; ; )
>     {
>         int below = remMant < denomMant;
>         if( UNLIKELY(remExp - below < denomExp) )
>             break;
>         remExp -= below;
>         remMant <<= below;
>         if( UNLIKELY(!(remMant -= denomMant)) )
>         {
>             remExp = 0;
>             break;
>         }
>         normalize( &remMant, &remExp );
>     };
>     if( UNLIKELY(remExp <= 0) )
>         // denormal result
>         remMant >>= -remExp + 1,
>         remExp = 0;
>     return dbl( bSign | (uint64_t)remExp << 52 | remMant & MANT_MASK );
> }
> 
> The results are binary-compatible to those of glibc i.e. all the (S)NaN-
> and Inf-results are all the same and all finite results are the same.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Improvement of fmod()
@ 2023-03-07 15:46 Edison von Myositis
  2023-03-07 17:01 ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 3+ messages in thread
From: Edison von Myositis @ 2023-03-07 15:46 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2775 bytes --]

I've implemented fmod() in a way that it runs +105% to +150% faster than
the 30 yrs. old implementation from sun. It requires sth. like SETcc and BSR
/ LZCNT on x86.

#include <stdint.h>
#include <string.h>
#if defined(_MSC_VER)
    #include <intrin.h>
#endif

#define LIKELY(x) __builtin_expect((x), 1)
#define UNLIKELY(x) __builtin_expect((x), 0)

#define MAX_EXP (0x7FF)
#define SIGN_BIT ((uint64_t)1 << 63)
#define EXP_MASK ((uint64_t)MAX_EXP << 52)
#define IMPLCIT_BIT ((uint64_t)1 << 52)
#define MANT_MASK (IMPLCIT_BIT - 1)

#define HAS_MAX_EXP(b) ((b) >= EXP_MASK)
#define HAS_INF_MANT(b) (!((b) & MANT_MASK))

inline uint64_t bin( double d )
{
    uint64_t u;
    memcpy( &u, &d, sizeof d );
    return u;
}

inline double dbl( uint64_t u )
{
    double d;
    memcpy( &d, &u, sizeof u );
    return d;
}

inline void normalize( uint64_t *mant, int *exp )
{
    unsigned bits = __builtin_clzll( *mant ) - 11;
    *mant <<= bits;
    *exp -= bits;
}

double myFmodC<( double counter, double denom )
{
    uint64_t
        bCounter = bin( counter ),
        bDenom = bin( denom ) & ~SIGN_BIT,
        bSign = bCounter & SIGN_BIT;
    bCounter &= ~SIGN_BIT;
    if( UNLIKELY(!bDenom) || UNLIKELY(HAS_MAX_EXP(bCounter)) )
        return (counter * denom) / (counter * denom);
    if( UNLIKELY(HAS_MAX_EXP(bDenom)) )
        if( LIKELY(HAS_INF_MANT(bDenom)) )
            return counter;
        else
            return (counter * denom) / (counter * denom);
    if( UNLIKELY(!bCounter) )
        return counter;
    int
        counterExp = bCounter >> 52 & MAX_EXP,
        denomExp = bDenom >> 52 & MAX_EXP;
    uint64_t
        counterMant = (uint64_t)(counterExp != 0) << 52 | bCounter &
MANT_MASK,
        denomMant = (uint64_t)(denomExp != 0) << 52 | bDenom & MANT_MASK;
    if( UNLIKELY(!counterExp) )
        // normalize counter
        normalize( &counterMant, &counterExp ),
        ++counterExp;
    if( UNLIKELY(!denomExp) )
        // normalize denominator
        normalize( &denomMant, &denomExp ),
        ++denomExp;
    int remExp = counterExp;
    uint64_t remMant = counterMant;
    for( ; ; )
    {
        int below = remMant < denomMant;
        if( UNLIKELY(remExp - below < denomExp) )
            break;
        remExp -= below;
        remMant <<= below;
        if( UNLIKELY(!(remMant -= denomMant)) )
        {
            remExp = 0;
            break;
        }
        normalize( &remMant, &remExp );
    };
    if( UNLIKELY(remExp <= 0) )
        // denormal result
        remMant >>= -remExp + 1,
        remExp = 0;
    return dbl( bSign | (uint64_t)remExp << 52 | remMant & MANT_MASK );
}

The results are binary-compatible to those of glibc i.e. all the (S)NaN-
and Inf-results are all the same and all finite results are the same.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-03-07 22:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-07 22:13 Improvement of fmod() Wilco Dijkstra
  -- strict thread matches above, loose matches on Subject: below --
2023-03-07 15:46 Edison von Myositis
2023-03-07 17:01 ` Adhemerval Zanella Netto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).