From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2a.google.com (mail-oa1-x2a.google.com [IPv6:2001:4860:4864:20::2a]) by sourceware.org (Postfix) with ESMTPS id 14B5A3858C39 for ; Tue, 7 Mar 2023 17:01:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 14B5A3858C39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x2a.google.com with SMTP id 586e51a60fabf-1763e201bb4so15795647fac.1 for ; Tue, 07 Mar 2023 09:01:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1678208514; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=OoFf7msiZfxt05noPtf616azeIXQl8gFh9QcrZDt38g=; b=NC1mPaJvLK8nRg5YsPIr1dXoxSOH45KCVPQ+xZFowFt8XwwledhpyKsKCsrESOqLJp k7xx0p9dqy8c8rxm9ZkS1NQgKTIHmriH2EQ48BNBg2Ban9wzCPgazh61LBjkZqg//a1T U8FoG7rx90Cwhnv+YoztqqwZ0b40zJLY4++cdWG7Klve0DPXq2SwROF/ojQZLX7tuUcn pTAQkI7sn1Tn8t1IOvzZYS78+X78OUDigcnQ8HDrIgy4sNSBXCnzWQMMnK2BKDfQjv4P wx7+Jbbbwyyf76Em9P0aPqxK2mRsUgaPeCpBp4nvTges+0QpqwXq9Yr1iYSlVusCFcDr RaJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678208514; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OoFf7msiZfxt05noPtf616azeIXQl8gFh9QcrZDt38g=; b=PkXE5bZ/wyXpr3zD6hmGV3UAYljeWqo86tqzuml59h8dJs5Pva4wfm25QMxcxAw0BE lThtZgsYbJPB3u75+5xh/vktAoo35GH53DzKZJN/Be89qImdoXgN2En6v5THY1nzfIjc tseWyg/hbHktyNYvkqe0mb1+26mr4pEnfVSvQTyiKIpXXEmWUvH97fr+ifltLhW1uilf 5ECwN7iVhScNPnL2SpsTEiMFsUhgvCczKbxHtKdvSkezZF4Z8I9dSsF+mPiQFAznIFzD NXvbkDdfy6MHtgQWRvN4HSmNYCR9ocf6JRvWgHyJYP/Aqp4QS7lVmt2CNwNrG/uyNnCk jYog== X-Gm-Message-State: AO0yUKX/R+Z7SViyXY5bell1+PmMfXkLb+OEatX/gtxVbm5VgGYbxhHU y0TWnn2IFor7S9eZwXx90ICNew== X-Google-Smtp-Source: AK7set8nB2iJ0NXNpOzTz3+Ul/me8vu40VpCJ31FyGtFXutmsYXAtRjXJeY4oAIJcnLtABk19vRZUA== X-Received: by 2002:a05:6870:e2d6:b0:176:2e42:ee01 with SMTP id w22-20020a056870e2d600b001762e42ee01mr9751984oad.6.1678208514314; Tue, 07 Mar 2023 09:01:54 -0800 (PST) Received: from ?IPV6:2804:1b3:a7c3:d849:85a1:d2e8:5a25:72e7? ([2804:1b3:a7c3:d849:85a1:d2e8:5a25:72e7]) by smtp.gmail.com with ESMTPSA id u27-20020a4a6c5b000000b00524f381f681sm5199755oof.27.2023.03.07.09.01.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 07 Mar 2023 09:01:53 -0800 (PST) Message-ID: <9a032859-4bcb-077d-06e3-382e63bc5271@linaro.org> Date: Tue, 7 Mar 2023 14:01:47 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: Improvement of fmod() Content-Language: en-US To: Edison von Myositis , libc-alpha@sourceware.org, "H.J. Lu" References: From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 07/03/23 12:46, Edison von Myositis via Libc-alpha wrote: > I've implemented fmod() in a way that it runs +105% to +150% faster than > the 30 yrs. old implementation from sun. It requires sth. like SETcc and BSR > / LZCNT on x86. This implementation is essentially the same posted sometime ago on libc-help [1], did you used it as reference? I also wrote some remarks of what would need to be done to include it glibc. PS: this maillist is mostly for patch submissions and related discussions. [1] https://sourceware.org/pipermail/libc-help/2022-October/006310.html > > #include > #include > #if defined(_MSC_VER) > #include > #endif > > #define LIKELY(x) __builtin_expect((x), 1) > #define UNLIKELY(x) __builtin_expect((x), 0) > > #define MAX_EXP (0x7FF) > #define SIGN_BIT ((uint64_t)1 << 63) > #define EXP_MASK ((uint64_t)MAX_EXP << 52) > #define IMPLCIT_BIT ((uint64_t)1 << 52) > #define MANT_MASK (IMPLCIT_BIT - 1) > > #define HAS_MAX_EXP(b) ((b) >= EXP_MASK) > #define HAS_INF_MANT(b) (!((b) & MANT_MASK)) > > inline uint64_t bin( double d ) > { > uint64_t u; > memcpy( &u, &d, sizeof d ); > return u; > } > > inline double dbl( uint64_t u ) > { > double d; > memcpy( &d, &u, sizeof u ); > return d; > } > > inline void normalize( uint64_t *mant, int *exp ) > { > unsigned bits = __builtin_clzll( *mant ) - 11; > *mant <<= bits; > *exp -= bits; > } > > double myFmodC<( double counter, double denom ) > { > uint64_t > bCounter = bin( counter ), > bDenom = bin( denom ) & ~SIGN_BIT, > bSign = bCounter & SIGN_BIT; > bCounter &= ~SIGN_BIT; > if( UNLIKELY(!bDenom) || UNLIKELY(HAS_MAX_EXP(bCounter)) ) > return (counter * denom) / (counter * denom); > if( UNLIKELY(HAS_MAX_EXP(bDenom)) ) > if( LIKELY(HAS_INF_MANT(bDenom)) ) > return counter; > else > return (counter * denom) / (counter * denom); > if( UNLIKELY(!bCounter) ) > return counter; > int > counterExp = bCounter >> 52 & MAX_EXP, > denomExp = bDenom >> 52 & MAX_EXP; > uint64_t > counterMant = (uint64_t)(counterExp != 0) << 52 | bCounter & > MANT_MASK, > denomMant = (uint64_t)(denomExp != 0) << 52 | bDenom & MANT_MASK; > if( UNLIKELY(!counterExp) ) > // normalize counter > normalize( &counterMant, &counterExp ), > ++counterExp; > if( UNLIKELY(!denomExp) ) > // normalize denominator > normalize( &denomMant, &denomExp ), > ++denomExp; > int remExp = counterExp; > uint64_t remMant = counterMant; > for( ; ; ) > { > int below = remMant < denomMant; > if( UNLIKELY(remExp - below < denomExp) ) > break; > remExp -= below; > remMant <<= below; > if( UNLIKELY(!(remMant -= denomMant)) ) > { > remExp = 0; > break; > } > normalize( &remMant, &remExp ); > }; > if( UNLIKELY(remExp <= 0) ) > // denormal result > remMant >>= -remExp + 1, > remExp = 0; > return dbl( bSign | (uint64_t)remExp << 52 | remMant & MANT_MASK ); > } > > The results are binary-compatible to those of glibc i.e. all the (S)NaN- > and Inf-results are all the same and all finite results are the same.