From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2a.google.com (mail-oa1-x2a.google.com [IPv6:2001:4860:4864:20::2a]) by sourceware.org (Postfix) with ESMTPS id 5F4573858D1E for ; Mon, 13 Mar 2023 16:38:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5F4573858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x2a.google.com with SMTP id 586e51a60fabf-17997ccf711so772202fac.0 for ; Mon, 13 Mar 2023 09:38:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1678725487; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=jEiR5WGdObPtd3fcNhBja04oR4pv95Rg8Jvc02gSo00=; b=a7gHSuRUcc9Y5vDoFI4PwGuHV/No9myq1zQoTzQhppooiKNohFHk6r33H4zW3HP0gf nLfl1cRR+Sl6zpeF3plvUFJh3SNbOwgoOuqiYCKek7kTkTrx91T7PLmYO95650eVlLTA 5revu8kP+Opfd7GxJ2BQIGmM9SckN+NPKb9y2pjY5zOHBE7p5QEdBe2f712hG1TyVrjd es1g+AZ/fRDp0BjzK8l8ldFUPXEF05x2IsS8xM3V7nYOsvVcb3uNBHuHyrlOpjIMqCdL XPPj8nBy93bTqAl4+3MfBy4I88/OpLVkQXWos2OIrnQ7BPD5ZEaMyThJBzIMup9j4piV MiCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678725487; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jEiR5WGdObPtd3fcNhBja04oR4pv95Rg8Jvc02gSo00=; b=OQLpIq6G8QaafaSLqz9w8tdqj+E7cINwonk6K9mjNNaYPPpaodWk6xfzkSm6OfFw2F MaZRU9+qFcryosTLtrc1S13psD2G580BFCIIzcmzkB1mgnAzyNj3sj8QkLlMDEeC39Rc o90r2gAShgQuiVC+BbHD5QeypSip4vtX32HHGCAT66R/z6EJMApcBiAjphgTMQ9KJvjm BvyFmzHKC0EpXhLuPC+ljztglZuC4ldHMOwq1ux16pXEm7pTijesGKSz7CG7TbJdRpf/ rAX9fTtPi99zpnk8qcd78jwrPbpzmj0RiKzvzsgyvYNGuhSN5Mmdb9A5BoyuvLCLSLiP 7BcA== X-Gm-Message-State: AO0yUKXNu/OibYpZymTen05SPUBoX/FD1iZpttScS42h3VWR38lebUAF mpv8LEXIUiFWny6+cv82rEa/VQ== X-Google-Smtp-Source: AK7set94y9d1d5Ivk17V9UeRhPU603LLz3ktY3cX5w4Z26+ehJmTBzTVEzqqFO+bnKAOOIwCV81n5g== X-Received: by 2002:a05:6870:3920:b0:16d:c18d:4074 with SMTP id b32-20020a056870392000b0016dc18d4074mr20471602oap.12.1678725486726; Mon, 13 Mar 2023 09:38:06 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c0:544b:c5ae:2fda:455d:6e31? ([2804:1b3:a7c0:544b:c5ae:2fda:455d:6e31]) by smtp.gmail.com with ESMTPSA id 18-20020aca0d12000000b0037d7c3cfac7sm3313776oin.15.2023.03.13.09.38.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 13 Mar 2023 09:38:06 -0700 (PDT) Message-ID: Date: Mon, 13 Mar 2023 13:38:03 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH 4/4] math: Improve fmodf Content-Language: en-US To: Matt Turner Cc: libc-alpha@sourceware.org, Wilco Dijkstra , "H . J . Lu" , kirill References: <20230310175900.2388957-1-adhemerval.zanella@linaro.org> <20230310175900.2388957-5-adhemerval.zanella@linaro.org> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 13/03/23 12:19, Matt Turner wrote: > On Fri, Mar 10, 2023 at 1:01 PM Adhemerval Zanella via Libc-alpha > wrote: >> >> This uses a new algorithm similar to already proposed earlier [1]. >> With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers), >> the simplest implementation is: >> >> mx * 2^ex == 2 * mx * 2^(ex - 1) >> >> while (ex > ey) >> { >> mx *= 2; >> --ex; >> mx %= my; >> } >> >> With mx/my being mantissa of double floating pointer, on each step the >> argument reduction can be improved 8 (which is sizeof of uint32_t minus >> MANTISSA_WIDTH plus the signal bit): >> >> while (ex > ey) >> { >> mx << 8; >> ex -= 8; >> mx %= my; >> } */ >> >> The implementation uses builtin clz and ctz, along with shifts to >> convert hx/hy back to doubles. Different than the original patch, >> this path assume modulo/divide operation is slow, so use multiplication >> with invert values. >> >> I see the following performance improvements using fmod benchtests >> (result only show the 'mean' result): >> >> Architecture | Input | master | patch >> -----------------|-----------------|----------|-------- >> x86_64 (Ryzen 9) | subnormals | 17.2549 | 12.3214 >> x86_64 (Ryzen 9) | normal | 85.4096 | 52.6625 >> x86_64 (Ryzen 9) | close-exponents | 19.1072 | 17.4622 >> aarch64 (N1) | subnormal | 10.2182 | 6.81778 >> aarch64 (N1) | normal | 60.0616 | 158.339 > > Is this line correct? 60 -> 158? Nops, it was an overlook from my part. I reran the benchmark to double check it and the corrects numbers are: Architecture | Input | master | patch -----------------|-----------------|----------|-------- x86_64 (Ryzen 9) | subnormals | 17.2549 | 12.3214 x86_64 (Ryzen 9) | normal | 85.4096 | 52.6625 x86_64 (Ryzen 9) | close-exponents | 19.1072 | 17.4622 aarch64 (N1) | subnormal | 10.2182 | 6.81778 aarch64 (N1) | normal | 60.0616 | 21.2581 aarch64 (N1) | close-exponents | 11.5256 | 8.67894