From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=cSVn=7F=linaro.org=adhemerval.zanella@sourceware.org>
Received: from mail-oa1-x2a.google.com (mail-oa1-x2a.google.com [IPv6:2001:4860:4864:20::2a])
	by sourceware.org (Postfix) with ESMTPS id 5F4573858D1E
	for <libc-alpha@sourceware.org>; Mon, 13 Mar 2023 16:38:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5F4573858D1E
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org
Received: by mail-oa1-x2a.google.com with SMTP id 586e51a60fabf-17997ccf711so772202fac.0
        for <libc-alpha@sourceware.org>; Mon, 13 Mar 2023 09:38:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1678725487;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:from:to:cc:subject:date:message-id:reply-to;
        bh=jEiR5WGdObPtd3fcNhBja04oR4pv95Rg8Jvc02gSo00=;
        b=a7gHSuRUcc9Y5vDoFI4PwGuHV/No9myq1zQoTzQhppooiKNohFHk6r33H4zW3HP0gf
         nLfl1cRR+Sl6zpeF3plvUFJh3SNbOwgoOuqiYCKek7kTkTrx91T7PLmYO95650eVlLTA
         5revu8kP+Opfd7GxJ2BQIGmM9SckN+NPKb9y2pjY5zOHBE7p5QEdBe2f712hG1TyVrjd
         es1g+AZ/fRDp0BjzK8l8ldFUPXEF05x2IsS8xM3V7nYOsvVcb3uNBHuHyrlOpjIMqCdL
         XPPj8nBy93bTqAl4+3MfBy4I88/OpLVkQXWos2OIrnQ7BPD5ZEaMyThJBzIMup9j4piV
         MiCA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1678725487;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=jEiR5WGdObPtd3fcNhBja04oR4pv95Rg8Jvc02gSo00=;
        b=OQLpIq6G8QaafaSLqz9w8tdqj+E7cINwonk6K9mjNNaYPPpaodWk6xfzkSm6OfFw2F
         MaZRU9+qFcryosTLtrc1S13psD2G580BFCIIzcmzkB1mgnAzyNj3sj8QkLlMDEeC39Rc
         o90r2gAShgQuiVC+BbHD5QeypSip4vtX32HHGCAT66R/z6EJMApcBiAjphgTMQ9KJvjm
         BvyFmzHKC0EpXhLuPC+ljztglZuC4ldHMOwq1ux16pXEm7pTijesGKSz7CG7TbJdRpf/
         rAX9fTtPi99zpnk8qcd78jwrPbpzmj0RiKzvzsgyvYNGuhSN5Mmdb9A5BoyuvLCLSLiP
         7BcA==
X-Gm-Message-State: AO0yUKXNu/OibYpZymTen05SPUBoX/FD1iZpttScS42h3VWR38lebUAF
	mpv8LEXIUiFWny6+cv82rEa/VQ==
X-Google-Smtp-Source: AK7set94y9d1d5Ivk17V9UeRhPU603LLz3ktY3cX5w4Z26+ehJmTBzTVEzqqFO+bnKAOOIwCV81n5g==
X-Received: by 2002:a05:6870:3920:b0:16d:c18d:4074 with SMTP id b32-20020a056870392000b0016dc18d4074mr20471602oap.12.1678725486726;
        Mon, 13 Mar 2023 09:38:06 -0700 (PDT)
Received: from ?IPV6:2804:1b3:a7c0:544b:c5ae:2fda:455d:6e31? ([2804:1b3:a7c0:544b:c5ae:2fda:455d:6e31])
        by smtp.gmail.com with ESMTPSA id 18-20020aca0d12000000b0037d7c3cfac7sm3313776oin.15.2023.03.13.09.38.04
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 13 Mar 2023 09:38:06 -0700 (PDT)
Message-ID: <b91719fb-ecc4-c34a-5806-5a9726550a16@linaro.org>
Date: Mon, 13 Mar 2023 13:38:03 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.8.0
Subject: Re: [PATCH 4/4] math: Improve fmodf
Content-Language: en-US
To: Matt Turner <mattst88@gmail.com>
Cc: libc-alpha@sourceware.org, Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
 "H . J . Lu" <hjl.tools@gmail.com>, kirill <kirill.okhotnikov@gmail.com>
References: <20230310175900.2388957-1-adhemerval.zanella@linaro.org>
 <20230310175900.2388957-5-adhemerval.zanella@linaro.org>
 <CAEdQ38GP4Lvik9Pgt_4Q+5O2DLtCNc6YVmQPAkywTKz-mz9L_Q@mail.gmail.com>
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Organization: Linaro
In-Reply-To: <CAEdQ38GP4Lvik9Pgt_4Q+5O2DLtCNc6YVmQPAkywTKz-mz9L_Q@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>


On 13/03/23 12:19, Matt Turner wrote:
> On Fri, Mar 10, 2023 at 1:01 PM Adhemerval Zanella via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> This uses a new algorithm similar to already proposed earlier [1].
>> With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
>> the simplest implementation is:
>>
>>    mx * 2^ex == 2 * mx * 2^(ex - 1)
>>
>>    while (ex > ey)
>>      {
>>        mx *= 2;
>>        --ex;
>>        mx %= my;
>>      }
>>
>> With mx/my being mantissa of double floating pointer, on each step the
>> argument reduction can be improved 8 (which is sizeof of uint32_t minus
>> MANTISSA_WIDTH plus the signal bit):
>>
>>    while (ex > ey)
>>      {
>>        mx << 8;
>>        ex -= 8;
>>        mx %= my;
>>      }  */
>>
>> The implementation uses builtin clz and ctz, along with shifts to
>> convert hx/hy back to doubles.  Different than the original patch,
>> this path assume modulo/divide operation is slow, so use multiplication
>> with invert values.
>>
>> I see the following performance improvements using fmod benchtests
>> (result only show the 'mean' result):
>>
>>   Architecture     | Input           | master   | patch
>>   -----------------|-----------------|----------|--------
>>   x86_64 (Ryzen 9) | subnormals      | 17.2549  | 12.3214
>>   x86_64 (Ryzen 9) | normal          | 85.4096  | 52.6625
>>   x86_64 (Ryzen 9) | close-exponents | 19.1072  | 17.4622
>>   aarch64 (N1)     | subnormal       | 10.2182  | 6.81778
>>   aarch64 (N1)     | normal          | 60.0616  | 158.339
> 
> Is this line correct? 60 -> 158?

Nops, it was an overlook from my part.  I reran the benchmark to double
check it and the corrects numbers are:

      Architecture     | Input           | master   | patch
      -----------------|-----------------|----------|--------
      x86_64 (Ryzen 9) | subnormals      | 17.2549  | 12.3214
      x86_64 (Ryzen 9) | normal          | 85.4096  | 52.6625
      x86_64 (Ryzen 9) | close-exponents | 19.1072  | 17.4622
      aarch64 (N1)     | subnormal       | 10.2182  | 6.81778
      aarch64 (N1)     | normal          | 60.0616  | 21.2581
      aarch64 (N1)     | close-exponents | 11.5256  | 8.67894