Re: [PATCH v2 3/5] math: Improve fmod - Adhemerval Zanella Netto

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: libc-alpha@sourceware.org,
	Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	kirill <kirill.okhotnikov@gmail.com>
Subject: Re: [PATCH v2 3/5] math: Improve fmod
Date: Thu, 16 Mar 2023 11:28:22 -0300	[thread overview]
Message-ID: <b17659bf-27fd-1cb6-a608-f5e381948917@linaro.org> (raw)
In-Reply-To: <CAMe9rOrAzcLzqL5QUQ098aYc10pGX+z8V70KR+zt7Lc-C9ry=Q@mail.gmail.com>



On 15/03/23 21:58, H.J. Lu wrote:
> On Wed, Mar 15, 2023 at 1:59 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>> This uses a new algorithm similar to already proposed earlier [1].
>> With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
>> the simplest implementation is:
>>
>>    mx * 2^ex == 2 * mx * 2^(ex - 1)
>>
>>    while (ex > ey)
>>      {
>>        mx *= 2;
>>        --ex;
>>        mx %= my;
>>      }
>>
>> With mx/my being mantissa of double floating pointer, on each step the
>> argument reduction can be improved 11 (which is sizeo of uint64_t minus
>> MANTISSA_WIDTH plus the signal bit):
>>
>>    while (ex > ey)
>>      {
>>        mx << 11;
>>        ex -= 11;
>>        mx %= my;
>>      }  */
>>
>> The implementation uses builtin clz and ctz, along with shifts to
>> convert hx/hy back to doubles.  Different than the original patch,
>> this path assume modulo/divide operation is slow, so use multiplication
>> with invert values.
>>
>> I see the following performance improvements using fmod benchtests
>> (result only show the 'mean' result):
>>
>>   Architecture     | Input           | master   | patch
>>   -----------------|-----------------|----------|--------
>>   x86_64 (Ryzen 9) | subnormals      | 19.1584  | 12.5049
>>   x86_64 (Ryzen 9) | normal          | 1016.51  | 296.939
>>   x86_64 (Ryzen 9) | close-exponents | 18.4428  | 16.0244
> 
> I tried it with the test in
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=30179
> 
> On Intel i7-10710U, I got
> 
> time ./sse
> 3.13user 0.00system 0:03.13elapsed 99%CPU (0avgtext+0avgdata 512maxresident)k
> 0inputs+0outputs (0major+37minor)pagefaults 0swaps
> time ./x87
> 0.24user 0.00system 0:00.24elapsed 100%CPU (0avgtext+0avgdata 512maxresident)k
> 0inputs+0outputs (0major+37minor)pagefaults 0swaps
> time ./generic
> 0.55user 0.00system 0:00.55elapsed 99%CPU (0avgtext+0avgdata 512maxresident)k
> 0inputs+0outputs (0major+37minor)pagefaults 0swaps
> 
> The new generic is still slower than x87.

I think it really depends of the underlying hardware and on the input range.
Using the benchmark from the patch set and patch 66182 [1], I see:

CPU              | Input           | patch    | 66182
-----------------|-----------------|----------|--------
Ryzen 9          | subnormals      | 12.5049  | 31.2822
Ryzen 9          | normal          | 296.939  | 592.489
Ryzen 9          | close-exponents | 16.0244  | 33.5172
E5-2640          | subnormals      | 34.5454  | 652.59
E5-2640          | normal          | 473.602  | 438.836
E5-2640          | close-exponents | 39.298   | 22.2742
i7-4510U         | subnormals      | 25.2624  | 666.964
i7-4510U         | normal          | 386.489  | 454.222
i7-4510U         | close-exponents | 29.463   | 22.8572

So it seems that fprem performance is not really consistent over x86 CPUs, and 
even for recent AMD is far from great.  So I still think the generic is better
for x86, and I think fprem should be used along with ifunc to select on CPUs
that really yields better numbers (and take in consideration that subnormals
numbers seems to be pretty bad).

You might get better x86 performance by remove the SVID wrapper as I did
on the last patch; but it will increase 66182 complexity (you will need to
check for NaN/INF/0.0 and set errno).  And I hardly think it will close the
gap on the AMD chip I use.

I am also checking a algorithm change to use simple loop for the normal inputs,
where integer modulo operation is used instead of inverse multiplication. 
But as far I am testing performance is really bad on all x86 Intel chips I 
tests (it is not as bad on AMD).

[1] https://patchwork.sourceware.org/project/glibc/patch/20230309183312.205763-1-hjl.tools@gmail.com/

next prev parent reply	other threads:[~2023-03-16 14:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-15 20:59 [PATCH v2 0/5] Improve fmod and fmodf Adhemerval Zanella
2023-03-15 20:59 ` [PATCH v2 1/5] benchtests: Add fmod benchmark Adhemerval Zanella
2023-03-15 20:59 ` [PATCH v2 2/5] benchtests: Add fmodf benchmark Adhemerval Zanella
2023-03-15 20:59 ` [PATCH v2 3/5] math: Improve fmod Adhemerval Zanella
2023-03-16  0:58   ` H.J. Lu
2023-03-16 14:28     ` Adhemerval Zanella Netto [this message]
2023-03-16 16:13       ` Wilco Dijkstra
2023-03-16 20:39         ` Adhemerval Zanella Netto
2023-03-17 14:55           ` Wilco Dijkstra
2023-03-17 16:07             ` H.J. Lu
2023-03-17 18:22               ` Wilco Dijkstra
2023-03-15 20:59 ` [PATCH v2 4/5] math: Improve fmodf Adhemerval Zanella
2023-03-16 18:11   ` Wilco Dijkstra
2023-03-16 18:38     ` Adhemerval Zanella Netto
2023-03-16 19:15       ` Wilco Dijkstra
2023-03-16 19:45         ` Adhemerval Zanella Netto
2023-03-15 20:59 ` [PATCH v2 5/5] math: Remove the error handling wrapper from fmod and fmodf Adhemerval Zanella
2023-03-16 17:21   ` Wilco Dijkstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b17659bf-27fd-1cb6-a608-f5e381948917@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=hjl.tools@gmail.com \
    --cc=kirill.okhotnikov@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).