From: vladimir.mezentsev@oracle.com
To: Joseph Myers <joseph@codesourcery.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64
Date: Mon, 29 Jan 2018 20:54:00 -0000 [thread overview]
Message-ID: <c1a5bfb5-8346-70e8-e1e7-13f9c05df852@oracle.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1801260005550.12943@digraph.polyomino.org.uk>
On 01/25/2018 04:14 PM, Joseph Myers wrote:
> On Thu, 25 Jan 2018, vladimir.mezentsev@oracle.com wrote:
>
>> From: Vladimir Mezentsev <vladimir.mezentsev@oracle.com>
>>
>> Tested on aarch64-linux-gnu, x86_64-pc-linux-gnu and sparc64-unknown-linux-gnu.
>> No regression. New tests now passed.
>> There is a performance degradation for complex double type:
> This is, of course, something to consider for GCC 9; it's not suitable for
> the current development stage.
 Why ?
>
> Could you provide a more extended writeup of the issue (starting with the
> argument for it being a bug at all), the approach you took and the
> rationale for the approach, when you resubmit the patch for GCC 9 stage 1?
The formula for complex division (a+bi)/(c+di) is (a*c+b*d)/(c*c+d*d) +
(b*c-a*d)/(c*c+d*d)i
The current implementation avoids overflow in (c*c + d*d) using
x=MAX(|c|, |d|);Â c1=c/x; d1=d/x;
and then calculating the result as
(a*c1+b*d1)/(c1*c1+d1*d1)*(1.0/x) + (b*c1-a*d1)/(c1*c1+d1*d1)*(1.0/x)i
This approach has two issues:
1. Unexpected rounding when x is not a power of 2.0
For example, (1.0+3.0i) /(1.0+3.0i) should be 1.0
but it is (1.0 - 1.66533e-17i) on aarch64 (PR59714)
2. Unexpected underflow in c/x or d/x.
For example, (0x1.0p+1023 + 0x0.8p-1022 i)/(0x1.0p+677 + 0x1.0p-677 i)
should be (0x1.0p+346 - 0x1.0p-1008 i) but it is (0x1.0p+346)
My implementation avoids these issues.
>
>> * libgcc/config/sparc/sfp-machine.h: New
> Are you introducing a requirement for all architectures to provide
> sfp-machine.h? If so, did you determine that SPARC was the only one
> lacking it? If not the only one lacking it, you need to make sure that
> you do not break any existing architecture in GCC.
>
> What about architectures using non-IEEE formats? soft-fp is not suitable
> for those, but you mustn't break them. Remember that e.g. TFmode can be
> IBM long double, and other ?Fmode similarly need not be IEEE; the name
> only indicates how many bytes are in the format, nothing else about it.
Agree.Â
An additional fix is needed.
Build can be broken without sfp-mashine.h or on architectures using
non-IEEE format.
>
> What about powerpc __divkc3?
>
> What was the rationale for using soft-fp rather than adding appropriate
> built-in functions as suggested in a comment?
I had a version with built-in functions and I can restore it.
Let's design what we want to use soft-fp or built-in function.
I'd prefer to use built-in functions but performance is in two times worse.
The test below demonstrates performance degradation:
% cat r1.c
#include <stdio.h>
int main(int argc, char** argv)
{
 int i;
 long long sum = 0;
 for(i = 1; i < 1000000000; i++)
   {
     int exp;
     double d = (double) i;
#ifdef BUITIN
     double v = __builtin_frexp(d, &exp);
#else
     union _FP_UNION_D
     {
       double flt;
       struct _FP_STRUCT_LAYOUT
       {
         unsigned frac0 : 32;
         unsigned frac1 : 20;
         unsigned exp  : 11;
         unsigned sign : 1;
       } bits __attribute__ ((packed));
       long long LL;
     } u;
     u.flt = d;
     exp = u.bits.exp - 1022;
#endif
     sum += exp * (i % 2 == 0 ? 1 : -1);
   }
 printf("sum = %d\n", sum);
 return 0;
}
%gcc -O2 -DBUITIN r1.c ; time ./a.out
real   0m4.247s
user   0m4.243s
sys   0m0.001s
%gcc -O2 r1.c ; time ./a.out
real   0m1.977s
user   0m1.973s
sys   0m0.001s
 -Vladimir
next prev parent reply other threads:[~2018-01-29 20:38 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-25 20:40 vladimir.mezentsev
2018-01-26 3:39 ` Joseph Myers
2018-01-29 20:54 ` vladimir.mezentsev [this message]
2018-01-29 21:01 ` Joseph Myers
2018-02-06 8:55 ` vladimir.mezentsev
2018-02-06 17:13 ` Joseph Myers
-- strict thread matches above, loose matches on Subject: below --
2018-02-06 7:20 vladimir.mezentsev
2018-02-06 16:53 ` Joseph Myers
2018-02-07 0:25 ` vladimir.mezentsev
2018-02-07 0:31 ` Joseph Myers
2017-10-25 17:29 vladimir.mezentsev
2017-10-25 17:29 ` Joseph Myers
2017-10-25 19:19 ` vladimir.mezentsev
2017-10-25 20:04 ` Joseph Myers
2017-10-19 13:20 Wilco Dijkstra
2017-10-19 13:52 ` Richard Earnshaw (lists)
2017-10-19 17:13 ` Vladimir Mezentsev
2017-10-19 17:24 ` Wilco Dijkstra
2017-10-25 3:26 ` vladimir.mezentsev
2017-10-18 21:32 vladimir.mezentsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c1a5bfb5-8346-70e8-e1e7-13f9c05df852@oracle.com \
--to=vladimir.mezentsev@oracle.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=joseph@codesourcery.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).