Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: vladimir.mezentsev@oracle.com
To: Joseph Myers <joseph@codesourcery.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] PR libgcc/59714 complex division is surprising on aarch64
Date: Mon, 29 Jan 2018 20:54:00 -0000	[thread overview]
Message-ID: <c1a5bfb5-8346-70e8-e1e7-13f9c05df852@oracle.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1801260005550.12943@digraph.polyomino.org.uk>

On 01/25/2018 04:14 PM, Joseph Myers wrote:
> On Thu, 25 Jan 2018, vladimir.mezentsev@oracle.com wrote:
>
>> From: Vladimir Mezentsev <vladimir.mezentsev@oracle.com>
>>
>> Tested on aarch64-linux-gnu, x86_64-pc-linux-gnu and sparc64-unknown-linux-gnu.
>> No regression. New tests now passed.
>> There is a performance degradation for complex double type:
> This is, of course, something to consider for GCC 9; it's not suitable for 
> the current development stage.
Â  Why ?

>
> Could you provide a more extended writeup of the issue (starting with the 
> argument for it being a bug at all), the approach you took and the 
> rationale for the approach, when you resubmit the patch for GCC 9 stage 1?

The formula for complex division (a+bi)/(c+di) is (a*c+b*d)/(c*c+d*d) +
(b*c-a*d)/(c*c+d*d)i

The current implementation avoids overflow in (c*c + d*d) using
x=MAX(|c|, |d|);Â  c1=c/x; d1=d/x;
and then calculating the result as
(a*c1+b*d1)/(c1*c1+d1*d1)*(1.0/x) + (b*c1-a*d1)/(c1*c1+d1*d1)*(1.0/x)i

This approach has two issues:
1. Unexpected rounding when x is not a power of 2.0
For example, (1.0+3.0i) /(1.0+3.0i) should be 1.0
but it is (1.0 - 1.66533e-17i) on aarch64 (PR59714)

2. Unexpected underflow in c/x or d/x.
For example, (0x1.0p+1023 + 0x0.8p-1022 i)/(0x1.0p+677 + 0x1.0p-677 i)
should be (0x1.0p+346 - 0x1.0p-1008 i) but it is (0x1.0p+346)

My implementation avoids these issues.

>
>> * libgcc/config/sparc/sfp-machine.h: New
> Are you introducing a requirement for all architectures to provide 
> sfp-machine.h?  If so, did you determine that SPARC was the only one 
> lacking it?  If not the only one lacking it, you need to make sure that 
> you do not break any existing architecture in GCC.
>
> What about architectures using non-IEEE formats?  soft-fp is not suitable 
> for those, but you mustn't break them.  Remember that e.g. TFmode can be 
> IBM long double, and other ?Fmode similarly need not be IEEE; the name 
> only indicates how many bytes are in the format, nothing else about it.
Agree.Â 
An additional fix is needed.
BuildÂ  can be broken without sfp-mashine.h or onÂ  architectures using
non-IEEE format.

>
> What about powerpc __divkc3?
>
> What was the rationale for using soft-fp rather than adding appropriate 
> built-in functions as suggested in a comment?

I had a version with built-in functions and I can restore it.

Let's design what we want to use soft-fp or built-in function.
I'd prefer to use built-in functions but performance is in two times worse.

The test below demonstrates performance degradation:
% cat r1.c
#include <stdio.h>

int main(int argc, char** argv)
{
Â  int i;
Â  long long sum = 0;
Â  for(i = 1; i < 1000000000; i++)
Â Â Â  {
Â Â Â Â Â  int exp;
Â Â Â Â Â  double d = (double) i;
#ifdef BUITIN
Â Â Â Â Â  double v = __builtin_frexp(d, &exp);
#else
Â Â Â Â Â  union _FP_UNION_D
Â Â Â Â Â  {
Â Â Â Â Â Â Â  double flt;
Â Â Â Â Â Â Â  struct _FP_STRUCT_LAYOUT
Â Â Â Â Â Â Â  {
Â Â Â Â Â Â Â Â Â  unsigned frac0 : 32;
Â Â Â Â Â Â Â Â Â  unsigned frac1 : 20;
Â Â Â Â Â Â Â Â Â  unsigned expÂ Â  : 11;
Â Â Â Â Â Â Â Â Â  unsigned signÂ  : 1;
Â Â Â Â Â Â Â  } bits __attribute__ ((packed));
Â Â Â Â Â Â Â  long long LL;
Â Â Â Â Â  } u;
Â Â Â Â Â  u.flt = d;
Â Â Â Â Â  exp = u.bits.exp - 1022;
#endif
Â Â Â Â Â  sum += exp * (i % 2 == 0 ? 1 : -1);
Â Â Â  }
Â  printf("sum = %d\n", sum);
Â  return 0;
}

%gcc -O2 -DBUITIN r1.c ; time ./a.out
realÂ Â Â  0m4.247s
userÂ Â Â  0m4.243s
sysÂ Â Â  0m0.001s

%gcc -O2 r1.c ; time ./a.out
realÂ Â Â  0m1.977s
userÂ Â Â  0m1.973s
sysÂ Â Â  0m0.001s

Â  -Vladimir

next prev parent reply	other threads:[~2018-01-29 20:38 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-25 20:40 vladimir.mezentsev
2018-01-26  3:39 ` Joseph Myers
2018-01-29 20:54   ` vladimir.mezentsev [this message]
2018-01-29 21:01     ` Joseph Myers
2018-02-06  8:55       ` vladimir.mezentsev
2018-02-06 17:13         ` Joseph Myers
  -- strict thread matches above, loose matches on Subject: below --
2018-02-06  7:20 vladimir.mezentsev
2018-02-06 16:53 ` Joseph Myers
2018-02-07  0:25   ` vladimir.mezentsev
2018-02-07  0:31     ` Joseph Myers
2017-10-25 17:29 vladimir.mezentsev
2017-10-25 17:29 ` Joseph Myers
2017-10-25 19:19   ` vladimir.mezentsev
2017-10-25 20:04     ` Joseph Myers
2017-10-19 13:20 Wilco Dijkstra
2017-10-19 13:52 ` Richard Earnshaw (lists)
2017-10-19 17:13   ` Vladimir Mezentsev
2017-10-19 17:24     ` Wilco Dijkstra
2017-10-25  3:26       ` vladimir.mezentsev
2017-10-18 21:32 vladimir.mezentsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c1a5bfb5-8346-70e8-e1e7-13f9c05df852@oracle.com \
    --to=vladimir.mezentsev@oracle.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=joseph@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).