[PATCH,spu]: generate inline code for divdf3

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH,spu]: generate inline code for divdf3
@ 2007-12-14 15:54 Sa Liu
  2007-12-17 22:17 ` trevor_smigiel
  0 siblings, 1 reply; 8+ messages in thread
From: Sa Liu @ 2007-12-14 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: trevor_smigiel, Andrew_Pinski

Similar to the int-to-double conversion patch 
(http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01161.html), this patch is 
about to genetate inline code for double division. The implementation 
doesn't handle INF or NAN, therefore it only applies when 
-ffinite-math-only is given.

No regression found in gcc test suites. OK for mainline?

Thanks!
Sa

Index: gcc/gcc/config/spu/spu.md
===================================================================
--- gcc.orig/gcc/config/spu/spu.md
+++ gcc/gcc/config/spu/spu.md
@@ -1735,6 +1735,58 @@
     DONE;
   })
 
+;; Taken from STI's gcc
+;; Does not correctly handle INF or NAN.
+(define_expand "divdf3"
+  [(set (match_operand:DF 0 "register_operand" "=r")
+        (div:DF (match_operand:DF 1 "register_operand" "r")
+                (match_operand:DF 2 "register_operand" "r")))]
+  "flag_finite_math_only"
+  "{ 
+    /*
+    double
+    divdf3 (double x, double y)
+    {
+        float x0;
+        float y_f = (float) y;
+        double x1, x2;
+
+        x0 = spu_extract(spu_re(spu_promote(y_f, 0)), 0);
+        x1 = (double)(x0 * (2.0f - y_f * x0)); 
+        x2 = x1 * (2.0 - y * x1);
+        return (x * x2 * (2.0 - y * x2));
+    }
+    */
+
+    rtx dst = operands[0];
+    rtx x   = operands[1];
+    rtx y   = operands[2];
+    rtx y_f = gen_reg_rtx(SFmode);
+    rtx x0_f = gen_reg_rtx(SFmode);
+    rtx x1_f = gen_reg_rtx(SFmode);
+    rtx x1 = gen_reg_rtx(DFmode);
+    rtx x2 = gen_reg_rtx(DFmode);
+    rtx t1_f = gen_reg_rtx(SFmode);
+    rtx t1 = gen_reg_rtx(DFmode);
+    rtx two = gen_reg_rtx(DFmode);
+    rtx two_f = gen_reg_rtx(SFmode);
+
+    emit_insn (gen_truncdfsf2 (y_f, y));
+    emit_insn (gen_frest_sf (x0_f, y_f));
+    emit_insn (gen_fi_sf (x0_f, y_f, x0_f));
+    emit_insn (gen_movsf (two_f, spu_float_const(\"2.0\",SFmode)));
+    emit_insn (gen_fnms_sf (t1_f, y_f, x0_f, two_f));
+    emit_insn (gen_mulsf3 (x1_f, t1_f, x0_f));
+    emit_insn (gen_extendsfdf2 (x1, x1_f));
+    emit_insn (gen_extendsfdf2 (two, two_f));
+    emit_insn (gen_movdf (t1, two));
+    emit_insn (gen_fnms_df (t1, y, x1, t1));
+    emit_insn (gen_muldf3 (x2, x1, t1));
+    emit_insn (gen_fnms_df (two, y, x2, two));
+    emit_insn (gen_muldf3 (dst, x2, two));
+    emit_insn (gen_muldf3 (dst, dst, x));
+    DONE;
+}")
 \f
 ;; sqrt
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,spu]: generate inline code for divdf3
  2007-12-14 15:54 [PATCH,spu]: generate inline code for divdf3 Sa Liu
@ 2007-12-17 22:17 ` trevor_smigiel
  2008-06-16 22:02   ` trevor_smigiel
  0 siblings, 1 reply; 8+ messages in thread
From: trevor_smigiel @ 2007-12-17 22:17 UTC (permalink / raw)
  To: Sa Liu; +Cc: gcc-patches, Andrew_Pinski

OK.

Trevor

* Sa Liu <SALIU@de.ibm.com> [2007-12-14 08:30]:
> Similar to the int-to-double conversion patch 
> (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01161.html), this patch is 
> about to genetate inline code for double division. The implementation 
> doesn't handle INF or NAN, therefore it only applies when 
> -ffinite-math-only is given.
> 
> No regression found in gcc test suites. OK for mainline?
> 
> Thanks!
> Sa
> 
> Index: gcc/gcc/config/spu/spu.md
> ===================================================================
> --- gcc.orig/gcc/config/spu/spu.md
> +++ gcc/gcc/config/spu/spu.md
> @@ -1735,6 +1735,58 @@
>      DONE;
>    })
>  
> +;; Taken from STI's gcc
> +;; Does not correctly handle INF or NAN.
> +(define_expand "divdf3"
> +  [(set (match_operand:DF 0 "register_operand" "=r")
> +        (div:DF (match_operand:DF 1 "register_operand" "r")
> +                (match_operand:DF 2 "register_operand" "r")))]
> +  "flag_finite_math_only"
> +  "{ 
> +    /*
> +    double
> +    divdf3 (double x, double y)
> +    {
> +        float x0;
> +        float y_f = (float) y;
> +        double x1, x2;
> +
> +        x0 = spu_extract(spu_re(spu_promote(y_f, 0)), 0);
> +        x1 = (double)(x0 * (2.0f - y_f * x0)); 
> +        x2 = x1 * (2.0 - y * x1);
> +        return (x * x2 * (2.0 - y * x2));
> +    }
> +    */
> +
> +    rtx dst = operands[0];
> +    rtx x   = operands[1];
> +    rtx y   = operands[2];
> +    rtx y_f = gen_reg_rtx(SFmode);
> +    rtx x0_f = gen_reg_rtx(SFmode);
> +    rtx x1_f = gen_reg_rtx(SFmode);
> +    rtx x1 = gen_reg_rtx(DFmode);
> +    rtx x2 = gen_reg_rtx(DFmode);
> +    rtx t1_f = gen_reg_rtx(SFmode);
> +    rtx t1 = gen_reg_rtx(DFmode);
> +    rtx two = gen_reg_rtx(DFmode);
> +    rtx two_f = gen_reg_rtx(SFmode);
> +
> +    emit_insn (gen_truncdfsf2 (y_f, y));
> +    emit_insn (gen_frest_sf (x0_f, y_f));
> +    emit_insn (gen_fi_sf (x0_f, y_f, x0_f));
> +    emit_insn (gen_movsf (two_f, spu_float_const(\"2.0\",SFmode)));
> +    emit_insn (gen_fnms_sf (t1_f, y_f, x0_f, two_f));
> +    emit_insn (gen_mulsf3 (x1_f, t1_f, x0_f));
> +    emit_insn (gen_extendsfdf2 (x1, x1_f));
> +    emit_insn (gen_extendsfdf2 (two, two_f));
> +    emit_insn (gen_movdf (t1, two));
> +    emit_insn (gen_fnms_df (t1, y, x1, t1));
> +    emit_insn (gen_muldf3 (x2, x1, t1));
> +    emit_insn (gen_fnms_df (two, y, x2, two));
> +    emit_insn (gen_muldf3 (dst, x2, two));
> +    emit_insn (gen_muldf3 (dst, dst, x));
> +    DONE;
> +}")
>  \f
>  ;; sqrt
>  
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,spu]: generate inline code for divdf3
  2007-12-17 22:17 ` trevor_smigiel
@ 2008-06-16 22:02   ` trevor_smigiel
  2008-06-19  8:38     ` Sa Liu
  2008-06-19 15:47     ` Ulrich Weigand
  0 siblings, 2 replies; 8+ messages in thread
From: trevor_smigiel @ 2008-06-16 22:02 UTC (permalink / raw)
  To: Sa Liu, Ulrich Weigand; +Cc: gcc-patches, Andrew_Pinski, russell_olsen

[-- Attachment #1: Type: text/plain, Size: 1231 bytes --]

Ulrich, Sa,

I've attached an alternate implementation for divdf3 (and divv2df3).
This implementation properly handles Inf, zero, NaN and exponents that
are out of the range [-126..128], except for 1023 and 1024. It doesn't
do anything special for IEEE exceptions (DBZ, etc.)

The current implementation uses frds which doesn't deal well with
doubles that don't fit in a single precision float.  So, I'm inclined to
replace it with a call to the attached out-of-line version.  Do you have
any objections to that?

Also, what do you think about making the attached version the default
even without -ffinite-math-only?   It is much faster and smaller than
the default provided by libgcc.

Trevor

* Trevor Smigiel <trevor_smigiel@playstation.sony.com> [2007-12-17 10:56]:
> OK.
> 
> Trevor
> 
> * Sa Liu <SALIU@de.ibm.com> [2007-12-14 08:30]:
> > Similar to the int-to-double conversion patch 
> > (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01161.html), this patch is 
> > about to genetate inline code for double division. The implementation 
> > doesn't handle INF or NAN, therefore it only applies when 
> > -ffinite-math-only is given.
> > 
> > No regression found in gcc test suites. OK for mainline?
> > 
> > Thanks!
> > Sa

[-- Attachment #2: divdf_fast.c --]
[-- Type: text/x-csrc, Size: 1644 bytes --]

#include <spu_intrinsics.h>

qword
__divv2df3_fast (qword x, qword y)
{
  qword y_f;
  qword sign, exp, mant;
  qword sign_mask, exp_mask;
  qword inverse, two;
  qword is_inf, is_zero, m_is_zero;

  two = si_from_double (2.0);
  sign_mask = (qword){0x80, 0, 0, 0, 0, 0, 0, 0, 0x80, 0, 0, 0, 0, 0, 0, 0};
  exp_mask = (qword){0x7f, 0xf0, 0, 0, 0, 0, 0, 0, 0x7f, 0xf0, 0, 0, 0, 0, 0, 0};

  exp = si_and (y, exp_mask);
  sign = si_and (y, sign_mask);

  /* Test for zero and inf */
  m_is_zero = si_ceqi (si_andc (si_andc (y, exp_mask), sign_mask), 0);
  is_inf = si_and (si_ceq (exp, si_ilhu (0x7ff0)), m_is_zero);
  is_inf = si_xswd (si_and (si_rotqbyi (is_inf, -4), m_is_zero));
  is_zero = si_xswd (si_rotqbyi (si_ceqi (exp, 0), -4));

  /* Compute the inverse of the exponent */
  exp = si_sf (exp, si_ilhu (0x7fd0));
  exp = si_selb (si_il (0), exp, si_cgti (exp, -1));

  /* The only part we want from frest/fi is the mantissa.  We use bit
   * manipulation to get 23 bits of mantissa from y, and set the sign
   * and exponent to 0x3f8. */
  y_f = si_selb (si_shlqbii (y, 3), si_ilhu (0x3f80), si_ilhu (0xff80));
  mant = si_fesd (si_fi (y_f, si_frest (y_f)));

  /* Merge the exponent and mantissa to create a double */
  inverse = si_selb (mant, exp, exp_mask);

  /* Three iterations of x = x * (2.0 - y * x) */
  inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
  inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
  inverse = si_dfm (inverse, si_dfnms (inverse, y, two));
  inverse = si_selb (inverse, exp_mask, is_zero);
  inverse = si_selb (inverse, si_il (0), is_inf);
  return si_dfm (si_xor (x, sign), inverse);
}


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,spu]: generate inline code for divdf3
  2008-06-16 22:02   ` trevor_smigiel
@ 2008-06-19  8:38     ` Sa Liu
  2008-06-19 18:07       ` trevor_smigiel
  2008-06-19 15:47     ` Ulrich Weigand
  1 sibling, 1 reply; 8+ messages in thread
From: Sa Liu @ 2008-06-19  8:38 UTC (permalink / raw)
  To: trevor_smigiel; +Cc: Andrew_Pinski, gcc-patches, russell_olsen, Ulrich Weigand

Trevor,

It's nice that the attached version also handles special value operands. 
But I'm not sure how this could affect performance comparing with the 
in-line version. We brought this inaccurate inline code back in backend 
because of a decline of performance, of course compared with the libgcc 
version. I think we will need some measurements for this function too.

Sa 

> I've attached an alternate implementation for divdf3 (and divv2df3).
> This implementation properly handles Inf, zero, NaN and exponents that
> are out of the range [-126..128], except for 1023 and 1024. It doesn't
> do anything special for IEEE exceptions (DBZ, etc.)
> 
> The current implementation uses frds which doesn't deal well with
> doubles that don't fit in a single precision float.  So, I'm inclined to
> replace it with a call to the attached out-of-line version.  Do you have
> any objections to that?
> 
> Also, what do you think about making the attached version the default
> even without -ffinite-math-only?   It is much faster and smaller than
> the default provided by libgcc.
> 
> Trevor
> 
> * Trevor Smigiel <trevor_smigiel@playstation.sony.com> [2007-12-17 
10:56]:
> > OK.
> > 
> > Trevor
> > 
> > * Sa Liu <SALIU@de.ibm.com> [2007-12-14 08:30]:
> > > Similar to the int-to-double conversion patch 
> > > (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01161.html), this 
patch is 
> > > about to genetate inline code for double division. The 
implementation 
> > > doesn't handle INF or NAN, therefore it only applies when 
> > > -ffinite-math-only is given.
> > > 
> > > No regression found in gcc test suites. OK for mainline?
> > > 
> > > Thanks!
> > > Sa
> [attachment "divdf_fast.c" deleted by Sa Liu/Germany/IBM] 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,spu]: generate inline code for divdf3
  2008-06-16 22:02   ` trevor_smigiel
  2008-06-19  8:38     ` Sa Liu
@ 2008-06-19 15:47     ` Ulrich Weigand
  2008-06-19 18:34       ` trevor_smigiel
  1 sibling, 1 reply; 8+ messages in thread
From: Ulrich Weigand @ 2008-06-19 15:47 UTC (permalink / raw)
  To: trevor_smigiel; +Cc: Sa Liu, gcc-patches, Andrew_Pinski, russell_olsen

Trevor Smigiel wrote:

> I've attached an alternate implementation for divdf3 (and divv2df3).
> This implementation properly handles Inf, zero, NaN and exponents that
> are out of the range [-126..128], except for 1023 and 1024. It doesn't
> do anything special for IEEE exceptions (DBZ, etc.)
> 
> The current implementation uses frds which doesn't deal well with
> doubles that don't fit in a single precision float.  So, I'm inclined to
> replace it with a call to the attached out-of-line version.  Do you have
> any objections to that?

Well, as Sa mentioned, we have use cases where just the fact that we make
an out-of-line call to implement division drastically affects the performance
of applications (due to unhinted branches etc).  Therefore we'd really like
to keep the ability to inline divdf ...   Maybe this could be made 
dependent on optimize_size?

Of course, replacing the inlined version by something that does not use
frds would be nice.  Also, inlined versions could be tuned according to
the various -ffast-math related switches.

> Also, what do you think about making the attached version the default
> even without -ffinite-math-only?   It is much faster and smaller than
> the default provided by libgcc.

Well, if it handles Inf/NaN correctly, there is no need to guard that
implementation by -ffinite-math-only.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,spu]: generate inline code for divdf3
  2008-06-19  8:38     ` Sa Liu
@ 2008-06-19 18:07       ` trevor_smigiel
  0 siblings, 0 replies; 8+ messages in thread
From: trevor_smigiel @ 2008-06-19 18:07 UTC (permalink / raw)
  To: Sa Liu; +Cc: Andrew_Pinski, gcc-patches, russell_olsen, Ulrich Weigand

Sa,

This new version takes about 100 cycles.

The libgcc version takes over 1000 cycles because it has a loop to do a
64 bit integer divide.  This code is also much bigger.

The current inline version is only about 6 cycles faster because the
critical path has so much latency most of the extra instructions are
free.  It will be a bit faster because it is inlined too.  

If you have a test case where performance of double divide is critical,
can you please provide it, or a pointer to it.

Trevor

* Sa Liu <SALIU@de.ibm.com> [2008-06-19 01:35]:
> Trevor,
> 
> It's nice that the attached version also handles special value operands. 
> But I'm not sure how this could affect performance comparing with the 
> in-line version. We brought this inaccurate inline code back in backend 
> because of a decline of performance, of course compared with the libgcc 
> version. I think we will need some measurements for this function too.
> 
> Sa 
> 
> > I've attached an alternate implementation for divdf3 (and divv2df3).
> > This implementation properly handles Inf, zero, NaN and exponents that
> > are out of the range [-126..128], except for 1023 and 1024. It doesn't
> > do anything special for IEEE exceptions (DBZ, etc.)
> > 
> > The current implementation uses frds which doesn't deal well with
> > doubles that don't fit in a single precision float.  So, I'm inclined to
> > replace it with a call to the attached out-of-line version.  Do you have
> > any objections to that?
> > 
> > Also, what do you think about making the attached version the default
> > even without -ffinite-math-only?   It is much faster and smaller than
> > the default provided by libgcc.
> > 
> > Trevor
> > 
> > * Trevor Smigiel <trevor_smigiel@playstation.sony.com> [2007-12-17 
> 10:56]:
> > > OK.
> > > 
> > > Trevor
> > > 
> > > * Sa Liu <SALIU@de.ibm.com> [2007-12-14 08:30]:
> > > > Similar to the int-to-double conversion patch 
> > > > (http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01161.html), this 
> patch is 
> > > > about to genetate inline code for double division. The 
> implementation 
> > > > doesn't handle INF or NAN, therefore it only applies when 
> > > > -ffinite-math-only is given.
> > > > 
> > > > No regression found in gcc test suites. OK for mainline?
> > > > 
> > > > Thanks!
> > > > Sa
> > [attachment "divdf_fast.c" deleted by Sa Liu/Germany/IBM] 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,spu]: generate inline code for divdf3
  2008-06-19 15:47     ` Ulrich Weigand
@ 2008-06-19 18:34       ` trevor_smigiel
  2008-06-24 19:19         ` Ulrich Weigand
  0 siblings, 1 reply; 8+ messages in thread
From: trevor_smigiel @ 2008-06-19 18:34 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Sa Liu, gcc-patches, Andrew_Pinski, russell_olsen

Ulrich,

OK.  How about I check in this version as the default, but
-ffinite-math-only will still call the existing inline version?  I'll
leave it up to someone else to improve the inline version because I have
no way to decide what changes are acceptable for your test cases.

Trevor

* Ulrich Weigand <uweigand@de.ibm.com> [2008-06-19 08:40]:
> 
> Trevor Smigiel wrote:
> 
> > I've attached an alternate implementation for divdf3 (and divv2df3).
> > This implementation properly handles Inf, zero, NaN and exponents that
> > are out of the range [-126..128], except for 1023 and 1024. It doesn't
> > do anything special for IEEE exceptions (DBZ, etc.)
> > 
> > The current implementation uses frds which doesn't deal well with
> > doubles that don't fit in a single precision float.  So, I'm inclined to
> > replace it with a call to the attached out-of-line version.  Do you have
> > any objections to that?
> 
> Well, as Sa mentioned, we have use cases where just the fact that we make
> an out-of-line call to implement division drastically affects the performance
> of applications (due to unhinted branches etc).  Therefore we'd really like
> to keep the ability to inline divdf ...   Maybe this could be made 
> dependent on optimize_size?
> 
> Of course, replacing the inlined version by something that does not use
> frds would be nice.  Also, inlined versions could be tuned according to
> the various -ffast-math related switches.
>  
> > Also, what do you think about making the attached version the default
> > even without -ffinite-math-only?   It is much faster and smaller than
> > the default provided by libgcc.
> 
> Well, if it handles Inf/NaN correctly, there is no need to guard that
> implementation by -ffinite-math-only.
> 
> Bye,
> Ulrich
> 
> -- 
>   Dr. Ulrich Weigand
>   GNU Toolchain for Linux on System z and Cell BE
>   Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,spu]: generate inline code for divdf3
  2008-06-19 18:34       ` trevor_smigiel
@ 2008-06-24 19:19         ` Ulrich Weigand
  0 siblings, 0 replies; 8+ messages in thread
From: Ulrich Weigand @ 2008-06-24 19:19 UTC (permalink / raw)
  To: trevor_smigiel; +Cc: Sa Liu, gcc-patches, Andrew_Pinski, russell_olsen

Trevor Smigiel wrote:

> OK.  How about I check in this version as the default, but
> -ffinite-math-only will still call the existing inline version?  I'll
> leave it up to someone else to improve the inline version because I have
> no way to decide what changes are acceptable for your test cases.

Certainly, that would be fine with me.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-06-24 19:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-14 15:54 [PATCH,spu]: generate inline code for divdf3 Sa Liu
2007-12-17 22:17 ` trevor_smigiel
2008-06-16 22:02   ` trevor_smigiel
2008-06-19  8:38     ` Sa Liu
2008-06-19 18:07       ` trevor_smigiel
2008-06-19 15:47     ` Ulrich Weigand
2008-06-19 18:34       ` trevor_smigiel
2008-06-24 19:19         ` Ulrich Weigand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).