public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, spu] Improve precision of divsf3 on SPU
@ 2008-06-15 15:49 Ulrich Weigand
  2008-06-16 19:11 ` trevor_smigiel
  0 siblings, 1 reply; 5+ messages in thread
From: Ulrich Weigand @ 2008-06-15 15:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: trevor_smigiel, andrew_pinski

Hello,

the method used in spu.md's divsf3 quite frequently results in a number
that is "off by one".  The SDK compiler (and also the simdmath library
implementation) use an additional check to catch those cases; I propose
to add this also to the FSF implementation.

The following patch does this, fixing the following test cases
(without introducing any regressions):

FAIL: gcc.c-torture/execute/20000605-1.c execution,  -O0
FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O0
FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O1
FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O2
FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O3 -fomit-frame-pointer -funroll-loops
FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/20071030-1.c execution,  -Os
FAIL: gcc.c-torture/execute/gofast.c execution,  -O0
FAIL: gcc.c-torture/execute/gofast.c execution,  -O1
FAIL: gcc.c-torture/execute/ieee/980619-1.c execution,  -O0
FAIL: gcc.dg/builtins-50.c execution test
FAIL: gcc.dg/pr19402-2.c execution test

OK for mainline and 4.3 branch?

Bye,
Ulrich


ChangeLog:

	* config/spu/spu.md ("div<mode>3"): Return number with next highest
	magnitude if this is still less-or-equal the true quotient.

Index: gcc/config/spu/spu.md
===================================================================
*** gcc/config/spu/spu.md	(revision 136680)
--- gcc/config/spu/spu.md	(working copy)
***************
*** 1722,1728 ****
     (set_attr "length" "80")])
  
  (define_insn_and_split "div<mode>3"
!   [(set (match_operand:VSF 0 "spu_reg_operand" "=r")
  	(div:VSF (match_operand:VSF 1 "spu_reg_operand" "r")
  		 (match_operand:VSF 2 "spu_reg_operand" "r")))
     (clobber (match_scratch:VSF 3 "=&r"))
--- 1722,1728 ----
     (set_attr "length" "80")])
  
  (define_insn_and_split "div<mode>3"
!   [(set (match_operand:VSF 0 "spu_reg_operand" "=&r")
  	(div:VSF (match_operand:VSF 1 "spu_reg_operand" "r")
  		 (match_operand:VSF 2 "spu_reg_operand" "r")))
     (clobber (match_scratch:VSF 3 "=&r"))
***************
*** 1740,1746 ****
      emit_insn (gen_fi_<mode>(operands[3], operands[2], operands[3]));
      emit_insn (gen_mul<mode>3(operands[4], operands[1], operands[3]));
      emit_insn (gen_fnms_<mode>(operands[0], operands[4], operands[2], operands[1]));
!     emit_insn (gen_fma_<mode>(operands[0], operands[0], operands[3], operands[4]));
      DONE;
    })
  
--- 1740,1754 ----
      emit_insn (gen_fi_<mode>(operands[3], operands[2], operands[3]));
      emit_insn (gen_mul<mode>3(operands[4], operands[1], operands[3]));
      emit_insn (gen_fnms_<mode>(operands[0], operands[4], operands[2], operands[1]));
!     emit_insn (gen_fma_<mode>(operands[3], operands[0], operands[3], operands[4]));
!     emit_insn (gen_add<f2i>3(gen_lowpart (<F2I>mode, operands[4]),
! 			     gen_lowpart (<F2I>mode, operands[3]),
! 			     spu_const (<F2I>mode, 1)));
!     emit_insn (gen_fnms_<mode>(operands[0], operands[2],  operands[4], operands[1]));
!     emit_insn (gen_cgt_<f2i>(gen_lowpart (<F2I>mode, operands[0]),
! 			     gen_lowpart (<F2I>mode, operands[0]),
! 			     spu_const (<F2I>mode, -1)));
!     emit_insn (gen_selb(operands[0], operands[3], operands[4], operands[0]));
      DONE;
    })
  
-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH, spu] Improve precision of divsf3 on SPU
  2008-06-15 15:49 [PATCH, spu] Improve precision of divsf3 on SPU Ulrich Weigand
@ 2008-06-16 19:11 ` trevor_smigiel
  2008-06-26 12:49   ` Ulrich Weigand
  0 siblings, 1 reply; 5+ messages in thread
From: trevor_smigiel @ 2008-06-16 19:11 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches, andrew_pinski

Ulrich,

Is this result still consistent with round-to-zero?

Trevor

* Ulrich Weigand <uweigand@de.ibm.com> [2008-06-15 08:55]:
> Hello,
> 
> the method used in spu.md's divsf3 quite frequently results in a number
> that is "off by one".  The SDK compiler (and also the simdmath library
> implementation) use an additional check to catch those cases; I propose
> to add this also to the FSF implementation.
> 
> The following patch does this, fixing the following test cases
> (without introducing any regressions):
> 
> FAIL: gcc.c-torture/execute/20000605-1.c execution,  -O0
> FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O0
> FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O1
> FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O2
> FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O3 -fomit-frame-pointer
> FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O3 -fomit-frame-pointer -funroll-loops
> FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
> FAIL: gcc.c-torture/execute/20071030-1.c execution,  -O3 -g
> FAIL: gcc.c-torture/execute/20071030-1.c execution,  -Os
> FAIL: gcc.c-torture/execute/gofast.c execution,  -O0
> FAIL: gcc.c-torture/execute/gofast.c execution,  -O1
> FAIL: gcc.c-torture/execute/ieee/980619-1.c execution,  -O0
> FAIL: gcc.dg/builtins-50.c execution test
> FAIL: gcc.dg/pr19402-2.c execution test
> 
> OK for mainline and 4.3 branch?
> 
> Bye,
> Ulrich
> 
> 
> ChangeLog:
> 
> 	* config/spu/spu.md ("div<mode>3"): Return number with next highest
> 	magnitude if this is still less-or-equal the true quotient.
> 
> Index: gcc/config/spu/spu.md
> ===================================================================
> *** gcc/config/spu/spu.md	(revision 136680)
> --- gcc/config/spu/spu.md	(working copy)
> ***************
> *** 1722,1728 ****
>      (set_attr "length" "80")])
>   
>   (define_insn_and_split "div<mode>3"
> !   [(set (match_operand:VSF 0 "spu_reg_operand" "=r")
>   	(div:VSF (match_operand:VSF 1 "spu_reg_operand" "r")
>   		 (match_operand:VSF 2 "spu_reg_operand" "r")))
>      (clobber (match_scratch:VSF 3 "=&r"))
> --- 1722,1728 ----
>      (set_attr "length" "80")])
>   
>   (define_insn_and_split "div<mode>3"
> !   [(set (match_operand:VSF 0 "spu_reg_operand" "=&r")
>   	(div:VSF (match_operand:VSF 1 "spu_reg_operand" "r")
>   		 (match_operand:VSF 2 "spu_reg_operand" "r")))
>      (clobber (match_scratch:VSF 3 "=&r"))
> ***************
> *** 1740,1746 ****
>       emit_insn (gen_fi_<mode>(operands[3], operands[2], operands[3]));
>       emit_insn (gen_mul<mode>3(operands[4], operands[1], operands[3]));
>       emit_insn (gen_fnms_<mode>(operands[0], operands[4], operands[2], operands[1]));
> !     emit_insn (gen_fma_<mode>(operands[0], operands[0], operands[3], operands[4]));
>       DONE;
>     })
>   
> --- 1740,1754 ----
>       emit_insn (gen_fi_<mode>(operands[3], operands[2], operands[3]));
>       emit_insn (gen_mul<mode>3(operands[4], operands[1], operands[3]));
>       emit_insn (gen_fnms_<mode>(operands[0], operands[4], operands[2], operands[1]));
> !     emit_insn (gen_fma_<mode>(operands[3], operands[0], operands[3], operands[4]));
> !     emit_insn (gen_add<f2i>3(gen_lowpart (<F2I>mode, operands[4]),
> ! 			     gen_lowpart (<F2I>mode, operands[3]),
> ! 			     spu_const (<F2I>mode, 1)));
> !     emit_insn (gen_fnms_<mode>(operands[0], operands[2],  operands[4], operands[1]));
> !     emit_insn (gen_cgt_<f2i>(gen_lowpart (<F2I>mode, operands[0]),
> ! 			     gen_lowpart (<F2I>mode, operands[0]),
> ! 			     spu_const (<F2I>mode, -1)));
> !     emit_insn (gen_selb(operands[0], operands[3], operands[4], operands[0]));
>       DONE;
>     })
>   
> -- 
>   Dr. Ulrich Weigand
>   GNU Toolchain for Linux on System z and Cell BE
>   Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH, spu] Improve precision of divsf3 on SPU
  2008-06-16 19:11 ` trevor_smigiel
@ 2008-06-26 12:49   ` Ulrich Weigand
  2008-06-26 18:38     ` trevor_smigiel
  0 siblings, 1 reply; 5+ messages in thread
From: Ulrich Weigand @ 2008-06-26 12:49 UTC (permalink / raw)
  To: trevor_smigiel; +Cc: gcc-patches, andrew_pinski

Trevor Smigiel wrote:

> Is this result still consistent with round-to-zero?

Not in all cases.  I understand the original code always guarantees the
result is smaller or equal in magnitude to the true quotient.  However,
it is not always equal to the *nearest* such number (i.e. the result
to be expected in round-to-zero mode), but sometimes 1 ulp less than
this.

This 1 ulp off tends to occur very frequently if the true quotient is
actually representable exactly.  This leads to the quite surprising
behaviour that "simple" divisors like 1.0 or 2.0 nearly always yield
wrong results (e.g. the identity x / 1.0 == x does not hold).

The patch tries to fix this by checking whether the number 1 ulp larger
fits the real quotient better.  The intent is to still remain lower that
the true result (to respect round-to-zero), but the code does not always
achieve this.

There are two reasons for this:

- If the dividend is negative, the check is actually incorrect; we'd 
  have to check the error term for <= 0 in this case, but we always
  check it for >= 0.

- If the dividend is very small in magnitude (< 2^-100), the computation
  of the error term can underflow to zero, so we accidentally treat a
  too-large result as if it were the exact result.

The first of these problems can be fixed by multiplying the error term
with -1.0 for negative dividends.  The patch below implements this; it
is slightly less efficient than the original patch, but it may be 
preferable to the original version as it avoids that systematic error.

It seems the second problem can only be fixed by much more elaborate
code (e.g. normalizing the input operands and computing the result
exponent by hand, as the simdmath _divf4.h code does) ...   I don't
think we should do that for the "fast" inline implementation -- if this 
deviates from round-to-zero for input values near the limits of 
representable values, that should be an acceptable trade-off.

The alternative would be to provide a fully exact algorithm (along the
lines of the simdmath implementation) as libgcc function.

What do you think?

Bye,
Ulrich


Patch below was tested on spu-elf with no regressions, fixes the same
set of test cases that were fixed by the initial patch.

ChangeLog:

	* config/spu/spu.md ("div<mode>3"): Return number with next
	highest magnitude if this is still less or equal to the true
	quotient in magnitude.


Index: gcc/config/spu/spu.md
===================================================================
*** gcc/config/spu/spu.md	(revision 136911)
--- gcc/config/spu/spu.md	(working copy)
***************
*** 1726,1732 ****
  	(div:VSF (match_operand:VSF 1 "spu_reg_operand" "r")
  		 (match_operand:VSF 2 "spu_reg_operand" "r")))
     (clobber (match_scratch:VSF 3 "=&r"))
!    (clobber (match_scratch:VSF 4 "=&r"))]
    ""
    "#"
    "reload_completed"
--- 1726,1733 ----
  	(div:VSF (match_operand:VSF 1 "spu_reg_operand" "r")
  		 (match_operand:VSF 2 "spu_reg_operand" "r")))
     (clobber (match_scratch:VSF 3 "=&r"))
!    (clobber (match_scratch:VSF 4 "=&r"))
!    (clobber (match_scratch:VSF 5 "=&r"))]
    ""
    "#"
    "reload_completed"
***************
*** 1734,1746 ****
  	(div:VSF (match_dup:VSF 1)
  		 (match_dup:VSF 2)))
     (clobber (match_dup:VSF 3))
!    (clobber (match_dup:VSF 4))]
    {
      emit_insn (gen_frest_<mode>(operands[3], operands[2]));
      emit_insn (gen_fi_<mode>(operands[3], operands[2], operands[3]));
      emit_insn (gen_mul<mode>3(operands[4], operands[1], operands[3]));
!     emit_insn (gen_fnms_<mode>(operands[0], operands[4], operands[2], operands[1]));
!     emit_insn (gen_fma_<mode>(operands[0], operands[0], operands[3], operands[4]));
      DONE;
    })
  
--- 1735,1767 ----
  	(div:VSF (match_dup:VSF 1)
  		 (match_dup:VSF 2)))
     (clobber (match_dup:VSF 3))
!    (clobber (match_dup:VSF 4))
!    (clobber (match_dup:VSF 5))]
    {
      emit_insn (gen_frest_<mode>(operands[3], operands[2]));
      emit_insn (gen_fi_<mode>(operands[3], operands[2], operands[3]));
      emit_insn (gen_mul<mode>3(operands[4], operands[1], operands[3]));
!     emit_insn (gen_fnms_<mode>(operands[5], operands[4], operands[2], operands[1]));
!     emit_insn (gen_fma_<mode>(operands[3], operands[5], operands[3], operands[4]));
! 
!    /* Due to truncation error, the quotient result may be low by 1 ulp.
!       Conditionally add one if the estimate is too small in magnitude.  */
! 
!     emit_move_insn (gen_lowpart (<F2I>mode, operands[4]),
! 		    spu_const (<F2I>mode, 0x80000000ULL));
!     emit_move_insn (gen_lowpart (<F2I>mode, operands[5]),
! 		    spu_const (<F2I>mode, 0x3f800000ULL));
!     emit_insn (gen_selb (operands[5], operands[5], operands[1], operands[4]));
! 
!     emit_insn (gen_add<f2i>3 (gen_lowpart (<F2I>mode, operands[4]),
! 			      gen_lowpart (<F2I>mode, operands[3]),
! 			      spu_const (<F2I>mode, 1)));
!     emit_insn (gen_fnms_<mode> (operands[0], operands[2], operands[4], operands[1]));
!     emit_insn (gen_mul<mode>3 (operands[0], operands[0], operands[5]));
!     emit_insn (gen_cgt_<f2i> (gen_lowpart (<F2I>mode, operands[0]),
! 			      gen_lowpart (<F2I>mode, operands[0]),
! 			      spu_const (<F2I>mode, -1)));
!     emit_insn (gen_selb (operands[0], operands[3], operands[4], operands[0]));
      DONE;
    })
  

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH, spu] Improve precision of divsf3 on SPU
  2008-06-26 12:49   ` Ulrich Weigand
@ 2008-06-26 18:38     ` trevor_smigiel
  2008-07-21 17:54       ` Ulrich Weigand
  0 siblings, 1 reply; 5+ messages in thread
From: trevor_smigiel @ 2008-06-26 18:38 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches, andrew_pinski

I see.   Given that the original implementation is inaccurate, I think
it is ok to have a new implementation that is inaccurate in a different
way.  

I'm ok with either of the patches, as long as -ffast-math generates the
previous version, i.e., without the extra adjustment.

I assume we will eventually add the -mfloat=accurate and
-mdouble=accurate options to generate fully accurate answers.

Trevor

* Ulrich Weigand <uweigand@de.ibm.com> [2008-06-26 04:55]:
> Trevor Smigiel wrote:
> 
> > Is this result still consistent with round-to-zero?
> 
> Not in all cases.  I understand the original code always guarantees the
> result is smaller or equal in magnitude to the true quotient.  However,
> it is not always equal to the *nearest* such number (i.e. the result
> to be expected in round-to-zero mode), but sometimes 1 ulp less than
> this.
> 
> This 1 ulp off tends to occur very frequently if the true quotient is
> actually representable exactly.  This leads to the quite surprising
> behaviour that "simple" divisors like 1.0 or 2.0 nearly always yield
> wrong results (e.g. the identity x / 1.0 == x does not hold).
> 
> The patch tries to fix this by checking whether the number 1 ulp larger
> fits the real quotient better.  The intent is to still remain lower that
> the true result (to respect round-to-zero), but the code does not always
> achieve this.
> 
> There are two reasons for this:
> 
> - If the dividend is negative, the check is actually incorrect; we'd 
>   have to check the error term for <= 0 in this case, but we always
>   check it for >= 0.
> 
> - If the dividend is very small in magnitude (< 2^-100), the computation
>   of the error term can underflow to zero, so we accidentally treat a
>   too-large result as if it were the exact result.
> 
> The first of these problems can be fixed by multiplying the error term
> with -1.0 for negative dividends.  The patch below implements this; it
> is slightly less efficient than the original patch, but it may be 
> preferable to the original version as it avoids that systematic error.
> 
> It seems the second problem can only be fixed by much more elaborate
> code (e.g. normalizing the input operands and computing the result
> exponent by hand, as the simdmath _divf4.h code does) ...   I don't
> think we should do that for the "fast" inline implementation -- if this 
> deviates from round-to-zero for input values near the limits of 
> representable values, that should be an acceptable trade-off.
> 
> The alternative would be to provide a fully exact algorithm (along the
> lines of the simdmath implementation) as libgcc function.
> 
> What do you think?
> 
> Bye,
> Ulrich
> 
> 
> Patch below was tested on spu-elf with no regressions, fixes the same
> set of test cases that were fixed by the initial patch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH, spu] Improve precision of divsf3 on SPU
  2008-06-26 18:38     ` trevor_smigiel
@ 2008-07-21 17:54       ` Ulrich Weigand
  0 siblings, 0 replies; 5+ messages in thread
From: Ulrich Weigand @ 2008-07-21 17:54 UTC (permalink / raw)
  To: trevor_smigiel; +Cc: gcc-patches, andrew_pinski

Trevor Smigiel wrote:

> I see.   Given that the original implementation is inaccurate, I think
> it is ok to have a new implementation that is inaccurate in a different
> way.  
> 
> I'm ok with either of the patches, as long as -ffast-math generates the
> previous version, i.e., without the extra adjustment.
> 
> I assume we will eventually add the -mfloat=accurate and
> -mdouble=accurate options to generate fully accurate answers.

OK, this version falls back to the original code if
flag_unsafe_math_optimizations (which is enabled by -ffast-math).

Still fixes the same set of test cases originally reported.

Tested on spu-elf on mainline and 4.3 with no regressions.
Committed to mainline and 4.3.

Bye,
Ulrich


ChangeLog:

	* config/spu/spu.md ("div<mode>3"): Convert into expander, move
	original insn and splitter contents into ...
	("*div<mode>3_fast"): ... this new pattern.  Enable only if
	flag_unsafe_math_optimizations.  Add dummy scratch register.
	("*div<mode>3_adjusted"): New insn and splitter.  Enable only if
	!flag_unsafe_math_optimizations.  Returns number with next
	highest magnitude if this is still less or equal to the true
	quotient in magnitude.


Index: gcc/config/spu/spu.md
===================================================================
*** gcc/config/spu/spu.md	(revision 138006)
--- gcc/config/spu/spu.md	(working copy)
***************
*** 1721,1740 ****
    [(set_attr "type" "multi0")
     (set_attr "length" "80")])
  
! (define_insn_and_split "div<mode>3"
    [(set (match_operand:VSF 0 "spu_reg_operand" "=r")
  	(div:VSF (match_operand:VSF 1 "spu_reg_operand" "r")
  		 (match_operand:VSF 2 "spu_reg_operand" "r")))
     (clobber (match_scratch:VSF 3 "=&r"))
!    (clobber (match_scratch:VSF 4 "=&r"))]
!   ""
    "#"
    "reload_completed"
    [(set (match_dup:VSF 0)
  	(div:VSF (match_dup:VSF 1)
  		 (match_dup:VSF 2)))
     (clobber (match_dup:VSF 3))
!    (clobber (match_dup:VSF 4))]
    {
      emit_insn (gen_frest_<mode>(operands[3], operands[2]));
      emit_insn (gen_fi_<mode>(operands[3], operands[2], operands[3]));
--- 1721,1753 ----
    [(set_attr "type" "multi0")
     (set_attr "length" "80")])
  
! (define_expand "div<mode>3"
!   [(parallel
!     [(set (match_operand:VSF 0 "spu_reg_operand" "")	
! 	  (div:VSF (match_operand:VSF 1 "spu_reg_operand" "")
! 		   (match_operand:VSF 2 "spu_reg_operand" "")))
!      (clobber (match_scratch:VSF 3 ""))
!      (clobber (match_scratch:VSF 4 ""))
!      (clobber (match_scratch:VSF 5 ""))])]
!   ""
!   "")
! 
! (define_insn_and_split "*div<mode>3_fast"
    [(set (match_operand:VSF 0 "spu_reg_operand" "=r")
  	(div:VSF (match_operand:VSF 1 "spu_reg_operand" "r")
  		 (match_operand:VSF 2 "spu_reg_operand" "r")))
     (clobber (match_scratch:VSF 3 "=&r"))
!    (clobber (match_scratch:VSF 4 "=&r"))
!    (clobber (scratch:VSF))]
!   "flag_unsafe_math_optimizations"
    "#"
    "reload_completed"
    [(set (match_dup:VSF 0)
  	(div:VSF (match_dup:VSF 1)
  		 (match_dup:VSF 2)))
     (clobber (match_dup:VSF 3))
!    (clobber (match_dup:VSF 4))
!    (clobber (scratch:VSF))]
    {
      emit_insn (gen_frest_<mode>(operands[3], operands[2]));
      emit_insn (gen_fi_<mode>(operands[3], operands[2], operands[3]));
***************
*** 1744,1749 ****
--- 1757,1806 ----
      DONE;
    })
  
+ (define_insn_and_split "*div<mode>3_adjusted"
+   [(set (match_operand:VSF 0 "spu_reg_operand" "=r")
+ 	(div:VSF (match_operand:VSF 1 "spu_reg_operand" "r")
+ 		 (match_operand:VSF 2 "spu_reg_operand" "r")))
+    (clobber (match_scratch:VSF 3 "=&r"))
+    (clobber (match_scratch:VSF 4 "=&r"))
+    (clobber (match_scratch:VSF 5 "=&r"))]
+   "!flag_unsafe_math_optimizations"
+   "#"
+   "reload_completed"
+   [(set (match_dup:VSF 0)
+ 	(div:VSF (match_dup:VSF 1)
+ 		 (match_dup:VSF 2)))
+    (clobber (match_dup:VSF 3))
+    (clobber (match_dup:VSF 4))
+    (clobber (match_dup:VSF 5))]
+   {
+     emit_insn (gen_frest_<mode> (operands[3], operands[2]));
+     emit_insn (gen_fi_<mode> (operands[3], operands[2], operands[3]));
+     emit_insn (gen_mul<mode>3 (operands[4], operands[1], operands[3]));
+     emit_insn (gen_fnms_<mode> (operands[5], operands[4], operands[2], operands[1]));
+     emit_insn (gen_fma_<mode> (operands[3], operands[5], operands[3], operands[4]));
+ 
+    /* Due to truncation error, the quotient result may be low by 1 ulp.
+       Conditionally add one if the estimate is too small in magnitude.  */
+ 
+     emit_move_insn (gen_lowpart (<F2I>mode, operands[4]),
+ 		    spu_const (<F2I>mode, 0x80000000ULL));
+     emit_move_insn (gen_lowpart (<F2I>mode, operands[5]),
+ 		    spu_const (<F2I>mode, 0x3f800000ULL));
+     emit_insn (gen_selb (operands[5], operands[5], operands[1], operands[4]));
+ 
+     emit_insn (gen_add<f2i>3 (gen_lowpart (<F2I>mode, operands[4]),
+ 			      gen_lowpart (<F2I>mode, operands[3]),
+ 			      spu_const (<F2I>mode, 1)));
+     emit_insn (gen_fnms_<mode> (operands[0], operands[2], operands[4], operands[1]));
+     emit_insn (gen_mul<mode>3 (operands[0], operands[0], operands[5]));
+     emit_insn (gen_cgt_<f2i> (gen_lowpart (<F2I>mode, operands[0]),
+ 			      gen_lowpart (<F2I>mode, operands[0]),
+ 			      spu_const (<F2I>mode, -1)));
+     emit_insn (gen_selb (operands[0], operands[3], operands[4], operands[0]));
+     DONE;
+   })
+ 
  ;; Taken from STI's gcc
  ;; Does not correctly handle INF or NAN.
  (define_expand "divdf3"
-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-07-21 17:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-15 15:49 [PATCH, spu] Improve precision of divsf3 on SPU Ulrich Weigand
2008-06-16 19:11 ` trevor_smigiel
2008-06-26 12:49   ` Ulrich Weigand
2008-06-26 18:38     ` trevor_smigiel
2008-07-21 17:54       ` Ulrich Weigand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).