public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][i386] Implement ix86_emit_swdivsf more efficiently
@ 2011-03-14 15:59 Richard Guenther
  2011-03-17 14:36 ` Michael Matz
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Guenther @ 2011-03-14 15:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jan Hubicka


This rewrites the iteration step of swdivsf to be more register
efficient (two registers instead of four, no load of a FP constant).
This matches how ICC emits the rcp sequence and causes no overall loss
of precision (Micha might still remember the exact details).  The patch is
fallout of the work trying to fix PR47989.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for 4.7?

Thanks,
Richard.

2011-03-14  Richard Guenther  <rguenther@suse.de>

	* config/i386/i386.c (ix86_emit_swdivsf): Implement more
	efficiently.

Index: trunk/gcc/config/i386/i386.c
===================================================================
--- trunk.orig/gcc/config/i386/i386.c	2011-03-09 11:52:21.000000000 +0100
+++ trunk/gcc/config/i386/i386.c	2011-03-10 15:43:47.000000000 +0100
@@ -31747,38 +31747,38 @@ void ix86_emit_i387_log1p (rtx op0, rtx
 
 void ix86_emit_swdivsf (rtx res, rtx a, rtx b, enum machine_mode mode)
 {
-  rtx x0, x1, e0, e1, two;
+  rtx x0, x1, e0, e1;
 
   x0 = gen_reg_rtx (mode);
   e0 = gen_reg_rtx (mode);
   e1 = gen_reg_rtx (mode);
   x1 = gen_reg_rtx (mode);
 
-  two = CONST_DOUBLE_FROM_REAL_VALUE (dconst2, SFmode);
-
-  if (VECTOR_MODE_P (mode))
-    two = ix86_build_const_vector (mode, true, two);
-
-  two = force_reg (mode, two);
-
-  /* a / b = a * rcp(b) * (2.0 - b * rcp(b)) */
+  /* a / b = a * ((rcp(b) + rcp(b)) - (b * rcp(b) * rcp (b))) */
 
   /* x0 = rcp(b) estimate */
   emit_insn (gen_rtx_SET (VOIDmode, x0,
 			  gen_rtx_UNSPEC (mode, gen_rtvec (1, b),
 					  UNSPEC_RCP)));
-  /* e0 = x0 * a */
+  /* e0 = x0 * b */
   emit_insn (gen_rtx_SET (VOIDmode, e0,
-			  gen_rtx_MULT (mode, x0, a)));
-  /* e1 = x0 * b */
-  emit_insn (gen_rtx_SET (VOIDmode, e1,
 			  gen_rtx_MULT (mode, x0, b)));
-  /* x1 = 2. - e1 */
+
+  /* e0 = x0 * e0 */
+  emit_insn (gen_rtx_SET (VOIDmode, e0,
+			  gen_rtx_MULT (mode, x0, e0)));
+
+  /* e1 = x0 + x0 */
+  emit_insn (gen_rtx_SET (VOIDmode, e1,
+			  gen_rtx_PLUS (mode, x0, x0)));
+
+  /* x1 = e1 - e0 */
   emit_insn (gen_rtx_SET (VOIDmode, x1,
-			  gen_rtx_MINUS (mode, two, e1)));
-  /* res = e0 * x1 */
+			  gen_rtx_MINUS (mode, e1, e0)));
+
+  /* res = a * x1 */
   emit_insn (gen_rtx_SET (VOIDmode, res,
-			  gen_rtx_MULT (mode, e0, x1)));
+			  gen_rtx_MULT (mode, a, x1)));
 }
 
 /* Output code to perform a Newton-Rhapson approximation of a

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH][i386] Implement ix86_emit_swdivsf more efficiently
  2011-03-14 15:59 [PATCH][i386] Implement ix86_emit_swdivsf more efficiently Richard Guenther
@ 2011-03-17 14:36 ` Michael Matz
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Matz @ 2011-03-17 14:36 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Jan Hubicka

Hi,

On Mon, 14 Mar 2011, Richard Guenther wrote:

> This rewrites the iteration step of swdivsf to be more register 
> efficient (two registers instead of four, no load of a FP constant). 
> This matches how ICC emits the rcp sequence and causes no overall loss 
> of precision (Micha might still remember the exact details).

I haven't done a full error analysis for the intermediate rounding steps, 
but merely a statistical analysis for a subset of dividends and the full 
set of divisors.  On AMD and Intel processors (that matters because rcpss 
accuracy is different on both) the sum of all absolute errors between the 
quotient as from divss and the quotients from either our old and the new 
method is better for the new method.  The max error is 2ulps in each case.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH][i386] Implement ix86_emit_swdivsf more efficiently
@ 2011-03-14 17:23 Uros Bizjak
  0 siblings, 0 replies; 3+ messages in thread
From: Uros Bizjak @ 2011-03-14 17:23 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Guenther, jh

Hello!

> This rewrites the iteration step of swdivsf to be more register
> efficient (two registers instead of four, no load of a FP constant).
> This matches how ICC emits the rcp sequence and causes no overall loss
> of precision (Micha might still remember the exact details).  The patch is
> fallout of the work trying to fix PR47989.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for 4.7?
>
> Thanks,
> Richard.
>
> 2011-03-14  Richard Guenther  <rguenther@suse.de>
>
> 	* config/i386/i386.c (ix86_emit_swdivsf): Implement more
> 	efficiently.

OK for 4.7.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-03-17 14:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-14 15:59 [PATCH][i386] Implement ix86_emit_swdivsf more efficiently Richard Guenther
2011-03-17 14:36 ` Michael Matz
2011-03-14 17:23 Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).