public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/31723]  New: Use reciprocal and reciprocal square root with -ffast-math
@ 2007-04-27  8:07 jb at gcc dot gnu dot org
  2007-04-27  9:16 ` [Bug middle-end/31723] " burnus at gcc dot gnu dot org
                   ` (28 more replies)
  0 siblings, 29 replies; 30+ messages in thread
From: jb at gcc dot gnu dot org @ 2007-04-27  8:07 UTC (permalink / raw)
  To: gcc-bugs

I did some analysis of why gfortran does badly at the gas_dyn benchmark of the
Polyhedron benchmark suite. See my analysis at

http://gcc.gnu.org/ml/fortran/2007-04/msg00494.html

In short, GCC should use reciprocal and reciprocal square root instructions
(available in single precision for SSE and Altivec) when possible. These
instructions are very fast, a few cycles vs. dozens or hundreds of cycles for
normal division and square root instructions. However, as these instructions
are accurate only to 12 bits, they should be enabled only with -ffast-math (or
some separate option that gets included with -ffast-math).

The following C program demonstrates the issue, for all the functions it should
be possible to use reciprocal and/or reciprocal square root instructions
instead of normal div and sqrt:

#include <math.h>

float recip1 (float a)
{
  return 1.0f/a;
}

float recip2 (float a, float b)
{
  return a/b;
}

float rsqrt1 (float a)
{
  return 1.0f/sqrtf(a);
}

float rsqrt2 (float a, float b)
{
  /* Mathematically equivalent to 1/sqrt(b*(1/a))  */
  return sqrtf(a/b);
}

asm output (compiled with -std=c99 -O3 -c -Wall -pedantic -march=k8
-mfpmath=sse -ffast-math -S):

        .file   "recip.c"
        .text
        .p2align 4,,15
.globl recip1
        .type   recip1, @function
recip1:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movss   .LC0, %xmm0
        divss   8(%ebp), %xmm0
        movss   %xmm0, -4(%ebp)
        flds    -4(%ebp)
        leave
        ret
        .size   recip1, .-recip1
        .p2align 4,,15
.globl recip2
        .type   recip2, @function
recip2:
        pushl   %ebp
        movl    %esp, %ebp
        movss   8(%ebp), %xmm0
        divss   12(%ebp), %xmm0
        movss   %xmm0, 8(%ebp)
        flds    8(%ebp)
        leave
        ret
        .size   recip2, .-recip2
        .p2align 4,,15
.globl rsqrt2
        .type   rsqrt2, @function
rsqrt2:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movss   8(%ebp), %xmm0
        divss   12(%ebp), %xmm0
        sqrtss  %xmm0, %xmm0
        movss   %xmm0, -4(%ebp)
        flds    -4(%ebp)
        leave
        ret
        .size   rsqrt2, .-rsqrt2
        .p2align 4,,15
.globl rsqrt1
        .type   rsqrt1, @function
rsqrt1:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movss   .LC0, %xmm0
        sqrtss  8(%ebp), %xmm1
        divss   %xmm1, %xmm0
        movss   %xmm0, -4(%ebp)
        flds    -4(%ebp)
        leave
        ret
        .size   rsqrt1, .-rsqrt1
        .section        .rodata.cst4,"aM",@progbits,4
        .align 4
.LC0:
        .long   1065353216
        .ident  "GCC: (GNU) 4.3.0 20070426 (experimental)"
        .section        .note.GNU-stack,"",@progbits


As can be seen, it uses divss and sqrtss instead of rcpss and rsqrtss. Of
course, there are vectorized versions of these functions too, rcpps and
rsqrtps, that should be used when appropriate (vectorization is important e.g.
for gas_dyn).


-- 
           Summary: Use reciprocal and reciprocal square root with -ffast-
                    math
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jb at gcc dot gnu dot org
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
@ 2007-04-27  9:16 ` burnus at gcc dot gnu dot org
  2007-04-27  9:45 ` rguenth at gcc dot gnu dot org
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-04-27  9:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from burnus at gcc dot gnu dot org  2007-04-27 10:16 -------
Comment by Richard Guenther in the same thread:
-----------------
I think that even with -ffast-math 12 bits accuracy is not ok.  There is
the possibility of doing another newton iteration step to improve
accuracy, that would be ok for -ffast-math.  We can, though, add an
extra flag -msserecip or however you'd call it to enable use of the
instructions with less accuracy.


-- 

burnus at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |burnus at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
  2007-04-27  9:16 ` [Bug middle-end/31723] " burnus at gcc dot gnu dot org
@ 2007-04-27  9:45 ` rguenth at gcc dot gnu dot org
  2007-04-27 10:27 ` jb at gcc dot gnu dot org
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-04-27  9:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2007-04-27 10:45 -------
Note that SSE can vectorize only the float precision variant, not the double
precision one.  So one needs to carefuly either disable vectorization for the
double variant to get reciprocal code or the other way around.

Note that the function/pattern vectorizer needs to be quite "adjusted" to
support
emitting mutliple instructions if we don't want to create builtin functions for
the result.  But it's certainly possible.

The easier part is to expand differently.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2007-04-27 10:45:36
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
  2007-04-27  9:16 ` [Bug middle-end/31723] " burnus at gcc dot gnu dot org
  2007-04-27  9:45 ` rguenth at gcc dot gnu dot org
@ 2007-04-27 10:27 ` jb at gcc dot gnu dot org
  2007-04-27 10:29 ` jb at gcc dot gnu dot org
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jb at gcc dot gnu dot org @ 2007-04-27 10:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from jb at gcc dot gnu dot org  2007-04-27 11:27 -------
(In reply to comment #2)
> Note that SSE can vectorize only the float precision variant, not the double
> precision one.  So one needs to carefuly either disable vectorization for the
> double variant to get reciprocal code or the other way around.

AFAICS these reciprocal instructions are available only for single precision,
both for scalar and packed variants. Altivec is only single precision, the SSE
instructions are 

rcpss (single precision scalar reciprocal)
rcpps (single precision packed reciprocal)
rsqrtss (single precision scalar reciprocal square root)
rsqrtps (single precision packed reciprocal square root)

There are no equivalent double precision versions of any of these instructions.
Or do you think there would be a speed benefit for double precision to

1. Convert to single precision
2. Calculate rcp(s|p)s or rsqrt(p|s)s
3. Refine with newton iteration

vs. just using div(p|s)d or sqrt(p|s)d?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (2 preceding siblings ...)
  2007-04-27 10:27 ` jb at gcc dot gnu dot org
@ 2007-04-27 10:29 ` jb at gcc dot gnu dot org
  2007-04-27 11:01 ` jb at gcc dot gnu dot org
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jb at gcc dot gnu dot org @ 2007-04-27 10:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from jb at gcc dot gnu dot org  2007-04-27 11:29 -------
(In reply to comment #3)
> 1. Convert to single precision
> 2. Calculate rcp(s|p)s or rsqrt(p|s)s
> 3. Refine with newton iteration
> 
> vs. just using div(p|s)d or sqrt(p|s)d?

This should be

1. Convert to single precision
2. Calculate rcp(s|p)s or rsqrt(p|s)s
3. Convert back to double precision
4. Refine with newton iteration


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (3 preceding siblings ...)
  2007-04-27 10:29 ` jb at gcc dot gnu dot org
@ 2007-04-27 11:01 ` jb at gcc dot gnu dot org
  2007-04-27 11:09 ` rguenth at gcc dot gnu dot org
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jb at gcc dot gnu dot org @ 2007-04-27 11:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from jb at gcc dot gnu dot org  2007-04-27 12:01 -------
With the benchmarks at http://www.hlnum.org/english/doc/frsqrt/frsqrt.html

I get

~/src/benchmark/rsqrt% g++ -O3 -funroll-loops -ffast-math -funit-at-a-time
-march=k8 -mfpmath=sse frsqrt.cc
~/src/benchmark/rsqrt% ./a.out
first example: 1 / sqrt(3)
  exact  = 5.7735026918962584e-01
  float  = 5.7735025882720947e-01, error = 1.7948e-08
  double = 5.7735026918962506e-01, error = 1.3461e-15
second example: 1 / sqrt(5)
  exact  = 4.4721359549995793e-01
  float  = 4.4721359014511108e-01, error = 1.1974e-08
  double = 4.4721359549995704e-01, error = 1.9860e-15

Benchmark

(float)  time for 1.0 / sqrt = 5.96 sec (res = 2.8450581250000000e+05)
(float)  time for      rsqrt = 2.49 sec (res = 2.2360225000000000e+05)
(double)  time for 1.0 / sqrt = 7.35 sec (res = 5.9926234364635509e+05)
(double)  time for      rsqrt = 7.49 sec (res = 5.9926234364355623e+05)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (4 preceding siblings ...)
  2007-04-27 11:01 ` jb at gcc dot gnu dot org
@ 2007-04-27 11:09 ` rguenth at gcc dot gnu dot org
  2007-04-27 11:41 ` burnus at gcc dot gnu dot org
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-04-27 11:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenth at gcc dot gnu dot org  2007-04-27 12:09 -------
You are right, they are only available for float precision.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (5 preceding siblings ...)
  2007-04-27 11:09 ` rguenth at gcc dot gnu dot org
@ 2007-04-27 11:41 ` burnus at gcc dot gnu dot org
  2007-04-27 20:43 ` steven at gcc dot gnu dot org
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-04-27 11:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from burnus at gcc dot gnu dot org  2007-04-27 12:41 -------
> (float)  time for 1.0 / sqrt = 5.96 sec (res = 2.8450581250000000e+05)
> (float)  time for      rsqrt = 2.49 sec (res = 2.2360225000000000e+05)
> (double)  time for 1.0 / sqrt = 7.35 sec (res = 5.9926234364635509e+05)
> (double)  time for      rsqrt = 7.49 sec (res = 5.9926234364355623e+05)

On an Athlon 64 2x, the double result is more favourable for rsqrt
(using the system g++ 4.1.2 with g++ -march=opteron -O3 -ftree-vectorize
-funroll-loops -funit-at-a-time -msse3 frsqrt.cc; similarly with -ffast-math)

(float)  time for 1.0 / sqrt = 3.76 sec (res = 1.7943843750000000e+05)
(float)  time for      rsqrt = 1.72 sec (res = 1.7943843750000000e+05)
(double)  time for 1.0 / sqrt = 5.15 sec (res = 5.9926234364320245e+05)
(double)  time for      rsqrt = 3.34 sec (res = 5.9926234364320245e+05)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (6 preceding siblings ...)
  2007-04-27 11:41 ` burnus at gcc dot gnu dot org
@ 2007-04-27 20:43 ` steven at gcc dot gnu dot org
  2007-04-27 21:03 ` rguenth at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: steven at gcc dot gnu dot org @ 2007-04-27 20:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from steven at gcc dot gnu dot org  2007-04-27 21:43 -------
I suppose this is something that requires new builtins?


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steven at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (7 preceding siblings ...)
  2007-04-27 20:43 ` steven at gcc dot gnu dot org
@ 2007-04-27 21:03 ` rguenth at gcc dot gnu dot org
  2007-04-27 23:25 ` pinskia at gcc dot gnu dot org
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-04-27 21:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from rguenth at gcc dot gnu dot org  2007-04-27 22:03 -------
I looked at this at some time and in priciple it doens't require it.  For the
vectorized call we'd need to support target dependent pattern vectorization,
for the scalar case we would need a new optab to handle 1/x expansion
specially.
Now, for 1/sqrt a builtin could make sense, but even that can be handled via
another optab at expansion time.

Just to have the time and start experimenting...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (8 preceding siblings ...)
  2007-04-27 21:03 ` rguenth at gcc dot gnu dot org
@ 2007-04-27 23:25 ` pinskia at gcc dot gnu dot org
  2007-06-10  8:28 ` ubizjak at gmail dot com
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-04-27 23:25 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (9 preceding siblings ...)
  2007-04-27 23:25 ` pinskia at gcc dot gnu dot org
@ 2007-06-10  8:28 ` ubizjak at gmail dot com
  2007-06-10 10:47 ` ubizjak at gmail dot com
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-10  8:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from ubizjak at gmail dot com  2007-06-10 08:28 -------
I have experimented a bit with rcpss, trying to measure the effect of
additional NR step to the performance. NR step was calculated based on
http://en.wikipedia.org/wiki/N-th_root_algorithm, and for N=-1 (1/A) we can
simplify to:

x1 = x0 (2.0 - A X0)

To obtain 24bit precision, we have to use a reciprocal, two multiplies and
subtraction (+ a constant load).

First, please note that "divss" instruction is quite _fast_, clocking at 23
cycles, where approximation with NR step would sum up to 20 cycles, not
counting load of constant.

I have checked the performance of following testcase with various
implementetations on x86_64 C2D:

--cut here--
float test(float a)
{
  return 1.0 / a;
}


int main()
{
  float a = 1.12345;
  volatile float t;
  int i;

  for (i = 1; i < 1000000000; i++)
    {
      t += test (a);
      a += 1.0;
    }

  printf("%f\n", t);

  return 0;
}
--cut here--

divss     : 3.132s
rcpss NR  : 3.264s
rcpss only: 3.080s

To enhance the precision of 1/sqrt(A), additional NR step is calculated as

x1 = 0.5 X0 (3.0 - A x0 x0 x0)

and considering that sqrtss also clocks at 23 clocks (_far_ from hundreds of
clocks ;) ), additional NR step just isn't worth it.

The experimental patch:

Index: i386.md
===================================================================
--- i386.md     (revision 125599)
+++ i386.md     (working copy)
@@ -15399,6 +15399,15 @@
 ;; Gcc is slightly more smart about handling normal two address instructions
 ;; so use special patterns for add and mull.

+(define_insn "*rcpsf2_sse"
+  [(set (match_operand:SF 0 "register_operand" "=x")
+       (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm")]
+                  UNSPEC_RCP))]
+  "TARGET_SSE"
+  "rcpss\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sse")
+   (set_attr "mode" "SF")])
+
 (define_insn "*fop_sf_comm_mixed"
   [(set (match_operand:SF 0 "register_operand" "=f,x")
        (match_operator:SF 3 "binary_fp_operator"
@@ -15448,6 +15457,29 @@
           (const_string "fop")))
    (set_attr "mode" "SF")])

+(define_insn_and_split "*rcp_sf_1_sse"
+  [(set (match_operand:SF 0 "register_operand" "=x")
+       (div:SF (match_operand:SF 1 "immediate_operand" "F")
+               (match_operand:SF 2 "nonimmediate_operand" "xm")))
+   (clobber (match_scratch:SF 3 "=&x"))
+   (clobber (match_scratch:SF 4 "=&x"))]
+  "TARGET_SSE_MATH
+   && operands[1] == CONST1_RTX (SFmode)
+   && flag_unsafe_math_optimizations"
+   "#"
+   "&& reload_completed"
+   [(set (match_dup 3)(match_dup 2))
+    (set (match_dup 4)(match_dup 5))
+    (set (match_dup 0)(unspec:SF [(match_dup 3)] UNSPEC_RCP))
+    (set (match_dup 3)(mult:SF (match_dup 3)(match_dup 0)))
+    (set (match_dup 4)(minus:SF (match_dup 4)(match_dup 3)))
+    (set (match_dup 0)(mult:SF (match_dup 0)(match_dup 4)))]
+{
+  rtx two = const_double_from_real_value (dconst2, SFmode);
+
+  operands[5] = validize_mem (force_const_mem (SFmode, two));
+})
+
 (define_insn "*fop_sf_1_mixed"
   [(set (match_operand:SF 0 "register_operand" "=f,f,x")
        (match_operator:SF 3 "binary_fp_operator"

Based on these findings, I guess that NR step is just not worth it. If we want
to have noticeable speed-up on division and square root, we have to use 12bit
implementations, without any refinements - mainly for benchmarketing, I'm
afraid.

BTW: on x86_64, patched gcc compiles "test" function to:

test:
        movaps  %xmm0, %xmm1
        rcpss   %xmm0, %xmm0
        movss   .LC1(%rip), %xmm2
        mulss   %xmm0, %xmm1
        subss   %xmm1, %xmm2
        mulss   %xmm2, %xmm0
        ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (10 preceding siblings ...)
  2007-06-10  8:28 ` ubizjak at gmail dot com
@ 2007-06-10 10:47 ` ubizjak at gmail dot com
  2007-06-10 11:06 ` jb at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-10 10:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from ubizjak at gmail dot com  2007-06-10 10:47 -------
Here are the results of mubench insn timings for various x86 processors:
http://mubench.sourceforge.net/results.html (target processor can be
benchmarked by downloading mubench from
http://mubench.sourceforge.net/index.html).

And finally an interesting read how commercial compilers trade accurracy for
speed (please read at least about SPEC2006 benchmark):
http://www.hpcwire.com/hpc/1556972.html


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (11 preceding siblings ...)
  2007-06-10 10:47 ` ubizjak at gmail dot com
@ 2007-06-10 11:06 ` jb at gcc dot gnu dot org
  2007-06-10 12:07 ` rguenth at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: jb at gcc dot gnu dot org @ 2007-06-10 11:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from jb at gcc dot gnu dot org  2007-06-10 11:06 -------
(In reply to comment #11)

Thanks for the work.

> First, please note that "divss" instruction is quite _fast_, clocking at 23
> cycles, where approximation with NR step would sum up to 20 cycles, not
> counting load of constant.
> 
> I have checked the performance of following testcase with various
> implementetations on x86_64 C2D:
> 
> --cut here--
> float test(float a)
> {
>   return 1.0 / a;
> }
>
> divss     : 3.132s
> rcpss NR  : 3.264s
> rcpss only: 3.080s

Interesting, on ubuntu/i686/K8 I get (average of 3 runs)

divss: 7.485 s
rcpss NR: 9.915 s

> To enhance the precision of 1/sqrt(A), additional NR step is calculated as
> 
> x1 = 0.5 X0 (3.0 - A x0 x0 x0)
> 
> and considering that sqrtss also clocks at 23 clocks (_far_ from hundreds of
> clocks ;) ), additional NR step just isn't worth it.

Well, I suppose it depends on the hardware. IIRC older cpu:s did division with
microcode whereas at least core2 and K8 do it in hardware, so I guess the
hundreds of cycles doesn't apply to current cpu:s. 

Also, supposedly Penryn will have a much improved divider..

That being said, I think there is still a case for the reciprocal square root,
as evidenced by the benchmarks in #5 and #7 as well as my analysis of gas_dyn
linked to in the first message in this PR (in short, ifort does sqrt(a/b) about
twice as fast as gfortran by using reciprocal approximations + NR). If indeed
div(p|s)s is about equally fast as rcp(p|s)s as your benchmarks show, then it
suggests almost all the performance benefit ifort gets is due to the
rsqrt(p|s)s, no? Or perhaps there is some issue with pipelining? In gas_dyn the
sqrt(a/b) loop fills an array, whereas your benchmark accumulates..

> Based on these findings, I guess that NR step is just not worth it. If we want
> to have noticeable speed-up on division and square root, we have to use 12bit
> implementations, without any refinements - mainly for benchmarketing, I'm
> afraid.

I hear that it's possible to pass spec2k6/gromacs without the NR step. As most
MD programs, gromacs spends almost all it's time in the force calculations,
where the majority of time is spent calculating 1/sqrt(...). So perhaps one
should watch out for compilers that get suspiciously high scores on that
benchmark. :)

No, I'm not suggesting gcc should do this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (12 preceding siblings ...)
  2007-06-10 11:06 ` jb at gcc dot gnu dot org
@ 2007-06-10 12:07 ` rguenth at gcc dot gnu dot org
  2007-06-10 12:09 ` rguenth at gcc dot gnu dot org
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-06-10 12:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from rguenth at gcc dot gnu dot org  2007-06-10 12:07 -------
The interesting difference between sqrtss, divss and rcpss, rsqrtss is that
the former have throughput of 1/16 while the latter are 1/1 (latencies compare
21 vs. 3).  This is on K10.  The optimization guide only mentions calculating
the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss
(sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)))

So the optimization would be mainly to improve instruction throughput, not
overall latency.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (13 preceding siblings ...)
  2007-06-10 12:07 ` rguenth at gcc dot gnu dot org
@ 2007-06-10 12:09 ` rguenth at gcc dot gnu dot org
  2007-06-10 16:25 ` ubizjak at gmail dot com
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-06-10 12:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from rguenth at gcc dot gnu dot org  2007-06-10 12:09 -------
And of course optimizing division or square root this way violates IEEE 754
which
specifies these as intrinsic operations.  So a separate flag from
-funsafe-math-optimization should be used for this optimization.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (14 preceding siblings ...)
  2007-06-10 12:09 ` rguenth at gcc dot gnu dot org
@ 2007-06-10 16:25 ` ubizjak at gmail dot com
  2007-06-10 16:49 ` ubizjak at gmail dot com
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-10 16:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from ubizjak at gmail dot com  2007-06-10 16:24 -------
(In reply to comment #13)

> > x1 = 0.5 X0 (3.0 - A x0 x0 x0)

Whops! One x0 too much above. Correct calcualtion reads:

rsqrt = 0.5 rsqrt(a) (3.0 - a rsqrt(a) rsqrt(a)).

> Well, I suppose it depends on the hardware. IIRC older cpu:s did division with
> microcode whereas at least core2 and K8 do it in hardware, so I guess the
> hundreds of cycles doesn't apply to current cpu:s. 
> 
> Also, supposedly Penryn will have a much improved divider..

Well, mubench says for my Core2Duo that _all_ sqrt and div functions have
latency of 6 clocks and rcp throughput of 5 clks. By _all_ I mean divss, divps,
divsd, divpd, sqrtss, sqrtps, sqrtsd and sqrtpd. OTOH, rsqrtss and rcpss have
latency of 3 clks and rcp throughput of 2 clks. This is just amazing.

> That being said, I think there is still a case for the reciprocal square root,
> as evidenced by the benchmarks in #5 and #7 as well as my analysis of gas_dyn
> linked to in the first message in this PR (in short, ifort does sqrt(a/b) about
> twice as fast as gfortran by using reciprocal approximations + NR). If indeed
> div(p|s)s is about equally fast as rcp(p|s)s as your benchmarks show, then it
> suggests almost all the performance benefit ifort gets is due to the
> rsqrt(p|s)s, no? Or perhaps there is some issue with pipelining? In gas_dyn the
> sqrt(a/b) loop fills an array, whereas your benchmark accumulates..

It is true, that only a trivial accumulation function is benchmarked by my
"benchmark". I can prepare a bunch of expanders to expand:

a / b <=> a [rcpss(b) (2.0 - b rcpss(b))]

a / sqrtss(b) <=> a [0.5 rsqrtss(b) (3.0 - b rsqrtss(b) rsqrtss(b))].

sqrtss (a) <=> a 0.5 rsqrtss(a) (3.0 - a rsqrtss(a) rsqrtss(a))

second and third case indeed look similar...

> I hear that it's possible to pass spec2k6/gromacs without the NR step. As most
> MD programs, gromacs spends almost all it's time in the force calculations,
> where the majority of time is spent calculating 1/sqrt(...). So perhaps one
> should watch out for compilers that get suspiciously high scores on that
> benchmark. :)

Yes, look at hpcwire article in Comment #12

> No, I'm not suggesting gcc should do this.

;))


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (15 preceding siblings ...)
  2007-06-10 16:25 ` ubizjak at gmail dot com
@ 2007-06-10 16:49 ` ubizjak at gmail dot com
  2007-06-10 17:34 ` ubizjak at gmail dot com
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-10 16:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from ubizjak at gmail dot com  2007-06-10 16:49 -------
(In reply to comment #0)

>   /* Mathematically equivalent to 1/sqrt(b*(1/a))  */
>   return sqrtf(a/b);

Whoa, this one is a little gem, but ATM in the opposite direction. At least for
-ffast-math we could optimize (a / sqrt (b/c)) into a * sqrt (c/b), thus
loosing one division. I'm sure that richi knows by his heart, how to write this
kind of folding ;)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (16 preceding siblings ...)
  2007-06-10 16:49 ` ubizjak at gmail dot com
@ 2007-06-10 17:34 ` ubizjak at gmail dot com
  2007-06-10 21:39 ` rguenther at suse dot de
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-10 17:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from ubizjak at gmail dot com  2007-06-10 17:34 -------
(In reply to comment #14)
> The interesting difference between sqrtss, divss and rcpss, rsqrtss is that
> the former have throughput of 1/16 while the latter are 1/1 (latencies compare
> 21 vs. 3).  This is on K10.  The optimization guide only mentions calculating
> the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss
> (sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)))
> 
> So the optimization would be mainly to improve instruction throughput, not
> overall latency.

If this is the case, then middle-end will need to fold sqrtss in different way
for targets that prefer rsqrtss. According to Comment #16, it is better to fold
to 1.0/sqrt(c/b) instead of sqrt(b/c) because this way, we will loose one
multiplication during NR expansion by rsqrt [due to sqrt(x) <=>  x * (1.0 /
sqrt(x))].

IMO we need a new tree code to handle reciprocal sqrt - RSQRT_EXPR, together
with proper folding functionality that expands directly to (NR-enhanced) rsqrt
optab. If we consider a*sqrt(b/c), then b/c will be expanded as b* NR-rcp(c)
[where NR-rcp stands for NR enhanced rcp] and sqrt will be expanded as
NR-rsqrt. In this case, I see no RTL pass that would be able to combine
everything together in order to swap (b/c) operands to produce NR-enhanced
a*rsqrt(c/b) equivalent.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (17 preceding siblings ...)
  2007-06-10 17:34 ` ubizjak at gmail dot com
@ 2007-06-10 21:39 ` rguenther at suse dot de
  2007-06-10 21:47 ` rguenth at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenther at suse dot de @ 2007-06-10 21:39 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from rguenther at suse dot de  2007-06-10 21:39 -------
Subject: Re:  Use reciprocal and reciprocal square root
 with -ffast-math

On Sun, 10 Jun 2007, ubizjak at gmail dot com wrote:

> 
> 
> ------- Comment #18 from ubizjak at gmail dot com  2007-06-10 17:34 -------
> (In reply to comment #14)
> > The interesting difference between sqrtss, divss and rcpss, rsqrtss is that
> > the former have throughput of 1/16 while the latter are 1/1 (latencies compare
> > 21 vs. 3).  This is on K10.  The optimization guide only mentions calculating
> > the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss
> > (sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)))
> > 
> > So the optimization would be mainly to improve instruction throughput, not
> > overall latency.
> 
> If this is the case, then middle-end will need to fold sqrtss in different way
> for targets that prefer rsqrtss. According to Comment #16, it is better to fold
> to 1.0/sqrt(c/b) instead of sqrt(b/c) because this way, we will loose one
> multiplication during NR expansion by rsqrt [due to sqrt(x) <=>  x * (1.0 /
> sqrt(x))].
> 
> IMO we need a new tree code to handle reciprocal sqrt - RSQRT_EXPR, together
> with proper folding functionality that expands directly to (NR-enhanced) rsqrt
> optab. If we consider a*sqrt(b/c), then b/c will be expanded as b* NR-rcp(c)
> [where NR-rcp stands for NR enhanced rcp] and sqrt will be expanded as
> NR-rsqrt. In this case, I see no RTL pass that would be able to combine
> everything together in order to swap (b/c) operands to produce NR-enhanced
> a*rsqrt(c/b) equivalent.

We just need a new builtin function, __builtin_rsqrt and at some stage
replace reciprocals of sqrt with the new builtin.  For example in
tree-ssa-math-opts.c which does the existing reciprocal transforms.
For example a target hook could be provided that would for example look
like

   tree target_fn_for_expr (tree expr);

and return a target builtin decl for the given expression.

And we should start splitting this PR ;)  One for a/sqrt(b/c) and one
for the above transformation.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (18 preceding siblings ...)
  2007-06-10 21:39 ` rguenther at suse dot de
@ 2007-06-10 21:47 ` rguenth at gcc dot gnu dot org
  2007-06-10 21:48 ` rguenth at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-06-10 21:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from rguenth at gcc dot gnu dot org  2007-06-10 21:46 -------
PR32279 for 1/sqrt(x/y) to sqrt(y/x)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (19 preceding siblings ...)
  2007-06-10 21:47 ` rguenth at gcc dot gnu dot org
@ 2007-06-10 21:48 ` rguenth at gcc dot gnu dot org
  2007-06-11  3:32 ` tbptbp at gmail dot com
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-06-10 21:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from rguenth at gcc dot gnu dot org  2007-06-10 21:48 -------
The other issue is really about this bug, so not splitting.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (20 preceding siblings ...)
  2007-06-10 21:48 ` rguenth at gcc dot gnu dot org
@ 2007-06-11  3:32 ` tbptbp at gmail dot com
  2007-06-11  5:51 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: tbptbp at gmail dot com @ 2007-06-11  3:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from tbptbp at gmail dot com  2007-06-11 03:32 -------
I'm a bit late to the debate but...

At some point icc did such transformations (for 1/x and sqrt) but, apparently,
they're now removed. It didn't bother to plug every holes (ie wrt infinities)
but at least got the case of 0 covered even when set lose; it's cheap to do.
I've repeatedly been pointed to the peculiar semantic of -ffast-math in the
past, so i know there's little chance for me to succeed, but would it be
possible to consider that as an option?

PS: Yes, i do rely on infinities and -ffast-math and deserve to die a slow and
painful way.


-- 

tbptbp at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tbptbp at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (21 preceding siblings ...)
  2007-06-11  3:32 ` tbptbp at gmail dot com
@ 2007-06-11  5:51 ` ubizjak at gmail dot com
  2007-06-11  5:58 ` tbptbp at gmail dot com
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-11  5:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from ubizjak at gmail dot com  2007-06-11 05:51 -------
(In reply to comment #22)

> At some point icc did such transformations (for 1/x and sqrt) but, apparently,
> they're now removed. It didn't bother to plug every holes (ie wrt infinities)
> but at least got the case of 0 covered even when set lose; it's cheap to do.
> I've repeatedly been pointed to the peculiar semantic of -ffast-math in the
> past, so i know there's little chance for me to succeed, but would it be
> possible to consider that as an option?

But both, rcpss and rsqrtss handle infinties correctly (they return zero) and
return [-]inf when [-]0.0 is used as an argument.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (22 preceding siblings ...)
  2007-06-11  5:51 ` ubizjak at gmail dot com
@ 2007-06-11  5:58 ` tbptbp at gmail dot com
  2007-06-13 20:21 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: tbptbp at gmail dot com @ 2007-06-11  5:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from tbptbp at gmail dot com  2007-06-11 05:58 -------
Yes, but there's some fuss at 0 when you pile up a NR round.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (23 preceding siblings ...)
  2007-06-11  5:58 ` tbptbp at gmail dot com
@ 2007-06-13 20:21 ` ubizjak at gmail dot com
  2007-06-14  9:18 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-13 20:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from ubizjak at gmail dot com  2007-06-13 20:20 -------
RFC patch at http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00916.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (24 preceding siblings ...)
  2007-06-13 20:21 ` ubizjak at gmail dot com
@ 2007-06-14  9:18 ` ubizjak at gmail dot com
  2007-06-15 13:23 ` burnus at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-14  9:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from ubizjak at gmail dot com  2007-06-14 09:18 -------
Patch at http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00944.html


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|ubizjak at gmail dot com    |
         AssignedTo|unassigned at gcc dot gnu   |ubizjak at gmail dot com
                   |dot org                     |
                URL|                            |http://gcc.gnu.org/ml/gcc-
                   |                            |patches/2007-
                   |                            |06/msg00944.html
             Status|NEW                         |ASSIGNED
           Keywords|                            |patch
   Last reconfirmed|2007-04-27 10:45:36         |2007-06-14 09:18:11
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (25 preceding siblings ...)
  2007-06-14  9:18 ` ubizjak at gmail dot com
@ 2007-06-15 13:23 ` burnus at gcc dot gnu dot org
  2007-06-16  9:53 ` uros at gcc dot gnu dot org
  2007-06-18  8:56 ` ubizjak at gmail dot com
  28 siblings, 0 replies; 30+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-06-15 13:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #27 from burnus at gcc dot gnu dot org  2007-06-15 13:23 -------
Cross-pointer: see also PR 32352 (Polyhedron aermod.f90 crashes due
out-of-bounds problems to numerical differences using rsqrt/-mrecip).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (26 preceding siblings ...)
  2007-06-15 13:23 ` burnus at gcc dot gnu dot org
@ 2007-06-16  9:53 ` uros at gcc dot gnu dot org
  2007-06-18  8:56 ` ubizjak at gmail dot com
  28 siblings, 0 replies; 30+ messages in thread
From: uros at gcc dot gnu dot org @ 2007-06-16  9:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #28 from uros at gcc dot gnu dot org  2007-06-16 09:53 -------
Subject: Bug 31723

Author: uros
Date: Sat Jun 16 09:52:48 2007
New Revision: 125756

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=125756
Log:
    PR middle-end/31723
    * hooks.c (hook_tree_tree_bool_null): New hook.
    * hooks.h (hook_tree_tree_bool_null): Add prototype.
    * tree-pass.h (pass_convert_to_rsqrt): Declare.
    * passes.c (init_optimization_passes): Add pass_convert_to_rsqrt.
    * tree-ssa-math-opts.c (execute_cse_reciprocals): Scan for a/func(b)
    and convert it to reciprocal a*rfunc(b).
    (execute_convert_to_rsqrt): New function.
    (gate_convert_to_rsqrt): New function.
    (pass_convert_to_rsqrt): New pass definition.
    * target.h (struct gcc_target): Add builtin_reciprocal.
    * target-def.h (TARGET_BUILTIN_RECIPROCAL): New define.
    (TARGET_INITIALIZER): Initialize builtin_reciprocal with
    TARGET_BUILTIN_RECIPROCAL.
    * doc/tm.texi (TARGET_BUILTIN_RECIPROCAL): Document.

    * config/i386/i386.h (TARGET_RECIP): New define.
    * config/i386/i386.md (divsf3): Expand by calling ix86_emit_swdivsf
    for TARGET_SSE_MATH and TARGET_RECIP when
    flag_unsafe_math_optimizations is set and not optimizing for size.
    (*rcpsf2_sse): New insn pattern.
    (*rsqrtsf2_sse): Ditto.
    (rsqrtsf2): New expander.  Expand by calling ix86_emit_swsqrtsf
    for TARGET_SSE_MATH and TARGET_RECIP when
    flag_unsafe_math_optimizations is set and not optimizing for size.
    (sqrt<mode>2): Expand SFmode operands by calling ix86_emit_swsqrtsf
    for TARGET_SSE_MATH and TARGET_RECIP when
    flag_unsafe_math_optimizations is set and not optimizing for size.
    * config/i386/sse.md (divv4sf): Expand by calling ix86_emit_swdivsf
    for TARGET_SSE_MATH and TARGET_RECIP when
    flag_unsafe_math_optimizations is set and not optimizing for size.
    (*sse_rsqrtv4sf2): Do not export.
    (sqrtv4sf2): Ditto.
    (sse_rsqrtv4sf2): New expander.  Expand by calling ix86_emit_swsqrtsf
    for TARGET_SSE_MATH and TARGET_RECIP when
    flag_unsafe_math_optimizations is set and not optimizing for size.
    (sqrtv4sf2): Ditto.
    * config/i386/i386.opt (mrecip): New option.
    * config/i386/i386-protos.h (ix86_emit_swdivsf): Declare.
    (ix86_emit_swsqrtsf): Ditto.
    * config/i386/i386.c (IX86_BUILTIN_RSQRTF): New constant.
    (ix86_init_mmx_sse_builtins): __builtin_ia32_rsqrtf: New
    builtin definition.
    (ix86_expand_builtin): Expand IX86_BUILTIN_RSQRTF using
    ix86_expand_unop1_builtin.
    (ix86_emit_swdivsf): New function.
    (ix86_emit_swsqrtsf): Ditto.
    (ix86_builtin_reciprocal): New function.
    (TARGET_BUILTIN_RECIPROCAL): Use it.
    (ix86_vectorize_builtin_conversion): Rename from
    ix86_builtin_conversion.
    (TARGET_VECTORIZE_BUILTIN_CONVERSION): Use renamed function.
    * doc/invoke.texi (Machine Dependent Options): Add -mrecip to
    "i386 and x86_64 Options" section.
    (Intel 386 and AMD x86_64 Options): Document -mrecip.

testsuite/ChangeLog:

    PR middle-end/31723
    * gcc.target/i386/recip-divf.c: New test.
    * gcc.target/i386/recip-sqrtf.c: Ditto.
    * gcc.target/i386/recip-vec-divf.c: Ditto.
    * gcc.target/i386/recip-vec-sqrtf.c: Ditto.
    * gcc.target/i386/sse-recip.c: Ditto.


Added:
    trunk/gcc/testsuite/gcc.target/i386/recip-divf.c
    trunk/gcc/testsuite/gcc.target/i386/recip-sqrtf.c
    trunk/gcc/testsuite/gcc.target/i386/recip-vec-divf.c
    trunk/gcc/testsuite/gcc.target/i386/recip-vec-sqrtf.c
    trunk/gcc/testsuite/gcc.target/i386/sse-recip.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386-protos.h
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.h
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/i386.opt
    trunk/gcc/config/i386/sse.md
    trunk/gcc/doc/invoke.texi
    trunk/gcc/doc/tm.texi
    trunk/gcc/hooks.c
    trunk/gcc/hooks.h
    trunk/gcc/passes.c
    trunk/gcc/target-def.h
    trunk/gcc/target.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-pass.h
    trunk/gcc/tree-ssa-math-opts.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
  2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
                   ` (27 preceding siblings ...)
  2007-06-16  9:53 ` uros at gcc dot gnu dot org
@ 2007-06-18  8:56 ` ubizjak at gmail dot com
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2007-06-18  8:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #29 from ubizjak at gmail dot com  2007-06-18 08:56 -------
Patch was committed to SVN, so closing as fixed.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-06-18  8:56 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
2007-04-27  9:16 ` [Bug middle-end/31723] " burnus at gcc dot gnu dot org
2007-04-27  9:45 ` rguenth at gcc dot gnu dot org
2007-04-27 10:27 ` jb at gcc dot gnu dot org
2007-04-27 10:29 ` jb at gcc dot gnu dot org
2007-04-27 11:01 ` jb at gcc dot gnu dot org
2007-04-27 11:09 ` rguenth at gcc dot gnu dot org
2007-04-27 11:41 ` burnus at gcc dot gnu dot org
2007-04-27 20:43 ` steven at gcc dot gnu dot org
2007-04-27 21:03 ` rguenth at gcc dot gnu dot org
2007-04-27 23:25 ` pinskia at gcc dot gnu dot org
2007-06-10  8:28 ` ubizjak at gmail dot com
2007-06-10 10:47 ` ubizjak at gmail dot com
2007-06-10 11:06 ` jb at gcc dot gnu dot org
2007-06-10 12:07 ` rguenth at gcc dot gnu dot org
2007-06-10 12:09 ` rguenth at gcc dot gnu dot org
2007-06-10 16:25 ` ubizjak at gmail dot com
2007-06-10 16:49 ` ubizjak at gmail dot com
2007-06-10 17:34 ` ubizjak at gmail dot com
2007-06-10 21:39 ` rguenther at suse dot de
2007-06-10 21:47 ` rguenth at gcc dot gnu dot org
2007-06-10 21:48 ` rguenth at gcc dot gnu dot org
2007-06-11  3:32 ` tbptbp at gmail dot com
2007-06-11  5:51 ` ubizjak at gmail dot com
2007-06-11  5:58 ` tbptbp at gmail dot com
2007-06-13 20:21 ` ubizjak at gmail dot com
2007-06-14  9:18 ` ubizjak at gmail dot com
2007-06-15 13:23 ` burnus at gcc dot gnu dot org
2007-06-16  9:53 ` uros at gcc dot gnu dot org
2007-06-18  8:56 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).