public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/31723]  New: Use reciprocal and reciprocal square root with -ffast-math
@ 2007-04-27  8:07 jb at gcc dot gnu dot org
  2007-04-27  9:16 ` [Bug middle-end/31723] " burnus at gcc dot gnu dot org
                   ` (28 more replies)
  0 siblings, 29 replies; 30+ messages in thread
From: jb at gcc dot gnu dot org @ 2007-04-27  8:07 UTC (permalink / raw)
  To: gcc-bugs

I did some analysis of why gfortran does badly at the gas_dyn benchmark of the
Polyhedron benchmark suite. See my analysis at

http://gcc.gnu.org/ml/fortran/2007-04/msg00494.html

In short, GCC should use reciprocal and reciprocal square root instructions
(available in single precision for SSE and Altivec) when possible. These
instructions are very fast, a few cycles vs. dozens or hundreds of cycles for
normal division and square root instructions. However, as these instructions
are accurate only to 12 bits, they should be enabled only with -ffast-math (or
some separate option that gets included with -ffast-math).

The following C program demonstrates the issue, for all the functions it should
be possible to use reciprocal and/or reciprocal square root instructions
instead of normal div and sqrt:

#include <math.h>

float recip1 (float a)
{
  return 1.0f/a;
}

float recip2 (float a, float b)
{
  return a/b;
}

float rsqrt1 (float a)
{
  return 1.0f/sqrtf(a);
}

float rsqrt2 (float a, float b)
{
  /* Mathematically equivalent to 1/sqrt(b*(1/a))  */
  return sqrtf(a/b);
}

asm output (compiled with -std=c99 -O3 -c -Wall -pedantic -march=k8
-mfpmath=sse -ffast-math -S):

        .file   "recip.c"
        .text
        .p2align 4,,15
.globl recip1
        .type   recip1, @function
recip1:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movss   .LC0, %xmm0
        divss   8(%ebp), %xmm0
        movss   %xmm0, -4(%ebp)
        flds    -4(%ebp)
        leave
        ret
        .size   recip1, .-recip1
        .p2align 4,,15
.globl recip2
        .type   recip2, @function
recip2:
        pushl   %ebp
        movl    %esp, %ebp
        movss   8(%ebp), %xmm0
        divss   12(%ebp), %xmm0
        movss   %xmm0, 8(%ebp)
        flds    8(%ebp)
        leave
        ret
        .size   recip2, .-recip2
        .p2align 4,,15
.globl rsqrt2
        .type   rsqrt2, @function
rsqrt2:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movss   8(%ebp), %xmm0
        divss   12(%ebp), %xmm0
        sqrtss  %xmm0, %xmm0
        movss   %xmm0, -4(%ebp)
        flds    -4(%ebp)
        leave
        ret
        .size   rsqrt2, .-rsqrt2
        .p2align 4,,15
.globl rsqrt1
        .type   rsqrt1, @function
rsqrt1:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movss   .LC0, %xmm0
        sqrtss  8(%ebp), %xmm1
        divss   %xmm1, %xmm0
        movss   %xmm0, -4(%ebp)
        flds    -4(%ebp)
        leave
        ret
        .size   rsqrt1, .-rsqrt1
        .section        .rodata.cst4,"aM",@progbits,4
        .align 4
.LC0:
        .long   1065353216
        .ident  "GCC: (GNU) 4.3.0 20070426 (experimental)"
        .section        .note.GNU-stack,"",@progbits


As can be seen, it uses divss and sqrtss instead of rcpss and rsqrtss. Of
course, there are vectorized versions of these functions too, rcpps and
rsqrtps, that should be used when appropriate (vectorization is important e.g.
for gas_dyn).


-- 
           Summary: Use reciprocal and reciprocal square root with -ffast-
                    math
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jb at gcc dot gnu dot org
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-06-18  8:56 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-27  8:07 [Bug middle-end/31723] New: Use reciprocal and reciprocal square root with -ffast-math jb at gcc dot gnu dot org
2007-04-27  9:16 ` [Bug middle-end/31723] " burnus at gcc dot gnu dot org
2007-04-27  9:45 ` rguenth at gcc dot gnu dot org
2007-04-27 10:27 ` jb at gcc dot gnu dot org
2007-04-27 10:29 ` jb at gcc dot gnu dot org
2007-04-27 11:01 ` jb at gcc dot gnu dot org
2007-04-27 11:09 ` rguenth at gcc dot gnu dot org
2007-04-27 11:41 ` burnus at gcc dot gnu dot org
2007-04-27 20:43 ` steven at gcc dot gnu dot org
2007-04-27 21:03 ` rguenth at gcc dot gnu dot org
2007-04-27 23:25 ` pinskia at gcc dot gnu dot org
2007-06-10  8:28 ` ubizjak at gmail dot com
2007-06-10 10:47 ` ubizjak at gmail dot com
2007-06-10 11:06 ` jb at gcc dot gnu dot org
2007-06-10 12:07 ` rguenth at gcc dot gnu dot org
2007-06-10 12:09 ` rguenth at gcc dot gnu dot org
2007-06-10 16:25 ` ubizjak at gmail dot com
2007-06-10 16:49 ` ubizjak at gmail dot com
2007-06-10 17:34 ` ubizjak at gmail dot com
2007-06-10 21:39 ` rguenther at suse dot de
2007-06-10 21:47 ` rguenth at gcc dot gnu dot org
2007-06-10 21:48 ` rguenth at gcc dot gnu dot org
2007-06-11  3:32 ` tbptbp at gmail dot com
2007-06-11  5:51 ` ubizjak at gmail dot com
2007-06-11  5:58 ` tbptbp at gmail dot com
2007-06-13 20:21 ` ubizjak at gmail dot com
2007-06-14  9:18 ` ubizjak at gmail dot com
2007-06-15 13:23 ` burnus at gcc dot gnu dot org
2007-06-16  9:53 ` uros at gcc dot gnu dot org
2007-06-18  8:56 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).