public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss
@ 2012-12-20 15:49 vincenzo.innocente at cern dot ch
  2012-12-20 15:52 ` [Bug tree-optimization/55760] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-12-20 15:49 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760

             Bug #: 55760
           Summary: scalar code non using rsqrtss and rcpss
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: vincenzo.innocente@cern.ch


is there any reason why rsqrtss and rcpss are not used for scalar code while
rsqrtps and rcpps are used for loops?

cat scalar.cc
#include<cmath>
void scalar(float& a, float& b) {
  a = std::sqrt(a);
  b = 1.f/b;
}

float v[1024];
float w[1024];

void vector() {
  for(int i=0;i!=1024;++i) {
    v[i] = std::sqrt(v[i]);
    w[i] = 1.f/w[i];
  }
}

c++ -std=c++11 -Ofast -march=corei7 -S scalar.cc -ftree-vectorizer-verbose=1 
-ftree-loop-if-convert-stores; cat scalar.s | c++filt


scalar(float&, float&):
LFB221:
    sqrtss    (%rdi), %xmm0
    movss    %xmm0, (%rdi)
    movss    LC0(%rip), %xmm0
    divss    (%rsi), %xmm0
    movss    %xmm0, (%rsi)
    ret
LFE221:
    .align 4,0x90
    .globl vector()
vector():
LFB222:
    movaps    LC1(%rip), %xmm5
    leaq    void(%rip), %rax
    xorps    %xmm3, %xmm3
    movaps    LC2(%rip), %xmm4
    leaq    wchar_t(%rip), %rdx
    leaq    4096+void(%rip), %rcx
    .align 4,0x90
L4:
    movaps    (%rax), %xmm1
    movaps    %xmm3, %xmm2
    addq    $16, %rax
    addq    $16, %rdx
    rsqrtps    %xmm1, %xmm0
    cmpneqps    %xmm1, %xmm2
    andps    %xmm2, %xmm0
    mulps    %xmm0, %xmm1
    mulps    %xmm1, %xmm0
    mulps    %xmm4, %xmm1
    addps    %xmm5, %xmm0
    mulps    %xmm1, %xmm0
    movaps    %xmm0, -16(%rax)
    movaps    -16(%rdx), %xmm1
    rcpps    %xmm1, %xmm0
    mulps    %xmm0, %xmm1
    mulps    %xmm0, %xmm1
    addps    %xmm0, %xmm0
    subps    %xmm1, %xmm0
    movaps    %xmm0, -16(%rdx)
    cmpq    %rcx, %rax
    jne    L4
    rep; ret


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
  2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
@ 2012-12-20 15:52 ` rguenth at gcc dot gnu.org
  2012-12-20 15:55 ` vincenzo.innocente at cern dot ch
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-20 15:52 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-20 15:52:31 UTC ---
Use -mrecip.  It's otherwise not safe for SPEC CPU 2006 which is why it is not
enabled by default for -ffast-math.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
  2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
  2012-12-20 15:52 ` [Bug tree-optimization/55760] " rguenth at gcc dot gnu.org
@ 2012-12-20 15:55 ` vincenzo.innocente at cern dot ch
  2012-12-20 15:59 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-12-20 15:55 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760

--- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-12-20 15:55:03 UTC ---
Thanks.
not safe meaning producing incorrect results?
Is it documented?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
  2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
  2012-12-20 15:52 ` [Bug tree-optimization/55760] " rguenth at gcc dot gnu.org
  2012-12-20 15:55 ` vincenzo.innocente at cern dot ch
@ 2012-12-20 15:59 ` rguenth at gcc dot gnu.org
  2012-12-20 16:07 ` dominiq at lps dot ens.fr
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-20 15:59 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-20 15:58:55 UTC ---
(In reply to comment #2)
> Thanks.
> not safe meaning producing incorrect results?

Yes.

> Is it documented?

See the documentation for -mrecip:

...

Note that while the throughput of the sequence is higher than the throughput
of the non-reciprocal instruction, the precision of the sequence can be
decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).

...


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
  2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
                   ` (2 preceding siblings ...)
  2012-12-20 15:59 ` rguenth at gcc dot gnu.org
@ 2012-12-20 16:07 ` dominiq at lps dot ens.fr
  2013-01-08 15:29 ` vincenzo.innocente at cern dot ch
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-12-20 16:07 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760

--- Comment #4 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-12-20 16:07:11 UTC ---
> is there any reason why rsqrtss and rcpss are not used for scalar code while
> rsqrtps and rcpps are used for loops?

Yep! I don't have the patience to dig the bugzilla archive right now, but the
main reason is related to a loss of accuracy (especially 1/2.0 != 0.5) leading
to problems in some codes (see gas_dyn.f90 in the polyhedron tests). You can
pass options to force the use of rsqrtss and rcpss for scalars:

-mrecip
This option enables use of RCPSS and RSQRTSS instructions (and their vectorized
variants RCPPS and RSQRTPS) with an additional Newton-Raphson step to increase
precision instead of DIVSS and SQRTSS (and their vectorized variants) for
single-precision floating-point arguments. These instructions are generated
only when -funsafe-math-optimizations is enabled together with
-finite-math-only and -fno-trapping-math. Note that while the throughput of the
sequence is higher than the throughput of the non-reciprocal instruction, the
precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of
1.0 equals 0.99999994).
Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) already
with -ffast-math (or the above option combination), and doesn't need -mrecip.

Also note that GCC emits the above sequence with additional Newton-Raphson step
for vectorized single-float division and vectorized sqrtf(x) already with
-ffast-math (or the above option combination), and doesn't need -mrecip. 

-mrecip=opt
This option controls which reciprocal estimate instructions may be used. opt is
a comma-separated list of options, which may be preceded by a `!' to invert the
option:
`all'
Enable all estimate instructions. 
`default'
Enable the default instructions, equivalent to -mrecip. 
`none'
Disable all estimate instructions, equivalent to -mno-recip. 
`div'
Enable the approximation for scalar division. 
`vec-div'
Enable the approximation for vectorized division. 
`sqrt'
Enable the approximation for scalar square root. 
`vec-sqrt'
Enable the approximation for vectorized square root.
So, for example, -mrecip=all,!sqrt enables all of the reciprocal
approximations, except for square root.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
  2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
                   ` (3 preceding siblings ...)
  2012-12-20 16:07 ` dominiq at lps dot ens.fr
@ 2013-01-08 15:29 ` vincenzo.innocente at cern dot ch
  2013-01-08 23:55 ` glisse at gcc dot gnu.org
  2021-08-07 22:59 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-01-08 15:29 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760

--- Comment #5 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2013-01-08 15:29:18 UTC ---
we just got "hit" by this great type of code (copysign is unknown to
scientists)

most probably gcc could optimize it for -Ofast to return copysignf(1.f,x); (x/x
is optimized in 1)


cat one.cc;c++ -Ofast -mrecip -S one.cc; cat one.s
#include<cmath>
int one(float x) {
  return x/std::abs(x);
}

    .text
    .align 4,0x90
    .globl __Z3onef
__Z3onef:
LFB86:
    movss    LC0(%rip), %xmm2
    andps    %xmm0, %xmm2
    rcpss    %xmm2, %xmm1
    mulss    %xmm1, %xmm2
    mulss    %xmm1, %xmm2
    addss    %xmm1, %xmm1
    subss    %xmm2, %xmm1
    mulss    %xmm0, %xmm1
    cvttss2si    %xmm1, %eax
    ret


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
  2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
                   ` (4 preceding siblings ...)
  2013-01-08 15:29 ` vincenzo.innocente at cern dot ch
@ 2013-01-08 23:55 ` glisse at gcc dot gnu.org
  2021-08-07 22:59 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-01-08 23:55 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |glisse at gcc dot gnu.org

--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> 2013-01-08 23:55:18 UTC ---
(In reply to comment #5)
> we just got "hit" by this great type of code (copysign is unknown to
> scientists)
> 
> most probably gcc could optimize it for -Ofast to return copysignf(1.f,x); (x/x
> is optimized in 1)
> 
> 
> cat one.cc;c++ -Ofast -mrecip -S one.cc; cat one.s
> #include<cmath>
> int one(float x) {
>   return x/std::abs(x);
> }

That looks like a completely different issue than this PR, I think you should
open a different PR if you don't want it to get lost. It seems easy to add a
few lines to fold_binary_loc about it (not the best place, but that's where the
others are) near the place that optimizes A / A to 1.0. You could try writing
the patch, I don't foresee any trap.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
  2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
                   ` (5 preceding siblings ...)
  2013-01-08 23:55 ` glisse at gcc dot gnu.org
@ 2021-08-07 22:59 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-07 22:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |WONTFIX
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=47989
           Keywords|                            |documentation

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
See PR 47989 for the reason why this option is not enabled for scalar code and
why it was only enabled for vectorized code.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-08-07 22:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
2012-12-20 15:52 ` [Bug tree-optimization/55760] " rguenth at gcc dot gnu.org
2012-12-20 15:55 ` vincenzo.innocente at cern dot ch
2012-12-20 15:59 ` rguenth at gcc dot gnu.org
2012-12-20 16:07 ` dominiq at lps dot ens.fr
2013-01-08 15:29 ` vincenzo.innocente at cern dot ch
2013-01-08 23:55 ` glisse at gcc dot gnu.org
2021-08-07 22:59 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).