public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss
@ 2012-12-20 15:49 vincenzo.innocente at cern dot ch
2012-12-20 15:52 ` [Bug tree-optimization/55760] " rguenth at gcc dot gnu.org
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-12-20 15:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
Bug #: 55760
Summary: scalar code non using rsqrtss and rcpss
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: vincenzo.innocente@cern.ch
is there any reason why rsqrtss and rcpss are not used for scalar code while
rsqrtps and rcpps are used for loops?
cat scalar.cc
#include<cmath>
void scalar(float& a, float& b) {
a = std::sqrt(a);
b = 1.f/b;
}
float v[1024];
float w[1024];
void vector() {
for(int i=0;i!=1024;++i) {
v[i] = std::sqrt(v[i]);
w[i] = 1.f/w[i];
}
}
c++ -std=c++11 -Ofast -march=corei7 -S scalar.cc -ftree-vectorizer-verbose=1
-ftree-loop-if-convert-stores; cat scalar.s | c++filt
scalar(float&, float&):
LFB221:
sqrtss (%rdi), %xmm0
movss %xmm0, (%rdi)
movss LC0(%rip), %xmm0
divss (%rsi), %xmm0
movss %xmm0, (%rsi)
ret
LFE221:
.align 4,0x90
.globl vector()
vector():
LFB222:
movaps LC1(%rip), %xmm5
leaq void(%rip), %rax
xorps %xmm3, %xmm3
movaps LC2(%rip), %xmm4
leaq wchar_t(%rip), %rdx
leaq 4096+void(%rip), %rcx
.align 4,0x90
L4:
movaps (%rax), %xmm1
movaps %xmm3, %xmm2
addq $16, %rax
addq $16, %rdx
rsqrtps %xmm1, %xmm0
cmpneqps %xmm1, %xmm2
andps %xmm2, %xmm0
mulps %xmm0, %xmm1
mulps %xmm1, %xmm0
mulps %xmm4, %xmm1
addps %xmm5, %xmm0
mulps %xmm1, %xmm0
movaps %xmm0, -16(%rax)
movaps -16(%rdx), %xmm1
rcpps %xmm1, %xmm0
mulps %xmm0, %xmm1
mulps %xmm0, %xmm1
addps %xmm0, %xmm0
subps %xmm1, %xmm0
movaps %xmm0, -16(%rdx)
cmpq %rcx, %rax
jne L4
rep; ret
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
@ 2012-12-20 15:52 ` rguenth at gcc dot gnu.org
2012-12-20 15:55 ` vincenzo.innocente at cern dot ch
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-20 15:52 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-20 15:52:31 UTC ---
Use -mrecip. It's otherwise not safe for SPEC CPU 2006 which is why it is not
enabled by default for -ffast-math.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
2012-12-20 15:52 ` [Bug tree-optimization/55760] " rguenth at gcc dot gnu.org
@ 2012-12-20 15:55 ` vincenzo.innocente at cern dot ch
2012-12-20 15:59 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-12-20 15:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
--- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-12-20 15:55:03 UTC ---
Thanks.
not safe meaning producing incorrect results?
Is it documented?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
2012-12-20 15:52 ` [Bug tree-optimization/55760] " rguenth at gcc dot gnu.org
2012-12-20 15:55 ` vincenzo.innocente at cern dot ch
@ 2012-12-20 15:59 ` rguenth at gcc dot gnu.org
2012-12-20 16:07 ` dominiq at lps dot ens.fr
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-20 15:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-20 15:58:55 UTC ---
(In reply to comment #2)
> Thanks.
> not safe meaning producing incorrect results?
Yes.
> Is it documented?
See the documentation for -mrecip:
...
Note that while the throughput of the sequence is higher than the throughput
of the non-reciprocal instruction, the precision of the sequence can be
decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).
...
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
` (2 preceding siblings ...)
2012-12-20 15:59 ` rguenth at gcc dot gnu.org
@ 2012-12-20 16:07 ` dominiq at lps dot ens.fr
2013-01-08 15:29 ` vincenzo.innocente at cern dot ch
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-12-20 16:07 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
--- Comment #4 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-12-20 16:07:11 UTC ---
> is there any reason why rsqrtss and rcpss are not used for scalar code while
> rsqrtps and rcpps are used for loops?
Yep! I don't have the patience to dig the bugzilla archive right now, but the
main reason is related to a loss of accuracy (especially 1/2.0 != 0.5) leading
to problems in some codes (see gas_dyn.f90 in the polyhedron tests). You can
pass options to force the use of rsqrtss and rcpss for scalars:
-mrecip
This option enables use of RCPSS and RSQRTSS instructions (and their vectorized
variants RCPPS and RSQRTPS) with an additional Newton-Raphson step to increase
precision instead of DIVSS and SQRTSS (and their vectorized variants) for
single-precision floating-point arguments. These instructions are generated
only when -funsafe-math-optimizations is enabled together with
-finite-math-only and -fno-trapping-math. Note that while the throughput of the
sequence is higher than the throughput of the non-reciprocal instruction, the
precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of
1.0 equals 0.99999994).
Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) already
with -ffast-math (or the above option combination), and doesn't need -mrecip.
Also note that GCC emits the above sequence with additional Newton-Raphson step
for vectorized single-float division and vectorized sqrtf(x) already with
-ffast-math (or the above option combination), and doesn't need -mrecip.
-mrecip=opt
This option controls which reciprocal estimate instructions may be used. opt is
a comma-separated list of options, which may be preceded by a `!' to invert the
option:
`all'
Enable all estimate instructions.
`default'
Enable the default instructions, equivalent to -mrecip.
`none'
Disable all estimate instructions, equivalent to -mno-recip.
`div'
Enable the approximation for scalar division.
`vec-div'
Enable the approximation for vectorized division.
`sqrt'
Enable the approximation for scalar square root.
`vec-sqrt'
Enable the approximation for vectorized square root.
So, for example, -mrecip=all,!sqrt enables all of the reciprocal
approximations, except for square root.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
` (3 preceding siblings ...)
2012-12-20 16:07 ` dominiq at lps dot ens.fr
@ 2013-01-08 15:29 ` vincenzo.innocente at cern dot ch
2013-01-08 23:55 ` glisse at gcc dot gnu.org
2021-08-07 22:59 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-01-08 15:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
--- Comment #5 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2013-01-08 15:29:18 UTC ---
we just got "hit" by this great type of code (copysign is unknown to
scientists)
most probably gcc could optimize it for -Ofast to return copysignf(1.f,x); (x/x
is optimized in 1)
cat one.cc;c++ -Ofast -mrecip -S one.cc; cat one.s
#include<cmath>
int one(float x) {
return x/std::abs(x);
}
.text
.align 4,0x90
.globl __Z3onef
__Z3onef:
LFB86:
movss LC0(%rip), %xmm2
andps %xmm0, %xmm2
rcpss %xmm2, %xmm1
mulss %xmm1, %xmm2
mulss %xmm1, %xmm2
addss %xmm1, %xmm1
subss %xmm2, %xmm1
mulss %xmm0, %xmm1
cvttss2si %xmm1, %eax
ret
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
` (4 preceding siblings ...)
2013-01-08 15:29 ` vincenzo.innocente at cern dot ch
@ 2013-01-08 23:55 ` glisse at gcc dot gnu.org
2021-08-07 22:59 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-01-08 23:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
Marc Glisse <glisse at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |glisse at gcc dot gnu.org
--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> 2013-01-08 23:55:18 UTC ---
(In reply to comment #5)
> we just got "hit" by this great type of code (copysign is unknown to
> scientists)
>
> most probably gcc could optimize it for -Ofast to return copysignf(1.f,x); (x/x
> is optimized in 1)
>
>
> cat one.cc;c++ -Ofast -mrecip -S one.cc; cat one.s
> #include<cmath>
> int one(float x) {
> return x/std::abs(x);
> }
That looks like a completely different issue than this PR, I think you should
open a different PR if you don't want it to get lost. It seems easy to add a
few lines to fold_binary_loc about it (not the best place, but that's where the
others are) near the place that optimizes A / A to 1.0. You could try writing
the patch, I don't foresee any trap.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
` (5 preceding siblings ...)
2013-01-08 23:55 ` glisse at gcc dot gnu.org
@ 2021-08-07 22:59 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-07 22:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |WONTFIX
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=47989
Keywords| |documentation
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
See PR 47989 for the reason why this option is not enabled for scalar code and
why it was only enabled for vectorized code.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-08-07 22:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-20 15:49 [Bug tree-optimization/55760] New: scalar code non using rsqrtss and rcpss vincenzo.innocente at cern dot ch
2012-12-20 15:52 ` [Bug tree-optimization/55760] " rguenth at gcc dot gnu.org
2012-12-20 15:55 ` vincenzo.innocente at cern dot ch
2012-12-20 15:59 ` rguenth at gcc dot gnu.org
2012-12-20 16:07 ` dominiq at lps dot ens.fr
2013-01-08 15:29 ` vincenzo.innocente at cern dot ch
2013-01-08 23:55 ` glisse at gcc dot gnu.org
2021-08-07 22:59 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).