public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt
@ 2012-10-22 6:44 vincenzo.innocente at cern dot ch
2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-10-22 6:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016
Bug #: 55016
Summary: request for specific builtins for rcp and rsqrt
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: vincenzo.innocente@cern.ch
There are cases where the use of approximate rcp and rsqrt suffice.
I wonder if it would be possible to introduce specific "generic" builtins for
"rcp" and "rsqrt" that produce the proper instruction depending on the target
architecture (see,avx etc) and eventually generate vector instruction in a loop
at the moment anything like this is target specific, inefficient and does not
vectorize!
#include <x86intrin.h>
float v0[1024];
float v1[1024];
inline
float rsqrtf( float x ) {
return _mm_cvtss_f32( _mm_rsqrt_ss( _mm_set_ss( x ) ) );
}
void v() {
for(int i=0; i!=1024; ++i)
v0[i] = rsqrtf(v1[i]);
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/55016] request for specific builtins for rcp and rsqrt
2012-10-22 6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch
@ 2012-10-22 18:57 ` glisse at gcc dot gnu.org
2012-10-23 5:19 ` vincenzo.innocente at cern dot ch
2012-10-23 6:12 ` glisse at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-22 18:57 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-22 18:56:42 UTC ---
(In reply to comment #0)
> void v() {
> for(int i=0; i!=1024; ++i)
> v0[i] = rsqrtf(v1[i]);
> }
Doesn't writing
v0[i] = 1 / sqrtf(v1[i])
work with suitable fast-math flags? It still produces an extra iteration to
refine the result, do we want a -ffaster-math?
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/55016] request for specific builtins for rcp and rsqrt
2012-10-22 6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch
2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org
@ 2012-10-23 5:19 ` vincenzo.innocente at cern dot ch
2012-10-23 6:12 ` glisse at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-10-23 5:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016
--- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-10-23 05:19:37 UTC ---
For the application I have in mind a global flag will such as -ffaster-math
will not be suitable
as it would affect also places where full "single precision" is still required.
I would like just to profit of the rcp and rsqrt instructions for cases where
their low "precision" is enough.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/55016] request for specific builtins for rcp and rsqrt
2012-10-22 6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch
2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org
2012-10-23 5:19 ` vincenzo.innocente at cern dot ch
@ 2012-10-23 6:12 ` glisse at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-23 6:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016
--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-23 06:12:30 UTC ---
(In reply to comment #2)
> For the application I have in mind a global flag will such as -ffaster-math
> will not be suitable
> as it would affect also places where full "single precision" is still required.
Flags can be used per function, but that doesn't work very well indeed.
> I would like just to profit of the rcp and rsqrt instructions for cases where
> their low "precision" is enough.
Some math libraries (AIX for instance) provide a correctly rounded rsqrt, so it
would have to be called vaguely_estimate_rsqrt or something (still not to be
confused with the fast-math version that adds one refinement round to reach
almost full precision).
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-10-23 6:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-22 6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch
2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org
2012-10-23 5:19 ` vincenzo.innocente at cern dot ch
2012-10-23 6:12 ` glisse at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).