public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt @ 2012-10-22 6:44 vincenzo.innocente at cern dot ch 2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: vincenzo.innocente at cern dot ch @ 2012-10-22 6:44 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016 Bug #: 55016 Summary: request for specific builtins for rcp and rsqrt Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned@gcc.gnu.org ReportedBy: vincenzo.innocente@cern.ch There are cases where the use of approximate rcp and rsqrt suffice. I wonder if it would be possible to introduce specific "generic" builtins for "rcp" and "rsqrt" that produce the proper instruction depending on the target architecture (see,avx etc) and eventually generate vector instruction in a loop at the moment anything like this is target specific, inefficient and does not vectorize! #include <x86intrin.h> float v0[1024]; float v1[1024]; inline float rsqrtf( float x ) { return _mm_cvtss_f32( _mm_rsqrt_ss( _mm_set_ss( x ) ) ); } void v() { for(int i=0; i!=1024; ++i) v0[i] = rsqrtf(v1[i]); } ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/55016] request for specific builtins for rcp and rsqrt 2012-10-22 6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch @ 2012-10-22 18:57 ` glisse at gcc dot gnu.org 2012-10-23 5:19 ` vincenzo.innocente at cern dot ch 2012-10-23 6:12 ` glisse at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: glisse at gcc dot gnu.org @ 2012-10-22 18:57 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016 --- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-22 18:56:42 UTC --- (In reply to comment #0) > void v() { > for(int i=0; i!=1024; ++i) > v0[i] = rsqrtf(v1[i]); > } Doesn't writing v0[i] = 1 / sqrtf(v1[i]) work with suitable fast-math flags? It still produces an extra iteration to refine the result, do we want a -ffaster-math? ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/55016] request for specific builtins for rcp and rsqrt 2012-10-22 6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch 2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org @ 2012-10-23 5:19 ` vincenzo.innocente at cern dot ch 2012-10-23 6:12 ` glisse at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: vincenzo.innocente at cern dot ch @ 2012-10-23 5:19 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016 --- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-10-23 05:19:37 UTC --- For the application I have in mind a global flag will such as -ffaster-math will not be suitable as it would affect also places where full "single precision" is still required. I would like just to profit of the rcp and rsqrt instructions for cases where their low "precision" is enough. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/55016] request for specific builtins for rcp and rsqrt 2012-10-22 6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch 2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org 2012-10-23 5:19 ` vincenzo.innocente at cern dot ch @ 2012-10-23 6:12 ` glisse at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: glisse at gcc dot gnu.org @ 2012-10-23 6:12 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016 --- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-23 06:12:30 UTC --- (In reply to comment #2) > For the application I have in mind a global flag will such as -ffaster-math > will not be suitable > as it would affect also places where full "single precision" is still required. Flags can be used per function, but that doesn't work very well indeed. > I would like just to profit of the rcp and rsqrt instructions for cases where > their low "precision" is enough. Some math libraries (AIX for instance) provide a correctly rounded rsqrt, so it would have to be called vaguely_estimate_rsqrt or something (still not to be confused with the fast-math version that adds one refinement round to reach almost full precision). ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-10-23 6:12 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-10-22 6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch 2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org 2012-10-23 5:19 ` vincenzo.innocente at cern dot ch 2012-10-23 6:12 ` glisse at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).