public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt
@ 2012-10-22  6:44 vincenzo.innocente at cern dot ch
  2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-10-22  6:44 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016

             Bug #: 55016
           Summary: request for specific builtins for rcp and rsqrt
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: vincenzo.innocente@cern.ch


There are cases where the use of approximate rcp and rsqrt suffice.

I wonder if it would be possible to introduce specific "generic" builtins for
"rcp" and "rsqrt" that produce the proper instruction depending on the target
architecture (see,avx etc) and eventually generate vector instruction in a loop

at the moment anything like this is target specific, inefficient and does not
vectorize!

#include <x86intrin.h>
float v0[1024];
float v1[1024];
inline
float rsqrtf( float x ) {
  return _mm_cvtss_f32( _mm_rsqrt_ss( _mm_set_ss( x ) ) );
}
void v() {
  for(int i=0; i!=1024; ++i)
    v0[i] = rsqrtf(v1[i]);
}


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/55016] request for specific builtins for rcp and rsqrt
  2012-10-22  6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch
@ 2012-10-22 18:57 ` glisse at gcc dot gnu.org
  2012-10-23  5:19 ` vincenzo.innocente at cern dot ch
  2012-10-23  6:12 ` glisse at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-22 18:57 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016

--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-22 18:56:42 UTC ---
(In reply to comment #0)
> void v() {
>   for(int i=0; i!=1024; ++i)
>     v0[i] = rsqrtf(v1[i]);
> }

Doesn't writing
v0[i] = 1 / sqrtf(v1[i])
work with suitable fast-math flags? It still produces an extra iteration to
refine the result, do we want a -ffaster-math?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/55016] request for specific builtins for rcp and rsqrt
  2012-10-22  6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch
  2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org
@ 2012-10-23  5:19 ` vincenzo.innocente at cern dot ch
  2012-10-23  6:12 ` glisse at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2012-10-23  5:19 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016

--- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-10-23 05:19:37 UTC ---
For the application I have in mind a global flag will such as -ffaster-math
will not be suitable
as it would affect also places where full "single precision" is still required.
I would like just to profit of the rcp and rsqrt instructions for cases where
their low "precision" is enough.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/55016] request for specific builtins for rcp and rsqrt
  2012-10-22  6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch
  2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org
  2012-10-23  5:19 ` vincenzo.innocente at cern dot ch
@ 2012-10-23  6:12 ` glisse at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-23  6:12 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016

--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-23 06:12:30 UTC ---
(In reply to comment #2)
> For the application I have in mind a global flag will such as -ffaster-math
> will not be suitable
> as it would affect also places where full "single precision" is still required.

Flags can be used per function, but that doesn't work very well indeed.

> I would like just to profit of the rcp and rsqrt instructions for cases where
> their low "precision" is enough.

Some math libraries (AIX for instance) provide a correctly rounded rsqrt, so it
would have to be called vaguely_estimate_rsqrt or something (still not to be
confused with the fast-math version that adds one refinement round to reach
almost full precision).


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-10-23  6:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-22  6:44 [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt vincenzo.innocente at cern dot ch
2012-10-22 18:57 ` [Bug tree-optimization/55016] " glisse at gcc dot gnu.org
2012-10-23  5:19 ` vincenzo.innocente at cern dot ch
2012-10-23  6:12 ` glisse at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).