Fast operations on floating point numbers?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Fast operations on floating point numbers?
@ 2003-12-04 10:11 Martin Reinecke
  2003-12-04 14:16 ` Peter Barada
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Martin Reinecke @ 2003-12-04 10:11 UTC (permalink / raw)
  To: gcc

Hi,

I'm not completely sure whether this question belongs to a C/C++ newsgroup,
but it is probably compiler-specific, so I'm trying my luck here.

I'm developing a numerical simulation code which contains operations
like this one

double a;
int flipsign;

[...]

if (flipsign)
   a = -a;

in an inner loop which is executed billions of times.
I suspect that the conditional slows the execution down,
and the "a=-a" statement might not be optimal as well.

What is, according to your experience, the most efficient way to negate
a floating point number? Should one write
  a=-a;
or
  a*=-1;
or would it be better just to flip the sign bit by hand with something like this
   #include<ieee754.h>

   ieee754_double a;
   a.negative ^= 1;

In the latter case, I could replace the above example by

   a.negative ^= flipsign;

which avoids the conditional and should be rather effective.

It would be great to have some intrinsics around for dealing with this issue,
but I guess not many people are bit by this problem, and since it is not covered
by the C and C++ standards, this is unlikely to happen.

Similarly, there is the math library function ldexp(), which multiplies a double
with a given power of two. For some reason it is extremely slow, but in principle
it should just add a number to the exponent of the double (and do a few sanity checks).

Does gcc offer a way to perform this kind of operations efficiently (on machines
adhering to IEEE754)?

Thanks in advance,
   Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fast operations on floating point numbers?
  2003-12-04 10:11 Fast operations on floating point numbers? Martin Reinecke
@ 2003-12-04 14:16 ` Peter Barada
  2003-12-04 14:30   ` Martin Reinecke
  2003-12-04 14:34 ` Falk Hueffner
  2003-12-04 17:49 ` Davide Rossetti
  2 siblings, 1 reply; 9+ messages in thread
From: Peter Barada @ 2003-12-04 14:16 UTC (permalink / raw)
  To: martin; +Cc: gcc


>I'm not completely sure whether this question belongs to a C/C++ newsgroup,
>but it is probably compiler-specific, so I'm trying my luck here.
>
>I'm developing a numerical simulation code which contains operations
>like this one
>
>double a;
>int flipsign;
>
>[...]
>
>if (flipsign)
>   a = -a;
>
>in an inner loop which is executed billions of times.
>I suspect that the conditional slows the execution down,
>and the "a=-a" statement might not be optimal as well.

If flipsign is only refered to inside the(hevily executed) loop, you
could try:

double a, flip;
int flipsign;

flip = flipsign ? -1.0 : 1.0;

for (;;) {
  ...
  a *= flip;
  ...
}

And get rid of the conditional...

-- 
Peter Barada
peter@the-baradas.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fast operations on floating point numbers?
  2003-12-04 14:16 ` Peter Barada
@ 2003-12-04 14:30   ` Martin Reinecke
  2003-12-04 14:43     ` Scott Robert Ladd
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Martin Reinecke @ 2003-12-04 14:30 UTC (permalink / raw)
  To: Peter Barada; +Cc: gcc

Peter Barada wrote:

> If flipsign is only refered to inside the(hevily executed) loop, you
> could try:
> 
> double a, flip;
> int flipsign;
> 
> flip = flipsign ? -1.0 : 1.0;
> 
> for (;;) {
>   ...
>   a *= flip;
>   ...
> }
> 
> And get rid of the conditional...

Unfortunately this does not work in all places where I have this
problem. But even where I can use it, there is still a
full floating-point multiplication where the simple XORing of
a single bit should do. But maybe the cost of both operations
is not so different on today's CPUs; I don't know.

Thanks,
   Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fast operations on floating point numbers?
  2003-12-04 10:11 Fast operations on floating point numbers? Martin Reinecke
  2003-12-04 14:16 ` Peter Barada
@ 2003-12-04 14:34 ` Falk Hueffner
  2003-12-04 17:49 ` Davide Rossetti
  2 siblings, 0 replies; 9+ messages in thread
From: Falk Hueffner @ 2003-12-04 14:34 UTC (permalink / raw)
  To: Martin Reinecke; +Cc: gcc

Martin Reinecke <martin@MPA-Garching.MPG.DE> writes:

> if (flipsign)
>    a = -a;
> 
> in an inner loop which is executed billions of times.
> I suspect that the conditional slows the execution down,
> and the "a=-a" statement might not be optimal as well.

a = -a should produce optimal code. If it doesn't, please file a bug
report. We should not need intrinsics for this.

> Similarly, there is the math library function ldexp(), which
> multiplies a double with a given power of two. For some reason it is
> extremely slow, but in principle it should just add a number to the
> exponent of the double (and do a few sanity checks).

gcc provides __builtin_ldexp for this. I don't know whether it is well
optimized, though. If it doesn't produce optimal code for your
platform, please file a bug report with the assembly you would like to
see for it.

-- 
	Falk

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fast operations on floating point numbers?
  2003-12-04 14:30   ` Martin Reinecke
@ 2003-12-04 14:43     ` Scott Robert Ladd
  2003-12-04 14:52       ` Rupert Wood
  2003-12-04 15:55     ` Paul Koning
  2003-12-04 17:44     ` Joe Buck
  2 siblings, 1 reply; 9+ messages in thread
From: Scott Robert Ladd @ 2003-12-04 14:43 UTC (permalink / raw)
  To: Martin Reinecke, gcc mailing list

Martin Reinecke wrote:
> Unfortunately this does not work in all places where I have this
> problem. But even where I can use it, there is still a
> full floating-point multiplication where the simple XORing of
> a single bit should do. But maybe the cost of both operations
> is not so different on today's CPUs; I don't know.

Your problem is architecture-specific. For example, modern Intel 
architecture  processors support the FCHS instruction to flip the sign 
of the floating-point value in ST(0), by performing a simple bit flip. I 
haven't run any tests, but I suspect GCC is smart enough to use that 
instruction.

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Fast operations on floating point numbers?
  2003-12-04 14:43     ` Scott Robert Ladd
@ 2003-12-04 14:52       ` Rupert Wood
  0 siblings, 0 replies; 9+ messages in thread
From: Rupert Wood @ 2003-12-04 14:52 UTC (permalink / raw)
  To: 'Scott Robert Ladd', 'Martin Reinecke'; +Cc: gcc

Scott Robert Ladd wrote:

> Your problem is architecture-specific. For example, modern Intel
> architecture  processors support the FCHS instruction to flip the
> sign of the floating-point value in ST(0), by performing a simple
> bit flip. I haven't run any tests, but I suspect GCC is smart
> enough to use that instruction.

Yeah, e.g.

    /* floating-point in a register */
    double f(double a, int b)
    {
        if (b) a *= -1;
        return a;
    }

    /* floating-point in memory */
    void g(double *p)
    {
        *p *= -1;
    }

The register case generates the floating point change sign operation:

        fchs

and the second the xor as you'd hope:

        movl    4(%esp), %eax
        xorb    $-128, 7(%eax)

Rup.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fast operations on floating point numbers?
  2003-12-04 14:30   ` Martin Reinecke
  2003-12-04 14:43     ` Scott Robert Ladd
@ 2003-12-04 15:55     ` Paul Koning
  2003-12-04 17:44     ` Joe Buck
  2 siblings, 0 replies; 9+ messages in thread
From: Paul Koning @ 2003-12-04 15:55 UTC (permalink / raw)
  To: martin; +Cc: peter, gcc

>>>>> "Martin" == Martin Reinecke <martin@MPA-Garching.MPG.DE> writes:

 Martin> Peter Barada wrote:
 >> If flipsign is only refered to inside the(hevily executed) loop,
 >> you could try:
 >> 
 >> double a, flip; int flipsign;
 >> 
 >> flip = flipsign ? -1.0 : 1.0;
 >> 
 >> for (;;) { ...  a *= flip; ...  }
 >> 
 >> And get rid of the conditional...

 Martin> Unfortunately this does not work in all places where I have
 Martin> this problem. But even where I can use it, there is still a
 Martin> full floating-point multiplication where the simple XORing of
 Martin> a single bit should do. But maybe the cost of both operations
 Martin> is not so different on today's CPUs; I don't know.

You'd have to study the CPU books to find out; the answer will vary.

On some CPUs, trying to XOR float values is a bad idea because it
requires moving things from float to integer registers, which may
require memory load/stores.  

On some CPUs, using a multiply instead of a negate may hurt a lot.  On
others it might be just fine.

Some CPUs may have conditional-execute machinery so something like
     if (foo) a = -a;
doesn't involve any branches.  Other CPUs may have efficient branch
caches so the cost of that conditional branch is very low.

My inclination would be to write the obvious source code (i.e., what's
above) and let the optimizer pick "the right" answer.  Trying to fake
it out by introducing a multiply by -1.0 either will have no effect
(if the compiler recognizes the constant and translates the operation
back into a negate) or is likely to make things slower.

If profiling and/or analysis of the generated code along with a
detailed study of the processor documentation tells you that the
optimizer is NOT generating the best code for this task (and that
improving it will actually make an interesting difference) then your
best bet is to insert assembly language code for the job.  But if you
don't understand the CPU well enough to do that, you really should
just leave things alone and let the compiler do its best.

     paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fast operations on floating point numbers?
  2003-12-04 14:30   ` Martin Reinecke
  2003-12-04 14:43     ` Scott Robert Ladd
  2003-12-04 15:55     ` Paul Koning
@ 2003-12-04 17:44     ` Joe Buck
  2 siblings, 0 replies; 9+ messages in thread
From: Joe Buck @ 2003-12-04 17:44 UTC (permalink / raw)
  To: Martin Reinecke; +Cc: Peter Barada, gcc

On Thu, Dec 04, 2003 at 03:22:31PM +0100, Martin Reinecke wrote:
> Peter Barada wrote:
> 
> > If flipsign is only refered to inside the(hevily executed) loop, you
> > could try:
> > 
> > double a, flip;
> > int flipsign;
> > 
> > flip = flipsign ? -1.0 : 1.0;
> > 
> > for (;;) {
> >   ...
> >   a *= flip;
> >   ...
> > }
> > 
> > And get rid of the conditional...
> 
> Unfortunately this does not work in all places where I have this
> problem. But even where I can use it, there is still a
> full floating-point multiplication where the simple XORing of
> a single bit should do.

On many modern CPUs, the full floating point multiplication will be
faster than the XOR, because you'll be using a pipelined multiplier
and keeping the pipe full, and there isn't an XOR instruction that
operates on a floating point register.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fast operations on floating point numbers?
  2003-12-04 10:11 Fast operations on floating point numbers? Martin Reinecke
  2003-12-04 14:16 ` Peter Barada
  2003-12-04 14:34 ` Falk Hueffner
@ 2003-12-04 17:49 ` Davide Rossetti
  2 siblings, 0 replies; 9+ messages in thread
From: Davide Rossetti @ 2003-12-04 17:49 UTC (permalink / raw)
  To: Martin Reinecke; +Cc: gcc

Martin Reinecke wrote:

> Hi,
>
> I'm not completely sure whether this question belongs to a C/C++ 
> newsgroup,
> but it is probably compiler-specific, so I'm trying my luck here.
>
> I'm developing a numerical simulation code which contains operations
> like this one
>
> double a;
> int flipsign;
>
> [...]
>
> if (flipsign)
>   a = -a;
>
> in an inner loop which is executed billions of times.
> I suspect that the conditional slows the execution down,
> and the "a=-a" statement might not be optimal as well.

if you are on SSE2 platform (dunno if altivec...) and are using -mss2 
-fmpmath=sse
double is mapped onto SSE2 regs. in that case I seem to remember that 
the NEG operation is optimized as a SSE XOR. the bit which I do not know 
is if it is even able to use the SSE insn to generate masks to avoid 
jumps... hope so...


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-12-04 17:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-04 10:11 Fast operations on floating point numbers? Martin Reinecke
2003-12-04 14:16 ` Peter Barada
2003-12-04 14:30   ` Martin Reinecke
2003-12-04 14:43     ` Scott Robert Ladd
2003-12-04 14:52       ` Rupert Wood
2003-12-04 15:55     ` Paul Koning
2003-12-04 17:44     ` Joe Buck
2003-12-04 14:34 ` Falk Hueffner
2003-12-04 17:49 ` Davide Rossetti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).