Re: binary compiled with -O1 and w/ individual optimization flags are not the same

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: binary compiled with -O1 and w/ individual optimization flags  are not the same
  2008-02-29 17:28 binary compiled with -O1 and w/ individual optimization flags are not the same CSights
@ 2008-02-29 17:28 ` Andrew Haley
  2008-02-29 17:40 ` Eljay Love-Jensen
  2008-02-29 20:15 ` Brian Dessent
  2 siblings, 0 replies; 9+ messages in thread
From: Andrew Haley @ 2008-02-29 17:28 UTC (permalink / raw)
  To: CSights; +Cc: gcc-help

CSights wrote:
> Hi,
> 	I'm trying to debug some mismatching results from a program compiled with 
> O1,2,3) and without (-O0 or nothing) optimization flags.
> 	My thought was to individually turn on optimization flags and see which one 
> changes the program's output.
> 	Unfortunately for this plan, the binaries produced using -O1 and those flags 
> said to be turned on by g++ in the manual (-fdefer-pop -fdelayed-branch
> -fguess-branch-probability -fcprop-registers -fif-conversion -fif-conversion2 
> -ftree-ccp -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-ter -ftree-lrs 
> -ftree-sra -ftree-copyrename -ftree-fre -ftree-ch -funit-at-a-time -fmerge-constants) 
> do not match.

No, they wouldn't: without -O you get no optimizations.  Doesn't matter
what individual opt flags you put on the command line.

> I also tried without -fdelayed-branch b/c it is not supported 
> on my architectures (i386: athlon xp, core duo) and with -fomit-frame-pointer 
> in case that was the difference.
> 	I've tried this with g++ versions
> g++ (GCC) 4.2.3 (Debian 4.2.3-1) 
> and
> i686-apple-darwin8-g++-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5363)
> 	Is there a way to have g++ tell me which flags it is actually using when 
> compiling a program.  E.g. expand -O1 to the individual optimization flags at 
> run time?

-save-temps -fverbose-asm puts all the optimizations into the .s file.

> 	Anyone have any other suggestions?

Being a bit more scientific, what is the difference between runs?

Andrew.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* binary compiled with -O1 and w/ individual optimization flags are not the same
@ 2008-02-29 17:28 CSights
  2008-02-29 17:28 ` Andrew Haley
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: CSights @ 2008-02-29 17:28 UTC (permalink / raw)
  To: gcc-help

Hi,
	I'm trying to debug some mismatching results from a program compiled with 
O1,2,3) and without (-O0 or nothing) optimization flags.
	My thought was to individually turn on optimization flags and see which one 
changes the program's output.
	Unfortunately for this plan, the binaries produced using -O1 and those flags 
said to be turned on by g++ in the manual (-fdefer-pop -fdelayed-branch
-fguess-branch-probability -fcprop-registers -fif-conversion -fif-conversion2 
-ftree-ccp -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-ter -ftree-lrs 
-ftree-sra -ftree-copyrename -ftree-fre -ftree-ch -funit-at-a-time -fmerge-constants) 
do not match.  I also tried without -fdelayed-branch b/c it is not supported 
on my architectures (i386: athlon xp, core duo) and with -fomit-frame-pointer 
in case that was the difference.
	I've tried this with g++ versions
g++ (GCC) 4.2.3 (Debian 4.2.3-1) 
and
i686-apple-darwin8-g++-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5363)
	Is there a way to have g++ tell me which flags it is actually using when 
compiling a program.  E.g. expand -O1 to the individual optimization flags at 
run time?
	Anyone have any other suggestions?

Thanks much,
	C.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: binary compiled with -O1 and w/ individual optimization flags are  not the same
  2008-02-29 17:28 binary compiled with -O1 and w/ individual optimization flags are not the same CSights
  2008-02-29 17:28 ` Andrew Haley
@ 2008-02-29 17:40 ` Eljay Love-Jensen
  2008-02-29 20:15 ` Brian Dessent
  2 siblings, 0 replies; 9+ messages in thread
From: Eljay Love-Jensen @ 2008-02-29 17:40 UTC (permalink / raw)
  To: CSights, GCC-help

Hi CSights,

> Is there a way to have g++ tell me which flags it is actually using when
> compiling a program.  E.g. expand -O1 to the individual optimization flags at
> run time?

Yes.

I use this trick:

g++ -O1 -S -fverbose-asm -x c++ <(echo '') -o O1.s

cat O1.s

> Anyone have any other suggestions?

Keep in mind that -O1 or higher turns on many more "behind the scenes"
optimizations than just those that are twiddle-able with the -f optimization
flags.

And keep in mind that -O0 disables all optimizations, irregardless of
specified -f optimization flags (which are ignored).

HTH,
--Eljay

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: binary compiled with -O1 and w/ individual optimization flags are   not the same
  2008-02-29 17:28 binary compiled with -O1 and w/ individual optimization flags are not the same CSights
  2008-02-29 17:28 ` Andrew Haley
  2008-02-29 17:40 ` Eljay Love-Jensen
@ 2008-02-29 20:15 ` Brian Dessent
  2008-02-29 22:38   ` CSights
  2 siblings, 1 reply; 9+ messages in thread
From: Brian Dessent @ 2008-02-29 20:15 UTC (permalink / raw)
  To: CSights; +Cc: gcc-help

CSights wrote:

>         I'm trying to debug some mismatching results from a program compiled with
> O1,2,3) and without (-O0 or nothing) optimization flags.
>         My thought was to individually turn on optimization flags and see which one
> changes the program's output.

-O is not just a combination of a bunch of -f flags.  It doesn't work
that way.  There are optimizations that are controlled directly by -O,
with no corresponding -f.  The manual says this at the top of
<http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html>:

> Not all optimizations are controlled directly by a flag. Only optimizations that have a flag are listed.

If you have a program that behaves differently with and without
optimization, then it's probably relying on undefined behavior.  A
common mistake is to violate the C aliasing rules.  Compile your code
with -O2 -fno-strict-aliasing and see if that makes the problem go
away.  If it does that's a good indication that it's an aliasing issue. 
Then compile with -O2 -Wstrict-aliasing (i.e. remove
-fno-strict-aliasing) and see if any of the warnings give you a clue as
to the problem.  You can try -Wstrict-aliasing=1 if the default did not
give any warnings, at the cost of potentially more false positives.

Brian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: binary compiled with -O1 and w/ individual optimization flags are  not the same
  2008-02-29 20:15 ` Brian Dessent
@ 2008-02-29 22:38   ` CSights
       [not found]     ` <47C88B66.A501ABD6@dessent.net>
  0 siblings, 1 reply; 9+ messages in thread
From: CSights @ 2008-02-29 22:38 UTC (permalink / raw)
  To: gcc-help

Hi all,
	Thanks for your helpful hints.
	The first thing I did was compare the compiled program's output with -O2 
and -O2 -fno-strict-aliasing.  The differences did not go away 
using -fno-strict-aliasing, but it gave me a clue that I should be increasing 
the precision on the output to check for differences.
	Below is some output which does not match when viewed at high precision.  The 
output should match, because the number is calculated from the string 
sequence. (The output is two individuals' fitness in a genetic algorithm if 
that matters to you.)

-O2
0.879923389326927374298747963621281087398529052734375, "dadabda"

-O2 -fno-strict-aliasing 
0.87992338932692748532105042613693512976169586181640625, "dadabda"

	This makes me think that there is something going on with calculation of the 
number from the string of letters.  The calculation includes use of exp(), 
but I think that is the only special thing other than + - / *.
	I played around with "-msse -mfpunit=sse", but this doesn't seem to make a 
difference.  Are there any other math type stuff to try? Does the output of a 
program compiled with "-O2 -fno-strict-aliasing" not being the same give you 
all any clues?  Also, I tried -Wstrict-aliasing and -Wstrict-aliasing=1, but 
they didn't give any warnings/errors.

	Here are the output tests I've tried so far, and whether the output matches 
or not:
(-O1 == -O2 == -O2 -msse -mfpunit=sse == -O3) != (-O2 -fno-strict-aliasing) != 
(-O0 == -O0 -msse -mfpunit=sse)

	Does this suggest any further tests?

Thanks again,
	C.

P.S.  Thanks for the hint on how to view all the optimization flags activated 
by Ox.  I tried some tests adding the flags activated by O1 to the command 
line, but that didn't cause the output to be the same.  It seems the 
difference in output is from one of the non-flag optimizations activated by 
O1-3.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: binary compiled with -O1 and w/ individual optimization flags are   not the same
       [not found]     ` <47C88B66.A501ABD6@dessent.net>
@ 2008-03-01 15:59       ` CSights
  2008-03-01 17:26         ` Tim Prince
  2008-03-01 18:22         ` Brian Dessent
  0 siblings, 2 replies; 9+ messages in thread
From: CSights @ 2008-03-01 15:59 UTC (permalink / raw)
  To: gcc-help; +Cc: Brian Dessent

Hi Brian and list, 

> What data types are used in this calculation?  floats only offer ~7
> decimals and doubles only ~14 decimals of precision, so there's no way
> you can compare past that and expect consistancy, unless you're using
> some kind of bignum package.

	Currently using doubles, but thanks for reminding me about the number of 
decimals that make sense.

> By default calculations on the 387 are done by the hardware in 80 bits
> precision, but truncated down to 64 (assuming double types) when moved
> out of the registers.  There are a number of ways to deal with it, or at
> least expose it:
>
> -ffloat-store will cause gcc to always move intermediate results out of
> registers and into memory, which effectively gets rid of the excess
> precision at the cost of a speed hit.

	Progress! Now the program output matching blocks are
(O0 -ffloat-store == O1 ffloat-store == O2 ffloat-store) != (O0) != (O1 == O2 
== O3)  In other words, now the O0 matches 1,2 with the addition 
of -ffloat-store, even though it still doesn't match the Ox without 
ffloat-store.
	Does this suggest to you the mismatching output was due to decimal point 
differences rather than other problems (aliasing for example)?
	Also, I didn't mention earlier (did I?) that the program's output when 
compiled on the Macintosh matched at all optimization levels.  (O0 == O1 == 
O2) (Though the output did not match any output from the program compiled on 
linux.)  Is this possibly b/c the Mac has sse2 (Core 2 Duo) and able to use 
those instructions which have more meaningful decimal places?
	If this is the problem, what would be a good way of dealing with it?  
Throwing away the meaningless decimal digits is okay with me, but avoiding 
the performance hit that comes with ffloat-store would be nice.  Also, it 
would be nice to not have the output depend on compiler flags.
	Is there a way to do the float-store equivalent in the program code itself?  
The goal being to have the program's output when compiled with O0,1,2 match, 
as it does with -ffloat-store.
	I've tried using floats only in the what I guess is the key calculation 
involving the exp(), then casting to double (so that I don't have to modify 
all the code to be float), but this doesn't result in matching output between 
O1 and O0.  Does the compiler do any recasting of float->double double->float 
behind the scenes?
	Another way might be to use doubles, then zero out the least significant bits 
that a float does not have.  Then use these modified doubles in the 
calculation.  ?

Thanks again!
	C.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: binary compiled with -O1 and w/ individual optimization flags  are   not the same
  2008-03-01 15:59       ` CSights
@ 2008-03-01 17:26         ` Tim Prince
  2008-03-01 18:22         ` Brian Dessent
  1 sibling, 0 replies; 9+ messages in thread
From: Tim Prince @ 2008-03-01 17:26 UTC (permalink / raw)
  To: CSights; +Cc: gcc-help, Brian Dessent

CSights wrote:
>
> 	Currently using doubles, but thanks for reminding me about the number of 
> decimals that make sense.
>
>   
>> By default calculations on the 387 are done by the hardware in 80 bits
>> precision, but truncated down to 64 (assuming double types) when moved
>> out of the registers.  There are a number of ways to deal with it, or at
>> least expose it:
>>
>> -ffloat-store will cause gcc to always move intermediate results out of
>> registers and into memory, which effectively gets rid of the excess
>> precision at the cost of a speed hit.
>>     
>
> 	Progress! Now the program output matching blocks are
> (O0 -ffloat-store == O1 ffloat-store == O2 ffloat-store) != (O0) != (O1 == O2 
> == O3)  In other words, now the O0 matches 1,2 with the addition 
> of -ffloat-store, even though it still doesn't match the Ox without 
> ffloat-store.
> 	Does this suggest to you the mismatching output was due to decimal point 
> differences rather than other problems (aliasing for example)?
>   
It suggests that you were in fact getting more than 53-bit double 
somewhere, and that it's not an aliasing error.
> 	Also, I didn't mention earlier (did I?) that the program's output when 
> compiled on the Macintosh matched at all optimization levels.  (O0 == O1 == 
> O2) (Though the output did not match any output from the program compiled on 
> linux.)  Is this possibly b/c the Mac has sse2 (Core 2 Duo) and able to use 
> those instructions which have more meaningful decimal places?
>   
If you use SSE2, you have no extra precision for -ffloat-store to 
suppress.  Assuming the machine where you used 387 has SSE2 hardware, 
you could set -mfpmath=sse That is the default for 64-bit gcc.
> 	I've tried using floats only in the what I guess is the key calculation 
> involving the exp(), then casting to double (so that I don't have to modify 
> all the code to be float), but this doesn't result in matching output between 
> O1 and O0.  Does the compiler do any recasting of float->double double->float 
> behind the scenes?
>
>   
The 387 exp() performs all its calculations with extra precision.  Then, 
if you don't set -ffloat-store, it may never get rounded down.  If you 
have no SSE2 math library, you will still get 387 exp() even if you set 
-mfpmath=sse, but there will be an implicit -ffloat-store in the 
conversion of the result to SSE2.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: binary compiled with -O1 and w/ individual optimization flags are     not the same
  2008-03-01 15:59       ` CSights
  2008-03-01 17:26         ` Tim Prince
@ 2008-03-01 18:22         ` Brian Dessent
  2008-03-05 18:13           ` CSights
  1 sibling, 1 reply; 9+ messages in thread
From: Brian Dessent @ 2008-03-01 18:22 UTC (permalink / raw)
  To: CSights; +Cc: gcc-help

CSights wrote:

>         Also, I didn't mention earlier (did I?) that the program's output when
> compiled on the Macintosh matched at all optimization levels.  (O0 == O1 ==
> O2) (Though the output did not match any output from the program compiled on
> linux.)  Is this possibly b/c the Mac has sse2 (Core 2 Duo) and able to use
> those instructions which have more meaningful decimal places?

Yes, it's probably using the sse2 unit.

>         If this is the problem, what would be a good way of dealing with it?

Well first realize that it's not a problem per se.  The results *are*
equivalent in the significant digits that actually represent what a
double can hold.  The only reason they seem different is because there
are these extra bits of precision that result from the value still being
in a 387 register.  But those bits shouldn't matter because as soon as
the result is moved into memory they are truncated away.

> Throwing away the meaningless decimal digits is okay with me, but avoiding
> the performance hit that comes with ffloat-store would be nice.  Also, it

Like I said, you can use -mpc64 to explicitly set the 387 to 64 bits
precision, just like the sse2 unit.  If you don't have a gcc new enough
to have this option or you don't want to depend on requiring an option,
you can simply manually configure the 387 it at the beginning of your
program to disable the extended precision.  See
<http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323#c60> for a code snippet
of how to do this.  (That relies on a glibc-specific fpu_control.h
header but the definitions in that header are pretty self-contained.)

> would be nice to not have the output depend on compiler flags.

But the output doesn't *really* depend on compiler flags!  That's the
point I'm trying to make.  It only seems like the output differs because
you're looking at something that's like the equivalent of uninitialized
memory.

Suppose you had a string buffer of 80 chars and you filled it with a
\0-terminated string of 40 chars, but to display it you print all 80
chars of the buffer.  Clearly two strings that have the same first 40
chars before the \0 are semantically equivalent as C strings, because
the rest of the buffer is just junk.  No reasonable programmer would
ever consider printing the junk past the \0 when displaying the string,
just like it's not reasonable to print more than 15 (or whatever the
limit is, I forget) significant digits of a double.

This can also cause issues if you are simply testing for equality, i.e.
assert((x/y) == (x/y)) can sometimes fail simply because one result is
in a register and another in memory.  But the solution here is to not
use == for comparing floating point values, but rather compare the
absolute value of their difference to some small delta.  But this is
something that you should do anyway with floating point calculations
because they are by their very design inexact.  Some details at
<http://www.lahey.com/float.htm>.

Brian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: binary compiled with -O1 and w/ individual optimization flags are not the same
  2008-03-01 18:22         ` Brian Dessent
@ 2008-03-05 18:13           ` CSights
  0 siblings, 0 replies; 9+ messages in thread
From: CSights @ 2008-03-05 18:13 UTC (permalink / raw)
  To: gcc-help

Thanks Brian and others,
	I ended causing the output of programs compiled with O0 and O1,2,3 to match 
by changing some key doubles to floats.

Thanks for all your help!
	C.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-03-05 18:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-29 17:28 binary compiled with -O1 and w/ individual optimization flags are not the same CSights
2008-02-29 17:28 ` Andrew Haley
2008-02-29 17:40 ` Eljay Love-Jensen
2008-02-29 20:15 ` Brian Dessent
2008-02-29 22:38   ` CSights
     [not found]     ` <47C88B66.A501ABD6@dessent.net>
2008-03-01 15:59       ` CSights
2008-03-01 17:26         ` Tim Prince
2008-03-01 18:22         ` Brian Dessent
2008-03-05 18:13           ` CSights

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).