From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-help-return-31798-listarch-gcc-help=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 22979 invoked by alias); 1 Mar 2008 17:26:02 -0000
Received: (qmail 22824 invoked by uid 22791); 1 Mar 2008 17:25:58 -0000
X-Spam-Check-By: sourceware.org
Received: from smtp123.sbc.mail.sp1.yahoo.com (HELO smtp123.sbc.mail.sp1.yahoo.com) (69.147.64.96)     by sourceware.org (qpsmtpd/0.31) with SMTP; Sat, 01 Mar 2008 17:25:30 +0000
Received: (qmail 43090 invoked from network); 1 Mar 2008 17:25:28 -0000
Received: from unknown (HELO ?192.168.1.6?) (timothyprince@sbcglobal.net@68.125.167.198 with plain)   by smtp123.sbc.mail.sp1.yahoo.com with SMTP; 1 Mar 2008 17:25:27 -0000
X-YMail-OSG: jqC1hfoVM1nBrA04DiNMW3AHbHW2ctDWlRT3kmZ7i8ZeI7j8Q8k48R4jHWKcmbzcTWkZHbsUjQ--
X-Yahoo-Newman-Property: ymail-3
Message-ID: <47C99188.2050309@myrealbox.com>
Date: Sat, 01 Mar 2008 17:26:00 -0000
From: Tim Prince <tprince@myrealbox.com>
User-Agent: Thunderbird 2.0.0.12 (Windows/20080213)
MIME-Version: 1.0
To: CSights <csights@fastmail.fm>
CC: gcc-help@gcc.gnu.org, Brian Dessent <brian@dessent.net>
Subject: Re: binary compiled with -O1 and w/ individual optimization flags  are   not the same
References: <200802291215.47293.csights@fastmail.fm> <200802291716.22197.csights@fastmail.fm> <47C88B66.A501ABD6@dessent.net> <200803011057.26145.csights@fastmail.fm>
In-Reply-To: <200803011057.26145.csights@fastmail.fm>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-help.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-help/>
List-Post: <mailto:gcc-help@gcc.gnu.org>
List-Help: <mailto:gcc-help-help@gcc.gnu.org>
Sender: gcc-help-owner@gcc.gnu.org
X-SW-Source: 2008-03/txt/msg00004.txt.bz2

CSights wrote:
>
> 	Currently using doubles, but thanks for reminding me about the number of 
> decimals that make sense.
>
>   
>> By default calculations on the 387 are done by the hardware in 80 bits
>> precision, but truncated down to 64 (assuming double types) when moved
>> out of the registers.  There are a number of ways to deal with it, or at
>> least expose it:
>>
>> -ffloat-store will cause gcc to always move intermediate results out of
>> registers and into memory, which effectively gets rid of the excess
>> precision at the cost of a speed hit.
>>     
>
> 	Progress! Now the program output matching blocks are
> (O0 -ffloat-store == O1 ffloat-store == O2 ffloat-store) != (O0) != (O1 == O2 
> == O3)  In other words, now the O0 matches 1,2 with the addition 
> of -ffloat-store, even though it still doesn't match the Ox without 
> ffloat-store.
> 	Does this suggest to you the mismatching output was due to decimal point 
> differences rather than other problems (aliasing for example)?
>   
It suggests that you were in fact getting more than 53-bit double 
somewhere, and that it's not an aliasing error.
> 	Also, I didn't mention earlier (did I?) that the program's output when 
> compiled on the Macintosh matched at all optimization levels.  (O0 == O1 == 
> O2) (Though the output did not match any output from the program compiled on 
> linux.)  Is this possibly b/c the Mac has sse2 (Core 2 Duo) and able to use 
> those instructions which have more meaningful decimal places?
>   
If you use SSE2, you have no extra precision for -ffloat-store to 
suppress.  Assuming the machine where you used 387 has SSE2 hardware, 
you could set -mfpmath=sse That is the default for 64-bit gcc.
> 	I've tried using floats only in the what I guess is the key calculation 
> involving the exp(), then casting to double (so that I don't have to modify 
> all the code to be float), but this doesn't result in matching output between 
> O1 and O0.  Does the compiler do any recasting of float->double double->float 
> behind the scenes?
>
>   
The 387 exp() performs all its calculations with extra precision.  Then, 
if you don't set -ffloat-store, it may never get rounded down.  If you 
have no SSE2 math library, you will still get 387 exp() even if you set 
-mfpmath=sse, but there will be an implicit -ffloat-store in the 
conversion of the result to SSE2.