question about equivalent x87/x64-64 fpu code...

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* question about equivalent x87/x64-64 fpu code...
@ 2011-05-13 18:12 Paweł Sikora
  2011-05-16 11:03 ` Andrew Haley
  0 siblings, 1 reply; 4+ messages in thread
From: Paweł Sikora @ 2011-05-13 18:12 UTC (permalink / raw)
  To: gcc-help

Hi,

i'm using a 3rd-party engine http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
for partitioning some complex data. it worked fine for years until today (may 13)...

observations:
- the 32-bit metis build produces nice and balanced partitons.
- the 64-bit metis build produces bad and unbalanced partitons.

the metis' engine uses arrays of integers on the public interface and internally
some float-based and unsafe in terms of precison (x<y and x==y) operations.

so, i've built/tested following metis variants:

1). -m32 -march=pentium4 -O1                         - works fine.
2). -m32 -march=pentium4 -O1 -mfpmath=sse            - works fine.
3). -m64 -march=x86-64 -O1                           - bad/unbalanced partitions.
4). -m64 -march=x86-64 -O1 -mfpmath=387              - bad/unbalanced partitions.

at this point i've expected wrong results (< 80-bit precision) from variants 2/3
and good results from variants 1/4 but the real world differs.

next, i've isolated a one place in sources with float x<y stmt and changed it
to (x-y)<0.00001. with such change both native 1/3 variants give nice/equivalent results.

so, where is the problem? is the variants 1/4 really equivalent?

BR,
PaweÅ‚.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: question about equivalent x87/x64-64 fpu code...
  2011-05-13 18:12 question about equivalent x87/x64-64 fpu code Paweł Sikora
@ 2011-05-16 11:03 ` Andrew Haley
  2011-05-16 22:58   ` Pawel Sikora
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Haley @ 2011-05-16 11:03 UTC (permalink / raw)
  To: gcc-help

On 13/05/11 19:11, PaweÅ‚ Sikora wrote:
> Hi,
> 
> i'm using a 3rd-party engine http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
> for partitioning some complex data. it worked fine for years until today (may 13)...
> 
> observations:
> - the 32-bit metis build produces nice and balanced partitons.
> - the 64-bit metis build produces bad and unbalanced partitons.
> 
> the metis' engine uses arrays of integers on the public interface and internally
> some float-based and unsafe in terms of precison (x<y and x==y) operations.
> 
> so, i've built/tested following metis variants:
> 
> 1). -m32 -march=pentium4 -O1                         - works fine.
> 2). -m32 -march=pentium4 -O1 -mfpmath=sse            - works fine.
> 3). -m64 -march=x86-64 -O1                           - bad/unbalanced partitions.
> 4). -m64 -march=x86-64 -O1 -mfpmath=387              - bad/unbalanced partitions.
> 
> at this point i've expected wrong results (< 80-bit precision) from variants 2/3
> and good results from variants 1/4 but the real world differs.
> 
> next, i've isolated a one place in sources with float x<y stmt and changed it
> to (x-y)<0.00001. with such change both native 1/3 variants give nice/equivalent results.
> 
> so, where is the problem? is the variants 1/4 really equivalent?

It's going to be very hard for gcc specialists to answer this.  You really
need a numerical analyst who is familiar with the code to have a look.

This may be a gcc bug, or it may be a bug in the code.  It'd impossible
to know without doing more digging into the problem.

Andrew.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: question about equivalent x87/x64-64 fpu code...
  2011-05-16 11:03 ` Andrew Haley
@ 2011-05-16 22:58   ` Pawel Sikora
  2011-05-16 23:57     ` Andrew Haley
  0 siblings, 1 reply; 4+ messages in thread
From: Pawel Sikora @ 2011-05-16 22:58 UTC (permalink / raw)
  To: gcc-help; +Cc: Andrew Haley

On Monday 16 of May 2011 11:15:29 Andrew Haley wrote:
> On 13/05/11 19:11, Paweł Sikora wrote:
> > Hi,
> > 
> > i'm using a 3rd-party engine http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
> > for partitioning some complex data. it worked fine for years until today (may 13)...
> > 
> > observations:
> > - the 32-bit metis build produces nice and balanced partitons.
> > - the 64-bit metis build produces bad and unbalanced partitons.
> > 
> > the metis' engine uses arrays of integers on the public interface and internally
> > some float-based and unsafe in terms of precison (x<y and x==y) operations.
> > 
> > so, i've built/tested following metis variants:
> > 
> > 1). -m32 -march=pentium4 -O1                         - works fine.
> > 2). -m32 -march=pentium4 -O1 -mfpmath=sse            - works fine.
> > 3). -m64 -march=x86-64 -O1                           - bad/unbalanced partitions.
> > 4). -m64 -march=x86-64 -O1 -mfpmath=387              - bad/unbalanced partitions.
> > 
> > at this point i've expected wrong results (< 80-bit precision) from variants 2/3
> > and good results from variants 1/4 but the real world differs.
> > 
> > next, i've isolated a one place in sources with float x<y stmt and changed it
> > to (x-y)<0.00001. with such change both native 1/3 variants give nice/equivalent results.
> > 
> > so, where is the problem? is the variants 1/4 really equivalent?
> 
> It's going to be very hard for gcc specialists to answer this.  You really
> need a numerical analyst who is familiar with the code to have a look.
>
> This may be a gcc bug, or it may be a bug in the code.  It'd impossible
> to know without doing more digging into the problem.

Hi,

i've naturally reported these numerical problems to the author at first place
but i'm still impressed that code produced by gcc for x87/x86-64 with explicit
and equal -mpc32/-mfpmath options gives different results.

testcase compiled for 32/64-bit with SSE math and fpu precision forced
to 32-bit gives the same (bad) results:

$ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="-mfpmath=sse" EXTRA_CFLAGS64=""
compiling 32-bit metis-4.0.1 testcase...
gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=sse
gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o
./test32.m4.0.1 && mv test{,32.m4.0.1}.out
partition 0: lut+dram:  216506, flip-flop:   56961, bram:  141955
partition 1: lut+dram:   86815, flip-flop:   36485, bram:  143550
partition 2: lut+dram:  142807, flip-flop:   49038, bram:  151525
partition 3: lut+dram:  142517, flip-flop:   48918, bram:  149930
compiling 64-bit metis-4.0.1 testcase...
gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32
gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o
./test64.m4.0.1 && mv test{,64.m4.0.1}.out
partition 0: lut+dram:  216506, flip-flop:   56961, bram:  141955
partition 1: lut+dram:   86815, flip-flop:   36485, bram:  143550
partition 2: lut+dram:  142807, flip-flop:   49038, bram:  151525
partition 3: lut+dram:  142517, flip-flop:   48918, bram:  149930
diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff

similiar variant with math forced to x87 behaves differently:

$ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="" EXTRA_CFLAGS64="-mfpmath=387"
compiling 32-bit metis-4.0.1 testcase...
gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32
gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o
./test32.m4.0.1 && mv test{,32.m4.0.1}.out
partition 0: lut+dram:  150173, flip-flop:   46357, bram:  141955
partition 1: lut+dram:  153148, flip-flop:   47089, bram:  143550
partition 2: lut+dram:  141322, flip-flop:   49043, bram:  151525
partition 3: lut+dram:  144002, flip-flop:   48913, bram:  149930
compiling 64-bit metis-4.0.1 testcase...
gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=387
gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o
./test64.m4.0.1 && mv test{,64.m4.0.1}.out
partition 0: lut+dram:  216506, flip-flop:   56961, bram:  141955
partition 1: lut+dram:   86815, flip-flop:   36485, bram:  143550
partition 2: lut+dram:  142807, flip-flop:   49038, bram:  151525
partition 3: lut+dram:  142517, flip-flop:   48918, bram:  149930
diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff
make: *** [all] Error 1

but.... adding -fexcess-precision=standard to 32-bit testcase gives me again bad but equal results.

$ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="-fexcess-precision=standard" EXTRA_CFLAGS64="-mfpmath=387"
compiling 32-bit metis-4.0.1 testcase...
gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -fexcess-precision=standard
gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o
./test32.m4.0.1 && mv test{,32.m4.0.1}.out
partition 0: lut+dram:  216506, flip-flop:   56961, bram:  141955
partition 1: lut+dram:   86815, flip-flop:   36485, bram:  143550
partition 2: lut+dram:  142807, flip-flop:   49038, bram:  151525
partition 3: lut+dram:  142517, flip-flop:   48918, bram:  149930
compiling 64-bit metis-4.0.1 testcase...
gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=387
gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o
./test64.m4.0.1 && mv test{,64.m4.0.1}.out
partition 0: lut+dram:  216506, flip-flop:   56961, bram:  141955
partition 1: lut+dram:   86815, flip-flop:   36485, bram:  143550
partition 2: lut+dram:  142807, flip-flop:   49038, bram:  151525
partition 3: lut+dram:  142517, flip-flop:   48918, bram:  149930
diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff


should -mpc32 and equal fpmath model produce equal results (no matter good or bad) ?
or mabye there's a bug in gcc exposed by explicit -fexcess-precision option?

shoud i report this as potential gcc bug?

BR,
Paweł.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: question about equivalent x87/x64-64 fpu code...
  2011-05-16 22:58   ` Pawel Sikora
@ 2011-05-16 23:57     ` Andrew Haley
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Haley @ 2011-05-16 23:57 UTC (permalink / raw)
  To: gcc-help

On 16/05/11 11:45, Pawel Sikora wrote:
> On Monday 16 of May 2011 11:15:29 Andrew Haley wrote:
>> On 13/05/11 19:11, PaweÅ‚ Sikora wrote:
>>> Hi,
>>>
>>> i'm using a 3rd-party engine http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
>>> for partitioning some complex data. it worked fine for years until today (may 13)...
>>>
>>> observations:
>>> - the 32-bit metis build produces nice and balanced partitons.
>>> - the 64-bit metis build produces bad and unbalanced partitons.
>>>
>>> the metis' engine uses arrays of integers on the public interface and internally
>>> some float-based and unsafe in terms of precison (x<y and x==y) operations.
>>>
>>> so, i've built/tested following metis variants:
>>>
>>> 1). -m32 -march=pentium4 -O1                         - works fine.
>>> 2). -m32 -march=pentium4 -O1 -mfpmath=sse            - works fine.
>>> 3). -m64 -march=x86-64 -O1                           - bad/unbalanced partitions.
>>> 4). -m64 -march=x86-64 -O1 -mfpmath=387              - bad/unbalanced partitions.
>>>
>>> at this point i've expected wrong results (< 80-bit precision) from variants 2/3
>>> and good results from variants 1/4 but the real world differs.
>>>
>>> next, i've isolated a one place in sources with float x<y stmt and changed it
>>> to (x-y)<0.00001. with such change both native 1/3 variants give nice/equivalent results.
>>>
>>> so, where is the problem? is the variants 1/4 really equivalent?
>>
>> It's going to be very hard for gcc specialists to answer this.  You really
>> need a numerical analyst who is familiar with the code to have a look.
>>
>> This may be a gcc bug, or it may be a bug in the code.  It'd impossible
>> to know without doing more digging into the problem.
> 
> Hi,
> 
> i've naturally reported these numerical problems to the author at first place
> but i'm still impressed that code produced by gcc for x87/x86-64 with explicit
> and equal -mpc32/-mfpmath options gives different results.
> 
> should -mpc32 and equal fpmath model produce equal results (no matter good or bad) ?

Not necessarily.  Whatever libraries your code is calling won't be affected
by the compiler options you use, for example.

> or mabye there's a bug in gcc exposed by explicit -fexcess-precision option?

Maybe.

> shoud i report this as potential gcc bug?

No, because we haven't even established that there is a gcc bug yet.
There's little point in reporting a bug without a test case that shows what
gcc is doing wrong.

In general, floating-point on 64-bit x86 is better behaved than on 32-bit.

Andrew.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-05-16 11:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-13 18:12 question about equivalent x87/x64-64 fpu code Paweł Sikora
2011-05-16 11:03 ` Andrew Haley
2011-05-16 22:58   ` Pawel Sikora
2011-05-16 23:57     ` Andrew Haley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).