* question about equivalent x87/x64-64 fpu code... @ 2011-05-13 18:12 Paweł Sikora 2011-05-16 11:03 ` Andrew Haley 0 siblings, 1 reply; 4+ messages in thread From: Paweł Sikora @ 2011-05-13 18:12 UTC (permalink / raw) To: gcc-help Hi, i'm using a 3rd-party engine http://glaros.dtc.umn.edu/gkhome/metis/metis/overview for partitioning some complex data. it worked fine for years until today (may 13)... observations: - the 32-bit metis build produces nice and balanced partitons. - the 64-bit metis build produces bad and unbalanced partitons. the metis' engine uses arrays of integers on the public interface and internally some float-based and unsafe in terms of precison (x<y and x==y) operations. so, i've built/tested following metis variants: 1). -m32 -march=pentium4 -O1 - works fine. 2). -m32 -march=pentium4 -O1 -mfpmath=sse - works fine. 3). -m64 -march=x86-64 -O1 - bad/unbalanced partitions. 4). -m64 -march=x86-64 -O1 -mfpmath=387 - bad/unbalanced partitions. at this point i've expected wrong results (< 80-bit precision) from variants 2/3 and good results from variants 1/4 but the real world differs. next, i've isolated a one place in sources with float x<y stmt and changed it to (x-y)<0.00001. with such change both native 1/3 variants give nice/equivalent results. so, where is the problem? is the variants 1/4 really equivalent? BR, PaweÅ. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: question about equivalent x87/x64-64 fpu code... 2011-05-13 18:12 question about equivalent x87/x64-64 fpu code Paweł Sikora @ 2011-05-16 11:03 ` Andrew Haley 2011-05-16 22:58 ` Pawel Sikora 0 siblings, 1 reply; 4+ messages in thread From: Andrew Haley @ 2011-05-16 11:03 UTC (permalink / raw) To: gcc-help On 13/05/11 19:11, PaweÅ Sikora wrote: > Hi, > > i'm using a 3rd-party engine http://glaros.dtc.umn.edu/gkhome/metis/metis/overview > for partitioning some complex data. it worked fine for years until today (may 13)... > > observations: > - the 32-bit metis build produces nice and balanced partitons. > - the 64-bit metis build produces bad and unbalanced partitons. > > the metis' engine uses arrays of integers on the public interface and internally > some float-based and unsafe in terms of precison (x<y and x==y) operations. > > so, i've built/tested following metis variants: > > 1). -m32 -march=pentium4 -O1 - works fine. > 2). -m32 -march=pentium4 -O1 -mfpmath=sse - works fine. > 3). -m64 -march=x86-64 -O1 - bad/unbalanced partitions. > 4). -m64 -march=x86-64 -O1 -mfpmath=387 - bad/unbalanced partitions. > > at this point i've expected wrong results (< 80-bit precision) from variants 2/3 > and good results from variants 1/4 but the real world differs. > > next, i've isolated a one place in sources with float x<y stmt and changed it > to (x-y)<0.00001. with such change both native 1/3 variants give nice/equivalent results. > > so, where is the problem? is the variants 1/4 really equivalent? It's going to be very hard for gcc specialists to answer this. You really need a numerical analyst who is familiar with the code to have a look. This may be a gcc bug, or it may be a bug in the code. It'd impossible to know without doing more digging into the problem. Andrew. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: question about equivalent x87/x64-64 fpu code... 2011-05-16 11:03 ` Andrew Haley @ 2011-05-16 22:58 ` Pawel Sikora 2011-05-16 23:57 ` Andrew Haley 0 siblings, 1 reply; 4+ messages in thread From: Pawel Sikora @ 2011-05-16 22:58 UTC (permalink / raw) To: gcc-help; +Cc: Andrew Haley On Monday 16 of May 2011 11:15:29 Andrew Haley wrote: > On 13/05/11 19:11, Paweł Sikora wrote: > > Hi, > > > > i'm using a 3rd-party engine http://glaros.dtc.umn.edu/gkhome/metis/metis/overview > > for partitioning some complex data. it worked fine for years until today (may 13)... > > > > observations: > > - the 32-bit metis build produces nice and balanced partitons. > > - the 64-bit metis build produces bad and unbalanced partitons. > > > > the metis' engine uses arrays of integers on the public interface and internally > > some float-based and unsafe in terms of precison (x<y and x==y) operations. > > > > so, i've built/tested following metis variants: > > > > 1). -m32 -march=pentium4 -O1 - works fine. > > 2). -m32 -march=pentium4 -O1 -mfpmath=sse - works fine. > > 3). -m64 -march=x86-64 -O1 - bad/unbalanced partitions. > > 4). -m64 -march=x86-64 -O1 -mfpmath=387 - bad/unbalanced partitions. > > > > at this point i've expected wrong results (< 80-bit precision) from variants 2/3 > > and good results from variants 1/4 but the real world differs. > > > > next, i've isolated a one place in sources with float x<y stmt and changed it > > to (x-y)<0.00001. with such change both native 1/3 variants give nice/equivalent results. > > > > so, where is the problem? is the variants 1/4 really equivalent? > > It's going to be very hard for gcc specialists to answer this. You really > need a numerical analyst who is familiar with the code to have a look. > > This may be a gcc bug, or it may be a bug in the code. It'd impossible > to know without doing more digging into the problem. Hi, i've naturally reported these numerical problems to the author at first place but i'm still impressed that code produced by gcc for x87/x86-64 with explicit and equal -mpc32/-mfpmath options gives different results. testcase compiled for 32/64-bit with SSE math and fpu precision forced to 32-bit gives the same (bad) results: $ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="-mfpmath=sse" EXTRA_CFLAGS64="" compiling 32-bit metis-4.0.1 testcase... gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=sse gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o ./test32.m4.0.1 && mv test{,32.m4.0.1}.out partition 0: lut+dram: 216506, flip-flop: 56961, bram: 141955 partition 1: lut+dram: 86815, flip-flop: 36485, bram: 143550 partition 2: lut+dram: 142807, flip-flop: 49038, bram: 151525 partition 3: lut+dram: 142517, flip-flop: 48918, bram: 149930 compiling 64-bit metis-4.0.1 testcase... gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o ./test64.m4.0.1 && mv test{,64.m4.0.1}.out partition 0: lut+dram: 216506, flip-flop: 56961, bram: 141955 partition 1: lut+dram: 86815, flip-flop: 36485, bram: 143550 partition 2: lut+dram: 142807, flip-flop: 49038, bram: 151525 partition 3: lut+dram: 142517, flip-flop: 48918, bram: 149930 diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff similiar variant with math forced to x87 behaves differently: $ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="" EXTRA_CFLAGS64="-mfpmath=387" compiling 32-bit metis-4.0.1 testcase... gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o ./test32.m4.0.1 && mv test{,32.m4.0.1}.out partition 0: lut+dram: 150173, flip-flop: 46357, bram: 141955 partition 1: lut+dram: 153148, flip-flop: 47089, bram: 143550 partition 2: lut+dram: 141322, flip-flop: 49043, bram: 151525 partition 3: lut+dram: 144002, flip-flop: 48913, bram: 149930 compiling 64-bit metis-4.0.1 testcase... gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=387 gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o ./test64.m4.0.1 && mv test{,64.m4.0.1}.out partition 0: lut+dram: 216506, flip-flop: 56961, bram: 141955 partition 1: lut+dram: 86815, flip-flop: 36485, bram: 143550 partition 2: lut+dram: 142807, flip-flop: 49038, bram: 151525 partition 3: lut+dram: 142517, flip-flop: 48918, bram: 149930 diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff make: *** [all] Error 1 but.... adding -fexcess-precision=standard to 32-bit testcase gives me again bad but equal results. $ LANG=C make METIS_VER=4.0.1 EXTRA_CFLAGS="-march=core2 -mpc32" EXTRA_CFLAGS32="-fexcess-precision=standard" EXTRA_CFLAGS64="-mfpmath=387" compiling 32-bit metis-4.0.1 testcase... gcc -m32 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -fexcess-precision=standard gcc -m32 -lm *.o -o test32.m4.0.1 && rm *.o ./test32.m4.0.1 && mv test{,32.m4.0.1}.out partition 0: lut+dram: 216506, flip-flop: 56961, bram: 141955 partition 1: lut+dram: 86815, flip-flop: 36485, bram: 143550 partition 2: lut+dram: 142807, flip-flop: 49038, bram: 151525 partition 3: lut+dram: 142517, flip-flop: 48918, bram: 149930 compiling 64-bit metis-4.0.1 testcase... gcc -m64 -O1 -Imetis-4.0.1 metis-4.0.1/*.c test.c -c -march=core2 -mpc32 -mfpmath=387 gcc -m64 -lm *.o -o test64.m4.0.1 && rm *.o ./test64.m4.0.1 && mv test{,64.m4.0.1}.out partition 0: lut+dram: 216506, flip-flop: 56961, bram: 141955 partition 1: lut+dram: 86815, flip-flop: 36485, bram: 143550 partition 2: lut+dram: 142807, flip-flop: 49038, bram: 151525 partition 3: lut+dram: 142517, flip-flop: 48918, bram: 149930 diff -u test32.m4.0.1.out test64.m4.0.1.out >test.m4.0.1.out.diff should -mpc32 and equal fpmath model produce equal results (no matter good or bad) ? or mabye there's a bug in gcc exposed by explicit -fexcess-precision option? shoud i report this as potential gcc bug? BR, Paweł. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: question about equivalent x87/x64-64 fpu code... 2011-05-16 22:58 ` Pawel Sikora @ 2011-05-16 23:57 ` Andrew Haley 0 siblings, 0 replies; 4+ messages in thread From: Andrew Haley @ 2011-05-16 23:57 UTC (permalink / raw) To: gcc-help On 16/05/11 11:45, Pawel Sikora wrote: > On Monday 16 of May 2011 11:15:29 Andrew Haley wrote: >> On 13/05/11 19:11, PaweÅ Sikora wrote: >>> Hi, >>> >>> i'm using a 3rd-party engine http://glaros.dtc.umn.edu/gkhome/metis/metis/overview >>> for partitioning some complex data. it worked fine for years until today (may 13)... >>> >>> observations: >>> - the 32-bit metis build produces nice and balanced partitons. >>> - the 64-bit metis build produces bad and unbalanced partitons. >>> >>> the metis' engine uses arrays of integers on the public interface and internally >>> some float-based and unsafe in terms of precison (x<y and x==y) operations. >>> >>> so, i've built/tested following metis variants: >>> >>> 1). -m32 -march=pentium4 -O1 - works fine. >>> 2). -m32 -march=pentium4 -O1 -mfpmath=sse - works fine. >>> 3). -m64 -march=x86-64 -O1 - bad/unbalanced partitions. >>> 4). -m64 -march=x86-64 -O1 -mfpmath=387 - bad/unbalanced partitions. >>> >>> at this point i've expected wrong results (< 80-bit precision) from variants 2/3 >>> and good results from variants 1/4 but the real world differs. >>> >>> next, i've isolated a one place in sources with float x<y stmt and changed it >>> to (x-y)<0.00001. with such change both native 1/3 variants give nice/equivalent results. >>> >>> so, where is the problem? is the variants 1/4 really equivalent? >> >> It's going to be very hard for gcc specialists to answer this. You really >> need a numerical analyst who is familiar with the code to have a look. >> >> This may be a gcc bug, or it may be a bug in the code. It'd impossible >> to know without doing more digging into the problem. > > Hi, > > i've naturally reported these numerical problems to the author at first place > but i'm still impressed that code produced by gcc for x87/x86-64 with explicit > and equal -mpc32/-mfpmath options gives different results. > > should -mpc32 and equal fpmath model produce equal results (no matter good or bad) ? Not necessarily. Whatever libraries your code is calling won't be affected by the compiler options you use, for example. > or mabye there's a bug in gcc exposed by explicit -fexcess-precision option? Maybe. > shoud i report this as potential gcc bug? No, because we haven't even established that there is a gcc bug yet. There's little point in reporting a bug without a test case that shows what gcc is doing wrong. In general, floating-point on 64-bit x86 is better behaved than on 32-bit. Andrew. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-05-16 11:03 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-05-13 18:12 question about equivalent x87/x64-64 fpu code Paweł Sikora 2011-05-16 11:03 ` Andrew Haley 2011-05-16 22:58 ` Pawel Sikora 2011-05-16 23:57 ` Andrew Haley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).