* incorrectly rounded square root @ 2021-05-04 8:08 Paul Zimmermann 2021-05-31 20:52 ` Jeff Johnston 0 siblings, 1 reply; 19+ messages in thread From: Paul Zimmermann @ 2021-05-04 8:08 UTC (permalink / raw) To: newlib Hi, according to IEEE 754, the square root function should be correctly rounded for all rounding modes. I noticed this is not the case in Newlib: $ cat test_sqrt.c #include <stdio.h> #include <math.h> #include <fenv.h> #ifdef NEWLIB int errno; int* __errno () { return &errno; } #endif int main() { int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD }; char Rnd[4] = "NZUD"; float x = 0x1.ff07fep+127f; float y; for (int i = 0; i < 4; i++) { fesetround (rnd[i]); y = sqrtf (x); printf ("RND%c: %a\n", Rnd[i], y); } } $ gcc -DNEWLIB -fno-builtin test_sqrt.c /localdisk/zimmerma/newlib-4.1.0/libm.a -lm $ ./a.out RNDN: 0x1.ff83fp+63 RNDZ: 0x1.ff83fp+63 RNDU: 0x1.ff83fp+63 RNDD: 0x1.ff83fp+63 The RNDZ and RNDD results are wrong. With glibc I get: $ gcc -fno-builtin test_sqrt.c -lm $ ./a.out RNDN: 0x1.ff83fp+63 RNDZ: 0x1.ff83eep+63 RNDU: 0x1.ff83fp+63 RNDD: 0x1.ff83eep+63 Best regards, Paul Zimmermann ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-05-04 8:08 incorrectly rounded square root Paul Zimmermann @ 2021-05-31 20:52 ` Jeff Johnston 2021-06-01 7:11 ` Paul Zimmermann 0 siblings, 1 reply; 19+ messages in thread From: Jeff Johnston @ 2021-05-31 20:52 UTC (permalink / raw) To: Paul Zimmermann; +Cc: Newlib Hi Paul, Not all platforms supply a machine implementation of fesetround. Which platform are you using? -- Jeff J. On Tue, May 4, 2021 at 4:09 AM Paul Zimmermann <Paul.Zimmermann@inria.fr> wrote: > Hi, > > according to IEEE 754, the square root function should be correctly rounded > for all rounding modes. I noticed this is not the case in Newlib: > > $ cat test_sqrt.c > #include <stdio.h> > #include <math.h> > #include <fenv.h> > > #ifdef NEWLIB > int errno; > int* __errno () { return &errno; } > #endif > > int main() > { > int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD }; > char Rnd[4] = "NZUD"; > float x = 0x1.ff07fep+127f; > float y; > for (int i = 0; i < 4; i++) > { > fesetround (rnd[i]); > y = sqrtf (x); > printf ("RND%c: %a\n", Rnd[i], y); > } > } > > $ gcc -DNEWLIB -fno-builtin test_sqrt.c > /localdisk/zimmerma/newlib-4.1.0/libm.a -lm > $ ./a.out > RNDN: 0x1.ff83fp+63 > RNDZ: 0x1.ff83fp+63 > RNDU: 0x1.ff83fp+63 > RNDD: 0x1.ff83fp+63 > > The RNDZ and RNDD results are wrong. With glibc I get: > > $ gcc -fno-builtin test_sqrt.c -lm > $ ./a.out > RNDN: 0x1.ff83fp+63 > RNDZ: 0x1.ff83eep+63 > RNDU: 0x1.ff83fp+63 > RNDD: 0x1.ff83eep+63 > > Best regards, > Paul Zimmermann > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-05-31 20:52 ` Jeff Johnston @ 2021-06-01 7:11 ` Paul Zimmermann 2021-06-01 16:28 ` Jeff Johnston 0 siblings, 1 reply; 19+ messages in thread From: Paul Zimmermann @ 2021-06-01 7:11 UTC (permalink / raw) To: Jeff Johnston; +Cc: newlib Hi Jeff, > Not all platforms supply a machine implementation of fesetround. Which > platform are you using? this is on a Intel(R) Core(TM) i5-4590, with gcc 10.2.1, under Debian GNU/Linux 11 (bullseye). Best regards, Paul ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-01 7:11 ` Paul Zimmermann @ 2021-06-01 16:28 ` Jeff Johnston 2021-06-02 7:51 ` Paul Zimmermann 0 siblings, 1 reply; 19+ messages in thread From: Jeff Johnston @ 2021-06-01 16:28 UTC (permalink / raw) To: Paul Zimmermann; +Cc: Newlib, joel Ok, not sure what is happening. You should be using the newlib/libm/machine/shared_x86/fenv.c which includes fesetround which you can confirm. Comparing to glibc's version of fesetround, it is very similar. The only difference appears to be that glibc does the x87 fpu first, then the MSCSR register for sse, where newlib's version intermixes the code, but the same insn's are used. Looking at the sqrtf implementation of newlib's libm/math/ef_sqrt.c vs glibc's sysdeps/ieee754/flt-32/e_sqrtf.c the rounding logic is identical. /* use floating add to find out rounding direction */ if(ix!=0) { z = one-tiny; /* trigger inexact flag */ if (z>=one) { z = one+tiny; if (z>one) q += 2; else q += (q&1); } } ix = (q>>1)+0x3f000000; ix += (m <<23); SET_FLOAT_WORD(z,ix); return z; In fact, other than the normalization code, the code is identical (not sure that difference is applicable here). Corinna is on vacation so I'm cc'ing Joel in case he can offer some input. -- Jeff J. On Tue, Jun 1, 2021 at 3:12 AM Paul Zimmermann <Paul.Zimmermann@inria.fr> wrote: > Hi Jeff, > > > Not all platforms supply a machine implementation of fesetround. Which > > platform are you using? > > this is on a Intel(R) Core(TM) i5-4590, with gcc 10.2.1, under > Debian GNU/Linux 11 (bullseye). > > Best regards, > Paul > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-01 16:28 ` Jeff Johnston @ 2021-06-02 7:51 ` Paul Zimmermann 2021-06-02 13:07 ` Joel Sherrill 0 siblings, 1 reply; 19+ messages in thread From: Paul Zimmermann @ 2021-06-02 7:51 UTC (permalink / raw) To: Jeff Johnston; +Cc: newlib, joel Hi Jeff, thank you for your answer. After investigating more it seems fesetround() is ineffective with newlib. Indeed if I print the result of fegetround() just after the fesetround() calls I get: fegetround: 0 RNDN: 0x1.ff83fp+63 fegetround: 0 RNDZ: 0x1.ff83fp+63 fegetround: 0 RNDU: 0x1.ff83fp+63 fegetround: 0 RNDD: 0x1.ff83fp+63 whereas with GNU libc I get: fegetround: 0 RNDN: 0x1.ff83fp+63 fegetround: 3072 RNDZ: 0x1.ff83eep+63 fegetround: 2048 RNDU: 0x1.ff83fp+63 fegetround: 1024 RNDD: 0x1.ff83eep+63 Thus it seems this has nothing to do with the square root. According to gdb the fesetround() code used is the following: (gdb) break fesetround Breakpoint 1 at 0x1650: file ../../../../../../newlib/libm/machine/x86_64/fenv.c, line 371. Paul ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-02 7:51 ` Paul Zimmermann @ 2021-06-02 13:07 ` Joel Sherrill 2021-06-02 18:43 ` Jeff Johnston 0 siblings, 1 reply; 19+ messages in thread From: Joel Sherrill @ 2021-06-02 13:07 UTC (permalink / raw) To: Paul Zimmermann; +Cc: Jeff Johnston, Newlib On Wed, Jun 2, 2021 at 2:51 AM Paul Zimmermann <Paul.Zimmermann@inria.fr> wrote: > Hi Jeff, > > thank you for your answer. After investigating more it seems fesetround() > is ineffective with newlib. Indeed if I print the result of fegetround() > just after the fesetround() calls I get: > > fegetround: 0 > RNDN: 0x1.ff83fp+63 > fegetround: 0 > RNDZ: 0x1.ff83fp+63 > fegetround: 0 > RNDU: 0x1.ff83fp+63 > fegetround: 0 > RNDD: 0x1.ff83fp+63 > > whereas with GNU libc I get: > > fegetround: 0 > RNDN: 0x1.ff83fp+63 > fegetround: 3072 > RNDZ: 0x1.ff83eep+63 > fegetround: 2048 > RNDU: 0x1.ff83fp+63 > fegetround: 1024 > RNDD: 0x1.ff83eep+63 > > Thus it seems this has nothing to do with the square root. > > According to gdb the fesetround() code used is the following: > > (gdb) break fesetround > Breakpoint 1 at 0x1650: file > ../../../../../../newlib/libm/machine/x86_64/fenv.c, line 371. > That is close to what I see in the source. That matches a variable declaration in fesetround(). I was concerned that maybe the stub magic was resulting in empty methods getting in for one of these. But doing an objdump on i386-rtems, that doesn't appear to be the case. The implementation came from Cygwin and was lightly massaged to get here. A student and I migrated that code to newlib in Feb 2020. Corinna did some work in March 2021 to use this for Cygwin but I don't see any reason it is broken. The i386-rtems installed sys/fenv.h is the one for x86. fegetround() is quite simple and fesetround isn't much more than that. https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libm/machine/shared_x86/fenv.c;h=ccc08e2d8103ab974baf8d9591f71e5565d73ace;hb=HEAD#l350 I really expected to see a build issue and you using a stub. Could you confirm that the code we expect to be in those methods really is there for you? The line number indicates you have the expected code but.... The rounding is set in the x87 control register and then if SSE is there (dynamic check), it is also set in the SSE control register. The rounding more is read from the x86 control register and the SSE control register is ignored. Could you step through get and set and see if it looks like the x87 control register is actually changed? I'm confused. This looks like it should work. --joel > Paul > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-02 13:07 ` Joel Sherrill @ 2021-06-02 18:43 ` Jeff Johnston 2021-06-02 19:07 ` Marco Atzeri 0 siblings, 1 reply; 19+ messages in thread From: Jeff Johnston @ 2021-06-02 18:43 UTC (permalink / raw) To: joel; +Cc: Paul Zimmermann, Newlib On Wed, Jun 2, 2021 at 9:08 AM Joel Sherrill <joel@rtems.org> wrote: > > > On Wed, Jun 2, 2021 at 2:51 AM Paul Zimmermann <Paul.Zimmermann@inria.fr> > wrote: > >> Hi Jeff, >> >> thank you for your answer. After investigating more it seems fesetround() >> is ineffective with newlib. Indeed if I print the result of fegetround() >> just after the fesetround() calls I get: >> >> fegetround: 0 >> RNDN: 0x1.ff83fp+63 >> fegetround: 0 >> RNDZ: 0x1.ff83fp+63 >> fegetround: 0 >> RNDU: 0x1.ff83fp+63 >> fegetround: 0 >> RNDD: 0x1.ff83fp+63 >> >> whereas with GNU libc I get: >> >> fegetround: 0 >> RNDN: 0x1.ff83fp+63 >> fegetround: 3072 >> RNDZ: 0x1.ff83eep+63 >> fegetround: 2048 >> RNDU: 0x1.ff83fp+63 >> fegetround: 1024 >> RNDD: 0x1.ff83eep+63 >> >> Thus it seems this has nothing to do with the square root. >> >> According to gdb the fesetround() code used is the following: >> >> (gdb) break fesetround >> Breakpoint 1 at 0x1650: file >> ../../../../../../newlib/libm/machine/x86_64/fenv.c, line 371. >> > > That is close to what I see in the source. That matches a variable > declaration in fesetround(). > > I was concerned that maybe the stub magic was resulting in empty > methods getting in for one of these. But doing an objdump on i386-rtems, > that doesn't appear to be the case. > > The implementation came from Cygwin and was lightly massaged to > get here. A student and I migrated that code to newlib in Feb 2020. > Corinna did some work in March 2021 to use this for Cygwin but I > don't see any reason it is broken. The i386-rtems installed sys/fenv.h > is the one for x86. > > fegetround() is quite simple and fesetround isn't much more than that. > > > https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libm/machine/shared_x86/fenv.c;h=ccc08e2d8103ab974baf8d9591f71e5565d73ace;hb=HEAD#l350 > > I really expected to see a build issue and you using a stub. Could you > confirm that the code we expect to be in those methods really is there > for you? The line number indicates you have the expected code but.... > > The rounding is set in the x87 control register and then if SSE is there > (dynamic check), it is also set in the SSE control register. The rounding > more is read from the x86 control register and the SSE control register is > ignored. > > Could you step through get and set and see if it looks like the x87 > control register is actually changed? > > I'm confused. This looks like it should work. > > --joel > > I'll second Joel's comment. The code is extremely close to the glibc code both in sqrtf and fesetround. The only thing I can think of is that the glibc code does the x87 stuff first and does the set back into FPU state before doing the SSE stuff. The newlib code sets back the FPU state at the end after the SSE stuff. Don't know if this is relevant or not. Any Cygwin users out there who can verify that the code is working/not working for them? -- Jeff J. >> Paul >> >> >> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-02 18:43 ` Jeff Johnston @ 2021-06-02 19:07 ` Marco Atzeri 2021-06-02 19:12 ` Joel Sherrill 2021-06-03 11:25 ` Paul Zimmermann 0 siblings, 2 replies; 19+ messages in thread From: Marco Atzeri @ 2021-06-02 19:07 UTC (permalink / raw) To: newlib On 02.06.2021 20:43, Jeff Johnston wrote: > On Wed, Jun 2, 2021 at 9:08 AM Joel Sherrill <joel@rtems.org> wrote: > >> >> >> On Wed, Jun 2, 2021 at 2:51 AM Paul Zimmermann <Paul.Zimmermann@inria.fr> >> wrote: >> >> > I'll second Joel's comment. The code is extremely close to the glibc code > both in sqrtf and fesetround. The only > thing I can think of is that the glibc code does the x87 stuff first and > does the set back into FPU state before doing the > SSE stuff. The newlib code sets back the FPU state at the end after the > SSE stuff. Don't know if this is relevant or not. > > Any Cygwin users out there who can verify that the code is working/not > working for them? > > -- Jeff J. > current Cygwin produces for both i686 and X86_64 $ gcc -DNEWLIB -fno-builtin test_sqrt.c -lm $ ./a.exe RNDN: 0x1.ff83fp+63 RNDZ: 0x1.ff83fp+63 RNDU: 0x1.ff83fp+63 RNDD: 0x1.ff83fp+63 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-02 19:07 ` Marco Atzeri @ 2021-06-02 19:12 ` Joel Sherrill 2021-06-03 3:01 ` Jeff Johnston 2021-06-03 11:25 ` Paul Zimmermann 1 sibling, 1 reply; 19+ messages in thread From: Joel Sherrill @ 2021-06-02 19:12 UTC (permalink / raw) To: Marco Atzeri; +Cc: Newlib On Wed, Jun 2, 2021, 2:08 PM Marco Atzeri <marco.atzeri@gmail.com> wrote: > On 02.06.2021 20:43, Jeff Johnston wrote: > > On Wed, Jun 2, 2021 at 9:08 AM Joel Sherrill <joel@rtems.org> wrote: > > > >> > >> > >> On Wed, Jun 2, 2021 at 2:51 AM Paul Zimmermann < > Paul.Zimmermann@inria.fr> > >> wrote: > >> > > >> > > I'll second Joel's comment. The code is extremely close to the glibc > code > > both in sqrtf and fesetround. The only > > thing I can think of is that the glibc code does the x87 stuff first and > > does the set back into FPU state before doing the > > SSE stuff. The newlib code sets back the FPU state at the end after the > > SSE stuff. Don't know if this is relevant or not. > > > > Any Cygwin users out there who can verify that the code is working/not > > working for them? > > > > -- Jeff J. > > > > current Cygwin produces for both i686 and X86_64 > > $ gcc -DNEWLIB -fno-builtin test_sqrt.c -lm > > $ ./a.exe > RNDN: 0x1.ff83fp+63 > RNDZ: 0x1.ff83fp+63 > RNDU: 0x1.ff83fp+63 > RNDD: 0x1.ff83fp+63 > Does fegetround() return different values based on what mode was set? --joel > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-02 19:12 ` Joel Sherrill @ 2021-06-03 3:01 ` Jeff Johnston 2021-06-03 10:21 ` Paul Zimmermann 0 siblings, 1 reply; 19+ messages in thread From: Jeff Johnston @ 2021-06-03 3:01 UTC (permalink / raw) To: joel; +Cc: Marco Atzeri, Newlib [-- Attachment #1: Type: text/plain, Size: 1741 bytes --] Something weird is going on. I condensed the newlib code into the following files and when I compile and run the test, it works as expected: gcc -std=c99 -g3 -O0 test_sqrt.c fesetround.c ef_sqrt.c [jjohnstn@localhost shared_x86]$ ./a.out RNDN: 0x1.ff83fp+63 RNDZ: 0x1.ff83eep+63 RNDU: 0x1.ff83fp+63 RNDD: 0x1.ff83eep+63 Can you verify that you are calling the fesetround in fenv.c and what configuration call did you use for building newlib? -- Jeff J. On Wed, Jun 2, 2021 at 3:12 PM Joel Sherrill <joel@rtems.org> wrote: > On Wed, Jun 2, 2021, 2:08 PM Marco Atzeri <marco.atzeri@gmail.com> wrote: > > > On 02.06.2021 20:43, Jeff Johnston wrote: > > > On Wed, Jun 2, 2021 at 9:08 AM Joel Sherrill <joel@rtems.org> wrote: > > > > > >> > > >> > > >> On Wed, Jun 2, 2021 at 2:51 AM Paul Zimmermann < > > Paul.Zimmermann@inria.fr> > > >> wrote: > > >> > > > > >> > > > I'll second Joel's comment. The code is extremely close to the glibc > > code > > > both in sqrtf and fesetround. The only > > > thing I can think of is that the glibc code does the x87 stuff first > and > > > does the set back into FPU state before doing the > > > SSE stuff. The newlib code sets back the FPU state at the end after > the > > > SSE stuff. Don't know if this is relevant or not. > > > > > > Any Cygwin users out there who can verify that the code is working/not > > > working for them? > > > > > > -- Jeff J. > > > > > > > current Cygwin produces for both i686 and X86_64 > > > > $ gcc -DNEWLIB -fno-builtin test_sqrt.c -lm > > > > $ ./a.exe > > RNDN: 0x1.ff83fp+63 > > RNDZ: 0x1.ff83fp+63 > > RNDU: 0x1.ff83fp+63 > > RNDD: 0x1.ff83fp+63 > > > > Does fegetround() return different values based on what mode was set? > > --joel > > > > > > > [-- Attachment #2: ef_sqrt.c --] [-- Type: text/x-csrc, Size: 1939 bytes --] /* ef_sqrtf.c -- float version of e_sqrt.c. * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com. */ /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ #include "fdlibm.h" #ifdef __STDC__ static const float one = 1.0, tiny=1.0e-30; #else static float one = 1.0, tiny=1.0e-30; #endif float sqrtf(float x) { float z; __uint32_t r,hx; __int32_t ix,s,q,m,t,i; GET_FLOAT_WORD(ix,x); hx = ix&0x7fffffff; /* take care of Inf and NaN */ if(!FLT_UWORD_IS_FINITE(hx)) return x*x+x; /* sqrt(NaN)=NaN, sqrt(+inf)=+inf sqrt(-inf)=sNaN */ /* take care of zero and -ves */ if(FLT_UWORD_IS_ZERO(hx)) return x;/* sqrt(+-0) = +-0 */ if(ix<0) return (x-x)/(x-x); /* sqrt(-ve) = sNaN */ /* normalize x */ m = (ix>>23); if(FLT_UWORD_IS_SUBNORMAL(hx)) { /* subnormal x */ for(i=0;(ix&0x00800000L)==0;i++) ix<<=1; m -= i-1; } m -= 127; /* unbias exponent */ ix = (ix&0x007fffffL)|0x00800000L; if(m&1) /* odd m, double x to make it even */ ix += ix; m >>= 1; /* m = [m/2] */ /* generate sqrt(x) bit by bit */ ix += ix; q = s = 0; /* q = sqrt(x) */ r = 0x01000000L; /* r = moving bit from right to left */ while(r!=0) { t = s+r; if(t<=ix) { s = t+r; ix -= t; q += r; } ix += ix; r>>=1; } /* use floating add to find out rounding direction */ if(ix!=0) { z = one-tiny; /* trigger inexact flag */ if (z>=one) { z = one+tiny; if (z>one) q += 2; else q += (q&1); } } ix = (q>>1)+0x3f000000L; ix += (m <<23); SET_FLOAT_WORD(z,ix); return z; } [-- Attachment #3: fdlibm.h --] [-- Type: text/x-chdr, Size: 12567 bytes --] /* @(#)fdlibm.h 5.1 93/09/24 */ /* * ==================================================== * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. * * Developed at SunPro, a Sun Microsystems, Inc. business. * Permission to use, copy, modify, and distribute this * software is freely granted, provided that this notice * is preserved. * ==================================================== */ /* REDHAT LOCAL: Include files. */ #include <sys/types.h> #define __IEEE_LITTLE_ENDIAN /* Most routines need to check whether a float is finite, infinite, or not a number, and many need to know whether the result of an operation will overflow. These conditions depend on whether the largest exponent is used for NaNs & infinities, or whether it's used for finite numbers. The macros below wrap up that kind of information: FLT_UWORD_IS_FINITE(X) True if a positive float with bitmask X is finite. FLT_UWORD_IS_NAN(X) True if a positive float with bitmask X is not a number. FLT_UWORD_IS_INFINITE(X) True if a positive float with bitmask X is +infinity. FLT_UWORD_MAX The bitmask of FLT_MAX. FLT_UWORD_HALF_MAX The bitmask of FLT_MAX/2. FLT_UWORD_EXP_MAX The bitmask of the largest finite exponent (129 if the largest exponent is used for finite numbers, 128 otherwise). FLT_UWORD_LOG_MAX The bitmask of log(FLT_MAX), rounded down. This value is the largest input that can be passed to exp() without producing overflow. FLT_UWORD_LOG_2MAX The bitmask of log(2*FLT_MAX), rounded down. This value is the largest input than can be passed to cosh() without producing overflow. FLT_LARGEST_EXP The largest biased exponent that can be used for finite numbers (255 if the largest exponent is used for finite numbers, 254 otherwise) */ #ifdef _FLT_LARGEST_EXPONENT_IS_NORMAL #define FLT_UWORD_IS_FINITE(x) 1 #define FLT_UWORD_IS_NAN(x) 0 #define FLT_UWORD_IS_INFINITE(x) 0 #define FLT_UWORD_MAX 0x7fffffff #define FLT_UWORD_EXP_MAX 0x43010000 #define FLT_UWORD_LOG_MAX 0x42b2d4fc #define FLT_UWORD_LOG_2MAX 0x42b437e0 #define HUGE ((float)0X1.FFFFFEP128) #else #define FLT_UWORD_IS_FINITE(x) ((x)<0x7f800000L) #define FLT_UWORD_IS_NAN(x) ((x)>0x7f800000L) #define FLT_UWORD_IS_INFINITE(x) ((x)==0x7f800000L) #define FLT_UWORD_MAX 0x7f7fffffL #define FLT_UWORD_EXP_MAX 0x43000000 #define FLT_UWORD_LOG_MAX 0x42b17217 #define FLT_UWORD_LOG_2MAX 0x42b2d4fc #define HUGE ((float)3.40282346638528860e+38) #endif #define FLT_UWORD_HALF_MAX (FLT_UWORD_MAX-(1L<<23)) #define FLT_LARGEST_EXP (FLT_UWORD_MAX>>23) /* Many routines check for zero and subnormal numbers. Such things depend on whether the target supports denormals or not: FLT_UWORD_IS_ZERO(X) True if a positive float with bitmask X is +0. Without denormals, any float with a zero exponent is a +0 representation. With denormals, the only +0 representation is a 0 bitmask. FLT_UWORD_IS_SUBNORMAL(X) True if a non-zero positive float with bitmask X is subnormal. (Routines should check for zeros first.) FLT_UWORD_MIN The bitmask of the smallest float above +0. Call this number REAL_FLT_MIN... FLT_UWORD_EXP_MIN The bitmask of the float representation of REAL_FLT_MIN's exponent. FLT_UWORD_LOG_MIN The bitmask of |log(REAL_FLT_MIN)|, rounding down. FLT_SMALLEST_EXP REAL_FLT_MIN's exponent - EXP_BIAS (1 if denormals are not supported, -22 if they are). */ #ifdef _FLT_NO_DENORMALS #define FLT_UWORD_IS_ZERO(x) ((x)<0x00800000L) #define FLT_UWORD_IS_SUBNORMAL(x) 0 #define FLT_UWORD_MIN 0x00800000 #define FLT_UWORD_EXP_MIN 0x42fc0000 #define FLT_UWORD_LOG_MIN 0x42aeac50 #define FLT_SMALLEST_EXP 1 #else #define FLT_UWORD_IS_ZERO(x) ((x)==0) #define FLT_UWORD_IS_SUBNORMAL(x) ((x)<0x00800000L) #define FLT_UWORD_MIN 0x00000001 #define FLT_UWORD_EXP_MIN 0x43160000 #define FLT_UWORD_LOG_MIN 0x42cff1b5 #define FLT_SMALLEST_EXP -22 #endif #ifdef __STDC__ #undef __P #define __P(p) p #else #define __P(p) () #endif /* * set X_TLOSS = pi*2**52, which is possibly defined in <values.h> * (one may replace the following line by "#include <values.h>") */ #define X_TLOSS 1.41484755040568800000e+16 /* Functions that are not documented, and are not in <math.h>. */ #ifdef _SCALB_INT extern double scalb __P((double, int)); #else extern double scalb __P((double, double)); #endif extern double significand __P((double)); extern long double __ieee754_hypotl __P((long double, long double)); /* ieee style elementary functions */ extern double __ieee754_sqrt __P((double)); extern double __ieee754_acos __P((double)); extern double __ieee754_acosh __P((double)); extern double __ieee754_log __P((double)); extern double __ieee754_atanh __P((double)); extern double __ieee754_asin __P((double)); extern double __ieee754_atan2 __P((double,double)); extern double __ieee754_exp __P((double)); extern double __ieee754_cosh __P((double)); extern double __ieee754_fmod __P((double,double)); extern double __ieee754_pow __P((double,double)); extern double __ieee754_lgamma_r __P((double,int *)); extern double __ieee754_gamma_r __P((double,int *)); extern double __ieee754_tgamma __P((double)); extern double __ieee754_log10 __P((double)); extern double __ieee754_sinh __P((double)); extern double __ieee754_hypot __P((double,double)); extern double __ieee754_j0 __P((double)); extern double __ieee754_j1 __P((double)); extern double __ieee754_y0 __P((double)); extern double __ieee754_y1 __P((double)); extern double __ieee754_jn __P((int,double)); extern double __ieee754_yn __P((int,double)); extern double __ieee754_remainder __P((double,double)); extern __int32_t __ieee754_rem_pio2 __P((double,double*)); #ifdef _SCALB_INT extern double __ieee754_scalb __P((double,int)); #else extern double __ieee754_scalb __P((double,double)); #endif /* fdlibm kernel function */ extern double __kernel_standard __P((double,double,int)); extern double __kernel_sin __P((double,double,int)); extern double __kernel_cos __P((double,double)); extern double __kernel_tan __P((double,double,int)); extern int __kernel_rem_pio2 __P((double*,double*,int,int,int,const __int32_t*)); /* Undocumented float functions. */ #ifdef _SCALB_INT extern float scalbf __P((float, int)); #else extern float scalbf __P((float, float)); #endif extern float significandf __P((float)); /* ieee style elementary float functions */ extern float __ieee754_sqrtf __P((float)); extern float __ieee754_acosf __P((float)); extern float __ieee754_acoshf __P((float)); extern float __ieee754_logf __P((float)); extern float __ieee754_atanhf __P((float)); extern float __ieee754_asinf __P((float)); extern float __ieee754_atan2f __P((float,float)); extern float __ieee754_expf __P((float)); extern float __ieee754_coshf __P((float)); extern float __ieee754_fmodf __P((float,float)); extern float __ieee754_powf __P((float,float)); extern float __ieee754_lgammaf_r __P((float,int *)); extern float __ieee754_gammaf_r __P((float,int *)); extern float __ieee754_tgammaf __P((float)); extern float __ieee754_log10f __P((float)); extern float __ieee754_sinhf __P((float)); extern float __ieee754_hypotf __P((float,float)); extern float __ieee754_j0f __P((float)); extern float __ieee754_j1f __P((float)); extern float __ieee754_y0f __P((float)); extern float __ieee754_y1f __P((float)); extern float __ieee754_jnf __P((int,float)); extern float __ieee754_ynf __P((int,float)); extern float __ieee754_remainderf __P((float,float)); extern __int32_t __ieee754_rem_pio2f __P((float,float*)); #ifdef _SCALB_INT extern float __ieee754_scalbf __P((float,int)); #else extern float __ieee754_scalbf __P((float,float)); #endif #if !__OBSOLETE_MATH /* The new math code does not provide separate wrapper function for error handling, so the extern symbol is called directly. This is valid as long as there are no namespace issues (the extern symbol is reserved whenever the caller is reserved) and there are no observable error handling side effects. */ # define __ieee754_exp(x) exp(x) # define __ieee754_log(x) log(x) # define __ieee754_pow(x,y) pow(x,y) # define __ieee754_expf(x) expf(x) # define __ieee754_logf(x) logf(x) # define __ieee754_powf(x,y) powf(x,y) #endif /* float versions of fdlibm kernel functions */ extern float __kernel_sinf __P((float,float,int)); extern float __kernel_cosf __P((float,float)); extern float __kernel_tanf __P((float,float,int)); extern int __kernel_rem_pio2f __P((float*,float*,int,int,int,const __int32_t*)); /* The original code used statements like n0 = ((*(int*)&one)>>29)^1; * index of high word * ix0 = *(n0+(int*)&x); * high word of x * ix1 = *((1-n0)+(int*)&x); * low word of x * to dig two 32 bit words out of the 64 bit IEEE floating point value. That is non-ANSI, and, moreover, the gcc instruction scheduler gets it wrong. We instead use the following macros. Unlike the original code, we determine the endianness at compile time, not at run time; I don't see much benefit to selecting endianness at run time. */ #ifndef __IEEE_BIG_ENDIAN #ifndef __IEEE_LITTLE_ENDIAN #error Must define endianness #endif #endif /* A union which permits us to convert between a double and two 32 bit ints. */ #ifdef __IEEE_BIG_ENDIAN typedef union { double value; struct { __uint32_t msw; __uint32_t lsw; } parts; } ieee_double_shape_type; #endif #ifdef __IEEE_LITTLE_ENDIAN typedef union { double value; struct { __uint32_t lsw; __uint32_t msw; } parts; } ieee_double_shape_type; #endif /* Get two 32 bit ints from a double. */ #define EXTRACT_WORDS(ix0,ix1,d) \ do { \ ieee_double_shape_type ew_u; \ ew_u.value = (d); \ (ix0) = ew_u.parts.msw; \ (ix1) = ew_u.parts.lsw; \ } while (0) /* Get the more significant 32 bit int from a double. */ #define GET_HIGH_WORD(i,d) \ do { \ ieee_double_shape_type gh_u; \ gh_u.value = (d); \ (i) = gh_u.parts.msw; \ } while (0) /* Get the less significant 32 bit int from a double. */ #define GET_LOW_WORD(i,d) \ do { \ ieee_double_shape_type gl_u; \ gl_u.value = (d); \ (i) = gl_u.parts.lsw; \ } while (0) /* Set a double from two 32 bit ints. */ #define INSERT_WORDS(d,ix0,ix1) \ do { \ ieee_double_shape_type iw_u; \ iw_u.parts.msw = (ix0); \ iw_u.parts.lsw = (ix1); \ (d) = iw_u.value; \ } while (0) /* Set the more significant 32 bits of a double from an int. */ #define SET_HIGH_WORD(d,v) \ do { \ ieee_double_shape_type sh_u; \ sh_u.value = (d); \ sh_u.parts.msw = (v); \ (d) = sh_u.value; \ } while (0) /* Set the less significant 32 bits of a double from an int. */ #define SET_LOW_WORD(d,v) \ do { \ ieee_double_shape_type sl_u; \ sl_u.value = (d); \ sl_u.parts.lsw = (v); \ (d) = sl_u.value; \ } while (0) /* A union which permits us to convert between a float and a 32 bit int. */ typedef union { float value; __uint32_t word; } ieee_float_shape_type; /* Get a 32 bit int from a float. */ #define GET_FLOAT_WORD(i,d) \ do { \ ieee_float_shape_type gf_u; \ gf_u.value = (d); \ (i) = gf_u.word; \ } while (0) /* Set a float from a 32 bit int. */ #define SET_FLOAT_WORD(d,i) \ do { \ ieee_float_shape_type sf_u; \ sf_u.word = (i); \ (d) = sf_u.value; \ } while (0) /* Macros to avoid undefined behaviour that can arise if the amount of a shift is exactly equal to the size of the shifted operand. */ #define SAFE_LEFT_SHIFT(op,amt) \ (((amt) < 8 * sizeof(op)) ? ((op) << (amt)) : 0) #define SAFE_RIGHT_SHIFT(op,amt) \ (((amt) < 8 * sizeof(op)) ? ((op) >> (amt)) : 0) #ifdef _COMPLEX_H /* * Quoting from ISO/IEC 9899:TC2: * * 6.2.5.13 Types * Each complex type has the same representation and alignment requirements as * an array type containing exactly two elements of the corresponding real type; * the first element is equal to the real part, and the second element to the * imaginary part, of the complex number. */ typedef union { float complex z; float parts[2]; } float_complex; typedef union { double complex z; double parts[2]; } double_complex; typedef union { long double complex z; long double parts[2]; } long_double_complex; #define REAL_PART(z) ((z).parts[0]) #define IMAG_PART(z) ((z).parts[1]) #endif /* _COMPLEX_H */ [-- Attachment #4: fesetround.c --] [-- Type: text/x-csrc, Size: 2168 bytes --] /* * SPDX-License-Identifier: BSD-2-Clause * * Copyright (c) 2010-2019 Red Hat, Inc. */ #define _GNU_SOURCE // for FE_NOMASK_ENV #include <errno.h> /* Mask and shift amount for rounding bits. */ #define FE_CW_ROUND_MASK (0x0c00) #define FE_CW_ROUND_SHIFT (10) /* Same, for SSE MXCSR. */ #define FE_MXCSR_ROUND_MASK (0x6000) #define FE_MXCSR_ROUND_SHIFT (13) #define FE_DOWNWARD (1) #define FE_TONEAREST (0) #define FE_TOWARDZERO (3) #define FE_UPWARD (2) static inline int use_sse(void) { unsigned int edx, eax; /* Check for presence of SSE: invoke CPUID #1, check EDX bit 25. */ eax = 1; __asm__ volatile ("cpuid" : "=d" (edx), "+a" (eax) :: "%ecx", "%ebx"); /* If this flag isn't set we'll avoid trying to execute any SSE. */ if ((edx & (1 << 25)) != 0) return 1; return 0; } /* Returns the currently selected rounding mode, represented by one of the values of the defined rounding mode macros. */ int fegetround (void) { unsigned short cw; /* Get control word. We assume SSE and x87 stay in sync. */ __asm__ volatile ("fnstcw %0" : "=m" (*&cw) : ); return (cw & FE_CW_ROUND_MASK) >> FE_CW_ROUND_SHIFT; } /* Changes the currently selected rounding mode to round. If round does not correspond to one of the supported rounding modes nothing is changed. fesetround returns zero if it changed the rounding mode, a nonzero value if the mode is not supported. */ int fesetround (int round) { unsigned short cw; unsigned int mxcsr = 0; /* Will succeed for any valid value of the input parameter. */ if (round < FE_TONEAREST || round > FE_TOWARDZERO) return EINVAL; /* Get control words. */ __asm__ volatile ("fnstcw %0" : "=m" (cw) : ); if (use_sse()) __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : ); /* Twiddle bits. */ cw &= ~FE_CW_ROUND_MASK; cw |= (round << FE_CW_ROUND_SHIFT); mxcsr &= ~FE_MXCSR_ROUND_MASK; mxcsr |= (round << FE_MXCSR_ROUND_SHIFT); /* Set back into FPU state. */ __asm__ volatile ("fldcw %0" :: "m" (cw)); if (use_sse()) __asm__ volatile ("ldmxcsr %0" :: "m" (mxcsr)); /* Indicate success. */ return 0; } [-- Attachment #5: test_sqrt.c --] [-- Type: text/x-csrc, Size: 446 bytes --] #include <stdio.h> #include <math.h> int fesetround(int); #define FE_DOWNWARD (1) #define FE_TONEAREST (0) #define FE_TOWARDZERO (3) #define FE_UPWARD (2) int main() { int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD }; char Rnd[4] = "NZUD"; float x = 0x1.ff07fep+127f; float y; for (int i = 0; i < 4; i++) { fesetround(rnd[i]); y = sqrtf (x); printf ("RND%c: %a\n", Rnd[i], y); } } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-03 3:01 ` Jeff Johnston @ 2021-06-03 10:21 ` Paul Zimmermann 2021-06-03 15:50 ` Jeff Johnston 0 siblings, 1 reply; 19+ messages in thread From: Paul Zimmermann @ 2021-06-03 10:21 UTC (permalink / raw) To: Jeff Johnston; +Cc: joel, newlib Hi Jeff, I've investigated and found (partially) the cause of the problem. When I compile my program, gcc uses the system fenv.h, which contains FE_TONEAREST=0, FE_DOWNWARD=0x400, FE_UPWARD=0x800, and FE_TOWARDZERO=0xc00 in /usr/include/bits/fenv.h. However, the values in x86_64/newlib/targ-include/sys/fenv.h are different: FE_TONEAREST=0, FE_DOWNWARD=1, FE_UPWARD=2, FE_TOWARDZERO=3. Thus the test at the beginning of fesetround in newlib/libm/machine/x86_64/fenv.c fails except for round=FE_TONEAREST: /* Will succeed for any valid value of the input parameter. */ if (round < FE_TONEAREST || round > FE_TOWARDZERO) return EINVAL; enter fesetround, round=0 use_sse=1 enter fesetround, round=3072 use_sse=1 enter fesetround, round=2048 use_sse=1 enter fesetround, round=1024 use_sse=1 This explains why the other modes are not set. A first question is why Newlib and the system libm (here glibc) use different constants for the rounding modes. Now if I compile with -I/tmp/x86_64/include (where the Newlib fenv.h is installed) I get: enter fesetround, round=0 use_sse=1 enter fesetround, round=3 use_sse=1 enter fesetround, round=2 use_sse=1 enter fesetround, round=1 use_sse=1 but I still do get the same results whatever the rounding mode! If I look at the cw and mxcsr registers like in Newlib fenv.c, I get with glibc (after the fesetround call): fegetround: 0 cw=895 mxcsr=8064 RNDN: 0x1.ff83fp+63 fegetround: 3072 cw=3967 mxcsr=32672 RNDZ: 0x1.ff83eep+63 fegetround: 2048 cw=2943 mxcsr=24480 RNDU: 0x1.ff83fp+63 fegetround: 1024 cw=1919 mxcsr=16288 RNDD: 0x1.ff83eep+63 and with Newlib: fegetround: 0 cw=895 mxcsr=8064 RNDN: 0x1.ff83fp+63 fegetround: 3 cw=3967 mxcsr=32640 RNDZ: 0x1.ff83fp+63 fegetround: 2 cw=2943 mxcsr=24448 RNDU: 0x1.ff83fp+63 fegetround: 1 cw=1919 mxcsr=16256 RNDD: 0x1.ff83fp+63 The cw values do match but not the mxcsr values. Best regards, Paul > Something weird is going on. I condensed the newlib code into the > following files and when I compile and run the test, > it works as expected: > > gcc -std=c99 -g3 -O0 test_sqrt.c fesetround.c ef_sqrt.c > [jjohnstn@localhost shared_x86]$ ./a.out > RNDN: 0x1.ff83fp+63 > RNDZ: 0x1.ff83eep+63 > RNDU: 0x1.ff83fp+63 > RNDD: 0x1.ff83eep+63 > > Can you verify that you are calling the fesetround in fenv.c and what > configuration call did you use for building newlib? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-03 10:21 ` Paul Zimmermann @ 2021-06-03 15:50 ` Jeff Johnston 2021-06-04 7:14 ` Paul Zimmermann 0 siblings, 1 reply; 19+ messages in thread From: Jeff Johnston @ 2021-06-03 15:50 UTC (permalink / raw) To: Paul Zimmermann; +Cc: joel, Newlib Hi Paul, I figured the values were off when I had to hard-code them in my own test_sqrt.c but forgot to include that info in my note. Now, that said, using the code I attached earlier, I am seeing the exact values you are quoting above for glibc for the mxcsr register and the round is working. Have your tried running that code? The mxcsr values you are seeing that are different are not due to the fesetround code. The code is shifting the round value 13 bits and for 3, that ends up being 0x6000. It is masking mxcsr with 0xffff9fff first so when you start with 0x1fxx and end up with 0x7fxx, the code is doing what is supposed to do. The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) which is a bit in the last 2 hex digits which isn't touched by the code logic. -- Jeff J. On Thu, Jun 3, 2021 at 6:22 AM Paul Zimmermann <Paul.Zimmermann@inria.fr> wrote: > Hi Jeff, > > I've investigated and found (partially) the cause of the problem. When I > compile my program, gcc uses the system fenv.h, which contains > FE_TONEAREST=0, > FE_DOWNWARD=0x400, FE_UPWARD=0x800, and FE_TOWARDZERO=0xc00 in > /usr/include/bits/fenv.h. > > However, the values in x86_64/newlib/targ-include/sys/fenv.h are different: > FE_TONEAREST=0, FE_DOWNWARD=1, FE_UPWARD=2, FE_TOWARDZERO=3. Thus the test > at the beginning of fesetround in newlib/libm/machine/x86_64/fenv.c fails > except for round=FE_TONEAREST: > > /* Will succeed for any valid value of the input parameter. */ > if (round < FE_TONEAREST || round > FE_TOWARDZERO) > return EINVAL; > > enter fesetround, round=0 use_sse=1 > enter fesetround, round=3072 use_sse=1 > enter fesetround, round=2048 use_sse=1 > enter fesetround, round=1024 use_sse=1 > > This explains why the other modes are not set. > > A first question is why Newlib and the system libm (here glibc) use > different > constants for the rounding modes. > > Now if I compile with -I/tmp/x86_64/include (where the Newlib fenv.h is > installed) I get: > > enter fesetround, round=0 use_sse=1 > enter fesetround, round=3 use_sse=1 > enter fesetround, round=2 use_sse=1 > enter fesetround, round=1 use_sse=1 > > but I still do get the same results whatever the rounding mode! > > If I look at the cw and mxcsr registers like in Newlib fenv.c, I get with > glibc > (after the fesetround call): > > fegetround: 0 > cw=895 mxcsr=8064 > RNDN: 0x1.ff83fp+63 > fegetround: 3072 > cw=3967 mxcsr=32672 > RNDZ: 0x1.ff83eep+63 > fegetround: 2048 > cw=2943 mxcsr=24480 > RNDU: 0x1.ff83fp+63 > fegetround: 1024 > cw=1919 mxcsr=16288 > RNDD: 0x1.ff83eep+63 > > and with Newlib: > > fegetround: 0 > cw=895 mxcsr=8064 > RNDN: 0x1.ff83fp+63 > fegetround: 3 > cw=3967 mxcsr=32640 > RNDZ: 0x1.ff83fp+63 > fegetround: 2 > cw=2943 mxcsr=24448 > RNDU: 0x1.ff83fp+63 > fegetround: 1 > cw=1919 mxcsr=16256 > RNDD: 0x1.ff83fp+63 > > The cw values do match but not the mxcsr values. > > Best regards, > Paul > > > Something weird is going on. I condensed the newlib code into the > > following files and when I compile and run the test, > > it works as expected: > > > > gcc -std=c99 -g3 -O0 test_sqrt.c fesetround.c ef_sqrt.c > > [jjohnstn@localhost shared_x86]$ ./a.out > > RNDN: 0x1.ff83fp+63 > > RNDZ: 0x1.ff83eep+63 > > RNDU: 0x1.ff83fp+63 > > RNDD: 0x1.ff83eep+63 > > > > Can you verify that you are calling the fesetround in fenv.c and what > > configuration call did you use for building newlib? > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-03 15:50 ` Jeff Johnston @ 2021-06-04 7:14 ` Paul Zimmermann 2021-06-04 18:44 ` Jeff Johnston 0 siblings, 1 reply; 19+ messages in thread From: Paul Zimmermann @ 2021-06-04 7:14 UTC (permalink / raw) To: Jeff Johnston; +Cc: joel, newlib Hi Jeff, > I figured the values were off when I had to hard-code them in my own > test_sqrt.c but forgot to include that info in my note. > > Now, that said, using the code I attached earlier, I am seeing the exact > values you are quoting above for glibc for the mxcsr register and the round > is working. Have your > tried running that code? yes it works as expected, but it doesn't work with Newlib's fenv.h and libm.a (see below). > The mxcsr values you are seeing that are different are not due to the > fesetround code. The code is shifting the round value 13 bits > and for 3, that ends up being 0x6000. It is masking mxcsr with 0xffff9fff > first so when you start with 0x1fxx and end up with 0x7fxx, the code is > doing what is supposed to do. > The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) which is a > bit in the last 2 hex digits which isn't touched by the code logic. here is how to reproduce the issue: tar xf newlib-4.1.0.tar.gz cd newlib-4.1.0 mkdir build cd build ../configure --prefix=/tmp --disable-multilib --target=x86_64 make -j4 make install $ cat test_sqrt_2.c #include <stdio.h> #include <math.h> #include <fenv.h> #ifdef NEWLIB /* RedHat's libm claims: undefined reference to `__errno' in j1f/y1f */ int errno; int* __errno () { return &errno; } #endif int main() { int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD }; char Rnd[4] = "NZUD"; float x = 0x1.ff07fep+127f; float y; for (int i = 0; i < 4; i++) { unsigned short cw; unsigned int mxcsr = 0; fesetround (rnd[i]); __asm__ volatile ("fnstcw %0" : "=m" (cw) : ); __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : ); y = sqrtf (x); printf ("RND%c: %a cw=%u mxcsr=%u\n", Rnd[i], y, cw, mxcsr); } } With GNU libc: $ gcc -fno-builtin test_sqrt_2.c -lm $ ./a.out RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672 RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480 RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288 With Newlib: $ gcc -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c /tmp/libm.a $ ./a.out RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 RNDZ: 0x1.ff83fp+63 cw=3967 mxcsr=32640 RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24448 RNDD: 0x1.ff83fp+63 cw=1919 mxcsr=16256 Can you reproduce that on x86_64 Linux? Paul ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-04 7:14 ` Paul Zimmermann @ 2021-06-04 18:44 ` Jeff Johnston 2021-06-04 18:59 ` Joel Sherrill ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Jeff Johnston @ 2021-06-04 18:44 UTC (permalink / raw) To: Paul Zimmermann; +Cc: joel, Newlib Ok, I now know exactly what is happening. The compiler is optimizing out the rounding check in ef_sqrt.c, probably due to the operation using two constants. 86 ix += (m <<23); (gdb) list 81 else 82 q += (q&1); When I debug, it always does the else at line 81 without performing the one-tiny operation. The difference in the mxcsr register is the PE bit which I believe gets set when you do the one-tiny operation. Since we aren't doing it, it never gets set on and the difference of 0x20 in the mxcsr register is explained. By making the constants volatile, I am able to get the code working as it should. I have pushed a patch for this. -- Jeff J. On Fri, Jun 4, 2021 at 3:14 AM Paul Zimmermann <Paul.Zimmermann@inria.fr> wrote: > Hi Jeff, > > > I figured the values were off when I had to hard-code them in my own > > test_sqrt.c but forgot to include that info in my note. > > > > Now, that said, using the code I attached earlier, I am seeing the exact > > values you are quoting above for glibc for the mxcsr register and the > round > > is working. Have your > > tried running that code? > > yes it works as expected, but it doesn't work with Newlib's fenv.h and > libm.a > (see below). > > > The mxcsr values you are seeing that are different are not due to the > > fesetround code. The code is shifting the round value 13 bits > > and for 3, that ends up being 0x6000. It is masking mxcsr with > 0xffff9fff > > first so when you start with 0x1fxx and end up with 0x7fxx, the code is > > doing what is supposed to do. > > The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) which is a > > bit in the last 2 hex digits which isn't touched by the code logic. > > here is how to reproduce the issue: > > tar xf newlib-4.1.0.tar.gz > cd newlib-4.1.0 > mkdir build > cd build > ../configure --prefix=/tmp --disable-multilib --target=x86_64 > make -j4 > make install > > $ cat test_sqrt_2.c > #include <stdio.h> > #include <math.h> > #include <fenv.h> > > #ifdef NEWLIB > /* RedHat's libm claims: > undefined reference to `__errno' in j1f/y1f */ > int errno; > int* __errno () { return &errno; } > #endif > > int main() > { > int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD }; > char Rnd[4] = "NZUD"; > float x = 0x1.ff07fep+127f; > float y; > for (int i = 0; i < 4; i++) > { > unsigned short cw; > unsigned int mxcsr = 0; > fesetround (rnd[i]); > __asm__ volatile ("fnstcw %0" : "=m" (cw) : ); > __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : ); > y = sqrtf (x); > printf ("RND%c: %a cw=%u mxcsr=%u\n", Rnd[i], y, cw, mxcsr); > } > } > > With GNU libc: > $ gcc -fno-builtin test_sqrt_2.c -lm > $ ./a.out > RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 > RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672 > RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480 > RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288 > > With Newlib: > $ gcc -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c /tmp/libm.a > $ ./a.out > RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 > RNDZ: 0x1.ff83fp+63 cw=3967 mxcsr=32640 > RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24448 > RNDD: 0x1.ff83fp+63 cw=1919 mxcsr=16256 > > Can you reproduce that on x86_64 Linux? > > Paul > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-04 18:44 ` Jeff Johnston @ 2021-06-04 18:59 ` Joel Sherrill 2021-06-05 13:25 ` Brian Inglis 2021-06-12 22:31 ` Maciej W. Rozycki 2 siblings, 0 replies; 19+ messages in thread From: Joel Sherrill @ 2021-06-04 18:59 UTC (permalink / raw) To: Jeff Johnston; +Cc: Paul Zimmermann, Newlib On Fri, Jun 4, 2021, 1:44 PM Jeff Johnston <jjohnstn@redhat.com> wrote: > Ok, I now know exactly what is happening. > > The compiler is optimizing out the rounding check in ef_sqrt.c, probably > due to the operation using two constants. > > 86 ix += (m <<23); > (gdb) list > 81 else > 82 q += (q&1); > > When I debug, it always does the else at line 81 without performing the > one-tiny operation. The difference in the mxcsr > register is the PE bit which I believe gets set when you do the one-tiny > operation. Since we aren't doing it, it never gets > set on and the difference of 0x20 in the mxcsr register is explained. > > By making the constants volatile, I am able to get the code working as it > should. I have pushed a patch for this. > Awesome catch Paul and great eye to spot the problem Jeff! --joel > > -- Jeff J. > > On Fri, Jun 4, 2021 at 3:14 AM Paul Zimmermann <Paul.Zimmermann@inria.fr> > wrote: > >> Hi Jeff, >> >> > I figured the values were off when I had to hard-code them in my own >> > test_sqrt.c but forgot to include that info in my note. >> > >> > Now, that said, using the code I attached earlier, I am seeing the exact >> > values you are quoting above for glibc for the mxcsr register and the >> round >> > is working. Have your >> > tried running that code? >> >> yes it works as expected, but it doesn't work with Newlib's fenv.h and >> libm.a >> (see below). >> >> > The mxcsr values you are seeing that are different are not due to the >> > fesetround code. The code is shifting the round value 13 bits >> > and for 3, that ends up being 0x6000. It is masking mxcsr with >> 0xffff9fff >> > first so when you start with 0x1fxx and end up with 0x7fxx, the code is >> > doing what is supposed to do. >> > The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) which is >> a >> > bit in the last 2 hex digits which isn't touched by the code logic. >> >> here is how to reproduce the issue: >> >> tar xf newlib-4.1.0.tar.gz >> cd newlib-4.1.0 >> mkdir build >> cd build >> ../configure --prefix=/tmp --disable-multilib --target=x86_64 >> make -j4 >> make install >> >> $ cat test_sqrt_2.c >> #include <stdio.h> >> #include <math.h> >> #include <fenv.h> >> >> #ifdef NEWLIB >> /* RedHat's libm claims: >> undefined reference to `__errno' in j1f/y1f */ >> int errno; >> int* __errno () { return &errno; } >> #endif >> >> int main() >> { >> int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD }; >> char Rnd[4] = "NZUD"; >> float x = 0x1.ff07fep+127f; >> float y; >> for (int i = 0; i < 4; i++) >> { >> unsigned short cw; >> unsigned int mxcsr = 0; >> fesetround (rnd[i]); >> __asm__ volatile ("fnstcw %0" : "=m" (cw) : ); >> __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : ); >> y = sqrtf (x); >> printf ("RND%c: %a cw=%u mxcsr=%u\n", Rnd[i], y, cw, mxcsr); >> } >> } >> >> With GNU libc: >> $ gcc -fno-builtin test_sqrt_2.c -lm >> $ ./a.out >> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 >> RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672 >> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480 >> RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288 >> >> With Newlib: >> $ gcc -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c >> /tmp/libm.a >> $ ./a.out >> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 >> RNDZ: 0x1.ff83fp+63 cw=3967 mxcsr=32640 >> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24448 >> RNDD: 0x1.ff83fp+63 cw=1919 mxcsr=16256 >> >> Can you reproduce that on x86_64 Linux? >> >> Paul >> >> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-04 18:44 ` Jeff Johnston 2021-06-04 18:59 ` Joel Sherrill @ 2021-06-05 13:25 ` Brian Inglis 2021-06-07 9:51 ` Paul Zimmermann 2021-06-12 22:31 ` Maciej W. Rozycki 2 siblings, 1 reply; 19+ messages in thread From: Brian Inglis @ 2021-06-05 13:25 UTC (permalink / raw) To: newlib Great catch, analysis, and fix! Now sqrtf rounds correctly on Cygwin! $ ./test-sqrtf-round Direction CW MX Input Hex Input Decimal Sqrt Hex Sqrt Decimal RNDN 0 0 37f 1f80: 0x1.ff07fe00p+127 339638501828070541185766401939693633536 0x1.ff83f000p+63 18429283829060468736 RNDD 1 1 77f 3f80: 0x1.ff07fe00p+127 339638501828070541185766401939693633536 0x1.ff83ee00p+63 18429282729548840960 RNDU 2 2 b7f 5f80: 0x1.ff07fe00p+127 339638501828070541185766401939693633536 0x1.ff83f000p+63 18429283829060468736 RNDZ 3 3 f7f 7f80: 0x1.ff07fe00p+127 339638501828070541185766401939693633536 0x1.ff83ee00p+63 18429282729548840960 -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. On 2021-06-04 12:44, Jeff Johnston wrote: > Ok, I now know exactly what is happening. > The compiler is optimizing out the rounding check in ef_sqrt.c, probably > due to the operation using two constants. > 86 ix += (m <<23); > (gdb) list > 81 else > 82 q += (q&1); > When I debug, it always does the else at line 81 without performing the > one-tiny operation. The difference in the mxcsr > register is the PE bit which I believe gets set when you do the one-tiny > operation. Since we aren't doing it, it never gets > set on and the difference of 0x20 in the mxcsr register is explained. > By making the constants volatile, I am able to get the code working > as it should. I have pushed a patch for this. > On Fri, Jun 4, 2021 at 3:14 AM Paul Zimmermann wrote: >>> I figured the values were off when I had to hard-code them in my own >>> test_sqrt.c but forgot to include that info in my note. >>> >>> Now, that said, using the code I attached earlier, I am seeing >>> the exact values you are quoting above for glibc for the mxcsr >>> register and the round is working. Have your tried running that >>> code? >> yes it works as expected, but it doesn't work with Newlib's fenv.h and >> libm.a (see below). >>> The mxcsr values you are seeing that are different are not due to >>> the fesetround code. The code is shifting the round value 13 >>> bits and for 3, that ends up being 0x6000. It is masking mxcsr >>> with 0xffff9fff first so when you start with 0x1fxx and end up >>> with 0x7fxx, the code is doing what is supposed to do. >>> The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) >>> which is a bit in the last 2 hex digits which isn't touched by >>> the code logic. >> here is how to reproduce the issue: >> tar xf newlib-4.1.0.tar.gz >> cd newlib-4.1.0 >> mkdir build >> cd build >> ../configure --prefix=/tmp --disable-multilib --target=x86_64 >> make -j4 >> make install >> $ cat test_sqrt_2.c >> #include <stdio.h> >> #include <math.h> >> #include <fenv.h> >> #ifdef NEWLIB >> /* RedHat's libm claims: >> undefined reference to `__errno' in j1f/y1f */ >> int errno; >> int* __errno () { return &errno; } >> #endif >> int main() >> { >> int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD }; >> char Rnd[4] = "NZUD"; >> float x = 0x1.ff07fep+127f; >> float y; >> for (int i = 0; i < 4; i++) >> { >> unsigned short cw; >> unsigned int mxcsr = 0; >> fesetround (rnd[i]); >> __asm__ volatile ("fnstcw %0" : "=m" (cw) : ); >> __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : ); >> y = sqrtf (x); >> printf ("RND%c: %a cw=%u mxcsr=%u\n", Rnd[i], y, cw, mxcsr); >> } >> } >> With GNU libc: >> $ gcc -fno-builtin test_sqrt_2.c -lm >> $ ./a.out >> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 >> RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672 >> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480 >> RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288 >> With Newlib: >> $ gcc -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c /tmp/libm.a >> $ ./a.out >> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 >> RNDZ: 0x1.ff83fp+63 cw=3967 mxcsr=32640 >> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24448 >> RNDD: 0x1.ff83fp+63 cw=1919 mxcsr=16256 >> Can you reproduce that on x86_64 Linux? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-05 13:25 ` Brian Inglis @ 2021-06-07 9:51 ` Paul Zimmermann 0 siblings, 0 replies; 19+ messages in thread From: Paul Zimmermann @ 2021-06-07 9:51 UTC (permalink / raw) To: newlib; +Cc: newlib > Great catch, analysis, and fix! > Now sqrtf rounds correctly on Cygwin! thank you, I also confirm I now get expected results, after applying Jeff's patch on top of Newlib 4.1.0: zimmerma@tomate:/tmp$ gcc -frounding-math -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c /tmp/x86_64/lib/libm.a zimmerma@tomate:/tmp$ ./a.out RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672 RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480 RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288 zimmerma@tomate:/tmp$ gcc -frounding-math -fno-builtin test_sqrt_2.c -lm zimmerma@tomate:/tmp$ ./a.out RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064 RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672 RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480 RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288 Paul ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-04 18:44 ` Jeff Johnston 2021-06-04 18:59 ` Joel Sherrill 2021-06-05 13:25 ` Brian Inglis @ 2021-06-12 22:31 ` Maciej W. Rozycki 2 siblings, 0 replies; 19+ messages in thread From: Maciej W. Rozycki @ 2021-06-12 22:31 UTC (permalink / raw) To: Jeff Johnston; +Cc: Paul Zimmermann, Newlib On Fri, 4 Jun 2021, Jeff Johnston wrote: > The compiler is optimizing out the rounding check in ef_sqrt.c, probably > due to the operation using two constants. Have you checked the `-frounding-math' option to GCC? This piece of code (as probably all newlib's math pieces) ought to be built with that option, and if that still does not help (which it may well, as noted in the GCC manual), then would you please file a bug with GCC, attaching the test case discussed here as a reproducer? FWIW glibc does use said option, which may be why seemingly equivalent code works with glibc but not with newlib. HTH, Maciej ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: incorrectly rounded square root 2021-06-02 19:07 ` Marco Atzeri 2021-06-02 19:12 ` Joel Sherrill @ 2021-06-03 11:25 ` Paul Zimmermann 1 sibling, 0 replies; 19+ messages in thread From: Paul Zimmermann @ 2021-06-03 11:25 UTC (permalink / raw) To: Marco Atzeri; +Cc: newlib Dear Marco, > current Cygwin produces for both i686 and X86_64 > > $ gcc -DNEWLIB -fno-builtin test_sqrt.c -lm > > $ ./a.exe > RNDN: 0x1.ff83fp+63 > RNDZ: 0x1.ff83fp+63 > RNDU: 0x1.ff83fp+63 > RNDD: 0x1.ff83fp+63 make sure you are using the Newlib fenv.h (or print the values of FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD). Cf my previous mail. Paul ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-06-12 22:31 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-04 8:08 incorrectly rounded square root Paul Zimmermann 2021-05-31 20:52 ` Jeff Johnston 2021-06-01 7:11 ` Paul Zimmermann 2021-06-01 16:28 ` Jeff Johnston 2021-06-02 7:51 ` Paul Zimmermann 2021-06-02 13:07 ` Joel Sherrill 2021-06-02 18:43 ` Jeff Johnston 2021-06-02 19:07 ` Marco Atzeri 2021-06-02 19:12 ` Joel Sherrill 2021-06-03 3:01 ` Jeff Johnston 2021-06-03 10:21 ` Paul Zimmermann 2021-06-03 15:50 ` Jeff Johnston 2021-06-04 7:14 ` Paul Zimmermann 2021-06-04 18:44 ` Jeff Johnston 2021-06-04 18:59 ` Joel Sherrill 2021-06-05 13:25 ` Brian Inglis 2021-06-07 9:51 ` Paul Zimmermann 2021-06-12 22:31 ` Maciej W. Rozycki 2021-06-03 11:25 ` Paul Zimmermann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).