incorrectly rounded square root

public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed

* incorrectly rounded square root
@ 2021-05-04  8:08 Paul Zimmermann
  2021-05-31 20:52 ` Jeff Johnston
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Zimmermann @ 2021-05-04  8:08 UTC (permalink / raw)
  To: newlib

       Hi,

according to IEEE 754, the square root function should be correctly rounded
for all rounding modes. I noticed this is not the case in Newlib:

$ cat test_sqrt.c
#include <stdio.h>
#include <math.h>
#include <fenv.h>

#ifdef NEWLIB
int errno;
int* __errno () { return &errno; }
#endif

int main()
{
  int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD };
  char Rnd[4] = "NZUD";
  float x = 0x1.ff07fep+127f;
  float y;
  for (int i = 0; i < 4; i++)
  {
    fesetround (rnd[i]);
    y = sqrtf (x);
    printf ("RND%c: %a\n", Rnd[i], y);
  }
}

$ gcc -DNEWLIB -fno-builtin test_sqrt.c /localdisk/zimmerma/newlib-4.1.0/libm.a -lm
$ ./a.out
RNDN: 0x1.ff83fp+63
RNDZ: 0x1.ff83fp+63
RNDU: 0x1.ff83fp+63
RNDD: 0x1.ff83fp+63

The RNDZ and RNDD results are wrong. With glibc I get:

$ gcc -fno-builtin test_sqrt.c -lm
$ ./a.out
RNDN: 0x1.ff83fp+63
RNDZ: 0x1.ff83eep+63
RNDU: 0x1.ff83fp+63
RNDD: 0x1.ff83eep+63

Best regards,
Paul Zimmermann


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-05-04  8:08 incorrectly rounded square root Paul Zimmermann
@ 2021-05-31 20:52 ` Jeff Johnston
  2021-06-01  7:11   ` Paul Zimmermann
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Johnston @ 2021-05-31 20:52 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: Newlib

Hi Paul,

Not all platforms supply a machine implementation of fesetround.  Which
platform are you using?

-- Jeff J.

On Tue, May 4, 2021 at 4:09 AM Paul Zimmermann <Paul.Zimmermann@inria.fr>
wrote:

>        Hi,
>
> according to IEEE 754, the square root function should be correctly rounded
> for all rounding modes. I noticed this is not the case in Newlib:
>
> $ cat test_sqrt.c
> #include <stdio.h>
> #include <math.h>
> #include <fenv.h>
>
> #ifdef NEWLIB
> int errno;
> int* __errno () { return &errno; }
> #endif
>
> int main()
> {
>   int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD };
>   char Rnd[4] = "NZUD";
>   float x = 0x1.ff07fep+127f;
>   float y;
>   for (int i = 0; i < 4; i++)
>   {
>     fesetround (rnd[i]);
>     y = sqrtf (x);
>     printf ("RND%c: %a\n", Rnd[i], y);
>   }
> }
>
> $ gcc -DNEWLIB -fno-builtin test_sqrt.c
> /localdisk/zimmerma/newlib-4.1.0/libm.a -lm
> $ ./a.out
> RNDN: 0x1.ff83fp+63
> RNDZ: 0x1.ff83fp+63
> RNDU: 0x1.ff83fp+63
> RNDD: 0x1.ff83fp+63
>
> The RNDZ and RNDD results are wrong. With glibc I get:
>
> $ gcc -fno-builtin test_sqrt.c -lm
> $ ./a.out
> RNDN: 0x1.ff83fp+63
> RNDZ: 0x1.ff83eep+63
> RNDU: 0x1.ff83fp+63
> RNDD: 0x1.ff83eep+63
>
> Best regards,
> Paul Zimmermann
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-05-31 20:52 ` Jeff Johnston
@ 2021-06-01  7:11   ` Paul Zimmermann
  2021-06-01 16:28     ` Jeff Johnston
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Zimmermann @ 2021-06-01  7:11 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: newlib

       Hi Jeff,

> Not all platforms supply a machine implementation of fesetround.  Which
> platform are you using?

this is on a Intel(R) Core(TM) i5-4590, with gcc 10.2.1, under
Debian GNU/Linux 11 (bullseye).

Best regards,
Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-01  7:11   ` Paul Zimmermann
@ 2021-06-01 16:28     ` Jeff Johnston
  2021-06-02  7:51       ` Paul Zimmermann
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Johnston @ 2021-06-01 16:28 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: Newlib, joel

Ok, not sure what is happening.

You should be using the newlib/libm/machine/shared_x86/fenv.c which
includes fesetround which you can confirm.

Comparing to glibc's version of fesetround, it is very similar.  The only
difference appears to be that glibc does the x87 fpu first, then the MSCSR
register for sse,
where newlib's version intermixes the code, but the same insn's are used.

Looking at the sqrtf implementation of newlib's libm/math/ef_sqrt.c vs
glibc's sysdeps/ieee754/flt-32/e_sqrtf.c
the rounding logic is identical.

    /* use floating add to find out rounding direction */
        if(ix!=0) {
            z = one-tiny; /* trigger inexact flag */
            if (z>=one) {
                z = one+tiny;
                if (z>one)
                    q += 2;
                else
                    q += (q&1);
            }
        }
        ix = (q>>1)+0x3f000000;
        ix += (m <<23);
        SET_FLOAT_WORD(z,ix);
        return z;

In fact, other than the normalization code, the code is identical (not sure
that difference is applicable here).

Corinna is on vacation so I'm cc'ing Joel in case he can offer some input.

-- Jeff J.

On Tue, Jun 1, 2021 at 3:12 AM Paul Zimmermann <Paul.Zimmermann@inria.fr>
wrote:

>        Hi Jeff,
>
> > Not all platforms supply a machine implementation of fesetround.  Which
> > platform are you using?
>
> this is on a Intel(R) Core(TM) i5-4590, with gcc 10.2.1, under
> Debian GNU/Linux 11 (bullseye).
>
> Best regards,
> Paul
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-01 16:28     ` Jeff Johnston
@ 2021-06-02  7:51       ` Paul Zimmermann
  2021-06-02 13:07         ` Joel Sherrill
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Zimmermann @ 2021-06-02  7:51 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: newlib, joel

       Hi Jeff,

thank you for your answer. After investigating more it seems fesetround()
is ineffective with newlib. Indeed if I print the result of fegetround()
just after the fesetround() calls I get:

fegetround: 0
RNDN: 0x1.ff83fp+63
fegetround: 0
RNDZ: 0x1.ff83fp+63
fegetround: 0
RNDU: 0x1.ff83fp+63
fegetround: 0
RNDD: 0x1.ff83fp+63

whereas with GNU libc I get:

fegetround: 0
RNDN: 0x1.ff83fp+63
fegetround: 3072
RNDZ: 0x1.ff83eep+63
fegetround: 2048
RNDU: 0x1.ff83fp+63
fegetround: 1024
RNDD: 0x1.ff83eep+63

Thus it seems this has nothing to do with the square root.

According to gdb the fesetround() code used is the following:

(gdb) break fesetround
Breakpoint 1 at 0x1650: file ../../../../../../newlib/libm/machine/x86_64/fenv.c, line 371.

Paul



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-02  7:51       ` Paul Zimmermann
@ 2021-06-02 13:07         ` Joel Sherrill
  2021-06-02 18:43           ` Jeff Johnston
  0 siblings, 1 reply; 19+ messages in thread
From: Joel Sherrill @ 2021-06-02 13:07 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: Jeff Johnston, Newlib

On Wed, Jun 2, 2021 at 2:51 AM Paul Zimmermann <Paul.Zimmermann@inria.fr>
wrote:

>        Hi Jeff,
>
> thank you for your answer. After investigating more it seems fesetround()
> is ineffective with newlib. Indeed if I print the result of fegetround()
> just after the fesetround() calls I get:
>
> fegetround: 0
> RNDN: 0x1.ff83fp+63
> fegetround: 0
> RNDZ: 0x1.ff83fp+63
> fegetround: 0
> RNDU: 0x1.ff83fp+63
> fegetround: 0
> RNDD: 0x1.ff83fp+63
>
> whereas with GNU libc I get:
>
> fegetround: 0
> RNDN: 0x1.ff83fp+63
> fegetround: 3072
> RNDZ: 0x1.ff83eep+63
> fegetround: 2048
> RNDU: 0x1.ff83fp+63
> fegetround: 1024
> RNDD: 0x1.ff83eep+63
>
> Thus it seems this has nothing to do with the square root.
>
> According to gdb the fesetround() code used is the following:
>
> (gdb) break fesetround
> Breakpoint 1 at 0x1650: file
> ../../../../../../newlib/libm/machine/x86_64/fenv.c, line 371.
>

That is close to what I see in the source. That matches a variable
declaration in fesetround().

I was concerned that maybe the stub magic was resulting in empty
methods getting in for one of these. But doing an objdump on i386-rtems,
that doesn't appear to be the case.

The implementation came from Cygwin and was lightly massaged to
get here. A student and I migrated that code to newlib in Feb 2020.
Corinna did some work in March 2021 to use this for Cygwin but I
don't see any reason it is broken. The i386-rtems installed sys/fenv.h
is the one for x86.

fegetround() is quite simple and fesetround isn't much more than that.

https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libm/machine/shared_x86/fenv.c;h=ccc08e2d8103ab974baf8d9591f71e5565d73ace;hb=HEAD#l350

 I really expected to see a build issue and you using a stub. Could you
confirm that the code we expect to be in those methods really is there
for you? The line number indicates you have the expected code but....

The rounding is set in the x87 control register and then if SSE is there
(dynamic check), it is also set in the SSE control register. The rounding
more is read from the x86 control register and the SSE control register is
ignored.

Could you step through get and set and see if it looks like the x87
control register is actually changed?

I'm confused. This looks like it should work.

--joel

> Paul
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-02 13:07         ` Joel Sherrill
@ 2021-06-02 18:43           ` Jeff Johnston
  2021-06-02 19:07             ` Marco Atzeri
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Johnston @ 2021-06-02 18:43 UTC (permalink / raw)
  To: joel; +Cc: Paul Zimmermann, Newlib

On Wed, Jun 2, 2021 at 9:08 AM Joel Sherrill <joel@rtems.org> wrote:

>
>
> On Wed, Jun 2, 2021 at 2:51 AM Paul Zimmermann <Paul.Zimmermann@inria.fr>
> wrote:
>
>>        Hi Jeff,
>>
>> thank you for your answer. After investigating more it seems fesetround()
>> is ineffective with newlib. Indeed if I print the result of fegetround()
>> just after the fesetround() calls I get:
>>
>> fegetround: 0
>> RNDN: 0x1.ff83fp+63
>> fegetround: 0
>> RNDZ: 0x1.ff83fp+63
>> fegetround: 0
>> RNDU: 0x1.ff83fp+63
>> fegetround: 0
>> RNDD: 0x1.ff83fp+63
>>
>> whereas with GNU libc I get:
>>
>> fegetround: 0
>> RNDN: 0x1.ff83fp+63
>> fegetround: 3072
>> RNDZ: 0x1.ff83eep+63
>> fegetround: 2048
>> RNDU: 0x1.ff83fp+63
>> fegetround: 1024
>> RNDD: 0x1.ff83eep+63
>>
>> Thus it seems this has nothing to do with the square root.
>>
>> According to gdb the fesetround() code used is the following:
>>
>> (gdb) break fesetround
>> Breakpoint 1 at 0x1650: file
>> ../../../../../../newlib/libm/machine/x86_64/fenv.c, line 371.
>>
>
> That is close to what I see in the source. That matches a variable
> declaration in fesetround().
>
> I was concerned that maybe the stub magic was resulting in empty
> methods getting in for one of these. But doing an objdump on i386-rtems,
> that doesn't appear to be the case.
>
> The implementation came from Cygwin and was lightly massaged to
> get here. A student and I migrated that code to newlib in Feb 2020.
> Corinna did some work in March 2021 to use this for Cygwin but I
> don't see any reason it is broken. The i386-rtems installed sys/fenv.h
> is the one for x86.
>
> fegetround() is quite simple and fesetround isn't much more than that.
>
>
> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libm/machine/shared_x86/fenv.c;h=ccc08e2d8103ab974baf8d9591f71e5565d73ace;hb=HEAD#l350
>
>  I really expected to see a build issue and you using a stub. Could you
> confirm that the code we expect to be in those methods really is there
> for you? The line number indicates you have the expected code but....
>
> The rounding is set in the x87 control register and then if SSE is there
> (dynamic check), it is also set in the SSE control register. The rounding
> more is read from the x86 control register and the SSE control register is
> ignored.
>
> Could you step through get and set and see if it looks like the x87
> control register is actually changed?
>
> I'm confused. This looks like it should work.
>
> --joel
>
>
I'll second Joel's comment.  The code is extremely close to the glibc code
both in sqrtf and fesetround.  The only
thing I can think of is that the glibc code does the x87 stuff first and
does the set back into FPU state before doing the
SSE stuff.  The newlib code sets back the FPU state at the end after the
SSE stuff.  Don't know if this is relevant or not.

Any Cygwin users out there who can verify that the code is working/not
working for them?

-- Jeff J.


>> Paul
>>
>>
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-02 18:43           ` Jeff Johnston
@ 2021-06-02 19:07             ` Marco Atzeri
  2021-06-02 19:12               ` Joel Sherrill
  2021-06-03 11:25               ` Paul Zimmermann
  0 siblings, 2 replies; 19+ messages in thread
From: Marco Atzeri @ 2021-06-02 19:07 UTC (permalink / raw)
  To: newlib

On 02.06.2021 20:43, Jeff Johnston wrote:
> On Wed, Jun 2, 2021 at 9:08 AM Joel Sherrill <joel@rtems.org> wrote:
> 
>>
>>
>> On Wed, Jun 2, 2021 at 2:51 AM Paul Zimmermann <Paul.Zimmermann@inria.fr>
>> wrote:
>>

>>
> I'll second Joel's comment.  The code is extremely close to the glibc code
> both in sqrtf and fesetround.  The only
> thing I can think of is that the glibc code does the x87 stuff first and
> does the set back into FPU state before doing the
> SSE stuff.  The newlib code sets back the FPU state at the end after the
> SSE stuff.  Don't know if this is relevant or not.
> 
> Any Cygwin users out there who can verify that the code is working/not
> working for them?
> 
> -- Jeff J.
> 

current Cygwin produces for both i686 and X86_64

  $ gcc -DNEWLIB -fno-builtin test_sqrt.c  -lm

  $ ./a.exe
RNDN: 0x1.ff83fp+63
RNDZ: 0x1.ff83fp+63
RNDU: 0x1.ff83fp+63
RNDD: 0x1.ff83fp+63


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-02 19:07             ` Marco Atzeri
@ 2021-06-02 19:12               ` Joel Sherrill
  2021-06-03  3:01                 ` Jeff Johnston
  2021-06-03 11:25               ` Paul Zimmermann
  1 sibling, 1 reply; 19+ messages in thread
From: Joel Sherrill @ 2021-06-02 19:12 UTC (permalink / raw)
  To: Marco Atzeri; +Cc: Newlib

On Wed, Jun 2, 2021, 2:08 PM Marco Atzeri <marco.atzeri@gmail.com> wrote:

> On 02.06.2021 20:43, Jeff Johnston wrote:
> > On Wed, Jun 2, 2021 at 9:08 AM Joel Sherrill <joel@rtems.org> wrote:
> >
> >>
> >>
> >> On Wed, Jun 2, 2021 at 2:51 AM Paul Zimmermann <
> Paul.Zimmermann@inria.fr>
> >> wrote:
> >>
>
> >>
> > I'll second Joel's comment.  The code is extremely close to the glibc
> code
> > both in sqrtf and fesetround.  The only
> > thing I can think of is that the glibc code does the x87 stuff first and
> > does the set back into FPU state before doing the
> > SSE stuff.  The newlib code sets back the FPU state at the end after the
> > SSE stuff.  Don't know if this is relevant or not.
> >
> > Any Cygwin users out there who can verify that the code is working/not
> > working for them?
> >
> > -- Jeff J.
> >
>
> current Cygwin produces for both i686 and X86_64
>
>   $ gcc -DNEWLIB -fno-builtin test_sqrt.c  -lm
>
>   $ ./a.exe
> RNDN: 0x1.ff83fp+63
> RNDZ: 0x1.ff83fp+63
> RNDU: 0x1.ff83fp+63
> RNDD: 0x1.ff83fp+63
>

Does fegetround() return different values based on what mode was set?

--joel

>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-02 19:12               ` Joel Sherrill
@ 2021-06-03  3:01                 ` Jeff Johnston
  2021-06-03 10:21                   ` Paul Zimmermann
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Johnston @ 2021-06-03  3:01 UTC (permalink / raw)
  To: joel; +Cc: Marco Atzeri, Newlib

[-- Attachment #1: Type: text/plain, Size: 1741 bytes --]

Something  weird is going on.  I condensed the newlib code into the
following files and when I compile and run the test,
it works as expected:

gcc -std=c99 -g3 -O0 test_sqrt.c fesetround.c ef_sqrt.c
[jjohnstn@localhost shared_x86]$ ./a.out
RNDN: 0x1.ff83fp+63
RNDZ: 0x1.ff83eep+63
RNDU: 0x1.ff83fp+63
RNDD: 0x1.ff83eep+63

Can you verify that you are calling the fesetround in fenv.c and what
configuration call did you use for building newlib?

-- Jeff J.



On Wed, Jun 2, 2021 at 3:12 PM Joel Sherrill <joel@rtems.org> wrote:

> On Wed, Jun 2, 2021, 2:08 PM Marco Atzeri <marco.atzeri@gmail.com> wrote:
>
> > On 02.06.2021 20:43, Jeff Johnston wrote:
> > > On Wed, Jun 2, 2021 at 9:08 AM Joel Sherrill <joel@rtems.org> wrote:
> > >
> > >>
> > >>
> > >> On Wed, Jun 2, 2021 at 2:51 AM Paul Zimmermann <
> > Paul.Zimmermann@inria.fr>
> > >> wrote:
> > >>
> >
> > >>
> > > I'll second Joel's comment.  The code is extremely close to the glibc
> > code
> > > both in sqrtf and fesetround.  The only
> > > thing I can think of is that the glibc code does the x87 stuff first
> and
> > > does the set back into FPU state before doing the
> > > SSE stuff.  The newlib code sets back the FPU state at the end after
> the
> > > SSE stuff.  Don't know if this is relevant or not.
> > >
> > > Any Cygwin users out there who can verify that the code is working/not
> > > working for them?
> > >
> > > -- Jeff J.
> > >
> >
> > current Cygwin produces for both i686 and X86_64
> >
> >   $ gcc -DNEWLIB -fno-builtin test_sqrt.c  -lm
> >
> >   $ ./a.exe
> > RNDN: 0x1.ff83fp+63
> > RNDZ: 0x1.ff83fp+63
> > RNDU: 0x1.ff83fp+63
> > RNDD: 0x1.ff83fp+63
> >
>
> Does fegetround() return different values based on what mode was set?
>
> --joel
>
> >
> >
>
>

[-- Attachment #2: ef_sqrt.c --]
[-- Type: text/x-csrc, Size: 1939 bytes --]

/* ef_sqrtf.c -- float version of e_sqrt.c.
 * Conversion to float by Ian Lance Taylor, Cygnus Support, ian@cygnus.com.
 */

/*
 * ====================================================
 * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
 *
 * Developed at SunPro, a Sun Microsystems, Inc. business.
 * Permission to use, copy, modify, and distribute this
 * software is freely granted, provided that this notice 
 * is preserved.
 * ====================================================
 */

#include "fdlibm.h"

#ifdef __STDC__
static	const float	one	= 1.0, tiny=1.0e-30;
#else
static	float	one	= 1.0, tiny=1.0e-30;
#endif

	float sqrtf(float x)
{
	float z;
	__uint32_t r,hx;
	__int32_t ix,s,q,m,t,i;

	GET_FLOAT_WORD(ix,x);
	hx = ix&0x7fffffff;

    /* take care of Inf and NaN */
	if(!FLT_UWORD_IS_FINITE(hx))
	    return x*x+x;		/* sqrt(NaN)=NaN, sqrt(+inf)=+inf
					   sqrt(-inf)=sNaN */
    /* take care of zero and -ves */
	if(FLT_UWORD_IS_ZERO(hx)) return x;/* sqrt(+-0) = +-0 */
	if(ix<0) return (x-x)/(x-x);		/* sqrt(-ve) = sNaN */

    /* normalize x */
	m = (ix>>23);
	if(FLT_UWORD_IS_SUBNORMAL(hx)) {		/* subnormal x */
	    for(i=0;(ix&0x00800000L)==0;i++) ix<<=1;
	    m -= i-1;
	}
	m -= 127;	/* unbias exponent */
	ix = (ix&0x007fffffL)|0x00800000L;
	if(m&1)	/* odd m, double x to make it even */
	    ix += ix;
	m >>= 1;	/* m = [m/2] */

    /* generate sqrt(x) bit by bit */
	ix += ix;
	q = s = 0;		/* q = sqrt(x) */
	r = 0x01000000L;		/* r = moving bit from right to left */

	while(r!=0) {
	    t = s+r; 
	    if(t<=ix) { 
		s    = t+r; 
		ix  -= t; 
		q   += r; 
	    } 
	    ix += ix;
	    r>>=1;
	}

    /* use floating add to find out rounding direction */
	if(ix!=0) {
	    z = one-tiny; /* trigger inexact flag */
	    if (z>=one) {
	        z = one+tiny;
		if (z>one)
		    q += 2;
		else
		    q += (q&1);
	    }
	}
	ix = (q>>1)+0x3f000000L;
	ix += (m <<23);
	SET_FLOAT_WORD(z,ix);
	return z;
}

[-- Attachment #3: fdlibm.h --]
[-- Type: text/x-chdr, Size: 12567 bytes --]


/* @(#)fdlibm.h 5.1 93/09/24 */
/*
 * ====================================================
 * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
 *
 * Developed at SunPro, a Sun Microsystems, Inc. business.
 * Permission to use, copy, modify, and distribute this
 * software is freely granted, provided that this notice 
 * is preserved.
 * ====================================================
 */

/* REDHAT LOCAL: Include files.  */
#include <sys/types.h>

#define __IEEE_LITTLE_ENDIAN

/* Most routines need to check whether a float is finite, infinite, or not a
   number, and many need to know whether the result of an operation will
   overflow.  These conditions depend on whether the largest exponent is
   used for NaNs & infinities, or whether it's used for finite numbers.  The
   macros below wrap up that kind of information:

   FLT_UWORD_IS_FINITE(X)
	True if a positive float with bitmask X is finite.

   FLT_UWORD_IS_NAN(X)
	True if a positive float with bitmask X is not a number.

   FLT_UWORD_IS_INFINITE(X)
	True if a positive float with bitmask X is +infinity.

   FLT_UWORD_MAX
	The bitmask of FLT_MAX.

   FLT_UWORD_HALF_MAX
	The bitmask of FLT_MAX/2.

   FLT_UWORD_EXP_MAX
	The bitmask of the largest finite exponent (129 if the largest
	exponent is used for finite numbers, 128 otherwise).

   FLT_UWORD_LOG_MAX
	The bitmask of log(FLT_MAX), rounded down.  This value is the largest
	input that can be passed to exp() without producing overflow.

   FLT_UWORD_LOG_2MAX
	The bitmask of log(2*FLT_MAX), rounded down.  This value is the
	largest input than can be passed to cosh() without producing
	overflow.

   FLT_LARGEST_EXP
	The largest biased exponent that can be used for finite numbers
	(255 if the largest exponent is used for finite numbers, 254
	otherwise) */

#ifdef _FLT_LARGEST_EXPONENT_IS_NORMAL
#define FLT_UWORD_IS_FINITE(x) 1
#define FLT_UWORD_IS_NAN(x) 0
#define FLT_UWORD_IS_INFINITE(x) 0
#define FLT_UWORD_MAX 0x7fffffff
#define FLT_UWORD_EXP_MAX 0x43010000
#define FLT_UWORD_LOG_MAX 0x42b2d4fc
#define FLT_UWORD_LOG_2MAX 0x42b437e0
#define HUGE ((float)0X1.FFFFFEP128)
#else
#define FLT_UWORD_IS_FINITE(x) ((x)<0x7f800000L)
#define FLT_UWORD_IS_NAN(x) ((x)>0x7f800000L)
#define FLT_UWORD_IS_INFINITE(x) ((x)==0x7f800000L)
#define FLT_UWORD_MAX 0x7f7fffffL
#define FLT_UWORD_EXP_MAX 0x43000000
#define FLT_UWORD_LOG_MAX 0x42b17217
#define FLT_UWORD_LOG_2MAX 0x42b2d4fc
#define HUGE ((float)3.40282346638528860e+38)
#endif
#define FLT_UWORD_HALF_MAX (FLT_UWORD_MAX-(1L<<23))
#define FLT_LARGEST_EXP (FLT_UWORD_MAX>>23)

/* Many routines check for zero and subnormal numbers.  Such things depend
   on whether the target supports denormals or not:

   FLT_UWORD_IS_ZERO(X)
	True if a positive float with bitmask X is +0.	Without denormals,
	any float with a zero exponent is a +0 representation.	With
	denormals, the only +0 representation is a 0 bitmask.

   FLT_UWORD_IS_SUBNORMAL(X)
	True if a non-zero positive float with bitmask X is subnormal.
	(Routines should check for zeros first.)

   FLT_UWORD_MIN
	The bitmask of the smallest float above +0.  Call this number
	REAL_FLT_MIN...

   FLT_UWORD_EXP_MIN
	The bitmask of the float representation of REAL_FLT_MIN's exponent.

   FLT_UWORD_LOG_MIN
	The bitmask of |log(REAL_FLT_MIN)|, rounding down.

   FLT_SMALLEST_EXP
	REAL_FLT_MIN's exponent - EXP_BIAS (1 if denormals are not supported,
	-22 if they are).
*/

#ifdef _FLT_NO_DENORMALS
#define FLT_UWORD_IS_ZERO(x) ((x)<0x00800000L)
#define FLT_UWORD_IS_SUBNORMAL(x) 0
#define FLT_UWORD_MIN 0x00800000
#define FLT_UWORD_EXP_MIN 0x42fc0000
#define FLT_UWORD_LOG_MIN 0x42aeac50
#define FLT_SMALLEST_EXP 1
#else
#define FLT_UWORD_IS_ZERO(x) ((x)==0)
#define FLT_UWORD_IS_SUBNORMAL(x) ((x)<0x00800000L)
#define FLT_UWORD_MIN 0x00000001
#define FLT_UWORD_EXP_MIN 0x43160000
#define FLT_UWORD_LOG_MIN 0x42cff1b5
#define FLT_SMALLEST_EXP -22
#endif

#ifdef __STDC__
#undef __P
#define	__P(p)	p
#else
#define	__P(p)	()
#endif

/* 
 * set X_TLOSS = pi*2**52, which is possibly defined in <values.h>
 * (one may replace the following line by "#include <values.h>")
 */

#define X_TLOSS		1.41484755040568800000e+16 

/* Functions that are not documented, and are not in <math.h>.  */

#ifdef _SCALB_INT
extern double scalb __P((double, int));
#else
extern double scalb __P((double, double));
#endif
extern double significand __P((double));

extern long double __ieee754_hypotl __P((long double, long double));

/* ieee style elementary functions */
extern double __ieee754_sqrt __P((double));			
extern double __ieee754_acos __P((double));			
extern double __ieee754_acosh __P((double));			
extern double __ieee754_log __P((double));			
extern double __ieee754_atanh __P((double));			
extern double __ieee754_asin __P((double));			
extern double __ieee754_atan2 __P((double,double));			
extern double __ieee754_exp __P((double));
extern double __ieee754_cosh __P((double));
extern double __ieee754_fmod __P((double,double));
extern double __ieee754_pow __P((double,double));
extern double __ieee754_lgamma_r __P((double,int *));
extern double __ieee754_gamma_r __P((double,int *));
extern double __ieee754_tgamma __P((double));
extern double __ieee754_log10 __P((double));
extern double __ieee754_sinh __P((double));
extern double __ieee754_hypot __P((double,double));
extern double __ieee754_j0 __P((double));
extern double __ieee754_j1 __P((double));
extern double __ieee754_y0 __P((double));
extern double __ieee754_y1 __P((double));
extern double __ieee754_jn __P((int,double));
extern double __ieee754_yn __P((int,double));
extern double __ieee754_remainder __P((double,double));
extern __int32_t __ieee754_rem_pio2 __P((double,double*));
#ifdef _SCALB_INT
extern double __ieee754_scalb __P((double,int));
#else
extern double __ieee754_scalb __P((double,double));
#endif

/* fdlibm kernel function */
extern double __kernel_standard __P((double,double,int));
extern double __kernel_sin __P((double,double,int));
extern double __kernel_cos __P((double,double));
extern double __kernel_tan __P((double,double,int));
extern int    __kernel_rem_pio2 __P((double*,double*,int,int,int,const __int32_t*));

/* Undocumented float functions.  */
#ifdef _SCALB_INT
extern float scalbf __P((float, int));
#else
extern float scalbf __P((float, float));
#endif
extern float significandf __P((float));

/* ieee style elementary float functions */
extern float __ieee754_sqrtf __P((float));			
extern float __ieee754_acosf __P((float));			
extern float __ieee754_acoshf __P((float));			
extern float __ieee754_logf __P((float));			
extern float __ieee754_atanhf __P((float));			
extern float __ieee754_asinf __P((float));			
extern float __ieee754_atan2f __P((float,float));			
extern float __ieee754_expf __P((float));
extern float __ieee754_coshf __P((float));
extern float __ieee754_fmodf __P((float,float));
extern float __ieee754_powf __P((float,float));
extern float __ieee754_lgammaf_r __P((float,int *));
extern float __ieee754_gammaf_r __P((float,int *));
extern float __ieee754_tgammaf __P((float));
extern float __ieee754_log10f __P((float));
extern float __ieee754_sinhf __P((float));
extern float __ieee754_hypotf __P((float,float));
extern float __ieee754_j0f __P((float));
extern float __ieee754_j1f __P((float));
extern float __ieee754_y0f __P((float));
extern float __ieee754_y1f __P((float));
extern float __ieee754_jnf __P((int,float));
extern float __ieee754_ynf __P((int,float));
extern float __ieee754_remainderf __P((float,float));
extern __int32_t __ieee754_rem_pio2f __P((float,float*));
#ifdef _SCALB_INT
extern float __ieee754_scalbf __P((float,int));
#else
extern float __ieee754_scalbf __P((float,float));
#endif

#if !__OBSOLETE_MATH
/* The new math code does not provide separate wrapper function
   for error handling, so the extern symbol is called directly.
   This is valid as long as there are no namespace issues (the
   extern symbol is reserved whenever the caller is reserved)
   and there are no observable error handling side effects.  */
# define __ieee754_exp(x) exp(x)
# define __ieee754_log(x) log(x)
# define __ieee754_pow(x,y) pow(x,y)
# define __ieee754_expf(x) expf(x)
# define __ieee754_logf(x) logf(x)
# define __ieee754_powf(x,y) powf(x,y)
#endif

/* float versions of fdlibm kernel functions */
extern float __kernel_sinf __P((float,float,int));
extern float __kernel_cosf __P((float,float));
extern float __kernel_tanf __P((float,float,int));
extern int   __kernel_rem_pio2f __P((float*,float*,int,int,int,const __int32_t*));

/* The original code used statements like
	n0 = ((*(int*)&one)>>29)^1;		* index of high word *
	ix0 = *(n0+(int*)&x);			* high word of x *
	ix1 = *((1-n0)+(int*)&x);		* low word of x *
   to dig two 32 bit words out of the 64 bit IEEE floating point
   value.  That is non-ANSI, and, moreover, the gcc instruction
   scheduler gets it wrong.  We instead use the following macros.
   Unlike the original code, we determine the endianness at compile
   time, not at run time; I don't see much benefit to selecting
   endianness at run time.  */

#ifndef __IEEE_BIG_ENDIAN
#ifndef __IEEE_LITTLE_ENDIAN
 #error Must define endianness
#endif
#endif

/* A union which permits us to convert between a double and two 32 bit
   ints.  */

#ifdef __IEEE_BIG_ENDIAN

typedef union 
{
  double value;
  struct 
  {
    __uint32_t msw;
    __uint32_t lsw;
  } parts;
} ieee_double_shape_type;

#endif

#ifdef __IEEE_LITTLE_ENDIAN

typedef union 
{
  double value;
  struct 
  {
    __uint32_t lsw;
    __uint32_t msw;
  } parts;
} ieee_double_shape_type;

#endif

/* Get two 32 bit ints from a double.  */

#define EXTRACT_WORDS(ix0,ix1,d)				\
do {								\
  ieee_double_shape_type ew_u;					\
  ew_u.value = (d);						\
  (ix0) = ew_u.parts.msw;					\
  (ix1) = ew_u.parts.lsw;					\
} while (0)

/* Get the more significant 32 bit int from a double.  */

#define GET_HIGH_WORD(i,d)					\
do {								\
  ieee_double_shape_type gh_u;					\
  gh_u.value = (d);						\
  (i) = gh_u.parts.msw;						\
} while (0)

/* Get the less significant 32 bit int from a double.  */

#define GET_LOW_WORD(i,d)					\
do {								\
  ieee_double_shape_type gl_u;					\
  gl_u.value = (d);						\
  (i) = gl_u.parts.lsw;						\
} while (0)

/* Set a double from two 32 bit ints.  */

#define INSERT_WORDS(d,ix0,ix1)					\
do {								\
  ieee_double_shape_type iw_u;					\
  iw_u.parts.msw = (ix0);					\
  iw_u.parts.lsw = (ix1);					\
  (d) = iw_u.value;						\
} while (0)

/* Set the more significant 32 bits of a double from an int.  */

#define SET_HIGH_WORD(d,v)					\
do {								\
  ieee_double_shape_type sh_u;					\
  sh_u.value = (d);						\
  sh_u.parts.msw = (v);						\
  (d) = sh_u.value;						\
} while (0)

/* Set the less significant 32 bits of a double from an int.  */

#define SET_LOW_WORD(d,v)					\
do {								\
  ieee_double_shape_type sl_u;					\
  sl_u.value = (d);						\
  sl_u.parts.lsw = (v);						\
  (d) = sl_u.value;						\
} while (0)

/* A union which permits us to convert between a float and a 32 bit
   int.  */

typedef union
{
  float value;
  __uint32_t word;
} ieee_float_shape_type;

/* Get a 32 bit int from a float.  */

#define GET_FLOAT_WORD(i,d)					\
do {								\
  ieee_float_shape_type gf_u;					\
  gf_u.value = (d);						\
  (i) = gf_u.word;						\
} while (0)

/* Set a float from a 32 bit int.  */

#define SET_FLOAT_WORD(d,i)					\
do {								\
  ieee_float_shape_type sf_u;					\
  sf_u.word = (i);						\
  (d) = sf_u.value;						\
} while (0)

/* Macros to avoid undefined behaviour that can arise if the amount
   of a shift is exactly equal to the size of the shifted operand.  */

#define SAFE_LEFT_SHIFT(op,amt)					\
  (((amt) < 8 * sizeof(op)) ? ((op) << (amt)) : 0)

#define SAFE_RIGHT_SHIFT(op,amt)				\
  (((amt) < 8 * sizeof(op)) ? ((op) >> (amt)) : 0)

#ifdef  _COMPLEX_H

/*
 * Quoting from ISO/IEC 9899:TC2:
 *
 * 6.2.5.13 Types
 * Each complex type has the same representation and alignment requirements as
 * an array type containing exactly two elements of the corresponding real type;
 * the first element is equal to the real part, and the second element to the
 * imaginary part, of the complex number.
 */
typedef union {
        float complex z;
        float parts[2];
} float_complex;

typedef union {
        double complex z;
        double parts[2];
} double_complex;

typedef union {
        long double complex z;
        long double parts[2];
} long_double_complex;

#define REAL_PART(z)    ((z).parts[0])
#define IMAG_PART(z)    ((z).parts[1])

#endif  /* _COMPLEX_H */


[-- Attachment #4: fesetround.c --]
[-- Type: text/x-csrc, Size: 2168 bytes --]

/*
 * SPDX-License-Identifier: BSD-2-Clause
 *
 * Copyright (c) 2010-2019 Red Hat, Inc.
 */

#define _GNU_SOURCE        // for FE_NOMASK_ENV

#include <errno.h>

/*  Mask and shift amount for rounding bits.  */
#define FE_CW_ROUND_MASK	(0x0c00)
#define FE_CW_ROUND_SHIFT	(10)
/*  Same, for SSE MXCSR.  */
#define FE_MXCSR_ROUND_MASK	(0x6000)
#define FE_MXCSR_ROUND_SHIFT	(13)

#define FE_DOWNWARD     (1)
#define FE_TONEAREST    (0)
#define FE_TOWARDZERO   (3)
#define FE_UPWARD       (2)

static inline int use_sse(void)
{
  unsigned int edx, eax;

  /* Check for presence of SSE: invoke CPUID #1, check EDX bit 25.  */
  eax = 1;
  __asm__ volatile ("cpuid" : "=d" (edx), "+a" (eax) :: "%ecx", "%ebx");
  /* If this flag isn't set we'll avoid trying to execute any SSE.  */
  if ((edx & (1 << 25)) != 0)
    return 1;

  return 0;
}

/*  Returns the currently selected rounding mode, represented by one of the
   values of the defined rounding mode macros.  */
int
fegetround (void)
{
  unsigned short cw;

  /* Get control word.  We assume SSE and x87 stay in sync.  */
  __asm__ volatile ("fnstcw %0" : "=m" (*&cw) : );

  return (cw & FE_CW_ROUND_MASK) >> FE_CW_ROUND_SHIFT;
}

/*  Changes the currently selected rounding mode to round. If round does
   not correspond to one of the supported rounding modes nothing is changed.
   fesetround returns zero if it changed the rounding mode, a nonzero value
   if the mode is not supported.  */
int
fesetround (int round)
{
  unsigned short cw;
  unsigned int mxcsr = 0;

  /* Will succeed for any valid value of the input parameter.  */
  if (round < FE_TONEAREST || round > FE_TOWARDZERO)
    return EINVAL;

  /* Get control words.  */
  __asm__ volatile ("fnstcw %0" : "=m" (cw) : );
if (use_sse())
    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : );

  /* Twiddle bits.  */
  cw &= ~FE_CW_ROUND_MASK;
  cw |= (round << FE_CW_ROUND_SHIFT);
  

  mxcsr &= ~FE_MXCSR_ROUND_MASK;
  mxcsr |= (round << FE_MXCSR_ROUND_SHIFT);
/* Set back into FPU state.  */
  __asm__ volatile ("fldcw %0" :: "m" (cw));
  if (use_sse())
    __asm__ volatile ("ldmxcsr %0" :: "m" (mxcsr));

  /* Indicate success.  */
  return 0;
}


[-- Attachment #5: test_sqrt.c --]
[-- Type: text/x-csrc, Size: 446 bytes --]

#include <stdio.h>
#include <math.h>

int fesetround(int);
#define FE_DOWNWARD     (1)
#define FE_TONEAREST    (0)
#define FE_TOWARDZERO   (3)
#define FE_UPWARD       (2)


int main()
{
  int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD };
  char Rnd[4] = "NZUD";
  float x = 0x1.ff07fep+127f;
  float y;
  for (int i = 0; i < 4; i++)
  {
    fesetround(rnd[i]);
    y = sqrtf (x);
    printf ("RND%c: %a\n", Rnd[i], y);
  }
}


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-03  3:01                 ` Jeff Johnston
@ 2021-06-03 10:21                   ` Paul Zimmermann
  2021-06-03 15:50                     ` Jeff Johnston
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Zimmermann @ 2021-06-03 10:21 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: joel, newlib

       Hi Jeff,

I've investigated and found (partially) the cause of the problem. When I
compile my program, gcc uses the system fenv.h, which contains FE_TONEAREST=0,
FE_DOWNWARD=0x400, FE_UPWARD=0x800, and FE_TOWARDZERO=0xc00 in
/usr/include/bits/fenv.h.

However, the values in x86_64/newlib/targ-include/sys/fenv.h are different:
FE_TONEAREST=0, FE_DOWNWARD=1, FE_UPWARD=2, FE_TOWARDZERO=3. Thus the test
at the beginning of fesetround in newlib/libm/machine/x86_64/fenv.c fails
except for round=FE_TONEAREST:

  /* Will succeed for any valid value of the input parameter.  */
  if (round < FE_TONEAREST || round > FE_TOWARDZERO)
    return EINVAL;

enter fesetround, round=0 use_sse=1
enter fesetround, round=3072 use_sse=1
enter fesetround, round=2048 use_sse=1
enter fesetround, round=1024 use_sse=1

This explains why the other modes are not set.

A first question is why Newlib and the system libm (here glibc) use different
constants for the rounding modes.

Now if I compile with -I/tmp/x86_64/include (where the Newlib fenv.h is
installed) I get:

enter fesetround, round=0 use_sse=1
enter fesetround, round=3 use_sse=1
enter fesetround, round=2 use_sse=1
enter fesetround, round=1 use_sse=1

but I still do get the same results whatever the rounding mode!

If I look at the cw and mxcsr registers like in Newlib fenv.c, I get with glibc
(after the fesetround call):

fegetround: 0
cw=895 mxcsr=8064
RNDN: 0x1.ff83fp+63
fegetround: 3072
cw=3967 mxcsr=32672
RNDZ: 0x1.ff83eep+63
fegetround: 2048
cw=2943 mxcsr=24480
RNDU: 0x1.ff83fp+63
fegetround: 1024
cw=1919 mxcsr=16288
RNDD: 0x1.ff83eep+63

and with Newlib:

fegetround: 0
cw=895 mxcsr=8064
RNDN: 0x1.ff83fp+63
fegetround: 3
cw=3967 mxcsr=32640
RNDZ: 0x1.ff83fp+63
fegetround: 2
cw=2943 mxcsr=24448
RNDU: 0x1.ff83fp+63
fegetround: 1
cw=1919 mxcsr=16256
RNDD: 0x1.ff83fp+63

The cw values do match but not the mxcsr values.

Best regards,
Paul

> Something  weird is going on.  I condensed the newlib code into the
> following files and when I compile and run the test,
> it works as expected:
> 
> gcc -std=c99 -g3 -O0 test_sqrt.c fesetround.c ef_sqrt.c
> [jjohnstn@localhost shared_x86]$ ./a.out
> RNDN: 0x1.ff83fp+63
> RNDZ: 0x1.ff83eep+63
> RNDU: 0x1.ff83fp+63
> RNDD: 0x1.ff83eep+63
> 
> Can you verify that you are calling the fesetround in fenv.c and what
> configuration call did you use for building newlib?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-02 19:07             ` Marco Atzeri
  2021-06-02 19:12               ` Joel Sherrill
@ 2021-06-03 11:25               ` Paul Zimmermann
  1 sibling, 0 replies; 19+ messages in thread
From: Paul Zimmermann @ 2021-06-03 11:25 UTC (permalink / raw)
  To: Marco Atzeri; +Cc: newlib

       Dear Marco,

> current Cygwin produces for both i686 and X86_64
> 
>   $ gcc -DNEWLIB -fno-builtin test_sqrt.c  -lm
> 
>   $ ./a.exe
> RNDN: 0x1.ff83fp+63
> RNDZ: 0x1.ff83fp+63
> RNDU: 0x1.ff83fp+63
> RNDD: 0x1.ff83fp+63

make sure you are using the Newlib fenv.h (or print the values of FE_TONEAREST,
FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD). Cf my previous mail.

Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-03 10:21                   ` Paul Zimmermann
@ 2021-06-03 15:50                     ` Jeff Johnston
  2021-06-04  7:14                       ` Paul Zimmermann
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Johnston @ 2021-06-03 15:50 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: joel, Newlib

Hi Paul,

I figured the values were off when I had to hard-code them in my own
test_sqrt.c but forgot to include that info in my note.

Now, that said, using the code I attached earlier, I am seeing the exact
values you are quoting above for glibc for the mxcsr register and the round
is working.  Have your
tried running that code?

The mxcsr values you are seeing that are different are not due to the
fesetround code.  The code is shifting the round value 13 bits
and for 3, that ends up being 0x6000.  It is masking mxcsr with 0xffff9fff
first so when you start with 0x1fxx and end up with 0x7fxx, the code is
doing what is supposed to do.
The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) which is a
bit in the last 2 hex digits which isn't touched by the code logic.

-- Jeff J.



On Thu, Jun 3, 2021 at 6:22 AM Paul Zimmermann <Paul.Zimmermann@inria.fr>
wrote:

>        Hi Jeff,
>
> I've investigated and found (partially) the cause of the problem. When I
> compile my program, gcc uses the system fenv.h, which contains
> FE_TONEAREST=0,
> FE_DOWNWARD=0x400, FE_UPWARD=0x800, and FE_TOWARDZERO=0xc00 in
> /usr/include/bits/fenv.h.
>
> However, the values in x86_64/newlib/targ-include/sys/fenv.h are different:
> FE_TONEAREST=0, FE_DOWNWARD=1, FE_UPWARD=2, FE_TOWARDZERO=3. Thus the test
> at the beginning of fesetround in newlib/libm/machine/x86_64/fenv.c fails
> except for round=FE_TONEAREST:
>
>   /* Will succeed for any valid value of the input parameter.  */
>   if (round < FE_TONEAREST || round > FE_TOWARDZERO)
>     return EINVAL;
>
> enter fesetround, round=0 use_sse=1
> enter fesetround, round=3072 use_sse=1
> enter fesetround, round=2048 use_sse=1
> enter fesetround, round=1024 use_sse=1
>
> This explains why the other modes are not set.
>
> A first question is why Newlib and the system libm (here glibc) use
> different
> constants for the rounding modes.
>
> Now if I compile with -I/tmp/x86_64/include (where the Newlib fenv.h is
> installed) I get:
>
> enter fesetround, round=0 use_sse=1
> enter fesetround, round=3 use_sse=1
> enter fesetround, round=2 use_sse=1
> enter fesetround, round=1 use_sse=1
>
> but I still do get the same results whatever the rounding mode!
>
> If I look at the cw and mxcsr registers like in Newlib fenv.c, I get with
> glibc
> (after the fesetround call):
>
> fegetround: 0
> cw=895 mxcsr=8064
> RNDN: 0x1.ff83fp+63
> fegetround: 3072
> cw=3967 mxcsr=32672
> RNDZ: 0x1.ff83eep+63
> fegetround: 2048
> cw=2943 mxcsr=24480
> RNDU: 0x1.ff83fp+63
> fegetround: 1024
> cw=1919 mxcsr=16288
> RNDD: 0x1.ff83eep+63
>
> and with Newlib:
>
> fegetround: 0
> cw=895 mxcsr=8064
> RNDN: 0x1.ff83fp+63
> fegetround: 3
> cw=3967 mxcsr=32640
> RNDZ: 0x1.ff83fp+63
> fegetround: 2
> cw=2943 mxcsr=24448
> RNDU: 0x1.ff83fp+63
> fegetround: 1
> cw=1919 mxcsr=16256
> RNDD: 0x1.ff83fp+63
>
> The cw values do match but not the mxcsr values.
>
> Best regards,
> Paul
>
> > Something  weird is going on.  I condensed the newlib code into the
> > following files and when I compile and run the test,
> > it works as expected:
> >
> > gcc -std=c99 -g3 -O0 test_sqrt.c fesetround.c ef_sqrt.c
> > [jjohnstn@localhost shared_x86]$ ./a.out
> > RNDN: 0x1.ff83fp+63
> > RNDZ: 0x1.ff83eep+63
> > RNDU: 0x1.ff83fp+63
> > RNDD: 0x1.ff83eep+63
> >
> > Can you verify that you are calling the fesetround in fenv.c and what
> > configuration call did you use for building newlib?
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-03 15:50                     ` Jeff Johnston
@ 2021-06-04  7:14                       ` Paul Zimmermann
  2021-06-04 18:44                         ` Jeff Johnston
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Zimmermann @ 2021-06-04  7:14 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: joel, newlib

       Hi Jeff,

> I figured the values were off when I had to hard-code them in my own
> test_sqrt.c but forgot to include that info in my note.
> 
> Now, that said, using the code I attached earlier, I am seeing the exact
> values you are quoting above for glibc for the mxcsr register and the round
> is working.  Have your
> tried running that code?

yes it works as expected, but it doesn't work with Newlib's fenv.h and libm.a
(see below).

> The mxcsr values you are seeing that are different are not due to the
> fesetround code.  The code is shifting the round value 13 bits
> and for 3, that ends up being 0x6000.  It is masking mxcsr with 0xffff9fff
> first so when you start with 0x1fxx and end up with 0x7fxx, the code is
> doing what is supposed to do.
> The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) which is a
> bit in the last 2 hex digits which isn't touched by the code logic.

here is how to reproduce the issue:

tar xf newlib-4.1.0.tar.gz
cd newlib-4.1.0
mkdir build
cd build
../configure --prefix=/tmp --disable-multilib --target=x86_64
make -j4
make install

$ cat test_sqrt_2.c
#include <stdio.h>
#include <math.h>
#include <fenv.h>

#ifdef NEWLIB
/* RedHat's libm claims:
   undefined reference to `__errno' in j1f/y1f */
int errno;
int* __errno () { return &errno; }
#endif

int main()
{
  int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD };
  char Rnd[4] = "NZUD";
  float x = 0x1.ff07fep+127f;
  float y;
  for (int i = 0; i < 4; i++)
  {
    unsigned short cw;
    unsigned int mxcsr = 0;
    fesetround (rnd[i]);
    __asm__ volatile ("fnstcw %0" : "=m" (cw) : );
    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : );
    y = sqrtf (x);
    printf ("RND%c: %a cw=%u mxcsr=%u\n", Rnd[i], y, cw, mxcsr);
  }
}

With GNU libc:
$ gcc -fno-builtin test_sqrt_2.c -lm
$ ./a.out
RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064
RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672
RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480
RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288

With Newlib:
$ gcc -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c /tmp/libm.a
$ ./a.out 
RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064
RNDZ: 0x1.ff83fp+63 cw=3967 mxcsr=32640
RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24448
RNDD: 0x1.ff83fp+63 cw=1919 mxcsr=16256

Can you reproduce that on x86_64 Linux?

Paul


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-04  7:14                       ` Paul Zimmermann
@ 2021-06-04 18:44                         ` Jeff Johnston
  2021-06-04 18:59                           ` Joel Sherrill
                                             ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jeff Johnston @ 2021-06-04 18:44 UTC (permalink / raw)
  To: Paul Zimmermann; +Cc: joel, Newlib

Ok, I now know exactly what is happening.

The compiler is optimizing out the rounding check in ef_sqrt.c, probably
due to the operation using two constants.

86 ix += (m <<23);
(gdb) list
81 else
82    q += (q&1);

When I debug, it always does the else at line 81 without performing the
one-tiny operation.  The difference in the mxcsr
register is the PE bit which I believe gets set when you do the one-tiny
operation.  Since we aren't doing it, it never gets
set on and the difference of 0x20 in the mxcsr register is explained.

By making the constants volatile, I am able to get the code working as it
should.  I have pushed a patch for this.

-- Jeff J.

On Fri, Jun 4, 2021 at 3:14 AM Paul Zimmermann <Paul.Zimmermann@inria.fr>
wrote:

>        Hi Jeff,
>
> > I figured the values were off when I had to hard-code them in my own
> > test_sqrt.c but forgot to include that info in my note.
> >
> > Now, that said, using the code I attached earlier, I am seeing the exact
> > values you are quoting above for glibc for the mxcsr register and the
> round
> > is working.  Have your
> > tried running that code?
>
> yes it works as expected, but it doesn't work with Newlib's fenv.h and
> libm.a
> (see below).
>
> > The mxcsr values you are seeing that are different are not due to the
> > fesetround code.  The code is shifting the round value 13 bits
> > and for 3, that ends up being 0x6000.  It is masking mxcsr with
> 0xffff9fff
> > first so when you start with 0x1fxx and end up with 0x7fxx, the code is
> > doing what is supposed to do.
> > The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) which is a
> > bit in the last 2 hex digits which isn't touched by the code logic.
>
> here is how to reproduce the issue:
>
> tar xf newlib-4.1.0.tar.gz
> cd newlib-4.1.0
> mkdir build
> cd build
> ../configure --prefix=/tmp --disable-multilib --target=x86_64
> make -j4
> make install
>
> $ cat test_sqrt_2.c
> #include <stdio.h>
> #include <math.h>
> #include <fenv.h>
>
> #ifdef NEWLIB
> /* RedHat's libm claims:
>    undefined reference to `__errno' in j1f/y1f */
> int errno;
> int* __errno () { return &errno; }
> #endif
>
> int main()
> {
>   int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD };
>   char Rnd[4] = "NZUD";
>   float x = 0x1.ff07fep+127f;
>   float y;
>   for (int i = 0; i < 4; i++)
>   {
>     unsigned short cw;
>     unsigned int mxcsr = 0;
>     fesetround (rnd[i]);
>     __asm__ volatile ("fnstcw %0" : "=m" (cw) : );
>     __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : );
>     y = sqrtf (x);
>     printf ("RND%c: %a cw=%u mxcsr=%u\n", Rnd[i], y, cw, mxcsr);
>   }
> }
>
> With GNU libc:
> $ gcc -fno-builtin test_sqrt_2.c -lm
> $ ./a.out
> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064
> RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672
> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480
> RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288
>
> With Newlib:
> $ gcc -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c /tmp/libm.a
> $ ./a.out
> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064
> RNDZ: 0x1.ff83fp+63 cw=3967 mxcsr=32640
> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24448
> RNDD: 0x1.ff83fp+63 cw=1919 mxcsr=16256
>
> Can you reproduce that on x86_64 Linux?
>
> Paul
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-04 18:44                         ` Jeff Johnston
@ 2021-06-04 18:59                           ` Joel Sherrill
  2021-06-05 13:25                           ` Brian Inglis
  2021-06-12 22:31                           ` Maciej W. Rozycki
  2 siblings, 0 replies; 19+ messages in thread
From: Joel Sherrill @ 2021-06-04 18:59 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: Paul Zimmermann, Newlib

On Fri, Jun 4, 2021, 1:44 PM Jeff Johnston <jjohnstn@redhat.com> wrote:

> Ok, I now know exactly what is happening.
>
> The compiler is optimizing out the rounding check in ef_sqrt.c, probably
> due to the operation using two constants.
>
> 86 ix += (m <<23);
> (gdb) list
> 81 else
> 82    q += (q&1);
>
> When I debug, it always does the else at line 81 without performing the
> one-tiny operation.  The difference in the mxcsr
> register is the PE bit which I believe gets set when you do the one-tiny
> operation.  Since we aren't doing it, it never gets
> set on and the difference of 0x20 in the mxcsr register is explained.
>
> By making the constants volatile, I am able to get the code working as it
> should.  I have pushed a patch for this.
>

Awesome catch Paul and great eye to spot the problem Jeff!

--joel

>
> -- Jeff J.
>
> On Fri, Jun 4, 2021 at 3:14 AM Paul Zimmermann <Paul.Zimmermann@inria.fr>
> wrote:
>
>>        Hi Jeff,
>>
>> > I figured the values were off when I had to hard-code them in my own
>> > test_sqrt.c but forgot to include that info in my note.
>> >
>> > Now, that said, using the code I attached earlier, I am seeing the exact
>> > values you are quoting above for glibc for the mxcsr register and the
>> round
>> > is working.  Have your
>> > tried running that code?
>>
>> yes it works as expected, but it doesn't work with Newlib's fenv.h and
>> libm.a
>> (see below).
>>
>> > The mxcsr values you are seeing that are different are not due to the
>> > fesetround code.  The code is shifting the round value 13 bits
>> > and for 3, that ends up being 0x6000.  It is masking mxcsr with
>> 0xffff9fff
>> > first so when you start with 0x1fxx and end up with 0x7fxx, the code is
>> > doing what is supposed to do.
>> > The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) which is
>> a
>> > bit in the last 2 hex digits which isn't touched by the code logic.
>>
>> here is how to reproduce the issue:
>>
>> tar xf newlib-4.1.0.tar.gz
>> cd newlib-4.1.0
>> mkdir build
>> cd build
>> ../configure --prefix=/tmp --disable-multilib --target=x86_64
>> make -j4
>> make install
>>
>> $ cat test_sqrt_2.c
>> #include <stdio.h>
>> #include <math.h>
>> #include <fenv.h>
>>
>> #ifdef NEWLIB
>> /* RedHat's libm claims:
>>    undefined reference to `__errno' in j1f/y1f */
>> int errno;
>> int* __errno () { return &errno; }
>> #endif
>>
>> int main()
>> {
>>   int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD };
>>   char Rnd[4] = "NZUD";
>>   float x = 0x1.ff07fep+127f;
>>   float y;
>>   for (int i = 0; i < 4; i++)
>>   {
>>     unsigned short cw;
>>     unsigned int mxcsr = 0;
>>     fesetround (rnd[i]);
>>     __asm__ volatile ("fnstcw %0" : "=m" (cw) : );
>>     __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : );
>>     y = sqrtf (x);
>>     printf ("RND%c: %a cw=%u mxcsr=%u\n", Rnd[i], y, cw, mxcsr);
>>   }
>> }
>>
>> With GNU libc:
>> $ gcc -fno-builtin test_sqrt_2.c -lm
>> $ ./a.out
>> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064
>> RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672
>> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480
>> RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288
>>
>> With Newlib:
>> $ gcc -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c
>> /tmp/libm.a
>> $ ./a.out
>> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064
>> RNDZ: 0x1.ff83fp+63 cw=3967 mxcsr=32640
>> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24448
>> RNDD: 0x1.ff83fp+63 cw=1919 mxcsr=16256
>>
>> Can you reproduce that on x86_64 Linux?
>>
>> Paul
>>
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-04 18:44                         ` Jeff Johnston
  2021-06-04 18:59                           ` Joel Sherrill
@ 2021-06-05 13:25                           ` Brian Inglis
  2021-06-07  9:51                             ` Paul Zimmermann
  2021-06-12 22:31                           ` Maciej W. Rozycki
  2 siblings, 1 reply; 19+ messages in thread
From: Brian Inglis @ 2021-06-05 13:25 UTC (permalink / raw)
  To: newlib

Great catch, analysis, and fix!
Now sqrtf rounds correctly on Cygwin!

$ ./test-sqrtf-round
  Direction    CW   MX          Input Hex
                           Input Decimal         Sqrt Hex
         Sqrt Decimal
RNDN   0   0 37f 1f80: 0x1.ff07fe00p+127 
339638501828070541185766401939693633536 0x1.ff83f000p+63 
18429283829060468736
RNDD   1   1 77f 3f80: 0x1.ff07fe00p+127 
339638501828070541185766401939693633536 0x1.ff83ee00p+63 
18429282729548840960
RNDU   2   2 b7f 5f80: 0x1.ff07fe00p+127 
339638501828070541185766401939693633536 0x1.ff83f000p+63 
18429283829060468736
RNDZ   3   3 f7f 7f80: 0x1.ff07fe00p+127 
339638501828070541185766401939693633536 0x1.ff83ee00p+63 
18429282729548840960

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.


On 2021-06-04 12:44, Jeff Johnston wrote:
> Ok, I now know exactly what is happening.
> The compiler is optimizing out the rounding check in ef_sqrt.c, probably
> due to the operation using two constants.
> 86 ix += (m <<23);
> (gdb) list
> 81 else
> 82    q += (q&1);
> When I debug, it always does the else at line 81 without performing the
> one-tiny operation.  The difference in the mxcsr
> register is the PE bit which I believe gets set when you do the one-tiny
> operation.  Since we aren't doing it, it never gets
> set on and the difference of 0x20 in the mxcsr register is explained.
> By making the constants volatile, I am able to get the code working
> as it should. I have pushed a patch for this.

> On Fri, Jun 4, 2021 at 3:14 AM Paul Zimmermann wrote:

>>> I figured the values were off when I had to hard-code them in my own
>>> test_sqrt.c but forgot to include that info in my note.
>>>
>>> Now, that said, using the code I attached earlier, I am seeing
>>> the exact values you are quoting above for glibc for the mxcsr
>>> register and the round is working.  Have your tried running that
>>> code?

>> yes it works as expected, but it doesn't work with Newlib's fenv.h and
>> libm.a (see below).

>>> The mxcsr values you are seeing that are different are not due to
>>> the fesetround code. The code is shifting the round value 13
>>> bits and for 3, that ends up being 0x6000.  It is masking mxcsr
>>> with 0xffff9fff first so when you start with 0x1fxx and end up
>>> with 0x7fxx, the code is doing what is supposed to do.
>>> The difference in values above is 0x20 (e.g. 0x7fa0 vs 0x7f80) 
>>> which is a bit in the last 2 hex digits which isn't touched by
>>> the code logic.

>> here is how to reproduce the issue:
>> tar xf newlib-4.1.0.tar.gz
>> cd newlib-4.1.0
>> mkdir build
>> cd build
>> ../configure --prefix=/tmp --disable-multilib --target=x86_64
>> make -j4
>> make install
>> $ cat test_sqrt_2.c
>> #include <stdio.h>
>> #include <math.h>
>> #include <fenv.h>
>> #ifdef NEWLIB
>> /* RedHat's libm claims:
>>     undefined reference to `__errno' in j1f/y1f */
>> int errno;
>> int* __errno () { return &errno; }
>> #endif
>> int main()
>> {
>>    int rnd[4] = { FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD };
>>    char Rnd[4] = "NZUD";
>>    float x = 0x1.ff07fep+127f;
>>    float y;
>>    for (int i = 0; i < 4; i++)
>>    {
>>      unsigned short cw;
>>      unsigned int mxcsr = 0;
>>      fesetround (rnd[i]);
>>      __asm__ volatile ("fnstcw %0" : "=m" (cw) : );
>>      __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr) : );
>>      y = sqrtf (x);
>>      printf ("RND%c: %a cw=%u mxcsr=%u\n", Rnd[i], y, cw, mxcsr);
>>    }
>> }
>> With GNU libc:
>> $ gcc -fno-builtin test_sqrt_2.c -lm
>> $ ./a.out
>> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064
>> RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672
>> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480
>> RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288
>> With Newlib:
>> $ gcc -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c /tmp/libm.a
>> $ ./a.out
>> RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064
>> RNDZ: 0x1.ff83fp+63 cw=3967 mxcsr=32640
>> RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24448
>> RNDD: 0x1.ff83fp+63 cw=1919 mxcsr=16256
>> Can you reproduce that on x86_64 Linux?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-05 13:25                           ` Brian Inglis
@ 2021-06-07  9:51                             ` Paul Zimmermann
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Zimmermann @ 2021-06-07  9:51 UTC (permalink / raw)
  To: newlib; +Cc: newlib

> Great catch, analysis, and fix!
> Now sqrtf rounds correctly on Cygwin!

thank you, I also confirm I now get expected results, after applying Jeff's
patch on top of Newlib 4.1.0:

zimmerma@tomate:/tmp$ gcc -frounding-math -I/tmp/x86_64/include -DNEWLIB -fno-builtin test_sqrt_2.c /tmp/x86_64/lib/libm.a 
zimmerma@tomate:/tmp$ ./a.out 
RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064
RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672
RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480
RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288

zimmerma@tomate:/tmp$ gcc -frounding-math -fno-builtin test_sqrt_2.c -lm
zimmerma@tomate:/tmp$ ./a.out 
RNDN: 0x1.ff83fp+63 cw=895 mxcsr=8064
RNDZ: 0x1.ff83eep+63 cw=3967 mxcsr=32672
RNDU: 0x1.ff83fp+63 cw=2943 mxcsr=24480
RNDD: 0x1.ff83eep+63 cw=1919 mxcsr=16288

Paul


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: incorrectly rounded square root
  2021-06-04 18:44                         ` Jeff Johnston
  2021-06-04 18:59                           ` Joel Sherrill
  2021-06-05 13:25                           ` Brian Inglis
@ 2021-06-12 22:31                           ` Maciej W. Rozycki
  2 siblings, 0 replies; 19+ messages in thread
From: Maciej W. Rozycki @ 2021-06-12 22:31 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: Paul Zimmermann, Newlib

On Fri, 4 Jun 2021, Jeff Johnston wrote:

> The compiler is optimizing out the rounding check in ef_sqrt.c, probably
> due to the operation using two constants.

 Have you checked the `-frounding-math' option to GCC?  This piece of code 
(as probably all newlib's math pieces) ought to be built with that option, 
and if that still does not help (which it may well, as noted in the GCC 
manual), then would you please file a bug with GCC, attaching the test 
case discussed here as a reproducer?

 FWIW glibc does use said option, which may be why seemingly equivalent 
code works with glibc but not with newlib.

 HTH,

  Maciej

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-06-12 22:31 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04  8:08 incorrectly rounded square root Paul Zimmermann
2021-05-31 20:52 ` Jeff Johnston
2021-06-01  7:11   ` Paul Zimmermann
2021-06-01 16:28     ` Jeff Johnston
2021-06-02  7:51       ` Paul Zimmermann
2021-06-02 13:07         ` Joel Sherrill
2021-06-02 18:43           ` Jeff Johnston
2021-06-02 19:07             ` Marco Atzeri
2021-06-02 19:12               ` Joel Sherrill
2021-06-03  3:01                 ` Jeff Johnston
2021-06-03 10:21                   ` Paul Zimmermann
2021-06-03 15:50                     ` Jeff Johnston
2021-06-04  7:14                       ` Paul Zimmermann
2021-06-04 18:44                         ` Jeff Johnston
2021-06-04 18:59                           ` Joel Sherrill
2021-06-05 13:25                           ` Brian Inglis
2021-06-07  9:51                             ` Paul Zimmermann
2021-06-12 22:31                           ` Maciej W. Rozycki
2021-06-03 11:25               ` Paul Zimmermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).