public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Re: fpu/e_expl.c for i686
@ 2001-05-07 12:24 Stephen L Moshier
  2001-05-07 12:38 ` Ulrich Drepper
  2001-05-07 12:45 ` Andreas Jaeger
  0 siblings, 2 replies; 6+ messages in thread
From: Stephen L Moshier @ 2001-05-07 12:24 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: libc-alpha

> +__ieee754_expl (long double x)
> ...
> +       "fldl2e\n"
> +       "fmulp\n"			/* x * log2(e) */
> +       "fld				%%st\n"
> +       "frndint\n"					/* int(x *> log2(e)) */
> +       "fsubr	%%st,%%st(1)\n"		/* fract(x * log2(e)) */

This code seems adequate for double precision but it has poor
accuracy for long double, due to error amplification over the
wider domain of the long double function.  For example, expl(1000)
is off by roughly 460 ulps.  I would suggest improving the range reduction
method here, but since that entails three or four more multiplications
you might not want to make the speed sacrifice.  What would your
policy be about this tradeoff of speed versus accuracy?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fpu/e_expl.c for i686
  2001-05-07 12:24 fpu/e_expl.c for i686 Stephen L Moshier
@ 2001-05-07 12:38 ` Ulrich Drepper
  2001-05-07 12:45 ` Andreas Jaeger
  1 sibling, 0 replies; 6+ messages in thread
From: Ulrich Drepper @ 2001-05-07 12:38 UTC (permalink / raw)
  To: moshier; +Cc: Andreas Jaeger, libc-alpha

Stephen L Moshier <moshier@mediaone.net> writes:

> What would your policy be about this tradeoff of speed versus
> accuracy?

Accuracy is more important here.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fpu/e_expl.c for i686
  2001-05-07 12:24 fpu/e_expl.c for i686 Stephen L Moshier
  2001-05-07 12:38 ` Ulrich Drepper
@ 2001-05-07 12:45 ` Andreas Jaeger
  2001-05-07 12:57   ` Stephen L Moshier
  1 sibling, 1 reply; 6+ messages in thread
From: Andreas Jaeger @ 2001-05-07 12:45 UTC (permalink / raw)
  To: moshier; +Cc: libc-alpha

Stephen L Moshier <moshier@mediaone.net> writes:

> > +__ieee754_expl (long double x)
> > ...
> > +       "fldl2e\n"
> > +       "fmulp\n"			/* x * log2(e) */
> > +       "fld				%%st\n"
> > +       "frndint\n"					/* int(x *> log2(e)) */
> > +       "fsubr	%%st,%%st(1)\n"		/* fract(x * log2(e)) */
> 
> This code seems adequate for double precision but it has poor
> accuracy for long double, due to error amplification over the
> wider domain of the long double function.  For example, expl(1000)
> is off by roughly 460 ulps.  I would suggest improving the range reduction

expl (1000) is .19e435 and can only be represented with long doubles.
How much would the error be for expl (100)?

> method here, but since that entails three or four more multiplications
> you might not want to make the speed sacrifice.  What would your
> policy be about this tradeoff of speed versus accuracy?

I personally would go for accuracy.

Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fpu/e_expl.c for i686
  2001-05-07 12:45 ` Andreas Jaeger
@ 2001-05-07 12:57   ` Stephen L Moshier
  0 siblings, 0 replies; 6+ messages in thread
From: Stephen L Moshier @ 2001-05-07 12:57 UTC (permalink / raw)
  To: Andreas Jaeger; +Cc: moshier, libc-alpha

> expl (1000) is .19e435 and can only be represented with long doubles.
> How much would the error be for expl (100)?

Instead of exp(x) the program computes  exp(x(1 + dx)) where dx
is the relative input error introduced, in this case by the range
reduction method.
    exp(x(1 + dx)) = exp(x) (1 + x dx + ...)
So if x is 100 and dx is 1/2 ulp, the relative error would be
about 50 ulps (long double ulps).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fpu/e_expl.c for i686
  2001-05-06 13:17 Andreas Jaeger
@ 2001-05-06 16:01 ` Ulrich Drepper
  0 siblings, 0 replies; 6+ messages in thread
From: Ulrich Drepper @ 2001-05-06 16:01 UTC (permalink / raw)
  To: libc-alpha; +Cc: Jan Hubicka

Andreas Jaeger <aj@suse.de> writes:

> to these removing one conditional jump:
> 1:	fldz
>         testl	$0x200, %eax	/* Test sign.  */
> 	fcmovnz %st(1),%st(0)
>         fstp	%st(1)
> 2:

Seems to be right.  This is what I had in mind.  If it passes the test
suite it should be fine.

> Changing the assembler file to an inline asm, I also have a problem
> that the inline asm can only have one exit point.  Is my
> transformation ok?

Looks like it.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* fpu/e_expl.c for i686
@ 2001-05-06 13:17 Andreas Jaeger
  2001-05-06 16:01 ` Ulrich Drepper
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Jaeger @ 2001-05-06 13:17 UTC (permalink / raw)
  To: libc-alpha; +Cc: Jan Hubicka

Uli, you comment in sysdeps/i386/fpu/e_expl.S that the code can be
written better for i686.  What have you had in mind?  I'd like to
optimize the code for i686 and newer systems.  The only thing I
considered was changing these instructions:

1:	testl	$0x200, %eax	/* Test sign.  */
        jz	2f		/* If positive, jump.  */
        fstp	%st
        fldz			/* Set result to 0.  */
2:

to these removing one conditional jump:
1:	fldz
        testl	$0x200, %eax	/* Test sign.  */
	fcmovnz %st(1),%st(0)
        fstp	%st(1)
2:

Changing the assembler file to an inline asm, I also have a problem
that the inline asm can only have one exit point.  Is my
transformation ok?

Andreas

============================================================
Index: sysdeps/i386/i686/fpu/e_expl.c
--- sysdeps/i386/i686/fpu/e_expl.c	created
+++ sysdeps/i386/i686/fpu/e_expl.c	Sun May  6 22:14:16 2001	1.1
@@ -0,0 +1,46 @@
+/*
+ * Written by J.T. Conklin <jtc@netbsd.org>.
+ * Public domain.
+ *
+ * Adapted for `long double' by Ulrich Drepper <drepper@cygnus.com>.
+ */
+
+#include <math_private.h>
+
+/* e^x = 2^(x * log2l(e)) */
+long double
+__ieee754_expl (long double x)
+{
+  long double res;
+
+/* I added the following ugly construct because expl(+-Inf) resulted
+   in NaN.  The ugliness results from the bright minds at Intel.
+   For the i686 the code can be written better.
+   -- drepper@cygnus.com.  */
+  asm ("fxam\n"				/* Is NaN or +-Inf?  */
+       "fstsw	%%ax\n"
+       "movb	$0x45, %%dh\n"
+       "andb	%%ah, %%dh\n"
+       "cmpb	$0x05, %%dh\n"
+       "je	1f\n"			/* Is +-Inf, jump.  */
+       "fldl2e\n"
+       "fmulp\n"			/* x * log2(e) */
+       "fld	%%st\n"
+       "frndint\n"			/* int(x * log2(e)) */
+       "fsubr	%%st,%%st(1)\n"		/* fract(x * log2(e)) */
+       "fxch\n"
+       "f2xm1\n"			/* 2^(fract(x * log2(e))) - 1 */
+       "fld1\n"
+       "faddp\n"			/* 2^(fract(x * log2(e))) */
+       "fscale\n"			/* e^x */
+       "fstp	%%st(1)\n"
+       "jmp 2f\n"
+       "1:\ttestl	$0x200, %%eax\n"	/* Test sign.  */
+       "jz	2f\n"			/* If positive, jump.  */
+       "fstp	%%st\n"
+       "fldz\n"				/* Set result to 0.  */
+       "2:\t\n"
+       : "=t" (res) : "0" (x) : "ax", "dx");
+
+  return res;
+}

-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-05-07 12:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-05-07 12:24 fpu/e_expl.c for i686 Stephen L Moshier
2001-05-07 12:38 ` Ulrich Drepper
2001-05-07 12:45 ` Andreas Jaeger
2001-05-07 12:57   ` Stephen L Moshier
  -- strict thread matches above, loose matches on Subject: below --
2001-05-06 13:17 Andreas Jaeger
2001-05-06 16:01 ` Ulrich Drepper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).