From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19663 invoked by alias); 20 Sep 2013 03:32:15 -0000 Mailing-List: contact libc-ports-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-ports-owner@sourceware.org Received: (qmail 19654 invoked by uid 89); 20 Sep 2013 03:32:15 -0000 Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 20 Sep 2013 03:32:15 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.5 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r8K3WAZm009257 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 19 Sep 2013 23:32:10 -0400 Received: from [10.3.113.73] (ovpn-113-73.phx2.redhat.com [10.3.113.73]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r8K3W8Hl021875; Thu, 19 Sep 2013 23:32:09 -0400 Message-ID: <523BC1B8.4040102@redhat.com> Date: Fri, 20 Sep 2013 03:32:00 -0000 From: "Carlos O'Donell" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: Steve Ellcey CC: libc-ports@sourceware.org Subject: Re: [PATCH] Speed up libm on MIPS References: <1379631395.5770.445.camel@ubuntu-sellcey> In-Reply-To: <1379631395.5770.445.camel@ubuntu-sellcey> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2013-09/txt/msg00111.txt.bz2 On 09/19/2013 06:56 PM, Steve Ellcey wrote: > This patch defines various inline routines and macros used by the math > library in order to speed up libm on MIPS. It does not affect > soft-float builds but for hard-float builds 'make bench' shows a > speed-up. With an o32 little-endian glibc, sin() went from 27792.6 > iter/s to 31293.6 iter/s. On n32 it went from 32955.2 to 36179.7 and on > n64 from 33074.7 to 36242. exp() went from 45742.4 to 56511.2 on o32 and > pow() went from 19008.8 to 20508.7. I have attached the original and > new bench.out files for o32, n32, and n64 ABIs in case you want to see > more of the data. These are all little-endian hard-float runs. > > I ran 'make check' and 'make bench' using the o32, n32, and n64 ABIs > with big and little endian and with hard and soft float to verify there > were no failures. I did run into an unrelated problem that is being > fixed (https://sourceware.org/ml/libc-alpha/2013-09/msg00601.html) but > there were no other failures except the expected ones for MIPS. Thank you for running `make bench' and posting the results. I appreciate you going out of your way to make the measurements and post them. > OK for checkin? Does MIPS have a slow fpu save/restore? Does using HAVE_RM_CTX speed anything up for you? For example see 2506109403de69bd454de27835d42e6eb6ec3abc Two nits below. > Steve Ellcey > sellcey@mips.com > > > 2013-09-18 Steve Ellcey > > * sysdeps/mips/math_private.h (libc_feholdexcept_mips): New function. > (libc_feholdexcept): New macro. > (libc_feholdexceptf): New macro. > (libc_feholdexceptl): New macro. > (libc_fesetround_mips): New function. > (libc_fesetround): New macro. > (libc_fesetroundf): New macro. > (libc_fesetroundl): New macro. > (libc_feholdexcept_setround_mips): New function. > (libc_feholdexcept_setround): New macro. > (libc_feholdexcept_setroundf): New macro. > (libc_feholdexcept_setroundl): New macro. > (libc_fesetenv_mips): New function. > (libc_fesetenv): New macro. > (libc_fesetenvf): New macro. > (libc_fesetenvl): New macro. > (libc_feupdateenv_mips): New function. > (libc_feupdateenv): New macro. > (libc_feupdateenvf): New macro. > (libc_feupdateenvl): New macro. > > > mips-libm.patch > > > diff --git a/ports/sysdeps/mips/math_private.h b/ports/sysdeps/mips/math_private.h > index 6b99957..0ac18fd 100644 > --- a/ports/sysdeps/mips/math_private.h > +++ b/ports/sysdeps/mips/math_private.h > @@ -26,6 +26,119 @@ > # define HIGH_ORDER_BIT_IS_SET_FOR_SNAN > #endif > > +/* Inline functions to speed up the math library implementation. The > + default versions of these routines are in generic/math_private.h > + and call fesetround, feholdexcept, etc. These routines use inlined > + code instead. */ > + > +#ifdef __mips_hard_float > + > +#include > +#include > +#include > + > +static __always_inline void > +libc_feholdexcept_mips (fenv_t *envp) > +{ > + fpu_control_t cw; > + > + /* Save the current state. */ > + _FPU_GETCW (cw); > + envp->__fp_control_register = cw; > + > + /* Clear all exception enable bits and flags. */ > + cw &= ~(_FPU_MASK_V|_FPU_MASK_Z|_FPU_MASK_O|_FPU_MASK_U|_FPU_MASK_I|FE_ALL_EXCEPT); > + _FPU_SETCW (cw); > +} > +#define libc_feholdexcept libc_feholdexcept_mips > +#define libc_feholdexceptf libc_feholdexcept_mips > +#define libc_feholdexceptl libc_feholdexcept_mips > + > +static __always_inline void > +libc_fesetround_mips (int round) > +{ > + fpu_control_t cw; > + > + /* Get current state. */ > + _FPU_GETCW (cw); > + > + /* Set rounding bits. */ > + cw &= ~0x3; What's the magic ~0x3? Should it be a new macro? > + cw |= round; > + > + /* Set new state. */ > + _FPU_SETCW (cw); > +} > +#define libc_fesetround libc_fesetround_mips > +#define libc_fesetroundf libc_fesetround_mips > +#define libc_fesetroundl libc_fesetround_mips > + > +static __always_inline void > +libc_feholdexcept_setround_mips (fenv_t *envp, int round) > +{ > + fpu_control_t cw; > + > + /* Save the current state. */ > + _FPU_GETCW (cw); > + envp->__fp_control_register = cw; > + > + /* Clear all exception enable bits and flags. */ > + cw &= ~(_FPU_MASK_V|_FPU_MASK_Z|_FPU_MASK_O|_FPU_MASK_U|_FPU_MASK_I|FE_ALL_EXCEPT); > + > + /* Set rounding bits. */ > + cw &= ~0x3; Likewise? > + cw |= round; > + > + /* Set new state. */ > + _FPU_SETCW (cw); > +} > +#define libc_feholdexcept_setround libc_feholdexcept_setround_mips > +#define libc_feholdexcept_setroundf libc_feholdexcept_setround_mips > +#define libc_feholdexcept_setroundl libc_feholdexcept_setround_mips > + > +static __always_inline void > +libc_fesetenv_mips (fenv_t *envp) > +{ > + fpu_control_t cw; > + > + /* Read first current state to flush fpu pipeline. */ > + _FPU_GETCW (cw); > + > + if (envp == FE_DFL_ENV) > + _FPU_SETCW (_FPU_DEFAULT); > + else if (envp == FE_NOMASK_ENV) > + _FPU_SETCW (_FPU_IEEE); > + else > + _FPU_SETCW (envp->__fp_control_register); > +} > +#define libc_fesetenv libc_fesetenv_mips > +#define libc_fesetenvf libc_fesetenv_mips > +#define libc_fesetenvl libc_fesetenv_mips > + > +static __always_inline void > +libc_feupdateenv_mips (fenv_t *envp) > +{ > + int temp; > + > + /* Save current exceptions. */ > + _FPU_GETCW (temp); > + > + /* Set flag bits (which are accumulative), and *also* set the > + cause bits. The setting of the cause bits is what actually causes Two spaces after period. > + the hardware to generate the exception, if the corresponding enable > + bit is set as well. */ > + temp &= FE_ALL_EXCEPT; > + temp |= envp->__fp_control_register | (temp << CAUSE_SHIFT); > + > + /* Set new state. */ > + _FPU_SETCW (temp); > +} > +#define libc_feupdateenv libc_feupdateenv_mips > +#define libc_feupdateenvf libc_feupdateenv_mips > +#define libc_feupdateenvl libc_feupdateenv_mips > + > +#endif > + > #include_next > > #endif Otherwise looks good to me. Cheers, Carlos.