public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Inline IBM long double __gcc_qsub
@ 2021-08-26  0:23 David Edelsohn
  2021-08-26  7:35 ` Andreas Schwab
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: David Edelsohn @ 2021-08-26  0:23 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches

    rs6000: inline ldouble __gcc_qsub

    While performing some tests of IEEE 128 float for PPC64LE, Michael
    Meissner noticed that __gcc_qsub is substantially slower than
    __gcc_qadd.  __gcc_qsub valls __gcc_add with the second operand
    negated.  Because the functions normally are invoked through
    libgcc shared object, the extra PLT overhead has a large impact
    on the overall time of the function.  Instead of trying to be
    fancy with function decorations to prevent interposition, this
    patch inlines the definition of __gcc_qadd into __gcc_qsub with
    the negation propagated through the function.

    libgcc/ChangeLog:

            * config/rs6000/ibm-ldouble.c (__gcc_qsub): Inline negated
__gcc_qadd.

diff --git a/libgcc/config/rs6000/ibm-ldouble.c
b/libgcc/config/rs6000/ibm-ldouble.c
index 4c13453f975..ed74900e5c3 100644
--- a/libgcc/config/rs6000/ibm-ldouble.c
+++ b/libgcc/config/rs6000/ibm-ldouble.c
@@ -158,9 +158,42 @@ __gcc_qadd (double a, double aa, double c, double cc)
 }

 IBM128_TYPE
-__gcc_qsub (double a, double b, double c, double d)
+__gcc_qsub (double a, double aa, double c, double cc)
 {
-  return __gcc_qadd (a, b, -c, -d);
+  double xh, xl, z, q, zz;
+
+  z = a - c;
+
+  if (nonfinite (z))
+    {
+      if (fabs (z) != inf())
+       return z;
+      z = -cc + aa - c + a;
+      if (nonfinite (z))
+       return z;
+      xh = z;  /* Will always be DBL_MAX.  */
+      zz = aa - cc;
+      if (fabs(a) > fabs(c))
+       xl = a - z - c + zz;
+      else
+       xl = -c - z + a + zz;
+    }
+  else
+    {
+      q = a - z;
+      zz = q - c + (a - (q + z)) + aa - cc;
+
+      /* Keep -0 result.  */
+      if (zz == 0.0)
+       return z;
+
+      xh = z + zz;
+      if (nonfinite (xh))
+       return xh;
+
+      xl = z - xh + zz;
+    }
+  return pack_ldouble (xh, xl);
 }

 #ifdef __NO_FPRS__

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Inline IBM long double __gcc_qsub
  2021-08-26  0:23 [PATCH] Inline IBM long double __gcc_qsub David Edelsohn
@ 2021-08-26  7:35 ` Andreas Schwab
  2021-08-26  7:40 ` Andreas Schwab
  2021-08-26 12:15 ` [PATCH] " Segher Boessenkool
  2 siblings, 0 replies; 7+ messages in thread
From: Andreas Schwab @ 2021-08-26  7:35 UTC (permalink / raw)
  To: David Edelsohn via Gcc-patches; +Cc: Segher Boessenkool, David Edelsohn

On Aug 25 2021, David Edelsohn via Gcc-patches wrote:

>     rs6000: inline ldouble __gcc_qsub
>
>     While performing some tests of IEEE 128 float for PPC64LE, Michael
>     Meissner noticed that __gcc_qsub is substantially slower than
>     __gcc_qadd.  __gcc_qsub valls __gcc_add with the second operand
                                    __gcc_qadd

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Inline IBM long double __gcc_qsub
  2021-08-26  0:23 [PATCH] Inline IBM long double __gcc_qsub David Edelsohn
  2021-08-26  7:35 ` Andreas Schwab
@ 2021-08-26  7:40 ` Andreas Schwab
  2021-08-26 18:57   ` [PATCH v2] " David Edelsohn
  2021-08-26 12:15 ` [PATCH] " Segher Boessenkool
  2 siblings, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2021-08-26  7:40 UTC (permalink / raw)
  To: David Edelsohn via Gcc-patches; +Cc: Segher Boessenkool, David Edelsohn

On Aug 25 2021, David Edelsohn via Gcc-patches wrote:

>     rs6000: inline ldouble __gcc_qsub
>
>     While performing some tests of IEEE 128 float for PPC64LE, Michael
>     Meissner noticed that __gcc_qsub is substantially slower than
>     __gcc_qadd.  __gcc_qsub valls __gcc_add with the second operand
>     negated.  Because the functions normally are invoked through
>     libgcc shared object, the extra PLT overhead has a large impact
>     on the overall time of the function.  Instead of trying to be
>     fancy with function decorations to prevent interposition, this
>     patch inlines the definition of __gcc_qadd into __gcc_qsub with
>     the negation propagated through the function.
>
>     libgcc/ChangeLog:
>
>             * config/rs6000/ibm-ldouble.c (__gcc_qsub): Inline negated
> __gcc_qadd.

How about defining a static function that is used by both?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Inline IBM long double __gcc_qsub
  2021-08-26  0:23 [PATCH] Inline IBM long double __gcc_qsub David Edelsohn
  2021-08-26  7:35 ` Andreas Schwab
  2021-08-26  7:40 ` Andreas Schwab
@ 2021-08-26 12:15 ` Segher Boessenkool
  2 siblings, 0 replies; 7+ messages in thread
From: Segher Boessenkool @ 2021-08-26 12:15 UTC (permalink / raw)
  To: David Edelsohn; +Cc: GCC Patches

Hi!

On Wed, Aug 25, 2021 at 08:23:32PM -0400, David Edelsohn wrote:
>     rs6000: inline ldouble __gcc_qsub
> 
>     While performing some tests of IEEE 128 float for PPC64LE, Michael
>     Meissner noticed that __gcc_qsub is substantially slower than
>     __gcc_qadd.  __gcc_qsub valls __gcc_add with the second operand

("calls", "__gcc_qadd")

>     negated.  Because the functions normally are invoked through
>     libgcc shared object, the extra PLT overhead has a large impact
>     on the overall time of the function.  Instead of trying to be
>     fancy with function decorations to prevent interposition, this
>     patch inlines the definition of __gcc_qadd into __gcc_qsub with
>     the negation propagated through the function.

Looks good to me, and it is a good way to resolve this.  This code is
too old (and unimportant) to do serious engineering on.  If we want
any serious optimisation on it we should do that at tree level (why does
that not happen yet anyway?), and inline all of this.  This patch is
really just to make benchmark results saner ;-)

Thanks David!


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] Inline IBM long double __gcc_qsub
  2021-08-26  7:40 ` Andreas Schwab
@ 2021-08-26 18:57   ` David Edelsohn
  2021-08-26 22:51     ` Segher Boessenkool
  0 siblings, 1 reply; 7+ messages in thread
From: David Edelsohn @ 2021-08-26 18:57 UTC (permalink / raw)
  To: Andreas Schwab, Segher Boessenkool; +Cc: David Edelsohn via Gcc-patches

    While performing some tests of IEEE 128 float for PPC64LE, Michael
    Meissner noticed that __gcc_qsub is substantially slower than
    __gcc_qadd.  __gcc_qsub calls __gcc_add with the second operand
    negated.  Because the functions normally are invoked through
    libgcc shared object, the extra PLT overhead has a large impact
    on the overall time of the function.  This patch converts
    __gcc_qadd to a static inline function invoked by __gcc_qadd
    and __gcc_qsub.

    libgcc/ChangeLog:

            * config/rs6000/ibm-ldouble.c (ldouble_qadd_internal): Rename from
            __gcc_qadd.
            (__gcc_qadd): Call ldouble_qadd_internal.
            (__gcc_qsub): Call ldouble_qadd_internal with second long double
            argument negated.

diff --git a/libgcc/config/rs6000/ibm-ldouble.c
b/libgcc/config/rs6000/ibm-ldouble.c
index 4c13453f975..0b385aa940b 100644
--- a/libgcc/config/rs6000/ibm-ldouble.c
+++ b/libgcc/config/rs6000/ibm-ldouble.c
@@ -118,8 +118,8 @@ pack_ldouble (double dh, double dl)
 }

 /* Add two 'IBM128_TYPE' values and return the result. */
-IBM128_TYPE
-__gcc_qadd (double a, double aa, double c, double cc)
+static inline IBM128_TYPE
+ldouble_qadd_internal (double a, double aa, double c, double cc)
 {
   double xh, xl, z, q, zz;

@@ -158,9 +158,15 @@ __gcc_qadd (double a, double aa, double c, double cc)
 }

 IBM128_TYPE
-__gcc_qsub (double a, double b, double c, double d)
+__gcc_qadd (double a, double aa, double c, double cc)
+{
+  return ldouble_qadd_internal (a, aa, c, cc);
+}
+
+IBM128_TYPE
+__gcc_qsub (double a, double aa, double c, double cc)
 {
-  return __gcc_qadd (a, b, -c, -d);
+  return ldouble_qadd_internal (a, aa, -c, -cc);
 }

 #ifdef __NO_FPRS__

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] Inline IBM long double __gcc_qsub
  2021-08-26 18:57   ` [PATCH v2] " David Edelsohn
@ 2021-08-26 22:51     ` Segher Boessenkool
  2021-08-26 23:07       ` David Edelsohn
  0 siblings, 1 reply; 7+ messages in thread
From: Segher Boessenkool @ 2021-08-26 22:51 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Andreas Schwab, David Edelsohn via Gcc-patches

Hi!

On Thu, Aug 26, 2021 at 02:57:35PM -0400, David Edelsohn wrote:
>             * config/rs6000/ibm-ldouble.c (ldouble_qadd_internal): Rename from
>             __gcc_qadd.
>             (__gcc_qadd): Call ldouble_qadd_internal.
>             (__gcc_qsub): Call ldouble_qadd_internal with second long double
>             argument negated.

Still looks good, please commit.  Thanks :-)

> +static inline IBM128_TYPE
> +ldouble_qadd_internal (double a, double aa, double c, double cc)

Does it end up actually inlined, or as one static function that both
__gcc_qadd and __gcc_qsub use?  This is fine for complexity, it is just
a simple tail-call jump, just wondering what the compiler thinks is best
here (it matters in other cases, if the inline function has conditional
branches for example).


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] Inline IBM long double __gcc_qsub
  2021-08-26 22:51     ` Segher Boessenkool
@ 2021-08-26 23:07       ` David Edelsohn
  0 siblings, 0 replies; 7+ messages in thread
From: David Edelsohn @ 2021-08-26 23:07 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Andreas Schwab, David Edelsohn via Gcc-patches

On Thu, Aug 26, 2021 at 6:53 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Thu, Aug 26, 2021 at 02:57:35PM -0400, David Edelsohn wrote:
> >             * config/rs6000/ibm-ldouble.c (ldouble_qadd_internal): Rename from
> >             __gcc_qadd.
> >             (__gcc_qadd): Call ldouble_qadd_internal.
> >             (__gcc_qsub): Call ldouble_qadd_internal with second long double
> >             argument negated.
>
> Still looks good, please commit.  Thanks :-)
>
> > +static inline IBM128_TYPE
> > +ldouble_qadd_internal (double a, double aa, double c, double cc)
>
> Does it end up actually inlined, or as one static function that both
> __gcc_qadd and __gcc_qsub use?  This is fine for complexity, it is just
> a simple tail-call jump, just wondering what the compiler thinks is best
> here (it matters in other cases, if the inline function has conditional
> branches for example).

I confirmed that the implementation is inlined in both functions when
compiled with optimization, and the negation is propagated into qsub.

Thanks, David

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-26 23:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26  0:23 [PATCH] Inline IBM long double __gcc_qsub David Edelsohn
2021-08-26  7:35 ` Andreas Schwab
2021-08-26  7:40 ` Andreas Schwab
2021-08-26 18:57   ` [PATCH v2] " David Edelsohn
2021-08-26 22:51     ` Segher Boessenkool
2021-08-26 23:07       ` David Edelsohn
2021-08-26 12:15 ` [PATCH] " Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).