[PATCH] aarch64: Implement math acceleration via builtins

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] aarch64:  Implement math acceleration via builtins
@ 2017-10-13 21:26 Michael Collison
  2017-10-13 22:37 ` Adhemerval Zanella
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Collison @ 2017-10-13 21:26 UTC (permalink / raw)
  To: libc-alpha; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 3239 bytes --]

This patch converts asm statements into builtin and neon intrinsics for AArch64.
As an example for the file sysdeps/aarch64/fpu/_s_ceil.c, we convert the function from

double
__ceil (double x)
{
  double result;
  asm ("frintp\t%d0, %d1" :
       "=w" (result) : "w" (x) );
  return result;
}

into

double
__ceil (double x)
{
  return __builtin_ceil (x);
}

Tested on aarch64-linux-gnu with gcc-4.9.4 and gcc-6. Okay for trunk?

2017-10-13  Michael Collison  <michael.collison@arm.com>

	* sysdeps/aarch64/fpu/e_sqrt.c (ieee754_sqrt): Replace
	asm statements with __builtin_sqrt.
	* sysdeps/aarch64/fpu/e_sqrtf.c (ieee754_sqrtf): Replace
	asm statements with __builtin_sqrtf.
	* sysdeps/aarch64/fpu/s_ceil.c (__ceil): Replace
	asm statements with __builtin_ceil.
	* sysdeps/aarch64/fpu/s_ceilf.c (__ceilf): Replace
	asm statements with __builtin_ceilf.
	* sysdeps/aarch64/fpu/s_floor.c (__floor): Replace
	asm statements with __builtin_floor.
	* sysdeps/aarch64/fpu/s_floorf.c (__floorf): Replace
	asm statements with __builtin_floorf.
	* sysdeps/aarch64/fpu/s_fma.c (__fma): Replace
	asm statements with __builtin_fma.
	* sysdeps/aarch64/fpu/s_fmaf.c (__fmaf): Replace
	asm statements with __builtin_fmaf.
	* sysdeps/aarch64/fpu/s_fmax.c (__fmax): Replace
	asm statements with __builtin_fmax.
	* sysdeps/aarch64/fpu/s_fmaxf.c (__fmaxf): Replace
	asm statements with __builtin_fmaxf.
	* sysdeps/aarch64/fpu/s_fmin.c (__fmin): Replace
	asm statements with __builtin_fmin.
	* sysdeps/aarch64/fpu/s_fminf.c (__fminf): Replace
	asm statements with __builtin_fminf.
	* sysdeps/aarch64/fpu/s_frint.c: Delete file.
	* sysdeps/aarch64/fpu/s_frintf.c: Delete file.
	* sysdeps/aarch64/fpu/s_llrint.c (__llrint): Replace
	asm statements with builtin and neon intrinsic.
	* sysdeps/aarch64/fpu/s_llrintf.c (__llrintf): Ditto.
	* sysdeps/aarch64/fpu/s_llround.c (__llround): Replace
	asm statements with neon intrinsic.
	* sysdeps/aarch64/fpu/s_llroundf.c (__llroundf): Ditto.
	* sysdeps/aarch64/fpu/s_lrint.c (__lrint): Replace
	asm statements with builtin and neon intrinsic.
	* sysdeps/aarch64/fpu/s_lrintf.c (__lrintf): Ditto.
	* sysdeps/aarch64/fpu/s_lround.c (__lround): Replace
	asm statements with neon intrinsic vcvtad_s64_f64.
	* sysdeps/aarch64/fpu/s_lroundf.c (__lroundf): Replace
	asm statements with neon intrinsic vcvtas_s32_f32.
	* sysdeps/aarch64/fpu/s_nearbyint.c (__nearbyint): Replace
	asm statements with __builtin_nearbyint.
	* sysdeps/aarch64/fpu/s_nearbyintf.c (__nearbyintf): Replace
	asm statements with __builtin_nearbyintf.
	* sysdeps/aarch64/fpu/s_rint.c (__rint): Replace
	asm statements with __builtin_rint.
	* sysdeps/aarch64/fpu/s_rintf.c (__rintf): Replace
	asm statements with __builtin_rintf.
	* sysdeps/aarch64/fpu/s_round.c (__round): Replace
	asm statements with __builtin_round.
	* sysdeps/aarch64/fpu/s_roundf.c (__roundf): Replace
	asm statements with __builtin_roundf.
	* sysdeps/aarch64/fpu/s_trunc.c (__trunc): Replace
	asm statements with __builtin_trunc.
	* sysdeps/aarch64/fpu/s_truncf.c (__truncf): Replace
	asm statements with __builtin_truncf.
	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with
	-fno-math-errno.

[-- Attachment #2: gnutools-1925-rebase-v12.patch --]
[-- Type: application/octet-stream, Size: 21753 bytes --]

From 750555187d336401d8833bcfe29497e54267aae2 Mon Sep 17 00:00:00 2001
From: Michael Collison <michael.collison@arm.com>
Date: Wed, 11 Oct 2017 08:22:54 +0100
Subject: [PATCH] Commit for GNUTOOLS-1925 after rebase

Commit latest ReviewBoard comments

fix whitespace problem
---
 sysdeps/aarch64/fpu/Makefile       |  4 +++
 sysdeps/aarch64/fpu/e_sqrt.c       |  4 +--
 sysdeps/aarch64/fpu/e_sqrtf.c      |  4 +--
 sysdeps/aarch64/fpu/s_ceil.c       | 12 ++++++---
 sysdeps/aarch64/fpu/s_ceilf.c      | 12 ++++++---
 sysdeps/aarch64/fpu/s_floor.c      | 12 ++++++---
 sysdeps/aarch64/fpu/s_floorf.c     | 12 ++++++---
 sysdeps/aarch64/fpu/s_fma.c        | 26 +++---------------
 sysdeps/aarch64/fpu/s_fmaf.c       | 13 ++++++---
 sysdeps/aarch64/fpu/s_fmax.c       | 12 ++++++---
 sysdeps/aarch64/fpu/s_fmaxf.c      | 14 ++++++----
 sysdeps/aarch64/fpu/s_fmin.c       | 30 +++------------------
 sysdeps/aarch64/fpu/s_fminf.c      | 13 ++++++---
 sysdeps/aarch64/fpu/s_frint.c      | 49 ---------------------------------
 sysdeps/aarch64/fpu/s_frintf.c     | 24 -----------------
 sysdeps/aarch64/fpu/s_llrint.c     | 20 +++++++++++---
 sysdeps/aarch64/fpu/s_llrintf.c    | 23 +++++++++++-----
 sysdeps/aarch64/fpu/s_llround.c    | 13 ++++++---
 sysdeps/aarch64/fpu/s_llroundf.c   | 15 ++++++-----
 sysdeps/aarch64/fpu/s_lrint.c      | 46 ++++++++++---------------------
 sysdeps/aarch64/fpu/s_lrintf.c     | 20 +++++++++++---
 sysdeps/aarch64/fpu/s_lround.c     | 55 +++++---------------------------------
 sysdeps/aarch64/fpu/s_lroundf.c    | 13 ++++++---
 sysdeps/aarch64/fpu/s_nearbyint.c  | 12 ++++++---
 sysdeps/aarch64/fpu/s_nearbyintf.c | 12 ++++++---
 sysdeps/aarch64/fpu/s_rint.c       | 12 ++++++---
 sysdeps/aarch64/fpu/s_rintf.c      | 12 ++++++---
 sysdeps/aarch64/fpu/s_round.c      | 12 ++++++---
 sysdeps/aarch64/fpu/s_roundf.c     | 12 ++++++---
 sysdeps/aarch64/fpu/s_trunc.c      | 12 ++++++---
 sysdeps/aarch64/fpu/s_truncf.c     | 12 ++++++---
 31 files changed, 254 insertions(+), 288 deletions(-)
 create mode 100644 sysdeps/aarch64/fpu/Makefile
 delete mode 100644 sysdeps/aarch64/fpu/s_frint.c
 delete mode 100644 sysdeps/aarch64/fpu/s_frintf.c

diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
new file mode 100644
index 0000000..4bd4e69
--- /dev/null
+++ b/sysdeps/aarch64/fpu/Makefile
@@ -0,0 +1,4 @@
+ifeq ($(subdir),math)
+CFLAGS-e_sqrtf.c += -fno-math-errno
+CFLAGS-e_sqrt.c += -fno-math-errno
+endif
diff --git a/sysdeps/aarch64/fpu/e_sqrt.c b/sysdeps/aarch64/fpu/e_sqrt.c
index f984d87..b80ac27 100644
--- a/sysdeps/aarch64/fpu/e_sqrt.c
+++ b/sysdeps/aarch64/fpu/e_sqrt.c
@@ -21,8 +21,6 @@
 double
 __ieee754_sqrt (double d)
 {
-  double res;
-  asm ("fsqrt   %d0, %d1" : "=w" (res) : "w" (d));
-  return res;
+  return __builtin_sqrt (d);
 }
 strong_alias (__ieee754_sqrt, __sqrt_finite)
diff --git a/sysdeps/aarch64/fpu/e_sqrtf.c b/sysdeps/aarch64/fpu/e_sqrtf.c
index 67707ef..7380454 100644
--- a/sysdeps/aarch64/fpu/e_sqrtf.c
+++ b/sysdeps/aarch64/fpu/e_sqrtf.c
@@ -21,8 +21,6 @@
 float
 __ieee754_sqrtf (float s)
 {
-  float res;
-  asm ("fsqrt   %s0, %s1" : "=w" (res) : "w" (s));
-  return res;
+  return __builtin_sqrtf (s);
 }
 strong_alias (__ieee754_sqrtf, __sqrtf_finite)
diff --git a/sysdeps/aarch64/fpu/s_ceil.c b/sysdeps/aarch64/fpu/s_ceil.c
index d0a8bd8..bc90ab9 100644
--- a/sysdeps/aarch64/fpu/s_ceil.c
+++ b/sysdeps/aarch64/fpu/s_ceil.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define	FUNC ceil
-#define INSN "frintp"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__ceil (double x)
+{
+  return __builtin_ceil (x);
+}
+
+weak_alias (__ceil, ceil)
diff --git a/sysdeps/aarch64/fpu/s_ceilf.c b/sysdeps/aarch64/fpu/s_ceilf.c
index b9c2e7c..d5c4383 100644
--- a/sysdeps/aarch64/fpu/s_ceilf.c
+++ b/sysdeps/aarch64/fpu/s_ceilf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define	FUNC ceilf
-#define INSN "frintp"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__ceilf (float x)
+{
+  return __builtin_ceilf (x);
+}
+
+weak_alias (__ceilf, ceilf)
diff --git a/sysdeps/aarch64/fpu/s_floor.c b/sysdeps/aarch64/fpu/s_floor.c
index f7f8731..049535c 100644
--- a/sysdeps/aarch64/fpu/s_floor.c
+++ b/sysdeps/aarch64/fpu/s_floor.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC floor
-#define INSN "frintm"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__floor (double x)
+{
+  return __builtin_floor (x);
+}
+
+weak_alias (__floor, floor)
diff --git a/sysdeps/aarch64/fpu/s_floorf.c b/sysdeps/aarch64/fpu/s_floorf.c
index 7be63b5..fa6fa17 100644
--- a/sysdeps/aarch64/fpu/s_floorf.c
+++ b/sysdeps/aarch64/fpu/s_floorf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC floorf
-#define INSN "frintm"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__floorf (float x)
+{
+  return __builtin_floorf (x);
+}
+
+weak_alias (__floorf, floorf)
diff --git a/sysdeps/aarch64/fpu/s_fma.c b/sysdeps/aarch64/fpu/s_fma.c
index 6f62ce2..e496ec6 100644
--- a/sysdeps/aarch64/fpu/s_fma.c
+++ b/sysdeps/aarch64/fpu/s_fma.c
@@ -18,28 +18,10 @@
 
 #include <math.h>
 
-#ifndef FUNC
-# define FUNC fma
-#endif
-
-#ifndef TYPE
-# define TYPE double
-# define REGS "d"
-#else
-# ifndef REGS
-#  error REGS not defined
-# endif
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-TYPE
-__CONCATX(__,FUNC) (TYPE x, TYPE y, TYPE z)
+double
+__fma (double x, double y, double z)
 {
-  TYPE result;
-  asm ( "fmadd" "\t%" REGS "0, %" REGS "1, %" REGS "2, %" REGS "3"
-        : "=w" (result) : "w" (x), "w" (y), "w" (z) );
-  return result;
+  return __builtin_fma (x, y, z);
 }
 
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__fma, fma)
diff --git a/sysdeps/aarch64/fpu/s_fmaf.c b/sysdeps/aarch64/fpu/s_fmaf.c
index 880a22d..ff1abbf 100644
--- a/sysdeps/aarch64/fpu/s_fmaf.c
+++ b/sysdeps/aarch64/fpu/s_fmaf.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fmaf
-#define TYPE float
-#define REGS "s"
-#include <s_fma.c>
+#include <math.h>
+
+float
+__fmaf (float x, float y, float z)
+{
+  return __builtin_fmaf (x, y, z);
+}
+
+weak_alias (__fmaf, fmaf)
diff --git a/sysdeps/aarch64/fpu/s_fmax.c b/sysdeps/aarch64/fpu/s_fmax.c
index 395a9ba..d7a82f8 100644
--- a/sysdeps/aarch64/fpu/s_fmax.c
+++ b/sysdeps/aarch64/fpu/s_fmax.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fmax
-#define INSN "fmaxnm"
-#include <fpu/s_fmin.c>
+#include <math.h>
+
+double
+__fmax (double x, double y)
+{
+  return __builtin_fmax (x, y);
+}
+
+weak_alias (__fmax, fmax)
diff --git a/sysdeps/aarch64/fpu/s_fmaxf.c b/sysdeps/aarch64/fpu/s_fmaxf.c
index f450d9f..ec4dcdd 100644
--- a/sysdeps/aarch64/fpu/s_fmaxf.c
+++ b/sysdeps/aarch64/fpu/s_fmaxf.c
@@ -16,8 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fmaxf
-#define INSN "fmaxnm"
-#define TYPE float
-#define REGS "s"
-#include <fpu/s_fmin.c>
+#include <math.h>
+
+float
+__fmaxf (float x, float y)
+{
+  return __builtin_fmaxf (x, y);
+}
+
+weak_alias (__fmaxf, fmaxf)
diff --git a/sysdeps/aarch64/fpu/s_fmin.c b/sysdeps/aarch64/fpu/s_fmin.c
index b6d32d5..bba894e 100644
--- a/sysdeps/aarch64/fpu/s_fmin.c
+++ b/sysdeps/aarch64/fpu/s_fmin.c
@@ -18,32 +18,10 @@
 
 #include <math.h>
 
-#ifndef FUNC
-# define FUNC fmin
-#endif
-
-#ifndef INSN
-# define INSN "fminnm"
-#endif
-
-#ifndef TYPE
-# define TYPE double
-# define REGS "d"
-#else
-# ifndef REGS
-#  error REGS not defined
-# endif
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-TYPE
-__CONCATX(__,FUNC) (TYPE x, TYPE y)
+double
+__fmin (double x, double y)
 {
-  TYPE result;
-  asm ( INSN "\t%" REGS "0, %" REGS "1, %" REGS "2"
-        : "=w" (result) : "w" (x), "w" (y) );
-  return result;
+  return __builtin_fmin (x, y);
 }
 
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__fmin, fmin)
diff --git a/sysdeps/aarch64/fpu/s_fminf.c b/sysdeps/aarch64/fpu/s_fminf.c
index 032262d..7d3a3a3 100644
--- a/sysdeps/aarch64/fpu/s_fminf.c
+++ b/sysdeps/aarch64/fpu/s_fminf.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fminf
-#define TYPE float
-#define REGS "s"
-#include <fpu/s_fmin.c>
+#include <math.h>
+
+float
+__fminf (float x, float y)
+{
+  return __builtin_fminf (x, y);
+}
+
+weak_alias (__fminf, fminf)
diff --git a/sysdeps/aarch64/fpu/s_frint.c b/sysdeps/aarch64/fpu/s_frint.c
deleted file mode 100644
index 48881f5..0000000
--- a/sysdeps/aarch64/fpu/s_frint.c
+++ /dev/null
@@ -1,49 +0,0 @@
-/* Copyright (C) 1996-2017 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#ifndef FUNC
-# error FUNC not defined
-#endif
-
-#ifndef TYPE
-# define TYPE double
-# define REGS "d"
-#else
-# ifndef REGS
-#  error REGS not defined
-# endif
-#endif
-
-#ifndef INSN
-# error INSN not defined
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-TYPE
-__CONCATX(__,FUNC) (TYPE x)
-{
-  TYPE result;
-  asm ( INSN "\t%" REGS "0, %" REGS "1" :
-	"=w" (result) : "w" (x) );
-  return result;
-}
-
-weak_alias (__CONCATX(__,FUNC), FUNC)
diff --git a/sysdeps/aarch64/fpu/s_frintf.c b/sysdeps/aarch64/fpu/s_frintf.c
deleted file mode 100644
index dae99d7..0000000
--- a/sysdeps/aarch64/fpu/s_frintf.c
+++ /dev/null
@@ -1,24 +0,0 @@
-/* Copyright (C) 2011-2017 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#ifndef FUNC
-# error FUNC not defined
-#endif
-#define TYPE float
-#define REGS "s"
-#include <s_frint.c>
diff --git a/sysdeps/aarch64/fpu/s_llrint.c b/sysdeps/aarch64/fpu/s_llrint.c
index 57821c0..3e41546 100644
--- a/sysdeps/aarch64/fpu/s_llrint.c
+++ b/sysdeps/aarch64/fpu/s_llrint.c
@@ -16,7 +16,19 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llrint
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lrint.c>
+#include <math.h>
+#include <math_private.h>
+
+long long int
+__llrint (double x)
+{
+  double r = __builtin_rint (x);
+
+  // Prevent gcc from calling lrint directly when compiled with -fno-math-errno
+  // by inserting a barrier
+
+  math_opt_barrier (r);
+  return r;
+}
+
+weak_alias (__llrint, llrint)
diff --git a/sysdeps/aarch64/fpu/s_llrintf.c b/sysdeps/aarch64/fpu/s_llrintf.c
index 98ed4f8..baf9027 100644
--- a/sysdeps/aarch64/fpu/s_llrintf.c
+++ b/sysdeps/aarch64/fpu/s_llrintf.c
@@ -16,9 +16,20 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llrintf
-#define ITYPE float
-#define IREG_SIZE 32
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lrint.c>
+#include <math.h>
+#include <math_private.h>
+
+long long int
+__llrintf (float x)
+{
+  float r = __builtin_rintf (x);
+
+  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
+  // by inserting a barrier
+
+
+  math_opt_barrier (r);
+  return r;
+}
+
+weak_alias (__llrintf, llrintf)
diff --git a/sysdeps/aarch64/fpu/s_llround.c b/sysdeps/aarch64/fpu/s_llround.c
index ef7aedf..2902946 100644
--- a/sysdeps/aarch64/fpu/s_llround.c
+++ b/sysdeps/aarch64/fpu/s_llround.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llround
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lround.c>
+#include <math.h>
+
+long long int
+__llround (double x)
+{
+  return __builtin_llround (x);
+}
+
+weak_alias (__llround, llround)
diff --git a/sysdeps/aarch64/fpu/s_llroundf.c b/sysdeps/aarch64/fpu/s_llroundf.c
index 294f0f4..0ca390b 100644
--- a/sysdeps/aarch64/fpu/s_llroundf.c
+++ b/sysdeps/aarch64/fpu/s_llroundf.c
@@ -16,9 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llroundf
-#define ITYPE float
-#define IREG_SIZE 32
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lround.c>
+#include <math.h>
+
+long long int
+__llroundf (float x)
+{
+  return __builtin_llroundf (x);
+}
+
+weak_alias (__llroundf, llroundf)
diff --git a/sysdeps/aarch64/fpu/s_lrint.c b/sysdeps/aarch64/fpu/s_lrint.c
index 6ef64e2..ba4d7cc 100644
--- a/sysdeps/aarch64/fpu/s_lrint.c
+++ b/sysdeps/aarch64/fpu/s_lrint.c
@@ -19,38 +19,17 @@
 #include <math.h>
 #include <get-rounding-mode.h>
 #include <stdint.h>
+#include <math_private.h>
 
-#ifndef FUNC
-# define FUNC lrint
-#endif
-
-#ifndef ITYPE
-# define ITYPE double
 # define IREG_SIZE 64
-#else
-# ifndef IREG_SIZE
-#  error IREG_SIZE not defined
-# endif
-#endif
 
-#ifndef OTYPE
-# define OTYPE long int
 # ifdef __ILP32__
 #  define OREG_SIZE 32
 # else
 #  define OREG_SIZE 64
 # endif
-#else
-# ifndef OREG_SIZE
-#  error OREG_SIZE not defined
-# endif
-#endif
 
-#if IREG_SIZE == 32
-# define IREGS "s"
-#else
 # define IREGS "d"
-#endif
 
 #if OREG_SIZE == 32
 # define OREGS "w"
@@ -58,15 +37,14 @@
 # define OREGS "x"
 #endif
 
-#define __CONCATX(a,b) __CONCAT(a,b)
 
-OTYPE
-__CONCATX(__,FUNC) (ITYPE x)
+long int
+__lrint (double x)
 {
-  OTYPE result;
-  ITYPE temp;
 
 #if IREG_SIZE == 64 && OREG_SIZE == 32
+  long int result;
+
   if (__builtin_fabs (x) > INT32_MAX)
     {
       /* Converting large values to a 32 bit int may cause the frintx/fcvtza
@@ -96,10 +74,14 @@ __CONCATX(__,FUNC) (ITYPE x)
       return result;
     }
 #endif
-  asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
-        "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
-        : "=r" (result), "=w" (temp) : "w" (x) );
-  return result;
+
+  double r =  __builtin_rint (x);
+
+  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
+  // by inserting a barrier
+
+  math_opt_barrier (r);
+  return r;
 }
 
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__lrint, lrint)
diff --git a/sysdeps/aarch64/fpu/s_lrintf.c b/sysdeps/aarch64/fpu/s_lrintf.c
index 2e73271..8d4d4d0 100644
--- a/sysdeps/aarch64/fpu/s_lrintf.c
+++ b/sysdeps/aarch64/fpu/s_lrintf.c
@@ -16,7 +16,19 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC lrintf
-#define ITYPE float
-#define IREG_SIZE 32
-#include <s_lrint.c>
+#include <math.h>
+#include <math_private.h>
+
+long int
+__lrintf (float x)
+{
+  float r = __builtin_rintf (x);
+
+  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
+  // by inserting a barrier
+
+  math_opt_barrier (r);
+  return r;
+}
+
+weak_alias (__lrintf, lrintf)
diff --git a/sysdeps/aarch64/fpu/s_lround.c b/sysdeps/aarch64/fpu/s_lround.c
index 1f77d82..90c3163 100644
--- a/sysdeps/aarch64/fpu/s_lround.c
+++ b/sysdeps/aarch64/fpu/s_lround.c
@@ -18,53 +18,10 @@
 
 #include <math.h>
 
-#ifndef FUNC
-# define FUNC lround
-#endif
+long int
+__lround (double x)
+ {
+  return __builtin_lround (x);
+ }
 
-#ifndef ITYPE
-# define ITYPE double
-# define IREG_SIZE 64
-#else
-# ifndef IREG_SIZE
-#  error IREG_SIZE not defined
-# endif
-#endif
-
-#ifndef OTYPE
-# define OTYPE long int
-# ifdef __ILP32__
-#  define OREG_SIZE 32
-# else
-#  define OREG_SIZE 64
-# endif
-#else
-# ifndef OREG_SIZE
-#  error OREG_SIZE not defined
-# endif
-#endif
-
-#if IREG_SIZE == 32
-# define IREGS "s"
-#else
-# define IREGS "d"
-#endif
-
-#if OREG_SIZE == 32
-# define OREGS "w"
-#else
-# define OREGS "x"
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-OTYPE
-__CONCATX(__,FUNC) (ITYPE x)
-{
-  OTYPE result;
-  asm ( "fcvtas" "\t%" OREGS "0, %" IREGS "1"
-        : "=r" (result) : "w" (x) );
-  return result;
-}
-
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__lround, lround)
diff --git a/sysdeps/aarch64/fpu/s_lroundf.c b/sysdeps/aarch64/fpu/s_lroundf.c
index b30ddb6..baf0693 100644
--- a/sysdeps/aarch64/fpu/s_lroundf.c
+++ b/sysdeps/aarch64/fpu/s_lroundf.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC lroundf
-#define ITYPE float
-#define IREG_SIZE 32
-#include <s_lround.c>
+#include <math.h>
+
+long int
+__lroundf (float x)
+{
+  return __builtin_lroundf (x);
+}
+
+weak_alias (__lroundf, lroundf)
diff --git a/sysdeps/aarch64/fpu/s_nearbyint.c b/sysdeps/aarch64/fpu/s_nearbyint.c
index 51067f2..6ba5de1 100644
--- a/sysdeps/aarch64/fpu/s_nearbyint.c
+++ b/sysdeps/aarch64/fpu/s_nearbyint.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define	FUNC nearbyint
-#define INSN "frinti"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__nearbyint (double x)
+{
+  return __builtin_nearbyint (x);
+}
+
+weak_alias (__nearbyint, nearbyint)
diff --git a/sysdeps/aarch64/fpu/s_nearbyintf.c b/sysdeps/aarch64/fpu/s_nearbyintf.c
index 8125646..de69fd9 100644
--- a/sysdeps/aarch64/fpu/s_nearbyintf.c
+++ b/sysdeps/aarch64/fpu/s_nearbyintf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC nearbyintf
-#define INSN "frinti"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__nearbyintf (float x)
+{
+  return __builtin_nearbyintf (x);
+}
+
+weak_alias (__nearbyintf, nearbyintf)
diff --git a/sysdeps/aarch64/fpu/s_rint.c b/sysdeps/aarch64/fpu/s_rint.c
index 73b4e26..b4ac349 100644
--- a/sysdeps/aarch64/fpu/s_rint.c
+++ b/sysdeps/aarch64/fpu/s_rint.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC rint
-#define INSN "frintx"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__rint (double x)
+{
+  return __builtin_rint (x);
+}
+
+weak_alias (__rint, rint)
diff --git a/sysdeps/aarch64/fpu/s_rintf.c b/sysdeps/aarch64/fpu/s_rintf.c
index 3560dc2..d0f70ce 100644
--- a/sysdeps/aarch64/fpu/s_rintf.c
+++ b/sysdeps/aarch64/fpu/s_rintf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC rintf
-#define INSN "frintx"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__rintf (float x)
+{
+  return __builtin_rintf (x);
+}
+
+weak_alias (__rintf, rintf)
diff --git a/sysdeps/aarch64/fpu/s_round.c b/sysdeps/aarch64/fpu/s_round.c
index 6781748..a34fca1 100644
--- a/sysdeps/aarch64/fpu/s_round.c
+++ b/sysdeps/aarch64/fpu/s_round.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC round
-#define INSN "frinta"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__round (double x)
+{
+  return __builtin_round (x);
+}
+
+weak_alias (__round, round)
diff --git a/sysdeps/aarch64/fpu/s_roundf.c b/sysdeps/aarch64/fpu/s_roundf.c
index ef6f672..66c8ee6 100644
--- a/sysdeps/aarch64/fpu/s_roundf.c
+++ b/sysdeps/aarch64/fpu/s_roundf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC roundf
-#define INSN "frinta"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__roundf (float x)
+{
+  return __builtin_roundf (x);
+}
+
+weak_alias (__roundf, roundf)
diff --git a/sysdeps/aarch64/fpu/s_trunc.c b/sysdeps/aarch64/fpu/s_trunc.c
index 2bf5474..6550dfc 100644
--- a/sysdeps/aarch64/fpu/s_trunc.c
+++ b/sysdeps/aarch64/fpu/s_trunc.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC trunc
-#define INSN "frintz"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__trunc (double x)
+{
+  return __builtin_trunc (x);
+}
+
+weak_alias (__trunc, trunc)
diff --git a/sysdeps/aarch64/fpu/s_truncf.c b/sysdeps/aarch64/fpu/s_truncf.c
index 94865a4..b7890a2 100644
--- a/sysdeps/aarch64/fpu/s_truncf.c
+++ b/sysdeps/aarch64/fpu/s_truncf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC truncf
-#define INSN "frintz"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__truncf (float x)
+{
+  return __builtin_truncf (x);
+}
+
+weak_alias (__truncf, truncf)
-- 
1.9.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] aarch64: Implement math acceleration via builtins
  2017-10-13 21:26 [PATCH] aarch64: Implement math acceleration via builtins Michael Collison
@ 2017-10-13 22:37 ` Adhemerval Zanella
  2017-10-16  9:37   ` Szabolcs Nagy
  2017-10-17 16:14   ` Szabolcs Nagy
  0 siblings, 2 replies; 10+ messages in thread
From: Adhemerval Zanella @ 2017-10-13 22:37 UTC (permalink / raw)
  To: libc-alpha



On 13/10/2017 18:26, Michael Collison wrote:
> This patch converts asm statements into builtin and neon intrinsics for AArch64.
> As an example for the file sysdeps/aarch64/fpu/_s_ceil.c, we convert the function from
> 
> double
> __ceil (double x)
> {
>   double result;
>   asm ("frintp\t%d0, %d1" :
>        "=w" (result) : "w" (x) );
>   return result;
> }
> 
> into
> 
> double
> __ceil (double x)
> {
>   return __builtin_ceil (x);
> }
> 
> Tested on aarch64-linux-gnu with gcc-4.9.4 and gcc-6. Okay for trunk?
> 
> 2017-10-13  Michael Collison  <michael.collison@arm.com>
> 
> 	* sysdeps/aarch64/fpu/e_sqrt.c (ieee754_sqrt): Replace
> 	asm statements with __builtin_sqrt.
> 	* sysdeps/aarch64/fpu/e_sqrtf.c (ieee754_sqrtf): Replace
> 	asm statements with __builtin_sqrtf.
> 	* sysdeps/aarch64/fpu/s_ceil.c (__ceil): Replace
> 	asm statements with __builtin_ceil.
> 	* sysdeps/aarch64/fpu/s_ceilf.c (__ceilf): Replace
> 	asm statements with __builtin_ceilf.
> 	* sysdeps/aarch64/fpu/s_floor.c (__floor): Replace
> 	asm statements with __builtin_floor.
> 	* sysdeps/aarch64/fpu/s_floorf.c (__floorf): Replace
> 	asm statements with __builtin_floorf.
> 	* sysdeps/aarch64/fpu/s_fma.c (__fma): Replace
> 	asm statements with __builtin_fma.
> 	* sysdeps/aarch64/fpu/s_fmaf.c (__fmaf): Replace
> 	asm statements with __builtin_fmaf.
> 	* sysdeps/aarch64/fpu/s_fmax.c (__fmax): Replace
> 	asm statements with __builtin_fmax.
> 	* sysdeps/aarch64/fpu/s_fmaxf.c (__fmaxf): Replace
> 	asm statements with __builtin_fmaxf.
> 	* sysdeps/aarch64/fpu/s_fmin.c (__fmin): Replace
> 	asm statements with __builtin_fmin.
> 	* sysdeps/aarch64/fpu/s_fminf.c (__fminf): Replace
> 	asm statements with __builtin_fminf.
> 	* sysdeps/aarch64/fpu/s_frint.c: Delete file.
> 	* sysdeps/aarch64/fpu/s_frintf.c: Delete file.
> 	* sysdeps/aarch64/fpu/s_llrint.c (__llrint): Replace
> 	asm statements with builtin and neon intrinsic.
> 	* sysdeps/aarch64/fpu/s_llrintf.c (__llrintf): Ditto.
> 	* sysdeps/aarch64/fpu/s_llround.c (__llround): Replace
> 	asm statements with neon intrinsic.
> 	* sysdeps/aarch64/fpu/s_llroundf.c (__llroundf): Ditto.
> 	* sysdeps/aarch64/fpu/s_lrint.c (__lrint): Replace
> 	asm statements with builtin and neon intrinsic.
> 	* sysdeps/aarch64/fpu/s_lrintf.c (__lrintf): Ditto.
> 	* sysdeps/aarch64/fpu/s_lround.c (__lround): Replace
> 	asm statements with neon intrinsic vcvtad_s64_f64.
> 	* sysdeps/aarch64/fpu/s_lroundf.c (__lroundf): Replace
> 	asm statements with neon intrinsic vcvtas_s32_f32.
> 	* sysdeps/aarch64/fpu/s_nearbyint.c (__nearbyint): Replace
> 	asm statements with __builtin_nearbyint.
> 	* sysdeps/aarch64/fpu/s_nearbyintf.c (__nearbyintf): Replace
> 	asm statements with __builtin_nearbyintf.
> 	* sysdeps/aarch64/fpu/s_rint.c (__rint): Replace
> 	asm statements with __builtin_rint.
> 	* sysdeps/aarch64/fpu/s_rintf.c (__rintf): Replace
> 	asm statements with __builtin_rintf.
> 	* sysdeps/aarch64/fpu/s_round.c (__round): Replace
> 	asm statements with __builtin_round.
> 	* sysdeps/aarch64/fpu/s_roundf.c (__roundf): Replace
> 	asm statements with __builtin_roundf.
> 	* sysdeps/aarch64/fpu/s_trunc.c (__trunc): Replace
> 	asm statements with __builtin_trunc.
> 	* sysdeps/aarch64/fpu/s_truncf.c (__truncf): Replace
> 	asm statements with __builtin_truncf.
> 	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with
> 	-fno-math-errno.

Looks good in general with some comments below.  Is there any advantage
of the refactor besides avoid inline assembly?

Since you refactoring it, I would also suggest to use builtin for the 
aarch64 match_private.h sqrt and get/set fpcr at fpu_control.h.

> 
> diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
> new file mode 100644
> index 0000000..4bd4e69
> --- /dev/null
> +++ b/sysdeps/aarch64/fpu/Makefile
> @@ -0,0 +1,4 @@
> +ifeq ($(subdir),math)
> +CFLAGS-e_sqrtf.c += -fno-math-errno
> +CFLAGS-e_sqrt.c += -fno-math-errno
> +endif
> diff --git a/sysdeps/aarch64/fpu/e_sqrt.c b/sysdeps/aarch64/fpu/e_sqrt.c
> index f984d87..b80ac27 100644
> --- a/sysdeps/aarch64/fpu/e_sqrt.c
> +++ b/sysdeps/aarch64/fpu/e_sqrt.c
> @@ -21,8 +21,6 @@
>  double
>  __ieee754_sqrt (double d)
>  {
> -  double res;
> -  asm ("fsqrt   %d0, %d1" : "=w" (res) : "w" (d));
> -  return res;
> +  return __builtin_sqrt (d);
>  }
>  strong_alias (__ieee754_sqrt, __sqrt_finite)
> diff --git a/sysdeps/aarch64/fpu/e_sqrtf.c b/sysdeps/aarch64/fpu/e_sqrtf.c
> index 67707ef..7380454 100644
> --- a/sysdeps/aarch64/fpu/e_sqrtf.c
> +++ b/sysdeps/aarch64/fpu/e_sqrtf.c
> @@ -21,8 +21,6 @@
>  float
>  __ieee754_sqrtf (float s)
>  {
> -  float res;
> -  asm ("fsqrt   %s0, %s1" : "=w" (res) : "w" (s));
> -  return res;
> +  return __builtin_sqrtf (s);
>  }
>  strong_alias (__ieee754_sqrtf, __sqrtf_finite)
> diff --git a/sysdeps/aarch64/fpu/s_ceil.c b/sysdeps/aarch64/fpu/s_ceil.c
> index d0a8bd8..bc90ab9 100644
> --- a/sysdeps/aarch64/fpu/s_ceil.c
> +++ b/sysdeps/aarch64/fpu/s_ceil.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define	FUNC ceil
> -#define INSN "frintp"
> -#include <s_frint.c>
> +#include <math.h>
> +
> +double
> +__ceil (double x)
> +{
> +  return __builtin_ceil (x);
> +}
> +
> +weak_alias (__ceil, ceil)
> diff --git a/sysdeps/aarch64/fpu/s_ceilf.c b/sysdeps/aarch64/fpu/s_ceilf.c
> index b9c2e7c..d5c4383 100644
> --- a/sysdeps/aarch64/fpu/s_ceilf.c
> +++ b/sysdeps/aarch64/fpu/s_ceilf.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define	FUNC ceilf
> -#define INSN "frintp"
> -#include <s_frintf.c>
> +#include <math.h>
> +
> +float
> +__ceilf (float x)
> +{
> +  return __builtin_ceilf (x);
> +}
> +
> +weak_alias (__ceilf, ceilf)
> diff --git a/sysdeps/aarch64/fpu/s_floor.c b/sysdeps/aarch64/fpu/s_floor.c
> index f7f8731..049535c 100644
> --- a/sysdeps/aarch64/fpu/s_floor.c
> +++ b/sysdeps/aarch64/fpu/s_floor.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC floor
> -#define INSN "frintm"
> -#include <s_frint.c>
> +#include <math.h>
> +
> +double
> +__floor (double x)
> +{
> +  return __builtin_floor (x);
> +}
> +
> +weak_alias (__floor, floor)
> diff --git a/sysdeps/aarch64/fpu/s_floorf.c b/sysdeps/aarch64/fpu/s_floorf.c
> index 7be63b5..fa6fa17 100644
> --- a/sysdeps/aarch64/fpu/s_floorf.c
> +++ b/sysdeps/aarch64/fpu/s_floorf.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC floorf
> -#define INSN "frintm"
> -#include <s_frintf.c>
> +#include <math.h>
> +
> +float
> +__floorf (float x)
> +{
> +  return __builtin_floorf (x);
> +}
> +
> +weak_alias (__floorf, floorf)
> diff --git a/sysdeps/aarch64/fpu/s_fma.c b/sysdeps/aarch64/fpu/s_fma.c
> index 6f62ce2..e496ec6 100644
> --- a/sysdeps/aarch64/fpu/s_fma.c
> +++ b/sysdeps/aarch64/fpu/s_fma.c
> @@ -18,28 +18,10 @@
>  
>  #include <math.h>
>  
> -#ifndef FUNC
> -# define FUNC fma
> -#endif
> -
> -#ifndef TYPE
> -# define TYPE double
> -# define REGS "d"
> -#else
> -# ifndef REGS
> -#  error REGS not defined
> -# endif
> -#endif
> -
> -#define __CONCATX(a,b) __CONCAT(a,b)
> -
> -TYPE
> -__CONCATX(__,FUNC) (TYPE x, TYPE y, TYPE z)
> +double
> +__fma (double x, double y, double z)
>  {
> -  TYPE result;
> -  asm ( "fmadd" "\t%" REGS "0, %" REGS "1, %" REGS "2, %" REGS "3"
> -        : "=w" (result) : "w" (x), "w" (y), "w" (z) );
> -  return result;
> +  return __builtin_fma (x, y, z);
>  }
>  
> -weak_alias (__CONCATX(__,FUNC), FUNC)
> +weak_alias (__fma, fma)
> diff --git a/sysdeps/aarch64/fpu/s_fmaf.c b/sysdeps/aarch64/fpu/s_fmaf.c
> index 880a22d..ff1abbf 100644
> --- a/sysdeps/aarch64/fpu/s_fmaf.c
> +++ b/sysdeps/aarch64/fpu/s_fmaf.c
> @@ -16,7 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC fmaf
> -#define TYPE float
> -#define REGS "s"
> -#include <s_fma.c>
> +#include <math.h>
> +
> +float
> +__fmaf (float x, float y, float z)
> +{
> +  return __builtin_fmaf (x, y, z);
> +}
> +
> +weak_alias (__fmaf, fmaf)
> diff --git a/sysdeps/aarch64/fpu/s_fmax.c b/sysdeps/aarch64/fpu/s_fmax.c
> index 395a9ba..d7a82f8 100644
> --- a/sysdeps/aarch64/fpu/s_fmax.c
> +++ b/sysdeps/aarch64/fpu/s_fmax.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC fmax
> -#define INSN "fmaxnm"
> -#include <fpu/s_fmin.c>
> +#include <math.h>
> +
> +double
> +__fmax (double x, double y)
> +{
> +  return __builtin_fmax (x, y);
> +}
> +
> +weak_alias (__fmax, fmax)
> diff --git a/sysdeps/aarch64/fpu/s_fmaxf.c b/sysdeps/aarch64/fpu/s_fmaxf.c
> index f450d9f..ec4dcdd 100644
> --- a/sysdeps/aarch64/fpu/s_fmaxf.c
> +++ b/sysdeps/aarch64/fpu/s_fmaxf.c
> @@ -16,8 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC fmaxf
> -#define INSN "fmaxnm"
> -#define TYPE float
> -#define REGS "s"
> -#include <fpu/s_fmin.c>
> +#include <math.h>
> +
> +float
> +__fmaxf (float x, float y)
> +{
> +  return __builtin_fmaxf (x, y);
> +}
> +
> +weak_alias (__fmaxf, fmaxf)
> diff --git a/sysdeps/aarch64/fpu/s_fmin.c b/sysdeps/aarch64/fpu/s_fmin.c
> index b6d32d5..bba894e 100644
> --- a/sysdeps/aarch64/fpu/s_fmin.c
> +++ b/sysdeps/aarch64/fpu/s_fmin.c
> @@ -18,32 +18,10 @@
>  
>  #include <math.h>
>  
> -#ifndef FUNC
> -# define FUNC fmin
> -#endif
> -
> -#ifndef INSN
> -# define INSN "fminnm"
> -#endif
> -
> -#ifndef TYPE
> -# define TYPE double
> -# define REGS "d"
> -#else
> -# ifndef REGS
> -#  error REGS not defined
> -# endif
> -#endif
> -
> -#define __CONCATX(a,b) __CONCAT(a,b)
> -
> -TYPE
> -__CONCATX(__,FUNC) (TYPE x, TYPE y)
> +double
> +__fmin (double x, double y)
>  {
> -  TYPE result;
> -  asm ( INSN "\t%" REGS "0, %" REGS "1, %" REGS "2"
> -        : "=w" (result) : "w" (x), "w" (y) );
> -  return result;
> +  return __builtin_fmin (x, y);
>  }
>  
> -weak_alias (__CONCATX(__,FUNC), FUNC)
> +weak_alias (__fmin, fmin)
> diff --git a/sysdeps/aarch64/fpu/s_fminf.c b/sysdeps/aarch64/fpu/s_fminf.c
> index 032262d..7d3a3a3 100644
> --- a/sysdeps/aarch64/fpu/s_fminf.c
> +++ b/sysdeps/aarch64/fpu/s_fminf.c
> @@ -16,7 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC fminf
> -#define TYPE float
> -#define REGS "s"
> -#include <fpu/s_fmin.c>
> +#include <math.h>
> +
> +float
> +__fminf (float x, float y)
> +{
> +  return __builtin_fminf (x, y);
> +}
> +
> +weak_alias (__fminf, fminf)
> diff --git a/sysdeps/aarch64/fpu/s_frint.c b/sysdeps/aarch64/fpu/s_frint.c
> deleted file mode 100644
> index 48881f5..0000000
> --- a/sysdeps/aarch64/fpu/s_frint.c
> +++ /dev/null
> @@ -1,49 +0,0 @@
> -/* Copyright (C) 1996-2017 Free Software Foundation, Inc.
> -
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <http://www.gnu.org/licenses/>.  */
> -
> -#include <math.h>
> -
> -#ifndef FUNC
> -# error FUNC not defined
> -#endif
> -
> -#ifndef TYPE
> -# define TYPE double
> -# define REGS "d"
> -#else
> -# ifndef REGS
> -#  error REGS not defined
> -# endif
> -#endif
> -
> -#ifndef INSN
> -# error INSN not defined
> -#endif
> -
> -#define __CONCATX(a,b) __CONCAT(a,b)
> -
> -TYPE
> -__CONCATX(__,FUNC) (TYPE x)
> -{
> -  TYPE result;
> -  asm ( INSN "\t%" REGS "0, %" REGS "1" :
> -	"=w" (result) : "w" (x) );
> -  return result;
> -}
> -
> -weak_alias (__CONCATX(__,FUNC), FUNC)
> diff --git a/sysdeps/aarch64/fpu/s_frintf.c b/sysdeps/aarch64/fpu/s_frintf.c
> deleted file mode 100644
> index dae99d7..0000000
> --- a/sysdeps/aarch64/fpu/s_frintf.c
> +++ /dev/null
> @@ -1,24 +0,0 @@
> -/* Copyright (C) 2011-2017 Free Software Foundation, Inc.
> -
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public License as
> -   published by the Free Software Foundation; either version 2.1 of the
> -   License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <http://www.gnu.org/licenses/>.  */
> -
> -#ifndef FUNC
> -# error FUNC not defined
> -#endif
> -#define TYPE float
> -#define REGS "s"
> -#include <s_frint.c>
> diff --git a/sysdeps/aarch64/fpu/s_llrint.c b/sysdeps/aarch64/fpu/s_llrint.c
> index 57821c0..3e41546 100644
> --- a/sysdeps/aarch64/fpu/s_llrint.c
> +++ b/sysdeps/aarch64/fpu/s_llrint.c
> @@ -16,7 +16,19 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC llrint
> -#define OTYPE long long int
> -#define OREG_SIZE 64
> -#include <s_lrint.c>
> +#include <math.h>
> +#include <math_private.h>
> +
> +long long int
> +__llrint (double x)
> +{
> +  double r = __builtin_rint (x);
> +
> +  // Prevent gcc from calling lrint directly when compiled with -fno-math-errno
> +  // by inserting a barrier
> +
> +  math_opt_barrier (r);

I am not seeing any difference on code generation with or without the
math barrier when using GCC 4.9, GCC7 or GCC mainline with or without
-fno-math-errno (I am seeing just a combination of frintx plus fcvtzs).
Do we really need this barrier here?

Also I think the comment line is too long and I am not sure if it is
usual to use '//' comments (usually we use default '/*' '*/' pairs).

> +  return r;
> +}
> +
> +weak_alias (__llrint, llrint)
> diff --git a/sysdeps/aarch64/fpu/s_llrintf.c b/sysdeps/aarch64/fpu/s_llrintf.c
> index 98ed4f8..baf9027 100644
> --- a/sysdeps/aarch64/fpu/s_llrintf.c
> +++ b/sysdeps/aarch64/fpu/s_llrintf.c
> @@ -16,9 +16,20 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC llrintf
> -#define ITYPE float
> -#define IREG_SIZE 32
> -#define OTYPE long long int
> -#define OREG_SIZE 64
> -#include <s_lrint.c>
> +#include <math.h>
> +#include <math_private.h>
> +
> +long long int
> +__llrintf (float x)
> +{
> +  float r = __builtin_rintf (x);
> +
> +  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
> +  // by inserting a barrier
> +
> +
> +  math_opt_barrier (r);
> +  return r;
> +}
> +
> +weak_alias (__llrintf, llrintf)
> diff --git a/sysdeps/aarch64/fpu/s_llround.c b/sysdeps/aarch64/fpu/s_llround.c
> index ef7aedf..2902946 100644
> --- a/sysdeps/aarch64/fpu/s_llround.c
> +++ b/sysdeps/aarch64/fpu/s_llround.c
> @@ -16,7 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC llround
> -#define OTYPE long long int
> -#define OREG_SIZE 64
> -#include <s_lround.c>
> +#include <math.h>
> +
> +long long int
> +__llround (double x)
> +{
> +  return __builtin_llround (x);
> +}
> +
> +weak_alias (__llround, llround)
> diff --git a/sysdeps/aarch64/fpu/s_llroundf.c b/sysdeps/aarch64/fpu/s_llroundf.c
> index 294f0f4..0ca390b 100644
> --- a/sysdeps/aarch64/fpu/s_llroundf.c
> +++ b/sysdeps/aarch64/fpu/s_llroundf.c
> @@ -16,9 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC llroundf
> -#define ITYPE float
> -#define IREG_SIZE 32
> -#define OTYPE long long int
> -#define OREG_SIZE 64
> -#include <s_lround.c>
> +#include <math.h>
> +
> +long long int
> +__llroundf (float x)
> +{
> +  return __builtin_llroundf (x);
> +}
> +
> +weak_alias (__llroundf, llroundf)
> diff --git a/sysdeps/aarch64/fpu/s_lrint.c b/sysdeps/aarch64/fpu/s_lrint.c
> index 6ef64e2..ba4d7cc 100644
> --- a/sysdeps/aarch64/fpu/s_lrint.c
> +++ b/sysdeps/aarch64/fpu/s_lrint.c
> @@ -19,38 +19,17 @@
>  #include <math.h>
>  #include <get-rounding-mode.h>
>  #include <stdint.h>
> +#include <math_private.h>
>  
> -#ifndef FUNC
> -# define FUNC lrint
> -#endif
> -
> -#ifndef ITYPE
> -# define ITYPE double
>  # define IREG_SIZE 64
> -#else
> -# ifndef IREG_SIZE
> -#  error IREG_SIZE not defined
> -# endif
> -#endif
>  
> -#ifndef OTYPE
> -# define OTYPE long int
>  # ifdef __ILP32__
>  #  define OREG_SIZE 32
>  # else
>  #  define OREG_SIZE 64
>  # endif
> -#else
> -# ifndef OREG_SIZE
> -#  error OREG_SIZE not defined
> -# endif
> -#endif
>  
> -#if IREG_SIZE == 32
> -# define IREGS "s"
> -#else
>  # define IREGS "d"
> -#endif
>  
>  #if OREG_SIZE == 32
>  # define OREGS "w"
> @@ -58,15 +37,14 @@
>  # define OREGS "x"
>  #endif
>  
> -#define __CONCATX(a,b) __CONCAT(a,b)
>  
> -OTYPE
> -__CONCATX(__,FUNC) (ITYPE x)
> +long int
> +__lrint (double x)
>  {
> -  OTYPE result;
> -  ITYPE temp;
>  
>  #if IREG_SIZE == 64 && OREG_SIZE == 32
> +  long int result;
> +
>    if (__builtin_fabs (x) > INT32_MAX)
>      {
>        /* Converting large values to a 32 bit int may cause the frintx/fcvtza
> @@ -96,10 +74,14 @@ __CONCATX(__,FUNC) (ITYPE x)
>        return result;
>      }
>  #endif
> -  asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
> -        "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
> -        : "=r" (result), "=w" (temp) : "w" (x) );
> -  return result;
> +
> +  double r =  __builtin_rint (x);
> +
> +  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
> +  // by inserting a barrier
> +
> +  math_opt_barrier (r);
> +  return r;
>  }

Same comment as before about math_opt_barrier and the comments seems off (it is
referring to lrintf instead of lrint).

>  
> -weak_alias (__CONCATX(__,FUNC), FUNC)
> +weak_alias (__lrint, lrint)
> diff --git a/sysdeps/aarch64/fpu/s_lrintf.c b/sysdeps/aarch64/fpu/s_lrintf.c
> index 2e73271..8d4d4d0 100644
> --- a/sysdeps/aarch64/fpu/s_lrintf.c
> +++ b/sysdeps/aarch64/fpu/s_lrintf.c
> @@ -16,7 +16,19 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC lrintf
> -#define ITYPE float
> -#define IREG_SIZE 32
> -#include <s_lrint.c>
> +#include <math.h>
> +#include <math_private.h>
> +
> +long int
> +__lrintf (float x)
> +{
> +  float r = __builtin_rintf (x);
> +
> +  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
> +  // by inserting a barrier
> +
> +  math_opt_barrier (r);
> +  return r;
> +}
> +
> +weak_alias (__lrintf, lrintf)
> diff --git a/sysdeps/aarch64/fpu/s_lround.c b/sysdeps/aarch64/fpu/s_lround.c
> index 1f77d82..90c3163 100644
> --- a/sysdeps/aarch64/fpu/s_lround.c
> +++ b/sysdeps/aarch64/fpu/s_lround.c
> @@ -18,53 +18,10 @@
>  
>  #include <math.h>
>  
> -#ifndef FUNC
> -# define FUNC lround
> -#endif
> +long int
> +__lround (double x)
> + {
> +  return __builtin_lround (x);
> + }
>  
> -#ifndef ITYPE
> -# define ITYPE double
> -# define IREG_SIZE 64
> -#else
> -# ifndef IREG_SIZE
> -#  error IREG_SIZE not defined
> -# endif
> -#endif
> -
> -#ifndef OTYPE
> -# define OTYPE long int
> -# ifdef __ILP32__
> -#  define OREG_SIZE 32
> -# else
> -#  define OREG_SIZE 64
> -# endif
> -#else
> -# ifndef OREG_SIZE
> -#  error OREG_SIZE not defined
> -# endif
> -#endif
> -
> -#if IREG_SIZE == 32
> -# define IREGS "s"
> -#else
> -# define IREGS "d"
> -#endif
> -
> -#if OREG_SIZE == 32
> -# define OREGS "w"
> -#else
> -# define OREGS "x"
> -#endif
> -
> -#define __CONCATX(a,b) __CONCAT(a,b)
> -
> -OTYPE
> -__CONCATX(__,FUNC) (ITYPE x)
> -{
> -  OTYPE result;
> -  asm ( "fcvtas" "\t%" OREGS "0, %" IREGS "1"
> -        : "=r" (result) : "w" (x) );
> -  return result;
> -}
> -
> -weak_alias (__CONCATX(__,FUNC), FUNC)
> +weak_alias (__lround, lround)
> diff --git a/sysdeps/aarch64/fpu/s_lroundf.c b/sysdeps/aarch64/fpu/s_lroundf.c
> index b30ddb6..baf0693 100644
> --- a/sysdeps/aarch64/fpu/s_lroundf.c
> +++ b/sysdeps/aarch64/fpu/s_lroundf.c
> @@ -16,7 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC lroundf
> -#define ITYPE float
> -#define IREG_SIZE 32
> -#include <s_lround.c>
> +#include <math.h>
> +
> +long int
> +__lroundf (float x)
> +{
> +  return __builtin_lroundf (x);
> +}
> +
> +weak_alias (__lroundf, lroundf)
> diff --git a/sysdeps/aarch64/fpu/s_nearbyint.c b/sysdeps/aarch64/fpu/s_nearbyint.c
> index 51067f2..6ba5de1 100644
> --- a/sysdeps/aarch64/fpu/s_nearbyint.c
> +++ b/sysdeps/aarch64/fpu/s_nearbyint.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define	FUNC nearbyint
> -#define INSN "frinti"
> -#include <s_frint.c>
> +#include <math.h>
> +
> +double
> +__nearbyint (double x)
> +{
> +  return __builtin_nearbyint (x);
> +}
> +
> +weak_alias (__nearbyint, nearbyint)
> diff --git a/sysdeps/aarch64/fpu/s_nearbyintf.c b/sysdeps/aarch64/fpu/s_nearbyintf.c
> index 8125646..de69fd9 100644
> --- a/sysdeps/aarch64/fpu/s_nearbyintf.c
> +++ b/sysdeps/aarch64/fpu/s_nearbyintf.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC nearbyintf
> -#define INSN "frinti"
> -#include <s_frintf.c>
> +#include <math.h>
> +
> +float
> +__nearbyintf (float x)
> +{
> +  return __builtin_nearbyintf (x);
> +}
> +
> +weak_alias (__nearbyintf, nearbyintf)
> diff --git a/sysdeps/aarch64/fpu/s_rint.c b/sysdeps/aarch64/fpu/s_rint.c
> index 73b4e26..b4ac349 100644
> --- a/sysdeps/aarch64/fpu/s_rint.c
> +++ b/sysdeps/aarch64/fpu/s_rint.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC rint
> -#define INSN "frintx"
> -#include <s_frint.c>
> +#include <math.h>
> +
> +double
> +__rint (double x)
> +{
> +  return __builtin_rint (x);
> +}
> +
> +weak_alias (__rint, rint)
> diff --git a/sysdeps/aarch64/fpu/s_rintf.c b/sysdeps/aarch64/fpu/s_rintf.c
> index 3560dc2..d0f70ce 100644
> --- a/sysdeps/aarch64/fpu/s_rintf.c
> +++ b/sysdeps/aarch64/fpu/s_rintf.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC rintf
> -#define INSN "frintx"
> -#include <s_frintf.c>
> +#include <math.h>
> +
> +float
> +__rintf (float x)
> +{
> +  return __builtin_rintf (x);
> +}
> +
> +weak_alias (__rintf, rintf)
> diff --git a/sysdeps/aarch64/fpu/s_round.c b/sysdeps/aarch64/fpu/s_round.c
> index 6781748..a34fca1 100644
> --- a/sysdeps/aarch64/fpu/s_round.c
> +++ b/sysdeps/aarch64/fpu/s_round.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC round
> -#define INSN "frinta"
> -#include <s_frint.c>
> +#include <math.h>
> +
> +double
> +__round (double x)
> +{
> +  return __builtin_round (x);
> +}
> +
> +weak_alias (__round, round)
> diff --git a/sysdeps/aarch64/fpu/s_roundf.c b/sysdeps/aarch64/fpu/s_roundf.c
> index ef6f672..66c8ee6 100644
> --- a/sysdeps/aarch64/fpu/s_roundf.c
> +++ b/sysdeps/aarch64/fpu/s_roundf.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC roundf
> -#define INSN "frinta"
> -#include <s_frintf.c>
> +#include <math.h>
> +
> +float
> +__roundf (float x)
> +{
> +  return __builtin_roundf (x);
> +}
> +
> +weak_alias (__roundf, roundf)
> diff --git a/sysdeps/aarch64/fpu/s_trunc.c b/sysdeps/aarch64/fpu/s_trunc.c
> index 2bf5474..6550dfc 100644
> --- a/sysdeps/aarch64/fpu/s_trunc.c
> +++ b/sysdeps/aarch64/fpu/s_trunc.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC trunc
> -#define INSN "frintz"
> -#include <s_frint.c>
> +#include <math.h>
> +
> +double
> +__trunc (double x)
> +{
> +  return __builtin_trunc (x);
> +}
> +
> +weak_alias (__trunc, trunc)
> diff --git a/sysdeps/aarch64/fpu/s_truncf.c b/sysdeps/aarch64/fpu/s_truncf.c
> index 94865a4..b7890a2 100644
> --- a/sysdeps/aarch64/fpu/s_truncf.c
> +++ b/sysdeps/aarch64/fpu/s_truncf.c
> @@ -16,6 +16,12 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define FUNC truncf
> -#define INSN "frintz"
> -#include <s_frintf.c>
> +#include <math.h>
> +
> +float
> +__truncf (float x)
> +{
> +  return __builtin_truncf (x);
> +}
> +
> +weak_alias (__truncf, truncf)
> -- 1.9.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] aarch64: Implement math acceleration via builtins
  2017-10-13 22:37 ` Adhemerval Zanella
@ 2017-10-16  9:37   ` Szabolcs Nagy
  2017-10-17 16:43     ` Adhemerval Zanella
  2017-10-17 16:14   ` Szabolcs Nagy
  1 sibling, 1 reply; 10+ messages in thread
From: Szabolcs Nagy @ 2017-10-16  9:37 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha; +Cc: nd

On 13/10/17 23:36, Adhemerval Zanella wrote:
> On 13/10/2017 18:26, Michael Collison wrote:
>> +long long int
>> +__llrint (double x)
>> +{
>> +  double r = __builtin_rint (x);
>> +
>> +  // Prevent gcc from calling lrint directly when compiled with -fno-math-errno
>> +  // by inserting a barrier
>> +
>> +  math_opt_barrier (r);
> 
> I am not seeing any difference on code generation with or without the
> math barrier when using GCC 4.9, GCC7 or GCC mainline with or without
> -fno-math-errno (I am seeing just a combination of frintx plus fcvtzs).
> Do we really need this barrier here?
> 

https://godbolt.org/g/zgvReU

if gcc can do this transformation then we
better put a barrier there.

> Also I think the comment line is too long and I am not sure if it is
> usual to use '//' comments (usually we use default '/*' '*/' pairs).
> 
>> +  return r;
>> +}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] aarch64: Implement math acceleration via builtins
  2017-10-13 22:37 ` Adhemerval Zanella
  2017-10-16  9:37   ` Szabolcs Nagy
@ 2017-10-17 16:14   ` Szabolcs Nagy
  2017-10-17 16:48     ` Adhemerval Zanella
  2017-10-19 19:16     ` Michael Collison
  1 sibling, 2 replies; 10+ messages in thread
From: Szabolcs Nagy @ 2017-10-17 16:14 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Michael Collison; +Cc: nd

On 13/10/17 23:36, Adhemerval Zanella wrote:
> On 13/10/2017 18:26, Michael Collison wrote:
>> This patch converts asm statements into builtin and neon intrinsics for AArch64.
>> As an example for the file sysdeps/aarch64/fpu/_s_ceil.c, we convert the function from
>>
>> double
>> __ceil (double x)
>> {
>>   double result;
>>   asm ("frintp\t%d0, %d1" :
>>        "=w" (result) : "w" (x) );
>>   return result;
>> }
>>
>> into
>>
>> double
>> __ceil (double x)
>> {
>>   return __builtin_ceil (x);
>> }
>>
...
> Looks good in general with some comments below.  Is there any advantage
> of the refactor besides avoid inline assembly?

i think this is a code clean up patch (the generated code
should not change other than temp register numbers), it
avoids the current preprocessor hackery and inline asm.

> Since you refactoring it, I would also suggest to use builtin for the 
> aarch64 match_private.h sqrt and get/set fpcr at fpu_control.h.
> 

note that's no longer a code clean up patch:

e.g. gcc does not know the cost of inline asm so it cannot
schedule it optimally, but it has detailed model of the
builtins so using them in headers should be an improvement.

Wilco has some sqrt patches, so sqrt can be left alone
(if those patches don't progress then it can be fixed later)
https://sourceware.org/ml/libc-alpha/2017-09/msg00958.html
https://sourceware.org/ml/libc-alpha/2017-09/msg00959.html
https://sourceware.org/ml/libc-alpha/2017-09/msg00960.html

i think using builtins for fpcr/fpsr is a good idea.

__builtin_aarch64_get_fpcr
__builtin_aarch64_set_fpcr
__builtin_aarch64_get_fpsr
__builtin_aarch64_set_fpsr

please check if earliest supported gcc (4.9) have them,
i think fixing this can be a separate patch.

>> +long long int
>> +__llrint (double x)
>> +{
>> +  double r = __builtin_rint (x);
>> +
>> +  // Prevent gcc from calling lrint directly when compiled with -fno-math-errno
>> +  // by inserting a barrier
>> +
>> +  math_opt_barrier (r);
...
> 
> Also I think the comment line is too long and I am not sure if it is
> usual to use '//' comments (usually we use default '/*' '*/' pairs).
> 

please fix the comment style.

>> +long int
>> +__lrint (double x)
>>  {
>> -  OTYPE result;
>> -  ITYPE temp;
>>  
>>  #if IREG_SIZE == 64 && OREG_SIZE == 32
>> +  long int result;
>> +
>>    if (__builtin_fabs (x) > INT32_MAX)
>>      {
>>        /* Converting large values to a 32 bit int may cause the frintx/fcvtza
>> @@ -96,10 +74,14 @@ __CONCATX(__,FUNC) (ITYPE x)
>>        return result;
>>      }
>>  #endif
>> -  asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
>> -        "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
>> -        : "=r" (result), "=w" (temp) : "w" (x) );
>> -  return result;
>> +
>> +  double r =  __builtin_rint (x);
>> +
>> +  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
>> +  // by inserting a barrier
>> +
>> +  math_opt_barrier (r);
>> +  return r;
>>  }
> 
> Same comment as before about math_opt_barrier and the comments seems off (it is
> referring to lrintf instead of lrint).
> 

please fix the comment.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] aarch64: Implement math acceleration via builtins
  2017-10-16  9:37   ` Szabolcs Nagy
@ 2017-10-17 16:43     ` Adhemerval Zanella
  0 siblings, 0 replies; 10+ messages in thread
From: Adhemerval Zanella @ 2017-10-17 16:43 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-alpha; +Cc: nd



On 16/10/2017 07:36, Szabolcs Nagy wrote:
> On 13/10/17 23:36, Adhemerval Zanella wrote:
>> On 13/10/2017 18:26, Michael Collison wrote:
>>> +long long int
>>> +__llrint (double x)
>>> +{
>>> +  double r = __builtin_rint (x);
>>> +
>>> +  // Prevent gcc from calling lrint directly when compiled with -fno-math-errno
>>> +  // by inserting a barrier
>>> +
>>> +  math_opt_barrier (r);
>>
>> I am not seeing any difference on code generation with or without the
>> math barrier when using GCC 4.9, GCC7 or GCC mainline with or without
>> -fno-math-errno (I am seeing just a combination of frintx plus fcvtzs).
>> Do we really need this barrier here?
>>
> 
> https://godbolt.org/g/zgvReU
> 
> if gcc can do this transformation then we
> better put a barrier there.

Right, it seems that indeed GCC 5 and GCC 6 does this transformation.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] aarch64: Implement math acceleration via builtins
  2017-10-17 16:14   ` Szabolcs Nagy
@ 2017-10-17 16:48     ` Adhemerval Zanella
  2017-10-19 19:16     ` Michael Collison
  1 sibling, 0 replies; 10+ messages in thread
From: Adhemerval Zanella @ 2017-10-17 16:48 UTC (permalink / raw)
  To: Szabolcs Nagy, libc-alpha, Michael Collison; +Cc: nd



On 17/10/2017 14:14, Szabolcs Nagy wrote:
> On 13/10/17 23:36, Adhemerval Zanella wrote:
>> On 13/10/2017 18:26, Michael Collison wrote:
>>> This patch converts asm statements into builtin and neon intrinsics for AArch64.
>>> As an example for the file sysdeps/aarch64/fpu/_s_ceil.c, we convert the function from
>>>
>>> double
>>> __ceil (double x)
>>> {
>>>   double result;
>>>   asm ("frintp\t%d0, %d1" :
>>>        "=w" (result) : "w" (x) );
>>>   return result;
>>> }
>>>
>>> into
>>>
>>> double
>>> __ceil (double x)
>>> {
>>>   return __builtin_ceil (x);
>>> }
>>>
> ...
>> Looks good in general with some comments below.  Is there any advantage
>> of the refactor besides avoid inline assembly?
> 
> i think this is a code clean up patch (the generated code
> should not change other than temp register numbers), it
> avoids the current preprocessor hackery and inline asm.
> 
>> Since you refactoring it, I would also suggest to use builtin for the 
>> aarch64 match_private.h sqrt and get/set fpcr at fpu_control.h.
>>
> 
> note that's no longer a code clean up patch:
> 
> e.g. gcc does not know the cost of inline asm so it cannot
> schedule it optimally, but it has detailed model of the
> builtins so using them in headers should be an improvement.

Yes, I am aware of it.  My question was more out of curiosity if 
the code cleanup brings any non obvious performance improvement 
over libm.

> 
> Wilco has some sqrt patches, so sqrt can be left alone
> (if those patches don't progress then it can be fixed later)
> https://sourceware.org/ml/libc-alpha/2017-09/msg00958.html
> https://sourceware.org/ml/libc-alpha/2017-09/msg00959.html
> https://sourceware.org/ml/libc-alpha/2017-09/msg00960.html
> 
> i think using builtins for fpcr/fpsr is a good idea.

Ack.

> 
> __builtin_aarch64_get_fpcr
> __builtin_aarch64_set_fpcr
> __builtin_aarch64_get_fpsr
> __builtin_aarch64_set_fpsr
> 
> please check if earliest supported gcc (4.9) have them,
> i think fixing this can be a separate patch.

Alright, I will prepare a patch then.

> 
>>> +long long int
>>> +__llrint (double x)
>>> +{
>>> +  double r = __builtin_rint (x);
>>> +
>>> +  // Prevent gcc from calling lrint directly when compiled with -fno-math-errno
>>> +  // by inserting a barrier
>>> +
>>> +  math_opt_barrier (r);
> ...
>>
>> Also I think the comment line is too long and I am not sure if it is
>> usual to use '//' comments (usually we use default '/*' '*/' pairs).
>>
> 
> please fix the comment style.
> 
>>> +long int
>>> +__lrint (double x)
>>>  {
>>> -  OTYPE result;
>>> -  ITYPE temp;
>>>  
>>>  #if IREG_SIZE == 64 && OREG_SIZE == 32
>>> +  long int result;
>>> +
>>>    if (__builtin_fabs (x) > INT32_MAX)
>>>      {
>>>        /* Converting large values to a 32 bit int may cause the frintx/fcvtza
>>> @@ -96,10 +74,14 @@ __CONCATX(__,FUNC) (ITYPE x)
>>>        return result;
>>>      }
>>>  #endif
>>> -  asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
>>> -        "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
>>> -        : "=r" (result), "=w" (temp) : "w" (x) );
>>> -  return result;
>>> +
>>> +  double r =  __builtin_rint (x);
>>> +
>>> +  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
>>> +  // by inserting a barrier
>>> +
>>> +  math_opt_barrier (r);
>>> +  return r;
>>>  }
>>
>> Same comment as before about math_opt_barrier and the comments seems off (it is
>> referring to lrintf instead of lrint).
>>
> 
> please fix the comment.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] aarch64: Implement math acceleration via builtins
  2017-10-17 16:14   ` Szabolcs Nagy
  2017-10-17 16:48     ` Adhemerval Zanella
@ 2017-10-19 19:16     ` Michael Collison
  2017-10-20  9:36       ` Szabolcs Nagy
  1 sibling, 1 reply; 10+ messages in thread
From: Michael Collison @ 2017-10-19 19:16 UTC (permalink / raw)
  To: Szabolcs Nagy, Adhemerval Zanella, libc-alpha; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 6377 bytes --]

Patch updated to reflect recommended changes.

Okay for trunk?

2017-10-19  Michael Collison  <michael.collison@arm.com>

	* sysdeps/aarch64/fpu/e_sqrt.c (ieee754_sqrt): Replace
	asm statements with __builtin_sqrt.
	* sysdeps/aarch64/fpu/e_sqrtf.c (ieee754_sqrtf): Replace
	asm statements with __builtin_sqrtf.
	* sysdeps/aarch64/fpu/s_ceil.c (__ceil): Replace
	asm statements with __builtin_ceil.
	* sysdeps/aarch64/fpu/s_ceilf.c (__ceilf): Replace
	asm statements with __builtin_ceilf.
	* sysdeps/aarch64/fpu/s_floor.c (__floor): Replace
	asm statements with __builtin_floor.
	* sysdeps/aarch64/fpu/s_floorf.c (__floorf): Replace
	asm statements with __builtin_floorf.
	* sysdeps/aarch64/fpu/s_fma.c (__fma): Replace
	asm statements with __builtin_fma.
	* sysdeps/aarch64/fpu/s_fmaf.c (__fmaf): Replace
	asm statements with __builtin_fmaf.
	* sysdeps/aarch64/fpu/s_fmax.c (__fmax): Replace
	asm statements with __builtin_fmax.
	* sysdeps/aarch64/fpu/s_fmaxf.c (__fmaxf): Replace
	asm statements with __builtin_fmaxf.
	* sysdeps/aarch64/fpu/s_fmin.c (__fmin): Replace
	asm statements with __builtin_fmin.
	* sysdeps/aarch64/fpu/s_fminf.c (__fminf): Replace
	asm statements with __builtin_fminf.
	* sysdeps/aarch64/fpu/s_frint.c: Delete file.
	* sysdeps/aarch64/fpu/s_frintf.c: Delete file.
	* sysdeps/aarch64/fpu/s_llrint.c (__llrint): Replace
	asm statements with builtin and neon intrinsic.
	* sysdeps/aarch64/fpu/s_llrintf.c (__llrintf): Ditto.
	* sysdeps/aarch64/fpu/s_llround.c (__llround): Replace
	asm statements with neon intrinsic.
	* sysdeps/aarch64/fpu/s_llroundf.c (__llroundf): Ditto.
	* sysdeps/aarch64/fpu/s_lrint.c (__lrint): Replace
	asm statements with builtin and neon intrinsic.
	* sysdeps/aarch64/fpu/s_lrintf.c (__lrintf): Ditto.
	* sysdeps/aarch64/fpu/s_lround.c (__lround): Replace
	asm statements with neon intrinsic vcvtad_s64_f64.
	* sysdeps/aarch64/fpu/s_lroundf.c (__lroundf): Replace
	asm statements with neon intrinsic vcvtas_s32_f32.
	* sysdeps/aarch64/fpu/s_nearbyint.c (__nearbyint): Replace
	asm statements with __builtin_nearbyint.
	* sysdeps/aarch64/fpu/s_nearbyintf.c (__nearbyintf): Replace
	asm statements with __builtin_nearbyintf.
	* sysdeps/aarch64/fpu/s_rint.c (__rint): Replace
	asm statements with __builtin_rint.
	* sysdeps/aarch64/fpu/s_rintf.c (__rintf): Replace
	asm statements with __builtin_rintf.
	* sysdeps/aarch64/fpu/s_round.c (__round): Replace
	asm statements with __builtin_round.
	* sysdeps/aarch64/fpu/s_roundf.c (__roundf): Replace
	asm statements with __builtin_roundf.
	* sysdeps/aarch64/fpu/s_trunc.c (__trunc): Replace
	asm statements with __builtin_trunc.
	* sysdeps/aarch64/fpu/s_truncf.c (__truncf): Replace
	asm statements with __builtin_truncf.
	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with
	-fno-math-errno.

-----Original Message-----
From: Szabolcs Nagy 
Sent: Tuesday, October 17, 2017 9:15 AM
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>; libc-alpha@sourceware.org; Michael Collison <Michael.Collison@arm.com>
Cc: nd <nd@arm.com>
Subject: Re: [PATCH] aarch64: Implement math acceleration via builtins

On 13/10/17 23:36, Adhemerval Zanella wrote:
> On 13/10/2017 18:26, Michael Collison wrote:
>> This patch converts asm statements into builtin and neon intrinsics for AArch64.
>> As an example for the file sysdeps/aarch64/fpu/_s_ceil.c, we convert 
>> the function from
>>
>> double
>> __ceil (double x)
>> {
>>   double result;
>>   asm ("frintp\t%d0, %d1" :
>>        "=w" (result) : "w" (x) );
>>   return result;
>> }
>>
>> into
>>
>> double
>> __ceil (double x)
>> {
>>   return __builtin_ceil (x);
>> }
>>
...
> Looks good in general with some comments below.  Is there any 
> advantage of the refactor besides avoid inline assembly?

i think this is a code clean up patch (the generated code should not change other than temp register numbers), it avoids the current preprocessor hackery and inline asm.

> Since you refactoring it, I would also suggest to use builtin for the
> aarch64 match_private.h sqrt and get/set fpcr at fpu_control.h.
> 

note that's no longer a code clean up patch:

e.g. gcc does not know the cost of inline asm so it cannot
schedule it optimally, but it has detailed model of the
builtins so using them in headers should be an improvement.

Wilco has some sqrt patches, so sqrt can be left alone
(if those patches don't progress then it can be fixed later)
https://sourceware.org/ml/libc-alpha/2017-09/msg00958.html
https://sourceware.org/ml/libc-alpha/2017-09/msg00959.html
https://sourceware.org/ml/libc-alpha/2017-09/msg00960.html

i think using builtins for fpcr/fpsr is a good idea.

__builtin_aarch64_get_fpcr
__builtin_aarch64_set_fpcr
__builtin_aarch64_get_fpsr
__builtin_aarch64_set_fpsr

please check if earliest supported gcc (4.9) have them,
i think fixing this can be a separate patch.

>> +long long int
>> +__llrint (double x)
>> +{
>> +  double r = __builtin_rint (x);
>> +
>> +  // Prevent gcc from calling lrint directly when compiled with -fno-math-errno
>> +  // by inserting a barrier
>> +
>> +  math_opt_barrier (r);
...
> 
> Also I think the comment line is too long and I am not sure if it is
> usual to use '//' comments (usually we use default '/*' '*/' pairs).
> 

please fix the comment style.

>> +long int
>> +__lrint (double x)
>>  {
>> -  OTYPE result;
>> -  ITYPE temp;
>>  
>>  #if IREG_SIZE == 64 && OREG_SIZE == 32
>> +  long int result;
>> +
>>    if (__builtin_fabs (x) > INT32_MAX)
>>      {
>>        /* Converting large values to a 32 bit int may cause the frintx/fcvtza
>> @@ -96,10 +74,14 @@ __CONCATX(__,FUNC) (ITYPE x)
>>        return result;
>>      }
>>  #endif
>> -  asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
>> -        "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
>> -        : "=r" (result), "=w" (temp) : "w" (x) );
>> -  return result;
>> +
>> +  double r =  __builtin_rint (x);
>> +
>> +  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
>> +  // by inserting a barrier
>> +
>> +  math_opt_barrier (r);
>> +  return r;
>>  }
> 
> Same comment as before about math_opt_barrier and the comments seems off (it is
> referring to lrintf instead of lrint).
> 

please fix the comment.


[-- Attachment #2: gnutools-1925-rebase-v13.patch --]
[-- Type: application/octet-stream, Size: 19600 bytes --]

diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
new file mode 100644
index 0000000..4bd4e69
--- /dev/null
+++ b/sysdeps/aarch64/fpu/Makefile
@@ -0,0 +1,4 @@
+ifeq ($(subdir),math)
+CFLAGS-e_sqrtf.c += -fno-math-errno
+CFLAGS-e_sqrt.c += -fno-math-errno
+endif
diff --git a/sysdeps/aarch64/fpu/e_sqrt.c b/sysdeps/aarch64/fpu/e_sqrt.c
index f984d87..b80ac27 100644
--- a/sysdeps/aarch64/fpu/e_sqrt.c
+++ b/sysdeps/aarch64/fpu/e_sqrt.c
@@ -21,8 +21,6 @@
 double
 __ieee754_sqrt (double d)
 {
-  double res;
-  asm ("fsqrt   %d0, %d1" : "=w" (res) : "w" (d));
-  return res;
+  return __builtin_sqrt (d);
 }
 strong_alias (__ieee754_sqrt, __sqrt_finite)
diff --git a/sysdeps/aarch64/fpu/e_sqrtf.c b/sysdeps/aarch64/fpu/e_sqrtf.c
index 67707ef..7380454 100644
--- a/sysdeps/aarch64/fpu/e_sqrtf.c
+++ b/sysdeps/aarch64/fpu/e_sqrtf.c
@@ -21,8 +21,6 @@
 float
 __ieee754_sqrtf (float s)
 {
-  float res;
-  asm ("fsqrt   %s0, %s1" : "=w" (res) : "w" (s));
-  return res;
+  return __builtin_sqrtf (s);
 }
 strong_alias (__ieee754_sqrtf, __sqrtf_finite)
diff --git a/sysdeps/aarch64/fpu/s_ceil.c b/sysdeps/aarch64/fpu/s_ceil.c
index d0a8bd8..bc90ab9 100644
--- a/sysdeps/aarch64/fpu/s_ceil.c
+++ b/sysdeps/aarch64/fpu/s_ceil.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define	FUNC ceil
-#define INSN "frintp"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__ceil (double x)
+{
+  return __builtin_ceil (x);
+}
+
+weak_alias (__ceil, ceil)
diff --git a/sysdeps/aarch64/fpu/s_ceilf.c b/sysdeps/aarch64/fpu/s_ceilf.c
index b9c2e7c..d5c4383 100644
--- a/sysdeps/aarch64/fpu/s_ceilf.c
+++ b/sysdeps/aarch64/fpu/s_ceilf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define	FUNC ceilf
-#define INSN "frintp"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__ceilf (float x)
+{
+  return __builtin_ceilf (x);
+}
+
+weak_alias (__ceilf, ceilf)
diff --git a/sysdeps/aarch64/fpu/s_floor.c b/sysdeps/aarch64/fpu/s_floor.c
index f7f8731..049535c 100644
--- a/sysdeps/aarch64/fpu/s_floor.c
+++ b/sysdeps/aarch64/fpu/s_floor.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC floor
-#define INSN "frintm"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__floor (double x)
+{
+  return __builtin_floor (x);
+}
+
+weak_alias (__floor, floor)
diff --git a/sysdeps/aarch64/fpu/s_floorf.c b/sysdeps/aarch64/fpu/s_floorf.c
index 7be63b5..fa6fa17 100644
--- a/sysdeps/aarch64/fpu/s_floorf.c
+++ b/sysdeps/aarch64/fpu/s_floorf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC floorf
-#define INSN "frintm"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__floorf (float x)
+{
+  return __builtin_floorf (x);
+}
+
+weak_alias (__floorf, floorf)
diff --git a/sysdeps/aarch64/fpu/s_fma.c b/sysdeps/aarch64/fpu/s_fma.c
index 6f62ce2..e496ec6 100644
--- a/sysdeps/aarch64/fpu/s_fma.c
+++ b/sysdeps/aarch64/fpu/s_fma.c
@@ -18,28 +18,10 @@
 
 #include <math.h>
 
-#ifndef FUNC
-# define FUNC fma
-#endif
-
-#ifndef TYPE
-# define TYPE double
-# define REGS "d"
-#else
-# ifndef REGS
-#  error REGS not defined
-# endif
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-TYPE
-__CONCATX(__,FUNC) (TYPE x, TYPE y, TYPE z)
+double
+__fma (double x, double y, double z)
 {
-  TYPE result;
-  asm ( "fmadd" "\t%" REGS "0, %" REGS "1, %" REGS "2, %" REGS "3"
-        : "=w" (result) : "w" (x), "w" (y), "w" (z) );
-  return result;
+  return __builtin_fma (x, y, z);
 }
 
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__fma, fma)
diff --git a/sysdeps/aarch64/fpu/s_fmaf.c b/sysdeps/aarch64/fpu/s_fmaf.c
index 880a22d..ff1abbf 100644
--- a/sysdeps/aarch64/fpu/s_fmaf.c
+++ b/sysdeps/aarch64/fpu/s_fmaf.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fmaf
-#define TYPE float
-#define REGS "s"
-#include <s_fma.c>
+#include <math.h>
+
+float
+__fmaf (float x, float y, float z)
+{
+  return __builtin_fmaf (x, y, z);
+}
+
+weak_alias (__fmaf, fmaf)
diff --git a/sysdeps/aarch64/fpu/s_fmax.c b/sysdeps/aarch64/fpu/s_fmax.c
index 395a9ba..d7a82f8 100644
--- a/sysdeps/aarch64/fpu/s_fmax.c
+++ b/sysdeps/aarch64/fpu/s_fmax.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fmax
-#define INSN "fmaxnm"
-#include <fpu/s_fmin.c>
+#include <math.h>
+
+double
+__fmax (double x, double y)
+{
+  return __builtin_fmax (x, y);
+}
+
+weak_alias (__fmax, fmax)
diff --git a/sysdeps/aarch64/fpu/s_fmaxf.c b/sysdeps/aarch64/fpu/s_fmaxf.c
index f450d9f..ec4dcdd 100644
--- a/sysdeps/aarch64/fpu/s_fmaxf.c
+++ b/sysdeps/aarch64/fpu/s_fmaxf.c
@@ -16,8 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fmaxf
-#define INSN "fmaxnm"
-#define TYPE float
-#define REGS "s"
-#include <fpu/s_fmin.c>
+#include <math.h>
+
+float
+__fmaxf (float x, float y)
+{
+  return __builtin_fmaxf (x, y);
+}
+
+weak_alias (__fmaxf, fmaxf)
diff --git a/sysdeps/aarch64/fpu/s_fmin.c b/sysdeps/aarch64/fpu/s_fmin.c
index b6d32d5..bba894e 100644
--- a/sysdeps/aarch64/fpu/s_fmin.c
+++ b/sysdeps/aarch64/fpu/s_fmin.c
@@ -18,32 +18,10 @@
 
 #include <math.h>
 
-#ifndef FUNC
-# define FUNC fmin
-#endif
-
-#ifndef INSN
-# define INSN "fminnm"
-#endif
-
-#ifndef TYPE
-# define TYPE double
-# define REGS "d"
-#else
-# ifndef REGS
-#  error REGS not defined
-# endif
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-TYPE
-__CONCATX(__,FUNC) (TYPE x, TYPE y)
+double
+__fmin (double x, double y)
 {
-  TYPE result;
-  asm ( INSN "\t%" REGS "0, %" REGS "1, %" REGS "2"
-        : "=w" (result) : "w" (x), "w" (y) );
-  return result;
+  return __builtin_fmin (x, y);
 }
 
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__fmin, fmin)
diff --git a/sysdeps/aarch64/fpu/s_fminf.c b/sysdeps/aarch64/fpu/s_fminf.c
index 032262d..7d3a3a3 100644
--- a/sysdeps/aarch64/fpu/s_fminf.c
+++ b/sysdeps/aarch64/fpu/s_fminf.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fminf
-#define TYPE float
-#define REGS "s"
-#include <fpu/s_fmin.c>
+#include <math.h>
+
+float
+__fminf (float x, float y)
+{
+  return __builtin_fminf (x, y);
+}
+
+weak_alias (__fminf, fminf)
diff --git a/sysdeps/aarch64/fpu/s_frint.c b/sysdeps/aarch64/fpu/s_frint.c
deleted file mode 100644
index 48881f5..0000000
--- a/sysdeps/aarch64/fpu/s_frint.c
+++ /dev/null
@@ -1,49 +0,0 @@
-/* Copyright (C) 1996-2017 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#ifndef FUNC
-# error FUNC not defined
-#endif
-
-#ifndef TYPE
-# define TYPE double
-# define REGS "d"
-#else
-# ifndef REGS
-#  error REGS not defined
-# endif
-#endif
-
-#ifndef INSN
-# error INSN not defined
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-TYPE
-__CONCATX(__,FUNC) (TYPE x)
-{
-  TYPE result;
-  asm ( INSN "\t%" REGS "0, %" REGS "1" :
-	"=w" (result) : "w" (x) );
-  return result;
-}
-
-weak_alias (__CONCATX(__,FUNC), FUNC)
diff --git a/sysdeps/aarch64/fpu/s_frintf.c b/sysdeps/aarch64/fpu/s_frintf.c
deleted file mode 100644
index dae99d7..0000000
--- a/sysdeps/aarch64/fpu/s_frintf.c
+++ /dev/null
@@ -1,24 +0,0 @@
-/* Copyright (C) 2011-2017 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#ifndef FUNC
-# error FUNC not defined
-#endif
-#define TYPE float
-#define REGS "s"
-#include <s_frint.c>
diff --git a/sysdeps/aarch64/fpu/s_llrint.c b/sysdeps/aarch64/fpu/s_llrint.c
index 57821c0..f0e0b09 100644
--- a/sysdeps/aarch64/fpu/s_llrint.c
+++ b/sysdeps/aarch64/fpu/s_llrint.c
@@ -16,7 +16,19 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llrint
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lrint.c>
+#include <math.h>
+#include <math_private.h>
+
+long long int
+__llrint (double x)
+{
+  double r = __builtin_rint (x);
+
+  /* Prevent gcc from calling llrint directly when compiled with
+     -fno-math-errno by inserting a barrier.  */
+
+  math_opt_barrier (r);
+  return r;
+}
+
+weak_alias (__llrint, llrint)
diff --git a/sysdeps/aarch64/fpu/s_llrintf.c b/sysdeps/aarch64/fpu/s_llrintf.c
index 98ed4f8..f115749 100644
--- a/sysdeps/aarch64/fpu/s_llrintf.c
+++ b/sysdeps/aarch64/fpu/s_llrintf.c
@@ -16,9 +16,20 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llrintf
-#define ITYPE float
-#define IREG_SIZE 32
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lrint.c>
+#include <math.h>
+#include <math_private.h>
+
+long long int
+__llrintf (float x)
+{
+  float r = __builtin_rintf (x);
+
+  /* Prevent gcc from calling rintf directly when compiled with
+     -fno-math-errno by inserting a barrier.  */
+
+
+  math_opt_barrier (r);
+  return r;
+}
+
+weak_alias (__llrintf, llrintf)
diff --git a/sysdeps/aarch64/fpu/s_llround.c b/sysdeps/aarch64/fpu/s_llround.c
index ef7aedf..2902946 100644
--- a/sysdeps/aarch64/fpu/s_llround.c
+++ b/sysdeps/aarch64/fpu/s_llround.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llround
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lround.c>
+#include <math.h>
+
+long long int
+__llround (double x)
+{
+  return __builtin_llround (x);
+}
+
+weak_alias (__llround, llround)
diff --git a/sysdeps/aarch64/fpu/s_llroundf.c b/sysdeps/aarch64/fpu/s_llroundf.c
index 294f0f4..0ca390b 100644
--- a/sysdeps/aarch64/fpu/s_llroundf.c
+++ b/sysdeps/aarch64/fpu/s_llroundf.c
@@ -16,9 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llroundf
-#define ITYPE float
-#define IREG_SIZE 32
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lround.c>
+#include <math.h>
+
+long long int
+__llroundf (float x)
+{
+  return __builtin_llroundf (x);
+}
+
+weak_alias (__llroundf, llroundf)
diff --git a/sysdeps/aarch64/fpu/s_lrint.c b/sysdeps/aarch64/fpu/s_lrint.c
index 6ef64e2..ee645ad 100644
--- a/sysdeps/aarch64/fpu/s_lrint.c
+++ b/sysdeps/aarch64/fpu/s_lrint.c
@@ -19,38 +19,17 @@
 #include <math.h>
 #include <get-rounding-mode.h>
 #include <stdint.h>
+#include <math_private.h>
 
-#ifndef FUNC
-# define FUNC lrint
-#endif
-
-#ifndef ITYPE
-# define ITYPE double
 # define IREG_SIZE 64
-#else
-# ifndef IREG_SIZE
-#  error IREG_SIZE not defined
-# endif
-#endif
 
-#ifndef OTYPE
-# define OTYPE long int
 # ifdef __ILP32__
 #  define OREG_SIZE 32
 # else
 #  define OREG_SIZE 64
 # endif
-#else
-# ifndef OREG_SIZE
-#  error OREG_SIZE not defined
-# endif
-#endif
 
-#if IREG_SIZE == 32
-# define IREGS "s"
-#else
 # define IREGS "d"
-#endif
 
 #if OREG_SIZE == 32
 # define OREGS "w"
@@ -58,15 +37,14 @@
 # define OREGS "x"
 #endif
 
-#define __CONCATX(a,b) __CONCAT(a,b)
 
-OTYPE
-__CONCATX(__,FUNC) (ITYPE x)
+long int
+__lrint (double x)
 {
-  OTYPE result;
-  ITYPE temp;
 
 #if IREG_SIZE == 64 && OREG_SIZE == 32
+  long int result;
+
   if (__builtin_fabs (x) > INT32_MAX)
     {
       /* Converting large values to a 32 bit int may cause the frintx/fcvtza
@@ -96,10 +74,14 @@ __CONCATX(__,FUNC) (ITYPE x)
       return result;
     }
 #endif
-  asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
-        "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
-        : "=r" (result), "=w" (temp) : "w" (x) );
-  return result;
+
+  double r =  __builtin_rint (x);
+
+  /* Prevent gcc from calling lrint directly when compiled with
+     -fno-math-errno by inserting a barrier.  */
+
+  math_opt_barrier (r);
+  return r;
 }
 
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__lrint, lrint)
diff --git a/sysdeps/aarch64/fpu/s_lrintf.c b/sysdeps/aarch64/fpu/s_lrintf.c
index 2e73271..0d28344 100644
--- a/sysdeps/aarch64/fpu/s_lrintf.c
+++ b/sysdeps/aarch64/fpu/s_lrintf.c
@@ -16,7 +16,19 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC lrintf
-#define ITYPE float
-#define IREG_SIZE 32
-#include <s_lrint.c>
+#include <math.h>
+#include <math_private.h>
+
+long int
+__lrintf (float x)
+{
+  float r = __builtin_rintf (x);
+
+  /* Prevent gcc from calling rintf directly when compiled with
+     -fno-math-errno by inserting a barrier.  */
+
+  math_opt_barrier (r);
+  return r;
+}
+
+weak_alias (__lrintf, lrintf)
diff --git a/sysdeps/aarch64/fpu/s_lround.c b/sysdeps/aarch64/fpu/s_lround.c
index 1f77d82..90c3163 100644
--- a/sysdeps/aarch64/fpu/s_lround.c
+++ b/sysdeps/aarch64/fpu/s_lround.c
@@ -18,53 +18,10 @@
 
 #include <math.h>
 
-#ifndef FUNC
-# define FUNC lround
-#endif
+long int
+__lround (double x)
+ {
+  return __builtin_lround (x);
+ }
 
-#ifndef ITYPE
-# define ITYPE double
-# define IREG_SIZE 64
-#else
-# ifndef IREG_SIZE
-#  error IREG_SIZE not defined
-# endif
-#endif
-
-#ifndef OTYPE
-# define OTYPE long int
-# ifdef __ILP32__
-#  define OREG_SIZE 32
-# else
-#  define OREG_SIZE 64
-# endif
-#else
-# ifndef OREG_SIZE
-#  error OREG_SIZE not defined
-# endif
-#endif
-
-#if IREG_SIZE == 32
-# define IREGS "s"
-#else
-# define IREGS "d"
-#endif
-
-#if OREG_SIZE == 32
-# define OREGS "w"
-#else
-# define OREGS "x"
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-OTYPE
-__CONCATX(__,FUNC) (ITYPE x)
-{
-  OTYPE result;
-  asm ( "fcvtas" "\t%" OREGS "0, %" IREGS "1"
-        : "=r" (result) : "w" (x) );
-  return result;
-}
-
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__lround, lround)
diff --git a/sysdeps/aarch64/fpu/s_lroundf.c b/sysdeps/aarch64/fpu/s_lroundf.c
index b30ddb6..baf0693 100644
--- a/sysdeps/aarch64/fpu/s_lroundf.c
+++ b/sysdeps/aarch64/fpu/s_lroundf.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC lroundf
-#define ITYPE float
-#define IREG_SIZE 32
-#include <s_lround.c>
+#include <math.h>
+
+long int
+__lroundf (float x)
+{
+  return __builtin_lroundf (x);
+}
+
+weak_alias (__lroundf, lroundf)
diff --git a/sysdeps/aarch64/fpu/s_nearbyint.c b/sysdeps/aarch64/fpu/s_nearbyint.c
index 51067f2..6ba5de1 100644
--- a/sysdeps/aarch64/fpu/s_nearbyint.c
+++ b/sysdeps/aarch64/fpu/s_nearbyint.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define	FUNC nearbyint
-#define INSN "frinti"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__nearbyint (double x)
+{
+  return __builtin_nearbyint (x);
+}
+
+weak_alias (__nearbyint, nearbyint)
diff --git a/sysdeps/aarch64/fpu/s_nearbyintf.c b/sysdeps/aarch64/fpu/s_nearbyintf.c
index 8125646..de69fd9 100644
--- a/sysdeps/aarch64/fpu/s_nearbyintf.c
+++ b/sysdeps/aarch64/fpu/s_nearbyintf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC nearbyintf
-#define INSN "frinti"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__nearbyintf (float x)
+{
+  return __builtin_nearbyintf (x);
+}
+
+weak_alias (__nearbyintf, nearbyintf)
diff --git a/sysdeps/aarch64/fpu/s_rint.c b/sysdeps/aarch64/fpu/s_rint.c
index 73b4e26..b4ac349 100644
--- a/sysdeps/aarch64/fpu/s_rint.c
+++ b/sysdeps/aarch64/fpu/s_rint.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC rint
-#define INSN "frintx"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__rint (double x)
+{
+  return __builtin_rint (x);
+}
+
+weak_alias (__rint, rint)
diff --git a/sysdeps/aarch64/fpu/s_rintf.c b/sysdeps/aarch64/fpu/s_rintf.c
index 3560dc2..d0f70ce 100644
--- a/sysdeps/aarch64/fpu/s_rintf.c
+++ b/sysdeps/aarch64/fpu/s_rintf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC rintf
-#define INSN "frintx"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__rintf (float x)
+{
+  return __builtin_rintf (x);
+}
+
+weak_alias (__rintf, rintf)
diff --git a/sysdeps/aarch64/fpu/s_round.c b/sysdeps/aarch64/fpu/s_round.c
index 6781748..a34fca1 100644
--- a/sysdeps/aarch64/fpu/s_round.c
+++ b/sysdeps/aarch64/fpu/s_round.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC round
-#define INSN "frinta"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__round (double x)
+{
+  return __builtin_round (x);
+}
+
+weak_alias (__round, round)
diff --git a/sysdeps/aarch64/fpu/s_roundf.c b/sysdeps/aarch64/fpu/s_roundf.c
index ef6f672..66c8ee6 100644
--- a/sysdeps/aarch64/fpu/s_roundf.c
+++ b/sysdeps/aarch64/fpu/s_roundf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC roundf
-#define INSN "frinta"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__roundf (float x)
+{
+  return __builtin_roundf (x);
+}
+
+weak_alias (__roundf, roundf)
diff --git a/sysdeps/aarch64/fpu/s_trunc.c b/sysdeps/aarch64/fpu/s_trunc.c
index 2bf5474..6550dfc 100644
--- a/sysdeps/aarch64/fpu/s_trunc.c
+++ b/sysdeps/aarch64/fpu/s_trunc.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC trunc
-#define INSN "frintz"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__trunc (double x)
+{
+  return __builtin_trunc (x);
+}
+
+weak_alias (__trunc, trunc)
diff --git a/sysdeps/aarch64/fpu/s_truncf.c b/sysdeps/aarch64/fpu/s_truncf.c
index 94865a4..b7890a2 100644
--- a/sysdeps/aarch64/fpu/s_truncf.c
+++ b/sysdeps/aarch64/fpu/s_truncf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC truncf
-#define INSN "frintz"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__truncf (float x)
+{
+  return __builtin_truncf (x);
+}
+
+weak_alias (__truncf, truncf)
-- 
1.9.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] aarch64: Implement math acceleration via builtins
  2017-10-19 19:16     ` Michael Collison
@ 2017-10-20  9:36       ` Szabolcs Nagy
  2017-10-20 20:16         ` Michael Collison
  0 siblings, 1 reply; 10+ messages in thread
From: Szabolcs Nagy @ 2017-10-20  9:36 UTC (permalink / raw)
  To: Michael Collison, Adhemerval Zanella, libc-alpha; +Cc: nd

On 19/10/17 20:16, Michael Collison wrote:
> Patch updated to reflect recommended changes.
> 
> Okay for trunk?
> 
> 2017-10-19  Michael Collison  <michael.collison@arm.com>
> 
> 	* sysdeps/aarch64/fpu/e_sqrt.c (ieee754_sqrt): Replace
> 	asm statements with __builtin_sqrt.
> 	* sysdeps/aarch64/fpu/e_sqrtf.c (ieee754_sqrtf): Replace
> 	asm statements with __builtin_sqrtf.
> 	* sysdeps/aarch64/fpu/s_ceil.c (__ceil): Replace
> 	asm statements with __builtin_ceil.
> 	* sysdeps/aarch64/fpu/s_ceilf.c (__ceilf): Replace
> 	asm statements with __builtin_ceilf.
> 	* sysdeps/aarch64/fpu/s_floor.c (__floor): Replace
> 	asm statements with __builtin_floor.
> 	* sysdeps/aarch64/fpu/s_floorf.c (__floorf): Replace
> 	asm statements with __builtin_floorf.
> 	* sysdeps/aarch64/fpu/s_fma.c (__fma): Replace
> 	asm statements with __builtin_fma.
> 	* sysdeps/aarch64/fpu/s_fmaf.c (__fmaf): Replace
> 	asm statements with __builtin_fmaf.
> 	* sysdeps/aarch64/fpu/s_fmax.c (__fmax): Replace
> 	asm statements with __builtin_fmax.
> 	* sysdeps/aarch64/fpu/s_fmaxf.c (__fmaxf): Replace
> 	asm statements with __builtin_fmaxf.
> 	* sysdeps/aarch64/fpu/s_fmin.c (__fmin): Replace
> 	asm statements with __builtin_fmin.
> 	* sysdeps/aarch64/fpu/s_fminf.c (__fminf): Replace
> 	asm statements with __builtin_fminf.
> 	* sysdeps/aarch64/fpu/s_frint.c: Delete file.
> 	* sysdeps/aarch64/fpu/s_frintf.c: Delete file.
> 	* sysdeps/aarch64/fpu/s_llrint.c (__llrint): Replace
> 	asm statements with builtin and neon intrinsic.

neon intrinsic is not used, it's builtin and conversion to int.
(same for the rest of the changelog)

> 	* sysdeps/aarch64/fpu/s_llrintf.c (__llrintf): Ditto.

s/Ditto/Likewise/
(same for the rest of the changelog)

> 	* sysdeps/aarch64/fpu/s_llround.c (__llround): Replace
> 	asm statements with neon intrinsic.
> 	* sysdeps/aarch64/fpu/s_llroundf.c (__llroundf): Ditto.
> 	* sysdeps/aarch64/fpu/s_lrint.c (__lrint): Replace
> 	asm statements with builtin and neon intrinsic.
> 	* sysdeps/aarch64/fpu/s_lrintf.c (__lrintf): Ditto.
> 	* sysdeps/aarch64/fpu/s_lround.c (__lround): Replace
> 	asm statements with neon intrinsic vcvtad_s64_f64.
> 	* sysdeps/aarch64/fpu/s_lroundf.c (__lroundf): Replace
> 	asm statements with neon intrinsic vcvtas_s32_f32.
> 	* sysdeps/aarch64/fpu/s_nearbyint.c (__nearbyint): Replace
> 	asm statements with __builtin_nearbyint.
> 	* sysdeps/aarch64/fpu/s_nearbyintf.c (__nearbyintf): Replace
> 	asm statements with __builtin_nearbyintf.
> 	* sysdeps/aarch64/fpu/s_rint.c (__rint): Replace
> 	asm statements with __builtin_rint.
> 	* sysdeps/aarch64/fpu/s_rintf.c (__rintf): Replace
> 	asm statements with __builtin_rintf.
> 	* sysdeps/aarch64/fpu/s_round.c (__round): Replace
> 	asm statements with __builtin_round.
> 	* sysdeps/aarch64/fpu/s_roundf.c (__roundf): Replace
> 	asm statements with __builtin_roundf.
> 	* sysdeps/aarch64/fpu/s_trunc.c (__trunc): Replace
> 	asm statements with __builtin_trunc.
> 	* sysdeps/aarch64/fpu/s_truncf.c (__truncf): Replace
> 	asm statements with __builtin_truncf.
> 	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with
> 	-fno-math-errno.
> 

OK to commit with the ChangeLog fixes and comment fixes listed
below. Please remove neon intrinsics from the commit message
(text of your original mail) before commit (and let me know
if you don't have commit rights so i'll do it for you).

> +long long int
> +__llrintf (float x)
> +{
> +  float r = __builtin_rintf (x);
> +
> +  /* Prevent gcc from calling rintf directly when compiled with
> +     -fno-math-errno by inserting a barrier.  */

use llrintf in the comment.

> +long int
> +__lrintf (float x)
> +{
> +  float r = __builtin_rintf (x);
> +
> +  /* Prevent gcc from calling rintf directly when compiled with
> +     -fno-math-errno by inserting a barrier.  */

use lrintf in the comment.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] aarch64: Implement math acceleration via builtins
  2017-10-20  9:36       ` Szabolcs Nagy
@ 2017-10-20 20:16         ` Michael Collison
  2017-10-23 14:43           ` Szabolcs Nagy
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Collison @ 2017-10-20 20:16 UTC (permalink / raw)
  To: Szabolcs Nagy, Adhemerval Zanella, libc-alpha; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 7157 bytes --]

Patch updated with upstream comments. ChangeLog updated as well.

Tested on aarch64-linux-gnu with gcc-5.4 and gcc-6. Okay for trunk?

2017-10-20  Michael Collison  <michael.collison@arm.com>

	* sysdeps/aarch64/fpu/e_sqrt.c (ieee754_sqrt): Replace
	asm statements with __builtin_sqrt.
	* sysdeps/aarch64/fpu/e_sqrtf.c (ieee754_sqrtf): Replace
	asm statements with __builtin_sqrtf.
	* sysdeps/aarch64/fpu/s_ceil.c (__ceil): Replace
	asm statements with __builtin_ceil.
	* sysdeps/aarch64/fpu/s_ceilf.c (__ceilf): Replace
	asm statements with __builtin_ceilf.
	* sysdeps/aarch64/fpu/s_floor.c (__floor): Replace
	asm statements with __builtin_floor.
	* sysdeps/aarch64/fpu/s_floorf.c (__floorf): Replace
	asm statements with __builtin_floorf.
	* sysdeps/aarch64/fpu/s_fma.c (__fma): Replace
	asm statements with __builtin_fma.
	* sysdeps/aarch64/fpu/s_fmaf.c (__fmaf): Replace
	asm statements with __builtin_fmaf.
	* sysdeps/aarch64/fpu/s_fmax.c (__fmax): Replace
	asm statements with __builtin_fmax.
	* sysdeps/aarch64/fpu/s_fmaxf.c (__fmaxf): Replace
	asm statements with __builtin_fmaxf.
	* sysdeps/aarch64/fpu/s_fmin.c (__fmin): Replace
	asm statements with __builtin_fmin.
	* sysdeps/aarch64/fpu/s_fminf.c (__fminf): Replace
	asm statements with __builtin_fminf.
	* sysdeps/aarch64/fpu/s_frint.c: Delete file.
	* sysdeps/aarch64/fpu/s_frintf.c: Delete file.
	* sysdeps/aarch64/fpu/s_llrint.c (__llrint): Replace
	asm statements with builtin_rint and conversion to int.
	* sysdeps/aarch64/fpu/s_llrintf.c (__llrintf): Ditto.
	* sysdeps/aarch64/fpu/s_llround.c (__llround): Replace
	asm statements with builtin_llround.
	* sysdeps/aarch64/fpu/s_llroundf.c (__llroundf): Ditto.
	* sysdeps/aarch64/fpu/s_lrint.c (__lrint): Replace
	asm statements with builtin_rint and conversion to long
	int.
	* sysdeps/aarch64/fpu/s_lrintf.c (__lrintf): Ditto.
	* sysdeps/aarch64/fpu/s_lround.c (__lround): Replace
	asm statements with builtin_lround.
	* sysdeps/aarch64/fpu/s_lroundf.c (__lroundf): Replace
	asm statements with builtin_lroundf.
	* sysdeps/aarch64/fpu/s_nearbyint.c (__nearbyint): Replace
	asm statements with __builtin_nearbyint.
	* sysdeps/aarch64/fpu/s_nearbyintf.c (__nearbyintf): Replace
	asm statements with __builtin_nearbyintf.
	* sysdeps/aarch64/fpu/s_rint.c (__rint): Replace
	asm statements with __builtin_rint.
	* sysdeps/aarch64/fpu/s_rintf.c (__rintf): Replace
	asm statements with __builtin_rintf.
	* sysdeps/aarch64/fpu/s_round.c (__round): Replace
	asm statements with __builtin_round.
	* sysdeps/aarch64/fpu/s_roundf.c (__roundf): Replace
	asm statements with __builtin_roundf.
	* sysdeps/aarch64/fpu/s_trunc.c (__trunc): Replace
	asm statements with __builtin_trunc.
	* sysdeps/aarch64/fpu/s_truncf.c (__truncf): Replace
	asm statements with __builtin_truncf.
	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with
	-fno-math-errno.

-----Original Message-----
From: Szabolcs Nagy 
Sent: Friday, October 20, 2017 2:36 AM
To: Michael Collison <Michael.Collison@arm.com>; Adhemerval Zanella <adhemerval.zanella@linaro.org>; libc-alpha@sourceware.org
Cc: nd <nd@arm.com>
Subject: Re: [PATCH] aarch64: Implement math acceleration via builtins

On 19/10/17 20:16, Michael Collison wrote:
> Patch updated to reflect recommended changes.
> 
> Okay for trunk?
> 
> 2017-10-19  Michael Collison  <michael.collison@arm.com>
> 
> 	* sysdeps/aarch64/fpu/e_sqrt.c (ieee754_sqrt): Replace
> 	asm statements with __builtin_sqrt.
> 	* sysdeps/aarch64/fpu/e_sqrtf.c (ieee754_sqrtf): Replace
> 	asm statements with __builtin_sqrtf.
> 	* sysdeps/aarch64/fpu/s_ceil.c (__ceil): Replace
> 	asm statements with __builtin_ceil.
> 	* sysdeps/aarch64/fpu/s_ceilf.c (__ceilf): Replace
> 	asm statements with __builtin_ceilf.
> 	* sysdeps/aarch64/fpu/s_floor.c (__floor): Replace
> 	asm statements with __builtin_floor.
> 	* sysdeps/aarch64/fpu/s_floorf.c (__floorf): Replace
> 	asm statements with __builtin_floorf.
> 	* sysdeps/aarch64/fpu/s_fma.c (__fma): Replace
> 	asm statements with __builtin_fma.
> 	* sysdeps/aarch64/fpu/s_fmaf.c (__fmaf): Replace
> 	asm statements with __builtin_fmaf.
> 	* sysdeps/aarch64/fpu/s_fmax.c (__fmax): Replace
> 	asm statements with __builtin_fmax.
> 	* sysdeps/aarch64/fpu/s_fmaxf.c (__fmaxf): Replace
> 	asm statements with __builtin_fmaxf.
> 	* sysdeps/aarch64/fpu/s_fmin.c (__fmin): Replace
> 	asm statements with __builtin_fmin.
> 	* sysdeps/aarch64/fpu/s_fminf.c (__fminf): Replace
> 	asm statements with __builtin_fminf.
> 	* sysdeps/aarch64/fpu/s_frint.c: Delete file.
> 	* sysdeps/aarch64/fpu/s_frintf.c: Delete file.
> 	* sysdeps/aarch64/fpu/s_llrint.c (__llrint): Replace
> 	asm statements with builtin and neon intrinsic.

neon intrinsic is not used, it's builtin and conversion to int.
(same for the rest of the changelog)

> 	* sysdeps/aarch64/fpu/s_llrintf.c (__llrintf): Ditto.

s/Ditto/Likewise/
(same for the rest of the changelog)

> 	* sysdeps/aarch64/fpu/s_llround.c (__llround): Replace
> 	asm statements with neon intrinsic.
> 	* sysdeps/aarch64/fpu/s_llroundf.c (__llroundf): Ditto.
> 	* sysdeps/aarch64/fpu/s_lrint.c (__lrint): Replace
> 	asm statements with builtin and neon intrinsic.
> 	* sysdeps/aarch64/fpu/s_lrintf.c (__lrintf): Ditto.
> 	* sysdeps/aarch64/fpu/s_lround.c (__lround): Replace
> 	asm statements with neon intrinsic vcvtad_s64_f64.
> 	* sysdeps/aarch64/fpu/s_lroundf.c (__lroundf): Replace
> 	asm statements with neon intrinsic vcvtas_s32_f32.
> 	* sysdeps/aarch64/fpu/s_nearbyint.c (__nearbyint): Replace
> 	asm statements with __builtin_nearbyint.
> 	* sysdeps/aarch64/fpu/s_nearbyintf.c (__nearbyintf): Replace
> 	asm statements with __builtin_nearbyintf.
> 	* sysdeps/aarch64/fpu/s_rint.c (__rint): Replace
> 	asm statements with __builtin_rint.
> 	* sysdeps/aarch64/fpu/s_rintf.c (__rintf): Replace
> 	asm statements with __builtin_rintf.
> 	* sysdeps/aarch64/fpu/s_round.c (__round): Replace
> 	asm statements with __builtin_round.
> 	* sysdeps/aarch64/fpu/s_roundf.c (__roundf): Replace
> 	asm statements with __builtin_roundf.
> 	* sysdeps/aarch64/fpu/s_trunc.c (__trunc): Replace
> 	asm statements with __builtin_trunc.
> 	* sysdeps/aarch64/fpu/s_truncf.c (__truncf): Replace
> 	asm statements with __builtin_truncf.
> 	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with
> 	-fno-math-errno.
> 

OK to commit with the ChangeLog fixes and comment fixes listed below. Please remove neon intrinsics from the commit message (text of your original mail) before commit (and let me know if you don't have commit rights so i'll do it for you).

> +long long int
> +__llrintf (float x)
> +{
> +  float r = __builtin_rintf (x);
> +
> +  /* Prevent gcc from calling rintf directly when compiled with
> +     -fno-math-errno by inserting a barrier.  */

use llrintf in the comment.

> +long int
> +__lrintf (float x)
> +{
> +  float r = __builtin_rintf (x);
> +
> +  /* Prevent gcc from calling rintf directly when compiled with
> +     -fno-math-errno by inserting a barrier.  */

use lrintf in the comment.


[-- Attachment #2: gnutools-1925-rebase-v14.patch --]
[-- Type: application/octet-stream, Size: 19603 bytes --]

diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
new file mode 100644
index 0000000..4bd4e69
--- /dev/null
+++ b/sysdeps/aarch64/fpu/Makefile
@@ -0,0 +1,4 @@
+ifeq ($(subdir),math)
+CFLAGS-e_sqrtf.c += -fno-math-errno
+CFLAGS-e_sqrt.c += -fno-math-errno
+endif
diff --git a/sysdeps/aarch64/fpu/e_sqrt.c b/sysdeps/aarch64/fpu/e_sqrt.c
index f984d87..b80ac27 100644
--- a/sysdeps/aarch64/fpu/e_sqrt.c
+++ b/sysdeps/aarch64/fpu/e_sqrt.c
@@ -21,8 +21,6 @@
 double
 __ieee754_sqrt (double d)
 {
-  double res;
-  asm ("fsqrt   %d0, %d1" : "=w" (res) : "w" (d));
-  return res;
+  return __builtin_sqrt (d);
 }
 strong_alias (__ieee754_sqrt, __sqrt_finite)
diff --git a/sysdeps/aarch64/fpu/e_sqrtf.c b/sysdeps/aarch64/fpu/e_sqrtf.c
index 67707ef..7380454 100644
--- a/sysdeps/aarch64/fpu/e_sqrtf.c
+++ b/sysdeps/aarch64/fpu/e_sqrtf.c
@@ -21,8 +21,6 @@
 float
 __ieee754_sqrtf (float s)
 {
-  float res;
-  asm ("fsqrt   %s0, %s1" : "=w" (res) : "w" (s));
-  return res;
+  return __builtin_sqrtf (s);
 }
 strong_alias (__ieee754_sqrtf, __sqrtf_finite)
diff --git a/sysdeps/aarch64/fpu/s_ceil.c b/sysdeps/aarch64/fpu/s_ceil.c
index d0a8bd8..bc90ab9 100644
--- a/sysdeps/aarch64/fpu/s_ceil.c
+++ b/sysdeps/aarch64/fpu/s_ceil.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define	FUNC ceil
-#define INSN "frintp"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__ceil (double x)
+{
+  return __builtin_ceil (x);
+}
+
+weak_alias (__ceil, ceil)
diff --git a/sysdeps/aarch64/fpu/s_ceilf.c b/sysdeps/aarch64/fpu/s_ceilf.c
index b9c2e7c..d5c4383 100644
--- a/sysdeps/aarch64/fpu/s_ceilf.c
+++ b/sysdeps/aarch64/fpu/s_ceilf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define	FUNC ceilf
-#define INSN "frintp"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__ceilf (float x)
+{
+  return __builtin_ceilf (x);
+}
+
+weak_alias (__ceilf, ceilf)
diff --git a/sysdeps/aarch64/fpu/s_floor.c b/sysdeps/aarch64/fpu/s_floor.c
index f7f8731..049535c 100644
--- a/sysdeps/aarch64/fpu/s_floor.c
+++ b/sysdeps/aarch64/fpu/s_floor.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC floor
-#define INSN "frintm"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__floor (double x)
+{
+  return __builtin_floor (x);
+}
+
+weak_alias (__floor, floor)
diff --git a/sysdeps/aarch64/fpu/s_floorf.c b/sysdeps/aarch64/fpu/s_floorf.c
index 7be63b5..fa6fa17 100644
--- a/sysdeps/aarch64/fpu/s_floorf.c
+++ b/sysdeps/aarch64/fpu/s_floorf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC floorf
-#define INSN "frintm"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__floorf (float x)
+{
+  return __builtin_floorf (x);
+}
+
+weak_alias (__floorf, floorf)
diff --git a/sysdeps/aarch64/fpu/s_fma.c b/sysdeps/aarch64/fpu/s_fma.c
index 6f62ce2..e496ec6 100644
--- a/sysdeps/aarch64/fpu/s_fma.c
+++ b/sysdeps/aarch64/fpu/s_fma.c
@@ -18,28 +18,10 @@
 
 #include <math.h>
 
-#ifndef FUNC
-# define FUNC fma
-#endif
-
-#ifndef TYPE
-# define TYPE double
-# define REGS "d"
-#else
-# ifndef REGS
-#  error REGS not defined
-# endif
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-TYPE
-__CONCATX(__,FUNC) (TYPE x, TYPE y, TYPE z)
+double
+__fma (double x, double y, double z)
 {
-  TYPE result;
-  asm ( "fmadd" "\t%" REGS "0, %" REGS "1, %" REGS "2, %" REGS "3"
-        : "=w" (result) : "w" (x), "w" (y), "w" (z) );
-  return result;
+  return __builtin_fma (x, y, z);
 }
 
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__fma, fma)
diff --git a/sysdeps/aarch64/fpu/s_fmaf.c b/sysdeps/aarch64/fpu/s_fmaf.c
index 880a22d..ff1abbf 100644
--- a/sysdeps/aarch64/fpu/s_fmaf.c
+++ b/sysdeps/aarch64/fpu/s_fmaf.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fmaf
-#define TYPE float
-#define REGS "s"
-#include <s_fma.c>
+#include <math.h>
+
+float
+__fmaf (float x, float y, float z)
+{
+  return __builtin_fmaf (x, y, z);
+}
+
+weak_alias (__fmaf, fmaf)
diff --git a/sysdeps/aarch64/fpu/s_fmax.c b/sysdeps/aarch64/fpu/s_fmax.c
index 395a9ba..d7a82f8 100644
--- a/sysdeps/aarch64/fpu/s_fmax.c
+++ b/sysdeps/aarch64/fpu/s_fmax.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fmax
-#define INSN "fmaxnm"
-#include <fpu/s_fmin.c>
+#include <math.h>
+
+double
+__fmax (double x, double y)
+{
+  return __builtin_fmax (x, y);
+}
+
+weak_alias (__fmax, fmax)
diff --git a/sysdeps/aarch64/fpu/s_fmaxf.c b/sysdeps/aarch64/fpu/s_fmaxf.c
index f450d9f..ec4dcdd 100644
--- a/sysdeps/aarch64/fpu/s_fmaxf.c
+++ b/sysdeps/aarch64/fpu/s_fmaxf.c
@@ -16,8 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fmaxf
-#define INSN "fmaxnm"
-#define TYPE float
-#define REGS "s"
-#include <fpu/s_fmin.c>
+#include <math.h>
+
+float
+__fmaxf (float x, float y)
+{
+  return __builtin_fmaxf (x, y);
+}
+
+weak_alias (__fmaxf, fmaxf)
diff --git a/sysdeps/aarch64/fpu/s_fmin.c b/sysdeps/aarch64/fpu/s_fmin.c
index b6d32d5..bba894e 100644
--- a/sysdeps/aarch64/fpu/s_fmin.c
+++ b/sysdeps/aarch64/fpu/s_fmin.c
@@ -18,32 +18,10 @@
 
 #include <math.h>
 
-#ifndef FUNC
-# define FUNC fmin
-#endif
-
-#ifndef INSN
-# define INSN "fminnm"
-#endif
-
-#ifndef TYPE
-# define TYPE double
-# define REGS "d"
-#else
-# ifndef REGS
-#  error REGS not defined
-# endif
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-TYPE
-__CONCATX(__,FUNC) (TYPE x, TYPE y)
+double
+__fmin (double x, double y)
 {
-  TYPE result;
-  asm ( INSN "\t%" REGS "0, %" REGS "1, %" REGS "2"
-        : "=w" (result) : "w" (x), "w" (y) );
-  return result;
+  return __builtin_fmin (x, y);
 }
 
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__fmin, fmin)
diff --git a/sysdeps/aarch64/fpu/s_fminf.c b/sysdeps/aarch64/fpu/s_fminf.c
index 032262d..7d3a3a3 100644
--- a/sysdeps/aarch64/fpu/s_fminf.c
+++ b/sysdeps/aarch64/fpu/s_fminf.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC fminf
-#define TYPE float
-#define REGS "s"
-#include <fpu/s_fmin.c>
+#include <math.h>
+
+float
+__fminf (float x, float y)
+{
+  return __builtin_fminf (x, y);
+}
+
+weak_alias (__fminf, fminf)
diff --git a/sysdeps/aarch64/fpu/s_frint.c b/sysdeps/aarch64/fpu/s_frint.c
deleted file mode 100644
index 48881f5..0000000
--- a/sysdeps/aarch64/fpu/s_frint.c
+++ /dev/null
@@ -1,49 +0,0 @@
-/* Copyright (C) 1996-2017 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <math.h>
-
-#ifndef FUNC
-# error FUNC not defined
-#endif
-
-#ifndef TYPE
-# define TYPE double
-# define REGS "d"
-#else
-# ifndef REGS
-#  error REGS not defined
-# endif
-#endif
-
-#ifndef INSN
-# error INSN not defined
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-TYPE
-__CONCATX(__,FUNC) (TYPE x)
-{
-  TYPE result;
-  asm ( INSN "\t%" REGS "0, %" REGS "1" :
-	"=w" (result) : "w" (x) );
-  return result;
-}
-
-weak_alias (__CONCATX(__,FUNC), FUNC)
diff --git a/sysdeps/aarch64/fpu/s_frintf.c b/sysdeps/aarch64/fpu/s_frintf.c
deleted file mode 100644
index dae99d7..0000000
--- a/sysdeps/aarch64/fpu/s_frintf.c
+++ /dev/null
@@ -1,24 +0,0 @@
-/* Copyright (C) 2011-2017 Free Software Foundation, Inc.
-
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public License as
-   published by the Free Software Foundation; either version 2.1 of the
-   License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#ifndef FUNC
-# error FUNC not defined
-#endif
-#define TYPE float
-#define REGS "s"
-#include <s_frint.c>
diff --git a/sysdeps/aarch64/fpu/s_llrint.c b/sysdeps/aarch64/fpu/s_llrint.c
index 57821c0..f0e0b09 100644
--- a/sysdeps/aarch64/fpu/s_llrint.c
+++ b/sysdeps/aarch64/fpu/s_llrint.c
@@ -16,7 +16,19 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llrint
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lrint.c>
+#include <math.h>
+#include <math_private.h>
+
+long long int
+__llrint (double x)
+{
+  double r = __builtin_rint (x);
+
+  /* Prevent gcc from calling llrint directly when compiled with
+     -fno-math-errno by inserting a barrier.  */
+
+  math_opt_barrier (r);
+  return r;
+}
+
+weak_alias (__llrint, llrint)
diff --git a/sysdeps/aarch64/fpu/s_llrintf.c b/sysdeps/aarch64/fpu/s_llrintf.c
index 98ed4f8..dac73f2 100644
--- a/sysdeps/aarch64/fpu/s_llrintf.c
+++ b/sysdeps/aarch64/fpu/s_llrintf.c
@@ -16,9 +16,20 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llrintf
-#define ITYPE float
-#define IREG_SIZE 32
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lrint.c>
+#include <math.h>
+#include <math_private.h>
+
+long long int
+__llrintf (float x)
+{
+  float r = __builtin_rintf (x);
+
+  /* Prevent gcc from calling llrintf directly when compiled with
+     -fno-math-errno by inserting a barrier.  */
+
+
+  math_opt_barrier (r);
+  return r;
+}
+
+weak_alias (__llrintf, llrintf)
diff --git a/sysdeps/aarch64/fpu/s_llround.c b/sysdeps/aarch64/fpu/s_llround.c
index ef7aedf..2902946 100644
--- a/sysdeps/aarch64/fpu/s_llround.c
+++ b/sysdeps/aarch64/fpu/s_llround.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llround
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lround.c>
+#include <math.h>
+
+long long int
+__llround (double x)
+{
+  return __builtin_llround (x);
+}
+
+weak_alias (__llround, llround)
diff --git a/sysdeps/aarch64/fpu/s_llroundf.c b/sysdeps/aarch64/fpu/s_llroundf.c
index 294f0f4..0ca390b 100644
--- a/sysdeps/aarch64/fpu/s_llroundf.c
+++ b/sysdeps/aarch64/fpu/s_llroundf.c
@@ -16,9 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC llroundf
-#define ITYPE float
-#define IREG_SIZE 32
-#define OTYPE long long int
-#define OREG_SIZE 64
-#include <s_lround.c>
+#include <math.h>
+
+long long int
+__llroundf (float x)
+{
+  return __builtin_llroundf (x);
+}
+
+weak_alias (__llroundf, llroundf)
diff --git a/sysdeps/aarch64/fpu/s_lrint.c b/sysdeps/aarch64/fpu/s_lrint.c
index 6ef64e2..ee645ad 100644
--- a/sysdeps/aarch64/fpu/s_lrint.c
+++ b/sysdeps/aarch64/fpu/s_lrint.c
@@ -19,38 +19,17 @@
 #include <math.h>
 #include <get-rounding-mode.h>
 #include <stdint.h>
+#include <math_private.h>
 
-#ifndef FUNC
-# define FUNC lrint
-#endif
-
-#ifndef ITYPE
-# define ITYPE double
 # define IREG_SIZE 64
-#else
-# ifndef IREG_SIZE
-#  error IREG_SIZE not defined
-# endif
-#endif
 
-#ifndef OTYPE
-# define OTYPE long int
 # ifdef __ILP32__
 #  define OREG_SIZE 32
 # else
 #  define OREG_SIZE 64
 # endif
-#else
-# ifndef OREG_SIZE
-#  error OREG_SIZE not defined
-# endif
-#endif
 
-#if IREG_SIZE == 32
-# define IREGS "s"
-#else
 # define IREGS "d"
-#endif
 
 #if OREG_SIZE == 32
 # define OREGS "w"
@@ -58,15 +37,14 @@
 # define OREGS "x"
 #endif
 
-#define __CONCATX(a,b) __CONCAT(a,b)
 
-OTYPE
-__CONCATX(__,FUNC) (ITYPE x)
+long int
+__lrint (double x)
 {
-  OTYPE result;
-  ITYPE temp;
 
 #if IREG_SIZE == 64 && OREG_SIZE == 32
+  long int result;
+
   if (__builtin_fabs (x) > INT32_MAX)
     {
       /* Converting large values to a 32 bit int may cause the frintx/fcvtza
@@ -96,10 +74,14 @@ __CONCATX(__,FUNC) (ITYPE x)
       return result;
     }
 #endif
-  asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
-        "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
-        : "=r" (result), "=w" (temp) : "w" (x) );
-  return result;
+
+  double r =  __builtin_rint (x);
+
+  /* Prevent gcc from calling lrint directly when compiled with
+     -fno-math-errno by inserting a barrier.  */
+
+  math_opt_barrier (r);
+  return r;
 }
 
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__lrint, lrint)
diff --git a/sysdeps/aarch64/fpu/s_lrintf.c b/sysdeps/aarch64/fpu/s_lrintf.c
index 2e73271..5b6a426 100644
--- a/sysdeps/aarch64/fpu/s_lrintf.c
+++ b/sysdeps/aarch64/fpu/s_lrintf.c
@@ -16,7 +16,19 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC lrintf
-#define ITYPE float
-#define IREG_SIZE 32
-#include <s_lrint.c>
+#include <math.h>
+#include <math_private.h>
+
+long int
+__lrintf (float x)
+{
+  float r = __builtin_rintf (x);
+
+  /* Prevent gcc from calling lrintf directly when compiled with
+     -fno-math-errno by inserting a barrier.  */
+
+  math_opt_barrier (r);
+  return r;
+}
+
+weak_alias (__lrintf, lrintf)
diff --git a/sysdeps/aarch64/fpu/s_lround.c b/sysdeps/aarch64/fpu/s_lround.c
index 1f77d82..90c3163 100644
--- a/sysdeps/aarch64/fpu/s_lround.c
+++ b/sysdeps/aarch64/fpu/s_lround.c
@@ -18,53 +18,10 @@
 
 #include <math.h>
 
-#ifndef FUNC
-# define FUNC lround
-#endif
+long int
+__lround (double x)
+ {
+  return __builtin_lround (x);
+ }
 
-#ifndef ITYPE
-# define ITYPE double
-# define IREG_SIZE 64
-#else
-# ifndef IREG_SIZE
-#  error IREG_SIZE not defined
-# endif
-#endif
-
-#ifndef OTYPE
-# define OTYPE long int
-# ifdef __ILP32__
-#  define OREG_SIZE 32
-# else
-#  define OREG_SIZE 64
-# endif
-#else
-# ifndef OREG_SIZE
-#  error OREG_SIZE not defined
-# endif
-#endif
-
-#if IREG_SIZE == 32
-# define IREGS "s"
-#else
-# define IREGS "d"
-#endif
-
-#if OREG_SIZE == 32
-# define OREGS "w"
-#else
-# define OREGS "x"
-#endif
-
-#define __CONCATX(a,b) __CONCAT(a,b)
-
-OTYPE
-__CONCATX(__,FUNC) (ITYPE x)
-{
-  OTYPE result;
-  asm ( "fcvtas" "\t%" OREGS "0, %" IREGS "1"
-        : "=r" (result) : "w" (x) );
-  return result;
-}
-
-weak_alias (__CONCATX(__,FUNC), FUNC)
+weak_alias (__lround, lround)
diff --git a/sysdeps/aarch64/fpu/s_lroundf.c b/sysdeps/aarch64/fpu/s_lroundf.c
index b30ddb6..baf0693 100644
--- a/sysdeps/aarch64/fpu/s_lroundf.c
+++ b/sysdeps/aarch64/fpu/s_lroundf.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC lroundf
-#define ITYPE float
-#define IREG_SIZE 32
-#include <s_lround.c>
+#include <math.h>
+
+long int
+__lroundf (float x)
+{
+  return __builtin_lroundf (x);
+}
+
+weak_alias (__lroundf, lroundf)
diff --git a/sysdeps/aarch64/fpu/s_nearbyint.c b/sysdeps/aarch64/fpu/s_nearbyint.c
index 51067f2..6ba5de1 100644
--- a/sysdeps/aarch64/fpu/s_nearbyint.c
+++ b/sysdeps/aarch64/fpu/s_nearbyint.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define	FUNC nearbyint
-#define INSN "frinti"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__nearbyint (double x)
+{
+  return __builtin_nearbyint (x);
+}
+
+weak_alias (__nearbyint, nearbyint)
diff --git a/sysdeps/aarch64/fpu/s_nearbyintf.c b/sysdeps/aarch64/fpu/s_nearbyintf.c
index 8125646..de69fd9 100644
--- a/sysdeps/aarch64/fpu/s_nearbyintf.c
+++ b/sysdeps/aarch64/fpu/s_nearbyintf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC nearbyintf
-#define INSN "frinti"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__nearbyintf (float x)
+{
+  return __builtin_nearbyintf (x);
+}
+
+weak_alias (__nearbyintf, nearbyintf)
diff --git a/sysdeps/aarch64/fpu/s_rint.c b/sysdeps/aarch64/fpu/s_rint.c
index 73b4e26..b4ac349 100644
--- a/sysdeps/aarch64/fpu/s_rint.c
+++ b/sysdeps/aarch64/fpu/s_rint.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC rint
-#define INSN "frintx"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__rint (double x)
+{
+  return __builtin_rint (x);
+}
+
+weak_alias (__rint, rint)
diff --git a/sysdeps/aarch64/fpu/s_rintf.c b/sysdeps/aarch64/fpu/s_rintf.c
index 3560dc2..d0f70ce 100644
--- a/sysdeps/aarch64/fpu/s_rintf.c
+++ b/sysdeps/aarch64/fpu/s_rintf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC rintf
-#define INSN "frintx"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__rintf (float x)
+{
+  return __builtin_rintf (x);
+}
+
+weak_alias (__rintf, rintf)
diff --git a/sysdeps/aarch64/fpu/s_round.c b/sysdeps/aarch64/fpu/s_round.c
index 6781748..a34fca1 100644
--- a/sysdeps/aarch64/fpu/s_round.c
+++ b/sysdeps/aarch64/fpu/s_round.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC round
-#define INSN "frinta"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__round (double x)
+{
+  return __builtin_round (x);
+}
+
+weak_alias (__round, round)
diff --git a/sysdeps/aarch64/fpu/s_roundf.c b/sysdeps/aarch64/fpu/s_roundf.c
index ef6f672..66c8ee6 100644
--- a/sysdeps/aarch64/fpu/s_roundf.c
+++ b/sysdeps/aarch64/fpu/s_roundf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC roundf
-#define INSN "frinta"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__roundf (float x)
+{
+  return __builtin_roundf (x);
+}
+
+weak_alias (__roundf, roundf)
diff --git a/sysdeps/aarch64/fpu/s_trunc.c b/sysdeps/aarch64/fpu/s_trunc.c
index 2bf5474..6550dfc 100644
--- a/sysdeps/aarch64/fpu/s_trunc.c
+++ b/sysdeps/aarch64/fpu/s_trunc.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC trunc
-#define INSN "frintz"
-#include <s_frint.c>
+#include <math.h>
+
+double
+__trunc (double x)
+{
+  return __builtin_trunc (x);
+}
+
+weak_alias (__trunc, trunc)
diff --git a/sysdeps/aarch64/fpu/s_truncf.c b/sysdeps/aarch64/fpu/s_truncf.c
index 94865a4..b7890a2 100644
--- a/sysdeps/aarch64/fpu/s_truncf.c
+++ b/sysdeps/aarch64/fpu/s_truncf.c
@@ -16,6 +16,12 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define FUNC truncf
-#define INSN "frintz"
-#include <s_frintf.c>
+#include <math.h>
+
+float
+__truncf (float x)
+{
+  return __builtin_truncf (x);
+}
+
+weak_alias (__truncf, truncf)
-- 
1.9.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] aarch64: Implement math acceleration via builtins
  2017-10-20 20:16         ` Michael Collison
@ 2017-10-23 14:43           ` Szabolcs Nagy
  0 siblings, 0 replies; 10+ messages in thread
From: Szabolcs Nagy @ 2017-10-23 14:43 UTC (permalink / raw)
  To: Michael Collison, Adhemerval Zanella, libc-alpha; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

On 20/10/17 21:16, Michael Collison wrote:
> Patch updated with upstream comments. ChangeLog updated as well.
> 
> Tested on aarch64-linux-gnu with gcc-5.4 and gcc-6. Okay for trunk?
> 
> 2017-10-20  Michael Collison  <michael.collison@arm.com>
...
>> 	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with
>> 	-fno-math-errno.

i managed to miss this new Makefile in the commit.

but then i saw that lround, llround, lroundf, llroundf builtins
fail to inline as single instruction without -fno-math-errno
(at least with the toolchain i tried), so i added those to
the makefile, committed as in the attached patch.

however this feels fragile, we probably have to add the
-fno-math-errno flag to all math functions.


[-- Attachment #2: 0001-aarch64-Add-missing-math-Makefile-for-recent-commit.patch --]
[-- Type: text/x-patch, Size: 1427 bytes --]

From be080b6c143901d998c91f28ef7b2fe4a25c0237 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date: Mon, 23 Oct 2017 15:31:37 +0100
Subject: [PATCH] aarch64: Add missing math Makefile for recent commit

Without -fno-math-errno, the builtins just do a call instead of
inlining a single instruction.
---
 ChangeLog                    | 3 ++-
 sysdeps/aarch64/fpu/Makefile | 8 ++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/aarch64/fpu/Makefile

diff --git a/ChangeLog b/ChangeLog
index 4a011b136b..1316b401ae 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -55,7 +55,8 @@
 	with __builtin_trunc.
 	* sysdeps/aarch64/fpu/s_truncf.c (__truncf): Replace asm statements
 	with __builtin_truncf.
-	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with -fno-math-errno.
+	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with -fno-math-errno,
+	and s_l[l]round[f].c too.
 
 2017-10-23  Alan Modra  <amodra@gmail.com>
 
diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
new file mode 100644
index 0000000000..bf38f2c589
--- /dev/null
+++ b/sysdeps/aarch64/fpu/Makefile
@@ -0,0 +1,8 @@
+ifeq ($(subdir),math)
+CFLAGS-e_sqrtf.c += -fno-math-errno
+CFLAGS-e_sqrt.c += -fno-math-errno
+CFLAGS-s_lroundf.c += -fno-math-errno
+CFLAGS-s_lround.c += -fno-math-errno
+CFLAGS-s_llroundf.c += -fno-math-errno
+CFLAGS-s_llround.c += -fno-math-errno
+endif
-- 
2.11.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-10-23 14:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-13 21:26 [PATCH] aarch64: Implement math acceleration via builtins Michael Collison
2017-10-13 22:37 ` Adhemerval Zanella
2017-10-16  9:37   ` Szabolcs Nagy
2017-10-17 16:43     ` Adhemerval Zanella
2017-10-17 16:14   ` Szabolcs Nagy
2017-10-17 16:48     ` Adhemerval Zanella
2017-10-19 19:16     ` Michael Collison
2017-10-20  9:36       ` Szabolcs Nagy
2017-10-20 20:16         ` Michael Collison
2017-10-23 14:43           ` Szabolcs Nagy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).