public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, powerpc] Add -mmass to use XL's MASS vectorization library
@ 2010-08-18 20:36 Michael Meissner
  2010-08-18 20:41 ` Richard Guenther
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Meissner @ 2010-08-18 20:36 UTC (permalink / raw)
  To: gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 1224 bytes --]

This patch was cloned from the i386 -mveclibabi=<xxx> support, and it adds a
new switch (-mmass) that says to vectorize various mathematical functions (sin,
cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006
floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math
functions.  I have done bootstraps on my power systems, and comparison tests
and there were no regressions.  Is it ok to install in the tree?

[gcc]
2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.opt (-mmass): New option to enable the
	compiler to autovectorize mathmetical functions for power7 using
	the Mathematical Acceleration Subsystem library.

	* config/rs6000/rs6000.c (rs6000_builtin_vectorized_libmass): New
	function to handle auto vectorizing math functions that are in the
	MASS library.
	(rs6000_builtin_vectorized_function): Call it.

	* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mmass.

[gcc/testsuite]
2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vsx-mass-1.c: New file, test -mmass.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com

[-- Attachment #2: gcc-power7.patch150b --]
[-- Type: text/plain, Size: 17387 bytes --]

Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 163347)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -115,6 +115,10 @@ mpopcntd
 Target Report Mask(POPCNTD)
 Use PowerPC V2.06 popcntd instruction
 
+mmass
+Target Report Var(TARGET_MASS) Init(0)
+Use the Mathematical Acceleration Subsystem library high performance math libraries.
+
 mvsx
 Target Report Mask(VSX)
 Use vector/scalar (VSX) instructions
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 163347)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -989,6 +989,7 @@ static rtx rs6000_emit_stack_reset (rs60
 static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int,
 				   enum machine_mode, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
+static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
 static int rs6000_savres_strategy (rs6000_stack_t *, bool, int, int);
 static void rs6000_restore_saved_cr (rtx, int);
@@ -3602,6 +3603,145 @@ rs6000_parse_fpu_option (const char *opt
   return FPU_NONE;
 }
 
+
+/* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
+   library with vectorized intrinsics.  */
+
+static tree
+rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in)
+{
+  char name[32];
+  const char *suffix = NULL;
+  tree fntype, new_fndecl, bdecl = NULL_TREE;
+  int n_args = 1;
+  const char *bname;
+  enum machine_mode el_mode, in_mode;
+  int n, in_n;
+
+  /* Libmass is suitable for unsafe math only as it does not correctly support
+     parts of IEEE with the required precision such as denormals.  Only support
+     it if we have VSX to use the simd d2 or f4 functions.
+     XXX: Add variable length support.  */
+  if (!flag_unsafe_math_optimizations || !TARGET_VSX)
+    return NULL_TREE;
+
+  el_mode = TYPE_MODE (TREE_TYPE (type_out));
+  n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+  if (el_mode != in_mode
+      || n != in_n)
+    return NULL_TREE;
+
+  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+    {
+      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
+      switch (fn)
+	{
+	case BUILT_IN_ATAN2:
+	case BUILT_IN_HYPOT:
+	case BUILT_IN_POW:
+	  n_args = 2;
+	  /* fall through */
+
+	case BUILT_IN_ACOS:
+	case BUILT_IN_ACOSH:
+	case BUILT_IN_ASIN:
+	case BUILT_IN_ASINH:
+	case BUILT_IN_ATAN:
+	case BUILT_IN_ATANH:
+	case BUILT_IN_CBRT:
+	case BUILT_IN_COS:
+	case BUILT_IN_COSH:
+	case BUILT_IN_ERF:
+	case BUILT_IN_ERFC:
+	case BUILT_IN_EXP2:
+	case BUILT_IN_EXP:
+	case BUILT_IN_EXPM1:
+	case BUILT_IN_LGAMMA:
+	case BUILT_IN_LOG10:
+	case BUILT_IN_LOG1P:
+	case BUILT_IN_LOG2:
+	case BUILT_IN_LOG:
+	case BUILT_IN_SIN:
+	case BUILT_IN_SINH:
+	case BUILT_IN_SQRT:
+	case BUILT_IN_TAN:
+	case BUILT_IN_TANH:
+	  bdecl = implicit_built_in_decls[fn];
+	  suffix = "d2";				/* pow -> powd2 */
+	  if (el_mode != DFmode
+	      || n != 2)
+	    return NULL_TREE;
+	  break;
+
+	case BUILT_IN_ATAN2F:
+	case BUILT_IN_HYPOTF:
+	case BUILT_IN_POWF:
+	  n_args = 2;
+	  /* fall through */
+
+	case BUILT_IN_ACOSF:
+	case BUILT_IN_ACOSHF:
+	case BUILT_IN_ASINF:
+	case BUILT_IN_ASINHF:
+	case BUILT_IN_ATANF:
+	case BUILT_IN_ATANHF:
+	case BUILT_IN_CBRTF:
+	case BUILT_IN_COSF:
+	case BUILT_IN_COSHF:
+	case BUILT_IN_ERFF:
+	case BUILT_IN_ERFCF:
+	case BUILT_IN_EXP2F:
+	case BUILT_IN_EXPF:
+	case BUILT_IN_EXPM1F:
+	case BUILT_IN_LGAMMAF:
+	case BUILT_IN_LOG10F:
+	case BUILT_IN_LOG1PF:
+	case BUILT_IN_LOG2F:
+	case BUILT_IN_LOGF:
+	case BUILT_IN_SINF:
+	case BUILT_IN_SINHF:
+	case BUILT_IN_SQRTF:
+	case BUILT_IN_TANF:
+	case BUILT_IN_TANHF:
+	  bdecl = implicit_built_in_decls[fn];
+	  suffix = "4";					/* powf -> powf4 */
+	  if (el_mode != SFmode
+	      || n != 4)
+	    return NULL_TREE;
+	  break;
+
+	default:
+	  return NULL_TREE;
+	}
+    }
+  else
+    return NULL_TREE;
+
+  gcc_assert (suffix != NULL);
+  bname = IDENTIFIER_POINTER (DECL_NAME (bdecl));
+  strcpy (name, bname + sizeof ("__builtin_") - 1);
+  strcat (name, suffix);
+
+  if (n_args == 1)
+    fntype = build_function_type_list (type_out, type_in, NULL);
+  else if (n_args == 2)
+    fntype = build_function_type_list (type_out, type_in, type_in, NULL);
+  else
+    gcc_unreachable ();
+
+  /* Build a function declaration for the vectorized function.  */
+  new_fndecl = build_decl (BUILTINS_LOCATION,
+			   FUNCTION_DECL, get_identifier (name), fntype);
+  TREE_PUBLIC (new_fndecl) = 1;
+  DECL_EXTERNAL (new_fndecl) = 1;
+  DECL_IS_NOVOPS (new_fndecl) = 1;
+  TREE_READONLY (new_fndecl) = 1;
+
+  return new_fndecl;
+}
+
 /* Returns a function decl for a vectorized version of the builtin function
    with builtin function code FN and the result vector type TYPE, or NULL_TREE
    if it is not available.  */
@@ -3768,6 +3908,10 @@ rs6000_builtin_vectorized_function (tree
 	}
     }
 
+  /* Generate calls to libmass if appropriate.  */
+  if (TARGET_MASS)
+    return rs6000_builtin_vectorized_libmass (fndecl, type_out, type_in);
+
   return NULL_TREE;
 }
 
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 163347)
+++ gcc/doc/invoke.texi	(working copy)
@@ -786,7 +786,9 @@ See RS/6000 and PowerPC Options.
 -mprototype  -mno-prototype @gol
 -msim  -mmvme  -mads  -myellowknife  -memb  -msdata @gol
 -msdata=@var{opt}  -mvxworks  -G @var{num}  -pthread @gol
--mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision -mno-recip-precision}
+-mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision
+-mno-recip-precision @gol
+-mmass}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -15847,6 +15849,29 @@ automatically selects @option{-mrecip-pr
 precision square root estimate instructions are not generated by
 default on low precision machines, since they do not provide an
 estimate that converges after three steps.
+
+@item -mmass
+@itemx -mno-mass
+@opindex mmass
+Specifies to use IBM's Mathematical Acceleration Subsystem (MASS)
+libraries for vectorizing intrinsics using external libraries.  GCC
+will currently emit calls to @code{acosd2}, @code{acosf4},
+@code{acoshd2}, @code{acoshf4}, @code{asind2}, @code{asinf4},
+@code{asinhd2}, @code{asinhf4}, @code{atan2d2}, @code{atan2f4},
+@code{atand2}, @code{atanf4}, @code{atanhd2}, @code{atanhf4},
+@code{cbrtd2}, @code{cbrtf4}, @code{cosd2}, @code{cosf4},
+@code{coshd2}, @code{coshf4}, @code{erfcd2}, @code{erfcf4},
+@code{erfd2}, @code{erff4}, @code{exp2d2}, @code{exp2f4},
+@code{expd2}, @code{expf4}, @code{expm1d2}, @code{expm1f4},
+@code{hypotd2}, @code{hypotf4}, @code{lgammad2}, @code{lgammaf4},
+@code{log10d2}, @code{log10f4}, @code{log1pd2}, @code{log1pf4},
+@code{log2d2}, @code{log2f4}, @code{logd2}, @code{logf4},
+@code{powd2}, @code{powf4}, @code{sind2}, @code{sinf4}, @code{sinhd2},
+@code{sinhf4}, @code{sqrtd2}, @code{sqrtf4}, @code{tand2},
+@code{tanf4}, @code{tanhd2}, and @code{tanhf4} when generating code
+for power7.  Both @option{-ftree-vectorize} and
+@option{-funsafe-math-optimizations} have to be enabled.  The MASS
+libraries will have to be specified at link time.
 @end table
 
 @node RX Options
Index: gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c	(revision 0)
@@ -0,0 +1,554 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O3 -ftree-vectorize -mcpu=power7 -ffast-math -mmass" } */
+/* { dg-final { scan-assembler "bl atan2d2" } } */
+/* { dg-final { scan-assembler "bl atan2f4" } } */
+/* { dg-final { scan-assembler "bl hypotd2" } } */
+/* { dg-final { scan-assembler "bl hypotf4" } } */
+/* { dg-final { scan-assembler "bl powd2" } } */
+/* { dg-final { scan-assembler "bl powf4" } } */
+/* { dg-final { scan-assembler "bl acosd2" } } */
+/* { dg-final { scan-assembler "bl acosf4" } } */
+/* { dg-final { scan-assembler "bl acoshd2" } } */
+/* { dg-final { scan-assembler "bl acoshf4" } } */
+/* { dg-final { scan-assembler "bl asind2" } } */
+/* { dg-final { scan-assembler "bl asinf4" } } */
+/* { dg-final { scan-assembler "bl asinhd2" } } */
+/* { dg-final { scan-assembler "bl asinhf4" } } */
+/* { dg-final { scan-assembler "bl atand2" } } */
+/* { dg-final { scan-assembler "bl atanf4" } } */
+/* { dg-final { scan-assembler "bl atanhd2" } } */
+/* { dg-final { scan-assembler "bl atanhf4" } } */
+/* { dg-final { scan-assembler "bl cbrtd2" } } */
+/* { dg-final { scan-assembler "bl cbrtf4" } } */
+/* { dg-final { scan-assembler "bl cosd2" } } */
+/* { dg-final { scan-assembler "bl cosf4" } } */
+/* { dg-final { scan-assembler "bl coshd2" } } */
+/* { dg-final { scan-assembler "bl coshf4" } } */
+/* { dg-final { scan-assembler "bl erfd2" } } */
+/* { dg-final { scan-assembler "bl erff4" } } */
+/* { dg-final { scan-assembler "bl erfcd2" } } */
+/* { dg-final { scan-assembler "bl erfcf4" } } */
+/* { dg-final { scan-assembler "bl exp2d2" } } */
+/* { dg-final { scan-assembler "bl exp2f4" } } */
+/* { dg-final { scan-assembler "bl expd2" } } */
+/* { dg-final { scan-assembler "bl expf4" } } */
+/* { dg-final { scan-assembler "bl expm1d2" } } */
+/* { dg-final { scan-assembler "bl expm1f4" } } */
+/* { dg-final { scan-assembler "bl lgamma" } } */
+/* { dg-final { scan-assembler "bl lgammaf" } } */
+/* { dg-final { scan-assembler "bl log10d2" } } */
+/* { dg-final { scan-assembler "bl log10f4" } } */
+/* { dg-final { scan-assembler "bl log1pd2" } } */
+/* { dg-final { scan-assembler "bl log1pf4" } } */
+/* { dg-final { scan-assembler "bl log2d2" } } */
+/* { dg-final { scan-assembler "bl log2f4" } } */
+/* { dg-final { scan-assembler "bl logd2" } } */
+/* { dg-final { scan-assembler "bl logf4" } } */
+/* { dg-final { scan-assembler "bl sind2" } } */
+/* { dg-final { scan-assembler "bl sinf4" } } */
+/* { dg-final { scan-assembler "bl sinhd2" } } */
+/* { dg-final { scan-assembler "bl sinhf4" } } */
+/* { dg-final { scan-assembler "bl tand2" } } */
+/* { dg-final { scan-assembler "bl tanf4" } } */
+/* { dg-final { scan-assembler "bl tanhd2" } } */
+/* { dg-final { scan-assembler "bl tanhf4" } } */
+
+#ifndef SIZE
+#define SIZE 1024
+#endif
+
+double d1[SIZE] __attribute__((__aligned__(32)));
+double d2[SIZE] __attribute__((__aligned__(32)));
+double d3[SIZE] __attribute__((__aligned__(32)));
+
+float f1[SIZE] __attribute__((__aligned__(32)));
+float f2[SIZE] __attribute__((__aligned__(32)));
+float f3[SIZE] __attribute__((__aligned__(32)));
+
+void
+test_double_atan2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atan2 (d2[i], d3[i]);
+}
+
+void
+test_float_atan2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atan2f (f2[i], f3[i]);
+}
+
+void
+test_double_hypot (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_hypot (d2[i], d3[i]);
+}
+
+void
+test_float_hypot (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_hypotf (f2[i], f3[i]);
+}
+
+void
+test_double_pow (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_pow (d2[i], d3[i]);
+}
+
+void
+test_float_pow (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_powf (f2[i], f3[i]);
+}
+
+void
+test_double_acos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_acos (d2[i]);
+}
+
+void
+test_float_acos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_acosf (f2[i]);
+}
+
+void
+test_double_acosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_acosh (d2[i]);
+}
+
+void
+test_float_acosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_acoshf (f2[i]);
+}
+
+void
+test_double_asin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_asin (d2[i]);
+}
+
+void
+test_float_asin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_asinf (f2[i]);
+}
+
+void
+test_double_asinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_asinh (d2[i]);
+}
+
+void
+test_float_asinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_asinhf (f2[i]);
+}
+
+void
+test_double_atan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atan (d2[i]);
+}
+
+void
+test_float_atan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atanf (f2[i]);
+}
+
+void
+test_double_atanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atanh (d2[i]);
+}
+
+void
+test_float_atanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atanhf (f2[i]);
+}
+
+void
+test_double_cbrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cbrt (d2[i]);
+}
+
+void
+test_float_cbrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_cbrtf (f2[i]);
+}
+
+void
+test_double_cos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cos (d2[i]);
+}
+
+void
+test_float_cos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_cosf (f2[i]);
+}
+
+void
+test_double_cosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cosh (d2[i]);
+}
+
+void
+test_float_cosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_coshf (f2[i]);
+}
+
+void
+test_double_erf (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_erf (d2[i]);
+}
+
+void
+test_float_erf (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_erff (f2[i]);
+}
+
+void
+test_double_erfc (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_erfc (d2[i]);
+}
+
+void
+test_float_erfc (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_erfcf (f2[i]);
+}
+
+void
+test_double_exp2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_exp2 (d2[i]);
+}
+
+void
+test_float_exp2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_exp2f (f2[i]);
+}
+
+void
+test_double_exp (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_exp (d2[i]);
+}
+
+void
+test_float_exp (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_expf (f2[i]);
+}
+
+void
+test_double_expm1 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_expm1 (d2[i]);
+}
+
+void
+test_float_expm1 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_expm1f (f2[i]);
+}
+
+void
+test_double_lgamma (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_lgamma (d2[i]);
+}
+
+void
+test_float_lgamma (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_lgammaf (f2[i]);
+}
+
+void
+test_double_log10 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log10 (d2[i]);
+}
+
+void
+test_float_log10 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log10f (f2[i]);
+}
+
+void
+test_double_log1p (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log1p (d2[i]);
+}
+
+void
+test_float_log1p (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log1pf (f2[i]);
+}
+
+void
+test_double_log2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log2 (d2[i]);
+}
+
+void
+test_float_log2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log2f (f2[i]);
+}
+
+void
+test_double_log (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log (d2[i]);
+}
+
+void
+test_float_log (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_logf (f2[i]);
+}
+
+void
+test_double_sin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sin (d2[i]);
+}
+
+void
+test_float_sin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sinf (f2[i]);
+}
+
+void
+test_double_sinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sinh (d2[i]);
+}
+
+void
+test_float_sinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sinhf (f2[i]);
+}
+
+void
+test_double_sqrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sqrt (d2[i]);
+}
+
+void
+test_float_sqrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sqrtf (f2[i]);
+}
+
+void
+test_double_tan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_tan (d2[i]);
+}
+
+void
+test_float_tan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_tanf (f2[i]);
+}
+
+void
+test_double_tanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_tanh (d2[i]);
+}
+
+void
+test_float_tanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_tanhf (f2[i]);
+}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, powerpc] Add -mmass to use XL's MASS vectorization library
  2010-08-18 20:36 [PATCH, powerpc] Add -mmass to use XL's MASS vectorization library Michael Meissner
@ 2010-08-18 20:41 ` Richard Guenther
  2010-08-18 20:59   ` Michael Meissner
  2010-08-18 22:17   ` Michael Meissner
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Guenther @ 2010-08-18 20:41 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Wed, Aug 18, 2010 at 10:32 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch was cloned from the i386 -mveclibabi=<xxx> support, and it adds a
> new switch (-mmass) that says to vectorize various mathematical functions (sin,
> cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006
> floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math
> functions.  I have done bootstraps on my power systems, and comparison tests
> and there were no regressions.  Is it ok to install in the tree?

In the case that we develop a common library for all archs it would be nice
to have the same switch for ppc as we have for x86, so why didn't you
use -mveclibabi=mass?

Richard.

> [gcc]
> 2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        * config/rs6000/rs6000.opt (-mmass): New option to enable the
>        compiler to autovectorize mathmetical functions for power7 using
>        the Mathematical Acceleration Subsystem library.
>
>        * config/rs6000/rs6000.c (rs6000_builtin_vectorized_libmass): New
>        function to handle auto vectorizing math functions that are in the
>        MASS library.
>        (rs6000_builtin_vectorized_function): Call it.
>
>        * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mmass.
>
> [gcc/testsuite]
> 2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        * gcc.target/powerpc/vsx-mass-1.c: New file, test -mmass.
>
> --
> Michael Meissner, IBM
> 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
> meissner@linux.vnet.ibm.com
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, powerpc] Add -mmass to use XL's MASS vectorization library
  2010-08-18 20:41 ` Richard Guenther
@ 2010-08-18 20:59   ` Michael Meissner
  2010-08-18 21:08     ` Sebastian Pop
  2010-08-18 22:17   ` Michael Meissner
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Meissner @ 2010-08-18 20:59 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Wed, Aug 18, 2010 at 10:36:13PM +0200, Richard Guenther wrote:
> On Wed, Aug 18, 2010 at 10:32 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > This patch was cloned from the i386 -mveclibabi=<xxx> support, and it adds a
> > new switch (-mmass) that says to vectorize various mathematical functions (sin,
> > cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006
> > floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math
> > functions.  I have done bootstraps on my power systems, and comparison tests
> > and there were no regressions.  Is it ok to install in the tree?
> 
> In the case that we develop a common library for all archs it would be nice
> to have the same switch for ppc as we have for x86, so why didn't you
> use -mveclibabi=mass?

That sounds reasonable.

It isn't in this patch, but at some point, I think it would be a useful to add
a SSA pass to transform the code to call a function function that takes
pointers and a length argument, and eliminate the loop.  This way, the library
can properly deal with load delays, etc.  If memory serves, the Intel and AMD
optimized math libraries have similar functions, though the order of the
arguments is different than the MASS arguments.  Is this the case?

If I wasn't clear, consider the loop:

	for (i = 0; i < size; i++)
	  a[i] = __builtin_sin (b[i])

right now gets transformed to:

	V2DF_a_ptr = (V2DF *)a;
	V2DF_b_ptr = (V2DF *)b;
	for (i = 0; i < size/2; i++)
	  V2DF_a_ptr[i] = sind2 (V2DF_b_ptr[i])

and instead it should generate:

	len_tmp = size;
	vsin (a, b, &len_tmp);

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, powerpc] Add -mmass to use XL's MASS vectorization library
  2010-08-18 20:59   ` Michael Meissner
@ 2010-08-18 21:08     ` Sebastian Pop
  0 siblings, 0 replies; 6+ messages in thread
From: Sebastian Pop @ 2010-08-18 21:08 UTC (permalink / raw)
  To: Michael Meissner, Richard Guenther, gcc-patches, dje.gcc

On Wed, Aug 18, 2010 at 15:50, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> On Wed, Aug 18, 2010 at 10:36:13PM +0200, Richard Guenther wrote:
>> On Wed, Aug 18, 2010 at 10:32 PM, Michael Meissner
>> <meissner@linux.vnet.ibm.com> wrote:
>> > This patch was cloned from the i386 -mveclibabi=<xxx> support, and it adds a
>> > new switch (-mmass) that says to vectorize various mathematical functions (sin,
>> > cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006
>> > floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math
>> > functions.  I have done bootstraps on my power systems, and comparison tests
>> > and there were no regressions.  Is it ok to install in the tree?
>>
>> In the case that we develop a common library for all archs it would be nice
>> to have the same switch for ppc as we have for x86, so why didn't you
>> use -mveclibabi=mass?
>
> That sounds reasonable.
>
> It isn't in this patch, but at some point, I think it would be a useful to add
> a SSA pass to transform the code to call a function function that takes
> pointers and a length argument, and eliminate the loop.  This way, the library
> can properly deal with load delays, etc.  If memory serves, the Intel and AMD
> optimized math libraries have similar functions, though the order of the
> arguments is different than the MASS arguments.  Is this the case?
>
> If I wasn't clear, consider the loop:
>
>        for (i = 0; i < size; i++)
>          a[i] = __builtin_sin (b[i])
>
> right now gets transformed to:
>
>        V2DF_a_ptr = (V2DF *)a;
>        V2DF_b_ptr = (V2DF *)b;
>        for (i = 0; i < size/2; i++)
>          V2DF_a_ptr[i] = sind2 (V2DF_b_ptr[i])
>
> and instead it should generate:
>
>        len_tmp = size;
>        vsin (a, b, &len_tmp);
>

I also thought about this transform, and I think it
could be called from loop distribution.

Sebastian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, powerpc] Add -mmass to use XL's MASS vectorization library
  2010-08-18 20:41 ` Richard Guenther
  2010-08-18 20:59   ` Michael Meissner
@ 2010-08-18 22:17   ` Michael Meissner
  2010-08-20 14:44     ` David Edelsohn
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Meissner @ 2010-08-18 22:17 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Michael Meissner, gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 1852 bytes --]

On Wed, Aug 18, 2010 at 10:36:13PM +0200, Richard Guenther wrote:
> On Wed, Aug 18, 2010 at 10:32 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > This patch was cloned from the i386 -mveclibabi=<xxx> support, and it adds a
> > new switch (-mmass) that says to vectorize various mathematical functions (sin,
> > cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006
> > floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math
> > functions.  I have done bootstraps on my power systems, and comparison tests
> > and there were no regressions.  Is it ok to install in the tree?
> 
> In the case that we develop a common library for all archs it would be nice
> to have the same switch for ppc as we have for x86, so why didn't you
> use -mveclibabi=mass?

This revised patch changes the name of the switch to -mveclibabi=mass.  Is it
ok to apply?

[gcc]
2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.opt (-mveclibabi=mass): New option to
	enable the compiler to autovectorize mathmetical functions for
	power7 using the Mathematical Acceleration Subsystem library.

	* config/rs6000/rs6000.c (rs6000_veclib_handler): New variable to
	handle which vector math library we have.
	(rs6000_override_options): Add -mveclibabi=mass support.
	(rs6000_builtin_vectorized_libmass): New function to handle auto
	vectorizing math functions that are in the MASS library.
	(rs6000_builtin_vectorized_function): Call it.

	* doc/invoke.texi (RS/6000 and PowerPC Options): Document
	-mveclibabi=mass.

[gcc/testsuite]
2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vsx-mass-1.c: New file, test
	-mveclibabi=mass.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com

[-- Attachment #2: gcc-power7.patch151b --]
[-- Type: text/plain, Size: 18620 bytes --]

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk)	(revision 163345)
+++ gcc/doc/invoke.texi	(working copy)
@@ -786,7 +786,9 @@ See RS/6000 and PowerPC Options.
 -mprototype  -mno-prototype @gol
 -msim  -mmvme  -mads  -myellowknife  -memb  -msdata @gol
 -msdata=@var{opt}  -mvxworks  -G @var{num}  -pthread @gol
--mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision -mno-recip-precision}
+-mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision
+-mno-recip-precision @gol
+-mveclibabi=@var{type}}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -15847,6 +15849,30 @@ automatically selects @option{-mrecip-pr
 precision square root estimate instructions are not generated by
 default on low precision machines, since they do not provide an
 estimate that converges after three steps.
+
+@item -mveclibabi=@var{type}
+@opindex mveclibabi
+Specifies the ABI type to use for vectorizing intrinsics using an
+external library.  The only type supported at present is @code{mass},
+which specifies to use IBM's Mathematical Acceleration Subsystem
+(MASS) libraries for vectorizing intrinsics using external libraries.
+GCC will currently emit calls to @code{acosd2}, @code{acosf4},
+@code{acoshd2}, @code{acoshf4}, @code{asind2}, @code{asinf4},
+@code{asinhd2}, @code{asinhf4}, @code{atan2d2}, @code{atan2f4},
+@code{atand2}, @code{atanf4}, @code{atanhd2}, @code{atanhf4},
+@code{cbrtd2}, @code{cbrtf4}, @code{cosd2}, @code{cosf4},
+@code{coshd2}, @code{coshf4}, @code{erfcd2}, @code{erfcf4},
+@code{erfd2}, @code{erff4}, @code{exp2d2}, @code{exp2f4},
+@code{expd2}, @code{expf4}, @code{expm1d2}, @code{expm1f4},
+@code{hypotd2}, @code{hypotf4}, @code{lgammad2}, @code{lgammaf4},
+@code{log10d2}, @code{log10f4}, @code{log1pd2}, @code{log1pf4},
+@code{log2d2}, @code{log2f4}, @code{logd2}, @code{logf4},
+@code{powd2}, @code{powf4}, @code{sind2}, @code{sinf4}, @code{sinhd2},
+@code{sinhf4}, @code{sqrtd2}, @code{sqrtf4}, @code{tand2},
+@code{tanf4}, @code{tanhd2}, and @code{tanhf4} when generating code
+for power7.  Both @option{-ftree-vectorize} and
+@option{-funsafe-math-optimizations} have to be enabled.  The MASS
+libraries will have to be specified at link time.
 @end table
 
 @node RX Options
Index: gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c	(revision 163355)
@@ -0,0 +1,554 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O3 -ftree-vectorize -mcpu=power7 -ffast-math -mveclibabi=mass" } */
+/* { dg-final { scan-assembler "bl atan2d2" } } */
+/* { dg-final { scan-assembler "bl atan2f4" } } */
+/* { dg-final { scan-assembler "bl hypotd2" } } */
+/* { dg-final { scan-assembler "bl hypotf4" } } */
+/* { dg-final { scan-assembler "bl powd2" } } */
+/* { dg-final { scan-assembler "bl powf4" } } */
+/* { dg-final { scan-assembler "bl acosd2" } } */
+/* { dg-final { scan-assembler "bl acosf4" } } */
+/* { dg-final { scan-assembler "bl acoshd2" } } */
+/* { dg-final { scan-assembler "bl acoshf4" } } */
+/* { dg-final { scan-assembler "bl asind2" } } */
+/* { dg-final { scan-assembler "bl asinf4" } } */
+/* { dg-final { scan-assembler "bl asinhd2" } } */
+/* { dg-final { scan-assembler "bl asinhf4" } } */
+/* { dg-final { scan-assembler "bl atand2" } } */
+/* { dg-final { scan-assembler "bl atanf4" } } */
+/* { dg-final { scan-assembler "bl atanhd2" } } */
+/* { dg-final { scan-assembler "bl atanhf4" } } */
+/* { dg-final { scan-assembler "bl cbrtd2" } } */
+/* { dg-final { scan-assembler "bl cbrtf4" } } */
+/* { dg-final { scan-assembler "bl cosd2" } } */
+/* { dg-final { scan-assembler "bl cosf4" } } */
+/* { dg-final { scan-assembler "bl coshd2" } } */
+/* { dg-final { scan-assembler "bl coshf4" } } */
+/* { dg-final { scan-assembler "bl erfd2" } } */
+/* { dg-final { scan-assembler "bl erff4" } } */
+/* { dg-final { scan-assembler "bl erfcd2" } } */
+/* { dg-final { scan-assembler "bl erfcf4" } } */
+/* { dg-final { scan-assembler "bl exp2d2" } } */
+/* { dg-final { scan-assembler "bl exp2f4" } } */
+/* { dg-final { scan-assembler "bl expd2" } } */
+/* { dg-final { scan-assembler "bl expf4" } } */
+/* { dg-final { scan-assembler "bl expm1d2" } } */
+/* { dg-final { scan-assembler "bl expm1f4" } } */
+/* { dg-final { scan-assembler "bl lgamma" } } */
+/* { dg-final { scan-assembler "bl lgammaf" } } */
+/* { dg-final { scan-assembler "bl log10d2" } } */
+/* { dg-final { scan-assembler "bl log10f4" } } */
+/* { dg-final { scan-assembler "bl log1pd2" } } */
+/* { dg-final { scan-assembler "bl log1pf4" } } */
+/* { dg-final { scan-assembler "bl log2d2" } } */
+/* { dg-final { scan-assembler "bl log2f4" } } */
+/* { dg-final { scan-assembler "bl logd2" } } */
+/* { dg-final { scan-assembler "bl logf4" } } */
+/* { dg-final { scan-assembler "bl sind2" } } */
+/* { dg-final { scan-assembler "bl sinf4" } } */
+/* { dg-final { scan-assembler "bl sinhd2" } } */
+/* { dg-final { scan-assembler "bl sinhf4" } } */
+/* { dg-final { scan-assembler "bl tand2" } } */
+/* { dg-final { scan-assembler "bl tanf4" } } */
+/* { dg-final { scan-assembler "bl tanhd2" } } */
+/* { dg-final { scan-assembler "bl tanhf4" } } */
+
+#ifndef SIZE
+#define SIZE 1024
+#endif
+
+double d1[SIZE] __attribute__((__aligned__(32)));
+double d2[SIZE] __attribute__((__aligned__(32)));
+double d3[SIZE] __attribute__((__aligned__(32)));
+
+float f1[SIZE] __attribute__((__aligned__(32)));
+float f2[SIZE] __attribute__((__aligned__(32)));
+float f3[SIZE] __attribute__((__aligned__(32)));
+
+void
+test_double_atan2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atan2 (d2[i], d3[i]);
+}
+
+void
+test_float_atan2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atan2f (f2[i], f3[i]);
+}
+
+void
+test_double_hypot (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_hypot (d2[i], d3[i]);
+}
+
+void
+test_float_hypot (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_hypotf (f2[i], f3[i]);
+}
+
+void
+test_double_pow (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_pow (d2[i], d3[i]);
+}
+
+void
+test_float_pow (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_powf (f2[i], f3[i]);
+}
+
+void
+test_double_acos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_acos (d2[i]);
+}
+
+void
+test_float_acos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_acosf (f2[i]);
+}
+
+void
+test_double_acosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_acosh (d2[i]);
+}
+
+void
+test_float_acosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_acoshf (f2[i]);
+}
+
+void
+test_double_asin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_asin (d2[i]);
+}
+
+void
+test_float_asin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_asinf (f2[i]);
+}
+
+void
+test_double_asinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_asinh (d2[i]);
+}
+
+void
+test_float_asinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_asinhf (f2[i]);
+}
+
+void
+test_double_atan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atan (d2[i]);
+}
+
+void
+test_float_atan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atanf (f2[i]);
+}
+
+void
+test_double_atanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atanh (d2[i]);
+}
+
+void
+test_float_atanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atanhf (f2[i]);
+}
+
+void
+test_double_cbrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cbrt (d2[i]);
+}
+
+void
+test_float_cbrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_cbrtf (f2[i]);
+}
+
+void
+test_double_cos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cos (d2[i]);
+}
+
+void
+test_float_cos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_cosf (f2[i]);
+}
+
+void
+test_double_cosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cosh (d2[i]);
+}
+
+void
+test_float_cosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_coshf (f2[i]);
+}
+
+void
+test_double_erf (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_erf (d2[i]);
+}
+
+void
+test_float_erf (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_erff (f2[i]);
+}
+
+void
+test_double_erfc (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_erfc (d2[i]);
+}
+
+void
+test_float_erfc (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_erfcf (f2[i]);
+}
+
+void
+test_double_exp2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_exp2 (d2[i]);
+}
+
+void
+test_float_exp2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_exp2f (f2[i]);
+}
+
+void
+test_double_exp (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_exp (d2[i]);
+}
+
+void
+test_float_exp (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_expf (f2[i]);
+}
+
+void
+test_double_expm1 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_expm1 (d2[i]);
+}
+
+void
+test_float_expm1 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_expm1f (f2[i]);
+}
+
+void
+test_double_lgamma (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_lgamma (d2[i]);
+}
+
+void
+test_float_lgamma (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_lgammaf (f2[i]);
+}
+
+void
+test_double_log10 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log10 (d2[i]);
+}
+
+void
+test_float_log10 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log10f (f2[i]);
+}
+
+void
+test_double_log1p (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log1p (d2[i]);
+}
+
+void
+test_float_log1p (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log1pf (f2[i]);
+}
+
+void
+test_double_log2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log2 (d2[i]);
+}
+
+void
+test_float_log2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log2f (f2[i]);
+}
+
+void
+test_double_log (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log (d2[i]);
+}
+
+void
+test_float_log (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_logf (f2[i]);
+}
+
+void
+test_double_sin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sin (d2[i]);
+}
+
+void
+test_float_sin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sinf (f2[i]);
+}
+
+void
+test_double_sinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sinh (d2[i]);
+}
+
+void
+test_float_sinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sinhf (f2[i]);
+}
+
+void
+test_double_sqrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sqrt (d2[i]);
+}
+
+void
+test_float_sqrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sqrtf (f2[i]);
+}
+
+void
+test_double_tan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_tan (d2[i]);
+}
+
+void
+test_float_tan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_tanf (f2[i]);
+}
+
+void
+test_double_tanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_tanh (d2[i]);
+}
+
+void
+test_float_tanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_tanhf (f2[i]);
+}
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk)	(revision 163345)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -115,6 +115,10 @@ mpopcntd
 Target Report Mask(POPCNTD)
 Use PowerPC V2.06 popcntd instruction
 
+mveclibabi=
+Target RejectNegative Joined Var(rs6000_veclibabi_name)
+Vector library ABI to use
+
 mvsx
 Target Report Mask(VSX)
 Use vector/scalar (VSX) instructions
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk)	(revision 163345)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -949,6 +949,9 @@ static const enum rs6000_btc builtin_cla
 #undef RS6000_BUILTIN
 #undef RS6000_BUILTIN_EQUATE
 
+/* Support for -mveclibabi=<xxx> to control which vector library to use.  */
+static tree (*rs6000_veclib_handler) (tree, tree, tree);
+
 \f
 static bool rs6000_function_ok_for_sibcall (tree, tree);
 static const char *rs6000_invalid_within_doloop (const_rtx);
@@ -989,6 +992,7 @@ static rtx rs6000_emit_stack_reset (rs60
 static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int,
 				   enum machine_mode, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
+static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
 static int rs6000_savres_strategy (rs6000_stack_t *, bool, int, int);
 static void rs6000_restore_saved_cr (rtx, int);
@@ -2771,6 +2775,15 @@ rs6000_override_options (const char *def
 	       rs6000_traceback_name);
     }
 
+  if (rs6000_veclibabi_name)
+    {
+      if (strcmp (rs6000_veclibabi_name, "mass") == 0)
+	rs6000_veclib_handler = rs6000_builtin_vectorized_libmass;
+      else
+	error ("unknown vectorization library ABI type (%s) for "
+	       "-mveclibabi= switch", rs6000_veclibabi_name);
+    }
+
   if (!rs6000_explicit_options.long_double)
     rs6000_long_double_type_size = RS6000_DEFAULT_LONG_DOUBLE_SIZE;
 
@@ -3602,6 +3615,145 @@ rs6000_parse_fpu_option (const char *opt
   return FPU_NONE;
 }
 
+
+/* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
+   library with vectorized intrinsics.  */
+
+static tree
+rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in)
+{
+  char name[32];
+  const char *suffix = NULL;
+  tree fntype, new_fndecl, bdecl = NULL_TREE;
+  int n_args = 1;
+  const char *bname;
+  enum machine_mode el_mode, in_mode;
+  int n, in_n;
+
+  /* Libmass is suitable for unsafe math only as it does not correctly support
+     parts of IEEE with the required precision such as denormals.  Only support
+     it if we have VSX to use the simd d2 or f4 functions.
+     XXX: Add variable length support.  */
+  if (!flag_unsafe_math_optimizations || !TARGET_VSX)
+    return NULL_TREE;
+
+  el_mode = TYPE_MODE (TREE_TYPE (type_out));
+  n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+  if (el_mode != in_mode
+      || n != in_n)
+    return NULL_TREE;
+
+  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+    {
+      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
+      switch (fn)
+	{
+	case BUILT_IN_ATAN2:
+	case BUILT_IN_HYPOT:
+	case BUILT_IN_POW:
+	  n_args = 2;
+	  /* fall through */
+
+	case BUILT_IN_ACOS:
+	case BUILT_IN_ACOSH:
+	case BUILT_IN_ASIN:
+	case BUILT_IN_ASINH:
+	case BUILT_IN_ATAN:
+	case BUILT_IN_ATANH:
+	case BUILT_IN_CBRT:
+	case BUILT_IN_COS:
+	case BUILT_IN_COSH:
+	case BUILT_IN_ERF:
+	case BUILT_IN_ERFC:
+	case BUILT_IN_EXP2:
+	case BUILT_IN_EXP:
+	case BUILT_IN_EXPM1:
+	case BUILT_IN_LGAMMA:
+	case BUILT_IN_LOG10:
+	case BUILT_IN_LOG1P:
+	case BUILT_IN_LOG2:
+	case BUILT_IN_LOG:
+	case BUILT_IN_SIN:
+	case BUILT_IN_SINH:
+	case BUILT_IN_SQRT:
+	case BUILT_IN_TAN:
+	case BUILT_IN_TANH:
+	  bdecl = implicit_built_in_decls[fn];
+	  suffix = "d2";				/* pow -> powd2 */
+	  if (el_mode != DFmode
+	      || n != 2)
+	    return NULL_TREE;
+	  break;
+
+	case BUILT_IN_ATAN2F:
+	case BUILT_IN_HYPOTF:
+	case BUILT_IN_POWF:
+	  n_args = 2;
+	  /* fall through */
+
+	case BUILT_IN_ACOSF:
+	case BUILT_IN_ACOSHF:
+	case BUILT_IN_ASINF:
+	case BUILT_IN_ASINHF:
+	case BUILT_IN_ATANF:
+	case BUILT_IN_ATANHF:
+	case BUILT_IN_CBRTF:
+	case BUILT_IN_COSF:
+	case BUILT_IN_COSHF:
+	case BUILT_IN_ERFF:
+	case BUILT_IN_ERFCF:
+	case BUILT_IN_EXP2F:
+	case BUILT_IN_EXPF:
+	case BUILT_IN_EXPM1F:
+	case BUILT_IN_LGAMMAF:
+	case BUILT_IN_LOG10F:
+	case BUILT_IN_LOG1PF:
+	case BUILT_IN_LOG2F:
+	case BUILT_IN_LOGF:
+	case BUILT_IN_SINF:
+	case BUILT_IN_SINHF:
+	case BUILT_IN_SQRTF:
+	case BUILT_IN_TANF:
+	case BUILT_IN_TANHF:
+	  bdecl = implicit_built_in_decls[fn];
+	  suffix = "4";					/* powf -> powf4 */
+	  if (el_mode != SFmode
+	      || n != 4)
+	    return NULL_TREE;
+	  break;
+
+	default:
+	  return NULL_TREE;
+	}
+    }
+  else
+    return NULL_TREE;
+
+  gcc_assert (suffix != NULL);
+  bname = IDENTIFIER_POINTER (DECL_NAME (bdecl));
+  strcpy (name, bname + sizeof ("__builtin_") - 1);
+  strcat (name, suffix);
+
+  if (n_args == 1)
+    fntype = build_function_type_list (type_out, type_in, NULL);
+  else if (n_args == 2)
+    fntype = build_function_type_list (type_out, type_in, type_in, NULL);
+  else
+    gcc_unreachable ();
+
+  /* Build a function declaration for the vectorized function.  */
+  new_fndecl = build_decl (BUILTINS_LOCATION,
+			   FUNCTION_DECL, get_identifier (name), fntype);
+  TREE_PUBLIC (new_fndecl) = 1;
+  DECL_EXTERNAL (new_fndecl) = 1;
+  DECL_IS_NOVOPS (new_fndecl) = 1;
+  TREE_READONLY (new_fndecl) = 1;
+
+  return new_fndecl;
+}
+
 /* Returns a function decl for a vectorized version of the builtin function
    with builtin function code FN and the result vector type TYPE, or NULL_TREE
    if it is not available.  */
@@ -3768,6 +3920,10 @@ rs6000_builtin_vectorized_function (tree
 	}
     }
 
+  /* Generate calls to libmass if appropriate.  */
+  if (rs6000_veclib_handler)
+    return rs6000_veclib_handler (fndecl, type_out, type_in);
+
   return NULL_TREE;
 }
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, powerpc] Add -mmass to use XL's MASS vectorization library
  2010-08-18 22:17   ` Michael Meissner
@ 2010-08-20 14:44     ` David Edelsohn
  0 siblings, 0 replies; 6+ messages in thread
From: David Edelsohn @ 2010-08-20 14:44 UTC (permalink / raw)
  To: Michael Meissner, GCC Patches

On Wed, Aug 18, 2010 at 6:04 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:

>        * config/rs6000/rs6000.opt (-mveclibabi=mass): New option to
>        enable the compiler to autovectorize mathmetical functions for
>        power7 using the Mathematical Acceleration Subsystem library.
>
>        * config/rs6000/rs6000.c (rs6000_veclib_handler): New variable to
>        handle which vector math library we have.
>        (rs6000_override_options): Add -mveclibabi=mass support.
>        (rs6000_builtin_vectorized_libmass): New function to handle auto
>        vectorizing math functions that are in the MASS library.
>        (rs6000_builtin_vectorized_function): Call it.
>
>        * doc/invoke.texi (RS/6000 and PowerPC Options): Document
>        -mveclibabi=mass.
>
> [gcc/testsuite]
> 2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        * gcc.target/powerpc/vsx-mass-1.c: New file, test
>        -mveclibabi=mass.

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-08-20 14:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-18 20:36 [PATCH, powerpc] Add -mmass to use XL's MASS vectorization library Michael Meissner
2010-08-18 20:41 ` Richard Guenther
2010-08-18 20:59   ` Michael Meissner
2010-08-18 21:08     ` Sebastian Pop
2010-08-18 22:17   ` Michael Meissner
2010-08-20 14:44     ` David Edelsohn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).